Snow model comparison to simulate snow depth evolution and sublimation at point scale in the semi-arid Andes of Chile

Abstract. Physically based snow models provide valuable information on snow cover evolution and are therefore key to provide water availability projections. Yet, uncertainties related to snow modelling remain large as a result of differences in the representation of snow physics and meteorological forcing. While many studies focus on evaluating these uncertainties, no snow model comparison has been done in environments where sublimation is the main ablation process. This study evaluates a case study in the semi-arid Andes of Chile and aims to compare two snow models with different complexities, SNOWPACK and SnowModel, at a local point over one snow season and to evaluate their sensitivity relative to parameterisation and forcing. For that purpose, the two models are forced with (i) the most ideal set of input parameters, (ii) an ensemble of different physical parameterisations, and (iii) an ensemble of biased forcing.
Results indicate large uncertainties depending on forcing, the snow roughness length z0, albedo parameterisation, and fresh snow density parameterisation. The uncertainty caused by the forcing is directly related to the bias chosen. Even though the models show significant differences in their physical complexity, the snow model choice is of least importance, as the sensitivity of both models to the forcing data was on the same order of magnitude and highly influenced by the precipitation uncertainties. The sublimation ratio ranges are in agreement for the two models: 36.4 % to 80.7 % for SnowModel and 36.3 % to 86.0 % for SNOWPACK, and are related to the albedo parameterisation and snow roughness length choice for the two models.


equation, coupled with snow models, enables a more complete understanding of snow physical processes and are essential for understanding the interaction between snow cover evolution and climate change.
Physically-based snow models have different complexities in their physical representations, from a single layer approach (e.g. Strasser and Marke, 2010), to more sophisticated multi-layer detailed models representing the evolution of snow mi-25 crostructure and the layering of snow physical properties (e.g. Bartelt and Lehning, 2002;Vionnet et al., 2012), leading to a wide variety of snow models with a wide variety of parameterizations. In a snow model intercomparison study , Etchevers et al. (2004) highlighted the importance of parameterization choice, especially regarding the net longwave and albedo characterisation. After comparing 33 snow models, Rutter et al. (2009) concluded that no universal 'best' model exists and model performance mainly depends on the study site. Furthermore, the Earth System Model -Snow Model Intercomparison Project 30 (ESM-SnowMIP) compared several snow models to improve the models in the context of local-and global scale modelling (Krinner et al., 2018) and indicated scientific and human errors in snow model intercomparisons (Menard et al., 2021), but the study sites did not include semi-arid regions.
In addition to the development of new models, many studies have focused on model improvements offering different parameterizations in a single model (e.g. Douville et al., 1995;Dutra et al., 2010;Essery, 2015). In such frameworks, many parameters 35 need to be calibrated and are often difficult to be set according to local measurements, such as the albedo and aerodynamic roughness length (Brock et al., 2000(Brock et al., , 2006. To address this issue, and to consider and quantify parameter uncertainty propagation in simulated snow depth changes, recent studies have started to use ensemble approaches. Here models are evaluated based on different likely combinations of values of variables such as snow albedo, snow compaction, fresh snow density and liquid water transport (e.g. Essery et al., 2013;Lafaysse et al., 2017;Günther et al., 2019). 40 In addition, forcing data uncertainty has a significant influence on the simulated snow depth changes (e.g. Magnusson et al., 2015;Raleigh et al., 2015;Günther et al., 2019) and needs to be considered in model evaluations. While point scale simulations forced by direct observations generally reduce forcing uncertainties, measurement errors can be considerable due to the complexity of both measuring certain parameters as well as maintaining measurement sites (e.g. for precipitation (MacDonald and Pomeroy, 2007;Smith, 2007;Wolff et al., 2015), sensor inclination (Weiser et al., 2016) or sensor failure). 45 Methods such as stochastic perturbation with random noise (e.g. Charrois et al., 2016) or following a uniform or normally distributed bias with different magnitudes (e.g. Raleigh et al., 2015) can be used to build an ensemble of meteorological forcing and explicitly simulate the impact of forcing uncertainty on the simulated snow depth (e.g. Charrois et al., 2016;Zolles et al., 2019;Günther et al., 2019).
Despite past efforts to improve snow models and quantify uncertainty propagation, the uncertainties regarding snow physics 50 representation and meteorological forcing remains (e.g. Essery et al., 2013;Raleigh et al., 2015;Günther et al., 2019); in particular in regions where sublimation is the main ablation process, due to the lack of snow modelling studies in semi-arid regions (Gascoin et al., 2013;Réveillet et al., 2020;MacDonell et al., 2013a;Mengual Henríquez, 2017).
This study aims to evaluate two physical snow models with different complexities, considering parameterization and forcing uncertainties. We simulate snow depth changes in the semi-arid Andes of Chile using data from an automated weather station. 55 In this region, snow model uncertainty is a key concern as snow melt is an essential water resource for the population (Favier et al., 2009). Despite the importance of snow as water resource, quantifying and understanding the snow cover evolution remains limited and challenging due to i) high sublimation rates related to high levels of incoming solar radiation, cold air temperatures, arid atmosphere, and high wind speeds (e.g. MacDonell et al., 2013a;Réveillet et al., 2020), and ii) shallow snow depths due to very low precipitation amounts (Scaff et al., 2017;Réveillet et al., 2020;Ayala et al., 2017). In previous 60 studies the effect of wind on snow cover pattern distribution has been assessed by Gascoin et al. (2013) and the relative importance of melt versus sublimation has been studied over one catchment by Réveillet et al. (2020), both making use of the physically-based snow model SnowModel (Liston and Elder, 2006b). The study performed by Mengual Henríquez (2017) assessed the snow types in different Chilean regions with SNOWPACK (Bartelt and Lehning, 2002;Lehning et al., 2002a, b). Nevertheless, an accurate assessment of different snow models' sensitivity to parameterization choice or input forcing is 65 currently missing, although it is expected to have a large impact.
In this work, the sensitivity of SnowModel and SNOWPACK, the common snow models previously used in this region, is assessed based on parameterization choices and forcing uncertainty. First, the models are calibrated similarly to allow later comparisons and a most ideal setup for both models is designed to acquire a precipitation dataset that corrects for the underestimation of precipitation. Second, both models are run with different combinations of parameterizations to assess the uncertainty 70 of parameterizations. Subsequently, forcing uncertainty propagation in the snow model is considered by running the models with 1000 sets of perturbed forcing. The combination of sensitivity analysis to model parameterizations and meteorological forcing allows to evaluate and compare the two models.

Study area and data
We assess the sensitivity of both models using data from an AWS over the snow season of 2017. First this study area and the 75 meteorological observations are described, followed by the data preprocessing procedure.

Study area
The study area is located in the La Laguna catchment, in the Chilean Coquimbo region, close to the Argentinian border (Fig. 1a).
To assess the sensitivity of the models to the representation of snow physics and meteorological forcing, we use data from the Tapado Automatic Weather Station (AWS), a permanent meteorological tower since 2009 located close to the terminus of the 80 Tapado Glacier at 30 • S, 69 • W, 4306 m a.s.l. (Fig. 1c). The site shows a complex topographic setting with average (maximum) wind speeds of 4.2 m s −1 (>15 m s −1 ) in 2017 and little precipitation (<200 mm a −1 ) that falls as snow during fewer than 10 events per year. Precipitation events mainly occur during the winter season (>90%) (Rabatel et al., 2011;Réveillet et al., 2020). Therefore the area surrounding the AWS is only covered with snow in austral winter. At this elevation, vegetation is extremly sparse.
This gauge is an unshielded, unheated weighing bucket precipitation gauge filled with anti-freeze liquid and oil to prevent 90 freezing and evaporation respectively. During the snow season, defined as the period with snow on the ground (i.e. between 10 May and 6 November 2017), the station recorded meteorological observations continuously except for the TA and RH, for which gaps have been filled using three nearby AWSs (Fig. 1c, Sect. 2.3). Scientific which passively detects the change in naturally emitted terrestrial gamma radiation from the ground after it passes through snow cover. It provided two independent SWE observations measuring both potassium and thallium gamma rays (Wright, 2011). The uncertainty given by the manufacturer is ±15 mm from 0 to 300 mm and ±15% from 300 to 600 mm, but biases of up to 82 mm w.e. between gamma ray measurements at ∼300 mm w.e. were measured. The manufacturer suggests that the output with the higher count is generally the most reliable, which were the potassium gamma rays measurements 100 (Randall, 2018, personal communication). We display both data sets and estimate an accuracy of ±25 mm for this data set.

Preprocessing of forcing data
The period between 5 May and 30 November 2017 has been covered to model the snow evolution in the austral winter, as this was a season where SWE data was available to validate the models. Since continuous data are required for both snow models, preprocessing was necessary to fill the gaps in the TA and RH data sets (23 June 11:00 to 31 October 10:00 due to sensor 105 failure) and to correct the wind-induced undercatch in the precipitation data. Therefore, TA and RH data were interpolated based on lapse rates from nearby AWSs (Agua Negra (4774 m a.s.l.), Llano de las Liebres (3565 m a.s.l.) and La Laguna (3209 m a.s.l.; Fig. 1c).
For TA, a daily temperature lapse rate (Blandford et al., 2008) was calculated using TA measured at La Laguna and Paso Agua Negra AWSs (1565 m elevation difference) between 2014 and 2017. We fitted a sinusoidal trend over these lapse rates 110 for the four-year period and found daily lapse rates with a maximum of -6.9 • C km −1 in winter and a minimum of -8.0 • C km −1 in summer. These daily lapse rates were subsequently applied to TA observations of Llano de las Liebres AWS which is the only AWS that covers the entire period of missing data in 2017. For RH a similar approach was applied using the lapse rate of the daily dew point temperature between the Paso Agua Negra and La Laguna AWSs and applying it to data measured at the Llano de las Liebres AWS. Dew point temperature was converted to RH following Liston and Elder (2006a). Evaluation 115 of this lapse rate interpolation, based on 1638 overlapping observations at Tapado, shows an uncertainty (i.e. RMSE) of 2.8 • C and 9.97% for TA and RH respectively.
Since the precipitation observations were directly influenced by wind, an undercatch in the precipitation gauge is likely (e.g. MacDonald and Pomeroy, 2007;Smith, 2007;Wolff et al., 2015). As there are different possibilities to correct for this, the assimilation and correction of precipitation data is explained in Sect.  (Bartelt and Lehning, 2002;Lehning et al., 2002a, b). It is a one-dimensional model, but can be implemented in the spatially distributed, three-125 dimensional snow cover and earth surface model Alpine3D (Lehning et al., 2006). SNOWPACK includes the MeteoIO preprocessing library for meteorological data (Bavay and Egger, 2014) which was not used, as we implemented a homogeneous preprocessing approach for both models (see Sect. 2.3). SNOWPACK is a physically-based model which has the ability to simulate snow physical properties (e.g. snowpack temperature, layer thickness, snow microstructure and density) and snow processes (e.g. refreezing, sublimation, melt, evaporation) for multiple layers, which are merged if layers become too thin.

130
Sublimation and evaporation are calculated for the top element of the snowpack and melt is simulated using a water transport bucket scheme. In this bucket scheme, all the liquid water exceeding a threshold water content is transported downward in the snowpack or soil (Wever et al., 2014). An extensive description of the model can be found in Bartelt and Lehning (2002); Lehning et al. (2002a, b).

135
SnowModel is a spatially distributed snowpack evolution modelling system composed of four submodels MicroMet, EnBal, SnowPack and SnowTran3D (Liston and Elder, 2006b). MicroMet is a preprocessing library for meteorological data interpolation, which was not used in this study as we focused on one location only while we implemented a homogeneous preprocessing approach for both models (see Sect. 2.3). EnBal calculates standard surface energy balance exchanges (Liston and Hall, 1995).
SnowModel's SnowPack subroutine is a single or multi-layer (max. six layers) snowpack evolution and runoff model that de-140 scribes snowpack changes in response to precipitation and melt fluxes defined by MicroMet and EnBal (Liston and Hall, 1995;Liston and Elder, 2006b). In SnowModel, the melted snow is redistributed over the remaining snow depth up to a maximum density threshold of 550 kg m −3 . Any additional melt water is added to the runoff. In this study the model was run with the maximum of snow layers (i.e. six layers) to be comparable with the multiple amount of layers in SNOWPACK. Finally, the three-dimensional model SnowTran3D (Liston and Sturm, 1998), which simulates snow erosion and deposition, is not activated 145 in this study; this choice is discussed in Sect. 5.2.

Model setup and sensitivity analysis
To assess the sensitivity of both models to parameterization choice and input uncertainty, we applied a four step approach.
First, we calibrated both models similarly to allow later comparisons (Sect. 3.2.1). Second, we designed the most ideal setup for both models to acquire a precipitation dataset that corrects for the underestimation of precipitation. Third, we varied the

155
Initially, both models were set up using similar parameters to facilitate intercomparison. These parameters were derived from observations or previous studies (Table 2). For example, the soil albedo was set to 0.15, as this is the average observed albedo at the moment when there is no snow. The observed daily albedo is defined as the daily sum of the average hourly S ↑ divided by the daily sum of average hourly S ↓ (Fig. 4e,f). In the absence of roughness length measurements, the roughness length of the bare soil is set to 0.020 m, corresponding to the default roughness length of pebbles and rocks in SnowModel. As surface 160 ground temperature measurements are not available, we set it to -1 • C in both models. -1 • C is the default value in SnowModel and ensures that the first snow does not immediately melt.

Idealised setup
Preliminary results showed simulated SWE and SD to be more than two times lower than the observed SWE. This is caused by an underestimation of the precipitation measurements, as the AWS is placed in a concave area that collects more snow than the 165 Geonor precipitation gauge. This is in correspondence with research by Grünewald and Lehning (2014) on the spatial variability of SD measurements. Therefore, to determine the corrected precipitation input for the models, an idealised setup is designed, making use of all the data that the models allow as forcing. Two approaches are designed to adjust the precipitation data set.
First, it is chosen to assimilate a precipitation data set, which both models perform in different ways. SNOWPACK assimilates the data if SD is given as input. Reflected SW radiation is also given as input, to prevent inaccurate parameterizations of the 170 albedo. The precipitation data set is assimilated with the five possible fresh snow density parameterizations in SNOWPACK.
SnowModel allows the possiblity to assimilate the precipitation when SWE is given, but is not able to cope with reflected SW radiation as input. Therefore, six ensembles are made out of two albedo and three fresh snow density parameterizations to find an assimilated precipitation data set. Second, the precipitation is reconstructed from the SWE observations, which was computed using the cumulative positive SWE (potassium) changes during precipitation events (detected by the Geonor This idealised case corresponds therefore to simulations using the best possible combination of input data. As such ob-185 servations are not always available or used to evaluate models, the idealised simulations are not used for the sensitivity study and model comparisons, which are based on optimal simulations (i.e. without assimilating precipitation and albedo, see Sect. 3.2.3). The simulated SWE and SD are compared to the observed SWE and SD and the assimilated precipitation data sets are shown.The precipitation data set (P cor ) that leads to a simulation with best correspondence to the observed SWE and SD is used in the further study.

Sensitivity analysis of variable parameterizations
To assess the impact of the parameterizations on the snowpack simulation, an ensemble approach based on different combinations of albedo and snow density parameterizations and z 0 was used (e.g. Essery et al., 2013;Lafaysse et al., 2017, and Sect. 3.2.2). The choice to limit the sensitivity tests to these three parameters is discussed in Sect. 5.2. For SnowModel, an ensemble of 12 simulations was run, considering two albedo, three snow density parameterizations and two 200 different z 0 . The albedo parameterizations range between 0.6 and 0.9 depending i) on TA solely (more details in Liston and Hall (1995), Liston and Elder (2006b) and in Sect. S1 (SM)) or ii) on TA and time (Strack et al., 2004, and Sect. S1). SnowModel's default fresh snow density parameterization depends on the wet bulb temperature, but we included two fresh snow density parameterizations from SNOWPACK depending on TA, RH, WS and surface temperature to test the model more extensively.
In these additional parameterizations, we preserved the SnowModel defaults for minimum (50 kg

Forcing uncertainty estimation
To assess the model sensitivity to meteorological measurement uncertainties, a bias has been applied to the meteorological forcing presented in Sect. 2.3 to generate an ensemble of 1000 forcing files. Raleigh et al. (2015) have shown that the model outputs are more sensitive to forcing biases than random errors. Therefore, all input variables except P were modified by 215 Table 3. Forcing data for the snow models with the corresponding uncertainty σ used in the ensemble simulation. The ranges of PA, TA, S ↓ , L ↓ , RH and WS are ranges as used by Raleigh et al. (2015). The WD range is according to the uncertainty given by the manufacturer and the P range is described in Sect. • adding hourly biases with a normal distribution N (µ = 0, σ 2 ) with σ the uncertainty range taken from Raleigh et al. (2015) and reported in Table 3. The biases has been kept within their corresponding range (

Model evaluation
Model evaluation consists of comparing the model output of SD, SWE and albedo with the corresponding observations. For the idealised case this consists of evaluating the RMSE and R 2 between the modelled and the observed SD to acquire precipitation 230 data that approaches the correct mass balance. For the parameterization uncertainty, this consists of evaluating the RMSE and R 2 between the modelled and the observed albedo, to select the best reference for each model (i.e. 40 for SNOWPACK and 12 for SnowModel). In this case, it is chosen to only compare between modelled and observed albedo, as this ensures the best possible net shortwave radiation term in the energy balance equation. The forcing uncertainty is evaluated by comparing the differences of end of snow season. Last, the differences in ablation processes of the parameterizations are shown.

Idealised simulations
The assimilated precipitation datasets markedly differ between SNOWPACK and SnowModel (blue lines, Fig. 3e,f). For clarity Fig. 3 shows only the results of the idealised simulations for the z 0 value of 1 cm; the results for z 0 of 1 cm and 1 mm are displayed in Sect. S2. For SNOWPACK, eight out of ten runs with z 0 =1 cm crashed, thus only two simulations are shown. The 240 reason for these crashes has not been further investigated.
Assimilation of SD in SNOWPACK results in SWE that approximates the PSWE (Fig. 3), leading to assimilated precipitation amounts of 2.55 to 3.02 times the observed precipitation and a good correspondence with the observed SD (i.e. RMSE between 9.2 and 11.5 cm and R 2 between 0.90 and 0.93 calculated with the observed and simulated SD, Fig. 3a).
Assimilation of SWE in SnowModel only adjusts the precipitation between 22 and 27 June and between 7 and 12 August.

245
The amount of precipitation is not adjusted at the beginning of the season and thus, the assimilated data by SnowModel still leads to an underestimation of the SD and SWE (Fig. 3b,d). The missing adjustment of the SWE is probably caused by SnowModel taking a maximum of 99 SWE observations and the observations do not exactly allign with the precipitation events, which leads to correction factors of one (i.e. no change) to the precipitation data. The assimilated precipitation is approx. 1.6 times larger than the observed precipitation and the agreement between modelled and observed SD is better for SNOWPACK 250 than for SnowModel (i.e. RMSE between 7.1 and 17.1 cm and R 2 between 0.19 and 0.90 calculated with the observed and simulated SD, Fig. 3b).
Both models overestimate the SWE between mid-July and September when PSWE was given as input (red lines in Fig. 3c,d).
This is likely caused by an overestimation of the PSWE at the end of June. Only small amounts of precipitation are observed at the precipitation gauge, but the observed SWE distinctly increases probably due to snow drift, as strong winds were also 255 observed. A similar thing occurs at the end of September. The models markedly increase the amount of precipitation (between 97 and 137 mm w.e.) in the assimilation runs, but only small amounts of precipitation and strong winds were observed. This is related to a strong melt in September, not simulated by the model, along with models trying to get the SWE and SD to the observational levels. The strong melt in September is caused by a sudden decrease of the albedo (observations in Fig. 4), as it is likely that the snow got covered with dust after some days with strong wind at the end of September, but the simulations 260 overestimate the melt caused by this albedo decrease. Likewise, high wind speeds and a strong SWE and SD decrease is observed mid-July, which is likely snow erosion and also not considered in our simulations. The overestimation of and the need for SWE as validation data are indications that the PSWE is not a valid precipitation dataset for our simulations, but it is also unfeasible to select one of the assimilated precipitation sets by SNOWPACK as the amount of precipitation markedly increases at the end of September and we want to use SD as validation data.

265
Therefore, three different precipitation corrections depending on WS (Smith, 2007;MacDonald and Pomeroy, 2007) or on TA and WS (Wolff et al., 2015) were applied to the observed precipitation (See Sect. S3). Eq. (12) from Wolff et al. (2015) with WS corrected to gauge height using a logarithmic wind profile (e.g. Lehning et al., 2002a) and a z 0 of 0.01 m is used as precipitation data (P cor ) in the further study, as this correction approaches the PSWE and shows an increase in precipitation of 2.35 times the observed precipitation at the end of the season.

4.2 Sensitivity analysis of parameterizations
Evaluation of simulated SD and SWE based on various parameterizations shows that both models are in good agreement with observations (Fig. 4), although they overestimate SWE at the beginning of the season (May/June). The correction of the precipitation with the equation from Wolff et al. (2015) overestimates the precipitation in this period, and also leads to an overestimation in the simulations.

275
For SNOWPACK, the spread of the simulated SD from the 40 different parameterizations (20 simulations for z 0 = 1 mm and 20 for z 0 = 1 cm) is the largest at the end of the snow season (i.e. October) (Fig. 4a). The date of snow-free surface is the earliest at 8 October and exceeds the simulated period (i.e. after 30 November), depending on parameterization choice, and covers the observed date of snow removal (i.e. 16 October). The different SNOWPACK parameterizations (equations in Sect. S1) show a mean SD difference of 32 cm (which corresponds to 28.9% of the total SD) between the minimum and maximum simulated SD 280 (Fig. 4a), with a maximum of 127 cm observed at 27 June. For the SWE, this corresponds to a mean difference of 98.3 mm w.e.
(i.e 28.2% of the total SWE) (Fig. 4c). The large modelled SD spread in May and June can be explained by the different density parameterization choices as it is not apparent in the SWE simulations (Fig. 4a,c). The rapid decrease (3-8 cm d −1 ) of snow depth until July, caused by compaction of the snowpack, is simulated by the majority of fresh snow density parameterizations, while only one fresh density parameterization models a more moderate compaction (i.e. the bold red line has a moderate slope, 285 compared to the light red lines in Fig. 4a, until July). From July onward, the measured snow depth decreases 10 centimetres per 25 days, which is only simulated by the fresh density parameterization that simulated moderate compaction before July. Snow density measurements were unavailable in 2017 and the observed snow density in Fig. 4g is calculated with SWE /SD. The observed snow density is only shown until the end of August, as the calculation led to unrealistic decreasing snow densities after August. This is likely caused by higher readings at the SD sensor than at the SWE sensor, as those sensors were placed in terms of snow compaction after snowfall events, end of snow season, and albedo evolution (Fig. 4e). Therefore this simu-295 lation with a z 0 of 1 cm is selected as the reference simulation (represented in bold lines in Fig. 4) for the forcing uncertainty simulations.
For SnowModel, the largest SD spread of the 12 ensembles (six for every z 0 , equations in Sect. S1) occurs at the end of the simulated snow season (i.e. August, September and October) with complete snow removal between 21 October and 12 November (i.e. 22 days) (Fig. 4b). The mean SD difference between the parameterizations is 20 cm (i.e. 18% of the total SD), 300 with a maximum of 152 cm at the first snowfall event (12 May), while for SWE the mean difference is 57 mm (i.e. 19.2% of the total SWE) with a maximum of 317 mm w.e. at 12 August (Fig. 4d). Quantitative analysis (Sect. S4) shows best performance scores for the time-evolution albedo approach in combination with the reference snow density parameterization and a z 0 of 1 cm (RMSE of 0.150 (-) and R 2 of 0.600). Therefore, these are used as the reference simulation (bold line in Fig. 4).

305
Comparison of the SNOWPACK and SnowModel output shows similar SD variations attributed to snow density parameterizations that simulate low density snowfall with notable subsequent compaction (Fig. 3g,h). In reality, this happens at Tapado until June, followed by a different regime with denser fresh snow and less compaction. The biggest difference between the models, however, is the result of the albedo parameterizations. Where SnowModel relies on two albedo models based on TA and albedo ranges, SNOWPACK relies on empirical relations calibrated with measurements in Switzerland and not adapted to 310 the arid Tapado climate. Nevertheless, the albedo of the reference run of SNOWPACK performs well in a semi-arid area. Last, the simulations by SnowModel all approximate the end of snow season within a period of 22 days, whereas the simulations at the end of season noticeably diverge for SNOWPACK. This is likely caused by TA being above the freezing point at the end of October, resulting in a fast melt simulated for all ensembles by SnowModel.

Including precipitation uncertainty
The forcing perturbations including precipitation uncertainty shows that precipitation uncertainty has a large impact on SD and SWE ensemble spread (Fig. 5b,d). Averaged over the season this results in SD/SWE biases of 75 cm/257 mm w.e. and 70 cm/262 mm w.e. for SNOWPACK and SnowModel, respectively. Along with the similar average spread over the entire season 330 observed for both models, the range of the simulated day of snow-free surface is also similar; for SnowModel this date ranges between 20 August and 29 November (i.e. 101 days) and the range is similar but a bit later in the season for SNOWPACK (i.e. between 29 August and early December). Again, the main differences are found in the settling of the snowpack (See Sect. 5).

Consequences of the model choice and parameterizations on sublimation
Ablation rates (Fig. 6) show that sublimation is the dominant mode of mass loss in both models until September (i.e. cold 335 period), and followed by melt from September to the end of the season (i.e. end of November, and called the melting period).
Note that for SNOWPACK, the first day of snow-free surface of the reference run is 11 October and for SnowModel 27 October.
For SNOWPACK, the spread of the averaged sublimation rates corresponding to the ensemble runs from the first day of snow to 20 September have a minimum of 1.41 and a maximum of 2.96 mm w.e. d −1 (Fig. 6a). During the cold period, when no melt occurs, the sublimation amounts mainly depend on the z 0 , with sublimation rates ranging between 1.40 and 3.18 mm 340 w.e. d −1 , but this is mainly clustered according to the implemented z 0 . At the end of the season, the total sublimation ranges between 153 and 364 mm w.e. (corresponding to 36.2 to 86.0% of the total ablation, again stronly depending on the z 0 ).
During the melting period, the ensemble runs show a large spread of melt rates ranging between 0.97 to 17.7 mm w.e. d −1 . The total amount of runoff is between 28.9 and 236 mm w.e. for SNOWPACK and this model also simulates evaporation, which contributes between 2.5 and 10.2% of total ablation (Fig. 6a).

345
For SnowModel, sublimation differences between the parameterizations are similar (Fig. 6b) with average sublimation rates from the first day of snow to 20 September ranging between 1.27 to 2.79 mm w.e. d −1 . At the end of the winter season the sublimation totals range between 154 and 342 mm w.e. (which corresponds to 36.4 to 80.7% of the total ablation). The runoff is between 81.7 and 269 mm w.e.. A closer analysis of Fig. 6b shows that SnowModel's output clusters into four groups, where the grouping is determined by the albedo parameterization and z 0 with limited influence of fresh snow density parameterizations.

350
The two lower clusters are linked to the z 0 value of 1 mm. The differences between clusters for different z 0 values increase as the difference in albedo between the parameterizations increase at the end of June.
While the ensemble parameterization simulations do not lead to significant differences in the modelled end date of the snow season (i.e. difference of 22 days), the albedo parameterization and z 0 value directly impact the proportion of sublimation versus melt to the total ablation (Fig. 6b,d). During the cold period, simulations considering the lowest albedo and z 0 of 1 cm (the 355 reference simulation), lead to a higher sublimation rate (Fig. 6b). Indeed, a lower albedo increases the energy absorbed by the snowpack, and as the temperature is below the freezing point, this energy leads to an increase in the sublimation. A higher z 0 enhances this process even more as this leads to a more negative latent heat flux. Second, the increase of net shortwave radiation also affects the physical properties of the snowpack resulting in an increase of compaction (Fig. 4b,h). The snow density of the snowpack is therefore higher, which directly affects the thermal conductivity of the upper snow layers (Yen, 1981). Surface 360 temperature variation is directly linked to the latent heat flux and therefore to sublimation, explaining the different sublimation ratios simulated depending on the albedo parameterizations and z 0 values.
In contrast to SnowModel, the albedo parameterization in SNOWPACK does not affect the sublimation but noticeably influences the melting rate (Fig.6c), which can be attributed to the more complex characteristics of this model. SNOWPACK allows 365 refreezing and evaporation of melting snow within the snowpack, which can lead to a longer melt season, whereas calculated evaporation leads to a lower amount of runoff from melt. Also, SNOWPACK considers a more complex representation of snow physics, such as the grain size and microstructure, which directly impacts the albedo and can help to explain the wide diversity of melt simulations. The representation of turbulent fluxes is an important variable to consider to simulate sublimation and in snow models this is commonly based on the bulk method; the Richardson number is often used, together with the Monin-Obukhov similarity theory, to evaluate the atmosphere stability (e.g. Liston and Hall, 1995;Vionnet et al., 2012). Here, only the Richardson number is used as both models offered this option and the uncertainty associated to the turbulent fluxes parameterization is only 380 considered by implementing two different z 0 values, while it can have major implications in surface energy balance modelling (e.g. Dadic et al., 2013;Conway and Cullen, 2013;Litt et al., 2017;Réveillet et al., 2020). While the stability function cannot be compared between the two models chosen in this study, the sensitivity of SNOWPACK to the six possibilities available in the current version is low (i.e. max SD bias of a few centimeters, results not shown). In a study over Brewster Glacier in New-Zealand, Conway and Cullen (2013) pointed out the importance of the stability functions to properly simulate the heat fluxes with low wind speed and large temperature gradients and also that the modelled latent heat fluxes were unaffected by the choice of exchange coefficient parameterization. The present study takes place in a dry and windy environment without a large temperature gradient and this helps to explain the small differences observed related to the different atmospheric stability functions. The turbulent fluxes parameterization is sensitive to the z 0 value and observations, such as Eddy Covariance measurements, are essential to accurately parameterize the turbulent fluxes (e.g. Conway and Cullen, 2013;Litt et al., 2017;390 Réveillet et al., 2018).
Due to the absence of such measurements, the variability of this value over time (e.g. MacDonell et al., 2013b;Pellicciotti et al., 2005;Nicholson et al., 2016) and due to other values in literature at other locations showing a wide variety (two orders of magnitude) of the snow roughness length (Gromke et al., 2011;Poggi, 1977;Bintanja and Broeke, 1995;Andreas et al., 2005), it was decided to use two different values for z 0 (1 mm and 1 cm). Similar sensitity ranges for SNOWPACK and SnowModel 395 were found (e.g. Fig. 4), along with similar sublimation rates, but this directly depended on the value for z 0 . For both models, a z 0 of 1 cm led to better simulations (Fig. 3, 4), but as applied more often, this can also be seen as optimizing parameter (e.g. Stigter et al., 2018). In future work, the z 0 can be verified with eddy covariance measurements.
The biggest differences between the models are found as the snow settles and therefore depends on the snow density parameterization. The challenge in this study was that the snow settling showed two distinct regimes. From May to mid-June, 400 high compaction rates were found, whereas the compaction afterwards was more moderate. SNOWPACK is able to model both moderate and high compaction, depending on the parameterization chosen, but the mode of compaction remains the same over the season. SnowModel simulates high compaction rates for all parameterizations, which is correct for the start of the season but an overestimation after mid-June. These compaction rates implicate changes in the thermal conductivity of the snowpack and thus changes in the melting. The different snow density parameterizations in SNOWPACK are still developed and im-405 proved (e.g. Keenan et al., 2021), but an improvement of snow density parameterizations in semi-arid regions shows a demand for snow density measurements, as deriving density from SWE/SD measurements is biased over direct density observations using manual measurements (Smith et al., 2017).
Subsequently, the albedo parameterization appears an important parameter to be properly assessed (Fig. 4, 6), as also reported by the studies performed in Alpine regions (Etchevers et al., 2004;Zolles et al., 2019). This can be surprising at first glance as 410 in the semi-arid Andes the ablation is mainly driven by the sublimation and the albedo parameterization is generally crucial to accurately simulate the melt. However, according to the results presented here, the two models agree with the larger sensitivity to the albedo parameterization. The impact of parameterization choice differs for the two models as the uncertainty is directly related to the difference in snow physical representation and the characteristics of the models. Indeed, the range of the ensemble approach simulated by SNOWPACK is higher than that simulated by SnowModel, which is directly related to both higher 415 number of parameterization possibilities for SNOWPACK and more complex physical representation of the processes.
Likewise, results presented here show that the main sensitivity remains in the forcing uncertainty, in agreement with previous studies (Magnusson et al., 2015;Günther et al., 2019;Raleigh et al., 2015;Schlögl et al., 2016). For instance, Magnusson et al. (2015) found that the models of different complexity (temperature-index models vs. physical models) show similar ability to reproduce daily observed snowpack runoff, and concluded that the forcing uncertainties is the greatest factor affecting model 420 performance, rather than model parameterizations. However, as mentioned by Raleigh et al. (2015), simulated SD and SWE are critically sensitive to the relative magnitude of errors in forcing. Raleigh et al. (2015) and Schlögl et al. (2016) also mentioned that precipitation bias (or correction of the undercatch of the precipitation gauge) was the most important factor, in agreement with the findings of our study.
Finally, Rutter et al. (2009)  The sensitivity study of the two models to the forcing is done by adding a bias to the meteorological variables with ranges 430 derived from literature. It was also possible to add random noise to the data, but this does not necessarily preserve the physical consistency and would lead to low sensitivity of the models (results not shown), as random noise counterbalances each other, which has also been investigated by Raleigh et al. (2015). It would also have been possible to apply a random perturbation (e.g. Charrois et al., 2016) using a first-order auto-regressive model (Deodatis and Shinozuka, 1988). However, the forcing bias does not affect the conclusion of the relative comparison of the two models which only requires the exact same forcing 435 as input to be relevant. For the same reason, the choice of method applied for the forcing correction (i.e. for precipitation) and reconstruction (i.e. for the TA and RH) would not affect the conclusions of the model comparison. However, due to the precipitation uncertainty related to measurements errors, and also because the sensor locations may not be representative of the area, different ways to correct the precipitation data were proposed. Günther et al. (2019) and Grünewald and Lehning (2014) already outlined that the snow cover is spatially heterogeneous even at very small scales due to topographic and microclimatic 440 effects on accumulation, redistribution, and ablation processes, introducing an uncertainty in validation data. We also show that in any case, due to i) the question of the sensor location representativity of the area, ii) the precipitation undercatch because of the wind, and, iii) the high sensitivity of models to precipitation uncertainty, this study highlights the complexity and necessity of accurately measuring precipitation. Additionally, possible corrections also depend on the availability of observations, but this study was restricted to not using SWE and SD as forcing, as these parameters were needed as validation data. Therefore, 445 we chose a precipitation correction that overestimates snowfall at the start of the season, but does not capture the increase of SD and SWE in mid-June, resulting in a good agreement between simulated and observed SWE from the beginning of July (e.g. Fig. 4). Testing different albedo parameterizations is chosen as i) different options are possible in both models and ii) previous studies 455 concluded that the largest absolute uncertainties originate from the shortwave radiation and the albedo parameterizations (e.g. Zolles et al., 2019). The sensitivity test to different fresh snow density parameterizations was also chosen as previous studies identified this parameterization as a significant uncertainty in model calibration (e.g. Essery et al., 2013). Finally, energy balance models are known to be sensitive to z 0 , especially in cold and dry regions where sublimation is the main ablation process (e.g. Réveillet et al., 2020). However, due to this important sensitivity and the absence of measurements to properly calibrate 460 this value, two values for z 0 were implemented, but this still might underestimate the possible range of z 0 values.
Otherwise, despite the choice of limiting the parameterization options, SNOWPACK's sensitivity to model parameterizations is evaluated based on 40 simulations, whereas SnowModel's evaluation is based on 12 simulations only. However, the difference of the number of simulations does not impact the conclusion, as the width of the spread of different parameterizations was not quantitatively assessed.

465
Among the possible settings of the model, the snow transport option has not been activated, while the option is available in both models. However, due to the strong wind speed characteristics of the study area (Gascoin et al., 2013, and Fig. 2), snow transport is expected to be considerable. Yet, snow transport estimation remains out of the scope of this study, focused on energy balance comparisons, mainly to assess differences in sublimation rates. Also, in a study performed in the Pascua-Lama catchment, a region to the north of Tapado AWS, Gascoin et al. (2013) highlighted that the inclusion of SnowTran3D does not 470 change the fact that the model is unable to capture the small-scale snow depth spatial variability (as captured by in-situ snow depth sensors). Finally, snow transport in SNOWPACK can only be simulated in the 3D domain with SD as forcing, which then could not be used as validation data. However, due to the importance and complexity in modelling snow transport, properly assessing its impact could be assessed in future work.

Conclusions
Snow models are key to quantify runoff and provide accurate water availability projections. The aim of this study is to compare two snow models, SNOWPACK and SnowModel, and evaluate their sensitivity relative to parameterization and forcing. For that purpose, the two models are run over the 2017 snow season, at local point, and forced with i) the most ideal set of input parameters, ii) an ensemble of different physical parameterizations and iii) an ensemble of biased forcing.

480
The most ideal set of input parameters consisted of observed forcing and the validation parameters (SD and S ↓ for SNOW-PACK; SWE for SnowModel) given as input. Hence, the models were able to assimilate the forced precipitation to correct for undercatch in the precipitation gauge. SNOWPACK is able to approach the observation very well (i.e. min. RMSE of 9.2 cm, max. R 2 of 0.93 calculated with the observed and simulated SD), but SnowModel only adjusts the precipitation at two precipitation events, still leading to undercatch. The final correction of the precipitation data was done with an equation based 485 on TA and WS, as it was unwanted to adjust the precipitation with SNOWPACK's assimilated data, as this assimilated data is built from data, which were required for model evaluation. The parameterization simulations were done considering different parameterizations of the albedo and the fresh snow density and different values for z 0 . Results indicated a significant difference related mainly to the parameterization choice of the albedo and z 0 . However, the impact of the albedo affects the two models differently. For SnowModel, the albedo parameterization has a significant impact on the simulated sublimation during the cold 490 period while SNOWPACK simulates similar sublimation rates for all the possible parameterizations. The choice of albedo parameterization in SNOWPACK has direct consequences on melt at the end of the season. The model differences are mainly related to the model characteristics (e.g. the consideration of the water evaporation and refreezing into the snowpack), and the more complex representation of the snow physics in SNOWPACK. However, the models are both sensitive to the chosen z 0 , leading to sublimation rates ranging from 36% up to 86% of total ablation.

495
In addition, results presented in this study highlight a larger uncertainty depending on the model parameterization (despite the limited number of options chosen) than between the two models (despite the significant differences in their physical complexity).
Otherwise, for both models, results show high levels of uncertainty related to forcings which is directly related to the bias chosen, but the spread of the uncertainty for both models is approximately the same. SNOWPACK and SnowModel are highly 500 influenced by precipitation uncertainties. Both models show similar levels of uncertainty, in modelling the end of the season. .
Our study covers the winter season of 2017, and our conclusions on model sensitivity to various parameterizations are specific to that period. In further studies, simulations could be performed over a larger time period, and at distinct places to complement our results. Furthermore, additional models could be used, in particular snow models with similar physical 505 complexity. Such work would provide additional information of the parameterization sensitivity by allowing a comparison based on a larger choice of possible parameterizations.
Data availability. Part of the data used in this paper (AWS data) can be accessed at https://www.ceazamet.cl. SnowModel can be accessed by contacting the administrator, Glen E. Liston. SNOWPACK is an Open Source model and can be accessed at https://gitlabext.wsl.ch/snowmodels/snowpack. For any other access to the data presented in this study, please contact the authors.

510
Author contributions. AV conducted data preparation, ran the numerical experiments and produced the figures. AV and MR designed the modelling strategy. MR and SM designed the study. All authors contributed to the results analysis and to the preparation of the paper.
Competing interests. The authors declare that they have no conflict of interest.