Comparison of modeled snow properties in Afghanistan, Pakistan, and Tajikistan

Ice and snowmelt feed the Indus River and Amu Darya in western High Mountain Asia, yet there are limited in situ measurements of these resources. Previous work in the region has shown promise using snow water equivalent (SWE) reconstruction, which requires no in situ measurements, but validation has been a problem. However, recently we were provided with daily manual snow depth measurements from Afghanistan, Tajikistan, and Pakistan by the Aga Khan Agency for Habitat (AKAH). To validate SWE reconstruction, at each station, accumulated precipitation and SWE were derived from snow depth using the numerical snow cover model SNOWPACK. High-resolution (500 m) reconstructed SWE estimates from the Parallel Energy Balance Model (ParBal) were then compared to the modeled SWE at the stations. The Alpine3D model was then used to create spatial estimates at 25 km resolution to compare with estimates from other snow models. Additionally, the coupled SNOWPACK and Alpine3D system has the advantage of simulating snow profiles, which provides stability information. The median number of critical layers and percentage of faceted layers across all of the pixels containing the AKAH stations were computed. For SWE at the point scale, the reconstructed estimates showed a bias of − 42 mm (−19 %) at peak SWE. For the coarser spatial SWE estimates, the various models showed a wide range, with reconstruction being on the lower end. A heavily faceted snowpack was observed in both years, but 2018, a dry year, according to most of the models, showed more critical layers that persisted for a longer period.


Introduction
There are many parts of the world where little is known about the snowpack. This lack of knowledge presents a challenge for water managers and for avalanche forecasters. Afghanistan is particularly austere in this respect, as there have been no snow measurements available since the early 1980s. This lack of information about the snowpack potentially creates a humanitarian crisis, as snowmelt-fed streams run dry in the fall without warning (USAID, 2008). Accurate historical estimates of basin-wide snow water equivalent (SWE) are crucial for creating a baseline of climatological conditions, which can then aid in predicting today's SWE. For example, SWE climatology is the most important predictor in machine-learning statistical models for this region (Bair et al., 2018a).
To improve our knowledge about the snowpack in these areas, we have developed an approach that requires no in situ measurements. Using satellite-based estimates of the fractional snow-covered area (fSCA) and downscaled forcings in an energy balance model, we build up the snowpack in reverse, from melt out to its peak, using a technique called SWE reconstruction (Martinec and Rango, 1981). This technique has been shown to accurately estimate SWE in mountain ranges across the world, including the Sierra Nevada, USA Bair et al., 2016); the Rocky Mountains, USA (Jepsen et al., 2012;Molotch, 2009); and the Andes of South America (Cornwell et al., 2016) -all areas with relatively abundant independent ground validation measurements. For the so-called Third Pole of High Published by Copernicus Publications on behalf of the European Geosciences Union. 332 E. H. Bair et al.: Snow properties in Afghanistan, Pakistan, and Tajikistan Mountain Asia, and especially the northwestern parts of this region, e.g., Afghanistan, Tajikistan, and Pakistan, groundbased validation is challenging.

Aga Khan Agency for Habitat (AKAH) stations
In 2017, we received daily manual snow depth and other meteorological measurements from nearly 100 stations (Fig. 1) in an operational avalanche network (Chabot and Kaba, 2016). These stations are funded by the Aga Khan Agency for Habitat (AKAH) and are the first snowpack measurements available, at least that we are aware of, in Afghanistan in nearly 40 years. Hence, we refer to the region as the AKAH study region and the weather stations as the AKAH stations. The AKAH stations contain manual daily snow depth (also called height of snow), height of new (24 h) snow, daily high and low air temperature, instantaneous wind speed and direction, rainfall, and some text fields on weather and avalanche conditions. For mountainous areas, precipitation is the most uncertain term in the water balance (Milly and Dunne, 2002;Adam et al., 2006) because it exhibits high spatial variability and is difficult to measure with traditional gauges. Measuring snow on the ground has many advantages compared to using precipitation gauges, which suffer from undercatch, especially in the windy and treeless areas (Lehning et al., 2002a;Kochendorfer et al., 2017;Goodison et al., 1998) typical of this part of the world. Likewise, a strength of the SWE reconstruction technique is that it does not depend on precipitation measurements to build the snowpack.
Additionally, many of the AKAH stations are at high elevation, with 64 stations above 2500 m and 17 stations above 3000 m. Unfortunately, most of these stations are located in deep valleys, where the villages are, rather than on the mountains above, and the daily resolution is too coarse to use in a snow model without temporal interpolation. Additionally, many of the stations are near glacierized areas, which complicates spatially interpolated snow estimates, as some of the snow is on top of ice. The area covered by glaciers in Fig. 1 is 7.8 %.
Although there have been a large number of studies examining the glaciers of High Mountain Asia, there are fewer studies examining snowfall in High Mountain Asia, which is odd, since hydrologically in this region, snow on land melt provides the vast majority of runoff compared to snow on ice and melting glacier ice (Armstrong et al., 2018). Many of these studies are focused on the region to the east of the AKAH study area, shown in Fig. 1. To our knowledge, there have been no studies on snowpack stratigraphy in the AKAH study area, and we were unable to obtain any snow pit measurements from this area.  (Raup et al., 2007). All of the stations in Afghanistan and Tajikistan are in areas that eventually flow into the Amu Darya. All of the stations in Pakistan are in areas that eventually flow into the Indus River. Imagery courtesy of NASA Worldview.

Literature review
A few studies have specifically examined snowfall in larger regions that include some of the AKAH stations, mostly for stations in the southern basins that flow into the Indus River -that is, all of the stations in Pakistan. The rest of the stations in Afghanistan and Tajikistan are in basins that flow into the Amu Darya. The most comparable study (Shakoor and Ejaz, 2019) examines the Passu catchment in the Hunza River basin, on the right in Fig. 1. As in this study (Sect. 5.1), Shakoor and Ejaz (2019) also use the SNOWPACK and Alpine3D models. Model parameters were calibrated using a single weather station, Urdukas, at 3926 m elevation near the Baltoro Glacier (Ev-K2-CNR, 2014), with 1 year of precipitation measurements, using snow depth for validation. The authors report overestimation of the measured snow depth at the calibration station, even after questionable adjustments to the snow albedo and other model parameters. For example, the snow and ice albedo is given as 0.20 to 0.30 (Table 3, Shakoor and Ejaz, 2019), which would make it 0.10 to 0.20 lower than some of the lowest measured broadband albedo values for dirty snow (Bair et al., 2019b;Skiles and Painter, 2016). They attribute the overestimation to problems with the precipitation measurements, common for high-elevation stations. One problem with the Urdukas station in particular is that the tipping-bucket precipitation gauge is unheated, mak- ing it unusable for measuring solid precipitation. Temperatures at this station were well below freezing for the winter and most of the spring, which explains why no precipitation was recorded from January until sometime in March during 2012, the calibration year. Viste and Sorteberg (2015) study several gridded precipitation products throughout High Mountain Asia, including the Indus Basin. They report that while total precipitation was similar across the products -including MERRA (Rienecker et al., 2011), APHRODITE (Yatagai et al., 2012), TRMM (Huffman et al., 2007), and CRU (Harris et al., 2014) -the total snowfall varied by a factor of 2 to 4. Smith and Bookhagen (2018) used 24 years (1987to 2009) of satellitebased passive microwave SWE estimates to examine trends throughout High Mountain Asia, including the Amu Darya and Indus basins. Their SWE estimates show most 25 km pixels in this region in the 50-100 mm range for December through February, with a few over 100 mm in the Amu Darya basin (i.e., all the AKAH stations in Afghanistan and Tajikistan) and none over 100 mm in the Indus Basin (i.e., all the AKAH stations in Pakistan), likely too low by an order of magnitude for some pixels given our previous reconstructed SWE values and limited climate measurements in Afghanistan (Bair et al., 2018a).
For the AKAH stations in Tajikistan, the most comprehensive snow measurements come from Soviet snow surveys (mostly depth, but with some SWE and density measurements) that have been digitized (Bedford and Tsarev, 2001). Most of these measurements begin in the late 1950s and end around the fall of the Soviet Union, in either 1990 or 1992, making them useful for climatological studies but not for validation of modern satellite-based estimates.
The sole source of accessible snow measurements in Afghanistan was a table of outdated World Meteorological Organization (WMO) monthly climatological data from Kabul (elevation 1791 m) and North Salang (elevation 3366 m), showing the maximum monthly snow depth and the mean number of days with snow ( Table 1 in Bair et al., 2018a). Again, these measurements are not useful for validating more modern snow estimates.
There have been many other studies that have attempted to estimate basin-wide precipitation (including snowfall) for larger areas that include the AKAH region, especially in the Indus. Several climate studies of the Indus have focused on using lower-elevation precipitation gauges, which are then used to spatially interpolate basin-wide precipitation. Dahri et al. (2016) and Dahri et al. (2018) have assembled perhaps the largest collection of climatological measurements covering the AKAH region, mostly based on gauge measurements, as part of a study on the hydrometeorology of the Indus Basin. Using undercatch corrections based on wind, often from reanalysis, they increased precipitation estimates by 21 % on average throughout the Indus Basin (Dahri et al., 2018). For example, in the Gilgit sub-basin, they find an unadjusted precipitation estimate of 582 mm yr −1 , adjusted to 787 mm yr −1 , a 35 % increase. Although some of the measurements are taken from publicly available sources, as with most publications for this region, the comprehensive data used are not publicly accessible.
A similar but less sophisticated approach was used by Lutz et al. (2014), who used a constant increase of 17 % across the APHRODITE precipitation dataset, which covers all of High Mountain Asia. Immerzeel et al. (2015) used glacier mass balance estimates with streamflow measurements as validation to show that high-altitude precipitation in the upper Indus Basin is 2 to 10 times what is shown using gridded precipitation products like APHRODITE. Bookhagen and Burbank (2010) estimate that snowmelt contributes 66 % of annual discharge to the Indus and averages 424 mm across the basin.
In summary, quite a few studies have produced varying precipitation and snowfall estimates for the AKAH region, with no recent in situ snow measurements from Afghanistan or Tajikistan.

Previous work with AKAH snow measurements
Our previous work (Bair et al., 2018a) used a simple density model (Sturm et al., 2010) based on snow climatology (Sturm et al., 1995) and day of year to model SWE from the manual snow depth measurements. The density model itself has −12 % to 26 % bias in predicting SWE. When taking into account geolocational uncertainty of the reconstructed SWE estimates and uncertainty in the density model, errors are on the order of 11 %-13 % mean absolute error (MAE) and −2 % to 4 % bias, depending on the date. However, we only examined 1 year of the AKAH station data (2017), and the high uncertainty in the density model itself begs a more sophisticated approach.
From recent work (Bair et al., 2018b), we have shown that the SNOWPACK (Lehning et al., 2002a, b;Bartelt and Lehning, 2002) model is capable of accurate SWE prediction when supplied only with snow depth for precipitation as well as the other requisite forcings (i.e., radiation, snow albedo, temperatures, and wind speed). Over a 5-year period using hourly in situ measured energy balance forcings and a snow pillow for validation at a high-elevation site in the western US, the SWE modeled by the numerical snow cover model SNOWPACK showed a bias of −17 mm, or 1 % (Bair et al., 2018b). Likewise, the success of the Airborne Snow Observatory  has demonstrated that given accurate snow depth measurements, SWE can be well modeled.

Methods
Our modeling approach consisted of (a) downscaling forcings in the Parallel Energy Balance Model (ParBal) and reconstructing SWE, (b) combining the downscaled forcings for each AKAH station with temporally interpolated man-334 E. H. Bair et al.: Snow properties in Afghanistan, Pakistan, and Tajikistan ual snow measurements, (c) running SNOWPACK for each of the AKAH stations with the downscaled and interpolated measurements from (a) and (b), and (d) running Alpine3D using the output from SNOWPACK, notably the hourly precipitation. In addition to predicting SWE, the SNOWPACK-Alpine3D coupled model also predicts stratigraphic parameters useful for avalanche forecasting, thereby giving us an idea of the layering and stability in this region. For comparison, we also ran the Noah-Multiparameterization (Noah-MP) land surface model over the region with widely used forcings. We also compared spatial estimates of SWE from GLDAS-2. Methods are summarized in Table 1 and explained below, with more detail provided in Appendix A.

SNOWPACK and Alpine3D
SNOWPACK and Alpine3D are freely available (https:// models.slf.ch, last access: 15 May 2019) point and spatially distributed snow models, courtesy of the Swiss Snow and Avalanche Research Institute SLF. SNOWPACK is the older of the two and uses finite elements to model all of the layers in snowpack at a point.
SNOWPACK has shown promising results in both operational (e.g., Lehning et al., 1999;Nishimura et al., 2005) and research applications (e.g., Bellaire et al., 2011;Hirashima et al., 2010). Previous results with SNOWPACK (Bair et al., 2018b) show high model sensitivity to precipitation but only a 1 % error in modeled SWE when using snow depth only (not total precipitation) as a forcing. Thus, given reliable snow depth measurements at each AKAH station (see Sect. 5.5), modeled SWE during the accumulation season is treated as having negligible uncertainty. During the ablation season (after peak SWE), uncertainty is higher. Unlike during snow accumulation events, SNOWPACK does not force its modeled snow ablation to match the measured snow depth decreases. Uncertainty in SWE during the ablation season is then largely dependent on radiative forcings (Marks and Dozier, 1992) and the broadband snow albedo (Bair et al., 2019b). Here, 5 % uncertainty is used, based on the MAE from SWE reconstructions using the same remotely sensed forcings at a continental subalpine site (Bair et al., 2019b). In the same study, a small (3 %) bias in SWE was also found, but this is likely due to shortcomings with the reconstruction method and not applicable to SWE modeled with SNOW-PACK. Thus, the small bias was ignored. We acknowledge that these uncertainty estimates are themselves uncertain; e.g., the reanalysis forcings could be especially poor for this region compared to those available in the western US.
Alpine3D (Lehning et al., 2006) is essentially a spatially distributed version of SNOWPACK with a number of additional modules, including terrain-based radiation modeling, blowing snow, and hydrologic modeling. Integral to Alpine3D is SNOWPACK, which is run for each pixel, as well as the MeteoIO library (Bavay and Egger, 2014), which provides a large number of temporal and spatial interpola-tion functions that can be used on forcings for Alpine3D and SNOWPACK.

The Parallel Energy Balance Model
ParBal was created at UC Santa Barbara and designed for reconstruction of SWE. It is also publicly available (https:// github.com/edwardbair/ParBal/releases/tag/v1.0). Currently, ParBal is designed to use downscaled temperature, pressure, and humidity from version 2 of the Global or National Land Data Assimilation System (GLDAS-2 and NLDAS-2; Xia et al., 2012;Rodell et al., 2004), shortwave and longwave radiation from Edition 4A of the Clouds and the Earth's Radiant Energy System (CERES; Rutan et al., 2015) SYN product, and time-space-smoothed (Dozier et al., 2008;Rittger et al., 2020) snow surface properties from MODIS Snow Covered Area and Grain Size (MODSCAG; Painter et al., 2009) and MODIS Dust and Radiative Forcing in Snow (MODDRFS; Painter et al., 2012). ParBal is run hourly at 500 m spatial resolution, and forcings are adjusted for terrain and elevation. The main output is the residual energy balance term, which is assumed to go into melt when positive during the ablation phase after cold content is overcome (Jepsen et al., 2012). This residual melt term is then summed in reverse during periods of contiguous snow cover and multiplied by the fSCA to spread the snow spatially. The errors in SWE from ParBal are mostly from fSCA and the radiative forcings. Errors and details on ParBal are covered extensively in  and Rittger et al. (2016). In the supplement for Bair et al. (2018a), the errors arising from using GLDAS-2 and CERES 4A (available worldwide but at coarser spatial resolution) vs. NLDAS-2 are specifically evaluated. Using 3 years of basin-wide SWE estimated by the Airborne Snow Observatory in the upper Tuolumne basin, California, USA, the MAE for ParBal was 25 mm, or 26 % (Bair et al., 2018a).

Global Data Assimilation System 2 (GLDAS-2)
For comparison, we also include the SWE estimates from GLDAS-2 (Noah). SWE from GLDAS-2 has been shown to be comparable to estimates from other reanalysis datasets but negatively biased by about 60 % in comparison to higher spatial datasets with assimilation from snow station measurements (Broxton et al., 2016).

Noah-Multiparameterization (MP)
The Noah-MP v3.6 (Niu et al., 2011;Ek et al., 2003) land surface model, forced using MERRA-2 (Gelaro et al., 2017), was used to simulate the hydrologic cycle over the study area and provide SWE estimates for comparison with Par-Bal and the Alpine3D output. Noah-MP was selected due to its detailed representation of the snowpack relative to other land surface models. The model subdivides the snowpack into up to three layers with associated liquid water storage and melt and refreeze capability (Yang and Niu, 2003;   . It incorporates the exchange of heat and moisture through the snowpack between the land surface and the atmosphere. In a model intercomparison study using a 2 km spatial resolution regional climate model for forcings, Chen et al. (2014) show that Noah-MP modeled peak SWE at SNOTEL sites in Colorado, USA, with a −7 % bias.

Use of AKAH station measurements
We modeled daily SWE at the AKAH stations during the 2017 and 2018 water years (WYs) primarily using the manually measured height of snow (HS), also called snow depth, combined with our downscaled energy balance parameters (for downscaling methodology, see Rittger et al., 2016;Bair et al., 2016Bair et al., , 2018a. To our knowledge, no quality control was performed on the AKAH station measurements before we received them. We choose the manual HS and new (24 h) snow (HN) as the only variables to use from the AKAH stations. The HS appeared to be the most reliably measured, as that only requires reading a value from a master snow depth stake. Apart from spurious drops or missing values (see below), the HS measurement appeared consistent and believable at most of the stations, implying an accurate snow depth record. The HN was used to correct a data entry problem in 2017 that we discuss below. The reliability of the other measurements (instantaneous wind speed and direction, maximum and minimum temperature, and rainfall) was questionable. For example, we were not provided with sensor or measurement metadata, e.g., sensor make and model, mea-surement height, and whether or not the temperature sensor was shielded from shortwave radiation. These other measurements taken daily were also of limited value for interpolation to hourly values (see item 3 below). Thus, these other measurements were not used.
The AKAH dataset had a number of shortcomings that we list here along with how we addressed them.
1. Some of the stations recorded no snow at all, especially in the dry 2018 year, or had obvious problems, such as weeks of missing measurements, so they were excluded. For 2017, 52 (54 %) of stations were used. For 2018, 41 (46 %) stations were used.
2. There were spurious drops in the HS measurements. The drops were clearly cases of missing values being filled with zeros. These measurements were manually flagged and converted to null values for interpolation (see below).
terpolation from daily to hourly measurements such that we assumed that all the new snow fell in a 24 h period. The other case was for days where the linear interpolation would yield a value below the minimum threshold hard-coded into SNOWPACK (0.5 cm h −1 ) for the first accumulating snowfall on bare ground. In this case, a previous neighbor interpolation was used in such a way that the entire snowfall occurred in the last hr prior to the next day's measurement.
4. We found the AKAH stations suitable for snow on the ground measurements but not for rainfall or total (solid plus liquid) precipitation. This was only an issue for the Alpine3D snow modeling, as snow measurements were being extrapolated to higher elevations than the AKAH stations (Sect. 6.2); thus at these higher elevations, snow accumulated earlier and melted later than at the lower AKAH stations.
Given the near-total lack of canopy cover in the region, we suspected substantial undercatch from rain gauges. Using the wind speed, an undercatch correction would have been possible given more information on the gauges (e.g., orifice opening diameter and whether or not a shield was present); however these instrument metadata were not available to us. Likewise, we did not know if the gauges were heated or not.
Further, the time period for recording measurements from the stations was not consistent. In WY 2017, measurements began being reported on 10 November 2016 and were reported until 24 November 2017. However, in WY 2018, measurements were not reported until 1 December 2017, and no station measurements were reported past 1 April 2017. The reporting period likely covered all the snowfall events but not all the precipitation events.
To address the rainfall measurement and reporting issues, we used GLDAS Noah v2.1 (Rodell et al., 2004) rainfall and snowfall from the nearest grid cell (1/4 • spatial by 3 h temporal resolution) to fill in precipitation prior to the first measurements in each water year and after 1 April for both water years. We did not account for rain from 10 November 2016 to 1 April 2017 and from 1 December 2017 to 1 April 2018; instead we relied on the modeled precipitation from SNOW-PACK using snow depth. The AKAH station observations show that rain during this time period was rare. 5. A database problem prevented snow heights >100 cm from being entered into the database for a few days in 2017. This problem became apparent during February 2017, when the Nuristan avalanches took place (United Nations, 2017), as that is the first time that most stations recorded values >100 cm. Values were shown as 100 cm on multiple days, followed by values >100 cm.
To address this issue, we flagged all the values equal to 100 cm prior to peak snow depth in 2017 and then marked those as null values. We then filled those null values using the cumulative sum of new snow during that time.

Analysis of modeled snow profiles
For holistic measures of the snow profiles modeled in Alpine3D, we used two metrics from Bellaire et al. (2018): (1) the fraction of facets and (2) Table 1). In each profile, six different variables (grain size, difference in grain size, hardness, difference in hardness, grain type, and depth) in the top meter of the snowpack (from the surface) were checked against threshold values. Layers exceeding five or more thresholds were classified as critical.
The fraction-of-facets metric does not have a validation study, but faceted layers are a weak crystal form and are responsible for 43 %  to 67 % (Schweizer and Jamieson, 2001) of investigated avalanches. Layers classified as critical using the threshold sum approach above corresponded to failure layers about half of the time to failure layers found with compression tests (Monti et al., 2014), an in situ snowpack stability test (van Herwijnen and Jamieson, 1999).

Spatial scale for comparisons
Because ParBal is the only model run at 500 m spatial resolution and all the other models were run at ∼ 25 km, it is the only model appropriate for point comparisons, although point-to-area problems are still an issue. To address the geolocational uncertainty for the gridded MODIS products, which can be up to one ∼ 500 m pixel (Tan et al., 2006;Xiaoxiong et al., 2005) and spatial variability of the snow, we used a 9-pixel neighborhood centered on each AKAH station and chose the best fit to the SNOWPACK-modeled SWE. This approach has been used in previous work (Bair et al., 2018a;Rittger et al., 2016). We also include the high and low SWE values in that surrounding 9-pixel neighborhood to bound the uncertainty.
For all of the other model comparisons, we resampled all of the model output to a UTM (Zone 43S) grid with 25 km pixels, close to the native resolution of the Noah-MP and GLDAS2 grid used (0.25 • ). This yielded a study area of 105 625 km 2 (13 pixels ×13 pixels, each 25 km 2 in area). The ParBal output had to be significantly upscaled from 500 m to 25 km using Gaussian pyramid reduction (Burt and Adelson, 1983) in steps, with bilinear interpolation for the final step.

Results and discussion
The relationships between the components are summarized in Fig. 2. The results discussed below are comparisons of (1) SWE and (2) snow stratigraphy across (a) all of the AKAH stations (points) and (b) the entire study region.

Point comparisons between SNOWPACK and reconstructed SWE
A first step for any SWE reconstruction comparison is to determine when the ablation season starts. This varies for different years and at different sites (e.g., Margulis et al., 2016). Using the SNOWPACK-modeled SWE, we can examine the peak SWE dates for both years for all of the AKAH stations (Fig. 3a, b). Peak SWE dates vary across the stations and years, but the median values between years are a week apart, 19 February 2017 and 26 February 2018. Thus, we use those dates for our comparisons.
To create a holistic comparison for all the stations across the ablation period, mean SWE values were computed and modeled at all of the AKAH stations using SNOWPACK (blue lines) compared to reconstructed SWE from ParBal using a best-of-9-pixel approach (red lines). Also plotted is the median peak SWE date. The highlow (hi/lo) bounds (filled areas) represent uncertainty. For ParBal, uncertainty is expressed as the range of values in the 9-pixel neighborhood. For SNOWPACK, uncertainty is 5 % of the modeled SWE during the ablation season. See Sect. 5.1 and 5.2 for details. The modeled SWE values end abruptly on 1 April 2018 because the AKAH stations stopped reporting due to drought conditions. The number of stations used is the same as in Fig. 3. plotted for each day during the ablation season (Fig. 4). For the reconstructed SWE on 19 February 2017, the bias is −77 mm (−28 %). For the reconstructed SWE on 26 February 2018, the bias is −6 mm (−9 %). Thus, together these biases average to −42 mm (−19 %). The high-low values in the 9-pixel neighborhood show the wide spatial variation in SWE estimates and are to be expected in these deep valley sites (Sect. 6.2). The increases in reconstructed SWE during the ablation season are caused by differences in how melt is summed for any given pixel. In ParBal, melt is only summed during periods of contiguous snow cover. This means that if a pixel containing an AKAH station has no snow on it at some point during the ablation season, but then snow is detected, it causes an increase in the mean SWE. This is called an ephemeral snow event, i.e., snow that disappears and reappears. For a more in-depth examination of the error at individual stations, a box plot is shown for the median peak SWE dates for both years (Fig. 5). The median bias of the reconstructed SWE is −11 mm (−14 %).

Four model spatial comparisons
The AKAH stations are lower than the average elevation for the region. The average elevation of the AKAH stations is 2619 m (1735 to 3410 m). But when the 500 m DEM (digital elevation model) is upscaled to 25 km, the average elevation www.the-cryosphere.net/14/331/2020/ The Cryosphere, 14, 331-347, 2020 The range between models is striking. Noah-MP has the highest peaks (562 mm in 2017 and 331 mm in 2018) but is among the first to melt out. The reconstructed SWE from ParBal only shows minor variation between the 2017 peak (240 mm) and the 2018 peak (206 mm). ParBal and GLDAS-2 melt snow out latest in both years. This is especially true for ParBal in 2017, where the Supplement video shows that ParBal has snow cover over more pixels that persists for longer into the melt season but is lower in SWE than the other models. The Alpine3D model shows the second highest peak SWE in 2017 (469 mm) but the lowest peak (165 mm) in 2018. The comparatively higher values from Noah-MP could result from relatively high precipitation estimates from its MERRA2 precipitation forcings. Similarly, Viste and Sorteberg (2015) report that MERRA (version 1) showed higher snowfall in the Indus Basin than any other reanalysis or observation-based forcing dataset.
Since Alpine3D relies heavily on extrapolation of SWE, we suggest that its mean SWE values plotted in Fig. 6 could have higher uncertainty than some of the other models. For example, the Alpine3D pixels seem to melt out early compared to the other models, especially ParBal, which is the only model that relies on satellite-based estimates of fSCA (see Supplement video). Thus, Alpine3D may computing too little SWE in cold, high-elevation areas that melt slowly. These problems are all indicative of stations that are located in valley bottoms and that only cover the lowest elevations across these 25 km pixels. Figure 6. Time series of mean SWE for four snow models across the study area (13 pixels ×13 pixels, each 25 km 2 ) shown in Fig. 1  for 2017 (a) and 2018 (b). The reconstructed SWE from ParBal (yellow) goes back to 4 May, the median peak SWE date for both years, since reconstruction is only valid during the ablation season. The ParBal results are confounding given that the agreement between the modeled SWE from ParBal and SNOW-PACK at individual AKAH stations (Fig. 4a, b) is much better for both 2017 and 2018.
For insight into potential biases in the modeled spatial SWE from ParBal, we carefully studied the snow-covered area (SCA; not just for 2017 and 2018 but since 2001), the potential melt (i.e., the melt if a pixel were 100 % snow covered), and the melt from glacierized areas (light blue in Fig. 1). We did not find any errors in the model, its parameters, or its forcings. Thus, it is possible that the ParBal SWE is biased low in 2017 for reasons that we could not discern or that the other models are biased high. To note is that the 2017 and 2018 SCA ( Fig. 8; purple and orange) is very similar for both years during the ablation period, especially at the end of the ablation season.
Since pixels do not contribute uniformly to melt, SCA alone cannot be used to predict SWE, but in general years with less snow they have lower SCA values towards the end of the ablation season. Figure 8 shows that 2017 and 2018 were similar in terms of SCA from April to melt out. Thus, the large difference between 2017 and 2018 for the AKAH station SWE, but small differences in SCA and spatially averaged reconstructed SWE suggest that 2017 may have been a larger snow year at the lower elevations where the AKAH stations are but similar to 2018 at the higher elevations.

Stratigraphy and stability
The simulated snow profiles from the AKAH stations (Fig. 9a, b) and the 25 km pixels containing the AKAH stations (Fig. 10a, b) show very different snowpacks. Because of the induced increase in elevation from scaling (e.g., from an average of 2619 to 3858 m; Sect. 6.2), the 25 km pixels show a deeper but more faceted snowpack with critical layers that persist for a month or longer. In 2017, for the median AKAH station values, the snowpack reaches a maximum of 76 % facets on 21 January (Fig. 9a). In 2018, the snowpack reaches a maximum of 71 % facets (Fig. 9b). There were no critical layers simulated. In contrast, for the median values in the 25 km pixels for both years, the height of snow (HS) is approximately 2 times that for the stations (Fig. 10a, b). The snowpack reaches a maximum of 94 % facets in 2017, with one critical layer persisting for 35 d (Fig. 10a). The snowpack in 2018 reaches 95 % facets, with one or two critical layers persisting for 80 d (Fig. 10b). During the Nuris- tan avalanches on 4 to 7 February 2017 that killed over 100 people (United Nations, 2017), the AKAH stations show the largest 3 d snowfall of the study period (Fig. 9a), and the results for the 25 km pixels show large snowfall occurring on top of the only critical layer of the season (Fig. 9b). That is a classic avalanche scenario, i.e., a large snowfall on a weak snowpack.
In lieu of any type of snow profile from this region, these profiles paint the best picture of the snow conditions available. A relatively stable snowpack seems to be present in the valleys, where the AKAH stations are located. But at the higher elevations, the simulated profiles show a more critical snowpack. This is especially serious, considering that these villages are in the run-out zones of these potentially unstable snowpacks. In some cases, several thousand meters of vertical relief loom above the villages. For example, Yarkhun Lasht (36.795 • N 73.022 • E; elevation of 3249 m) in Pakistan is flanked by 6500 m peaks on both sides of its valley.

Conclusions
Knowledge of the snowpack in northwestern High Mountain Asia is poor. This area is subject to droughts and threatened by snow avalanches. Both problems can be aided by improved knowledge of the snowpack. Thanks to a novel operational avalanche observation network, there are now daily snow measurements at a number of operational weather stations in this austere region. In this study, 2 years of daily snow depth measurements from these stations were combined with downscaled reanalysis and remotely sensed measurements to force a point and spatially distributed snow model. Compared to a previous effort (Bair et al., 2018a), this study represents a substantial improvement in SWE modeling for the region and a first attempt to characterize regionwide snow stratigraphy. At the point scale, SWE estimates from a reconstruction technique that does not use precipitation or in situ measurements compared favorably. At the regional scale, four models showed a wide spread in both peak SWE and melt timing. For the models that rely on in situ precipitation measurements, a major challenge is spatial extrapolation, as many of the stations are located in deep valleys. Adding measurements from the mountains above would facilitate more realistic lapse rates, but these measurements do not currently exist, although they would be beneficial both for operational avalanche safety and for scientific studies. In the regional comparison, SWE estimates from ParBal were on the low end, but given the model spread it is difficult to form a consensus estimate. We plan additional in situ validation at other sites in High Mountain Asia to continue to assess the performance of ParBal there.
The simulated profiles showed very different snowpacks. At the point scale at lower elevations in the valleys, profiles showed fewer facets and almost no critical layers, while at the regional scale for higher elevations, the profiles showed heavily faceted snowpacks with critical layers that persisted throughout the winter and spring. ParBal was configured and forced as described in Bair et al. (2018a) and Bair et al. (2016). The model time step was 1 h. The DEM used was the ASTER GDEM version 2 at 1 arcsec (NASA JPL, 2011), while the canopy type and fraction were taken from the Global Land Survey at 30 m (USGS, 2009). The shortwave and longwave forcings were downscaled from the CERES SYN Edition 4A 1 • by 1 h product (Rutan et al., 2015), while the air temperature, specific humidity, air pressure, and wind speeds were downscaled from the GLDAS Noah version 2.1 0.25 • by 3 h product (Cosgrove et al., 2003). Time-space-smoothed (Dozier et al., 2008;Rittger et al., 2020) fSCA and grain size from MOD-SCAG (Painter et al., 2009) was combined with the visible albedo degradation from dust in MODDRFS (Painter et al., 2012) to produce snow hourly snow albedo.

A2 Noah-MP
Noah-MP v3.6 was run in retrospective mode within the NASA Land Information System (LIS) framework. A state vector ensemble (total 30 replicates) was generated by perturbing the forcings to account for the state uncertainty during forward propagation of the model. MERRA-2 (Gelaro et al., 2017) forcings were utilized with bilinear spatial and linear temporal interpolation. Hourly forcings were computed by combining temporally interpolated snow depth from the AKAH manual measurements with air temperature, incoming shortwave, reflected shortwave, incoming longwave, wind speed, and relative humidity from the downscaled ParBal outputs, as described in Sect. 5.2. SNOWPACK was only run for periods when measurements from the AKAH stations were available in November-December to April-May, depending on the year. Plots were assumed to be level, so forcings without terrain correction were applied, except for shading, when the sun was below the local horizon, e.g., a mountain was blocking the sun (Dozier and Frew, 1990). The wind direction, which is not available in GLDAS-2, was fixed at the mean value from the daily AKAH instantaneous values. The ground temperature was set as the minimum of the air temperature or −1.5 • C when snow cover was present.
Aside from setting required parameters and values for inputs and outputs, changes to default parameters that affected model output are provided in Table A2.

A4 Alpine3D
Alpine3D version 3.10 was run using with the outputs produced by SNOWPACK as forcings for each of the AKAH stations at 25 km resolution. The DEM and land cover (incorrectly labeled land use in the Alpine3D documentation) data were upscaled from the ParBal data. Alpine3D was run at an hourly time step using hourly forcings, with daily outputs using the "enable-eb" switch. Other switches were set to off, the defaults. The "enable-eb" switch computes the terrain radiation with shading and terrain reflections (see Alpine3D documentation at https://models.slf.ch (last access: 15 May 20019) for a description).
To extend the length of the model runs, for each AKAH station, GLDAS-2 precipitation was appended to periods prior to the first AKAH observation for the year and after the last, as described in Sect. 5.5.
The forcings were hourly for the following measurements: incoming shortwave, incoming longwave, air temperature, relative humidity, wind speed, wind direction, reflected shortwave, accumulated precipitation, and ground temperature.
Critical to Alpine3D are the interpolation methods from MeteoIO to spatially distribute precipitation and other forcings. We found the modeled SWE to be highly dependent on the spatial interpolation of precipitation. Our initial approach was to explore local (i.e., with a given radius from a station) and regional (i.e., all AKAH stations) lapse rates in the measured snow depth and modeled precipitation from SNOWPACK. We found almost no correlation in many of the measurements, which was not surprising given the complexity of the terrain and likely existence of microclimates with substantial influence on precipitation. Without having a good validation source for spatial precipitation (as is the case for all of High Mountain Asia), we selected an interpolation method that yielded relatively smooth results but showed increases in precipitation with elevation.
Ultimately, we decided to use an inverse distance weighting scheme with elevation detrending (IDW_LAPSE) and a multilinear option. For this method, the input data are detrended, and then the residuals are spatially interpolated according to an inverse distance weighting scheme. The detrending uses a multiple linear regression with northing, easting, and altitude. The linear regression has an iterative method for removing outliers. Finally, values at each cell are retrended using the multiple linear regression and added to the interpolated residuals.  A summary of the interpolation methods, all of which are defined in the MeteoIO documentation (Bavay and Egger, 2014), is given in Table A3.
Unfortunately, the AKAH measurements are not publicly available due to security concerns. Requests for the dataset should be made through the Aga Khan Agency for Habitat (https://www.akdn. org, last access: May 2019).
Author contributions. DC provided the AKAH dataset. JA ran the Noah-MP simulations. KR prepared the snow surface properties dataset. EB processed the data and prepared the paper.