Snow-cover Reconstruction Methodology for Mountainous Regions Based on Historic in Situ Observations and Recent Remote Sensing Data

Spatially distributed snow-cover extent can be derived from remote sensing data with good accuracy. However , such data are available for recent decades only, after satellite missions with proper snow detection capabilities were launched. Yet, longer time series of snow-cover area are usually required, e.g., for hydrological model calibration or water availability assessment in the past. We present a methodology to reconstruct historical snow coverage using recently available remote sensing data and long-term point observations of snow depth from existing meteorological stations. The methodology is mainly based on correlations between station records and spatial snow-cover patterns. Additionally , topography and temporal persistence of snow patterns are taken into account. The methodology was applied to the Zerafshan River basin in Central Asia – a very data-sparse region. Reconstructed snow cover was cross validated against independent remote sensing data and shows an accuracy of about 85 %. The methodology can be used in mountainous regions to overcome the data gap for earlier decades when the availability of remote sensing snow-cover data was strongly limited.

1 Introduction 24 Water resources from remote mountain catchments play a crucial role for the development of 25 regions in or in the vicinity of mountain ranges (Pellicciotti et al., 2011). Seasonal snow is an 26 important water resource in many of Earth's semiarid regions (Durand et al., 2008). Spectroradiometer) are available for recent decades only. The derivation of snow cover from 10 Landsat and Advanced Very High Resolution Radiometer (AVHRR) data which are available 11 for longer periods (Landsat launch 1972, AVHRR launch 1978) is strongly limited by the 12 presence of clouds and atmospheric pollutants. Although recently the reconstruction of snow 13 cover time series from AVHRR data for Central Asia has been reported by Zhou et al. (2013).
14 However, they are also limited in time starting in 1986 at earliest. 15 Several studies used remotely sensed snow cover either as input to hydrological models (Tekeli  21 In Central Asia, continuous hydro-meteorological records are widely available from 1960s and 22 earlier until the collapse of the Soviet Union in 1991. In contrast, continuous remote sensing 23 snow cover data from MODIS are readily available after 2000, when station data are very 24 scarce. We present a methodology which enables reconstructing historical snow cover pattern 25 using long-term, point-based observations from existing meteorological stations and recent 26 remotely sensed snow cover data. By merging high-resolution spatial satellite data with long-27 term station data, snow cover patterns can be reconstructed for several decades into the past.

28
Only a limited number of studies on snow cover reconstruction have been conducted in the past 29 that use long term station observations and recent remote sensing data (Robinson, 1991, Brown,   regression analysis between snow characteristics and snow cover area derived from AVHRR 2 satellite observations. As snow characteristics both studies used snow cover duration derived 3 from interpolated station records. Another study by Brown (1999) conducted reconstruction of 4 snow cover for "pre-satellite era" interpolating snow depth data from station network. For grid 5 cells of nearly 200 km, the interpolation of snow cover was done using different thresholds for 6 snow depth and compared against NOAA SCE (Snow Cover Extent) during "satellite era". The 7 calibration showed 2 cm to be most appropriate snow depth for accurate snow cover 8 reconstruction based on station data. These studies can be helpful in assessing climate related 9 variations of snow cover but are hardly transferable to smaller catchment scale with moderate 10 resolution and limited station data availability.

11
Different to those studies mentioned above, we present a methodology for snow cover 12 reconstruction 1) with moderate spatial resolution (500 m), 2) suitable for catchment scale 13 hydrological studies, 3) accounting topography and 4) delivering spatially distributed snow 14 cover maps. The methodology takes advantage of the strong control of topography on the spatial 15 snow cover distribution. Hence, measurements from snow observation stations at different 16 elevations can be interpreted as representative sites to predict snow cover patterns. The 17 methodology consists of five successive steps which make use of topographic information and 18 correlations between station records and spatial snow cover patterns. In order to test the 19 presented methodology, snow cover reconstruction was conducted for four days (Table 1) for 20 which independent Landsat data were available. The methodology for snow cover reconstruction was developed and tested for the area 24 containing the Upper Zerafshan River basin, Central Asia (Fig. 1). We used (1) daily in-situ snow depth data, (2) daily MODIS snow cover data, (3) a digital 8 elevation model (DEM), and (4) Landsat data. The first three datasets were used for snow cover 9 reconstruction whereas Landsat data was used as an independent dataset to validate the results.

3.1
In-situ snow depth data 12 Daily snow depth data in the period from 1964 to 2012 were available for seven climate stations 13 located at different elevations ( Fig. 1    The methodology presented hereafter is based on similarity of different locations in terms of 2 presence or absence of snow at a given time. The idea is to use the information about the 3 presence of snow from one location in order to estimate the presence of snow at another 4 location. The similarity between different locations was assessed using both observed snow 5 cover at meteorological stations (i.e. records of snow depth > 0) and MODIS snow cover data. 6 This similarity was quantified using the concept of conditional probability. The conditional 7 probability gives the probability of one event (e.g. snow cover at a pixel) to occur given that 8 another event (e.g. snow depth>0 at station) has already occurred. Additionally to the similarity 9 between two locations, temporally persistent monthly snow cover patterns and elevation based 1. Pixel to station CP fields 17 2. Temporally persistent monthly probability fields 18 3. Pixel to pixel CP fields 19 4. Usage of elevation information 20 5. Pixel to station CP for CP <1 21 22 4.1 Pixel to station conditional probability 23 In the first step, we consider the CP of each pixel, given the observed data from a set of snow 24 stations. We compute the CP of each pixel as follows:

12
CPs were computed for each MODIS pixel in the study area (total of 169 776 pixels) using over 13 12 years of available MODIS data and observed snow depth measurements. Hence, the daily 14 snow cover maps from MODIS are treated as snow observation for each 500 m grid cell, giving 15 rise to a very dense "observation network". An example for a CP map for snow and land 16 conditions for Chimgan station is given in Fig. 4. In total, 14 maps were derived (two maps for 17 every of the 7 stations: one for ( , | ) and one for ( , | )). 18 The number of pixels with ( , | ) = 1 ( ( , | ) = 1) varies from station to station. 19 The higher the number of pixels with ( , | ) = 1, the higher is the predictive power of 20 the station for snow classification. Similarly, the higher the number of pixels with 21 ( , | ) = 1, the higher is the predictive power of the station to predict snow-free 22 conditions. In order to quantify the predictive power of each station, we introduce two terms:  shows snow-free conditions, 9.0 % of the study area can be assigned as snow free.
7 Figure 4 shows that ( , | ) = 1 occurs mainly at high elevations (cf. Fig. 1), whilst 8 ( , | ) = 1 occurs mainly at low elevations. This is not surprising since pixels with an 9 elevation far higher than the elevation of Chimgan station tend to be snow covered if Chimgan 10 station records positive snow depth, whilst pixels with an elevation far below the elevation of 11 Chimgan station tend to be snow free if Chimgan station records snow depth of zero. can be explained with the high elevation of the station. When the station indicates snow free 17 conditions, pixels with significantly lower elevation are likely to be snow-free as well. 18 Assuming that the dependencies remain stable in time, the computed CPs of each pixel can be 19 used to classify individual pixels for any arbitrary day prior to the availability of MODIS data 20 (before 2000) for which station records are available: This step leads to a partially reconstructed snow cover map which is further enhanced in the 24 next steps. or snow free with high confidence. The spatial distributions of such temporally persistent 1 patterns can be identified using the available MODIS snow cover data in the period 2000-2012. 2 A "monthly probability" (MP) of each pixel to be covered by snow or land in a certain month 3 is computed according to: 6 where , , and , , are the probabilities of pixel , to be covered by snow or land in 7 month , respectively. ( , , ) indicates the coverage (snow or land) of pixel , on day .  Computation of MPs for every pixel in the study area leads to MP maps for all 12 months as 12 illustrated exemplarily in Fig. 5 for April. 13 Pixels with , , = 1 in Fig. 5, i.e. pixels that were always snow covered in April, add up to 14 12.7 % of the whole area. This means that 12.7 % of the study area can be classified as snow 15 covered in April. Pixels with , , = 1 add up to 14.9 %, i.e. 14.9 % of the domain can be 16 classified as snow free in April. To remain consistent with the terminology used in the first step, 17 we call the sum of pixels with , , = 1 ( , , = 1 ) the monthly SPI (LPI) value for 18 snow (land). Monthly SPI and LPI are defined in a similar way as in step 1 (Eq. 3 and 4). 19 The main idea in this step is to transfer these temporally persistent monthly spatial snow / land 20 patterns ( / ) to the past to reconstruct historical snow cover. However, the validity 21 of these temporally persistent spatial snow / land patterns over longer time in the past is not 22 assured, due to e.g. potential warmer / cooler or wetter / dryer climate conditions. In order to 23 account for possible climatic variability, we introduce a buffer as vertical elevation shift from where , is the elevation of the pixel , .
is thus the maximum elevation of all pixels 1 with , , = 1 in month . Note that below the altitude not all pixels necessarily have 2 , , = 1. Similarly, is the minimum elevation of all pixels with , , = 1 in month 3 . Again, does not necessarily represent the elevation above which all pixels have 4 , , = 1. Table 3 lists monthly and values, as well as , and , values 5 for the selected area. 6 These , , , , and monthly / maps were used to further reconstruct the snow 7 cover resulting from step 1:

4.3
Pixel to pixel conditional probability 19 In step 1, CPs of each pixel in accordance to station records were computed, and any pixel that The computation of ( , | , ) and ( , | , ) according to Eq. (13) and (14)

12
The pixel with coordinates =100, =100 has ( , | 100,100 ) = 1 for 30 684 (18 %) other 13 pixels in the study area, which means that when the particular pixel was snow covered during  (4), and illustrated in Fig. 7 for all pixels in the study area. 19 The maximum value of Fig. 7  easier to predict than snow covered ones.

30
The SPI and LPI maps were used for classifying pixels that are still undefined after the 31 previous steps: Since in this step SPI and LPI maps were generated for every pixel in the basin, this step 3 tends to classify a significantly larger area than the first step where only 7 stations were used 4 for constructing MRs. snow free in any of the previous steps. If any of the adjacent eight pixels is covered by snow 10 and the elevation of that snow-covered pixel is lower than the pixel that is still undefined, then 11 the undefined pixel is classified as snow covered. The same idea is applied for snow-free pixels. 12 Hence, this step can be formalized as follows: This step takes only elevation of neighbouring pixel into account. However, in areas where 18 factors others than elevation have an influence on neighbouring pixel condition (e.g. pixels 19 located near to water surfaces or forests), additional information such as land cover map could 20 be introduced into this step.

23
In the last step, the ( , | ) and ( , | ) values calculated in step 1 are used again. 24 Whereas in step 1 only CP= 1 conditions were considered, this step considers < 1 relations 25 (cf. Fig 4) to classify still undefined pixels. Still undefined pixels are classified according to the 26 highest CP value found amongst all 14 CP values available (7 CPs for snow and 7 CPs for land). confidence level for � , � � < 1 case among 14 stations. Thus, we use highest lower bound 10 CI among all CPs of this particular pixel to be decisive for reconstruction. Taking maximum lower bound CI values for still undefined pixels in the last step allows to 15 complete the classification for all pixels. However, since in this step ( , | )<1 was 16 considered, the reconstruction is subject to uncertainty that stems from non-perfect agreement 17 between station records and a pixel in the period 2000-2012. 18 19 5 Results and discussion 20 Applying the five steps described above, snow cover maps for the area containing Zerafshan assessed in a contingency table (Table 4). In Table 4, the sum of percentages of "SS" and "LL" 1 columns represent the degree of accuracy after each reconstruction step, related to the total 2 share of reconstructed pixels. Accordingly, the sum of "SL" and "LS" indicate the error in 3 relation to the total percentage. 4 As an example, Fig. 8 shows the reconstructed and Landsat-derived snow cover maps for April 5 10, 1998. The comparison of these maps results in 85.7 % of correct reconstruction (cases 6 SS+LL in Table 4 to previous steps. Note that ER may also be enhanced by erroneous snow cover estimation from 10 raw Landsat data and due to the spatial aggregation of Landsat 30 m original resolution to 500 11 m. Another potential bias may come from similar approaches (NDSI) used to map snow cover 12 both for MODIS snow cover maps which are used to assess CPs between station and pixels and 13 Landsat snow cover maps which are used to validate reconstructed snow cover maps. However, 14 different threshold values than MODIS were used to map snow cover from Landsat assuring 15 best visual validation of snow and snow free surface cover.

16
In order to better illustrate snow reconstruction in step 5, Fig. 9 shows the areal fraction for (reddish and light blue colours in Fig. 9). Figure 10 illustrates the trade-off between RF and ER 21 as a function of lower bound CI in step 5. For example, for the validation day of April 10, ER 22 from steps 1-4 adds up to 1.5 % (Table 4) and RF to 51.4 % (snow and land classes in Fig. 9).

23
With decreasing CI, RF increases, but at the cost of an increased ER. However, Fig. 10 also 24 shows that RF is relatively high until about CI=0.9 with increasing ER. In all four days used 25 for validation, an almost complete reconstruction is achieved with CI>0.9. snow cover maps (Fig. 9) deliver a partial snow cover reconstruction with high accuracy 1 resulting from steps 1-4, and, as result from step 5, a probability statement for snow cover for 2 the remaining pixels. 3 The validation days for this study were chosen deliberately from snow melt and snow 4 accumulation season (transition period) where snow cover estimation is particularly 5 challenging. For the time outside the snow melt or snow accumulation period, higher accuracies 6 can be expected since a higher fraction can be reconstructed in the first 4 steps already. During 7 snow transition period, snow cover conditions such as ephemeral snow cover can occur which 8 exacerbates snow cover estimation. However, in the reconstruction process using steps 1, 2 and  Under such conditions, station or MODIS data may see different snow cover than the reality.

11
In such cases (e.g. MODIS sees "land" although there is ephemeral snow whilst the station sees 12 "snow" since it is a manual point recording with a certain threshold), CP and MP of pixel gets 13 the value of < 1 as they do not show the same event and is not used in the first three steps for 14 reconstruction. Only distinct snow cover records from both station and MODIS are used to 15 identify snow covered areas in these step with CP=1 (MP=1). Reduced CP values that may 16 partly be due to ephemeral snow cover are, however, used in step 5 in order to classify areas 17 still undefined in the steps 1-4 and may thus contribute to the accuracy loss in step 5. 18 The validation of reconstructed snow cover maps were done using independent Landsat data in 19 this study. Alternatively, the AVHRR snow cover data, which is also available beyond the 20 MODIS data availability in the past, can be used for validation purposes. However, AVHRR 21 snow cover data has a coarser spatial resolution (~ 1.1 km) than the resolution (500 m) used in 22 this study. Unfortunately, processed AVHRR snow cover data was not available at the time of 23 manuscript writing and remains alternative data to be used for validation.  stations could be obtained. This is important to estimate initial snow cover in the first step which 28 is a base input for next steps (except step 2). Thus, we can conclude that the methodology is 29 well applicable for mountainous areas where high SPI and LPI values can be obtained.

30
However, it might be difficult to exploit statistical relationship between point measurements 31 and aereal pattern in lowland areas and is a subject to be tested. In this study, a methodology for reconstructing past snow cover using historical in-situ snow 3 depth data, recent remote sensing snow cover data and topographic data was presented. The 4 methodology is based on (1) constructing relationships between station observations and remote 5 sensing data, (2) estimating the monthly variation of snow cover from remote sensing data, (3) 6 deriving pixel-to-pixel relationships using remote sensing data, and (4) using neighbourhood 7 relations. Once the dependence between individual pixels and station records is derived, this 8 dependence is used to reconstruct past snow cover based solely on station records. 9 The methodology was applied to a study area containing the Zerafshan River basin -a basin 10 with high topographic gradients -in Central Asia and showed correct classification in the range 11 of 83.3 to 85.7 % when compared to four Landsat snow cover scenes. This high agreement is 12 noteworthy, given that only 7 stations and 12 years of remote sensing data were available. 13 Moreover, snow cover reconstruction was done for snowmelt and onset season when snow 14 classification is challenging compared to outside snowmelt and onset seasons where large areas 15 are easy to classify as snow or land. The agreement is only slightly less than that of original whether a station was snow covered (0) or snow free (1) during a day for which snow cover 4 reconstruction was conducted and Landsat scenes were available for validation.     LS. The first (second) letter indicates the classification according to the presented algorithm 3 (Landsat). "S" stands for "snow", "L" for "land". "Total" indicates the percentage of pixels