Evaluation of snow extent time series derived from Advanced Very High Resolution Radiometer global area coverage data (1982–2018) in the Hindu Kush Himalayas

Abstract. Long-term monitoring of snow cover is crucial for climatic
and hydrological studies. The utility of long-term snow-cover products lies in
their ability to record the real states of the earth's surface. Although a
long-term, consistent snow product derived from the ESA CCI+ (Climate Change Initiative) AVHRR GAC (Advanced Very High Resolution Radiometer global area coverage) dataset
dating back to the 1980s has been generated and released, its accuracy and
consistency have not been extensively evaluated. Here, we extensively
validate the AVHRR GAC snow-cover extent dataset for the mountainous Hindu
Kush Himalayan (HKH) region due to its high importance for climate change
impact and adaptation studies. The sensor-to-sensor consistency was first
investigated using a snow dataset based on long-term in situ stations (1982–2013).
Also, this includes a study on the dependence of AVHRR snow-cover accuracy
related to snow depth. Furthermore, in order to increase the spatial
coverage of validation and explore the influences of land-cover type,
elevation, slope, aspect, and topographical variability in the accuracy of
AVHRR snow extent, a comparison with Landsat Thematic Mapper (TM) data was included. Finally, the performance of the AVHRR GAC snow-cover dataset was also compared to the
MODIS (MOD10A1 V006) product. Our analysis shows an overall accuracy of
94 % in comparison with in situ station data, which is the same with MOD10A1 V006.
Using a ±3 d temporal filter caused a slight decrease in accuracy
(from 94 % to 92 %). Validation against Landsat TM data over the area with a wide range of conditions (i.e., elevation, topography, and land cover)
indicated overall root mean square errors (RMSEs) of about 13.27 % and 16 % and overall biases of
about −5.83 % and −7.13 % for the AVHRR GAC raw and gap-filled snow
datasets, respectively. It can be concluded that the here validated AVHRR
GAC snow-cover climatology is a highly valuable and powerful dataset to
assess environmental changes in the HKH region due to its good quality, unique
temporal coverage (1982–2019), and inter-sensor/satellite consistency.


Abstract. Long-term monitoring of snow cover is crucial for climatic and hydrological studies. The utility of long-term snow-cover products lies in their ability to record the real states of the earth's surface. Although a long-term, consistent snow product derived from the ESA CCI+ (Climate Change Initiative) AVHRR GAC (Advanced Very High Resolution Radiometer global area coverage) dataset dating back to the 1980s has been generated and released, its accuracy and consistency have not been extensively evaluated. Here, we extensively validate the AVHRR GAC snow-cover extent dataset for the mountainous Hindu Kush Himalayan (HKH) region due to its high importance for climate change impact and adaptation studies. The sensor-to-sensor consistency was first investigated using a snow dataset based on long-term in situ stations . Also, this includes a study on the dependence of AVHRR snow-cover accuracy related to snow depth. Furthermore, in order to increase the spatial coverage of validation and explore the influences of land-cover type, elevation, slope, aspect, and topographical variability in the accuracy of AVHRR snow extent, a comparison with Landsat Thematic Mapper (TM) data was included. Finally, the performance of the AVHRR GAC snow-cover dataset was also compared to the MODIS (MOD10A1 V006) product. Our analysis shows an overall accuracy of 94 % in comparison with in situ station data, which is the same with MOD10A1 V006. Using a ±3 d temporal filter caused a slight decrease in accuracy (from 94 % to 92 %). Validation against Land-sat TM data over the area with a wide range of conditions (i.e., elevation, topography, and land cover) indicated overall root mean square errors (RMSEs) of about 13.27 % and 16 % and overall biases of about −5.83 % and −7.13 % for the AVHRR GAC raw and gap-filled snow datasets, respectively. It can be concluded that the here validated AVHRR GAC snow-cover climatology is a highly valuable and powerful dataset to assess environmental changes in the HKH region due to its good quality, unique temporal coverage , and inter-sensor/satellite consistency. ment decisions, and investigating climate change impacts on environmental variables (Arsenault et al., 2014;Sun et al., 2020).
The Hindu Kush Himalayan (HKH) region, which is often called the freshwater tower of Asia, comprises the highest concentration of snow outside the polar regions. The snow cover of this area plays a crucial role in the water supply of several major Asian rivers (Immerzeel et al., 2009). On the other hand, the HKH region is of special interest due to its large area, rich diversity of climates, hydrology, ecology, and biology (Wester et al., 2019). Variations in snow cover affect the precipitation, near-ground air temperature, and summer monsoon in Eurasia and across the Northern Hemisphere . Given the fact that the HKH region is particularly sensitive to climate change and thus shows strong interannual variability, reliable daily snow-cover data over a long time series across this area are in great demand.
Optical satellite data provide important data sources for snow-cover retrieval through the contrasting spectral behavior of snow relative to other natural surfaces in the visible and middle-infrared regions (Tedesco, 2014;Zhou et al., 2013). The global spatial coverage of satellite data makes it an efficient data source to improve our knowledge of snowcover dynamics (Siljamo and Hyvärinen, 2011;Solberg et al., 2010). Many satellites have been used to generate snowcover products at various spatial and temporal resolutions, such as AMSR-E (Tedesco and Jeyaratnam, 2016), MODIS (Riggs et al., 2016a), AVHRR (Advanced Very High Resolution Radiometer; Shan et al., 2016), VIIRS (Riggs et al., 2016b), and Landsat (Rosenthal and Dozier, 1996). In particular, new generation satellite sensors (e.g., MODIS, VI-IRS) generally show an advantage over old sensors such as AVHRR and TM/ETM (Thematic Mapper and Enhanced Thematic Mapper) which suffer from significant saturation over snow in the visible channels (WMO, 2012). Nevertheless, AVHRR offers the unique opportunity to generate a consistent snow product over a 30-year normal climate period (IPCC, 2013) and thus remains vitally important. In response to the systematic observation requirements of the Global Climate Observing System (GCOS), the ESA Climate Change Initiative (CCI) has emphasized the necessity of generating consistent, high-quality long-term datasets over the last 30 years as a timely contribution to the ECV (Essential Climate Variable) databases. For this demand, a global time series of daily fractional snow-cover products has been generated from AVHRR GAC (global area coverage) data (Naegeli et al., 2021). This snow dataset is unique as it spans 4 decades and thus provides information about an ECV at climaterelevant timescales.
Nevertheless, there are many factors, such as data processing (e.g., calibration, geocoding) and the accuracy of cloud masking, atmospheric constituents, topographic effects, bidirectional reflectance distribution function (BRDF), and the limitations of snow-cover retrieval algorithms, influencing the accuracy of the AVHRR GAC snow-cover extent. Hence, the performance of the AVHRR GAC snow product needs to be extensively evaluated, especially over the HKH region which is highly sensitive to climate change. This paper presents the validation of the AVHRR GAC snow product over the HKH area during snow seasons. Of particular importance is validating the temporal performance of the product (i.e., different platform operated over the entire dataset period). To this end, the first validation was carried out using 118 in situ stations' measurements. The correlation between spatial products and "point" measurements depends strongly on the selected snow depth. Therefore, the influence of snow depth on the accuracy of the product was also investigated. Considering that the HKH region features distinct characteristics of snow cover with shallowness, patchiness, and frequent short duration ephemeral snow (Qin et al., 2006), in situ site measurements alone are not enough to characterize its accuracy. A multi-scale validation and comparison strategy is highly needed to assess its accuracy over greater spatial extent and elevation ranges. Within this validation framework, the influences of land-cover types, elevations, aspects, slopes, and topographies on the accuracy of AVHRR GAC snow were also explored. Finally, the MODIS snow maps were also introduced to conduct a comparison between the well-validated MODIS product and the new AVHRR GAC snow product. Section 2 describes the study area and data. The validation methodology is explained in Sect. 3. The performance of the AVHRR GAC snow dataset is presented and discussed in Sect. 4. A brief conclusion is presented at the end.
2 Study area and data

Study area
The HKH region covers a mountainous region of more than 4 million km 2 within the geographic area between about 16 to 40 • N latitude and 60 to 105 • E longitude. It extends across all or parts of eight countries, namely Afghanistan, Bangladesh, Bhutan, China, India, Myanmar, Nepal, and Pakistan (You et al., 2017). Moreover, it contains the highest concentration of snow and ice outside the polar regions and is thus referred to as the "Third Pole" (Wester et al., 2019). This region is one of the most dynamic, fragile, and complex mountain systems in the world due to the rich diversity of climatic, hydrological, and ecological characteristics. The climate conditions range from tropical (< 500 m a.s.l.) to high alpine and nival zones (> 6000 m a.s.l.), with a principal vertical vegetation regime composed of tropical and subtropical rainforests, temperate broadleaf, deciduous, or mixed forests, temperate coniferous forests, alpine moist and dry scrub, meadows, and desert steppe (Guangwei, 2002). The main land cover of this region is rangeland, which covers approximately 54 % of the total area. Agriculture and forest are also present, accounting for 26 % and 14 % of this region, respectively. A total of 5 % of this region is permanent snow and glaciers, and 1 % is water bodies (Ning et al., 2014;Wester et al., 2019). Snowmelt is considered to be a key source of water supply in the HKH range, and the ability of snow products to quantify snow storage and melt is thus critical for the management of water resources (Foster et al., 2011).
The validation based on in situ stations covers mainly the eastern part of the HKH region (Fig. 1a). To demonstrate the accuracy of the AVHRR snow product over the whole area, Landsat data covering the entire region were introduced to conduct a multi-scale validation (Fig. 1b). Furthermore, in order to explore its performance in high detail for a wide range of conditions (e.g., elevation, topography, and land cover), validation against Landsat TM data was also performed in detail using two tiles of Landsat data (path 140, rows 40 and 41, denoted as "P140-R40/41") ( Fig. 1c), covering a diverse region on the Nepal/Tibet border centered around Mount Everest. This region was chosen because it contains the greatest elevation range in the Himalayas. The northernmost part of this region are areas on the Tibetan plateau exceeding 6000 m a.s.l. where vegetation change is occurring rapidly (Qiu, 2016). Furthermore, it covers a broad range of climatic conditions (Bookhagen and Burbank, 2006). Therefore, this region is a microcosm of the range of conditions experienced across the wide HKH region and thus provides a good point for investigating snow extent accuracy under different conditions (Anderson et al., 2020).

AVHRR GAC snow extent retrieval
The AVHRR GAC snow-cover extent time series version 1 derived in the frame of the ESA CCI+ Snow project is the most recent long-term global snow-cover product available (Naegeli et al., 2021). It covers the period 1982-2019 at a daily temporal and 0.05 • spatial resolution. The product is based on the Fundamental Climate Data Record (FCDR) consisting of daily composites of AVHRR GAC data (https://doi.org/10.5676/DWD/ESA_Cloud_cci/AVHRR-PM/V003) produced in the ESA CCI Cloud project (Stengel et al., 2020). The data were preprocessed with an improved geocoding and an inter-channel and inter-sensor calibration using PyGAC (Devasthale et al., 2017). The snow-cover extent retrieval method was developed and improved based on the ESA GlobSnow approach described by Metsämäki et al. (2015) and complemented with a pre-classification module. Alongside the daily reflectance and brightness temperature information, an excellent cloud mask including pixel-based uncertainty information is provided (Stengel et al., , 2020. All cloud-free pixels are then used for the snow extent mapping using spectral bands centered at about 630 nm and 1.61 µm (channel 3a or the reflective part of channel 3b) and an emissive band centered at about 10.8 µm. The water bodies, permanent ice bodies, and missing values are flagged. SCAmod retrieves both the snow cover on top of the canopy, as well as on ground below the canopy, by taking the canopy density into account. Here, we focus on the latter variable as this is most suitable for the comparison with in situ stations.
To reduce the effect of cloud coverage, a temporal filter of ±3 d for each individual snow-cover observation was applied based on Foppa and Seiz (2012). The AVHRR GAC FCDR snow-cover product comprises only one longer data gap of 92 d between November 1994 and January 1995, resulting in a 99 % data coverage over the entire study period of 38 years. In this study, we will focus on the evaluation of raw daily retrieval of AVHRR GAC snow extent (denoted by "AVHRR_Raw") since additional uncertainty will be introduced with the gap-filling process.

In situ snow depth measurements
In situ data were provided by the China Meteorological Administration (https://data.cma.cn/en, last access: 30 October 2019). Daily snow depth (SD) measurements (118 · 365) are obtained from 118 stations located at different elevations ranging from 776 to 8530 m above sea level. SD was usually measured over a large flat area using rulers at 08:00 LT (UTC+8) every day. Three measurements were made at least 10 m away, and their mathematical mean was used as the daily snow depth. In particular, if snowfall occurred after 08:00, a second measurement at 14:00 or a third measurement at 20:00 were needed depending on the time of snowfall. The data were rounded to the nearest centimeter. Thus, SD less than 0.5 cm would be labeled as 0 cm in the record. Detailed quality control was made to flag suspicious values. The period from 1982 to 2013 was used to prove the temporal consistency of the AVHRR GAC snow-cover extent product.

Landsat TM/ETM data and processing
Landsat data were introduced for two purposes: (i) to check the spatial consistency between AVHRR GAC snow and Landsat-based snow based on 197 scenes covering the whole HKH region and (ii) to explore the factors (e.g., elevation, topography, and land cover) influencing the accuracy of AVHRR GAC snow based on P140-R40/41. To mitigate the effect of clouds, the validation over P140-R40/41 was restricted to clear-sky (cloud no more than 10 %) scenes of Landsat 5 TM during snow seasons (46 · 2 scenes from 1984 until 2013; downloaded from https://glovis.usgs.gov/, last access: 30 April 2020). The validation over the whole HKH region was restricted to Landsat clear-sky scenes from 1999 to 2018 (197 scenes) (Fig. 1b). Level-1 Precision and Terrain Correction (L1TP) data were selected since they have been radiometrically and geometrically corrected. Following the recommendation of Metsämäki et al. (2015), the fractional snow method by Salomonson and Appel (2006) was employed to generate reference FSC (fractional snow cover) from Landsat TM/ETM imagery. This method is originally designed for MODIS FSC products, with a mean absolute error of less than 10 % (Salomonson and Appel, 2004). In this paper, we assumed that such an accuracy can be achieved with higher resolution data. Bands 2 (0.53-0.61 µm) and 5 (1.55-1.75 µm) were used to provide NDSI (normalized difference snow index) estimates (Eq. 1), and then the Salomonson and Appel scaling (Eq. 2) is applied. These highresolution data were then projected to a geographic projection and aggregated to AVHRR GAC pixel scale using the area-weighted average of contributing pixels to "simulate" the reference FSC estimates at the AVHRR GAC pixel scale.
where B2 and B5 denote the spectral bands 2 and 5, respectively.

MODIS snow-cover product
The Terra MODIS Level 3, Collection 6, 500 m daily snow-cover products (MOD10A1) (Hall and Riggs, 2016) over the HKH region from 2000 to 2013 were obtained through Google Earth Engine (GEE). The MODIS snow detection algorithm also uses NDSI and other test criteria (Riggs et al., 2016a). Instead of directly providing binary snow-covered area (SCA) and FSC, version V006 provides NDSI_Snow_Cover and NDSI. The former is reported in the range of 0-100 with other features identified by mask values, while the latter represents the real NDSI values multiplied by 10 000, which is calculated for all pixels (Riggs et al., 2016a). This treatment provides more information and great flexibility to enhance the accuracy of the product because the NDSI range is not necessarily restricted to 0. Compared to the previous version, version V006 made great improvements on atmospheric correction, cloud cover, and quality index. Furthermore, the algorithm takes a pixel's elevation into account, which is especially important for elevated snow-covered surfaces in spring. In order to avoid a spatial-scale mismatch between AVHRR and MODIS pixels, MOD10A1 was reprojected to a geographic projection and aggregated to AVHRR GAC pixel scale using the areaweighted average of contributing pixels.

Auxiliary data
The digital elevation model (DEM) information was obtained from the SRTM (Shuttle Radar Topography Mission) dataset, which provides a nearly global coverage with a spatial resolution of 90 m. In this study, the elevation, slope, aspect, and topographical variability were derived using this dataset in order to investigate their influences on the accuracy of the AVHRR GAC snow extent product. The topographical variability within a certain AVHRR GAC pixel was determined by calculating the standard deviation of elevations of all subpixels within its spatial extent, while the elevation, slope, and aspect were resampled to match the resolution of the AVHRR GAC snow dataset. The MODIS Terra/Aqua Combined Annual Level 3 Global 500 m Collection 6 land-cover dataset (MCD12Q1) was generated using a supervised classification methodology (Friedl et al., 2010). In this study, the International Geosphere-Biosphere Programme (IGBP) of the MCD12Q1 mosaic was used to investigate the difference in accuracy over different land-cover types. It includes 11 types of natural vegetation, 3 types of developed and mosaic lands, and 3 types of non-vegetated lands, which have been reclassified into nine major classes: forest, grassland, savannas, croplands, built-up lands, barren, permanent snow and ice, water body, and wetlands. In order to match with the pixel size of AVHRR GAC snow, the MCD12Q1 was resampled to 0.05 • spatial resolution with the nearest neighbor interpolation.

Methods
AVHRR GAC snow extent was evaluated from several aspects. The validation based on in situ sites aims to prove the long-term consistency since in situ stations provide valuable long time series measurements, while the comparison with Landsat and MODIS snow is focused on their spatial consistency and the in-depth analysis of influential factors (elevation, topography, and land cover). The validation strategy is briefly summarized in Table 1.

Binary validation based on in situ data
Although the validation based on in situ sites leaves issues of scale unresolved and therefore likely accompanied by uncertainties, in situ observations provide the only source to validate the time series AVHRR GAC snow extent over this long period. Since there is no reliable way to convert SD to FSC, both FSC and SD information were converted to binary information by applying appropriate thresholds, respectively. Different thresholds have been suggested for in situ SD measurements to determine whether the associated pixel is covered by snow, ranging from 0 to 5 cm (Parajka et al., 2012;Hori et al., 2017;Hao et al., 2019;Huang et al., 2018;Zhang et al., 2019;Gascoin et al., 2019). Therefore, the sensitivity of thresholds was tested by computing accuracy metrics with SD increasing from 1 to 5 cm. The FSC maps were transferred from fractional to binary snow information by applying a threshold of FSC ≥ 50 %. The value of 50 % is widely used and accepted in snow-cover detection (Wunderle et al., 2016;Mir et al., 2015;Crawford, 2015;Marchane et al., 2015;Hall and Riggs, 2007). Concerning the comparison of spatial satellite data with in situ measurements, a point-wise comparison was implemented. To relate in situ "point" measurements with AVHRR GAC "area" snow information, both the center pixel containing the in situ point measurement and the 3 × 3 pixels centered around this point were tested, respectively. This treatment took into consideration the influence of data noise, geometric mismatch, and spatial heterogeneity. Furthermore, the absence or presence of snow indicated by in situ observations is assumed to be representative of at least a 3 × 3 pixel area, but this depends on topography. Consequently, there are altogether 10 combination cases for accuracy assessment (Table 2).
The 2 × 2 contingency table statistics (Table 3) were utilized to indicate the quality of the snow product. If both reference data and the snow product identified the pixel as snow, it is labeled as a hit (a); if neither of them indicated the pixel as snow, it is labeled as zero (d); if the snow product indicates the pixel as snow, but the reference data does not, it is marked as false (b); and if the opposite occurs, it is indicated as a miss (c) (Hüsler et al., 2012;Siljamo et al., 2011).
Based on these measures, indicators such as accuracy (ACC), Heidke skill score (HSS), and bias (Bias) were determined (Eqs. 3-5) (Hüsler et al., 2012). ACC denotes the percentage of correctly classified pixels divided by the total number of pixels. ACC values closer to 1 denotes a perfect agreement between the snow product and the reference data, while a value of 0 corresponds to complete disagreement. However, it is strongly influenced by the most frequent category (i.e., in summer) (Hüsler et al., 2012) and thus ideally requires an equal distribution of categories. Hence, we confine our accuracy assessment to the snow season (from October to March) only, a limitation that was implemented in other studies as well (Yang et al., 2015;Gafurov et al., 2012;Hüsler et al., 2012;Huang et al., 2011). The HSS and Bias provide refined measures in cases when the frequency distribution within the validation subsets is not equal. The former describes the proportion of pixels correctly classified over the number that was correct by chance in the total absence of skill. Negative values indicate that the chance performance is better, 0 represents no skill, and a perfect performance obtains an HSS of 1 (Hüsler et al., 2012). It is generally true that a value above 0.3 denotes a relatively good score for a reasonably sized sample for the binary forecast (Singh, 2015). The Bias, described by the ratio of the number of snow-covered pixels to the number of reference data pixels, is a relative measure to detect overestimation (value is higher than 1) or underestimation of snow (value is less than 1). Unbiased re- Table 2. A short summary of all the combinations of thresholds.

Cases Combinations Cases Combinations
Case1 SD ≥ 1 cm, center pixel Case6 SD ≥ 1 cm, 3 × 3 pixels Case2 SD ≥ 2 cm, center pixel Case7 SD ≥ 2 cm, 3 × 3 pixels Case3 SD ≥ 3 cm, center pixel Case8 SD ≥ 3 cm, 3 × 3 pixels Case4 SD ≥ 4 cm, center pixel Case9 SD ≥ 4 cm, 3 × 3 pixels Case5 SD ≥ 5 cm, center pixel Case10 SD ≥ 5 cm, 3 × 3 pixels The validation follows two types of strategies. First, the snow-cover data time series of satellite and in situ station were compared, resulting in accuracy indicators over each station. This validation allows us to check the spatial divergence of accuracy within different sites, as well as the effect of land cover on the accuracy of satellite-derived snow information. Second, the snow data of all in situ sites were combined together for validation of AVHRR GAC snow on the daily basis. In this way, the long-term temporal consistency of accuracy can be evaluated. Additionally, in order to assess the product performance with respect to the temporal variability in snow cover, the binary metrics are summarized and analyzed for each month. An analysis of increase/decrease in accuracy with respect to FSC and SD was also included to explore the influence of smaller snow patches on the accuracy. Finally, in order to check the relative performance of AVHRR GAC snow to the well-used MODIS product, MOD10A1 V006 was also evaluated with in situ station data following the same method. It is expected that the major difference in their performance is either due to the quality of the applied processing and snow-cover retrieval algorithms or the general satellite data characteristics. As for the comparison of their absolute values, the root mean square error (RMSE), mean bias (mBias), and the coefficient of correlation (R) were derived through the scene-by-scene comparison.

Multi-scale validation based on high-resolution snow-cover maps
In order to evaluate AVHRR GAC snow at a broader spatial scale, Landsat TM/ETM aggregated FSC was used as the reference. Snow-free values are treated as 0 % snow, and a fully snow-covered pixel is assigned 100 % snow. The validation was conducted from two aspects: (i) one is based on 197 scenes covering the whole HKH region in order to increase the spatial coverage of validation, and (ii) the other is based on 46 · 2 scenes over P140-R40/41 in order to make a detailed analysis of the factors (e.g., elevation, topography, and land cover) influencing the accuracy of the AVHRR snow dataset.
4 Results and discussion 4.1 The validation based on in situ data

Snow depths and pixel threshold sensitivity analysis
To test the sensitivity of the in situ SD threshold for the snowcover detection, the overall accuracy metrics were computed by combining data of all in situ sites throughout the study period (from 1982 to 2013 for the AVHRR-GAC-derived snow and from 2000 to 2013 for MOD10A1). The variations in Bias, ACC, and HSS with all the threshold combinations (Table 2) are shown in Fig. 3. As shown in Fig. 3a, an SD threshold of 2 cm (case2) maximizes the overall accuracy of the AVHRR GAC snowcover dataset. With the further increase in SD threshold, the AVHRR GAC snow detected will be seriously overestimated. This indicates the presence of snow can be best detected by the AVHRR GAC dataset for in situ snow depth measurement of 2 cm. Furthermore, the increasing rate of ACC and decreasing rate of HSS are the highest between the 1 and 2 cm SD thresholds, and it flattens for greater SD thresholds. When it comes to the influences of geometric mismatch or spatial heterogeneity (center pixel versus 3 × 3 pixels; Table 2), they show significant effects on both the magnitude and the variation trend of these accuracy indicators. But such effects are not fixed and vary by satellite datasets and accuracy indicators (Fig. 3). For this reason, we chose case2 (SD ≥ 2 cm, center pixel) as the optimum threshold combination for the evaluation of the AVHRR GAC snow dataset. For consistency, this choice was also used for the comparison of the performance of MODIS with in situ data.
As seen from Fig. 3a, AVHRR snow datasets show distinct advantages over MODIS snow regarding the Bias value. The former shows biases of 0.94 and 1.03 for the AVHRR raw snow and gap-filled snow, respectively, while the latter is seriously overestimated with the bias of 1.74. Nevertheless, the three datasets show comparable ACC, with the values of 0.94, 0.92, and 0.94 for AVHRR raw snow, AVHRR gap-filled snow, and MODIS snow datasets, respectively. The HSSs of the three datasets are reasonable, with the values larger than 0.3. MODIS snow shows the largest HSS of 0.35, followed by AVHRR raw snow with an HSS of 0.34. The AVHRR gap-filled snow-cover dataset ranks last, with the smallest HSS of 0.31. From the above results, it can be found that the AVHRR raw dataset performs slightly better than the AVHRR gap-filled dataset with respect to the agreement with in situ sites and the algorithm performance (skill). This is reasonable since additional uncertainty was introduced in the gap-filling process. For this reason, we will only focus on AVHRR raw snow for further analysis. Generally, AVHRR raw snow is comparable with MODIS snow when ACC and HSS are focused.

The temporal consistency of quality indicators
From Fig. 4, it can be seen that the interannual variability in these accuracy metrics is evident, especially for ACC and HSS. In the time series of AVHRR GAC snow, ACC is basically distributed between about 88 % and 92 % (Fig. 4a). An obvious increase in ACC can be observed from 1982 to 1985, followed by a decrease in ACC from 1985 to 1992. Then an increasing trend of ACC occurs from 1992 to 2000. From 2000 to 2010, ACC is relatively stable with time. But after 2010, an increasing trend reappears. Differently from the previous assessments, the ACC of the AVHRR snow datasets at the beginning of the time series (1982)(1983)(1984)(1985)(1986)(1987)(1988)(1989)(1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000) is slightly worse than the end of the time series (2000-2013) regarding the magnitude of ACC and its temporal consistency. The HSS shows a different behavior compared to ACC (Fig. 4b), which increases slightly and monotonously from 0.45 at the beginning to about 0.48 at the end of the time series. This further indicates that the performance of AVHRR snow continues to improve with time. Nevertheless, the improvements of the performance of AVHRR GAC snow do not occur in the Bias (Fig. 4c). The Bias shows the best performance from 1990 to 2000, with relatively stable values around 1. But during other time periods, relatively large fluctuations appear, and it generally overestimates snow during these periods.
As shown in Fig. 4, it can be seen that MODIS snow is inferior to AVHRR GAC snow regarding the magnitude of ACC and its temporal consistency. Furthermore, its HSS is consistently smaller than that of AVHRR GAC snow. Nevertheless, its temporal stability is slightly better than AVHRR GAC snow since the HSS of MODIS almost stays constant over time. When it comes to Bias, MODIS snow shows a more serious overestimation than AVHRR GAC snow but comparable temporal stabilities to the latter.
In order to highlight the performance of AVHRR GAC snow in different months, the temporal variations in ACC, HSS, and Bias over different months are presented (Fig. 5). From Fig. 5a, it can be seen that ACC of AVHRR GAC snow is over 0.85 for all months and even above 0.90 for October. Nevertheless, both the temporal variation trend and the magnitude of ACC show differences from month to month. It is clear that the AVHRR GAC snow shows the highest ACC in October and lowest ACC in January, but the temporal stability of ACC is best in November and worst in January and December. It is interesting that the results tend to polarize into two groups: ACC for January through March and ACC for October through December. Generally, ACC in the former group is smaller than those in the latter group. It is noteworthy that ACC after 2000 is generally larger and more stable than those in earlier years on the monthly scale (Fig. 5a), indicating the better accuracy and consistency of the younger satellite platforms after 2000. Compared to AVHRR GAC snow, the ACC of MODIS snow consistently shows large temporal variations for all months, and there is no month that shows advantages over others regarding the magnitude and temporal stability of ACC (Fig. 5b).
The HSS for different months are larger than 0.4 throughout the time series, but large differences of the magnitude and temporal stability exist between different months (Fig. 5c). Similar to ACC, the AVHRR GAC snow generally shows the largest HSS in October for most of the time. Furthermore, the HSS in October shows a similar temporal variation trend with the overall temporal trend of HSS in Fig. 4b. Among all the months, the HSS in December shows the largest temporal variations, featured by the highest HSS from 1990 to 2000 and the lowest HSS from 2005 to the end. The HSS in January through March shows relatively smaller temporal variations than those in October through December. Regarding the magnitude of HSS, the different rank of these months during different periods may be associated with the shift of snow-cover phenology due to interannual variability intensified by global warming. Unlike AVHRR GAC snow, MODIS snow shows larger HSSs in January and February (Fig. 5d). Furthermore, the temporal variations in HSS are more significant than AVHRR GAC during the same period.
Although the AVHRR GAC snow shows the best performance in October regarding the magnitude of ACC and HSS, it shows serious overestimation in this month (Fig. 5e). In particular, AVHRR GAC snow generally overestimates snow in February, March, October, and November. By contrast, it  either slightly overestimates or underestimates snow in December and January, with the bias distributed around 1. This result is understandable because during December and January, snow coverage tends to be dense and spatially continuous, which results in unbiased estimation. By contrast, during February, March, October, and November, snow cover tends to be patchy, and AVHRR GAC data are more able to detect snow than in situ point observations due to the large pixel coverage. MODIS snow consistently overestimates snow in different months and shows larger temporal variations than AVHRR GAC snow (Fig. 5f).
From the results above, it can be concluded that the AVHRR GAC snow dataset performs variably throughout the course of the year, which may be related to the different amounts of snow in the HKH region. Generally, the magnitudes of ACC and HSS are largest in October and smallest in January. But the temporal stability of ACC is best in November and worst in January and December, while that of HSS is worst in December. The results of Bias provide different perspectives for the performance of AVHRR GAC snow. It generally overestimates snow in February, March, October, and November. By contrast, unbiased estimation is likely to occur in December and January. Compared to AVHRR snow datasets, the interannual variability in ACC, HSS, and Bias of the MODIS snow product in different months is generally stronger (Fig. 5). Figure 6 show the boxplots of the validation metric derived from each in situ station, with the aim of revealing their spatial variability. It can be observed that the spatial variability in these validation metrics widely exists given their dispersed distribution. The maximum of ACC even reaches 0.99 for the AVHRR snow datasets, while the minimum values are close to 0.76 (Fig. 6a). Similarly, HSS also shows a dispersed distribution for the AVHRR snow datasets. The AVHRR raw dataset ranges from 0.2 to 0.39 with min-max values of 0.01 to about 0.68 (Fig. 6b). Likewise, the bias is located around 0.51-1.6 with min-max values of 0.05 and 2.89 for the AVHRR dataset (Fig. 6c). These results are understandable because the performance of satellite snow datasets is affected by many factors. Despite the awareness of spatial variability in these validation metrics, the degree of variability depends on satellite datasets and metrics. The HSS and Bias of the MODIS snow dataset are more divergent than the AVHRR raw snow dataset (Fig. 6).

The potential factors influencing accuracy
Following the early study (Klein and Barnett, 2003), the effect of SD on the accuracy of satellite snow datasets was evaluated (Fig. 7a). Observed SD was divided into six categories: SD = 0, 1, 2, 3, 4, and 5 cm. It is obvious that the ACC of the two satellite snow datasets based on AVHRR and MODIS show similar responses. The highest ACC occurred when SD=0 cm, which is followed by SD = 1 cm. When SD ≥ 2 cm, the ACC decreases significantly. The threshold of 2 cm which transforms in situ SD measurements to snowcover or snow-free information is partly responsible for this result. Another cause of this phenomenon is the representativeness of the point-scale in situ observation compared with satellite observation on a larger pixel scale. When SD was less than 2 cm, it is more likely that snowfall events only occurred over a limited area of the satellite pixel. In this condition, satellite snow datasets are more likely to classify the pixel as snow-free, which would increase the agreement between satellite and in situ observations. Despite the decrease in ACC when SD ≥ 2 cm (compared to SD < 2 cm), the ACC of various snow datasets clearly shows an increasing trend with increasing SD. It is understandable since, with increasing SD, the satellite pixel is more likely to be entirely covered by snow, and the agreement between satellite and in situ observations, as a result, increases. In general, SD was shown to affect the overall agreement of satellite snow datasets, and their accuracies increase with increasing SD in the situation when the in situ site indicates snow-cover information, which is in line with previous studies (Zhou et al., 2013;Wang et al., 2009).
The effect of FSC on the accuracy of satellite snow datasets was checked in Fig. 7b. FSC was grouped into five categories using the ranges of 0 %-20 %, 20 %-40 %, 40 %-60 %, 60 %-80 %, and 80 %-100 %. Likewise, the ACC of the AVHRR and MODIS snow datasets shows a similar response to FSC. The highest ACC was found when FSC ≤ 20 %, followed by 20 % < FSC ≤ 40 %. This is also partly caused by the threshold (FSC ≥ 50 %) applied to FSC maps to transfer fractional to binary snow information and partly caused by the spatial representativeness of in situ sites. When only a small part of the pixel is covered by snow, in situ sites are more likely covered by very thin snow or not covered by snow. Consequently, in situ sites are more likely to be indicated as snow-free, which increases the agreement between in situ and satellite observations. In the situation of 40 % < FSC ≤ 60 %, ACC decreases significantly. This occurs because part of the satellite data in this group indicates snow cover using the threshold FSC ≥ 50 %, but there appears a strong possibility that the in situ site is not covered by snow or only covered by very thin snow. For the case of 60 % < FSC ≤ 80 %, all satellite data in this group indicate snow cover, but there remains a very real risk that the in situ site is not covered by snow or only covered by very thin snow. As a result, the agreement between them further decreases. With the further increase in FSC, the possibility that in situ sites indicate snow cover also increases. Thus, ACC increases in the situation of 80 % < FSC ≤ 100. From these results, it is concluded that FSC affects the overall agreement between satellite snow datasets and in situ observations. In the condition that satellite data indicate snow-free, ACC decreases with increasing FSC. By contrast, in the case of satellite data indicating snow cover, ACC increases with increasing FSC. Nevertheless, it is important to note that the variations in ACC with snow depth and FSC are related to the threshold adopted for transferring SD and FSC to snowcover or snow-free information.
As seen in Fig. 7a, we can find that the accuracy of the AVHRR snow datasets is larger than the MODIS snow product when SD ≤ 1 cm but consistently smaller than MODIS snow at each SD when SD ≥ 2 cm. This means that in snowfree or very thin snow conditions, the AVHRR snow datasets are less misclassified than the MODIS snow product, but in contrast, in snow-covered conditions, although the three datasets all reveal an increase in ACC with increasing SD, the MODIS snow product is more reliable and correctly classified. The discrepancies between them mainly result from the different spatial scale of the pixel. From Fig. 7b, it becomes apparent that accuracies of the AVHRR snow datasets are slightly lower than the MODIS product for each level of FSC when FSC ≤ 60 % but larger than the MODIS product when FSC > 60 %. This phenomenon is related to the different degrees of spatial representativeness of in situ sites relative to different pixel scales. Figure 8a and b present the distribution of ACC for two satellite snow datasets against in situ site observations over different elevation regions (five classes) and land-cover types (four types), respectively. It is generally thought that coarsepixel satellite snow products perform better at higher elevations due to the continuous and thick snow cover (Yang et al., 2011). Nevertheless, the ACC over the HKH region shows different phenomena. The two satellite snow products consistently show larger ACC over slightly lower elevations than those over higher elevations. Nevertheless, an exception can be found in the elevation region of 3500-4500 m, where the ACC of the two datasets is the lowest over the whole HKH region. Furthermore, the ACC over these elevation regions is the most divergent, demonstrating that the accuracy of snow product within this range is more likely to be affected by other factors. It is noteworthy that the MODIS snow product slightly outperforms the AVHRR snow dataset over different elevation regions. This is reasonable since the spatial-scale mismatch between in situ and satellite-based observations is greater for the AVHRR snow datasets than for the MODIS snow dataset.
Despite the effect of elevation on ACC, it was not treated when we explored the effect of land-cover type on ACC  ( Fig. 8b) because the number of in situ sites over different land-cover types and different elevation regions are very limited. For AVHRR GAC snow, the highest agreement with in situ measurements is found in the barren class, followed by grasslands and savannas. Although nearly half of the in situ sites over forest show ACC larger than 0.91, substantial numbers of in situ stations show relatively low ACC over forest. This indicates that the well-known issues of identifying snow in forested areas using optical satellite data are not fully resolved in AVHRR GAC snow. It is interesting to find that the MODIS snow product maintains its superiority over different land-cover types, and its advantage becomes more pronounced over forest and savannas. The different performance between AVHRR snow and MODIS snow is partly caused by their individual accuracy and partly caused by the different effects of spatial-scale mismatch between in situ and satellite-based observations.

Quantitative comparison to MOD10A1
In order to investigate the absolute difference between AVHRR GAC and MODIS snow, we compared them on the pixel basis following the cross-validation framework. The indicators of RMSE, mean Bias (mBias), and correlation coefficient (R) are used to reveal their differences and consistencies. The scene-by-scene comparison was made over the region P140-R40/41 throughout the snow season of 2012 and 2013. As shown in Fig. 9, the highest density is between 0 and −5 for mBias and 0 and 10 for RMSE. Only a small part of the scenes show a relatively large mBias of 15 % and RMSE of 30 %. Their overall mBias values are very small, with the values of 0.06 % and 0.94 % for the AVHRR raw and gap-filled snow datasets, respectively. The overall RM-SEs are 12.8 % and 17.0 % for the AVHRR raw and gap-filled snow datasets, respectively. Furthermore, the spatial distribution characteristics of FSC indicated by AVHRR GAC snow basically agree with those of MODIS snow, given the overall R is 0.63 and 0.53 for the AVHRR raw and gap-filled snow datasets, respectively.

Spatial consistency of snow-cover extent
In order to avoid the spatial limitations of the in situ stations, the comparison between the AVHRR raw snow datasets and Landsat data was also carried out over the whole extent of the HKH region. The RMSE, mBias, and R in different conditions are summarized in Table 4. RMSE is generally less than 23 % in different conditions with an overall RMSE of 22.31 %. The mBias still indicates an underestimation of the AVHRR snow datasets, with the overall mBias of −2.96 %. The consistency between Landsat and AVHRR snow is good, with the overall R of 0.82. The best performance of AVHRR GAC snow is observed in the plain class, with the smallest RMSE of 18.2 % and mBias of −1.65, as well as the largest R of 0.90. By contrast, the largest RMSE of 22.9 % and mBias of −3.18 % appear in mountain areas. When it comes to consistency, the worst performance occurs in forests, with the lowest R of 0.57. In order to explore the performance of AVHRR GAC snow in high detail for a wide range of conditions, the spatial accuracy was assessed on the pixel basis based on Landsat5 TM data time series over the areas covered by P140-R40/41. The AVHRR snow datasets systematically underestimate snowcovered areas with regards to the Landsat5 TM data (Table 5). This can be explained by the fact that direct coarseresolution FSC is more likely to be lower than the FSC aggregated from high-resolution FSC because high-resolution data are able to pick up snow in one pixel, which is too little to create enough snow signals in coarse-resolution pixels but will show up in the aggregated FSC (Singh et al., 2014;Jain et al., 2008). The accuracy of AVHRR GAC snow is different over the two areas, with better performance over P140-R40 than P140-R41 (Table 5). AVHRR raw snow shows a higher accuracy with a smaller RMSE of 11.39 % (vs. 15.08 %) and mBias of −4.19 % (vs. −7.64 %) over P140-R40 than P140-R41. Similar results can also be seen in AVHRR gap-filled snow, with a smaller RMSE of 13.40 % (vs. 18.37 %) and mBias of −4.94 % (vs. −9.46 %) over P140-R40 than P140-R41. When the two areas are combined together, the AVHRR GAC snow presents overall RMSEs of 13.27 % and 16 % and mBias values of −5.83 % and −7.13 % for the raw and gapfilled datasets, respectively, over the highly variable region (e.g., elevation, topography, and land cover). From Table 5, it is clear that AVHRR raw snow shows a higher accuracy

Pixel-based comparison and potential factors influencing accuracy
Both the land-cover types and topographies are highly heterogeneous over the HKH region. Here, the sub-region P140-R40/41 was chosen to investigate the factors (i.e., elevations, land-cover type, slope, aspect, and topographical variability) influencing the accuracy of the AVHRR GAC snow dataset (Fig. 10). From Fig. 10a, it can be seen that RMSE shows a strong positive response to elevations. But an exception can be found within the region of 3500-4500 m, where RMSE shows a clear decrease but also the greatest spread. This occurs because the accuracy of AVHRR GAC snow is not merely influenced by elevations. Over the flat areas (0-200 m) and hills (200-500 m), the highest density of RMSEs is distributed between 0 % and 5 %. Over lower and medium height mountains (500-2500 m), the highest density of RM-SEs is distributed between 0 % and 10 %. With the further increase in elevation (i.e., 2500-3500 m), more than half of the pixels show RMSEs larger than 10 %. Nevertheless, over the elevation region of 3500-4500 m, more than half of the pixels show small RMSEs of less than 5 %. But the maximum RMSE can reach 45 % over this region. Over the elevation region of 3500-4500 m, the highest density of RMSEs is lower than 15 %, and the RMSE increases significantly over the extreme high area (> 5500 m). This finding is inconsistent with Yang et al. (2011) who consider that coarse-pixel satellite snow products generally perform better at higher elevations due to the continuous and thick snow cover. The larger RM-SEs in the highest elevations are partly caused by the large values of FSC themselves, partly caused by the roughness, topographic effects, and shadows, and partly caused by the cloud effects given that the probability of cloud rises with rising altitude in mountain areas.
Given the considerable effect of elevation on the accuracy of AVHRR GAC snow, the regions P140-R40/41 are divided into eight groups according to their elevations (Fig. 10b). From Fig. 10b, it can be seen that the RMSE is rising with elevation in each individual land-cover type. Nevertheless, an exception can be found in grasslands, which show the 4274 X. Wu et al.: Evaluation of snow extent time series in the Hindu Kush Himalayas largest RMSE over the region 2500-3500 m, and the RMSEs decrease significantly over the region 3500-4500 m. Over the flat areas (0-200 m), AVHRR snow mapping accuracy is the best in croplands and the worst in the barren class, and the accuracy is slightly better in forest than in savannas. Moreover, the accuracy is most spatially stable in grasslands given the centralized distribution of RMSE. When it comes to hills (200-500 m), croplands still show the best accuracy, followed by the forest and grasslands, and savannas rank last. As the elevation increases to 500-1500 m, croplands still show the best accuracy. By contrast, grasslands show the worst accuracy. Savannas show a smaller RMSE than forests. With the further increase in elevations (2500-3500 m), only grassland, savannas, and forests appear. The best performance occurs in forests, followed by savannas, and grasslands rank last. Over the high mountain area (3500-4500 m), savannas present the largest RMSE, followed by forest, and the grasslands show the largest spatial variations within this range. With the further increase in elevation (> 4500 m), only grasslands and the barren class appear, and the former shows better accuracy than the latter with regard to the magnitude of RMSE and its spatial variations. Therefore, we can conclude that the performance of AVHRR GAC snow over different land-cover types depends mainly on elevations. Its accuracy is generally good in croplands since it is distributed only within the region of < 1500 m. The accuracy of the barren class is generally not good because it is merely distributed within the range of > 3500 m. Forests and savannas basically show comparable overall accuracy. The accuracy of grasslands shows a different response to elevations, which is the worst over regions of 2500-3500 m height. Its accuracy is comparable to other land-cover types over relatively low elevations (< 1500 m) and outperforms the barren class over high elevations (> 3500 m).
The effect of slope on the accuracy of the AVHRR GAC snow datasets is clearly shown in Fig. 10c. Better results tend to appear over the areas with smaller slopes. The RMSE over different elevation regions generally shows an increasing trend with slope. Nevertheless, there are two outliers over the regions of 1500-2500 m and extremely high areas (> 5000 m). In the former region, the RMSEs with slopes ranging from 25 to 35 • are slightly larger than those with slopes ranging from 35 to 45 • . In the latter region, there is no increase in RMSE when slopes increase from 15-25 • to 25-35 • . This occurs because the accuracy of AVHRR GAC snow is affected by many factors. In fact, the effect of slope on snow mapping accuracy is understandable since the topographic effects tend to be significant in a steep mountain area.
Regarding the effect of aspect (Fig. 10d), there is not a clear trend of RMSEs with aspect over the regions lower than 5000 m. Nevertheless, over the areas higher than 5500 m, the RMSEs first show a clear decreasing trend and then a clear increasing trend when aspect changes from the north-facing slope to the south-facing slope, and vice versa. Moreover, the maximum RMSE can even reach 70 % over the southfacing slope, which is larger than that of the north-facing slope (∼ 40 %). This is attributed to the fact that during winter months, the south-facing slopes receive significantly more radiant energy, providing an unfavorable environment for snow accumulation. Thus, snow cover on the south-facing slopes is more likely to be shallowness and patchiness, reducing the accuracy of the AVHRR GAC snow datasets.
From Fig. 10e, it can be found that there is only small topographical variability over the regions with low elevations (< 500 m). The RMSEs of these regions with different elevations generally show an increasing trend with topographical variability, indicating its significant effect on the accuracy of the AVHRR GAC snow datasets. This is because the rugged relief can lead to shadowing effects, resulting in different degrees of surface information loss between highresolution satellite data and coarse-resolution satellite data. Furthermore, the increasing trend is more significant over the regions with large elevations. It is noteworthy that there are also several outliers that do not show a clear increasing trend. For instance, over the elevation region of 2500-3500 m, even a decrease in RMSE can be observed when topographical variability increases from 100-250 to 250-350 m. This is due to the fact that the topographical variability is just one of the factors influencing the accuracy of AVHRR GAC snow.
From the results above, we can conclude that the accuracy of AVHRR GAC snow is closely related to elevations, slopes, and topographical variability, and the negative influence of these factors on snow mapping accuracy is more significant over regions with high elevations. The effect of aspect can be ignored over the regions lower than 5500 m, but for the areas higher than 5500 m, the accuracy first increases and then decreases gradually from the north-facing slope to the southfacing slope, and vice versa. The effect of land-cover type on snow mapping accuracy is related to elevations.

Conclusions
In this study, the ESA CCI+ Snow project AVHRR GAC snow-cover extent product was evaluated using different reference datasets. Compared to other AVHRR snow extent products, this dataset is designed to provide global snow extent with consistent performance across the whole suite of AVHRR sensors, which is considered a major step toward a detailed snow climatology on the global scale. The validation was conducted from two aspects. First, more than 30 years of in situ measurements over 118 stations were employed to assess the sensor-to-sensor consistency. Second, medium-to high-resolution data (i.e., MODIS and Landsat snow) were introduced to provide great spatial coverage and investigate the general performance of AVHRR GAC snow. Furthermore, an in-depth analysis was made over the area with a wide range of conditions (e.g., elevation, topography, and land cover) in order to explore the factors influencing the performance of AVHRR GAC snow.
Validated against in situ station observations, the overall ACC of the AVHRR raw snow dataset was about 94 %, which is the same as for MOD10A1. The use of a temporal filter caused a slight reduction in ACC of the AVHRR gapfilled snow dataset, with overall values around 92 %. AVHRR GAC raw snow is slightly underestimated, with the bias of 0.94. Based on the observations of all in situ sites, we obtain HSS = 0.34 for AVHRR GAC snow, which is also comparable to the one for MOD10A1 (HSS = 0.35). When validated against Landsat5 TM images over the whole HKH region, the RMSE and R are 22.31 % and 0.82, respectively, but for the highly variable sub-region P140-R40/41, AVHRR GAC snow presents RMSEs of 13.27 % and 16 % and mBias values of −5.83 % and −7.13 for the raw and gap-filled datasets, respectively. Their consistency with Landsat snow is reduced, with a relatively low R of 0.46 and 0.47 for raw and gap-filled snow datasets, respectively.
Regarding the temporal consistency of the AVHRR GAC snow datasets, the sensor-to-sensor consistency was found to differ slightly and unsystematically in ACC and Bias throughout the time series. While the consistent slight increasing trend of HSS is noteworthy, it is important to point out that the different performance of the AVHRR GAC snow datasets in different months is mainly caused by the variable amount of snow. Particularly, the performance of AVHRR GAC snow is worst in January and best in October regarding the magnitude of ACC and HSS, but when the temporal stability of accuracy was considered, it performs best in November and worst in January and December regarding the ACC, while that of HSS is worst in December. The results of Bias provide different perspectives for the performance of AVHRR GAC snow. It generally overestimates snow in February, March, October, and November, which is strongly linked to the patchiness of the snow cover that is not captured by the in situ data. By contrast, unbiased estimation is likely to occur in December and January when the snow cover is most continuous over greater areas.
The validation results with two independent reference datasets (i.e., in situ and Landsat) both show considerable spatial variabilities, indicating the effect of other factors (e.g., SD, FSC, land-cover type, elevation, slope, aspect, and topographical variability). Generally, in snow-covered situations, the accuracy of satellite snow datasets increases with increasing SD and FSC. By contrast, in snow-free conditions, accuracy decreases with increasing SD and FSC. Furthermore, the accuracy of AVHRR GAC snow is closely related to elevations, slopes, and topographical variability, and the negative influence of these factors on snow mapping accuracy is more significant over regions with high elevations. The RMSE over different elevation regions generally shows an increasing trend with slope. The effect of aspect can be ignored over the regions lower than 5500 m, but over the areas higher than 5500 m, the accuracy first increases and then decreases gradually from the north-facing slope to the southfacing slope, and vice versa. The effect of land-cover type on snow mapping accuracy is related to elevations. Its accuracy is generally good in croplands since it is distributed only within the region of < 1500 m. The accuracy of the barren class is generally not good because it is merely distributed within the range of > 3500 m. Forests and savannas basically show comparable overall accuracy. The accuracy of grasslands shows different responses to elevations, which is the worst over regions of 2500-3500 m height. Its accuracy is comparable to other land-cover types over relatively low elevations (< 1500 m) and outperforms the barren class over high elevations (> 3500 m).
When it comes to the performance relative to the MODIS snow products, AVHRR raw snow is comparable with MODIS snow when ACC and HSS are focused. Nevertheless, it shows distinct advantages over the MODIS snow product focusing on Bias. Regarding the temporal and spatial behaviors, different results appear in the two dimensions. In the temporal dimension, the AVHRR snow datasets display a more stable behavior regarding the ACC but less stable regarding the HSS than the MODIS snow products, but in the spatial dimension, the AVHRR snow datasets show a comparable spatial variability in accuracy but a smaller spatial variability in HSS and Bias than the MODIS snow products. The absolute differences between the AVHRR GAC and MODIS snow datasets were still reasonable, with the overall RMSE of 12.8 % and 17.0 %, mBias of 0.06 % and 0.94 %, and R of 0.63 and 0.53 for the AVHRR raw and gap-filled snow datasets, respectively.
This study represents the first validation of the unique daily AVHRR GAC snow extent spanning 4 decades over the HKH region.
Although the reference datasets (i.e., in situ sites, highresolution satellite data) have their own limitations and flaws, our results still encourage the compilation of a consistent, complete, long time series snow extent dataset from historical AVHRR GAC data. This study characterizes the product performance with distinct accuracy parameters from different perspectives and thus contributes to the ongoing efforts to improve the performance of existing snow products by enhancing our knowledge of the thematic and absolute accuracy of current products.
Author contributions. XW was responsible for the main research ideas and writing the manuscript. KN, VP, CM, DM, and JW contributed to the data collection. SW contributed to the manuscript organization. All the authors thoroughly reviewed and edited this paper.
Competing interests. The contact author has declared that neither they nor their co-authors have any competing interests.
Disclaimer. Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.