Surface composition of debris-covered glaciers across the Himalaya using spectral unmixing and multi-sensor imagery

The Hindu-Kush Himalaya mountain range is characterized by highly glacierized, complex, dynamic topography. The ablation area of these glaciers is often covered a highly heterogeneous debris cover mantle comprising ponds, steep and shallow slopes of various aspects, variable debris thickness and exposed ice cliffs. These surface elements are associated with differing ice 15 ablation rates, and understanding the composition of the glacier surface is essential for a proper understanding of glacier hydrology and glacier-related hazards. Here we use high-resolution Pleiades (2 m) and RapidEye imagery (5 m) combined with Landsat imagery (30 m) to estimate the composition of debris-covered glacier tongues across the Himalaya around the year 2015. We use linear spectral unmixing to map various types of debris, clean ice, supraglacial ponds and vegetation on debris-covered glaciers across the mountain range. We develop the spectral unmixing methods in the Khumbu region of eastern 20 Nepal, and then apply them over the entire Himalaya (a glacier area of 2,254 km2). This allowed us to convert 30 m fractional maps into finer classification maps and to estimate the composition of debris-covered glaciers at various spatial scales. Debriscovered glaciers across the mountain range comprised 2.1 % supraglacial ponds, 12.8 % dark debris, 60.9 % light debris and 4.5 % supra glacial vegetation, with negligible amounts of clean ice and clouds and unclassified areas. Supraglacial ponds were more prevalent in the monsoon-influenced central-eastern Himalaya (up to 4 % of the debris cover area) compared to the 25 monsoon-dry transition zone (only 0.3 %). The automated fractional supraglacial pond maps developed here serve to complement and improve the accuracy of existing regional lake datasets. They also provide a basis for exploring the turbidity of lakes and ponds as indicators of glacier change processes, and to monitor the evolution of ponds in the context of glacial hazards.


Introduction
High altitude glacierized environments of the Hindu-Kush Himalaya are characterized by complex, dynamic topography, notably the presence of a mantle of debris cover on the ablation areas of glaciers. This is caused by mass-wasting processes 35 such as rock fall and rockslides from the steep valley sides (Kirkbride, 2011;Shroder et al., 2000). This results in a highly heterogeneous surface, consisting of debris material of various lithologies and grain sizes (sand and silt to boulders), forming debris cones on variable but mostly shallow slopes. Some of the most notable features of such surfaces are the supraglacial ponds and exposed ice cliffs, which have gained interest in recent years for several reasons. First, they influence the surface energy receipts of the supraglacial debris surface and the efficiency with which atmospheric energy can be transferred to the 40 underlying ice and cause glacier ice ablation. Ice cliffs and supraglacial ponds, in particular, are local 'hot spots' for glacier downwasting due to enhanced heat absorption at the surface of these features (Ragettli et al., 2016;Miles et al., 2016;Sakai et al., 2000). Understanding their distribution and dynamics on the glacier surface is essential for a proper understanding of glacier hydrology, notably to simulate ablation rates and meltwater production using a variety of ice melt models (Reid and Brock, 2010;Foster et al., 2012;Miles et al., 2020). Second, the current distribution and fluctuation of pro-glacial lakes as 45 well as and supraglacial pond extents is also of interest for assessing glacier-related hazards. Recent studies have reported an increase in pro-and supraglacial lake area and number in the Himalaya and worldwide as a response to climatic changes (Shugar et al., 2020a;Nie et al., 2017;Shukla et al., 2018). Some of the supraglacial ponds have been shown to coalesce and form larger supraglacial lakes, or to evolve into fully-formed pro-glacial ice or moraine-dammed lakes. Moraine breaching "super-resolution mapping" include sub-pixel classification, neural network approaches and pixel swapping. Initially introduced by Atkinson (1997;2004) and Foody (2004), such techniques allow analysis at sub-pixel scales, and are of interest for investigating the small-scale variability of surface properties of debris-covered glaciers. Spectral unmixing routines have 100 been used in glaciology to some extent, i.e. for retrieval of snow grain size or to derive fractional snow covered areas from MODIS or Landsat (Painter, 2003;Painter et al., 2009;Sirguey et al., 2009;Veganzones et al., 2014;Rosenthal and Dozier, 1996), to map clean glacier areas or snow (Painter et al., 2012;Cortés et al., 2014), lakes (Zhang et al., 2004) and vegetation (Ettritch et al., 2018;Song, 2005;Xie et al., 2008). However, their use for mapping the surface of debris-covered glacier tongues has not been thoroughly exploited. A couple of studies used spectral analysis to characterize the mineral composition 105 of debris covered glaciers in the Khumbu region of Nepal Himalaya (Casey and Kääb, 2012;Casey et al., 2011) or to characterize lake colour, turbidity and suspended solids in the same area (Matta et al., 2017;Giardino et al., 2010). Scherler et al. (2018b) tested various methods, i.e. a fractional debris cover algorithm and standardized band ratios to produce three versions of a global debris cover map (Scherler et al., 2018a) on the basis of Randolph Glacier Inventory (RGI) version 6.0 (RGI_Consortium 2017). Recent studies (i.e. Xie et al., 2020a;Xie et al., 2020b;Wangchuk and Bolch, 2020) demonstrated 110 the potential of innovative machine learning techniques combined with freely available data such as Landsat to quantify the surface features of debris-covered glaciers. Applying such methods at large scales allow gaining a better understanding of the response of debris-covered glaciers to climate variability in terms of water resources and hazards.
In this study, we evaluate and explore the potential of spectral unmixing techniques applied to 30 m Landsat imagery to quantify the surface characteristics and the composition of debris-covered glaciers, with a particular emphasis on supraglacial 115 pond coverage. We use high-resolution multi-spectral Pleiades and RapidEye imagery (2 m and 5 m, respectively) to define the materials of interest and to extract their spectral signatures. We teste and validate our methods on Khumbu region in the eastern Himalaya and upscale them to the entire mountain range. Specifically, the objectives of this study are to: (1) Test spectral unmixing techniques for mapping of debris-covered glaciers and their surface features; (2) Estimate the composition of debris-covered tongues in terms of exposed ice, debris material, supraglacial ponds 120 and vegetation at regional and mountain-range scale; (3) Automatically map supraglacial ponds and assess their regional distribution with respect to topographic characteristics and geographic location.
The application of sub-pixel approaches to mapping supraglacial features are rather novel, and may serve to fill the gap in the supraglacial pond coverage in the current global debris cover glacier inventories or to provide data for glacier-scale studies. 125 We aim at developing a robust, standardized method that can be transferred from commercial software to open-source software to upscale these characteristics to mountain-range scale and high-resolution temporal steps.
Our study area comprises two spatial domains (Fig. 1). The smaller domain, used as test area, comprises the glacierized Khumbu region of Nepal and glaciers north of the divide in China, both of which are part of the larger trans-boundary Sapta Kosi river basin. For simplicity we refer to this region hereinafter as the "Khumbu domain". The Khumbu region has been well studied in the last decades in terms of glacier mass balance using the traditional glaciologic method (Wagnon et al., 2013), the geodetic method (Bolch et al., 2008;Nuimura et al., 2012;Brun et al., 2017;Bolch et al., 2011;Rieg et al., 2018), energy 135 balance models (Rounce and McKinney, 2014;Rounce et al., 2015;Kayastha et al., 2000), debris cover characteristics (Iwata et al., 1980;Watanabe et al., 1986;Nakawo et al., 1999;Iwata et al., 2000;Casey et al., 2011;Yukari et al., 2000), surface velocity (Quincey et al., 2009) and more recently mapping of supraglacial lakes and ice cliffs at glacier scale (Watson et al., 2016(Watson et al., , 2017a. Rates of change of the debris-covered glacier tongues in this area vary from −0.12 ± 0.05 % per year from 1962 to 2005 (Bolch et al., 2008) to −0.27 ± 0.06 % per year from 1962 to 2011 (Thakuri et al., 2014). Watson et al. (2017a) reported 140 a pond coverage area of 1 -7 % of the glacierized area for some glaciers in the Khumbu based on high resolution Pleiades data. Salerno et al (2012) reported an area of 1.49 km 2 in the Khumbu in 2008 based on analysis of Landsat imagery, which represents 0.42 % of the mapped glacier area in their study. Ice cliffs are estimated to cover about 7 -8 % of the glacier area, for example for Changri Nup glacier in the Khumbu (Brun et al., 2018). The larger Himalaya domain is defined here as the region spanning from 76.3 to 92.6° W and 26.3 to 34.2° N, covering most area from Himachal/Jammu and Kashmir border to Bhutan Himalaya (Fig. 1). Himalayan glaciers have been in a state of negative mass balance in the last decades, and the negative trends have accelerated in the 2000 to 2010 decade (Bolch et al., 2019;Brun et al., 2017;Kääb et al., 2012). We upscaled our methods over this entire region and then selected three areas for a closer comparison of the composition of debris-covered glacier tongues in three regions from west to east, following climatic 150 patterns. The Lahaul Spiti region in the western Himalaya region (labelled "A" on Figure 1) is situated in the "monsoon-arid" transition zone, which receives monsoon precipitation during the summer and precipitation from the Westerlies in the winter (Thayyen and Gergan, 2010). The central and eastern Himalaya regions ("B" and "C" on Figure 1) are situated under the influence of the Indian Summer Monsoon, which brings large amounts of precipitation during the summer months (June to September) especially in the eastern part (Barros and Lang, 2003;Bookhagen and Burbank, 2006). In the central-eastern 155 regions, ablation and accumulation both occur during the summer season (Thayyen and Gergan, 2010).

Remote sensing data
We used multi-sensor data at various spatial and spectral resolutions, as listed in Table 1: Landsat-8 Operational Land Imager (OLI) (30 m), Pleiades (2 m) and RapidEye (5 m). The Landsat OLI sensor has been acquiring data at 30 m spatial resolution in the visible to short-wave infrared since 2013 (USGS, 2016). Sentinel-2 data were not available in this area for 160 the date/year of the acquisition of the Pleiades data (2015), so we developed our method using Landsat data. The estimated global geolocation accuracy of Landsat OLI data is estimated at 50 m; its spectral band width, calibration, geometry and radiometric resolution are superior to previous Landsat missions (Irons et al., 2012). We obtained level L1TP -T1 data, which consists of top of atmosphere registered, radiometrically calibrated and orthorectified imagery (Wulder et al., 2019;USGS, 2015). A total of 12 Landsat images were need to cover the entire Himalaya domain for the year 2015. Cloud-free 165 images were selected to match the acquisition date of Pleiades data as closely as possible. For two regions in the eastern part where cloud cover was present in 2015, we used images around the same time of the year (September to October) in 2014 and 2016 (Table 1). The Landsat scene from Khumbu (September 30, 2015) ( Fig. 1) was used as reference to develop and test our methods. Table 1 here 170 The Pleiades 1A satellite sensor, launched in 2011, has been acquiring tri-stereo high-resolution data at 0.5 m spatial resolution in the panchromatic band and 2 m in the multispectral bands (blue to near-infrared), with 20 km image swath at nadir. Multiple images per acquisition were needed to cover the extent of the Khumbu area (Rieg et al., 2018) (Fig. 1). Three Pleiades scenes from the 2015 fall acquisition (Oct 7, 19 and 20) covered the north, northeast and southeast part of Khumbu and were the closest in date to the Landsat (Sept 30). These scenes were cloud-free and snow-free. They were provided as 175 three sets of triplets of primary data (1A), and were orthorectified in ERDAS Imagine Photogrammetry Suite using the Pleiades Rational Polynomial Coefficient model and the Pleiades DEM (1 m) generated using semi global matching (Rieg et al., 2018). The individual image parts were mosaicked to a single image using ERDAS Imagine using a nearest neighbour resampling method at 2 m spatial resolution. False colour composites of Pleiades bands 4,3 and 2 were used to extract spectral signatures for the surfaces of interest from Khumbu in the spectral unmixing procedure (section 2.5.2). For areas in 180 Khumbu located outside the Pleiades extent, we used a RapidEye scene from the Planet satellite from Oct 9 th , 2015, namely a Level 3A analytic ortho tile. This product consists in orthorectified, surface reflectance data at 5 m spatial resolution and five multispectral bands. The RapidEye tiles were mosaicked using nearest neighbour resampling method and were used along with Pleiades for endmember selection (section 2.5.2), as well as for validating the supraglacial pond maps in the Khumbu region. 185

Elevation data
We obtained and evaluated two DEMs : the Shuttle Radar Topography Mission (SRTM) version 3.0 (1 arc second, ~30 m) (NASA-JPL, 2013) and the ALOS Global Digital Surface Model (AW3D30) version 2.2 (30 m) (JAXA, 2019). Their vertical accuracies in the Himalaya were reported as 23.5 m (Mukul et al., 2017) and 2 m (Tadono et al., 2014), respectively. The ALOS tiles were mosaicked using bilinear interpolation to create a seamless DEM covering the entire Himalaya domain. The 190 ALOS DEM provided better vertical accuracy, better shadow rendering and less artefacts due to data voids compared to SRTM and was therefore chosen for this study. We extracted surface slope, minimum elevation and elevation range over the debris https://doi.org/10.5194/tc-2020-372 Preprint. Discussion started: 8 January 2021 c Author(s) 2021. CC BY 4.0 License. covered areas using zonal analysis and investigated the possible dependency of supraglacial features on these topographic variables.

2.4
Image pre-processing 195 Atmospheric effects due to scattering or absorption from atmospheric gases, aerosols and clouds result in attenuation, brightness modifications, or altered spatial distribution of the reflected solar radiation at visible wavelengths of interest (0.4 -2.5 μm) (Kaufman, 1989). Sophisticated atmospheric correction models require climatic inputs such as ground measurements of aerosol or temperature, which are not easily available in high mountains. Furthermore, topographic effects caused by differential solar illumination of the earth's surface in complex terrain result in widely varying spectral responses which affect 200 image processing operations, including sub-pixel routines. Various topographic normalization algorithms exist, including the cosine function and the Minaert correction reviewed previously by others (Richter, 1997), and most recently for the Himalaya (Mishra et al., 2010).
We performed atmospheric and topographic corrections of the Landsat scenes using the open-source Atmospheric and Radiometric Correction of Satellite Imagery (ARCSI) routine (Wilson, 2013). This is based on the 6S algorithm (Vermote et 205 al., 1997) and corrects for the spectral variation of surfaces and illumination effects using physically-based illumination and reflectance corrections (Shepherd and Dymond, 2003), a standard atmospheric profile, an Aerosol Optical Thickness (AOT) value and sensor geometry. The AOT is commonly derived by a numerical inversion the surface reflectance on an image-basis using the simple dark object subtraction technique from the blue band. For our Sept 30, 2015 scene from the Khumbu, this yielded an AOT of 0.05. To validate the performance of this technique for atmospheric profile representation in our area, we 210 compared the estimated AOT with data from AERONET (https://aeronet.gsfc.nasa.gov/) and Copernicus Atmospheric Monitoring Service (CAMS) (https://atmosphere.copernicus.eu/catalogue#/) for the same date and time as the reference scene.
AERONET provides three quality levels (Levels 1.0, 1.5, 2.0) for aerosol size (Giles et al., 2019); we used 1.5 level data for 550 m for the three sites in eastern Nepal located within our study area. The CAMS database provides daily forecast global reanalysis of total optical depth at multiple wavelengths. The AOT values calculated from AERONET versus CAMS (based 215 on the average of the two cells where the Khumbu area is situated) were 0.07 and 0.05, respectively, which was consistent with the value obtained using the dark subtraction method (0.05). Based on this, we performed the atmospheric corrections in automatically for each scene using the method implemented in ARCSI.

Background and set-up 220
Spectral unmixing techniques rely on the basic idea that any pixel in an image represents an average of the various materials on the ground. These techniques aim at decomposing a given pixel, generally at medium-low spatial resolution (> 30 m) into the constituting materials in order to improve mapping accuracy (Keshava and Mustard, 2002). In the case of debris-covered glaciers tongues, a "mixed pixel" is typically a combination of debris, ice cliffs, and/or supraglacial ponds and vegetation in various proportions (Rounce et al., 2018). Spectral unmixing yields the percent occurrence (or so-called "fractional 225 abundance") of each distinct type of material present in a pixel (Keshava and Mustard, 2002). Materials are selected based on their characteristic, distinct spectral signatures, and are referred to as "pure" endmembers (Painter et al., 2009;Keshava and Mustard, 2002). The relationship between the fractional abundance of each material and its spectra can be defined either as linear or non-linear. Linear mixing models assume that the spectra of a given pixel is a linear combination of the spectral reflectance of the distinct constituent materials, and is the most widely used to distinguish vegetation, rock, or different snow 230 grain sizes (Painter et al., 2009). Nonlinear mixing models take into account multiple scattering between surfaces, and are useful when the different materials are in close association, for example vegetation canopy height, or particulate mineral mixtures (Keshava and Mustard, 2002). While non-linear models are more realistic, they are also are less well developed in the literature and were not considered here.
To yield physically meaningful results, fractions obtained from spectral unmixing routines should comply with two major 235 constraints: (a) the non-negativity (or positivity) constraint (i.e. fractions should not be negative) and (b) the sum-to-unity (i.e. for each pixel, fractions should add up to 1) (Keshava and Mustard, 2002). The non-negativity condition is recommended because negative reflectance values have no physical meaning, or for variable illumination conditions. The sum-to-unity constraint is recommended when very dark endmembers such as shadows are targeted or for unmixing radiance or thermal IR emissivity. Models which comply with both conditions are called "fully-constrained models" but are difficult to implement 240 as they require perfect knowledge of the system. Moreover, the constrained models have been shown to produce unrealistic fractions in poorly defined areas or areas of low illumination (Cortés et al., 2014). In this study, we focused on two implementations of linear mixing models (denoted hereinafter as 'LMM') available in the ENVI software, namely a partial constraint (sum-to-one) and an unconstrained model, applied over the supraglacial debris cover only. For the Khumbu domain, we also tested the ENVI implementation of the Spectral Angle Mapper (SAM), which is an automated, physically-based 245 spectral classification that matches each pixel to a reference spectra based on an the spectral angle between them (Kruse et al., 1993). This method was used in the Alps with good results (Naegeli et al., 2015). The SAM method was relevant here because it is relatively insensitive to illumination and albedo effects when used on reflectance data (Kruse et al., 1993), as is our case.

Endmember selection and spectral signatures 250
The selection of endmembers is crucial for the accuracy and reliability of the spectral unmixing (Song, 2005). Prior to selecting endmembers, we performed a forward Minimum Noise Fraction Transform on the Khumbu Landsat image to determine the dimensionality of the data. We extracted six endmembers using the pixel purity index routine in ENVI. These were checked and refined using the n-D visualizer until they were statistically separable based on the Jeffries-Matusita and Transformed Divergence separability measures (values > 1.9-2.0). We tested these against the automated Sequential Maximum Angle 255 Convex Cone routine in ENVI, which is an unsupervised, iterative algorithm for defining endmembers from spectral data https://doi.org/10.5194/tc-2020-372 Preprint. Discussion started: 8 January 2021 c Author(s) 2021. CC BY 4.0 License. (Gruninger et al., 2004). We combined this iterative process with a-priori knowledge about types of surfaces on the supraglacial debris. We chose six endmembers, namely clean ice, dry vegetation, clouds, light and dark debris and turbid water, which we considered to be representative of the physical components of the glacier surface (Fig. 2). While previous studies (Painter, 2003;Painter et al., 2009;Rittger et al., 2013) only used three or four endmembers for unmixing Landsat data, 260 we considered it necessary to include the cloud and clean ice endmembers ice in order to minimize model errors. We based this on that fact that images were mostly cloud free, and we would not expect clean ice to occur in large quantities on the debris surface. However, we performed model runs with 3, 4, respectively 5, endmembers to test this assumption. The endmember areas were carefully defined as single Landsat pixels to minimize the occurrence of "mixed pixels", and their location was checked against colour composites of the Pleiades image (bands 4,3,2). 265

Fig. 2 here
Spectral signatures of endmembers were extracted from Landsat bands 1 to 7 (Fig. 3a). For the two debris classes, we visually compared the spectral signatures of dark and light debris endmembers with spectra previously acquired in the field on Mer de Glace in the French Alps using an SVC HR-1024 spectrometer (350 nm to 2500 nm) (Racoviteanu and Arnaud, 2013), as well as with supraglacial debris spectra from the Swiss Alps (Naegeli et al., 2015;Naegeli et al., 2017) (Fig. 3b). 270 For the turbid water endmember, the spectra defined here corresponds well with field-based spectra for other turbid lakes in the Khumbu, such as Chola Lake, reported in other studies (Matta et al., 2017).

Supraglacial debris cover masking
We used the database of global distribution of supraglacial debris cover (SDC v.1) (Scherler et al., 2018a), referred hereinafter 275 as the "SDC dataset". These debris cover outlines were derived from either Landsat or Sentinel-2 using three different algorithms, on the basis of the Randolph Glacier Inventory (RGI v.6) (RGI_Consortium, 2017), as described in Scherler et al. (2018b). The Landsat-based extents issued from single band ratios agreed best with our previous debris-cover estimates based on manual digitization in terms of debris cover representation (Racoviteanu et al., 2013). We constrained our spectral unmixing analysis over these SDC outlines in order to reduce model complexity. Glaciers in this dataset span the period 1998-2001 for 280 the central-eastern Himalaya, 2002 for the western (monsoon-dry transition zone) and mostly 2010 for China. We assumed that any changes in the debris-covered glacier areas from 1998 to 2015 would have occurred within these older outlines, since surge-type glaciers, and hence apparent glacier "advance" are rare or non-existent in the Himalaya region, contrary to the Karakoram (Sevestre and Benn, 2015). SDC outlines required pre-processing because supraglacial ponds were represented as "holes" in this dataset, and some contained geometry errors due to occurrence of nunataks. Since water was one of our target 285 classes, we "filled" the holes and automatically checked and repaired the geometry of the SDC polygons. For the test Khumbu area, we removed supraglacial debris polygons with area less than 0.01 km 2 , which proved to be erroneous areas upon visual https://doi.org/10.5194/tc-2020-372 Preprint. Discussion started: 8 January 2021 c Author(s) 2021. CC BY 4.0 License. examination, i.e., sliver polygons or isolated bare land pixels. These are unwanted small polygons that typically result from polygon overlays; they do not represent a physical entity on the ground and can be removed in a semi-automated manner (Delafontaine et al., 2009). The repaired SDC outlines were rasterized and used as a mask in the spectral unmixing process. 290

Surface classification and composition from fractional maps
SAM routines yield a set of greyscale rule images (one image per class), in which each pixel value represents the spectral angle (in radians) between the reference spectrum and the spectrum of each class. For the SAM routine, we combined the light and dark debris into one class. We tested various thresholds for the maximum spectral angles (0.1 to 0.2), and evaluated the 295 resulting classified image based on the spectral similarity of each class to reference spectra.
The LMM routines result in a multi-band raster containing pixel-by-pixel fractional cover values for each class, which ideally range from 0 to 1. However, negative fractions can occur when endmembers are missing or are poorly defined, although they do not necessarily affect the ability to discriminate between surfaces (Klein and Isacks, 1999). In case of negative values, we inferred the absence of the material in that pixel upon examining the rasters, and we forced these negative values to zero. We 300 then normalized values greater than 1 on a by dividing each endmember fraction by the sum of the endmembers, so that the sum of the fractions of the various materials in each pixel added up to 1. This is a common procedure suggested by previous studies (Rosenthal and Dozier, 1996;Quintano et al., 2012;Cortés et al., 2014). To classify the fractional maps, we applied a threshold α (defined based on visual interpretation of the fractional maps) to produce binary maps for each class. The userdefined minimum threshold used in other studies such as snow mapping was α = 0.4 or 0.5, i.e. a pixel was assigned to the 305 snow class if it contained a fraction of 40 -50 % to 100 % snow (Hall, 2002;Rittger et al., 2013). Since any pixel contains a mixture of materials in various proportions, the thresholds varied by class and were adjusted carefully based on visual interpretation and validation with the Pleiades image. We also developed and tested an automated binary method in which each pixel was assigned to the class for which it had the highest fractional cover.
We converted these lower -resolution (30 m) fractional maps into a finer classification map to estimate the composition of 310 debris-covered glaciers at various spatial scales. We first estimated the composition for seven debris-covered glaciers from the Khumbu domain, which have been well studied in the past (Casey and Kääb, 2012;Watson et al., 2017). Using the spectra and spectral unmixing parameters from the Khumbu, we upscaled the composition to the entire supraglacial debris cover from the SDC v.1 dataset for the Himalaya spatial domain. This allowed us to compare and contrast the composition of debris-covered tongues in three climatically distinct regions, ranging from west to east: Lahaul Spiti, Khumbu and Bhutan 315 domains (Fig. 1 inset). All the post-spectral unmixing geospatial analysis steps were implemented in ArcPython scripting to allow for loop processing of multiple images and various spatial extents.

Accuracy assessment and validation
We first assessed the accuracy of the spectral unmixing routines qualitatively, based on visual interpretation and comparison with the high-resolution Pleiades. A more thorough accuracy analysis was performed using established quantitative measures 320 (root mean square error -RMSE, fraction value abnormalities, residual values and confusion matrix) (Gillespie et al., 1990).
Residuals were output from unmixing runs on a pixel-by-pixel basis as a separate band. We used this residual map to identify poorly defined endmembers and optimized the LMM by refining these iteratively until we obtained the lowest model RMSE.
To assess the ground accuracy of the final LMM, we digitized 151 test pixels (10 -38 pixels per class) on false colour composites of Pleiades and RapidEye images. These were well distributed on several debris-covered tongues in the Khumbu, 325 and were taken to represent "ground truth". These locations were compared to the predicted classes in the confusion matrix for the abundance maps. To further validate the LMM performance for water detection, we mapped the supraglacial ponds on seven debris-covered glacier tongues in the Khumbu from the October 2015 Pleiades scene using OBIA methods implemented in the ENVI Feature Extraction module (namely a combination of spectral mean and edge detection). The resulting ponds were manually inspected and corrected in shaded areas beneath ice cliffs following similar procedures as those described in Watson 330 et al. (2017a). We also visually and quantitively compared the "pond" pixels resulting from the LMM against Pleiades colour composites (bands 4,3,2). The date of the Pleiades pond reference dataset for the Khumbu glacier is one week apart from to the date of the Landsat scene but we assume minimal lateral expansion might have occurred between the two dates, as discussed by Watson et al (2018).

SAM classification results in the Khumbu domain for the entire scene
Here we present the classification map for the five surfaces obtained with a maximum spectral angle of 0.15 (i.e. pixels with spectral angle > 0.15 were not classified) (Fig. 4). On Figure 4, we note that SAM classifier performed well for mapping snow and ice and identifying clouds. The debris-covered tongues were also identified roughly, although they contained some extra bare terrain and some water pixels were confused with debris. Supraglacial vegetation was detected at the edges and 340 termini of debris-covered glaciers especially in the eastern part of the domain. This represented mostly dry vegetation (with higher reflectance in the visible and NIR bands compared to green vegetation). Healthy green vegetation present outside the glacier outlines at lower altitudes was not detected by the SAM, due to its distinct spectral signature (i.e., low reflectance in the blue and red bands due to absorption by chlorophyll and high reflectance in the green bands). Some turbid pro-glacial ice-contact lakes (e.g., Imja Lake) were well classified, but others such as the darker blue, less turbid, Gokyo Lakes and most 345 of the supraglacial lakes were not detected. Some lakes are mostly composed of turbid water, with a higher reflectance in the red bands due to the presence of suspended sediments, and hence a higher reflectance overall than that of clear water. The SAM results were useful here to test the endmember choices and fine tune them for the LMM routines. For example, some of the missing parts in the centre of debris-covered tongues in the northern part of the area (China) as well as Ngozumpa Glacier indicate the need for two debris endmembers to capture the variability in the surface reflectance of debris with 350 varying lithology. Another advantage of the SAM is that is also provides unclassified pixels separately, as well as masked (shadow) pixels as separate classes. The SAM method is presented here only as an additional verification on the endmembers chosen for the subsequent LMM runs.

Linear mixing model results for the Khumbu domain 355
In this study, we attempted both an unconstrained and partially constrained (sum-to-one) LMM implementations in ENVI.
Despite the non-negativity constraint in the latter, both models yielded negative abundances for all classes, with larger positive values (> 3), especially for the vegetation class as discussed below. While a model is considered "valid" when class abundance < 1.01 (strict constraint rules) or < 2.01 (looser rules) and RMS < 2.5% (Painter et al., 2009), larger positive values for some classes are to be expected when using limited spectral resolution data (Landsat, 7 bands for unmixing) compared to 360 hyperspectral data (for example AVIRIS, 224 bands) used more commonly (Painter et al., 1998). In terms of model performance, the unconstrained model run had a lower average root mean square error (RMS of 0.6 %) compared to the partially constrained model run (RMS of 1.5 %), so the unconstrained model results were presented in this paper. Here we present results for the unconstrained model, which had a lower average RMSE (0.6 %) than the partially constrained model (1.5 %) (section 4.4). 365 Examples of the normalized fractional maps for each of the six material types in the test Khumbu domain are presented in patches at the terminus of glaciers in China (Fig. 5). Pixels with abnormally high positive fractional vegetation values were 370 found in areas of healthy green vegetation and/or bare terrain, which should not be part of the debris-covered tongues, as will be discussed later (section 3.4). Supraglacial ponds were mapped by thresholding the fractional water raster (fractional water > 0.5), which automatically mapped most ponds. These are well visible for example at the termini of Ngozumpa and Khumbu Glaciers (Fig. 5). Clouds were mapped using a threshold of 0.45, but these were often mixed with light or dark debris. For clean ice, fractional values mostly ranged from 0 (areas which might have some degree of dirty, dark ice with a lower albedo) 375 to 1 (clean ice). A fractional ice threshold of 0.20 was useful to classify these ice surfaces.

Accuracy of the fractional cover maps
The overall accuracy of the LMM-based classification was 0.75, with a Kappa coefficient of 0.69 (Table 2). The confusion 380 matrix presented in Table 2 shows that dark debris was 100 % classified with respect to the reference pixels. Dry vegetation and water were also classified with a high accuracy (> 83 %), and light debris was classified at 70 %, as some light debris pixels were confounded with vegetation. Clouds were only classified with 50 % overall accuracy, and these were only present over small areas. Clean ice was the most poorly classified with an accuracy of 7.1 %. Considering its distinct spectral signature from other classes (Fig. 3), we attribute this to the poor representation of the clean ice class on the debris-covered 385 tongues, because, in addition to there being only a very small horizontal surface area of exposed ice, it tends to also be heavily dusted with fine debris.
In Table 2 we also present accuracy measures for each class as omission errors / producer's accuracy (i.e., percentage of pixels that were omitted from each class) and commission errors/ user's accuracy (i.e., percentage pixels that belong to a category but are mistakenly assigned to another category). The dark debris class had the highest (100 %) producer's accuracy (all 390 reference pixels on the ground were correctly classified) but lower (56 %) user's accuracy (Table 2). Water and vegetation classes had high producer's and user's accuracy (> 83 % and > 93 % respectively), i.e., most reference pixels in these classes were correctly classified, with a high probability that pixels assigned to these two classes contain some amount of the respective surfaces on the ground. Similarly, light debris had a producer's and user's accuracy > 70%. Clean ice pixels, on the contrary, were not well classified, as 93 % of the ice pixels were omitted (producer's accuracy = 7 %). Clouds were classified at 50 % 395 producer's accuracy, as 38 % pixels from other classes were mistakenly assigned to this class. Based on the confusion matrix, we note that overall, the LMM most accurately classified the water, vegetation, and debris classes.

Composition of the debris-cover glacier tongues at multi -scale
For the seven debris-covered glacier tongues in the Khumbu (Table 3) the most prevalent material in this subset region, as 400 expected, is debris (dark and light), with an average of 53.7 % and 43.6 % of the debris-covered tongues, respectively. At regional scales, debris-cover glacier tongues in the Lahaul Spiti region are composed of 13.1 % dark debris and 74.4 % light debris, and in the Bhutan domain they are composed of 15.0 % and 64.1 % light debris (Table 4). Overall, light debris is prevalent over the entire Himalayan domain, comprising almost three times the amount of dark debris (60.9 % vs. 23.8 %, respectively) (Table 4). 405 Table 3 here Table 4 here The dark and light debris areas exhibit variable distribution patterns by glacier (Fig. 6). For example, the debris-covered tongue of Nuptse Glacier in Khumku is mostly covered by light debris (> 95 % of its area), while the opposite is true for Lhotse Glacier, which is mostly composed of dark debris (> 91 %) (Table 3). Other glaciers in the eastern part of Khumbu 410 exhibit alternating bands of light and dark debris, with darker bands correctly representing medial moraines (Fig. 6).
Glaciers in the western part of the Himalaya are mostly composed of supraglacial light debris.

Fig. 6 here
Exposed ice was detected in small quantities on the seven debris-covered glacier tongues in the Khumbu, ranging from 0.2 % (Lhotse) to a 1.4 % (Changri Nup) with an average of 0.6 % of the debris-covered part. A total of 126.5 km 2 of ice (5.6 % of 415 the debris area) was detected over the entire Himalaya (Table 4). Some ice pixels were detected at ice cliffs (Fig. 7), but since these are covered with a fine layer of darker particles, they have a lower albedo and lower spectra than that of clean ice (Naegeli et al., 2015), they cannot be properly identified using this endmember.

Fig. 7 here
Clouds were not present over the seven tongues in the Khumbu. While some isolated pixels were classified as clouds, this part 420 of the image is cloud free, and some cloud pixels were confounded with other types of surfaces, notably debris (Table 2). Over the entire range, we mapped 45 km 2 of cloud coverage, corresponding to 2.0 % of the supraglacial debris cover, with less coverage in Lahaul Spiti and Khumbu (1.6 % and 0.6 % of the supraglacial debris cover, respectively) compared to the Bhutan domain (6%). While these patterns may be explained by climatic patterns, i.e. more frequent cover in the eastern Himalaya in September where glaciers in the eastern Himalaya continue to experience precipitation and more frequent cloud cover until 425 late fall months (Thayyen and Gergan, 2010), we are less confident in the actual estimations of the cloud cover areas.
Patches of supraglacial vegetation were detected over debris-covered tongues, ranging from ~0 % (Lhotse Nup Glacier) to 1.6 % (Gaunara Glacier), with an average of 0.5 % over the seven tongues. The higher percentage for the latter is most likely due to some debris and ice pixels mistakenly assigned to the vegetation class (Table 2). At the scale of the Khumbu domain, a closer look at these highly "vegetated" pixels revealed some interesting findings. Some glaciers, for example Khazenpu Glacier 430 in China, exhibit supraglacial vegetation which was correctly mapped using the spectral unmixing (Fig. 8a). On other glaciers, for example Labeilong Glacier in the same area, besides correctly mapped supraglacial vegetation, we also detected large patches of vegetation which actually represent non-glacierized bare land and/or healthy vegetation outside the debris-covered glacier outlines, but these were included in the SDC v.1 dataset (Fig. 8b). Such spurious vegetated areas present withing the debris cover outlines may have affected to some extent the quality of the spectral unmixing, particularly the non-negativity 435 condition, because it produced large negative and positive fractional vegetation values. The most obvious areas were corrected manually for Khumbu, but no corrections were applied when applying the unmixing over the entire Himalaya domain. Overall, supraglacial vegetation covered 4.5 % of the debris-covered glacier surface in the Himalaya, with less supraglacial vegetation in the western part (Lahaul Spiti, 1.6 % of the debris cover) compared to the central and eastern Himalaya (Khumbu and Bhutan domains, ~3 %). 440

Fig. 8 here
The fractional supraglacial pond area in the Khumbu in 2015 ranged from 0.9 % (Lhotse and Nuptse Glaciers) to ~ 3 % of the debris-covered glacier area (Ngozumpa and Khumbu Glaciers), with an average of 1.6 % over the seven debris-covered glacier tongues (Table 3). The larger fractional water coverage for the latter two may be explained by the presence of large supraglacial ponds at the terminus of these two glaciers (Fig. 6). We obtained a supraglacial pond area of 47 km 2 over the 445 entire Himalaya (2.1 % of the total supraglacial debris cover area), with regional variability from west to east (Table 4).

Supraglacial pond validation
To validate the subpixel supraglacial pond classification, we mapped pond extents from the high-resolution Pleiades image using OBIA image segmentation for the seven reference glacier tongues in the Khumbu region. We compared outputs of various manual thresholds, and a binary automated method for classifying supraglacial ponds from the fractional maps (as 450 described in section 2.5.4) against the Pleiades-derived pond areas. A threshold of 0.5 applied to the water class (fractional water > 0.5 = supraglacial ponds) yielded the best agreement with the lake areas obtained from the high-resolution Pleiades image (1.0 km 2 compared to 1.1 km 2 for the total coverage, respectively) ( Table 5). This is in agreement with other spectral unmixing studies (Hall, 2002;Rittger et al., 2013), which suggested using a threshold of 0.5 for the fractional maps to map seasonal snow. 455 Table 5 here In Table 6 we compare the pond coverage obtained using a threshold of 0.5 and an automated binary classification method (i.e., based on the maximum fractional values) against the Pleiades pond areas. When using the automated approach, the total supraglacial pond coverage was over-estimated by 64 % compared to the Pleiades pond area. For the Khumbu glacier, for example, Landsat spectral unmixing yielded a total binary pond area of 0.20 km 2 , in agreement with the area estimated from 460 OBIA analysis using Pleiades (0.23 km 2 ). This is also consistent with the area estimated by Watson  based on similar methods using the same Pleiades image (Oct 7, 2015) ( Table 6). The automated maximum fraction method yielded 0.27 km 2 for the pond area. For the pond area, the area difference between the thresholded and the manual method is small (0.07 km 2 ), but this has to be however considered when upscaling such results to larger areas. Table 6 here 465 Figure 9 shows the supraglacial pond maps for three glacier termini in the Khumbu domain, compared to the OBIA Pleiades ponds. There is good agreement for the pond coverage for the Ngozumpa and Khumbu Glaciers. For Lhotse Glacier, the supraglacial pond area is slightly under-estimated compared to Pleiades, as noted in Table 6. This may perhaps be due to the predominant darker debris of this glacier, some of which was confused with water, as shown in the confusion matrix ( Table   2). 470

Debris type
The composition of debris-covered tongues presented in this study shows that the light debris is prevalent over the entire 475 Himalayan domain, comprising almost three times the amount of dark debris (60.9 % vs. 23.8 % of the debris cover area, respectively). Overall, light debris material tends to dominate particularly in the western parts of the range, which presumably reflects the nature of the underlying bedrock geology (Searle et al., 1987). Sub pixel routines tested here capture the variability in the geological material present on the surface of the debris-covered glaciers in the Himalaya. With regards to the on-the ground spectral characteristics of the debris material in the Khumbu region, Casey et al. (2011) showed that these vary due to 480 the presence of various of minerals, notably distinct granitic (lighter) vs. schistic (darker) debris types. These findings supported our choice of separating the debris material into two different classes: light and dark debris, corresponding to the principal types of material composition found across the mountain range.

Exposed ice 485
The percentage coverage of clean ice within the debris covered area for the entire range (5.6 %) is considerably higher than that for the reference glaciers in the Khumbu region (0.6 % on average) (Table 3). At the date of analysis, chosen based on the Pleiades image (Sept 30 th , 2015), we suspect that debris-covered glaciers in some parts of the range may have supraglacial snow because of the differences in timing of the end of ablation season across the mountain range. In terms of spectral unmixing, the algorithm was able to detect snow and ice on the surface of the debris-covered glaciers, corresponding to clean 490 ice patches at the upper limit between supraglacial debris and clean ice due to either inaccuracies in the input data (inclusion of clean ice) or the presence of snow from avalanches. Given the spectral limitations of the Landsat imagery, we used a "clean ice" endmember and thus we did not expect the algorithm to fully detect dirty ice at the ice cliffs, which have a lower albedo.
In terms of ice cliff coverage, we could not extract the accurate percent of ice coverage unless the images are completely free of clean ice and snow. However, we did detect the presence of ice pixels around some supraglacial ponds, indicating the 495 presence of ice cliffs, but this was not the main focus of this study. Targeting exposed but dusted ice features within the debris cover in addition to clean ice would need some refinement of the algorithms we used to optimally detect supraglacial ponds, and better spectral resolution of the imagery.

Supraglacial pond occurrence 500
The coverage of supraglacial ponds over the entire range (2.1 %) is within the range of glacier-by-glacier variability noted in the Khumbu reference area (Table 3). However, this varies across the Himalaya, from a low coverage of in the drier areas in the monsoon transition zone of Lahaul Spiti (0.3 % of the debris-covered area) to 1.6 % in the central Himalaya (Khumbu region) and it is almost seven-fold (4.9 % of the debris-covered area) in the eastern, monsoon influenced parts of Himalaya (Bhutan). We speculate that the lower abundance of supraglacial ponds in the western part of the range is due to a combination 505 of debris cover characteristics and climatic conditions. Climatic conditions and glacier characteristics in the western part of the Himalaya are more similar to the ones in the Karakoram, which have undergone less glacier shrinkage (Brun et al., 2017;Kääb et al., 2012;Gardelle et al., 2013) while the eastern, monsoon-influenced Himalaya experience higher temperatures, and lower glacier termini elevations. Supraglacial ponds have been shown to form in on stagnating areas on the ablation areas of glaciers, with surface angles lower than 2° (Sakai and Fujita, 2010;. Pond formation has been shown to relate 510 to the amount of downwasting a glacier surface has undergone and to the related reduction of the downglacier slope angle. In this sense, the degree of pond coverage could be considered an indicator of the degree of glacier decay.

Supraglacial vegetation
Although we estimate that some of the classified vegetation also includes healthy vegetation and bright non-glacierized bare 515 terrain (section 3.4), we found that supraglacial vegetation is less prevalent in the west (Lahaul Spiti) than in central and eastern Himalaya (Khumbu and Bhutan domains). Glacier tongues here descend to lower elevations (~3,700 m to 4,400 m compared to ~ 4,700 to 4.900 m in the western part) which, in conjunction with the more humid, monsoon-influenced climate might favour vegetation growth. Development of vegetation (mostly shrubs) has been noted on stagnant, thick debris-covered tongues in other areas of the world (Xie et al., 2020a;Tampucci et al., 2016), and was found to be increasing in certain glacierized 520 areas such as the Alps as a consequence to climatic change and declining glaciation (Vezzola et al., 2016). As vegetation typically will only develop on stagnant surfaces that are no longer undergoing substantial gravitational reworking, its presence is also an indication of glacier inactivity and later stages of decay.

Controls on supraglacial pond and vegetation coverage
Supraglacial ponds and vegetation may reveal information in terms of understanding the evolution of debris-covered 525 glaciers. For example, the expansion of supraglacial lakes at the terminus of a glacier is a precursor to the transition to a proglacial lake which may present an enhanced potential for glacial hazard such as glacier lake outburst floods, and supraglacial vegetation growth may indicate the transition of a debris-covered glacier to an inactive glacier tongue or a rock glacier. To investigate this further, we explored the possible dependency of supraglacial pond and vegetation cover on topographic variables (elevation, slope, extent of supraglacial debris cover and geographic location). We only performed this 530 analysis on debris cover glacier tongues > 1 km 2 in order to remove spurious small bare land patches or isolated debris pixels. This threshold was also chosen considering that these tend to have higher turbidity (i.e. grey colours) (Matta et al., 2017) and larger water volumes, which is of concern for outburst flood potential. After the area filtering, the resulting dataset comprised 408 debris-covered glacier tongues, most of which (90 % of the total number) had a surface area less than 5 km 2 and had a mean slope of 12.7º, with a minimum of 7º. The average slope is consistent with values used in previous studies to 535 delineate debris covered glaciers (10 to 15º) (Bolch et al., 2007;Racoviteanu and Williams, 2012;Lippl et al., 2018;Robson et al., 2015). While slope thresholds are region-dependent, the minimum slope obtained when filtering the debris covered glaciers by area is consistent with other areas, for ex. 6.7º in the Hunza (Xie et al., 2020a). Both the area and surface slope have a skewed distribution (Fig. 10). Slope angles > 30º were associated with bare steep terrain outside the glaciers, which was mistakenly classified as supraglacial debris in the SDC v.1 dataset (cf. Fig. 8). 540

Fig. 10 here
We did not find a significant correlation between supraglacial pond coverage and average debris cover slope angles or elevation. Supraglacial ponds are expected to occur predominantly on parts of the debris cover with slopes less than 2º (Sakai et al., 2000; and negative glacier mass balance. From the analysis of supraglacial pond area coverage with respect to average debris cover slope (Fig. 11a), we can only say that the debris covered tongues with slopes < 10º 545 comprise on average 1.3 % supraglacial ponds, and for average slopes > 20º, this drops to an average of 0.07 %, which is to be expected because at steeper slopes, meltwater can drain away .

Fig. 11 here
We note a decreasing pattern in supraglacial pond coverage with debris cover area, with larger glaciers displaying a maximum of ~5 % ponds on their surface, but the trend is not statistically significant at 95% confidence interval (Fig. 11b). 550 Supraglacial ponds covered in general 0 to 2 % of the debris cover area (Fig. 11b), with more extreme values (up to 20 % of the debris area) for higher debris cover termini, some of which we attribute to errors in the input SDC dataset. We did not find a clear trend of pond coverage with respect to minimum elevation (Fig. 11c).
Supraglacial vegetation coverage did not display strong dependencies on slope nor minimum elevation of the debris-covered tongues ( Fig. 11a-b). While we could hypothesize that vegetation tends to develop on stagnant, low slope debris-covered 555 tongues situated at lower elevations, the trends were not statistically significant. In general, supraglacial vegetation cover was concentrated on glaciers with an average slope less than 25 % and minimum debris elevation situated from 3,000 m to 5,000 m (Fig. 11c), with some extreme fractional vegetation values > 40 % at higher elevations, which we attribute again to errors such as those noted in Figure 8. The lack of significant dependencies of supraglacial pond coverage and vegetation may imply that their distribution is governed by more complex factors which may include geomorphologic and climatic 560 patterns, which cannot be examined here in detail.
The distribution of supraglacial pond and vegetation coverage across the entire mountain range (Fig. 12) shows that both supraglacial ponds and vegetation cover a relatively small percent of debris-cover areas in Western Himalaya (0 to 2.5 %), with clusters of higher percentage in the central-eastern Himalaya (7.5 to 10% and 20 -40 %, respectively). The high percentage of supraglacial vegetation in some of the eastern parts is partly explained by the errors in the input supraglacial 565 dataset, as explained earlier, and we are hesitant to over interpret the vegetation analysis. Further applications and comparison with regional lake databases The fractional water maps obtained from the spectral unmixing techniques presented here can be further used to characterize the state of lakes and ponds in terms of their turbidity, i.e., by quantifying the fraction of a pixel covered by water, light and/or 570 dark debris (Fig. 13). This aspect is not fully explored in this study, but can be further investigated using field measurements spectra of ponds and lakes with various degrees of turbidity (Matta et al., 2017;Giardino et al., 2010) to characterize the lake turbidity across the mountain range. In this regard, repeated monitoring of pond turbidity using these combined tools allows tracking changes in suspended sediment load over time, which are considered direct indicators of glacier wasting processes and glacier-lake interaction (Giardino et al., 2010). 575

Fig. 13 here
Another important application of this study is the mapping of lakes outside the glacier areas across the mountain range. This can be achieved by replacing the turbid water with a clear water endmember, and applying the unmixing over the full Landsat extent rather than just the debris cover. Mapping all the lakes was beyond the purpose of this study, but initial tests provide an illustration of such an output for the terminus of Ngozumpa Glacier (Fig. 13). We compared our lakes and pond 580 extents with two existing glacial lakes databases constructed from the same year (2015 Landsat): the HMA v.1 lake dataset, derived using a normalized difference water index (Shugar, 2020b) and HI-MAG constructed using a modified NDWI and manual corrections (Chen et al., 2020). A comparison with other global databases such as the Global Surface Water (GSW) (Pekel et al., 2016) was not undertaken here, as this has already been shown to underestimate the water occurrence over most of the Himalaya by Chen et al. (2020). With regards to HMA v1 and HI-MAG datasets, Figure 13 shows that the lake 585 outlines obtained from spectral unmixing in this study are outperforming both of the existing databases. Our lake extents are consistent with the HMA v.1 lakes extents outside debris cover (Fig. 13), and the surface area estimates agree quite well, for example we calculated a difference of 5 % in the summed pond area over the three Gokyo Lakes (1.15 km 2 in our estimates vs. 1.09 km 2 in HMA v.1). The slight underestimate in the latter is due to simplification of the raster edges in the vector conversion process, visible in the lake extents. With regards to supraglacial ponds, for example Spillway Lakes at the 590 terminus of Ngozumpa Glacier, our spectral unmixing technique maps most of these lakes, while both HMA v.1 and the HI-MAG datasets fail to detect all the supraglacial ponds (Fig. 13). The HI-MAG detects more of the surface of Spillway Lake compared to HMA v.1, but the outlines are very much simplified and lack precision with respect to Landsat pixels (Fig. 13).
We did not simplify the lake and pond polygons, as this can introduce significant area errors. The examples above show that spectral unmixing can help refine and complement these larger databases at regional and global scales. 595

Uncertainties and limitations
While spectral unmixing is a relatively straightforward routine to implement, limitations and main difficulties in correctly applying this routine arise due to several factors: 1) quality and limitations of the input data; 2) illumination differences and 3) uncertainties in the endmember selection and in the unmixing routine itself.
(1) With respect to the quality of the input data, our study relied on Landsat-8 OLI. Using the Landsat series has the advantage 600 of spanning several decades (1972 to present), with the possibility of replicating this study for past decades. While previous missions such as Landsat 7 missions suffer from saturation at the sensor and data gaps due to Scan Line Corrector (SLC)-off, these were still useful for snow mapping as noted earlier. Using Landsat data has the advantage that the same methodology can be applied to previous Landsat missions to investigate changes in supraglacial features. For dates post-2015, Sentinel-2 or the future Landsat-9 may be used to improve the performance of the spectral unmixing. 605 (2) Atmospheric and topographic corrections were found to be necessary when applying the spectral unmixing over large areas. In this study, such corrections helped minimize illumination differences, remove deep shadows etc. so we were confident that spectra developed for the Khumbu could be used over the entire Himalaya. However, in some areas, some spectral differences may have remained, leading to confusion between the water and light debris classes and hence some over-estimation of the pond coverage particularly in some areas of the western Himalaya. Such areas require some post-610 classification corrections prior to their inclusion in regional datasets, but these will not be very computationally-intensive.
(3) The endmember selection was a crucial step in the spectral unmixing process, as it is prone to uncertainties and limitations of the satellite imagery used. The impact of the different parameters of the spectral unmixing model on the performance of the subpixel mapping needs to be thoroughly evaluated (Xu et al., 2017). One of the difficulties of applying spectral unmixing lies in capturing the variability of the system while maintaining a low model complexity to ensure a "valid" 615 model. In this study, Pleiades imagery combined with field expertise was essential in fine tuning our choice of materials and extracting "pure" pixels. We used six endmembers, which can be considered large compared to the limited number of Landsat bands (n = 7). However, our tests with fewer classes (three or four) resulted in large model errors (RMS) due to missing classes. To minimize the numbers of classes, we made several choices: (a) We combined snow and clean ice into one class (even though their spectral signatures were statistically different), and this potentially reduced the accuracy 620 of the clean ice classification. (b) For the same reasons, we could not define a dirty ice endmember as done in studies using hyperspectral imagery (Naegeli et al., 2015). (c) On the debris surface, we assumed the supraglacial ponds to be mostly of turbid type, with larger quantities of suspended sediments, though some small clear water supraglacial ponds may exist as well (Takeuchi et al., 2012;Giardino et al., 2010). The limited Landsat spectral resolution implied that we could not concomitantly use both a clear and turbid water endmember in the spectral unmixing. However, we used these 625 two endmembers in different model runs, and then combined the results. (d) We defined both a light and dark debris endmember on the basis of spectral differences noted in other studies (Casey et al 2011), but we are aware that these may not cover the wide spectrum of lithology present across the Himalaya. Using a single debris endmember would have resulted in large RMS errors, indicating a missing or poorly defined class. (e) The vegetation endmember was defined as graminoid shrubs or overgrown vegetation with a grass-like appearance, which occur at high altitudes (Wehn et al., 2014). 630 However, we note that some small amounts of healthy vegetation do occur on debris-covered glaciers, and the spectral unmixing was able to detect these healthy vegetation areas as well. (f) A cloud endmember was needed because when omitted, cloudy areas were confused with light debris. However, clouds were accurately mapped using the SAM unmixing method and perhaps removed prior to running the LMM, so these two methods can be combined in order to get a better accuracy. Cloud masking algorithms such as Fmask (Zhu et al., 2015) do not perform well over glacierized areas because 635 of the confusion of snow and ice with clouds.
(4) Our study revealed some important issues with the input SDC v.1 dataset used to constrain the spectral unmixing, particularly with respect to the inclusion of healthy vegetation and bare, bright steep terrain in the debris cover extents.
This affected the accuracy of the LMM to some extent, notably by resulting in large negative and positive values. On the other hand, these abnormal values can be used to identify errors in the existing global supraglacial debris cover maps, and 640 thus a valuable tool to correct and refine these databases.

Summary and further work
In this study, we estimated the spatial distribution of surface characteristics on the debris-covered glaciers (various types of 645 debris, clean ice, supraglacial ponds and vegetation) at a subpixel scale using 30 m fractional maps obtained from a spectral linear mixing model. We developed the spectral unmixing in the wider Khumbu region of eastern Nepal and China using Landsat combined with Pleiades and RapidEye imagery, and then applied them over the entire Himalaya to test its applicability to a wider domain. Our key findings can be summarised as follows: • We have demonstrated that spectral signatures derived from the Landsat image and cross-checked using high-650 resolution Pleiades from the same date can be successfully applied at the mountain range scale provided that all images are atmospherically and topographically corrected to reduce differences in illumination patterns; • We successfully used spectral unmixing to map supraglacial ponds at the scale of the entire Himalaya from coarse resolution and freely available Landsat imagery, and this approach can provide more detail, and thus outperform other analyses of automatic lake identification and classification performed on similar Landsat data for the same period but 655 based on normalized difference water indices (Shugar, 2020b;Chen et al., 2020); • Spectral unmixing thus allows us to produce more accurate supraglacial pond maps from historical coarser satellite imagers that are more comparable to mapping quality from higher resolution, and more recently available imagery, allowing improved analysis of multitemporal change in pond incidence and size; • Extreme fractional vegetation pixels identified by the spectral unmixing can be used as a tool to indicate non-660 glacierized terrain that has been incorrectly included in global debris-covered glacier databases such as the SDC v.1 (Scherler et al., 2018a); • We are less confident of the utility of spectral unmixing in identifying exposed ice within the debris cover from coarse scale imagery. Because of its limited spectral resolution, we could not effectively optimize our end member selection to identify the steep, dusted ice typically found within the debris cover. This is a pity as such exposed ice is of interest 665 is known to contribute disproportionally to total glacier ablation (Steiner et al., 2015;Brun et al., 2018); • While the limited Landsat spectral resolution did not allow for a very fine definition of the wide spectrum of all the different debris lithologies and ice types present on debris-covered tongues across the study area, LMM successfully distinguished among broad categories. Overall, we consider the spectral unmixing method presented here a promising approach to add to the suite of tools that are valuable in analyzing the dynamic surfaces of debris covered glaciers. 670 Future developments include exploring the use of more sophisticated non-linear mixing models, including the ERDAS Subpixel Classifier, which is useful for specific materials that may be spectrally either similar or different to other materials in the scene and overcomes the limitations of LMM in discriminating materials of interest in detail. Furthermore, while we developed this routine in commercial remote sensing software (ENVI), the spectral signatures and the linear unmixing steps can be integrated in other python-based routines. Other improvements envisioned include applying the spectral unmixing 675 developed here on recent Sentinel-2 imagery, which has a better spectral, spatial and temporal resolution (13 bands in the visible to shortwave infrared, 10 -20 m, 5-day revisit time) compared to Landsat (7 bands in the visible to shortwave, 30 m, 16-day revisit time), allowing for better definition of endmembers, more accurate and repeated mapping in the future.
With continued advances in satellite data in the near future, the methodology developed here provides avenues towards achieving large-scale, repeated mapping of supraglacial features. Ongoing work is undertaken to make the unmixing step 680 approach fully automated by integrating it in python-based routine such as RSGISLIB software (Bunting et al., 2014), so it can be applied to derive supraglacial pond outlines at multi-temporal scales to monitor pond development over time.. Given that these surface ponds are ephemeral and change rapidly, automated multi-temporal scale mapping is highly desirable to track their evolution over time in various regions. Mountain-range mapping of the pro-glacial lakes using the clear water class as an endmember is currently in progress and can help complement existing databases. Thus, this analysis presented here 685 complements and expands the existing lake databases for the year 2015 such as the HI-MAG or the HMA, with a particular emphasis on providing supraglacial debris cover pond extents. Such a mapping effort has not been undertaken at the scale of the Himalaya to our knowledge.

6
Author contributions AR led this work, conceiving the idea, designed the spectral unmixing experiments, obtained and processed the Landsat images 690 and wrote the paper. LN provided Pleiades imagery, discussed the research strategy and helped select endmembers based on field expertise. NG supervised the study and provided geomorphology expertise. All authors contributed to writing the paper.

Code availability
Atmospheric and topographic corrections have been performed using the ARCSI routine, which is embedded in the freely available, python-based RSGISLIB software (Bunting et al., 2014). The code for batch processing of the Landsat OLI images 695 for the entire Himalaya can be provided upon request. Post-processing of the spectrally unmixed Landsat OLI maps was done in ArcPython. The steps for loop processing (normalizing the fractional rasters, classifying the surfaces, and extracting the composition of the debris-covered glaciers from the fractional maps) can be provided upon request.

Data availability
Landsat OLI data used in this study can be obtained at no cost from the USGS Earth Explorer (https://earthexplorer.usgs.gov/). 700 All versions of the NASA SRTM Global 1-arc second DEMs are available from the EarthData platform (https://earthdata.nasa.gov/). All versions of the ALOS Global Digital Surface Model, including the one used in this paper, are available from https://www.eorc.jaxa.jp/ALOS/en/aw3d30/index.htm. Preliminary datasets of supraglacial ponds and vegetation, along with the fractional maps are available via Zenodo data repository (DOI:10.5281/zenodo.4421857) and will be updated regularly after post-classification corrections. 705 9

Competing interests
The authors declare that they have no conflict of interest.

Acknowledgements
This research received funding from the European Union's Horizon 2020 research and innovation programme under the Marie 710 Skłodowska-Curie grant agreement No 663830. LN was supported by the Austrian Science Fund (FWF) Grant P28521. Access to Pleiades imagery was provided through the Österreichische Forschungsförderungsgesellschaft (FFG) project "High resolution spaceborne studies of mass balance processes on glaciers of the Khumbu Himal, Nepal" (GlHima-Sat). We acknowledge the BritInn Fellowship programme which funded AR's work visit to Univ. of Innsbruck to develop this research with LN in 2018. We are grateful to United States Geological Survey and to Planet API program for providing free access to 715 Landsat and RapidEye imagery. Thanks for Lorenzo Rieg and Christoph Klug of the University of Innsbruck for processing of the Pleiades DEMs.

List of References
Alifu, H., Johnson, B., and Tateishi Table 2 Confusion matrix of the spectral unmixing expressed as percentage accuracy (computed as numbers of pixels classified correctly/total number of reference pixels). Diagonal values represent number of pixels that are correctly classified, and off -1050 diagonal are number of pixels that are misclassified. Class accuracy is reported as omission and commission errors and corresponding Producer's and User's accuracy (as percent).   . 2 Types of surfaces present in the study area: a) light debris cover (quartz, feldspar); b) darker schistic debris with ice cliff; c) clean ice with crevasses in the glacier ablation area; d) graminoid shrub type vegetation (dry); e) supraglacial lakes of 1085 different turbidity; f) valley clouds. All photos were taken in the Khumbu region. Photo credit: A. Racoviteanu debris were combined into one class and the algorithm was applied over the entire scene.