Improving Semi-Automated Glacier Mapping with a Multi-Method Approach: Applications in Central Asia

Studies of glaciers generally require precise glacier outlines. Where these are not available, extensive manual digitization in a Geographic Information System (GIS) must be performed, as current algorithms struggle to delineate glacier areas with debris cover or other irregular spectral profiles. Although several approaches have improved upon spectral band ratio delineation of glacier areas, none have entered wide use due to complexity or computational intensity. In this study, we present and apply a glacier mapping algorithm in Central Asia which delineates both clean glacier ice and debris-covered glacier tongues. The algorithm is built around the unique velocity and topographic characteristics of glaciers, and further leverages spectral and spatial relationship data. We found that the algorithm misclassifies between 2 and 10% of glacier areas, as compared to a ∼750 glacier control dataset, and can reliably classify a given Landsat scene in 3-5 minutes. The algorithm does not completely solve the difficulties inherent in classifying glacier areas from remotely sensed imagery, but does represent a significant improvement over purely spectral-based classification schemes, such as the band ratio of Landsat 7 bands three and five or the Normalized Difference Snow Index. The main caveats of the algorithm are (1) classification errors at an individual glacier level, (2) reliance on manual intervention to separate connected glacier areas, and (3) dependence on fidelity of the input Landsat data.


Introduction
This study focuses on mapping glaciers over a large spatial scale using publicly available remotely sensed data.Several high-resolution glacier outline databases have been produced, most notably the Global Land Ice Measurements from Space (GLIMS) project (Armstrong et al., 2005;Raup et al., 2007Raup et al., , 2014) ) and the recently produced supplemental GLIMS data set known as the Randolph Glacier Inventory (RGI) v4.0 (Arendt et al., 2012;Pfeffer et al., 2014).Smallerscale glacier databases are also available, such as the Chinese Glacier Inventory (CGI) v2 (Guo et al., 2015).A coherent, complete, and accurate global glacier database is important for several reasons, including monitoring global glacier changes driven by climate change, natural hazard detection and assessment, and analysis of the role of glaciers in natural and built environments, including glacier contributions to regional water budgets and hydrologic cycles (Racoviteanu et al., 2009;Vaughan et al., 2013).Precision in glacier outlines is of utmost importance for monitoring changes in glaciers, which may change less than 15-30 m yr −1 (∼ 1-2 pixels of Landsat Enhanced Thematic Mapper (ETM+) panchromatic band/yr).
Several methods have been developed to delineate clean glacier ice (i.e., Hall et al., 1987;Paul, 2002;Paul et al., 2002;Racoviteanu et al., 2008a, b;Hanshaw and Bookhagen, 2014), relying primarily on multi-spectral data from satellites such as Landsat and Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER).Although significant progress has been made towards automated glacier outline retrieval using satellite imagery, these methods struggle to accurately map debris-covered glaciers or other glaciers with irregular spectral profiles (Paul et al., 2004;Bolch et al., 2007;Racoviteanu et al., 2008b;Scherler et al., 2011a;Bhambri et al., 2011).Much of this difficulty stems from the similarities in spectral profiles of debris located on top of a glacier tongue and the surrounding landscape.The majority of studies examining debris-covered glaciers employ extensive manual digitization in a geographic information system (GIS), which is very time consuming and can introduce significant user-generated errors (Paul et al., 2013;Pfeffer et al., 2014;Raup et al., 2014).Building on the multispectral, topographic, and spatially weighted methods developed by Paul et al. (2004), we present a refined rules-based classification algorithm based on spectral, topographic, velocity, and spatial relationships between glacier areas and the surrounding environment.The algorithm has been designed to be user-friendly, globally applicable, and built upon opensource tools.
2 Study area and data sources
The study area contains a wide range of glacier types and elevations, with both small and clean-ice-dominated glaciers, as well as large, low-slope, and debris-covered glaciers.The diversity in glacier types in the region provides an ideal test area -particularly in mapping glaciers with long and irregular debris tongues, such as the Inylchek and Tomur glaciers in the central Tien Shan (Shangguan et al., 2015).
The wintertime climate of the study area is controlled by both the winter westerly disturbances and the Siberian High, which dominate regional circulation and create strong precipitation gradients throughout the range, which extends from Uzbekistan in the west through China in the east (Fig. 1) (Lioubimtseva and Henebry, 2009;Narama et al., 2010;Bolch et al., 2011;Sorg et al., 2012;Cannon et al., 2014).The western edges of the region tend to receive more winter precipitation in the form of snow, with precipitation concentrated in the spring and summer in the central and eastern reaches of the range (Narama et al., 2010).

Data sources
Our glacier mapping algorithm is based on several data sets.The Landsat 5 (TM), 7 (ETM+), and 8 (OLI) platforms were chosen as the primary spectral data sources, as they provide spatially and temporally extensive coverage of the study area (Table 1).ASTER can also be used as a source of spectral information, but here we chose to focus on the larger footprint and longer time series available through the Landsat archive.In addition to spectral data, the 2000 Shuttle Radar Topography Mission V4.1 (SRTM) digital elevation model (DEM) (∼ 90 m, void-filled) was leveraged to provide elevation and slope information (Jarvis et al., 2008).The SRTM data and its derivatives were downsampled to 30 m to match the resolution of the Landsat images using bilinear resampling.The USGS HydroSHEDS (Hydrological data and maps based on SHuttle Elevation Derivatives at multiple Scales) river network (15 s resolution, ∼ 500 m) was also used as an input data set (Lehner et al., 2008).

Methods
Our glacier classification algorithm uses several sequential thresholding steps to delineate glacier outlines.The scripts used in this study are available in the data repository, with updates posted to http://github.com/ttsmith89/GlacierExtraction/.It is noted if the step requires manual processing or is part of a script.
(a) Velocity fields are calculated with normalized image cross-correlation (manual, can be automatized).
(b) The HydroSHEDS river network is rasterized (manual, can be automatized).(e) All input data sets are matched to a single extent and spatial resolution (30 m) (Python script).
(e) Distance-weighting metrics are used to remove areas distant from river networks or clean glacier ice (Matlab script).
(f) Distance-weighting metrics are used to remove areas very distant from clean glacier ice and manual seed points (Matlab script).
(g) The resulting glacier outlines are cleaned with statistical filtering (Matlab script).
(a) Glacier outlines are exported to ESRI shapefile format for use in a GIS (Python script).

Data preparation
For accurate glacier delineation, we primarily used Landsat images which were free of new snow, and had less than 10 % cloud cover.However, we have also included scenes with limited snow and cloud cover in our analysis to understand their impacts on our classification algorithm.We find that the presence of fresh snow in images tends to overclassify glacier areas and classify non-permanent snow as glaciers.Additionally, cloud-covered glaciers cannot be correctly mapped by the algorithm (Paul et al., 2004;Hanshaw and Bookhagen, 2014).We use the USGS Level 1T orthorectified Landsat scenes to ensure that the derived glacier outlines are consistent in space (Hansen and Loveland, 2012;Nuimura et al., 2015).
The algorithm uses Landsat imagery, a void-filled DEM, a velocity surface derived from image cross-correlation, and the HydroSHEDS 15 arcsec river network (buffered by 200 m and converted to a raster) as the primary inputs (steps 1a, 1b).The algorithm generates a slope image from the DEM and rectifies additional input data sets described below for processing by resampling and reprojecting each data set to the same spatial extent and resolution (30 m to match the Landsat data) (steps 1d, 1e).Although the current algorithm leverages a few proprietary Matlab commands, we will continue to update the code with the goal of using only open-source tools and libraries in the future.

Clean-ice delineation
Calculations are performed on rasterized versions of each input data set, which have been standardized to the same matrix size.The first step in the classification process leverages Landsat 7 bands 1, 3, and 5 (Step 2a).For Landsat 8 OLI images, a slightly different set of bands is used to conform to OLI's modified spectral range.For simplicity, bands referenced in this publication refer to Landsat 7 ETM+ spectral ranges.The ratio of TM3 / TM5 (value ≥ 2), with additional spectral information from TM1 (value > 25) has been used in previous research as an effective means of delineating glacier areas (e.g., Hall et al., 1987;Paul and Kääb, 2005;Hanshaw and Bookhagen, 2014) but is not effective in delineating debris-covered glacier areas (Fig. 2a).In our algorithm, we use a threshold of TM3 / TM5 ≥ 2 and TM1 > 60 to map clean glacier ice.While these thresholds perform well over many scenes, using thresholds which are not tuned to specific scene conditions can generate large errors, particularly in shadowed areas (Paul et al., 2015).Here we choose fairly conservative threshold values to ensure that we do not remove clean glacier ice.We find that increasing the TM1 threshold results in tighter classification of debris-covered glacier tongues but also removes some areas properly classified as glacier, particularly in steep areas of the accumulation zone.Thus, we err on the side of overclassification with our delineation of clean glacier ice.The end result of this step is the spectrally derived (clean-ice) glacier outlines, which are later integrated back into the workflow before statistical filtering (Fig. 2a).

Topographic filtering
Building on the work of Paul et al. (2004), low slope areas (between 1 and 24 • ) are isolated as areas where debriscovered glaciers are likely to exist (Step 2b).We choose the relatively high threshold of 24 • , as opposed to the 12 • suggested for the Himalaya (Bolch et al., 2007), the 15 • suggested for the Himachal Himalaya (Shukla et al., 2010), or the 18 • suggested for the Garhwal Himalaya (Bhambri et al., 2011) to ensure that we do not prematurely remove debriscovered areas.As we use the low-slope areas as our initial maximum likely debris extent, a conservative slope threshold helps reduce errors of underclassification.Low-elevation areas (automatically defined on a scene-by-scene basis based on the average elevation of clean-ice areas minus 1750, generally below 2500-3000 m in the study area) are then masked out to decrease processing time (Step 2c).These thresholding steps identify areas where there is the potential for a debris-covered glacier to exist and are performed independently of the previous, spectrally delineated, glacier outlines.Additional thresholding is then performed on this "potential debris area" subset to identify debris-covered glacier areas (Fig. 2b).
As can be seen in Fig. 2b, extensive areas that are not glacier or debris-covered glacier tongue are identified in this step.However, this step greatly reduces the processing time of subsequent steps by removing pixels outside of the main glacierized areas of any scene and allowing the algorithm to work on a subset of the image from this point forward.

Velocity filtering
The Correlation Image Analysis Software (CIAS) (Kääb, 2002) tool, which uses a method of statistical image crosscorrelation, is used to derive glacier velocities from Landsat band 8 panchromatic images.This method functions by tracking individual pixels across space and time, and provides a velocity surface at the same resolution as the input data sets (15 m) (Step 1a).The velocity surface is then upsampled using bilinear resampling to provide a consistent velocity estimate across the entire Landsat scene.We then standardized the velocity measurements to meters per year using the capture dates of the two Landsat images.As glacier velocity can change significantly throughout the year, and clean images were not available at exactly the same intervals for each Path/Row combination, there is some error in our velocity fields.However, as the velocity surface is used to remove stable ground, which is generally well-defined despite changes in glacier velocities, errors in the velocity surface do not contribute significantly to glacier classification errors, except on slower-moving parts of debris-covered glacier tongues.It is important to note that images must be cloudfree over glaciers and snow-free off glaciers for this step, as the presence of snow or cloud cover can disrupt the correlation process, resulting in anomalous velocity measurements.An example velocity surface is shown in Fig. 2c (Step 2d).Red areas are removed from the "potential debris areas", as they fall outside of the expected range of debris-tongue velocities.
We only used one multi-year velocity measurement for each Path/Row combination to derive general areas of movement/stability for glacier classification, as using stepped velocity measurements over smaller time increments did not show a noticeable improvement in glacier classification.This also improved our classification of slow-moving glaciers, which may not change significantly over only a single year.These velocities ranged generally from 4.5 to 30 m yr −1 across the different scenes.A single velocity threshold of 5 m yr −1 was used across all scenes to remove www.the-cryosphere.net/9/1747/2015/The Cryosphere, 9, 1747-1759, 2015 stable ground.A method of frequential cross-correlation using the co-registration of optically sensed images and correlation (COSI-Corr) tool (Leprince et al., 2007;Scherler et al., 2011b) was tested and did not show any appreciable improvement in velocity measurements (Heid and Kääb, 2012).The velocity step is most important for removing hard-toclassify pixels along the edges of glaciers and wet sands in riverbeds.These regions are often spectrally indistinguishable from debris tongues but have very different velocity profiles.It is important to note, however, that this step also removes some glacier area, as not all parts of a glacier are moving at the same speed.This can result in small holes in the delineated glaciers, which the algorithm attempts to rectify using statistical filtering.Generating a velocity field is the most computationally expensive step of the algorithm.

Spatial weighting
After topographic and velocity filtering, a set of spatially weighted filters was constructed.The first filtering step uses the HydroSHEDS river network to remove "potential debris areas" which are distant from the center of a given glacier valley (Fig. 2d; Step 2e).As glaciers occur along the flow lines of rivers, and the HydroSHEDS river network generally delineates flow lines nearly to the peaks of mountains, the river network provides an ideal set of seed points with which to remove misclassified pixels outside of river valleys.A second distance weighting is then performed using the clean-ice outlines generated in Step 2a, as well as any manual seed points provided (Step 2f).As debris tongues must occur in proximity to either glacier areas or the centerlines of valleys, these two steps are effective in removing overclassified areas (Fig. 2d).The spatial weighting performed here differs from that proposed by Paul et al. (2004) in that it uses a measure of geodesic distance from given seed points, as opposed to maintaining entire polygons which are connected to cleanice areas.This difference helps remove non-glacier areas that are distant from clean ice but still connected by at least a single pixel to clean-ice areas.At this step, it is possible to add manual seed points, which may be necessary for some longer debris tongues.We note that these are optional, and the majority of glaciers do not need the addition of manual seed points.However, for certain irregular or cirque glaciers, the addition of manual seed points has been observed to increase the efficacy of the algorithm.In processing the Landsat imagery presented here, we have not used additional manual seed points.
The spatial-weighting step is essential for removing pixels spatially distant from any clean-ice area.In many cases, large numbers of river pixels and, in some cases, dry sand pixels have similar spectral and topographic profiles to debriscovered glaciers.This step effectively removes the majority of pixels outside the general glacierized area(s) of a Landsat scene, as can be seen in Fig. 2e.

Statistical filtering
Once the spatial-weighting steps are completed, a set of three filters are then applied in order to remove isolated pixels, bridge gaps between isolated glacier areas, and fill holes in large contiguous areas (Step 2g).First, a 3 × 3 median filter is applied, followed by an "area opening" filter, which fills holes in contiguous glacier areas.Finally, an "image bridging" filter is applied to connect disjointed areas, and fill holes missed by the area opening filter.
This step is necessary for filling holes and reconnecting separated glacier areas that result from the initial thresholdbased filtering steps.For example, slow-moving pixels in the middle of a debris-covered glacier tongue that were removed based on velocity filtering are often restored by the statistical filtering (Fig. 3).The improved classification of debris areas between the clean-ice and final algorithm outputs can clearly be seen in Fig. 3.

Creation of manual control data sets
Manual control data sets encompassing ∼ 750 glaciers (∼ 11 000 km 2 ) were created to test the efficacy of the glacier mapping algorithm.These data sets were digitized from Landsat imagery in a GIS and then corrected with higherresolution imagery in Google Earth.The data sets are coherent in space but cover two different times (∼ 2000 and ∼ 2011, depending on the dates of the available Landsat scenes).The bulk of the manually digitized glaciers fall within the boundary of Landsat Path/Row combination 147/031, as this is the most heavily glacierized subregion of our study area.However, we have digitized glaciers throughout the eight Path-Row combinations to avoid biasing our statistics and algorithm to one specific scene extent.We have also considered a wide range of size classes in our manual data set (<0.5 km 2 to 500+ km 2 ), as well as both clean-ice and debris-covered glaciers.We note that although the manual data sets here are considered "perfect", there is inherent error in any manual digitization in a GIS (e.g., Paul et al., 2013).Due to the lack of ground truth information, we have estimated the overall uncertainty of the manual data set to be 2 % based on previous experiments (Paul et al., 2002(Paul et al., , 2013)).Figure 4 shows the size-class distribution of the manual control data set, with logarithmic area scaling.
Before any comparisons between glaciers can be performed, glacier complexes must be split into component parts.A set of manually edited watershed boundaries, derived from the SRTM DEM, were used to split both the manual and algorithm data sets into individual glacier areas for analysis.In this way, the diverse data sets and classified glacier areas can be split into the same subset areas for statistical comparison.

Results
Over the eight Landsat footprints used in this study, we map ∼ 44 000 km 2 of glaciers over two distinct time slices.Several additional time periods were mapped but not included in the statistical analysis presented in this paper.

Statistical analysis of algorithm errors
A subset of 215 glaciers from the manual control data sets of varying size and topographic setting was chosen for more detailed analysis.The unedited, algorithm-generated glacier outlines were compared against spectral outlines, which only classify the glacier areas via commonly used spectral subsetting (using TM1, TM3, and TM5, produced in Step 2b), the manual control data sets, and the CGI v2. Figure 5a shows the bulk elevation distributions across 215 glaciers for each data set in 10 m elevation bins.There is some apparent bias in our algorithm towards lowelevation areas, which represent the debris-covered portions of glaciers and are the most difficult areas to classify.This bias also stems from misclassified areas in shadows, particularly in north-facing glaciers.There is also a bias in our control data set towards underclassifying the high-elevation areas, which we attribute to user bias in removing isolated rock outcrops within glaciers, as opposed to simply defining accumulation areas as a single polygon.In general, the algorithm and the control data set are well-matched below 4000 m; above this, the spectral data set and the algorithm www.the-cryosphere.net/9/1747/2015/The Cryosphere, 9, 1747-1759, 2015 data set begin to align closely and generally follow the manually digitized data.This threshold represents the general transition from debris-covered glaciers to clean glacier ice in the study area.Our algorithm output is also well-matched with the CGI v2, except at very high elevations where it overclassifies some areas as compared to the CGI v2.
In order to examine inherent bias throughout the algorithm classification, under-and overclassified areas were examined for a subset of the control data set.To determine areas of overclassification (underclassification), the manually (algorithm) generated data set was subtracted from the algorithm (manual) data set, leaving only pixels that were overclassified (underclassified).Figure 5b shows the elevation distributions of under-and overclassified areas.The algorithm tends to consistently overclassify areas across the range of glacier elevations, which we attribute here to differences in manual and algorithm treatment of steep and de-glacierized areas within glacier accumulation zones.Importantly, the algorithm underclassifies a much smaller number of pixels, generally corresponding to areas below 4000 m, where debris tongues are dominant.The majority of these pixels are along the edges of debris-covered glacier tongues, which are removed by the algorithm due to their low relative velocity.It is also possible that some of these pixels are "dead ice", which is difficult to differentiate from debris tongues by visual inspection.The total misclassification of algorithm-derived outlines against two independent manual control data sets is 2 and 10 % respectively, which represents a significant improvement from a pure spectral delineation approach.
To investigate sampling bias in our analysis, we used 465 GLIMS glacier identification numbers (centroids, point features) that overlapped with the manual control data sets.A random subset of 100 of these points was chosen for this analysis.As can be seen in Fig. 5c, similar patterns emerge between the randomly sampled glaciers and the sampling used in other sections of this paper.There is evidence of more noise in the random sample, as some glaciers which we avoided due to closeness to wet sand/or other hard-to-classify areas were chosen during the random sampling.However, the relationship between the algorithm and the manual data sets remains significant (Kolmogorov-Smirnov test passes at 99 % confidence interval).

Vertex distance matching
To capture changes in the shape of the glacier outlines between the initial spectral classification and the final algorithm output, we computed the distance between pairs of glacier vertices.We first reduced our manual control data set to a set of X/Y pairs for each component vertex, which were then matched to the closest vertex in the resulting spectral and final algorithm polygons, respectively (Fig. 6).
The distance distribution for the algorithm data set shows generally close agreement between the algorithm and manual control data sets.The spectral data set also contains a large percentage of vertices close to a 1 : 1 agreement with the manual control data set, which are primarily the vertices at the upper edges of glaciers or from small, debrisfree glaciers.The difference in these two distributions is attributed to the increased precision with which the algorithm maps debris-covered glacier outlines.Both data sets were normalized by their whole-data-set maximum distances.

Comparison with previous glacier mapping algorithms
Several authors have presented alternative debris-covered glacier classification methods and schemes using thermal and spectral data (Taschner and Ranzi, 2002), topographic and neighborhood analysis (Paul et al., 2004), clustering of optical and thermal data (Bolch et al., 2007), maximum likelihood classification (Shukla et al., 2010), slope and curvature clustering combined with thermal data (Bhambri et al., 2011), decision tree classification and texture analysis, (Racoviteanu and Williams, 2012) and object-based classifications (Rastner et al., 2014).While all of these methods present improvements over basic clean-ice delineation as proposed by Hall et al. (1987), they each have shortcomings that limit their range of use.Table 2 shows a comparison of these different methods alongside the algorithm presented in this study.
Our study improves on previous work in three main ways: (1) reduced computational intensity, (2) greater diversity of study area, and (3) increased temporal range of our data set.The methods proposed in this study, excepting the generation of a velocity field, require very little processing power.Once initial input data sets (velocity surface, rasterized river network) have been created, a Landsat scene can be processed in 3-5 min (Ubuntu 14.04, 8 cores (3.6 GhZ), 16 GB RAM).When this is compared with the training data set creation, computationally expensive classification schemes, and This has allowed us to further generalize our algorithm to be effective beyond a single scene or small set of scenes, and to remain effective across a wide spatial and temporal range.The time-dynamic aspect of our algorithm can also complement time-static wide-area data sets, such as the RGI v4.0, the CGI v2, and the forthcoming GAMDAM data sets (Arendt et al., 2012;Guo et al., 2015;Nuimura et al., 2015).While these data sets may provide higher-quality manually digitized outlines for specific glaciers, they only provide a single snapshot in time and are limited to a specific area of coverage.

Additional tested filtering steps
Two additional topographic indices -spatial fast Fourier transforms (FFTs), also known as 2-D FFTs, and ASTER surface roughness measurements -were tested during the development of the algorithm, although neither provided significant improvement.We attempted to derive frequential information from several Landsat and ASTER bands, with limited success.Some glaciers exhibit a unique frequency signature when analyzed using spatial FFTs, although these were not consistent across multiple debris-covered glaciers with differing surface characteristics.Additionally, the FFT approach was tested against a principal component analysis (PCA) image derived from all Landsat bands, without significant improvement to the algorithm.We also attempted to integrate surface roughness measurements using the ASTER satellite, which contains both forward looking (3N -nadir) and backwards looking (3B -backwards) images, primarily intended for the generation of stereoscopic DEMs.The difference in imaging angle provides the opportunity to examine surface roughness by examining changes in shadowed areas (Mushkin et al., 2006;Mushkin and Gillespie, 2011).We found that there are slight surface roughness differences between terrain on and off glaciers; however, these differences are not significant enough to use as a thresholding metric.Furthermore, the nature of the steep topography limits the efficacy of this method, as valleys which lie parallel the satellite flight path and those which lie perpendicular to the flight path show different results.Thus, the algorithm relies on the velocity and slope thresholds to characterize the topography of the glacier areas.

Algorithm use cases and caveats
The glacier outlines provided by the algorithm are a useful first-pass analysis of glacier area.It is often more efficient to digitize only misclassified areas, as opposed to digitizing entire glacier areas by hand (Paul et al., 2013).Paul et al. (2013) also note that for clean ice, automatically derived glacier outlines tend to be more accurate, and it is only in the more difficult debris-covered and shadowed areas that manual digitization becomes preferable.In the algorithm presented here, clean-ice thresholding was implemented using TM1, TM3, and TM5.However, because the algorithm operates primarily on "potential debris areas", any clean-ice classification scheme could be used.For example, in other study regions or for different satellite sensors, other schemes, such as the normalized difference snow index (Dozier, 1989), may outperform clean-ice classification as implemented in this study.
The algorithm moves a step further than spectral-only classification and attempts to classify glacier areas as accurately as possible, including debris-covered areas.As can be seen in Fig. 7, the algorithm compares well with both the control data set and the CGI v2 -a high-fidelity, manually edited, data set -across a range of glacier types (Step 2a) (Guo et al., 2015).However, the algorithm outlines do not perfectly align with either data set.In Fig. 7, a tendency to remove pixels along the edge of debris-covered glacier tongues can be observed, which we attribute to the fact that the center of debris tongues often move faster than the edges.Furthermore, both the algorithm results and the manual control data set underestimate glacier area as compared to the CGI v2, due to the removal of non-clean-ice pixels at high altitudes or high slopes, which are generally within the accumulation area of a glacier but are not always covered by permanent ice.These two types of classification bias are easily rectified with minimal manual intervention.Some biases between the manual or algorithm data sets and the CGI v2 can also be attributed to the difference in time; while the manual and algorithm data sets share an image date, the CGI v2 was digitized on top of multiple images that may not match up perfectly in time with our data sets.Despite these misclassified areas, the raw algorithm output effectively identifies the furthest reaches of the glacier tongues in most cases, as can be seen in three long debris tongues shown in Fig. 7.
Without post-processing, these raw glacier outlines can be used to analyze regional glacier characteristics, such as slope, aspect, and hypsometry.Even if glacier outlines are not perfectly rectified in space, at the scale of watersheds, satellite image footprints, or mountain ranges, errors of underand overclassification even out, yielding valuable regional statistics (Fig. 5a).As the method can be easily modified to fit the topographic and glacier setting of any region, it is a powerful tool for analyzing glacier changes over large scales for the period of Landsat TM, ETM+ and OLI coverage.While the algorithm has yet to be applied to large and slow-moving debris-covered glaciers in the Himalaya, a wide range of glacier size classes, speeds, and topographic settings are well-classified by the algorithm.For example, even small glacier changes are captured by the algorithm, as can be seen in Fig. 8.
Figure 8 also illustrates some potential errors in the algorithm where river sand is sometimes delineated as glacier area.In many cases, the same areas are captured across different time periods, as the topographic and velocity data used to define "potential debris areas" are mostly static in time, excepting the distance-weighting steps.However, these areas are easily removed during manual inspection of results.
The second use case for the algorithm is as a substitute for simple spectral ratios.Manual digitization of glacier tongues is time consuming, particularly in regions with numerous debris-covered glaciers.Our algorithm provides a robust baseline set of glacier outlines that can be corrected manually, with minimal extra processing time.As generating the input velocity surfaces can take longer than processing glacier outlines from dozens of Landsat scenes, efficiencies are gained when Landsat scenes are processed in bulk.The algorithm as presented in this paper takes ∼ 3-5 min of actual processing time once the base data sets have been created.For a single Path/Row combination, the time to set up the input data sets (velocity surface, manual debris points) is ∼ 4 h.Once the initial setup has been completed for a given Path/Row combination, any number of Landsat scenes can be processed very quickly.
Although the algorithm represents a step forward in semiautomated glacier classification, there are several important caveats to keep in mind.(1) Lack of data density and temporal range limits the efficacy of individual glacier analysis; the algorithm presented in this paper was not designed with individual glacier studies in mind, and in many cases, such as in mass balance studies, more accurate manual glacier outlines are necessary.Furthermore, (2) the algorithm relies on manual intervention to separate individual glaciers which are connected through overlapping classified areas or which are part of glacier complexes.Finally, (3) the algorithm relies heavily on the fidelity of the Landsat images provided, in that glacier outlines on images with cloud or snow cover are less likely to be well-defined.This creates a data limitation, as many glacierized areas are subject to frequent cloud and snow cover and thus have a limited number of potentially useful Landsat images for the purpose of this algorithm.

Conclusions
This study presents an enhanced glacier classification methodology based on the spectral, topographic, and spatial characteristics of glaciers.We present a new method of (semi-)automated glacier classification, which is built upon, but unique from, the work of previous authors.Although it does not completely solve the difficulties associated with debris-covered glaciers, it can effectively and rapidly characterize glaciers over a wide area.Following an initial delineation of clean glacier ice, a set of velocity, spatial, and statistical filters are applied to accurately delineate glacier outlines, including their debris-covered areas.
When compared visually and statistically against a manually digitized control data set and the high-fidelity CGI v2, our algorithm remains robust across the diverse glacier sizes and types found in central Asia.The algorithm developed here is applicable to a wide range of glacierized regions, par- ticularly in those regions where debris-covered glaciers are dominant and extensive manual digitization of glacier areas has previously been required.The raw algorithm output is usable for rough statistical queries on glacier area, hypsometry, slope, and aspect; however, manual inspection of algorithm output is necessary before using the generated glacier outlines for more in-depth area change or mass balance studies.
The Supplement related to this article is available online at doi:10.5194/tc-9-1747-2015-supplement.

Figure 1 .
Figure 1.Greater study area of the Tien Shan, showing SRTM v4.1 topography (Jarvis et al., 2008) and location of eight Landsat image footprints (grayscale) used in the study, along with their Path/Row combinations.Blue box delineates Figs. 2, 3 and 7, yellow box delineates Fig. 8. Winter westerly disturbances and Siberian High highlighted in orange.

Figure 2 .
Figure 2. (a) Characteristic example of a debris-covered glacier tongue (Inylchek Glacier).Spectrally delineated glacier outlines (black), over Landsat bands B7/B5/B3 (R/G/B), from image LC81470312013268LGN00, with poorly mapped debris-covered tongues (red arrows).(b) Blue areas show "potential debris areas", as delineated by slopes between 1-24 • , with elevations below ∼ 2500 m removed, SRTM hillshade underneath, clean-ice outlines overlain in black.(c) Example of a glacier velocity surface, generated using normalized image crosscorrelation (NICC).Areas in red are slow-moving areas and represent stable ground, clean-ice outlines overlain in black.(d) Example of distance-weighting seed areas used to remove pixels from the "potential debris areas" which are distant from either a river valley or classified glacier ice.Rivers in blue, clean-ice outlines overlain in black.(e) Areas removed by second distance-weighting step (yellow).(f) Impacts of statistical filtering on glacier outlines, with areas in black removed during the filtering process.East and west Qong Terang glaciers.

Figure 3 .
Figure 3. Final algorithm outlines (black) with areas classified in addition to the clean-ice delineation in red.Landsat OLI image captured on 25 September 2013 as background.

Figure 4 .
Figure 4. Glacier size class distribution (n = 750) for the manual control data set.Note the logarithmic x axis to account for a wide range of glacier sizes.

Figure 5 .
Figure 5. (a) Bulk elevation distributions of sampled glaciers, with manual delineation (reference data set, n = 215, 4500 km 2 ) in blue, algorithm-derived delineation in red, spectral delineation in green, and CGI v2 in black.Values have been normalized to maximum probability.(b) Elevation distributions of over-and underclassified glacier areas, as compared to a manual control data set (n = 75, 330 km 2 ); 5.5 % is overclassified, and 0.8 % is underclassified.(c) Averaged elevation differences for a random sample of glaciers overlapping a manual control data set (n = 100, 100 km 2 ).

Figure 6 .
Figure 6.Vertex distance distributions for algorithm (blue) and spectral (red) vertices, as compared to a manual control data set, normalized to the maximum distance.

Figure 7 .
Figure 7. Algorithm outlines (yellow) compared to the control data set (black) and the CGI v2 (red) illustrate high fidelity in overall debris-tongue length between the three data sets, although the algorithm outlines exhibit noise along the edges of debris tongues.

Figure 8 .
Figure 8. Algorithm outlines for July 2013 (black) and algorithm outlines for August 2002 (yellow), showing small retreats in glacier areas, particularly at the debris tongues.Vicinity of the Akshiirak glacierized massif, central Tien Shan.

Table 1 .
Data table listing Landsat acquisition dates used in this study.Organized by WRS2 Path/Row combinations.Bold dates indicate images used for velocity profiles.

2015/ The Cryosphere, 9, 1747-1759, 2015 T. Smith et al.: Large-scale glacier mapping in central Asia neighborhood
analyses employed by other studies, there is a clear improvement in efficiency.Secondly, we analyze a significantly larger glacier area than any of the previous studies, which has helped us generalize our algorithm and methods to a wide range of topographic and land cover settings.Finally, we process a multi-year data set, encompassing 40 Landsat scenes with varying land cover and meteorological settings.