A wealth of research has focused on elucidating the key
controls on mass loss from the Greenland and Antarctic ice sheets in
response to climate forcing, specifically in relation to the drivers of
marine-terminating outlet glacier change. The manual methods traditionally
used to monitor change in satellite imagery of marine-terminating outlet
glaciers are time-consuming and can be subjective, especially where
mélange exists at the terminus. Recent advances in deep learning applied
to image processing have created a new frontier in the field of automated
delineation of glacier calving fronts. However, there remains a paucity of
research on the use of deep learning for pixel-level semantic image
classification of outlet glacier environments. Here, we apply and test a
two-phase deep learning approach based on a well-established convolutional
neural network (CNN) for automated classification of Sentinel-2 satellite
imagery. The novel workflow, termed CNN-Supervised Classification (CSC) is
adapted to produce multi-class outputs for unseen test imagery of glacial
environments containing marine-terminating outlet glaciers in Greenland.
Different CNN input parameters and training techniques are tested, with
overall
Quantifying glacier change from remote sensing data is essential to improve our understanding of the impacts that climate change has on glaciers (Vaughan et al., 2013; Hill et al., 2017). In many glaciated areas, well-established semi-automated techniques such as image band ratio methods are used to extract glacier outlines for this purpose and to create glacier inventories (Paul et al., 2016). These methods are widely used in studies of mountain glaciers and ice caps (e.g. Bolch et al., 2010; Frey et al., 2012; Rastner et al., 2012; Guo et al., 2015; Stokes et al., 2018). However, they are less effective for mapping more complex glaciated landscapes such as marine-terminating outlet glaciers, which often contain spectrally similar surfaces like mélange (a mixture of sea ice and icebergs) near their calving fronts (Amundson et al., 2020).
As a result, manual digitisation remains the most common technique used to delineate marine-terminating glaciers (e.g. Miles et al., 2016, 2018; Carr et al., 2017; Wood et al., 2018; Brough et al., 2019; Cook et al., 2019; King et al., 2020). Nonetheless, the labour-intense nature of manual digitisation can result in datasets with spatial or temporal limitations (Seale et al., 2011). With this in mind, the importance of processes occurring at marine-terminating outlet glaciers on a range of spatio-temporal scales (Amundson et al., 2010; Juan et al., 2010; Chauché et al., 2014; Carroll et al., 2016; Bunce et al., 2018; Catania et al., 2018, 2020; King et al., 2018; Bevan et al., 2019; Sutherland et al., 2019; Tuckett et al., 2019) highlights the growing need for a more efficient method to quantify outlet glacier change, especially in an era of increasingly available satellite data.
To confront this challenge, several specialised automated techniques reliant on traditional image processing and computer vision tools (i.e. semantic segmentation and edge detection) have been developed to extract ice fronts in Greenland and Antarctica (Sohn and Jezek, 1999; Liu and Jezek, 2004; Seale et al., 2011; Krieger and Floricioiu, 2017; Yu et al., 2019). Semantic segmentation, a term interchangeable with pixel-level semantic classification, divides an image into its constituent parts based on groups of pixels of a given class and assigns each pixel a semantic label (Liu et al., 2019). It remains a core concept underlying more recent advancements which use deep learning approaches to classify imagery for more efficient automated calving front detection (Baumhoer et al., 2019; Mohajerani et al., 2019; Zhang et al., 2019; Cheng et al., 2021).
Deep learning is a type of machine learning in which a computer learns
complex patterns from raw data by building a hierarchy of simpler patterns (Goodfellow et al., 2016). Convolutional neural networks (CNNs)
are deep learning models specifically designed to process multiple 2D arrays
of data such as multiple image bands (LeCun et al., 2015).
They differ from conventional classification algorithms based solely on the
spectral properties of individual pixels by detecting the contextual
information in images such as shape and texture, in the same way a human
operator would. This is beneficial for classification of complex
environments with little contrast between spectrally similar surfaces (e.g.
glacier ice/ice shelves, snow, mélange, and water containing icebergs)
where traditional statistical classification techniques (e.g. maximum
likelihood) produce more noisy classifications (Li et al., 2014). Previous studies which apply deep learning to detect the calving fronts of marine-terminating glaciers used a type of CNN called a
fully convolutional neural network (FCN) (Ronneberger et al., 2015) and various post-processing techniques to extract the boundaries between (1) ice and ocean in Antarctica (Baumhoer et al., 2019) and (2) marine-terminating
outlet glaciers and mélange/water in Greenland (Mohajerani et al., 2019;
Zhang et al., 2019; Cheng et al., 2021). Calving fronts detected using these
methods deviate by 38 to 108 m (
These approaches have so far relied on a binary classification of input images. For example, Baumhoer et al. (2019) used only two classes (land ice and ocean). Similarly, Zhang et al. (2019) classified images into ice mélange regions and non-ice-mélange regions (the latter including both glacier ice and bedrock). While these methods are valuable for extracting glacier and ice shelf fronts to quantify fluctuations over time, they perhaps overlook the ability of deep learning methods to create highly accurate image classification outputs which contain more than two classes (i.e. not just ice and no-ice areas). Aside from calving front delineation, a method which produces multi-class image classifications could provide an efficient way to further elucidate processes and interactions controlling outlet glacier behaviour at high temporal resolution (e.g. calving events, the buttressing effects of mélange, subglacial plumes, and supra-glacial lakes). Moreover, deep learning has been used successfully in other disciplines to classify entire landscapes or image scenes to a high level of accuracy (Sharma et al., 2017; Carbonneau et al., 2020a). In glaciology, CNNs have been used to map debris-covered land-terminating glaciers (Xie et al., 2020), rock glaciers (Robson et al., 2020), supraglacial lakes (Yuan et al., 2020), and snow cover (Nijhawan et al., 2019). Despite this, multi-class image classification of entire marine-terminating outlet glacier environments has not yet been tested using deep learning.
Thus, the aim of this paper is to adapt a two-phase deep learning method which was originally developed to classify airborne imagery in fluvial settings (Carbonneau et al., 2020a) and test it on satellite imagery of marine-terminating outlet glaciers in Greenland. We first modify and train a well-established CNN using labelled image tiles from 13 seasonally variable images of Helheim Glacier, southeast Greenland. The two-phase deep learning approach is then applied to produce pixel-level classifications, from which calving front outlines are detected and error is estimated from manually delineated validation labels. We assess the sensitivity of the classification workflow to different image band combinations, training techniques, and model parameters for fine-tuning and transferability. Our objective is to establish and evaluate a workflow for multi-class image classification for glacial landscapes in Greenland which can be accessed and used rapidly without having specialised knowledge of deep learning or the need for time-consuming generation of substantial new training data. Furthermore, we aspire to exceed the current state of the art for pixel-level image classification of marine-terminating outlet glacier landscapes. The methods developed here are trained and tested on glaciers in Greenland with a pre-defined set of seven image classes.
The classification workflow used here is termed CNN-Supervised
Classification (CSC) and was originally developed and tested on airborne
imagery (
The pre-trained CNN applied in phase one of CSC falls into the category of
supervised learning (Goodfellow et al., 2016) and is trained with a sample
of image tiles which have been manually labelled according to class
(training dataset). Each tile used to train the phase one CNN represents a
sample of pure class (i.e. one class covers over 95 % of the tile area),
allowing the CNN to learn predictive features and subsequently make class
predictions for a tiled input image not previously seen in training (test
dataset). During phase one of CSC, unseen test images are tiled and encoded
in the form of 4D tensors which contain several separate tiles (dimensions:
tiles,
Conceptual diagram of the CNN-Supervised Classification workflow showing the production of a tiled class raster in phase one. Phase one predictions are then used as image-specific training labels for the phase two model which produces a final pixel-level classification.
Since the phase one CNN predictions take the form of a tiled class raster, it is expected that individual tiles may straddle more than one class and result in inaccurate class boundaries. As a result, this will generate some error in the phase one predictions and therefore phase two training labels. Nonetheless, deep learning approaches have been found to tolerate noise in training labels (Rolnick et al., 2018). This is because the training process minimises overall error rather than memorising noise, meaning models can still learn a trend even if some labels are wrong. Likewise, the phase two models used in CSC are robust to noise and have been shown to overcome these errors with resulting pixel-level classifications following class boundaries much more accurately (Carbonneau et al., 2020a).
An area spanning
The glacier, fjord, and surrounding landscape provide an ideal training area for the deep learning workflow because they contain a number of diverse elements that vary over short spatial and temporal scales and are typical of other complex outlet glacier settings in Greenland. These characteristics include (1) seasonal variations in glacier calving front position; (2) weekly to monthly changes in the extent and composition of mélange; (3) sea ice in varying stages of formation; (4) varying volumes and sizes of icebergs in fjord waters; (5) seasonal variations in the degree of surface meltwater on the glacier and ice mélange; (6) short-lived, meltwater-fed glacial plumes which result in polynyas adjacent to the terminus; and (7) seasonal variations in snow cover on both bedrock and ice. The resulting spectral variations over multiple satellite images, in addition to potential differences resulting from changes in illumination and weather, pose a considerable challenge to image classification. However, capturing these characteristics at the scale of an entire outlet glacier image scene is important for a more efficient and integrated understanding of how numerous glacial processes interact. Examination of imagery showing the seasonal change of the glacial landscape throughout 2019 resulted in the establishment of seven semantic classes, including (1) open water, (2) iceberg water, (3) mélange, (4) glacier ice, (5) snow on ice, (6) snow on rock, and (7) bare bedrock (see class examples in Fig. 2b and detailed criteria for each in Table S1). Training and validation data for the phase one CNN applied in CSC was collected from the Helheim study area shown in Fig. 2 and labelled according to these seven classes.
The ability of a model to accurately predict the class of pixels in an
unseen test image is called generalisation (Goodfellow et al., 2016) and determines the transferability of the model. To test the transferability of the CSC workflow adapted for marine-terminating glacial landscapes in Greenland, we applied CSC to a test dataset composed of seasonally variable imagery from in-sample and out-of-sample study sites (Fig. 3). CSC was never tested on any image that was used in training. Rather, the in-sample test dataset is compiled of images from the same glacier used in training but acquired on different dates to training data.
The in-sample test site includes Helheim Glacier (Helheim) and has a
slightly smaller area (
Test areas used to quantify the transferability of the CSC
workflow.
The out-of-sample test areas contain Jakobshavn Isbrae (Jakobshavn) and
Store Glacier (Store) in central west (CW) Greenland, and they represent
outlet glacier landscapes never seen during training (Fig. 3b and c). The
Jakobshavn site spans
To train and test the CSC workflow, Sentinel-2 image bands 4, 3, 2, and 8 (red, green, and blue (RGB) and near infrared (NIR)) were used at 10 m spatial resolution. RGB bands are commonly selected for image classification with deep learning architectures, making existing CNNs easily transferable for the purpose of this study. Additionally, snow and ice have high reflectance in the NIR band, which is often used in remote sensing of glacial environments, for example to identify glacier outlines using band ratios (e.g. Alifu et al., 2015). Initial testing revealed that the combination of RGB and NIR bands (collectively referred to as RGBNIR) improved classification results compared to using RGB bands alone (see Sect. 2.6). Thus, four-band RGBNIR images of the study sites were used as CSC inputs.
Cloud cover and insufficient solar illumination present challenges when
using optical satellite imagery such as Sentinel-2 data, meaning data
availability for the study sites was limited to cloud-free imagery from
February to October. Despite these limitations, sufficient data were
available to train and test CSC on seasonal timescales. Therefore, to best
encompass the seasonally variable landscape characteristics and collect
sufficient training data to represent intra-class variation, 13 cloud-free
Sentinel-2 images of the Helheim training area, taken between February and
October 2019, were acquired for phase one CNN training (Table S2 in the
Supplement). Similarly, a seasonally variable test dataset composed of nine
in-sample images from 2019 with different dates to training data and 18
out-of-sample images from February to October 2020 were acquired (Table S2
in Supplement). Level-2A products were downloaded from Copernicus Open
Access Hub (available at
For the base architecture of the pre-trained CNN used in phase one of CSC, we
adapted a well-established CNN called VGG16 (Simonyan and Zisserman, 2015) which achieved state-of-the-art performance in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2014. The architecture used consists of five stacks of 13 2D convolutional layers which have
Architecture of phase one CNN, adapted from the original VGG16
model architecture (Simonyan and Zisserman, 2015). Diagram shows an example
with a
The input image tile size for the first convolutional layer in the original
VGG16 model architecture was fixed as a
To train the phase one CNN, we employed early stopping to control
hyperparameters and inhibit overfitting which occurs when a model is unable
to generalise between training and validation data (Goodfellow et al.,
2016). To do this, we designed a custom callback that trains the network
until the validation data (20 % set aside with a train–validate–split)
reach a desired target accuracy threshold. These targets ranged from 92.5 % to 99 % and determined the number of epochs the CNN was trained for. We used categorical cross entropy as the loss function and Adam gradient-based optimisation (Kingma and Ba, 2017) with a learning
rate of
When applying CSC to multiple sites, we came to a similar conclusion to
Carbonneau et al. (2020a), who found that model transferability was
improved when the phase one CNN was trained with data from more than one
site. We therefore deployed a joint fine-tuning training procedure where a
CNN initially trained only on data from Helheim was trained further with a
small set of extra tiles (5000 samples per class) using only two images
(one from winter and one from summer) for all three glaciers. This fine
tuning was done at a low learning rate of
A dataset of 210 000 training samples with 30 000 image tiles per class was
used to train and validate the phase one CNN. To create the training tiles,
the RGBNIR images extracted from 13 Sentinel-2 acquisitions were manually
labelled according to the seven semantic classes using QGIS 3.4 digitising
tools. Vector polygons labelled by class number were rasterised to produce a
per-pixel class raster the same size as the training area. Both the input
image and class raster were then tiled using a script which extracted tiles
with high overlap using a stride of 20 px (Fig. 5). Each tile was
extracted and assigned a class label based on the manually delineated class
raster, and any tiles occupied by less than 95 % pure class were rejected,
removing tiles containing mixed classes. Once extracted, each image tile was
augmented by three successive rotations of 90
Conceptual diagram of the tiling process used to create training
and validation data. A specified tile size and stride were used to extract
tiles from the class raster and training image. Image tiles were filtered,
augmented, and saved to individual class folders using an
The tiles were randomly allocated to training and validation folders with an
80 %
To classify airborne imagery of fluvial scenes at pixel level using the CSC
workflow, Carbonneau et al. (2020a) applied a pixel-based approach using an
MLP in the second phase of the workflow, achieving high levels of accuracy
(90 %–99 %). We propose that applying pixel-based techniques to coarser-resolution imagery such as Sentinel-2 data may be less effective compared to applying the workflow to high-resolution imagery. Furthermore, particularly in landscapes containing marine-terminating glaciers, many distinct classes may be covered in snow or ice and therefore be very spectrally similar (i.e. all classes are white), and where this is the case a pixel-based MLP would predictably struggle to differentiate between classes. So, in addition to testing a pixel-based MLP, we adopted a patch-based approach which uses a small window of pixels to determine the class of a central pixel, as in Sharma et al. (2017). This approach is based on the idea that a pixel in remotely sensed imagery is spatially dependent and likely to be similar to those around it (Berberoglu et al.,
2000). The use of a region instead of a single pixel allows for the
construction of a small CNN (dubbed “compact CNN” or cCNN: Samarth et al., 2019) with fewer convolutional layers that assigns a class to the central pixel according to the properties of the region (Carbonneau
et al., 2020b). It therefore combines spatial and spectral information.
Sharma et al. (2017) use a patch size of
For the pixel-based classification in phase two, we used an MLP (Fig. 6a). An
MLP is a typical deep learning model (also commonly known as an artificial
neural network) which consists of three (or more) interconnected layers (Rumelhart et al., 1986; Berberoglu et al., 2000). The MLP has five layers consisting of four fully connected (dense) layers and one batch normalisation layer (Fig. 6a). The first dense layer has 64 output filters and is followed by a batch normalisation layer which helps to reduce overfitting by adjusting the activations in the network to add noise. This is followed by two more dense layers with 32 and 16 filters, respectively. Each dense layer uses
The MLP was trained using conventional early stopping with a patience parameter and a minimum improvement threshold. The minimum improvement was set as 0.5 %. Training did not stabilise for at least 20 epochs, so the patience was set to 20. This means that if training does not improve the validation accuracy by 0.5 % after a period of 20 epochs, the training will stop. Since the MLP is pixel-based, the number of parameters was smaller compared to the patch-based model, with 3192 trainable parameters for RGBNIR imagery.
For the patch-based classification in phase two, we used a cCNN architecture
(Fig. 6b). This model architecture is referred to as a compact CNN (see
Samarth et al., 2019) because it contains fewer convolutional layers in
comparison to conventional CNNs. The cCNN learns the class of a central
pixel in a patch as a function of its neighbourhood. So, for each pixel in
the input image, a small image tile is extracted with square dimensions of
the patch size (e.g.
The architecture of the cCNN is composed of a deepening series of
convolution layers which change depending on the patch size. In effect, we
use as many
As with the MLP, conventional early stopping was used to train the cCNN with
a patience parameter and a minimum improvement threshold. The minimum
improvement was set as 0.5 %. For patch sizes of
The performance of CSC was tested in two ways to allow comparison to previous deep learning methods. Firstly, classification accuracy was measured using manually collected validation labels. Secondly, a calving front detection method was implemented, and error was quantified using manually digitised calving front data for all test images.
Model performance is often measured by classification accuracy (the number
of correct predictions divided by the total number of predictions). However,
some models require more robust measures of accuracy which also account for
confusion between predicted classes (Goodfellow et al., 2016; Carbonneau et
al., 2020a). We therefore used an
The validation labels used to calculate
In addition to classification performance, we implemented a calving front detection method based on morphological geodesic active contours (see Fig. S1). The method is based on the definition of a calving front as the contact between “ocean” pixels (open water, iceberg water, or mélange) and glacier ice pixels. Since the final classification output from CSC is at pixel level, this allowed for calving front detection at the native spatial resolution of Sentinel-2 imagery (10 m). Error was quantified for each predicted calving front by measuring the Euclidean distance between each predicted calving front pixel and the closest pixel in manually digitised calving fronts. From this, the mean, median, and mode errors were quantified for each predicted calving front. Calculating the median and mode values allows the elimination of outliers in calving front predictions (Baumhoer et al., 2019). Calving fronts were digitised in QGIS 3.4 and rasterised to form a single pixel-wide line.
Table S3 shows that the highest classification performance in phase one was
achieved using
Similarly, an evaluation of calving front error for CSC results revealed
that a patch size of
A kernel density estimate (KDE) plot of the full error
distribution for all calving front predictions derived from all test sites
using classifications produced with optimal parameters. Error values above
1000 m are grouped into a single bin to reduce tail length and show a second
peak which represents catastrophic errors in calving front prediction. Note
that low calving front errors occur most with
In comparison, manually digitised calving fronts usually have an error of
around 2 to 4 px. For example, Carr et al. (2017) calculated a mean calving front error of 27.1 m using repeat digitisations. In this work, small classification errors of a few pixels (often caused by shadows at the front) can lead to errors in the range of 5 to 10 px. The smaller-scale information provided in a
Figure 8 shows examples of CSC applied to images of the Helheim test site.
High
Examples of pixel-level classification outputs for seasonally
variable imagery from the in-sample test site showing input images of
Helheim in the first column, which were acquired on
Examples of pixel-level classification outputs for seasonally
variable imagery from the out-of-sample test site showing input images of
Store in the first column, which were acquired on
The joint training method improved classification performance (Table 2).
Results were only marginally improved for the in-sample study site, which was
to be expected since phase one models were already trained on data from
Helheim. A comparison of classification outputs from single and joint
training for an image of Store Glacier can be found in Fig. S3, which shows
that the addition of joint fine-tuning rectified areas of misclassification
seen in results which used single training, with the overall
Optimum
Confusion matrices which show the relationship between CSC class predictions
and validation data for each test glacier are shown in Fig. S6. In summary,
Fig. S6 shows good agreement between predicted and actual classes for all
glaciers, with the exception of the open water class for Helheim and
Jakobshavn where confusion occurs between the iceberg water and bedrock
classes. Open water is the smallest class for both sites, with open water
often covering only small areas in each individual image. There is still
class confusion in joint results (Fig. S7); however better overall
Moreover, the size of input imagery to the CSC workflow is not limited to a
specified set of dimensions. Since collection of validation labels for each
test image required manual digitisation, the test sites were restricted to
A time series produced using CSC results showing calving front position and changes in mélange area at Helheim throughout 2019 can be seen in Fig. 10. Figure 10a and c show fluctuation in calving front position between March and October 2019 with an overall pattern of retreat. Two predicted calving fronts which had an error of over 4.2 px were removed from the time series, and frontal position change was quantified using the rectilinear box method to account for cross-glacier variation (Lea et al., 2014). Figure 10b and c illustrate the variation in mélange area for all nine in-sample test images. Taken together, these results show the robustness of CSC and usefulness of multi-class outputs for holistic analysis of marine-terminating glacial environments.
Our results build on the work of deep-learning-based classification methods
for ice front delineation (Baumhoer et al., 2019; Mohajerani et al., 2019; Zhang et al., 2019; Cheng et al., 2021), with several key innovations and variations of note. Firstly, the CSC workflow produces multi-class outputs using seven semantic classes rather than the binary outputs of previous methods. This fulfils the aim to provide meaningful information which could be used for a variety of applications at the scale of entire outlet glacier landscapes. In terms of classification accuracy, CSC produces marginally better
Additionally, since previous deep learning studies which produce binary
classifications for Greenlandic outlet glaciers do not provide
Mean calving front errors from previous deep learning methods designed specifically to detect ice fronts in comparison to the mean calving front errors produced by CSC in this study.
The second major difference between CSC and previous methods is the deep learning architecture. All previous deep learning classification methods for delineating ice fronts (Baumhoer et al., 2019; Mohajerani et al., 2019; Zhang et al., 2019; Cheng et al., 2021) use FCN/U-Net architectures (Ronneberger et al., 2015). Hoeser et al. (2020) reviewed image segmentation and object detection in remote sensing, and whilst they do conclude that FCN/U-Net architectures are dominant, they still find about 30 % of published work uses patch-based approaches which are akin to the second phase of the CSC method presented here. This suggests that FCN architectures need not be considered the de facto algorithm for glacial landscape classification. Moreover, the advantage of CSC over one-stage patch-based methods using FCNs is that the initial phase one CNN provides transferability and delivers bespoke training labels for the pixel-level patch-based operator (as described in Sect. 2.1). We discuss the other major implications of the architectural differences between our work and FCNs in the following sections.
CSC has certain practical advantages over FCNs in terms of data processing
and computational loads. Firstly, the CSC method has low pre-processing
requirements. In effect, Sentinel-2 images were cropped to produce large
images containing whole marine-terminating glacier landscapes, yet still
within a workable size for detailed digitisation of validation labels. The
only other pre-processing step required is normalisation by a constant
factor of 8192 to convert raw Sentinel-2 data to 16-bit floating point data.
Once this is done, CSC has a low computational load. Training the initial
VGG16 model can be done in under 1 h using an I7 processor at 5.1 GHz
and an Nvidia RTX 2060 GPU. When CSC is subsequently applied to a sample
image of
In contrast, for several of the previous studies which implement FCN
architectures, a larger number of pre-processing steps are required,
including but not limited to rotation for consistent glacier flow direction,
edge enhancement, and pseudo-HDR toning (Mohajerani et al., 2019; Zhang et
al., 2019; Cheng et al., 2021). Similarly, FCN architectures can be very
demanding in terms of computer RAM and GPU RAM, especially when large images
are used as inputs. When we tested this by implementing the popular FCN8
based on VGG16 which has
In terms of the number of training samples used for deep learning models,
Goodfellow et al. (2016) note that, as a general rule, each class should
contain at least 5000 samples to reach satisfactory performance, but models
can reach and exceed human-level performance when trained on at least 10 million samples. Considering this, the number of labelled samples produced
by manually labelled training images and data augmentation in the datasets
used here (210 000 tiles) makes them relatively small. However, in
comparison to pre-trained models such as VGG16 which were trained on the
ImageNet database using over 1000 classes, our adapted VGG16 architecture
only uses seven classes and therefore can be trained sufficiently with
“only” a few hundred thousand samples. This suggests that relatively few images
are needed to produce highly accurate image classifications using our
workflow, reducing the time required for initial creation of manually
labelled training data. Furthermore, the number of satellite acquisitions
used to produce the training data for the phase one CNN in CSC is smaller
than that used to train models in previous FCN-based studies. Given that our
optimal phase one CNN training sample is
The size of input imagery also represents an area where CSC has advantages
over FCNs. In FCN architectures, the instance that must be classified must
be well framed in the input image. Often in the case of higher-resolution
images where such framing would lead to image sizes in excess of
Finally, from a theoretical perspective, FCN architectures can be strongly
dependent on object shapes and less dependent on inner textures. In the
final stages of the encoder part of an FCN architecture, the simplified
shape of the object will contribute to the weights learned in training (as
will inter-class relations). This means that an FCN must be trained to
recognise specific shapes. As a result, an FCN trained only on data from
Helheim could not be expected to perform well at the task of classifying
Jakobshavn. There are no published examples where an FCN has been trained on
a single glacier and displays transferability to very different glaciers.
For example, Mohajerani et al. (2019) train their FCN on three glaciers
(Jakobshavn, Sverdrup, and Kangerlussuaq) and only test it on Helheim
Glacier. Similarly, the FCN used by Zhang et al. (2019) is only trained and
tested on Jakobshavn, providing no test of spatial transferability. Instead,
multiple sites must be included in FCN training in order to reach good
transferability (e.g. Cheng et al., 2021). Contrastingly, in this study,
even before the application of joint fine-tuning, the phase one VGG16 CNN
solely trained on data from Helheim successfully classified large areas of
Jakobshavn, leading to very high performance with final phase two results
with
Overall, the empirical results presented here show that CSC has delivered a
state-of-the-art performance for novel multi-class pixel-level
classification of marine-terminating glacial landscapes in Greenland. In
summary, when compared to FCN architectures, CSC has lower training data
volume requirements and simpler pre-processing steps. Moreover, the workflow
produces marginally better
The results reported here demonstrate that the CSC workflow adapted for
landscapes containing marine-terminating outlet glaciers in Greenland
produces state-of-the-art pixel-level classifications for seasonally
variable imagery. After testing the performance of different band
combinations, tile sizes, and patch sizes on seasonally variable test
imagery, we find that classifications reach
Given that CSC can identify multiple semantic classes, this also provides scope for analysis in other research areas, beyond calving front monitoring. Changes in other class boundaries could be monitored, for instance to detect changes in snowline/equilibrium line position and quantify ablation area change (Noël et al., 2019). Similarly, the multi-class outputs could be used to quantify seasonal changes in the area of a specific class, for example to monitor changes in the area of mélange (Foga et al., 2014; Cassotto et al., 2015) as shown in Fig. 10. Moreover, while CSC operates at the scale of overall land cover classes, outputs could potentially be used to isolate a specific target class for detection of smaller-scale features, for example to detect change in the evolution of supraglacial lakes (Hochreuther et al., 2021) and subglacial meltwater plumes (How et al., 2017; Everett et al., 2018), as well as iceberg tracking (Barbat et al., 2021). Finally, the outputs of the CSC script retain the geospatial information of the input data, meaning classification and calving front outputs can be easily manipulated in GIS software.
The joint fine-tuning method significantly improved classification
CSC performance was optimal when using RGBNIR bands rather than RGB bands alone. Testing the use of additional image bands to increase spectral data may be advantageous in future work. For example, Xie et al. (2020) used a CNN trained with 17 input bands derived from Landsat 8 imagery and DEM data and found that using more bands produced higher accuracy for mapping debris-covered mountain glaciers. However, this may not necessarily be the case with marine-terminating outlet glaciers, and using additional input channels is likely to increase processing time, which should also be taken into account when considering that accurate results can be achieved using only RGBNIR bands.
We proposed that adopting a patch-based technique which includes contextual
information surrounding a pixel would aid classification of complex and
seasonally variable outlet glacier landscapes, as it has in other
applications (Sharma et al., 2017), and we found that the phase two patch-based
method significantly outperformed the pixel-based method. This also
validates similar findings that patch-based CNNs outperform standard
pixel-based neural networks and CNNs (Sharma et al., 2017). For calving
front detection, a patch size of
We develop and evaluate a workflow for novel multi-class image classification of seasonally variable marine-terminating outlet glacier scenes using deep learning. The development of deep learning methods for automated classification of outlet glaciers is an important step towards monitoring processes at high temporal and spatial resolution (e.g. changes in frontal position, mélange extent, and calving events) over several years. While still in its infancy in glacial settings, image classification using deep learning provides clear potential to reduce the labour-intensive nature of manual methods and facilitate automated analysis in an era of the burgeoning availability of satellite imagery. Our two-phase workflow, termed CNN-Supervised Classification, is adapted for classification of medium-resolution Sentinel-2 imagery of outlet glaciers in Greenland. In phase one, the application of a well-established, pre-trained CNN called VGG16 replicates the way a human operator would interpret an image, rapidly producing training labels for a second image-specific model in phase two. Application of the phase two model produces pixel-level classifications according to seven semantic classes characteristic of complex outlet glacier settings in Greenland.
Alongside an evaluation of input parameters and training methods on model
performance, we apply and test the workflow on 27 seasonally variable unseen
images. The test dataset is composed of nine images from the training area
of Helheim Glacier (in-sample) and 18 images from Jakobshavn and Store
glaciers which represent landscapes not previously seen during training
(out-of-sample). Resulting pixel-level classifications produce high
Sentinel-2 imagery is available from
the Copernicus Open Access Hub (2020, available at
The supplement includes descriptions for each of the
seven semantic classes (Table S1), the Sentinel-2 acquisitions used for
training and testing the classification workflow (Table S2), a flow chart of
the methodology used to produce calving fronts (Fig. S1), phase one
PC developed the code with contributions and editing by MM. MM created training and test data, implemented the code to perform image classifications, and wrote the manuscript. CRS and PC supervised, discussed results, and edited the manuscript.
The authors declare that they have no conflict of interest.
Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
We acknowledge the European Union Copernicus programme and the European Space Agency (ESA) for providing Sentinel-2 data. We are also grateful for the constructive comments from the three reviewers and the editor (Bert Wouters), which improved both the content and clarity of the paper.
This paper was edited by Bert Wouters and reviewed by three anonymous referees.