Calving Front Machine (CALFIN): Glacial Termini Dataset and Automated Deep Learning Extraction Method for Greenland, 1972-2019

We present Calving Front Machine (CALFIN), an automated method for extracting calving fronts from satellite images of marine-terminating glaciers. The results use Landsat imagery from 1972 to 2019 to generate 22,678 calving front lines across 66 Greenlandic glaciers. The method uses deep learning, and builds on existing work by Mohajerani et al., Zhang et al., and Baumhoer et al. Additional post-processing techniques allow for accurate segmentation of imagery into Shapefile outputs. This method is uniquely robust to the impact of clouds, illumination differences, ice mélange, and Landsat-7 Scan 5 Line Corrector errors. CALFIN provides improvements on the current state of the art. A model inter-comparison is performed to evaluate performance against existing methodologies. CALFIN’s ability to generalize to SAR imagery is also evaluated. CALFIN’s fronts are often indistinguishable from manually-curated fronts, deviating by 2.25 pixels (86.76 meters) from the true front on a diverse set of 162 testing images. The current implementation offers a new opportunity to explore sub-seasonal trends on the extent of Greenland’s margins, and supplies new constraints for simulations of the evolution of the mass balance 10 of the Greenland Ice Sheet and its contributions to future sea level rise.


Introduction
The evolution of Greenland's tidewater glaciers is an important constraint on the evolution of the Greenland Ice Sheet (Nick et al., 2013). Likewise, changes in Greenland are important in tracking and predicting future sea level rise over the next century et al., 2016). Modern machine learning techniques and deep neural networks provide a robust, scalable, and accurate solution to these processing challenges.
In this study, Sect. 2 covers the data source along with the spatial and temporal coverage. Sect. 3 examines the CALFIN algorithm and method for processing the data. Sect. 4.1 validates the algorithm through error analysis. Sect. 5 and Sect. 6 shows as well as discusses the results -the calving front dataset and algorithm. Several potential data sources are first evaluated for use, including Terra/MODIS, TerraSAR-X, Landsat, and Sentinel (see Table 1). Landsat is selected for its long time-series availability and reasonable spatial distribution/resolution. Table 1. Potential Data Sources: A comparison of selected data sources available for use. We initially focus on a single source for the study.

Name
Resolution ( "max-pool" layers. The last contribution adds a 2-channel output to the decoder, allowing for both calving front mask and ice/ocean masking. Together, these changes reduce the size of the network from 40M parameters to 29M parameters while also increasing the overall accuracy.

Post-Processing
At this stage, the 2-channel pixel mask output of CALFIN-NN is post-processed to extract useful Shapefile data products as

Calving Front Reprocessing
Individual fronts are first isolated from the processed image and reprocessed as zoomed-in subsets of the input image wherever they are detected. The front detection method is described in Sect. 3.3.3. The nature of CALFIN-NN's output as a confidence measure is also exploited, so that generated fronts can be filtered out based on how confidently detected each one is.

5
Next, a polyline is fit to the pixel mask to retrieve the correct coastline boundary. This is performed by converting each pixel in the mask to nodes in a graph, connecting the nearest neighboring nodes, then finding the single longest path in the graph's minimum spanning tree (MST) (Kruskal, 1956). This polyline not only corresponds with the coastline edge, but also out-performs other contour finding algorithms by eliminating noise, errors, and gaps inherited from previous steps. A visual example is given in Fig. 6a-d. connecting each pixel (blue) to 15% of its nearest neighbors with an edge (black). (c) Next, create an MST from the graph. (d) Now, extract the longest path from the MST. (e) Finally, mask the static coastline using the fjord boundaries (blue) to extract the calving front.

Coastline to Calving Front
Next, the calving front is isolated from the coastline polyline. Fjord boundary masks are first created for each basin. By calculating the distance from each point in the coastline to the nearest fjord boundary pixel, then selecting the contiguous pixels which are the farthest from the fjord boundaries, the calving front can be isolated. The result of this is shown in Fig 6e.

15
The last step is to export the polyline and corresponding polygon as geo-referenced Shapefiles. First, the polyline is smoothed to eliminate noise artifacts inherited from previous steps. Next, the smoothed polylines, fjord boundary mask, and land-ice/ocean masks are combined to create a polygonal ocean mask. Optionally, manual verification each output with the original GeoTIFF subset can be performed. This was done for all cases in this study to ensure the validity of the automated pipeline. This constrains the mean distance error to be <100 m, as covered in the following section. isting studies as discussed in Sect. 6.2. These validation sets contain data that is excluded during model training. This prevents the models from memorizing data and skewing the accuracy.

Error Estimation
The primary quality assessment method is the Mean Distance Error (Mohajerani et al., 2019;Zhang et al., 2019;Baumhoer et al., 2019). Conceptually, this method resembles the numerical integration of the area between two curves, normalized by the 10 average length of the curves (see Fig. 7a). Also referred to as the Area over Front (A/F) in literature, this method can also be seen as a generalization of the method of transects along arbitrarily oriented fronts (Mohajerani et al., 2019;Baumhoer et al., 2019). This metric is implemented by taking the mean/median of the distances between closest pixels in the predicted and manually delineated fronts. Note that pixel distance is biased to be inversely proportional to a network's input size, so the error in meters is also provided in the following analysis.

Classification Accuracy
The secondary quality assessment method calculates the Intersection over Union (IoU) (Baumhoer et al., 2019). This metric evaluates the degree of overlap between the predicted and ground truth masks of the calving front. It is calculated by dividing the number of pixels in the intersection of two masks over the number of pixels in the union of the two masks (see Fig. 7b).
When calculating the IoU of 3 pixel wide edges, this measure is very strict: 1 pixel of difference results in a score of 0.5000, and 20 scores in that range or above are indicative of human levels of accuracy. When calculating the IoU of land-ice/ocean-mélange masks, this measure is less strict, and scores in the range of 0.9000 and above are indicative of human levels of accuracy.

Validation Results
The following subsections list tables that print the above metrics for the associated validation sets, the values from the original studies, and a subset of the outputs of CALFIN-NN on each. The primary validation set, the CALFIN validation set (CALFIN-VS), consists of 162 images with clouds, illumination differences, ice mélange, and Landsat 7 Scanline Corrector Errors (L7SCEs). The CALFIN-VS contains data from 62 Greenlandic basins, including Helheim, which was specifically excluded   Note that the drop in mean pixel distance despite the increase in mean meter distance (and vice versa) comes from L7SCE images being reprocessed at lower sizes due to detection failures (see Fig. 5c), and pixel error bias being inversely related to input size (see Sect. 4.1).  While this ensures accurate fronts are output rather than incorrect fronts, this filtering behavior removes potentially large errors, 15 and must be accounted for when comparing errors across other sets. 10 https://doi.org/10.5194/tc-2020-231 Preprint. Discussion started: 14 October 2020 c Author(s) 2020. CC BY 4.0 License.

Data Product Results
The code implementation of the CALFIN method is released, along with its associated calving front data products as described in the following subsections, for use within the scientific community.  Table S2) derived from the MEaSUREs Glacial Termini Dataset (Moon and Joughin, 2008;Joughin et al., 2015), and names are derived from Bjørk et al. (2015). These products can be found at datadryad.org/stash/share/Q9guqsrdoB7v2a9JSLsgoV6HY_RS8RkCDvStx2eWsBg.

15
An implementation of CALFIN-NN is available at github.com/daniel-cheng/CALFIN. Innovations as described in Sect. 3.2 can also be applied to other networks and investigations. The implementation and its associated processing scripts are written in Python 3, using the Keras & Tensorflow libraries. Note that access to the network parameters is hosted separately as part of the associated DataDryad dataset linked above (Cheng et al., 2020). For additional insight into the network training and processing requirements, see the following discussion in Sect. 6.1.

Training Insights
Throughout the course of the study, several innovations are developed to improve the performance of CALFIN-NN. To increase accuracy, a custom Intersection-over-Union based loss function is used to heavily penalize incorrect calving front predictions. To prevent over-fitting the neural network, a large set of training data was manually delineated (see Fig. S3), totalling 5 1541 Landsat and 232 Antarctic SAR image/mask pairs, with the SAR data taken from the same training scenes used by Baumhoer et al. (2019). Another measure to prevent over-fitting involves data augmentation, which entails performing random flips/transpositions, random Gaussian noise, random sharpen filters, random rotations of up to 12°, random crops, and random scaling on the pre-processed images during CALFIN-NN training. Through empirical testing, it is determined that excessive image padding, rotation, warping, and cropping calving fronts to close to the image bounds result in sub-optimal performance.

10
Yet another helpful technique is the use of test-time augmentations. More specifically, each image subset is cut into 9 overlapping 224x224 image windows and processed individually, before being reassembled into the final 256x256 output mask. This allows for multiple independent classifications of the central pixels, ensuring agreement and confidence in detected calving fronts.
After integrating these improvements, CALFIN-NN is trained for a total of 80 epochs, with 4000 batches per epoch, and

Data Analysis and Usage Example
With the new data available to use in the CALFIN dataset, it is possible to explore a subset and validate the evolution of Helheim Glacier against existing ESA-CCI, MEaSUREs, and PROMICE data products (ENVEO, 2017;Joughin et al., 2015;Andersen et al., 2019). Similar to Zhang et al. (2019), the relative change in position of the calving front along the fjord centerline from 1972 to June 2019 is graphed. For Joughin et al. (2015), if a date range is given, the same relative change at both start and end 5 dates (Moon and Joughin, 2008) is plotted. For Andersen et al. (2019), August 15th is used as the "end-of-melt-season" date of delineation, as the date is otherwise not specified in the provided data. Fig. 13 shows the length change of the calving front along the basin centerline, relative to its Sept. 6, 1972 position. Overall, there is high agreement between CALFIN and existing data products on the evolution of Helheim over the available time series. Note that while Helheim is relatively easy to accurately and automatically delineate, all of the above data is still 10 produced without manual input outside of visual verification. Thus with this context in mind, this comparison with existing data products helps validate the applicability of this study's outputs.

Inter-model Comparison
To similarly reinforce the validity of the study, and address the shortcomings of different error metric comparisons (as dis- ated, which is within the known capabilities of the M-NN. Furthermore, the same pre-and post-processing is applied to both models.  Fig. 9, and the error analysis is continued below.
To reemphasize the differences in mean distance error calculation between different studies, Mohajerani et al.