TermPicks: A century of Greenland glacier terminus data for use in machine learning applications
- 1Department of Geological Sciences, University of Texas at Austin
- 2Institute for Geophysics, University of Texas at Austin
- 3Department of Earth and Space Sciences, University of Washington, Seattle, WA, USA
- 4Polar Science Center, Applied Physics Laboratory, University of Washington, Seattle, WA, USA
- 5Department of Geography and Planning, University of Liverpool
- 6University of California at Irvine, Irvine, CA, USA
- 7Geography Department, College of Science, Swansea University, Swansea, UK
- 8Department of Geosciences and Natural Resource Management,University of Copenhagen, Copenhagen, Denmark
- 9School of Geography, Politics and Sociology, Newcastle University, Newcastle-Upon-Tyne, UK
- 10School of Geosciences, University of Edinburgh, Edinburgh, UK
- 11School of Geography and Sustainable Development, University of St Andrews, UK
- 12Jet Propulsion Laboratory, California Institute of Technology
- 13Institute for Risk and Uncertainty, University of Liverpool
- 14Department of Geography and Environmental Sciences, University of Northumbria, Newcastle upon Tyne, United Kingdom
- 15The Geological Survey of Denmark and Greenland, Østervoldgade 10, 1350 København K, Danmark
- 16National Snow and Ice Data Center, Cooperative Institute for Research in Environmental Sciences, University of Colorado Boulder
- 17Department of Geography, University of Sheffield, Sheffield, UK
- 18Earth System Science Programme, The Chinese University of Hong Kong
- 1Department of Geological Sciences, University of Texas at Austin
- 2Institute for Geophysics, University of Texas at Austin
- 3Department of Earth and Space Sciences, University of Washington, Seattle, WA, USA
- 4Polar Science Center, Applied Physics Laboratory, University of Washington, Seattle, WA, USA
- 5Department of Geography and Planning, University of Liverpool
- 6University of California at Irvine, Irvine, CA, USA
- 7Geography Department, College of Science, Swansea University, Swansea, UK
- 8Department of Geosciences and Natural Resource Management,University of Copenhagen, Copenhagen, Denmark
- 9School of Geography, Politics and Sociology, Newcastle University, Newcastle-Upon-Tyne, UK
- 10School of Geosciences, University of Edinburgh, Edinburgh, UK
- 11School of Geography and Sustainable Development, University of St Andrews, UK
- 12Jet Propulsion Laboratory, California Institute of Technology
- 13Institute for Risk and Uncertainty, University of Liverpool
- 14Department of Geography and Environmental Sciences, University of Northumbria, Newcastle upon Tyne, United Kingdom
- 15The Geological Survey of Denmark and Greenland, Østervoldgade 10, 1350 København K, Danmark
- 16National Snow and Ice Data Center, Cooperative Institute for Research in Environmental Sciences, University of Colorado Boulder
- 17Department of Geography, University of Sheffield, Sheffield, UK
- 18Earth System Science Programme, The Chinese University of Hong Kong
Abstract. Marine-terminating outlet glacier terminus traces, mapped from satellite and aerial imagery, have been used extensively in understanding how outlet glaciers adjust to climate change variability over a range of time scales. Numerous studies have digitized termini manually, but this process is labor-intensive, and no consistent approach exists. A lack of coordination leads to duplication of efforts, particularly for Greenland, which is a major scientific research focus. At the same time, machine learning techniques are rapidly making progress in their ability to automate accurate extraction of glacier termini, with promising developments across a number of optical and SAR satellite sensors. These techniques rely on high quality, manually digitized terminus traces to be used as training data for robust automatic traces. Here we present a database of manually digitized terminus traces for machine learning and scientific applications. These data have been collected, cleaned, assigned with appropriate metadata including image scenes, and compiled so they can be easily accessed by scientists. The TermPicks data set includes 39,060 individual terminus traces for 278 glaciers with a mean and median number of traces per glacier of 136 ± 190 and 93, respectively. Across all glaciers, 32,567 dates have been picked, of which 4,467 have traces from more than one author (duplication of 14 %). We find a median error of ∼100 m among manually-traced termini. Most traces are obtained after 1999, when Landsat 7 was launched. We also provide an overview of an updated version of The Google Earth Engine Digitization Tool (GEEDiT), which has been developed specifically for future manual picking of the Greenland Ice Sheet.
Sophie Goliber et al.
Status: final response (author comments only)
-
RC1: 'Comment on tc-2021-311', Anonymous Referee #1, 15 Nov 2021
In this manuscript, the authors have described a dataset of manually digitized terminus positions for outlet glaciers of the Greenland ice sheet compiled from previously-published datasets, in order to provide a consistently-formatted training dataset for future machine learning applications. This is an excellent and timely undertaking that highlights the power of collaborative efforts.
On the whole, the manuscript does a good job describing the issues involved in combining "input" datasets from multiple authors, as well as describing the "ouptut" dataset, and even manages to show an example application of combining data sources. Accordingly, I only have a few minor comments/suggestions to make on the manuscript. The bulk of my comments/suggestions have to do with the description of the metadata - I think a Table with a few different example entries would help clarify this for a reader.
- l. 10: is this the mean (± standard deviation)?
- l. 52: check that months are removed from the reference dates
- l. 104: is the Howat reference here for the MODIS image?
- l. 130 (Date): I found this description slightly confusing - are there 5 columns (one column for the date string, four columns for the year, month, day, and decimal date)? From the dataset, I see that it is indeed five individual columns, but the header makes it seem like there's only one column here (Date).
- l. 135 (Satellite): How is this formatted/written?
- l. 144 (Scene ID): here again, it would be helpful to have more information about this. The Landsat Product ID/other satellite IDs are relatively straightforward, but what about the aerial images?
- l. 155 (Quality Flag): What does this entry look like for a given image? From the dataset, I see that it's comma-separated 2-digit strings (00, 01, 02, 03, 04, 05) - I'm not sure I would have gotten that from the description here.
- l. 170: where do the glacier centerlines come from?
- l. 226: how many of these picks needed manual checking?
- l. 228: wouldn't it make more sense to compare the image (assuming it exists) against the different picks, rather than using the completeness of the metadata?
- Figure 5: I really like this figure.
- The GEEDiT walkthrough is great - have you thought about putting it on github pages (https://pages.github.com/) so that it's more widely visible/available?
-
AC1: 'Reply on RC1', Sophie Goliber, 24 Feb 2022
The comment was uploaded in the form of a supplement: https://tc.copernicus.org/preprints/tc-2021-311/tc-2021-311-AC1-supplement.pdf
-
RC2: 'Comment on tc-2021-311', Anonymous Referee #2, 16 Dec 2021
Review: TermPicks: A century of Greenland glacier terminus data for use in machine learning applications
The manuscript from Goliber et al. collates terminus shapefile from a variety of different published studies into one dataset, complete with metadata, with the ultimate aim that the dataset could be used as training data for machine learning.
I think this is both an excellent manuscript and dataset and I enjoyed having a look through the dataset and the associated Google Earth file. I certainly recommend the publication of this manuscript in The Cryosphere. I do have a few very minor comments which the authors may wish to consider.
Line 91: Why exclude glaciers with less than two authors digitizing them? What is the rationale for this?
Section 3.2: Is there a bias here, in that most of the repeated terminus picks I presume are from the later periods i.e. 2000-2020. Here the imagery is of much superior quality, which would result in a lower error. In particular most of the Landsat-1 scenes have a pretty poor geolocation accuracy and often require a manual correction, could this result in a much larger error?
Figure 9: There seems to be a large difference between the authors in this figure in the calculated retreat, but I can not distinguish any difference on the figure due to the thickness of the shapefile. Could the thickness of the shapefiles be reduced to help with this?
-
AC2: 'Reply on RC2', Sophie Goliber, 24 Feb 2022
The comment was uploaded in the form of a supplement: https://tc.copernicus.org/preprints/tc-2021-311/tc-2021-311-AC2-supplement.pdf
-
AC2: 'Reply on RC2', Sophie Goliber, 24 Feb 2022
-
RC3: 'Comment on tc-2021-311', Anonymous Referee #3, 27 Jan 2022
Summary: The authors compiled all publicly-available Greenland marine-terminating outlet glacier positions from a wide variety of authors and performed a rigorous standardization procedure with the aim of creating a terminus trace database that could train machine learning algorithms. A description of qualitative and quantitative differences between the sources is provided, as well as a cursory review of the terminus position data coverage and estimated retreat rates relative to single datasets. The discussion focuses on recommendations for use of these data in machine learning algorithms as well as generation of additional manual terminus trace data using the updated GEEDiT tool (called GEEDiT-TermPicks).
The manuscript is easy to read and documents much-needed work. Although I hope the standardized datasets and the “ideal” approach and output format for the terminus data will advance our field, I am a bit disappointed that this manuscript did not describe any novel insights gained from the combined dataset. I assume that is the topic of another manuscript, but it would have been nice to have this manuscript go a bit beyond a dataset description.
Major Points:
- I’m not a huge fan of the title. A think there are lots of other applications for this dataset and I think it does the dataset a disservice for the title to suggest it can only be beneficial to machine learning applications. Also, there is no demonstration how the dataset improves machine learning applications (although the authors site machine learning manuscripts focused on glacier change). Instead, I recommend something broader, like “A standardized dataset and workflow for Greenland glacier terminus positions”.
- I appreciate that the results focus on errors and biases for individual traces, but I would also like more information on what the dataset can tell us about changes over time. This does not have to be a Greenland-wide description, but it is important to demonstrate how the combined dataset is much improved over individual datasets. There is one example figure (Figure 8) that is briefly mentioned in the discussion section as an example of the more “complete view of the change” for a glacier. It would be helpful if more examples were given, say as a series of subplots, and that some patterns in retreat rate, magnitude, or timing of changes in those metrics were presented for the broader dataset. Figure 6 gets close to doing this sort of broad overview to demonstrate merit, but doesn’t adequately emphasize the value added by combining the datasets. If these sorts of metrics were presented for some of the contributing datasets as well, I think that information would really emphasize the need for coordination of efforts so that records are detailed in time but also extensive in both space and time. Right now there isn’t anything that demonstrates the broad importance of the dataset you worked hard to create.
- I’m not sure if this should be swapped in as a main figure or added as a supplemental figure, but I’d like to see heat maps or actual maps of the average temporal resolution and coverage for each glacier. You could potentially use different symbol sizes and colors on an actual map to display those data. Right now the focus is on the number of traces for each glacier, which is important for machine learning, but the temporal resolution and coverage is much more important for someone who would want to analyze these data.
- In my opinion, the data formatting section should be below the metadata creation section. You mention scene IDs in the metadata creation but that comes after you already describe how you assigned IDs for datasets that did not contain that bit of metadata.
Minor Comments:
- Why is the ID flag 005 but all the other flags begin with X?
- Section 3.3: There needs to be more quantitative substance here. You briefly state that you observe changes in retreat rates. What are the retreat rates? See my major comment about including more of a comparison with the contributing datasets to demonstrate difference
-
AC3: 'Reply on RC3', Sophie Goliber, 24 Feb 2022
The comment was uploaded in the form of a supplement: https://tc.copernicus.org/preprints/tc-2021-311/tc-2021-311-AC3-supplement.pdf
Sophie Goliber et al.
Data sets
TermPicks: A century of Greenland glacier terminus data for use in machine learning applications Sophie Goliber https://doi.org/10.5281/zenodo.5512724
Sophie Goliber et al.
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
909 | 369 | 25 | 1,303 | 19 | 14 |
- HTML: 909
- PDF: 369
- XML: 25
- Total: 1,303
- BibTeX: 19
- EndNote: 14
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1