Articles | Volume 20, issue 2
https://doi.org/10.5194/tc-20-931-2026
© Author(s) 2026. This work is distributed under the Creative Commons Attribution 4.0 License.
Antarctic ice sheet model comparison with uncurated geological constraints shows that higher spatial resolution improves deglacial reconstructions
Download
- Final revised paper (published on 04 Feb 2026)
- Supplement to the final revised paper
- Preprint (discussion started on 15 May 2025)
- Supplement to the preprint
Interactive discussion
Status: closed
Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor
| : Report abuse
-
RC1: 'Comment on egusphere-2025-2008', Anonymous Referee #1, 03 Jul 2025
- AC1: 'Reply on RC1', Anna Ruth Halberstadt, 24 Oct 2025
-
RC2: 'Comment on egusphere-2025-2008', Anonymous Referee #2, 05 Sep 2025
- AC2: 'Reply on RC2', Anna Ruth Halberstadt, 24 Oct 2025
Peer review completion
AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload
ED: Publish subject to revisions (further review by editor and referees) (15 Nov 2025) by Florence Colleoni
AR by Anna Ruth Halberstadt on behalf of the Authors (15 Nov 2025)
Author's response
Author's tracked changes
Manuscript
ED: Referee Nomination & Report Request started (01 Dec 2025) by Florence Colleoni
RR by Anonymous Referee #2 (11 Dec 2025)
RR by Anonymous Referee #1 (20 Dec 2025)
ED: Publish subject to minor revisions (review by editor) (19 Jan 2026) by Florence Colleoni
AR by Anna Ruth Halberstadt on behalf of the Authors (19 Jan 2026)
Author's response
Author's tracked changes
Manuscript
ED: Publish subject to technical corrections (20 Jan 2026) by Florence Colleoni
AR by Anna Ruth Halberstadt on behalf of the Authors (21 Jan 2026)
Manuscript
The authors present a new automated workflow for model-data comparison using the Penn State University ice sheet model and the ICE-D surface exposure age database. The paper goes in depth about different approaches and metrics for comparing model simulations to spatially and temporally sparse data, with a thoughtful treatment of uncertainties (model, analytical, and geological). This work is impressive and well suited for publication in The Cryosphere after some revision.
My largest concern with this manuscript is also the easiest to fix: the original sources of the data shown in figures need to be cited, not just the ICE-D database. The way this database is currently used means that Balco (2020) gets cited instead of the original data sources. For instance, the six sites shown in Fig 16 a–f span four publications, but only two of those publications are cited, and those not in relation to the data. Surely there is a way to easily compile the necessary citations when pulling data from ICE-D.
Because the paper is very long and dense, it would benefit from some significant revision for clarity. For instance, as you’ll see from my specific comments below, I was confused for a long time about the different metrics being described, how they were used, and how they did or did not interact with each other. It became more clear in the Results and Discussion sections, but it made the Methods section very difficult to get through. I suggest adding a more clear and thorough roadmap of that section before getting into the details.
I have also pointed out some places in the detailed comments below where aspects of the methodology seem arbitrary or not well supported by the data, or where such complete automation could be undesirable. Obviously it would be absurd to rely on careful manual curation of an ever-growing dataset forever, but the pitfalls of automation need to be clearly addressed. The manuscript does a fairly good job at this in various places, but I think it would benefit from a dedicated sub-section of the Discussion that goes further into these considerations beyond the comparison in Fig 11.
Specific comments:
In general, locations of sample sites shown in figures need to be marked on a map somewhere. For instance, site KRING in Fig 8 is not shown on any location map.
While automation and the use of uncurated data certainly has obvious benefits, the cosmogenic nuclide record is often ambiguous and frequently relies on expert assessment of individual samples and their geologic context. I worry that automating the model-data comparison process will come with a loss of expert judgment. For instance, the exposure-age record at Diamond Hill is quite complex and uses multiple nuclides (Be-10 and C-14) and sample types (erratics and bedrock), but Fig 16a is missing a key in-situ C-14-saturated sample and shows a greatly simplified version of the history that does not agree with the preferred ice history suggested by flow-band modeling (Hillebrand et al., 2021). I’m not insisting that the preferred history must be correct but instead pointing out how much nuance is lost in the automated approach.
L88–100. You could use the modern ice surface elevation at each site as a constraint, rather than a complete spatial map. This is worth considering adding to your analysis, especially given that many of the later figures in the paper show poor fit to present day ice thickness at the end of the model runs and this is essentially a free data point. And if one reason to compare model results with paleo-constraints is to calibrate a model for future projections, getting the modern state right is very important.
L102: Clarify that this ~10–20km resolution only holds for paleo simulations. Plenty of models use much higher resolution than this for shorter-term simulations.
L 163: Is “marine ice shelf instabilities” intentional phrasing?
L 167–170: All of these data sets (with the possible exception of Liu et al., 2009) are pretty seriously outdated at this point. Has any attempt been made to update these? Is there evidence that the model is insensitive to the choice of these datasets?
Section 2.2 needs more information:
What data sets are used to define present day ice thickness and bed topography?
How does the basal friction inversion work?
Is there also an optimized ice stiffness field?
How is the present-day temperature state achieved and what geothermal flux product is used?
What spatial resolution? Has any mesh convergence testing been done?
What sub-shelf melt and calving schemes are used?
L 182: “The model basal inversion slipperiness input is downscaled at a correspondingly high resolution for each nested domain.” Does this just mean it is interpolated from the coarse resolution, or is the inversion done on the nested domain or some other high resolution continental domain? For most of the outlet glaciers, the coarse resolution inversion is probably meaningless, since even Byrd Glacier is at best going to be a few grid cells wide at best.
“Nested simulations have been shown to be resolution-independent.” Numerically speaking, this cannot be true in general. The figure referenced from DeConto et al (2021) shows that the model is more or less converged with respect to resolution at 10km grid spacing for that particular domain and scenario. This will not necessarily hold for other applications.
Is there some relaxation period used to let the model adjust to the nesting resolution? Going to higher resolution usually means the model needs time to adjust, which can take thousands of years.
Fig 16 caption references Fig 0, which does not exist.
L261: Maybe this is made clear later on, but from this text it sounds like only ice thickness change (and not the actual value of ice thickness or surface elevation) is evaluated here. I understand that it’s often very tricky to get models to match absolute surface elevation, but you could have an excellent model agreement to observed ice thickness change and still be extremely far off in terms of the ice surface elevation. For many cases, this would make it seem like the model is doing a good job of explaining the data, when in fact it is biased extremely high or low. Update: Okay, I see that you also have this exceedance metric later on. It would be good to mention it here.
L330: What do you define as LGM age here? Depending on the production rate scaling scheme, even using 30ka would prevent C-14-saturated samples from being included. This also brings up another symptom of automation, which is that pre-exposed erratics can still provide a constraint on LGM surface elevation even if they do not provide a good constraint on the timing, but those are discounted here.
L335: What data product do you use to define the modern ice surface? Also, do the sites have to be strictly within a model cell? Due to the low model resolution, I can imagine cases where the sample sites are in a cell that is always ice free, but could still provide constraints on an adjacent cell that contains ice. This could happen even at the 2km resolution for some of the smaller TAM outlets.
What are the numbers of lumped sites for the different resolutions?
L375: Needs some example citations
L 400: It is starting to occur to me that there should be a summary table or list at the beginning of section 3 that briefly covers all the different metrics used.
S 3.1.6 Surely there is a more rigorous way to estimate the minimum geologic error? It seems strange to use this relatively sophisticated Monte Carlo analysis and then eyeball the dividing line for the tail of the distribution. Perhaps using the 95th percentile value as the cutoff? I know it’s unlikely to change the number much, but this would be preferable in terms of reducing interpretation bias and leaving the workflow flexible as new samples are added to the database.
Also, is there a strong justification for making this a uniform value, rather than varying it spatially by sector, for instance?
L423: needs references for “ Previous work has relied on continental-scale ice sheet models at 20-40km”. In general, all of section 3 would benefit from more references.
Fig 4 legend is missing the grey curve for the first 2km entry. The box on the continental map showing the location of b–e is hard to see. I suggest making it a bright color that contrasts better with the ice thickness map (or simply removing the ice thickness from this plot and just showing the grounding line or continental outline).
L472–474: Possibly as (or maybe more) important than the ice-flow perturbation is the formation of blue ice areas and wind scoops on the down-wind side. See figure 4 in Bintanja (1999), for instance.
L476: I’m not quite sure what is meant by “due to parameter variation” here. There are many factors aside from parameter uncertainty that play into this: forcing uncertainty, initial condition uncertainty, structural uncertainty, etc. You could probably just delete that phrase, as it’s fairly obvious why it is hard to match the modern state at the end of a simulation lasting tens of thousands of years. I would also change “may not” to “is very unlikely to” or even “will not”.
L487: Does the float-scoring metric really account for time? If so, it seems that time is double counted when also applying the time-offset metric.
L500: I don’t understand this sentence: “The timing of model thinning should be generally insensitive to model-data alignment issues.” Can you reword for clarity? By “alignment”, are you referring to the elevation misfit? It could also be interpreted as general alignment along space and/or time dimensions, which makes the sentence rather ambiguous.
L501: The parenthetical “horizontal” here is a bit strange, since horizontal by definition refers to space, not time. Specify that this is along the horizontal axis in the age-elevation space shown in Fig 6, for instance.
L506: There could also be random errors in the forcing datasets accounting for this. Not all forcing uncertainty is going to be systematic. Conversely, model resolution issues could certainly impart a systematic bias. For instance, low resolution model simulations tend to exhibit marine-ice-sheet-instability-style collapse more readily than high resolution simulations. If the model is not converged with respect to resolution (as I would bet is the case for 40km simulations), there would be a systematic bias across the entire West Antarctic Ice Sheet during a deglaciation event.
Based on Figure 6, it looks like you apply a dozen or so equally spaced offset values and then score them to find the best of those user-defined values. Presumably there is a straightforward way to solve this as a minimization problem, which would be more accurate and robust and make the workflow more automated. Also, panel b would make more sense if the axes were flipped, since the offset is the independent variable here.
S3.2.3: Are these metrics weighted equally? Can you do some kind of L-curve analysis to determine the optimal weights?
L529: If I understand this correctly, using “the closest time of exposure of the data point” means that the interpretation of the exposure age is dependent on the model prediction, which doesn’t seem appropriate here. It also suggests that any age within the uncertainty bounds is equally likely, which is not the case.
L~565: By this point, I’m fairly confused. What was the point of the metrics in 3.2.3 if you instead use this misfit metric in equation 1? I think this section needs to start with a more thorough roadmap that explains which metrics are needed and what they are used for.
3.3.2 title: Presumably you want to score the model, not the samples?
L583: Why not use a similar 10kyr value for this case? The way it is done here makes the two youngest samples in Fig 7 have significantly less weight than the oldest sample, even though the model fits them all very poorly. Also, it seems like the choice of the 10% ice-thickness-change vertical window could have a very large impact on scoring. If that window was just a very tiny bit wider, both the oldest and the second-youngest samples in Fig 7 would be interpreted by the algorithm as having relatively small ∆t values. How was this 10% window determined? Instead of using a window of uniform height for each site, what if the algorithm could solve for the necessary window height and then account for that in the misfit calculation?
Also, should this reference Fig 7 instead of Fig 5?
Fig 8: The top panel shows a misfit of 0 even though it underestimates modern ice thickness by 50m (out of only 350m total thickness change) and is still thinning at a significant rate at the end of the simulation. See my comment above about including the modern ice thickness as a constraint.
This misfit cap of 50 seems quite arbitrary as well. How was this determined? Can you show at least two other sites to show that this value is representative?
This figure doesn’t necessarily support the cap of 50. For instance, the difference between the misfits of 46 and 55 is not all that significant: they both thin too little, too late, and the difference is just a matter of degree. And more importantly, the run with misfit 55 thins at about the right time and is in fact significantly better than the run with misfit 120, which bears almost no resemblance to the data whatsoever. So I think the limit of 50 is actually a bit too restrictive here. One could also argue that the runs with misfits of 46 and 55 both explain the data better in some senses than the run with misfit of 34, since that thins thousands of years too early (assuming that the assertion in L 487 that the float misfit score accounts for timing of thinning is true).
L607: How sensitive are the conclusions to this spatial weighting scheme? It seems like you could defensibly define this in multiple ways — for instance by using ice-flow catchment boundaries (either time-dependent or modern) — and those could possibly give different answers without a clear way to choose between them.
Eq 5: I’m still a bit lost at this point. Reading from the top of section 3 from top to here, I don’t understand how all the different metrics come together. You first define a float-scoring (i.e., thickness change) metric and a best-time-offset metric in S3.2.3. Then in 3.3.1 you define this sample misfit metric that appears to be more or less independent of either of those. That metric is used to calculate site misfit, which is then used to calculate model misfit. So where do the float-scoring and best-time-offset metrics come in? As I’ve mentioned above, this section would greatly benefit from a very clear roadmap at the beginning that gives a very clear summary of the procedure before you dive into the details. The text at the beginning of Section 3 seems to attempt this, but doesn’t really help clarify the approach for me.
L 628–630: need example references
L705: Technically, saturated C-14 concentrations provide robust evidence for lack of LGM ice cover. The scaling factor of 10 seems arbitrary as well. How sensitive are results to this assumption?
L731: Here the float misfit score has come back. Is this in fact the same as the misfit defined in Eq 5? Why is the time-misfit score not mentioned here?
L788: “wrong more consistently” is rather ambiguous phrasing. Reword for clarity. Something like “exhibit a more consistent time-offset bias”. Refer to Fig 13 here to help illustrate.
Fig 13: Median and interquartile range might be more meaningful than standard deviation for distributions like these.
L835: Are the 20- and 10-km results shown anywhere? If not, a figure should be added.
L898: delayed and more rapid compared with data, or with other model runs?
L896–904: My guess is that this is due to the power-law (Weertman-type) basal friction parameterization used in the PSU model, which uses a relatively large exponent and thus assumes a relatively hard ice-sheet bed. A more-plastic bed rheology would likely be more appropriate for much of the Antarctic Ice Sheet, though the appropriate rheology and thus the appropriate friction parameterization will vary widely in space. The power-law exponent has not been varied as a parameter for any study with this model that I am aware of, but it has a large impact on time-evolving behavior in other models (Parizek et al., 2013; Gillet-Chaulet et al., 2016; Nias et al., 2018; Joughin et al., 2019; Brondex et al., 2019; Hillebrand et al., 2022; Schwans et al, 2023). It cannot be determine with a snap-shot inversion approach, but instead requires calibration in time-evolving simulations, or a transient inversion. I think using a more-plastic bed rheology might tend to lower the maximum thickness (leading to better exceedance scores) and also lead to steadier, less abrupt thinning (potentially improving float misfit scores if I understood the previous text correctly). Obviously you don’t need to attempt this here, but it’s worth mentioning that this key assumption has never been tested with this model.
L980: One explanation that occurs to me (although there are many many possible and complementary explanations) is that deposits in currently ice-free valleys alongside TAM outlet glaciers can be hundreds of meters lower in elevation than deposits of the same age that are on the glacier valley walls because the glacier margins extended several km into these valleys at the LGM (see, eg., the wide range of elevations of the mapped Britannia I limit at Lake Wellman in King et al., 2020). Those two populations would both be compared against very similar (or identical) modern-day surface elevations, so they record very different ice thickness changes despite recording the same event at almost the same location. Without some calculation that accounts for the very different elevation of these valley-floor deposits (e.g., using some estimated surface slope to project them back to the glacier centerline or nearest model grid-cell), it’s unlikely that they will be used accurately in this analysis. Recorded rates of thinning also vary widely (up to maybe a factor of 2) between valley-floor and nearby valley-wall samples. An automated approach will likely always miss the difference between these types of samples. To be fair, however, most expert curation would has also missed that distinction.
There’s also the issue that the small-scale meteorology of the glacier margins is a strong control on location of deposits, and is not going to be represented at all in the model. For instance, the presence of algae-hosting melt ponds at the LGM and even the deposition of erratics in valleys requires marginal ablation areas.
L991: Resolution is just one of many important model choices that have a bearing on this, so it’s not necessarily resolution that is limiting accuracy at this point. The remaining discrepancies are more likely due to all the other sources of uncertainty (model structure, unrepresented physics, parameters that likely need to vary spatially, bed topography, forcing, etc). I would guess that you’re not going to see much further improvement at higher resolution at this point, although it would be interesting to test that for a few of these ensemble members. The nested domains are also still driven by the low-resolution simulations at the boundaries, so they cannot completely decouple from the inaccurate low-resolution ice dynamics occurring outside the high-resolution region. This might not be a big issue at the interior sites, but in the TAM it is probably important, since the grounding-line position in the Ross Embayment is likely to be resolution dependent.
The sign convention on time in the figures in the appendix is reversed relative to the main text. Please point this out in captions.
References:
Balco, Greg. "A prototype transparent-middle-layer data management and analysis infrastructure for cosmogenic-nuclide exposure dating." Geochronology 2.2 (2020): 169-175.
Bintanja, Richard. "On the glaciological, meteorological, and climatological significance of Antarctic blue ice areas." Reviews of Geophysics 37.3 (1999): 337-359.
Brondex, J., Gillet-Chaulet, F., and Gagliardini, O.: Sensitivity of centennial mass loss projections of the Amundsen basin to the friction law, The Cryosphere, 13, 177–195, https://doi.org/10.5194/tc-13-177-2019, 2019.
DeConto, Robert M., et al. "The Paris Climate Agreement and future sea-level rise from Antarctica."Nature 593.7857 (2021): 83-89.
Gillet-Chaulet, F., Durand, G., Gagliardini, O., Mosbeux, C., Mouginot, J., Rémy, F., and Ritz, C.: Assimilation of surface velocities acquired between 1996 and 2010 to constrain the form of the basal friction law under Pine Island Glacier, Geophys. Res. Lett., 43, 10311–10321, https://doi.org/10.1002/2016GL069937, 2016.
Hillebrand, Trevor R., et al. "Holocene thinning of Darwin and Hatherton glaciers, Antarctica, and implications for grounding-line retreat in the Ross Sea." The Cryosphere 15.7 (2021): 3329-3354
Hillebrand, Trevor R., et al. "The contribution of Humboldt Glacier, northern Greenland, to sea-level rise through 2100 constrained by recent observations of speedup and retreat." The Cryosphere 16.11 (2022): 4679-4700.
Joughin, I., Smith, B. E., and Schoof, C. G.: Regularized Coulomb Friction Laws for Ice Sheet Sliding: Application to Pine Island Glacier, Antarctica, Geophys. Res. Lett., 46, 4764–4771, https://doi.org/10.1029/2019GL082526, 2019.
King, Courtney, et al. "Delayed maximum and recession of an East Antarctic outlet glacier." Geology 48.6 (2020): 630-634.
Nias, I. J., Cornford, S. L., and Payne, A. J.: New Mass-Conserving Bedrock Topography for Pine Island Glacier Impacts Simulated Decadal Rates of Mass Loss, Geophys. Res. Lett., 45, 3173–3181, https://doi.org/10.1002/2017GL076493, 2018.
Parizek, B. R., et al. "Dynamic (in) stability of Thwaites Glacier, West Antarctica." Journal of Geophysical Research: Earth Surface 118.2 (2013): 638-655.
Schwans, Emily, et al. "Model insights into bed control on retreat of Thwaites Glacier, West Antarctica." Journal of Glaciology 69.277 (2023): 1241-1259.