Comment on tc-2021-175

General comments 1. The commentary in this manuscript is focussed on a single multi-model comparison (ISMIP6 Greenland). Many of the suggestions would similarly apply to other initiatives in the ice sheet modelling community (e.g., ISMIP6 Antarctica, ABUMIP, and LARMIP2), outside of it (e.g., GlacierMIP or even CMIP), and equally to many individual projections. While ISMIP6-Greenland can still be used as an example, we would appreciate clarification on why the authors did not undertake a broader approach (for example, using both ice sheets).

selecting data, processing output, and combining results have not been motivated and described with sufficient detail. In particular: * It is unclear how the uncertainty envelope has been derived. The original figure for the historical period (Fig. 4, Goelzer et al., 2020) does not attempt to show uncertainty in the model results, but simply reports the ensemble. Please clarify how the 90% credibility interval was derived.
* The uncertainty envelope of, e.g., IMBIE depends on the choice of assuming fully correlated or uncorrelated errors when accumulating uncertainties (compare again Fig. 4 in . Can you motivate your choice for the narrower envelope and discuss the implications? * Please discuss the conceptual difference between the historical experiments (until 2014) and the projections (2015+). While modellers were free in their simulation of the historical period, the projections were tightly constrained by CMIP model output. Combined analysis across those experiments (e.g., 2008-2020) is therefore difficult to interpret. See also the comment on L. 45 below.
L. 27-28: "ISMIP6 produced probabilistic distributions of projected sea level contribution" ISMIP6 did not produce probabilistic results -it presented ensembles with no probabilities attached. Others (e.g., Edwards et al., 2021) have used these ensembles to make probabilistic assessments, but their analysis includes additional information and not simply the ISMIP6 results.
L. 31: "This ISMIP6 distribution has since been adopted as the foundation for the IPCC AR6 consensus estimate." What is the basis for this statement? See the general comment above.
L. 35: "Our skepticism regarding the ISMIP6 projections is based on the premise that accurate predictions of the cryosphere's contribution to sea level require that models: 1. Fully characterize uncertainties in model structure, parameters, initial conditions, and boundary conditions.

Yield simulations that fit observations within observational uncertainty."
Although the requirements are laudable, it is nearly impossible for any study to achieve both, let alone a large multi-model project such as ISMIP6. To "fully characterize uncertainties" is demanding indeed, but this is not a problem within a single study, as long as other research addresses other uncertainties. It is also impossible for a model to fit all available observations within observational uncertainty, unless the model is overtuned. One can argue that particular observations are supremely important, but it is not obvious that recent mass loss is more important than, say, an accurate simulation of observed ice extent, thickness, and velocity.
Although this paragraph is set up to elaborate both requirements, the subsequent analysis of Fig. 1 is based only on the second requirement. See the comments above on the augmentation of ISMIP6 results.

L. 43: "Most simulations underestimate recent (2008-2020) mass loss."
The period 2008-2020 straddles the ISMIP6 historical period (ending in 2014), during which modellers used the forcing of their choice, and the future period (2015+), when forcing was provided by climate models. Mass loss from 2015 reflects, in part, natural variability that would not be reproduced by the climate models.
Why focus on 2008-2020? Figure 1 (and Fig. 4 in Goelzer et al., 2020) start before 2008, so the statement applies also to a longer time period.

L. 45: "Underestimating recent mass loss likely translates into underestimating mass loss at 2100 as well."
This is not necessarily true. As pointed out above, it is important to distinguish the historical period (before 2015) from the projections. Generally speaking, in order for ice sheet mass loss to be accurately simulated for the recent past (2008-2020), two things are required: The climate forcing should be accurate, and the ice sheet model should accurately represent the processes translating this forcing into mass loss.
Modellers were free to choose their own forcing for the historical period; in most cases, they used SMB output from regional atmosphere models such as RACMO and MAR. Some ice sheet models may have applied SMB forcing that was biased positive for the period 2008-2014. Also, most models did not apply forcing to outlet glaciers before 2015.
ISMIP6 climate forcing from 2015 onward was derived from the CMIP5 and CMIP6 Earth System Model (ESM) ensembles. A known complication of this forcing is that interannual variability (known to be important in determining Greenland's mass budget) is seldom in phase with the observed climate. This significantly complicates a model-observation comparison over a short period of 12 years.
For these reasons, models that underestimated mass loss during 2008-2020 might have been responding realistically to biased forcing. To demonstrate that they underestimate mass loss at 2100, one would need to argue that (1) the ESM-derived SMB forcing through 2100 is biased positive, and/or (2) the models underestimate recent mass loss when forced with an accurate SMB and output glacier forcing. projections, but rather is connected to uncertainties in the forcing. Thus, the uncertainty framework described here does not apply in the same way to ISMIP6. Goelzer et al.
(2020) state explicitly that we did not sample RCM uncertainty (which could loosely map to some of the uncertainties in PDD factors), but these complexities are not discussed here.
Since uncertainty in the forcing (SMB and outlet glaciers) could account for the issues highlighted by Fig. 1, it seems appropriate to address that uncertainty.
L. 102: "This lack of knowledge induces parametric uncertainty, for example, different values of thermal conductivity within firn might lead to different predictions of sea level contribution."