Reply on RC1

The authors use HIRHAM5 regional climate model to downscale two EC-Earth models (one for CMIP5 and one for CMIP6), in order to assess projected surface mass balance from Greenland and Antarctica in the future. Whilst this is an important area of research given the uncertainty in future SMB estimates, especially with the difference in CMIP5 and CMIP6 climate sensitivity, there are still very big assumptions being made, and little justification for their research design. After reviewing their previous submission and the changes between versions, there still remains large questions in my mind in terms of the robustness of the results given that only 1 model from each CMIP is used to draw conclusions on the future of Greenland and Antarctic SMB. I am pleased to see the evaluation of HIRHAM5 is now included, and have no issue in the choice of RCM (as the authors note, most similar studies use MAR/RACMO, but there is no need for them to be used exclusively). However, I still see no justification for the choice of GCM. There are also a number of other questions remaining on their method design choices and how this influences the results. Therefore, I can’t recommend the manuscript for publication in its current state and recommend major revisions.

The authors use HIRHAM5 regional climate model to downscale two EC-Earth models (one for CMIP5 and one for CMIP6), in order to assess projected surface mass balance from Greenland and Antarctica in the future. Whilst this is an important area of research given the uncertainty in future SMB estimates, especially with the difference in CMIP5 and CMIP6 climate sensitivity, there are still very big assumptions being made, and little justification for their research design. After reviewing their previous submission and the changes between versions, there still remains large questions in my mind in terms of the robustness of the results given that only 1 model from each CMIP is used to draw conclusions on the future of Greenland and Antarctic SMB. I am pleased to see the evaluation of HIRHAM5 is now included, and have no issue in the choice of RCM (as the authors note, most similar studies use MAR/RACMO, but there is no need for them to be used exclusively). However, I still see no justification for the choice of GCM. There are also a number of other questions remaining on their method design choices and how this influences the results. Therefore, I can't recommend the manuscript for publication in its current state and recommend major revisions.
Reply: We agree that it is a challenge to draw firm conclusions on any differences between CMIP generations using only one GCM. This study is primarily on the difference between two EC-Earth versions but to some extent also on how these EC-Earth versions compare with other CMIP models. When it comes to the choice of GCM, EC-Earth v2 has been used in a number of studies (both as a GCM and as regional downscalings using HIRHAM5) with a focus on Greenland and the Arctic, showing that it has an arctic cold bias (see figure 2). In EC-Earth v3, this arctic cold bias has more or less disappeared (see figure 2) and the current study aimed at investigating how this would affect the SMB for Greenland. Unpublished data (Cecile Agosta, pers. comm.) indicates that EC-Earth3 is one of the highest performing GCMs in the CMIP6 ensemble when it comes to replicating observed Arctic climate during the historical period. Given the cold Arctic bias in EC-Earth2, we therefore seek in this publication to understand how the apparently improved version for the Arctic will affect the ice sheet mass budget of Greenland. We also feel that this comparison is worthwhile for the Antarctic ice sheet as many models display this hemispheric asymmetry in performance. It is also not clear that models that perform well compared to observed climate can be said to be more reliable when it comes to future projections as this is more likely to be related to ECS of a given model. Furthermore, we would argue that future climate projections should explore the wide distribution of outcomes from global climate models, rather than focusing on the ensemble mean. As the EC-Earth3 model has a high equilibrium climate sensitivity and the v2 model a rather low one, projections with both model versions likely represent these extremes. Future work will focus on downscaling other GCMs that represent other parts of the CMIP6 model ensemble but here we focus on improvements and changes to one single model as it has evolved from CMIP5 to CMIP6. We will add text that discusses the implications of these questions more fully in the introduction and discussion sections.

Specific major comments:
Methods: Overall, the manuscript would benefit from justification of certain choices you have made in your methods and model setup. I list a number of questions below, which need some justification and also discussion on how your choices may influence your results.
Model selection: Why EC-Earth, and why the specific realisations that you chose? Whilst I appreciate the high time-and computing power-consumption of downscaling GCM/ESMs with RCMs, there still needs to be some justification of your selections. Efforts are being made to ensure that the 'best' GCM/ESM realisation are chosen for specific regions using selection criteria (see Pickler and Mölg 2021 and earlier references within). Even referencing earlier literature which highlights the success of EC-Earth compared to other models against observations could be cited. What if EC-Earth performs relatively poorly (compared to reanalysis and observation data) compared with other models? Or what if these specific realisation members (r3i1p1 and r5i1p1f1) are not reflections of the average ensemble for EC-Earth? In the discussion you mention the Southern Annual Mode, but there is no discussion of whether EC-Earth is able to represent the SAM characteristics. In your discussion, you mention the Bracegirdle study which found a large spread in conditions between CMIP5 models (line 315), which further suggests that you need to justify why you have chosen only 1 model. Whilst you do compare EC-Earth to ERA-I for the ice sheets, you don't then compare any other GCMs to ensure that EC-Earth is an appropriate tool. Figure 4 is a step in the right direction, but again doesn't provide any information on whether these two models are the best suggestions of historical/future climate for the ice sheets. Which models are included in Figure 4? Line 154 is quite broad-there are over 600 realisations for CMIP6 models in total, which ones are you using in Figure 4?
Reply: The reply on choice of GCM is given in the reply above. We do not make any attempt in trying to select the best GCM for this study. We want to study EC-Earth and how downscalings of two versions of EC-Earth differ when it comes to ice sheet SMB. As can be seen in figure 4, there are relatively small differences in changes in precipitation and temperature between realization members, so we are convinced that we would get similar results if we had chosen a different member. The models included in figure 4 are one realization for each model available at the time having both historical and RCP85/SSP585 simulations, giving 39 (+ 2 EC-Earth) members for CMIP5 and 21 (+ 7 EC-Earth) members for CMIP6. We will add this to the text and the figure 4 caption.
Selected time periods: Why are you using different time periods (and different durations) for Greenland and Antarctica? What is the justification for looking at 20 vs 30 years, and why have different time periods in both historical and future runs? If you are trying to compare the SMB and discuss the uncertainties between CMIP5 and CMIP6, why add to the complexity by choosing different time periods? With ERA-I only available from 1979, and therefore this simulation is 8 years shorter than the Antarctica GCM runs (Line 199), why still chose the 1971-2000 period? There is then some discrepancy throughout the manuscript. For instance, Figure 3 shows 2081-2100 but Antarctica uses the period 2070-2100 in other results. Why are there different spin up times for historical and scenario forcings for Antarctica?
Reply: The CMIP5 downscalings for Greenland and Antarctica were part of two separate projects with focus on two different time periods. When we planned for the CMIP6 downscalings we decided to use the same time periods as were used with CMIP5 to save time and computing power. We understand the confusion that can arise from this decision. When plotting global data ( figure 2 and 3) we decided to use the 2081-2100 scenario period since this is common to both regional domains. We did not want to use different time periods in the different panels of figure 3, so we went with 2081-2100 for all 4 panels. Also, figure 2 and 3 were done for 30 year time periods as well and the result is very similar to the shorter periods. When it comes to the spin-up times for Antarctica, the spin-up time for the scenario is shorter since we use the historical spin-up condition as a starting point for the scenario spin-up. We will add this information to the text.
Why downscale to such high resolutions when you spend very little time discussing the regional differences? It would be nice to include more about the regional differences between the CMIP runs, as well as just presenting the continent-averaged values.
Reply: Figure 5, 6 and 7 do present regional differences and this is mentioned in the text. But we agree that the differences and similarities can be further discussed. We can also mention that there are other ongoing studies looking in more detail on these differences.

Discussion
Ln 310: this paragraph should be more descriptive. To what extent do your results agree with RACMO2 and other models? They all agree with an increase for CMIP6, but are the magnitudes of increase similar in your results to the others? Similarly, in line 298 onwards, you mention the opposite results of other studies, but also do not go into detail about why they disagree. This needs to be discussed more so that the reader can interpret why your results differ or agree.
Reply: We agree. We will expand these two paragraphs.
Other specific comments: Throughout: Why v2 and v3? I would recommend abbreviating the model runs to something more intuitive considering CMIP5 and CMIP6 GCMs are used. Perhaps v5/v6 or EC-Earth5 EC-Earth6, so that it is instantly understandable for the reader. It only becomes clear in the third paragraph of the methods that EC-Earth2 and EC-Earth3 are the actual names of the model. Reply: We mention on line 100 in the introduction section that the EC-Earth2 and EC-Earth3 names come from the model versions EC-Earth v2.3 and v3.3. To be consistent with the model names used in CMIP5 and CMIP6 and to avoid confusion for future versions we would like to stick with 2 and 3.
Ln 93: regional climate models is used rather than an abbreviation here. As there are many abbreviations throughout, perhaps you could avoid using the abbreviation in line 75.