Spatio-temporal reconstruction of winter glacier mass balance in the Alps, Scandinavia, Central Asia and western Canada (1981–2019) using climate reanalyses and machine learning

Guidicelli, Matteo; Huss, Matthias; Gabella, Marco; Salzmann, Nadine

doi:https://doi.org/10.5194/tc-17-977-2023

Articles | Volume 17, issue 2

https://doi.org/10.5194/tc-17-977-2023

© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/tc-17-977-2023

© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 17, issue 2

Research article

|

01 Mar 2023

Research article |

| 01 Mar 2023

Spatio-temporal reconstruction of winter glacier mass balance in the Alps, Scandinavia, Central Asia and western Canada (1981–2019) using climate reanalyses and machine learning

Matteo Guidicelli, Matthias Huss, Marco Gabella, and Nadine Salzmann

Download

Final revised paper (published on 01 Mar 2023)
Supplement to the final revised paper
Preprint (discussion started on 27 Apr 2022)
Supplement to the preprint

Interactive discussion

Status: closed

RC1:
'Comment on tc-2022-69', Anonymous Referee #1, 01 Jun 2022

In "Snow accumulation over the world’s glaciers (1981-2021) inferred from climate reanalyses and machine learning", a machine learning model is applied to 95 glaciers on 3 continents to downscale precipitation and other variables from commonly used reanalysis products.

The problems begin with the title, which overstates its importance. Only a tiny fraction, in fewer than half of the continents, of the world's glaciers are examined. The manuscript has too many figures and tables. The manuscript is supposed to be within 12 journal pages for TCD. The tables and figures alone, most of which occupy a full page, would take up this much space. The figures are bloated. For example, there is no need to illustrate "Tree 1" nor "Tree N", both of which are identical in Figure 3. The PCA section (4.1) doesn't tell the reader much more than the fact that elevation is the most important downscaling predictor. The leave one out validation is problematic as there is no independent validation dataset used, meaning that biases in precipitation are unlikely to be identified.

ERA-5 and MERRA-2 reanalyses are used without any mention of their potential large biases in the mountains. For example, Liu and Margulis (2019) report that MERRA-2 underestimates snowfall (which is based on the "PRECTTOLAND" variable used here) by 54% in High Mountain Asia. It's not clear to me that the downscaling techniques presented here will correct that bias, as no independent evaluation of precipitation is presented. Melt and sublimation are ignored in the "winter mass balance," which is then the wrong term.

After carefully searching through the text, I still cannot understand how precipitation phase was treated. It seems to have been ignored as SWE is used interchangeably with the downscaled precipitation on glaciers. But then, in Table B1 and B2 ERA-5/MERRA-2 snowfall variables are listed as predictors?

Because of its excessive length, lack of clarity, and questionable assumptions, I recommend this manuscript be rejected. For a resubmission, I suggest the authors consider an independent evaluation of snow accumulation and at least an explanation of how precipitation phase was treated. The size of the figures and tables needs to be cut approximately in half.

Works cited

Liu, Y., and Margulis, S. A.: Deriving Bias and Uncertainty in MERRA-2 Snowfall Precipitation Over High Mountain Asia, Frontiers in Earth Science, 7, 10.3389/feart.2019.00280, 2019.

Citation: https://doi.org/10.5194/tc-2022-69-RC1
- AC1:
  'Reply on RC1', Matteo Guidicelli, 25 Jul 2022
  We would like to acknowledge the reviewer for this thorough and critical review that has helped us to sharpen the focus of our study.
  In the following, we report our responses (bold) to the reviewer's concerns (within quotation marks).
  “The problems begin with the title, which overstates its importance. Only a tiny fraction, in fewer than half of the continents, of the world's glaciers are examined. The manuscript has too many figures and tables. The manuscript is supposed to be within 12 journal pages for TCD. The tables and figures alone, most of which occupy a full page, would take up this much space. The figures are bloated. For example, there is no need to illustrate "Tree 1" nor "Tree N", both of which are identical in Figure 3. The PCA section (4.1) doesn't tell the reader much more than the fact that elevation is the most important downscaling predictor.”:
  Title: We agree that the term “world’s glaciers” can be misleading. In response to this comment we will change the title to: “Snow accumulation over glaciers in the Alps, Scandinavia, Central Asia and Western Canada (1981-2020) inferred from climate reanalysis and machine learning”
  Number of figures and tables: We agree that some simplification is beneficial to the paper and we will accordingly perform major changes including a reduction of the number of Figures / Tables, as well as their content wherever possible and briefly described in the following:
  Tables 2 and 3 will be moved to the Supplementary material. Fig. 2 will be moved to the Supplementary material as well. Fig. 5 could also be moved to the Supplementary material; even though it shows that other predictors than elevation are important to explain different biases between reanalysis’ precipitation and snow accumulation on glaciers.
  We also agree that Sec. 4.1 needs to be modified in order to better quantify the added value of each group of predictors on the model’s performance. In the revised version of the paper we will show the changes in terms of overall model performance when suppressing the downscaled predictors (and/or other predictors (e.g. topographical)). In fact, this might be a better evaluation of the predictors’ importance than only showing the frequency of use of the main predictors (Fig. 4a and b) and their correlations (Fig. 4c and d).
  Fig. 3 will be simplified and replaced by a smaller figure without the illustration of the “Trees”.
  
  “The leave one out validation is problematic as there is no independent validation dataset used, meaning that biases in precipitation are unlikely to be identified.”:
  Many thanks for this thought. However, we do not fully agree with this statement. For the “site-independent GBR”, the model is always validated on a glacier that is independent from the model’s training. Thus, as stated in the manuscript, the leave-one glacier-out cross-validation allows evaluating the generalization of the machine learning models for glaciers located in the same regions of the training data. Fig. 9 shows a more robust validation, where the performance of the machine learning models is also evaluated for completely independent regions (removing neighboring glaciers from the training data). Biases of reanalysis’s precipitation against snow accumulation data (based on ground measurements and extrapolation techniques (see Sec. 2.2)) on the glaciers of the study are therefore identified (see Figs. 6 and 7).
  Despite the glaciers used for validation being independent from the GBR model’s training, it is true that they have an influence on the choice of the optimal hyperparameters of the GBR model, i.e.: the GBR model was optimized to perform well on the validation data. However, each single glacier (1 out of 95 glaciers) used for the validation has a very limited weight on the overall performance (mean squared error) and on the choice of the GBR’s hyperparameters.
  In order to make the proposed method even more robust, we will also define the hyperparameters independently from the test sites, i.e.: in turn, each glacier will be used to test the GBR model trained and validated (k-fold cross-validation for the selection of the hyperparameters) with the other glaciers.
  
  “ERA-5 and MERRA-2 reanalyses are used without any mention of their potential large biases in the mountains. For example, Liu and Margulis (2019) report that MERRA-2 underestimates snowfall (which is based on the "PRECTTOLAND" variable used here) by 54% in High Mountain Asia.”:
  We are fully aware of the limitations of Reanalyses (because of missing and/or highly inaccurate in-situ observations) in high mountain region and specifically precipitation. In fact, our whole study is in principle motivated by this major challenge of improving the quantification of high altitude (solid) precipitation and SWE. In the current manuscript. Reanalysis biases in high-mountain regions are thus clearly mentioned including references in the introduction (lines 60-67). However, we agree that the biases observed in previous studies have not been described and quantified abundantly enough. In the revised paper we will better include them in the introduction thus enhancing the comprehensiveness of the manuscript. We will also add respective reference in the revised manuscript (e.g. Nitu et al, (2018), Zandler et al. (2019)).
  References:
  Nitu, R., Roulet, Y. A., Wolff, M., Earle, M., Reverdin, A., Smith, C., ... & Yamashita, K.: WMO Solid Precipitation Intercomparison Experiment (SPICE). Tech. Rep., World Meteorological Organization, 2018. a., http://hdl.handle.net/20.500.11765/10839
  
  Zandler, H., Haag, I., and Samimi, C.: Evaluation needs and temporal performance differences of gridded precipitation products in peripheral mountain regions, Scientific Reports, 9, 15 118, https://doi.org/10.1038/s41598-019-51666-z, 2019.
  
  “It's not clear to me that the downscaling techniques presented here will correct that bias, as no independent evaluation of precipitation is presented.”:
  Reanalysis’s precipitation is compared against snow accumulation data on glaciers. This data clearly is independent, and it is to our knowledge the only and thus best possible source of (cumulative) precipitation at very high elevation. The machine learning model is trained and validated against these snow accumulation data on glaciers. In general, from the results presented in the manuscript (e.g. Figs. 6 and 7) it is clear that, on average, the machine learning models can adjust the reanalysis’ bias against snow accumulation on glaciers, which is among the main purposes of the study.
  
  “Melt and sublimation are ignored in the "winter mass balance," which is then the wrong term.”:
  We do not fully agree with the reviewer here. The term “winter mass balance” refers to the snow water equivalent found on the glacier close to the maximum of snow depth, or the end of winter. Therefore, the winter mass balance – per definition – includes loss terms such as melt and sublimation, although they are not individually quantified. Furthermore, our periods of analysis are adjusted to optimally match the period where the components of melt and sublimation are small in comparison to accumulation by solid precipitation.
  
  “After carefully searching through the text, I still cannot understand how precipitation phase was treated. It seems to have been ignored as SWE is used interchangeably with the downscaled precipitation on glaciers. But then, in Table B1 and B2 ERA-5/MERRA-2 snowfall variables are listed as predictors?”:
  Indeed, the precipitation phase was ignored. In the revised paper we will more clearly describe this choice, and also why we think that this simplification is justified, as briefly summariezed in the following.
  We adjusted the total precipitation variable of the reanalysis (“tp” for ERA-5 and “PRECTOTLND” for MERRA-2 (see Sec. 2.1.1 and Sec. 2.1.2)). We are aware that a different adjustment factor of precipitation might be needed depending on the precipitation phase. However, as we only adjust the total precipitation occurred during the accumulation season, the adjustment factors represent the “average” adjustment factor of all precipitation events.The snowfall variable was used as a predictor in order to give the chance to the GBR model to learn that a different “average” adjustment factor should be applied depending on the proportion of snowfall against total precipitation (i.e. depending on the main precipitation phase during the accumulation season).
  
  Citation: https://doi.org/10.5194/tc-2022-69-AC1
RC2:
'Comment on tc-2022-69', Anonymous Referee #2, 18 Jun 2022

Guidicelli et al propose an interesting method to downscale and bias-correct reanalysis precipitation data to the elevation and sites of glaciers in 4 regions of the world. 2 reanalyses are used : ERA5 and MERRA2. The method is based on gradient boosting regressions, a technique from the field of artificial intelligence. The performance of this method is evaluated through cross-validation and discussed in terms of both temporal and spatial extrapolation. Finally, precipitation trends on glaciers are derived for each 4 regions based on the bias corrected and downscaled reanalysis data.

The study tackles the very interesting and yet unsolved issue of high-altitude precipitation amounts, with tools from machine learning. It adds to the existing literature by focusing on glacier winter mass balances, used as a proxy for winter precipitation at high altitudes. In my opinion, this makes the topic of this study very relevant. While the analyses displayed are in general sound, I advise a revision of the paper with respect to concerns regarding the spatial generalization capability of the models and the derivation of trends, see below.

MAIN COMMENTS

1 - Comparison/justification with respect to other AI techniques for bias correction and downscaling in literature : Even though the introduction describes well the existing literature on AI-based downscaling/bias correction methods, the choice of GBR is barely justified with respect to other techniques. I would have expected elements in that direction in the manuscript, especially since a section of the Discussion is entitled : '5.1 Advantages and disadvantages of gradient boosting regressors'.

2 - Limits inherent to the number of available learning data :

Some of the regions of interest, e.g. Canada and Central Asia, have in total less than 20 glaciers used in this study, which is an extremely low percentage of the number of glaciers that they truly host.

This in my opinion strongly impedes the (spatial) generalization capability of the GBR models learned on these data, to the region of interest as a whole. Although this is not what the authors do in the paper, this is what the title suggests while mentioning the world's glaciers. I would strongly recommend to modify this misleading title, as the developed technique is in practice not applied to derive precipitation data over any glacier of the world, but is limited to (i) the regions of interest and (ii) the few glaciers with data in these regions..

On top of the low sampling level for application of machine learning techniques in general, there may be furthermore a strong sampling bias in the glaciers data from WGMS, for instance towards large glaciers in the European Alps, so that the representativity of the glaciers with data w/r to the regions of interest is questionable. It follows that it is hard to know whether models or conclusions inferred solely based on these very few glaciers, are representative of the region as a whole.

I very much would like the authors to comment on this.

"The good performance of the GBRs in terms of bias suggests that they can be used for SWE estimates over glaciers where no ground observations are available (site-independent GBRs)". Despite being better than the benchmark, the performance of site-independant GBR models is limited (Fig 9) and decreases when data of neighbouring glaciers are excluded

from the training. Considering that, and the likely sampling biases of WGMS data, I think the authors could revise this sentence.

3 - Trends :

In my opinion the derivation of trends based on the GBR modelled precipitation, should be accompanied with sensitivity tests to ascertain the robustness and uncertainties of this method. Typically, data-withdrawal techniques could be used on the longest time-series to evaluate the robustness/uncertainty of the trends derived when missing data are encountered. The distribution of the data gaps within the time-series (= for instance one missing season every two year, vs 20 years with data and nothing for the following 20 years) may also play a role, and it would be good to have an insight into this and possibly only derive trends for glaciers with a sufficient number data (seasons). The strong limitation of temporal extrapolation for some glaciers is highlighted l 350-l355, hence making a derivation of trends on these glaciers meaningless.

MINOR COMMENTS

- the GBR consider as predictors both elevation differences between reanalysis pixel and glacier site, and downscaled variables like temperature, whereby the downscaling of temperature itself mostly relies on this altitude difference. Hence there is a high redundancy in the chosen predictors. Did you test suppressing the downscaled predictors ?

- the predictors in the PCA figures (4 and 5) are often barely lisible. Fig 5 could maybe join the supplemental material.

- l 264-274 : could the different magnitude in factors relate to known biases / weaknesses of the reanalyses in representing different types of precipitation events ?

- l 311 : "their performance is worse than the site-independent models". It is not so clear for me why : could you please explain ?

- l 448 : why were more topographic predictors used in the ERA-5 GBRs than in the MERRA-2 ones ?

- Fig 2 could join the Supplemental material

- Fig 6 : could the absolute biases also be mentioned ?

- Fig 7: a ranking of the glaciers with respect to altitude, or to the number of seasons with Bw_data, would enable to more efficiently support the analysis related to this figure, please consider this. The same applies to Fig 11.

- Tables 1 and 2 could join the supplemental material

- Section 5.2 : this recent literature could also be of interest : https://doi.org/10.5194/hess-24-5355-2020; https://doi.org/10.5194/essd-14-1707-2022 (update of Durand et al., 2009).

Citation: https://doi.org/10.5194/tc-2022-69-RC2
- AC2: 'Reply on RC2', Matteo Guidicelli, 25 Jul 2022
  
  We would like to thank the reviewer for the positive appreciation of our work and the constructive comments that will help us to improve the paper considerably.
  In the following, we report our responses (bold) to the reviewer's concerns (within quotation marks).
  
  MAIN COMMENTS
  “1 - Comparison/justification with respect to other AI techniques for bias correction and downscaling in literature : Even though the introduction describes well the existing literature on AI-based downscaling/bias correction methods, the choice of GBR is barely justified with respect to other techniques. I would have expected elements in that direction in the manuscript, especially since a section of the Discussion is entitled : '5.1 Advantages and disadvantages of gradient boosting regressors'.”
  We decided to use a tree-based algorithm because of its higher readability in terms of the predictors’ importance compared to other methods (e.g. neural networks). Furthermore, gradient boosting is a gradient descent algorithm, where each additional tree tries to get the model closer to the target and reduce the bias rather than the variance (which is what a random forest algorithm does). We agree, however, that more background on our choice is needed. A dedicated discussion will be added to Sec. 5.1.
  
  “2 - Limits inherent to the number of available learning data : Some of the regions of interest, e.g. Canada and Central Asia, have in total less than 20 glaciers used in this study, which is an extremely low percentage of the number of glaciers that they truly host. This in my opinion strongly impedes the (spatial) generalization capability of the GBR models learned on these data, to the region of interest as a whole. Although this is not what the authors do in the paper, this is what the title suggests while mentioning the world's glaciers. I would strongly recommend to modify this misleading title, as the developed technique is in practice not applied to derive precipitation data over any glacier of the world, but is limited to (i) the regions of interest and (ii) the few glaciers with data in these regions.. On top of the low sampling level for application of machine learning techniques in general, there may be furthermore a strong sampling bias in the glaciers data from WGMS, for instance towards large glaciers in the European Alps, so that the representativity of the glaciers with data w/r to the regions of interest is questionable. It follows that it is hard to know whether models or conclusions inferred solely based on these very few glaciers, are representative of the region as a whole. I very much would like the authors to comment on this. "The good performance of the GBRs in terms of bias suggests that they can be used for SWE estimates over glaciers where no ground observations are available (site-independent GBRs)". Despite being better than the benchmark, the performance of site-independant GBR models is limited (Fig 9) and decreases when data of neighbouring glaciers are excluded from the training. Considering that, and the likely sampling biases of WGMS data, I think the authors could revise this sentence.”
  We agree with the reviewer regarding most aspects mentioned here. In the revised paper we will more critically discuss our approaches and also demonstrate the limitations of our approach, for example in the case of a limited number of observations.
  Title: We agree that the term “world’s glaciers” can be misleading. We will change the title to: “Snow accumulation over glaciers in the Alps, Scandinavia, Central Asia and Western Canada (1981-2020) inferred from climate reanalysis and machine learning”
  Regarding the sentence mentioned ("The good performance of the GBRs in terms of bias suggests that they can be used for SWE estimates over glaciers where no ground observations are available (site-independent GBRs)") we fully agree that our statement was too optimistic / too general and we will better specify that the model can be applied on other glaciers, only if the glacier is in proximity to the glaciers used in the training. Moreover, we will specify that the resulting performance strongly depends on the characteristics of the glaciers with respect to the glaciers used in the training.
  
  “3 - Trends : In my opinion the derivation of trends based on the GBR modelled precipitation, should be accompanied with sensitivity tests to ascertain the robustness and uncertainties of this method. Typically, data-withdrawal techniques could be used on the longest time-series to evaluate the robustness/uncertainty of the trends derived when missing data are encountered. The distribution of the data gaps within the time-series (= for instance one missing season every two year, vs 20 years with data and nothing for the following 20 years) may also play a role, and it would be good to have an insight into this and possibly only derive trends for glaciers with a sufficient number data (seasons). The strong limitation of temporal extrapolation for some glaciers is highlighted l 350-l355, hence making a derivation of trends on these glaciers meaningless.”
  Thanks a lot, this is a very valid comment and a good suggestion.
  In the trend analysis, the GBR models are applied over 41 years for all the glaciers of the study. The Bw data was only used to train the GBR models and not to derive the trends. Thus, we propose the following sensitivity test to be included in the revised manuscript: similarly to Fig. 9a, c, e and g and only for glaciers with long timeseries of Bw data, we will show the trends depending on (a) the used number of training seasons for the validated glacier and (b) the distribution of the available Bw data. The sensitivity test would thus allow us to further evaluate the general expected robustness and uncertainties of the trends depending on the number of years with available Bw data used for training. However, this is only feasible for the season-independent GBR.
  In fact, trends are also derived with the site-independent GBRs, which are not affected by the number of years with available Bw data (because no Bw data of the validated glacier is used for training). The fact that the site-independent GBRs often perform better than the season-independent GBRs in terms of temporal correlation with the Bw data, is an indicator that the number of available years with Bw data does not necessarily need to be high in order to accurately represent the temporal variability of the snow accumulation over the years and thus, in order to derive trends. In the revised manuscript, we will determine the trends only for glaciers with long time-series of Bw data, i.e.: only for glaciers where the temporal correlation between the GBR models and the Bw data can be evaluated. For these glaciers, the comparison between the trends obtained from the season-independent and site-independent GBRs will allow a better discussion of the potential use of the site-independent GBRs for the derivation of trends on glaciers with no Bw data.
  
  MINOR COMMENTS
  - "the GBR consider as predictors both elevation differences between reanalysis pixel and glacier site, and downscaled variables like temperature, whereby the downscaling of temperature itself mostly relies on this altitude difference. Hence there is a high redundancy in the chosen predictors. Did you test suppressing the downscaled predictors ?"
  Thanks for this interesting comment. The high correlation between predictors is only a problem for the interpretability of the predictors’ importance. However, this does not affect the performance of the GBR because decision trees are by nature not affected by multi-collinearity. If two predictors are highly correlated, the tree will choose only one of the two predictors when deciding upon a split.
  As suggested, in the revised paper, we will show the changes in terms of overall model’s performance when suppressing the downscaled predictors (and/or other variables, e.g. topographical) in Sec. 4.1. This will be helpful to quantify the added value of each group of predictors. Correspondingly, Fig. 4 will be modified. In fact, we are quite confident that this will be a better evaluation of the predictors’ importance than only showing the frequency of use of the main predictors (Fig. 4a and b).
  
  - "the predictors in the PCA figures (4 and 5) are often barely lisible. Fig 5 could maybe join the supplemental material."
  Fig. 5 will be moved to the Supplementary material. We will also increase the fontsize and avoid the overlapping of predictors’ names.
  
  -" l 264-274 : could the different magnitude in factors relate to known biases / weaknesses of the reanalyses in representing different types of precipitation events ?"
  Yes, this is a good suggestion and we will invest more time in this, trying to link our results with the literature.
  
  - "l 311 : "their performance is worse than the site-independent models". It is not so clear for me why : could you please explain ?"
  The season-independent GBR model has a higher number of trees and less samples are needed to create a new leaf of the tree (i.e. to predict a different adjustment factor) than the site-independent GBR. Thanks to its higher complexity than the site-independent model, if Bw data of the validated glacier is used to train the season-independent model, this latter can learn the specific characteristics of the validated glacier and perform better than the site-independent model.
  On the other hand, if no Bw data of the validated glacier is used to train the season-independent GBR, its performance is worse than the site-independent GBR, because it will overfit the training data.
  This discussion will be added in shorter form to the revised paper.
  
  - "l 448 : why were more topographic predictors used in the ERA-5 GBRs than in the MERRA-2 ones ?"
  We used all the topographical predictors describing the reanalysis’s subgrid complexity of both reanalysis products and ERA-5 is providing more descriptors than MERRA-2.
  
  - "Fig 2 could join the Supplemental material"
  Yes, we agree.
  
  - "Fig 6 : could the absolute biases also be mentioned ?"
  Yes, we will also evaluate and report the mean bias error in addition to the root mean squared error. However, the figure is already too busy to allow more numbers and we will discuss the results in the text.
  
  - "Fig 7: a ranking of the glaciers with respect to altitude, or to the number of seasons with Bw_data, would enable to more efficiently support the analysis related to this figure, please consider this. The same applies to Fig 11."
  Thanks for the suggestion. We will modify Fig. 7 and Fig. 11 as proposed.
  
  - "Tables 1 and 2 could join the supplemental material"
  Yes, we agree.
  
  - "Section 5.2 : this recent literature could also be of interest : https://doi.org/10.5194/hess-24-5355-2020; https://doi.org/10.5194/essd-14-1707-2022 (update of Durand et al., 2009)."
  Thanks. We will include this literature in the discussion.
  
  Citation: https://doi.org/10.5194/tc-2022-69-AC2

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

ED: Reconsider after major revisions (further review by editor and referees) (02 Aug 2022) by Thomas Mölg

AR by Matteo Guidicelli on behalf of the Authors (11 Oct 2022) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (19 Oct 2022) by Thomas Mölg

RR by Anonymous Referee #2 (04 Nov 2022)

RR by Fabien Maussion (09 Dec 2022)

Suggestions for revision or reasons for rejection

Dear authors,

I would like to start my review with an apology for the time it took me to write it. I am aware that endless review processes are a real strain (especially for PhD students), and some unplanned personal matters prevented me from writing sooner.

Another (less important) reason for the late review, however, is that the paper is a difficult read. I really enjoyed the study and I find that it is relevant and interesting, but as of the current version the manuscript feels like a puzzle that the readers have to solve by themselves, because a lot of relevant information is scattered across the manuscript. I also believe that the first round of reviews shaped the manuscript in a way which makes it less readable now. I think however that the manuscript can be brought in a reasonable shape with some restructuring.

I will start with the most important point: what is the purpose of the paper? The title says “Snow accumulation over glaciers … inferred from climate reanalyses and machine learning”. But, in reality, snow accumulation is never analyzed (or even plotted! I only see correction factors everywhere). What is analyzed is the capacity of a statistical model to reconstruct winter mass-balance (not snow) in space and time.

Note that the study is well introduced: the problems stated in the introduction are real, and it IS a good idea to use winter mass-balance observations to look at biases in reanalysis data. At the end of the introduction, however, the authors state: “we thus aim at providing improved observation-independent SWE estimates at highest elevations of different mountain ranges across the Earth”. But the manuscript does not provide anything like that, does it? I saw no SWE data, and I also didn’t see a code & data availability section (against TC’s policies, by the way: https://www.the-cryosphere.net/policies/data_policy.html).

I think that this is the main problem of the manuscript, as the reader is left wondering what the paper is about. Some themes which (I feel) are developed throughout the manuscript:
- Training a statistical model to reconstruct winter mass balance (WMB) from partial information
- What information is needed to do so successfully, and what problems are occurring when data becomes scarce (this is, in my view, the most interesting aspect of the study)
- What are probable bias in winter precipitation in reanalyses
- What are the differences between MERRA-2 and ERA5 (although, to be honest, I don’t recall much discussion of this point despite the fact that having two datasets significantly clutters many figures in the manuscript).
- The WMB elevation profiles and how your model can sometimes reproduce those (to my surprise)

What is not developed in the manuscript:
- The difference between different statistical model choice (this is a bit of a weakness as it make the paper very descriptive, but is not a big issue in my opinion)
- Whether or not the method developed in this study will be used to develop regional products. This is very important because if yes, the paper needs to be a bit more careful in its wording as suggested by Reviewer #2. The paper is already much better at discussing limitations, but I think that if the plan is to derive actual products from the method, the abstract needs to state that this is the goal and to be more precise about what’s needed to reach this future goal.
- If the goal is not to make some sort of product in a future paper, then I would like to suggest going back to my point above about clear study motivation statements.

This may sound like harsh comments, but I do not intend them to be that way: I think that the study has potential! It would be very beneficial to the paper to be more clearly written, to better explain what is done and why. I’ll do my best to provide a more timely review at the next iteration.

### Specific comments

- I still don’t think that the title reflects the content of the paper well (see general comments)
- Introduction: clearly state the objectives of the study, and what will be shown in the paper. Why are these regions / glaciers chosen, etc.
- Line 144: the motivation and implication of using total seasonal averages needs to be discussed in depth. Intuitively, a model using temporal information (even at the monthly scale) would perform better, but I understand that this is not feasible in this context.
- The methods section feels incomplete. I truly don’t understand how your model is actually able to simulate WMB profiles, because my understanding at the end of the methods section is that you use seasonal totals of climate predictors to simulate total seasonal WMB of glaciers. It’s not clear to what purpose “downscaling” is used, and to what elevation the variables are downscaled (I assumed the average glacier elevation). Are you using elevation bands to reconstruct WMB as a function of elevation and then average per area somehow to get the glacier specific WMB? Where is this procedure described? Or, do you actually use elevation band data from WGMS for training? You can see that I’m confused.
- L175: to be honest I don’t think the benchmark is very fair, because the parameter K seems to be a parameter to tune for each reanalysis / situation. It is also not data informed at all. I am not requesting to change this at this stage, but I personally don't put much value in this benchmark.
- L204: “ For these cases, groups of data in the 10-fold cross-validation contain data of different years but different groups can contain data of different years of the same glacier.” -> this is really unclear. I assume only one glacier is used each time? Are you therefore building 95 models (one for each glacier) here? After reading the rest of the manuscript I see its not, but I really wonder what value there is to interpolate in time with a model that is trained on highly inhomogeneous data, and it seems that the data with the most explanatory power is obviously the data on this very glacier.
- L206-210: this paragraph is very unclear. It’s also not clear what the non-GBR specialist can learn from table 1? Either discuss to explain the value of this information or delete.
Section 4.1: intuitively, I would put this section later in the paper. But I leave this open.
- L250: I don’t think that the calendar year should be part of the predictor pool. If a constant line has predictive power, it's because the training data have trends that are not in the reanalysis data, and I think it is highly problematic to rely on such information when trying to extrapolate in space and time. Happy to be convinced otherwise though.
- L253-256: this is very unclear, I’m sorry but I don’t understand what this means.
- L278-281: Isn't this information already on the figure and does it need repeating here?
- Fig. 5 is very difficult to read.
- Fig. 6 illustrates well what is confusing me: why do the correction factors have trends? I think that Fig. 6 would also be a good opportunity to show actual data instead of correction factors, which is a very abstract notion for glaciologists…
- Fig. 6: when averaging factors, you should also plot the range (std dev) to show the robustness of the differences
- L307: “confirming the importance of a specific optimization scheme depending on the goal of the model.” -> I have to reiterate: what is the goal of the model?
- L320: genuine question: is there any skill in ingesting data from very far away glaciers to interpolate in time?
- L321: “ In conclusion, filling data gaps is much simpler than estimating SWE on glaciers with no observations.” yes, and this raises the question whether GBR is really needed for that or not (rhetorical question, requiring no change to the manuscript)
- L325: see comment above: it is really unclear to me how the profiles are predicted…
- L363: “ This suggests that complex models such as our GBRs are needed to adjust reanalysis to different glacier sites” -> this statement is made based on the benchmark model, which is not data informed. The paper does not say which model complexity is needed to achieve WMB reconstruction.
- L380: “A disadvantage of tree-based algorithms, however, could be that this approach does not predict continuous values.” -> I feel that this information should be shared much much earlier in the paper.
- Section 5.1.2 is highly speculative, short and not convincing.
- In general, the discussion is by far the most interesting part of the paper. Many points related to how the method works or doesn’t work are described here, and this is what makes the paper interesting.
- Section 5.2.4: I might be wrong, but I think that this is the only time the difference between the two reanalysis datasets is discussed? Does this justify the additional complexity of many of the figures? (I’m not suggesting changing the study design at this stage, but it is still a valid question).

Hide

ED: Reconsider after major revisions (further review by editor and referees) (21 Dec 2022) by Thomas Mölg

AR by Matteo Guidicelli on behalf of the Authors (29 Jan 2023) Author's response Author's tracked changes Manuscript

ED: Publish as is (16 Feb 2023) by Thomas Mölg

AR by Matteo Guidicelli on behalf of the Authors (17 Feb 2023) Author's response Manuscript

Short summary

Spatio-temporal reconstruction of winter glacier mass balance is important for assessing long-term impacts of climate change. However, high-altitude regions significantly lack reliable observations, which is limiting the calibration of glaciological and hydrological models. We aim at improving knowledge on the spatio-temporal variations in winter glacier mass balance by exploring the combination of data from reanalyses and direct snow accumulation observations on glaciers with machine learning.