|Dear authors, |
I would like to start my review with an apology for the time it took me to write it. I am aware that endless review processes are a real strain (especially for PhD students), and some unplanned personal matters prevented me from writing sooner.
Another (less important) reason for the late review, however, is that the paper is a difficult read. I really enjoyed the study and I find that it is relevant and interesting, but as of the current version the manuscript feels like a puzzle that the readers have to solve by themselves, because a lot of relevant information is scattered across the manuscript. I also believe that the first round of reviews shaped the manuscript in a way which makes it less readable now. I think however that the manuscript can be brought in a reasonable shape with some restructuring.
I will start with the most important point: what is the purpose of the paper? The title says “Snow accumulation over glaciers … inferred from climate reanalyses and machine learning”. But, in reality, snow accumulation is never analyzed (or even plotted! I only see correction factors everywhere). What is analyzed is the capacity of a statistical model to reconstruct winter mass-balance (not snow) in space and time.
Note that the study is well introduced: the problems stated in the introduction are real, and it IS a good idea to use winter mass-balance observations to look at biases in reanalysis data. At the end of the introduction, however, the authors state: “we thus aim at providing improved observation-independent SWE estimates at highest elevations of different mountain ranges across the Earth”. But the manuscript does not provide anything like that, does it? I saw no SWE data, and I also didn’t see a code & data availability section (against TC’s policies, by the way: https://www.the-cryosphere.net/policies/data_policy.html).
I think that this is the main problem of the manuscript, as the reader is left wondering what the paper is about. Some themes which (I feel) are developed throughout the manuscript:
- Training a statistical model to reconstruct winter mass balance (WMB) from partial information
- What information is needed to do so successfully, and what problems are occurring when data becomes scarce (this is, in my view, the most interesting aspect of the study)
- What are probable bias in winter precipitation in reanalyses
- What are the differences between MERRA-2 and ERA5 (although, to be honest, I don’t recall much discussion of this point despite the fact that having two datasets significantly clutters many figures in the manuscript).
- The WMB elevation profiles and how your model can sometimes reproduce those (to my surprise)
What is not developed in the manuscript:
- The difference between different statistical model choice (this is a bit of a weakness as it make the paper very descriptive, but is not a big issue in my opinion)
- Whether or not the method developed in this study will be used to develop regional products. This is very important because if yes, the paper needs to be a bit more careful in its wording as suggested by Reviewer #2. The paper is already much better at discussing limitations, but I think that if the plan is to derive actual products from the method, the abstract needs to state that this is the goal and to be more precise about what’s needed to reach this future goal.
- If the goal is not to make some sort of product in a future paper, then I would like to suggest going back to my point above about clear study motivation statements.
This may sound like harsh comments, but I do not intend them to be that way: I think that the study has potential! It would be very beneficial to the paper to be more clearly written, to better explain what is done and why. I’ll do my best to provide a more timely review at the next iteration.
### Specific comments
- I still don’t think that the title reflects the content of the paper well (see general comments)
- Introduction: clearly state the objectives of the study, and what will be shown in the paper. Why are these regions / glaciers chosen, etc.
- Line 144: the motivation and implication of using total seasonal averages needs to be discussed in depth. Intuitively, a model using temporal information (even at the monthly scale) would perform better, but I understand that this is not feasible in this context.
- The methods section feels incomplete. I truly don’t understand how your model is actually able to simulate WMB profiles, because my understanding at the end of the methods section is that you use seasonal totals of climate predictors to simulate total seasonal WMB of glaciers. It’s not clear to what purpose “downscaling” is used, and to what elevation the variables are downscaled (I assumed the average glacier elevation). Are you using elevation bands to reconstruct WMB as a function of elevation and then average per area somehow to get the glacier specific WMB? Where is this procedure described? Or, do you actually use elevation band data from WGMS for training? You can see that I’m confused.
- L175: to be honest I don’t think the benchmark is very fair, because the parameter K seems to be a parameter to tune for each reanalysis / situation. It is also not data informed at all. I am not requesting to change this at this stage, but I personally don't put much value in this benchmark.
- L204: “ For these cases, groups of data in the 10-fold cross-validation contain data of different years but different groups can contain data of different years of the same glacier.” -> this is really unclear. I assume only one glacier is used each time? Are you therefore building 95 models (one for each glacier) here? After reading the rest of the manuscript I see its not, but I really wonder what value there is to interpolate in time with a model that is trained on highly inhomogeneous data, and it seems that the data with the most explanatory power is obviously the data on this very glacier.
- L206-210: this paragraph is very unclear. It’s also not clear what the non-GBR specialist can learn from table 1? Either discuss to explain the value of this information or delete.
Section 4.1: intuitively, I would put this section later in the paper. But I leave this open.
- L250: I don’t think that the calendar year should be part of the predictor pool. If a constant line has predictive power, it's because the training data have trends that are not in the reanalysis data, and I think it is highly problematic to rely on such information when trying to extrapolate in space and time. Happy to be convinced otherwise though.
- L253-256: this is very unclear, I’m sorry but I don’t understand what this means.
- L278-281: Isn't this information already on the figure and does it need repeating here?
- Fig. 5 is very difficult to read.
- Fig. 6 illustrates well what is confusing me: why do the correction factors have trends? I think that Fig. 6 would also be a good opportunity to show actual data instead of correction factors, which is a very abstract notion for glaciologists…
- Fig. 6: when averaging factors, you should also plot the range (std dev) to show the robustness of the differences
- L307: “confirming the importance of a specific optimization scheme depending on the goal of the model.” -> I have to reiterate: what is the goal of the model?
- L320: genuine question: is there any skill in ingesting data from very far away glaciers to interpolate in time?
- L321: “ In conclusion, filling data gaps is much simpler than estimating SWE on glaciers with no observations.” yes, and this raises the question whether GBR is really needed for that or not (rhetorical question, requiring no change to the manuscript)
- L325: see comment above: it is really unclear to me how the profiles are predicted…
- L363: “ This suggests that complex models such as our GBRs are needed to adjust reanalysis to different glacier sites” -> this statement is made based on the benchmark model, which is not data informed. The paper does not say which model complexity is needed to achieve WMB reconstruction.
- L380: “A disadvantage of tree-based algorithms, however, could be that this approach does not predict continuous values.” -> I feel that this information should be shared much much earlier in the paper.
- Section 5.1.2 is highly speculative, short and not convincing.
- In general, the discussion is by far the most interesting part of the paper. Many points related to how the method works or doesn’t work are described here, and this is what makes the paper interesting.
- Section 5.2.4: I might be wrong, but I think that this is the only time the difference between the two reanalysis datasets is discussed? Does this justify the additional complexity of many of the figures? (I’m not suggesting changing the study design at this stage, but it is still a valid question).
In "Snow accumulation over the world’s glaciers (1981-2021) inferred from climate reanalyses and machine learning", a machine learning model is applied to 95 glaciers on 3 continents to downscale precipitation and other variables from commonly used reanalysis products.
The problems begin with the title, which overstates its importance. Only a tiny fraction, in fewer than half of the continents, of the world's glaciers are examined. The manuscript has too many figures and tables. The manuscript is supposed to be within 12 journal pages for TCD. The tables and figures alone, most of which occupy a full page, would take up this much space. The figures are bloated. For example, there is no need to illustrate "Tree 1" nor "Tree N", both of which are identical in Figure 3. The PCA section (4.1) doesn't tell the reader much more than the fact that elevation is the most important downscaling predictor. The leave one out validation is problematic as there is no independent validation dataset used, meaning that biases in precipitation are unlikely to be identified.
ERA-5 and MERRA-2 reanalyses are used without any mention of their potential large biases in the mountains. For example, Liu and Margulis (2019) report that MERRA-2 underestimates snowfall (which is based on the "PRECTTOLAND" variable used here) by 54% in High Mountain Asia. It's not clear to me that the downscaling techniques presented here will correct that bias, as no independent evaluation of precipitation is presented. Melt and sublimation are ignored in the "winter mass balance," which is then the wrong term.
After carefully searching through the text, I still cannot understand how precipitation phase was treated. It seems to have been ignored as SWE is used interchangeably with the downscaled precipitation on glaciers. But then, in Table B1 and B2 ERA-5/MERRA-2 snowfall variables are listed as predictors?
Because of its excessive length, lack of clarity, and questionable assumptions, I recommend this manuscript be rejected. For a resubmission, I suggest the authors consider an independent evaluation of snow accumulation and at least an explanation of how precipitation phase was treated. The size of the figures and tables needs to be cut approximately in half.
Liu, Y., and Margulis, S. A.: Deriving Bias and Uncertainty in MERRA-2 Snowfall Precipitation Over High Mountain Asia, Frontiers in Earth Science, 7, 10.3389/feart.2019.00280, 2019.