Articles | Volume 20, issue 5
https://doi.org/10.5194/tc-20-3187-2026
© Author(s) 2026. This work is distributed under the Creative Commons Attribution 4.0 License.
Machine learning for snow depth estimation over the European Alps, using Sentinel-1 observations, meteorological forcing data and process-based model simulations
Download
- Final revised paper (published on 29 May 2026)
- Preprint (discussion started on 30 Jul 2025)
Interactive discussion
Status: closed
Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor
| : Report abuse
-
RC1: 'Comment on egusphere-2025-3327', Anonymous Referee #1, 08 Oct 2025
- AC1: 'Reply on RC1', Lucas Boeykens, 27 Jan 2026
-
EC1: 'Comment on egusphere-2025-3327', Francesco Avanzi, 18 Nov 2025
- AC2: 'Reply on EC1', Lucas Boeykens, 27 Jan 2026
Peer review completion
AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload
ED: Reconsider after major revisions (further review by editor and referees) (29 Jan 2026) by Francesco Avanzi
AR by Lucas Boeykens on behalf of the Authors (29 Jan 2026)
Author's response
Author's tracked changes
Manuscript
ED: Referee Nomination & Report Request started (30 Jan 2026) by Francesco Avanzi
RR by Anonymous Referee #1 (06 Mar 2026)
RR by Anonymous Referee #2 (03 Apr 2026)
ED: Publish subject to minor revisions (review by editor) (05 Apr 2026) by Francesco Avanzi
AR by Lucas Boeykens on behalf of the Authors (16 Apr 2026)
Author's response
Author's tracked changes
Manuscript
ED: Publish as is (16 Apr 2026) by Francesco Avanzi
AR by Lucas Boeykens on behalf of the Authors (23 Apr 2026)
Manuscript
Summary:
This manuscript presents an extensive analysis of machine learning capabilities for snow depth estimation. The authors compare a variety of machine learning (XGBoost) model configurations and apply a threefold nested cross validation to evaluate their approach. The inputs to the machine learning model are remote sensing data from Sentinel-1 (of which PolSAR had not been used before to estimate snow depth), downscaled meteorological forcing data and physically-based model simulations. The authors then evaluate the importance of features in the machine learning model, and the spatial predictions of snow depth at unseen locations by the model.
The aims and findings of this study are interesting, with the main novelty being the inclusion of PolSAR variables as well as meteorogical forcings or a physically-based model forced with those to predict snow depth at high resolution (100 m) over the Alps. I am impressed with the amount of data processing and careful methodological procedures that the authors went through, which seems very robust.
However, I think the manuscript should more clearly state the novelty of this study in comparison with Dunmire et al. (2024). While the authors claim that the snow depth estimations are improved with the inclusion of PolSAR and meteorological forcings, it is hard to see any significant improvement when comparing similar figures between the manuscripts. Even within this manuscript, it is often claimed that a method improves snow depth estimates without this clearly seen in the figures. Furthermore, I have several concerns regarding the presentation of results, some of them are not very clear and there are many instances of “results not shown”. I believe the authors need to improve the manuscript before it can be published, and I hope my comments below will help.
Main comments:
The title claims snow depth estimation over the European Alps, but there is no map of estimated snow depth over the European Alps, and no map of the predicted snow depth validation over the entire mountain range (as there is in Figure 2, and Figure 7, in Dunmire et al 2024).
About XGBoost: Besides referring the reader to Chen and Guestrin (2016), I think there should be at least a few lines description of what this ML model is and its characteristics, and why the authors (or previous authors) chose this model.
The inclusion in the ML model of physically-based model simulations with meteorological forcing yields a comparable accuracy than using the meteorological forcings directly as input to the ML model. As I understand it, there is therefore no advantage of using physically-based model simulations, as this adds unnecessary complexity. It seems the ML models learns the physics already with the meteorological forcing. I think this is an interesting finding and should be better discussed.
Section 4.1 states a couple of times that differences are significant because p <<0.05 (e.g. differences due to Table C1). However, the improvements are quite marginal (R2 0.88 vs R2 0.89; MAE 0.3m vs MAE 0.29 m). I think the authors should discuss the significance based on the absolute improvement, which is very little, and not the statistical significance, which in this case is clearly just due to the large sample size. See https://www.nature.com/articles/s41598-021-00199-5 and https://linkinghub.elsevier.com/retrieve/pii/S026151771730078X . With this in mind, the authors should revise this section carefully.
There are several instances of “results not shown” in the paper. I think they should all be included as they seem relevant (lines 396, 411, 465, 474, 484, 494, 501, 526, 550). There are also instances where a result is discussed but not seen on any figure (lines 397-399, 431-433, 497-498, 511-512, 534-537, 586)
About the novelty with respect to Dunmire et al (2024). Line 344 even states that the errors are slightly higher in this study than in Dunmire, and another example in line 434 shows very similar results. There should be a more open discussion about the little improvement, despite the novelty of this paper.
Sometimes in the manuscript, it seems that meteorological forcing data AND physically-based model simulations are used simultaneously in the ML model, but that is not the case. I suggest to change the following to OR (not in the title, as that is a list of all the inputs). (Line 8, 568)
Comments by line number:
L34: a reference for “essential climate variables” is missing.
L40: I suggest to add example datasets: “measurements offer frequent data at many locations globally (e.g. Matiu et al. 2021 https://doi.org/10.5194/tc-151343-2021, Fontrodona-Bach et al. 2023; https://doi.org/10.5194/essd-15-2577-2023, Mortimer and Vionnet, 2025) https://doi.org/10.5194/essd-17-3619-2025).
L53: Does “this work” refer to the one in this manuscript? Not clear if it refers to the previous references.
L54: “an increasing snowpack DEPTH”?
L55: I recommend against the use of etc. Either complete the list or simply state the examples.
Lines 62-64 and 75-77 seem to be a repetition of each other regarding the current gap in knowledge.
L74: perhaps: “snow depth retrieval”?
L83: “compared to in-situ measurements.” This needs references.
L85: such as instead of e.g.
L88: This needs a reference at the end of the sentence.
L91: perhaps: “contribute to improving SD predictions”?
L103: coarse instead of course.
L116: The GHC needs a reference (and isn’t It GHCN?). Does the end of the sentence mean only Germany and Slovenia are taken from this dataset?
L145: Why does rescaling matter for interannual start of season differences? It is unclear what this means.
L158: How many are these remaining gaps? How many were filled?
L166: a quick definition of majority resampling would be useful.
L178: What other downscaling techniques?
Equation 1: Where does this downscaling equation come from? A reference or explanation is needed.
L223: With a rather long paper and a lot of specific nomenclature, it is sometimes easy to forget what LIA, or TPI mean, especially for unspecialised readers. I suggest to include a table or list in the appendix with all abbreviations used (or expand Table B1)
L239: Perhaps it is useful to remind the reader that here the input of meteorological data or physical model simulations is still not assessed. I thought there should be 5 configurations otherwise.
L241: Why “next” and not together with the previous?
L243: I suggest “The second configuration, focusing…” instead of “Conversely, the configuration…”
Table 2 caption: I suggest “within this study.”
Table 2: some of the features are presented in the text after the table is presented, therefore the reader does not know what all these variables are. I suggest to spell them out in the caption, or in the acronym table I suggested above.
3.2.3 Snow depth prediction: I do not understand how this title links to the paragraph. It seems that the paragraph is about standardization of features.
L276: If all folds contain at least one station from each of all the boxes of stations within 5 km of each other, how is this a blind validation? I am possibly understanding this wrong, please clarify this.
L278: The procedure for the temporal fold is also not entire clear to me. What does it mean that sites were kept separate, but grouped and divided in 5 folds?
Figure 2: Please a add a legend for the colours and textures.
L308: Does this mean that for these sites, the snow season is less than 10 days?
L314: The bias, although discussed, is not always shown in figures or tables. Please include it (e.g. Table 3, Table C1).
L332: Why is Table C1 not together with Table 3? As it seems quite important and thoroughly discussed.
L336: “the temporal framework overestimates model performance” what does this mean? Can model performance be overestimated or is snow depth overestimated?
L339: This says that the spatio-temporal framework provide a more realistic evaluation of model performance for this study, but lines 331-332 say that performance is highest for the temporal framework and progressively deteriorates in the spatial and spatio-temporal frameworks. These two statements contradict each other.
L349: in the figure c1b caption it says observed-predicted, so a negative bias would mean an overestimation of snow depth. Please standardise this.
L353-355: How is a deterioration of model performance seen as an accurate representation of model performance? This sentence is unclear.
L355: why “also”? Which other improvements were there?
L355-365: As stated in a general comment, I don’t see this little improvement as a significant improvement, I think it is the effect of the sample size.
L368: FSC instead of fSC.
Figure 3: It is difficult to see differences between configurations, perhaps a table in the supplement or Appendix would be useful.
L397-399: How are these results seen in Fig. 4a?
Figure 5: The predicted snow depth time series show a clear flat long period in the middle of the accumulation period (especial at 5a and 5b), which does not match observations very well, suggesting that snowfalls and increasing snow depth in mid-winter are not well captured? This should be discussed. Also please state which sites are these (name, location, source of the data).
L505-507: Linking to one of my main comments, I think the results underscore the potential of using meteorological forcing data alone, as input to ML models (as the improvement of Snowclim is minimal). I think this should be included here.
Figure 6. I suggest adding the title of each configuration on each row, to make the figure more easily readable.
L531: again, the improvement seems quite minimal.
542-543: the potential inability to correctly predict snow density is a key limitation for further refining this method to predict daily time series in the future. This could be discussed.
L539: The authors say weatherML and snowclimML overestimate snow depth, based on the biases. However, when comparing figure 7b with the measured Figure 7d, it seems the opposity. It seems that 7b (weatherML) shows much lower snow depths than measured. In fact, the scatter plot suggests that weatherML outperforms snowclimML, but the snowclimML snow depth map resembles the observations more. This discrepancy should be clarified.
Figure 7. Why do maps have different MAE than their respective scatter plots? Why do the scatter plots have a low density of points when approaching 0 m snow depth?
Figure 8. It would be better to show the maps of snow depth with survey data, without survey data, the difference, and the measured maps, to enable a better comparison.
L593-595: Compare these results to estimates from other studies, such as the results from Dunmire et al 2024.
Equation A1: Can Wsat not become infinite if any weight is zero? Revise or clarify.
L633-635: what downscaling techniques and what parameters?
Figure B1: It would be interesting to see different scatter plots for the snow surveys and the point measurements.
Figure C1. Why not just a map with the bias per station, and compare it with the one from Dunmire et al. (2024) in their Figure 2?