Past and future changes in avalanche problems in northern Norway estimated with machine-learning models

Eiselt, Kai-Uwe; Graversen, Rune Grand

doi:10.5194/tc-20-1867-2026

Articles | Volume 20, issue 3

https://doi.org/10.5194/tc-20-1867-2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/tc-20-1867-2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 20, issue 3

Research article

|

01 Apr 2026

Research article |

| 01 Apr 2026

Past and future changes in avalanche problems in northern Norway estimated with machine-learning models

Kai-Uwe Eiselt and Rune Grand Graversen

Download

Final revised paper (published on 01 Apr 2026)
Supplement to the final revised paper
Preprint (discussion started on 06 Oct 2025)
Supplement to the preprint

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-4685', francis meloche, 03 Nov 2025
Review of Past and future changes in avalanche problems in northern Norway estimated with machine-learning models
By Kai-Uwe Eiselt and Rune Grand Graversen

Summary
This paper uses a model chain to predict the past and future avalanche hazard in northern Norway. This work builds on previous work by the same authors, who developed Random Forest models to predict avalanche danger. The model chain they developed primarily consists of a dynamic downscaling of climate models in Norway for the past and future, which serves as input to the snow cover model SNOWPACK. Then, they build Random Forest models to predict avalanche days for several avalanche problems using meteorological variables (from the downscaled climate models) and snow instability variables from SNOWPACK. They show different historical trends in the frequency of avalanche days for different avalanche problems (e.g., wet, storm, wind, or persistent), as well as correlations with the Arctic Oscillation (AO). They conclude with projections of avalanche problems using climate projections (RCP4.5–8.5) for Norway, demonstrating similar results to those found in the Alps (Switzerland and France).
The paper is generally well written, well thought out, and is worthy of publication in The Cryosphere. The only major concerns I have regarding the methodology relate to the spatial aggregation of the downscaled climate simulations. In addition, more details should be provided concerning the SNOWPACK modeling for reproducibility purposes. It may also be beneficial to add a dedicated section in the discussion about the limitations and biases of their study, and how these affect their results (small one in the conclusion). There are a few punctuation issues across the text, and addressing them would enhance the flow of the manuscript.

Major Comments:
Climate simulations tend to “smooth” extreme events due to their coarse resolution. In addition, the spatial aggregation of the climate simulations further enhances this smoothing effect, which is critical for avalanche problem types such as storm and wind slab. I believe this important bias needs to be addressed in the discussion, as it could affect the interpretation of the projected trends for these two avalanche problems. While the projected climate captures “thermal” avalanche problems such as persistent weak layers (PWL) and wet snow reasonably well, the projections for storm and wind slabs should be interpreted with caution. I also think that more information on the spatial aggregation would help the reader better understand this effect.

More details are needed concerning important parameters, parameterizations, and the simulation setup of the SNOWPACK model, in order to improve the reproducibility of this study.

The figure sizes should be adjusted, as they are currently too small, and the font style does not match that of the manuscript.

Several punctuation marks are missing throughout the text, which limits the flow and the comprehension of some sentences. I’ve highlighted a few examples below, but please check this consistently throughout the manuscript.

Specific comments (line number)
Section 1 - Introduction
15: Already have impact the occurrence in the arctic, especially in mass movements. they are several references in the literature.

Section 2 - Data
115: Change apply to past tense “applied”
158: What is slab snow??
160-161 : I think a ref to Figure 3 would be great here, as I struggle the get what the number means unless I look at Figure 3.
Figure 3: is ADL on the x axis the general? Please define.
183: punctuation is needed to enhance the flow between danger and we.
201: punctuation is needed to enhance the flow between conditions and Lind.
205: too-strong is a bit vague for an amount of precipitation, or maybe it is about precipitation rate? Please clarify.
205-209: Not sure the relevance of these information to describe the dataset, it feels more like an introduction, or maybe as a part of the discussion to compare with the results.
213: punctuation is needed to enhance the flow between cover and we.
216: not sure if this is the right reference for key summary of SNOWPACK. This paper is an update status on snow cover modeling in avalanche forecasting including CROCUS and SNOWPACK.
219: punctuation is needed to enhance the flow between temperature and we.
220-221: punctuation is needed to enhance the flow between (TSS) and we.
226: Do you end up with 4 SNOWPACK simulations per warning region? Each simulations have the average grid cell for 4 elevation band? Is 20 the total number per warning region or the entire study area? A sentence that summarizes how many simulations per warning region is needed.
230-235: Maybe reduce these lines to one or two sentences, as it limits the comprehension of your methods. We assumed that it is included and it complicates for nothing this section.
257: why explain this? Either remove it or put it into the result.
258: based, use past tense .

Section 3 - Methods
264: you need to state at least the main analysis and parameter we should not need to read another paper.
265: do you have values or maybe a figure to show the imbalance and the effect of the algorithm.
283: should the F1 score gives that?
Figure 4: Please adjust the font to match the manuscript, and define what is general? Maybe remove true danger, as danger bring confusion between danger level and avalanche problem.
Section 4 – Model Performance and features importances.
This section also results like section 5.
310: the false alarm is also very high.
313: please stick to one definition either problem or danger level.

Section 5 - Results
Section 5.1.1 : please use past tense.
339 - 340 : please rephrase this sentence.
343 : maybe refer to the figure 8.
344 : be consistent with fig. Or figure.
361 : was this define in the method section.
Section 5.2: there is way more reference to supplemental figures than figure 9, please put these into the text. Figure S9 has more references than figure 9. Or maybe the appendix, which is more accessible.

Section 6 - Discussion
Section 6.1: how the precision of the model affects your results especially the PWL.
419 - 428: I think it might be worth it to discuss these factors between the development and the trigger of the PWL.
491: would it be better yrs instead of y.
497: it might also be warmer and thaw events stabilizing the snowpack.

Section 7 – Summary and conclusions
593: why not write meteorological input as both are spatially aggregated for input to the rf's model.
597-598: I think this is rather concerning. it was also point out that SNOWPACK struggle to model artic snowpack, because of the high thermal gradient (Domine et al., 2019).

References
Domine, F., Picard, G., Morin, S., Barrere, M., Madore, J. B., & Langlois, A. (2019). Major issues in simulating some Arctic snowpack properties using current detailed snow physics models: Consequences for the thermal regime and water budget of permafrost. Journal of Advances in Modeling Earth Systems, 11(1), 34-44.
Citation: https://doi.org/10.5194/egusphere-2025-4685-RC1
- AC1: 'Reply on RC1', Kai-Uwe Eiselt, 15 Jan 2026
  
  We thank the reviewer for considering our manuscript.
  Please find our response in the pdf supplement. Our responses are written in green.
  
  Citation: https://doi.org/10.5194/egusphere-2025-4685-AC1
RC2:
'Comment on egusphere-2025-4685', Anonymous Referee #2, 18 Nov 2025

General comments:
This paper addressed past and future avalanche frequencies in northern Norway using the SNOWPACK model and the random forest (RF) model. The target avalanches were not only general problems but also wind slab, persistent weak layer slab, and wet snow avalanche problems. The past avalanches were investigated mainly with consideration of their linkage to the Arctic Oscillation (AO) index, and the avalanche frequencies were well correlated with the AO index. The future dry-snow avalanches would be estimated to decrease, while the wet-snow avalanches would increase until mid-century. The topic and results are valuable for the scientific community. The introduction provided a nice review of the global warming impact on avalanches.
However, I have a concern about the originality of this study. I agree with the authors that this work presents an original case to show future avalanche problems in Norway; however, the other aspects of originality seem limited. The random forest model used had mainly been developed in the authors’ previous work. The linkage between avalanches and the AO index had also been found in the authors’ previous work. The future estimations, including their procedure, are similar to those in previous works, such as Mayer et al. (2024). I feel that the originality of this work would be insignificant for “The Cryosphere”, even though the differences in locations themselves are valuable to the scientific community.
The utilization of the RF model also seems problematic. From my understanding, the authors estimated the avalanche-day frequency (ADF) by cumulating the daily 1/0 output from the RF model. However, this procedure might lead to a biased ADF because the RF model was not optimized by minimizing the error of the ADF. Actually, the sum of predicted AvD for wind slab avalanches is 440, while that of true AvD is 245 (Fig. 4), indicating a mean bias towards overestimation in the ADF. I suppose the RF model should be a regression type, rather than a binary type. I recommend confirming the RF model’s reproducibility regarding ADF by comparing it to the observation.
This may be related to the above problem, but I am also concerned that the authors did not consider uncertainties arising from the RF model. Seeing Fig. 4, the RF model may produce a very large uncertainty in its projection. For example, the RF model incorrectly predicts general avalanches with probabilities of 36% in AvD predictions and 17% for non-AvD predictions (Fig. 4). I am not certain, but the uncertainty range is comparable to or more than that of climate models. Furthermore, the authors converted AvD/non-AvD from avalanche danger level simply by a threshold (Section 2.1), which also causes uncertainty. However, the authors show no data to discuss this kind of uncertainty arising from the conversion. These problems would change the results of statistical tests for linear trends in past and future avalanche frequencies (Figs. 6, 7, 8 ,9), and if so, the authors’ conclusion may be changed. Authors should quantitatively demonstrate the uncertainties associated with past and future projections arising from the RF model, and these uncertainties should be considered in the statistical analysis. This point is crucial for ensuring the reliability of the RF models’ estimation.

Specific comments:
L41: “RPCs” seems to be a typo instead of “RCPs”.
L46: You need to define the abbreviation NorCP here.
L55: From my understanding, Lazar and Williams (2008) assessed a potential avalanche period very simply based on air temperature exceeding 0 °C or not. Although I do not want to treat authors' opinions carelessly, I disagree with this.
L105–121: These contents are better moved to Section 2.
L133: A dual abbreviation definition of ADL.
L136: What are the active avalanche problems?
L140: What are distribution and sensitivity?
Figure 3: Is the left axis showing the number of avalanche days? What is the avalanche problem frequency?
Section 2.4: Please describe the model settings for soil.
Section 2.4: How did you calculate liquid water content (LWC)? LWC is very important for wet avalanches (Fig. 5). Furthermore, local LWC exceeding 5% is very important for wet-avalanche predictions (Wever et al. 2016). This point should be taken into account.
Section 2.4: Please describe how you obtain daily snowpack variables. The original output of the SNOWPACK model is generally hourly data, but you use daily avalanche data.
L218: How do you prepare long-wave radiation data?
L218: You used the net short-wave radiation. So, you mean that the albedo depends on a land surface model implemented in a meteorological model? If so, does this affect the SNOWPACK simulation? The snowpack calculation is very sensitive to the short-wave radiation.
L220: The linear model should be described in the Appendix or Supplement.
L226: How did you calculate precipitation, wind, and relative humidity? A simple arithmetic mean is generally inappropriate for these variables.
L228–235: These lines should be described in the Appendix or Supplement.
Section 2.5: This content is too hard for readers without a background in the RF model. Can you merge this content into Section 3?
L273: What are min_samples_leaf, min_samples_split, max_depth, n_estimators, and max_features?
L285: You mean leave-one-out cross-validation? However, your procedure is not the leave-one-out cross-validation, but the k-fold cross-validation, actually. Leave-one-out cross-validation is a method in which a single independent data point is excluded from the training data. In this study, a single independent data is a 1/0 in a day, not a year.
L286: I do not understand why five years of training data are available even though you have Norwegian avalanche bulletin’s data from 2017/18 to 2024/25.
L509–512: This is also problematic from the viewpoint of the applicability of RF models to future climate. Does the RF model linearly increase the wet-snow ADF by increasing air temperature (or liquid water content) if only there were enough snowpack? However, one of the necessary conditions for wet avalanches is a high liquid water content, locally exceeding 5% (Wever et al., 2016). Satisfying this condition, wetting of an initially below-freezing snowpack is important (Mitterer et al. 2011). Capillary barriers or melt–freeze crusts are also key phenomena. Therefore, the authors need to confirm whether the models’ behavior in linearly increasing wet-snow ADF by increasing air temperature is really appropriate in Norway.
L590–637: These lines should be described in Section 6.

References:
Mayer, S., Hendrick, M., Michel, A., Richter, B., Schweizer, J., Wernli, H., and van Herwijnen, A.: Impact of climate change on snow avalanche activity in the Swiss Alps, The Cryosphere, 18, 5495–5517, https://doi.org/10.5194/tc-18-5495-2024, 2024.
Wever, N., C. Vera Valero, and C. Fierz (2016), Assessing wet snow avalanche activity using detailed physics based snowpack simulations, Geophys. Res. Lett., 43, 5732–5740, doi:10.1002/2016GL068428.
Mitterer C, Hirashima H, Schweizer J. Wet-snow instabilities: comparison of measured and modelled liquid water content and snow stratigraphy. Annals of Glaciology. 2011;52(58):201-208. doi:10.3189/172756411797252077

Citation: https://doi.org/10.5194/egusphere-2025-4685-RC2
- AC2: 'Reply on RC2', Kai-Uwe Eiselt, 15 Jan 2026
  
  We thank the reviewer for considering our manuscript.
  Please find our response in the pdf supplement. Our responses are written in green.
  
  Citation: https://doi.org/10.5194/egusphere-2025-4685-AC2
RC3:
'Comment on egusphere-2025-4685', Anonymous Referee #3, 09 Jan 2026
This manuscript investigates past (1970–2024) and future (21st century) changes in avalanche danger in Troms county (northern Norway) using machine-learning (random forest) models driven by 3-km dynamically downscaled climate data (NORA3 reanalysis and NorCP projections) and simulated SNOWPACK data. A key contribution is that the authors differentiate between specific avalanche problems—wind slab, persistent weak layer (PWL) slab, and wet snow—in addition to a general avalanche danger metric. Applying this framework to northern Norway, a region experiencing rapid Arctic warming, is valuable and underrepresented in the literature.
However, several key aspects need strengthening before the conclusions—especially regarding projected changes—can be considered robust. The primary issues are the definition of avalanche days, the snowpack-model settings, and the influence of spatial aggregation. Addressing these would substantially improve transparency and confidence in the findings.
Major comments
Definition of avalanche days: The study’s central metric is derived from the avalanche bulletin danger level rather than direct avalanche observations. This is a reasonable choice in given known inhomogeneities and incompleteness in the Norwegian avalanche activity databases. However, the manuscript converts the binary target variable DL≥3 vs. DL<3 into “avalanche days” and “non-avalanche days,” which risks implying a direct correspondence with observed avalanche occurrence. While DL≥3 is a defensible and widely used threshold for elevated hazard, danger levels are forecast-based assessments that integrate expected instability, likely triggers (natural or human triggering), the spatial distribution of the problem, and anticipated avalanche size. As a result, DL3 (“Considerable”) does not necessarily correspond to a consistent probability of avalanche occurrence across time and space. The authors could retain the same classification target while framing it as high-danger days (DL≥3) vs. low-danger days (DL<3), or as a DL3 exceedance frequency. This would avoid over-interpreting the danger threshold as a direct proxy for avalanche occurrence and would strengthen the conceptual consistency of the study while preserving the modeling framework and results.

SNOWPACK settings: The model chain relies on SNOWPACK-derived predictors (stability indices, liquid water proxies), yet the SNOWPACK configuration and parameter choices for the NORA3 chain raise concerns. For the NORA3 chain, the snowpack energy balance is not computed in the same way as for the NorCP data, likely because NORA3 does not provide the full set of required radiative forcing variables (e.g., incoming short-wave radiation). Instead, snow surface and ground temperatures are prescribed as boundary conditions. Setting ground temperature to the temporal mean of air temperature (l.220) removes variability in the basal boundary condition and may distort the modeled temperature gradient. This can directly affect early-season faceting and weak-layer formation in shallow snowpacks. Given the strong sensitivity of persistent weak layer development to temperature gradients, this assumption may contribute to the comparatively weak PWL model performance. Additionally, the snow surface temperature (TSS) is estimated using a linear model trained on ERA5 data; however, the manuscript does not provide sufficient information on how this model was trained and validated, nor whether it performs reliably under the maritime–Arctic conditions of Troms and near-melting regimes where surface temperature strongly influences wet-snow processes. To increase confidence in the derived SNOWPACK predictors and the projected changes in avalanche problems, I recommend that the authors revise and validate the SNOWPACK settings used with the NORA3 input data. Alternatively, given that including SNOWPACK variables did not enhance model performance compared to an earlier study (Eiselt and Graversen, 2025), it may be worth considering whether a meteorology-only random forest model provides a more robust and framework (l.595).

Spatial aggregation: A key strength of this study is the use of dynamically downscaled climate data at relatively high spatial resolution (3 km), which is suited to capturing meteorological drivers of avalanche conditions. However, to reduce computational cost, the study averages the 3-km forcing over large warning regions and four elevation bands and uses these regional means as SNOWPACK input. While this is understandable for efficiency, it risks smoothing or misrepresenting key physical drivers of avalanche problems. Many processes relevant for avalanche hazard are dominated by spatial extremes rather than means, including precipitation peaks and wind redistribution. I appreciate that the authors explicitly acknowledge the limitation of spatial aggregation in the Discussion (l.403) and that they tested alternatives, including running SNOWPACK for selected wind- and snow-exposed grid cells. To increase confidence that the aggregation strategy is not suppressing avalanche-relevant signals, I recommend providing a clearer evaluation of how aggregation affects predictive skill, beyond the wind-exposed-cell experiments already reported. For instance, it would be informative to compare models using regional means with those using upper percentiles or maxima of key predictors within each warning region/elevation band. If computational cost is a constraint, an alternative to region-wide averaging could be to run SNOWPACK on a small stratified sample of grid cells (e.g., across elevation bands and representative subregions/exposure classes) and use percentiles/extremes across these simulations as model inputs, rather than relying solely on mean forcing.

Climate-model uncertainty / wind: The manuscript appropriately acknowledges limitations due to the small NorCP ensemble and severe spatial aggregation. However, given that wind slab is a dominant avalanche problem in the study region, it would be valuable to discuss uncertainty in modeled wind speed characteristics in both NORA3 and NorCP. A brief discussion of how wind uncertainty may propagate into projected wind-slab changes would strengthen the interpretation of the future results.

Overall, the paper is timely and has strong potential, but addressing the above issues would substantially improve the robustness and interpretability of the projected changes in avalanche problems.
Citation: https://doi.org/10.5194/egusphere-2025-4685-RC3
- AC3: 'Reply on RC3', Kai-Uwe Eiselt, 15 Jan 2026
  
  We thank the reviewer for considering our manuscript.
  Please find our response in the pdf supplement. Our responses are written in green.
  
  Citation: https://doi.org/10.5194/egusphere-2025-4685-AC3
- AC4: 'Reply on RC3', Kai-Uwe Eiselt, 15 Jan 2026
  
  We thank the reviewer for considering our manuscript.
  Please find our response in the pdf supplement. Our responses are written in green.
  
  Citation: https://doi.org/10.5194/egusphere-2025-4685-AC4

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload

ED: Publish subject to revisions (further review by editor and referees) (19 Jan 2026) by Nora Helbig

AR by Kai-Uwe Eiselt on behalf of the Authors (20 Jan 2026) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (20 Jan 2026) by Nora Helbig

RR by Anonymous Referee #1 (10 Feb 2026)

Suggestions for revision or reasons for rejection

Review of Past and future changes in avalanche problems in northern Norway estimated with machine-learning models
By Kai-Uwe Eiselt and Rune Grand Graversen

Summary
This paper presents a model chain to reconstruct and project avalanche hazard in northern Norway. The approach builds on the authors’ earlier work using Random Forest models for avalanche danger prediction. The model chain combines dynamically downscaled climate model output with the snow cover model SNOWPACK, which provides physically based snow stratigraphy and stability variables. These outputs, together with meteorological variables from the downscaled climate models, are used as inputs to Random Forest classifiers that predict avalanche days for multiple avalanche problem types (e.g., wet snow, storm snow, wind slab, and persistent weak layers). The results show distinct historical trends in the frequency of avalanche days depending on avalanche problem type, as well as statistically significant relationships with large-scale climate drivers such as the Arctic Oscillation (AO). Finally, future projections based on climate scenarios (RCP4.5 and RCP8.5) indicate systematic shifts in avalanche problem regimes, consistent with trends previously reported for Alpine regions in Switzerland and France.
The paper is generally well written, well thought out, and is worthy of publication in The Cryosphere. I would like to thank the authors for a comprehensive review response and changes. I have only technical corrections before publication.

Technical corrections:
Line 586: Remove “with” for “Several issues and limitations”
Line 602-607: Is this somewhere in the supplement information? If available, you can maybe point to the figure/section.
Line 610-611: Same comment as above, add the figure/supplement section if available.
Line 613-614: Not very a correction here but mostly a comment on why the performance is not improved with SNOWPACK. The Pwl slab problem would be describe by SNOWPACK (development of weak layers), but the triggers itself, especially if using natural release, would mostly be weather related like a snowfall. This might be an explanation of why it’s not improving your model accuracy as anticipated, in addition to biases you mentioned above in section 6.4.

Hide

RR by Anonymous Referee #4 (09 Mar 2026)

Suggestions for revision or reasons for rejection

This paper combines different tools to study past and future avalanche activity in northern Norway. It uses a Random Forest (RF) model for inferring between avalanche days and non-avalanche days based on meteorological input data from meteorological and climate models and snow information from the snowpack snow cover model. Both the past and future trends on partition between avalanche and non-avalanche days are presented.
The main novelty of the paper is to presents trends on northern Norway, a region that benefit from less studies than for instance the European Alps to which the results are compared. The overall methodology is similar to the methodology presented in previous work on European Alps with some adaptations to data available.
The paper is quite long, with a lot of appendices and supplements. It shows the amount of adaptation work needed for Arctic regions but may also make reading more difficult. A proofreading to remove typos and highlight the main conclusions would be valuable.
The scientific challenge is interesting and the overall question is relevant for publication in TC. However, my main concern is about the methodology for running snowpack simulations and validation of the results on historical period.

Major comments

1. In I understand well, the snowpack model is forced by precipitations and surface temperature of snow. This configuration is generally used when there are snow surface temperature measurements available. Here, you use an emulated definition of surface temperature coming from a relation between ERA5 data and NORA3 data. However, both data does not have the same representation of snow and in your Figure S1 it is clear that you mix soil with and without snow. You apply this relation to a third model (SNOWPACK) independent in terms of snow coverage. I cannot imagine that there is no discrepancies in the result with high surface temperature (coming from a non-snow situation in the atmospheric model) applied to snow-covered soil in SNOWPACK. The reported RMSE of about 4K seems quite high for me, especially in northern Norway where you state that the air temperature is generally not far from 0°C. There is no sensitivity associated to this quite important parameterization of the input of SNOWPACK. Additionally there is in the paper nothing to judge the relevance of the snowpack represented by SNOWPACK model. Hence, with the presented data, I cannot conclude on the relevance of the variables derived from SNOWPACK model and the conclusions that come from the correlation or absence of correlation with variables from SNOWPACK.

2. There is some points that would benefit from clarification in the input data and optimisation procedure of the RF model. In table C1 and C2, it was not clear for me on which variables the suffixes are applied. Is it to all features, only some, how they are combined when several suffixes are possible? With the current state of the manuscript, it was very difficult for me to figure out what are the exact inputs of the model.
For the optimisation, you “consider” the average of listed metrics, so F1 score, TSS, FAR, accuracy. This is a quite suspiring approach. Usually, coherent indicators mixing the different part of the confusion matrix are used but I do not know examples of optimisation on such a mix of scores that, for some of them, already combine the different parts of the confusion matrix (e.g. TSS, F1). This needs at least to be discussed and justified. I fully agree that using only FAR or accuracy may lead to incorrect results but some existing tools already exist to combine indicators coherently (e.g. ROC for combining recall and FAR).

2.1 there is an inconsistency in the presentation of features when they are several on the same line. e.g. “w1, w3, w7” but “wdrift_2, 3”. This does not ease the reading of the table.
2.2 For the sum of LWC by volume, a clear definition would be valuable as summing percentages on layers of different thicknesses is a nonsense.
2.3 : For SSI, Sk38 and Sn38 variation of the index on 1 to 3 days, I would like to be sure that it is the variation on the same layer even though three days before, an other weak layer have been identified as the weakest layer. Can you explain more in detail this variation?

3. The presentation of the evaluation of the RF model is quite confusing and surprising.
The percentages of figure 4 are not related to the whole sample but subsamples consisting of the different classes. This is not quite clear in the legend (the “instances” are not defined clearly) and difficult to interpret. Moreover the legend and the text line 318 state that two of the four percentages presented are recall score while only one correspond to the common definition of recall (also presented in appendix D). The recall is the part of positive avalanche conditions that are correctly predicted (lower right cell only) and not the part of negative conditions that are correctly predicted (which is usually called a true negative rate or specificity). I suggest to present the scores in a table independent from the confusion matrices presented in Fig 4 and to present all the scores used for optimisation.

4. Past results are presented with NORA3 data while future (climate) projections are presented with NorCP. However, I have not seen a comparison of the data from NorCP on the historical period with observational data and/or NORA3 data that have been validated in section 4. Usually, climate models are compared on an historical period to observations or other data that give confidence in their past representation of the studied object and validate their application in a far future. Figure 9 could be enhaced by having for the historical period both the data from NorCP and AvD/non-AvD repartition and/or NORA3 data.

Minor comments

1. The last sentence of the abstract is not easy to understand. A rewriting may allow to deliver the same message more easily.

2. Line 41-46, the sentence is very long and some explained acronyms are not (or nearly not) used in the manuscript and may be removed (e.g. SSP, SRES).

3. Fig 1 typo in the legend.

4. Figure 3 : Does the line between points have any significance? If not, please remove it.

5.Line 271 : Balancing of classes for RF training. You choose to oversample the minority class. It would also be possible to downsample the majority class or a combination of both. RF models cal also adjust the probability of drawing an observation based on the unbalancing between classes to ensure that the probability of using an observation of minority or majority class is equal. Can you briefly explain and/or justify this choice?

6. The abbreviation TSS is used both for snow surface temperature and true skill score, this does not help with readability…

7. Table D1 and Figure 4 does not have prediction/True value at the same place.

8. On Table D1, it is strange to have a, c and d expressed in terms of AvD/non-AvD but not b. Otherwise, a can be called a true positive or a hit, c a miss and d a true negative.

9. Line 302, point 2 may be rephrased to be easier to read.

10. I do not clearly see the interest of the “General” model (e.g. Fig 5). I understand all conditions are gathered into this model. However, the climatology of Fig3 show that wind-related problem is largely predominant. Hence, it seems quite logical that the “General” model is quite close to the one trained only on wind slab problem.

11. Figure 7 : There is no uncertainty on this graph while there is on Figure 6. Would it be possible to transfer the uncertainty computed on Fig 6 into Fig 7 because the uncertainty on Fig 6 is interesting for visualisation but not for reading a quantitative value while Fig 7 is designed for that purpose.

12. Figure 7, typo “significance”

13. Mo,e 480-481 : what are the histograms of s3_emin and s3_emax? I wonder if the better correlation with s3_emax is not linked to the fact that at the maximum altitude, you mainly have snow while at lower altitude, you have frequent rain (so s3_emin is zero and could therefore not be well correlated to anything which is not constant).

14. Line 569-570 : I think there is a misunderstanding of Castebrunet et al., 2014 as northern French Alps are generally considered of higher elevation than southern French Alps… They mainly state that “results on small scales may be more uncertain, for instance those concerning the southern French Alps”.

15. Appendix D : Some metrics are redundant (e.g. FAR and PR).

Hide

ED: Publish subject to revisions (further review by editor and referees) (09 Mar 2026) by Nora Helbig

AR by Kai-Uwe Eiselt on behalf of the Authors (17 Mar 2026) Author's response Author's tracked changes Manuscript

ED: Publish as is (22 Mar 2026) by Nora Helbig

AR by Kai-Uwe Eiselt on behalf of the Authors (23 Mar 2026) Manuscript

Short summary

We train machine-learning models to predict avalanche problems from meteorological and snow-cover data in northern Norway. A major part of the work is the estimation of avalanche-problem changes throughout the 21st century based on future climate projections. We find that while the avalanche danger generally declines towards 2100, the avalanche characteristics will likely change, meaning fewer dry but more wet avalanches, having potential implications for the avalanche-danger forecast quality.