This paper combines different tools to study past and future avalanche activity in northern Norway. It uses a Random Forest (RF) model for inferring between avalanche days and non-avalanche days based on meteorological input data from meteorological and climate models and snow information from the snowpack snow cover model. Both the past and future trends on partition between avalanche and non-avalanche days are presented.
The main novelty of the paper is to presents trends on northern Norway, a region that benefit from less studies than for instance the European Alps to which the results are compared. The overall methodology is similar to the methodology presented in previous work on European Alps with some adaptations to data available.
The paper is quite long, with a lot of appendices and supplements. It shows the amount of adaptation work needed for Arctic regions but may also make reading more difficult. A proofreading to remove typos and highlight the main conclusions would be valuable.
The scientific challenge is interesting and the overall question is relevant for publication in TC. However, my main concern is about the methodology for running snowpack simulations and validation of the results on historical period.
Major comments
1. In I understand well, the snowpack model is forced by precipitations and surface temperature of snow. This configuration is generally used when there are snow surface temperature measurements available. Here, you use an emulated definition of surface temperature coming from a relation between ERA5 data and NORA3 data. However, both data does not have the same representation of snow and in your Figure S1 it is clear that you mix soil with and without snow. You apply this relation to a third model (SNOWPACK) independent in terms of snow coverage. I cannot imagine that there is no discrepancies in the result with high surface temperature (coming from a non-snow situation in the atmospheric model) applied to snow-covered soil in SNOWPACK. The reported RMSE of about 4K seems quite high for me, especially in northern Norway where you state that the air temperature is generally not far from 0°C. There is no sensitivity associated to this quite important parameterization of the input of SNOWPACK. Additionally there is in the paper nothing to judge the relevance of the snowpack represented by SNOWPACK model. Hence, with the presented data, I cannot conclude on the relevance of the variables derived from SNOWPACK model and the conclusions that come from the correlation or absence of correlation with variables from SNOWPACK.
2. There is some points that would benefit from clarification in the input data and optimisation procedure of the RF model. In table C1 and C2, it was not clear for me on which variables the suffixes are applied. Is it to all features, only some, how they are combined when several suffixes are possible? With the current state of the manuscript, it was very difficult for me to figure out what are the exact inputs of the model.
For the optimisation, you “consider” the average of listed metrics, so F1 score, TSS, FAR, accuracy. This is a quite suspiring approach. Usually, coherent indicators mixing the different part of the confusion matrix are used but I do not know examples of optimisation on such a mix of scores that, for some of them, already combine the different parts of the confusion matrix (e.g. TSS, F1). This needs at least to be discussed and justified. I fully agree that using only FAR or accuracy may lead to incorrect results but some existing tools already exist to combine indicators coherently (e.g. ROC for combining recall and FAR).
2.1 there is an inconsistency in the presentation of features when they are several on the same line. e.g. “w1, w3, w7” but “wdrift_2, 3”. This does not ease the reading of the table.
2.2 For the sum of LWC by volume, a clear definition would be valuable as summing percentages on layers of different thicknesses is a nonsense.
2.3 : For SSI, Sk38 and Sn38 variation of the index on 1 to 3 days, I would like to be sure that it is the variation on the same layer even though three days before, an other weak layer have been identified as the weakest layer. Can you explain more in detail this variation?
3. The presentation of the evaluation of the RF model is quite confusing and surprising.
The percentages of figure 4 are not related to the whole sample but subsamples consisting of the different classes. This is not quite clear in the legend (the “instances” are not defined clearly) and difficult to interpret. Moreover the legend and the text line 318 state that two of the four percentages presented are recall score while only one correspond to the common definition of recall (also presented in appendix D). The recall is the part of positive avalanche conditions that are correctly predicted (lower right cell only) and not the part of negative conditions that are correctly predicted (which is usually called a true negative rate or specificity). I suggest to present the scores in a table independent from the confusion matrices presented in Fig 4 and to present all the scores used for optimisation.
4. Past results are presented with NORA3 data while future (climate) projections are presented with NorCP. However, I have not seen a comparison of the data from NorCP on the historical period with observational data and/or NORA3 data that have been validated in section 4. Usually, climate models are compared on an historical period to observations or other data that give confidence in their past representation of the studied object and validate their application in a far future. Figure 9 could be enhaced by having for the historical period both the data from NorCP and AvD/non-AvD repartition and/or NORA3 data.
Minor comments
1. The last sentence of the abstract is not easy to understand. A rewriting may allow to deliver the same message more easily.
2. Line 41-46, the sentence is very long and some explained acronyms are not (or nearly not) used in the manuscript and may be removed (e.g. SSP, SRES).
3. Fig 1 typo in the legend.
4. Figure 3 : Does the line between points have any significance? If not, please remove it.
5.Line 271 : Balancing of classes for RF training. You choose to oversample the minority class. It would also be possible to downsample the majority class or a combination of both. RF models cal also adjust the probability of drawing an observation based on the unbalancing between classes to ensure that the probability of using an observation of minority or majority class is equal. Can you briefly explain and/or justify this choice?
6. The abbreviation TSS is used both for snow surface temperature and true skill score, this does not help with readability…
7. Table D1 and Figure 4 does not have prediction/True value at the same place.
8. On Table D1, it is strange to have a, c and d expressed in terms of AvD/non-AvD but not b. Otherwise, a can be called a true positive or a hit, c a miss and d a true negative.
9. Line 302, point 2 may be rephrased to be easier to read.
10. I do not clearly see the interest of the “General” model (e.g. Fig 5). I understand all conditions are gathered into this model. However, the climatology of Fig3 show that wind-related problem is largely predominant. Hence, it seems quite logical that the “General” model is quite close to the one trained only on wind slab problem.
11. Figure 7 : There is no uncertainty on this graph while there is on Figure 6. Would it be possible to transfer the uncertainty computed on Fig 6 into Fig 7 because the uncertainty on Fig 6 is interesting for visualisation but not for reading a quantitative value while Fig 7 is designed for that purpose.
12. Figure 7, typo “significance”
13. Mo,e 480-481 : what are the histograms of s3_emin and s3_emax? I wonder if the better correlation with s3_emax is not linked to the fact that at the maximum altitude, you mainly have snow while at lower altitude, you have frequent rain (so s3_emin is zero and could therefore not be well correlated to anything which is not constant).
14. Line 569-570 : I think there is a misunderstanding of Castebrunet et al., 2014 as northern French Alps are generally considered of higher elevation than southern French Alps… They mainly state that “results on small scales may be more uncertain, for instance those concerning the southern French Alps”.
15. Appendix D : Some metrics are redundant (e.g. FAR and PR). |
Review of Past and future changes in avalanche problems in northern Norway estimated with machine-learning models
By Kai-Uwe Eiselt and Rune Grand Graversen
Summary
This paper uses a model chain to predict the past and future avalanche hazard in northern Norway. This work builds on previous work by the same authors, who developed Random Forest models to predict avalanche danger. The model chain they developed primarily consists of a dynamic downscaling of climate models in Norway for the past and future, which serves as input to the snow cover model SNOWPACK. Then, they build Random Forest models to predict avalanche days for several avalanche problems using meteorological variables (from the downscaled climate models) and snow instability variables from SNOWPACK. They show different historical trends in the frequency of avalanche days for different avalanche problems (e.g., wet, storm, wind, or persistent), as well as correlations with the Arctic Oscillation (AO). They conclude with projections of avalanche problems using climate projections (RCP4.5–8.5) for Norway, demonstrating similar results to those found in the Alps (Switzerland and France).
The paper is generally well written, well thought out, and is worthy of publication in The Cryosphere. The only major concerns I have regarding the methodology relate to the spatial aggregation of the downscaled climate simulations. In addition, more details should be provided concerning the SNOWPACK modeling for reproducibility purposes. It may also be beneficial to add a dedicated section in the discussion about the limitations and biases of their study, and how these affect their results (small one in the conclusion). There are a few punctuation issues across the text, and addressing them would enhance the flow of the manuscript.
Major Comments:
Specific comments (line number)
Section 1 - Introduction
15: Already have impact the occurrence in the arctic, especially in mass movements. they are several references in the literature.
Section 2 - Data
115: Change apply to past tense “applied”
158: What is slab snow??
160-161 : I think a ref to Figure 3 would be great here, as I struggle the get what the number means unless I look at Figure 3.
Figure 3: is ADL on the x axis the general? Please define.
183: punctuation is needed to enhance the flow between danger and we.
201: punctuation is needed to enhance the flow between conditions and Lind.
205: too-strong is a bit vague for an amount of precipitation, or maybe it is about precipitation rate? Please clarify.
205-209: Not sure the relevance of these information to describe the dataset, it feels more like an introduction, or maybe as a part of the discussion to compare with the results.
213: punctuation is needed to enhance the flow between cover and we.
216: not sure if this is the right reference for key summary of SNOWPACK. This paper is an update status on snow cover modeling in avalanche forecasting including CROCUS and SNOWPACK.
219: punctuation is needed to enhance the flow between temperature and we.
220-221: punctuation is needed to enhance the flow between (TSS) and we.
226: Do you end up with 4 SNOWPACK simulations per warning region? Each simulations have the average grid cell for 4 elevation band? Is 20 the total number per warning region or the entire study area? A sentence that summarizes how many simulations per warning region is needed.
230-235: Maybe reduce these lines to one or two sentences, as it limits the comprehension of your methods. We assumed that it is included and it complicates for nothing this section.
257: why explain this? Either remove it or put it into the result.
258: based, use past tense .
Section 3 - Methods
264: you need to state at least the main analysis and parameter we should not need to read another paper.
265: do you have values or maybe a figure to show the imbalance and the effect of the algorithm.
283: should the F1 score gives that?
Figure 4: Please adjust the font to match the manuscript, and define what is general? Maybe remove true danger, as danger bring confusion between danger level and avalanche problem.
Section 4 – Model Performance and features importances.
This section also results like section 5.
310: the false alarm is also very high.
313: please stick to one definition either problem or danger level.
Section 5 - Results
Section 5.1.1 : please use past tense.
339 - 340 : please rephrase this sentence.
343 : maybe refer to the figure 8.
344 : be consistent with fig. Or figure.
361 : was this define in the method section.
Section 5.2: there is way more reference to supplemental figures than figure 9, please put these into the text. Figure S9 has more references than figure 9. Or maybe the appendix, which is more accessible.
Section 6 - Discussion
Section 6.1: how the precision of the model affects your results especially the PWL.
419 - 428: I think it might be worth it to discuss these factors between the development and the trigger of the PWL.
491: would it be better yrs instead of y.
497: it might also be warmer and thaw events stabilizing the snowpack.
Section 7 – Summary and conclusions
593: why not write meteorological input as both are spatially aggregated for input to the rf's model.
597-598: I think this is rather concerning. it was also point out that SNOWPACK struggle to model artic snowpack, because of the high thermal gradient (Domine et al., 2019).
References
Domine, F., Picard, G., Morin, S., Barrere, M., Madore, J. B., & Langlois, A. (2019). Major issues in simulating some Arctic snowpack properties using current detailed snow physics models: Consequences for the thermal regime and water budget of permafrost. Journal of Advances in Modeling Earth Systems, 11(1), 34-44.