the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Combining snow physics and machine learning to predict avalanche activity: does it help?
Léo Viallon-Galinier
Pascal Hagenmuller
Nicolas Eckert
Abstract. Predicting avalanche activity from meteorological and snow cover simulations is critical in mountainous areas to support operational forecasting. Several numerical and statistical methods have tried to address this issue. However, it remains unclear how the combination of snow physics, mechanical analysis of snow profiles and observed avalanche data improves avalanche activity prediction. This study combines extensive snow cover and snow stability simulations with observed avalanche occurrences within a Random Forest approach to predict avalanche days at a spatial resolution corresponding to elevations and aspects of avalanche paths in a given mountain range. We develop a rigorous leave-one-out evaluation procedure including an independent test set, confusion matrices, and receiver operating characteristic curves. In a region of the French Alps (Haute-Maurienne) and over the period 1960–2018, we show the added value within the statistical model of considering advanced snow cover modelling and mechanical stability indices instead of using only simple meteorological and bulk information. Specifically, using mechanically-based stability indices and their time-derivatives in addition to simple snow and meteorological variables increases the recall from around 65 % to 76 %. However, due to the scarcity of avalanche events and the possible misclassification of non-avalanche days in the training data set, the precision remains low, around 3.5 %, due to the scarcity of avalanche days. These scores illustrate the difficulty of predicting avalanche occurrence with a high spatio-temporal resolution, even with the current cutting-edge data and modelling tools. Yet, our study opens perspectives to improve modelling tools supporting operational avalanche forecasting.
Léo Viallon-Galinier et al.
Status: final response (author comments only)
-
RC1: 'Comment on tc-2022-108', Frank Techel, 28 Jun 2022
Review "Combining snow physics and machine learning to predict avalanche activity: does it help?" by Viallon-Galinier et al.
The authors present a random-forest algorithm, which predicts the occurrence of natural avalanches running to the valley bottom in the Haute-Maurienne part of the French Alps. The algorithm is trained using a long-term record of avalanche observations, a highly unbalanced data set with 100 times more non-avalanche days compared to avalanche days. From my perspective, the novel - and certainly very challenging aspect of this study, is the prediction of (often single) avalanche events for aspect-elevation segments. The algorithm's predictive performance is characterized by recognizing many of the observed avalanche days, but having a very high false-alarm rate (only 3% of the predicted avalanche days coincided with observed avalanche days). The manuscript is well written, and most sections are easy to follow. Questions, however, arise with regard to the definition of the target variable (Sections 2.1-2.3, 2.5.1, Discussion), the stability indices for dry snow (Sect. 2.4.1), and the way the variable importance is presented and interpreted (Sect. 3.2 and Fig. 4).
Please find below some comments regarding these three points. I hope these comments will be helpful in improving the manuscript.
General comments
(1) Definition of the target variable and subset used for training and testing
- You defined avalanche days (AvD) and non-avalanche days (nAvD) by aspect-elevation-segment (AE segment). For a specific AE segment, an AvD is fulfilled if at least one avalanche running to the valley bottom (below the blue line in Figure 1) was observed, while nAvD are all other days (l 148-149). If possible, please provide an indication regarding the minimal avalanche size that would be typically required to reach this run-out zone in the study area.
- Overall, I think that the description of AvD and nAvD could be improved. Particularly, what is considered a nAvD is not fully clear. Furthermore, as nAvD were 100 times more frequent compared to AvD, it could be valuable to use a more strict definition of nAvD, excluding for instance days when avalanche activity was uncertain (l 96-101). Not doing so, will inevitably reduce the performance statistics, not because the model performs poorly, but because the target variable is uncertain.
- Some avalanche events had uncertain dating (l 96). Please indicate the number of these events.
- You removed avalanche events with an uncertainty on the release date of more than three days from the data set (l 97-98). Were these days and AE segments then treated as nAvD, or removed from the data set?
- In case the uncertainty of the release date was two or three days, you assigned the last day as the date of release (l 98-99). Did you treat the two previous days as nAvD, or were these removed from the data set? On l 146-148 you explain why the time derivatives are required and that avalanches may release when the stability is lowest. This is somewhat different to how you assigned the avalanche release date when this was uncertain.
- You state that the data set provides a "nearly exhaustive screenshot of natural avalanche activity" (l 93). To me, less than 3000 avalanches in 110 paths in 58 years do not seem exhaustive at all. Consider rephrasing this sentence, for instance to "a representative screenshot of avalanche activity of avalanches running to valley floor" or similar.
- There are 110 avalanche paths and 24 AE segments. - If you consider the topographical distribution of potential start zones, are all AE segments equally often represented? For instance, the distribution in Figure 2 shows that there were 100 times more avalanches in the South aspects compared to the North-East aspects. Is this due to more start zones in South aspects or because activity was indeed higher? Providing more information on the distribution of start zones per AE segment would help the reader to understand this relationship. Consider showing the AE distribution of potential start zones in the study area, maybe in a plot similar to Figure 2. If they were distributed rather unequally, please discuss how you considered this in the analysis, and what impact this may have on the results.
- You attempt to predict both dry-snow and wet-snow avalanches with the same algorithm. I suspect that this probably contributes to the poor performance of the algorithm as a dry-snow avalanche can't be correctly predicted by a tree, which learned conditions favorable for a wet-snow avalanche, and vice versa. This should be discussed.
- Does the EPA provide information on the wetness of the avalanche? Please briefly indicate whether it did or not and if it did, why you preferred to develop one rather than two algorithms. It could also be discussed that splitting the data into wet and dry snow conditions using the simulated stratigraphy and learning two separate algorithms may have helped to address the different release mechanisms in a more appropriate manner, which would potentially also cause fewer false alarms.
- Why did you pick 15 Oct until 15 Mar as the winter season? 15 Oct seems rather early, and 15 Mar rather late. Please explain.
- Why did you use a 1 cm threshold as minimal snow depth? (l186) Or did you use 10 cm, as stated later in the manuscript (l 299)? Both values seem rather low snow depth values considering that avalanches must be rather large to reach the run-out zones. Also along this line: how did you treat cases when there was no snow in a lower elevation band, but some snow in the highest elevation band. I suspect that avalanches running almost to the valley bottom are probably rather unlikely in these situations (-> nAvD), even if conditions in the start zone would favor avalanche release.
(2) Presentation and interpretation of variable importance (Sect. 3.2 and Fig. 4)
- Fig. 4 shows the variable importance, aggregated (summed) by groups of variables. This is a rather unusual way of presenting variable importance and makes the interpretation of the plot rather difficult. For instance, snow depth and variations (SDV) and dry snow stability indices (DSSI) have the same cumulative Gini importance (about 0.18), but the first contains 7 variables, the latter 30. This means that on average each SDV variable has a higher importance (0.18/7 = 0.025) compared to a single DSSI variable (0.18/30 = 0.006). This only becomes clear from the plot when making these calculations. This is also somewhat indicated in the text (l 259-260).
- To me, it was not intuitive, which of the 7 variables belong to snow depth and variations (SDV). I was able to figure this out after going back to Table 1. Maybe you could somewhere describe this more clearly in Table 1 and/or Figure 4? For the other variable groups, this was clear.
- Did the depth of the weak layers, described in Table 1, not play a role in the RF models? It seems to be missing in Figure 4.
(3) Variable definition (Sect. 2.4.1)
You selected the five weakest layers in each profile (l133-136). Please explain why you used five layers and not just the weakest one. Furthermore, I wonder whether the stability of the five weakest layers isn't highly correlated? What would happen if you train the RF only with the weakest layer? Please elaborate more on how you selected the five weak layers if the local minima for Sn, Sa, Sr, + two crack propagation indices were in five different layers, and how if they all indicated the same weak layer.
Technical comments
- l 60: consider rephrasing this sentence as machine learning approaches evaluation is somewhat awkward to read
- l 63: consider replacing of of interest with suitable, or similar
- l 72: in this study could probably be deleted
- l 77: consider removing largely
- l 87: consider adding was before extensively
- Figure 1: please show the runout area more clearly, for instance by shading it
- l 97-98: consider rephrasing the second part of this sentence (from the data set at the end of the sentence)
- l 144: typo Considering --> considering
- l 146-148: somewhat awkward to read, consider splitting or rephrasing this sentence
- l 180: consider rephrasing the beginning of this sentence to We use two classes or similar
- l 186: You mention that the first selection criteria causes undersampling. What impact did the second selection criteria have?
- l 207: typo probabilityy --> probability
- l 215: Consider changing truly to correctly, or similar
- l 243: typo closed --> close
- l 250: add day after avalanche
- l 298: what does leading to strong results mean. A recall of 3% is not really strong. Consider rephrasing.
- Discussion: It would be rather nice to see an exemplary time series of the model predictions for one winter season for all 24 AE segments, together with the corresponding observed avalanche activity. This may help the reader to get a better impression on the correlation between avalanche activity and model predictions.
- l 351-353: this statement is correct, but maybe more importantly, this lowers the observed performance of the classifier as AvD predictions may be counted as a false alarm when in fact there was a (smaller) avalanche
Citation: https://doi.org/10.5194/tc-2022-108-RC1 -
AC1: 'Reply on RC1', Léo Viallon-Galinier, 14 Oct 2022
The comment was uploaded in the form of a supplement: https://tc.copernicus.org/preprints/tc-2022-108/tc-2022-108-AC1-supplement.pdf
-
RC2: 'Comment on tc-2022-108', Karl W. Birkeland, 15 Aug 2022
In this paper the authors present a method using random forests to predict natural avalanches running to the valley bottom in the French Alps. Their methods appear to be solid, and the question they are trying to answer is important. In comparison to previous research, the novelty of their approach is that they make their predictions at the spatial scale of specific elevations and aspects. The paper is generally well-written and clear. I believe this research makes a valuable contribution, but I also feel there are issues that should be addressed prior to publication.
Here are a few of the major issues that I believe should be addressed:
- It would be helpful for the reader to better understand the spatial characteristics of the starting zones of the approximately 110 avalanche paths in the study area. Looking at Figure 1, it appears that most of the starting zones will have either a NW or a SE aspect. I am not sure about the distribution of the starting zone elevations. A Figure like Figure 2 (which shows the distribution of avalanche events by aspect and elevation) should be created for the avalanche path characteristics. In fact, it would be useful to pair this new Figure with Figure 2 so the reader could assess the effect of the avalanche path characteristics on the number of avalanches in each elevation/aspect zone.
- Along these same lines and again looking at Figure 1, I assume that the elevations and aspects of the avalanche starting zones are not evenly distributed in the 24 classes (three elevation and eight aspect categories). How does this affect the analyses? I understand that the authors would like to use the 24 elevation/aspect categories used in avalanche forecasts, but I wonder if it is appropriate to use all 24 categories for a dataset that appears to be unbalanced in the distribution of avalanche starting zone characteristics? How is this affecting their results?
- Another issue is the inclusion of both dry and wet snow avalanches in the same analysis. This was also pointed out by the other reviewer. Since we know that the avalanche release mechanisms for these two primary categories of avalanches are quite different, as are the meteorological factors that lead to instability, why are these included in the same analysis? Perhaps this is because both wet snow stability indices and dry snow stability indices are included? Wouldn’t it be better to split all the avalanches into “dry” and “wet” categories, and then proceed with the analysis on each of these two subsets of the data?
- The other reviewer also mentioned another issue I believe needs to be addressed. The dataset does not include all avalanches that occurred, but rather it consists predominantly of avalanches running to the valley floor. I assume these are almost all quite large avalanches. Can you provide a range of the size of the avalanches? Are they all Size 3 (on the Canadian or the U.S. destructive scale) or larger? Or perhaps size 4 or larger? What effect do the authors believe that this bias toward large avalanches has on their results?
- While the authors reference some of the more recent work on predicting avalanches with random forests, I feel like they might want to also reference some early work that attempts to better predict avalanche activity using the statistical techniques available at that time. These older papers had more the more modest goal of trying to predict avalanche days (without elevation/aspect of the starting zones), but they were a first step in this direction. This does not have to be a comprehensive review at all, but just a sentence or two with some references would be nice to see. Some older examples exist of researchers using discriminant analysis (examples: Bovis, 1977; Foehn and others, 1977), nearest neighbor techniques (example: Buser, 1983), and binary regression trees (example: Davis and others, 1992). Also, who was the first to use random forests for this type of work? Perhaps one of the authors who you already reference?
- Finally, one thing that perplexes me about this research is why new snowfall is rated so low in importance (Figure 4). This is completely different than prior research, which typically rated snowfall as the most important factor for dry avalanche release. Why do the authors believe this is the case? Is it because the “snow depth and variations” class is capturing this essential information? Or is it because of this information is captured (fully or partly) in some of the stability indices? Or is it the mixing of the dry and wet snow avalanches into one dataset? It might also be related to the fact that the dataset consists of only large avalanches. What do the authors think?
Despite the above comments, I believe this is valuable research and is deserving of publication once the authors address or respond to these issues.
I have also attached an annotated PDF, which includes corrections to some typographical errors, as well as further suggestions and suggested wording changes.
I hope the authors find my comments and suggestions useful.
Karl Birkeland
Some possible older references (the authors may have other/different older references they wish to cite):
Bovis, M.J. 1977. Statistical forecasting of snow avalanches, San Juan Mountains, Southern Colorado, U.S.A. Journal of Glaciology 18(78), 87-99.
Buser, O. 1983. Avalanche forecast with the method of nearest neighbors: An interactive approach. Cold Regions Science and Technology 8, 155-163.
Davis, R.E., K. Elder, and E. Bouzaglou. 1992. Applications of classification tree methodology to avalanche data management and forecasting. Proceedings of the 1992 International Snow Science Workshop, Breckenridge, Colorado, 123-133 (available at: https://arc.lib.montana.edu/snow-science/item.php?id=1245).
Foehn, P.M.B. and others. 1977. Evaluation and comparison of statistical and conventional methods of forecasting avalanche hazard. Journal of Glaciology 18(78), 375-387.
-
AC2: 'Reply on RC2', Léo Viallon-Galinier, 14 Oct 2022
The comment was uploaded in the form of a supplement: https://tc.copernicus.org/preprints/tc-2022-108/tc-2022-108-AC2-supplement.pdf
-
RC3: 'Comment on tc-2022-108', Simon Horton, 17 Aug 2022
General comments
This study presents a statistical model to predict avalanche and non-avalanche days using a combination of weather data, modelled snowpack properties, and modelled stability indices. The model is developed with 58 years of avalanche observations from a region in France. The study is designed to examine the added value of stability indices in statistical models for avalanche activity. While statistical models have been widely developed and tested in the scientific literature, investigating how recent advances in snowpack modelling and snow mechanics could improve these models is an interesting and worthwhile objective that is well suited for The Cryosphere. My main concern is how some of the methodological choices likely impacted the results and conclusions. I also think the study missed an opportunity to present their spatially distributed results (i.e., by aspect and elevation) which could be of value to avalanche forecasters. Please see my specific comments for suggested revisions to this paper.
Specific comments
- Manuscript structure: The paper was well structured with complete and logical flow of information. The graphics were also clean and easy to interpret.
- Sampling of days to include in the study: I question some of the choices made about filtering the data set and how that impacted the results. A few things stand out as dramatically impacting the set of avalanche days and non-avalanche days that were analyzed:
- Why was the period restricted to Oct 15 to Mar 15? Doesn’t this remove a large portion of large wet avalanches from the study? What is the purpose of including wet snow stability indices when many of the wet snow avalanche days have been removed? Do you have any information about wet versus dry avalanche activity in the EPA data set? Similarly, I question how meaningful including days in October and November are for predicting full path avalanches.
- Second, the threshold of 1 cm (or 10 cm in other parts of the manuscript?) seems very low considering the avalanche observation data only considered avalanches reaching the bottom of avalanche paths. I think a larger threshold would be much more appropriate. Choosing a threshold depth for avalanches grounded in literature or deriving one form your data set would be more appropriate (e.g., calculate the distribution of snow depths on avalanche days and chose a low percentile as a cut-off). I assume this would be on the order of 100 cm and would remove many of the non-avalanche days from the study.
- I suspect plotting the avalanche activity by day of year and snow depth would reveal informative patterns about when discriminating avalanche and non-avalanche days is actually important to avalanche forecasters. A model informing the likelihood of large natural avalanches in mid-winter and late-winter is likely much more helpful than a model informing whether the snowpack depth has reached the threshold for avalanches.
- By removing more of the uninteresting non-avalanche days, the dataset would be more balanced. This would likely diminish the obvious impacts of snow depth on the resulting models and put more weight on the stability indices, which would better suit the objective of the study.
- Weak layer selection: The choice of always selecting 5 weak layers seems unusual and was not adequately justified. What is the benefit to this method over choosing a threshold value to identify weak layers? Could there be adverse effects to having many extra layers in the analysis that are potentially stable and uninteresting? For example, wouldn’t this diminish the importance of the stability indices compared to a dataset that only included layers that met some type of threshold stability criteria?
- Classification scores and model performance: I wonder how my previous comments impact the resulting classification scores. The precision seems very low, despite the explanation provided. I was also surprised to see the low performance of the meteo subset, as I would expect weather factors to be significantly better at predicting natural avalanche activity than a random model. Especially when considering large natural avalanches, common forecasting experience and past studies have found simple weather indices like 72 hour accumulated precipitation and air temperature to be strong influences. This has me question the representativeness of the dataset/variables and the overall soundness of the results. Can you justify the low performance of the meteo subset in this model?
- No presentation of results by aspect and elevation: While I understand the decision to aggregate the results from different aspect and elevations to see the overall importance of input variables, I think presenting some of the aspect and elevation patterns would be of great interest as well. First, the question of how well the model can predict the location of avalanche activity would be valuable to forecasters. Second, it’s not clear whether the imbalance in the amount of avalanche days by terrain class shown in Fig. 2 impacted the results (e.g., how does the model performance compare on south aspects where there were many avalanche days versus NE aspects where there were few avalanche days).
- Writing style: I found parts of the manuscript difficult to read, with poor flow between sentences and phrases interrupted by citations. I had to read some paragraphs twice to fully understand the meaning and would appreciate additional editing to improve the readability.
Technical comments
- Title: Is “snow physics” the best way to describe the dataset in this study? It has a broad range of interpretations and when first reading the manuscript I wouldn’t have automatically assumed the main data was model-generated stability indices.
- Lines 11-12: The terms “recall” and “precision” are rather technical for the abstract and would probably have more impact if replaced with plain language descriptions (e.g., predicted X% of days when avalanches were observed), especially considering there are many synonyms for contingency table statistics and some readers may not be familiar with these specific ones.
- Line 20 “Human infrastructure” is an unusual term and could probably be described better.
- Line 19-23: These first few sentences are examples where the position of citations interrupts the readability.
- Line 42: The phrase “delimitation lines around avalanche-prone conditions” is verbose and could be more concise and clear.
- Lines 50-52: Nice context and motivation for this study!
- Line 52: I question whether adding mechanical stability indices would “reduce the complexity of statistical tools”. These tend to be relatively complex variables dependent upon many other parametrized variables, and in my view are more complex than a simple model based on variables like snow depth and air temperature. I suggest removing “reduced complexity” and directly stating what is meant by complexity (i.e., models with fewer variables and interactions).
- Lines 62-63: This important sentence stating the objective of the study should be written to be more clear and specific. I had to read this multiple times and was still unclear on the big picture aim of the study.
- Line 72: remove “an” from “study an area”
- Line 75: What is meant by a “series of events” being reliable? Is this refereeing to reliable observations of the events?
- Line 80: Please justify this date range. As mentioned above, the early part of this range likely contains many uninteresting non-avalanche days and the late part of this range omits large spring avalanches. This date range criteria could be dramatically influencing the results and their interpretation.
- Line 88: Can you comment on the typical size of these avalanches that reach the run-out threshold (e.g., using the EAWS scale https://www.avalanches.org/standards/avalanche-size/). This would help readers better understand the type of avalanches this model predicts. Also, are all these avalanches natural or are any of the paths modified or controlled with explosives (because snowpack would impact the representativeness of the snowpack model)?
- Line 98: Please describe how avalanche date uncertainty is defined? Do observers estimate a range of dates?
- Line 116: Was the entire study area treated as a single massif in SAFRAN or was SAFRAN run for each municipality? If a single massif, why is it meaningful to show the three municipalities in Fig. 1?
- Line 131: I think a bit more detail about these indices could be included in this section rather than referring to another paper. Providing equations and/or describing some of the key snowpack outputs used to calculate strength and stress would be valuable. Also, the only reference for Viallon-Galinier (2021) in the reference list is https://doi.org/10.1016/j.coldregions.2020.103163, but I think these citations are intended to refer to https://doi.org/10.1016/j.coldregions.2022.103596 which is not listed.
- Line 133: The choice to select five weak layers from every profile is not adequately justified. Also see my specific comment about how this may impact the results. Also, when defining the local minimum is one layer identified for each separate indices or is there some type of weighted average? If the former case, are there situations where a layer may be duplicated because it is the minimum for multiple indices?
- Sect 2.4.3: I really like the addition of these time derivatives and think it is an interesting part of the study!
- Sect 2.5.1: With such a rich observation dataset I wonder why the simplest binary metric for avalanche activity was chosen. I would expect between the large set of avalanche observations and the types of stability indices included in the models you could try to predict more advanced indicators such as weighted avalanche activity indices, percentage of paths in an aspect-elevation sector that released, etc. The chosen indicator is fine, but perhaps the choice could be justified a bit more.
- Line 160: Be careful with using the term “the model” throughout the paper when both the physical snowpack model and statistical model are part of the study.
- Table 1: I appreciate this concise summary of model inputs. Minor corrections are the depth of dry snow weak layers is listed in consecutive rows, units are provided in different columns, and column 2 is missing a title.
- Lines 180-190: Are there also concerns about the imbalance in the aspect-elevation data? For example, based on Fig. 1 and 2 I assume the number of start zones per sector are variable, so is it reasonable to have an equal number of data points for NE and S aspects in the analysis?
- Line 185: A 1 cm threshold seems very small for full path avalanches.
- Sect. 2.6: I like the LOYO validation approach used in this study and it is well described here. One minor comment is why was the 20 to 80th percentiles chosen when 25-75, 10-90 or 5-95 percentile ranges are more common?
- Fig. 3: Please specify the range of uncertainty in the caption (i.e., 20th to 80th percentile).
- Line 260: Here and in Fig. 4 a new way of grouping the variables is introduced which differs from Table 2. I can track how these counts arise, but it could be clearer.
- Line 257: What is meant by new snow variations? This sounds like change in snow depth, which is not a variable listed in Table 1. Also, I would consider separating the snow depth from variations in Fig. 4 to see how much of the predictive power was simply due to snow depth reaching the threshold for avalanches versus how much was due to detecting snow depth changes over shorter time intervals.
- Fig 4: I suggest sorting the rows by WSSI and DSSI rather than time step to more clearly show the impact of different step sizes.
- Line 294: Please the describe the context referred to in Rubin et al. (2012), I am curious how such low precision has been justified in other studies rather than highlighting some type of issue with how the study was designed.
- Line 300: I disagree that the obvious non-avalanche days have been removed (see Specific comments).
- Lines 310-318: While I understand how the model is build with aspect-elevation specific inputs, I think presenting some of the terrain specific results would be a highly interesting part of the study.
Citation: https://doi.org/10.5194/tc-2022-108-RC3 - AC3: 'Reply on RC3', Léo Viallon-Galinier, 14 Oct 2022
Léo Viallon-Galinier et al.
Léo Viallon-Galinier et al.
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
494 | 199 | 23 | 716 | 12 | 7 |
- HTML: 494
- PDF: 199
- XML: 23
- Total: 716
- BibTeX: 12
- EndNote: 7
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1