Consistency in assigning an avalanche danger level when forecasting or locally assessing avalanche hazard is essential but challenging to achieve, as relevant information is often scarce and must be interpreted in light of uncertainties.
Furthermore, the definitions of the danger levels, an ordinal variable, are vague and leave room for interpretation.
Decision tools developed to assist in assigning a danger level are primarily experience-based due to a lack of data.
Here, we address this lack of quantitative evidence by exploring a large data set of stability tests (

Consistent communication of regional avalanche hazard in publicly available avalanche forecast products is paramount for avoiding misinterpretations by the users

the probability of avalanche release,

the frequency and location of the triggering spots, and

the expected avalanche size.

The probability of avalanche release, or “sensitivity to triggers” as termed in the Conceptual Model of Avalanche Hazard (CMAH;

The frequency and location of the triggering spots is typically unknown.
So far, it can only be assessed with laborious extensive sampling (e.g.,

Finally, avalanche size is defined with sizes ranging from 1 to 5 relating to the destructive potential of an avalanche (e.g.,

The EADS descriptions of the key factors for each of the five categories of danger level leave ample room for interpretation and are even partly ambiguous.
This may be a major reason for inconsistencies noted in the use of the danger levels among individual forecasters or field observers and, even more prominently, among different forecast centers and avalanche warning services

The same danger level can be described with different combinations of the three factors.
To improve consistency in the use of the danger levels, a first decision aid, the Bavarian Matrix was adopted by the European Avalanche Warning Services (EAWS) in 2005.
The Bavarian Matrix, a look-up table, combined the frequency of triggering locations with the release probability.
In 2017, an update of the Bavarian Matrix, now called the EAWS Matrix, was presented that additionally incorporates avalanche size

Challenges in the improvement of these decision support tools include the fact that the three key factors characterizing avalanche danger are not clearly defined and hence are poorly quantified

All the data described below were recorded for the purpose of operational avalanche forecasting in Norway (NOR; Norwegian Water Resources and Energy Directorate NVE) or Switzerland (SWI; WSL Institute for Snow and Avalanche Research SLF). In the vast majority of cases, these observations were provided by specifically trained observers, belonging to the observer network of either the Norwegian or the Swiss avalanche warning service.

For the analysis, we rely primarily on the Swiss data, using the Norwegian data for comparison and validation. Nevertheless, we will occasionally present results for Swiss and Norwegian data side by side.

The avalanche danger level is an estimate at best, as there is no straightforward operational verification.
Whether assessing the danger level in the field or in hindsight, it remains an expert assessment

We rely on the local danger level estimate provided by specifically trained observers.
In both countries, this estimate is based on the observations made on the day and on other information considered relevant

In this study, we make use of local estimates for dry-snow conditions only.
Each stability test or avalanche observation was linked to a danger rating as described below (Sect.

Data overview.

Operationally available information directly related to snow instability includes simple field observations as well as snowpack stability tests

The rutschblock (RB) test is a stability test, ideally performed on slopes steeper than 30

The extended column test (ECT) is a stability test that provides an indication of crack propagation propensity

Each stability test was linked to a danger rating relating to dry-snow conditions.
We considered the danger rating, which was transmitted together with the snow profile or stability test (in text form, SWI), most relevant.
In the Swiss data set, this danger rating was replaced for stability tests observed on days and in warning regions for which a verified regional danger rating existed (Sect.

The Swiss RB data set comprised 4439 RBs, observed mainly on NW-, N-, and NE-facing slopes (67 %) at a median elevation of 2380 m a.s.l. (interquartile range IQR 2160–2565 m) and a median slope angle of 35

As part of the daily observations, observers (and occasionally the public) reported avalanches observed in their region.
Avalanches can be reported not only individually but also by summarizing several avalanches into one observation.
While individual avalanches were reported in a similar way in SWI and NOR, the reporting of several avalanches differed.
In SWI, observers reported the number of avalanches of a given size. In all reporting forms, information about the wetness and trigger type could be provided.
In NOR, observers reported avalanche size, trigger type, and wetness, which was typical for the situation, and described the observed number of avalanches using categorical terms (single 1, some 2–4, many 5–10, numerous

The analysis was restricted to dry-snow avalanches, where the trigger type was either natural release or human-triggered. These avalanches were linked to a dry-snow local danger rating for the release date of the avalanche(s) in the same warning region.

To enhance the quality of the data, we filtered observations that we believe may indicate errors in the local estimate of the danger level or of avalanche size.
To this end, we calculated the avalanche activity index (AAI;

The total number of avalanches that remained was 33 262 in Switzerland, representing 6610 cases (different days and/or different warning regions), and 5755 in Norway (1618 cases; Table

Snowpack stability is one of the three contributing factors of avalanche hazard and relates to the probability of avalanche release.
In the following, we describe how we classified the results of the snow instability tests into the four stability classes (

Rutschblock (RB) test results were classified into the four stability classes according to Fig.

Extended column test (ECT) results were classified relying on the classification recently suggested by

If failures in several weak layers were induced in a single stability test, the test results were classified for each failure layer.
For this, we considered the failure as not relevant (rating the test result as good) if a failure layer was less than 10 cm below the snow surface

Stability classification of

The second factor contributing to avalanche hazard is the frequency of potential triggering locations or of snowpack stability.

To determine the frequency distribution of point snow instability within a defined region and at a given danger level many stability test results on a given day are in general needed

We randomly selected

For each of the

The second important parameter when bootstrap sampling is the number

These simulations are compared to a small number of days when more than six RB tests (

Schematic representation of the workflow for bootstrap sampling and frequency class definition.

Currently, neither well-defined terms to describe frequency classes (such as a few or many) nor thresholds to differentiate between the classes exist.
In the following, we therefore introduce a data-driven approach to define class intervals that we will use to describe the frequency of a certain snowpack stability class.
We considered the following points:

Classes should be defined based on the snowpack stability class most relevant with regard to avalanche release, hence the frequency of the class very poor. Even though the focus is on the proportion of very poor snowpack stability, classes need to capture the entire possible parameter space, i.e., from very rare to virtually all (1 % to 99 %).

The number of classes should reflect the human capacity to distinguish between them. We explored three, four, and five classes only, as these are the number of classes currently used to describe and communicate avalanche hazard and its components (e.g., three spatial distribution categories in the CMAH, four frequency terms in the EAWS Matrix, five danger levels, five avalanche size classes).

Classes must be sufficiently different to ease classification by the forecaster as well as communication to the user. And, if quantifier terms were assigned to these classes, these terms would need to unambiguously describe such increasing frequencies. An example of such a succession of five terms is

When assigning a danger level, the information relating to snowpack stability and the frequency distribution of snowpack stability needs to be combined with avalanche size. As we do not have data describing the three factors relating to the same day and region, we used a simulation approach by assuming that the distribution of the observed data represents the typical values and ranges at a specific danger level. Randomly sampling and combining a sufficient number of data points results in typical combinations of the three factors according to their presence in the data but may also produce a small number of less likely combinations.

We made use of the simulated frequency distributions of snowpack stability and their respective frequency class (Sect.

We first present the findings relating to the three contributing factors and their combination making use of Swiss rutschblock and avalanche data (Sect.

We analyzed the stability distributions obtained with the RB test at danger levels 1-Low to 4-High (Fig.

Distribution of stability ratings for the stability tests

Here, we describe the four frequency classes based on the frequency of very poor stability as sampled from the stability distributions shown in Fig.

Using four frequency classes, and labeling them

Large proportions of very poor stability (e.g.,

The correlation between the frequency class describing the frequency of very poor stability and the danger level was strong (

Distribution of the danger levels for the four frequency classes describing the proportion of very poor snowpack stability, derived from sampling 25 rutschblock tests (as described in Sect.

Frequency classification derived from the proportion of very poor stability ratings, using four frequency classes. The intervals for the frequency of very poor stability are shown.

Most avalanches in the Swiss data set were size 1 (Fig.

Considering the size of the largest reported avalanche per day and warning region showed that the largest avalanche per day and region was most frequently size 2 for 1-Low and 2-Moderate, a mix of size 2 and size 3 at 3-Considerable, and size 3 at 4-High (Fig.

Note that we did not explore days with no avalanches as we were interested in the size of avalanches and not their frequency. The frequency component is addressed using the frequency of locations with very poor stability as a proxy.

Size distribution of dry-snow avalanches, which released naturally or were human-triggered for danger levels 1-Low to 4-High, showing all avalanches

Assuming that the stability class very poor corresponds to the actual trigger locations, we combined the snowpack stability class, the frequency of this stability class, and avalanche size.
Hence, this combination considers all three key factors characterizing the avalanche danger level.
The resulting simulated data set contained the following information: danger level, frequency class describing occurrence of very poor stability, and largest avalanche size.
These data looked like the following, here for 1-Low:Sample 1 – 1-Low, a few, largest avalanche size 1Sample 2 – 1-Low, none or nearly none, largest avalanche size 2Sample 3 – 1-Low, a few, largest avalanche size 1

Table showing the combination of the frequency class of very poor snowpack stability and the largest avalanche size for the four danger levels. Frequencies are rounded to the full percent value. Bold values highlight the most frequent combination; “–” indicates that these combinations did not exist.

Finally, we present a data-driven look-up table to assess avalanche danger (Fig.

The first matrix (Fig.

The second matrix (Fig.

To derive the danger level, these two matrices can be used as follows:

In the stability matrix (Fig.

The resulting letter is transferred to the danger matrix (Fig.

The most frequent danger levels that were typical for this combination are shown.

Data-driven look-up table for avalanche danger assessment (similar to the structure proposed by

Data behind the matrices shown in Fig.

For the main results, presented in Sect.

Additionally to the RB test, we explored stability distributions derived from ECT results and performed not only in Switzerland but also in Norway at 1-Low to 4-High (Fig.

The proportion of poor-rated ECTs increased from 10 % at 1-Low to 28 % at 3-Considerable, while the proportion of the two most unfavorable stability classes combined rose from 16 % to 42 %.
At 4-High, where very few ECTs were observed, only the combined proportion of the two most unfavorable classes showed this increasing trend (61 %; Fig.

In comparison to the RB test (Fig.

The avalanche size distributions in Sect.

In Norway, size 1 was the most frequently reported size at 1-Low, while size 2 avalanches were the most frequent size at 3-Considerable and 4-High (Fig.

Considering the largest avalanche per day and warning region, Norway (Fig.

To obtain a variety of frequency distributions of point snow instability, we sampled stability ratings as described in Sect.

The results shown in Sect.

Simulated proportions of very poor and good snowpack stability derived from RB tests for different number of samples

When introducing the bootstrap-sampling approach to create a range of plausible stability distributions (Sect.

Comparing the bootstrap-sampled distributions with actually observed distributions of stability ratings on the same day and in the same region (

In all examples shown in Fig.

Comparison of observed (points,

Relevant parameters for the definition of class intervals, as introduced in Sect.

The correlation between the frequency class and the danger level increased with increasing

In the following, we discuss our findings in the light of potential uncertainties linked to the data (Sect.

Stability tests conducted by specifically trained observers are often performed at locations where the snowpack stability is expected to be low, though in an environment where spatial variability in the snowpack can be high

At 4-High, stability test data were limited, as not only are these situations rare and temporally often short-lived but also backcountry travel in avalanche terrain is dangerous and therefore not recommended. As a consequence, not only were considerably fewer field observations made but also these were dug on less steep slopes at lower elevation, which may potentially underestimate snow instability.

We relied on observational data recorded in the context of operational avalanche forecasting.
This means that differences in the quality of single observations are possible.
For instance, variations in both the estimation of avalanche size

Completeness of observations is another issue.
Avalanche recordings are generally incomplete, in the sense that not all avalanches within an area are recorded as well as that single observations may lack information, e.g., on size.
However, the size distributions (Fig.

To address potential bias in observations linked to Swiss observational standards

Finally, stability test results, avalanche observations, and local danger level estimates are generally not independent from each other, as often the same observer provided all this information.
However, as shown by

We relied on existing RB and ECT classifications (RB –

We could not rely on a large number of stability tests observed on the same day in the same region, which is a general problem in avalanche forecasting.
We therefore generated stability distributions using resampling methods (Sect.

Repeated sampling from small data sets may underestimate the uncertainty associated with a metric, but more importantly, the question must be raised whether the sample reflects the population well.
While at 1-Low to 3-Considerable, we sampled from between 700 and 2000 RB stability ratings per danger level, at 4-High the number of observations was very small (

Comparing the distributions of our snowpack stability classes with the characteristic stability distributions obtained during the verification campaign in Switzerland in 2002 and 2003, some differences can be noted (Swiss RB data).
For instance, the proportion of very poor and poor combined was at 2-Moderate about 15 % and at 3-Considerable about 40 %, which is lower than findings (20 %–25 % and about 50 %, respectively) by

In addition to simulating snowpack stability distributions using a resampling approach, we developed a data-driven classification of the proportion of very poor stability tests.
Our approach shows that the number

Assigning a class to the proportion of very poor stability, however, was affected by

The preferred number of classes

We showed an increasing frequency (or number of locations) of very poor snowpack stability with increasing danger level, in line with previous studies exploring point snowpack stability within a region or small basin

We explored primarily the frequency of the stability class very poor, which is most closely related to actual triggering points.
However, as several studies have shown, even when stability tests suggested instability, often only some of the slopes were in fact unstable and released as an avalanche

The most frequent avalanche size had little discriminating power, with the typical size being size 1 or size 2, regardless of danger level.
This can be explained by the fact that larger events normally occur less frequent than smaller events.
This frequency–magnitude relation has also been observed for other natural hazards

We showed that considering the largest avalanche per day resulted in a slightly better discrimination between danger levels.
This finding is also supported by

For danger level 5-Very High, for which we had no data, other studies have shown a further shift towards size 4 avalanches.

In Sect.

Our approach can only provide general distributions observed under dry-snow conditions.
The look-up table presented in Fig.

We explored observational data from two different countries relating to the three key factors describing avalanche hazard: snowpack stability, the frequency distribution of snowpack stability, and avalanche size.
We simulated stability distributions and defined four classes describing the frequency of potential avalanche-triggering locations, which we termed none or nearly none, a few, several, and many. The observed and simulated distributions of stability ratings derived from RB tests showed that locations with very poor stability are generally rare (Figs.

Our findings suggest that the three key factors did not distinguish equally prominently between the danger levels:

The proportion of very poor or poor stability test results increased from one danger level to the next highest one (Figs.

Considering the largest observed avalanche size per day and warning region was most relevant to distinguishing between 3-Considerable and 4-High (Fig.

To combine the three factors and to derive avalanche danger, we introduced two data-driven look-up tables (Fig.

We hope that our data-driven perspective on avalanche hazard will allow a review of key definitions in avalanche forecasting such as the avalanche danger scale.

The data are available at

FT designed the study, conducted the analysis, and wrote the manuscript. KM extracted the Norwegian data. KM and JS repeatedly provided in-depth feedback on the study design and analysis and critically reviewed the entire manuscript several times.

The authors declare that they have no conflict of interest.

We thank the two reviewers Simon Horton and Karl Birkeland for their detailed and very helpful feedback, which greatly helped to improve this paper.

This paper was edited by Guillaume Chambon and reviewed by Karl W. Birkeland and Simon Horton.