Discussion of “Brief communication: A numerical tool for averaging large data sets of snow stratigraphy proﬁles useful for avalanche forecasting” Author response 1

Abstract. Snowpack models can provide detailed insight about the evolution of the snow stratigraphy in ways that is not possible with direct observations. However, the lack of suitable data aggregation methods currently prevents the effective use of the available information, which is commonly reduced to bulk properties and summary statistics of the entire snow column or individual grid cells. This is only of limited value for operational avalanche forecasting. To address this challenge, we present an averaging algorithm for snow profiles that can effectively synthesize large numbers of snow profiles into a meaningful overall perspective of the existing conditions. Notably, the algorithm enables compiling of informative summary statistics and distributions of snowpack layers, which creates new opportunities for presenting and analyzing distributed and ensemble snowpack simulations.


1. the medoid profile, which identifies the one profile from the profile set that represents the group the best (Herla et al. 2021). While this medoid approach is computationally very expensive and inefficient, it has been shown to perform equally well or better than other more sophisticated and efficient sequence aggregation methods (Paparrizos and Gravano 2015). Besides the algorithm presented in this manuscript, the medoid approach is the only aggregation method that has been applied to snow stratigraphy.
2. the average profile, computed as in Sect. 2 and 3.1 of this manuscript (default algorithm).
3. the average profile, computed as in Sect. 3.2 of this manuscript (timeseries algorithm).
We performed this quantitative comparison of methods for every 7th day of the season. While the averaging calculations took less than 30 minutes, the medoid calculations for the 32 days took 28 hours. Despite this immense difference in computational demand, both averaging approaches yielded very similar root mean squared errors (RMSE) compared to the medoid approach ( Fig. 1). This suggests that-in the presented experiment-the performance of the aggregation is more influenced by the specifics of the profile set than the peculiarities of the aggregation algorithm. Given the other disadvantages of the medoid approach (ll. 31-37) and the considerable advantage of the averaging algorithm in making underlying distributions of the profile set accessible (ll. 116-118), the averaging algorithm becomes the obvious winner of the direct comparison.
Besides that quantitative comparison experiment with another profile aggregation method, we qualitatively assessed the value of the algorithm both in an academic context as well in an operational context. Two subsequent and currently ongoing research projects use the algorithm as data exploration tool. First, a large-scale validation study uses the algorithm (timeseries implementation) to provide context to detailed comparisons between human assessment data and snowpack simulations. In that context, the algorithm yielded meaningful results for multiple seasons and different elevation bands (i.e., captured most prevalent weak layers, median quantities like snow height, new snow amounts, etc, and did not yield any obvious erroneous artifacts). While these observations are qualitative for now, that project will generate more quantitative estimates of the capability of the algorithm to capture the predominant weak layers of the underlying profiles. Second, an ongoing clustering project aims to combine clustering methods with the averaging algorithm. In that context the algorithm (default implementation) is used extensively to summarize the conditions of profile clusters and produces aggregations that satisfy our demands for presenting the general snowpack structure on a high level as well as capturing the presence/absence of slab and weak layers. Finally, the algorithm (default implementation) has been used in an operational product at Avalanche Canada during the 2021/22 season to summarize profiles from predefined zones. In that application, the average profile helped forecasters to get a familiar overview of the general snowpack structure in that zone before consulting the We will elaborate on the testing of the algorithm in our revised manuscript and include the comparison of the averaging algorithm and the medoid approach.

Capabilities of the algorithm during wet snow conditions and melting
Referee Comment: Section 3.2 and Figure 3 show an example of an average snow profile over the course of a season. This example is helpful as it nicely illustrates the potential of the presented algorithm for the analysis of snow-cover simulations at a regional scale. However, from the perspective of a potential user of this algorithm, it would be useful if you could address the following two points: (1) In this example, the early part of the season is presented, but the melting season is missing. This makes me wonder whether the algorithm works equally well in spring when the snowpack height and the number of simulated layers decrease with increasing wetting. As the first wetting of the snowpack is highly relevant for forecasting wet-snow avalanches, snowpack characteristics like the advance of the wetting front are very important pieces of information (e.g. Wever et al., 2018). -I suggest expanding the average profile shown in Figure 3 well into spring; or, in case the algorithm is less reliable during the melting period, to mention this limitation.
Author Response: Thank you very much for bringing this up! The algorithm can indeed be used for melt season conditions equivalently as shown for midseason conditions. Since our operational data archives for the meteorological forcing end in April each year, it is impossible for us to simulate the snowpack until it is fully melted out. However, we extracted a set of grid points from lower elevations to demonstrate the melt season capabilities of the algorithm. All of the newly selected 46 grid points (from below treeline) become isothermal before the end of April (Fig. 2d-f show the individual grid points at March 23, March 25, and April 20, respectively). Similarly to the performance of the algorithm with mid-season profiles, the average time series precisely follows the median snow height also during melting of the snowpack (Fig. 2c). Furthermore, it also allows for monitoring the wetting of the snowpack, as it continues to follow the median depths of layers (such as the wetting front) (Fig. 2c). In our example, all grid points were entirely sub-freezing before March 23, when warmer air masses (Fig. 2a), cloud cover, and small amounts of liquid precipitation (Fig. 2b) led to the first wetting of the snow surface (Fig. 2c, e). In the consecutive month, the median depth of the wetting front remained constant at roughly 30 cm. A slightly more pronounced rain event on April 19 led to most grid points being entirely isothermal (with a frozen surface crust) (Fig. 2c, f).
We emphasize that our algorithm purely aggregates the information from the underlying grid points. We do not draw any conclusions about the performance of the individual grid points on accurately capturing snow conditions during wetting of the snowpack.
We will demonstrate the algorithm's capabilities of supporting forecasting for wet snow conditions in our revised manuscript.

Include summary of observed weak layers
Referee Comment: (2) I personally would have greatly appreciated if this example would have been supported with the (observed) weak layer summary in the region. From what I remember, the main author presented such a comparison at a conference last year (Herla et al., 2021), showing that the average profile captures most of the weak layers tracked by the field observers. While not a full validation, this would help the reader to understand that the average snow profile, synthesizing snowpack simulations driven with an NWP model, captures the most important snowpack features in the region.
Author Response: We agree with the referee that a brief summary of the observed weak layers in the region will make the presented example more tangible for readers. We will include it in our revised manuscript by highlighting observed weak layers in Fig. 3 of the original manuscript. Responses to Referee #2 (Christoph Mitterer)

General Comment
Referee Comment: Dear editor, dear authors, the presented brief communication by Florian Herla and collegues describes the use of a specific averaging technique, the Dynamic Time Warping Barycenter Averaging (DBA) in the field of Dynamic Time Warping (DTW) with pure focus on analysing modelled snow stratigraphy. While the appraoch and methods are not novel, it is the first time that these set of methods was applied to modelled snow stratigraphy and to the field of avalanche forecasting. Large parts of the devleopped DTW method for modelled snow stratigraphy were presented in an earlier manuscript by Herla et al (2021). The focus and added value on the presented manuscript compared to the already published content by Herla et al (2021) is on (1) the newly added averaging technique DBA (section 2), (2) some added features on the layer matching appraoch (lines 73-79) and (3)  Even though parts of the content were already described in Herla et al (2021), I like the idea of this brief communication since the authors focus more on the quality of the results while within the other publication the architecture of the algorithm covered most parts of the reading. Nevertheless, I would expect a little more quantitative presentation on some of the descriptions, which leads me to my four general comments that may improve the quality of the manuscript:

[continued below]
Author Comment: Thank you for your encouraging review and your thoughts on improving the manuscript, we much appreciate it. Upon changing of the manuscript type, we will be able to elaborate on some explanations in slightly more detail and context. Thank you for bringing this up! 2.2 Improve Figure 1 Referee Comment: As stated, Figure 1 is a bit confusing and hard to understand. Could you maybe use less profiles in between and describe the workflow a bit more in detail within the graph. In addition, add some more description within the caption.
Author Response: Yes, we will remove some profiles to make the figure less overwhelming, and we will describe the workflow within the caption.

Elaborate on influence of initial conditions
Referee Comment: You state that for the DBA it is essential to start the interation by choosing initial condition profiles strategically (line 89). How influential is that condition of the initial profile? Or with other words, if I miss to chose my starting position carefully, does the algorithm support me and is able to find weak layers that I just missed when picking the starting conditions. Can you quantify that by adding some noise to your initial profile?
Author Response: The initial condition profile can indeed have substantial influence in the resulting average profile. It is well possible that a weak layer that exists in the majority of profiles but not in the initial condition profile is not captured in the final result. It is therefore crucial to select the initial condition profile with care, and to re-run the algorithm for several different initial conditions (l. 104). If a prevalent weak layer is not included in the initial condition profile, the odds that this layer will be present in the final average profile depend on (with increasing influence) • the distribution of this layer in the profile set: The more profiles contain this layer, the more likely it is that it will be present in the final result, because more opportunities exist that this layer will be aligned onto the same reference layer.
• the thickness of this layer: The thicker the layer, the more likely it is that it will be present in the final result, because of the same reason as above. Note that this factor is often invariant, because many weak layers are thin.
• the location of this layer in the profiles: The more particular or specific the adjacent layers of the weak layer, the more likely it is that it will be in the final result. The underlying snow profile alignment algorithm (Herla et al. 2021) that the averaging algorithm builds upon optimizes not only the matching of individual layers, but the matching of entire layer sequences. Particular layer sequences adjacent to individual weak layers can therefore be thought of as anchor points during layer matching that tremendously increase the odds that the entire neighborhood of layers will be matched correctly and thus included in the average profile.
So, the main factors that influence whether a thin weak layer is included in the average profile are (i) distribution of the layer, (ii) the specificity of neighboring layer sequences, and (iii) the initial condition profile. While the first two factors are attributes of the data set, the initial condition is the only factor that can be tuned.
For example, assume a weak layer exists in 40 % of grid points within a thick sequence of unspecific bulk layers. If the occurrence frequency threshold to include that weak layer in the average profile (ll. 83-84) is set to 30 % and the initial condition profile does not contain that layer, then the odds that this layer will be present in the average profile tend to zero, because all four important factors are adverse: the layer only exists in a few more profiles than the minimum threshold, it is very thin, the bulk layer sequences around the weak layer are not specific but can be found in many other locations of the profiles as well, and the initial conditions do not contain the layer. However, either if • that particular layer is present in the initial condition profile, or • that particular layer is located next to a particular snowpack feature, for example a crust that is present in most grid points, then the odds that this layer is present in the final result are a lot higher. Figure 3 visualizes the example and illustrates these statements: A scarcely distributed surface hoar layer can be found roughly 20 cm below the new snow within the bulk layers ( Fig. 3a-layer emphasized by slightly more salient color). Five out of six initial condition profiles that do contain that layer (Fig. 3b) lead to average profiles that also contain that layer (Fig. 3c). By contrast, using an initial condition profile that misses that layer (Fig. 3d) results in an average profile that also does not contain that layer (Fig. 3e). If, however, the grain type sequence around the surface hoar layer were more particular-we model that by artificially ingesting a crust layer below the surface hoar layer, or where non-existent somewhere within the bulk layers (Fig. 3f)-the resulting average profile does contain both crust and surface hoar layer (Fig. 3g), although both were not present in the initial condition profile (Fig. 3d).
We emphasize that our algorithm per default picks (multiple) suitable initial conditions by itself, so accidentally picking only unsuitable starting conditions is highly unlikely. As stated in the manuscript, it searches for appropriate profiles within the interquartile range of snow height (Fig. 3a), and selects profiles with an above number of weak layers in as many depth ranges as possible (l. 90). That means, it organizes the profiles into several tiers by number of occupied depth ranges 1 and by total number of weak layers. Tier 1 contains all profiles with the maximum number of occupied depth ranges and the maximum total number of weak layers. Tier 2 contains the remaining profiles with the maximum number of occupied depth ranges and an above-average number of weak layers. Tier 3 contains all remaining profiles with the sub-maximum number of occupied depth ranges and an above average number of weak layers. Depending on how many initial conditions are requested by the user the algorithm picks profiles from the three tiers. While the algorithm can pick the appropriate starting conditions by itself, the user can modify how the algorithm should search for weak layers by labeling layers of interest (ll. 75-79). In other words, structural considerations (such as grain type, grain size, etc) can be combined with a number of popular stability indices given that they are available in the profiles. Currently implemented are threshold sums TSA or RTA (Schweizer and Jamieson 2007;Monti and Schweizer 2013), SK38 (Monti et al. 2016), critical crack length (Richter et al. 2019), and p unstable (Mayer et al. 2022).
We will include in the revised manuscript • a more detailed description of how initial conditions are picked by the algorithm and explain users' options for customization, • a more detailed description of the factors influencing the resulting average profile and emphasize the role of the initial condition.

Elaborate on testing of algorithm
Referee Comment: Related to that are your statements on the testing. I would like to see some more quantitative results and more in-depth description on how you did that. Reading phrases like (...)consistently produce reasonable average snow profiles suitable for avalanche forecasting.
(Line 105), are with low support and not helpful for the interested reader or an avalanche forecaster that wants to apply your findings. In addition, I would be curious what you think is suitable for avalanche forecasting and what is not ;-).
Author Response: This has been answered in detail in Author Comment 1.2.

Include better stability index
Referee Comment: I like Fig. 2 very much. It will be very helpful in daily routines of avalanche forecasting centers. However, I have some issues with how the content of Fig. 2b was produced. You basically applied the approach by Schweizer and Jamieson (2007) which turned out to be inpropriate or at least less helpful when applied to simulated snow cover data (Monti and Schweizer, 2013). Main reason for that is the fact that the thresholds by Schweizer and Jamieson (2007) were obtained with statistics based on observed snow stratigraphy parameters which may differ compared to simulated ones (especially grain size). That's why Monti and Schweizer (2013) introduced the relative threshold sum appraoch and I would love to see if there are particular differences for the presented example. In fact, I would expect, e.g. the facets below the thick layer of RGs (I assume this to be the slab) to give more indication towards instability. This in turn would give you the option to included FCs as weak layers as well. At the moment the representation of Fig. 2b is heavily driven by grain size only, since the used underlaying snow cover model classifies the weak layer DH and SH mainly based on their size.
Author Response: It is very encouraging to hear that you think our approach of making underlying distributions in the data set accessible will be helpful for operational avalanche forecasting. With regards to the threshold sum approach (TSA) by Schweizer and Jamieson (2007), we agree with you in that it is not a state-of-the-art stability index for simulated profiles. It is, however, a conceptually straightforward approach that is very tangible to many practitioners due to its application in the field. Since our figure aims at presenting the general capabilities of our algorithm, we believe it is most valuable to keep the complexity of the presented stability assessment low at this point in the paper. To address your suggestion of comparing the presented TSA approach to more sophisticated stability indices, we will follow up our general overview figure with Fig. 4. This new figure will highlight very strongly that our approach of making underlying distributions of the data set accessible is not confined to a single and particular property, but can be applied to any available variable in the user's data set. Furthermore, it will visualize that our averaging approach creates an opportunity for systematically comparing stability indices on a large-scale (i.e., for many profiles instead of few select ones) and for designing sensitivity studies that assess impacts on the layer level.
Since the recent development by Mayer et al. (2022) tremendously improves upon the temporal resolution of previously existing stability indices for simulated profiles, we will also visualize the proportion of grid points that promote poor layer stability in the time series of the average profile (Fig. 5). This visualization takes the concept from Fig. 4f to a temporal context and makes it effortless for users to understand temporal trends in the layerwise stability predictions of many profiles.   Fig. 3 of the manuscript, but overplotted with the distribution of grid points that contain layers with poor stability as diagnosed by p unstable (Mayer et al. 2022); it is therefore the analogon to Fig. 4e in a time series view.