The RHOSSA campaign: Multi-resolution monitoring of the seasonal evolution of the structure and mechanical stability of an alpine snowpack

The necessity of characterizing snow through objective, physically-motivated parameters has led to new model formulations and new measurement techniques. Consequently, essential structural parameters such as density and specific surface area (for basic characterization) or mechanical parameters such as the critical crack length (for avalanche stability characterization) gradually replace the semi-empirical indices acquired from traditional stratigraphy. These advances come along with new demands and potentials for validation. To this end, we conducted the RHOSSA field campaign, in resemblance of den5 sity (⇢) and specific surface area (SSA), at the Weissfluhjoch research site in the Swiss Alps to provide a multi-instrument, multi-resolution dataset of density, SSA, and critical crack length over the complete winter season 2015-2016. In this paper, we present the design of the campaign and a basic analysis of the measurements alongside with predictions from the model SNOWPACK. To bridge between traditional and new methods, the campaign comprises traditional profiles, density cutter, IceCube, SnowMicroPen (SMP), micro-computed-tomography, propagation saw tests, and compression tests. To bridge between 10 different temporal resolutions, the traditional weekly to bi-weekly snow pits were complemented by daily SMP measurements. From the latter, we derived a re-calibration of the statistical retrieval of density and SSA for SMP version 4 that yields an unprecedented, spatio-temporal picture of the seasonal evolution of density and SSA in a snowpack. Finally, we provide an inter-comparison of measured and modeled estimates of density and SSA for 4 characteristic layers over the entire season to demonstrate the potential of high temporal resolution monitoring for snowpack model validation. 15

The measurement field at the WFJ site is a flat area of about 20 : m : ⇥ 8 m 2 (Fig. 1). To ensure an efficient use of the snow field, measurements were performed within defined areas. The snow field was divided into three corridors, each 20 m long and 1.5 m wide, as illustrated in Figure 1. Throughout the season, sets of measurements were performed moving continuously 30 along the corridor in daily steps, starting at one end of corridor 1 and ending at the end of corridor 3, two consecutive sets of measurements being at least 30 cm apart to avoid disturbances. A schematic of the location of three consecutive sets of measurements ("day 1", "day 2", and "day 3") performed in corridor 2 at mid season is shown in Figure 1. Each corridor was divided lengthwise in 2 parts of 75 cm wide. One side was reserved for stability tests (red area in Fig. 1); the other side was used for all the other measurements. First, the five daily SMP measurements with a 15 cm spacing were performed perpendicular to the corridor direction (black dots in Fig. 1). Then, during a snow pit day as illustrated by "day 2" in Figure 1, the pit was dug such that the pit wall was parallel and a few centimeters behind the line that was formed by the SMP measurements. Density cutter and IceCube measurements were done next to each other (blue and orange areas in Fig. 1), and complemented by a 5 traditional snow profile when needed (green area in Fig. 1). Finally, for the occasional X-ray tomography, undisturbed snow blocks were extracted from the pit wall near the location of the other measurements.  Traditional profile every 1 to 2 weeks ::: (11 :::::: profiles : in ::::: total) variable traditional layer parameters, temperature :::: grain ::::: shape, ::::: grain :::: size ::::: (mm), hand hardness, ram resistance :::::::: temperature :::: ( C), :::: ram ::::::: resistance ::: (N) : Stability tests 8 times over the season -critical crack length :: (m), #taps until failure Tomography 6 times over the season 0.1 mm density , SSA ::: (kg ::::: m 3 ), ::: SSA :::: (m 2 :::: 3 Measurements

Traditional profile and stability tests
Traditional snow profiles were observed to characterize snow stratigraphy by hand hardness, grain size and grain type. In 10 addition, ram resistance, snow temperatures, and water equivalent of the snow cover were measured (Fierz et al., 2009).
Snow stability tests were performed to identify potential weak layers and evaluate the load required for failure. Specifically, we performed the compression test (CT; van Herwijnen and Jamieson, 2007), the extended compression :::::: column test (ECT; Simenhois and Birkeland, 2009) and the propagation saw test (PST; Gauthier and Jamieson, 2008). In a CT or an ECT, the snowpack is progressively loaded by tapping on a snow shovel placed on the snow surface with increasing force (10 taps from 15 the wrist, 10 taps from the elbow and 10 taps from the shoulder). If a failure occurs within the snow cover, the loading step, i.e. the number of taps at which the failure occurred, is recorded. In a CT, which consists of an isolated column of 30 by 30 cm, information describing the type of failure is also recorded (for more details see van Herwijnen and Jamieson, 2007). In an ECT, which consists of an isolated column of 30 by 90 cm, the propagation distance across the column is recorded as either no propagation, partial propagation or full propagation (for more details see Simenhois and Birkeland, 2009). CT and ECT are thus used to identify potential weak layers and qualify the loading required for failure. The PST, on the other hand, is used to 5 measure the critical crack length required for crack propagation in an a priori known weak layer. It consists of an isolated 30 cm wide column with a length of at least 120 cm, which has been excavated to below the weak layer of interest. An artificial crack is then created by drawing a snow saw through the weak layer until the critical crack length is reached and rapid crack propagation occurs. The critical crack length is recorded as well as the propagation distance, where END refers to cracks which propagated to the end of the column (for more details see Gauthier and Jamieson, 2008). 10

Density cutter
A density cutter was used to manually record the density profile of the snowpack by performing successive measurements from the surface to the bottom of the snowpack with a vertical resolution of 3 cm. A box-type density cutter of 100 cm 3 (3 ⇥ 5.5 ⇥ 6 cm) (Carroll, 1977;Conger and McClung, 2009;Proksch et al., 2016), was used to measure density by weighing a snow sample extracted from the cutter. A measurement error of about 10% can be expected (Carroll, 1977;Conger and McClung, extracting light snow, and of incomplete snow volumes (underestimation) when extracting fragile snow (e.g. faceted crystals or depth hoar).

IceCube
The IceCube was used to measure an SSA profile of the snowpack by performing successive IceCube measurements from the surface to the bottom with a vertical resolution of 3 cm. The IceCube is an optical system commercialized by A2 Photonic Sen-5 sors (Zuanon, 2013) to retrieve SSA from measurements of the infrared hemispherical reflectance of snow (Gallet et al., 2009).

Simulations with SNOWPACK
To put the measurement campaign in context, we conducted standard simulations with the detailed snow cover model SNOW-
The time step for the simulation was :: set :: to : 15 min and output was written every 60 min. For this campaign, we were particularly interested in evaluating the model in terms of density and SSA. The density of new snow was obtained from an 10 empirical relation between air temperature and wind speed (Schmucki et al., 2014). The snowpack itself is considered to be a linear viscoelastic material, the settlement of which was calculated as described in section 2.2.2 in Lehning et al. (2002b), using an altered viscosity parametrization. In addition, the effect of load rate was taken into account but any elastic effects were neglected. SSA was :::: latter ::: was : simply retrieved from the optical diameter of snow that is empirically derived from dendricity, sphericity, and grain size according to Vionnet et al. (2012). 15 5 Data analysis methods
(2015), a regression of the form was applied to estimate SSA by least squares optimization (SSA ic being the target). The following regression parameters were obtained: b 1 = 0.57 ± 0.05, b 2 = 18.56 ± 0.04, and b 3 = 3.66 ± 0.01, where SSA smp is in m 2 kg 1 . This regression has a R 2 coefficient of 0.67, a residual standard error of 8.4 m 2 kg 1 , and p-values less than 10 3 .
Four distinct layers were ::: This :::: way, ::: the :::: four :::::: layers tracked in this study and consist in four directly adjacent layers located in the bottom part of the snowpack. We choose these layers because they are among the main stratigraphic features of the snowpack observed during the winter, showed a wide range of snow types and properties, could be tracked over the entire winter, and were relatively easy to identify (rather sharp property transitions). These tracked layers are called the DH-layer 30 (depth hoar), the MF-layer (melt forms), the FC-layer (faceted crystals), and the RG-layer (rounded grains), from bottom to top layers, referring to the predominant grain shape observed in the layer. They are described in details in the next section. These four layers were :::: were identified based on four boundaries called 151201-boundary, 151202-boundary, 160102-boundary, and 160117-boundary, from the lower to the upper boundary, and the ground. This way, the ::: The : DH-layer was comprised between the ground and the 151201-boundary, MF-layer between the 151201-boundary and the 151202-boundary, FC-layer between the 151202-boundary and the 160102-boundary, and RG-layer between the 160102-boundary and the 160117-boundary.
Thereafter, a dry period followed during which snow surface temperature remained between -20 C and -10 C, allowing large temperature gradients to build up across the shallow snowpack. Traditional profiles show that this basal layer recrystallized predominantly into depth hoar (dark blue colored layers below 0 cm in Fig by a period of rather clear weather leading to low snow surface temperatures (Fig. 3). Again, this layer of faceted crystals was observed throughout the season (light blue colored layers between about 0 cm and 10 cm in Fig. 4, upper) and corresponds to the tracked FC-layer. January was generally characterized by a more cloudy weather with consistent precipitation events ( Fig. 3). With the first snow falls early January, snow accumulated on top of the FC-layer and was quickly buried by the 30 subsequent heavy precipitation events, being buried under around 75 cm of snow by mid-January. This layer was protected from significant temperature gradients and evolved into small faceted crystals and rounded grains (light blue and light red colored layers between about 10 to 25 cm in Fig. 4). As this layer showed systematically a higher hand hardness (4 fingers against 1 finger) and a smaller grain size (not shown) than the FC-layer and DH-layer, this layer was named RG-layer for a sake of differentiation. Finally, after further precipitation events mostly occurring early February and early March, the snowpack height reached about 200 cm by mid March and consisted mostly of layers of rounded grains on a weaker base of facets and depth hoar.

5
The snowpack stratigraphy simulated by SNOWPACK is shown in the lower panel of Figure 4. Qualitatively, modeled stratigraphy compared well with observed stratigraphy. Indeed, although many subtle differences in grain shape and hand hardness exist throughout the season, the major stratigraphic features are well reproduced, notably the weak base layers (DHlayer and FC-layer) as well as the overlying slab which mostly consisted of small rounded or faceted grains for which the hardness increases from top to bottom. :::: Note :::: also ::: the ::::: lower :::::: density :: of ::: the :::: base :::: layer ::::::::: compared :: to ::: the :::::::: overlying :::: slab. One major 10 discrepancy is that the melt-freeze / rain crust which formed on 1 December (MF-layer) was not simulated by SNOWPACK (see dedicated comment in Sec. 7.3). Instead, SNOWPACK simulated around 3 cm of new snow, which later re-crystallized into faceted crystals.
Snow stability tests showed that the weak base, namely the DH-layer and FC-layer, were the most critical weak layers during most of the season. As shown in Figure 5, both layers consistently failed in CT and ETC until the beginning of February. 15 Thereafter, these layers were not reactive anymore as tapping on the snow surface was not affecting the weak base buried below the hard and thick slab (black symbols in Fig. 5). From the PST, it was possible to follow the evolution of the critical crack length throughout the season (crosses in Fig. 5). Overall, the critical crack length increased steadily from about 20 cm in mid-January to around 60 cm beginning of March for both FC-layer and DH-layer, indicating weak layers less and less prone to crack propagation with time. Note that the critical crack length was consistently lower for the DH-layer than for the 20 FC-layer.     Fig. 6b). The evolution of this layer is not or only diffusely captured by the cutter measurements. Note that this layer was reported in the traditional profiles from the 24th of February on as a layer of melt forms with a hand harness of one fist (Fig. 4).
Simulations of the density profiles over the season agree overall well with the observations (Fig. 6c). The mis-modeling of the MF-layer, as mentioned earlier, leads however to large local deviations. Moreover, SNOWPACK seems to overestimate the densification rate of the DH-layer and FC-layer, leading to significantly higher modeled values by mid-March ( Fig. 8a and 8c).
This overestimation can also be observed in the vertical profile of March :::::: profiles for both weak layers :: for :::::::: example ( Fig. 7b :: Inversely, densification rate seems to be underestimated for layers evolving from fresh snow to rounded grains in the upper part of the snowpack, leading to simulated densities lower than the measured ones by mid-march, as shown in Figure 6 and 7b : f (layers from about 20 to 100 cm height). Finally, other inconsistencies can be observed locally in the simulated stratigraphy, such as the two relatively denser layers observed near the surface in March 2 at around 125 cm and 135 cm (Fig. 7b).
6.3 Evolution of SSA 5 Figure 9 shows the evolution of the SSA profiles over the course of the winter from IceCube measurements, from SMP measurements, and from SNOWPACK simulations. Note that IceCube measurements could not be performed on 19 January 2016 and 10 February 2016. SSA values range from about 70 m 2 kg 1 , for fresh snow layers at the surface, to about 5 m 2 kg 1 , in the bottom part of the snowpack. The MF-layer, well identifiable in terms of density ( Fig. 6a and b), is here difficult to distinguish from the DH-layer and the FC-layer due to their similar SSA values. The general trend of the SSA evolution is Finally, SNOWPACK overall underestimates SSA compared to measurements (Fig. 9, 10, and 11). Deviations are higher with the IceCube data than the tomographic data, for which some good agreements can locally be found, for instance when looking at the SSA evolution of the tracked layers from mid-January on (excluding the MF-layer).
Comparisons of SSA profiles from tomography, IceCube, and SMP measurements.  The specificity of the RHOSSA dataset is to provide time-series of density and SSA at a daily frequency and with a vertical resolution of 1 :: 0.5 : mm, in contrast with previous validation datasets (weekly to bi-weekly, vertical resolution of 3 cm or higher) (e.g. Morin et al., 2013;Leppänen et al., 2015). Both temporal and spatial resolution are critical to account for in snow models 15 because thin layers as well as processes occurring within short-time scales can have a significant impact on the snowpack behavior, e.g. on its mechanical stability (e.g Jamieson and Johnston, 1992). We highlight the need of high resolution datasets, as provided here, to evaluate the simulation of such features and processes. In addition to validation datasets, comparison methods are also crucial when assessing models. Different methods were presented in the past to compare measurements and simulations: i) the comparison of averaged (bulk) values over the entire snowpack height (e.g. Landry et al., 2014;Leppänen et al., 2015;Essery et al., 2016), which is easy to implement but provides rather limited information, ii) the comparison of paired-values at the same height of the snowpack, which allows assessing the snowpack stratigraphy (e.g. Lehning et al., 2001;Morin et al., 2013) (as in Fig. 7 and 10), and iii) the comparison of values 5 averaged within boundaries of specific layers of the snowpack, as used in Wever et al. (2015) and in this study ( Fig. 8 and 11).
This latter method seems particularly suitable to assess the skill of parameterizations of internal snow processes, e.g. temporal  re-align the profiles thanks to the presence of the dominant MF-layer in all measurement methods and throughout the season.
Slight vertical mismatches can however be found. For example, the density profile of March 2, 2016 (Fig. 7) shows two distinct denser layers at around 125 cm and 135 cm height which are well identified in both SMP and density cutter measurements but with a height mismatch of about 5 cm. This re-alignment method based on the identification of a persistent and well-defined snowpack feature might however not be always applicable. A more systematic approach could be the algorithm presented by 5 Hagenmuller and Pilloix (2016) to automatically match snow profiles by adjusting their layer thicknesses. This methods has a strong potential for quantitative comparison studies (Hagenmuller et al., 2018). When comparing properties of specific layers, the definition of the layers boundaries is critical. The second-order fluctuations observed in the evolution of density and SSA of the MF-layer ( Fig. 8 and 11), especially visible in the SMP data, might possibly result from the boundaries definition of this layer, in addition to the natural spatial variability of snow. Besides, the manual definition of boundaries is rather time- 10 consuming if numerous layers are tracked. A more automatic method could be developed. In this respect, the RHOSSA data constitutes a valuable resource due to the continuity of the spatio-temporal picture of the seasonal evolution of stratigraphy.

The potential of daily SMP measurements
With daily SMP measurements, the RHOSSA campaign allows following the evolution of the internal structure of a snowpack at a sub-centimeter vertical resolution almost continuously over 4 months -up to now inaccessible. An unparalleled, smooth 15 picture of the spatio-temporal evolution of density and SSA is revealed, contrasting with data from the classical snow pit measurements ( Fig. 6 and 9). Many thin stratigraphic features are indeed clearly visible in the SMP data but only diffusely shown by the manual measurements. This highly detailed picture of the snowpack evolution opens new opportunities for field studies on snowpack processes occurring over short-time scales (e.g. densification of fresh snow) or very localized (e.g. rain crust or surface hoar formation), as well as refined evaluation of snow models as already mentioned.
One advantage of SMP measurements compared to snow pit measurements is they are relatively faster (of the order of 30 minutes for five measurements) and thus more suitable for daily snowpack monitoring. It is however important to keep in mind that density and SSA are not directly measured by the SMP but derived from the force signal based on parameterizations ( Fig.   5 2), bearing additional uncertainties comparing to other more direct measurements. Several parametrizations were previously put forward to derive density and/or SSA from SMP signals (e.g. Pielmeier and Schneebeli, 2003;Dadic et al., 2008;Proksch et al., 2015;Kaur and Satyawali, 2017). Differences between the parameterizations of density and SSA of Proksch et al. (2015) and the ones presented in this study are likely due to the version of the SMP device which has undergone an update of the electronics in version 4 that affected the inversion of the model Löwe and van Herwijnen (2012) through the force correlation 10 function. We would hope that the parameterization Eq. (1) and (2) are generally applicable to an SMP version 4. However, without an independent validation by measurements under different snowpack conditions, it is not possible to state the range of validity of the parametrizations presented here. In the long term, it would be desirable to improve the underlying stochasticmechanical approach (Löwe and van Herwijnen, 2012) by an invertible model that contains density and SSA to retrieve these parameters from a more physical picture of the penetration process.

Comparing density and SSA estimates
As possible starting points to future dedicated studies, we sum up here the main deviations reported in this paper when com-5 paring density and SSA estimates. First, we recall that density and SSA derived from SMP data were obtained to best match results from the cutter and IceCube measurements, so they necessarily inherit their performances.

Code and data availability
The dataset presented in the paper will be available on the EnviDat database (doi will be provided :::: upon ::::::::: acceptance). The international classification for seasonal snow on the ground, IHP-VII Technical Documents in Hydrology n 83, IACS Contribution n Gallet, J.-C., Domine, F., Zender, C. S., and Picard, G.: Measurement of the specific surface area of snow using infrared reflectance in an integrating sphere at 1310 and 1550 nm, The Cryosphere, 3, 167 -182, https://doi.org /10.5194/tc-3-167-2009, 2009. Gaume, J., Herwijnen, A. v., Chambon, G., Wever, N., and: Snow fracture in relation to slab avalanche release: critical state for the onset of crack propagation, The Cryosphere, 11, 217-228, 2017. Gauthier, D. and Jamieson, B.: Understanding the propagation of fractures and failures leading to large and destructive snow avalanches: Lehning, M., Bartelt, P., Brown, B., and Fierz, C (02)   van Herwijnen, A. and Jamieson, B.: Snowpack properties associated with fracture initiation and propagation resulting in skier-triggered dry snow slab avalanches, 50, 13-22, https://doi.org /10.1016/j.coldregions.2007.02.004, 2007. van Herwijnen, A. and Jamieson, B.: Fracture character in compression tests, Cold Reg. Sci. Technol., 47, 60-68, https://doi.org/10.1016/j.coldregions.2006.08.016, 2007 van Herwijnen, A., Bair, E., Birkeland, K., Reuter, B., Simenhois, R., Jamieson, B., and Schweizer, J The authors present a local-scale study aimed at characterizing seasonal snowpack evolution with traditional sampling (snow pits), advanced techniques (SnowMicroPen, IceCube, and Tomography) and model application (SNOWPACK). Applying a multi-scale approach, methods are intermixed to construct a daily time series of vertical variation in snow density and specific surface area. The methods are cross-compared to contribute a recalibration of the Proksch et al. (2015) SMP empirical model and to evaluate SNOWPACK simulations. Analysis of the dataset demonstrates clearly how recent advances in field methodology can support model evaluation at very high vertical resolutions. In particular, the details found in Figures 6 and 9, where SMP derived snow properties are introduce at daily time steps, show ability to track snow events and metamorphosis captured in SNOWPACK simulations. Overall, the paper provides a great summary of the campaign results and demonstrates how future model evaluations can benefit from applying similar seasonal framework.
Prior to publication, the paper would benefit from some restructuring to clarify properties of generated the dataset and promote repeatability. These would be meaningful additions to allow application of this work to other environments: -Recalibration of the Proksch et al. (2015) model uses collocated SMP profiles and density cutter measurements. No distinction is made between the training and testing data when evaluating Eqns 1 or 2. If the authors felt cross-validation was unnecessary, please include this information so that the reader can determine if the skill estimates may be biased (i.e. Test-Train are identical datasets). à The entire cutter and IceCube data have been used to "train" the SMP data and to obtain Eq. (1) and (2). The scatter plots shown in Figure 2 show the quality of these parameterizations for the same dataset, i.e. SMP derived data from Eq(1) and (2) versus cutter and IceCube data. The "Train" and "Test" dataset are thus the same. We aimed here at getting as close as possible to this particular cutter and IceCube dataset from our SMP data, and we did not evaluate the obtained parameterizations with other independent dataset. This is why we wrote page 23 line 10: "We would hope that the parameterization Eq. (1) and (2) are generally applicable to an SMP version 4. However, without an independent validation by measurements under different snowpack conditions, it is not possible to state the range of validity of the parametrizations presented here." We improved Section 5.1 so that it appears more clearly that the test and train data are the same, p10, L6: "This plot [Figue2] shows the observed density from cutter measurements against the SMP-derived density obtained from Eq. (1) and from Proksch2015 for the 15 days for which both data are available (same dataset as used for the statistical modeling). Similarly, the observed SSA from IceCube measurements are presented against the SMP-derived SSA from Eq. (2) and from Proksch2015 for the 13 days for which both data were available (same dataset as used for the statistical modeling). To do so, and as done for the statistical modeling, SMP-derived properties were averaged over 3 cm resolution and SMP and snow pit profiles of the same day were re-aligned with the snow surface and cropped to the length of the shortest profile." -I'd like to better understand why realignment resulted in improved correlation between the cutter/IceCube measurements and SMP derived properties in Figure 2 as indicated in text (P9 L23). If alignment with the persistent layer defined in Section 6 resulted in a better vertical matching, why were the better alignments not used for the initial recalibration? Throughout the paper, descriptions of alignment could be improved and are noted in the extended comments below. à We thank the reviewer for pointing out this issue in the paper. We agree on the confusion about the alignment. For explanation, we would like to point out the difference between 1/ matching of profiles of the same day for statistical analysis, and 2/ matching for visualisation of the data such as the evolution of profile with time. 1/ Alignment of co-located, co-temporal profiles can be done by using the snow surface. This is convenient and always applicable (unlike using a specific layer) so it is a suitable method to use when doing a local re-calibration of the SMP parameterizations as in our study. 2/ Alignment of profiles when plotting their evolution with time requires another method of matching since profiles are then not co-temporal and do not share a common height/snow surface. One way is to re-align profiles with the ground. For sites showing a ground that is uneven or bumpy this method can however lead to a mediocre alignment. This was the case of the WFJ (ground is uneven) and we found out that a re-alignment based the crust MF-layer offers a qualitatively better match, when looking at plots of Fig  6 and 9 for example. Hence, we chose this alignment method for to present data in Figure 6, 7, 9 and 10. à As pointed out by the Reviewer, the first version of the paper showed an inconsistency related to the choice of the alignment method in Section 5.1. Indeed, method 1 (snow surface alignment) was used to develop the statistical model but method 2 (MF-layer alignment) was used to test the performance of the model ( Figure  2). We fully agree with the reviewer that it is confusing. Thus, we modified so that method 1 (snow surface alignment) is now used for both the statistical model and the analysis of the model performance. Method 2 (layer alignment) is only used later in the paper, in the Result part, for time-series plotting purposes.
Modifications throughout the paper have been done accordingly, especially: - Figure 2 has been redone, based on data re-aligned using the snow surface -R2 coefficients associated to Figure 2 have been modified. They are slightly better than the previous version (from layer alignment to snow surface alignment: R2 changes from 0.73 to 0.75 for density and from 0.81 to 0.82 for SSA, using Eq 1 and Eq 2 respectively). This actually makes sense as Eq. 1 and 2 have been developed from data aligned with the snow surface. -Section 5.1 reads now, p10, L6: The performance of the new parametrizations compared to the original parametrizations of Proksch2015 is presented in Figure  2. This plot shows the observed density from cutter measurements against the SMP-derived density obtained from Eq. (1) and from Proksch2015 for the 15 days for which both data are available. Similarly, the observed SSA from IceCube measurements are presented against the SMP-derived SSA from Eq. (2) and from Proksch2015 for the 13 days for which both data were available. To do so, and as done for the statistical modeling, SMP-derived properties were averaged over 3 cm resolution and SMP and snow pit profiles of the same day were re-aligned with the snow surface and cropped to the length of the shortest profile. " -In the introduction to the Result part, p12, L3, we included now: "To present the evolution of profile properties with time, vertical profiles presented in the following were re-aligned such as z = 0 cm corresponds to the height of the upper boundary of the MF-layer (i.e. the 20151202-boundary). Choosing this layer as a height reference leads to a qualitatively better match than by simply taking the ground as reference (the field site ground at WFJ is uneven)." -While the layer tracking analysis is meaningful (Fig 8 and 11), description of the SMP tracking method is difficult (if not impossible) to reproduce. An enhanced description of how transitions in SMP signal were used to define layers would be a helpful addition. à Section 5.2 "Layer tracking" has been restructured and some reformulation has been made to improve the description of the method. Layers in SMP data were tracked in the same way as in the cutter and IceCube data, i.e. by a manual identification of boundaries in the snow property profiles. The paragraph now reads: "In the measurements data, the layers of interest were defined by the height of their upper and lower boundaries. Boundaries were manually identified by simply looking at the property profiles, looking for sharp and relevant transitions, and recording heights. This step was performed on all the weekly density profile from the cutter and SSA profile from IceCube, as well as on all the daily representative profile of penetration force resistance obtained from the five daily SMP measurements. The identification of layer boundaries was sometimes challenging for weak stratigraphic transitions, e.g. the transition between a layer of fresh snow that fell onto a soft snow layer. To help in such cases, boundaries could be backtracked in time, starting from a profile where the layer of interest is older and its boundaries more clearly detectable. Also, additional information, such as observed height of new snow, was sometimes used to help delineate boundaries." Besides, we would like to point out that this method only works when tracking wellpronounced layers, so might be hard to use in a systematic way over entire snowpack profiles. To stress this point, we added p10, L23: "The first step is to define which are the layers of interest, knowing that this method is only possible with layers that contrast well enough with their surrounding, so their boundaries can be identified by a significant and rather sharp transition in the vertical profile of snow properties." -I can confirm that the revised coefficients presented for SMP density are improved over those Proksch et al. 2015 for Arctic snow and snow on sea ice. However, local calibration with our SMP4 unit resulted in quite different coefficients and better RMSE over the use of global parameters (P23 L11). This may make it important to make clear the calibration methods so that they can be easily repeated for different environments or units(?). We improved the description of the calibration method in Section 5.1, making sure that each step is clearly described.
General comments P2 L5 -Suggest removing the 'e.g' and revising as 'data back to 1936 in the case of WFJ'. à Modified accordingly P2 L8 -Please be explicit about which properties are characterized rather than using 'hard hardness . . ..'.
P2 L9 -Remove the period between the citation and sentence. à Modified P2 L14 -Can you clarify what 'non-empirical snow properties' means? This statement is unclear.
With "non-empirical properties" we refer to properties that are physically/mathematically-defined, such as density and SSA, in contrast to grain shape for instance which has no mathematical definition. We modified the term and use "objectively-defined snow properties" (P2, L15).
P2 L15 -Ideally traditional measurements would be supported with metrics such as SSA but the use of the word 'tends' seems to imply this IS a frequent practice. Could it rephrased with the word 'can' or similar? à Modified accordingly. The sentence reads now "Concerning the characterization of snow microstructure, the observer-biased estimate of traditional grain size can be replaced by measurements of specific surface area" (P2, L15).
P3 L16 -Should the word 'such' be in this sentence? à We modified the sentence as "These examples exploit key advantages of the SMP, namely fast profiling for frequent measurements and high vertical resolution, so that profiles are obtained at a considerably finer scale (mm) than possible with traditional means." (P3, L17).
P3 L21 -It feels a bit discouraging to say that the stated goals are dependent on availability of a large dataset with many tools. As a suggestion, removing the word 'only' might lessen the tone. The wording 'cross-validation' could also be problematic as it refers to a specific statistics method. Later the wording 'cross-comparison' (P4 L8) is used which seems to be a better fit. à We agree with the reviewer and modified the sentence accordingly as "In the context raised above, the value of emergent, objective snow properties, their potential to replace traditional means in operational snow monitoring programs, and their requirements on temporal and vertical resolutions for model evaluations can be investigated within a multi-resolution and multi-instrument dataset to facilitate comprehensive crosscomparison analyses." P4 L12 -Degree symbols should accompany the coordinate units. P7L7 -If the Zuanon (2013) methods were adopted, were any samples compressed to avoid over penetration of the laser? A sentence on how samples were extracted and prepared would be useful for future comparisons where this has become common practice.
à The extraction of the sample was performed following the protocol described in Zuanon et al. 2013. In addition, we indeed systematically slightly compressed the extracted sample. We included this information in the paper: p7, L10: "Snow samples were very slightly compressed when inserted into the sample holder and attention was paid to have a flat snow sample surface." P7 L8 -What about uncertainty with low SSA (i.e. DH or FC)? Standard deviation of the measurements in Figure 10a appears to increase with depth and is quite large relative to tomography. à As pointed out in the paper Section 7.3, we report a significant and systematic intermeasurement deviation in the SSA estimates. Although we did not study in details uncertainty of SSA measurements in weak layers, our results do not show that biases are more pronounced for DH or FC layers. We did not observe an evolution of the bias with depth. The paper however stresses that these inter-measurement deviations should be further investigated.
P7 L17 -Would like to see an enhanced description of what goes into the profile quality check. Previous studies have described linear trends while measuring in air while others have provided quantitative methods to apply a noise threshold. Which approach was used to determine drift or accept/reject a profile? à We improved the description of the SMP data processing. The paragraph now reads P7, L21: "The quality control of SMP force profiles was done manually by rejecting signals with 1) visible trends either in the air portion of the signal or over the entire depth, 2) high noise levels and unrealistic spikes, and 3) frozen tip problems revealed by a force response that appears to be activated only deeper in the snowpack. Most of these problems are caused by wet conditions. The air-snow and snow-ground interface were detected manually to remove air and ground regions from the signal." P7 L20 -What were the qualities of the data, snow, or study site that determined the profiles could be matched without an offset correction? In section 6 the opposite seems to be stated that spatial variability required compensation to avoid height mismatches (P11 L13). à This seems to be a misunderstanding. We improved the description of the SMP data processing in Section 3.4. By offset correction we mean that the value of the force signal itself was not shifted by a given value as it can be sometimes observed (see previous comment). The force signal in the air was very close to zero (manual check) so we did not correct the force signal. This has no link with the height alignment performed in Section 6 for data visualisation.
P7 L29 -Suggest removing 'Reconstruction followed standard procedure' as it's described in the next sentence.
à Modified accordingly P8 L10 -May be helpful to indicate the rate of replacement. à During the period shown in this study (no melt out), only missing values of either incoming or outgoing SW or albedo values above 0.95 require a replacement. There are no missing values and the latter amount to at most 0.8%, predominantly at sunrise and sunset.
P9 L7 to 11 -Found this a bit of confusing. Is the single 'median' profile being used to train (1)? Perhaps the alignment sentence could be moved upwards in the paragraph to clarify. As it reads now I was not able to determine if 1 profile per pit is being used or if multiple A-S aligned and cropped profiles are being used. à We agree with the reviewer and modified the paragraph to describe more clearly each step of the process. It reads now, P9, L16: "The statistical modeling was applied based on a sub-dataset of data from the days for which both SMP and snow pit measurements were available (15 days for density, 13 days for SSA). From each raw force signals, parameters F and L were computed from the raw penetration force profiles over a sliding window of 1 mm with 50% overlap, yielding profiles of F and L with a vertical resolution of 0.5 mm. Note that Proksch et al. (2015) used a sliding window of 2.5 mm, but tests with different window heights (1, 2.5 and 5 mm) did not show a significant impact. Next, for each day, the five daily profiles of F and L of the same day were aligned by simply using snow surface as common reference and a median operation was applied to get one representative profile of F and L per day, called the median profiles in the following. Next, each median profile was averaged vertically using a 3 cm window to match the vertical resolution of the snow pit measurements. Finally, the median 3cm-averaged profiles F and L and the profiles of rho_cutter and SSA_ic of the same day were aligned by using snow surface again as common reference and cropped to the length of the shortest profile. This way, all profiles of a given day are described on the same vertical scale and values of F, L, rho_cutter and SSA_ic can be paired for the statistical modeling, relying on a total of 590 paired-values for density and 497 for SSA." P9 L15 -Please provide the number of compared measurements to support of the significance test. à The number of compared measurements was 590 for density and 497 for SSA. We included that in the manuscript (see comment above).
P9 L16 -This differs substantially from Proksch et al (2015) where coefficients for SSA were not provided. This new equation requires no estimate of density from the SMP, which arguably is better if SSA is the target (minimizes bias from density coefficients and conversion from d0?). No action to take unless the authors wish to highlight the benefit of avoiding the conversion of L_ex to SSA. à We would agree with the reviewer that directly estimating SSA and not correlation length via the density as in Proksch et al. 2015, should lead to a better estimates (less errors). In the paper we simply pointed out this difference in the method by writing "Differing slightly from the one suggested by Proksch et al 2015, a regression of the from [Eq 2] was applied to estimate SSA …".
P9 L23 -An enhanced explanation of why the values in Figure 2 do not reflect the error/skill assessment in this section is needed. Related questions: Why does correlation improve when Eqn.
(1) was trained on a different set of comparisons? Why was Eqn (1) was not just trained on this better alignment to begin with? à As written in an above comment on the same issue, we agree with the Reviewer and modified Section 5.1. In the revised version, values in Figure 2 (and the associated correlation analysis) are based on the same set of data and same re-alignment with the snow surface than the values taken for the statistical modelling Eq 1 and 2. Besides, our statement that using the MF-layer alignment leads to better correlation of values in Fig 2  was a wrong statement. Slightly better R2 coefficients are indeed found when using the snow surface than using the MF-layer for re-alignment (from layer alignment to snow surface alignment: R2 changes from 0.73 to 0.75 for density and from 0.81 to 0.82 for SSA, using Eq 1 and Eq 2 respectively). This makes sense as Eq 1 and 2 have been developed based on a snow surface re-alignment. This has been corrected in the revision and Section 5.1 is now consistent.
P8L29 -Remove one set of brackets around the Eqn. à done P10 L2 -What was the statistical test that showed the boundary transition to be significant? If untested, consider removing the word 'significant'. See comments in the initiate statement about repeatability as well. à Boundaries were detected manually just from looking at the data, so there was no statistical test to identify them as well as to confirm that they are "significant". We deleted the work "significant" and it reads now, P10, L23: "The first step is to define which are the layers of interest, knowing that this method is only possible with layers that contrast well enough with their surrounding, so their boundaries can be easily identified by a rather sharp transition in the vertical profile of snow properties." We modified substantially Section 5.2 "Layer tracking", as described in a related comment above, so the method is better described now and can be repeated.
P10 L3 -Given that the boundaries were identified subjectively, will their heights be provided in the published dataset? à Heights of the tracked layers will be provided in the database of this study.
P15 L3 -I agree that the information is really useful to show the formation and evolution of these fine features. However, given that Figure 6b has no minor or major ticks for the initial date (Feb 22) it's fairly difficult to identity the feature. Could a label be provided for easy reference? à We prefer to leave the figures as is to avoid an emphasis on a single, annotated feature. Since the location is given exactly in the text and the x-axes of the subfigures are exactly the same, the birth of this layer could be easily taken from the SMP image above.
P23 L11 -I can confirm that the recalibrated density coefficients don't produce a bestpossible estimates of snow density with our SMP for Arctic snow. Would be very interesting to combine datasets from multiple units to evaluate this uncertainty. à We agree that it would be very interesting to compare different sites to test the recalibrations presented here.
P25L20 -Citation style should be a paraphrase.  -Please provide a colour legend for the grain type classifications even though they are standardized. Additionally, is it possible to provide sub-hatching for the hand hardness levels? It's challenging to determine the level past the first data.
à Figure 4 has been modified accordingly.
Figure 6/9 -Has the SMP data been smoothed or aggregated? This does not appear to be mentioned in text but Figure 10 shows variability in SSA absent in Figure 9 at the 1 mm scale.
à We used the same data with a resolution of 0.5 mm for the seasonal evolution plots as well as for the vertical profile plots (7 and 10).

Response to Review 2
We thank very much Reviewer 2 for his/her comments that help improving the manuscript. Please find below our point-by-point replies in blue color.
The paper presents the RHOSSA campaign focusing on snow density, SSA and sta-bility measurements over one winter in Weissfluhjoch, Switzerland. Modern methods such as SMP and IceCube are compared with traditional snow pit measurements and SNOWPACK modeling. Measurement results demonstrate how modern methods can increase temporal and vertical resolution in snow profiling compared with traditional measurements. This kind of data sets allow proper evaluation of modeling results, which is not possible using traditional measurements due to their poor temporal and vertical resolution. The main result is the recalibration of Proksch et al. 2015 model for deriving SSA and snow density from SMP data.
The snow stability part is a bit disconnected from the main text, which focuses on SSA and density. The authors could consider dropping the stability measurements. à We understand the concern raised here. However, although the mechanical properties are not analysed in as many details as for the structural properties, we think providing the complete dataset, including mechanical properties, can be very useful for other studies, as studies related to avalanches for example. It is important to keep in mind that traditional snow observations have a long tradition in avalanche research, which supports daily snow observations in alpine regions (such as Switzerland or France). And since nowadays stability predictions become feasible from high-resolution density profiles we definitely want to keep it. The full dataset is made available through a doi given in the paper.
p4r4 Section 6-> Section 5 à Corrected p4r12 Degrees missing from coordinates. à Corrected p6r15 The snowpack was sampled with 3 cm resolution. What did you do with layers thinner than 3 cm? This explains why the 22 Feb layer is "only diffusely reported in the IceCube data" (p16 r17), if it is mixed with grains from other layers. How did you sample the MF layers? They are very difficult to get into sample holder without breaking them.
Were the low density layers compacted to avoid measuring the sample holder? à Density and SSA profiles were recorded at regular height intervals of 3 cm, without considering the layering (we did the same for the SMP data with a vertical resolution of 1 mm and with the tomography data with a resolution of 18 µm). Using regular vertical grids and not following defined layers allows comparing data from different measurements and simulations, solely based on height (objective), without the need to identify layers (which can be subjective). We agree that a vertical resolution of 3 cm can lead to sampling in layer transitions leading to a more diffuse picture of the density or SSA profile. The MF layers were not too difficult to sample in our case (not overly dense) and procedure was the same as for other layers. Unfortunately, we are not aware of the method of compacting low-density layers to avoid measuring sample holder.
p8r8 Why exactly 1.2 •C? à The question of the impact on simulations from not considering the phase of precipitations cannot be answered straight away as we currently do not have observations permitting a proper attribution of precipitation phase at Weissfluhjoch. However, in preparation of the first SnowMIP around 2000, a dataset including the phase (liquid/solid, no mixed precipitations) and based on visual observations of the current weather could be constructed. The observations led us then to use a threshold of 1 °C. The threshold of 1.2 °C for Automatic Weather Station located above ~1000 m a.s.l. was introduced for operational use and proved to be well suited for Switzerland and Weissfluhjoch in particular (see Schmucki et al., 2014) Along the period considered in this paper, there were no major precipitations associated with air temperatures above 0 °C though. In summary, this threshold plays no role in the context of this study and it would be out of scope to discuss it further in the text. Nevertheless, we reformulated slightly that sentence in Section 4 of the paper.
p9r10 What is the justification for selecting different method for matching the profiles here than later in the paper (p9r24)? If re-aligning profiles using the MF layer resulted in "better correlation between estimates from SMP and snow pit measurements", why didn't you use the same method here to derive the parameters? à We thank the reviewer for pointing out this issue in the paper. We agree on the confusion about the alignment methods used in Section 5.1. The revised version of the paper was modified so that alignment done for the statistical modelling (Eq 1 and 2) and, later, to compare cutter/IceCube data and SMP data (Fig 2) is the same and based on the snow surface. Using the snow surface to re-align co-located and co-temporal profiles is the more convenient and systematically applicable method that can be done by others in the same way (unlike using a specific layer of the snowpack). Besides, our statement that using the MF-layer alignment leads to better correlation of values in Fig 2 was erroneous. Slightly better R2 coefficients are found when using the snow surface than using the MF-layer for re-alignment (from layer alignment to snow surface alignment: R2 changes from 0.73 to 0.75 for density and from 0.81 to 0.82 for SSA, using Eq 1 and Eq 2 respectively). This makes sense as Eq 1 and 2 have been developed based on a snow surface re-alignment. Finally, the re-alignment based on the MF-layer is now only used in the Result part, for time-series plotting purposes for which snow surface alignment is not relevant as profiles are not co-temporal anymore (evolution of snowpack height over the season).
Modifications concerning the alignment method were done throughout the paper, especially: - Figure 2 has been redone, based on data re-aligned using the snow surface -R2 coefficients associated to Figure 2 have been modified. They are slightly better than the previous version (from layer alignment to snow surface alignment: R2 changes from 0.73 to 0.75 for density and from 0.81 to 0.82 for SSA, using Eq 1 and Eq 2 respectively). This actually makes sense as Eq. 1 and 2 have been developed from data aligned with the snow surface. -Section 5.1 reads now, p10, L6: The performance of the new parametrizations compared to the original parametrizations of Proksch2015 is presented in Figure  2. This plot shows the observed density from cutter measurements against the SMP-derived density obtained from Eq. (1) and from Proksch2015 for the 15 days for which both data are available (same data as used for the statistical modelling). Similarly, the observed SSA from IceCube measurements are presented against the SMP-derived SSA from Eq. (2) and from Proksch2015 for the 13 days for which both data were available (again, same data as used for the statistical modelling). To do so, and as done for the statistical modeling, SMPderived properties were averaged over 3 cm resolution and SMP and snow pit profiles of the same day were re-aligned with the snow surface and cropped to the length of the shortest profile. " -In the introduction to the Result part, p12, L3, we explained further the choice of the MF-layer alignment to do temporal plots: "To present the evolution of profile properties with time, vertical profiles presented in the following were re-aligned such as z = 0 cm corresponds to the height of the upper boundary of the MF-layer (i.e. the 20151202-boundary). Choosing this layer as a height reference leads to a qualitatively better match than by simply taking the ground as reference (the field site ground at WFJ is uneven)." p9 The model parameters are derived from IceCube measurements. Later (e.g. Fig 11) you show that there are big differences between IceCube and tomography measurements. Please comment on the accuracy of SMP-derived SSA values. à Comparisons between SSA measurements are described in Section 7.3. The SMPderived SSA values inherits from 1/ the accuracy of the IceCube measurements (since the SMP-derived SSA values come from a fit (Eq 2) of the IceCube data) and 2/ the quality of the statistical model (how good is the fit). Concerning 2/, the quality of the model is described in Section 5.1 and in Figure 2 (scatter plot). To describe further correlation of values in Fig 2, we included the RMSD values. P10, l14 now reads: "Applying a simple linear correlation between rho_cutter and rho_smp, a R2 coefficient of 0.87 and a root-mean square deviation (RMSD) of 34 kg m −3 are found when using Eq. (1) against a R2 of 0.75 and a RMSD of 69 kg m −3 when using the parametrization of Proksch et al. (2015). Between SSA_ic and SSA_smp, a R2 coefficient of 0.82 and a RMSD of 7 m 2 kg −1 are found when using Eq.
(2) against a R2 of 0.65 and a RMSD of 14 m 2 kg −1 when using the parametrization of Proksch et al. (2015)." Also, in Section 7.3 (line 14 page 24), we raise the point that the present statistical model used to derive SSA from SMP measurements fails to reproduce the high SSA values of newly-deposited snow, and that this could be because of their underrepresentations (only one day) in the IceCube dataset used for calibration. Point 1/ is mentioned in Section 7.3 such as "First, we recall that density and SSA derived from SMP data were obtained to best match results from the cutter and IceCube measurements, so they necessarily inherit their performances" (p 23, L. 27). This implies that any discrepancies between IceCube and tomography data will necessary be also found between SMP-derived data and tomography data.
p11r2 choose->chose à modified accordingly p11r20 caption->panel à modified accordingly p22 Fig 11. The difference between SSA derived from SMP and tomography varies between different layers. Do you think the snow structure (grain type) has something to do with that? Should the SSA model be calibrated separately for different grain types?
And why are there big differences between IceCube measurements and SMP, if IceCube data was used in the fitting, shouldn't they agree better? à From Figure 11, we think that the variations in the differences between SSA derived from SMP and tomography depend more on the range of SSA values considered, rather than on the layers considered and so on the snow structure. Indeed, the quality of Eq 2 is better for some SSA ranges than other. In particular, looking at Figure 2b at SSA values below 20 m 2 kg -1 , we see that most of SMP values are slightly overestimated compared to IceCube values (cloud of values slightly below the 1:1 curve). Back to Figure 11, this bias clearly appears for most layers, for which SSA values are all mostly below 20 m 2 kg -1 . We add a sentence in the paper about this comment, which reads, P24, L7: "Note that one major discrepancy between IceCube and SMP-derived SSA comes from that the calibration used (parameterization) leads largely to an overestimation of the SSA values below about 20 m 2 kg -1 by the SMP compared to IceCube (see Figure 2b, data cloud is mostly located below the 1:1 curve). This can be clearly seen in our results (Figure 9, 10, and 11) since a large part of the snowpack shows SSA values below 20 m 2 kg -1 ." à Regarding the differences observed between SMP estimates of SSA and IceCube, they are directly link to the quality of the prediction Eq 2. To explain why a better regression could not be obtained, we think one point is that some snow type might not be well captured because of the under-representation in the IceCube measurements for some snow types, such as fresh snow in our case (this was the case of only 1 day on measurement for which fresh snow was measured in the first cm of the snowpack). To improve that, the calibration dataset should be extended so that all snow type is rigorously covered. à Regarding the grain type: a large part of the motivation of this work is making a step away from (subjective) indices. Thus re-introducing grain-type dependent calibration coefficients is, from our perspective, the wrong way to go. But it is true that the microstructure has an impact on the performance of the calibration model. This is the reason that only by introducing the SMP parameter L into the model, a significant improvement of the calibration (in particular in depth hoar) could be made over the old approaches of just using the median of the SMP force. Similar things are expected to happen for other snow types. The fact that the SSA point cloud in Fig 2 is not straight but slightly curved further supports that the present calibration model is still missing essential physics.
p23 Fig 12. Please add SNOWPACK profiles as well. à SNOWPACK simulations were added in Fig 12.