These authors contributed equally to this work.

Our current knowledge of spatial and temporal snow depth trends is based almost exclusively on time series of non-homogenised observational data. However, like other long-term series from observations, they are prone to inhomogeneities that can influence and even change trends if not taken into account. In order to assess the relevance of homogenisation for time-series analysis of daily snow depths, we investigated the effects of adjusting inhomogeneities in the extensive network of Swiss snow depth observations for trends and changes in extreme values of commonly used snow indices, such as snow days, seasonal averages or maximum snow depths in the period 1961–2021. Three homogenisation methods were compared for this task: Climatol and HOMER, which apply median-based adjustments, and the quantile-based interpQM. All three were run using the same input data with identical break points. We found that they agree well on trends of seasonal average snow depth, while differences are detectable for seasonal maxima and the corresponding extreme values. Differences between homogenised and non-homogenised series result mainly from the approach for generating reference series. The comparison of homogenised and original values for the 50-year return level of seasonal maximum snow depth showed that the quantile-based method had the smallest number of stations outside the 95 % confidence interval.
Using a multiple-criteria approach, e.g. thresholds for series correlation (

During winter in the Northern Hemisphere, more than 50 % of the earth's surface can be covered with snow

All climate time series comprise a climate signal, a station signal and white noise

Today, this is a standard procedure for climate data like temperature and precipitation

Widely used metrics to describe the snow cover include average and maximum snow depths and days with a snow depth above a certain threshold, referred to here as snow days. This commonly used index is defined as the number of days within a certain time period (e.g. season) with a certain snow depth, usually between 1 and 50 cm

We use the break points recently identified by

Our research questions are the following.

How do the homogenised series compare across the three methods used?

What influence does homogenisation have on the decadal trends in average and maximum snow depth?

How do the three homogenisation methods affect widely used snow indices?

To what extent are the maximum snow depths with a 50-year return period (as an example of snow metrics used by practitioners) affected by the different homogenisation methods?

Daily manual snow depth measurements (HS) from 184 Swiss stations serve as the basis for quantifying the benefit of data homogenisation for snow depth series. Seasonal (November to April) and monthly averages (HSavg), maximum snow depths (HSmax), and the number of snow days

Figure

Summary of autocorrelations for lag (year) 1 for all stations (

We use the set of 45 break points (found in 40 snow depth series) identified by

Left panel: map of Switzerland with all 184 Swiss stations used in this study. The 40 identified inhomogeneous stations with valid break points are highlighted in pink triangles. The green circles are series that are considered homogeneous. Right panel: elevation distribution of the homogeneous stations and those with detected breaks.

Each break point of a candidate series is adjusted by a multiplicative approach to the most recent status of the snow station. This is in agreement for all three adjustment methods applied. Adjustment factors are based on statistical measures of the candidate and reference series, respectively, and applied to the monthly (Climatol, HOMER) or daily (interpQM) values. These statistical measures (e.g. median, quantiles) applied for adjustment are different for the three methods and are described below. It is important to know that the reference series used for adjustment by the three methods are not identical and are selected on different criteria. For interpQM and HOMER, they are known to the user.

All analysed methods use the same data set to select suitable reference stations for the calculation of the adjustment factors based on the pre-determined break points, which in our case are provided by

Climatol

The adjustment factor of a time series

HOMER

interpQM

The use of homogenisation techniques that adjust daily values allows the analysis of the impact on derived indicators that require daily data for their calculation, e.g. snow days. Only the original data and interpQM are compared here, as HOMER only provides monthly or seasonal data and Climatol kept crashing when using the full daily data set. Since we did not want to pre-select stations as this would influence the results, we decided not to use them for this purpose.

interpQM does not add new days with snow (

Theil–Sen slopes

To investigate the effects of homogenisation on extreme snow depths, return levels for the seasonal maximum snow depth (HSmax)

We use the sub-set of 40 stations with identified breaks as input and adjust them with the three methods. While Climatol and HOMER use monthly values as input and thus only provide monthly HSavg and HSmax values, interpQM works with daily snow depth values. From these the analysed seasonal HSavg and HSmax are then derived after the successful adjustment. Decadal trends are calculated for seasonal dHS (snow days) of several thresholds and HSavg aggregated from either monthly HSavg (HOMER and Climatol) or daily HS (interpQM). The largest HSmax value per station, calculated over the entire period (absolute maximum snow depth, maxHSmax), is compared for homogenised and original values. The return levels for seasonal HSmax with a 50-year return period (R50HSmax) are determined either from daily homogenised HS aggregated to seasonal HSmax (interpQM) or from monthly homogenised HSmax (HOMER and Climatol). All calculated trends and the R50HSmax of the different methods are then compared.

Climatol automatically fills in any existing missing dates and interpolates their corresponding values, resulting in an artificially increased length of these series. It also automatically adjusts outliers in the homogeneous period in the default settings.

In the following section, we compare the results of different adjustment methods on the one hand and the homogenised data with the non-homogenised data on the other. In this way we can show both the effect of homogenisation and the dependence of the results on the method used. In Sect. 4.1 we show this as an example for the number of snow days and in particular for the effects on the trend (for interpQM only). Similarly, this is also shown for the maximum snow depth in Sect. 4.2. Finally, in Sect. 4.3 a particular example is given for the magnitude and frequency of extreme snow depth.

Statistics for snow days (dHS) for the period 1961 to 2021 on a seasonal basis with thresholds of 5 (dHS5), 30 (dHS30) and 50 (dHS50) cm for both original (Orig) and interpQM-homogenised data (iQM).

Percentages for significant negative and significant positive, indicated with an asterisk, are calculated based on the total number of negative/positive values, respectively. Positive/negative trends were

Difference in snow day trends between original and interpQM adjusted series for thresholds 5, 30 and 50 cm (dHS5, dHS30 and dHS50). Purple squares indicate stations with a result of

The number of snow days per season was examined for two sub-groups of stations, below (

The adjustments made had the strongest effect on dHS30 and dHS50 at stations above 1000 m, as can be seen in Fig.

The number of snow days per season is declining for the vast majority of stations for all analysed thresholds and elevation levels, as shown in Table

Overall, the homogenisation removed all positive trends and, depending on the threshold for snow depth and elevation sub-set, either did not change or reduced the number of stations without trends: for example, 86 % of the high-elevation stations had a negative trend for dHS30 before and 100 % after the homogenisation. The percentage of low-elevation stations with no trend for dHS50 changed from 42 % to 35 % after the homogenisation, while the percentage of stations with a negative trend was raised from 54 % to 65 %.

In general, the adjustments changed the median and mean trends of both sub-sets for dHS5 and the higher-elevation sub-sets for dHS30 and dHS50 to more negative and the lower-elevation sub-sets of dHS30 and dHS50 to less negative. The mean trends of the lower elevations changed from

The percentage of low-elevation stations with no trend is different for the larger thresholds than for dHS5, where it increases from 0 % to 7 % with increasing altitude but decreases for both dHS30 (from 19 % to 0 %) and dHS50 (from 42 % to 0 %). Homogenisation changed these figures only for dHS50, where instead of 42 % only 35 % of the lower-elevation stations do not show a trend. The number of stations with a negative trend decreased for both dHS5 (from 100 % to 93 %) and the lower stations for dHS30 (from 77 % to 81 %). However, the numbers increased at the higher elevations for dHS30 (from 86 % to 100 %) and at all elevations for dHS50 (from 54 % to 65 % for the lower elevations and from 93 % to 100 % for the higher elevations). A similar pattern is seen in the significant negative trends: an increase at all higher-elevation stations (between 8 % and 19 %) but a decrease at lower elevations for dHS30 (3 %) and dHS50 (33 %). Overall, interpQM weakened the dHS5 trends for 35 % of all the stations, strengthened them for 30 % and did not change them for 35 %. For dHS30, 38 % of all the stations had weaker trends after the adjustments, 40 % had stronger trends and for 22 % they did not change. For dHS50, the trend weakened for 30 %, strengthened for 38 % and remained unchanged for 32 % of all the stations. The adjustments changed the trend of 1 station to non-significant for dHS5 and of 12 to significant. Six stations for dHS30 were changed to non-significant and 10 to significant. For dHS50, the trends of 10 stations were changed to non-significant and those of 12 to significant.

The KS test did not reveal significant differences between the original and interpQM-adjusted time series in the distribution of the dHS5, dHS30 or dHS50 time series for any of the stations analysed. A comparison with the W test also showed no significant differences for dHS5 and dHS30, only at one station (Adelboden) for dHS50.

Comparison of original and homogenised seasonal mean (HSavg) and maximum (HSmax) snow depths for the SLF station in Davos. The thick lines show a Gaussian-filtered time series with a 30-year window and the vertical dashed line the identified break in 1972.

The effect of homogenisation on the mean (HSavg) and maximum (HSmax) snow depths is illustrated using the example of Davos in Fig.

To assess the impact of homogenisation on trends of HSavg and HSmax, decadal trends are calculated for each homogenisation method and the original data, respectively. Figure

Figure

The vast majority of the trends for HSmax, 37 of the original series and 39 for all the homogenisation methods, show a negative trend, as shown on the right-hand side of Table

The performed KS test for revealing noticeable differences between the original and adjusted HSavg time series showed significant differences for four stations for HOMER (Meien, Klosters, Sils-Maria, Stans) and two each for Climatol (Meien, Sils-Maria) and interpQM (Klosters, Stans). The W test showed similar results with six stations for HOMER (Meien, Klosters, St. Moritz, Glarus, Sils-Maria, Stans), five for Climatol (Meien, Klosters, St. Moritz, Glarus, Sils-Maria, Stans) and one for interpQM (Klosters). For a comparison of the results of the adjustment methods, the homogenised time series were compared against each other with the KS and W tests. With the KS test, significant differences were found for all the methods for two stations (Glarus, Stans). The W-test results were also significant between HOMER and Climatol for two stations (Luzern, Stans). For HSmax, the KS test showed significant differences between the original and adjusted time series for three stations for HOMER and interpQM (Klosters, St. Moritz, Elm) and two stations for Climatol (Klosters, St. Moritz). The W test was significant for four stations with HOMER and interpQM (Klosters, St. Moritz, Elm, Sils-Maria) and three with Climatol (Klosters, St. Moritz, Sils-Maria). The adjustment methods were significantly different only with the W test for three stations (La Comballaz, Saanenmöser, Samedan) between HOMER and Climatol.

Comparison of trends calculated with original and homogenised data (Climatol, HOMER, and interpQM) for the period 1961–2021 for HSavg (left-hand side) and HSmax (right-hand side). Stations are ordered according to elevation. Black dots indicate statistical significance with

Statistics for trends of HSavg and HSmax for the period 1961 to 2021.

Percentages for significant negative and significant positive, indicated with an asterisk, are calculated based on the total number of negative/positive values, respectively.

To investigate a possible influence of the homogenisation on the magnitude and frequency of extreme snow depths, the absolute maximum snow depths (maxHSmax) recorded at each station over the entire period, the year with the absolute maximum snow depth and the difference between the original and homogenised maxHSmax are plotted for each station and homogenisation method. Figure

The return levels for 50-year return periods of maximum snow depth (R50HSmax) are calculated from homogenised data and compared with the values obtained from the original data including the 95 % confidence intervals. Figure

Maximum values of HSmax recorded for each station and method over the entire period (1961–2021). Panel

HSmax with 50-year return periods and 95 % confidence intervals for both original (grey) and homogenised data using Climatol (orange), HOMER (blue), and interpQM (yellow). The whiskers represent the 95 % confidence interval for the original values. Stations are ordered according to elevation.

Statistics for R50HSmax: number and percentage of stations that are outside the original's 95 % confidence intervals for each homogenisation method.

The three methods agreed in decreasing the snow depth in the time prior to the breaks for 19 (48 %) of the 40 stations while increasing it for 17 (43 %). For four (9 %) stations, the methods had different signs for the adjustments. The differences between the homogenisation methods were more pronounced for R50HSmax and HSmax than for HSavg.

In contrast to the larger thresholds of the snow day analysis, dHS5 shows almost no differences between the original and homogenised series, confirming the stability of this metric as described by

All but two of the trends for HSavg (in both the original and homogenised data) are negative, which is consistent with the findings from previous snow studies

For most stations, the R50HSmax of the homogenised data is still within the 95 % confidence intervals of the original values. However, depending on the homogenisation method, between 3 and 7 of the investigated 40 stations (see Table

The observed differences between the three methods compared can be explained by the respective methods used to construct the reference series sub-networks and the adjustments. HOMER adjusts the entire period before an identified break point using a single factor, while Climatol uses multiple factors dependent on the reference series constructed using homogenised sub-periods. interpQM, on the other hand, uses multiple adjustment factors based on quantile matching for the entire inhomogeneous period, similarly to HOMER. The range of the applied adjustments for interpQM is shown in Appendix

The selection of suitable reference series is the crucial part of the homogenisation procedure, both for the detection of breaks and for the adjustment step. HOMER can be run in either correlation or distance mode: i.e. the sub-networks are compiled based on thresholds for either correlation or horizontal distances. In Climatol, the sub-networks are formed based on the Euclidean distance between series with a scaling parameter for the vertical component. In interpQM, the user can choose correlation and horizontal as well as vertical distance thresholds. For a height-dependent variable such as snow depth, the ability to select the sub-networks by setting thresholds for vertical and horizontal distances separately proves invaluable. It is possible, albeit cumbersome, to define the sub-networks manually and use them as input for HOMER. The ability in HOMER to visually inspect the set of reference series used for each candidate station can provide a useful indication of how accurately the reference series reflect local climatic or topographic characteristics: for example, does the majority of the reference series come from a completely different micro-climate? This is particularly important for a study area with complex Alpine topography, where neighbouring valleys may have completely different climates: northern/southern, inner-Alpine, or pre-Alpine. Furthermore, these lists of reference series can also be used to identify stations with suspicious reference series that are probably not suitable for homogenisation.

The analysis of the sub-networks for HOMER and interpQM shows that, due to the distance restriction in interpQM, reference series are drawn from a more similar region, whereas in HOMER distant stations with high correlations are frequently included. To avoid selecting close-by but unsuitable reference series due to local climatic variations, the correlation criterion in interpQM works well.

Both

This study is the first in-depth comparison of different homogenisation methods applied to a large network of snow depth series between 500 and 2500 m. The focus is on their influence on the decadal trends of the number of snow days, i.e. days with a snow depth above a certain threshold (5, 30 and 50 cm), the seasonal mean and maximum snow depths (HSavg, HSmax) and extreme snow depths. The results underpin the relevance of homogenising long-term snow depth series for trend and extreme value analysis. Due to the impact of homogenisation on derived trends, this is especially true for conclusions drawn from individual series. In our analyses, for the long-term trends of HSavg and dHS5, the overall picture does not change through homogenisation of original data by median-/mean-based adjustment methods. However, the picture becomes different when a quantile-based homogenisation approach (interpQM) is applied, which in the case of Swiss snow depth series shows the strongest effect, with only negative trends for HSavg and a slight increase in the number of significant trends. The differences between the methods increase when looking at seasonal maximum values: the trends for HSmax, where trends of low-elevation stations were significant only with interpQM, absolute maximum snow depths and extreme values. The homogenisation performed with interpQM increases the confidence in the derived extreme values based on the 95 % confidence interval, which is particularly relevant for engineering applications. As far as snow days are concerned, the quantile-based adjustments had the strongest impact on the larger snow depth thresholds.

Our results support a homogenisation approach that separates the break point detection from the adjustment procedure, e.g. to use the robust combined detection approach described in

So far, the homogenised snow depth time series have shown no evidence of a bias in the methods towards increasing or decreasing snow depths due to the adjustments made, neither in Austria nor in Switzerland. In this study, depending on the homogenisation method, the mean snow depth before a break was increased at about 52 %–57 % of the stations and decreased at between 42 % and 45 % of them; 95 % of the 40 inhomogeneous stations show a negative trend for seasonal mean snow depth in the original data, which is significant for 58 %. These figures are lower for the 144 homogeneous stations in the data set, where 78 % show a negative trend that is significant for 50 %.

As pointed out, break detection for snow depth is preferably done using the described two-out-of-three method. From our experience, there is no incentive or advantage to using automatic homogenisation methods such as HOMER and Climatol. On the contrary, automatic methods open the door to unintended automatic outlier corrections or adjustments based on the selection of reference series that are sufficiently correlated but that cannot be assigned in a climatologically meaningful way. To achieve reasonable results, these methods require a certain degree of user intervention, e.g. the use of a pre-defined selection of reference stations, thresholds for correlation, and horizontal and vertical distances. Therefore, it seems promising to separate the detection and adjustment of breaks using the described two-out-of-three method for detection and interpQM for the adjustment, as it provides reliable results, especially for larger snow depths, and yields daily data.

Input data for the various homogenisation methods are available on EnviDat at

The study was devised by MBu and CM with input from GR, WS and MBe. Snow day analysis was performed by GR, HSavg and HSmax analysis by MBu. The figures were produced by GR and MBu. GR and MBu discussed the results with input from CM, SB and WS. MBu and GR wrote the initial draft. The article was finalised by GR with contributions from all the co-authors.

The contact author has declared that none of the authors has any competing interests.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The authors want to thank the two anonymous reviewers for investing their time in improving and polishing our manuscript with ideas and constructive comments. We would also like to thank MeteoSchweiz, SLF, ZAMG and the Austrian Water Budget Department for access to their data sets. For data juggling, homogenisation and evaluation, R 4.2

This research has been supported by the FWF (Fonds zur Förderung der wissenschaftlichen Forschung, project Hom4Snow, grant no. I 3692) and the Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung (grant no. SNF:200021L 175920).

This paper was edited by Masashi Niwano and reviewed by two anonymous referees.