Identification and correction of snow depth bias in ERA5 datasets over Central Europe using machine learning

Stachura, Gabiel; Ustrnul, Zbigniew

doi:10.5194/tc-20-2295-2026

Articles | Volume 20, issue 4

https://doi.org/10.5194/tc-20-2295-2026

Articles | Volume 20, issue 4

Research article

21 Apr 2026

Research article |

| 21 Apr 2026

Identification and correction of snow depth bias in ERA5 datasets over Central Europe using machine learning

Gabiel Stachura and Zbigniew Ustrnul

Abstract

Accurate estimation of snow depth is a crucial problem from both meteorological and hydrological points of view. Global and regional reanalyses still struggle to address it, mostly because the scale of snow spatial heterogeneity is widely beyond current resolutions of the datasets. In the study, snow depth estimations from Copernicus reanalyses ERA5 and ERA5-Land are compared and evaluated against point measurements in Poland, the Czech Republic and Slovakia in winter seasons 2001/2002–2020/2021. Additionally, a Random Forests (RF) model is developed to reduce identified errors based on various environmental variables and parameters derived from the reanalyses and a digital elevation model. As mountains are main snow water reservoirs for Central Europe, the model is then used to spatially downscale snow depth over a fine-scaled subdomain in mountainous terrain.

For both reanalyses, the deviations are relatively small in flat or gently rolling terrain (below 500 m a.s.l.). ERA5 (0.25°) outperforms ERA5-Land (0.1°) due to the presence of data assimilation. Since only synop measurements are assimilated, errors are the lowest for these stations, however, climate and precipitation stations are also affected. In more complex terrain, both reanalyses exhibit an underestimation of snow that increases with elevation. In this area, ERA5-Land is slightly less biased due to its higher resolution and the fact that observations from mountainous sites are often masked out from the data assimilation in ERA5. The proposed RF model improves accuracy of estimation by around 48 % with respect to the best reanalysis. The results of spatial downscaling certainly provide added value to the problem of snow estimation in complex terrain. Although they cannot be considered entirely valid and reliable since not all factors determining spatial variability of snow at such resolution are taken into account, they might be useful for future studies concerning, e.g., climatological variability of snow with respect to altitudinal zonation.

Download & links

How to cite.

Received: 14 Oct 2025 – Discussion started: 24 Nov 2025 – Revised: 03 Feb 2026 – Accepted: 24 Feb 2026 – Published: 21 Apr 2026

1 Introduction

Accurate estimation of snow cover, being a phenomenon at the interface between the atmosphere and the land surface, is of particular interest to meteorologists and hydrologists. From a meteorological point of view, as soon as snow cover occurs, it alters, e.g., surface radiation budget and roughness, which in turn affect other meteorological elements such as near-surface air temperature or wind speed (Pomeroy and Brun, 2001). From a climatological perspective, seasonal variations of snow cover duration and snow depth are used as one of the climatological characteristics of winter (Brown and Mote, 2009; Falarz and Bednorz, 2021). Although in numerical modelling it is usually snow water equivalent (SWE) which is a principal variable storing information about the amount of snow at every model timestep (ECMWF, 2016b; Le Moigne et al., 2018), there are issues in favour of considering snow depth instead. First of all, snow depth in situ measurements are taken more easily than SWE – besides a snow stick, no special equipment is needed. Secondly, they are more temporally continuous since there is no minimum amount of snow, below which taking a measurement is unfeasible. Consequently, the in situ data coverage for snow depth is higher than in case of SWE. Last but not least, it is snow depth which is actually assimilated in most of the models, as the model background field is transformed from SWE to snow depth prior to performing snow analysis (ECMWF, 2016a; Helmert et al., 2018).

In recent years, there has been a tremendous growth of studies that use reanalyses as a proxy for observations (Nouri and Homaee, 2021; Palarz et al., 2020; Vernay et al., 2022). Hence, reanalyses validation becomes a fundamental task. There are, however, several factors that make comparing point snow depth measurements with gridded datasets more complicated than in the case of other meteorological elements, such as 2 m air temperature or precipitation. First of all, spatial heterogeneity of snow depth is much larger than the size of a grid cell of most existing datasets. Hence, what is actually compared is a point measurement against a grid-averaged value (Copernicus Climate Change Service, 2019). Secondly, snow depth is a cumulative parameter. As a result, bias between modelled and measured snow depth, assuming no data assimilation, has cumulative nature. It makes verification a non-trivial task. Naturally, the problem occurs at most in areas with snow cover persisting throughout a winter season and is less severe where snow cover is intermittent.

Despite the above-mentioned hurdles, accuracy assessment of snow in ERA5 and ERA5-Land reanalysis using point measurements from meteorological stations has already been carried out in literature. A study conducted by the provider of the reanalysis showed that both reanalyses underestimate snow at mountain sites, however, at altitudes up to 1500 m a.s.l. ERA5 generally outperforms ERA5-Land (Muñoz-Sabater et al., 2021). For more elevated sites, ERA5-Land is more accurate due to the added value of higher resolution. It is also shown that the superiority of one reanalysis against the other is region-dependent. Overestimation of snow in ERA5 has been identified in several studies over the High Mountain Asia (HMA) region (Lei et al., 2023; Orsolini et al., 2019; Wang et al., 2021). Lei et al. (2023) stated that the bias is especially distinct during the ablation period. They also spotted it in ERA5-Land, however, to a much lesser degree than in ERA5. Smaller overestimation of snow in ERA5-Land was also recognized over the mountainous part of Iran by Majidi et al. (2025). On the other hand, they detected a systematic underestimation in the atmospheric ERA5 across all elevations. Opposite conclusions were drawn by Monteiro and Morin (2023) who examined snow bias in the Alps. They found that it is ERA5-Land that has greater positive bias. The bias increases with elevation, while in ERA5 it becomes negative at altitudes below 600 m a.s.l. Overestimation of snow in ERA5-Land was reported in the Alps also by Pflug et al. (2025) and in the Atlas Mountains in Africa by Baba et al. (2021). However, it should be noted that those two works concern only SWE. Particularly interesting is research done by Varga and Breuer (2023) as it focused on the Pannonian Basin, which is just to the south of the area of this study. They claimed that ERA5-Land overestimates snow while ERA5 matches observations well. However, their analysis was limited only to flat terrain. In addition, they concluded that the quality of ERA5 away from assimilated sites is yet to be investigated.

Monumental intercomparisons concerning ERA5 as well as other regional and global reanalyses have shown that systematic biases in snow estimation are a common issue (Monteiro and Morin, 2023; Mudryk et al., 2025). Hence, there have been a couple of methods developed to reduce them. Some studies address it by dynamical downscaling, using Weather Research and Forecasting (WRF) or some other regional numerical model (Baba et al., 2021; Varga and Breuer, 2023; Yang et al., 2021). Wang et al. (2021) used the Japanese 55-year Reanalysis (JRA-55) to correct biased snow depth in ERA5 over the HMA region using a linear scaling factor. Recently, machine learning (ML) algorithms have been recognised as a robust tool to achieve this goal, particularly Long Short-Term Memory (LSTM) and Random Forests (RF) models. King et al. (2020) used RF to reduce overestimation of SWE in a regional 1 km resolution reanalysis in Ontario, Canada. Two studies concerning the application of LSTM to forecast SWE based on meteorological forcing over the western United States reached the conclusion that without nudging the results with observations, LSTM predictions suffer from bias that grows over time (Cui et al., 2023; Song et al., 2024). It seems, however, that application of ML is the most advantageous for prediction based on combining data from multiple datasets and sources. Qiao et al. (2021) merged data from ERA-Interim, Modern-Era Retrospective Analysis for Research and Applications (MERRA-2), Global Land Data Assimilation System (GLDAS) and microwave remote sensing to obtain snow depth over China. A similar job was done in the Western Himalayas by Tanniru et al. (2023) as they ran an extremely randomised trees model combining data from ERA5, MERRA-2, Canadian Meteorological Centre (CMC) and JRA-55 reanalysis. A broader study area occurs in the work of Hu et al. (2024) where they combined several reanalyses (e.g., Era-Interim, MERRA-2) and remote sensing snow products to create a gridded dataset for the Northern Hemisphere using RF. The results were superior to input reanalyses, however, the resolution was quite coarse (0.25°). Work on the integration of satellite products and data from ERA5 using LSTM to get continuous spatial distribution of snow depth in the Atlas Mountains is ongoing in Morocco (Elyoussfi et al., 2023).

In summary, studies investigating snow depth and SWE in ERA5 and ERA5-Land revealed it is mostly overestimated, with ERA5-Land being generally biased more. However, as pointed out by Muñoz-Sabater et al. (2021), the exact outcome may be different as it is region- and elevation-dependent. Moreover, the work of Varga and Breuer (2023), which is to our knowledge the only one concerning Central Europe, is limited only to flat terrain and synoptic stations. Hence, there remains a need to address the problem more thoroughly (using more station data) and more specifically – for the part of Central Europe which involves also complex orography. Therefore, the first objective of the study is to assess spatial and temporal variability of bias in ERA5 and ERA5-Land snow depth estimation in Central Europe. The assessment is followed by an investigation of the potential of ML to combine information from these two reanalyses in order to reduce the identified errors and create a consistent snow distribution over a subdomain with complex orography, which constitutes the second aim of the study. Although several robust ML tools have been proposed in literature, they usually concern large domains and therefore their spatial resolution often does not correspond to the scale of topographic complexity. Hence, beyond reducing systematic errors, the study uniquely considers the problem of ML-based spatial downscaling to the hectometric scale.

This paper has been divided into four parts. The first part describes input data and how it is processed to become predictors for ML training. Then, a RF algorithm as well as verification metrics are briefly introduced. Additionally, a setup for a spatial downscaling experiment is defined. The Results section deals with assessing the accuracy of both reanalyses and the RF model. It is followed by the discussion which handles, e.g., circumstances favouring large deviations and the influence of data assimilation. Eventually, limitations of emulating spatial variability of snow in high resolution in complex terrain are carefully analysed.

2 Data and methods

2.1 Study area

The investigated area extends from 48.5 to 55° N and 14 to 24.5° E, representing the north-eastern part of Central Europe. It covers the whole administrative area of Poland and most of the Czech Republic and Slovakia. Its northern and central parts belong to the North European Plain and are usually flat or gently rolling with elevations not exceeding 200 m a.s.l. (Fig. 1). South of that, uplands and mountain ranges occur, belonging to the Central European Uplands. Two major mountain ranges can be distinguished: the Sudetes and the Western Carpathians. Their elevation ranges from 800 to 1600 m a.s.l. in the Sudetes and from 800 to 2650 m a.s.l. in the Western Carpathians. The south-western edge of the study area also reaches the Bohemian Forest (800–1400 m a.s.l.). Average snow conditions in the study area are characterised in the Appendix A.

https://tc.copernicus.org/articles/20/2295/2026/tc-20-2295-2026-f01

Figure 1Orography map of the study area with country borders and stations included in the study (S: synop, C: climate, P: precipitation).

2.2 Study period and ground-based data

The study spans 20 winter seasons defined as a period between 1 November and 30 April in years 2001–2021. While the assumed duration of a winter period is certainly too short for high-elevated sites (particularly regarding a snowmelt period), it was set as a reasonable compromise between losing some data and a large amount of empty data (i.e., without snow) from the majority of the stations. The initial and final years of the study period were specified so as to capture considerable interannual winter variability while having limited access to historical snow depth measurements in national weather centres. A more detailed description of snow conditions in the study period may be found in the Appendix A.

Ground-based data includes snow depth measurements taken daily at 06:00 UTC at meteorological stations of national weather services of Poland, the Czech Republic and Slovakia. Only data from manned stations were collected. The stations are grouped according to their importance into three ranks: synoptic (reporting every hour according to an international SYNOP standard), climate (reporting several basic meteorological elements internally at 06:00, 12:00 and 18:00 UTC) and precipitation (reporting only daily precipitation sum and snow phenomena internally at 06:00 UTC). The methodology of snow depth measurements is the same across the station ranks, however, due to diverse quality control mechanisms, credibility of data is somewhat lower for non-synops. Additionally, measurements at a few top-mountain stations are done in an approximate way due to site's high susceptibility on snow deflation. A total of 340 stations (91 synoptic, 140 climate and 109 precipitation) were selected according to the following criteria (ranked from most to least important):

average snow cover duration
data completeness throughout the study period
relatively even spatial distribution of stations in the study area
station rank

Consequently, stations at mountainous sites were included even if they worked only for a few winter seasons. For this reason, a denser station network could be seen in the southern part of the study area (Fig. 1). If two stations with complete data records but different ranks were close to each other, the higher-ranked one was picked. Although station elevation ranges in total from 1 to 2635 m a.s.l., 50 % of the stations lie below 300 m a.s.l. and only 5 % above 1000 m a.s.l.

2.3 ERA5 and ERA5-Land reanalyses

A number of meteorological and topographic fields from ERA5 and ERA5-Land reanalyses have been retrieved using the Climate Data Store API. Both datasets belong to the ERA5 family, which is the latest (as of the time of submitting the manuscript) generation of global reanalysis of past weather conditions produced by ECMWF (Hersbach et al., 2020). A full list of parameters from both reanalyses is to be found in Table B1 (Appendix B).

ERA5 is an atmospheric reanalysis combining short-ranged numerical forecasts with meteorological observations during a process of data assimilation (DA). It has 0.25° horizontal resolution (around 30 km) and 1 h temporal resolution. It assimilates snow depth observations from synop stations and some additional national snow measurements that are available on the Global Telecommunication System (GTS) (ECMWF, 2016a; de Rosnay et al., 2015). As claimed by the ECMWF (2024), no other measurements than synops are assimilated from the area of Poland, the Czech Republic and Slovakia. The observations are then combined with a snow-depth background field (estimated from forecasted SWE and snow density) and a satellite-based snow extent product (ECMWF, 2016a). The Optimal Interpolation algorithm is used for that purpose. The analysis is performed every 6 h starting at 00:00 UTC. It is worth noting that the snow extent product was not included in the DA system for data before 2004, which is a potential source of discontinuity. Furthermore, for data after 2010 its horizontal resolution was upgraded from 24 to 4 km (Mortimer et al., 2020; de Rosnay et al., 2015). Additionally, before the exact DA, observations undergo various quality checks. The most important ones regarding snow are that the maximum threshold of 140 cm for in situ snow depth data is given (it is lowered to 70 cm if background 2 m air temperature exceeds 8 °C) and the satellite-based snow extent is not assimilated from areas above 1500 m a.s.l. (ECMWF, 2016a). In the output dataset, snow depth occurs in fact as SWE – it is defined as the depth of water after the whole snow melted and was uniformly spread over a grid box. To obtain snow depth sensu stricto, it is necessary to divide the field by snow density. Hence, bias in snow density may be transmitted to snow depth.

ERA5-Land is a land surface reanalysis run in an offline mode with an atmospheric forcing coming from the ERA5 dataset (Muñoz-Sabater et al., 2021). Output fields are produced with a 1 h frequency at the enhanced 0.1° horizontal resolution, which is roughly three times higher than in the case of ERA5. No DA is deployed in ERA5-Land. Observations are indirectly passed to the model through an atmospheric forcing. The forcing includes surface pressure, air temperature, specific humidity, wind components, liquid and solid precipitation as well as solar radiation fluxes. It is important to highlight that no snow fields are passed to ERA5-Land. Just like in ERA5, actual snow depth is retrieved from two prognostic variables: SWE and snow density.

2.4 Auxiliary data

Data from the Digital Elevation Model (DEM) – Shuttle Radar Topography Mission Version 4.1 is used to provide information about altitude and topography (Jarvis et al., 2008). The data is in 3^′′ horizontal resolution (around 75 m in the study area).

In addition to station-based measurements of snow depth, data from the public GlobalSnowPack (GSP) database is used as a supplementary and independent source of information on snow cover extent. The data provide daily snow cover extent with 500 m horizontal resolution derived from MODIS satellite measurements (Dietz et al., 2015). The product is already gap-filled and quality-tested. Data gaps caused, e.g., by cloudiness are filled with temporally interpolated values. The data were not used for ML training but merely for verification.

2.5 Data preprocessing

2.5.1 Ground-based data

If a trace of snow (i.e., snow depth lower than 0.5 cm) or snow in patches (i.e., less than one-half of the ground covered by snow) was reported, snow depth was assumed to be 0 cm. If snow cover was discontinuous but it covered more than one-half of the ground, snow depth was kept as measured.

2.5.2 ERA5 and ERA5-Land reanalyses

Reanalysis data was extracted for every station using nearest neighbour interpolation as implemented in the Climate Data Operators (CDO) function remapnn (Schulzweida, 2023). Next, selected fields were temporally aggregated in order to capture their variability in between snow depth measurements, which are taken once a day at 06:00 UTC. The exact metric of aggregation and its timeframe were subjectively defined by the authors depending on the nature of a meteorological element and its potential impact on snow depth. See Table B1 (Appendix B) to find out how exactly every reanalysis variable was aggregated.

Since the nearest grid point for a few coastline stations is covered predominantly by a sea, a number of land variables are not available there. Therefore, an automatic search of the nearest land grid point was performed and assigned to a station. Most often it was one of the other surrounding grid points. One station was removed from the database as it lies at the tip of a 35 km long sandspit that expanded into the Baltic Sea and therefore, a shift by 0.4° would be required.

2.5.3 Auxiliary data

The following topographic parameters were calculated based on the DEM: topographic position index (TPI), sky view factor (SVF) and a fraction of the surrounding area with higher elevation. Dedicated R packages terra and horizon were used for that purpose (Hijmans, 2020; Van Doninck, 2018). Computations were performed for every grid point for a set of user-defined radii ranging from 100 to 30 000 m in order to capture land relief variability at different scales. However, due to high computational cost, the calculation of SVF and a fraction of the surrounding area with higher elevation was limited to 5000 m. Data from DEM were also used to calculate the daily integrated amount and duration of direct incoming solar radiation. This was performed using a Points/Area Solar Radiation algorithm available in the ArcGIS Spatial Analyst Toolbox (ArcGIS 10.2.1). Apart from DEM-based predictors, a few temporal and station-embedded variables were also included in the dataset in order to emulate the passing of time and spatial distribution of data, respectively. The exact list of parameters and their aggregations is to be found in Table B2 (Appendix B).

https://tc.copernicus.org/articles/20/2295/2026/tc-20-2295-2026-f02

Figure 2A schematic diagram of data used for training of a RF model. A group of input predictors (left side of the diagram) is compared against station data which is used as the target variable (on the right).

Download

2.6 Machine learning

The RF algorithm designed by Breiman (2001) and implemented in the R environment (package ranger) by Wright and Ziegler (2017) is used in the study. It generates an ensemble of decision trees by bootstrapping the training data, which are then aggregated into a final prediction. Generalisation skill and performance accuracy are estimated on a training set using an out-of-bag error (Breiman, 2001). Along with neural networks, RF has been commonly deployed at various stages of weather and climate modelling (Bochenek and Ustrnul, 2022; de Burgh-Day and Leeuwenburg, 2023). In this study, we used it to predict an absolute value of snow depth, which is a regression problem (Fig. 2). The initial set of around 120 predictors was prepared based on meteorological knowledge of physical processes affecting snow depth. It was eventually shrunk to 70 predictors in a process of feature selection to account for redundancy and multicollinearity. The selection was made based on the Boruta algorithm implemented in the R package Boruta, permutation variable importance from the ranger package and Variance Inflation Factor implemented in the collinear package (Benito, 2023; Kursa and Rudnicki, 2010). The predictors are listed in Tables B1 and B2 (Appendix B). Despite feature selection, the total number of predictors is still larger than typical in literature, mainly due to remaining correlations and the necessity of temporal aggregation of some fields (which is not the case for LSTM). Next, an automated tuning of training hyperparameters from the tuneRanger package was performed to find the best model settings (Probst et al., 2019). The tunable parameters are: minimum node size (min.node.size), the number of predictors at each split (mtry) and the fraction of observations at each split (sample.fraction). The number of trees in the forest was increased to 1000 in order to adapt it to the final number of input features and thus ensure more stable estimation of the out-of-bag error and variable importance (Boehmke and Greenwell, 2020). A set of parameters that gives the lowest out-of-bag error was picked as the most optimal configuration, which is mtry = 31, min.node.size = 2 and sample.fraction = 0.895.

Predictions of the RF model for the whole dataset (20 winter seasons) were produced using a leave-one-year-out cross-validation. This is a particular example of k-folds cross-validation method that is often applied when handling time-series meteorological data (Hu et al., 2024; Sebbar et al., 2023). A set of 20 RF models was generated where each winter season was treated as a test set, while the remaining 19 seasons were used for training. In a similar manner, spatial cross-validation was also carried out to ensure the model generalisation skills are satisfactory when some stations are excluded. To this end, the study area was divided into 5 longitudinal blocks (every 2° of longitude). A relative number of stations in each block varied from 13 % to 26 %. The average RMSE training error was 2.51 cm (σ=0.03) for the temporal and 2.45 cm (σ=0.09) for the spatial split. A detailed list of cross-validation metrics is to be found in Table C1 (Appendix C).

https://tc.copernicus.org/articles/20/2295/2026/tc-20-2295-2026-f03

Figure 3A subdomain selected for a spatial experiment (Tatra Mountains). Stations are coloured according to their rank (S: synop, C: climate, P: precipitation).

2.7 Spatial downscaling experiment setup

Apart from point verification on station locations, the set of 20 RF models trained via leave-one-season-out strategy was also tested over a small subdomain in complex terrain to assess its skill in downscaling of ERA5 snow depth estimations. The domain covers the Tatra Mountains – the highest range in the Carpathian Mountains located on the border between Poland and Slovakia (Fig. 3). Several factors determined this selection. First of all, due to its elevation, it is the largest snow reservoir in the study area. Hence, accurate estimation of snow amount is crucial there. Secondly, the horizontal resolution of both reanalyses is much lower than the scale of orographic complexity of most of the area, which results in large biases that need to be reduced. Hence, it was expected that the added value of the DEM would benefit here the most. Furthermore, there is a considerable spatial coverage of observations in this area which makes accuracy evaluation more reliable. The output horizontal resolution was the DEM's resolution (3^′′).

As the RF model was trained on point data, its output is also a point prediction. Consequently, the final outcome of the experiment is in fact just an ensemble of point predictions produced separately for every grid point of the domain. Before predicting, a test set was prepared by bicubicly regridding all predictors coarser than the output grid (i.e., all except the DEM-based ones). The regridding was done using CDO software (Schulzweida, 2023). The predictions are then analysed only at one timestep, primarily for demonstrating potential spatial skills rather than thoroughly assessing them, which could be a subject of separate research. Additionally, in light of obtained results, sensitivity analysis of wind-related predictors (daily mean wind speed, wind direction and daily maximum wind gusts at 10 m) was conducted.

2.8 Evaluation metrics

To assess the accuracy of the predictions and the reanalyses, commonly known verification metrics such as Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Error (bias), standard deviation of bias (SD) and MAE Skill Score (SS_MAE) are used in the study (Table 1). RMSE is by far the most commonly used benchmark metric employed to evaluate reanalysis or forecast accuracy, however, as it penalises greater deviations more, errors occurring at mountainous stations would affect the global metric more than in the case of MAE. Hence, it was decided to use MAE as a basic verification metric in the study. Another snow-specific issue caused by the cumulative nature of snow is an increase of error with an increasing amount of snow (provided no DA is applied). To account for that, normalisation is carried out (Monteiro and Morin, 2023; Muñoz-Sabater et al., 2021). In the study, mean snow depth is used to normalise MAE (denoted as nMAE). To avoid the influence of snowless days, all metrics were calculated only for data when either measured or reanalysed snow depth was above 0 cm.

Table 1Verification scores used in the study (adapted from Warner, 2011). Denotations: N: number of data samples, F_i: ith forecasted value, O_i: ith observed value, MAE_RF: MAE of an evaluated RF forecast, MAE_ref: MAE of the best reanalysis.

Download Print Version | Download XLSX

Table 2Average accuracy of snow depth estimation in ERA5 and ERA5-Land.

Download Print Version | Download XLSX

https://tc.copernicus.org/articles/20/2295/2026/tc-20-2295-2026-f04

Figure 4Bias (top row), MAE (middle row) and nMAE (bottom row) of snow depth at every station in relation to elevation for ERA5 (left column) and ERA5-Land (right column). Stations at which a given reanalysis performs better than the other are shown in green, while those where it performs worse – in grey. In zoomed areas in the middle row, stations are coloured according to their rank (S: synop, C: climate, P: precipitation).

Download

3 Results

3.1 Reanalyses accuracy in relation to elevation

Considering overall verification scores from Table 2 (bottom row), estimation of snow depth is more accurate in ERA5-Land than in ERA5. While the difference in MAE is relatively small, it is much higher for bias. To address this issue, metrics in relation to elevation need to be examined. Aggregated metrics for elevation intervals (Table 2) as well as plots for bias and MAE (Fig. 4a–d) show a clear increase of error with elevation. Particularly for elevations over 1000 m a.s.l., the trend is roughly linear. Both reanalyses severely underestimate snow depth there, however, the deviations are slightly lower in ERA5-Land. What differentiates the datasets the most is bias tendency below 1000 m a.s.l. While in ERA5-Land it is predominantly positive (only 15 % of stations experiencing underestimation of snow), in ERA5 it is diverse, with substantial negative bias at some sites at 500–1000 m a.s.l. Regarding MAE, it is interesting to notice that although the global score from Table 2 gives favour to ERA5-Land, at 65 % of stations it is actually ERA5 that performs better. The ratio increases even to 75 % if stations below 500 m a.s.l. are considered.

In zoomed areas in Fig. 4c–d, stations are coloured according to their rank. A couple of issues are striking here. Errors in ERA5 can go as low as around 0.5 cm, whereas in ERA5-Land it is always higher than 2 cm. Additionally, the majority of the lowest errors in ERA5 occur at synop stations (red dots), while in ERA5-Land no clear separation is visible. This could be directly attributed to data assimilation. Thirdly, for ERA5 there is an error spike near the sea level, which is barely distinguishable for the other reanalysis. The rise of error affects mostly synops, which could be explained by the fact that these stations were dismissed from data assimilation due to their location at the interface between land and sea.

To compare MAE between stations in a more objective way, the metric was normalised with the mean snow depth for each station (Fig. 4e–f). As a result, the relationship with elevation is more complex, and no clear trend is visible anymore, especially for ERA5-Land. In both datasets, the highest errors occur at an altitude range of 400–700 m a.s.l. These are predominantly stations lying in the foothills of the Carpathian Mountains and the Sudetes rather than high mountainous ones. If a threshold value of nMAE = 1 is considered, there are only three sites exceeding it in ERA5: two lower-rank ones in the Sudetes (Karpacz, Przesieka) and one Slovakian synop Poprad-Gánovce (Fig. 5). For ERA5-Land the threshold is violated by a number of stations in mountain valleys or basins, including synops, e.g., Zakopane, Liesek or Kłodzko. The error for these sites in ERA5 is usually 3–4 times lower, likely owing to data assimilation.

https://tc.copernicus.org/articles/20/2295/2026/tc-20-2295-2026-f05

Figure 5Location of stations with nMAE of snow depth estimation exceeding 1 cm. Stations are coloured according to their rank (S: synop, C: climate, P: precipitation). The colour of the label denotes reanalysis, for which the error occured (blue: ERA5, light green: ERA5-Land, purple: both).

https://tc.copernicus.org/articles/20/2295/2026/tc-20-2295-2026-f06

Figure 6Annual variability of MAE (a) and nMAE (b) for ERA5, ERA5-Land and RF predictions (lines), SS_MAE of RF over the best reanalysis (barplot) and the mean snow depth for all stations (boxplot).

Download

3.2 Random Forests interannual performance

Having analysed the elevation-dependent differences in both reanalyses, now the interannual variability will be investigated (Fig. 6). Seasonal MAE averaged across all the considered stations changes substantially throughout the seasons. Both reanalyses correlate well with each other with respect to this metric. The highest values occur in the seasons 2004/2005, 2005/2006, 2011/2012 and 2018/2019, which were all snowy ones. The lowest MAE refers to the 2013/2014 season, which was in fact one of the least snowy ones, however, also to the 2009/2010 season, which was actually quite abundant in snow. Additionally, the season 2019/2020, which is by far the least snowy, has relatively large errors, particularly in ERA5. The error for RF predictions is lower by around 50 % than for the reanalyses and exhibits similar changes in relation to the mean snow depth. To remove this impact, nMAE was also analysed (Fig. 6b). After normalisation, the intraseasonal fluctuations are less severe and the errors are relatively steady in time. The lowest error for both reanalyses is achieved in the 2009/2010 season (for ERA5-Land also the 2004/2005 season). What is striking is a sharp drop of the error for the reanalyses from 2008/2009 to 2009/2010. As regards RF, two seasons are easily distinguishable by an increased value of the error: 2010/2011 and 2013/2014. Since the reanalyses performed quite well then, the relative improvement (SS_MAE) is much smaller than the average (11 % and 31 %, respectively). This could be attributed to the unusual winter conditions during that time. However, this is barely visible on boxplots as the conditions occurred mainly at mountainous stations, which are scarcely represented yet provide a lot of data with snow. The two seasons turned out to be the driest ones in the study period and involved frequent melting episodes (Fig. 7). Consequently, the evolution and accumulation of snow cover were then significantly deviated, particularly regarding the second part of the seasons.

https://tc.copernicus.org/articles/20/2295/2026/tc-20-2295-2026-f07

Figure 7Temporal evolution of quantiles of snow depth distribution in seasons 2001/2002–2020/2021 for mountain synop stations: Štrbské Pleso, Kasprowy Wierch and Śnieżka. Light blue indicates total range, while darker blue – interquartile range. The median is shown as black line. The olive and red lines represent 2010/2011 and 2013/2014 seasons, respectively.

Download

In total, RF estimated snow depth with the mean bias of 0.26 cm and MAE – 5.5 cm, which provides a large improvement upon both reanalysis (compare with the scores in Table 2). Metrics calculated for each station show that RF is superior at 70 % of them, while at the majority of the remaining 30 %, RF is outperformed by ERA5 only slightly (less than 0.5 cm of MAE difference). These are mostly synops or lower-rank stations strongly affected by DA and thus having very small errors and therefore little space for improvement.

https://tc.copernicus.org/articles/20/2295/2026/tc-20-2295-2026-f08

Figure 8Spatial variability of snow depth on 12 December 2016, 06:00 UTC in ERA5-Land (a) and the result of downscaling by RF (b). Measured and predicted values for stations are shown in black and red, respectively. Snow-free areas from the GSP database are indicated by hatching. The bottom map depicts land relief (for comparison purposes only). Stations are coloured according to their rank (S: synop, C: climate, P: precipitation).

3.3 Spatial downscaling experiment

As far as the downscaling experiment is concerned, the fine-scaled (3^′′) spatial distribution of snow depth on 12 December 2016 at 06:00 UTC produced by RF based on coarse-resolved reanalyses ERA5 (0.25°) and ERA5-Land (0.1°) is depicted in Fig. 8b. Additionally, snow-free areas as seen by the GlobSnowPack are indicated in the figure using hatching. For comparative purposes, snow depth from ERA5-Land with native grid mesh and land relief from DEM (3^′′) is also shown (Fig. 8a and c, respectively). The date was chosen based on observations availability and climatological characteristics of the season. Additionally, in order to benefit from information provided by the GSP database, snow cover should not occur in the entire experiment domain. Moreover, an early stage of snow accumulation was deliberately selected not to violate a 140 cm threshold of maximum snow depth allowed in analysis during the DA process in ERA5 (ECMWF, 2016a). The predictions were initially supposed to be verified with spatially interpolated measurements. However, probably due to high terrain complexity, none of the interpolation methods yielded meaningful results. Consequently, verification concerning snow depth is limited only to station points. Spatial distribution of snow could be evaluated only qualitatively, using a snow mask product from the GSP database.

The forecasted and measured values of snow depth are shown on the map in red and black, respectively. What is easily noticeable is the elevation dependence of the field. The greatest spatial variability occurs in highly complex terrain. In the northern part of the domain, where elevation does not vary that much, the predictions are quite close (2–6 cm). To begin with, some of the northern parts of the domain are snow-free (as indicated by both site observations and the GSP snow mask), while RF still sees thin snow cover there. The overall agreement between GSP and site observations is good – out of four stations with snow depth equal to or less than 1 cm, snow-free areas are present in the vicinity of three of them. Only one station (Białka Tatrzańska) shows a mismatch, however, what was reported there was actually a trace of snow. Taking into account also reports from Liesek synop station which include sleet that day, the mismatch might be a matter of a difference of a few hours between a satellite overpass and 06:00 UTC. Opposite bias is noticeable in the south-western part of the domain – RF underpredicts actual snow depth at two stations at the foothills of the Tatra Mountains (Tatranská Lomnica and Tatranská Polianka). Hence, a conclusion could be drawn that RF is underdispersive in areas with low elevation variability.

As far as stations in the Tatra Mountains are considered, deviations slightly increase. The direction of bias is diverse, however, underestimation prevails. The greatest one occurs at Lomnický Štít. This is a mountain-top station at the highest altitude in the dataset (2635 m a.s.l.). Apparently, the RF model predicts the Eastern Tatras to have slightly less snow than the western part, which stands in contrast to observations as well as typical snow conditions there. This might imply a systematic error that may be derived from the reanalysis (as the SE corner of the domain had the least snow in ERA5-Land).

https://tc.copernicus.org/articles/20/2295/2026/tc-20-2295-2026-f09

Figure 9Spatial variability of snow depth along the main ridge of the Western Tatras (a) and its orography (b). The main valleys are marked with numbers: 1: Chochołowska Valley, 2: Kościeliska Valley, 3: Tichá Valley.

Closer inspection of Fig. 8b shows another interesting issue – the RF model accumulated more snow on northern slopes. Hence, it might be concluded that it was able to emulate the shadowing effect of the orography. This is to be seen at best along east-west-orientated ridges, particularly along the main ridge of the Western Tatras, which is shown in magnification in Fig. 9. This is likely to be an effect of including DEM-based parameters like potential solar radiation and SVF into the training dataset.

https://tc.copernicus.org/articles/20/2295/2026/tc-20-2295-2026-f10

Figure 10Importance score for the 20 most important predictors of the reference RF model. Predictors were manually grouped into categories denoted with colors. Variable abbreviations are explained in Tables B1 and B2. Following suffixes are used: cum: cumulative parameter, age: snow age, avg24: daily average value. The x axis is logarithmic.

Download

3.4 Sensitivity test of wind predictors

Apart from the shadowing effect, it is commonly acknowledged that above the tree line, it is wind redistribution that affects snow variability the most. However, wind-related predictors were excluded from the RF training dataset due to its poor importance score. The 25 most informative variables are ranked in Fig. 10. Except for snow depth itself, topographic variables that carry information about elevation are most precious. What is worth noticing is the high values of parameters cumulated from the season beginning (snowfall, snowmelt and total precipitation). Information from some past time is also included in other high-ranked predictors such as age of the snow depth, mean air temperature in the second soil layer and snow density. Wind-related predictors were scored way below 1. Therefore, a sensitivity analysis for these predictors has been additionally conducted. The predictors were added to the training dataset listed in the Appendix B and a new RF model was trained. Then it was compared in selected periods when weather conditions favored wind redistribution. The conditions were: average daily wind speed over 1.5 m s⁻¹, maximum wind gusts over 10 m s⁻¹, daily maximum temperature below 0 °C, daily precipitation below 0.5 mm and observed daily snow depth reduction by at least 5 cm. In total, 40 such cases were identified. The verification was performed for one station Hala Gąsienicowa, since this station lies above tree line and its snow depth report is representative (averaged from five snow poles). The place is also known for snow being blown away to lower parts of the valley, which results in snow depth reduction. The mean snow deflation as seen by RF is 2.1 cm while in fact it was over 11.5 cm. This is only slightly better than the reference RF model (with no wind-derived predictors), which reduced snow depth by 1.6 cm on average. ERA5-Land does not exhibit any snow depth reduction (0 cm on average). Global metrics calculated for the stations within the subdomain show minor improvement in most cases (differences in MAE predominantly below 0.1 cm).

4 Discussion

4.1 Bias in mountainous terrain

Underestimation of snow depth at high elevations (above 1000 m a.s.l.) in both ERA5 and ERA5-Land has been reported, e.g., by Muñoz-Sabater et al. (2021) and was attributed to orography smoothing. As it is an effect of orography discretisation, it occurs in most numerical models (depending chiefly on their horizontal resolution). In this light, less deviation observed in this study at high elevations in ERA5-Land could be clearly explained by better resolution of this reanalysis, which is in line with conclusions of Muñoz-Sabater et al. (2021) or Lei et al. (2023). However, their evaluation was done either globally or for a specific mountain area located remotely from the study area. Surprisingly, conclusions from studies regarding the Alps, which are the closest large-scale mountain range, stay in contradiction to our findings (Dalla Torre et al., 2024; Monteiro and Morin, 2023; Shrestha et al., 2023). This clearly demonstrates that the impact of orography smoothing could be completely offset by some other factors, which can locally become a dominant source of deviation. Bias in snow depth may be derived from various factors, out of which the most important are generally errors of air temperature, precipitation or shortcomings in parametrisation of snowpack properties, particularly snow density. The last one is usually systematic (spatially invariant), which does not explain the current discrepancy. Hence, the reason might be either temperature or precipitation bias that differs between the Alps and the Carpathian Mountains. As a matter of fact, in comprehensive pieces of work by Dalla Torre et al. (2024) and Monteiro and Morin (2023) these two elements were also evaluated. The authors identified substantial wet bias that contributes to overestimation of snow depth the most. Lack of accuracy assessment of air temperature and precipitation in this study hinders a clear explanation of the observed bias, which is a major limitation of the work.

4.2 Role of data assimilation

Outside complex terrain, the differences in errors between ERA5 and ERA5-Land are determined chiefly by the presence of DA in the coarser-resolved reanalysis. It is important to notice that despite the fact that only synop stations are assimilated, the improvement is visible also for lower-rank stations. First of all, it should be reminded that apart from SYNOP reports, additional information about snow extent is provided by satellites. However, this is just binary information with no plain conversion to snow depth (de Rosnay et al., 2015). From the IFS documentation (2016a), one can learn that horizontal and vertical structure functions applied in the Optimal Interpolation algorithm play an essential role in interpolating snow depth values in space between assimilated points. Because of them, the influence of a point measurement assimilated in the model spreads to neighbouring areas. It was examined more deeply by Stachura (2024), who claims that it yields satisfactory results unless there is a considerable elevation difference between a low-ranked station and an assimilated synop station.

However, some synops are excluded from the assimilation process. Our findings suggest that this is the case for sites at the sea coast. More importantly, there are several quality control conditions which result in rejection of measurements from, e.g., mountainous stations (ECMWF, 2016a). The IFS documentation does not specify a certain elevation threshold, below which measurements are accepted. Based on MAE scores presented in the study, one can state that the impact of DA is to be seen up to around 900 m a.s.l., however, our analysis of the normalised MAE revealed that there exist synops lying below this altitude with errors suggesting they were most likely not included in the DA system (e.g., Poprad-Gánovce). This indicates that quality control of snow depth DA in ERA5 involves thresholds other than elevation-based ones.

4.3 Random Forests performance

Analysis of interannual variability of nMAE and SS_MAE indicated two seasons with relatively weaker performance of RF with respect to the reanalysis. This deviation was attributed to very dry conditions in the southern part of the study area during these two winters. Consequently, the process of snow cover accumulation in the mountains was abnormal then. As a matter of fact, some sites recorded the seasonal maximum just at the beginning of December. Since RF predictions were generated through a leave-one-year-out cross-validation, for each of the two abnormal seasons, one of them was always included in the training set. Hence, it was not the case that such circumstances were completely unknown for the RF model. Apparently, however, they were too rare or too local for the model to generalise the information and represent it correctly. The importance of an equally balanced training set is commonly acknowledged in machine learning (Meehan et al., 2024; O'Gorman and Dwyer, 2018). One possible solution could be to use oversampling techniques that make training sets well-balanced. Otherwise, some other ML method could be deployed as some studies suggest that ANN or DL may be more robust in similar circumstances (Pflug et al., 2025; Stachura et al., 2024).

Nevertheless, considerably lower SS_MAE for the two specific seasons is not only an effect of RF mispredictions but also exceptionally low bias of the reanalyses. As concluded from Fig. 4, snow cover in mountainous terrain is generally heavily underestimated. From the other site, dry winters result in poor amounts of accumulated snow during the season. Bearing in mind the cumulative nature of snow, this eventually leads to lower bias in the reanalyses.

4.4 Spatial downscaling experiment

Station verification shown in Fig. 8b is in such complex terrain definitely insufficient to draw a conclusion that spatial distribution of the downscaled snow depth is legitimate and corresponds to the ground truth in the whole subdomain. It was shown that except for elevation dependence, the shadowing effect is also to be noticed due to including predictors which take into account the topographic influence of the surrounding terrain. This corresponds well with variable importance presented in Fig. 10, where information about elevation and solar radiation belongs to the most informative predictors. However, both variable importance and the sensitivity test proved little or negligible impact of the wind, while it is commonly acknowledged that this is a predominant determinant of spatial variability of snow, especially above the tree line (Mott et al., 2018). First of all, the observation network above the tree line is really scarce and hardly reflects wind redistribution in an objective way. As mentioned previously, some of the mountain-top stations do not measure snow depth directly (i.e., by reading the value from a snow pole) but rather estimate it based on fresh snowfall and other meteorological quantities so that the final value is representative for a greater area than a mountain peak itself. Such methodology is used, e.g., at Kasprowy Wierch, Śnieżka and Lomnický Štít. This, however, is misleading for a ML algorithm during training since the target values for these stations have slightly different meanings than for stations which measure snow depth explicitly. One way to get around it would be simply to remove the problematic sites, however, it would result in significant shrinkage of the data, especially above 1500 m a.s.l. and with large values of snow depth. In view of the known limited capability of RF to predict extremes (Muckley et al., 2022; Sun et al., 2024; Yang et al., 2020), this may lead to the deterioration of the final results. Another solution would be to include measurement data from ultrasound sensors, which have been recently set up at several stations in Poland. Lack of long observation series could be partially offset by higher data frequency. Secondly, the majority of training data come from stations located in lowland areas, where strong wind events are much less frequent than in mountainous regions and thus, wind redeposition occurs only rarely. A potential solution for this would be to train a ML model only on data from mountainous sites. This would result, however, in reduction of generalization ability of the model. Thirdly, as certain meteorological variables have proved to be more informative when expressed in their cumulative form, we speculate that a similar approach would be beneficial for wind-related predictors. However, aggregating them to this form would be more challenging than, e.g., precipitation, since effects of wind redistribution (deposition/erosion) are often wind-direction-dependent. A parameter accounting for these processes was developed and successfully used by Liu et al. (2025). However, it should be admitted that studies on snow wind redistribution in complex terrain using ML remain challenging and, therefore, relatively rare. More frequently, the topic is addressed through numerical modelling (Liston et al., 2007; Marsh et al., 2024).

Apart from wind, there are a few other snow depth determinants which are completely unrepresented in observations due to adhering to the WMO standards for weather station siting. These are, e.g., the influence of slope steepness, snow avalanches or canopy. Consequently, the RF model has no information about how snow accumulates on steep slopes or in the forest. As a result, RF predicts considerable amounts of snow even on extremely steep slopes, which is obviously erroneous. Owing to a lack of target observations, the potential of a ML approach seems to be limited. However, some of these processes (e.g., snow interception over canopy or snow settlement over steep slopes) are parameterised in surface numerical models, e.g., SURFEX (Hedstrom and Pomeroy, 1998). Hence, it might be worth considering developing a hybrid approach which would combine the strengths of both ML and classic Numerical Weather Prediction (NWP) models, e.g., using physics-informed ML (Viallon-Galinier et al., 2023).

5 Conclusions

As most studies focus either on accuracy assessment of reanalyses or bias correction, this study presents an uncommon approach where both problems are linked and handled collectively. In both reanalyses (i.e., ERA5 and ERA5-Land), heavy underestimation of snow depth is observed in complex terrain, where accurate estimation of snow is particularly crucial, as this is a key winter water reservoir in the north-eastern part of Central Europe. On average, ERA5-Land is less biased than the atmospheric ERA5 due to its higher resolution and the fact that mountainous stations are excluded from the DA process in the atmospheric reanalysis. These findings are particularly interesting as they do not correspond to outcomes of similar analysis from the Alps. In flatter terrain, the accuracy of snow depth estimation is predominantly affected by DA, which makes the coarse-resolved reanalysis better at 75 % of sites below 500 m a.s.l. This highlights the importance of DA and indicates major deficiencies in parametrisation of snow processes. MAE for both reanalyses generally increases with elevation, however, after normalisation, the greatest errors occur at sites located in the foothill zone rather than high in the mountains.

Continued efforts are needed at the side of reanalysis providers to enhance the DA system in several aspects. First of all, at the level of international data collection policies, an effort is required to adapt them so that more national databases of snow depth (which also involve measurements from lower-ranked stations) would be available to be assimilated. Secondly, at the scientific level, more robust methods should be deployed to handle assimilation in areas which were so far masked out from the procedure (due to strict quality control requirements or other assumptions). It is important to note here that DA would not be that crucial to reanalysis accuracy if physical parameterisations of snow processes were not flawed. Moreover, it is commonly known in the NWP community that correcting for bad physics is not the primary goal of DA. Therefore, improvement in the description of snow-related physical processes is a challenge of the utmost importance.

On average, the RF model was able to reduce systematic errors occurring at both reanalyses by as much as 48 %. The greatest improvement occurs in elevated terrain, predominantly due to improved resolution of orography. In lowlands, the differences between model and real elevation are little and the observed improvement occurs mostly due to simplifications in parametrizations of snow processes, particularly in non-assimilated areas. Stations where RF predictions were not superior are the stations with already small errors, most likely due to the impact of DA. Importantly, RF diminishes interannual variations of the error which were apparent for the reanalyses. The study identified two winter seasons with distinctly smaller relative improvement and argued that abnormal climatological conditions contributed to it. This points out the importance of a well-balanced training dataset and a potential limitation in application of this method of ML. Spatial downscaling performed by RF in areas with complex orography produced a fine-scaled spatial distribution of snow depth. Despite a decent overall performance verified pointly at 20 sites and the ability to capture more snow accumulated on northern slopes, predictions turned out to be spatially underdispersive in areas where factors other than elevation affect snow depth. Additionally, there are some factors which are crucial in determining the spatial extent of snow depth in mountainous terrain at such resolution, but they are barely reflected in observations (e.g., snow avalanches, drifting snow). Consequently, information about these determinants in training data is incomplete and deficient.

Overall, the main limitation of the RF model presented in this paper is that it requires ground station measurements and therefore, its skill and generalisation capabilities depend chiefly on the measurements quality and representativeness. This allows the conclusion that other sources of measurements and observations should be deployed in order to address the problem more properly in the future, preferably ones which are spatially continuous. Such data are provided by satellite measurements, e.g., passive microwave remote sensing or laser altimetry, and was used in snow depth estimation by, e.g. Liu et al. (2025) or Takala et al. (2011). However, other problems arise when processing this type of data such as scarce temporal coverage or too coarse resolution.

Finally, it should be added the above-mentioned limitations apply to virtually every modelling approach, especially when considering such a diverse meteorological element as snow cover and its thickness. In light of the literature and our own experience with snow cover measurements, it should be noted that the value of measurement data is crucial not only for simple spatial analyses using GIS spatialisation methods but also for the application of ML methods, including RF. Nevertheless, we believe the experiment provides valuable insights. Particularly, it might be further used to analyse climatological variability of snow over the past decades in the Carpathian Mountains with respect to, e.g., altitudinal zonation. To date, such attempts relied either on coarse-resolved reanalyses or sparse and temporally limited observations. ML-based downscaling opens new research directions in this context. As input variables could be easily generated for any domain, the applicability of the proposed approach is not only limited to the study area presented in this paper. However, one of the major constraints could be the diverse quality of snow fields, especially regarding the DA system. To fully address this issue, sensitivity experiments are required to be conducted.

Appendix A

The study area belongs to the temperate climate zone. A specific feature of climate in this place is a gradual transition from the oceanic type in the west to the continental one in the east. This is reflected in spatial variability of main climate characteristics such as mean air temperature but also mean snow cover duration (Fig. A1). It ranges from less than 30 d in the northwestern part of the study area to around 100 d in the northeast. The latitudinal pattern is strongly affected by mountainous areas in the south, where snow cover duration is elevation-dependent and can reach 180 days (6 months) in the highest parts of the Carpathians.

https://tc.copernicus.org/articles/20/2295/2026/tc-20-2295-2026-f11

Figure A1Average snow cover duration (in days) in the period 1991–2020 according to ERA5-Land (Sałaja, 2025).

https://tc.copernicus.org/articles/20/2295/2026/tc-20-2295-2026-f12

Figure A2Seasonal distribution of snow cover duration (a) and the mean snow depth (b) in the study period based on measurements taken at all stations included in the analyses. For the sake of better legibility, outliers are not shown.

Download

The study period (20 winter seasons from 2001/2002 to 2020/2021) is marked by fairly gentle snow conditions with snow duration shorter by around 10 d on average with respect to a long period 1951–2021. It is important to highlight that, in contrast to what is commonly believed, snow conditions then were not yet considerably poorer. By analysing boxplots showing seasonal distribution of snow cover duration and mean snow depth (Fig. A2), it is evident that snow conditions were very variable. The most snowy season regarding snow cover duration was 2005/2006 when 50 % of stations had snow for more than 110 d. However, in terms of mean snow depth, this season is slightly inferior to 2004/2005, when the median of this parameter reached almost 20 cm. The least snowy season was 2019/2020 with fewer than 10 d with snow at half of the stations and no snow at 25 % of them. Mean snow depth was also extremely low then. In general, considering the whole study period, four consecutive seasons, 2002/2003–2005/2006 as well as 2012/2013, could be regarded as snowy ones, while 2013/2014, 2015/2016 and 2019/2020 are seasons with the poorest snow conditions.

Summing it up, average snow conditions in the study period were a little gentler than in the over-70-year period 1951–2022. However, due to considerable interannual variability, extreme winter seasons in the study period are also one of the most severe and mildest seasons in the respected long period. This was concluded based on data from ERA5 as well as a number of meteorological stations with long observational series.

Appendix B

Table B1List of reanalysis fields and their aggregations used in training a Random Forests model.

Download Print Version | Download XLSX

Table B2List of DEM-based parameters used in training a Random Forests model.

Download Print Version | Download XLSX

Appendix C

Table C1Variability of RMSE training error during temporal and spatial cross-validation.

Download Print Version | Download XLSX

Data availability

Snow depth measurements from Polish stations used in the study are publicly available at https://danepubliczne.imgw.pl/ (last access: 23 February 2025). This does not include data from precipitation stations from years 2001–2010, which are available only internally. For downloading the data, an R package climate was used (Czernecki et al., 2019). As far as Czech stations are considered, data are publicly available at https://opendata.chmi.cz/meteorology/climate/historical_csv/data/daily (last access: 30 March 2026). Data from Slovakia are available only on request at the national weather service (Slovenský hydrometeorologický ústav).

Copernicus reanalyses ERA5 and ERA5-Land are available to download at the Climate Data Store of Copernicus Climate Change Service at https://cds.climate.copernicus.eu/datasets (last access: 23 June 2024).

Author contributions

GS and ZU conceptualized the goals of the research; GS downloaded and preprocessed the data. The development of methodology and training of ML models was performed by GS. Investigations and validation were done by GS under the supervision of ZU. All figures were made by GS. ZU prepared Appendix A, GS prepared the rest of the manuscript. ZU reviewed and proposed minor corrections to the final version of the manuscript. GS revised the manuscript according to suggestions of the Reviewers.

Competing interests

The contact author has declared that neither of the authors has any competing interests.

Disclaimer

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. The authors bear the ultimate responsibility for providing appropriate place names. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Acknowledgements

The measurement data from Slovakian stations come from the Slovakian National Weather Service (Slovenský hydrometeorologický ústav) and were provided thanks to the kind support of Jozef Pecho. Special thanks to J. Sałaja for creating Fig. A1.

Financial support

The research has been supported by a grant from the Faculty Geography and Geology under the Strategic Programme Excellence Initiative at Jagiellonian University (PSP number: U1U/W23/NS/06). Computational environment was provided by high-performance Infrastructure PLGrid (HPC Centers: ACK Cyfronet AGH, PCSS, CI TASK, WCSS) within computational grant no. PLG/2025/018763.

Review statement

This paper was edited by Chris Derksen and reviewed by three anonymous referees.

References

Baba, M. W., Boudhar, A., Gascoin, S., Hanich, L., Marchane, A., and Chehbouni, A.: Assessment of MERRA-2 and ERA5 to Model the Snow Water Equivalent in the High Atlas (1981–2019), Water, 13, 890, https://doi.org/10.3390/w13070890, 2021.

Benito, B.: BlasBenito/collinear: CRAN release v1.0.1, Zenodo [code], https://doi.org/10.5281/ZENODO.10039489, 2023.

Bochenek, B. and Ustrnul, Z.: Machine Learning in Weather Prediction and Climate Analyses – Applications and Perspectives, Atmosphere, 13, 180, https://doi.org/10.3390/atmos13020180, 2022.

Boehmke, B. C. and Greenwell, B.: Hands-on machine learning with R, CRC Press, Taylor & Francis Group, Boca Raton London New York, p. 1, https://doi.org/10.1201/9780367816377, 2020.

Breiman, L.: Random Forests, Mach. Learn., 45, 5–32, https://doi.org/10.1023/A:1010933404324, 2001.

Brown, R. D. and Mote, P. W.: The Response of Northern Hemisphere Snow Cover to a Changing Climate, J. Climate, 22, 2124–2145, https://doi.org/10.1175/2008JCLI2665.1, 2009.

Copernicus Climate Change Service (C3S): ERA5-Land hourly data from 1950 to present, Copernicus Climate Change Service (C3S) Climate Data Store (CDS) [data set], https://doi.org/10.24381/cds.e2161bac, 2019.

Cui, G., Anderson, M., and Bales, R.: Mapping of snow water equivalent by a deep-learning model assimilating snow observations, J. Hydrol., 616, 128835, https://doi.org/10.1016/j.jhydrol.2022.128835, 2023.

Czernecki, B., Taszarek, M., Marosz, M., Półrolniczak, M., Kolendowicz, L., Wyszogrodzki, A., and Szturc, J.: Application of machine learning to large hail prediction – The importance of radar reflectivity, lightning occurrence and convective parameters derived from ERA5, Atmos. Res., 227, 249–262, https://doi.org/10.1016/j.atmosres.2019.05.010, 2019.

Dalla Torre, D., Di Marco, N., Menapace, A., Avesani, D., Righetti, M., and Majone, B.: Suitability of ERA5-Land reanalysis dataset for hydrological modelling in the Alpine region, J. Hydrol.: Regional Studies, 52, 101718, https://doi.org/10.1016/j.ejrh.2024.101718, 2024.

de Burgh-Day, C. O. and Leeuwenburg, T.: Machine learning for numerical weather and climate modelling: a review, Geosci. Model Dev., 16, 6433–6477, https://doi.org/10.5194/gmd-16-6433-2023, 2023.

de Rosnay, P., Isaksen, L., and Dahoui, M.: Snow data assimilation at ECMWF, ECMWF Newsletter, 143, 26–31, https://doi.org/10.21957/LKPXQ6X5, 2015.

Dietz, A. J., Kuenzer, C., and Dech, S.: Global SnowPack: a new set of snow cover parameters for studying status and dynamics of the planetary snow cover extent, Remote Sens. Lett., 6, 844–853, https://doi.org/10.1080/2150704X.2015.1084551, 2015.

ECMWF: IFS Documentation CY43R1 – Part II: Data Assimilation, ECMWF [data set], https://doi.org/10.21957/AM5DTG9PB, 2016a.

ECMWF: IFS Documentation CY43R1 – Part IV: Physical Processes, ECMWF [data set], https://doi.org/10.21957/SQVO5YXJA, 2016b.

ECMWF: Response to query on assimilation of snow depth in ERA5, ECMWF, Zenodo, https://doi.org/10.5281/zenodo.19350555, 2024.

Elyoussfi, H., Boudhar, A., Belaqziz, S., Bousbaa, M., Nifa, K., and Chehbouni, A.: Towards a Deep-Learning Approach for Snow Depth Prediction Over Mountainous Area in Morocco, 44th Canadian Symposium on Remote Sensing, June 2023, Yellowknife, NWT, Canada, https://doi.org/10.13140/RG.2.2.26384.42241, 2023.

Falarz, M. and Bednorz, E.: Snow Cover Change, in: Climate Change in Poland, edited by: Falarz, M., Springer International Publishing, Cham, 375–390, https://doi.org/10.1007/978-3-030-70328-8_14, 2021.

Hedstrom, N. R. and Pomeroy, J. W.: Measurements and modelling of snow interception in the boreal forest, Hydrol. Process., 12, 1611–1625, https://doi.org/10.1002/(SICI)1099-1085(199808/09)12:10/11<1611::AID-HYP684>3.0.CO;2-4, 1998.

Helmert, J., Şensoy Şorman, A., Alvarado Montero, R., De Michele, C., de Rosnay, P., Dumont, M., Finger, D. C., Lange, M., Picard, G., Potopová, V., Pullen, S., Vikhamar-Schuler, D., and Arslan, A. N.: Review of Snow Data Assimilation Methods for Hydrological, Land Surface, Meteorological and Climate Models: Results from a COST HarmoSnow Survey, Geosciences, 8, 489, https://doi.org/10.3390/geosciences8120489, 2018.

Hersbach, H., Bell, B., Berrisford, P., Hirahara, S., Horányi, A., Muñoz-Sabater, J., Nicolas, J., Peubey, C., Radu, R., Schepers, D., Simmons, A., Soci, C., Abdalla, S., Abellan, X., Balsamo, G., Bechtold, P., Biavati, G., Bidlot, J., Bonavita, M., De Chiara, G., Dahlgren, P., Dee, D., Diamantakis, M., Dragani, R., Flemming, J., Forbes, R., Fuentes, M., Geer, A., Haimberger, L., Healy, S., Hogan, R. J., Hólm, E., Janisková, M., Keeley, S., Laloyaux, P., Lopez, P., Lupu, C., Radnoti, G., de Rosnay, P., Rozum, I., Vamborg, F., Villaume, S., and Thépaut, J.-N.: The ERA5 global reanalysis, Q. J. Roy. Meteor. Soc., 146, 1999–2049, https://doi.org/10.1002/qj.3803, 2020.

Hijmans, R. J.: terra: Spatial Data Analysis, CRAN [code], https://doi.org/10.32614/CRAN.package.terra, 2020.

Hu, Y., Che, T., Dai, L., Zhu, Y., Xiao, L., Deng, J., and Li, X.: A long-term daily gridded snow depth dataset for the Northern Hemisphere from 1980 to 2019 based on machine learning, Big Earth Data, 8, 274–301, https://doi.org/10.1080/20964471.2023.2177435, 2024.

Jarvis, A., Reuter, H. I., Nelson, A., and Guevara, E.: Hole-filled seamless SRTM data V4 (4.1), International Centre for Tropical Agriculture [data set], http://srtm.csi.cgiar.org (last access: 15 May 2023), 2008.

King, F., Erler, A. R., Frey, S. K., and Fletcher, C. G.: Application of machine learning techniques for regional bias correction of snow water equivalent estimates in Ontario, Canada, Hydrol. Earth Syst. Sci., 24, 4887–4902, https://doi.org/10.5194/hess-24-4887-2020, 2020.

Kursa, M. B. and Rudnicki, W. R.: Feature Selection with the Boruta Package, J. Stat. Soft., 36, https://doi.org/10.18637/jss.v036.i11, 2010.

Le Moigne, P., Boone, A., Calvet, J., Decharme, B., Faroux, S., Gibelin, A., Lebeaupin, C., Mahfouf, J., Martin, E., and Masson, V.: SURFEX v8. 1 Scientific Documentation, Note de centre (CNRM/GMME), Météo-France, Toulouse, France, 2018.

Lei, Y., Pan, J., Xiong, C., Jiang, L., and Shi, J.: Snow depth and snow cover over the Tibetan Plateau observed from space in against ERA5: matters of scale, Clim. Dynam., 60, 1523–1541, https://doi.org/10.1007/s00382-022-06376-0, 2023.

Liston, G. E., Haehnel, R. B., Sturm, M., Hiemstra, C. A., Berezovskaya, S., and Tabler, R. D.: Simulating complex snow distributions in windy environments using SnowTran-3D, J. Glaciol., 53, 241–256, https://doi.org/10.3189/172756507782202865, 2007.

Liu, Z., Filhol, S., and Treichler, D.: Retrieving snow depth distribution by downscaling ERA5 Reanalysis with ICESat-2 laser altimetry, Cold Reg. Sci. Technol., 239, 104580, https://doi.org/10.1016/j.coldregions.2025.104580, 2025.

Majidi, F., Sabetghadam, S., Gharaylou, M., and Rezaian, R.: Evaluation of the performance of ERA5, ERA5-Land and MERRA-2 reanalysis to estimate snow depth over a mountainous semi-arid region in Iran, J. Hydrol.: Regional Studies, 58, 102246, https://doi.org/10.1016/j.ejrh.2025.102246, 2025.

Marsh, C. B., Lv, Z., Vionnet, V., Harder, P., Spiteri, R. J., and Pomeroy, J. W.: Snowdrift-Permitting Simulations of Seasonal Snowpack Processes Over Large Mountain Extents, Water Resour. Res., 60, e2023WR036948, https://doi.org/10.1029/2023WR036948, 2024.

Meehan, T. G., Hojatimalekshah, A., Marshall, H.-P., Deeb, E. J., O'Neel, S., McGrath, D., Webb, R. W., Bonnell, R., Raleigh, M. S., Hiemstra, C., and Elder, K.: Spatially distributed snow depth, bulk density, and snow water equivalent from ground-based and airborne sensor integration at Grand Mesa, Colorado, USA, The Cryosphere, 18, 3253–3276, https://doi.org/10.5194/tc-18-3253-2024, 2024.

Monteiro, D. and Morin, S.: Multi-decadal analysis of past winter temperature, precipitation and snow cover data in the European Alps from reanalyses, climate models and observational datasets, The Cryosphere, 17, 3617–3660, https://doi.org/10.5194/tc-17-3617-2023, 2023.

Mortimer, C., Mudryk, L., Derksen, C., Luojus, K., Brown, R., Kelly, R., and Tedesco, M.: Evaluation of long-term Northern Hemisphere snow water equivalent products, The Cryosphere, 14, 1579–1594, https://doi.org/10.5194/tc-14-1579-2020, 2020.

Mott, R., Vionnet, V., and Grünewald, T.: The Seasonal Snow Cover Dynamics: Review on Wind-Driven Coupling Processes, Front. Earth Sci., 6, https://doi.org/10.3389/feart.2018.00197, 2018.

Muckley, E. S., Saal, J. E., Meredig, B., Roper, C. S., and Martin, J. H.: Interpretable models for extrapolation in scientific machine learning, http://arxiv.org/abs/2212.10283 (last access: 24 January 2024), 16 December 2022.

Mudryk, L., Mortimer, C., Derksen, C., Elias Chereque, A., and Kushner, P.: Benchmarking of snow water equivalent (SWE) products based on outcomes of the SnowPEx+ Intercomparison Project, The Cryosphere, 19, 201–218, https://doi.org/10.5194/tc-19-201-2025, 2025.

Muñoz-Sabater, J., Dutra, E., Agustí-Panareda, A., Albergel, C., Arduini, G., Balsamo, G., Boussetta, S., Choulga, M., Harrigan, S., Hersbach, H., Martens, B., Miralles, D. G., Piles, M., Rodríguez-Fernández, N. J., Zsoter, E., Buontempo, C., and Thépaut, J.-N.: ERA5-Land: a state-of-the-art global reanalysis dataset for land applications, Earth Syst. Sci. Data, 13, 4349–4383, https://doi.org/10.5194/essd-13-4349-2021, 2021.

Nouri, M. and Homaee, M.: Spatiotemporal changes of snow metrics in mountainous data-scarce areas using reanalyses, J. Hydrol., 603, 126858, https://doi.org/10.1016/j.jhydrol.2021.126858, 2021.

O'Gorman, P. A. and Dwyer, J. G.: Using machine learning to parameterize moist convection: potential for modeling of climate, climate change and extreme events, J. Adv. Model Earth Sy., 10, 2548–2563, https://doi.org/10.1029/2018MS001351, 2018.

Orsolini, Y., Wegmann, M., Dutra, E., Liu, B., Balsamo, G., Yang, K., de Rosnay, P., Zhu, C., Wang, W., Senan, R., and Arduini, G.: Evaluation of snow depth and snow cover over the Tibetan Plateau in global reanalyses using in situ and satellite remote sensing observations, The Cryosphere, 13, 2221–2239, https://doi.org/10.5194/tc-13-2221-2019, 2019.

Palarz, A., Luterbacher, J., Ustrnul, Z., Xoplaki, E., and Celiński-Mysław, D.: Representation of low-tropospheric temperature inversions in ECMWF reanalyses over Europe, Environ. Res. Lett., 15, 074043, https://doi.org/10.1088/1748-9326/ab7d5d, 2020.

Pflug, J. M., Kumar, S. V., Hall, D. K., Riggs, G. A., Konapala, G., Whitney, K. M., Wrzesien, M. L., Nie, W., Sun, Z., and Arsenault, K. R.: Efficient and Regionally Transferable Snow Water Equivalent Estimation Using a Long Short-Term Memory Network, J. Geophys. Res.-Machine Learning and Computation, 2, e2025JH000593, https://doi.org/10.1029/2025JH000593, 2025.

Pomeroy, J. W. and Brun, E.: Physical properties of snow, in: Snow ecology: An interdisciplinary examination of snow-covered ecosystems, vol. 45, Cambridge University Press, Cambridge, 118, 2001.

Probst, P., Wright, M., and Boulesteix, A.-L.: Hyperparameters and Tuning Strategies for Random Forest, WIREs Data Min & Knowl, 9, e1301, https://doi.org/10.1002/widm.1301, 2019.

Qiao, D., Li, Z., Zhang, P., Zhou, J., and Liang, S.: Prediction of Snow Depth Based on Multi-Source Data and Machine Learning Algorithms, in: 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, 5578–5581, https://doi.org/10.1109/IGARSS47720.2021.9554675, 2021.

Sałaja, J.: Wieloletnia zmienność pokrywy śnieżnej w Europie Środkowej, M.Sc. thesis, Institute of Geography and Spatial Management, Jagiellonian University, 2025.

Schulzweida, U.: CDO User Guide, Zenodo [software documentation], https://doi.org/10.5281/ZENODO.10020800, 2023.

Sebbar, B., Khabba, S., Merlin, O., Simonneaux, V., Hachimi, C. E., Kharrou, M. H., and Chehbouni, A.: Machine-Learning-Based Downscaling of Hourly ERA5-Land Air Temperature over Mountainous Regions, Atmosphere, 14, 610, https://doi.org/10.3390/atmos14040610, 2023.

Shrestha, S., Zaramella, M., Callegari, M., Greifeneder, F., and Borga, M.: Scale Dependence of Errors in Snow Water Equivalent Simulations Using ERA5 Reanalysis over Alpine Basins, Climate, 11, 154, https://doi.org/10.3390/cli11070154, 2023.

Song, Y., Tsai, W.-P., Gluck, J., Rhoades, A., Zarzycki, C., McCrary, R., Lawson, K., and Shen, C.: LSTM-Based Data Integration to Improve Snow Water Equivalent Prediction and Diagnose Error Sources, J. Hydrometeorol., 25, 223–237, https://doi.org/10.1175/JHM-D-22-0220.1, 2024.

Stachura, G.: Evaluation and machine-learning-based downscaling of ERA5 snow depth in Central Europe, EMS Annual Meeting 2024, Barcelona, Spain, https://doi.org/10.5194/ems2024-168, 2024.

Stachura, G., Ustrnul, Z., Sekuła, P., Bochenek, B., Kolonko, M., and Szczęch-Gajewska, M.: Machine learning based post-processing of model-derived near-surface air temperature – A multimodel approach, Q. J. Roy. Meteor. Soc., 150, 618–631, https://doi.org/10.1002/qj.4613, 2024.

Sun, L., Zhang, X., Wang, H., Xiao, P., and Wang, Y.: Estimating Daily Snow Density Through a Spatiotemporal Random Forest Model, Water Resour. Res., 60, e2023WR036942, https://doi.org/10.1029/2023WR036942, 2024.

Takala, M., Luojus, K., Pulliainen, J., Derksen, C., Lemmetyinen, J., Kärnä, J.-P., Koskinen, J., and Bojkov, B.: Estimating northern hemisphere snow water equivalent for climate research through assimilation of space-borne radiometer data and ground-based measurements, Remote Sens. Environ., 115, 3517–3529, https://doi.org/10.1016/j.rse.2011.08.014, 2011.

Tanniru, S., R, C. P., Singh, K. K., and Ramsankaran, R.: Development of Machine Learning based Historical Snow Depth Dataset for Western Himalaya using Multiple Reanalysis Datasets, AGU Fall Meeting, event-title: AGU Fall Meeting AbstractsADS Bibcode: 2023AGUFM.C32C..05T, C32C-05, 2023.

Van Doninck, J.: Software code: Horizon: Horizon Search Algorithm. R package version 1.2, https://cran.r-project.org/src/contrib/Archive/horizon (last access: 1 April 2026), 2018.

Varga, A. and Breuer, H.: Evaluation of snow depth from multiple observation-based, reanalysis, and regional climate model datasets over a low-altitude Central European region, Theor. Appl. Climatol., 153, 1–17, https://doi.org/10.1007/s00704-023-04539-5, 2023.

Vernay, M., Lafaysse, M., Monteiro, D., Hagenmuller, P., Nheili, R., Samacoïts, R., Verfaillie, D., and Morin, S.: The S2M meteorological and snow cover reanalysis over the French mountainous areas: description and evaluation (1958–2021), Earth Syst. Sci. Data, 14, 1707–1733, https://doi.org/10.5194/essd-14-1707-2022, 2022.

Viallon-Galinier, L., Hagenmuller, P., and Eckert, N.: Combining modelled snowpack stability with machine learning to predict avalanche activity, The Cryosphere, 17, 2245–2260, https://doi.org/10.5194/tc-17-2245-2023, 2023.

Wang, X., Tolksdorf, V., Otto, M., and Scherer, D.: WRF-based dynamical downscaling of ERA5 reanalysis data for High Mountain Asia: Towards a new version of the High Asia Refined analysis, Int. J. Climatol., 41, 743–762, https://doi.org/10.1002/joc.6686, 2021.

Warner, T. T.: Numerical weather and climate prediction, Cambridge University Press, Cambridge , New York, 526 pp., https://doi.org/10.1017/CBO9780511763243, 2011.

Wright, M. N. and Ziegler, A.: Ranger: A Fast Implementation of Random Forests for High Dimensional Data in C $+ +$ and R, J. Stat. Softw., 77, 1–17, https://doi.org/10.18637/jss.v077.i01, 2017.

Yang, J., Jiang, L., Luojus, K., Pan, J., Lemmetyinen, J., Takala, M., and Wu, S.: Snow depth estimation and historical data reconstruction over China based on a random forest machine learning approach, The Cryosphere, 14, 1763–1778, https://doi.org/10.5194/tc-14-1763-2020, 2020.

Yang, T., Li, Q., Chen, X., Hamdi, R., De Maeyer, P., and Li, L.: Variation of Snow Mass in a Regional Climate Model Downscaling Simulation Covering the Tianshan Mountains, Central Asia, J. Geophys. Res.-Atmos., 126, e2020JD034183, https://doi.org/10.1029/2020JD034183, 2021.

Articles

Short summary

Reanalyses still struggle to accurately estimate snow depth, mostly because their horizontal resolution is beyond the spatial scale of snow variability. A comparison of two Copernicus reanalyses ERA5 and ERA5-Land reveals systematic errors and highlights the importance of data assimilation. A Random Forests model is able to reduce the systematic error by around a half. Spatial downscaling in complex terrain reflects mainly elevation dependence but also shadowing effect of surrounding topography.