Spatio-temporal reconstruction of winter glacier mass balance in the Alps, Scandinavia, Central Asia and western Canada (1981–2019) using climate reanalyses and machine learning

. Spatio-temporal reconstruction of winter glacier mass balance is important for assessing long-term impacts of climate change. However, high-altitude regions signiﬁcantly lack reliable observations, which is limiting the calibration of glaciological and hydrological models. Reanalysis products provide estimates of snow precipitation also for remote high-mountain regions

eties (e.g. Adger et al., 2003;Hock et al., 2019). The rate and magnitude of cryospheric changes significantly depend on the evolution of the high-altitude precipitation regime. The elevation dependency of precipitation trends is however unclear; while precipitation trends from station observations often show an inconsistent picture with no systematic changes with elevation, gridded datasets show reduced precipitation increases at higher elevations (e.g. Pepin et al., 2022). It is thus crucial to improve our understanding of the local high-altitude climate-cryosphere interaction (e.g. Stone et al., 2013;Salzmann et al., 2014;Huss et al., 2017;Barandun et al., 2020). However, at the local scale and particularly at very high altitudes, snow and precipitation in situ observations are typically very scarce, spatially not optimally distributed, with low temporal resolution, too short in time, or with important gaps caused by technical challenges and difficult accessibility and thus complicated and lavish maintenance (e.g. Beniston et al., 2012;Tapiador et al., 2012). This is an important limitation for studies focusing on the long-term effects of climate change at the high-mountain cryosphere, which require snow accumulation data covering decadal periods (e.g. Seiz et al., 2010). At very high altitudes, mostly only measured snow water equivalent (SWE) as the cumulative snow accumulation on glaciers is available. Measurements of SWE on glaciers are typically used for the determination of winter mass balance (Cogley et al., 2011), an important variable in international glacier monitoring (e.g. Zemp et al., 2013). The main process of snow accumulation is the total precipitation received by the glacier during the accumulation season. Since melting is often negligible during this time period, SWE on glaciers represents a reliable measure of local winter precipitation and was thus used for a comparison with precipitation products in different studies (e.g. Gugerli et al., 2020;Guidicelli et al., 2021). However, other processes such as deposition of hoar, freezing rain or snow drift caused by winds and avalanching can also influence the accumulation (Dadic et al., 2010;Gascoin et al., 2013).
Worldwide spatio-temporally continuous information on precipitation, snow depth and SWE is also provided by climate reanalyses that merge physical laws with the assimilated satellite and ground observations (e.g. Hersbach et al., 2020;Gelaro et al., 2017). However, the performance of reanalysis results can vary greatly depending on the region and the elevation range of interest (Sun et al., 2018). Large biases in reanalysis precipitation are particularly found in high-mountain regions (e.g. Liu and Margulis, 2019;Zandler et al., 2019). The scarcity of observations available for assimilation and the coarse resolution of such models limit their accuracy in areas of complex topography and their suitability for studies at a local scale (e.g. Salzmann and Mearns, 2012, (for snow)).
Thus, the further development of techniques to spatially and/or temporally transfer the available observational series between sites and/or filling data gaps is critical and ur-gently needed (e.g. Salzmann et al., 2014). Downscaling of precipitation estimates of reanalyses is thereby a necessary step to represent the local conditions in high-mountain regions. Different statistical and dynamical downscaling methods exist (see Maraun et al., 2010), which have also been employed and evaluated over glacierized regions (e.g. Mölg and Kaser, 2011). For instance, Liston and Elder (2006) developed a quasi-physically based meteorological model to produce high-resolution (30 m to 1 km horizontal grid) atmospheric forcings for several variables, where the precipitation adjustment is a non-linear function of the elevation difference between the grid and the point of interest. The same equation was used by Gupta and Tarboton (2016), who proposed an approach to downscale the MERRA (Rienecker et al., 2011) variables. They obtained a Nash-Sutcliffe efficiency greater than 0.70 for downscaled monthly precipitation at 173 Snow Telemetry (SNOTEL) sites. Fiddes and Gruber (2014) adapted this method for the Swiss Alps by including a climatological parameter based on the Alpine precipitation dataset provided by the Climatic Research Unit (gridded monthly precipitation totals at 10 arcmin resolution over the Alps, for the period . Their product allowed for improving the purely lapse-rate-based approach of Liston and Elder (2006), obtaining a correlation coefficient of 0.6 (versus 0.5) against the annual precipitation observed at 40 MeteoSwiss automatic meteorological network (ANETZ) stations. Recently, machine learning methods have demonstrated their high performance to statistically downscale reanalyses (and global climate models) estimates of precipitation and other meteorological variables, from subdaily and daily (e.g. Serifi et al., 2021;Wang et al., 2021) to monthly and seasonal (e.g. Sachindra et al., 2018;Najafi et al., 2011;Sun and Tang, 2020) resolution. However, downscaling methods for snow (and precipitation) are rarely assessed at very high elevations, mainly due to the scarcity of ground observations. Consequently, long-term effects of climate change on the snowpack at very high elevations are not well understood yet (e.g. Seiz et al., 2010).
In this study, we thus aim at analysing total precipitation biases of reanalysis datasets (ERA5 and MERRA-2) over the snow accumulation season on glaciers, i.e. at the highest elevations of different mountain ranges. The precipitation estimates are compared with the winter glacier mass balance data covering a period of up to 39 years from 95 glaciers in the European Alps, Scandinavia, Central Asia and western Canada. The selection of these regions and glaciers depends on the data consistency and availability (see Sect. 2.2). Ultimately, we aim at reconstructing the winter glacier mass balance from partial information. In order to achieve this goal, we develop and evaluate a machine learning approach based on gradient boosting regressor (GBR) models (see Friedman, 2001) to adjust the total precipitation of reanalysis (main driver of snow accumulation) along the elevation profiles of the glaciers. More specifically, the GBR models aim at allowing the spatio-temporal transferability of the learned infor-

Study sites and data
The study was conducted on 95 glaciers located in the Alps, Scandinavia, Central Asia and western Canada (Fig. 1), where the longest time series and the highest density of winter glacier mass balance data are available. In the following, we describe the different data sources used in the study.

Reanalysis data
We used data from ERA5 and MERRA-2 reanalyses, since these are currently among the most widely used reanalysis products, providing the highest spatial resolution and covering the longest time period in all the regions of our study.

ERA5
ERA5 is the fifth generation of the European Centre for Medium-Range Weather Forecasts atmospheric reanalyses of the global climate (see Hersbach et al., 2020 for more information). In this study, we used several variables from the ERA5 hourly data on single levels from 1979 to present (Hersbach et al., 2018b) and the ERA5 hourly data on pressure levels from 1979 to present (Hersbach et al., 2018a), all with a spatial resolution of 0.25 • ,× 0.25 • (∼ 30 km). All variables were resampled on a daily timescale before usage. The list of variables selected for the analysis is reported in Table B1. The ERA5 precipitation variable used in the study is total precipitation (tp) from the ERA5 single levels.

MERRA-2
MERRA-2 is the second version of the Modern-Era Retrospective Analysis for Research and Applications (see Gelaro et al., 2017, for more information). In this study, we used several variables from the MERRA-2 Land Surface Diagnostics (Global Modeling and Assimilation Office (GMAO), 2015b), the MERRA-2 Single-Level Diagnostics (Global Modeling and Assimilation Office (GMAO), 2015c) and the MERRA-2 Analysed Meteorological Fields (Global Modeling and Assimilation Office (GMAO), 2015a). All variables have a spatial resolution of 0.5 • × 0.625 • (∼ 50 km), and we resampled them on a daily timescale before usage. The list of the selected variables is reported in Table B2. The MERRA-2 precipitation variable used in the study is total precipitation (PRECTOTLAND) from the MERRA-2 Land Surface Diagnostics.

Winter mass balance data
The World Glacier Monitoring Service (WGMS) compiles and publishes standardized observations on changes in mass, volume, length and area of glaciers collected by national monitoring programmes and local observers around the world (glacier fluctuations; see Zemp et al., 2021 for more details). The data compilation provided by the WGMS based on dozens of detailed national monitoring programmes is unique in terms of providing consistent in situ data in different regions of the world at the highest elevation of mountain ranges, i.e. on glaciers that can be used for comparison with precipitation datasets.
Thus, we used the winter mass balance data separated per elevation intervals (EE-MASS-BALANCE data sheet in WGMS, 2021), and we refer to them as B w in this study. Point observations are also available (EEE-MASS-BALANCE POINT data sheet in WGMS, 2021) but are not used in this study because of the smaller number of glaciers with complete information reported (observation dates, elevation, coordinates). We only considered the B w data where the elevation interval is indicated in the WGMS database. The glacier area related to each elevation interval was also used to weight the B w data. In addition, we considered the average slope and aspect of the glaciers by using the information provided in the Randolph Glacier Inventory version 6 (RGI Consortium, 2017).
The winter mass balance is the result of the balance between the gain of snow which accumulates over the glacier, as well as refreezing of liquid water within the snowpack, and the loss caused by melting and sublimation over the accumulation season. Other processes such as snow drift caused by winds can also influence the accumulation. The amount of snow melt is typically minor compared to the snow accumulation and thus negligible in the comparison with the precipitation totals performed in this study. Snow accumulation is expressed in SWE (e.g. Østrem and Brugman, 1991), which is calculated by multiplying the measured snow depth with the respective bulk density of the snowpack. The snow depth is typically measured with a snow probe or groundpenetrating radar, while the snow density is usually measured in snow pits or by coring and is subsequently extrapolated to all observations on a glacier. The WGMS database only provides information on B w but does not generally allow tracing whether density was directly measured or not.
The B w data used in this study correspond to the mean winter balance for the glacier area within the respective elevation interval. Various spatial extrapolation techniques were applied by the national observers to infer elevation band average snow accumulation from the (sparse) point observations, which can be challenging due to important local-scale variability in snow depth (e.g. Dadic et al., 2010;Helfricht et al., 2014;Sold et al., 2016). Unfortunately, the WGMS database does not allow tracing the methods used, hence, resulting in an uncertainty that is difficult to be estimated. Often, no direct snow depth and density observations are available at the most extreme elevations of the glaciers because of high surface slopes and difficult accessibility. The employed techniques in the framework of the Swiss national programme Glacier Monitoring Switzerland (GLAMOS), which provides the data of the Swiss sites to WGMS) are described in Huss et al. (2021). The impact of the inter-and extrapolation of direct SWE measurements acquired on glaciers to obtain B w data used in this study on our results is discussed in Sect. 5.2.5.
The starting date of the accumulation season is not precisely known but often determined with a stratigraphic system (i.e. since the date of the minimum surface in the previous summer) (e.g. Mayo et al., 1972;Cogley et al., 2011). The date of the minimum surface varies between the years and also across the glacier. In fact, snow accumulation starts typically later at lower elevations than at higher elevations (Huss et al., 2009). However, in this study we used a unique starting date for the entire glacier according to the information provided in the WGMS database. The end of the season is determined by the day of the snow survey that is indicated in the WGMS database. In this study, we cumulated precipitation amounts over the accumulation season. The impact of the date considered the beginning of the accumulation season on our results is discussed in Sect. 5.2.5.

Methods
First, we derived the total or average of all variables provided by the reanalyses for the entire accumulation season. Subsequently, a machine learning model to adjust the total precipitation (see Sect. 2.1.1 and 2.1.2) of the reanalyses over glaciers for the accumulation season was developed to reconstruct the B w along the elevation profile of the glaciers.
We use a gradient boosting regressor (GBR), which makes use of several meteorological variables (original and downscaled) and topographical parameters as input variables (predictors). The list of predictors required by the GBR is summarized in Table B3, while the variable names are described in Tables B1 (ERA5) and B2 (MERRA-2). In principle, a different adjustment factor of precipitation might be needed depending on the precipitation phase. However, as we only adjust the total precipitation occurring during the accumulation season, the adjustment factors used here represent the "average" adjustment factor of all precipitation events. Moreover, the snowfall variable was used as a predictor in order to enable the GBR model to learn that a different "average" adjustment factor must be applied depending on the fraction of snowfall and total precipitation (i.e. depending on the main precipitation phase during the accumulation season).
The applied methods to downscale the meteorological variables used by the GBR model as predictors are described below.

Downscaling temperature and relative humidity
In addition to the original variables (all "constants", all "single level" and all "land surface" variables in Tables B1 and B2), the GBR requires some downscaled variables of the reanalyses as predictors at the glacier elevation intervals (see Table B3), including air temperature, dew point temperature and relative humidity (for MERRA-2 and ERA5), vertical velocity of air motion (for ERA5 only), and specific humidity (for MERRA-2 only). The downscaling procedure was applied at a daily resolution using a linear interpolation between the values of the two closest pressure levels to the centre of the elevation intervals of the B w data of the glaciers. This downscaling approach is illustrated in the Supplement (Fig. S1).
If information regarding the relative humidity was not directly provided by the reanalyses, we applied approaches presented by Liston and Elder (2006) and Gupta and Tarboton (2016) to derive it. The applied equations are described in Appendix A.

Downscaling precipitation
The total precipitation over the accumulation season was considered the main driver of B w . Thus, in order to derive B w estimates over the glaciers, we built a machine learning model that adjusts the total precipitation of reanalysis along the elevation profile of the glaciers. In this study, we also applied a pre-existing lapse-rate-based approach (not datainformed) for precipitation downscaling that we considered a benchmark. The approaches are described below, and the results obtained with the two methods are compared afterward. Liston and Elder (2006) proposed a lapse-rate-based approach to downscale reanalysis' precipitation by accounting for the elevation difference between the point of interest and the grid of the reanalysis. Whereas they applied the approach to the MERRA reanalysis, we applied the same approach to MERRA-2 and ERA5 reanalysis data and used the resulting adjusted precipitation as a benchmark:

Benchmark
where P reanalysis is the precipitation of the reanalysis, H reanalysis is the elevation of the reanalysis' grid, P adj is the adjusted precipitation at the altitude of the point of interest (H point ) and κ is a monthly adjustment factor (cf. Table 1 of Liston and Elder, 2006). In our study, we used an average factor κ = 0.3214, corresponding to the average between October and April. The precipitation adjusted with this approach on Findelgletscher is illustrated in Fig. S1.

Gradient boosting regressor (GBR) model
In order to represent a potential non-monotonic increase of snow accumulation (and precipitation) with elevation and to provide different adjustments of the original reanalysis' precipitation depending on the region and the site, we built more complex models based on machine learning. All models are built with the open source scikit-learn library for machine learning in Python (see Pedregosa et al., 2011). Specifically, we built a series of GBR models, each consisting of an ensemble of weak learning models (estimators) represented by regression trees. Similarly to all tree-based models, GBRs do not provide continuous estimates. In our case, the goal of the GBR models is to predict the logarithmic adjustment factors of the reanalysis' precipitation with respect to the B w along the elevation profile of the glaciers (Eq. 2). In a GBR, the trees are built sequentially, and the subsequent trees learn from the errors of the previous trees, minimizing the residuals between their predictions and the reference values.
where P reanalysis,tot is the total precipitation of the reanalysis over the accumulation season. The GBR models aim at minimizing the cost function defined in Eq.
(3), corresponding to the mean square error between the predicted and reference logarithmic adjustment factors.
Different hyperparameters characterize a GBR. In this study, we applied a grid search to optimize the number of estimators (number of additive trees, i.e. number of iterations), the maximum depth that each tree can reach, the minimum number of samples required to be at a leaf node of a tree and the learning rate, which can vary between 0 and 1 and determines how quickly the GBR learns by shrinking the contribution of the individual trees on the GBR predictions. The higher the number of estimators or the maximum depth is, the more complex and less generalized the GBR model is. In contrast, the larger the minimum number of samples is, the less complex and more generalized the GBR model is. A smaller learning rate makes the GBR more robust to the specific characteristics of each individual tree, thus allowing a better generalization. However, the smaller the learning rate is, the more subsequent trees (iterations) are generally required to reach the minimum of the cost function. A 10-fold cross-validation was applied with different combinations of hyperparameters. The hyperparameters that were able to minimize the mean square error of the validation data were chosen. Finally, the GBR model with the chosen hyperparameters was tested on independent data. The validation and the test data were defined differently depending on the goal of the GBR model. For both reanalysis products (ERA5 and MERRA-2), we built two different GBR models with two different goals and two different cross-validation and test schemes. The first GBR model is site-independent and aims at "extrapolating" the B w data in time and space (over glaciers with no B w data). For the site-independent GBR, we built 95 models, one for each glacier, trained and validated with the data of the other 94 glaciers. Each glacier is used, in turn, as an independent test for the model based on the data of the other 94 glaciers. Thus, the site-independent GBR is independent from any data of the glacier where the model is tested (see Fig. 2). The second GBR model is season-independent and aims at extrapolating the B w data in time only (filling data gaps over Figure 2. Training, cross-validation with hyperparameter selection and test scheme for the site-independent GBR model. F dB, ref is the reference adjustment factor (Eq. 2), and F dB, pred is the adjustment factor predicted by the GBR model. glaciers with discontinuous records of B w ). For the seasonindependent GBR, we built a model for each year (when the B w data are available) and each glacier. In this case, each year of each glacier is used, in turn, as an independent test for the GBR, which is trained using the data of the other years (of the tested glacier) and of the other 94 glaciers. Thus, unlike the site-independent GBR, the season-independent GBR is informed with data of the glacier where the model is tested, excluding only data of the year when the model is tested. For the season-independent GBR, data of the other 94 glaciers are still used in the training, because many glaciers only have a small number of years with available B w data. Thus, in the case of limited B w data, this may help the GBR to learn from the data of the other 94 glaciers.
The average optimal hyperparameters for all the studied glaciers are reported in Table 1. The resulting siteindependent model is more generalized (smaller number of estimators than the season-independent GBR and higher minimum number of samples per leaf), while the seasonindependent model is more detailed and can be split into individual sub-models adapted to a small number of samples. The different architecture between the site-independent and the season-independent GBRs is discussed in Sect. 5.2.1.
All variables presented in Sect. 2.1 and listed in Sect. B of the Appendix were considered by the GBR as predictors (separately for ERA5 and MERRA-2). In addition, we derived and used the differences between the downscaled variables (cf. Sect. 3.1) and the estimates at the grid of the re- Table 1. Average hyperparameters of the optimized GBR models of all the studied glaciers: "n estimators" is the number of additive trees (i.e. iterations) in the GBR, "max depth" is the maximum depth that each tree can reach, "min sample leaf" is the minimum number of samples required to be at a leaf node of a tree, "learning rate" determines how quickly the GBR learns (it shrinks the contribution of the individual trees on the GBR prediction) and "subsample" indicates the fraction of samples used for fitting the individual trees.

Site-independent
Season-independent GBR GBR ERA5 analysis. Some variables were not only averaged considering all days in the accumulation season but also considering only the days with a relevant amount of precipitation, here arbitrarily set to 5 mm. The GBRs also use the latitude and longitude of the glacier (same value for the entire glacier), as well as aspect and slope of the glacier (same value for the entire glacier). A summary of the predictors used by the GBRs is provided in Table B3.
During the training phase of our models, the B w data were weighted by considering the area of the glacier related to the respective elevation interval. Larger glaciers (and elevation intervals related to larger areas) thus receive more weight than smaller glaciers (and elevation intervals related to smaller areas).

Adjustment factors
In order to evaluate the bias between the B w data and the estimates of the models (original reanalyses, benchmark or GBR), we computed the adjustment factor f (dimensionless) as where E model is the estimate of a model. The adjustment factor F dB is expressed in decibels and is used to derive supplementary evaluation metrics:

Glacier-wide means
When deriving a glacier-wide factor (or glacier-wide B w ) for a single accumulation season, we computed a weighted mean of the area contained in the individual elevation intervals. These seasonal glacier-wide values were used to derive Pearson's correlations (r), the root mean square error (RMSE) and the fraction of standard error (FSE) between the glacierwide B w and the model estimate. The FSE corresponds to the RMSE divided by the glacier-wide B w .

Regional metrics
In order to further validate the performance of the GBR models, we derived the glacier-wide F dB described by Eq. (5) for every accumulation season and every glacier with B w data (F dB, mean ). We thus analysed the four investigated regions separately by deriving a mean factor per region as where n is the number of glaciers, and m g is the number of accumulation seasons with B w data for the glacier g.

Results
In the following, we first present the main results of our study, i.e. the performance of the GBR models over glaciers in the Alps, Scandinavia, Central Asia and western Canada (Sect. 4.1), followed by the analysis of the predictors' importance in the GBR models (Sect. 4.2).

Performance of the GBR models
Overall, the GBR models indicate better agreement in terms of bias and spatial and temporal correlation with the B w data than the original reanalyses and the benchmark for the majority of the studied glaciers. In the following we report in detail on the comparison between the B w data, the precipitation estimates of the reanalyses and the GBR models' estimates. Figure 3 shows the comparison between all glacier-wide B w values and the models' estimates. MERRA-2 precipitation underestimates B w more importantly than ERA5 precipitation in all regions ( Fig. 3a and b), with an overall RMSE of 946 mm (mean absolute error (MAE) of 749 mm) against 793 mm (611 mm) of ERA5. Excluding the Alps, the correlation between the B w data and the ERA5 precipitation is always higher than the correlation with the MERRA-2 precipitation. The adjusted estimates obtained with the siteindependent and the season-independent GBRs allowed us to consistently reduce (increase) the bias (Pearson correlation (r)) between the precipitation of the original reanalyses and B w (overall RMSE of 433 mm, MAE of 326 mm, r of 0.86 for the MERRA-2 site-independent GBR; RMSE of 410 mm, MAE of 307 mm, r of 0.87 for the ERA5 site-independent GBR; RMSE of 293 mm, MAE of 211 mm, r of 0.94 for the MERRA-2 season-independent GBR; RMSE of 275 mm, MAE of 200 mm, r of 0.94 for the ERA5 season-independent GBR). These results demonstrate the need of an adjustment of reanalyses data to reproduce B w data on glaciers, which are, otherwise, largely underestimated in all four regions involved in this study.

Glacier-wide reanalysis' bias adjustment
In order to make an in-depth analysis of the model performance, we also derived a glacier-wide factor between the B w data and reanalysis-based models' estimates (Eq. 4) for each accumulation season and for each site separately (Fig. 4). By comparing Fig. 4b and c, it is clear that, in Central Asia, the factors for adjusting the MERRA-2 reanalysis' precipitation are much larger than the factors for the ERA5 precipitation. The benchmark method overestimates the B w for many glaciers in the Alps (both ERA5 and MERRA-2) and several glaciers in Central Asia (ERA5). The site-independent and, especially, the season-independent GBRs are better scaled with respect to the B w data than the original reanalyses and the benchmarks. In general, the variability of the factors for each glacier is strongly affected by the number of available accumulation seasons with B w data (Fig. 4a). A lower variability is usually observed for glaciers with a small number of seasons with B w data. Figure 5 shows the mean regional factor between the B w data and the models' estimates as a function of the accumulation seasons from 1981 to 2019. It indicates that the original reanalyses clearly underestimate B w on glaciers, except for ERA5 in Central Asia, where, as a consequence, the benchmark overestimates B w . However, temporal variations in the mean regional bias are also affected by the considered set of glaciers that fluctuates over the analysed years. In the Alps, we observe increasing biases of the original reanalyses in recent years, where a much larger number of glaciers is available. In Scandinavia, the bias of MERRA-2 and ERA5 is similar and all the models are generally not able to remove it completely. In Central Asia, there is a tendency for all models to yield lower adjustment factors before the 2000s than afterwards. However, this has to be interpreted with care, because only one glacier was considered between 2002 and 2014. The continuity of the available B w data in western Canada is too limited to analyse temporal changes in the adjustment factors.
In order to evaluate the robustness of the GBR models to reduce the glacier-wide bias of the reanalysis, we performed a temporal and spatial validation of their predictions (Fig. 6). The performance of the season-independent models improves when using more accumulation seasons in the training data (Fig. 6a, c, e and g). Training the models with more than 20 seasons, however, does not seem to further improve performance consistently. The performance of the site-independent models is constant, because they are never trained with B w data of the tested glacier. When no B w data of the tested glacier are used to train the season-independent models (as for the site-independent models), their perfor- The Pearson correlation (CORR), the root mean square error (RMSE), the fraction of standard error (FSE, corresponding to the RMSE divided by the regional mean B w ) and the number of all seasons of all glaciers (N points) for each region are also reported. The boxplots indicate the distribution of the model's estimates (right) and of the B w data (top) for each region. mance is worse than the site-independent models, confirming the importance of a specific optimization scheme depending on the goal of the model.
As also expected, the performance of the site-independent models decreases when data of neighbouring glaciers are excluded from the training (Fig. 6b, d, f and h). The highest impact is on the performance of the MERRA-2 site-independent GBR in Central Asia. Overall, the bias of the site-independent GBR models remains comparable to the bias obtained with the benchmark method even when excluding all other glaciers located within 1000 km from the training. For the season-independent models, we always kept the B w data of the tested glacier in the training data and only excluded the other glaciers. This explains why the season- independent models perform better and are less sensitive to the removal of neighbouring glaciers from the training process.
The good performance of the GBRs in terms of bias suggests that they can be used for B w reconstruction over glaciers where no ground observations are available (siteindependent GBRs) and for filling data gaps of the recorded observations (season-independent GBRs). However, the performance generally decreases when the glacier is not in proximity to the glaciers used to train the GBR models. Furthermore, we assume that the resulting performance strongly depends on the characteristics of the glacier with respect to the glaciers used in the training. The results indicate more-over that the season-independent GBRs outperform the siteindependent GBRs to reduce the bias against B w data, especially in regions with a limited number of glaciers with B w data. In conclusion, filling data gaps is much simpler than estimating B w on glaciers with no observations.

Spatial winter mass balance variability on individual glaciers
In order to evaluate the ability of the GBR models to reproduce the spatial variability of the B w over individual glaciers, we compared the vertical profiles of B w to the estimates of the models. For Rhonegletscher for instance (Alps, Fig. 7a), both site-independent and season- For each season, all glaciers with available B w data were considered (the number of glaciers used to derive the regional factor is indicated by the grey bars).
independent GBRs are able to represent the shape of the vertical profile of the B w , which is characterized by an increasing B w until 3350 m a.s.l. and a more stable/decreasing B w in the upper part of the glacier. This vertical profile cannot be reproduced by using the benchmark approach, where, by definition, the precipitation is monotonically increasing with the elevation. Decreasing B w is also clearly indicated in the upper part of the Abramov glacier (Central Asia, Fig. 7b) in 1992. As suggested by the point observations reported, this is certainly the result of extrapolating to elevation ranges not or only poorly covered with data. However, this has a limited influence on the GBR models than the lower part of the glacier, as it received higher weights because of the larger areas (see Sect. 2.2). The site-independent GBRs are not able to adjust the precipitation by consistently reducing the bias with B w . On the other hand, the season-independent GBRs are able to better fit the altitudinal distribution of B w . In this case, we observe that the maximum B w coincides with the maximum downscaled MERRA-2 relative humidity. In the case of Storglaciären (Scandinavia, Fig. 7c), B w is underestimated by the benchmark, while the GBR models (the season-independent especially) are able to better represent the steep increase of B w over the glacier. In the case of the Sykora glacier (western Canada, Fig. 7d), all GBR models show a good agreement with B w data. By comparing the coefficients of variation, it is clear that the season-independent GBRs are able to better reproduce the amplitude of the spatial variability of the B w than the site-independent GBRs (see Table S1 in the Supplement). Furthermore, the correlations demonstrate that the GBRs outperform the benchmark method to reproduce the B w of almost all glaciers of this study (Table S1).

Temporal winter mass balance variability on individual glaciers
In general, the GBR models show a better performance in reproducing the relative changes of B w among individual years for the same glacier than the original reanalysis (see Table 2). The correlation between the GBR models' estimates and the B w over the years is often much higher than for the original reanalysis. The level of significance of the correlation between the original ERA5 and/or MERRA-2 improves when the GBR models are applied, especially for Goldbergkees, Graasubreen and the Ts. Tuyuksuyskiy glacier. However, in some cases we still have low correlations, indicating that the models are not suitable to represent the temporal variability. Furthermore, although the season-independent GBRs are the best models to reduce the bias, the relative changes among years are sometimes better explained by the site-independent Figure 6. Evaluation of the mean regional factor between the B w data and the reanalysis-based models' estimates (F dB, region defined in Eq. 6) depending on different data used in the training of the GBR models. (a, c, e, g) Evaluation of the performance of the season-independent GBR models as a function of the number of seasons of the tested glacier used in the training (from no data to a maximum of 40 years of data of the tested glacier). GBRs. In fact, the number of years of data of the tested glacier used to train the season-independent GBRs does not seem to impact its performance in terms of temporal correlation with the B w data (see Fig. S3). The number of years with available B w data is typically much smaller in western Canada and Central Asia than in the Alps and Scandinavia (see Fig. 4a); therefore, we could not robustly evaluate the ability of the models to represent the temporal variability of the B w data for these regions.

Importance of predictors in the GBR models
In order to understand the importance of the predictors used by the GBR models (i.e. those not related to the elevation of the glaciers and their elevation difference with the reanalysis' grid), we evaluated the changes in terms of overall GBR model performance when suppressing groups of predictors. For both ERA5 and MERRA-2 site-independent GBR models, the smallest RMSE results when using all predictors ( Fig. 8a and b). The RMSE particularly increases when suppressing the MERRA-2 single-level and pressurelevel variables from the predictors. In turn, for both ERA5 and MERRA-2 season-independent GBR models, the smallest RMSE results when suppressing the single-level and pressure-level variables from the predictors (Fig. 8c and d).
The RMSE increases most when suppressing the year, the topographical parameters and the glacier coordinates simultaneously as predictors. However, skipping reanalysis variables from the set of predictors leads to higher errors for some individual glaciers, especially in the representation of the temporal variability of the B w data. In fact, by excluding the reanalysis variables, the year is the only predictor able to vehiculate the climatological information; in other words, the year is the only predictor that could be used by the GBR to predict a different adjustment factor depending on the accumulation season (all the other predictors are constant in time). Therefore, and to allow a fairer comparison between site-independent and season-independent GBRs, in all our following analyses we always included all predictors.
In order to infer the importance of the predictors for the individual study regions, we built an individual GBR for each region. We furthermore performed a principal component analysis (PCA) considering the 10 predictors most frequently used by the GBRs for each region. In the Alps, lower factors between B w and ERA5 precipitation result at lower latitudes, and the glaciers affected by 100 m westerly winds (negative u component of the wind speed) have generally higher factors than those affected by easterly winds (Fig. S2a). In Scandinavia, we notice a cluster of five glaciers with smaller Table 2. Pearson correlation (r [-]) between the reanalysis-based models and the glacier-wide B w over the accumulation seasons (temporal correlation) separated by regions. Only the glaciers with a minimum of 15 seasons with B w data are shown (i.e. no glacier in western Canada). The significance of the correlation is based on Student's t distribution. The level of the correlation significance is indicated as follows: normal font, not significant (p value > 0.10); underlined, significant (p value ≤ 0.10); and underlined and bold, highly significant (p value ≤ 0.05).  (Fig. S2c). In Central Asia, the glaciers' aspect is the predictor that most clearly discriminates between high and low factors between B w and both ERA5 and MERRA-2 precipitation. Glaciers with ∼ north-facing slopes show smaller ERA5 factors and ∼ east-facing slopes higher MERRA-2 factors ( Fig. S2e and f). In western Canada, lower ERA5 factors correlate with larger precipitation amounts and lower elevation of the glaciers, while MERRA-2 factors are clearly lower at higher latitudes, which are characterized by stronger southerly winds at 850 hPa ( Fig. S2g and f).

Discussion
The GBR models developed, evaluated and presented in this study showed a better overall agreement in terms of bias and spatial and temporal correlation with the B w data than the original reanalyses and the benchmark (lapse-rate-based approach described in Sect. 3.2.1) for the majority of the studied glaciers in the Alps, Scandinavia, Central Asia and western Canada. In the following, we provide a comprehensive discussion of the approach and the results.

Differences with lapse-rate-based approaches
With the exception of some specific sites, our GBR models outperformed the benchmark method (lapse-rate-based approach (Sect. 3.2.1)) in the Alps, Scandinavia, Central Asia and western Canada regarding the reduction of the bias against glacier-wide B w data (Figs. 4 and 5). This suggests that data-informed models such as our GBRs are needed to adjust reanalysis to different glacier sites, which can be characterized by different topographical and climatic conditions, and where the performance of reanalysis' estimates can vary greatly depending on the region (e.g. Sun et al., 2018). In fact, (independent) B w data were used to train our GBR models, allowing the GBRs to learn specific characteristics of actual B w on glaciers and to transfer them to unknown sites (site-independent GBRs) and unknown seasons (seasonindependent GBRs). The GBR models also outperform the benchmark to reproduce the spatial variability of B w on individual glaciers. We observed lower B w in the uppermost sections of many glaciers, which may be attributed to preferential snow deposition redistribution processes, caused by the interplay be- Figure 8. Overall root mean square error (RMSE) between B w and GBR models using different groups of predictors, for all analysed glaciers and years: (a) ERA5 site-independent GBR, (b) MERRA-2 site-independent GBR, (c) ERA5 season-independent GBR and (d) MERRA-2 season-independent GBR. The "pressure-level" group refers to all variables derived from the pressure-level data, i.e. "downscaled" variables and "delta_variable" (except for the elevation difference) in Table B3. The "single-level" group refers to all single-level variables listed in Table B3. The "topography" group refers to the topographical parameters describing the reanalysis's subgrid complexity (all constants in Tables B1 and B2) and the average slope and aspect of the glaciers by using the information provided in the Randolph Glacier Inventory version 6 (RGI Consortium, 2017). tween snow, wind and the generally steep topography (e.g. Sold et al., 2016;Gerber et al., 2019). The ability of GBRs to model non-linear relationships allows for a better representation of the vertical profiles of B w than the benchmark method (Fig. 7, Table S1). In fact, the observed spatial variability of B w could not be reproduced with the benchmark method, which by definition cannot represent decreasing values with the elevation (cf. Eq. 1).
Both the GBR models and the benchmark do not require direct in situ observations to be applied. However, the benchmark method is independent from any B w data used in this study (not data-informed). In turn, the performance of the GBR models is influenced by the number of data used to train the models and strongly depends on the characteristics of the glacier with respect to the glaciers used to train the models.

Differences with other machine learning algorithms
Intuitively, we preferred a tree-based algorithm given the high inhomogeneity in terms of spatial distribution of the considered glaciers. As further discussed in Sect. 5.2.1, a tree-based algorithm can exploit the coordinates (if provided as predictors) to easily split into individual sub-models adapted to different regions of the world (from the continental scale to the glacier-specific scale). Such operations would not be possible by using a simpler model such as a multiple linear regression. Also, it is less clear to us how artificial neural networks would behave given the considerable inhomogeneity of the spatial distribution. In fact, we have chosen a tree-based algorithm because of its higher readability in terms of the predictors' usage compared to other machine learning methods (e.g. Huysmans et al., 2011;Freitas, 2014). A disadvantage of tree-based algorithms, however, could be that this approach does not predict continuous values. Yet, here, we aim at predicting an adjustment factor depending on a classification based on the used predictors, which is exactly the purpose of a tree-based algorithm. The choice of a gradient boosting instead of other tree-based algorithms (e.g. random forest; Breiman, 2001) is motivated by the fact that gradient boosting is a gradient descent algorithm, where each additional tree tries to reduce the bias (which is the main goal of our study) rather than the variance of the predictions.
5.2 Impact of the data used to train the GBR models

Site-independent and season-independent GBRs
The lower generalization of the season-independent GBRs compared with the site-independent GBRs allows for the splitting into individual sub-models adapted to a small number of samples (see Table 1). This enables us to exploit the B w data of the tested glacier by creating a specific sub-model but can result in an overfit of the training data. On the contrary, the higher generalization of the site-independent GBRs allows for learning on overall relationships between the used predictors and the reference adjustment factors (Eq. 2). The used training data and the selected hyperparameters also have a direct influence on the predictors needed by the GBR models to reduce the cost function (Eq. 3). In fact, the use of reanalysis variables (from single level and pressure levels) as predictors caused an increase of the overall RMSE of both ERA5 and MERRA-2 season-independent GBRs against the B w data of all glaciers of the study (Fig. 8c  and d). However, despite the high correlation of the downscaled reanalysis variables (cf. Sect. 3.1) with the elevation of the glaciers, their inclusion in the set of predictors for the training of the site-independent GBRs reduced the overall RMSE (Fig. 8a and b). This difference can be explained by the combined effect of using data of the tested glacier in the training of the season-independent GBRs and defin-ing a small minimum number of samples required to create a leaf node of the GBR. In fact, the season-independent GBR can theoretically exploit the coordinates to split into individual sub-models adapted to individual glaciers. Therefore, the season-independent GBRs can learn the adjustment factors observed in the other accumulation seasons of the tested glacier and predict a similar adjustment factor for the tested accumulation season, with no need for learning overall relationships between the reanalysis predictors and the reference adjustment factors (Eq. 2).
In general, the year was included in the set of predictors for the GBRs, because it may allow for learning the climatological information and potential trends in terms of reanalysis biases against the B w data. This can be relevant in case reanalysis variables included in the predictors are not able to represent the hypothetical trends of these biases, which can exist due to the increasing availability of observational data that could have been assimilated by reanalysis models over the years. From Fig. 8, we notice that the year has a different impact on the site-independent and the season-independent GBRs. For the site-independent GBR, the withdrawal of the year from the predictors leads to a smaller increase of the overall RMSE than the withdrawal of the reanalysis variables (pressure levels and single level). In contrast, for the seasonindependent GBR, the best performance is reached by withdrawing the reanalysis variables. Thus, given the mentioned ability of the season-independent GBR to potentially create sub-models for specific glaciers using their coordinates, we think that the year may be the one predictor to actually vehiculate the climatological information in this case (the GBR model learns which periods correspond to larger/smaller biases for the tested glacier).

Spatial and temporal transferability of the GBR models
The GBR models were trained with almost 100 glaciers distributed over the four regions on three continents. Between the regions we observed different robustness and performances. The performance of the GBR models tends to decrease when removing B w data of neighbouring glaciers from the training process (Fig. 6b, d, f and h). Neighbouring glaciers were removed from the training as a function of the distance (range) from the tested glacier. Our results suggest that more available glaciers with B w data would probably greatly improve the performance in Central Asia and western Canada, where our dataset is limited in terms of the number of monitored glaciers, and the horizontal spacing between different sites is considerable. In the Alps, the network of monitored glaciers is much denser. Thus, more glaciers are excluded from the training for shorter distances than in other regions, impacting the performance of the site-independent GBR models. When the range of excluded neighbouring glaciers is extended to 1000 km, a strongly reduced number of glaciers of the same region is still used in the training, meaning that the models are almost exclusively trained with the glaciers of the other regions (the site-independent GBR almost becomes a region-independent GBR). The climate conditions and the complexity of the weather processes can be very different among the four investigated regions (and even within the individual regions). A region-independent model is thus not expected to provide accurate results. In Scandinavia, a linear precipitation gradient with elevation is more appropriate than in the more complex topography of the Alps and Central Asia (e.g. Rasmussen and Andreassen, 2005). Thus, the site-independent GBR models are only performing slightly better than the benchmark when the full set of the other glaciers is used in the training, indicating that a simpler lapse-rate-based approach might be preferable. However, considering the four regions, the bias of the regionindependent GBR models remains comparable to the bias obtained with the benchmark method, which is independent from any ground observation. The performance of the season-independent GBR models improved consistently when including in the training only a few other seasons of B w data related to the tested glacier (Fig. 6a, c, e and g). This thus demonstrates the uniqueness of the B w distribution over each glacier that cannot be easily reproduced by using the relations learned at other glaciers. However, this also indicates that the B w distribution, and its relation with precipitation, is similar in different years (see also e.g. Grünewald et al., 2013;Sold et al., 2016). For our application, there would be the added benefit from B w data on additional glaciers rather than on additional seasons.
The inclusion of the year in the set of predictors is not problematic for temporal reconstruction of B w with limited gaps but could be a limitation for extrapolation to future conditions (using for instance global or regional climate model data in lieu of reanalysis). In fact, even though the use of the year as predictor could allow for identifying and modelling trends of reanalysis biases against the B w data, these trends would be limited to the training period of the GBRs. In contrast, the exclusive use of reanalysis variables as predictors could allow for identifying and modelling trends of biases as a function of specific climatological conditions represented by reanalysis variables.
Furthermore, the use of total seasonal averages as inputs for the GBRs has limitations for the application of the presented approach. The GBRs provide an estimate of the overall adjustment factor for the reanalysis precipitation according to the average climatic conditions of the accumulation season. Thus, a unique adjustment factor is estimated for the whole accumulation season. The application of this adjustment factor to daily precipitation data would allow for obtaining an average estimate of the SWE evolution during the accumulation season. However, the obtained SWE evolution would neglect potential melt of snow during the accumulation season (see Sect. 2.2), as well as potentially different adjustment factors for precipitation at higher temporal scales than seasonal. This limitation is related to the lack of refer-ence in situ observations at higher temporal resolution than seasonal available for our study, focusing on different regions of the world and at high elevations.

Representation of the temporal variability of the winter mass balance
All GBRs aimed at minimizing the MSE between the predicted and reference logarithmic adjustment factors (Eq. 3). The improvement of the temporal correlation between the original reanalysis and the B w data is thus a consequence of bias-adjusted estimates over accumulation seasons rather than a primary goal of the GBRs. A sensitivity test reported in the Supplement (Fig. S3) suggests that the seasonindependent GBRs are not very sensitive to the number of years of data of the tested glacier used for training. Their performance is comparable to the site-independent GBRs (Table 2). Furthermore, only in a few cases did the siteindependent GBRs show a performance inferior to the original reanalysis or the benchmark method (e.g. Ts. Tuyuksuyskiy glacier). These promising results suggest that our new estimates could also be used to derive B w trends with generally higher accuracy than the original reanalyses, thus potentially providing insights into the relation between climate change and both snow accumulation and precipitation at the highest elevations of mountain ranges, where virtually no direct precipitation records are available. Still, the limited number of glaciers with abundant B w data coverage available over a sufficient number of years does not allow us to perform a complete application of this approach.

Impact of the chosen reanalyses on the GBR models
At a regional scale, the total precipitation estimated in the accumulation season by the original MERRA-2 has shown larger biases than the original ERA5 when compared to B w on glaciers. The coarser spatial resolution of MERRA-2 is certainly a factor causing larger biases in complex highmountain areas (e.g. Zandler et al., 2019;Chen et al., 2021). In fact, a coarse resolution directly implies that mountains are more strongly smoothed. The absolute elevation of a grid cell is thus lower for a coarse resolution, and the estimated precipitation also refers to the lower elevation of the grid cell. The performance of the original ERA5 and MERRA-2 has a direct impact on the GBR models. However, the GBR models were able to compensate for such differences in the bias. In fact, the biases of the ERA5 and MERRA-2 GBR models are much closer to each other than the biases of the original reanalyses (see Fig. 3a-d). The differences between the performance of our GBR models are also caused by the different predictors that have been used. For instance, we considered all the topographical predictors describing the reanalysis's subgrid complexity of both reanalysis products, and ERA5 is providing more descriptors than MERRA-2 (see Tables B1  and B2).

Influence of the winter mass balance data accuracy on the GBR models
Our study strongly relies on reference B w data on glaciers.
However, various problems are related to the direct measurements of snow accumulation on glaciers thus leading to uncertainties in the observations (e.g. Zemp et al., 2013;Sold et al., 2016;O'Neel et al., 2019;Huss et al., 2021). Most importantly, snow accumulation measured at individual points needs to be extrapolated in space to obtain B w data used in our analysis. At the highest elevations of glaciers with a typically difficult accessibility for manual observations, results are often purely based on extrapolation techniques (e.g. Østrem and Brugman, 1991;Cogley et al., 2011;Huss et al., 2021). Given that the WGMS database does not generally report how this was achieved and how many actual observations were available in a given elevation interval, it is difficult to assess the integrative uncertainty in the B w data used. In order to illustrate the importance of the extrapolated B w data used in this study, we more closely inspected point winter snow observations for 12 Swiss sites and three years (2016-2018) based on a dataset with higher resolution and full documentation (GLAMOS, 2021). Figure 9a indicates that a lower number of manual observations was typically performed at the lowest and the highest elevations of the glaciers. In some elevation bands, even no manual observations are available and B w data refer to an extrapolation. However, a much larger number of manual observations is typically performed in the elevation intervals corresponding to the largest areas of the glaciers (Fig. 9a and c). As indicated by Fig. 9e, considerable uncertainties might exist in the analysed vertical profiles of B w . However, the weighting function dependent on the area of the intervals used in the training of the GBR models assigned more importance to the B w data in such observation-rich areas. Furthermore, the main results of the study relate to glacier-wide B w data, which for most glaciers is very close to the glacier-wide mean of the manual observations as also indicated in Fig. 9e (only in 3 out of 34 cases is the ratio of glacier-wide mass balance to the average of all individual observations larger than 1.10, and in no case is the ratio lower than 0.90).
Another source of uncertainty that is difficult to assess is the starting date of the accumulation season. We considered the same starting date for all elevation intervals, even though it varies over the glacier's elevation range. The accumulation of snow starts later at low elevations and earlier at high elevations. Therefore, the different elevations also collect different precipitation totals (as the periods differ). The impact on the study of the date considered the beginning of the accumulation season has been evaluated with a sensitivity test (Fig. 9). For the same 12 Swiss glaciers and 3 years as above, we rely on the more detailed dataset of point winter mass balance data that documents start dates of measured cumulative snow precipitation of the winter season for each location individually. Start dates have been inferred based on a distributed glaciological modelling approach driven by daily local weather data . The total precipitation of ERA5 and MERRA-2 was derived over these varying starting dates and was compared with the total precipitation obtained with non-varying, average starting dates (Fig. 9b, d and f). Figure 9b indicates that at high (low) elevations, the accumulation season can start up to 20 d before (after) the unique date that we considered for all elevation intervals. These differences may translate into different amounts of total precipitation. In extreme cases, the total MERRA-2 ( Fig. 9d) or ERA5 (Fig. 9f) precipitation that would be ob-tained with varying dates would be almost twice (or half) of the total precipitation that we considered. However, the impact on the main results presented above is limited, because these large differences are typically observed at the highest/lowest elevations of the glaciers, where the glacier area is minor, and thus, a lower weight is assigned to the B w data in the training of the GBR models. Moreover, the main results of the study are based on glacier-wide values, and for both MERRA-2 and ERA5 we observe very small differences in terms of glacier-wide precipitation totals for the majority of the Swiss glaciers. Only in 6 out of 34 cases is the glacierwide ratio smaller (larger) than 0.97 (1.03), and only in two cases it exceeds 1.10.
This analysis suggests that the used B w data and the considered start dates can lead to relevant uncertainties in the analysis of vertical profiles. However, this does not generally have a relevant impact on our conclusions, which are mainly based on glacier-wide values.

Conclusions
In this study, we developed and evaluated a machine learning approach based on gradient boosting regressor models to adjust the total precipitation of reanalysis datasets (ERA5 and MERRA-2) over the accumulation season on glaciers and to ultimately reconstruct spatio-temporal winter mass balance (B w ). The high performance achieved with our approach allowed us to use it to derive observation-independent B w estimates over glaciers in the Alps, Scandinavia, Central Asia and western Canada. Data on B w covering a period of up to 39 years from 95 glaciers (WGMS, 2021) were used to train our approach.
The most important variables that were automatically selected by our GBR models were those related to the elevation difference between the glacier surface and the terrain model underlying the reanalyses. The latitude and longitude of the studied sites were also frequently used in order to discriminate between regions that are characterized by different climate conditions and weather systems, allowing the GBR models to be split into individual sub-models adapted to specific sub-regions.
In general, the total precipitation of the reanalyses largely underestimates observed B w on glaciers. The largest (relative) regional underestimation is observed in Central Asia for MERRA-2 and in Scandinavia for ERA5 (Fig. 3). The GBR models allowed for reducing these biases. In Central Asia and western Canada, the correlation between the original reanalyses' estimates and the B w on the analysed glaciers has considerably increased with the season-independent GBRs only. With the exception of some specific glaciers, our GBR models outperformed the benchmark method (lapse-ratebased approach) in the Alps, Scandinavia, Central Asia and western Canada by reducing the bias of the original reanalysis against the B w data (Fig. 4). This suggests that more com-plex and data-informed models such as our GBRs are needed to adjust reanalysis data to different glaciers located in different topographical settings and climatic conditions and to overcome the varying performance of reanalysis data for different region of the world.
Our results furthermore indicate that the seasonindependent GBRs outperform the site-independent GBRs to reduce the bias, which consequently makes filling temporal data gaps much simpler than estimating B w of glaciers where no in situ observations are available. Thus, a denser network of ground-based snow accumulation measurements and/or improved remote sensing observations are of great importance to further develop methods that allow spatio-temporal transferability of the observed snow and/or precipitation in high-mountain areas.
The GBR models, compared to the original reanalyses, have moreover shown improved performance in reproducing temporal changes (over years) of B w for the majority of the analysed glaciers. Generally, our GBR models would allow for deriving more accurate B w trends than the original reanalyses, thus potentially providing insights on the relation between climate change and snow accumulation over glaciers.
We finally demonstrated that machine learning models (with robust cross-validation schemes) can be powerful instruments to adjust precipitation estimates over glaciers. The new information that our approach is able to deliver can significantly improve the calibration of glaciological and hydrological models in different regions of the world, in particular for regions where the quantity and quality of observations are very limited, and the spatial resolution and performance of reanalysis products are (too) low.
Appendix A: Equations used to derive the relative humidity The relative humidity is not directly provided by all the reanalysis products; therefore we derived it by applying a similar approach to Liston and Elder (2006) and Gupta and Tarboton (2016), which is presented hereafter.

A1 ERA5
The relative humidity is not directly provided at the grid level; therefore, we combined the 2 m temperature (t2m) and dew point temperature (d2m) as follows: where r2m * is the computed 2 m relative humidity, and for ice/snow, a = 611.21 Pa, b = 22.452 and c = 272.55 • C.

A2 MERRA-2
MERRA-2 is not providing the relative humidity at the grid and at the pressure levels; furthermore, the dew point temper-ature is not provided at the pressure levels either. Therefore, we combined the specific humidity and the pressure in order to derive them (at the grid and at the pressure levels). For ice/snow, a = 611.21 Pa, b = 22.452 and c = 272.55 • C. Vapour pressure is where the specific humidity is QV = QV10M for the grid and QV = QV levels for the pressure levels, and the pressure is P = P S for the grid and P = P levels for the pressure levels.
The vapour pressure e * was named e10M * for the grid and e * levels for the pressure levels. Dew point temperature is .
The dew point temperature Td * was named Td10M * for the grid and Td * levels for the pressure levels. Relative humidity is where the temperature is T = T 10M for the grid and T = T levels for the pressure levels. The relative humidity RH * was named RH10M * for the grid and RH * levels for the pressure levels.  In Tables B1 and B2, we report the complete list of variables selected from the reanalyses products. In Table B3 we provide a summary of all the variables used by the GBR models.   Table B3. Summary of the variables used by the GBR models: all constants and single-level variables from the original reanalyses (Tables B1 and B2), variables downscaled at the elevation of the B w data ("variable_down"), mean of the variables during the accumulation season considering only days with a minimum precipitation of 5 mm ("variable_Ppos"), difference between "variable_down" and the variable at the original grid of the reanalysis ("delta_variable"), year, glacier coordinates, slope, and aspect. P grid is the precipitation of the original reanalysis. The variable names have the same roots as those reported in Tables B1 and B2  Data availability. Winter mass balance data separated per elevation interval are freely available at https://doi.org/10.5904/wgmsfog-2021-05 (EE-MASS-BALANCE data sheet in WGMS, 2021, last access: 1 June 2021). The average slope and aspect of the glaciers were obtained from the Randolph Glacier Inventory version 6 at https://doi.org/10.7265/4m1f-gd79 (last access: 5 May 2021, RGI Consortium, 2017). ERA5 hourly data on single levels and on pressure levels were downloaded from the Copernicus Climate Change Service (C3S) Climate Data Store at https://doi.org/10.24381/cds.adbb2d47 (Hersbach et al., 2018b; last access: 1 June 2021) and https://doi.org/10.24381/cds.bd0915c6 (Hersbach et al., 2018a;last  Author contributions. MGu conducted the analysis and wrote the manuscript with inputs from all co-authors. MH derived the snow accumulation data for the Swiss glaciers analysed in Sects. 2.2 and 5.2.5 and contributed to the related discussion. MGa and NS contributed to the design of the research and helped with continuous discussions concerning the results. All co-authors contributed to the final form of the paper.