Propagating information from snow observations with CrocO ensemble data assimilation system: a 10-years case study over a snow depth observation network

. The mountainous snow cover is highly variable at all temporal and spatial scales. Snowpack models only imper-fectly represent this variability, because of uncertain meteorological inputs, physical parameterisations, and unresolved terrain features. In-situ observations of the height of snow (HS), despite their limited representativeness, could help constrain intermediate and large scale modelling errors by means of data assimilation. In this work, we assimilate HS observations from an in-situ network of 295 stations covering the French Alps, Pyrenees and Andorra, over the period 2009-2019. In view of 5 assimilating such observations into a spatialised snow cover modelling framework, we investigate whether such observations can be used to correct neighbouring snowpack simulations. We use CrocO, an ensemble data assimilation framework of snow cover modelling, based on a Particle Filter suited to the propagation of information from observed to unobserved areas. This ensemble system already beneﬁts from meteorological observations, assimilated within SAFRAN analysis scheme. CrocO also proposes various localisation strategies to assimilate snow observations. These approaches are evaluated in a Leave-One-Out 10 setup against the operational deterministic model and its ensemble open-loop counterpart, both running without HS assimilation. Results show that intermediate localisation radius of 35-50 km yield a slightly lower root mean square error (RMSE), and a better Spread-Skill than the strategy of assimilating all the observations from a whole mountain range. Signiﬁcant continuous ranked probability score (CRPS) improvements This study investigates the potential for localised versions of the Particle Filter to spatially propagate information from in-situ observations of the height of snow (HS) in an ensemble of snowpack simulations. Compared with state-of-the-art deterministic and ensemble open-loop approaches, over ten years, we demonstrate that substantial improvements are only obtained in locations and elevation ranges where the reference errors are the highest. These areas correspond to locations where the density of meteorological observations, which are crucial for the correction of the meteorological forcings within SAFRAN analysis scheme, is the lowest. This demonstrates a good complementarity with the meteorological observation analysed by SAFRAN to reduce the current errors of the operational chain. 540 Results also show that lower errors than a large only while lower may too small too ﬁnally a good assimilated shows that


Introduction
Better monitoring the spatio-temporal variability of the mountainous snow cover is paramount to improve the forecasting of 20 snow-related hazards (Morin et al., 2020) and anticipate downstream river flow (Lettenmaier et al., 2015). In mountainous terrain, the snow cover inherits a high spatial variability from several factors. The topography controls on the precipitation phase, air temperature, wind exposition and radiation fluxes (Durand et al., 1993;Oliphant et al., 2003). Wind drift redistributes snow at every scale (Mott et al., 2018). Finally, vegetation traps the snow (Sturm et al., 2001) and also affects its net shortwave and longwave radiation (Qu and Hall, 2014;Malle et al., 2019). 25 Snowpack models are commonly used to derive snowpack properties in the mountains. Yet, their ability to represent snow cover variability over large areas is inherently limited by large errors in their meteorological forcings (Raleigh et al., 2015), and uncertain physical parameterisations (Essery et al., 2013;Krinner et al., 2018). In addition, explicitly accounting for processes such as wind drift and snow-vegetation interaction is not yet affordable at large scales.
In that context, additional sources of information are needed to mitigate snowpack modelling uncertainty in the mountains. 30 Observations from weather stations located in the mountains can be used to correct Numerical Weather Prediction (NWP) model outputs. Dedicated downscaling and analysis schemes such as SAFRAN (Durand et al., 1993) or RhiresD interpolation in Switzerland (Frei and Schär, 1998) can be used to efficiently reduce the large errors of the NWP models in the mountains, in particular by the assimilation of local precipitation observations. Such approaches significantly improve snow cover simulations (Durand et al., 1999;Magnusson et al., 2014). These weather stations, however, are generally located below 1200m 35 (Frei and Schär, 1998;Vernay et al., in review), and important errors in precipitations (for example) remain at higher elevations (Magnusson et al., 2014).
Data assimilation of snowpack observations may help address this issue in complement to these observations. Remotely-sensed retrieval of snow bulk properties (e.g. the height of snow (HS, m) and the snow water equivalent (SWE, kg m −2 )) is a promising wealth of snowpack observations for data assimilation (e.g. Margulis et al., 2019) but it is inherently limited by spatio-temporal 40 gaps (De Lannoy et al., 2012), or only available at coarse resolutions (Andreadis and Lettenmaier, 2006). In-situ observations of HS and SWE cover large mountainous areas and are operational on a daily basis in numerous countries (e.g. Serreze et al., 1999;Jonas et al., 2009;Durand et al., 2009b;Cantet et al., 2019). Their potential to improve local simulations is unambiguous as demonstrated by many studies (e.g. Magnusson et al., 2017;Piazzi et al., 2018;Smyth et al., 2019;Cantet et al., 2019).
However, the representativeness of such observations is limited by the snow cover spatial variability (Grünewald and Lehning, coarser in-situ observational coverage (Largeron et al., 2020).
Here, we investigate whether the assimilation of in-situ HS observations can improve simulations of the Météo-France operational modelling chain for snow cover monitoring and avalanche hazard forecasting in the vicinity of the measurement stations, 55 and what is the most appropriate assimilation strategy for that purpose. We assess this in a network of in-situ HS observations over the French Alps, French Pyrenees, and Andorra, with contrasted observation densities. We use CrocO, an ensemble data assimilation system of snow cover modelling (Cluzet et al., 2021). CrocO is built around an ensemble version of the operational modelling system of Météo-France (Vionnet et al., 2012;Vernay et al., in review), accounting for modelling uncertainties from the meteorological forcings (Charrois et al., 2016;Deschamps-Berger et al., in review) and the snowpack model itself 60 Dumont et al., 2020). CrocO includes several versions of the Particle Filter tailored for the propagation of information from observed into unobserved areas (Cluzet et al., 2021). These variants are used in a localised framework, in which only observations coming from a certain radius around the considered location are assimilated (Van Leeuwen, 2009;Penny and Miyoshi, 2016;Poterjoy, 2016;Farchi and Bocquet, 2018). Domain localisation is commonly used in the Ensemble Kalman Filter (EnKF, (Evensen, 1994)) and PF communities (Van Leeuwen, 2009;Poterjoy, 2016;Penny and Miyoshi, 2016;Farchi and Bocquet, 2018). It is used to remove far-range unrealistic correlations in the EnKF (Houtekamer and Mitchell, 2001) and to circumvent the curse of dimensionality, causing the PF to diverge when too many observations are assimilated simultaneously (so-called PF degeneracy) (Bengtsson et al., 2008). PF localisation proved to be efficient in several studies (e.g. Poterjoy and Anderson, 2016;Potthast et al., 2019).
To assess the potential transfer of information, we opt for a leave-one-out approach (e.g. Slater and Clark, 2006), whereby the 70 assimilation is performed considering neighbouring observations, but discarding any local observation. The assimilation performance can be then evaluated using these independent local observations. If such potential transfer could be demonstrated, it would mean that the assimilation method is able to improve simulations at a sufficient distance of available observations to be efficient over the whole simulation domain. In other words, this network of observations could be used to constrain spatialised snowpack simulations over the French Alps, Pyrenees and Andorra. Furthermore, the methodology could be applied to other 75 areas with similar densities of observations.
To summarize, the following questions will be addressed in this paper: -What is the performance of data assimilation compared with the operational and ensemble models?
-Can data assimilation manage to propagate information in space?
-What is the best localisation strategy for assimilation?

80
-Could an increased observation density yield better results for assimilation?
The study area, observations, modelling chain and data assimilation scheme are described in Sec. 2. In Sec. 3, the evaluation strategy and scores are presented. The results are presented and discussed in Sec. 4 & 5. We finally conclude and open research perspectives in Sec. 6.

Study area and observations
The study area spans the French sides of the Alps and Pyrenees and Andorra. The French Alps culminate at the Mont-Blanc (4810 m) and are higher and about two times larger than the French Pyrenees (culminating at Vignemale, 3298 m). Andorra is a principality located at the center-East of the Pyrenees. In the following, for the sake of simplicity, we will refer to French Pyrenees and Andorra as "Pyrenees", and to French Alps as "Alps".

90
The winter climate of the Alps is contrasted between the North and the South. The Southern Alps are on average drier than the Northern Alps (Isotta et al., 2014). The Pyrenees are very elongated with a strong longitudinal gradient between the humid oceanic Western side to the drier Mediterranean Eastern side. The elevation of the winter snow line is around 1500 m in the Pyrenees (Durand et al., 2012), and about 1200 m in the Northern Alps (Durand et al., 2009a). Finally, the inter-annual variability of the snow cover is marked in both massifs (Durand et al., 2009a;Gascoin et al., 2015).

95
In this work, we perform snowpack simulations in a network of 295 daily HS observations stations. 217 stations are located in the Alps, and 78 in the Pyrenees (of which 7 are in Andorra). This network is an aggregate of several data sources. Most of the observations (144 stations) come from ski resorts, where HS is manually observed every morning during the commercial season (mid-December to April in general). The second source is a network of climatological observations (77 stations) in 100 which several meteorological parameters and HS are observed on a daily basis for the whole year. These stations are generally located around populated areas or in ski resorts. A few sites (19 stations) come from various automated measurements in ski resorts. Two networks of automated HS sensors were also used: Météo-France's Nivôses (27 stations) and Électricité de France (EDF) EDFNIVO stations (28 stations), the latter only from the winter season 2016-2017 on. These networks are located in remote areas and at generally higher altitudes than the rest of the observations.

105
The density of HS observations within each SAFRAN massif ( Fig. 1, see Sec. 2.2.2 for more details on SAFRAN) is very variable, from less than 0.5 daily observations per hundred km 2 in the Extremely Southern Alps and Western Pyrenees to more than ten times higher densities in the Mont-Blanc massif. It is mainly explained by the variable density of ski resorts. Although the density of observations is generally lower than in the Alps, the Pyrenees exhibit two clusters of dense observations, in the 110 Central Western part around Bigorre and in the Central eastern part close to Andorra. In the Alps, the density of observations is especially high from the Northern to the South Central area. The Southern massifs, as well as the lower altitude western massifs generally have fewer observations.

Ensemble data assimilation setup
The ensemble system consists in an ensemble of meteorological forcings generated by stochastic perturbations, forcing a multiphysics ensemble of snow models as described in Cluzet et al. (2020) and Cluzet et al. (2021). The total number of ensemble members (also named particles in the PF context) was set to 160. An open-loop run (i.e. without assimilation) was performed to serve as reference. Only a few changes were performed in the ensemble setup, which are described in Secs.

Ensemble of snowpack models
The simulation setup is based on a multiphysics framework representing the uncertainties of the main physical parameterisations of Crocus Cluzet et al., 2020). However, in this paper, the advanced radiative transfer scheme TARTES (Libois et al., 2013(Libois et al., , 2015 was not used contrary to previous studies (Cluzet et al., , 2021 because it requires 130 Light Absorbing Particles (LAP) fluxes from chemistry transport models such as MOCAGE, ALADIN or GFDL_AR4 (Josse et al., 2004;Nabat et al., 2015;Horowitz et al., 2020). To date, such products are not interpolated within SAFRAN geometry and would require a specific treatment and validation, going much beyond the scope of this study. Instead, we opted for a single parameterization of the snowpack radiative transfer, the 'B60' option from Brun et al. (1992) presented in Lafaysse et al. (2017), whereby the snow albedo of a layer is a function of its age.

Ensemble of meteorological forcings
Meteorological forcings are taken from SAFRAN (Système D'Analyse Fournissant des Renseignements Adaptés à la Neige, Durand et al. (1993)) reanalysis over the Alps and Pyrenees. SAFRAN is a surface meteorological analysis system adjusting backgrounds from NWP model ARPEGE (Courtier et al., 1991) with local meteorological observations (air temperature, pressure, precipitation, humidity) within so-called massifs of about 1000 km 2 (see Fig. 1) and further downscaled to the sta-140 tions of our study. Over the considered period of time, 438 observation sites provided precipitation observations to SAFRAN between November and April. These stations are mostly located at lower elevations (below 1500 m) as presented in Fig.4 of Vernay et al. (in review). Among them, 164 of these sites correspond to locations with snow depth observations included in the present study. SAFRAN analysis is issued separately for each massif in a semi-distributed geometry, i.e within 300 m elevation bands, aspect and slopes, the main topographic parameters controlling the snow cover evolution. This analysis is subsequently 145 downscaled into the specific topographic conditions (i.e. elevation, slope, aspect and local topographic mask) of the simulated station . This means that a same analysis is applied to all the points within a same massif, and interpolated consistently with their topographic parameters, while analyses for neighbouring stations located in distinct massifs will be different.
An ensemble of forcings was generated by applying stochastic perturbations in the same spirit as Charrois et al. (2016) but with slight corrections in the implementation of the perturbations compared with Cluzet et al. (2020Cluzet et al. ( , 2021 as described in Deschamps-Berger et al. (in review). For each member, perturbations are auto-correlated in time following an auto-regressive process and are spatially homogeneous. The perturbation parameters were taken from Charrois et al. (2016). Precipitation parameters were adjusted (i.e. multiplicative noise with auto correlation time τ = 1500h, and dispersion σ = 0.5) in order to 155 obtain a spread-skill close to 1 for the open-loop run (see Sec. 4.1). We used these perturbed analyses as input for the snowpack simulations at the stations.

The Particle Filter in CrocO
The Particle Filter used in this work is based on the version described in Cluzet et al. (2021). Only a brief description of the 160 procedure is given here. The ensemble is updated sequentially with the PF on each assimilation date and propagated forward until the following assimilation date. The PF is localised: each point receives a different analysis. Based on the comparison of neighbouring simulations of HS with their corresponding HS observations, the PF selects a sample of the best ensemble members. The idea is that if a particle is performing well against nearby observations, it should also be efficient locally (Farchi and Bocquet, 2018). Different localisation radius are tested in this study ranging from 17 km to 300 km. Note that when a 165 particle is selected by the PF, the full local state vector is copied: the local physical consistency of the variables is preserved.
Particle Filter degeneracy (see Sec. 1) may arise even with a reduced local domain size, and approaches to increase the PF tolerance may be required to overcome it. The localisation is complemented here by two different strategies described in Cluzet et al. (2021), inflation and k-localisation, leading to the 'rlocal' and 'klocal' algorithms, respectively. If the initial analysis is degenerated (i.e. the effective sample size N ef f is inferior to a target N * ef f ), the rlocal and klocal iteratively modify the 170 assimilation settings to make it more tolerant, so that the PF analysis reaches a sample size of N * ef f . The rlocal algorithm performs an inflation of observation errors inspired by Larue et al. (2018). The klocal algorithm discards observations coming from locations exhibiting the lower ensemble correlations with the considered location. It is important to note that inside a localisation radius, the rlocal method assimilates all available observation stations whereas the klocal method only selects a subset of observations from locations where the ensemble members are sufficiently correlated with the simulation members of 175 the considered point.

Example
This section presents an illustrative example for the propagation of information with the localised PF. On December 3 rd , 2009, we perform an analysis at an unobserved point p loc (2135 m.a.s.l) using an observation from a nearby point p obs (2293 m.a.s.l, 180 7 km away). The top panel of Fig. 3 shows the HS simulated by the 160 ensemble members at the two locations until the considered assimilation date. The observed HS at p obs is 0.87 m, above the ensemble median at this location (about 0.5 m).
The PF will likely select the particles that have above average HS at p obs . The bottom panel of Fig. 3 shows the particles' HS values at p obs as a function of their value at p loc . A correlation can be noted: the particles predicting the highest HS at p loc usually also predict higher than average HS at p obs . It means that the ensemble that we constructed (see Sec. 2.2) considers 185 that the modelling errors are linked: if there is an underestimated snowfall in early December at p obs , it's likely that this is also the case at p loc .
The localised PF performs an analysis for p loc by comparing the values modelled at p obs with the available observation, thereby selecting the 'best' particles at p obs , (bottom panel, in green). The marginal distribution of the ensemble at p obs (right of the bottom panel, in green) is significantly sharpened compared to the background, and is much closer to the observation. At p loc , 190 the distribution of the HS values of these particles is also sharper, and exhibits higher HS than before the analysis.
This example shows how the localised PF has used the non-local observation at p obs to infer information about the local unobserved point p loc . This example can be generalized to the situation where multiple observations are assimilated simultaneously as done in this study. It also highlights the implicit importance of ensemble correlations with distant locations: in the absence of correlation, no information can be transferred. In such a situation, the klocal algorithm would discard the observations from 195 the least areas, while the rlocal would keep them. Finally, note that if the ensemble correlation is dramatically wrong, (i.e. positive correlation instead of negative correlation), the analysis will degrade the ensemble performance.

Evaluation strategy
This work aims at assessing the potential transfer of information between points in an HS observation network by means of 200 localized data assimilation, and more specifically to address the questions presented in the end of Sec. 1. To demonstrate that, the data assimilation system must over-perform its ensemble counterpart with the assimilation switched off (open-loop) and the state-of-the-art operational deterministic snow cover modelling system from Météo-France (oper), which consists in a default Crocus version forced by the unperturbed SAFRAN meteorological forcings (Vernay et al., in review).

205
Assessing the ability of data assimilation to propagate information requires use independent data for validation. We opted for a leave-one-out setup in which local observations are removed from the set of observations used in the local PF analysis. Only weekly observations were assimilated, while all available observations between October 1 st and June 30 th were kept for evaluation.
There are two key design parameters for the data assimilation system: the value of the localisation radius (large or small) and 210 the choice of the PF algorithm (rlocal or klocal). Both exert a direct or indirect control on the number of observations simultaneously assimilated by the PF, and therefore, on its potential degeneracy and its ability to transfer information between locations.
Experiments respectively combining the rlocal and klocal algorithm with 4 different localisation radius were conducted: rang- Because the klocal approach does not use inflation (except in the case of degeneracy with only one observation), it is quite sensitive to the initial value of observation error. In case of degeneracy, the smaller the observation error, the fewer observations will be selected by the klocal algorithm. For this reason, the klocal algorithm was run with a multiplication factor of 5 on observation error variance (hence a fixed error standard deviation of 0.22 m) , allowing more observations to be assimilated 220 simultaneously.

Evaluation Scores
Several metrics are used in this work to assess the performance of the oper, open-loop and assimilation runs with respect to HS observations. From the ensemble E m,p,t of N e members m at station p and time t, the mean can be computed using Eq. 1: The mean is a convenient way of synthesizing ensemble properties for evaluation, however, some artifacts can be observed with bounded variables such as HS. On a decaying snow cover for example, the mean will not reach zero until every member has melted. For this reason, the ensemble medianẼ p,t will be preferred in the following. FromẼ p,t , we can compute the Absolute Error of the ensemble median compared with the observations o p,t (AE): Where N t is the number of evaluation time steps.
The ensemble bias is defined as the average difference between the ensemble median and the observations (Eq. 3): The Root Mean Squared Error of the median (RMSE) is computed from the AE, following (Eq. 4): Bias and RMSE can be computed for the oper run (treating it as a single-member ensemble) in order to evaluate the median performance, and can be taken over time and/or space by dropping the time/spatial mean in Eqs.3 and 4. These scores are not sufficient because they reduce an ensemble to its median. The ensemble spread (or dispersion) σ (Eq. 5), defined as the average variance, is a first metric to assess an ensemble reliability: Reliability is a desirable property for an ensemble, it means that all events are forecast with the right probability regardless of the probability value. The pdf of a reliable ensemble matches the actual pdf of observations over a large enough sample. We introduce the Spread-Skill (SS) as: Where sigma must be computed only in the dates and locations where the RMSE is computed. For a reliable ensemble, we have σ ∼ RMSE (Fortin et al., 2015), i.e a spread-skill close to unity (necessary but not sufficient condition). This means that the spread is on average a good estimate of the modeling error, which is useful to make decisions. Rank diagrams (Hamill, 2001) are the histogram of the position of the observation within the ensemble and enable to verify the reliability of an ensemble more closely (e.g. Bellier et al., 2017). Their flatness is a stronger condition for an ensemble's reliability than the SS=1.

250
The Continuous Ranked Probability Score (CRPS, (Eq. 7) Matheson and Winkler, 1976) is an aggregate, ensemble score evaluating the reliability and resolution of an ensemble based on a verification dataset. An ensemble has a good resolution when it is able to issue different forecasts on different events (contrary to the climatology) (Atger, 1999).
If we denote F p,t the Cumulative Distribution Function (CDF) and O p,t the corresponding observation CDF (Heaviside func-255 tion centered on the truth value), the CRPS is computed at (p, t) following: The CRPS skill score (CRPSS) is commonly used to compare the performance of an ensemble E to a reference R. Although CRPS can be computed from a deterministic run, R should be preferably an ensemble because comparing CRPS of deterministic and ensemble runs mainly illustrates the obvious fact that an imperfect deterministic run is a poor representation of a 260 probability distribution. The following equation is frequently used: In this formulation, if E is more skillful than R, CRPSS*(E, R) will be positive, with a perfect score of 1., while less skillful scores range between −∞ and 0, resulting in an asymmetry between positive and negative scores (i.e. CRPSS*(E, R) = CRPSS*(R,E) CRPSS*(R,E)−1 ). We introduce the new formulation:

Overall results of the assimilation experiments
In this work, we want to compare the performance of the rlocal and klocal algorithm, with different localisation radii (rang-  median, lower inter-annual variability) but yields the lowest spread-skills and highest RMSE of all the assimilation runs suggesting that this approach is not the most desirable. The most selective localisation strategies (radii of 17 km) achieve the highest SS, but their inter-annual performance variability is higher than for the other localisation radii.

Factors of variability of the assimilation skill
In the following, we will investigate the different factors influencing the skill variability of the assimilation runs. As described in the previous Sec. 4.2, there are only small skill differences between the localised radii of 17-50 km, and between the rlocal and klocal algorithm. For the sake of illustration, we decided to focus on the assimilation configuration yielding the lowest median RMSE. This configuration, the klocal with a 35 km localisation radii, is further referred to as 'klocal' configuration.

Temporal variability
Timeseries of ensemble bias can also provide information on their nature and origin. Fig. 11 shows diagram (twice as much as for a reliable ensemble), and preferentially above, which is consistent with the residual negative bias of the klocal simulation.

Discussion
In the following, we analyse the strengths and weaknesses of the operational and open-loop simulations and comment on the 365 performance of the data assimilation algorithms in comparison to them.

On the performance of the reference simulations
The performance of the operational simulation has been regularly assessed until recently (Durand et al., 2009a; in review). Overall, it is an accurate modelling system whose potential has been demonstrated in several recent climate studies and projections (e.g. López-Moreno et al., 2020;Verfaillie et al., 2018). However, it exhibits a contrasted regional performance   Fig. 13), and its errors are badly known at high altitude, due to the lack of observations (Fig. 12 of Vernay et al. (in review)). This is a common issue in mountainous areas (Frei and Schär, 1998) and is detrimental for the use of the operational chain for all applications (e.g. avalanche hazard forecasting, hydrology etc.).
Results from Tab. 1 shows that the operational version of the system, and its ensemble version, the open-loop, have comparable RMSE. The open-loop run is reliably accounting for its modelling uncertainties and errors, since its SS is slightly below unity over the ten years. This means that on average, the ensemble spread is almost a reliable estimate of the modelling error. This feature could be valuable for forecasters (Buizza, 2008). Tab. 1,and Figs. 6 and 11 show that the open-loop is negatively biased compared to the oper. This could be due to the centered stochastic perturbations (Charrois et al., 2016;Deschamps-Berger et al., in review), or a bias in the ESCROC multiphysics 380 model configurations . However, the oper model configuration is not expected to be perfectly centered in the open-loop, as several configurations, such as the parametrization of surface heat fluxes, ground heat capacity or fresh snow density strongly influence the resulting modelled snow depth. Strong increases in the oper and open-loop biases match with precipitation events, and they are only partly compensated by the following snow settling period (see Sec. 4.3.2), suggesting that it is likely that error compensations take place in the oper chain, between solid precipitation amounts, fresh snow density, 385 snow compaction, and ablation processes as suggested by results from Quéno et al. (2016). Evaluation with co-located SWE and HS data would help disentangle this situation (e.g. Smyth et al., 2019).
Biases of the oper and open-loop strongly depend on the altitude (Fig. 7) in a pattern that matches the evaluation from Vernay et al. (in review), though on a smaller number of stations and considered years. They are unambiguously negative in the range 1500-2500 m, and more variable above, probably due to a higher snow cover variability, and depending on the considered 390 region. In the range 1500-2500 m, this bias may be explained by higher wind speeds than at lower elevations, causing an underestimation of solid precipitation amounts in gauges (Kochendorfer et al., 2017), and consequently in SAFRAN, as evidenced by  during strong precipitation events.

On the PF strategies 395
In general one of the primary motivations of the domain localisation is to prevent the PF from degenerating (Farchi and Bocquet, 2018). In our case, as evidenced by the reasonable performance of the rlocal with a 300 km localisation radii (e.g. therefore simultaneously assimilating up to 217 observations in the Alps), domain localisation is not required against PF degeneracy thanks to the mitigations (i.e. inflation or k-localisation) developed in Cluzet et al. (2021). Here, localisation is rather used to adapt to the structures of errors of the reference run. From Fig. 5, it seems that open-loop bias is systematic and widespread.

400
Then a large localisation radii, averaging a significant number of observations, seems a good option. However, we also see regional structures in this bias, probably inherited from the oper (Vernay et al., in review). They are likely due to the fact that SAFRAN analyses are performed at the scale of the massif. To address this type of error, reducing the localisation radii is probably a better option. Finally, errors structures can depend on other parameters such as the elevation, and vary in time. In this situation, the klocal approach might be more adapted, since it adjusts the observation selection on the model background 405 correlation patterns. However, these background correlation patterns could sometimes be unrealistic, and therefore, misleading for the algorithm.
The klocal algorithm, by construction, selects observations from locations that are correlated in the model's point of view.
However, because we apply spatially homogeneous perturbations to the meteorological forcings, strong large scale background 410 correlation patterns are present in the open-loop, even between the Alps and Pyrenees (not shown). These strong, potentially artificial, large scale correlation patterns could hamper the performance of the klocal PF, leading it to assimilate very distant observation with no actual link with the considered location. Conversely, a completely random field of perturbations would prevent the algorithm from propagating any information between locations (Magnusson et al., 2014;Cantet et al., 2019). Using physically-based meteorological ensemble, such as PEARP (Descamps et al., 2015), used in Vernay et al. (2015) or AROME-415 EPS (Bouttier et al., 2016), or spatially correlated perturbation fields (Magnusson et al., 2014), could lead to more realistic correlation fields, but this goes much beyond the scope of this study, as actually, domain localisation prevents the klocal from assimilating too distant observations.

Overall performance of the assimilation compared with the references
Here, we discuss the ability of the proposed assimilation approaches (with several localisation radii) to succeed in reducing 420 the modelling errors from the oper and open-loop shown in Sec. 5.1. Aggregated results from Fig. 6 show that none of the proposed assimilation configurations enable us to significantly reduce overall modelling errors compared to the operational run. However, they overcome the significant negative bias of the open-loop they originate from, but at the expense of a strongly under-dispersive spread-skill. The bias reduction seems more efficient and stable (i.e. less variable from year to year) with the rlocal than with the klocal, and with a larger localisation radii, which makes sense as the open-loop bias is widespread (e.g. 425 Fig. 11) and both tend towards assimilating more observations at the same time. However, the RMSE is slightly larger for the largest localisation radii, and the spread-skill is strongly reduced too.
There are two reasons why the assimilation could not outperform the operational run in terms of RMSE. First, its error may be of a same magnitude than the natural variability of point scale observations and in that case, no added value can be extracted 430 even from nearby observations, or similarly, there are too few observations to efficiently constrain modelling errors. Increasing the observation density could be an option to overcome this issue. However, our results do not show a strong relationship between assimilation skill and density (Fig. 10, see Sec. 5.5 later on). Another explanation could be that there still remain systematic errors to correct, namely biases (as suggested by Fig. 7) but it is difficult to propagate information between locations. In an idealised case, (Cluzet et al., 2021) showed that the potential to propagate information from HS observations across

On the difficulties faced by assimilation algorithms
In this part, we comment the performance of the klocal with a localisation radii of 35 km assimilation configuration against the open-loop. Although it does not outperform other configurations significantly, the klocal seems best suited to solve the bias-elevation relation in the references and an intermediate localisation radii enables to adapt to local error structures (see Sec.

5.2).
The CRPS improvement is the highest for intermediate elevations coinciding with the highest open-loop negative bias (Fig. 9, 445 the latter being consistent with Cluzet et al. (2021) who showed that the largest improvements were obtained in the presence of systematic biases.
However, the klocal is strongly underdispersive, contrary to the open-loop which achieves a SS around 1, and therefore is significantly less reliable as evidenced by the U-shaped rank diagrams in Fig. 12. As the CRPS is a measure of both accuracy and reliability, it seems surprising to see that the klocal is more skilful than the open-loop in terms of CRPS, with average 450 positive CRPSS around 0.06 (Fig. 9).
This under-dispersion is not satisfactory because it implies that the assimilation run is too confident about its simulated distributions. This is a general issue for all the presented assimilation strategies (Fig. 6). In additional experiments (not shown), the assimilation frequency was reduced to 14 days, in order to let the ensemble spread increase between assimilation dates. It seems a reasonable value according to e.g. Smyth et al. (2020) and Viallon-Galinier et al. (2020), and resulted in an increased 455 spread, but was detrimental to the RMSE. We did not consider increasing the target efficient sample size, N * ef f , which is set to 100. This value, is much higher than previous studies (Larue et al., 2018;Cluzet et al., 2021) and was chosen as preliminary experiments (not shown) with values of 25 and 50 which gave an even lower SS. Finally, the spread of the stochastic perturbations on the forcings could be increased, or statistically calibrated distributions of the main forcing variables (e.g. Taillardat and Mestre, 2020) could be used.

460
Nevertheless, obtaining a perfect spread-skill may be a challenging goal for our assimilation system. Under dispersion is a common issue in the NWP (e.g. Bellier et al., 2017) and snow cover modelling communities Nousu et al., 2019). The spatial scale of our ensemble modelling framework cannot account for two important processes affecting the observations at the stations: the variability of the meteorological conditions inside SAFRAN massifs, and the snow redistribu-465 tion by wind (Mott et al., 2018). On the one hand, the variability of the meteorological conditions inside SAFRAN massifs is limited to topographic parameters (including local masks) so that two distant stations with the same topography will receive the exact same forcing (especially precipitation), and the snow redistribution by wind is not represented (Vionnet et al., 2018).
On the other hand, the spatial representativeness of observations is limited by plot-scale variability.
Data assimilation is known to partly compensate for such scale mismatches via error compensation. Error compensations are 470 also possible between physical processes (Klinker and Sardeshmukh, 1992;Rodwell and Palmer, 2007;Wong et al., 2020). For example, an ablation event in one observation can be compensated in the Particle Filter by selecting some members with a lower precipitation factor or a compaction scheme with a higher settling (Deschamps-Berger et al., in review). This compensation immediately results in lower errors, but implicitly, the model does a wrong assumption, which results in being over confident, thus with a lower spread. The only way to mitigate for this over confidence is to account for any relevant physical phenomenon, 475 which is a desirable goal, but a real challenge when it comes to snowdrift by wind, local meteorology and plot-scale variability.
This goal is to date out of reach at the temporal and spatial scale of this study.
Despite these limitations, the assimilation shows some ability to correct weaknesses of the reference runs. The first one is the significant bias above 1500 m in the reference run (Fig. 7). This bias probably originates from a lack of meteorological ob-480 servations in SAFRAN analysis at those altitudes (see Sec. 5.1 and Fig. 4 of (Vernay et al., in review)). In the range 1500-2000 m, the klocal has a significantly lower bias than the open-loop. There is a lower benefit at higher elevations, above 2000 m.
( Fig. 9), maybe owing to the fact that snow cover variability is higher, in particular due to stronger winds. There are also less observations available, and a less clear bias at this altitude (there seems to be a transition from a negative bias to a positive bias), reducing the odds of a successful assimilation. Unfortunately, such elevations are key for avalanche activity (Eckert et al., 2013;485 Lavigne et al., 2015). Another good feature of the assimilation is to improve the accuracy in areas where the references are less accurate due to a lack of meteorological observations, namely Andorra and Haute-Ariège in the Pyrenees, and Ubaye, Haut Verdon and Mercantour in the southern Alps (Fig. 8). Both features underline the complementarity between HS observations and the meteorological observations already assimilated in SAFRAN.

Performance in relation to the density of observations
The density of in-situ observations has been pointed out as a critical parameter for the success of data assimilation (Largeron et al., 2020). Winstral et al. (2019) managed to strongly reduce modelling errors with a high observation density, (about 1 observation site every 100 km 2 ). Because of natural variability, they considered detection of systematic errors may be more difficult with a lower density. Our study case explores a wide range of observation density (Fig. 10), from about 0.1 to 0.8 495 observations every 100 km 2 (accounting for the availability of observations). Yet, as mentioned in Secs. 4.2 and 5.1, the assimilation performance relative to the open-loop does not decrease with a lower observation density. It may be due to the fact that the assimilation is efficient only for strong open-loop negative biases (Fig. 9b), which seems the highest where the station density is the lowest (Fig. 10b) 2019)) because the open-loop performance is already high there. This behaviour is explained by the fact that the HS observation density is correlated with the density of precipitation observations used by SAFRAN to analyse the meteorological forcings (see Fig. 13 and Sec. 2.2.2)). Both (at the exception of the Nivôse and EDF nivo stations for the HS observations) are actually related to human implantation in the valleys and the presence of ski resorts. A higher weather station density for SAFRAN is likely to result in more accurate meteorological forcings, thus reducing the bias of the reference runs, This assumption may guide the strategies of definition of snow cover networks, not only in terms of observation density but also in terms of localisation. Our study suggests that snowpack observations do not yield significant improvements in areas where a sufficient amount of meteorological observations is already assimilated in the snowpack modelling chain (here, in 510 SAFRAN). The assimilation of snow depth observations rather gives significant improvements at higher altitudes, and in areas where model errors are larger, generally corresponding to areas where less meteorological observations are assimilated. This result could be verified in future work, in either semi distributed or distributed frameworks, validated by e.g. satellite retrievals of the snow cover fraction (Magnusson et al., 2014). 515 5.6 Towards the assimilation in a semi-distributed geometry?
The aim of this study was to assess the potential of the assimilation of in-situ HS observations to correct nearby simulations, in view of applying it in a semi-distributed or distributed framework (Cluzet et al., 2021), in a similar strategy as Magnusson et al.

520
The results are mitigated: an added value is observed only when initial modelling errors are large (Fig. 9b), similarly to results obtained by Winstral et al. (2019). In the Northern Alps, Western Pyrenees and under 1500 m, the added value is null on average, and seems too insufficient to be of a real use. Over these areas, it seems that there is no room for improvement with data assimilation of point scale HS only. There, simulation accuracy may be more limited by snow related processes such as wind drift and uncertain physical processes resulting in snow cover variability, than by meteorological errors. The use of spatialised 525 satellite retrievals (Margulis et al., 2019;Cluzet et al., 2020) to better constrain snow cover variability, or a finer correction of meteorological forcings using radar precipitation data (e.g. Birman et al., 2017;Le Bastard et al., 2019) in combination with higher resolution NWP models and their ensemble counterparts, might be a solution. Previous studies already demonstrated the added value of in-situ HS observations in a similar setting with a dense observation coverage (Magnusson et al., 2014;Winstral et al., 2019). It was suspected that lower observation densities would reduce the potential for assimilation. Here, we exploit data with a wide range of densities, generally lower than these studies, and find no 540 sensitivity of the assimilation performance to the observation density. This finding may be specific to the error structures of the reference simulations, which are correlated with the observation density.
Results also show that intermediate localisation strategies between 35-50 km of radii yielded slightly lower errors than a strategy addressing large scale errors only (300 km), while lower radii (17 km) may be too small to capture the snow cover variability where the density of observations is too small.

545
Our results finally show a good complementarity between the HS observations and meteorological observations already assimilated in the modelling chain, in particular in the most remote areas. This result is encouraging in the way of reducing the weaknesses of the current operational modelling chain, and shows that even scarce in-situ snowpack observations could be beneficial for snow cover modelling over large areas.
Code availability. The Crocus snowpack model (including all physical options of the ESCROC system) and the Particle Filter algorithm are outputs represent a considerable amount of data (300+Go) and are archived in Météo-France archiving system and can be accessed upon reasonable request to Matthieu Lafaysse (matthieu.lafaysse@meteo.fr).
Author contributions. BC, MD and ML conceived the study, BC performed the simulations, treated the observations and wrote the manuscript, ML downloaded the observations, CDB updated the routines to generate the forcings and helped with the interpretation of CRPS, MV performed the oper runs. All authors contributed to the analysis of results Deschamps-Berger, C., Cluzet, B., Dumont, M., Lafaysse, M., Berthier, E., Fanise, P., and Gascoin, S.: Assimilation of snow depth maps from spaceborne photogrammetry in a detaled snowpack model, in review.
Mott, R., Vionnet, V., and Grünewald, T.: The seasonal snow cover dynamics: review on wind-driven coupling processes, Frontiers in Earth