Simulating snow maps for Norway: description and statistical evaluation of the seNorge snow model

Daily maps of snow conditions have been produced in Norway with the seNorge snow model since 2004. The seNorge snow model operates with 1 × km resolution, uses gridded observations of daily temperature and precipitation as its input forcing, and simulates, among others, snow water equivalent (SWE), snow depth (SD), and the snow bulk density (ρ). In this paper the set of equations contained in the seNorge model code is described and a thorough spatiotemporal statistical evaluation of the model performance from 1957–2011 is made using the two major sets of extensive in situ snow measurements that exist for Norway. The evaluation results show that the seNorge model generally overestimates both SWE and ρ, and that the overestimation of SWE increases with elevation throughout the snow season. However, theR2-values for model fit are 0.60 for (logtransformed) SWE and 0.45 for ρ, indicating that after removal of the detected systematic model biases (e.g. by recalibrating the model or expressing snow conditions in relative units) the model performs rather well. The seNorge model provides a relatively simple, not very data-demanding, yet nonetheless process-based method to construct snow maps of high spatiotemporal resolution. It is an especially well suited alternative for operational snow mapping in regions with rugged topography and large spatiotemporal variability in snow conditions, as is the case in the mountainous Norway.


Introduction
Seasonal snow cover is an important element, and of great social and economic significance in many countries, including Norway, where approximately 30 % of the annual precipitation falls as snow.Good knowledge of snow conditions is important in hydropower production planning, forecasting of floods and the risk of avalanches, and in water resources management.Snow conditions also influence many aspects of society, such as traffic flow, construction safety, winter sport activities, and play an important role in population dynamics and survival of many animal and plant species (e.g.Stenseth et al., 2004;Borgstrøm and Museth, 2005;Tahkokorpi et al., 2007).Consequently, national snow maps providing updated information on snow conditions are produced in many countries.For example, in Finland (www.fmi.fi,www.environment.fi),Sweden (www.smhi.se),Switzerland (www.slf.ch) and Canada (www.socc.ca)these maps are normally constructed on the basis of (interpolated) in situ snow observations or interpreted satellite images, or a combination of both.In the US (www.nohrsc.noaa.gov),snow maps are compiled by a rather detailed process-based energy balance model and assimilation of observations into the model simulations.
In Norway, daily maps of snow conditions have been produced at the Norwegian Water Resources and Energy Directorate (NVE) with the seNorge snow model since 2004, in cooperation with the Norwegian Meteorological Institute (met.no) and the Norwegian Mapping Authority (Tveito et al., 2002;Engeset et al., 2004a,b).A model approach for snow mapping was chosen, among others, due to the lack of spatiotemporally well-distributed snow observation network and due to difficulties in using satellite-based snow observations in the cloudy and mountainous Norway.The seNorge snow model operates with the same 1 × 1 km resolution as the model used for operational snow mapping in the Published by Copernicus Publications on behalf of the European Geosciences Union.US (see above), but is simpler and requires less forcing data.It simulates different snow-related variables, such as snow water equivalent (SWE), snow depth (SD), bulk snow density (ρ = SWE/SD), and the amount of liquid water in snowpack (W L ), using gridded values of daily mean temperature and daily sum of precipitation as input forcing.Similar input (i.e.only precipitation and temperature) has been previously used in, among others, models of Anderson (1973), Lindström et al. (1997) and Schreider et al. (1997), according to Armstrong and Brun (2008), who reviewed the history of numerical modelling of snow cover.
The model forcing data is produced by the met.no, and is based on spatial interpolation of available temperature and precipitation observations to a grid with 1 km resolution (Tveito et al., 2005;Mohr, 2008Mohr, , 2009)).Consequently, a snow map consists of simulated snow conditions in approximately 324 000 1 × 1 km grid cells covering Norway.
According to the open data policy of NVE, these maps are published at www.seNorge.noand the simulation results are freely available.The time series of daily snow maps go back to 1957, and these maps are used actively in many operational tasks, including weekly reports on snow conditions, hydropower production and power system planning, forecasting of runoff, floods and the risk of avalanches, in addition to the public (e.g. for checking skiing conditions).The historical time series of snow conditions are also applied in scientific studies.
The seNorge evaluation studies by Engeset et al. (2004b), Alfnes (unpublished NVE research note, 2008) and Stranden (2010), using snow data from point measurements, snow courses, snow pillows and catchment model simulations indicated a general overestimation of SWE (especially in the southern Norway) and ρ, except for a few glacier sites where SWE was generally underestimated and ρ still overestimated.In these studies the overestimation of SWE was associated with the overestimation of the input precipitation, likely due to too strong an elevation gradient of precipitation.For the ten studied stations of the hydropower companies, the annual maximum SWE was on average overestimated by +86 %, and SD somewhat less by +50 % due to overestimation of ρ (Stranden, 2010).However, in terms of year-to-year variation, and deviation from a long-term mean value, the evaluation of Engeset et al. (2004b) showed a relatively good coherence between simulated and observed SWE.Dyrrdal (2010) used observations of SD from 11 meteorological stations in three different regions and found a general underestimation of SD on all stations for the study period 1971-2003, except for the station with highest elevation (830 m a.s.l.-above sea level) where SD was overestimated.She associated the model discrepancy mainly to overestimation of ρ by the snow compaction model, since the interpolated precipitation was larger than the observed at most of the selected stations, while the interpolated temperature values showed good agreement with observations.Similarly, the study of Endrizzi (unpublished NVE research note, 2010) indicated a general model underestimation of SD at seven meteorological stations where the model was run with observed temperature and precipitation forcing.Ragulina et al. (2011) analysed high-resolution SD measurements (< 1 m sampling interval) obtained by towing a ground-penetrating radar along approximately 80 km-long transects over the Hardangervidda mountain plateau in southern Norway in April 2008April , 2010April and 2011, approximately at the time of the maximum annual SWE.They compared the observed mean SD averaged over the seNorge grid cells to the simulated values and found a general SD overestimation by the model by roughly +50 %.
The previous seNorge snow model evaluation studies, briefly reviewed above, indicate a general overestimation of SWE and ρ by the model.However, the evaluations in these studies have often been rather qualitative, spatially or temporally restricted to a limited number of stations or years.Since snow conditions vary strongly with region and elevation, as well as with the time and the weather characteristics of the snow season (e.g.cold and dry vs. wet and warm winters), a comprehensive set of data is needed to thoroughly evaluate the seNorge snow model and the approximately 20 000 daily maps of snow conditions covering the approximately 324 000 different grid cells of Norway.
Due to the general overestimation of snow amounts in the seNorge-simulated snow maps, one has often been bound to use relative values of snow conditions instead of absolute values (i.e.expressing SWE as a percentage of a long-term median value instead of mm units) in practical applications, such as flood forecasting and hydropower production planning.The lack of accurate absolute values of snow conditions is a real limitation for many existing and potential new applications of the snow maps.
The main aim of this paper is to increase the usefulness and quality of the seNorge snow model results, and thus the snow maps for Norway, by: -describing the set of equations contained in the seNorge model code; -making a thorough statistical evaluation of the model performance from 1957-2011 using the two major sets of extensive in situ snow measurements that exist for Norway, namely (1) the over 5 million measurements of SD taken at meteorological stations, and (2) the over 40 000 measurements of SD and ρ (and thus SWE) collected by various hydropower companies; -quantifying confidence limits and systematic biases for model simulations, in addition to their variation regionally, temporally and by elevation;  a Currently parameterized as 0.08 b Grid cell specific default values are applied for C Mmax depending on the latitude and forest cover (higher values above treeline) of the grid cell (Engeset et al., 2004b).c The original equation in inch-units is converted to mm-units here (original value of b 1 is multiplied by 25.4).

Description of the seNorge snow model
In this section the equations in the seNorge snow model code (version 1.1) are presented.The model parameters and their default values are summarized in Table 1.
The seNorge snow model takes as input forcing the daily mean air temperature T [ • C] and the daily sum of precipitation P [mm].The model consists of two main modules: (1) the SWE model for snow pack water balance, based on the snow routine in the HBV model (Saelthun, 1996), and (2) the snow compaction and density module (Alfnes, unpublished NVE research note, 2008) used to convert SWE to SD.The SWE model for snow pack water balance is run before the snow compaction and density module.

SWE model for snowpack water balance
The precipitation is categorized as liquid or solid (P L , P S ) depending on whether T is above or below the snowfall threshold temperature parameter T S : If T ≤ T S (conditions for snowfall) If T > T S (conditions for rain) where the parameters f S and f R are correction factors for the input precipitation (as snow and rain, respectively).The degree-day factor C M for potential daily melting rate [mm d −1 • C −1 ] varies in the model seasonally as a sinuswave between C Mmin and C Mmax reaching its maximum at the summer solstice (around 21 June): where N d is the number of the current day in the current year.Depending on whether T is above or below the melting threshold temperature parameter T M , the potential daily melting (positive values) or refreezing (negative values) M * [mm d −1 ] in the snow pack is calculated.The actual daily melting or refreezing M [mm d −1 ] is restricted by the availability of ice and liquid water in the snow pack, respectively: If T ≤ T M (conditions for refreezing) where the superscripts t and t − 1 denote values from the current (today's) and previous (yesterday's) time steps, and where C rf [mm d −1 • C −1 ] is a degree-day factor for refreezing in the snowpack.The ice content in snow (in water equivalents) W I [mm], as well as the potential and actual liquid water contents in snow (W Lpot and W L , [mm]) are updated correspondingly: where r max is the maximum allowed liquid water to ice weight ratio (W L /W I ) in the snowpack, used to restrict the actual liquid water content in snow (W L ).Finally, SWE [mm] is the sum of liquid water and ice (in water equivalents) contained in the snow pack, and the difference between potential and actual liquid water contents in the snow pack is lost to runoff Q [mm d −1 ]:

Snowpack compaction and density model
The snow compaction and density model algorithms in the seNorge model are adopted from the VIC model (Cherkauer and Lettenmaier, 1999) and from the SNTHERM model (Jordan, 1991).This part of the model calculates the changes in SD [mm] in three steps.In the first step, any net decrease in SWE since the previous time step (due to melting; not taking into account the new snow fallen at the current time step) is taken into account in the snow depth SD 0 by reducing SD from the previous time step with the same relative amount: In the second step, the lowering of the snow depth SD 1 due to instant compaction of the old snowpack (i.e.compaction only below the new snow fall) owing to weight of new snow fallen at the current time step (if any) is calculated as in Bras (1990): where b 1 and b exp are empirical coefficients (see Table 1).
After the initial compaction and added new snow, snow depth SD 1 becomes thus: where ρ ns is the density of new snow fallen at the current time step (if any).The value of ρ ns is estimated as a function of air temperature as in Bras (1990): where ρ nsmin is the minimum density of new snow, a ns is an empirical coefficient (see Table 1), and T fahr is the air temperature in Fahrenheit units, i.e.T fahr = T • 9/5 + 32.
In the third step, a gradual compaction SD 2 of the whole snow pack is calculated by making the assumption commonly used in snow models that snow behaves as a viscous medium (e.g.Yen, 1981;Armstrong and Brun, 2008): where the numerator describes the force [N m −2 ] due to the weight of the snowpack over the layer that compaction is calculated for, and the denominator the viscosity of the snow [Ns m −2 ] as a function of snow temperature T snow (estimated by min (T , 0)) and density ρ 1 = SWE/SD 1 .The k c , g and ρ W are the weight scaling constant (0.5), gravitation constant (9.81 m s −2 ) and water density (1000 kg m −3 ), respectively.The η 0 , C 5 and C 6 are empirical coefficients (see Table 1) and t the time step (= 86 400 s).The final snow depth and density (SD t , ρ t ) for the present time step are then calculated as:

Model evaluation
As pointed out in Sect. 1, previous studies have indicated that the seNorge snow model tends in general to overestimate both SWE and ρ.In order to make a statistical and more detailed evaluation of the model performance along different covariates, such as elevation and the date in the snow season (t s ), two extensive sets of in situ snow measurements for Norway are applied.The first of the snow data sets consists of daily point measurements of SD recorded at stations operated by the met.no since 1882 (hereinafter referred to as the "met.no-data").The second data set consists of measurements of SD and ρ (enabling calculation of SWE) recorded at stations operated by various hydropower companies since 1914 (hereinafter referred to as the "HPC-data").Since the seNorge snow map simulations start in 1 September 1957, data before that is not considered in the following sections.
The spatiotemporal distribution and other features and differences of these two snow data sets are shown in Fig. 1 and listed in Table 2.
While the met.no-data has one hundred times more observations and a better spatiotemporal coverage than the HPCdata, ρ has not commonly been observed at the met.nostations.This complicates model evaluation somewhat, since a good model fit in SD can result by compensating model biases, e.g.due to overestimation in both SWE and ρ.The HPC-data, however, includes measurements of both SD and ρ, which is an advantage in seNorge model evaluation.The met.no-data also represents point observations, while the HPC-data is based mostly on averages of several observations taken over a snow course, which leads to smaller uncertainty in observing the grid cell mean snow conditions.Moreover, the potential model uncertainty due to input forcing should be less at the met.no-stations than generally in model grid cells, since data from the same met.no-stationsare often used in the interpolation of T and/or P to the model grid cells.Consequently, the model biases and uncertainties detected at the HPC-stations are probably more representative for the majority of the seNorge model grid cells than those detected at the met.no stations.The melting season is, however, better resolved in the met.no-data.These and other sources of uncertainty are further discussed in Sect. 4. All the evaluation statistics were calculated using the "R" statistical software (www.r-project.org).

Model performance against snow data from met.no-stations
The met.no-data of daily SD measurements were downloaded from the met.no climate data web service (eklima.met.no/wsKlima).Measurements at 10-day intervals were extracted from the original daily time series from 1957-2011 for model evaluation purposes.Since it was difficult to separate between true SD zero-values (i.e.bare ground) and missing observations, all zero or negative values in the data set were replaced by missing values, and thus only positive SD values were considered in the model evaluation.One clearly erroneous value was removed.
The simulated SD values in grid cells corresponding to the met.no-stations were extracted from the seNorge model results, and only those grid cells where the difference between station and grid cell elevation was within ± 100 m were used.After this matching, there were 392 547 observationsimulation pairs of SD at 1105 met.no-stations, which could be used in model evaluation (see Fig. 1 and Table 2).Of these, 107 267 (27 %) pairs had either the simulated or the observed SD ≤ 10 cm.The number of observations varies between 4000 and 10 000 per snow season for 1957-2011 and decreases strongly with elevation, as most of the met.nostations are located in lowland regions (Fig. 1).
The overall model fit with observations was evaluated by calculating the squared correlation coefficient (R 2 -value)    between simulated and observed values.The R 2 was 0.61 in the case of the original SD values and 0.56 in the case of the log 10 -transformed SD values.A spline-based general additive model (GAM) fitted to the cloud of points of observed vs. simulated SD (not shown) revealed, that the seNorge model fit switches from an overestimation for the lowest observed SDs below ca.30 cm to a slight underestimation above that.However, since SD varies strongly with elevation, region and time of the snow season, it is important to analyse the differences between simulated and observed SD ( SD = SD sim − SD obs ) further along these three main covariates.
The different percentiles of the SD distribution along t s are shown in Fig. 2.This figure shows almost no systematic bias in SD, until the main melting season from the middle of March onwards, when the model starts to increasingly overestimate the observed SD.However, it is worth bearing in mind that only non-zero, positive SD values are used, and thus the number of observations varies strongly along t s , having a maximum around February, and being much lower both at the start and end of the snow season, as Fig. 2 shows.Moreover, the median station elevation is negatively correlated with the number of available observations, reaching a minimum elevation around February.
Since the variance of SD increases along t s , an alternative model fit measure was calculated, namely the relative difference (i.e.ratio) between simulated and observed SD ( SD * = SD sim /SD obs ).As Fig. 2 shows, the variance of SD * is more constant along t s , except in the main melting season.Consequently, the relative difference model fit measure is preferred for SD and SWE in the following model evaluations.Assuming that the station-wise log 10 ( SD * ) is normally distributed (visually confirmed), then Mean (log 10 ( SD * )) ≈ Median ( SD * ), and an 80 % confidence factor CF 80% can be applied to describe the station-wise variability around the median value.For example, a CF 80% ( SD * ) = 2 means that 80 % of the SD * values are between 1/2 and 2 times the median value.
Figure 3 shows the spatial distribution (station position and elevation) of the station-wise Mean (log 10 ( SD * )) for two selected dates along t s , for stations with measurements from at least 15 snow seasons.Those stations where the systematic bias is not detected to be statistically significantly larger than −23 to +30 % (corresponding to a factor larger than 1.3 deviation from a "perfect match") were categorized as "good match" stations.In addition, stations at each investigated t s (every 30 days from 30 December to 29 May) were binned by 200 m elevation intervals, and median, 10 and 90 % percentiles of the station-wise means and variability were calculated for the bins containing at least 15 stations.These results show that from January through March the average station-wise Median ( SD * ) in all bins are within −14 to +22 % of the exact match, as the example for 30 March in Fig. 3 illustrates.At the end of April, however, the station-wise Median ( SD * ) values are generally positively biased, increasingly so at higher elevations (median bias of +25 % in the 0-200 m a.s.l.bin, and +108 % in the 400-600 m a.s.l.bin, Fig. 3).Similarly, from January through March the average station-wise CF 80% ( SD * ) is 1.3-2.3(the lower the bin elevation, the higher the value), and at the end of April, somewhat higher (1.8-2.6).The percentage of "good match" stations is 72-83 % before April, but decreases to 42 % at the end of April.
Figure 3 also shows the geographical distribution of the met.no-stationswhere SD is significantly systematically under-or overestimated (shown for 30 March and 29 April).No very distinct clustering patterns appear, except that no significant bias is detected at a majority of the stations in the eastern, more continental half of the southern Norway on 30 March (or before that; not shown), until the general overestimation pattern starts to prevail from April onwards.

Model performance against snow data from the HPC-stations
The HPC-data consists of measurements of SD and ρ (enabling estimation of SWE) taken by various hydropower companies since the 1910s.This data is managed by NVE, where the whole data set has recently been quality controlled by removing or correcting bad or duplicate values and outliers.Only data where both SD and ρ were reported in the period 1957-2011 were used in model evaluation (SWE = ρ • SD).Most of the measurements (60 %) are taken once per snow season around the time of the maximum annual SWE (March-April), but at many stations several (up to 13) measurements per snow season have been taken.A snow measurement from an HPC-station is normally based on a snow course, but may also be based on a sample taken at one or more points at/around the station.
As in the case of the met.no-data (see previous section), zero SWE and SD values were removed from the data set (< 1 % of the data), and the seNorge model-simulated SD and SWE values in grid cells corresponding to the HPCstations were extracted, resulting in 32 256 observationsimulation pairs of SD, SWE and ρ at 1207 HPC-stations (see Fig. 1 and Table 2).The station elevation ranges from 139 to 1700 m a.s.l., with most of the observations located on the central mountain area in the southern Norway, and most of them (70 %) taken between 600-1300 m a.s.l.(Fig. 1).The number of observations varies between 200 and 900 per snow season from 1957-2011 (Fig. 1).Mostly the same methods as described in the previous section are also applied to model evaluation with the HPC-data.
The R 2 -value for SWE and SD (simulated vs. observed) were 0.64 and 0.58, respectively (0.60 and 0.53, respectively, in case of the log 10 -transformed values).For ρ the R 2 -value was 0.45.The fitted GAM-curves (not shown) revealed a general overestimation for SWE and SD (more so for SWE than SD, due to the parallel overestimation in ρ).The general positive bias for ρ, however, decreases towards higher www.the-cryosphere.net/6/1323/2012/The Cryosphere, 6, 1323-1337, 2012  , where the station-wise median ΔSD* is not (green; "good match"), or is detected to be statistically significantly smaller (red; "underestimation") or larger (blue; "overestimation") than the threshold window of -23 to +30 % (corresponding to larger than a factor 1.3 deviation from a "perfect match").A two-sided p-value <0.05 is applied as the significance level.In the upper panel, the solid and dashed lines denote the median and the 10/90 % percentiles of the station-wise means in each 200m elevation bin, respectively.The total number of stations is 539 and 252 in March 30 and April 29, respectively.
Fig. 3. Spatial distribution (with elevation and station position) of the station-wise mean log 10 -transformed ratio ( SD * ) between simulated and observed snow depth at two selected dates at met.no-stations from 1957-2011.The green, red and blue circles indicate stations with at least 15 years of observations, where the station-wise median SD * is not (green; "good match"), or is detected to be statistically significantly smaller (red; "underestimation") or larger (blue; "overestimation") than the threshold window of −23 to +30 % (corresponding to a factor larger than 1.3 deviation from a "perfect match").A two-sided p-value < 0.05 is applied as the significance level.In the upper panel, the solid and dashed lines denote the median and the 10/90 % percentiles of the station-wise means in each 200-m elevation bin, respectively.The total number of stations is 539 and 252 on 30 March and 29 April, respectively.
values of observed ρ, indicating that the current density algorithm overestimates compaction the most at the lower density range.
As in the previous section, Fig. 2 shows the different percentiles of the distribution of SD * and SWE * (= SWE sim /SWE obs ) along t s .The same every-10day time resolution is used, as with the met.no-data.However, since the HPC-data is not sampled daily, date-windows of 10 days around the dates in t s are applied to bin together observations taken at almost the same dates.The number of stations varies strongly along t s , as Fig. 2 shows, peaking around late March and early April, i.e. at the time of the annual maximum SWE.The number of stations is much lower both at the start and end of the snow season, and thus the melting season is not well represented in this data (see Fig. 2).
Due to the lower number of stations, as well as variation in median elevation of the observations (Fig. 2), both SD * and SWE * have a more ragged pattern along t s than the met.no data.The overall variance of SD * and SWE * seem to be rather invariant along t s , except during the main melting season in April, when the 5 to 95 % percentile intervals increase markedly.The median of the difference between simulated and observed ρ ( ρ = ρ sim − ρ obs ) reaches a maximum of +0.12 kg L −1 in early March, being somewhat smaller both at the beginning and end of the snow season (Fig. 2).
Since most of the HPC-data (especially those stations with longer time series) are located in southern Norway, this data is not as well suited as the met.no-data for assessment of regional variations in the station-wise bias in model fit.Therefore, the spatiotemporal analysis (i.e.calculating median, 10 and 90 % percentiles of the data in 200 m elevation-bins at different dates along t s ) is conducted both for all data and for station-wise means.The results are shown in Fig. 4, and reveal a rather similar pattern for SWE * from the end of January to the end of March, namely a general positive bias www.the-cryosphere.net/6/1323/2012/The Cryosphere, 6, 1323-1337, 2012 increasing with elevation between roughly 500-1000 m a.s.l.
For example, at 30 March the median bias is +34 and +100 % at the bins centered at 500 and 1100 m a.s.l., respectively.Due to overestimation in ρ (median ρ ∼ +0.1 kg L −1 ), the corresponding bias for SD * is somewhat less (−5 and +54 %, respectively).On 29 April, the significant bias in SWE * still remains above ∼ 500 m a.s.l.(+50 to +100 %) (see definition in previous section) for SWE was 57 and 28 % in 28 February and 30 March, respectively.The large-scale regional differences in SWE * were roughly investigated by removing all data in southern Norway (south of Trondheim; see Fig. 1).Although the data coverage in northern Norway is rather limited (5 % of all data), the results (not shown) did not indicate any less bias in SWE than seen in the whole dataset.A plot similar to Fig. 4 showed a median bias of +76 and +57 % at 30 March in elevation bins centered at 500 and 700 m a.s.l., respectively.

Discussion
There are several possible sources of uncertainty which can explain the differences seen between the simulated and observed snow conditions in the two previous sections.Firstly, there is the uncertainty connected to the seNorge model structure (process representation, subgrid variability), model parameters, and model input forcing (i.e.observations of P and T and their interpolation to grid cells).Secondly, there is the uncertainty connected to the rather substantial natural spatial variability in snow conditions within the model 1 × 1 km grid-cells (see e.g.Clark et al., 2011), and to how well the point-and snow course-based measurements can capture this variability.Finally, there is also the uncertainty connected to the snow observations, although this uncertainty is often relatively small when compared to the spatial variability, for example (except for density, which can sometimes be demanding to measure).
As summarized in Table 2, the two data sets differ in many ways from each other.The HPC-data seems to give a more representative picture of the seNorge model fit in Norway in general than the met.no-data, since the interpolation uncertainty in the forcing data at the met.no-stations is likely smaller than in most of the other model grid cells (see Sect. 3).Moreover, since the met.no-data consist of point observations, they have larger uncertainty in representing the grid cell mean snow conditions than the HPC-data, which is based mostly on averages over a snow course.On the other hand, due to less uncertainty in the forcing data, the met.nodata could provide a better basis for isolating the uncertainty connected to the model structure and parameters, had it not been for the fact that the lack of measurements of ρ makes the model evaluation more difficult, since a good fit for SD can arise from compensating biases in simulating SWE and ρ.
The results presented in the previous sections confirm the general and sometimes rather substantial overestimation of SWE and ρ, which has been pointed out by previous evaluation studies (Sect.1).However, this study has contributed www.the-cryosphere.net/6/1323/2012/The Cryosphere, 6, 1323-1337, 2012 to refining and quantifying the knowledge on the seNorge model performance by revealing the dependencies of the model fit by region, by the date of the snow season, and especially by elevation.Although 60 % of the area of Norway is located below 600 m a.s.l., the higher elevation regions are of special interest (e.g.hydropower production).Since the positive bias for SWE increases with elevation, it could be, as indicated by previous studies, that the elevation-dependency in the interpolation of the forcing data is too strong for P and/or T , leading to increasingly excessive snow production along increased grid cell elevation.This explanation seems plausible, since most of the meteorological stations used in the interpolation are situated in the lowland areas, so that the interpolated values at the higher elevation sites become more uncertain.Other elevation-dependent processes (or lack thereof) responsible for the increasing bias along elevation could be, for example, the missing simulation of wind effects in the seNorge model, such as sublimation in blowing snow.
Wind certainly also redistributes snow within a grid cell, but the transport of snow between the grid cells is probably negligible.Armstrong and Brun (2008) reviewed some blowing snow model studies for Arctic regions, which showed that loss by sublimation could be 9 to 47 % of annual snowfall, depending on the topography and vegetation type.Thus, although sublimation could account for a fraction of the overestimation in SWE it seemingly cannot alone explain losses of the order of what is shown in Fig. 4. The evaluation results with the HPC-data indicate that ρ is also overestimated in the lower elevations, thus indicating that the rather good model fit seen for the SD in the met.nodata might be coupled with an overestimation of SWE.However, it is difficult to confirm this due to low number of lowland HPC-data, and since measurements of ρ are generally lacking below 200 m a.s.l., where the seasonal evolution of the ρ and SWE may be characterized by many accumulation and melting periods, and thus be different from the higher, more alpine and stable snow climate.The overestimation of ρ does not seem to be caused by the overestimation of SWE though (i.e.due to more overburden weight in the snow pack and thus more compaction), as the lacking coherence between these variables in Fig. 4 indicates.Thus, adjustments to the snow compaction algorithm may be necessary in future model versions in order to correct these biases.In fact, preliminary model testing indicates that the second step in the compaction algorithm ( SD 1 in Eq. 14a) might be unnecessary, causing some of the model density overestimation.
For comparison, we also tested whether an empirical statistical snow density model, e.g. that of Sturm et al. (2010), would perform better than the more process-based density algorithm in the seNorge model.The model by Sturm et al. (2010), which estimates ρ as a function of snow climate type (alpine assumed in our case), SD and the day of the year, showed less bias than the seNorge density model (+0.04 vs. +0.1 kg L −1 ), but the variability in ρ was still better simulated in the seNorge model (R 2 = 0.45) than in the model by Sturm et al. (2010) (R 2 = 0.32).
In the melting season (as illustrated by the difference from 30 March to 29 April in Figs. 3 and 4) the spread in SWE * increases in the HPC-data, while the bias approximately remains the same.For the met.no-data, the bias in SD * increases during the melt season, and on 29 April the positive bias increases also with elevation, which is not seen on 30 March (Fig. 3).This indicates that the positive bias here is due to model underestimation of the melting rates, since the bias in bulk density seems to be decreasing along the melting season (Fig. 4).There is also an indication for increased bias during the melting season in the HPC-data at the lower elevations below ca.1000 m a.s.l.However, it is worth bearing in mind that the melting season is not as well represented in the HPC-data as in the met.no-data, and that the number of stations decreases strongly from the end of March to the end of April in the HPC-data.
Despite the rather substantial biases, the seNorge model describes rather well the variability in snow conditions, as the R 2 values indicate (53-60 % of the observed variance in log-transformed SWE and SD are explained by the model).One way of utilizing this strength of the model is to use the snow simulations in a relative sense, e.g. by calculating the ratio between current SWE in a grid cell and a corresponding long-term median SWE.In these relative units most of the systematic bias is removed.For example on 30 March, in the 1000-1200 m a.s.l.bin, the 80 % confidence range for SWE * is +26 to +203 % (Fig. 4), while when using stationwise relative units (stations with at least 15 years of observations included) the bias is removed and 80 % confidence range is now −28 to +38 % (similarly, −25 to +45 % in the lower 400-600 m a.s.l.bin).
As already pointed out above, the observational uncertainty is largest for the point observations, but neither a mean over a snow course can give a "true" mean of the snow conditions within, for example, a 1 × 1 km grid cell.Thus, even with a hypothetically perfect model, no perfect match with observations can be expected.Considerable spatial subgrid variability in SWE and SD is caused, among others, by local precipitation patterns and by wind redistribution of snow, which again is dependent on the type of vegetation and topography (e.g.Armstrong and Brun, 2008;Clark et al., 2011).As an example of the observational uncertainty, the high-resolution SD measurements across the Hardangervidda mountain plateau (Ragulina et al., 2011;see Sect. 1) showed that a single point measurement of SD has typically an 80 % confidence range of −60 to +70 % in estimating the mean SD over 1 km subsections of the approximately 80 km long transects.By replacing the single point measurement by a mean over a snow course, (by taking a mean SD of 30 samples) the typical 80 % confidence range is reduced to ± 10 %.This example illustrates the generally better accuracy in the HPC-data, which is mostly based on snow courses, than in the point measurements of SD taken at The Cryosphere, 6, 1323-1337, 2012 www.the-cryosphere.net/6/1323/2012/ the met.no-stations.The confidence ranges in the above example apply for a treeless and hilly terrain over 1000 m a.s.l., and likely represent an upper range of variability in Norway, though.Subgrid variability in SD is likely somewhat smaller at the lower-lying met.no-stations,where the wind redistribution effect may be less pronounced and where the location of the snow stakes is fixed and can (in principle) be selected to be as representative as possible for the surrounding terrain.
The high-resolution SD data from the transects across Hardangervidda (Ragulina et al., 2011) were also compared to the corresponding seNorge model-simulated values.The SD * values from this additional comparison were similar to those calculated for the HPC-data in Sect.3.2.This high-resolution SD data can also be used to illustrate the effect of bias correction: if the seNorge simulation results were corrected for the +54 % bias indicated by the HPCdata (on 30 March in the 1000-1200 m a.s.l.elevation bin), then the observed mean SD for the Hardangervidda transects from 2008, 2010 and 2011 were over-/underestimated by the seNorge model only by +5, −13 and −10 %, respectively.This also indicates that the averaging over several model grid cells reduces the variability in the model fit, as compared to the grid cell-wise evaluation results presented in this paper.
It is clear that the seNorge snow model needs at least a recalibration, or maybe also revision of some of the algorithms, in order to increase the model performance and to properly remove the biases revealed in this study.In addition to the already indicated revisions of the input forcing data and of the compaction algorithm, also better representation of the effects of vegetation (especially forest canopy) on snow accumulation, sublimation and melting, as well as the use of input data with higher temporal resolution in the model should be tested and evaluated.
A suitable model calibration method should be able to utilize the strengths of the two datasets (Table 2).Since it is possible that several different parameter combinations can give equally good model fit (the so-called equifinality problem, see e.g.Beven, 2006), the calibration method should be able to search for all plausible combinations of the model parameters that can give statistically optimal model fit with the observations, taking into account also the uncertainty connected to the observations.One method that has shown to be able to fullfill these calibration requirements is the Markov chain Monte Carlo simulation (e.g.Gamerman, 1999;Saloranta et al., 2009).
The data presented in this paper can serve as a benchmark data set, against which new seNorge model calibrations and developments, or even new alternative model codes, can be tested and evaluated.In the future evaluation work, also other snow data sources, such as satellite images of snow-covered areas, high-resolution spatial SD distribution data from airborne laser-scanning, and high-resolution timeseries of SWE and SD from snow pillows should be used.Moreover, the results presented in this paper can be used in intercomparison of the accuracies in different snow map production methods.The seNorge snow mapping method is seemingly only applied in Norway at present, but it could relatively easily be applied and tested in other countries too, especially where rugged topography and lack of observations hampers the regular use of interpolation and satellitebased snow mapping methods.The results in this paper can also provide an indirect evaluation of the gridded T and P data used for model forcing, especially at the HPC-stations which are often located at higher elevations and far outside the met.no station network.A better coverage of stations measuring T and P , especially at the higher elevation sites, would likely contribute to more accurate input forcing data that would likely also increase the accuracy and precision of the simulated snow maps further.
While waiting for the revised/recalibrated version of the seNorge model (a natural next step from this study), the bias-corrected or relative seNorge simulation results are recommended to be used, in order to increase the quality of the information on snow conditions from the snow maps of Norway.

Conclusions
Owing to the large spatiotemporal variability of snow conditions in the mountainous Norway, the snow maps simulated by the seNorge model provide probably the best overview of the current and past snow conditions for Norway in general.This snow map production method is relatively simple, not very data-demanding, yet still process-based and able to provide snow maps of high spatiotemporal resolution (daily, 1 × 1 km grid cell).
The statistical evaluation of the seNorge snow model against the two large and mostly non-overlapping datasets from the HPC-and met.no-stations has revealed and quantified the model performance, i.e. the variability and biases between the model simulations and the observations in the corresponding model grid cells.The HPC-data shows that the seNorge model generally overestimates SWE, and that the distribution of model fit for SWE shows a clear dependency on elevation throughout the season.Especially between roughly 500-1000 m a.s.l. the model bias clearly increases (or becomes less negative) with elevation.Thus, for example, around 30 March, the model overestimates SWE on average by +34 % in the 400-600 m a.s.l.bin, while by 100 % in the higher elevation 1000-1200 m a.s.l.bin.There is no clear seasonal variation in the model fit with the HPCdata.Moreover, the seNorge model overestimates ρ on average by approximately +0.1 kg L −1 , and there is only a moderate variation in the model fit along the snow season and elevation for ρ.
The the uncertainty in model results can be significantly reduced by using relative units where current SWE is expressed as a percentage or a fraction of a long-term median value.In these relative units the bias in the simulation results is mostly removed and in 80 % of the cases the model results match the observed SWE within a factor of roughly 1.5 in both directions.By taking areal averages over several grid cells, these confidence intervals are further reduced.
Based on the model evaluation results, there are three recommendations for further model development with which the quality of the seNorge model results (and thus of the snow maps for Norway) could be increased, namely: (1) revision and/or calibration of the density algorithms (Eqs.13-15), in order to diminish the positive model bias in ρ, (2) calibration of the degree-day factors C M , in order diminish the positive model bias in SD (and SWE) in the melting season, as seen especially in evaluation against the met.no-data, and (3) closer investigation of which elevation dependencies (or lack of them) in the model or forcing data could best explain and diminish the increasing positive model bias in SWE with elevation.

Fig. 1 .
Fig. 1.Spatiotemporal distribution of the snow measurements in the (a) met.no-data (392 547 samples in total, varying from 10 to 1170 per station) and (b) HPC-data (32 256 samples in total, varying from 1 to 239 per station) from 1957-2011.The colours in the map denote the number of observations per station, as indicated in the legend.The contours in (b) show the number of samples per 10 × 10 km grid cells (dashed contours denote 15 samples, solid contours 50, 100 and 150 samples), and the city of Trondheim marks the division line between southern and northern Norway.

Figure 3 .
Fig. 2. (a) Seasonal changes in the number of observations, in their median and 25/75 % percentile elevation, as well as in the observed median snow depth (SD), snow water equivalent (SWE) and bulk snow density (ρ) in the met.no-andHPC-data from 1957-2011.The dashed line denotes the "low SD" subset of the met.no-datawhere the simulated and/or observed snow depth (SD) is < 10 cm.(b) Seasonal changes in the median and 5, 25, 75 and 95 % percentiles of the distribution of the difference ( SD) and log 10 -transformed ratio ( SD * ) between the simulated and observed SD from 1957-2011 in the met.no-data.(c) Same as (b), but now for the HPC-data, for log 10 -transformed ratio ( SWE * , SD * ) between the simulated and observed SWE and SD, as well as for the difference ( ρ) between the simulated and observed ρ.Only values based on at least 250 measurements (50 in HPC-data) per date are shown.Note that some of the 5 % percentile values in (b) are zero, and thus cannot be properly plotted on log-scale.The total number of observations in the figure is (b) ∼ 392 000 and (c) ∼ 32 000.The Cryosphere, 6, 1323-1337, 2012 www.the-cryosphere.net/6/1323/2012/
R 2 -values of 0.60 and 0.45 for log-transformed SWE and ρ, respectively, indicate that the model performs rather well in simulating the observed variability in SWE and ρ, despite of the overestimation of absolute values.Consequently,

Table 1 .
Engeset et al., 2004b)del parameters with their default values and units.The values have been set on the basis of literature, expert judgement as well as model evaluation against observations (seeEngeset et al., 2004b).

Table 2 .
Features of the two snow data sets used in the seNorge model evaluation (data since 1 September 1957 is considered).SD, SWE and ρ denote snow depth, snow water equivalent and bulk density, respectively, and "m a.s.l." elevation (meters above sea level).