A statistical approach to represent small-scale variability of permafrost temperatures due to snow cover

Introduction Conclusions References

3. Section 3.2: Due to the great importance of nF/nT on your results, it would be nice to include a short section critical appraising the various pros/cons of such statistical approach in the context of permafrost modelling. A very first thought is how spatial and temporally consistent are these relationships likely to be? Where were they developed? Over what period of time? You of course mention the variability of snow depth as being a large driver in the variability you see in nF/nT (motivation for this paper) but what else is significant?
We have included the following section at page 6668, line 26: The relationships between n-factors and snow cover in open areas are shown to be consistent within the two sites in southern Norway (Gisnås et al. 2013 andGisnås et al. 2014). Due to lack of field observations including all required variables at one site in northern Norway, the relation is not tested for this area. However, it fits very well with a detailed study with 107 loggers recording the variation in ground surface temperature at a lowland site in Svalbard (Gisnås et al. 2014). Other factors, such as solar radiation and soil moisture, have minor effects on the small-scale variation in ground surface temperatures in these areas. Gisnås et al (2014) demonstrated that most of the sub-grid variation in ground temperatures within 1x1 km areas in Norway and Svalbard was reproduced by including only the sub-grid variation of snow depths. In other areas other parameters than snow depth might have a larger effect on the ground surface temperatures, and should be accounted for in the derivation of n-factors. 4. Section 3.2: Following on from the point above, you state that the relationship between n factors and snow depth is based on 13 stations in S.Norway and 80 loggers in Finse and Juvvasshoe. This seems to be quite geographically limited. Can you briefly state if/how you might expect these relationships to vary with space, i.e what might they look like in Lyngen or Finnmark?
Compared to the total model domain we agree that these observations are limited. However, compared to the amount of available datasets including systematic measurements of ground surface and air temperatures together with snow depths in the same point location, these datasets are quite unique on global basis. The relationships for n-factors in vegetated areas will vary within different species, and this is not discussed here. However, because permafrost is not present in vegetated areas in Norway, we have not focused on the variation within these surface classes. The variation in the relation between n-factors and snow depth is not examined in northern Norway because we lack detailed field observations in this area. However, the dataset from Ny-Ålesund, which includes 107 loggers in a 1x1 km area, shows very similar dependencies as the data from southern Norway, even though this site is a lowland site (20 -40 m a.s.l.) with higher soil moisture and finer sediments.
We have included some comments on this in the section at page 6668, line 26, described in the previous point: The relationships between n-factors and snow cover in open areas are shown to be consistent within the two sites in southern Norway (Gisnås et al. 2013 andGisnås et al. 2014). Due to lack of field observations including all required variables at one site in northern Norway, the relation is not tested for this area. However, it fits very well with a detailed study with 107 loggers recording the variation in ground surface temperature at a lowland site in Svalbard (Gisnås et al. 2014). Other factors, such as solar radiation and soil moisture, have minor effects on the small-scale variation in ground surface temperatures in these areas. Gisnås et al (2014) demonstrated that most of the sub-grid variation in ground temperatures within 1x1 km areas in Norway and Svalbard was reproduced by including only the sub-grid variation of snow depths. In other areas other parameters than snow depth might have a larger effect on the ground surface temperatures, and should be accounted for in the derivation of n-factors. 5. P.6678, l.6-10: You mention the question of equilibrium with surface forcing on climatic scales, but how about seasonal lags ie. its quite typical to see max. Temperatures at 10m or so at around beginning of winter when summer forcing has been conducted to depth. Therefore to compare model and obs (even assuming you describe conductivities perfectly) you need to drive your model with at least 6months previous atmosphere to get the warming/cooling signal of that time slice. This could have an impact on your model performance, especially if there is an extreme season (dry, warm etc) missed in the simulation. Maybe I miss something here, but that brings me to the following point.... The reviewer makes a valid point. However, since we used field data distributed over larger areas and over longer time periods including all kinds of situations, the effect would mainly show in terms of a larger statistical spread, and not a systematic error. Using data from six months before is not good either, since this will vary quite a bit depending on the ground thermal properties of each single site. This is already partly commented on in the current manuscript p. 6678, line 15 -20: "For the model evaluation with measured ground temperatures in boreholes (Sect. 5.4), the modelled temperatures are forced with data for the hydrological year corresponding to the observations. Because of the assumption of an equilibrium situation in the model approach, such a comparison can be problematic as many of the boreholes have undergone warming during the past decades. However, with the majority of the boreholes located in bedrock or coarse moraine material with relatively high conductivity, the lag in the climate signal is relatively small at the depth of the top of permafrost." We include the following sentence after this section (Page 6678, line 21) to make this point clearer: The lag will also vary from borehole to borehole, depending on the ground thermal properties. Since we use data distributed over larger areas and longer time periods, including a large range of situations, the effect mainly shows in terms of a larger statistical spread and not a systematic error.  2) This is partly answered in point 2 above. We have included the following for clarification: Page 6667 line 15-18: MAGT is defined as "Mean Annual Ground Temperature at the top of the permafrost or at the bottom of the seasonal freezing layer". Page 6670, l-8: For the evaluation runs the model is forced with climatic data for the hydrological year corresponding to the observations.

In general you use a large amount of data and have a reasonable complex modell setup with multiple simulations and evaluations against various datasets. At times I felt a little lost on what was being computed, when and how. I think the paper would benefit tremendously from 3 additions: (1) a schematic of the model chain to give a very quick overview of the setup (forcing, permafrost model, wind model, subgrid distribution routine, calibrations and evaluations). (2) A table giving all data used together with details such as time period, depths of boreholes etc. (3) A table describing all your simulations with important information such as simulation period(s) -which I am really
We have also included a full table of boreholes and ground surface temperature loggers in the supplementary material, giving the location, depth of boreholes, measurement periods, and vegetation type. We refer to this in the text as follows: Page 6670, line 21: Tables of ground surface temperature loggers (Table S1) and boreholes used for validation (Table S2), are included in the supplementary material.
3) The model is forced with annual thawing and freezing degree days calculated over hydrological years. The main permafrost distribution results are given as an average over the 30-year period 1981 -2010. For validation with ground surface temperature loggers and boreholes temperatures, the degree days forcing the model are calculated over the same hydrological year as the observation. This will therefore vary, but is not defined in the supplementary material (S1 and S2). There are no other periods used. The date range 1961 -2013 is only the years with available climate forcing. We understand from the comments that this was confusing, but we think that some clarification in the text is better than another table. Instead we have included the tables of ground surface loggers and boreholes in the supplementary material (see point above), and made the following changes in the text: Yes, a detailed description was published in Saloranta et al. (2012), and it is also partly described in Engeset et al 2004. The following references are moved down from the previous sentence for clarification : (Engeset et al. 2004;Saloranta, 2012) 8. I think it is important to mention in the discussion that due to statistical nature inherent in core methods there maybe difficulties in inferring conclusions about future development of permafrost. That's not to say this contribution isn't valuable -just to include some discussion of possible limitations.
We have already discussed this on page 6678 l. 6-22: "CryoGRID1 is a simple modelling scheme delivering a mean annual ground temperature at the top of the permanently frozen ground based on near-surface meteorological variables, under the assumption that the ground thermal regime is in equilibrium with the applied surface forcing. This is a simplification, and the model cannot reproduce the transient evolution of ground temperatures. However, it has proven to capture the regional patterns of permafrost reasonably well (Gisnås et al., 2013;Westermann et al., 2013). Because of the simplicity it is computationally efficient, and suitable for doing test-studies like the one presented in this paper and in similar studies (Westermann et al., 2015).
For the model evaluation with measured ground temperatures in boreholes (Sect. 5.4), the modelled temperatures are forced with data for the hydrological year corresponding to the observations. Because of the assumption of an equilibrium situation in the model approach, such a comparison can be problematic as many of the boreholes have undergone warming during the past decades. However, with the majority of the boreholes located in bedrock or coarse moraine material with relatively high conductivity, the lag in the climate signal is relatively small at the depth of the top of permafrost." To comment it more explicit we have now added the following sentence on p. 6678, l. 10: ", and is therefore not suitable for future climate predictions." 9. Topography isn't mentioned anywhere in the methods -can air temperature and exposure to solar radiation be important predictors for subgrid variability of permafrost within 1km grids? Particularly in the south? Both variables are reasonably easy to distribute based on terrain parameters. Is there a reason not to do this? If so can you provide some references justifying the omission. I did find this reference (also cited by you in another context) which discuss some of these points (and possibly in the end favours ignoring topography) -but I think this deserves a short discussion: Topography is absolutely discussed as the main driver for the snow distribution. But, correctly, this paper only accounts for the variation in snow depths as the driver for the variation of ground temperatures within 1x1 km. The relation between snow cover and surface offset in this study shows that more than 60 % of the variation in nF and almost 50 % of the variation in nT is explained by snow depths. The same logger sites were also analyzed with respect to aspect, slope, solar radiation, vegetation and sediment type. With the now four years of data we find that maximum snow depth is the main explaining variable for the spatial variation in both nF and nT at all three field sites. The timing of melt out, or length of summer season, has a significantly higher correlation to maximum snow depth than to solar radiation. Gisnås et al. (2014) show that the observed small-scale distribution in MAGST could to a large degree be explained including only the sub-grid variation in maximum snow depths. It was concluded in Gisnås et al. (2014) that maximum snow depth is the main explaining variable for the spatial variation of ground surface temperatures within 1 x 1 km areas at the three field sites in southern Norway and Svalbard. Based on the study by Gisnås et al. (2014) this paper aims to implement sub-grid snow distribution over larger areas. This is a fundamental point for this study, and as we realize that this was not entirely clear, we include the following sentence in the introduction at page 6663, line 14: Gisnås et al., (2014) show that the observed variability in ground surface temperatures within 1 x 1 km areas is large degree reproduced by only accounting for the variation in maximum snow depths.
We also found that the reference (Gisnås et al. 2013) in the previous sentence is wrong, and it is now corrected to (Gisnås et al. 2014).

P.6666, l.25: "ALS" is mentioned for the first time without explanation of acronym.
"the ALS" is changed into "an Airborne Laser Scanning (ALS) of snow depths (see Sect. 4.1) " 2. P.6669, l.21: accent on "a" is not needed in English.
"a" is deleted.
3. P.6669, l.21: ">4000 grid cells in 70% of the areas" -I didn't understand this sentence, can you make it more clear what you mean? Why do the coarse grids of fixed area (0.5x1km) have varying numbers of 10x10m subgrids?
The sentences have been changed into: Each 0.5 km x 1 km area includes 500 to 5000 grid cells a 10m x 10m, depending on the area masked out due to lakes or measurement errors. There were > 4000 grid cells in 70% of the areas. Figure 2" is the wrong reference here.

P.6670, l.21: I think "
That's correct. It should be Figure 1, and is now changed. 5. P.6670, l.25: Can you specify "10 m above surface" for the wind variables you use -I think that is whats meant.
Airborne Laser Scanning (ALS) is changed into ALS.
7. P.6671, l.9: ALS data instead of ALS scan? As 'S' already stands for 'scanning'. This is true. However, we find that "survey" is more precise than "data" in this sentence. "Scan" is therefore changed into "survey".
The wind speeds are from a dataset dynamically downscaled from ERA-40 (see page 6670-6671). The bias-correction is simple, and all wind speeds (regardless of altitude) are increased with 60 % (p. 6671 line 5), which is derived from validation with weather stations in mountainous areas. We are aware that this is a rough approximation, and because of the poor quality, the wind speed data is only used to select the wind events accounted for when calculating the fraction of wind directions. For clarification we made the following change: Page 6671, l. 5-6: For these areas the forcing dataset has been linearly increased by 60 %.

P.6671, 7-10: What is the resolution of the raw ALS data?
The survey was done with nominal 1.5 m x 1.5 m ground point spacing. The following is included in line 10, p. 6671: The ALS survey is made along six transects, each covering a 0.5 km x 80 km area, with nominal 1.5 m x 1.5 m ground point spacing.
12. P.6671, l.22: These elevations seem very similar to me, 1300/1450m -is it really significant as a difference between sites?

P.6672, l.15: How was this interpolation done?
We have also revised the following sentence for clarification: The dataset, available for the period 1961 -2013, is based on air temperature and precipitation data collected at the official meteorological stations in Norway, interpolated to 1 km x 1 km resolution applying Optical Interpolation, following the methods of Frei (2014).

P.6679, l.8 What was the conclusion of Luce and Tarboton?
They conclude in the paper that "Dimensionless depletion curves depend primarily on the CV and to a lesser extent on the shape of the snow distribution function, and are a generalization of previously presented methods for depletion curve estimation." We refer to this saying: "This result contradicts the conclusions by Luce and Tarboton (2004), suggesting that the parameterization of the distribution function is more important than the choice of distribution model.». For clarification we change «suggesting» into «which suggest».

Figure 8: over what time period is the data in this correlation from?
For the validation the model is run for the same periods as the years of observations in the boreholes and ground surface temperature loggers, respectively. See page 6678 l. 15-18. To clarify this point we have now provided an overview of the validation data, including the years of observation at each point as a supplementary table (see previous points).