|The authors run a series of simulations of permafrost dynamics using the MERRA2 reanalysis and the MERRA land model. They compare the results to remotely sensed active layer thickness (ALT) and in situ measurements of ALT. The paper has the potential to be a solid model-data comparison of simulated permafrost dynamics. I recommend acceptance after major revisions.|
I have three major comments:
1) The authors need to account for measurement uncertainty when comparing to observations and refine their statistical comparison techniques. I found a number of errors in the statistical comparisons that I identify below.
2) The authors need to clarify the role of the remote sensing data in this analysis. They spend as much space comparing the remotely sensed ALT with in situ data as with the model. Is the paper a means to validate the model or the remote sensing data?
3) The authors need to change their spinup procedure or drop the trend analysis. Repeating the full time period for spinup introduces a dynamic response that produces false trends aliased on top of real trends. This pretty much invalidates the trend analysis.
I have the following specific comments:
P2L17-20: Reword. This is a runon sentence with two, double nested parenthetical clauses, making it very difficult to understand.
P2L25: State how models are useful. This paragraph emphasizes resolution as a weakness of models.
P3L2-3: This is not difficult. One must account for representation error when comparing a point measurement to the area average of a model pixel.
P3L7: The resolution of these simulations is essentially the same as for many published simulations, so I am not sure this is the best claim to make.
P45L3: State or described the improved performance.
P4L8: State or describe exactly what is inferior.
P4L10-12: Delete. Each reanalysis has strengths and weaknesses and I find it very difficult to believe that one version of MERRA is truly superior to another, especially considering the scarcity of measurements in the Arctic. I have no objection to using MERRA-2, of course, but claiming superiority is not warranted and best deleted from the manuscript.
P4L18-20: Here the authors state they will use the remotely sensed ALT to validate the model, but later they actually validate the remotely sensed ALT against ground observations. This makes the actual purpose of including remotely sensed data unclear in this paper.
P5L10: The vertical resolution seems too coarse to simulate ALT. The total depth is fine, but other models typically use much higher resolution to simulate ALT. The authors need to explain why this resolution will work.
P5L23: This is a good formulation. Models often use it, but rarely document it.
P6L17: A 180 year spinup is adequate for stabilizing soil temperatures, but not for soil carbon. Does this model include dynamic soil carbon pools? If yes, then a spinup of 1000-5000 years is more appropriate.
P6L20: The chosen spinup technique pretty much invalidates the trend analysis. The typical response time for soil temperature in a model such as this is 20-30 years, exactly matching the length of the MERRA forcing data. If they had spun up using only 1980-85 MERRA data, then the trend analysis makes sense. I suggest either changing the spinup or dropping the trend analysis.
P7L29-30: This means one can use the radar data only where one expects the alt to be less than 60 cm. If the radar cannot penetrate below 60 cm, I question the utility of using it for validation. The authors need to supply a rationale for including it in the study.
P8L11: Please identify which site got covered with lava. This is so unusual that you have to tell the reader.
P8L28: The section on comparison with the radar ALT must include uncertainty. The best that a model can do is match the observations within uncertainty.
P8L28: The authors should include a description of the statistical comparison itself. There are many ways to do this, ranging from a cost function to a regression.
P8L28: The authors need to change the section title. The title covers only comparison with the radar data, but the text covers comparison with CALM data.
P9L7-16: The comparison of a point, in situ measurement to a model or remote sensing pixel must account for representation error. Representation error is the uncertainty when a point measurement represents an average. The standard deviation of the CALM grid measurements is a good estimate of representation error.
P10L25-30: The authors should state this is a standard degree day model and find some references.
P11L5-8: This is a standard degree-day model for ALT with a snow adjustment. There are hundreds of variants of this model in the literature derived from the original thermodynamics equation, models, or empirically from in situ observations. The authors need some references here and text explaining that this is a degree day model.
P11L8: The authors should explain why they included the a0 term. The a0 term is not often seen in a degree day model because one typically assumes the soil starts frozen (a0=0).
P11L8: The authors should explain why they chose Tcum rather than the square root of Tcum. One can derive the sqrt(Tcum) relationship directly from the original thermodynamics equation and the relationship appears many times in analyses of in situ measurements. Because of the strong theoretical basis of sqrt(Tcum), using plain Tcum is rare, so the authors need to justify its use.
P11L12-16: This description of comparing to CALM is out of place and should be moved to section 3.1.
P11L12-16: The authors need to account for uncertainty in the CALM measurements when comparing to the model output.
P11L18-24: The spinup technique invalidates the trend analysis. Either drop the trend analysis or modify the spinup.
P12L6-8: Delete. Unneeded.
P12L10-12: Move to methods.
P12L14: Figure 3 shows the difference between the model and observations, but is this difference within the uncertainty? If yes, then the two are statistically identical and thus a match. If no, then there is a statistically significant mismatch. The magnitude of the difference is unimportant if the difference is less than uncertainty. The authors need to account for uncertainty in this comparison.
P123L21: Explain here why soil type influences the result. The reader should not have to flip forward in the paper to get this answer.
P12L25: Agreement ‘to first order’ is too vague and carries no meaning. The authors need to quantify the agreement accounting for uncertainty.
P13L1-8: The authors need to expand their statistical analysis of the model-data comparison. All they have is correlation, which says nothing about magnitude. They should expand the residual analysis to include bias (mean residual), root mean square error (residual standard deviation), and chi-squared (standard deviation of residuals normalized by uncertainty).
P13L9: ‘Broadly consistent’ is too vague and carries no meaning. The authors need to quantify the agreement.
P13L20: ‘In general’ is too vague and has no meaning. Comparing modeled and observed trend with latitude is perfectly valid here. The limited number of in situ measurements will simply result in higher uncertainty.
P13L23: Again, ‘generally’ is too vague.
P13L23-33: Figure 5 is not the correct format to show the relationships described here. The reader cannot visualize the relationships and correlations from the simple time series plots in Figure 5. The authors should replace the time series plots with three plots to illustrate the relationships: ALT vs. latitude, ALT vs. organic matter content, and ALT vs. air temperature.
P14L20-23: Perhaps, but the author’s argument is not convincing. Shading associated with higher LAI represents an equally valid explanation. Higher water content associated with higher organic matter content could also explain the difference. The authors have the full suite of model output on hand. They should do a statistical analysis of available output to track down exactly what explains the difference.
P14L31: The authors need to identify exactly what soil parameters changed. Porosity? Thermal conductivity? Volumetric water content? Also, the authors need to explain how they specify soil properties in the model. A sharp change as seen here is common when specifying properties by soil type, such as sandy loam defined in the USGS soil triangle. A sharp change would be unusual when specifying soil properties by maps of soil texture (sand, silt, and clay fraction).
P15L33: The reason simulated ALT is deeper is the same reason identified later in the manuscript: the model either has permafrost or it does not because it cannot represent sub-grid scale processes. When the model does simulate permafrost in sporadic regions, it is always greater than observed because it represents an area average of permafrost and non-permafrost areas.
P16L9-10: The authors need to either perform the analysis with air temperature or at least summarize and reference the results of other studies that did perform the analysis.
P16L12: The authors need to remove the trends in ALT, Tcum, and SWEmax before calculating the correlation coefficients. We see nice strong correlations because all three variables show strong trends over the time period of the simulation. Removing the trends will significantly change Figure 9 and its interpretation. If the authors want to isolate the effects of trends on the ALT, then they should include an analysis using the congruent trend fraction.
P17L1: The regions identified on the maps do not correspond to high mountains. Please clarify.
P17L22: ‘Geographically thin’ is too vague. Please reword.
P18L20: The authors cannot make this claim without an actual comparison with other models. Drop the statement.
P18L14 and P18L34: This is the first mention of representation error. The authors need to estimate the representation error of the in situ measurements and include this in the comparison with the modeled ALT. There are several ways to do this and I leave it to the authors to determine the most appropriate method for this paper.
P19L1-2: This statement is not true. A point measurement can represent an area average if one includes representation error in the point measurement.
P19L4: Either change the spinup or drop the trend analysis.
P20L15-16: Again I am confused about the motivation of including the remotely sensed ALT in this paper. This paper compares the model to the RS data, but also compares the RS data to the in situ measurements. Do the authors want to validate the model or the RS data?
P20-P22: Please reduce the summary to one page or less. The current summary is way too long and simply repeats material from the results section. What is the primary, take-away points the authors want to convey? What are their most important or most interesting results? What are the broader implications of their results?