|Though I firmly stated that the authors had not presented convincing evidence to support their claims, they chose to not implement and include any of my major suggestions for revisions to make the work more convincing. Instead they have largely chosen to reason away my suggestions. I don’t wish to repeat my review nor get into a lengthy debate on the relevancy of my comments. Instead I’ll just go with what the presented data say to me. The authors have derived a method for estimating snow densities that includes climatological variables, which provide a means of capturing spatial heterogeneity. They developed and tested this method using data from snow pillows and snow courses. By the authors own admissions (lines 46-50) these data all come from “relatively simple topography”. Generally these are all located in flat, wind-sheltered locations. On the other hand the Sturm model, a well-regarded and oft-cited research piece, is based on similar data as well as data collected on manual traverses representing a range of topographic positions and snow deposition zones. The presented comparison with the Sturm model is conducted only at SNOTEL sites (i.e. flat, wind-sheltered). These same exact sites were used in both the calibration of the new model and for comparisons with the Sturm model. Results show that overall the new model performs better. However, if one were to eliminate taiga sites, which were not well represented in the Sturm data, the overall results are very similar (see table below). Though the authors acknowledged that I raised an important point regarding their splitting of data in which all stations were included in both training and validation sets they state only that a station-dependent splitting method produced “extremely close” results to the original without actually presenting those results. Given how similar the provided results are for non-taiga performance of the two models, the only conclusion I can draw from the presented data is that at sheltered, taiga sites the new model performs better. At other sheltered locations, the new model might be better or worse. The new method is trained in the same conditions and sites as the validation was conducted whereas the Sturm method is based on a greater diversity of data from independent sites. Given this unbalanced methodology and the closeness of the results, I find the comparative assessments of model performance at sheltered, non-taiga sites to be uncertain.|
Results averaged over all snow classes except taiga (taken from snow class percentages provided in Section 3.1 and results in Table 4)
Sturm et al. Multi-variable two-equation
R2 0.97 0.97
rmse (mm) 72.1 67.8
bias (mm) 1.76 -2.26
Regarding the usefulness and accuracy of the new methodology at sites other than typical SNOTEL stations for estimating densities nothing has been shown (e.g. the manuscript-referenced crowd-sourced and Lidar data that are gathered in a variety of topographic settings). No data is presented for anything other than flat, wind-sheltered locations therefore conclusions on model performance in any other conditions are not possible. Yes, the maps look pretty and capture greater heterogeneity but how accurate are they? A theoretical argument can easily be made that the Sturm method – based on a more diverse set of data – would actually be better.
I would be remiss if I didn’t further comment on one of my major suggested revisions that wasn’t followed up on: direct comparisons to the Jonas et al. method. Rather than including direct comparisons to the Jonas method (based on over 11,000 observations and cited over 140 times) as suggested, the authors have chosen to make comparisons to the simple Pistocchi method that is solely a function of day-of-year (based on 206 observations and cited 5 times). According to the authors this was necessary because the Jonas method is dependent on month of year, elevation, and a geographic “offset” term. Certainly the former two variables are available to the authors. The presence of the offset term however leads the authors to conclude that the Jonas model cannot be applied to other regions while implying that Jonas et al. did not “construct” their model for such applications. Yet in referring to the importance of the offset term Jonas et al. state, “However, the minor importance of regional effects suggests that the model may also be applicable in other regions with similar snow climatologic conditions.” In fact, if one averaged the regional offset term over all the data records in the Jonas application it comes to a mere 3 kg/m3. One could in fact, easily optimize the Jonas “offset” term to the data presented here in the calibration set. (It is my personal belief that any comprehensive analysis should include this). Yet even a straight-out-of-the-box Jonas application using the average offset or none at all would be insightful. If, in fact, the presented model performs better than the Jonas model at the tested sites then this would show that there are indeed regional constraints present in the Jonas model that must be accounted for. On its own, these insights on regionality for one of the most widely cited density parameterizations would be relevant. Of course, this would also lend greater credence to the model presented in this work as well. This entire analysis including optimization could probably be done in less than a day. Without optimizing the offset term, this is about 5 lines of code that could even be handled in a spreadsheet. I truly don’t understand why this revision wasn’t undertaken.
Unfortunately, save for the inclusion of the Pistocchi model comparison, the authors have presented no new data. It is my opinion that the presented work still does not compellingly support the stated conclusions, particularly this one (lines 432-33), “The results presented in this study show that the regression equation described by equations (5, 7-8) is an improvement (lower bias and RMSE) over other widely used bulk density equations.” The authors have the data available to make this a much more extensive and scientifically supportable work (e.g. significance tests would be a nice touch). I think I have provided several constructive ideas on how to go about this. At the very least, if they weren’t to follow up on these suggestions they need to objectively evaluate their results and in my opinion, substantially scale back their claims.