Physics-based modeling of Antarctic snow and firn density

Estimates of snow and firn density are required for satellite altimetry based retrievals of ice sheet mass balance that rely on volume to mass conversions. Therefore, biases and errors in presently used density models confound assessments of ice sheet mass balance, and by extension, ice sheet contribution to sea level rise. Despite this importance, most contemporary firn densification models rely on simplified semi-empirical methods, which are partially reflected by significant modeled density errors when compared to observations. In this study, we present a new, wind-driven, drifting snow compaction scheme that we 5 have implemented into SNOWPACK, a physics-based land surface snow model. We demonstrate high-quality simulation of near-surface Antarctic snow firn density at 122 observed density profiles across the Antarctic ice sheet, as indicated by reduced model biases throughout most of the near-surface firn column when compared to two semi-empirical firn densification models. Because SNOWPACK is physics-based, its performance does not degrade when applied to sites without observations used in the calibration of semi-empirical models, and could therefore better represent firn properties in locations without extensive 10 observations and under future climate scenarios, in which firn properties are expected to diverge from their present state.

(1) Given that the uncertainty ranges of the density simulated by the different models overlap to some degree, it is not completely clear whether there is a statistically significant difference between them, or between the models and the observations at different levels. The authors should test whether this is the case.
(2) The authors should be careful to note some of the limitations of the current implementation (e.g. the validation is over the top 10 m, not the entire firn column; and the SNOWPACK bias is larger below 6 m depth) particularly in the abstract and conclusions sections. (3) The available evidence doesn't seem to necessarily support the argument that biases are substantially larger in the semi-empirical models at locations that were not used to calibrate those models. The authors should clarify whether this is indeed the case and revise the text accordingly. It would be interesting to include both the GSFC-FDM and IMAU-FDM in this comparison if possible. Further specific comments are provided below.

Specific Comments
1. Title: I would suggest adding SNOWPACK to the title, and mentioning the near-surface e.g. "Physics-based modeling of near-surface Antarctic snow and firn density with the SNOWPACK model". I would argue that the other models utilized here are also physically-based, they just employ simpler parameterizations for the process of firn densification. 2. Lines 1-11: In general, some quantitative evidence should be provided here.
Some of the limitations of SNOWPACK applied over Antarctica should be discussed, for example the larger bias for higher accumulation areas and the larger biases deeper in the snowpack, as well as the fact that this approach focuses on the near-surface, not the full firn column. 3. Lines 7-8: It would be best to quantify the magnitude of the biases here. 4. Line 9: It isn't entirely clear from this sentence that this is one of the findings of the study; it would be best to provide some quantitative results here. Also I believe the performance does degrade somewhat at these sites, just not as much as for the semi-empirical models? 5. 6. Line 38: What is meant by "all local and temporal density variability"? No model can capture "all" variability. Please clarify. 7. Line 41: These models do employ "physical principles"; they are not entirely empirical. Suggest simply removing the phrase "rather than physical principles". 8. Lines 50-53: Describe how the model is forced, briefly. 9. Line 50: Instead of "we apply", do you mean "we compare results from"> 10. Line 61: SNOWPACK also seems to include parameterizations that are empirically based. Perhaps mention explicitly how SNOWPACK is different from the other models mentioned in earlier sections. 11. Line 75: Perhaps change "new drifting snow compaction routine" to "new snow compaction routine", as drifting snow is just a component of the routine. 12. Line 80: Can the authors briefly note how this parameterization is derived?
13. Lines 87-88: How much do these parameters change the comparison with observed profiles. Provide some additional details either in the main manuscript or a supplemental section. 14. Line 90: Briefly explain the physical meaning of the "threshold friction velocity". 15. Line 124: Is this a bias over the entire Antarctic ice sheet? Are there spatial variations in the bias? 16. Lines 130-131: Is there a reference for these statements? 17. Line 133: Why use 19.4% and not 15.1 W m -2 ? 18. Line 134: Why is there still a bias after the bias is removed? 19. Lines 154-170: It would be helpful here to describe these two models in a bit more detail, in particular to highlight how they differ from SNOWPACK in terms of key physical processes (e.g. compaction), as the model differences are important to the conclusions of the study. 20. Line 166: Explain the meaning of "replay". 21. Line 194: Suggest changing "reduction in both RMSE…" to "statistically significant reduction in both RMSE…" 22. Line 196: This section could potentially be moved to later in the manuscript.
It might logically follow the section on comparison with observations. 23. Line 200: Clarify why these two stations were chosen. 24. Line 221: It is a bit unclear what is meant by "we tested for explanatory variables". Please clarify. 25. Lines 245-247: It might be useful to have a table here for the bias and RMSE for different models above and below 400 kg m -3 . 26. Line 253: This sentence is confusing. Suggest revising to read something like: "Additionally, we cannot rule out the possibility of larger errors in the observational data for densities above 400 kg m -3 ." 27. Line 256: The SUMup dataset does include information on measurement methods. It might be interesting to see if dividing by measurement method changes these biases in any way. 28. Lines 259-271, Fig. 6: Can the authors note whether the differences are statistically significant? It might also be useful to provide an uncertainty range on the biases. Also, at first glance at it appears that all the model simulation uncertainty ranges overlap in Fig. 6, but this is not the case. Perhaps the figure can be modified slightly to make this clearer, e.g. changing the transparency for different models or changing the colors. (Not sure how easy this would be). 29. Lines 293-294: It seems this would not be difficult to find out? It would also be interesting to see the IMAU-FDM results. 30. Lines 296-297: From Fig. 8, it actually looks like there is a larger change in the SNOWPACK density bias (at least at different levels). The numbers here do not seem to match with the figure. Please clarify. 31. Lines 335-336: This portion is interesting but seems disconnected from the rest of the manuscript. Perhaps these temporal variations could be placed in the context of temporal variations from in situ data. Are there any locations where a timeseries of measurements is available that could be compared with the SNOWPACK runs? 32. Line 342: Without validation of the temporal variability of the in situ measurements, I'm not sure the model results would qualify as "evidence". Please revise. 33. Lines 360-364: I'm not sure these statement is completely supported by the results. For example, SNOWPACK seems to show a larger bias at higher accumulation locations, and the SNOWPACK and the GSFC-FDM both seem to show a positive bias in locations that were not used to constrain GSFC-FDM between 0 and 6 m in depth. In general, however, I would agree that including a more physically realistic simulation of snowpack processes should produce a better projection of future conditions. Perhaps revise this statement to note that this is likely the case, but not entirely certain.