|Second Review of The Cryosphere paper by Xiao et al. – TC-2019-280.|
Overview and General Comments:
The revised manuscript is much improved in terms of its organization, explanations of the results and English writing, grammar and syntax. The authors have addressed sufficiently many of the reviewers’ original comments and suggestions, however, there are a few areas that may require additional attention and a couple of concerns that were not fully addressed.
The authors have conducted a comprehensive study in considering the inputs and characteristics associated with passive microwave retrievals and algorithms to estimate snow cover fraction. However, it focuses still on only two winter months and is limited by the number of years used in the training, testing and validation, using the period from 2008-2017 (in relation to SSMI/S sensor F-16). Selecting just the mid-winter months of January and February to demonstrate the application of the random forest regression approach as to determine snow cover fraction does limit the study to mostly dry snow events, and ignores the ablation and accumulation seasons, which are understandably harder to train and estimate and may require more time to address. If it is not too costly (as based on their computational timing estimates shown in the paper), could the authors include a couple of other months, at least for the random forest regression and Grody SCA algorithm methods, to see how they perform? For example, include December (reflecting more accumulation phase) and March (more ablation phase processes) to see how well the algorithms perform for different conditions and intermediate snow levels, especially the March case.
Though the authors point out (page 18-L11, page 21-L12, and page 23-L7) that the amount of intermediate fractional snow cover values is much lower, since they focus on peak winter months, the question arises then -- does their random forest regression snow cover retrieval approach, that was trained and tested for this study, adequately predict intermediate snow cover fraction values?
When looking at Fig. 6, which compares the reference (MODIS) with the random forest (RF) passive-microwave snow cover estimates for forest land type, the reader sees some of the intermediate snow cover fraction values captured across the RF-based estimates. However, when one looks at the Supplemental Figs. 4-6, for the other three land types (prairie, shrub and bare soil), most of all the values congregate either between 0.8 and 1.0 or near 0.2, for the RF estimated values, with only a few points near 0. With not knowing the distribution of land cover types, it is hard to know how many “intermediate” values are captured mainly due to vegetation canopy presence. For the other land cover types, intermediate values are not well captured, possibly reflecting mostly dry snow winter conditions, thus not making the approach tested a truly “fractional” snow cover product.
The authors stated in their introduction section that “there is an urgent need to acquire snow cover area within a sub-pixel to provide accurate snow cover information”, as a main motivation for performing this study. I feel the authors do try to capture the fractional nature of snow cover with their algorithm and passive microwave estimates. However, not many spatial map examples are provided to show how well the overall the passive microwave fractional snow cover estimates perform for the mid-winter months (Jan-Feb) against the MODIS-based fractional estimates to demonstrate this. It would be helpful if there was at least one other date provided and shown (e.g., besides Feb 27, 2017, in Fig 8) and perhaps a date that the MODIS product had less cloud coverage and more intermediate snow present.
Thank you for addressing and explaining further the reason for the “binary” nature of the aggregated (15x15) MODIS snow cover gridcells compared (Figure 8B panel), due to the “rigorous” screening of only including snow or non-snow pixels within the 15x15 aggregate. Also, the side analysis (not included in the paper) is helpful to see if you were to relax that constraint by allowing a 5% amount of pixels to be factored in (that are cloud or inland water), and how that translated into more “intermediate” valued gridcells, ranging from 0.3 and 0.8. With the original rigorous screening process still applied in the paper, at least shown in Fig. 8, the MODIS snow cover reference acts more as a “binary” snow cover, which does not seem to reflect fractional snow cover here. By selecting such thresholds (e.g., <= 0.3 cutoff as no-snow) and rigorous constraints, this may eliminate many pixels that could be used in the training process, even though the design is to utilize the most accurate input data values to establish the regression parameters.
In addition, the Supplemental Figure, S-7B, shows the authors’ passive-microwave based fractional snow cover estimates on a “continuous” scale, from 0 to 100% coverage (per pixel), related to Fig. 8. Based on this image, areas showing low snow cover values, e.g., < 0.2, seem to be appearing in areas like southern Florida (e.g., color-graduation near values of 0.2) and desert regions in the Southwestern U.S., which typically do not see much or no snow at all. In this case, the MODIS-based “non-snow” areas (in dark blue) look correct, and the lower estimated snow cover fraction values in the passive microwave product seem to overestimate the “low or no snow” fraction values, which is definitely shown in their Figs. 6 and 7.
What would be helpful in relation to Figs. 8 and S-7, would be to see an example of the same panels, e.g., like with respect to Fig. 8, but a case where MODIS has less cloud presence to show spatially the similarities and differences between the passive microwave product to the optical based sensor.
In addition to these issues, the motivation and methods described are disconnected at times in the paper, and the authors should want to present aspects of their study in a more coherent way. For example, the Section 3.1 Overview section is very helpful to provide readers a “roadmap” as to what is to come and expected in the study, but even the purpose of exploring the four different methods in relation to introducing the fourth method as an hypothesized better approach is simply stated, but those connections seem to be lacking in the Introduction and Section 3.1. I would recommend stating more directly in the Introduction section as to why the random forest regression was selected and provide a short description as what are its advantages, relative to other methods previously applied. I feel the first paragraph in your Section 3.4.4 subsection (on page 12) should be given earlier on in the paper, like in the Introduction section, as to why it is being pursued as an application and how it contributes to your overall hypothesis.
How were the specific years available for the SSMI/S PM dataset selected originally for: 1) training, 2) testing and 3) evaluation? For example, was 2010 simply selected randomly for the testing year of the four different snow cover retrieval algorithms? If you performed all your algorithm testing and training for the datasets used for different sets of years, do you think you would obtain the same level of results and validation, even though samples are drawn randomly? For example, if you had chosen 2008-2014 for training, 2015 for testing and 2016-2017 for validation, would you expect your table values (e.g., Table 5) and plots to look similar in nature (e.g., Fig. 7) as currently shown? Why or why not?
If you used more years (e.g., 20+ years) of overlapping satellite passive microwave sensor datasets, e.g., different SSM/I versions, could you apply the random forest regression approach separately for each month (and more months) and by region (e.g., U.S. Rocky Mountains, Upper Great Plains), since each of these categories would have distinct snow characteristics, like differences in snow densities (e.g., coastal mountains vs. continental plains regions, or accumulation vs. ablation periods), which can affect brightness temperature signals?
Abstract: The authors state around line 14: “under all-weather conditions”, which I feel is not representative to summarize in the abstract. The authors can say that they applied the random forest regression technique, along with three other retrieval algorithms, for “dry snow conditions in peak Northern Hemisphere winter months (January and February).”
Abstract (around lines 19-20): The authors mention that they obtain “with higher accuracy and no out-of-range estimated values” associated with the random forest regression technique. They may also want to note that this algorithm tends to underestimate upper fractional snow cover values and especially overestimates when no to little snow fraction is present, in comparison to the reference snow cover dataset.
Page 4, Lines 26-27: The authors state here that they used “bilinear interpolation” to “aggregate the 3.125 km spatial resolution data to 6.25 km”. Typically, bilinear interpolation is used for downscaling coarser grid datasets to finer grids. How was such an interpolation used here for “aggregation”? Please provide additional background on why this was selected, versus an averaging or mode-based scheme.
Page 5, lines 7-8: Authors state that GHCN-D data from 50,000 sites across Canada and US were collected and included in this study. Actually, this number is higher than what is available for the sites with actual snow depth measurements taken (e.g., may be closer to 10,000 stations for the year of interest, e.g., 2017). The authors may want to mention here how many stations are exactly included in their ground station analysis.
Pages 6-7, Section 3.1: I would recommend, early on in this section, mentioning that you compare your application of random forest regression as a new retrieval method against the other three known methods (linear regression, MARS and ANN), and specifically mention them in the first paragraph of this section. The reason is that these other methods have been applied before and you are demonstrating the application of this fourth method. If it has never been applied in such an application related to passive microwave snow cover estimates, you will want to highlight that as both novel and part of your hypothesis that you are testing in deriving snow cover estimates from passive microwave brightness temperatures.
Page 13, lines 6-7: The authors state here that “As several attempts to optimize the parameters of random forest structure had failed, all parameters used were the default values.” Can you further describe here then what default values you began with, and one of the approaches you applied to optimize the parameters?
Pages 17-18: Please expound here on why the shrub and bare land types may be experiencing more “binary” type extremes (“more distributed at two polar ends”) for snow cover presence (shown in Figs. 7B,b, and D,d) than the other two land types.
Page 18, near lines 10-15: “This means that the pixel was identified as snow cover when fractional snow cover value was less than 0.3.” In this last sentence here, did you mean to that “the pixel was identified as snow cover when fractional snow cover value was greater than 0.3”? The authors may want to check what they reported here.
Page 19, lines 1-2: What is meant by “overestimated (~0) and underestimated (~0)”? Did you mean to make the “0” with “overestimated” a “1”? Please clarify what is meant here for the two cases.
Section 4.4: For the snow depth evaluation of the random forest approach (in Section 4.4) vs. the Grody SCA algorithm (also using passive microwave retrievals), I believe only two months total of data were used for the validation (Jan-Feb, 2017). Since the GHCN station snow-depth measurements are not used in the training stage, would it be more representative of this part of the validation to use more years, e.g., 2008-2010 and 2017? Two months only of evaluation does not seem sufficient for this validation.
Page 22, line 20: The authors mention that passive microwave based satellites work “around the clock”, which is partly true. Most microwave measurements (unless they are from geosynchronous satellites) may have a repeat time over the same location every 2 to 3 days, producing gaps in available data and not continuous (e.g., per day) in time (hence why you see the non-overlapping swaths of “missing” or “filled” areas in panel C of Fig. 8). It might be better to simply remove that phrase here.
Figure 8: Why are the legend categories for Figure 8B and C distributed unevenly and alternating in range values (e.g., 0-0.3, 0.3-0.5, 0.5-0.8, 0.8-1.0)? It seems more appropriate to apply “quarters” and have four equal category ranges (e.g., 0-0.25, 0.25-0.50, 0.5-0.75, 0.75-1.0). Please provide explanation as to how these ranges were selected within the main text.
Page 8, line 3: Recommend changing “no-forest” to “non-forest”.
Page 8, lines 20-21: The superscripts used to define snow cover for Terra and Aqua MODIS sensors, e.g., “S^Aqua”, seem to be reversed for the given satellite. I believe you meant to use “Terra” for the Terra satellite snow cover ground status, “S”. “Whether a pixel in Terra (S^Aqua)”… should be: “Whether a pixel in Terra (S^Terra)”, and then the same for the Aqua snow cover term following that.
Page 17, lines 27-28: You may want to clarify within the sentence that starts as “Fig. 7A and 7a show …”, that this corresponds to the forest land type.
Page 17, line 30: Within the phrase, “best performance on the evaluation data”, change “on” to “for”.
Page 18, line 3: Remove “can” from “we can found that …”
Page 18, lines 27-28: You may want to update this sentence to include the additional years you drew randomly from for snow cover validation – 2008 and 2009.
Description for Fig. 8 on Page 18 should mention the date of the images sooner in the paragraph - February 27th, 2017.
Page 19, line 10: Add “be” to “may easily be neglected”.
Page 19, line 30: Change “In additional” to “In addition”.
Page 20, line 6: Change: “goo agreement” to “good agreement”.
Page 20, lines 12-14: Authors accidentally copied or repeated similar information here from Section 4.3, describing the results from Fig. 8, and not related to what is described in this paragraph related to Fig. 11. I recommend them removing these last two sentences in this paragraph.
Page 21, line 10: The authors may simply want to list the datasets used, such as: “The estimation results of the random forest model [for the training, testing and evaluation] datasets …”
Page 21, line 14: Change “this” to “these” in the phrase, “Even in [this] cases …”
Page 22, line 10: Typically, the terms used here, “under-forested and over-forested” in terms of snow cover are more often referred to as: “under or above forest canopy”. I would recommend updating this paragraph to reflect the use of these terms.
Table 5 caption: Change “brackets” to “parentheses”. Brackets are different from what is used in the table.
Table 7: This table and its caption are hard to follow, especially what the percentages are being reported on the right. Please further note what is meant in the caption and what are indicated by the percentages.
Figure 2: The diagram still shows for the fractional snow cover “evaluation” dataset – just 2017. I believe the authors added 2008-2009 to the validation dataset. Is this correct? If so, authors may want to update this diagram to reflect the total years used in the validation period.
Figure 9 B: X-axis label is misspelled – change: “Snow detph” to “Snow depth”.