Comment on tc-2021-258

This paper presents a comprehensive comparison of ten different passive microwave sea ice concentration products with Landsat visible imagery. Comparisons are made in both hemispheres and over a wide range of spring through autumn conditions. The results indicate varying performance of the algorithms, with the SICCI providing the best linear agreement with Landsat across different concentrations. The CBT, NOAA-CDR, and NT-2 AMSRE have the smallest overall biases, but this may relate to truncation of values at 0% and 100%; the NT2 substantially overestimates concentration in the Antarctic.


General Comment
This is a very in-depth and comprehensive paper and further adds to the excellent passive microwave sea ice comparisons studies over the past couple of years (Kern et al., 2019; Kern et al., 2020). The study is thorough and laid out well. My only main criticism is that it is a very bulky manuscript and difficult to take all of it in. I note in the comments below that only three of the six case studies are presented in the main manuscript, with the other three relegated to the supplement. This is fine, but it seems to make a somewhat arbitrary (maybe there is a rationale?) split between what is in the main paper and what is in the supplement. And the supplement itself is quite extensive -18 pages, albeit a lot of that is figures and tables. I wonder if there might be value in splitting the paper into two parts -one paper with the main hemispheric comparisons and then a second paper examining the case studies? I would leave this up to the authors and the editor, but it may be something to consider. I have some fairly minor comments below that should be addressed by the authors. I recommend acceptance after minor revisions.
Specific Comments (by line number): 127-128: In Table 1, the CBT and NT2 AMSR-E/2 products used are at 25 km resolution. But there is also a 12.5 km resolution product. Is there a reason to use the lower resolution product over the 12.5 km product? Higher resolution will pick more detail and generally be more accurate. I can see for simplicity picking only one or the other as the differences shouldn't be too large, but the 12.5 km product would make more sense to me and would have been a 2 nd 12.5 km product along with the SICCI-12km and ASI. Perhaps you wanted to be consistent with the SSMIS 25 km products?
138: In Table 1, the references seem to ATBDs or journal papers about the products. However, one should also reference the products themselves where available. I do see such references in the Reference list -e.g., Meier et al., 2017 for the NOAA CDR. However they are not listed in Table 1 or within the manuscript text (as far as I can tell) -e.g., in Table 1, for the NOAA CDR, the references provided are for the ATBD and a journal article. I would suggest adding the actual product citation in the far right column, again if available.
Also, I will note that the NOAA CDR used here is apparently Version 3. This is fine and there is no need to change. But I will note that there is now a Version 4 published that has some notable differences from Version 3, though nothing that I think would substantially affect your results. Nonetheless this highlights the need to cite the specific dataset, including the version, where possible so that there is clarity in what data is being used.
160: Though referenced, it seems simple to actually provide the a and b values in Equation 1 that were used and would be convenient for the reader. It seems these could be potentially added in Table 2? 198: I think most readers would be very familiar with the projections/grids, so this is a very minor point: you could provide the EPSG codes (or Proj4 strings) to exactly specify. For example, the NSIDC PS grid (EPSG 3411) is slightly different than another similar WGS84 NSIDC PS grid (EPSG 3413). 227: What does "arbitrarily" mean? Were scenes selected randomly? Or did you just pick scenes that looked "good" to use? 384-393: You mention snow metamorphism due to melt and melt-refreeze cycles. Another aspect could perhaps be flooded snow and snow-ice formation due to the weight of the snow on the ice causing negative freeboard. 398: is "ice tongue" the correct term here? I think of an ice tongue as relating to marineterminating glaciers or ice shelves. You could perhaps use "patch" instead of "tongue", or "floe" or "collection of floes"? 399: the oversampling issue seems quite important here. I assume it is discussed in the earlier papers referenced, so it isn't necessary to go into any great detail, but I do think it is worthwhile to mention. One place would be in the discussion of Table 1 where gridded resolutions are noted; in that context you could note that the sensor footprint resolutions are often coarser and thus the effective resolution is lower (coarser) than the gridded resolution. And then here in line 399, you could refer back to that to indicate the resolution issue. Otherwise, as it stands now, this sentence seems to lack context. EDIT: Writing comments as I was reading through it, I hadn't seen Section 5.1. The addresses the above comment in nice detail. Perhaps just a reference about the footprint resolution around Table 1 and line 399 noting that it will be discussed in detail in Section 5.1. You could consider taking the first paragraph of 5.1 and moving it to Section 2.1, but I can see where that might make that section overly long. 425: Curious as to the rationale to look at the latter three cases in the main manuscript rather than the first three? I recognize the desire/need to limit the length of the main manuscript, which is quite long. Choosing three is fine, but why those three? Was it just arbitrary or was there a reason to look at those three over the others? One thought is that, as noted, the main manuscript is quite long and these case studies add substantially. I wonder if just selecting one as a representative example and then relegating the other five to the supplement might be better? Or perhaps two contrasting cases -e.g., freezeup vs. high-concentration? 441-453: I wonder if another reason, in part could be the sampling/sensor resolution issue. At the low resolution of PMW, characteristics can be "smeared" over a larger area. So, high concentration ice could be smeared into lower concentration ice regions, causing a high bias. The fact that this happens as much in the 12.5 km product as in the 25 km maybe argues against this, but it could contribute. I also note in Figure 5 that ASI is particularly low. That suggests there could be some surface and/or atmospheric effect that the 85-90 GHz channels are particularly sensitive to but which may also affect the lower frequency channels?

Technical Corrections (by line number):
58: suggest omitting "for sure" or substituting something like "clearly" or "definitely"