I appreciate the effort made by the authors to address my original concerns with the first draft, and I think the revised version is improved. However, in my opinion there are several issues that should be addressed before I would consider the manuscript to be publishable. Importantly, the manuscript requires a careful read-through for typos as there are several throughout.
L1; The title is not very specific. Suggest changing to something like "Benchmark seasonal prediction skill estimates based on regional indices".
L80-82; This statement is a bit trivial, and it's not clear what is being implied in relation to the previous sentence.
L87; Should state what the "perfect model" assumption is. Also, the perfect models don’t just reveal information about persistence in the model (even though this can of course be computed from such simulations). Skill from perfect model simulations comes from a number of sources.
L96; “Sea ice” → sea ice extent
L103; Should note that the first two references are based on statistical models, whereas the third is a conclusion from a dynamical model hindcast study and is only true for winter forecasts. It seems more appropriate to reference that sea ice thickness is an important source of predictive skill for the summer (references below), as this season is the main focus of this paper.
Day, J. J., E. Hawkins, and S. Tietsche, 2014b: Will Arctic sea ice thickness initialization improve seasonal forecast skill? Geophys. Res. Lett., 41, 7566–7575
Collow, T. W., W. Wang, A. Kumar, and J. Zhang, 2015: Improving Arctic sea ice prediction using PIOMAS initial sea ice thickness in a coupled ocean–atmosphere model. Mon. Wea. Rev., 143, 4618–4630
Dirkson, A., W. J. Merryfield, and A. Monahan, 2017: Impacts of sea ice thickness initialization on seasonal Arctic sea ice predictions. J. Climate, 30, 1001–1017
Zhang, Y., C.M. Bitz, J.L. Anderson, N. Collins, J. Hendricks, T. Hoar, K. Raeder, and F. Massonnet, 2018: Insights on Sea Ice Data Assimilation from Perfect Model Observing System Simulation Experiments. J. Climate, 31, 5911–5926
L118; While true, this study is only looking at how the trend impacts the anomaly correlation metric (a strictly mathematical consequence of that metric), whereas the Drobot 2003 study speculated that the statistical relationships between the predictors and the sea ice might change due to evolving physical relationships between them.
L118-121; Not sure how this statement fits in with the rest of the paragraph, and the same point was already made in the preceding paragraph when referencing Bushuk et al 2017.
L149-152; Except that most of these studies used a long model control run to assess predictability, not restricted to the post 1979 satellite record.
L170; “Sea ice extent” --> “Pan-Arctic sea ice extent”
L176-180; could reference:
Lindsay, R., & Schweiger, A. (2015). Arctic sea ice thickness loss determined using subsurface, aircraft, and satellite observations. The Cryosphere, 9(1), 269-283.
L179; CryoSat-2? CryoSat-1 wasn’t successful. Also, reference needed.
L261-266; Dirkson et al 2017 brought up this very issue, and suggested the use of a quadratic fit to detrend pan-Arctic sea ice area, differing the linear fit used by the methods cited here.
Dirkson, A., W.J. Merryfield, and A. Monahan, 2017: Impacts of Sea Ice Thickness Initialization on Seasonal Arctic Sea Ice Predictions. J. Climate, 30, 1001–1017
L292-295; Already said this in previous paragraph.
L307-309; If `linregress' is the method used for fitting after finding the breakpoint year using ``curve_fit'', this should be described in the methodology before describing Fig. 4. Don't you use this to detrend pan-Arctic SIE as well?
L319ff; But Fig 5 is based on non-detrended data, so not really comparable to Bushuk et al 2018.
L326-327; Much of the correlation here is due to the trend, is it not? A more informative comparison of the two periods would be based on detrended data.
L331; What statistical test is being used here?
L337; Avoid using "significance" here if you're referring to the magnitude of the correlation and not statistical significance as measured according to a p-value.
L334; It would be more revealing to state these values in relation to the detrended values for the May and July auto-correlations. Again, this is why it would be useful to plot them on Fig 5.
L362ff; Which regions/months yield a breakpoint year in the 1960's? This feature appears in all months and should be mentioned and expanded on.
L369-370; This is redundant with the previous sentence.
L373; Again, what test is being used?
L378; correlations --> explained variance
L385; why surprisingly?
L386-388; ...September extents <of> East Siberian…
<of> to <explained by>
L397; ”variations.” to “variations in each region.”
L411; five subregions. Didn't mention Laptev.
L428ff; The study referenced here is not a perfect-model study. Also, the comparison is indeed not apples to apples, so I fail to see the point in making any comparison at all.
L447; What region is being referred to here in terms of the explained variance in the BSI?
L454; Only in January-April is the trend in the Bering sea positive.
L466; It would be more accurate to replace "predictive skill" throughout with "explained variance". And whenever saying significant, please clarify if you are referring to statistical significance.
L471ff; low skill for pan-Arctic SIE based on what? Please be specific.
I find this statement conjectural. How do your results compare quantitatively with the SIO? And how can a comparison be drawn when that study is based on 7 years of data (a very short record) when correlation estimates aren't reported due to the small sample.
L474; shows that <the> spring.
L479; I find this sentence misleading and inaccurate. Several studies show forecast skill outperforming persistence using non-perfect model approach (i.e. initialized hindcasts) - including Bushuk et al 2018 which presents results using both approaches. There is a long list of these studies, and even if they do not make direct comparisons against persistence within, it is possible to compare their reported detrended correlations of SIE against the persistence values found here, at least in summary.
L480-482; This isn’t very clear. The perfect model approach doesn't only reveal predictive skill from persistence; several sources of skill are included in these evaluations. Perfect model skill can be compared against persistence in the model, and initialized forecast skill can be compared against persistence in observations. Persistence timescales can also be compared between the model and observations, but I don't understand which comparison is being made here.
L484ff; Wouldn't it be more appropriate to compute these values based on anomaly persistence from June, July, and August SIE rather than year-to-year persistence? I.e. that which is presented in your "results" section, but for MAE and RMSE instead of explained variance or correlation. Are these values reported here based on all contributions to the SIO (heuristic, statistical, non-coupled models, fully-coupled models)? I'm not sure all of these would fall under "state of the art". Also, I think this analysis belongs in the results section rather than summary.
L499-501; For which metrics? Based on explained variance from which metrics?
Figs 5, 7-9; statistical significance should be incorporated into each of these figures. |