Convolutional Neural Network and Long Short-Term Memory Models for Ice-Jam Prediction
For the most part, the authors addressed the various comments I had on the previous version.
According to my earlier comments, the authors:
• Included benchmark results for several machine learning models, demonstrating that the deep learning models had higher performance than these benchmarks.
• Re-worked the manuscript structure to better separate model development details from the experiment results.
• Provided discussion on the interpretability and spatial generality (model transferability) of the deep learning models.
Despite the manuscript being improved, there are still several issues that need to be resolved (improved literature review, clearer description of model development details, etc.) as outlined in my comments below. Finally, the paper would greatly benefit from thorough editing (a number of grammatical issues still exist in the text and others were introduced in the revised manuscript).
I think the paper needs a minor revision before it can be considered for publication.
Line (L) numbers mentioned in my comments refer the tracked changes version of the manuscript. Additional comments can be found in the marked PDF of the authors’ revised (tracked changes) manuscript (see attachments).
1. The authors did not provide any literature review in the Introduction on the use of machine learning models for ice jam prediction, despite being requested to do so by the second reviewer. The authors cite their review article that covers this topic (Madaeni et al., 2020) but do not provide any details on the machine learning methods that have been used for ice jam prediction, which seems essential to highlight in the present study.
2. Section 2.2: it would be good to mention the software packages used for developing the machine learning models. Without this information, for example, it is difficult to know what is being referred to as ‘default values’ for the decision tree method.
3. L344-345: I think this is backwards, you do not need a loss function to evaluate model error - you need a prediction and a target. However, in many cases you need the model error to evaluate the loss function (e.g., if the loss function is mean square error or some regularized version of it). Perhaps it was meant that the loss function is used to guide the optimization problem?
4. Author’s reply to my former comment 8: If grid-search has ‘poor coverage in dimension’, its not clear how trial and error overcomes this. How did the authors know which hyper-parameter values to try in trial-and-error (i.e., those reported in Table 3)? Were the values in Table 3 decided upon based on recommendations from the literature? If so, any sources that guided these decisions would be good to cite.
The authors mention that various combinations of the hyper-parameters (L378) were applied but do not mention what combinations were explored. The authors should provide more information here to enable their experiments to be reproduced. That is, assuming someone had access to the same dataset, sufficient information should be provided by the authors to enable someone to arrive at the same (or at least similar) results.
5. Supplemental information file:
a. It would be good for all acronyms (and abbreviations, if any) to be spelled out in full at first use.
b. It’s not clear what is meant by ‘channel’. Do the authors mean ‘input’?
c. I think the authors mean ‘estimating gradients’ rather than ‘applying gradients’?
d. It appears the word ‘term’ is missing after ‘momentum’ in the first paragraph of the last section.
e. The authors should appropriately revise ‘high momentums’. Perhaps ‘when using high values for the momentum term’ would be more appropriate.
6. The referencing format is inconsistent (see, e.g., L 593-594).
7. Authors’ reply to my former comment 10: it’s not clear what is meant by ‘model implementations’ in this context. I suggest removing these words or using terms that better describe the technical matter.
8. Authors’ reply to my former comment 16:
a. Why not combine Table 11 and 12? It will make it easier for the reader to compare the performance between the deep learning and machine learning models.
b. It would be good for the authors to mention these benchmark machine learning methods in the abstract and include a sentence stating the relative improvement in performance achieved by the deep learning models (in comparison to the benchmarks).
9. Authors’ rely to my former comment 17:
a. In the authors’ response A, perhaps ‘time consuming to train’ would be more appropriate than ‘time consuming’?
b. In the authors’ response B:
i. What characteristics or the model and/or data makes the models transferable to New Brunswick and Eastern Ontario? Did you run the model on data from these provinces to verify this assertion? If so, this should be mentioned. If not, then the authors should be careful to use appropriate language. For example, the authors may instead mention that they anticipate the deep learning models developed in this research to perform well in these geographical zones for reasons X, Y, and Z.
ii. Please remove ‘pretty’ and ‘really’.
iii. In ‘correct predictions with the wrong’, replace ‘with’ with ‘for’.
Madaeni, F., Lhissou, R., Chokmani, K., Raymond, S., Gauthier, Y., 2020. Ice jam formation, breakup and prediction methods based on hydroclimatic data using artificial intelligence: A review. Cold Reg. Sci. Technol. 174, 103032. https://doi.org/https://doi.org/10.1016/j.coldregions.2020.103032