the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Deep learning subgrid-scale parametrisations for short-term forecasting of sea-ice dynamics with a Maxwell elasto-brittle rheology
Tobias Sebastian Finn
Charlotte Durand
Alban Farchi
Marc Bocquet
Yumeng Chen
Alberto Carrassi
Véronique Dansereau
Download
- Final revised paper (published on 21 Jul 2023)
- Preprint (discussion started on 02 Jan 2023)
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2022-1342', Anonymous Referee #1, 14 Mar 2023
Referee comment on "Deep learning of subgrid-scale parametrisations for short-term forecasting of sea-ice dynamics with a Maxwell-Elasto-Brittle rheology" by T. S. Finn et al.
The authors present an idealised study of sea-ice fracture in a channel due to wind forcing, demonstrating that a neural network (NN) is able to significantly reduce errors of a lower-resolution version of the physical model with respect to a higher-resolution version of the same model for 10-minute forecasts. They conclude that the NN has learned the tendencies from the unresolved scales in the lower-resolution model, and can therefore be used to parameterise these unresolved scales.
I appreciate the originality of the work and the level of detailed analysis it provides. It fits well with current efforts in the community to use machine learning for parameterisation of unresolved scales in geophysical models. However, given the very idealised setup, I have some concerns about the wider applicability of the results. Below, I spell that out in comments which I would like the authors to address before publication:
General Comments
1) There are a number of very strong idealisations and restrictions in the setup of this study: a) it is a so-called "perfect model" study, i.e. the performance of the lower-resolution model with/without NN corrections is assessed against a "truth" which is a simulation of that same model at higher resolution, without involving any observations or simulations from a different model; b) The forecast lead times considered are extremely short for most real-life weather and climate applications (only up to 1h); c) spatial domain is a simple rectangular channel; d) no treatment of sea-ice thermodynamics. Given these very strong idealisations and restrictions, one would hope for results that are a bit more convincing than the ones presented. I have concerns about whether the methods presented will be useful in a more realistic context, where each of the above assumptions will need to be relaxed. Can the authors please add some in-depth discussion (or even preliminary analysis) about what they think will happen if their methods are applied in a more realistic context?
2) Figure 7 and the corresponding text makes me wonder how much of the error reduction achieved by the NN is actually due to correcting the bias (i.e. mean error) of the low-resolution simulation w.r.t. the high-resolution simulation. Can the authors please provide some analysis to quantify the contribution of bias to the overall errors, with and without the NN corrections? For instance, one could just decompose the mean squared errors shown in the manuscript into squared bias and variance of the errors. I am asking this because there is a range of other methods to treat biases (e.g. a-priori by tuning the model, and a-posteriori by subtracting them from the forecast before further analysis). These methods are often simpler than the machine-learning approach and are in wide use in the weather and climate community. Utilising a complex and costly machine-learning approach only pays off it is clearly superior to other available methods.
3) Following up on the previous comment, I would like the authors to comment on potential overfitting of the NN in their methods. If I did the maths correctly, there are about 4500 degrees of freedom in the lower-resolution physical model (9 variables times 500 grid points). As stated on line 197, the NN has 1.2 million trainable parameters. So one could argue the NN has orders of magnitude more degrees of freedom than features it is learning from or results it is predicting. I am not an expert on machine learning, but that strikes me as odd - could the authors please comment on that? I would also like to see some quantitative analysis on the risk of overfitting.
4) Please revise the presentation of the methods, this is not sufficient in some places, and difficult to follow in others. See technical comments.
5) I am afraid I do not quite understand the motivation why a projection to a Cartesian grid is needed (Section 3.2). It seems to complicate the methods unnecessarily. Can the authors please clarify the motivation for doing this, and what the feasibility/implications would be of doing the analysis on the original triangular grid? Is this just a reflection of the fact that the standard machine-learning libraries for spatial analysis cannot deal with non-Cartesian grids?
Technical comments:
- Figure 1: Please specify which physical variable that is displayed (damage?). Could the authors find a more convincing "showcase" example? By visual inspection, it looks to me like there is still substantial errors in the "hybrid" field, which seems at odds with the claim of an 75% error reduction. Please quantify the error reduction for the case shown.
- l. 35f. (and elsewhere): I am not sure what "wave-like" and "channel-like" - please be more precise.
- Line 76 & 89: a 10 minute (or even 1 hour) forecast is extremely short both for main-stream earth system models and real-world applications. Can you please comment on that and justify looking at these very short time scales?
- The introduction in ll. 80-92 already gives too much technical detail about the methods. This belongs elsewhere.
- In Figure 2 and the corresponding text, the authors need to help the reader to get a physical understanding of the situation that causes the ice to fracture. Please add arrows indicating the wind field, and refer to Equation (1). Please specify which direction is x and which is y.
- Also Figure 2: Please use other colours than black and red to indicate the two grids, otherwise it is difficult to see for color-blind people.
- Line 134: I do not know what "wave-like" means. Please be more precise, and provide the equation with the wind forcing at the earliest possible place in the text.
- Figure 6: I much appreciate the sensitivity testing in Section 5.2, very good! However, I am puzzled by the very weak cross-variable coupling in the permutation feature importance. It seems contradictory to your claim that the NN has "learned the dynamics" of the physical model. For instance, for damage as an output variable, it seems that the NN only extracts information from the damage itself, all other input variables are unimportant! Could you please provide some more explanation/clarification/analysis on this?
- Figure 7: It is striking that the low-resolution model is much worse than simple persistence. This makes me wonder whether the NN is just correcting biases (see general comment #2). Please provide some discussion on this.
- Lines 516 - 519: This is a good start, but a much more in-depth discussion is needed here of the implications and wider applicability of the work presented (see general comment #1).Citation: https://doi.org/10.5194/egusphere-2022-1342-RC1 - AC1: 'Reply on RC1', Tobias Finn, 19 Apr 2023
-
RC2: 'Comment on egusphere-2022-1342', Nils Hutter, 23 Mar 2023
In this manuscript, the authors present a novel machine-learning method to correct unresolved sea-ice dynamics in simulations with low resolution. From the comparison of high and low-resolution simulations in an idealized domain neural networks are trained to predict the residual between both simulations at a certain lead time, which demonstrate promising performance. This approach aligns with recent developments in climate research, where machine learning is used to parameterize unresolved processes in low-resolution simulations. The study presents several innovative approaches to sea-ice science and provides a thorough evaluation of the performance of the trained ML algorithms. To the best of my knowledge, this paper is one of the most advanced works employing machine learning in the field of sea-ice dynamics. The presented analysis is sound and requires only a few modifications that I list below. The authors need, however, to improve the paper’s presentation of the paper as it can be difficult to follow in major parts. The manuscript is overly packed with information and details that can be challenging to grasp, even with a background in sea-ice dynamics and machine learning. I strongly recommend the manuscript for publication in The Cryosphere after the authors have addressed the issues mentioned and detailed below.
General comments:
- Target audience: I think the authors should keep two audiences in mind that will be interested in this work: sea-ice scientists and ML experts. The manuscript in its current form describes the ML part, network design, and thorough evaluation of the performance of the NN in great detail. I appreciate this for reproducible science, but am also afraid that the amount of detail makes the manuscript hard to follow for readers with a sea-ice background and limited knowledge of ML. This could be addressed by shortly introducing the many ML concepts before discussing them in length and/or reducing or reorganizing the information content of the paper (which I will explain in the next point). I highly recommend reading and editing the paper through the lenses of both audiences.
- Readability: I had a hard time following the first half of the paper on my first read. After reading the entire manuscript and knowing the subject, I could follow it better on a second read. Therefore, I would suggest editing and maybe restructuring this part of the manuscript thoroughly. In general, the manuscript holds a lot of information in part to describe the set-up, analysis, and results in detail, but also information that is only linked but not strictly necessary for the understanding or interpretation of the paper. Especially the latter makes it hard to stay focused on the storyline. I recommend going through the paper and reconsidering which information is necessarily required. This would also give the authors more space to explain important concepts in more detail. Section 3.1 helped me a lot to understand what you are after and I definitely recommend moving it further up in the manuscript, maybe even into the introduction. I would also consider moving the description of the data generation (Section 4) before the description of the ML, which would help to understand the network design etc. Section 2 is rather long and I would consider shortening it and eventually merging it with Section 4 as both discuss the sea ice model and the simulations. Up to Section 5, I had a hard time finding a storyline to follow. Please try to emphasize your storyline there stronger and try to guide readers better.
- Lead time for update: The authors use a lead time of 10 min 8 s to update the coarse resolution model. While all other design choices have been explained in detail, this is not the case for the lead time. Why did you choose this lead time? Wouldn’t you expect a shorter lead time to improve the results? With the existing twin simulations, it is straightforward to extract the residual between the truth and forecast model also at other lead times. Therefore, I strongly suggest studying also the effect of different lead times here. I would be especially interested to see if shorter lead times improve the seesaw patterns of the trajectories of the hybrid model in Figure 7.
- Generalization: The neural networks presented in the paper are trained on a specific (idealized) model configuration, which is also a good choice for this proof of concept. There is, however, only limited discussion of what steps are needed to use the same approach in other model configurations, especially realistic ones: do users need to train different NNs for each new model configuration, which will get very expensive as high-resolution truth simulations are required? Or can the trained weights of the kernel be applied also to different grid geometries in different configurations or could be used as starting weights to reduce the amount of training data? A discussion of these considerations would be helpful to get an impression of how feasible and flexible this approach can be applied in other model set-ups.
Specific comments
L1: "of"
Remove “of” as “subgrid-scale” is an adjective.
Abstract.
I would consider rearranging the abstract, maybe shortening sentences. Might be a matter of taste, but I had a hard time following it reading it the first time.
L5: includes important inductive biases needed for sea-ice dynamics.
Unclear what is meant by these biases.
L7: we cast the subgrid-scale parametrisation as model error correction
Unclear, please rephrase.
L11: cycling
What do you mean by cycling?
L11: physically-explainable input-to-output relation
It is not clear what is meant by “physically-explainable”, please clarify.
L16: dynamics of sea ice at an unprecedented resolution and accuracy
Please clarify what unprecedented means with respect to the resolution. All three papers use simulations with a resolution of 10km or lower, while much higher resolution sea-ice simulations have been presented. Do you mean unprecedented accuracy at the given resolution?
L16: Elasto-Brittle
Why capitalized? Here and elsewhere
L17: represent
Reproduce?
L19: single grid cell at the mesoscale
What is meant by mesoscale here? Please clarify
L31: the mesoscale
See comment above: please define the length scale mesoscale refers to here.
Figure 1
Please clarify that (a) shows the high-resolution initial conditions, but (b) and (c) the low-resolution forecasts one hour later. Why not show for all the damage after 1h forecast, so that the reader actually gets an impression if the hybrid model in (c) is closer to the high-resolution “truth” or not?
L37: possibly projected
What is meant by this? Please clarify
L39-40: Here, the low-resolution simulation 40 (b) misses the rapidly developed opening of sea ice in the high-resolution simulation (a).
Does this refer to the upper or the lower opening in the figure? Please clarify in the text.
L54-59: paragraph about marginal ice zone:
Does your regional model include the MIZ? To me, it looks more like pack ice with cracks. Also along leads there are sharp transitions that the NN needs to handle, so I think it is justified to present this issue here. However, please frame it in a way that fits your problem at hand.
L56: jump
Step function?
L98: as well
Remove?
L123-124: As the nodes are shared in the first-order elements, there are more grid points for all variables that are defined as zeroth-order elements than for the velocity and forcing components.
What is the relevance of this? Could you elaborate if this is an important point needed to be considered to interpret the presented results?
Section 3: A deep learning based subgrid-scale parametrisation
The presentation of the machine learning tools is done very thoroughly, which I appreciate and see as valuable for reproductivity. Given that ML applications in this field of science are just emerging and there are many geophysical and climate scientists interested in advancing in this field, I am afraid that the description is presented too high level for an audience with limited knowledge of ML. To also target this part of the scientific community and broaden the audience for this paper, I recommend summarising the main parts and ideas behind it more comprehensively for readers with limited ML background at the beginning of this section. While I see this recommendation as optional, as all necessary information is given in the current draft, I want to emphasize the large beneficial value I see in adding a summary like this.
L147-149: There, linear functions combine pixel-wise (i.e., processing each element defining grid point independently) the extracted features. Each linear function is shared across all grid points for each predicted residual variable.
The linear transformation from features to residuals is not clear to me. Does this involve combining different features for each grid point, where the weights of these combinations are learned in the training? Or is it a fixed combination? Please clarify the text accordingly.
Figure 3.
Does the red, blue, and grey color code for arrows, boxes, and labels have a specific meaning (trainable vs fixed or similar)? If so please give some explanation.
L154: 3.1 Problem formulation
This section helps to understand our approach’s goal, and I strongly suggest moving it further up (maybe even the introduction) to give the reader a better understanding of what you try to achieve before going into the details of the model or the pipeline.
L180-182: Note, for coarse Cartesian spaces, the mapping from Cartesian space to triangular space can be non-surjective, meaning that not all triangular elements are covered by at least one Cartesian element: the pseudo-inverse is in this case rank deficient.
This is unclear to me: why should the bigger triangles of the coarse resolution simulations not be covered by the higher resolution Cartesian elements? If at all, I imagine that should be an issue of the high-resolution grid with smaller triangles. Please clarify.
L196: complete U-net architecture
Do I understand the architecture correctly that you downscale only once in your U-Net? If that is the case, the illustration in Figure 3 is misleading, as 4 down scaling steps are shown. Please clarify this and adapt the figure potentially.
Section 4: Experimental setup
This section describes how data to train the NN is created. Consider renaming this section to e.g. “training data generation” or similar. I also would consider moving this section before the details on the ML algorithms as I feel it helps to know the data before getting introduced to the detailed methods.
L305: their expectation
-> their expected value?
Table 3 - Caption: MAE
Is the MAE computed at high or low resolution?
Table 3 - Caption: A score of one would correspond to the performance of the geophysical model forecast in the training dataset.
Do you mean the coarse resolution geophysical model forecasting the high resolution run? Please clarify.
Table 3 - Caption: the afterwards used architecture and the best scores
If this refers to the hybrid model or online use of the correction, please write this. Also, consider writing “that shows” instead “and”.
L320: persistence forecast performs
Could you for clarity once define what you use as a persistence forecast? It might be obvious to you, but for readers outside the field, it will be helpful.
L335: Such localised features
Consider adding the length scale in km if you think this finding is generalizable or holds valuable information for other processes related to sea-ice deformation.
L340: a generally smoother background pattern
What is meant by this? Please clarify and rephrase.
Figure 6
I) What colormap uses a)?
II) I am wondering if also the concentration maps in the initial and forecast step would be helpful here to interpret the gradients?
L378: either
Isn’t it to both instead of either or?
L380: Table ??
-> correct reference
L382-383: Additionally, the sensitivity is directional dependent, Fig. 6g, and exhibits localised features, Fig. 6c and i
Could you discuss these results also in the light of physical understanding that we can gain from the gradients? From both the gradients along initial and difference, we can learn about the shortcomings of the coarse resolution simulations that the NN tries to compensate for.
L391: −1×10−3 and 1×10−3
Units?
L392: Related to optimal control theory in dynamical systems,
This is not very helpful for readers with limited background knowledge. Please elaborate more or rephrase.
L405-408: Additionally, for the velocity, stress, and damage, the drift towards … the "Initial + Forecast" experiment in these variables and averaged over all nine model variables.
Unclear, please rephrase.
L412: As the initial condition error increases with each update, the network corrects less and less forecast errors.
Could this effect be dampened by updating at higher frequencies?
L413-419:"To show the effect of this error distribution shift, … An averaged correlation of 1 would indicate a perfect pattern correlation."
This paragraph is hardly understandable with no background knowledge of the method “centred spatial pattern correlation”. I suggest describing the principle of the method and its interpretation at the beginning of the paragraph in 1-2 sentences, before describing its specifics.
L422: especially for the divergent stress
From the previous paragraph, it sounds as if a value close to 1 is favorable, but this statement reads as if a high value close to 1 for divergent stress shows a weakness of the NN. Please clarify.
L434-435: However, the parametrisation misses the development of new strains and positions the main strain at the wrong place.
This suggests that the corrections of the NN violate the brittle model physics, as highly damaged areas are usually linked to high deformation rates. Is this correct? If so, please comment on this also in the text, and if there is a way to design a network that computes corrections in accordance with the physical laws of the model.
L456-457: Therefore, using such a mapping into Cartesian space, we can apply CNNs, which can efficiently scale to larger, Arctic-wide, models.
Are you talking about Arctic-wide models on unstructured grids? Or why is the mapping needed? Please clarify the text.
L462-463: As processes have no discretized resolution in realworld, we would have difficulties to find the right resolution for the projection in such cases.
Isn’t that only an issue if you would aim to train a correction with observations? If it is a model, then you would always know the resolution of resolved scales. Please clarify
L462: truth
Please clarify what is meant by truth: the high-resolution simulation or something different
L464: this argument
What argument?
L503: The only way is therefore to improve the forecast model, thereby changing its attractor.
What about updating the forecasting model at higher frequencies? Please comment.
L530-532: Mapping the input data into a Cartesian space that has a higher resolution than the original space, such scalable convolutional neural networks can be applied for feature extraction in sea-ice models defined on a triangular or unstructured grid.
Something is wrong with this sentence, please correct it.
L543: total deformation
On which Figure or result is the statement that the total deformation is improved? From Fig. 8 ,I would agree that the damage in the hybrid model looks closer to the high-resolution run than the uncorrected low-resolution simulation, but for total deformation, it is the other way around in my eyes.
L552: Appendix
Table?
L562-564: Caused by their limited capacity, the NNs have to focus on some variables, creating an imbalance between variables, which harms the performance for other variables, like the stress in the case of "Conv (×5)".
Have you tried to train individual networks for each variable, which could balance this effect?
L565: fast NN
Is the speed of the trained U-NeXt NN an issue compared to the computational costs of the geophysical model? Or does this refers to training speeds?
Citation: https://doi.org/10.5194/egusphere-2022-1342-RC2 - AC2: 'Reply on RC2', Tobias Finn, 19 Apr 2023