We introduce a proof of concept to parametrise the unresolved subgrid scale of sea-ice dynamics with deep learning techniques.
Instead of parametrising single processes, a single neural network is trained to correct all model variables at the same time.
This data-driven approach is applied to a regional sea-ice model that accounts exclusively for dynamical processes with a Maxwell elasto-brittle rheology.
Driven by an external wind forcing in a

Sea-ice models with elasto-brittle rheologies

Snapshot of sea-ice damage for a 1 h forecast with the here-used regional sea-ice model.
Shown are the high-resolution simulations (

To exemplify the impact of these unresolved subgrid-scale processes on the sea-ice dynamics, and to see how deep learning can remedy these issues, we perform twin experiments with a regional sea-ice model that depicts exclusively the dynamics in a Maxwell elasto-brittle rheology

Subgrid-scale parametrisations with machine learning have already been proved useful for other Earth system components

To predict the sea-ice concentration, purely data-driven surrogate models can replace geophysical models at daily

The dynamics of sea ice hereby impose new challenges for neural networks (NNs) that should parametrise the subgrid scale.

Current sea-ice models represent leads in a band of a few pixels, and sharp transition zones can appear as a non-continuous step function within the data.
For such discrete–continuous mixture data distributions, NNs that simply learn to regress into the future tend to diffuse and blur the target

In elasto-brittle models, the handling of the internal stress depends on the fragmentation of sea ice. This dependency also leads to different forecast error distributions for different fragmentation levels, even for variables only indirectly related to the stress, like the sea-ice thickness. Consequently, for model error correction, an NN has to be trained across a range of fragmentation levels and should be able to output multimodal predictions in the best case.

As sea ice is scale-invariant up to the kilometre scale, fragmentation of sea ice propagates from small, unresolved, scales to the larger, resolved, scales.
Because the small scales are unresolved, the appearance of linear kinematic features seems to be stochastic from the resolved macro-scale point of view.
Furthermore, such features are inherently multifractal and propagate in an anisotropic medium

As a first step towards solving these challenges for NNs and giving a proof of concept, we present the aforementioned twin experiments with a regional model.
Our goal is to train NNs to correct the output of simulations with a

We introduce the problem that we try to solve, the regional sea-ice model, and our strategy to train the NNs in Sect.

Our goal is to make a proof of concept that subgrid-scale processes can be parametrised by neural networks (NNs). We hereby parametrise subgrid-scale processes with an NN that corrects model errors. As a test bed, we use a regional sea-ice model that depicts sea-ice dynamics in a Maxwell elasto-brittle rheology. To train the neural networks, we use twin experiments, where we compare a low-resolution forecast to a known high-resolved truth, simulated with the same sea-ice model.

Our goal is to parametrise unresolved processes of the forecast model

The correction is represented by the output of an NN,

To apply the model error correction for continuous forecasting, the predicted residual is added to the forecast, resulting into the corrected forecast

The model depicts the dynamical processes of sea ice with a Maxwell elasto-brittle rheology

The model's equations are spatially discretised by a first-order continuous Galerkin scheme for the sea-ice velocity components, and a zeroth-order discontinuous Galerkin scheme for all other model variables.
The model is integrated in time with a first-order Eulerian implicit scheme, and a semi-implicit fixed point scheme iteratively solves the equations for the velocities, the stress, and the damage.
The model area spans

As external wind forcing, depending on the spatial

We use von-Neumann boundary conditions and an inflow of undamaged sea ice. With this model setup, the simulations can generally be seen as a zoomed-in region within an undamaged sea-ice field.

In our twin experiments we have two kinds of simulations, as depicted in Fig.

In our twin experiments, the high-resolution state with a

To initialise the forecast that should be corrected towards the truth, we project the true initial conditions from the high resolution to the low resolution. As projection operator, we make use of the interpolation defined by first-order continuous Galerkin and zeroth-order Galerkin elements, corresponding to Lagrange interpolation with (linear) barycentric and nearest neighbour interpolation, respectively.

To generate the forecast, the initial conditions at the low resolution are integrated to the target lead time with the forecast model. As we want to reinitialise the forecast model with the corrected model fields later, the model error correction has to be estimated at the low resolution. To consequently match the resolution of the forecast with the truth, we project the truth at the target lead time to the low resolution with our previously defined projection operator.

The neural network targets the difference between truth and forecast at the low resolution (see also Sect.

The neural network (NN) should learn to relate the input predictors to the output targets.
The inputs and targets are spatially discretised as finite elements, and the NN should directly act on this triangular model grid.
Moreover, the NN architecture should be scalable from regional models, as used in this study, to Arctic-wide models, like neXtSIM.
As we expect that the model errors from the sea-ice dynamics have an anisotropic behaviour, we additionally want to directly encode the extraction of localised features with a directional-dependent weighting into the NN.
Therefore, as depicted in Fig.

In our deep learning approach

Convolutional NNs are optimised for their use on Cartesian spaces, where they can easily exploit spatial autocorrelations.
The model variables are additionally defined on different positions at the triangles: the velocities are defined on the nodes of the triangles, whereas all other variables are constant across a triangle.
Consequently, we project from triangular space into Cartesian space, where the convolutional NN is applied to extract features.
As in the projection step from high-resolution model grid to low-resolution grid, we again use Lagrange interpolation with a Barycentric and nearest neighbour interpolation, as in our twin experiments (see also Sect.

The U-Net uses convolutional filters and shares its weights across all grid points.
This way, the U-Net extracts shift-invariant and localised features which represent common motifs.
To learn features at different scales, the features are coarse grained once in the encoding part of the U-Net (left part of the U-Net in Fig.

Instead of commonly used convolutional blocks with standard convolutional filters, followed by a normalisation and non-linear activation function, we make use of blocks inspired by the ConvNeXt architecture

The extracted features are projected back from the Cartesian space into the triangular space. Because the projection operator is purely linear, the back-projection operator can be analytically estimated by the pseudo-inverse of the projection matrix. As the Cartesian space is higher resolved, the back-projection averages the features of several Cartesian elements into features of single triangular elements.

Back in the triangular space, the extracted features are combined by learnable linear functions. These linear functions process each element-defining grid point independently but using the same weights across all grid points. To estimate their own model error correction out of the features, each of the nine model variables has its own linear function.

In total, by projecting the input into a Cartesian space, the convolutional U-Net extracts features which are then the basis for the estimation of the output in the original triangular space. The use of the U-Net allows us to extract localised features and an efficient implementation, even for Arctic-wide models. The extraction of features at a higher resolution bundled with their combination in triangular space makes the NN directly applicable for finite-element models.

We train and test different NNs with twin experiments using the regional sea-ice model, as described in Sect.

We train the NNs on an ensemble of

All high-resolution trajectories are initialised with a randomly chosen cohesion field and randomly drawn forcing parameters, as specified in Table

The random ensemble parameters and their distribution;

Defining the truth trajectories, the high-resolution simulations are run for 3 d of simulation time.
The forcing is linearly increased to its full strength, as in

These datasets contain input–target pairs. The inputs for the NNs consist of 20 fields: nine forecast model fields and one forcing field for the initial conditions and the forecast lead time. The targets are the difference between the projected truth and the forecasted state at the forecast lead time and consist of nine fields. The inputs and targets are normalised by a global per-variable mean and standard deviation, both estimated from the training dataset.

The hourly slicing gives us

We train the NNs by minimising a loss function proportional to a weighted mean absolute error (MAE); a more rigorous treatment of the loss function can be found in Sect.

If not otherwise specified, all NNs are trained for

All experiments are performed on the CNRS/IDRIS (French National Centre for Scientific Research) Jean Zay supercomputer, using a single NVIDIA Tesla V100 GPU or NVIDIA Tesla A100 GPU per experiment.
The NNs are implemented in PyTorch

We propose a baseline architecture based on the U-Net, as described in Sect.

We evaluate our trained NNs on the test dataset, with the mean absolute error (MAE) in the low resolution.
To get comparable performances across the nine model variables, we normalise their errors by their expected MAE in the training dataset.
Note that this normalisation results in a constant weighting, differing from the adaptive weighting used during the training process, which depends on the training trajectory.
Furthermore, this normalisation allows us to estimate the performance of the NNs with a single metric, averaged over all model variables.
The NNs are trained 10 times with different random seeds (

As a baseline method, we use a persistence forecast with the initial conditions as a constant prediction. We additionally compare the forecasts corrected by the NN to the uncorrected forecasts from our sea-ice model.

In the following, we discuss the results on the test dataset in Sect.

In the first step, we evaluate the performance of our model error correction on the test dataset, without applying the correction together with the geophysical model, Table

Normalised MAE on the test dataset, estimated in low resolution and averaged over 10 NNs trained with different seeds.
Reported are the errors for the velocity component in

The NN corrects the model forecasts across all variables.
This results in an averaged gain of the hybrid model over

To apply convolutional NNs (CNNs) to the raw data of our finite-elements-based sea-ice model, we project from triangular to Cartesian space, where the features are extracted.
The number of elements in the Cartesian space determines its effective resolution and, thus, the finest scale on which the NN can extract features.
To demonstrate the effect of different resolutions on the result, we perform three different experiments, where we change the grid size while keeping the NN architecture the same (Table

Normalised MAE on the test dataset for different Cartesian grid sizes,

The training loss, here the negative Laplace log-likelihood, measures how well an NN can be fitted towards the training dataset.
Although its resolution is higher than the original resolution of

Normalised feature map in Cartesian space for grid sizes of

In Fig.

The inputs of the NN have a crucial impact on the performance of the model error correction.
In the following, we evaluate the sensitivity of the NN with respect to its input variables.
In a first step, we alter the input and measure the resulting performance of the NN with the normalised MAE, Table

Normalised MAE on the test dataset for different input sets.
The error components are estimated as in Table

Usually, only the initial conditions are used for a neural-network-based model error correction

In the second step, we analyse how the input variables influence the output of the NN.
As we want to quantify the impact of the dynamics on the output, we base the analysis on the previous “Initial

The permutation feature importance of the RMSE for the given output variable with respect to the input variable for “Initial

All model variables are highly sensitive to their own dynamics.
Furthermore, the feature importance reflects the relations inherited by the model equations

As a local measure, we move to the sensitivity

Snapshots at an arbitrary time of

For the selected grid point, the prediction is especially sensitive to the area itself and the thickness, in absolute values, Fig.

Based on these sensitivities, we can interpret what features the NN has learned, guiding us towards a physical meaning of the model errors.
The diametral impacts of the thickness and area in absolute values and dynamics indicate that the sea-ice model tends to overestimate the effect of the dynamics, whereas the initial conditions have a stronger persisting influence than predicted by the model.
However, the connectivity between grid points is underestimated by the model, as seen in Fig.

In general, this analysis has shown that the NN relies not only on a single time step as a predictor but also on how the fields develop over time, indicating that the dynamics themself are the biggest source of model error between different resolutions. Additionally, the network extracts localised and anisotropic features, which are physically interpretable and point towards general shortcomings in the dynamics of the sea-ice model.

After establishing the importance of the dynamics for the error correction, we use the error correction together with the low-resolution forecast model for short-term forecasting.
As trained for a forecast horizon of

Normalised RMSE for

Overall, the hybrid models surpass the performance of the original geophysical model (Fig.

Normalised RMSE on the test dataset for a lead time of

The forecast error generally increases with lead time, but the error reduction gets smaller with each update, especially for the sea-ice area.
Since the NN correction is imperfect, the error during the next forecast cycle is an interplay between the errors from the initial conditions and from the model.
The NN is trained with perfect initial conditions to correct the model error only.
As the influence of the initial condition error increases with each update, the error distribution shifts, and the statistical relationship between input and residual changes with lead time; the network can correct fewer and fewer forecast errors.
This effect has an larger impact on the forecast if the lead time between two corrections with the NN is further reduced (Sect.

To show the effect of this error distribution shift, we analyse the differences between the first and fifth update step with the centred spatial pattern correlation

The correlations are estimated over space for each test sample and variable independently and averaged via a Fisher z-transformation

Centred pattern correlation on the test dataset between the updates and the residuals for the first update and fifth update.
The symbols of the variables are the same as in Table

Since they are trained for this, the NNs can almost perfectly predict the residual patterns for the first update. At the fifth update, larger parts of the residual patterns are unpredictable for our trained NN. Especially, the sea-ice area has a longer memory for error corrections such that the predicted patterns are almost unrelated to the residual patterns for the fifth update. Caused by the drift towards the attractor, the sea-ice model forgets parts of the previous error correction for the velocity and divergent stress component, and these forgotten parts get corrected again in the fifth update. However, the pattern correlation is also decreased for these dynamical variables for the fifth update. Based on these results, the error distribution shift is one of the main challenges for the application of such model error corrections for forecasting.

Our proposed parametrisation is deterministic and is designed to target the median value.
On the resolved scale, sea-ice dynamics can look stochastically noised, with suddenly appearing strains and linear kinematic features, as discussed in the introduction.
We show the effect of the seemingly stochastic behaviour in Fig.

Snapshots of damage (left) and total deformation (right), showing their temporal evolution, in the high-resolution simulation (top), in the low-resolution forecast model (middle), and in the low-resolution hybrid model (bottom).

The initial state exhibits damaged sea ice in the centre, corresponding to a diagonal main strain in the total deformation. In the high-resolution simulation, the damaging process continues, leading to more widespread damaging of sea ice. Related to new strains, the damage is extended towards the south. The low-resolution forecast model only diffuses the deformation without the remaining main strain in the already damaged sea ice. As a result, the model misses the southward-extending strain and damaging process. Furthermore, the model extends the damage and deformation southwards, although the newly developed strain is weaker than in the high resolution. The parametrisation can represent widespread damaging of sea ice. However, the parametrisation misses the development of new strains and positions the main strain at the wrong place. This problem can especially occur on longer forecasting timescales, where the damage field is further developed compared to its initial state. Therefore, we see the need for parametrisations that can also reproduce the stochastic effects of subgrid scales onto the resolved scales.

We have introduced an approach to parametrise subgrid-scale dynamical processes in sea-ice models based on deep learning techniques.
Using twin experiments with a model of sea-ice dynamics that implements a Maxwell elasto-brittle rheology, the NN learns to correct low-resolution forecasts towards high-resolution simulations for a forecast lead time of

Our results show that NNs are able to correct model errors related to the sea-ice dynamics and can thus parametrise the unresolved subgrid-scale processes as for other Earth system components.
In addition, we are able to directly transfer recent improvements in deep learning, like ConvNeXt blocks

For feature extraction, we map from the triangular model space into a Cartesian space with a higher resolution to preserve correlations of the input data. Our results hereby show that higher-resolved Cartesian spaces improve the parametrisation; the network can then extract more information about the subgrid scale. In the Cartesian space, a convolutional U-Net architecture extracts localised and anisotropic features on two scales. Mapped back into the original triangular space, the extracted features are linearly combined to predict the residuals, which parametrises the effect of the subgrid scale upon the resolved scales. Therefore, using a mapping into Cartesian space, we can apply CNNs to Arctic-wide models with unstructured grids, like neXtSIM.

Our results suggest that the finer the Cartesian space resolution, the better the performance of the NN.
This improvement could emerge from our type of twin experiments, where the main difference in the resolved processes is a result of different model resolutions.
Consequently, extracting features at a higher resolution than the forecast model might be needed to represent the processes of the higher-resolution simulations; the NN would act as an emulator for these processes.
In this case, the resolution needed for the projection would be linked to the resolution of the targeted simulations.
However, in light of our results, this link seems to be unlikely: the performance of the finer

The gain likely results from an inductive bias in the NN for Cartesian spaces with higher resolutions. We keep the NN architecture and its hyperparameters, like the size of the convolutional kernels, the same, independent of the resolution in the Cartesian space. Consequently, viewed from the original triangular space, the receptive field of the NN is reduced by increasing the resolution. The function space representable by such NN is more restricted, and, as the fitting power is reduced, the training loss increases again. The NN is biased towards more localised features. These localised features help the network to represent previously unresolved processes better. This better representation improves the generalisation of the NN, resulting in lower test errors. However, as this study is performed with twin experiments in very specific settings, it remains unknown to us if the projection into a space that has a higher resolution is advantageous for subgrid-scale parametrisations in general.

The permutation feature importance as a global feature importance and sensitivity maps as a local importance help us to explain the learned NN by physical reasoning. The sensitivity map has additionally shown that the convolutional U-Net can extract anisotropic and localised features, depending on the relation between input and output. We see such an analysis as especially relevant for subgrid-scale parametrisations learned from observations, as the feature importance can be utilised to find the sources of model errors and guide model developments.

Applying the NN correction together with the forecast model improves the forecasts up to 1 h.
Since the error correction is imperfect, the initial condition errors accumulate for longer forecast horizons.
The longer the forecast horizon, the less the targeted residuals in the training data are representative of the true residuals.
Such issues would be solved in online training of the NNs

Although the here-learned NNs can make continuous corrections, they represent only deterministic processes.
As the evolution of sea ice propagates from the subgrid scale to larger scales, unresolved processes can appear like stochastic noise from the resolved point of view.
Consequently, the deterministic model error correction is unable to parametrise such stochastic-like processes, which can lead, for example, to wrongly positioned strains and linear kinematic features.
Generative deep learning

Because of missing subgrid-scale processes in the low-resolution forecast model, the high-resolution simulations, projected into the low resolution, are far off the low-resolution attractor.
Consequently, when the forecast is run freely, it drifts toward its own attractor, resulting in large deviations from the projected high-resolution states.
This difficult forecast setting is indeed quite realistic, as models in reality also miss subgrid-scale processes

A subgrid-scale parametrisation can generally be seen as a kind of forcing.
Here, we use a resolvent correction, where we correct the forecast model with NNs at integrated time steps; the parametrisation is like Euler integrated in time.
Our results show that the NN needs access to the dynamics of the model to correct tendencies related to the drift towards the wrong attractor, at least at correction time.
One strategy can thus be to increase the update frequency or to distribute the correction over an update window, similarly to an incremental analysis update in data assimilation

This study and its experiments are designed to be a proof of concept. The NN is able to correct model errors; our results nevertheless indicate shortcomings and challenges towards an operational application of such subgrid-scale parametrisations. Our sea-ice model exhibits a strong drift towards its own attractor, which leads to large differences between simulations at different resolutions. It is yet unknown for us if this strong drift is only evident in our model or if it also prevails for other sea-ice models. Nevertheless, the NN should ideally take the models's attractor into account such that the corrected states stay on this attractor.

Additionally, the NN is trained to correct forecasts for a specific model setup and a specific model resolution.
Normally, the NN has to be retrained for other setups and especially other resolutions.
However, we might be lucky in correcting model errors from sea-ice dynamics: as sea-ice dynamics are temporally and spatially scale-invariant for resolutions up to at least

In our case, we apply twin experiments, where we train the NN to correct forecasts with perfectly known initial conditions towards a high-resolution simulation. Although such training is simple and in our case sufficient, the NN suffers from an error distribution shift. Applying twin experiments for the training of subgrid-scale parametrisations, the NN learns to emulate processes of the high-resolution simulations. Such an emulation could allow us to achieve a similar performance with low-resolution simulations as with high-resolution simulations, which would speed up the simulations. However, in this case, the NN learns instantiations of already known processes.

Instead, subgrid-scale parametrisations should ideally be learned by incorporating observations into the forecast.
This way, the parametrisation could learn to incorporate processes which might yet be unknown.
Such learning from sparsely distributed observations can be enabled by combining machine learning with data assimilation

Based on our results for twin experiments with a sea-ice-dynamics-only model in a channel setup, we conclude the following.

Deep learning can correct forecast errors and can thus parametrise unresolved subgrid-scale processes related to the sea-ice dynamics.
For its trained forecast horizon, the neural network can reduce the forecast errors by more than

A single big neural network can parametrise processes related to all model variables at the same time. The needed weighting parameters can hereby be spatially shared and learned with a maximum likelihood approach. A Laplace likelihood improves the extracted features compared to a Gaussian likelihood and is better suited to parametrise the sea-ice dynamics.

Convolutional neural networks with a U-Net architecture can represent important processes for sea-ice dynamics by extracting localised and anisotropic features from multiple scales. For sea-ice models defined on a triangular or unstructured grid, such scalable convolutional neural networks can be applied for feature extraction by mapping the input data into a Cartesian space that has a higher resolution than the original space. The finer Cartesian space hereby keeps correlations from the input data intact and enables the network to extract better features related to subgrid-scale processes.

Because forecast errors in the sea-ice dynamics are likely linked to errors of the forecast model attractor, we have to apply the model error correction as a post-processing step and input into the neural network the initial and forecasted state. This way, the neural network has access to the model dynamics and can correct them. Consequently, the dynamics of the forecast model variables are the most important predictors in a model error correction for sea-ice dynamics.

Although only trained for correction at the first update step, applying the error correction together with the forecast model improves the forecast, tested up to 1 h. The accumulation of uncorrected errors results in a distribution shift in the forecast errors, making the error correction less efficient for longer forecast horizons. Online training or techniques borrowed from offline reinforcement learning would be needed to remedy this distribution shift.

The deterministic model error correction leads to an improved representation of the fracturing processes. Nevertheless, the unresolved subgrid scale in the sea-ice dynamics can have seemingly stochastic effects on the resolved scales. These stochastic effects can result in wrongly positioned strains and fracturing processes for a deterministic error correction. To properly parametrise such effects, we would need generative neural networks.

In the following paragraphs, we will describe the most important properties of the regional sea-ice model used in this study.
For a more technical presentation of the model, we refer the reader to

The parameters for the regional sea-ice model that depicts the sea-ice dynamics

The characteristic time in the damaging process is chosen to be no source of forecast error.

Compared to Arctic and pan-Arctic sea-ice models, like neXtSIM

Atmospheric wind stress is the sole external mechanical forcing, whereas the ocean beneath the sea ice is assumed to be at rest.
Given the small horizontal extent of our simulation domain (see Fig.

The Maxwell elasto-brittle rheology from

The model equations are discretised in time using a first-order Eulerian implicit scheme. Due to the coupling of the mechanical parameters to the level of damage, the constitutive law is non-linear, and a semi-implicit fixed point scheme is used to iteratively solve the momentum, the constitutive, and the damage equations. Within a model integration time step, these three fields are updated first. Cohesion, thickness, and area are updated secondly, using the already updated fields of sea-ice velocity and damage.

The equations are discretised in space by a discontinues Galerkin scheme.
The velocity and forcing components are defined by linear, first-order, continuous finite elements.
All other variables and derived quantities like deformation and advection are characterised by constant, zeroth-order, discontinuous elements.
The model is implemented in C++ and uses the

Our virtual area spans

If not otherwise stated, we initialise the simulations with undamaged sea ice, the velocity and stress components are set to zero and the area and thickness to one.
The cohesion is initialised with a random field, drawn from a uniform distribution between

For the atmospheric wind forcing, we impose a sinusoidal velocity in the

To represent spatial correlations and anisotropic features, we use CNNs.
We train a model error correction as a subgrid-scale parametrisation (see also Sect.

For the Cartesian space, we chose a discretisation of

As projection operator

In the case of zeroth-order discontinuous Galerkin elements, the projection operator assigns to each Cartesian element one triangular element. The back-projection operator then corresponds to an averaging of the Cartesian elements into their assigned triangular element. This averaging can be seen as a type of ensembling the information from smaller, normally unresolved, scales to larger, resolved, scales. We have implemented this projection operator as an NN layer with fixed weights in PyTorch.

We use CNNs in Cartesian space.
The feature extractor should be able to extract multiscale features and to represent rapid spatial transitions, which might occur only on finer scales.
Consequently, we have selected a deep NN architecture with a U-like representation, a so-called U-Net

Proposed baseline U-NeXt architecture based on ConvNeXt-like blocks.
“Down” and “Up” correspond to downsampling and upsampling blocks, respectively.
Counting with the weights of the linear functions in triangular space, the architecture has in total around

Our typical U-Net architecture consists of three different blocks:
residual blocks, mainly inspired by ConvNeXt blocks

In our standard configuration, the processing blocks are mainly inspired by ConvNeXt blocks

The residual connection is an identity function

In the branch connection, a single convolutional layer with a

Afterwards, a convolution layer with a

For the downsampling operation, in the encoding part of the U-Net, we use a layer normalisation, followed by zero padding of one pixel on all four sides, and a convolution with a kernel size of

For the upsampling operation, in the decoding part of the U-Net, we use a sequence of bilinear interpolation, which doubles the spatial resolution, layer normalisation, zero padding of one pixel on all four sides, and a convolution with a

In our NN architecture, we want to predict a model error correction for all nine model variables at the same time, which causes nine different loss function terms, like nine different mean-squared errors or mean absolute errors (MAEs).
As each of these variables has its own error magnitude, variability, and issues to correct, we have to weight the loss functions against each other with parameters

In the maximum likelihood approach, a conditional probability distribution

We treat the output of our NN

The factor in front of the absolute error,

As our NN architecture accommodates multiple decisions, here we will explore how they influence the results on the test dataset.
We show what would happen if we would use other CNN architectures (Table

Normalised MAE on the test dataset for different NN architectures, shown are average and standard deviation across 10 training seeds.
Reported are the errors for the velocity component in

The simplest approach to correct the model forecast is to estimate a global bias, one for each variable, in the training dataset and to correct the forecast by this constant. As we measure the MAE, we take as bias the median error instead of the mean error in the training dataset. Correcting the bias has almost no impact on the scores, and, consequently, the model error is dominated by dynamical errors.

As a next level of complexity, we introduce a shallow CNN architecture with one layer as feature extractor.
Using dilation in the convolutional kernel, this layer can extract shallow multiscale spatial information for each grid point.
This shallow architecture with only around

An approach to extract multiscale information is to use a U-Net architecture that extracts and combines information from different levels of coarsened resolution.
To make the approaches comparable, we use almost the same configuration as specified in Sect.

Replacing the classical convolutional layers with ConvNeXt blocks as described in Sect.

Training “Conv (

By mapping from triangular space into high-resolution Cartesian space, several Cartesian elements are caught in one triangular element.
Consequently, a simple convolutional layer would have problems extracting information across multiple scales.
To circumvent such problems, we apply in the case of the naively stacked convolutional layers two convolutional layers at the same time – one local filter with a

“Conv (

“Conv (

Our baseline “U-Net”

“U-Net” with normal convolutional blocks, where down and up correspond to downsampling and upsampling operations, respectively. Each convolutional block is a sequence of a convolutional layer, batch normalisation, and a Gelu activation function, which is skipped in the last “Output Conv” block.

In this section, we provide additional results, showing the influence of different choices in the training on the performance in the testing dataset.

Large NNs have many parameters, in the case of the U-NeXt

In the following, we analyse the training behaviour of the NN and see what happens if we artificially train on a portion of data only (Fig.

The negative log-likelihood for a Laplace distribution, proportional to the mean absolute error (MAE), with a fixed weighting in the validation dataset as a function of epochs for different fractions of training data; the brighter the colour, the less training data are used.
The smaller Figure shows the averaged MAE in the test dataset as a function of the fraction of training data.
A fraction of

For all fractions of training data, the validation NLL smoothly decreases over the course of training.
Consequently, we see no overfitting, even with only

We can now wonder why the NN trained on only

One of our fixed parameters is the lead time for which the NN is trained and applied for forecasting. In the following, we will shortly discuss the impact of the lead time between two correction steps (hereafter correction time) on the forecasting performance, again measured by the normalised RMSE.

Normalised RMSE for

Decreasing the correction time decreases how long the trajectory freely drifts towards the attractor of the sea-ice model.
Model errors can, additionally, be corrected earlier, before they have a too-large impact on the forecast.
Consequently, we would expect that the shorter the correction time, the better the forecasting performance.
However, in our case, the forecasting performance is worse for a lead time of

For an increased correction time of

Optimising the Laplace log-likelihood corresponds to minimising the mean absolute error (MAE), whereas an optimisation of a Gaussian log-likelihood minimises the mean-squared error.
Thus, we report the averaged root-mean-squared error (RMSE) and MAE over all variables to measure the influence of the loss function on the performance of the NNs (Table

The average RMSE and MAE, normalised by their expected climatology, on the test dataset for different training loss functions. The bold loss function is the selected loss functions, and bold scores are the best scores in a column.

Compared to a Gaussian log-likelihood with trainable variance parameters, the Laplace log-likelihood as the loss function improves not only the MAE by around

Two typical feature maps for a NN trained with either

The loss function influences the output of the NN and the learned features before they are linearly combined to the output (Fig.

Another decision that we took in our architecture is to use the Gaussian error linear unit (Gelu) in the blocks and the rectified linear unit activation (relu) function as the activation of the features, before they are projected back into triangular space and linearly combined.
The Gelu activation function is recommended for use in a ConvNeXt block

Normalised MAE on the test dataset for different activation functions in the ConvNeXt blocks and as feature activation (w/o: no activation function).
The error components are estimated as in Table

As similarly found in

Snapshot of typical feature maps for

In the following, we show feature maps for different activation functions at the feature output of the U-Net as a qualitative measure (Fig.

Using the permutation feature importance, we have analysed that all model variables are very sensitive to their own dynamics as predictors. Nevertheless, by permuting single predictors independently, we only destroy information contained in this predictor. As other variables might hold similar information, e.g. for the sea-ice area and thickness, the inter-variable importance is likely to be underestimated, and the permutations can lead to unphysical instances. To see the effect of the correlations on the importance, we permute different variable groups and estimate their importance on the nine output variables.

Permutation feature importance of different variable groups.
The colouring is the same as in Table 6 of the original paper.
SIU stands for velocity in the

For the sea-ice velocities, their dynamics are clearly the predictors with the biggest impact.
However, the absolute values of sea-ice area and thickness have combined a small but considerable impact on the velocity in

The stress components and damage are highly sensitive to their own dynamics if only a single variable is shuffled, as shown for the reference feature importance; however, they are insensitive if the stress components and damage are shuffled as a group. For their correction, the NN seems to rely on features that extract relative combinations of these variables. Shuffling a single variable then creates unphysical instances, which destroys such features, whereas they are kept intact when the stress components and the damage are shuffled together. The same feature importance as for the reference is reached if the velocities and the stress variables are shuffled together. Here, the dynamics are as important as the absolute values. Because the area and thickness have no influence, the errors of the stress components and damage are also driven by the dynamical variables, as in our sea-ice model.

For the area and thickness, if their dynamics are shuffled alone, their importance is higher than shuffling their dynamics at the same time. Additionally, similar differences can be observed if the stress components are shuffled and combined with or without damage. Again, we attribute this to the naturally high correlation in some variables, which leads to unphysical instances, skewing the permutation feature importance. The importance of having physically consistent sample instances manifests one of the downsides of the permutation feature importance for correlated input variables. Nevertheless, this importance also shows that the NN takes groups of input variables and their correlations into account, which could explain the efficiency of the NN.

For forecasting with the model error correction, we only show results for the NNs with the initial conditions and the forecasts as input, although the NNs with initial conditions and the difference between forecast and initial conditions performs better in the testing dataset.
Here, we will shortly discuss the forecasting results of these latter NNs with the initial conditions and the differences as input (Table

Normalised RMSE on the test dataset for a lead time of

The dynamics are explicitly represented as the difference between the forecast and initial conditions.
On the one hand, this helps the NN to extract more information from the dynamics than for the “Initial

The authors will provide access to the data and weights of the neural networks upon request.
The source code for the experiments and the neural networks is publicly available under

MB and AC initialised the scientific questions. TSF, CD, AF, and MB refined the scientific questions and prepared an analysis strategy. TSF, YC, and VD advanced the codebase of the regional sea-ice model. TSF performed the experiments. TSF, CD, AF, and MB analysed and discussed the results. TSF wrote and revised the paper, with CD, AF, MB, YC, AC, and VD reviewing.

The contact author has declared that none of the authors has any competing interests.

Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The authors would like to thank Nils Hutter and Steffen Tietsche, by reviewing they helped to significantly improve the paper. The authors would additionally like to acknowledge other members of the project SASIP for the delightful discussions along the track. Furthermore, the authors received support from the project SASIP (grant no. 353), funded by Schmidt Futures – a philanthropic initiative that seeks to improve societal outcomes through the development of emerging science and technologies. This work was additionally granted access to the HPC resources of IDRIS under the allocation 2021-AD011013069 and 2022-AD011013069R1 made by GENCI.

This research has been supported by the Schmidt Family Foundation (grant no. 353) and the Grand Équipement National de Calcul Intensif (grant nos. 2021-AD011013069 and 2022-AD011013069R1).

This paper was edited by Yevgeny Aksenov and reviewed by Steffen Tietsche and Nils Hutter.