Predicting ocean-induced ice-shelf melt rates using a machine learning image segmentation approach

. Through their role in buttressing upstream ice ﬂow, Antarctic ice shelves play an important part in regulating future sea level change. Reduction in ice-shelf buttressing caused by increased ocean-induced melt along their undersides is now understood to be one of the key drivers of ice loss from the Antarctic Ice Sheet. However, despite the importance of this forcing mechanism most ice-sheet simulations currently rely on simple melt-parametrisations of this ocean-driven process, since a fully coupled ice-ocean modelling framework is prohibitively computationally expensive. Here, we provide an alternative approach 5 that is able to capture the greatly improved physical description of this process provided by large-scale ocean-circulation models over currently employed melt-parameterisations, but with trivial computational expense. We introduce a new approach that brings together deep learning and physical modelling to develop a deep neural network framework, MELTNET, that can emulate ocean model predictions of sub-ice shelf melt rates. We train MELTNET on synthetic geometries, using the NEMO ocean model as a ground-truth in lieu of observations to provide melt rates both for training and to evaluate the 10 performance of the trained network. We show that MELTNET can accurately predict melt rates for a wide range of complex synthetic geometries and outperforms more traditional parameterisations for > 95% of geometries tested. Furthermore, we ﬁnd MELTNET’s melt rate estimates show sensitivity to established physical relationships such as a changes in thermal forcing and ice shelf slope. This study demonstrates the potential for a deep learning framework to calculate melt rates with almost no computational expense, that could in the future be used in conjunction with an ice sheet model to provide predictions for 15 large-scale ice sheet models.

Given the current gulf between ocean models and lower complexity parameterisations, and the aforementioned problems with making long-term forecasts for the Antarctic ice sheet using a fully-coupled approach, there is a clear need for an alterna-60 tive middle ground. This should retain the ability to predict complex spatial patterns of melting but be computationally efficient in order to be able to run it synchronously with an ice sheet model without inhibiting the size of the domain or the duration of the simulation. Here, we propose using deep learning to emulate ocean model behaviour for the prediction of sub ice-shelf melt rates. Since the computational cost of a machine learning algorithm is insignificant once it has been trained, this could provide an alternative modelling approach. By treating the ocean model as a ground-truth and running ocean simulations on a wide variety of ice shelf configurations and ocean conditions, a network can be trained to approximate the behaviour of an ocean model. As a first step towards this goal, we demonstrate a deep learning framework that can accurately reproduce melt rate patterns as predicted by the NEMO ocean model and shows significantly better performance than existing intermediate complexity parameterisations without any increase in computational cost.

70
In the absence of sufficiently large observational melt rate training data sets for effective deep learning, we generate entirely random and synthetic geometries, together with temperature and salinity forcing, for several thousand ice shelves. These inputs are used as forcings for NEMO, a general circulation ocean model, which gives a simulated ice shelf melt rate. The inputs and NEMO melt rates are then applied within our deep learning framework, MELTNET, to train a model that can predict melt rates that closely resemble those predicted by the NEMO ocean model. We hold back 5% of the generated inputs, which are not 75 shown to the network during training and so can be used both to evaluate MELTNET and compare its performance with other melt rate parameterisations. We begin by describing our deep learning methodology, followed by the NEMO ocean model.
We then explain how the synthetic input fields, consisting of ice shelf geometry, bathymetry, temperature and salinity fields, are generated. Finally, we introduce the two intermediate complexity melt rate paramterisations that we compare MELTNET performance against.

Deep learning methodology
Our deep learning approach consists of two separate neural networks, trained to perform the two steps required to go from input fields to a melt rate field. All network design and training was done using MATLAB's deep learning toolbox (The MathWorks, 2021). In the first step, input geometries and ocean conditions (Sec. 2.3), together with NEMO melt rates (Sec. 2.2), are used to train a segmentation network that learns to classify regions of an ice shelf with labels representing the magnitude of melting or 85 refreezing. Secondly, an autoencoder network is trained to convert from these discrete classified melt rates to a continuous melt rate field. Hereafter, we refer to the combination of these two networks working in tandem, which together form our proposed melt rate parameterisation, as MELTNET. Figure 1 shows the workflow for training each network and predicting melt rates and each of these steps is described in more detail below.  Figure 1. Workflow diagram for the proposed deep learning framework, split into training and prediction. Synthetic ice shelf geometries (Section 2.3.1) and synthetic temperature and salinity profiles generated from WOA data (Section 2.3.2) are used (1) as inputs for the NEMO model which predicts a melt rate field and (2) to create a four channel input image for training of the segmentation network. NEMO melt rates are converted into a labelled image and the segmentation network trains to segment input images that match labelled NEMO output.
Separately, the autoencoder network takes the melt rate map and labelled melt rates from NEMO and learns to map between the two. In both of these networks, 5% of inputs and NEMO melt rates are withheld to form the validation set. Once the networks are trained, melt rate prediction proceeds by passing input images to the segmentation net and the resulting labelled images to the autoencoder, leading to a melt rate field.

90
The primary network, designed to classify melt rates from an input image, is a modified version of the SegNet architecture proposed by Badrinarayanan et al. (2017), with the modified architecture shown in Fig. B1. A segmentation network takes images as input and assigns a label to each pixel of that image. That input image may have different numbers of bands, for example a black and white image would have one input band and a standard colour image would have three. In our case, we use input images with 64x64 pixels and four bands representing bathymetry, ice shelf draft, temperature and salinity. Note that the 95 methodology is completely flexible with regards to the size of the image and the number of bands so more information could be coded into additional bands, as discussed later. In MELTNET, each pixel may have a value from 0 to 255 and geometrical input fields are re-scaled and mapped directly into the first two bands. Mapping of the temperature and salinity forcing, which are defined as depth dependant but spatially independent boundary conditions for the ocean model, is less intuitive. We explored several different options, all of which involved taking the depth-varying temperature and salinity and mapping those directly 100 to the ice shelf base at equivalent depth. Two noteworthy options we tested are: using the temperature and salinity profiles that NEMO is forced (prescribed) with at the boundary, or using the temperature and salinity simulated by NEMO at the ice front.
In practice, we found that it made little difference to the segmentation network accuracy (92.6% classification accuracy vs. 93.4%, respectively). We decided then, that taking average temperature and salinity conditions at the ice shelf front after model spinup was most consistent with existing melt rate parameterisations, since this is akin to forcing our model with observations 105 near the relevant ice shelf. These mapped temperature and salinities are then re-scaled and form the remaining bands of our input image.
The target melt rate field output by NEMO (Sect. 2.2) must be converted to a labelled image with a finite number of classes in order to be used to train the segmentation network. A tradeoff exists when selecting the number of classes for the segmentation network and the final performance of MELTNET in terms of predicting melt rates. With fewer classes, the segmentation net 110 accuracy goes up but the inverse classification net struggles to infer complex melt rate patterns, whereas with more classes the segmentation net accuracy drops, also resulting in a drop in overall MELTNET performance. We tested networks using N = 5 to 11 classes and the resulting NRMSE (Normalised Root Mean Squared Error, described later) varied from 0.16 to 0.12. Based on this testing, an optimal number of classes for our training set was found to be N = 8. Melting (or freezing) rates were converted to N discrete melt labels by calculating N − 2 quantiles of melt rates for every pixel in the training set 115 and assigning labels to melt rates that fall between each quantile, with the last label reserved for regions of the image with no melt/refreezing (i.e. outside of the ice shelf).
The segmentation net takes these input images and the corresponding set of labelled melt rates from NEMO and learns to reproduce the labelled melt rate distribution. At its core are convolutions, consisting of sets of filters that operate on the image.
These filters are learned during the training, using stochastic gradient descent to minimise the loss function which is calculated 120 by comparing the output with the NEMO training set. Layers of filters learn to extract useful features at different scales within the image, for example the outline of the coast or the local slope of the ice shelf base. The final training set, once a small subset of anomolous NEMO simulations with extreme temperatures were removed, consisted of 2575 images, with a further 136 (∼5%) retained for validation. MELTNET accuracy increases with an increasingly large number of training images but by testing incrementally larger training sets this number was found to be sufficient (Fig. B4).

125
The majority of each input image consists of pixels that lie outside of the ice shelf extent, resulting in a large bias in the training and scoring of the network towards pixels that are of no interest to our application. To alleviate this issue, classes were weighted according to the total frequency of pixels in each class for the entire training set. Furthermore, a random rotation which have the form f (x) = x · sigmoid(x), as these have been found to consistently outperform the more common ReLU function (Ramachandran et al., 2017).

135
We use another deep learning approach to perform the task of converting from discrete melt labels, output by the SegNet, to a continuous melt rate field. We found that a modified denoising autoencoder (DAE) architecture, based on the network proposed by Zhang et al. (2017), was able to perform this task effectively. DAEs take partially corrupted input and are trained to extract features that capture useful structure in order to recover the uncorrupted original. In this case, the corruption is the process of categorising melt rates into N discrete labels, which results in images that retain much of the original melt rate pattern but lose 140 fine-scale detail and magnitude information. The segmentation net is trained on these labelled melt rate images rather than the NEMO output directly, and itself outputs the same labels which need to be converted back to a continuous melt rate field in order to provide useful output for an ice sheet model.
The training set consists of labelled NEMO melt rates as inputs and true NEMO rates as outputs, i.e., the DAE learns to map from discrete labels to a continuous melt rate field. The specific experiments that comprised training and validation sets were 145 the same as those used to train the segmentation network. This ensured that the DAE did not get any unfair advantage from having already seen similar melt rate patterns during its training as those output by the segmentation net. The DAE architecture consists of several layers of 2D convolutions, batch normalisations and Swish layers (Swish layers were found to considerably improve performance compared to ReLU layers), as shown in Fig. B2. A comparison between NEMO melt rates, those same NEMO melt rates converted to a labelled image and the result of mapping from labelled images to melt rates using the trained 150 DAE network is shown in Fig. B3.

NEMO Ocean modelling
The ocean general circulation model used in this study is version v4.0.4 of the Nucleus for European Modelling of Ocean model (NEMO; Madec and Team). NEMO solves the incompressible, Boussinesq, hydrostatic primitive equations with a splitexplicit free-surface formulation. NEMO here uses a z*-coordinate (varying cell thickness) C-grid with partial cells at the 155 bottom-most and top-most ocean layers in order to provide more realistic representation of bathymetry (Bernard et al., 2006) and the ice-shelf geometry, respectively. Our model settings include: a 55-term polynomial approximation of the reference Thermodynamic Equation Of Seawater (TEOS-10; IOC and IAPSO (2010)), nonlinear bottom friction, a free-slip condition at the lateral boundaries (at both land and ice shelf interfaces), energy-and enstrophy-conserving momentum advection scheme and a prognostic turbulent kinetic energy scheme for vertical mixing. Laterally, we have spatially varying eddy coefficients lateral diffusion of momentum. Our model setup utilises the ice-shelf module that was developed by Mathiot et al. (2017).
Calculation of the ice shelf melt rate follows the standard three-equation parameterization as described in Asay-Davis et al.
(2016), with heat exchange and salt exchange coefficients of Γ T = 6 × 10 −2 and Γ s = 1.7 × 10 −3 , respectively. Additionally, the top drag coefficient is C d = 2.5 × 10 −3 . The conservative temperature, absolute salinity, and velocity are averaged over the where the interest here was to have a simple system in which to test the capabilities of a neural network to predict melt rates within drastically idealised ice shelf cavities. Future work will look at extending the neural network to more complex systems.

175
Following Holland et al. (2008), all simulations in this paper have a common spin-up of ten years where the time-mean values of the final year are used for all analysis. Sensitivity tests (not shown) suggest that a 10 year spin up is sufficient to capture the equilibrated response of the ice shelf to the forcing.

Synthetic input generation
A major hurdle to overcome with this deep learning approach is to generate synthetic inputs that are realistic but also show 180 sufficient variability to be useful analogs to real ice shelves. This problem can be broken down into two main steps; generation of the ice shelf geometry and generation of the temperature and salinity fields which set both the ocean initial conditions and far field restoring.

Ice shelf and coastline geometry
Four steps are involved in creating the geometry of the bathymetry (including the coastline) and ice-shelf draft, the algorithm  Each of these steps generates one or more random numbers that determine some geometrical property of the final domain, leading to a very large variety of final ice shelf configurations. These steps will now be outlined in order, with each step in its own paragraph.
The starting point for generating ice shelf geometries is the observation that most Antarctic ice shelves, particularly large ones, occur within embayments along the coast, while some smaller unconfined ice shelves also exist along flatter sections of 195 the coastline. From a square domain, we start by creating a polygon that will define the overall geometry of the coastline from four random point seeds that can each lie anywhere within their four predefined boxes, as shown in Fig 2a. Two further points are added on either side and inline with the previous end points to create a polygon with six points and five edges. Due to the extents of the four boxes within which the four initial points are seeded, most geometries will consist of a central embayment but the concavity of the resulting bay can vary from almost flat to a deep and strongly confined. This simple polygon is then 200 transformed into a complex polygon more closely resembling the fractal nature of a real coastline by repeatedly adding points midway between two existing points and offset some random distance from that edge, resulting in a final coastline as shown in With a coastline defined as described above, the next step is to define plausible extent for the ice-shelf front and from this the continental shelf break. Points on the coastline nearest to the two corner points (blue and yellow points in Fig 2a)  as trial start and end points for the ice-shelf front which has a random curvature (this determines the ice-shelf front shape, from concave to convex). If the ice-shelf front polygon does not intersect any coastline points and the area is less than a randomly selected minimum area then the ice-shelf front is accepted. If the ice shelf front is rejected, the two starting points along the coast for the calving front polygon are moved closer together and the procedure is repeated until a geometry is accepted. This results in a variety of different sized ice shelves that tend to be confined by any existing embayment in the coast. As a next 210 step, a distance is calculated for each open ocean point and the combined ice front and coastline ocean boundary. A contour of constant distance from the coastline is then drawn and converted to a tensioned spline to generate a smooth polygon that defines the continental shelf break. The result of these two steps is shown in Fig 2c. The geometry is now fully defined in 2D, but requires information on ice and water column thickness to be used as input for the ocean model. Ice thickness is first defined everywhere along the ice shelf grounding line as a product between distance to 215 the ice front, a measure of the coastline curvature and a random factor (resulting in a maximum ice thickness at the grounding line of 2000 m). This leads to ice thicknesses that are generally greater further from the coast and particularly where the coast consists of smaller inlets, to mimic plausible ice streams flowing into the ice shelf. Ice thickness is then extrapolated at regular points along the grounding line to the ice front using a simple analytical expression for a buttressed ice shelf thickness profile from Nilsson et al. (2017) under the simplifying assumption of no net accumulation. These ice thickness profiles are combined 220 and mapped onto a grid to generate a 2D ice thickness field everywhere within the ice shelf (Fig 2d). and random brownian noise is added to generate the final bathymetric grid (Fig 2e). The resulting fields of ice thickness and ocean depth are generated at finer resolution and then linearly interpolated onto 64x64 grids which serve directly as inputs to the ocean model (with each cell representing ∼8x8 km) and their discretised form serve as two bands of the input images for the segmentation net (Fig 2f).
A sample of 36 synthetic domain geometries is shown in (Fig 3). The algorithm that generates synthetic ice shelf geometries 230 must be capable of creating a wide variety of configurations. Validating these geometries is not possible, however the resulting configurations are visually similar to ice shelves typically found around Antarctica and the generation of ice thicknesses for each geometry, which melt rates are highly sensitive to, is based on analytical solutions for ice shelf flow. The final 64x64 grids result in each domain having an area of ∼252,000km. Given that much of the domain is taken up by grounded ice/ocean, this results in maximum ice shelf areas which are less than the two largest ice shelves in Antarctica (the Ross and Filchner-Ronne 235 Ice Shelves) but comparable to the next largest, such as the Amery and Larsen Ice Shelves.

Temperature and salinity forcing
The ocean model configuration used here (Section 2.2) requires a temperature/salinity restoring condition designed to imitate the far-field ocean forcing of an ice shelf. In this case, the restoring condition is applied only at the northern boundary, similar to the ISOMIP+ experiments (Asay-Davis et al., 2016). We thus need temperature and salinity fields that represent the variety 240 of conditions that might be found in this location around Antarctica. To this end, we use the World Ocean Atlas 2018 (hereafter WOA; Boyer et al.) to extract temperature and salinity profiles around all of Antarctica. For each meridian in the gridded data, we take data from the first ocean grid cell north of the Antarctic coast that has a depth of more than 2000 m. While this results in several hundred vertical profiles that could be used directly to force our synthetic geometries, the WOA dataset is still inherently a finite source of temperature and salinity data. As an alternative, we use these data as a starting point to generate 245 synthetic temperature and salinity profiles that share the same characteristics but can be unlimited in number and variety. This is accomplished with a Generative Adversarial Network (GAN, Goodfellow et al. (2014)). Details on this GAN network, together with a comparison between observed and generated temperature and salinity profiles, can be found in Appendix A.

Alternative melt rate parameterisations: PICO and PLUME
The performance of MELTNET is compared to two intermediate complexity melt rate parameterisations, where all models 250 judged on their ability to match NEMO's melt rate fields. The parameterisations are the Potsdam Ice-shelf Cavity mOdel (PICO, Reese et al. (2018a)) and a 2D implementation of the plume model (Jenkins, 1991) based on Lazeroms et al. (2018) (referred to hereafter as PLUME). The PICO model includes a representation of the vertical overturning circulation within an ice shelf cavity with a series of boxes that transfer heat and salt from the grounding line to the ice front. The PLUME model adapts 1-D plume theory by selecting a melt plume origin at any given ice shelf point and determining melt rate as a 255 function of properties at this plume origin and local ice shelf conditions. Plume origin for every ice shelf point is selected as the closest grounding line point, scaled by grounding line depth so that deeper origin points are favoured. Many other melt rate parameterisations exist but these were selected since they are generally regarded as the more advanced parameterisations; including physics related to cavity circulation while still remaining computationally inexpensive (Favier et al., 2019;Jourdain et al., 2020). For both paramterisations, a high resolution version of the synthetic geometries was converted to a finite element 260 mesh and the Ua ice-flow model implementation of each model was used to calculate melt rates.
In order to make our comparison to the PLUME and PICO models as fair as possible, two uncertain parameters in each model were optimised using the full set of NEMO outputs. For the PICO model, these two parameters were the overturning strength (C) and the heat exchange coefficient (γ * T ), which are also treated as tunetable parameters in the original PICO paper (Reese et al., 2018a). For the PLUME model, the heat exchange parameter Γ T S (similar but not the same as the γ * T parameter for 265 PICO) was selected, together with the plume entrainment coefficient (E 0 ). The PLUME and PICO models were run using the input geometry, temperature and salinity fields and then a total Normalised Root Mean Squared Error (NRMSE) was calculated compared to NEMO melt rates. This procedure was repeated, updating the model parameters to minimise total NRMSE, to derive an optimal set of parameters for each model that most closely replicated the NEMO melt rates. shown in Table B1.

Results
Figure 4 presents our main result of the study, a grid of different geometries (row 1) and corresponding melt rates (m yr −1 ) as calculated by: NEMO (row 2), MELTNET (row 3), PICO (row 4) and PLUME (row 5). Melt rates calculated by MELTNET clearly stand out amongst the lower three panels as the best qualitative match to the NEMO ocean model. In order to represent 275 the range of performance, rather than just the best results for any particular parameterisation, we assign a score to each MELT-NET result and sample evenly from the distribution of results by calculating the quantiles of the scores. The score used here (row 6), is based on a combination of the NRMSE and correlation coefficient; the two quantities are treated as vectors whose length is scaled such that a vector of zero length is a perfect score (NRMSE of 0 or correlation coefficient of 1) and the score is the L2 norm of these two vectors. Row 6 of Fig. 4  Melt rate results all use the same colourmap, with red and blue indicating melting and refreezing, respectively. Note the colour map gradient is not linear, but is greatest around zero, to make it easier to distinguish the magnitude of melting/refreezing over the bulk of the ice shelves.
Numbers in red and blue at the top of each melt rate column show the area averaged sub-ice-shelf temperature and salinity, respectively.
Numbers in the bottom left corner of each melt rate panel show the averaged melt rate as calculated by each model. Both the PICO and PLUME models show results using optimised parameters (Table B1).
validation set (panel a). Pearson correlation coefficient between MELTNET and the NEMO model (mean 0.65) was higher than PICO (mean 0.14) and PLUME (0.10) for 99% of the members of the validation set (panel b). These results show that MELTNET not only has a lower misfit than the other models but is far better at reproducing spatial patterns i.e. getting melting and refreezing in the right areas of each ice shelf.

290
A commonly raised and sensible concern with all forms of deep learning is that poorly trained networks can give the right answer for the wrong reasons. We take a number of steps to avoid this issue, for example, we randomly rotate the ice shelf orientation so that the network does not learn to associate high melting with grid cells on one side of the domain. We can also explore how MELTNET's predictions compare to our understanding of the physics underlying ocean-ice shelf interactions. The upper section of each panel presents the same information as probability density plots. Note lower NRMSE and higher correlation coefficient mean a better fit to the groundtruth NEMO melt rates. This is done for a simplified ice shelf geometry, i.e., uniform bathymetry at a depth of 1100m and an ice shelf 160km long and 295 352km wide. Ice shelf draft varies linearly from 600m at the grounding line to 200m at the ice front and there is no across-shelf variation in the geometry. Furthermore, both salinity and temperature are constant throughout the entire domain, and as before, this forms the northern boundary restoring condition. In this context, two simple relationships are expected to emerge: (1) a linear dependency of melt rates to changes in ice-shelf slope  and (2) a quadratic dependency of melt rates to changes in ocean temperature (Holland et al., 2008). To investigate these two relationships, we: (1) vary ice shelf basal 300 slope by keeping grounding line depth constant and moving the ice front and (2) vary temperature by a uniform amount in the entire domain. The change in the ice-shelf cavity melt rate for these two sensitivity tests is shown in Fig 6. In both tests, the dependence matched that expected by theory, as shown by the linear and quadratic trend lines through the sample points. This goes some way to demonstrating that MELTNET has learnt an accurate representation of actual melt physics. This is in spite of the fact that these simplified domains are very different from the more complex geometries that the network has been trained 305 on.
r 2 = 0.986 r 2 = 0.986 Figure 6. Area averaged basal melt rate from MELTNET for an idealised ice shelf geometry as a function of changes in ocean temperature (red) and ice shelf basal slope (blue). Dashed lines show quadratic and linear fits to results, with corresponding r 2 values in matching colours.

Discussion and Conclusions
The MELTNET deep neural network can produce melt rates that closely resemble those calculated by the NEMO ocean model for synthetic geometries that were not part of the training set. When compared to two intermediate complexity melt rate parameterisations, MELTNET outperforms them in terms of both overall NRMSE and correlation, even when parameters in 310 those models are tuned to minimise the misfit for the geometries we test. In terms of area averaged melt rates (Fig 4), MELTNET also performs favorably compared to PICO and PLUME, which both tend to underestimate this value. Since these two models are tuned to minimise the overall NRMSE rather than average melt this is not particularly surprising, but nevertheless highlights the problem with tuning these models based on one metric, leading them to perform poorly in other regards.
Correctly predicting spatial patterns of ice shelf melting (as shown by the high correlation of our results to NEMO), rather 315 than just the magnitude, is crucial because the sensitivity of an ice shelf to thinning will vary across that ice shelf (Reese et al., 2018b). Some regions within an ice shelf can be considered entirely 'passive', in that reductions in ice thickness in these areas has no impact on ice flux across the grounding line. On the other hand, perturbations to ice shelf melting within certain highly buttressed regions, for example in the shear margin of an ice shelf, can have a much greater impact on ice-sheet discharge than other regions such as downstream of an ice stream, despite those being otherwise dynamically important (Feldmann et al.,320 2021). In general, a small proportion of ice shelves contribute a disproportionately large fraction of the total buttressing force (Reese et al., 2018b).
One important caveat to this work is that MELTNET can only be, at best, as good as the ocean model that it has been trained on. Here, we necessarily treat the ocean model as our ground-truth, since the geometries are entirely synthetic. Training MELTNET on real world observations would be preferable but there are not enough distinct ice shelves, or indeed sufficient 325 observations of melting, for this to be feasible. Thus we consider NEMO melt rates akin to observations and matching these as accurately as possible is our goal. The NEMO ocean model setup has a number of simplifications; for example no representation of sea ice, surface forcing, ocean tides etc. These processes would all impact the melt rate calculation. Some missing processes would be possible to add in with our synthetic geometry approach, but others present more significant challenges.
On the other hand, this methodology also provides interesting advantages, since adding complexity to the representation 330 of the ocean model physics can simply be achieved by including more processes in the ocean model. This is in contrast to a typical model where adding new physics is a significant undertaking that can require replacing large sections of code and considerable testing. Furthermore, since the method is not limited in terms of input fields, any missing information required to properly train the network with new physics could easily be added into a new band in the input image. Conversely, processes could be removed by reducing the amount of inputs used to train MELTNET. Doing this would provide insights into which 335 processes are important for producing realistic melt rates, possibly aiding the development of alternative parameterisations.
Further improvements to MELTNET are no doubt possible by altering the network architecture. We developed and tested an architecture that combined both DAE and segmentation components into one network, however this proved harder to optimise than the approach we have presented here. That being said, this possibility could be explored further to simplify the training and application of the MELTNET parameterisation and potentially also increase its accuracy.

340
The results presented here show a promising first step to a parameterisation for ocean induced melting that shows high fidelity to advanced ocean models with very low computational cost. That being said, more work is required, before applying this to transient ice sheet models. Future work must demonstrate that the network, trained on synthetic geometries, is also capable of reproducing melt rate patterns on real ice shelves based on the limited observations that exist or in comparison to state-of-the-art ocean models. One limiting factor currently is the size of the domain, which at ∼502km square is not large enough to cover the  Figure A1. Schematic showing the GAN architecture, used to generate synthetic temperature and salinity profiles from the WOA observations. The Generator network (top) takes a 25x1 random vector and, through a series of convolution layers, outputs a synthetic combined profile. The discriminator network (bottom) is given both real and synthetic profiles and labels these as real or fake. Both networks learn from one another, resulting in a generator network that can create an infinite number of realistic temperature and salinity profiles from random noise.
This work has demonstrated that a deep learning network can be trained to emulate an ocean model in terms of predicted melt rates beneath an Antarctic ice shelf. When applied to a wide range of synthetic geometries, MELTNET agrees closely 355 with the NEMO model that it was trained on, and outperforms other commonly used parameterisations if we assume that the ocean model represents the best estimate of melt rates for a given geometry. These results show that a deep learning emulator may provide useful melt rate estimates for ice sheet models but more work is needed to refine the methodology and test this approach on real ice shelf geometries with observations of melt rates. An accurate and efficient parameterisation of melt rates beneath Antarctic ice shelves is urgently needed to improve the representation of this crucial component of mass loss.  generator network and a discriminator network. The generator learns to generate synthetic temperature and salinity profiles while the discriminator network attempts to distinguish between real profiles (from the WOA dataset) and the profiles created by the generator. Initially, neither network knows what to do but are in direct competition and learn from each other to improve. At the end of the training process, the generator network has learnt to take a random vector input as a seed and output temperature 370 and salinity profiles that closely resemble the real data. Since the GAN takes a random seed as input, any number of these random seeds can be used to generate the desired number of synthetic profiles.
The specific architecture used, shown in Fig. A1 is a modification of the Deep Convolutional GAN (DCGAN) as proposed by Radford et al. (2016). Temperature and salinity profiles from WOA are concatenated into one vector which the discriminator aims to reproduce and the discriminator learns to differentiate. The discriminator network includes dropout layers with a 375 dropout of 50%, which was necessary to avoid mode collapse. The two networks are trained simultaneously for 500 epochs and reach an equilibrium in which the loss for each stabilises around 0.5. Every available temperature and salinity profile from the WOA dataset, used to train the GAN, is shown in Fig. A2a, together with a sample of synthetic profiles generated by the GAN in Fig. A2b.  Figure B1. Schematic for the segmentation network used in this study. An input 64x64x4 image goes through layers of convolution, batch normalisation and swish layers in sequences of pooling and unpooling layers. Once trained on NEMO results, the final output is a segmented image consisting of melt rate labels that can then be converted into a melt rate field using the AE network.  Figure B2. Schematic for the AE net architecture used in this study. A melt rate field, as calculated by NEMO for a given input geometry, is converted to a matrix of discrete melt labels. These labelled melt rates served as the input to the network go through a series of convolution, batch normalisation and swish layers, leading to an output. The loss of this output is calculated against the original melt rate field to train the network to recover a continuous melt rate field from discrete melt rate labels.   ΓT S PLUME 2.19e-4 6.00e-4 E0 PLUME 1.98e-2 3.60e-2