DeepBedMap: a deep neural network for resolving the bed topography of Antarctica

To resolve the bed elevation of Antarctica, we present DeepBedMap – a novel machine learning method that can produce Antarctic bed topography with adequate surface roughness from multiple remote sensing data inputs. The super-resolution deep convolutional neural network model is trained on scattered regions in Antarctica where high-resolution (250 m) ground-truth bed elevation grids are available. This model is then used to generate high-resolution bed topography in less surveyed areas. DeepBedMap improves on previous interpolation methods by not restricting itself to a low-spatial-resolution (1000 m) BEDMAP2 raster image as its prior image. It takes in additional high-spatialresolution datasets, such as ice surface elevation, velocity and snow accumulation, to better inform the bed topography even in the absence of ice thickness data from direct icepenetrating-radar surveys. The DeepBedMap model is based on an adapted architecture of the Enhanced Super-Resolution Generative Adversarial Network, chosen to minimize perpixel elevation errors while producing realistic topography. The final product is a four-times-upsampled (250 m) bed elevation model of Antarctica that can be used by glaciologists interested in the subglacial terrain and by ice sheet modellers wanting to run catchmentor continent-scale ice sheet model simulations. We show that DeepBedMap offers a rougher topographic profile compared to the standard bicubically interpolated BEDMAP2 and BedMachine Antarctica and envision it being used where a high-resolution bed elevation model is required.


Introduction
The bed of the Antarctic ice sheet is one of the most challenging surfaces on Earth to map due to the thick layer of ice cover. Knowledge of bed elevation is however essential for estimating the volume of ice currently stored in the ice sheets and for input to the numerical models that are used to estimate the contribution ice sheets are likely to make to sea level in the coming century. The Antarctic ice sheet is estimated to hold a sea level equivalent (SLE) of 57.9 ± 0.9 m . Between 2012 and 2017, the Antarctic ice sheet was losing mass at an average rate of 219 ± 43 Gt yr −1 (0.61 ± 0.12 mm yr −1 SLE), with most of the ice loss attributed to the acceleration, retreat and rapid thinning of major West Antarctic Ice Sheet outlet glaciers (IMBIE, 2018). Bed elevation exerts additional controls on ice flow by routing subglacial water and providing frictional resistance to flow (Siegert et al., 2004). Bed roughness, especially at short wavelengths, exerts a frictional force against the flow of ice, making it an important influence on ice velocity (Bingham et al., 2017;Falcini et al., 2018). The importance of bed elevation has led to major efforts to compile bed elevation models of Antarctica, notably with the BEDMAP1 (Lythe and Vaughan, 2001) and BEDMAP2 (Fretwell et al., 2013) products. A need for a higher-spatial-resolution digital elevation model (DEM) is also apparent, as ice sheet models move to using sub-kilometre grids in order to quantify glacier ice flow dynamics more accurately (Le Brocq et al., 2010;Graham et al., 2017). Finer grids are especially important at the ice sheet's grounding zone on which adaptive mesh refinement schemes have focused (e.g. Cornford et al., 2016), and attention to the bed roughness component is imperative for proper modelling of fast-flowing outlet glaciers (Durand et al., 2011;Nias et al., 2016). Here we address the challenge of producing a high-resolution DEM while preserving a realistic representation of the bed terrain's roughness.
Estimating bed elevation directly from geophysical observations primarily uses ice-penetrating-radar methods (e.g. Robin et al., 1970). Airborne radar methods enable reliable along-track estimates with low uncertainty (around the 1 % level) introduced by imperfect knowledge of the firn and ice velocity structure, with some potential uncertainty introduced by picking the bed return. Radar-derived bed estimates remain limited in their geographic coverage (Fretwell et al., 2013) and are typically anisotropic in their coverage, with higher spatial sampling in the along-track direction than between tracks.
To overcome these limitations, indirect methods of estimating bed elevation have been developed, and these include inverse methods and spatial statistical methods. Inverse methods use surface observations combined with glaciological-process knowledge to determine ice thickness (e.g. van Pelt et al., 2013). A non-linear relationship exists between the thickness of glaciers, ice streams and ice sheets and how they flow (Raymond and Gudmundsson, 2005), meaning one can theoretically use a well-resolved surface to infer bed properties (e.g. Farinotti et al., 2009). Using surface observation inputs, such as the glacier outline, surface digital elevation models, surface mass balance, surface rate of elevation change, and surface ice flow velocity, various models have been tested in the Ice Thickness Models Intercomparison eXperiment (ITMIX; Farinotti et al., 2017) to determine ice thickness (surface elevation minus bed elevation). While significant inter-model uncertainties do exist, they can be mitigated by combining several models in an ensemble to provide a better consensus estimate (Farinotti et al., 2019). On a larger scale, the inverse technique has also been applied to the Greenland  and Antarctic  ice sheets, specifically using the mass conservation approach (Morlighem et al., 2011). Spatial statistical methods seek to derive a higher-spatialresolution bed by applying the topographical likeness of bed features known to great detail in one area to other regions. For example, the conditional simulation method applied by Goff et al. (2014) is able to resolve both fine-scale roughness and channelized morphology over the complex topography of Thwaites Glacier and make use of the fact that roughness statistics are different between highland and lowland areas. Graham et al. (2017) uses a two-step approach to generate their synthetic high-resolution grid, with the high-frequency roughness component coming from the ICE-CAP and BEDMAP1 compilation radar point data and the low-frequency component coming from BEDMAP2. Neither method is perfect, and we see all of the above methods as complementary.
We present a deep-neural-network method that is trained on direct ice-penetrating-radar observations over Antarctica and one which has features from both the indirect inverse modelling and spatial statistical methodologies. An artificial neural network, loosely based on biological neural networks, is a system made up of neurons. Each neuron comprises a simple mathematical function that takes an input to produce an output value, and neural networks work by combining many of these neurons together. The term deep neural network is used when there is not a direct function mapping between the input data and final output but two or more layers that are connected to one another (see LeCun et al., 2015, for a review). They are trained using backpropagation, a procedure whereby the weights or parameters of the neurons' connections are adjusted so as to minimize the error between the ground truth and output of the neural network (Rumelhart et al., 1986). Similar work has been done before using artificial neural networks for estimating bed topography (e.g. Clarke et al., 2009;Monnier and Zhu, 2018), but to our knowledge, no-one so far in the glaciological community has attempted to use convolutional neural networks that work in a more spatially aware, 2-dimensional setting. Convolutional neural networks differ from standard artificial neural networks in that they use kernels or filters in place of regular neurons (again, see LeCun et al., 2015, for a review). The techniques we employ are prevalent in the computer vision community, having existed since the 1980s (Fukushima and Miyake, 1982;LeCun et al., 1989) and are commonly used in visual pattern recognition tasks (e.g. Lecun et al., 1998;Krizhevsky et al., 2012). Our main contributions are twofold: we (1) present a high-resolution (250 m) bed elevation map of Antarctica that goes beyond the 1 km resolution of BEDMAP2 (Fretwell et al., 2013) and (2) design a deep convolutional neural network to integrate as many remote sensing datasets as possible which are relevant to estimating Antarctica's bed topography. We name the neural network "DeepBedMap", and the resulting digital elevation model (DEM) product "DeepBedMap_DEM".
2 Related work

Super resolution
Super resolution involves the processing of a low-resolution raster image into a higher-resolution one (Tsai and Huang, 1984). The idea is similar to the work on enhancing regular photographs to look crisper. The problem is especially ill-posed because a specific low-resolution input can correspond to many possible high-resolution outputs, resulting in the development of several different algorithms aimed at solving this challenge (see Nasrollahi and Moeslund, 2014, for a review). One promising approach is to use deep neural networks (LeCun et al., 2015) to learn an end-to-end mapping between the low-and high-resolution images, a method coined the Super-Resolution Convolutional Neural Network (SRCNN; Dong et al., 2014). Since the development of SR-CNN, multiple advances have been made to improve the perceptual quality of super-resolution neural networks (see Yang et al., 2019, for a review). One way is to use a better loss function, also known as a cost function. A loss function is a mathematical function that represents the error between the output of the neural network and the ground truth (see also Appendix A). By having an adversarial component in its loss function, the Super-Resolution Generative Adversarial Network (SRGAN; Ledig et al., 2017) manages to produce super-resolution images with finer perceptual details. A generative adversarial network (Goodfellow et al., 2014) consists of two neural networks, a generator and a discriminator. A common analogy used is to treat the generator as an artist that produces imitation paintings and the discriminator as an art critic that determines the authenticity of the paintings. The artist wants to fool the critic into believing its paintings are real, while the critic tries to identify problems with the painting. Over time, the artist or generator model learns to improve itself based on the critic's judgement, producing authentic-looking paintings with high perceptual quality. Perceptual quality is the extent to which an image looks like a valid natural image, usually as judged by a human. In this case, perceptual quality is quantified mathematically by the discriminator or critic taking into account high-level features of an image like contrast, texture, etc. Another way to improve performance is by reconfiguring the neural network's architecture, wherein the layout or building blocks of the neural network are changed. By removing unnecessary model components and adding residual connections (He et al., 2015), an enhanced deep super-resolution network (EDSR; Lim et al., 2017) features a deeper neural network model that has better performance than older models. For the DeepBedMap model, we choose to adapt the Enhanced Super-Resolution Generative Adversarial Network (ESRGAN; Wang et al., 2019) which brings together the ideas mentioned above. This approach produces state-ofthe-art perceptual quality and won the 2018 Perceptual Image Restoration and Manipulation Challenge on Super Resolution (Third Region; Blau et al., 2018).

Network conditioning
Network conditioning means having a neural network process one source of information in the context of other sources (Dumoulin et al., 2018). In a geographic context, conditioning is akin to using not just one layer but also other relevant layers with meaningful links to provide additional information for the task at hand. Many ways exist to insert extra conditional information into a neural network, such as concatenation-based conditioning, conditional biasing, conditional scaling and conditional affine transformations (Dumoulin et al., 2018). We choose to use the concatenationbased conditioning approach, whereby all of the individual raster images are concatenated together channel-wise, much like the individual bands of a multispectral satellite image. This was deemed the most appropriate conditioning method as all the contextual remote sensing datasets are raster grid images and also because this approach aligns with related work in the remote sensing field.
An example similar to this DEM super-resolution problem is the classic problem of pan-sharpening, whereby a blurry low-resolution multispectral image conditioned with a highresolution panchromatic image can be turned into a highresolution multispectral image. There is ongoing research into the use of deep convolutional neural networks for pansharpening (Masi et al., 2016;Scarpa et al., 2018), sometimes with the incorporation of specific domain knowledge , all of which show promising improvements over classical image processing methods. More recently, generative adversarial networks (Goodfellow et al., 2014) have been used in the conditional sense for general image-to-image translation tasks (e.g. Isola et al., 2016;Park et al., 2019), and also for producing more realistic pan-sharpened satellite images (Liu et al., 2018). Our DeepBedMap model builds upon these ideas and other related DEM super-resolution work (Xu et al., 2015;Chen et al., 2016), while incorporating extra conditional information specific to the cryospheric domain for resolving the bed elevation of Antarctica.

Data preparation
Our convolutional neural network model works on 2-D images, so we ensure all the datasets are in a suitable raster grid format. Ground-truth bed elevation points picked from radar surveys (see Table 1) are first compiled together onto a common Antarctic stereographic projection (EPSG:3031) using the WGS84 datum, reprojecting where necessary. These points are then gridded onto a 250 m spatial resolution (pixelnode-registered) grid. We preprocess the points first using Generic Mapping Tools v6.0 (GMT6; Wessel et al., 2019), computing the median elevation for each pixel block in a regular grid. The preprocessed points are then run through an adjustable-tension continuous-curvature spline function with a tension factor set to 0.35 to produce a digital elevation model grid. This grid is further post-processed to mask out pixels that are more than 3 pixels (750 m) from the nearest ground-truth point.
To create the training dataset, we use a sliding window to obtain square tiles cropped from the high-resolution (250 m) ground-truth bed elevation grids, with each tile required to be completely filled with data (i.e. no Not a Number -NaN -values). Besides these ground-truth bed elevation tiles, we also obtain other tiled inputs (see Table 2) corresponding to the same spatial bounding box area. To reduce border edge artefacts in the prediction, the neural network model's input convolutional layers (see Fig. 1) use no padding (also known as "valid" padding) when performing the initial convolution operation. This means that the model input grids (x, w 1 , w 2 , w 3 ) have to cover a larger spatial area than the  Table 2) into a consistent tensor. The core module processes the rich information contained within the concatenated inputs. The upsampling module scales the tensor up by 4 times and does some extra processing to produce the output DeepBedMap_DEM. Table 1. High-resolution ground-truth datasets from icepenetrating-radar surveys (collectively labelled as y) used to train the DeepBedMap model. Training site locations can be seen in Fig ground-truth grids (y). More specifically, the model inputs cover an area of 11 km × 11 km (e.g. 11 pixels × 11 pixels for BEDMAP2), while the ground-truth grids cover an area of 9 km × 9 km (36 pixels × 36 pixels). As the pixels of the ground-truth grids may not align perfectly with those of the model's input grids, we use bilinear interpolation to ensure that all the input grids cover the same spatial bounds as those of the reference ground-truth tiles. The general locations of these training tiles are shown in orange in Fig. 2.

Model design
Our DeepBedMap model is a generative adversarial network (Goodfellow et al., 2014) composed of two convolutional neural network models, a generator G θ that produces the bed elevation prediction and a discriminator D η critic that will judge the quality of this output. The two models are trained to compete against each other, with the generator trying to produce images that are misclassified as real by the discriminator and the discriminator learning to spot problems with the generator's prediction in relation to the ground truth. Following this is a mathematical definition of the neural network models and their architecture.
The objective of the main super-resolution generator model G θ is to produce a high-resolution (250 m) grid of Antarctica's bed elevationŷ given a low-resolution (1000 m) BEDMAP2 (Fretwell et al., 2013) image x. However, the information contained in BEDMAP2 is insufficient for this regular super-resolution task, so we provide the neural network with more context through network conditioning (see Sect. 2.2). Specifically, the model is conditioned at the input block stage with three raster grids (see Table 2): (1) ice surface elevation w 1 , (2) ice surface velocity w 2 and (3) snow Antarctic snow accumulation snow accumulation rate (kg m −2 yr −1 ) 1000 m Arthern et al. (2006) a Note that the x and y components of velocity are used here instead of the norm. b Gaps in 100 m mosaic filled in with bilinear resampled 200 m resolution REMA image. c Originally 450 m; bilinear resampled to 500 m.
accumulation w 3 . This can be formulated as follows: where G θ is the generator (see Fig. 1) that produces highresolution image candidatesŷ. For brevity in the following equations, we simplify Eq.
(1) to hide conditional inputs w 1 , w 2 and w 3 , so that all input images are represented using x. To train the generative adversarial network, we update the parameters of the generator θ and discriminator η as follows: where new estimates of the neural network parametersθ andη are produced by minimizing the total loss functions L G and L D , respectively, for the generator G and discriminator D andŷ n and y n are the set of predicted and ground-truth high-resolution images over N training samples. The generator network's loss L G is a custom perceptual loss function with four weighted components -content, adversarial, topographic and structural loss. The discriminator network's loss L D is designed to maximize the likelihood that predicted images are classified as fake (0) and ground-truth images are classified as real (1). Details of these loss functions are described in Appendix A.
Noting that the objective of the generator G is opposite to that of the discriminator D, we formulate the adversarial min-max problem following Goodfellow et al. (2014) as where for the discriminator D, we maximize the expectation E or the likelihood that the probability distribution of the discriminator's output fits D(y) = 1 when y ∼ P data (y); i.e. we want the discriminator to classify the high-resolution image as real (1) when the image y is in the distribution of the ground-truth images P data (y). For the generator G, we minimize the likelihood that the discriminator classifies the generator output D(G(x)) = 0 when x ∼ P G(x) ; i.e. we do not want the discriminator to classify the super-resolution image as fake (0) when the inputs x are in the distribution of generated images P G(x) . The overall goal of the entire network is to make the distribution of generated images G(x) as similar as possible to the ground truth y through optimizing the value function V . DeepBedMap's model architecture is adapted from the Enhanced Super-Resolution Generative Adversarial Network (ESRGAN; Wang et al., 2019). The generator model G (see Fig. 1) consists of an input, core and upsampling module. The input module is made up of four sub-networks, each one composed of a convolutional neural network that processes the input image into a consistent 9 × 9 shaped tensor. Note that the MEaSUREs Ice Velocity (Mouginot et al., 2019b) input has two channels, one each for the x and y velocity components. All the processed inputs are then concatenated together channel-wise before being fed into the core module. The core module is based on the ESRGAN architecture with 12 residual-in-residual dense blocks (see Wang et al., 2019, for details), saddled in between a pre-residual and postresidual convolutional layer. A skip connection runs from the pre-residual layer's output to the post-residual layer's output before being fed into the upsampling module. This skip connection (He et al., 2016) helps with the neural network training process by allowing the model to also consider minimally processed information from the input module, instead of solely relying on derived information from the residualblock layers when performing the upsampling. The upsampling module is composed of two upsampling blocks, specifically a nearest-neighbour upsampling followed by a convolutional layer and leaky rectified linear unit (LeakyReLU; Maas et al., 2013) activation, which progressively scales the tensors by 2 times each time. Following this are two deformable convolutional layers (Dai et al., 2017) which produce the final-output super-resolution DeepBedMap_DEM. This generator model is trained to gradually improve its prediction by comparing the predicted output with ground-truth images in the training regions (see Fig. 2), using the total loss function defined in Eq. (A9).
The main differences between the DeepBedMap generator model and ESRGAN are the custom input block at the beginning and the deformable convolutional layers at the end. The custom input block is designed to handle the prior low-  Table 1). resolution BEDMAP2 image and conditional inputs (see Table 2). Deformable convolution was chosen in place of the standard convolution so as to enhance the model's predictive capability by having it learn dense spatial transformations.
Besides the generator model, there is a separate adversarial discriminator model D (not shown in the paper). Again, we follow ESRGAN's  lead by implementing the adversarial discriminator network in the style of the Visual Geometry Group convolutional neural network model (VGG; Simonyan and Zisserman, 2014). The discriminator model consists of 10 blocks made up of a convolutional, batch normalization (Ioffe and Szegedy, 2015) and LeakyReLU (Maas et al., 2013) layer, followed by two fully connected layers comprised of 100 neurons and 1 neuron, respectively. For numerical stability, we omit the final fully connected layer's sigmoid activation function from the discriminator model's construction, integrating it instead into the binary cross-entropy loss functions at Eqs. (A2) and (A3) using the log-sum-exp function. The output of this discriminator model is a value ranging from 0 (fake) to 1 (real) that scores the generator model's output image. This score is used by both the discriminator and generator in the training process and helps to push the predictions towards more realistic bed elevations. More details of the neural network training setup can be found in Appendix B.

DeepBedMap_DEM topography
Here we present the output digital elevation model (DEM) of the super-resolution DeepBedMap neural network model and compare it with bed topography produced by other methods. The resulting DEM has a 250 m spatial resolution and therefore a four-times upsampled bed elevation grid product of BEDMAP2 (Fretwell et al., 2013). In Fig. 2, we show that the full Antarctic-wide DeepBedMap_DEM manages to capture general topographical features across the whole continent. The model is only valid for grounded-ice regions, but we have produced predictions extending outside of the grounding-zone area (including ice shelf cavities) using the same bed elevation, surface elevation, ice velocity and snow accumulation inputs where such data are available up to the ice shelf front. We emphasize that the bed elevation under the ice shelves has not been super resolved properly and is not intended for ice sheet modelling use. Users are encouraged to cut the DeepBedMap_DEM using their preferred grounding line (e.g. Bindschadler et al., 2011;Rignot et al., 2011;Mouginot et al., 2017) and replace the under-ice-shelf areas with another bathymetry grid product (e.g. GEBCO Bathymetric Compilation Group, 2020). The transition from the DeepBedMap_DEM to the bathymetry product across the grounding zone can then be smoothed using inverse distance weighting or an alternative interpolation method.
We now highlight some qualitative observations of DeepBedMap_DEM's bed topography beneath Pine Island Glacier (Fig. 3) and other parts of Antarctica (Fig. 4). DeepBedMap_DEM shows a terrain with realistic topographical features, having fine-scale bumps and troughs that makes it rougher than that of BEDMAP2 (Fretwell et al., 2013) and BedMachine Antarctica (Morlighem, 2019) while still preserving the general topography of the area (Fig. 3). Over steep topographical areas such as the Transantarctic Mountains (Fig. 4a and h), DeepBedMap produced speckle (S) texture patterns. Along fast-flowing ice streams and glaciers (Fig. 4b-h), we can see ridges (R) aligned parallel to the sides of the valley, i.e. along the flow. In some cases, the ridges are also oriented perpendicularly to the flow direction such as at Whillans Ice Stream (Fig. 4b), Bindschadler Ice Stream (Fig. 4c) and Totten Glacier (Fig. 4g), resulting in intersecting ridges that create a box-like, honeycomb structure. Over relatively flat regions in both West Antarctica and East Antarctica (e.g. Fig. 4g), there are some hummocky, wave-like (W) patterns occasionally represented in the terrain. Terrace (T) features can occasionally be found winding along the side of hills such as at the Gamburtsev Subglacial Mountains (Fig. 4i).

Surface roughness
We compare the roughness of DeepBedMap_DEM vs. Bed-Machine Antarctica with ground-truth grids from processed Operation IceBridge data (Shi et al., 2010) using SD as a simple measure of roughness (Rippin et al., 2014). We calculate the surface roughness for a single 250 m pixel from the SD of elevation values over a square 1250 m × 1250 m area (i.e. 5 pixels × 5 pixels) surrounding the central pixel. Focusing on Thwaites Glacier, the spatial 2-D view of the DeepBedMap_DEM (Fig. 5a) shows a range of typical topographic features such as hills and canyons. The calculated 2-D roughnesses for both DeepBedMap_DEM (Fig. 5b) and the Ground truth (Fig. 5c) lie in a similar range from 0 to 400 m, whereas the roughness of BedMachine Antarctica (Fig. 5d) is mostly in the 0-to-200 m range (hence the different colour scale). Also, the roughness pattern for both DeepBedMap_DEM and the ground truth has a more distributed cluster pattern made up of little pockets (especially towards the coastal region on the left; see Fig. 5b and c), whereas the BedMachine Antarctica roughness pattern shows larger cluster pockets in isolated regions (see Fig. 5d).
Taking a 1-D transect over the 250 m resolution DeepBedMap_DEM, BedMachine Antarctica and groundtruth grids, we illustrate the differences in bed topography and roughness from the coast towards the inland area of Thwaites Glacier with a flight trace from Operation Ice-Bridge (see Fig. 6). For better comparison, we have cal-culated the Operation IceBridge ground-truth bed elevation and roughness values from a resampled 250 m grid instead of using its native along-track resolution. All three elevation profiles are shown to follow the same general trend from the relatively rough coastal region (Fig. 6a from −1550 to −1500 km on the x scale), along the retrograde slope ( Fig. 6a from −1500 to −1450 km on the x scale) and into the interior region. DeepBedMap_DEM features a relatively noisy elevation profile with multiple fine-scale (< 10 km) bumps and troughs similar to the ground truth, while BedMachine Antarctica shows a smoother profile that is almost a moving average of the ground-truth elevation (Fig. 6a). Looking at the roughness statistic (Fig. 6b), both the DeepBedMap_DEM and Operation IceBridge groundtruth grids have a mean SD of about 40 m, whereas BedMachine Antarctica has a mean of about 10 m and rarely exceeds a SD value of 20 m along the transect.

Bed features
In Sect. 4.1, we show that the DeepBedMap model has produced a high-resolution (250 m) result (see Fig. 3) that can capture a detailed picture of the underlying bed topography. The fine-scale bumps and troughs are the result of the DeepBedMap generator model learning to produce features that are similar to those found in the high-resolution groundtruth datasets it was trained on. However, there are also artefacts produced by the model. For example, the winding terrace (T, Fig. 4) features are hard to explain, and though they resemble eskers (Drews et al., 2017), their placement along the sides of hills does not support this view. Similarly, we are not sure why speckle (S, Fig. 4) texture patterns are found over steep mountains, but the lack of high-resolution training datasets likely leads the model to perform worse over these high-gradient areas.
Another issue is that DeepBedMap will often pick up details from the high-resolution ice surface elevation model  input dataset, which may not be representative of the true bed topography. For example, the ridges (R, Fig. 4) found along fast-flowing ice streams and glaciers are likely to be the imprints of crevasses or flow stripes (Glasser and Gudmundsson, 2012) observable from the surface. An alternative explanation is that the ridges, especially the honeycomb-shaped ones, are rhombohedral moraine deposits formed by soft sediment squeezed up into basal crevasses that are sometimes found at stagnant surging glaciers (Dowdeswell et al., 2016a,b;Solheim and Pfirman, 1985). We favour the first interpretation as the positions of these bed features coincide with the surface features and also because these ridges are more likely to be eroded away in these fast-flowing ice stream areas. The hummocky wave-like (W) patterns we observe over the relatively flat and slower-flowing areas are likely to result from surface megadune structures (Scambos, 2014). Alternatively, they may be ribbed or Rogen moraine features that are formed in an orientation transverse to the ice flow direction (Hättestrand, 1997;Hättestrand and Kleman, 1999). While any one of these two explanations may be valid in different regions of Antarctica, we lean towards the conservative interpretation that these features are the result of the DeepBedMap model overfitting to the ice surface elevation data.

Roughness
In Sect. 4.2, we quantitatively show that a well-trained DeepBedMap neural network model can produce high roughness values more comparable to the ground truth than those of BedMachine Antarctica. While the mass conservation technique used by BedMachine Antarctica  improves upon ordinary interpolation techniques such as bicubic interpolation and kriging, its results are still inherently smooth by nature. The ground-truth grids show that rough areas do exist on a fine scale, and so the highresolution models we produce should reflect that.
DeepBedMap_DEM manages to capture much of the rough topography found in the Operation IceBridge groundtruth data, especially near the coast (see Fig. 6a, from −1550 to −1500 km on the x scale) where the terrain tends to be rougher. Along the retrograde slope (see Fig. 6a, from −1500 to −1450 km on the x scale), several of the fine-scale (< 10 km) bumps and troughs in DeepBedMap_DEM can be seen to correlate well in position with the ground truth. In contrast, the cubically interpolated BedMachine Antarctica product lacks such fine-scale (< 10 km) bumps and troughs, appearing as a relatively smooth terrain over much of the transect. Previous studies that estimated basal shear stress over Thwaites Glacier have found a band of strong bed extending about 80-100 km from the grounding line, with pockets of weak bed interspersed between bands of strong bed further upstream (Joughin et al., 2009;Sergienko and Hindmarsh, 2013), a pattern that is broadly consistent with the DeepBedMap_DEM roughness results (see Fig. 5b).
In general, DeepBedMap_DEM produces a topography that is rougher, with SD values more in line with those observed in the ground truth (see Fig. 6b). The roughness values for BedMachine Antarctica are consistently lower throughout the transect, a consequence of the mass conservation technique using regularization parameters that yield smooth results. We note that the DeepBedMap_DEM does appear rougher than the ground truth in certain areas. It is possible to tweak the training regime to incorporate roughness (or any statistical measure) into the loss function (see Appendix A) to yield the desired surface, and this will be explored in future work (see Sect. 5.4). Recent studies have stressed the importance of form drag (basal drag due to bed topography) over skin drag (or basal friction) on the basal traction of Pine Island Glacier (Bingham et al., 2017;Kyrke-Smith et al., 2018), and the DeepBedMap super-resolution work here shows strong potential in meeting that demand as a high-resolution bed topography dataset for ice sheet modelling studies.
In terms of bed roughness anisotropy, DeepBedMap is able to capture aspects of it from the ground-truth grids by combining (1) ice flow direction via the ice velocity grid's x and y components (Mouginot et al., 2019b), (2) ice surface aspect via the ice surface elevation grid , and (3) the low-resolution bed elevation input (Fretwell et al., 2013). There are therefore inherent assumptions that the topography of the current bed is associated with the current ice flow direction, surface aspect and existing low-resolution BEDMAP2 anisotropy. Provided that the direction of this surface velocity and aspect is the same as bed roughness anisotropy, as demonstrated in Holschuh et al. (2020), the neural network will be able to recognize it and perform accordingly. However, if the ice flow direction and surface aspect is not associated with bed anisotropy, then this assumption will be violated and the model will not perform well.

Limitations
The DeepBedMap model is trained only on a small fraction of the area of Antarctica, at less than 0.1 % of the groundedice regions (excluding ice shelves and islands). This is because the pixel-based convolutional neural network cannot be trained on sparse survey point measurements, nor is it able to constrain itself with track-based radar data. As the along-track resolution of radar bed picks are much smaller than 250 m pixels, it is also not easy to preserve roughness from radar unless smaller pixels are used. The topography generated by the model is sensitive to the accuracy of its data inputs (see Tables 1 and 2), and though this is a problem faced by other inverse methods, neural network models like the one presented can be particularly biased towards the training dataset. Specifically, the DeepBedMap model focuses on resolving short-wavelength features important for sub-kilometre roughness, compared to BedMachine Antarc-tica  which recovers large-scale features like ridges and valleys well.
An inherent assumption in this methodology is that the training datasets have sampled the variable bed lithology of Antarctica (Cox et al., 2018) sufficiently. This is unlikely to be true, introducing uncertainty into the result as different lithologies may cause the same macroscale bed landscapes to result in a range of surface features. In particular, the experimental model's topography is likely skewed towards the distribution of the training regions that tend to reside in coastal regions, especially over ice streams in West Antarctica (see Fig. 2). While bed lithology could be used as an input to inform the DeepBedMap model's prediction, it is challenging to find a suitable geological map (or geopotential proxy; see e.g. Aitken et al., 2014;Cox et al., 2018) for the entire Antarctic continent that has a sufficiently high spatial resolution. Ideally, the lithological map (categorical or qualitative) would first be converted to a hardness map with an appropriate erosion law and history incorporated (quantitative). This is because it is easier to train generative adversarial networks on quantitative data (e.g. hardness as a scale from 0 to 10) than on categorical data variables (e.g. sedimentary, igneous or metamorphic rocks); the latter would require a more elaborate model architecture and loss function design.

Future directions
The way forward for DeepBedMap is to combine quality datasets gathered by radioglaciology and remote sensing specialists, with new advancements made by the ice sheet modelling and machine learning community. While care has been taken to source the best possible datasets (see Tables 1 and 2), we note that there are still areas where more data are needed. Radio-echo sounding is the best tool available to fill in the data gap, as it provides not only the highresolution datasets needed for training but also the background coarse-resolution BEDMAP dataset. Besides targeting radio-echo-sounding acquisitions over a diverse range of bed and flow types, swath reprocessing of old datasets that have that capability (Holschuh et al., 2020) may be another useful addition to the training set. The super-resolution DeepBedMap technique can also be applied on bed elevation inputs newer than BEDMAP2 (Fretwell et al., 2013), such as the 1000 m resolution DEM over the Weddell Sea (Jeofry et al., 2017), the 500 m resolution BedMachine Antarctica dataset (Morlighem, 2019) or the upcoming BEDMAP3.
A way to increase the number of high-resolution groundtruth training data further is to look at formerly glaciated beds. There are a wealth of data around the margins of Antarctica in the form of swath bathymetry data and also on land in areas like the former Laurentide ice sheet. The current model architecture does not support using solely "elevation" as an input, because it also requires ice elevation, ice surface velocity and snow accumulation data. In order to support us-ing these paleobeds as training data, one could do one of the following: 1. Have a paleo-ice-sheet model that provides these ice surface observation parameters. However, continentscale ice sheet models quite often produce only kilometre-scale outputs, and there are inherent uncertainties with past ice sheet reconstructions that may bias the resulting trained neural network model.

2.
Modularize the neural network model to support different sets of training data. One main branch would be trained like a single-image super-resolution problem (Yang et al., 2019), where we try to map a lowresolution BEDMAP2 tile to a high-resolution groundtruth image (be it from a contemporary bed, a paleobed or offshore bathymetry). The optional conditional branches would then act to support and improve on the result of this naive super-resolution method. This design is more complicated to set up and train, but it can increase the available training data by at least an order of magnitude and lead to better results.
From a satellite remote sensing perspective, it is important to continue the work on increasing spatial coverage and measurement precision. Some of the conditional datasets used such as REMA  and MEaSUREs Ice Velocity (Mouginot et al., 2019b) contain data gaps which introduce artefacts in the DeepBedMap_DEM, and those holes need to be patched up for proper continent-wide prediction. A surface mass balance dataset with sub-kilometre spatial resolution will also prove useful in replacing the snow accumulation dataset (Arthern et al., 2006) used in this work. As the DeepBedMap model relies on data from multiple sources collected over different epochs, it has no proper sense of time. Ice elevation change captured using satellite altimeters such as from CryoSat-2 (Helm et al., 2014), ICESat-2 (Markus et al., 2017) or the upcoming CRISTAL (Kern et al., 2020) could be added as an additional input to better account for temporal factors. The DeepBedMap model's modular design (see Sect. 3.2) means the different modules (see Fig. 1) can be improved on and adapted for future-use cases. The generator model architecture's input module can be modified to handle new datasets such as the ones suggested above or redesigned to extract a greater amount of information for better performance. Similarly, the core and upsampling modules which are based on ESRGAN  can be replaced with newer, better architectures as the need arises. The discriminator model which is currently one designed for standard computer vision tasks can also be modified to incorporate glaciology-specific criteria. For example, the generated bed elevation image could be scrutinized by the discriminator model for valid properties such as topographic features that are aligned with the ice flow direction. The redesigned neural network model can be retrained from scratch or fine-tuned using the trained weights from DeepBedMap to further improve the predictive performance. Taken together, these advances will lead to an even more accurate and higherresolution bed elevation model.

Conclusions
The DeepBedMap convolutional neural network method presents a data-driven approach to resolve the bed topography of Antarctica using existing data. It is an improvement beyond simple interpolation techniques, generating highspatial-resolution (250 m) topography that preserves detail in bed roughness and is adaptable for catchment-to continentscale studies on ice sheets. Unlike other inverse methods that rely on some explicit parameterization of ice flow physics, the model uses deep learning to find suitable neural network parameters via an iterative error minimization approach. This makes the resulting model particularly sensitive to the training dataset, emphasizing the value of densely spaced bed elevation datasets and the need for such sampling over a more diverse range of Antarctic substrate types. The use of graphical processing units (GPUs) for training and inference allows the neural network method to scale easily, and the addition of more training datasets will allow it to perform better.
The work here is intended not to discourage the usage of other inverse modelling or spatial statistical techniques but to introduce an alternative methodology, with an outlook towards combining each methodology's strengths. Once properly trained, the DeepBedMap model runs quickly (about 3 min for the whole Antarctic continent) and produces realistic rough topography. Combining the DeepBedMap model with more physically based mass conservation inverse approaches (e.g. Morlighem et al., 2019) will likely result in more efficient ways of generating accurate bed elevation maps of Antarctica. One side product resulting from this work is a test-driven development framework that can be used to measure and compare the performance of upcoming bed terrain models. The radioglaciology community has already begun to compile a new comprehensive bed elevation and ice thickness dataset for Antarctica, and there have been discussions on combining various terrain interpolation techniques in an ensemble to collaboratively create the new BEDMAP3.

Appendix A: Details of loss function components
The loss function, or cost function, is a mathematical function that maps a set of input variables to an output loss value. The loss value can be thought of as a weighted sum of several error metrics between the neural network's prediction and the expected output or ground truth. It is this loss value which we want to minimize so as to train the neural network model to perform better, and we do this by iteratively optimizing the parameters in the loss function. Following this are the details of the various loss functions that make up the total loss function of the DeepBedMap generative adversarial network model.

A1 Content loss
To bring the pixel values of the generated images closer to those of the ground truth, we first define the content-loss function L 1 . Following ESRGAN , we have where we take the mean absolute error between the generator network's predicted valueŷ i and the ground-truth value y i , respectively, over every pixel i.

A2 Adversarial loss
Next, we define an adversarial loss to encourage the production of high-resolution imagesŷ closer to the manifold of natural-looking digital-elevation-model images. To do so, we introduce the standard discriminator in the form of D(y) = σ (C(y)), where σ is the sigmoid activation function and C(y) is the raw, non-transformed output from a discriminator neural network acting on high-resolution image y. The ESRGAN model , however, employs an improved relativistic-average discriminator (Jolicoeur-Martineau, 2018) denoted by D Ra . It is defined as is the arithmetic mean operation carried out over every generated imagê y in a mini batch. We use a binary cross-entropy loss as the discriminator's loss function defined as follows: The generator network's adversarial loss is in a symmetrical form:

A3 Topographic loss
We further define a topographic loss so that the elevation values in the super-resolved image make topographic sense with respect to the original low-resolution image. Specifically, we want the mean value of each 4 × 4 grid on the predicted super-resolution (DeepBedMap) image to closely match its spatially corresponding 1 pixel × 1 pixel area on the low-resolution (BEDMAP2) image. First, we apply a 4 × 4 mean pooling operation on the generator network's predicted super-resolution image: whereȳ j is the mean of all predicted valuesŷ i across the 16 super-resolved pixels i within a 4 × 4 grid corresponding to the spatial location of 1 low-resolution pixel at position j . Following this, we can compute the topographic loss as follows: where we take the mean absolute error between the mean of the 4 × 4 super-resolved pixels calculated in Eq. (A4)ȳ j and those of the spatially corresponding low-resolution pixel x j , respectively, over every low-resolution pixel j .

A4 Structural loss
Lastly, we define a structural loss that takes into account luminance, contrast and structural information between the predicted and ground-truth images. This is based on the structural similarity index (SSIM; Wang et al., 2004) and is calculated over a single window patch as SSIM(ŷ, y) = (2µŷµ y + c 1 )(2σŷ y + c 2 ) (µ 2 y + µ 2 y + c 1 )(σ 2 y + σ 2 y + c 2 ) , where µŷ and µ y are the arithmetic mean of predicted imageŷ and ground-truth image y, respectively, over a single window that we set to 9 pixels × 9 pixels; σŷ y is the covariance ofŷ and y; σ 2 y and σ 2 y are the variances ofŷ and y, respectively; and c 1 and c 2 are two variables set to 0.01 2 and 0.03 2 to stabilize division with a weak denominator. Thus, we can formulate the structural loss as follows: where we take 1 minus the mean of all structural similarity values SSIM(ŷ, y) calculated over every patch p obtained via a sliding window over the predicted imageŷ and groundtruth image y.

A5 Total loss function
Finally, we compile the loss functions for the discriminator and generator networks as follows: where η, λ, θ and ζ are the scaled weights for the content L 1 , adversarial L D , topographic L T and structural losses L S , respectively (see Table B1 for values used). The loss functions L D and L G are minimized in an alternate 1 : 1 manner so as to solve the entire generative adversarial network's objective function defined in Eq. (4).

Appendix B: Neural network training details
The neural networks were developed using Chainer v7.0.0 (Tokui et al., 2019) and trained using full-precision (floating point 32) arithmetic. Experiments were carried out on four graphical processing units (GPUs), specifically two Tesla P100 GPUs and two Tesla V100 GPUs. On the Tesla V100 GPU setup, one training run with about 150 epochs takes about 30 min. This is using a batch size of 128 on a total of 3826 training image tiles, with 202 tiles reserved for validation, i.e. a 95/5 training/validation split. We next describe the method used to evaluate each DeepBedMap candidate model, as well as the high-level way in which we semiautomatically arrived at a good model via semi-automatic hyperparameter tuning.
To check for overfitting, we evaluate the generative adversarial network model using the validation dataset after each epoch using two performance metrics -a peak signalto-noise ratio (PSNR) metric for the generator and an accuracy metric for the discriminator. Training stops when these validation performance metrics show little improvement, roughly at 140 epochs. Next, we conduct a full evaluation on an independent test dataset, comparing the model's predicted grid output with actual ground-truth xyz points. Using the "grdtrack" function in Generic Mapping Tools v6.0 (Wessel et al., 2019), we obtain the grid elevation at each ground-truth point and use it to calculate the elevation error on a point-to-point basis. All of these elevation errors are then used to compute a root mean square error (RMSE) statistic over this independent test site. This RMSE value is used to judge the model's performance in relation to baseline bicubic interpolation and is also the metric minimized by a hyperparameter optimization algorithm which we will describe next.
Neural networks contain a lot of hyperparameter settings that need to be decided upon, and generative adversarial networks are particularly sensitive to different hyperparameter settings. To stabilize model training and obtain better performance, we tune the hyperparameters (see Table B1) using a Bayesian approach. Specifically, we employ the Treestructured Parzen Estimator  from the Optuna v2.0.0  library with default settings as per the Hyperopt library (Bergstra et al., 2015). Given that we have four GPUs, we choose to parallelize the hyperparameter tuning experiments asynchronously between all four devices. The estimator first conducts 20 random experimental trials to scan the hyperparameter space, gradually narrowing down its range to a few candidate hyperparameters in subsequent experiments. We set each GPU to run a target of 60 experimental trials (i.e. a total of 240), though unpromising trials that have exploding or vanishing gradients are pruned prematurely using the Hyperband algorithm (Li et al., 2018) to save on time and computational resources. The top models from these experiments undergo further visual evaluation, and we continue to conduct further experiments until a suitable candidate model is found.
Author contributions. WJL was responsible for data curation, formal analysis, methodology, software, visualization and writing the original draft. HJH was responsible for funding acquisition and supervision. Both authors conceptualized the work and contributed to the reviewing and editing stages of the writing.