Articles | Volume 18, issue 3
Research article
19 Mar 2024
Research article |  | 19 Mar 2024

Deep clustering in subglacial radar reflectance reveals subglacial lakes

Sheng Dong, Lei Fu, Xueyuan Tang, Zefeng Li, and Xiaofei Chen

Ice-penetrating radar (IPR) imaging is a valuable tool for observing the internal structure and bottom of ice sheets. Subglacial water bodies, also known as subglacial lakes, generally appear as distinct, bright, flat, and continuous reflections in IPR images. In this study, we use available IPR images from the Gamburtsev Subglacial Mountains to extract one-dimensional reflector waveform features of the ice–bedrock interface. We apply a deep-learning method to reduce the dimension of the reflector features. An unsupervised clustering method is then used to separate different types of reflector features, including a reflector type corresponding to subglacial lakes. The derived clustering labels are then used to detect features of subglacial lakes in IPR images. Using this method, we compare the new detections with a known-lakes inventory. The results indicate that this new method identified additional subglacial lakes that were not previously detected, and some previously known lakes are found to correspond to other reflector clusters. This method can offer automatic detections of subglacial lakes and provide new insight for subglacial studies.

1 Introduction

Subglacial water, i.e., water between the bedrock and ice sheet, is formed through a complex interplay of factors such as subglacial pressure, friction heat, geothermal flux, and surface water injection (Robin1955; Siegert2000; Cuffey and Paterson2010; Pattyn2010). Subglacial lakes play an important role in subglacial water networks, which can also impact ice flow and dynamics (Kamb1987; Stearns et al.2008; Siegfried et al.2016; Kazmierczak et al.2022). Investigation of water storage in subglacial lakes can provide insights into estimating the contribution of ice sheet meltwater to sea level rise (King et al.2020; Fettweis et al.2013) and the history of former climate change and ice sheet evolution (Dowdeswell and Siegert1999). In addition, subglacial lake sediments may also contain information that records the historical evolution of ice sheets (Smith et al.2018). The extreme conditions of low temperature and absent sunlight create unique subglacial lacustrine ecosystems (Christner et al.2014; Mikucki et al.2016).

Ice-penetrating radar (IPR) can be used to detect the subsurface features of ice sheets (Bailey1964; Robin et al.1969, 1970; Carter et al.2007; Paden et al.2010; Arnold et al.2020). The thickness of the subglacial water layer and sediment characteristics at the bottom of lakes are also investigated with active seismic surveys (Peters et al.2008; Horgan et al.2012) and gravimetry and electromagnetic methods (Studinger et al.2004; Key and Siegfried2017). These observations have been used to construct the first Global Subglacial Lake Inventory (Livingstone et al.2022).

Subglacial lakes can be identified in radar images due to their distinct, bright, flat, and specular reflection characteristics (Oswald and Robin1973). Because of the specific reflection characteristics of subglacial lakes in IPR images (Schroeder et al.2013), the manual extraction of the visual features was initially applied (Siegert and Ridley1998; Gades et al.2000; Dowdeswell and Evans2004). With the increase in IPR data, semi-automatic methods based on ice bottom roughness features and reflected signal power have been developed to search for lake candidates (Carter et al.2007; Bowling et al.2019). Automatic methods based on experts' experience and physical modeling (Lang et al.2022; Hao et al.2023) as well as machine-learning methods (Gifford and Agah2012; Ilisei et al.2018) have also been proposed in subglacial lake detection. These methods have shown that improved selection rules and thresholds can enhance detection accuracy and efficiency. However, these methods were based on assumptions of physical modeling or learning from previous detection experience, which may lead to potentially inaccurate detections. In the past decades, IPR surveys have collected large numbers of radar images, which enable the analysis of basal radar reflectance features even if the interpretation of basal radar reflectance features is absent.

In recent years, deep learning has been applied as a powerful tool to detect different features in IPR images, including bedrock interfaces (Xu et al.2017; Rahnemoonfar et al.2017; Dong et al.2021; Liu-Schiaffini et al.2022), internal ice layers (Yari et al.2020; Varshney et al.2020; Dong et al.2021), snow accumulation layers (Varshney et al.2021), and subglacial waters (Gifford and Agah2012; Ilisei and Bruzzone2015; Ilisei et al.2018). The Center for Remote Sensing and Integrated Systems (CReSIS,, last access: 29 February 2024) released an extensive collection of historical radar images recorded in the Antarctic and Greenland ice sheets (Arnold et al.2020). These datasets have driven various investigations of recent subglacial studies (e.g., Varshney et al.2021; Zeising et al.2022). The ice bottom reflectors extracted from the CReSIS dataset can construct a comprehensive catalog of basal reflector characteristics, which can facilitate further data-driven deep-learning analysis of reflector features.

In this study, we follow known-lakes inventories (Wolovick et al.2013; Livingstone et al.2022) to investigate subglacial lakes in the Gamburtsev Subglacial Mountains region. We select IPR images in the region of the Gamburtsev Subglacial Mountains from the CReSIS database. We crop these images around the ice bottom to obtain a set of one-dimensional waveforms that capture the ice bottom reflectance characteristics. Using these data, we train the Variational Auto-Encoder (VAE, Kingma and Welling2013) to reconstruct the one-dimensional waveform features of basal reflectors. We then apply K-means clustering methods (MacQueen1967) in the VAE's latent space to analyze similar reflection features and separate them into different clusters. We identify a cluster of reconstructed reflectors with sharp, steep, and symmetric waveform characteristics corresponding to the subglacial lakes observed in field radar images. Furthermore, we apply a conventional method based on the linear relationship between depth and peak reflected power to filter the candidate subglacial lakes from latent space clustering. By using this workflow, we can obtain an automatic approach in subglacial lake detection. To validate the results, we compare the distributions of subglacial lakes by this method with the existing inventories. This automated method can improve the efficiency of the detection of subglacial lakes. By collecting and verifying the waveform characteristics of subglacial reflectors, the accuracy of subglacial lakes can also be improved. Additionally, this approach can be extended to detect and label other clusters of subglacial features, providing valuable reference data for further studies of subglacial environments.

2 Data and methods

In this section, we introduce the workflow of the ice bottom reflection feature clustering method, as shown in Fig. 1, which includes the extraction and sampling of ice bottom reflector features (Fig. 1a), the feature reconstruction and latent vector encoding by the variational auto-encoder (Fig. 1b), the unsupervised clustering of ice bottom reflector features (Fig. 1c), and the implementation of subglacial lake detection (Fig. 1d).

Figure 1Workflow for subglacial lake detection: (a) extract and sample the ice bottom reflector trace by trace in IPR images. (b) The VAE encodes and reconstructs the sampled ice bottom reflector. (c) Unsupervised clustering on the encoded latent vectors. (d) Trace the ice bottom reflector corresponding to the subglacial lake cluster. Subglacial lakes are labeled after depth-dependent echo power filtering.


2.1 Ice bottom reflectors

The utilized airborne radar images were collected during the December 2008–January 2009 Antarctica's GAmburtsev Province Project (AGAP) (, last access: 29 February 2024) from the CReSIS database. According to the lake inventories (Wolovick et al.2013; Livingstone et al.2022), multiple known subglacial lakes are located in this study area. We focus on the dataset from the southern camp of Dome A (AGAP-S), which comprises 2715 IPR images with a central frequency of 150 Hz, a bandwidth of 10 MHz, and a transmitting power of 800 W (Wolovick et al.2013). We use the L1B data product (CSAPR_standard), which employs focused synthetic aperture radar processing on each channel and motion compensation during data pre-processing (Arnold et al.2020). The radar images have an average spatial along-track trace spacing of 14 m and a time sample step of 10−7 s, equivalent to a sample range of 8.4 m in ice. The radar images also contain the positions of ice bottom reflectors, which were extracted by a hybrid manual–automatic method (CReSIS2024).

In this study, we perform a series of data processing steps to extract the ice bottom reflector signals from the radar images. First, we transfer the echo power to the decibel scale for each radar image by [X]db=10log10(X), where X is the pixel value from the images. Second, we use the bed picks in the dataset to truncate the one-dimensional data within ±200 sampling points near the bed reflector position for every single vertical trace. Third, we align the one-dimensional trace data by centering the traces according to their maximum value (peak echo) to correct minor position misfits of semi-automatic reflector picking. This step ensures that the maximum value of bottom reflector signal features always resided at the center of the one-dimensional trace data. To reduce the interference from other englacial radar features such as the internal ice layers and the potential multiple diffractions from bedrock, we apply constant time windows for each trace near the peak signal values. The width of the time window should contain the main part of the signal waveform. We truncate −64 to +63 data sampling points around the peak signal centers, which maintains a fixed length of 128 samples for each ice bottom reflection signal. As the sampling rates of the radar images in this region are identical, the sample ranges of the ice bottom reflector features are also consistent.

To enhance the reflector features and minimize the impact of sampling noise in radar images, a constant Gaussian filter with a kernel sigma value of 4 is applied to all the extracted trace reflector data. Last, all dynamic ranges of reflector features from different radar images are normalized into 0–1 to reduce the influence of the background echo power in the radar image and accentuate the reflector features. The normalization in each reflector waveform reduces the complexity of the data samples, which also accelerates the following training process and enables the waveform downsampling to a small 2×1 vector. By following the steps above, we generated and collected an ice bottom reflector waveform feature dataset with 1 488 600 one-dimensional z-axis radar echo traces.

2.2 Variational Auto-Encoder

The VAE was first proposed by Kingma and Welling (2013) and designed for image and signal processing. As an auto-encoder, the VAE consists of an encoder and a decoder: the encoder reduces the data sizes and downsamples the input data to vectors in latent space; the following decoder reconstructs the latent vectors to approach and match the raw input data. Between the encoder and decoder, the latent space is characterized as the “bottleneck” of the VAE, in which the input feature is depressed into the smallest size. After training, the encoded latent vectors can be considered dimension-reduced representations of the input data. The VAE now has various applications in Earth science studies, such as geophysical inversion (Cheng and Jiang2020; Liu et al.2022; Lopez-Alvis et al.2021), shale petroleum prediction (Li and Misra2017), engineering seismic analysis (Esfahani et al.2021), and seismic mechanism analysis (Li2022; Ma et al.2022).

In this study, we employ the VAE to reduce the dimension of the reflector waveform features from the ice bottom. The VAE architecture is shown in Fig. 2a, which consists of fully connected layers, including an input layer of 128 neurons, an encoder, and a decoder consisting of two hidden layers with 128 neurons and an output layer with 128 neurons. To perform a two-dimensional clustering in the latent vectors, we design the latent space with a small size of 2×1 following Li (2022). The two-dimensional latent space also facilitates the visualization of spatial distributions among the latent vectors in two-dimensional plots. The loss function used in VAE training follows Kingma and Welling (2013) and Li (2022):

(1)  Loss  = MSE + KL ,

where the MSE represents the mean squared error, which measures the average difference between the predicted and actual values, while KL represents Kullback–Leibler divergence, which measures the dissimilarity between the probability distribution of the latent space and a Gaussian distribution:


where X and X are raw input reflectors and VAE-reconstructed reflectors, respectively. The MSE in the loss function is applied to calculate the reconstruction misfit and KL divergence for estimating and reducing the difference between the distribution and the normal distribution in latent space. n represents the dimension of the latent space Z, which was preset to 2 in this study, and σ and μ are the variance and mean of the latent space, respectively. The Adam optimizer (Kingma and Ba2014) is employed to accelerate the training process.

Figure 2Variational Auto-Encoder (VAE) and demonstrations of ice bottom reflector reconstruction. (a) VAE architecture, with both encoder and decoder consisting of two fully connected layers with 128 neurons and bottleneck 1×2 latent space. (b–d) Illustration of data reconstruction using the VAE: input raw reflectors (blue waveforms) and VAE-reconstructed reflectors (orange waveforms), where the horizontal axis corresponds to time and the vertical axis corresponds to the normalized reflection power (ranging from 0 to 1). Reconstructed MSEs are labeled above the waveforms. (b) Symmetrical reflector features. (c) Asymmetrical reflector features. (d) Complex reflector features, which result in higher reconstruction errors.


We use the randomly shuffled reflector datasets to train and validate the VAE; 90 % of the data are used for training the VAE, while the remaining 10 % served as a validation set. The VAE is updated by a full training dataset during different epochs in training. Due to the similar reflector features after single-trace normalizations and the large data amount applied in training, the training loss rapidly descends and no longer changes after the first epoch. Thus, we stopped the training at epoch 4 (Fig. S1 in the Supplement).

To illustrate the VAE's reconstruction performance, we randomly select different reflectors from the validation set to demonstrate the reconstruction of ice bottom reflector features (Fig. 2b, c). Subfigures in Fig. 2b show a group of symmetrical reflectors and their corresponding reconstruction. The reconstructed reflector features (orange waveforms) remain the width and trend of raw input reflector features (blue waveforms). Due to the low-dimensional latent space with a 2×1 size applied in the latent space, the high-frequency detailed features in the reflector feature are unattainable and thus discarded by the VAE. Figure 2c demonstrated a group of asymmetric reflector features and the corresponding VAE reconstruction. The comparison between inputs and reconstructions suggests that the asymmetric trends of the reflector feature are also successfully reconstructed, together with the width waveform feature. In general, the VAE can reconstruct the features of both symmetric and asymmetric ice bottom reflectors. Furthermore, we select typical reflectors with large reconstruction errors to demonstrate the large misfit conditions (Fig. 2d). Notably, reflectors contained with high-frequency signals, multiple peaks, and severe oscillations are challenging to reconstruct, thus resulting in higher errors. These peculiar signal features deviate significantly from the majority of the reflector features in the training set, rendering the features difficult to encode and decode through latent vectors. The reconstructions of multiple peak features are usually simplified to broader reflection shapes, whose trends are approximate to the smooth shape and average of the input features.

As shown in Fig. 2a, the original 128-length reflector waveform features are transformed into a 2×1 latent vector between the encoder and decoder of the VAE. The features of ice bottom reflectors are derived by the encoder part of the VAE to latent vectors consisting of two dimensionless scalars, Z1 and Z2, which can be regarded as vectors containing the original signal features. Therefore, the distance between vectors from two reflector samples in the latent space can be considered an indicator of statistical feature similarity.

2.3 Clustering analysis in latent space

After VAE training, we randomly select a subset with 2000 reflector samples from the entire dataset for clustering. Because the selection is uniformly random, the reflector samples are from different radar images captured in different regions, and thus the samples reveal different ice bottom conditions. The reflector samples are first encoded by the VAE's encoder to two-dimensional vectors in latent space. Figure 3a shows the vector distributions of the samples in the latent space, in which each scattered point corresponds to an encoded reflector sample. Due to the application of KL divergence in the VAE's loss function, the vector distribution of these samples in the latent space composed of Z1 and Z2 is approximate to a two-dimensional Gaussian distribution. According to the character of the VAE (Kingma and Welling2013), the distance between encoded vectors in the latent space is equivalent to the difference between the input reflector samples. By measuring the distances between the reflectors' latent vectors, we can estimate the difference in waveform features. Furthermore, the distance-based clustering in latent vectors can classify the ice bottom reflector feature with similar features.

Figure 3(a) Latent space distribution of 2000 randomly selected encoded reflector features, with each point representing an encoded reflector sample. The color of each point represents different clustering results (classes), and the black cross denotes the clustering center of each class. The gray-dashed rectangle indicates the range of 2 standard deviations (±σZ1 and ±σZ2) of the latent vectors. (b) Synthetic ice bottom reflectors reconstructed by virtual vectors, where the virtual vectors' range corresponds to the ranges of standard deviations (σZ1 and σZ2). Different colors divide the latent spaces corresponding to different clusters, in which the waveforms demonstrate the synthetic reflectors in different clusters. The candidate cluster corresponding to subglacial lakes is shown in black color near the upper-right corner.


In clustering, a redundant dataset can slow down the clustering calculation. Conversely, over-reduced datasets may lack essential features and lower the accuracy. As illustrated in Fig. 3a, latent vectors from 2000 reflectors are randomly selected from the entire dataset for clustering analysis to balance the clustering efficiency and accuracy. We employ the K-means clustering algorithm (MacQueen1967), which is based on the Euclidean distance estimation of the differences between data samples as well as the characteristics in the VAE's latent space. Initially, K-clustering centers are randomly assigned in two-dimensional space. The distance of each sample vector to the cluster center is computed, and the sample is assigned to the nearest cluster with the smallest distance. Then, all the cluster centers are updated to the spatial center of all the samples belonging to the corresponding cluster. This assign–update process is repeated until the cluster center becomes constant or the clustering result remains unchanged.

The number of clusters (K) is a preset parameter in the K-means algorithm, which must balance the tradeoff between implied feature classes and feature density in the data. On the one hand, K should be sufficiently large to distinguish between different ice bottom conditions. On the other hand, K should not be so large as to create unnecessary subclasses. To obtain optimal clustering results, we first applied the elbow method to determine the appropriate value of K (Fig. 4). However, the elbow curve does not show a clear cutoff point, possibly due to the distribution of vectors in the latent space (Fig. 3a) not displaying a distinct trend of multiple classes.

Figure 4Elbow curve using different cluster numbers (K) in the K-means algorithm (a) and on log–log axes (b).


Since there is no distinct inflection point in the elbow curves (Fig. 4), it is necessary to specify the value of K for clustering the reflector samples. The selection of the K value in K means directly impacts the area of each cluster in the latent space. A smaller K corresponds to larger clusters in the latent space, while a larger K allows for more precise isolation of different reflection types. Consequently, different K values directly impact the detection of subglacial lakes. To identify an appropriate K value, we conducted multiple experiments with different K values (Figs. S2 and S3) and selected K=15 in the final detection.

To visualize and trace the representative waveform features in different clusters as well as the different regions in latent space, we apply a set of virtual vectors to generate synthetic waveforms by the VAE's decoder. The virtual vector set is generated by a grid with the same step length in the latent space, and then the decoder generates the waveforms corresponding to the inputted vectors. The two-dimensional range of the virtual vector grid is assigned based on the standard deviation (σ) of Z1 and Z2, as shown in the gray-dashed rectangle in Fig. 3a. The ranges in Z1 and Z2 are both divided into 10 intervals each. The synthetic waveforms are shown in Fig. 3b as well as the corresponding area of clusters. The VAE's learning target involves waveform reconstruction. Consequently, we can equate the synthetic waveforms with the input reflector waveforms that are encoded as identical vectors in the latent space. These synthetic waveforms can serve as a direct reference for the initial cluster selection of input subglacial water reflections using conventional waveform methods, such as Hao et al. (2023).

2.4 Subglacial lake detection

We further investigate the geometry features of synthetic waveforms in different clusters. We initially identify one of the clusters corresponding to subglacial lakes (indicated by black clusters and the region in the upper-right corner in Fig. 3b). The waveforms within this cluster display symmetrical shapes and rapid signal attenuation near the waveform peak, similar to subglacial lake reflections previously identified in studies such as Schroeder et al. (2013), Hills et al. (2020), and Hao et al. (2023). Subsequently, we map the distribution of these reflectors in radar images. The results show that these reflectors are spatially continuous in radar images, and the reflectors generally display flat, bright characteristics (Fig. 5c). These continuous features are similar to the visual criteria used by glaciologists to identify subglacial lakes (Wolovick et al.2013; Schroeder et al.2013). Therefore, we further apply the results of the encoder clustering as a candidate distribution of subglacial lakes.

Figure 5The first example of subglacial lake detection includes two larger and two smaller subglacial lakes. (a) The IPR image is shown with the blue-dashed line, indicating the positions of the ice bottom reflectors. (b) Separated, realigned, smoothed, denoised, and normalized ice bottom reflector features, which are applied as inputs to the encoder. (c) Results of the unsupervised clustering of the latent vectors obtained through encoding, where different colors correspond to the classes in Fig. 3. The black cluster corresponds to the candidate subglacial lakes. (d) The subglacial lake detection based on continuous reflector features, where black blocks represent the raw detected subglacial lakes, yellow blocks are occasional interruptions that are filled by interpolation, and white blocks correspond to the non-lake clusters.


2.4.1 Spatial threshold and discontinuity interpolation

It has been observed that the signal-to-noise ratio of radar images from deep ice sheets is low due to the attenuation of radar signals (Hills et al.2020). The interference of noise can occasionally cause odd clusters in the detection of candidate subglacial lakes (e.g., Fig. 5c). Occasionally, subglacial lakes may be mistakenly identified as appearing in non-lake areas. Additionally, the complex conditions of the ice bottom can cause interruptions in subglacial lake detection. To eliminate noise interference and extract continuous subglacial lakes, we limit the minimum width of subglacial lakes in observational detection. Detected subglacial lakes should contain a continuous ice bottom segmentation in subglacial water type with a width greater than eight traces (corresponding to an average spatial distance of 112 m). Meanwhile, interruptions in continuous subglacial lakes, which are narrower than eight traces, are considered noise interference and will be interpolated and filled into nearby subglacial lakes. During interpolation, it is ensured that the interpolated non-subglacial lakes in the continuous subglacial lakes are less than 25 % to avoid mistaken detection caused by abundant interpolation.

2.4.2 Depth-dependent echo power filtering

By implementing a threshold on the minimum width of subglacial lakes, we obtain a list of candidate lakes with larger widths, effectively minimizing noise interference. However, some of these candidates still exhibit weak and indistinct bottom reflector features that could not be conclusively identified as subglacial lakes. Therefore, we follow the conventional subglacial lake detection method based on englacial signal attenuation of bed reflectors (Wolovick et al.2013; Hills et al.2020). In this study, we apply a simplified process, using a linear threshold based on the average peak reflector echo power at different depths to reduce the reflector power anomalies. Values of peak echo power and depth are directly extracted from radar images for each reflector without ice surface correction, simplifying the approach. We calculate the best linear fit and standard deviation in the two-dimensional distribution of ice thickness (depth) and peak echo power of bottom reflectors from the radar images (Fig. 6). Considering that the VAE clustering has analyzed reflector features and the simplified ice thickness is applied, a lower linear threshold on the average echo power (best fit +1σ, compared to the previous study: Wolovick et al.2013) is applied to preserve potential subglacial lakes. The confirmed reflectors are represented as black points in Fig. 6. The average echo power of the detected subglacial lakes in the filtered list should surpass the threshold at the corresponding average depth. Consequently, this final refinement excluded candidate lakes exhibiting weak and blurred reflector features.

Figure 6Distribution of ice bottom reflection peak power and ice thickness. The background color map represents the probability density estimation (PDE) of the data. The orange dashed line represents the best linear fit. The black dashed line denotes the +1σ cutoff threshold. Gray dots represent reflector samples, while black dots represent the detected samples for subglacial lakes.


3 Results

We apply the encode–cluster method to the IPR images in the AGAP-S dataset and trace the spatial distribution of subglacial lakes in the images. In this section, we first demonstrate subglacial lakes detected at different scales. Next, we compare the distribution of the detected subglacial lakes with known lake inventories and discuss the newly detected subglacial lakes as well as the known subglacial lakes missed by our method.

3.1 Subglacial lakes on different scales

Figure 5 shows two large subglacial lake distributions and two smaller subglacial lakes located at the bottom of subglacial valleys. The two larger lakes on the right display high echo power as well as continuous and flat reflection features, which are relatively easy to detect visually. In contrast, the two smaller subglacial lakes on the left are easily overlooked due to their relatively narrow widths containing insufficient continuous and flat reflection features. This example demonstrates the detection of two different types of subglacial lakes of varying widths from within a radar image. The geothermal and subglacial environments should be similar at the length scales covered by the radar image, which was continuously recorded in adjacent areas. Therefore, the detection of the two smaller subglacial lakes can be considered reliable based on the reflector encodes and the following clustering results. In addition to the examples shown in Fig. 5, we provide further examples of subglacial lake detection in Figs. 7 and 8, where the workflows and sequences applied are identical to those shown in Fig. 5.

Figure 7Second example of subglacial lake detection, which contains a relatively narrow subglacial lake. (a) Input radar image, with the blue-dashed line indicating the positions of the ice bottom reflectors. (b) Separated, realigned, smoothed, denoised, and normalized ice bottom reflector features, which are used as inputs to the encoder. (c) Results of the unsupervised clustering of the latent vectors obtained through encoding, where different colors correspond to the classes in Fig. 3. The black cluster corresponds to the candidate subglacial lakes. (d) The subglacial lake detection based on continuous reflector features, where black blocks represent the raw detected subglacial lakes, yellow blocks are occasional interruptions that are filled by interpolation, and white blocks correspond to the non-lake clusters.


Figure 8Third example of subglacial lake detection, which contains a subglacial lake. (a) Input radar image, with the blue-dashed line indicating the positions of the ice bottom reflectors. (b) Separated, realigned, smoothed, denoised, and normalized ice bottom reflector features, which are used as inputs to the encoder. (c) Results of the unsupervised clustering of the latent vectors obtained through encoding, where different colors correspond to the classes in Fig. 3. The black cluster corresponds to the candidate subglacial lakes. (d) The subglacial lake detection based on continuous reflector features, where black blocks represent the raw detected subglacial lakes, yellow blocks are occasional interruptions that are filled by interpolation, and white blocks correspond to the non-lake clusters. The yellow arrow indicates another continuous subglacial reflector class distribution, which may correspond to other symbolic subglacial conditions. The white arrow indicates a possible subglacial frozen-on ice condition.


Figure 7 shows the detection of a relatively small subglacial lake, which is located at the concave bottom of a subglacial valley. Despite its short length, this lake displays a flat and continuous reflection interface, with strong echo power and rapid attenuation characteristics, making it visually similar to the larger lakes in Figs. 5 and 8. The continuous reflection features of this type of smaller subglacial lake are narrower and less prominent, which makes them easily overlooked in visual detection in previous studies.

Figure 8 presents a special example of a large continuous subglacial lake (at about 40 km along the transect) shown in a radar image. This subglacial lake has high returned power and flat reflection features that are visually easy to detect. However, only part of the lake is detected by the encode–cluster method based on reflector features, and discontinuities are found within the lake (Fig. 8d). Upon inspecting the radar image (Fig. 8a), we observe that the left part of this subglacial lake (indicated by the white arrow in Fig. 8c) displays different reflector features from the detected part of the lake. These inconsistent features visually have relatively thick and uniform reflection layer-like features near the ice bottom interface, resembling frozen-on ice as described by Bell et al. (2011). Additionally, another discontinuity interrupts the detected subglacial lake distribution in the center, which also implies thick reflection layer features. Moreover, in other areas of the radar image, there are also other continuous clusters of subglacial reflector features (as the yellow arrow indicates in Fig. 8c). By tracing these clusters, we note that these different classes of reflectors correspond to distinct uniform reflection layers with varying thicknesses. Due to the similar features of ice bottom reflections, we suggest that these continuous spatial distributions may relate to the ice flow dynamics and different stages of frozen-on ice (Bell et al.2011).

3.2 Spatial distribution of detected subglacial lakes

We compile and integrate the identified subglacial lakes and lakes from the AGAP-S IPR images and locate each detection within the spatial sampling range of each radar image provided by the database. Figure 9 presents the spatial distribution of subglacial lakes detected in the Gamburtsev Subglacial Mountains region, where the blue points represent subglacial lakes that have been confirmed by applying the peak reflection power filter to subglacial lake candidates detected by the encode–cluster method (light cyan points in Fig. 9). Overall, the subglacial lakes are distributed in clusters with spatial continuity (e.g., the regional cluster near the L1 and L3 area), but some isolated lakes are also detected, such as the L2 survey line. Densely distributed subglacial lakes in specific regions are usually identified in radar images by their obvious reflections, as illustrated in Figs. 5 and 7. In these cases, most detections are validated through peak power filtering. However, certain candidates in densely distributed regions exhibit lower peak reflector echo power than the thresholds used here. This discrepancy is primarily attributed to the ambiguous and weak reflections often associated with spatially extensive flat ice bottom shapes. An example of such candidates can be observed in the densely distributed light cyan points near the lower-left corner in Fig. 9.

Figure 9Distribution of airborne radar observation lines and detected subglacial lakes in the Gamburtsev Subglacial Mountains region. Blue points indicate the distribution of subglacial lakes detected in this study. Light cyan points mark all the lake candidates from the VAE cluster method before echo power filtering. Red and yellow points mark the subglacial lake distribution from the subglacial lake inventory in Wolovick et al. (2013) and Livingstone et al. (2022), respectively. Text labels with gray arrows indicate the positions and directions of selected survey lines (shown with black arrows), where L1, L2, and L3 correspond to the detection example survey lines (Figs. 5, 7, and 8), N1–4 correspond to the newly detected lakes in Fig. 10, and E1–5 correspond to the mismatched lakes with the known inventory in Fig. 11. The inset map shows the location of the study area.

In addition to the subglacial lakes detected in this study, we compare the subglacial lakes detected in this study with the previously identified subglacial lake distributions, as shown by the red and yellow points in Fig. 9, which correspond to the inventories of Wolovick et al. (2013) and Livingstone et al. (2022), respectively. The two larger subglacial lakes (shown in Figs. 5 and 8) correspond to the known subglacial lakes listed in the inventories (labeled L1 and L3 in Fig. 9). In contrast, the narrow subglacial lake shown in Fig. 7 was not previously included in the inventories (labeled L2 in Fig. 9). Overall, the subglacial lake distribution detected in this study roughly overlaps with the known inventory, but there are also some mismatches, such as the lines labeled E1–E5 and N1–N4 in Fig. 9. To further investigate the reasons for these discrepancies, we select the corresponding radar image segments from the labeled regions and plot the segments in Figs. 10 and 11, respectively.

Figure 10Newly detected subglacial lakes and lakes in the Gamburtsev Subglacial Mountains region, with text labels corresponding to the location distribution in Fig. 9. Scatter points in different colors mark the encoded classes of the ice bottom reflectors, and the blue and orange lines indicate the detected range of subglacial lakes. (a) IPR image of the N1 region, containing a known subglacial lake (orange line on the left) and a newly detected subglacial lake (right). (b) Radar image segment of the N2 region, containing a newly labeled subglacial lake area; the two relatively flat ice bottom reflection segments indicated by red arrows may record the frozen-on ice. (c–d) Newly detected subglacial lakes with smaller sizes from radar image segments from the N3 and N4 regions, respectively.


Figure 11IPR image segments from the Gamburtsev Subglacial Mountains area, which mismatch the identified lake inventory (Wolovick et al.2013; Livingstone et al.2022). The text labels correspond to the locations marked in Fig. 8, and the orange arrows mark the locations of the identified subglacial lakes from the inventories.


Figure 10 displays segments of radar images from the N1–N4 subregions, revealing multiple new subglacial lakes detected by the new method (indicated as blue lines in Fig. 10). Figure 10a illustrates two detected subglacial lakes, one on the left (marked as the orange line below) that was already included in previous subglacial lake inventories and one on the right that is newly detected by the new method. The ice bottom reflectors of both subglacial lakes have a similar visual appearance, with sharp and narrow reflectors on the z axis. However, the lake on the right has a narrower width, which could make it easier to overlook visually, potentially causing it to be neglected in previous studies.

The radar segment displayed in Fig. 10b is from the N2 subregion near 83° S, 70° E in Fig. 9, where a group of continuous subglacial lakes has been detected and recorded in the known inventory (Fig. 9, N2). However, there is no previous detection in this radar image from the known inventory. The new method detects subglacial lakes in about 7 km in Fig. 10b. It is worth noting that multiple reflectors with thick layer features (marked by red arrows) are displayed in 16 and 29 km simultaneously. Considering the dense distribution of subglacial lakes nearby, these thicker reflection features are possibly formed by frozen-on ice that complicates the shape of the near-basal reflection trace. Figure 10c and d show several smaller subglacial lakes, which are similar to the narrow subglacial lake shown in example 2 in Fig. 7. These small lakes may have originated from local melting or subglacial rivers corresponding to the regionally dense distribution of subglacial lakes near the L2, N3, and N4 regions in Fig. 9.

Figure 11 presents subglacial lakes previously identified in the inventories but which are not accurately detected by the encode–cluster method and echo power filtering. The orange arrows in Fig. 11 indicate the locations of previously identified subglacial lakes. In Fig. 11a, although there are multiple candidate lakes in the E1 subregion from the encode clustering in the radar segment, the average peak power is insufficient to confirm the subglacial lake in each lake candidate. The ice bottom reflectors of this radar segment are visually different from other known subglacial lake features (e.g., Figs. 5, 8, and 10). In Fig. 11b, the ice bottom reflectors near known subglacial lakes are classified as corresponding to other reflector classes instead of lake reflectors. By inspecting the radar image, these reflectors display a thick layer near the ice bottom reflections, which are similar to the reflections in Fig. 8c. We hypothesize that the subglacial lake in this segment may be associated with a frozen-on ice condition, distinguishing it from the thick bottom reflectors observed in Fig. 10b. In the latent space (Fig. 3b), the clusters in Fig. 10b (purple) and Fig. 8c (yellow) are adjacent. This adjacency suggests the presence of multiple clusters potentially corresponding to different phases of frozen-on ice.

Similarly, the reflectors from previously identified subglacial lakes in the E4 and E5 subregions in Fig. 11d and e are also classified as other ice bottom reflection classes by the encode–cluster method. By observing the radar image segments corresponding to these two subregions, the ice bottom reflectors that corresponded to previously identified subglacial lakes also display thicker layer reflections, which do not match other subglacial lake features from other regions. The undetected subglacial lake in Fig. 11c, near 25 km, is similar to Fig. 11a, where the average echo power is insufficient to confirm the subglacial lakes. Also, the encoded classes of ice bottom reflectors near the white arrow change to other classes, indicating that the potential lakes here may consist of more complex bottom conditions.

Overall, the new method in this study can capture more candidate subglacial lakes with similar reflector features to the previous lake inventories. Compared to the previously identified lake inventories, most of the newly detected subglacial lakes in this study are smaller subglacial lakes, which are more possibly overlooked in manual visual inspections and easily obscured by multi-trace averaging in detection windows. This automated method can promote updates of the known lake inventory with further investigation. The reflector waveform analysis can also provide additional candidate clusters of similar subglacial conditions. Further investigations, including drilling or modeling, will help to elucidate the connection between reflection waveforms and distinct basal conditions. This exploration may potentially interpret the false positives of subglacial lakes (e.g., Fig. 11) in known inventories.

3.3 Detection sensitivity with the number of clusters

Since different numbers of clusters (K) influence the area in the latent space, it becomes essential to assess the detection sensitivity by testing different values of K. Figure 12 illustrates the detected subglacial lake distributions under different K values. Figure 12a exhibits the boundaries between lake and non-lake clusters in the latent space. The black points in Fig. 12a represent vectors from the detected lakes in this study. Notably, some points fall outside the boundary of K=15 (white dashed curve) due to complementation through interpolations. When considering K values between 14 and 16 (as indicated by the red, blue, and white dashed curves), the clustered areas corresponding to lakes exhibit a relatively stable and lower misfit to the vectors from the final lake list. Figure 12b showcases the subglacial lake detections in a regional area from AGAP-S (from the black box in Fig. 9). Compared with the known lake inventory (gray points), smaller K values lead to more erroneous detections (e.g., sparse yellow points when K=8), while larger K values might miss more lakes. When K=15, the detected lake distribution aligns well with the known lake inventory. Figure 12c and e demonstrate subglacial lake detections with different K values in same radar images. Similarly to the spatial distribution in Fig. 12b, smaller K values could result in false detections, such as the mistaken detection indicated by the yellow line near 30 km in Fig. 12e (when K=8). Conversely, larger K values limit the detection range of subglacial lakes and introduce unexpected discontinuities. Overall, the subglacial lake range detected with K=15 correlates with visual observations. We further expanded the dataset by incorporating more data (5 % of the waveforms from the dataset) into clustering and then traced the subglacial lake detections. The results (black lines in Fig. 12c, e) showed negligible differences compared to the detections from the smaller dataset clustering (white lines).

Figure 12Differences in lake detections under different K values. (a) Sectional latent space. Curves in different colors indicate the boundaries between the lake and non-lake clusters when different K values are applied in the clustering, with yellow for K=8, orange for K=11, red for K=14, white dashed line for K=15, blue for K=16 (partial overlapping with the white dashed line), cyan for K=20, and pink for K=30. Black points denote the reflector vectors from detected lakes in this study, and the white cross denotes the centroid of all the lake vectors. (b) Regional spatial distributions of detected subglacial lakes in the map. The map area is truncated from the black box in Fig. 9, where gray points represent the known-lakes inventory (Livingstone et al.2022) and black lines indicate the detected ranges of subglacial lakes when K=15. Stacked points in various colors represent the detected lake distributions under different K values corresponding to panel (a). (c, e) Difference in subglacial lake detection under different K values in radar image samples from Figs. 5 and 10a, where differently colored lines represent the detected lake ranges under different K values. Black lines denote the detection ranges when K=15, with 5 % of the dataset applied in clustering. (d, f) Latent space distances (LSDs) between bottom reflector vectors to the centroid of detected lake vectors, derived from panels (c) and (e), respectively.


In the latent space, the difference in reflector features can be measured based on the distance of the corresponding vectors from the reflectors. Hence, the latent space distance serves as a statistical similarity indicator for reflector features. Using the newly compiled subglacial lake list, we can trace vectors corresponding to lake reflectors within the newly detected lake ranges (depicted as black points in Fig. 12a). These vectors contribute to a robust dataset, establishing a capable centroid of lake vectors in the latent space denoted by the white cross in Fig. 12a. The disparity between the lake centroid and each reflector can be quantified using the latent space distance, serving as an index for the reflector's similarity to the lake echo feature. Figure 12d and f demonstrate latent space distances (LSDs) from the lake centroid, where the y axes are reversed. Reflectors, which are similar to lake features, exhibit continuous and flat peaks (close to 0), while other reflectors display larger differences. Some regional reflectors also show brief high similarity (e.g., ∼4.5 km in Fig. 12f), possibly corresponding to smaller water bodies or water tunnels and overlooked by the minimum lake width threshold.

4 Discussion

The subglacial analysis method proposed in this study is based on the shape of the ice bottom reflector features, which enables the full exploitation of ice bottom echo waveform information contained in the IPR observation data, providing a novel observational perspective for the study of the ice bottom beyond reflection power intensity and roughness. By contrast with conventional supervised learning methods, this study acquires no manual labels, thereby minimizing the potential for artificial misfit from training labels and allowing for the application of this approach in surveying other potential subglacial conditions.

The unsupervised clustering in the latent vectors relies on the feature difference in reflection waveforms, allowing analysis of reflectors without precise interpretations of basal radar reflectances and reducing dependence on model assumptions. However, subjective elements persist, such as the experiential selection of the K value and the lattice-like boundaries observed in Fig. 12. For future studies, with the detection of more lakes from different regions, a more precise centroid of lake vectors can be established. Moreover, an ample sample size will yield a more credible lake boundary in latent space and a reliable threshold for the similarity index based on latent space distance.

Given the potential flattening effect on the vector distribution in the latent space by the variational module in the VAE (Fig. 3a), we conduct a comparative analysis using an auto-encoder lacking the variational module (Fig. S4). We compute the two-dimensional probability density for both distributions. In contrast to the VAE distribution (Fig. S4a), the distribution derived from the auto-encoder without the variational module (Fig. S4b) displays a more uniform trend and lacks discernible cluster patterns. Compared to the auto-encoder without the variational module, the VAE provides a continuous latent space (Doersch2016), facilitating the direct tracing of waveforms from different clusters through synthetic waveforms generated from the latent space (Fig. 3b).

In this study, the final subglacial lakes are obtained using radar echo power filtering, which is based on the linear relationship between reflection power and ice thickness (depth of the ice bottom). However, this simple linear threshold filtering potentially excludes subglacial lakes with weaker echo power. To improve the detection of weaker subglacial lake signals, more precise filtering strategies that take into account the roughness and slope of the ice bottom may be beneficial.

Although the encode–cluster method provides an abstract classification for ice bottom reflections, the physical properties of the ice bottom reflection and the corresponding cluster still require further interpretation. The VAE encoder maps high-dimensional reflections to a vector that corresponds to the reflector waveform feature. In the future, physical modeling and in situ drilling may provide more direct relationships between the latent vectors and subglacial conditions, thereby enhancing the understanding of this subglacial lake detection method.

In addition to subglacial water bodies, other clusters of ice bottom reflections also exhibit some consistent patterns, as illustrated in Figs. 8, 10b, and 11b. These resemblances may originate from similar subglacial conditions, particularly the thick layer-like reflections that could correspond to different stages of frozen-on ice. The encode–cluster method is capable of isolating these reflection clusters, offering a potential reference for studying glacier dynamics. Geostatistical modeling based on subglacial topography (MacKie et al.2020) may provide additional references for the reflector clusters under corresponding subglacial conditions. Furthermore, the novel interpretation of latent encoding and clustering could enhance conventional geostatistical analysis by directly utilizing the encoded or clustered results as input or reducing input data dimensions.

The detection method used in this study is based on deep learning, allowing for a mostly automated analysis of data. Deep-learning extractors, such as EisNet (Dong et al.2021), developed in recent years can efficiently pick up the bed interface in radar images. By combining these two types of deep-learning methods, an automatic method can be implemented to first extract the positions of the ice bottom and then analyze the features of the bottom reflector, which can further update the subglacial lake inventory by applying this combined deep-learning method in the available IPR database. The data used in this study are focused on the Gamburtsev Subglacial Mountains and can be extended to another database's radar image analyses covering, e.g., the Arctic, Antarctic, and Qinghai–Tibet Plateau. As such, this has potential applications for analyzing and tracing spatiotemporal changes in global subglacial lakes and other ice bottom reflection features. Furthermore, this method based on a vertical radar waveform also enables the single-trace waveform analysis from A-scope radar data, especially for early observations (Schroeder et al.2022).

5 Conclusions

We established a workflow utilizing deep clustering for the detection of subglacial lakes in the Gamburtsev Subglacial Mountains region. Based on the IPR data from the CReSIS database, we constructed a comprehensive dataset of ice bottom reflection signals. With this dataset, we trained the VAE to reconstruct the reflection features and encode them into the latent space. With the subsequent K-means clustering within the encoded features in latent space, we separated the reflector features potentially corresponding to subglacial lakes. By considering the relationship between the peak reflection power and ice thickness, we filtered subglacial lake candidates in this region. Compared with existing inventories, our method can effectively detect features of subglacial lakes and extract more smaller subglacial lakes. This method has potential applications in expanding the subglacial lake inventory and interpreting other subglacial conditions.

Code availability

The model code is available from GitHub (, last access: 29 February 2024) and Zenodo (, Dong2024).

Data availability

The IPR images and bed echo markers used in this study were obtained from (last access: 29 February 2024, CReSIS2024).


The supplement related to this article is available online at:

Author contributions

SD, LF, and XT conceptualized the study. SD, LF, and ZL implemented the methods and analysis. SD, LF, XT, and XC interpreted the results. SD, LF, and XT wrote the manuscript with corrections by ZL and XC.

Competing interests

The contact author has declared that none of the authors has any competing interests.


Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.


The authors would like to express their gratitude to Dustin Schroeder from Stanford University and Tong Hao from Tongji University for their valuable suggestions regarding this work. Additionally, the authors would like to extend their appreciation to the reviewers, Michael Wolovick and Veronica Tollenaar, for their helpful comments and contributions that greatly improved this study.

Financial support

This research has been supported by the National Natural Science Foundation of China (grant nos. 42276257, 41941006, and 41974044), the National Key Research and Development Program of China (grant no. 2021YFC2801404), and the CUG Scholar Scientific Research Funds at the China University of Geosciences (Wuhan) (grant no. 2022132).

Review statement

This paper was edited by Huw Horgan and reviewed by Michael Wolovick and Veronica Tollenaar.


Arnold, E., Leuschen, C., Rodriguez-Morales, F., Li, J., Paden, J., Hale, R., and Keshmiri, S.: CReSIS airborne radars and platforms for ice and snow sounding, Ann. Glaciol., 61, 58–67, 2020. a, b, c

Bailey, D.: Polar-cap absorption, Planet. Space Sci., 12, 495–541, 1964. a

Bell, R. E., Ferraccioli, F., Creyts, T. T., Braaten, D., Corr, H., Das, I., Damaske, D., Frearson, N., Jordan, T., Rose, K., Studinger, M., and Wolovick, M.: Widespread persistent thickening of the East Antarctic Ice Sheet by freezing from the base, Science, 331, 1592–1595, 2011. a, b

Bowling, J., Livingstone, S., Sole, A., and Chu, W.: Distribution and dynamics of Greenland subglacial lakes, Nat. Commun., 10, 1–11, 2019. a

Carter, S. P., Blankenship, D. D., Peters, M. E., Young, D. A., Holt, J. W., and Morse, D. L.: Radar-based subglacial lake classification in Antarctica, Geochem. Geophy. Geosy., 8, Q03016,, 2007. a, b

Cheng, X. and Jiang, K.: Crustal model in eastern Qinghai-Tibet plateau and western Yangtze craton based on conditional variational autoencoder, Phys. Earth Planet. Int., 309, 106584,, 2020. a

Christner, B. C., Priscu, J. C., Achberger, A. M., Barbante, C., Carter, S. P., Christianson, K., Michaud, A. B., Mikucki, J. A., Mitchell, A. C., Skidmore, M. L., Vick-Majors, T. J., and the WISSARD Science Team: A microbial ecosystem beneath the West Antarctic ice sheet, Nature, 512, 310–313, 2014. a

CReSIS: Radar Depth Sounder data, digital media, CReSIS [data set], (last access 29 February 2024), 2024. a, b

Cuffey, K. M. and Paterson, W. S. B.: The physics of glaciers, Academic Press, ISBN 9780123694614, 2010. a

Doersch, C.: Tutorial on variational autoencoders, arXiv [preprint],, 2016. a

Dong, S.: Dongsh/EisVAE: v0.01 (v0.01), Zenodo [code],, 2024. a

Dong, S., Tang, X., Guo, J., Fu, L., Chen, X., and Sun, B.: EisNet: Extracting Bedrock and Internal Layers from Radiostratigraphy of Ice Sheets with Machine Learning, IEEE T. Geosci. Remote, 60, 1–12, 2021. a, b, c

Dowdeswell, J. A. and Evans, S.: Investigations of the form and flow of ice sheets and glaciers using radio-echo sounding, Rep. Prog. Phys., 67, 1821,, 2004. a

Dowdeswell, J. A. and Siegert, M. J.: The dimensions and topographic setting of Antarctic subglacial lakes and implications for large-scale water storage beneath continental ice sheets, Geol. Soc. Am. Bull., 111, 254–263, 1999. a

Esfahani, R. D. D., Vogel, K., Cotton, F., Ohrnberger, M., Scherbaum, F., and Kriegerowski, M.: Exploring the Dimensionality of Ground-Motion Data by Applying Autoencoder Techniques, B. Seismol. Soc. Am., 111, 1563–1576, 2021. a

Fettweis, X., Franco, B., Tedesco, M., van Angelen, J. H., Lenaerts, J. T. M., van den Broeke, M. R., and Gallée, H.: Estimating the Greenland ice sheet surface mass balance contribution to future sea level rise using the regional atmospheric climate model MAR, The Cryosphere, 7, 469–489,, 2013. a

Gades, A. M., Raymond, C. F., Conway, H., and Jagobel, R.: Bed properties of Siple Dome and adjacent ice streams, West Antarctica, inferred from radio-echo sounding measurements, J. Glaciol., 46, 88–94, 2000. a

Gifford, C. M. and Agah, A.: Subglacial water presence classification from polar radar data, Eng. Appl. Artif. Intel., 25, 853–868,, 2012. a, b

Hao, T., Jing, L., Liu, J., Wang, D., Feng, T., Zhao, A., and Li, R.: Automatic Detection of Subglacial Water Bodies in the AGAP Region, East Antarctica, Based on Short-Time Fourier Transform, Remote Sens., 15, 363,, 2023. a, b, c

Hills, B. H., Christianson, K., and Holschuh, N.: A framework for attenuation method selection evaluated with ice-penetrating radar data at south pole lake, Ann. Glaciol., 61, 176–187, 2020. a, b, c

Horgan, H. J., Anandakrishnan, S., Jacobel, R. W., Christianson, K., Alley, R. B., Heeszel, D. S., Picotti, S., and Walter, J. I.: Subglacial Lake Whillans – Seismic observations of a shallow active reservoir beneath a West Antarctic ice stream, Earth Planet. Sc. Lett., 331, 201–209, 2012. a

Ilisei, A.-M. and Bruzzone, L.: A system for the automatic classification of ice sheet subsurface targets in radar sounder data, IEEE T. Geosci. Remote, 53, 3260–3277, 2015. a

Ilisei, A.-M., Khodadadzadeh, M., Ferro, A., and Bruzzone, L.: An automatic method for subglacial lake detection in ice sheet radar sounder data, IEEE T. Geosci. Remote, 57, 3252–3270, 2018. a, b

Kamb, B.: Glacier surge mechanism based on linked cavity configuration of the basal water conduit system, J. Geophys. Res.-Sol. Ea., 92, 9083–9100, 1987. a

Kazmierczak, E., Sun, S., Coulon, V., and Pattyn, F.: Subglacial hydrology modulates basal sliding response of the Antarctic ice sheet to climate forcing, The Cryosphere, 16, 4537–4552,, 2022. a

Key, K. and Siegfried, M. R.: The feasibility of imaging subglacial hydrology beneath ice streams with ground-based electromagnetics, J. Glaciol., 63, 755–771, 2017. a

King, M. D., Howat, I. M., Candela, S. G., Noh, M. J., Jeong, S., Noël, B. P., van den Broeke, M. R., Wouters, B., and Negrete, A.: Dynamic ice loss from the Greenland Ice Sheet driven by sustained glacier retreat, Commun. Earth Environ., 1, 1–7, 2020. a

Kingma, D. P. and Ba, J.: Adam: A method for stochastic optimization, arXiv [preprint],, 2014. a

Kingma, D. P. and Welling, M.: Auto-encoding variational bayes, arXiv [preprint],, 2013. a, b, c, d

Lang, S., Yang, M., Cui, X., Li, L., Cai, Y., Liu, X., Guo, J., Sun, B., and Siegert, M.: A Semiautomatic Method for Predicting Subglacial Dry and Wet Zones Through Identifying Dry–Wet Transitions, IEEE T. Geosci. Remote, 60, 1–15, 2022. a

Li, H. and Misra, S.: Prediction of subsurface NMR T2 distributions in a shale petroleum system using variational autoencoder-based neural networks, IEEE Geosci. Remote Sens. Lett., 14, 2395–2397, 2017. a

Li, Z.: A generic model of global earthquake rupture characteristics revealed by machine learning, Geophys. Res. Lett., 49, e2021GL096464,, 2022. a, b, c

Liu, M., Grana, D., and de Figueiredo, L. P.: Uncertainty quantification in stochastic inversion with dimensionality reduction using variational autoencoder, Geophysics, 87, M43–M58, 2022. a

Liu-Schiaffini, M., Ng, G., Grima, C., and Young, D.: Ice Thickness From Deep Learning and Conditional Random Fields: Application to Ice-Penetrating Radar Data With Radiometric Validation, IEEE T. Geosci. Remote, 60, 1–14, 2022. a

Livingstone, S. J., Li, Y., Rutishauser, A., Sanderson, R. J., Winter, K., Mikucki, J. A., Björnsson, H., Bowling, J. S., Chu, W., Dow, C. F., Fricker, H. A., McMillan, M., Ng, F. S. L., Ross, N., Siegert, M. J., Siegfried, M., and Sole, A. J.: Subglacial lakes and their changing role in a warming climate, Nature Rev. Earth Environ., 3, 106–124, 2022. a, b, c, d, e, f, g

Lopez-Alvis, J., Laloy, E., Nguyen, F., and Hermans, T.: Deep generative models in inversion: The impact of the generator's nonlinearity and development of a new approach based on a variational autoencoder, Comput. Geosci., 152, 104762,, 2021. a

Ma, S., Li, Z., and Wang, W.: Machine learning of source spectra for large earthquakes, Geophys. J. Int., 231, 692–702, 2022. a

MacKie, E., Schroeder, D., Caers, J., Siegfried, M., and Scheidt, C.: Antarctic topographic realizations and geostatistical modeling used to map subglacial lakes, J. Geophys. Res.-Earth, 125, e2019JF005420,, 2020. a

MacQueen, J.: Classification and analysis of multivariate observations, in: 5th Berkeley Symp. Math. Statist. Probability, University of California Los Angeles LA USA, 281–297, 1967. a, b

Mikucki, J. A., Lee, P. A., Ghosh, D., Purcell, A. M., Mitchell, A. C., Mankoff, K. D., Fisher, A. T., Tulaczyk, S., Carter, S., Siegfried, M. R., Fricker, H. A., Hodson, T., Coenen, J., Powell, R., Scherer, R., Vick-Majors, T., Achberger, A. A., Christner, B. C., Tranter, M., and the WISSARD Science Team: Subglacial Lake Whillans microbial biogeochemistry: a synthesis of current knowledge, Philos. T. Roy. Soc. A, 374, 20140290,, 2016. a

Oswald, G. and Robin, G. D.: Lakes beneath the Antarctic ice sheet, Nature, 245, 251–254, 1973. a

Paden, J., Akins, T., Dunson, D., Allen, C., and Gogineni, P.: Ice-sheet bed 3-D tomography, J. Glaciol., 56, 3–11, 2010. a

Pattyn, F.: Antarctic subglacial conditions inferred from a hybrid ice sheet/ice stream model, Earth Planet. Sc. Lett., 295, 451–461, 2010. a

Peters, L., Anandakrishnan, S., Holland, C., Horgan, H., Blankenship, D., and Voigt, D.: Seismic detection of a subglacial lake near the South Pole, Antarctica, Geophys. Res. Lett., 35, L23501,, 2008. a

Rahnemoonfar, M., Fox, G. C., Yari, M., and Paden, J.: Automatic ice surface and bottom boundaries estimation in radar imagery based on level-set approach, IEEE T. Geosci. Remote, 55, 5115–5122, 2017. a

Robin, G. D. Q.: Ice movement and temperature distribution in glaciers and ice sheets, J. Glaciol., 2, 523–532, 1955. a

Robin, G. d. Q., Evans, S., and Bailey, J. T.: Interpretation of radio echo sounding in polar ice sheets, Philos. T. Roy. Soc. Lond.-A, 265, 437–505, 1969. a

Robin, G. D. Q., Swithinbank, C., and Smith, B.: Radio echo exploration of the Antarctic ice sheet, Int. Assoc. Sci. Hydrol. Publ., 86, 97–115, 1970. a

Schroeder, D. M., Blankenship, D. D., and Young, D. A.: Evidence for a water system transition beneath Thwaites Glacier, West Antarctica, P. Natl. Acad. Sci. USA, 110, 12225–12228, 2013. a, b, c

Schroeder, D. M., Broome, A. L., Conger, A., Lynch, A., Mackie, E. J., and Tarzona, A.: Radiometric analysis of digitized Z-scope records in archival radar sounding film, J. Glaciol., 68, 733–740, 2022. a

Siegert, M. J.: Antarctic subglacial lakes, Earth-Sci. Rev., 50, 29–50, 2000.  a

Siegert, M. J. and Ridley, J. K.: Determining basal ice-sheet conditions in the Dome C region of East Antarctica using satellite radar altimetry and airborne radio-echo sounding, J. Glaciol., 44, 1–8, 1998. a

Siegfried, M. R., Fricker, H. A., Carter, S. P., and Tulaczyk, S.: Episodic ice velocity fluctuations triggered by a subglacial flood in West Antarctica, Geophys. Res. Lett., 43, 2640–2648, 2016. a

Smith, A. M., Woodward, J., Ross, N., Bentley, J., Hodgson, D. A., Siegert, M. J., and King, E. C.: Evidence for the long-term sedimentary environment in an Antarctic subglacial lake, Earth Planet. Sc. Lett., 504, 139–151, 2018. a

Stearns, L. A., Smith, B. E., and Hamilton, G. S.: Increased flow speed on a large East Antarctic outlet glacier caused by subglacial floods, Nat. Geosci., 1, 827–831, 2008. a

Studinger, M., Bell, R. E., and Tikku, A. A.: Estimating the depth and shape of subglacial Lake Vostok's water cavity from aerogravity data, Geophys. Res. Lett., 31, L12401,, 2004. a

Varshney, D., Rahnemoonfar, M., Yari, M., and Paden, J.: Deep ice layer tracking and thickness estimation using fully convolutional networks, in: 2020 IEEE International Conference on Big Data (Big Data), 3943–3952,, 2020. a

Varshney, D., Rahnemoonfar, M., Yari, M., Paden, J., Ibikunle, O., and Li, J.: Deep learning on airborne radar echograms for tracing snow accumulation layers of the Greenland ice sheet, Remote Sens., 13, 2707,, 2021. a, b

Wolovick, M. J., Bell, R. E., Creyts, T. T., and Frearson, N.: Identification and control of subglacial water networks under Dome A, Antarctica, J. Geophys. Res.-Earth, 118, 140–154,, 2013. a, b, c, d, e, f, g, h, i

Xu, M., Crandall, D. J., Fox, G. C., and Paden, J. D.: Automatic estimation of ice bottom surfaces from radar imagery, in: 2017 IEEE International Conference on Image Processing (ICIP), 340–344,, 2017. a

Yari, M., Rahnemoonfar, M., and Paden, J.: Multi-scale and temporal transfer learning for automatic tracking of internal ice layers, in: IGARSS 2020–2020 IEEE International Geoscience and Remote Sensing Symposium, 6934–6937,, 2020. a

Zeising, O., Steinhage, D., Nicholls, K. W., Corr, H. F. J., Stewart, C. L., and Humbert, A.: Basal melt of the southern Filchner Ice Shelf, Antarctica, The Cryosphere, 16, 1469–1482,, 2022. a

Short summary
Subglacial lakes are a unique environment at the bottom of ice sheets, and they have distinct features in radar echo images that allow for visual detection. In this study, we use machine learning to analyze radar reflection waveforms and identify candidate subglacial lakes. Our approach detects more lakes than known inventories and can be used to expand the subglacial lake inventory. Additionally, this analysis may also provide insights into interpreting other subglacial conditions.