Improved machine-learning based open-water/sea-ice/cloud discrimination over wintertime Antarctic sea ice using MODIS thermal-infrared imagery

The frequent presence of cloud cover in polar regions limits the use of the Moderate-Resolution Imageing Spectroradiometer (MODIS) and similar instruments for the investigation and monitoring of sea-ice polynyas compared to passivemicrowave-based sensors. The very low thermal contrast between present clouds and the sea-ice surface in combination with the lack of available visible and near-infrared channels during polar nighttime results in deficiencies in the MODIS cloud mask and dependent MODIS data products. This leads to frequent misclassifications of i) present clouds as sea ice and ii) open5 water/thin-ice areas as clouds, which results in an underestimation of polynya area and subsequently derived information. Here, we present a novel machine-learning based approach using a deep neural network that is able to reliably discriminate between clouds, sea-ice, and open-water/thin-ice areas in a given swath solely from thermal-infrared MODIS channels and additionally derived information. Compared to the reference MODIS sea-ice product, our data results in an overall increase of 31 % in annual swath-based coverage, attributed to an improved cloud-cover discrimination. Overall, higher spatial cover10 age results in a better sub-daily representation of thin-ice conditions that cannot be reconstructed with current state-of-the-art cloud-cover compensation methods.


Introduction
Information on cloud presence is of crucial importance when using thermal-infrared imagery. This is especially true for the polar regions, where the thermal contrast between clouds and the underlying snow and sea-ice surface can be low through persistent surface temperature inversion and low clouds (Welch et al., 1992). Furthermore, occurrences of warm clouds over cold sea ice and cold clouds over relatively warm and thin sea ice are both possible. Despite improvements (Liu et al., 2004;Frey et al., 2008;Holz et al., 2008;Liu and Key, 2014), the performance of the frequently used Moderate Resolution Imaging Spectroradiometer (MODIS) cloud mask product (MOD35/MYD35; Ackerman et al., 2015) is substantially reduced during polar nighttime compared to its performance during daytime conditions. Nonetheless, several studies use MODIS thermal-infrared (TIR) data to monitor polynya area and associated sea-ice production in polynyas both in the Arctic as well as the Antarctic and compare well to or even outperform studies using passive-microwave satellite data in certain regions (e.g., Paul et al., 2015;Aulicino et al., 2018;Preußer et al., 2019). These studies generally utilize ice-surface temperature from the National Snow and Ice Data Center (NSIDC) sea-ice product (MOD/MYD29; Hall et al., 2004;Hall and Riggs, Figure 1. Location of the general (orange) and focus (purple) study area of the Antarctic Brunt Ice Shelf in the southeastern Weddell Sea (green). Data of land ice (dark gray) and floating ice shelves (light gray) are retrieved from RTopo-2 (Refined Topography; Schaffer et al., 2016). 2015a, b). The MOD/MYD29 product is derived from both MODIS sensors on board the NASA polar-orbiting Aqua and Terra satellites with the MOD/MYD35 cloud mask product already applied (Riggs and Hall, 2015). However, especially positive temperature-anomaly features such as large warm open-water areas through sea-ice polynyas pose a problem for the MODIS cloud mask and result in frequent misclassification of these areas as cloud cover (Fraser et al., 2009). Additionally, other MODIS applications would potentially benefit from an improved wintertime cloud masking. These applications comprise composite generation (e.g., Fraser et al., 2010Fraser et al., , 2020, merged optical and passive microwave sensor applications (e.g., Ludwig et al., 2019), basin-wide lead detection from thermal-infrared data (e.g., Reiser et al., 2020), and sea-ice motion tracking through image cross-correlation.
In this study, we propose a novel machine-learning-based approach to discriminate between open-water and/or thinice, sea-ice, and cloud-covered areas in MODIS TIR swaths. We evaluate and analyze the use of a deep neural network (e.g., Kohonen, 1988;Goodfellow et al., 2016), building upon a comprehensive set of newly generated labeled training data. The data set is derived using a combined approach of unsupervised deep learning, subsequent clustering, and manual screening from co-located 1 km resolution MOD/MYD02 product data (MODIS Characterization Support Team (MCST), 2017a, b) accessed through the Level-1 and Atmosphere Archive & Distribution System (LAADS) Distributed Active Archive Center (DAAC) and the Sentinel-1 A/B (S1-A/B) synthetic aperture radar (SAR) calibrated backscatter data accessed through the Alaska Satellite Facility (ASF) DAAC as a cloud-independent reference.
The resulting classifier performance is then analyzed and evaluated based on wintertime estimates of the resulting polynya area in comparison to the MOD/MYD29 reference product for the Brunt Ice Shelf (BIS) region in the Antarctic Weddell Sea in the year 2017 ( Fig. 1). This region was chosen for its combination of high inter-annual polynya activity and high spatiotemporal coverage with Sentinel-1 data. Results are expected to be transferable to other polynya regions in the Antarctic.
In the following sections, we will first describe our methodology and input data starting with the employed basic methods and algorithms (Sect. 2.1) followed by the used input data (Sect. 2.2), a detailed explanation of the initialtraining-data generation scheme (Sect. 2.3), and the subsequent processing steps that lead to our final classifier (Sect. 2.4 and 2.5). Finally, we describe and discuss our results (Sect. 3) in comparison to standard MOD/MYD29derived estimates as well as using co-located S1-A/B SAR reference data. In the end we provide a summary and an outlook for future applications (Sect. 4).

Data and methods
In the following subsections we describe our methods and input data that lead to our deep neural network for the openwater-sea-ice-cloud discrimination (Fig. 2).

Basic methods and algorithms
This section intends to provide a basic introduction to the methods used in this study. However, it would be beyond the scope of this article to provide an exhaustive review of these methods. For more details, additional references are provided.
All computations for this study were carried out using the R software (R Core Team, 2018) running on a commercially available laptop.

Gray-level co-occurrence matrices (GLCMs)
Gray-level co-occurrence matrices (GLCMs) are a tool to quantify spatial texture based on brightness values of a pixel neighborhood (Haralick et al., 1973;Haralick, 1979;Hall-Beyer, 2017;R: Zvoleff, 2019). The directional-dependent occurrence frequencies of brightness-value combinations are counted and normalized to probabilities. Subsequently, several statistical measures can be calculated from the GLCM as an additional descriptive statistic of the data. Haralick et al. (1973) proposed 14 different metrics; however, not all were commonly adopted and implemented into modern software. For R, eight different measures are implemented (Zvoleff, 2019), of which we utilized four: GLCM mean, GLCM variance, contrast, and entropy (Table 1). Hall-Beyer (2017) showed that GLCM variance can be associated with edges of different class patches, while GLCM Figure 2. Flow chart summarizing all processing steps from the generation of the initial training data through manual classification of the swath-based split of the data into a calibration and a validation data set (Cal/Val) to the training of the final classifier and its application for open-water-sea-ice-cloud discrimination. mean and contrast and entropy correspond well to patchinterior texture.
In general, the use of GLCM texture metrics is suitable for cloud detection and classification in polar regions using visual, near-infrared, and thermal-infrared satellite data (Welch et al., 1992). However, as the size of each GLCM per pixel in a sliding-neighborhood window corresponds and increases proportionally to the image bit depth, computational cost increases rapidly for (i) large sliding windows and (ii) a large number of gray levels in the input data. For our study, all MOD/MYD02 channel-based input parameters for the GLCM computations were rescaled to 32 gray levels, using a 7 × 7 sliding-neighborhood window with horizontal, vertical, and diagonal directional pixel relationships.
The FCM is comparable to a classic k-means clustering approach (MacQueen, 1967;Hartigan and Wong, 1979), with the addition of providing cluster membership probabilities for each pixel. This type of clustering is also referred to as "soft" clustering. In contrast to "hard"-clustering approaches such as k-means, FCM allows for a pixel to belong to several clusters with a certain probability.
For this type of unsupervised clustering, it is necessary to preselect the number of clusters which the input data should be separated into. Without a priori knowledge about potential relationships and correlations between predictors, it is common practice to choose a large number of initial clusters and manually merge similar clusters afterwards to the desired number of classes.
In this study, we always use a setup of 35 clusters and stop the clustering process after 30 iterations.

Artificial neural networks (NNs)
An artificial neural network (NN) generally consists of several neurons organized in sequential layers in which each neuron of a layer is fully interconnected to all neurons in the adjacent two layers through weighted paths. These neurons respond to the weighted input of the preceding neurons and pass on their output to the adjacent neurons, modulated based on a type of activation function (Kohonen, 1988;Lee et al., 1990;Welch et al., 1992;Atkinson and Tatnall, 1997;LeCun et al., 2015;Schmidhuber, 2015;Goodfellow et al., 2016;R: Allaire and Chollet, 2020).
Once trained, NNs are powerful tools for fast and efficient processing of large amounts of remote sensing data and have been shown to be more accurate, e.g., in classification tasks, than other techniques (Kohonen, 1988;Lee et al., 1990;Atkinson and Tatnall, 1997).
Furthermore, NNs can represent complex and non-linear functions without formal description through learning from labeled training data. In contrast to statistical methods, NNs allow for incorporating data from different sources and require no knowledge or assumptions about their parametric distributions. Hence, NNs solely depend on their provided input data (Lee et al., 1990;Atkinson and Tatnall, 1997;Le-Cun et al., 2015).
In their simplest form, a so-called "shallow" NN consists of an input layer, a hidden layer, and an output layer. Inputlayer neurons correspond to the number of input features or predictors, whereas output layer neurons correspond in the case of classification tasks to the number of classes the input data should be categorized into. With an increasing number 1554 S. Paul and M. Huntemann: Machine-learning-based open-water-sea-ice-cloud discrimination of hidden layers, so-called "deep" NNs can handle even more complex problems (Atkinson and Tatnall, 1997;Schmidhuber, 2015).
While some general suggestions for the NN architecture exist, solutions are often found empirically by minimizing or maximizing the loss function or accuracy for both calibration and validation data classification without overfitting the model. This process is described in the following subsections.
In addition to these general NNs, we work with a second type called an autoencoder (AE). An AE is a specialized variant of an NN used for anomaly detection and dimension reduction (Cao et al., 2018;Dong et al., 2018;R: Allaire and Chollet, 2020).
In a typical AE, the output or target data are equal to the input data. However, all information is forced through a bottleneck hidden layer. The result relies on the capability of the bottleneck hidden-layer neurons to extract relevant information from the training data to enable the AE to reconstruct the input image with minimized error (Cao et al., 2018). This is achieved by constructing two branches of symmetric hidden layers of neurons (called the encoder and the decoder, respectively) around a bottleneck neuron layer generally consisting of very few neurons (Cao et al., 2018). The resulting encoder part of the AE can then be used for dimension reduction.

Input data
In total, we use four different types of data sets for the year 2017: 1. MODIS Level 1B calibrated radiances obtained from the MODIS sensors on board the polar-orbiting NASA satellites Terra and Aqua (MOD/MYD02; MODIS Characterization Support Team (MCST), 2017a, b; retrieved from the LAADS DAAC at https://ladsweb. modaps.eosdis.nasa.gov/, last access: 7 August 2019) with a spatial resolution of 1 km × 1 km at nadir and swath dimensions of 1354 km (across track) × 2030 km (along track), 2. Sentinel-1 A/B Level 1 calibrated backscatter data (S1-A/B; retrieved from the ASF DAAC at https://asf. alaska.edu/, last access: 25 June 2019, and processed by ESA) with a spatial resolution of 20 m × 20 m, 3. NSIDC MODIS sea-ice product (MOD/MYD29; Hall et al., 2004;Riggs and Hall, 2015) in the same resolution as the MOD/MYD02 data but comprising a precomputed and MODIS cloud-mask-applied ice-surface temperature (IST) data set, and 4. ECMWF ERA-Interim atmospheric reanalysis data (Dee et al., 2011) featuring a spatial resolution of 0.75 • and a temporal resolution of 6 h.
An overview of all used input parameters with their respective source as well as their application is provided in Table 1.
All MODIS and ERA-Interim data are resampled to a common equirectangular grid of the Brunt Ice Shelf (BIS) area with an average spatial resolution of 1 km × 1 km and an extent from 34 to 18 • W and 77 to 73 • S using a nearestneighbor approach. For visual reference, the S1-A/B data are also resampled to an equirectangular grid with the same extent but a spatial resolution of 25 m. Through the decreasing distance between meridians towards the pole, the per-pixel spatial area also decreases. This results from the constant latitudinal distance between grid points in this type of projection. Ice-shelf areas are excluded from our analysis based on RTopo-2 data (Schaffer et al., 2016).

MOD/MYD02 L1b calibrated radiances
Our goal for the later discrimination algorithm was for it to solely rely on MODIS-channel data, without the need for any auxiliary data.
Brightness temperatures (BTs) were calculated from calibrated radiances comprising MODIS channels 20, 25, 31, and 33 following Toller et al. (2009). This channel subset allows for distinguishing between open-water and/or thin-ice, sea-ice, and cloud pixels through a high inter-channel variability while reducing the impact of stripes in the MODIS data. Additionally, channel 32 data are used for the calculation of the ice-surface temperature (IST; following Riggs and Hall, 2015). Furthermore, we computed image-texture parameters using GLCM (Table 1). For this we use MODIS Collection 6.1 data.
We generally limited our study to swaths featuring sensor incidence angles of ≤ 50 • in 65 % of the study area (to minimize spatial distortion towards the swath edges) and a total coverage of our study area of > 90 %. In order to aid the manual categorization by providing favorable geometries, the MODIS colocation swath to the S1-A/B reference data needs to feature sensor incidence angles of ≤ 35 • in 65 % of the study area.

MOD/MYD29 sea-ice product
For a later comparison based on cloud coverage and polynya area, we extracted and use IST from the reference NSIDC MOD/MYD29 sea-ice product produced from MODIS Collection 6 data, which offers an overall accuracy of 1-3 K under ideal (i.e., clear-sky) conditions (Hall et al., 2004;Riggs and Hall, 2015).
Both IST values (MOD/MYD02 and MOD/MYD29) are derived based on a constant emissivity for snow or ice  but with the MODIS cloud mask already applied to the MOD/MYD29 product. Table 1. Summary of all used parameters, their source product or sensor, and their application in this study. These parameters comprise the brightness temperature (BT) from the selected MOD/MYD02 channel subset a as well as normalized BT (BT norm b ). Furthermore, ice-surface temperatures (IST) from MOD/MYD02 are presented together with the IST from neighboring swaths (IST Neighbors ) and the time-normalized IST change (IST t ) between them as well as the IST from the MOD/MYD29 product. The texture metrics calculated from GLCM (mean, variance, contrast, and entropy) as well as the calibrated backscatter (σ 0 ) from Sentinel-1 A/B are given as a reference (R). Finally, the atmospheric parameters taken from the ERA-Interim reanalysis necessary for the calculation of thin-ice thickness (TIT) are presented. The applications comprise primarily their use in the training of the neural network (NN) and autoencoder (AE).

S1-A/B L1 calibrated backscatter
In order to reliably identify polynyas independent of cloud cover or other atmospheric disturbances, we selected a total of 22 S1-A/B swaths featuring an active polynya in front of the BIS. These S1-A/B swaths together with co-located and at least partially cloud-free MOD/MYD02 data are used for calibration and validation of the algorithm. S1-A/B swath acquisition times are temporarily distributed over the 2017 Antarctic winter, with all additional information summarized in Table 2.

ERA-Interim data and thin-ice retrieval
For a quantitative comparison between the resulting polynya area (i.e., the total area of pixels covered with a maximum ice thickness of 0.2 m), we calculate the thin-ice thickness (TIT) from MODIS IST for MOD/MYD02 and MOD/MYD29 data using a surface-energy-balance model together with the ERA-Interim 2 m air temperature, the 10 m wind-speed components, the mean sea-level pressure, and the 2 m dew-point temperature (Dee et al., 2011).
The surface-energy-balance model utilizes the inversely proportional relation between IST and the thickness of thin sea ice (Yu and Rothrock, 1996;Drucker et al., 2003). The net positive flux towards the atmosphere between the warm ocean and the cold atmosphere is equalized from the conductive heat flux through the ice. From the conductive heat flux TIT is derived. A detailed description of the retrieval procedure and all equations and necessary assumptions are thoroughly described in Paul et al. (2015) as well as Adams et al. (2013). For ice thicknesses between 0.0 and 0.2 m, Adams et al. (2013) state an average uncertainty of ±4.7 cm.

Initial-training-data generation
The availability and quality of labeled training data are of utmost importance for the training of any supervised machinelearning algorithm. However, available spatiotemporal highresolution cloud information over nighttime sea ice is practically non-existent. Therefore, we had to derive our own labeled training data using co-located MODIS and S1-A/B data to manually identify open-water and/or thin-ice, sea-ice, and cloud pixels, respectively (Fig. 2a).
To reduce manual effort and uncertainty to a minimum, we employ a mix of dimension reduction and unsupervised clustering before the final manual categorization.
First, we selected MODIS swaths in close temporal proximity for each of the 22 S1-A/B reference swaths (Table 2), Land-ice (dark gray) and ice-shelf (light gray) overlays originate from RTopo-2 (Schaffer et al., 2016). In (c), clusters 20 and 24 were categorized as cloud; clusters 1, 3, 25, 28, and 30 were categorized as open water and/or thin ice; and clusters 19 and 33 were categorized as sea ice. Please note that the date format in this figure is year month day (yyyy-mm-dd).
i.e., in a temporal range of ±36 h around the S1-A/B swath. We chose the best-temporal-match-based sensor zenith angle (65 % of the study area feature an angle of ≤ 35 • ) and swath coverage of our study area (≥ 90 %). In this way, the data represent rather easy-to-distinguish configurations of openwater and/or thin-ice, sea-ice, and cloud pixels with favorable geometries for manual categorization.
Secondly, in addition to the textural parameters from the GLCM (Table 1), we wanted to add a temporal component to the parameter mix. We added the IST of two swaths acquired before and after the current swath, respectively. These four swaths were taken from the pool of selected MODIS swaths and arranged in temporal patterns before and after the best match. Additionally, we added the time-normalized IST difference between all these neighboring swaths.
From here, we take advantage of the AE dimension reduction capabilities (Fig. 2a). Instead of using the total number of 33 input parameters for the FCM with probably only mediocre results (Table 1), we cluster the encoded information from the bottleneck layer neurons swath-wise for all MODIS co-locations. Subsequently, the FCM soft-clusters similar pixels per swath into 35 clusters before we manually categorize these clusters into one of three classes, "open water and/or thin ice", "sea ice", or "cloud". An exemplary sequence of this procedure is shown in Fig. 3a-d.
For this task of dimension reduction, we trained and subsequently used the encoder part of our autoencoder based on a setting featuring a decreasing number of neurons per hidden layer of 32, 16, and 8 down to the bottleneck layer containing 3 neurons. The decoder part is built symmetrically to the encoder but in reversed order. We used a mean squared error loss function and trained for 50 epochs using a batch size of 2048 with the ADAM optimizer (Kingma and Ba, 2017) for all available co-located MODIS-S1-A/B combinations. For a detailed explanation of these technical terms, please see Goodfellow et al. (2016).
In order to reduce uncertainty in the training data, we constrained the manual classification to "obvious" cases (e.g., cold continuous patches over otherwise warm polynyas and adjacent sea ice categorized as clouds), which results in not every MOD/MYD02 swath being fully classified at this stage (Fig. 3d).
Finally, from our manual categorization, we only use pixels with an FCM probability (i.e., the membership score) above 0.6 for open-water and/or thin-ice pixels, 0.65 for seaice pixels, and 0.65 for cloud pixels (Fig. 3d). As sea-ice or cloud pixels are harder to identify, we chose a stricter probability threshold for those two classes. Due to the large temperature range present in Antarctic clouds, we arbitrarily separated our cloud class internally into "cold" (< 235 K), "intermediate", and "warm" (> 250 K) clouds. This separation lead to an improved general classification result through the neural network later on. All ice-shelf areas are excluded from our analysis to avoid any additional misclassifications due to the substantially different temperature regime. Table 2. List of used S1-A/B swaths for calibration or training, validation, and a detailed analysis (Fig. 6). Through this procedure, we created an initial labeled training data set consisting of about 3.5 × 10 6 data points for the 33 predictors (Table 1). For the purpose of training the NN, we divided the data into a training or calibration and a validation data set (Fig. 2b). As a random split would potentially lead to highly autocorrelated neighboring pixels, we decided for a swath-wise split with 16 swaths used for training or calibration and 6 swaths used for validation plus an additional 2 swaths for an additional analysis (Table 2).

Final-training-data generation
As mentioned, the initial training data set is based solely on obvious cases that were manually categorized. This procedure lead to only few data points per swath (Fig. 3d). In order to (at least almost) fully classify all co-located MODIS swaths and thereby extend our training data set, two simple intermediate classifiers were trained to represent their respective initial training data set (i.e., calibration or validation) as best as possible (Fig. 2c).
With this, we are able to extend our training data set by identifying and classifying additional similar data points in the complete set of co-located MODIS swaths that were previously not categorized. However, based on the class probabilities provided by the two NNs and through visual screening, we excluded ambiguous pixels from the final training data set (Fig. 3e). In this way, we get a statistically substantiated classification of almost the complete swaths -in contrast to the partially categorized swaths through manual classification used before (Fig. 2c).
Through this procedure, we created our final labeled training data set of about 10.0 × 10 6 and 3.1 × 10 6 data points comprising the 33 different predictors or parameters for calibration and validation, respectively (Table 1).

Training of the final classifier
We used this final training data set to train our final classifier (Fig. 2d). This NN consists of 6 hidden layers containing 20 neurons each with activation functions for a leaky rectified linear unit (leaky ReLU) while using a fixed batch size of 2048, a learning rate of 1 × 10 −4 , and a dropout rate of 20 % as well as weight decay in the form of an L 2 parameter regularization (Goodfellow et al., 2016). Furthermore, we used categorical cross-entropy loss and again the ADAM optimizer (Kingma and Ba, 2017).
Our final open-water-sea-ice-cloud discrimination (OSCD) classifier features an accuracy (the ratio of correctly classified pixels to the total number of samples) of 90.8 % and 84.3 % on the calibration and validation data set, respectively. For our comparisons and the results, we always merged all cloud subclasses to a single cloud class (Figs. 2e and 3f).

Results and discussion
In the following, we describe and discuss the results from using our open-water-sea-ice-cloud discrimination (OSCD) product in comparison to the reference MOD/MYD29 seaice product on the basis of a thin-ice thickness (TIT) estimates (i) on a swath-to-swath basis, (ii) on the basis of daily composites of all available swaths per day, and (iii) as a comparison of overall achieved coverage over a year (Fig. 2f).

Swath-based comparison
Representative comparisons between resulting the TIT from OSCD and MOD/MYD29 swaths reveal substantial differences, especially in the high-temperature polynya and thinice areas (PA; Figs. 4 and 5).
The S1-A/B reference data always feature a polynya signal in all our examples (Figs. 4a, e, and i and 5a, e, and i) and these are (at least partially) represented by a warm IST anomaly in the MODIS data (Figs. 4b, f, and j and 5b, f, and j). While for some examples the difference in resulting TIT between OSCD and MOD/MYD29 is comparably small or negligible (Figs. 4g and h and 5c and d), substantial dif- Figure 4. Compilation of exemplary co-located S1-A/B calibrated backscatter (in dB) and MODIS swaths of ice-surface temperature (IST; in K) and derived thin-ice thickness (TIT; in m) data (Table 3). Gray and green overlays highlight the ice-shelf extent. Manually picked S1-A/B reference polynya extent is outlined by a dashed red line in all panels. Please note that the date format in this figure is year month day (yyyy-mm-dd).
ferences appear for other examples (Figs. 4k and l, 5g and k, and 5h and l).
For a better comparison, the polynyas were hand-picked for the respective S1-A/B data in Figs. 4 and 5. The corresponding absolute polynya areas are summarized in Table 3. In addition to the respective numbers for each polynya, the corresponding area covered in the S1-A/B extent is given in parentheses. While there is some uncertainty due to the different grid resolutions (25 m vs. 1 km) as well as acquisitiontime difference and subsequent changes due to sea-ice drift, this allows for a good quantification of the impact of erroneously classified cloud cover on the estimated TIT.
While there are correct and also corresponding cloud classifications in both MODIS products, the applied MODIS cloud mask in the MOD/MYD29 product tends towards additionally masking out strong positive temperature anomalies (Figs. 4l and 5h and l). This happens frequently in the center of the primary polynya around 27.4 • W and 76 • S and leads to substantial differences in PA estimates (Table 3).
Due to the strong temperature gradient between the warm ocean and the cold atmosphere, turbulent exchange of sensible and latent heat is large and can potentially lead to the formation of sea fog and thin, low cloud cover (Gultepe et al., 2003;Fraser et al., 2009). However, the temperature texture in the open-water and/or thin-ice areas appears to be homo- Figure 5. Additional compilation of exemplary co-located S1-A/B and MODIS swaths in the same setup as Fig. 4. Gray and green overlays highlight the ice-shelf extent. Manually picked S1-A/B reference polynya extent is outlined by a dashed red line in all panels. Please note that the date format in this figure is year month day (yyyy-mm-dd). Table 3. Summary of polynya area (PA; in km 2 ) estimates between S1-A/B (PA S1 ), OSCD (PA OSCD ), and MOD/MYD29 (PA M29 ) data. PA estimates in parentheses correspond to the PA retrieved from MODIS for the S1-A/B polygon in Figs. 4 and 5.

Example
PA S1 PA OSCD PA M29 geneous and is likely not to be affected by either sea fog or clouds to the extent suggested by the MOD/MYD29 product through the MODIS cloud mask.

Daily-composite-based comparison
Based on the median TIT of all available MODIS swaths per day, daily polynya area (PA) was computed (Paul et al., 2015), and the difference between OSCD and MOD/MYD29 was calculated (i.e., OSCD minus MOD/MYD29; Fig. 6). Scattering of OSCD and MOD/MYD29 daily PA estimates against each other reveals a general tendency towards larger PA estimates in MOD/MYD29 data ( Fig. 6; top-left scatterplot inlet). However, there is also a strong seasonality in Figure 6. Daily polynya area difference in 10 3 km 2 using swath-wise pixel averages featuring a thin-ice thickness (TIT) ≤ 0.2 m between OSCD and MOD/MYD29. Difference is calculated by subtracting MOD/MYD29 from OSCD; results with OSCD ≥ MOD/MYD29 are shown in blue; results with OSCD < MOD/MYD29 are shown in red. Orange vertical bars highlight days with S1-A/B swath coverage used for calibration or training of the OSCD algorithm. Green vertical bars show additional S1-A/B swaths used for validation between products (Figs. 4 and 5). The top-left corner features a scatterplot of the daily polynya with MOD/MYD29 to OSCD. Additional information about the S1-A/B swaths is provided in Table 2. this MOD/MYD29 bias, which dominates from 1 April 2017 to mid May 2017, while OSCD estimates are predominately larger than or equal to MOD/MYD29 between mid May and 30 September 2017 (Fig. 6). For the year 2017, about 64 %, 50.0 %, and 27 % of the differences of the absolute daily median PA are below 1000 km 2 , 500 km 2 , and 100 km 2 , respectively.
On average, OSCD estimates the daily polynya area (PA) between 1 April and 30 September 2017 to be 1.88 ×10 3 km 2 in contrast to 2.69 ×10 3 km 2 using MOD/MYD29 data (not shown). This corresponds to an average daily mean PA which is about 44 % smaller for OSCD compared to MOD/MYD29. However, especially during freeze-up (i.e., between 1 April 2017 and mid May 2017), the differences are oftentimes very large (14.9 ×10 3 km 2 on 17 May 2017) and towards MOD/MYD29. To analyze this, we conduct a more detailed analysis of the OSCD and MOD/MYD29 daily median TIT (Figs. 7 and 8).
Unfortunately, no S1-A/B swath was acquired over the BIS area for 17 May 2017. However, S1-A/B swaths were acquired the day before and after (Table 2).
From the S1-A/B data ( Fig. 7a and b), the existence of open water and/or thin ice very close to the ice-shelf edge around 27.4 • W and 76 • S for 18 May 2017 is evident.
The lack of any clearly distinguishable positive temperature-anomaly features in the MODIS daily median IST composite (Fig. 7c) and the general texture of rather smooth temperature patches are both signs for a persistently present cloud cover during 17 May 2017.
However, the relatively high temperatures of some of these potential clouds lead to an erroneous calculation of TIT and the subsequent daily median TIT composite with an erroneously much larger polynya area (PA) for MOD/MYD29 compared to OSCD (Fig. 7d and e). Nonetheless, also OSCD features TIT estimates from cloud artifacts in the northwest around 29.5 • W and 74-74.5 • S as well as in the area of the primary BIS polynya.
The individual swaths used for the computation of both composites underline the absence of any pronounced positive temperature anomalies corresponding to open-water and/or thin-ice features ( Fig. 8a-g).
While cold clouds are reliably identified, the inability of the MODIS cloud mask to also reliably identify warm cloud patterns results in the computation of TIT in large patches west of BIS (Fig. 8q-u). Conversely, these false computations are not present or are at least much reduced in the OSCD data ( Fig. 8j-n). However, while a small area west of the tip of the BIS around 28 • W and 75.5 • S corresponds well to the polynya signal in the S1-A data (Fig. 7b), the majority of the TIT estimates appear to be cloud artifacts (Fig. 8n).
From our analysis of the swath-based and daily-composite comparisons, three major take home messages can be summarized:   1. Erroneous TIT estimates due to (especially) warm cloud-cover artifacts resulting from false-negative classifications in the MOD/MYD29 data increase the overall estimated PA substantially. These false-negative classifications are reduced in the OSCD data.
2. False-positive cloud classifications over positive icesurface temperature anomalies in the MOD/MYD29 data reduce the products' capability to estimate PA spatially and temporally correctly. These false-positive classifications are also reduced in the OSCD data.
3. Eliminating the thinnest sea-ice fraction of the thinice spectrum due to false-positive classifications potentially leads to a "thick" thin-ice bias during the dailycomposite procedure.
The combined effect leads to spatially misplaced TIT estimates, likely not resolving the correct shape and (sub)daily thickness distribution of the open-water and/or thin-ice areas. Studies such as Paul et al. (2015) and Preußer et al. (2019), therefore, try to mitigate the effect of points 1 and 2 by introducing predefined masks.

Coverage comparison
In order to pick up on the last point, we would like to analyze the per-swath coverage in more detail, as this also influences the subdaily TIT distribution and, therefore, the thickness distribution of the resulting daily composite. It appears that the per-swath thin-ice occurrence frequency is much higher in the OSCD data compared to the MOD/MYD29 data (Fig. 9). Figure 10. Comparison of binned thin-ice thickness classes with a bin size of 2 cm based on all available swaths from 1 April to 30 September 2017 between the use of MOD/MYD29 (red) and OSCD (blue) data for the primary BIS polynya (green dashed outline in Fig. 9).
Quantifying the differences in the outlined subregions ( Fig. 9; white and green dashed outlines), results in a 10 % and 20 % (BIS area and primary BIS polynya) higher detection rate of thin-ice pixels over all MODIS swaths between 1 April and 30 September 2017 in the OSCD data (Fig. 9b). This improved coverage likely leads to a higher-quality daily composite, as the impact from outliers is reduced. It admittedly sounds counterintuitive at first to have improved coverage ( Fig. 9) with at the same time substantially less average PA (Fig. 6). This effect can be explained from the difference between swath and daily-composite data. Here, the increase in coverage mainly focuses around the primary polynya at BIS (green outline in Fig. 9). However, the substantial decrease in daily PA results from reducing the false-negative classifications of warm clouds as sea ice, primarily off the BIS edge to the west. These misclassification-related TIT estimates push the resulting average PA for the MOD/MYD29 data.
Based on our analysis in Sect. 3.1 and 3.2, we assumed that these additional thin-ice occurrences likely feature very thin ice, therefore, reducing the potential bias for thick thin sea ice in the MOD/MYD29 data. This is evident from Fig. 10. Here, the TIT occurrence frequency based on 2 cm bins for all available swaths between 1 April and 30 September 2017 is shown for the MOD/MYD29 and OSCD data. Thickness classes between 0 and 20 cm are much more frequent in the OSCD data (402 724; Fig. 10) compared to the MOD/MYD29 standard product (211 021; Fig. 10). The largest difference between both products, however, is the overall higher occurrence frequency of the thinnest ice fractions (between 0 and 10 cm) in the OSCD data compared to MOD/MYD29. As assumed before, there is a bias for thick thin ice present in the MOD/MYD29 data, which potentially plays an important role especially in the estimation of sea-ice production based on daily composites.
Despite great care during the manual categorization, uncertainty remains due to the lack of measured ground-truth data for the training data generation. However, the underlying statistical basis from the unsupervised FCM clustering in combination with a second stage of fully classifying all co-located MODIS swaths using NNs before generating the calibration or validation swath-split final training data for the OSCD algorithm appears to provide a realistic representation of the present sea-ice conditions in the BIS area.

Summary and outlook
In this study, we present a novel approach to improve the detection of wintertime cloud cover over Antarctic sea ice and its discrimination from sea-ice cover and open-water and/or thin-ice areas in MODIS thermal-infrared data using a deep neural network.
We established a labeled training data set using the techniques of dimension reduction, unsupervised clustering, and supervised learning in combination with manual visual screening and categorization. Through this effort, we generated a total of 13.1 ×10 6 data points for 33 different predictors.
With this data set, we trained a deep neural network and used it to discriminate between open-water and/or thin-ice, sea-ice, and cloud-covered areas in the Brunt Ice Shelf region for the freezing period of 2017 (1 April to 30 September). Here, we computed the thin-ice thickness up to 0.2 m of open-water and/or thin-ice areas and evaluated the difference in daily polynya area and daily swath coverage to results using the standard NSIDC MOD/MYD29 sea-ice product.
Based on our approach, we obtain a 44 % lower average polynya area but 20 % higher swath coverage rate compared to the standard MOD/MYD29 product. On the one hand, the polynya area in MOD/MYD29 is likely dominated through frequent false-negative classifications of warm clouds as thin ice, leading to unrealistically large open-water and/or thinice areas, especially during freeze-up. On the other hand, the much lower coverage rate likely decreases the quality and accuracy of TIT estimates in the daily median TIT composites when using MOD/MYD29 data. Both factors are reduced in our OSCD data. This also reduces the impact of single outliers on the daily median TIT composites and, therefore, also increases the quality of derived information such as sea-ice production.
In the future, we plan to create an open-access comprehensive OSCD-based IST and TIT products covering all major Antarctic coastal polynyas, as well as providing higher-level parameters such as polynya area, sea-ice production, and associated ocean salt flux. We expect this data set to be of great use to the ocean-sea-ice-ice-shelf model community as well as for potential biological applications. Data availability. The generated initial training data set can be downloaded through Zenodo at https://doi.org/10.5281/zenodo.4596407 (Paul, 2021). Sources of all used data sets are referenced in the text.
Author contributions. SP designed the study and methodology, conducted the analysis, and wrote the original draft of the paper. MH assisted in the study design and the adaptation of the machinelearning algorithms as well as with writing the paper.
Competing interests. The authors declare no conflict of interests.
Acknowledgements. The authors want to thank the LAADS DAAC and the ASF DAAC for the provision of the MOD/MYD02 and S1-A/B data used here. The corresponding author appreciates the help of his family which enabled him to finally write this paper while working from home due to the SARS-CoV-2 pandemic. The comments by two anonymous reviewers as well as the editor, Claude Duguay, helped to substantially improve the quality of this paper.
Financial support. The article processing charges for this openaccess publication were covered by a Research Centre of the Helmholtz Association.
Review statement. This paper was edited by Claude Duguay and reviewed by two anonymous referees.