A new Sentinel-1 image-based sea ice classification algorithm using a machine-learning-based model trained in a semi-automated manner is proposed to support daily ice charting. Previous studies mostly rely on manual work in selecting training and validation data. We show that the readily available ice charts from the operational ice services can reduce the amount of manual work in preparation of large amounts of training/testing data. Furthermore, they can feed highly reliable data to the trainer by indirectly exploiting the best ability of the sea ice experts working at the operational ice services. The proposed scheme has two phases: training and operational. Both phases start from the removal of thermal, scalloping, and textural noise from Sentinel-1 data and calculation of grey level co-occurrence matrix and Haralick texture features in a sliding window. In the training phase, the weekly ice charts are reprojected into the SAR image geometry. A random forest classifier is trained with the texture features on input and labels from the rasterized ice charts on output. Then, the trained classifier is directly applied to the texture features from Sentinel-1 images operationally. Test results from the two datasets spanning winter (January–March) and summer (June–August) seasons acquired over the Fram Strait and the Barents Sea showed that the classifier is capable of retrieving three generalized cover types (open water, mixed first-year ice, old ice) with overall accuracies of 87 % and 67 % in winter and summer seasons, respectively. For the summer season, the classifier failed in distinguishing mixed first-year ice from old ice with accuracy of only 12 %; however, it performed rather like an ice–water discriminator with high accuracy of 98 % as the misclassification between the mixed first-year ice and old ice was between them. The accuracy for five cover types (open water, new ice, young ice, first-year ice, old ice) in the winter season was 60 %. The errors are attributed both to incorrect manual classification on the ice charts and to the semi-automated algorithm. Finally, we demonstrate the potential for near-real-time service of the ice map using daily mosaicked Sentinel-1 images.
Wide swath SAR observation from several spaceborne SAR missions (RADARSAT-1, 1995–2013; Envisat ASAR, 2002–2012; ALOS-1 PALSAR, 2006–2011; RADARSAT-2, 2007-present; Sentinel-1, 2014–present) played an important role in studying global ocean and ice-covered polar regions. The Sentinel-1 constellation (1A and 1B) is producing dual-polarization observation data with the largest Arctic coverage and the highest temporal resolution ever. The cross-polarization is known to be more sensitive to the difference in scattering from sea ice and open water than the co-polarization (Scheuchl et al., 2004), and the combination of HH and HV polarizations has been widely used for ice edge detection and ice type classification (a nice overview is given in the paper by Zakhvatkina et al., 2019). However, most of the recent ice classification algorithms were developed using RADARSAT-2 ScanSAR (Leigh et al., 2014; Liu et al., 2015; Zakhvatkina et al., 2017), which has different sensor characteristics from Sentinel-1 TOPSAR, and the use of Sentinel-1 for the same purpose is very limited in literature. The main drawback of applying existing algorithms to Sentinel-1 TOPSAR data is the relatively high level of thermal noise contamination and its propagation to image textures.
For proper use of dense time series of Earth observations using SAR sensors, radiometric properties must be well-calibrated. Thermal noise is often neglected in many cases but can seriously impact the utility of dual-polarization SAR data. Sentinel-1 TOPSAR image intensity is particularly disturbed by the thermal noise in the cross-polarization channel. Although the European Space Agency (ESA) provides calibrated noise vectors for noise power subtraction, residual noise contribution is still significant considering the relatively narrow backscattering distribution of the cross-polarization channel. In our previous study (Park et al. 2018), a new denoising method with azimuth de-scalloping, noise scaling, and inter-swath power balancing was developed and showed improved performance in various SAR intensity-based applications. Furthermore, when it came to texture-based image classification, we suggested a correction method for textural noise (Park et al., 2019) which distorts local statistics, and thus degrades texture information in the Sentinel-1 TOPSAR images.
Study area.
Processing flow chart of the proposed algorithm. The grey colour shows the training phase
In many of the previous studies on ice–water and/or sea ice classification (Soh and Tsatsoulis, 1999; Zakhvatkina et al., 2013; Leigh et al., 2014; Liu et al., 2015; Ressel et al., 2015; Zakhvatkina et al., 2017; Aldenhoff et al., 2018), the training and validation were done using manually produced ice maps. Although the authors claimed that the manual ice maps were drawn by ice experts, the selection of SAR scenes and interpretation could be inconsistent, and the number of samples might not be enough to generalize the results because of the laborious manual work. Furthermore, the results are hardly reproducible by others because the reference sources are not open to the public. Therefore, increasing objectivity is crucial, and automating the classification process is encouraged. The idea of training using SAR images and accompanying image analysis charts, which is a direct manual interpretation of SAR images by trained ice analysts working at operational ice services, was tested for sea ice concentration estimation by Wang et al. (2017); however, such image analysis charts are not accessible to the public.
An example of the ice chart preprocessing. From the ice chart,
stage of development (SoD;
The use of a public ice chart as training and validation reference data may help in solving the validation problem. The preparation of a public ice chart is also through manual inspection of various sources of satellite imagery and other sources of data (Partington et al., 2003; Johannessen et al., 2006); however, training using a large volume of these charts would reduce operator-to-operator bias, such as inconsistent decisions against similar ice conditions. The overall bias may exist since the public ice charts are produced in the interest of marine safety. Nevertheless, as the human interpretation available in the ice chart is currently considered the best available information on sea ice (Karvonen et al., 2015), the best practice to make a sea ice type classifier is to train with the public ice chart so that the best knowledge of ice analysts is mimicked.
In this work, we present a semi-automated Sentinel-1 image-based sea ice classification algorithm which takes advantage of our denoising method. The noise-corrected dual-polarization images are processed into image textures that capture sea ice features in various spatial scales, and they are used for supervised classification with a random forest classifier by relating with ice charts published by operational ice services. The use of ice charts has a dual purpose: semi-automatization of classifier training and minimization of human error.
The region of study for developing and testing the proposed algorithm is the
Fram Strait and the Barents Sea including a part of the Arctic Ocean
(75–85
Sentinel-1 TOPSAR data in extended wide-swath (EW) mode acquired in summer
(June–August in 2016–2018) and winter (January–March in 2017–2019) seasons
were collected from the Copernicus Open Access Hub
(
The ice charts covering the same periods were collected. There are two ice
services that publish weekly ice charts with pan-Arctic coverage: the U.S.
National Ice Center (NIC) of the United States of America and the Arctic and
Antarctic Research Institute (AARI) of Russia. Although the accuracies are
known to be comparable (Pastusiak, 2016) to each other, there is no partial
ice concentration information in the AARI ice chart. In this study, we use
the ice charts downloaded from the NIC website
(
Figure 2 shows the flow of the semi-automated ice classification scheme that we propose. It is divided into two phases: training and operational. Both phases start from the removal of thermal noise from Sentinel-1 data (Sect. 2.2.2), incidence angle calibration (Sect. 2.2.3), and calculation of texture features (Sect. 2.2.4). The training phase (shown in grey in Fig. 2) continues with preprocessing and collocation of the ice charts with the Sentinel-1 data (Sect. 2.2.1) and machine learning step (Sect. 2.2.5 and 2.2.6). The operational phase uses the classifier which developed during the training phase for processing texture features that were computed from the input SAR data and for generating ice charts. Detailed explanations for each step are given in the following subsections.
Incidence angle dependences of sigma naught in HH (closed squares) and HV (open squares) polarization channels. Pixels covering various types of sea ice were merged so that the averaged property can be estimated. The blue and red zones indicate winter and summer seasons, respectively.
Hyperparameter optimization using grid search results (cross).
Dashed lines represent the best-fit Richards curve.
Hyperparameters used for grid search.
To take advantage of the objective identification of the ice type from expert sources open to public and to develop a semi-automated processing scheme, the proposed algorithm uses electronic ice charts published by international ice chart services. The electronic ice chart follows SIGRID-3 format (JCOMM, 2014a), which is based on a vector format called shapefile (ESRI, 1998). The first step is to reproject the ice chart into the geometry of each SAR image. Although an accurate reprojection needs several pieces of information such as orbit, look angle, topographic height, etc., our interest is in the sea ice where the topographic difference does not exceed more than a few metres; hence the reprojection of coordinates of ice chart polygons is done with Geospatial Data Abstraction Library (GDAL; GDAL/OGR contributors, 2019) using a simple third-order polynomial fitted using the ground control point information from the Sentinel-1-product-included auxiliary data.
Distribution of the image acquisition dates prior to the publication of the reference ice chart.
After the reprojection, the following three layers are extracted: total ice concentration (CT), partial ice concentration of each ice type (CP), and stage of development (SoD). CT is important because areas with low CT can be misinterpreted as open ocean in a SAR image. Heinrichs et al. (2006) reported that the ice edge determined from AMSR-E passive microwave radiometer data using the isoline of 15 % concentration best matches the ice edge determined from RADARSAT-1 SAR data using visual inspection. After the visual comparison of many SAR images and the corresponding reprojected ice charts, we set a threshold of 20 % for CT to discard water-like pixels. Note that the ice concentration label in the SIGRID-3 format is assigned in an increment of 10 %. CP is also important in finding the dominant ice type in the given polygons. SoD is a so-called ice type. It is challenging to differentiate ice types using SAR data only; thus we merged the SoDs into five simple classes: open water, new ice, young ice, first-year ice, and old ice. For the summer season (June–August), there is almost no new ice and young ice annotated in the NIC ice chart, and the SoDs are further merged into three classes: open water, mixed first-year ice, and old ice. Figure 3 demonstrates an example of the ice chart preprocessing explained above with the colours following the WMO nomenclature (JCOMM, 2014b). Comparing the original SoD in panel (a) with the processed SoD in panel (d), it is clear that the ice edge of the processed SoD matches better with the SAR backscattering images.
Feature importance of the binary sub-classifiers. ASM: angular second moment; Cont: contrast; Corr: correlation; Var: variance; IDM: inverse difference moment; Sum Avg: sum average; Sum Var: sum variance; Sum Ent: sum entropy; Ent: entropy; Diff Var: difference variance; Diff Ent: difference entropy; IMC: information measures of correlation; CV: coefficient of variation. For definitions of each parameter, please refer to Haralick et al. (1973).
Sentinel-1 cross-polarization images suffer from strong noise, some which originates from combined effects of the relatively low signal-to-noise ratio of the sensor system and insufficient noise vector information in the extra-wide-swath mode level 1 product (Park et al., 2018). For surfaces with low backscattering such as calm open water and level sea ice without the presence of frost flowers on top, the effects from thermal noise contamination are visible not only in the backscattering image but also in some of the texture images (Park et al., 2019). The authors have developed an efficient method for textural denoising which is essential for the preprocessing of Sentinel-1 TOPSAR dual-polarization products. Denoising ensures beam-normalized texture properties for all subswaths, which helps seamlessly mosaic of multi-pass images regardless of the satellite orbit and image acquisition geometry. By following the methods developed in Park et al. (2018, 2019), each of the Sentinel-1 images was denoised before further processes are applied. As the noise power subtraction yields negative intensity values where the backscattering power is close to the noise floor, more often in HV polarization, which has lower backscatter than in HH polarization, we added mean of the noise power back to the denoised result so that those pixels do not turn into NaN (not a number) by the sigma naught conversion of linear scale to log scale (decibel).
It is well-known that there is a strong incidence angle dependence in the
SAR backscattering intensity for open water and sea ice surface
(Mäkynen et al., 2002; Mäkynen and
Karvonen, 2017). For a wide-swath SAR system like Sentinel-1 TOPSAR, varying
backscatter intensity confuses image interpretation. The quasi-linear slopes
in the plane of incidence angle versus sigma nought in decibel scale are
reported as
Figure 4 shows incidence angle dependence in the SAR backscattering
intensity for mixed sea ice types. From the Sentinel-1 dataset described in
Sect. 2.1, sea ice pixels were extracted by using daily global sea ice
edge products available from the EUMETSAT Ocean and Sea Ice Satellite
Application Facilities (OSI SAF). For the midwinter season (January–March displayed
as a blue background), the estimated mean slope in HH polarization was
We compensate for the incidence angle dependence using the estimated slopes
with respect to the nominal scene centre angle of 34.5
Like many of the previously developed sea ice type classification methods
(Shokr, 1991; Barber and LeDrew, 1991; Soh and Tsatsoulis, 1999; Deng and
Clausi, 2005; Zakhvatkina et al., 2013; Leigh et al., 2014; Liu et al.,
2015), the proposed approach starts from a grey level co-occurrence matrix
(GLCM) calculation. The GLCM is a four-dimensional matrix
An important factor that influences the computed texture features is the
number of grey levels,
Since
In addition to 13 Haralick features, the coefficient of variation (CV) which
is reported as a useful feature for ice–water discrimination (Keller et al.,
2017) is included. The CV is defined as follows:
Confusion matrix of the five-class RF classifier which was trained with and applied to the DS1 winter dataset. The bold font indicates where the classification was done correctly.
The 1 d mosaics of Sentinel-1A/1B images (
Since there are hundreds of algorithms in the field of machine learning (ML)
and each of the different algorithms has its own pros and cons, it is not
easy to compare their performances and decide what to use.
Fernández-Delgado et al. (2014) found that the random forest (RF;
Ho, 1998) was the best classifier for various types of datasets with a slight
difference from a support vector machine (SVM; Cortes and Vapnik, 1995). In
the previous studies about sea ice classification (e.g. Leigh et al., 2014;
Liu et al., 2015; Zakhvatkina et al., 2017), the SVM was used often because
by nature it works relatively well even when the number of datasets is
small. When the training dataset is prepared by manual work (i.e. manual
classification by human expert), the number of images is not large, usually
fewer than 20 (e.g. 12 scenes in Zakhvatkina et al., 2013; 20 scenes in
Leigh et al., 2014; 1 scene in Liu et al., 2015; 4 scenes in Ressel et al.,
2015). However, the number can increase with less effort when the readily
available ice charts are used as training references. Besides, there is no
need to rely on additional manual work prone to contamination by biased
decisions. The RF has two practical advantages when processing a large
number of datasets. First, the RF is scale-invariant and does not require
preprocessing of the datasets, whereas the SVM requires scaling and
normalization. Second, the computational complexity of the RF is lower than
that of the SVM. For the SVM, the number of operations is
The 1 d mosaics of Sentinel-1A/1B images (
We split the RF classifier into several binary classifiers using a one-vs.-all scheme (Anand et al., 1995). Although the standard RF algorithm can inherently deal with a multiclass problem, the one-vs.-all binarization to the RF results in better accuracy with smaller forest sizes than the standard RF (Adnan and Islam, 2015).
Confusion matrix of the five-class RF classifier which was trained with the DS1 winter dataset and applied to the DS2 winter dataset. The bold font indicates where the classification was done correctly.
Classification accuracies before and after applying textural denoising.
Confusion matrix of the three-class RF classifier which was trained with and applied to the DS1 summer dataset. The bold font indicates where the classification was done correctly.
Three hyperparameters of the RF classifier were tuned: number of trees
(
The 1 d mosaics of Sentinel-1A/1B images (
To train an ice type classifier, a set of collocated SAR images and ice charts is required. After the preprocessing of the ice chart including reprojection into the SAR image geometry, only the samples with spatially and temporally good matches should be fed to the training phase. The goodness of matching should be examined as the weekly ice chart is produced by merging information from many image sources acquired in different time instances; hence the ice locations and conditions are unlikely to match to those in every SAR image. As no explicit scene identifier or time information of the images used in ice charting is provided with the ice chart itself, the basic strategy in image selection is to find a pair of a SAR image and an ice chart which match well visually. Such an image selection is trivial, but not easy to automate. Since the weekly ice chart is made partly based on the SAR images acquired in the past 3 d from the date of publication, the ice edges in some images match well with those in the ice chart.
In order to automate image selection, the ice edges in SAR images need to be identified first. Since even an ice–water classifier has not been well-developed yet for Sentinel-1, the image selection procedure has to be done manually in the beginning. However, once a classifier is generated with high accuracy, it can be used to automate the procedure; then the whole process in the proposed scheme will be fully automated. This is why the proposed algorithm is named “semi-” automated for now. Nevertheless, the manual selection to guarantee a “good match” is done by visual inspection of ice–water boundaries overlaid on SAR images. The ice–water boundary can be extracted easily from the reprojected ice chart. Then the SAR backscattering image contrasts across the ice–water boundaries are examined in both HH and HV polarization because the image contrast between ice–water is larger in HV while smooth level ice is more easily identified in HH.
Confusion matrix of the three-class RF classifier which was trained with the DS1 summer dataset and applied to the DS2 summer dataset. The bold font indicates where the classification was done correctly.
Confusion matrix of the three-class RF classifier which was trained with the DS1 summer dataset and applied to the DS1 summer dataset of each month. The bold font indicates where the classification was done correctly.
After the image selection, the samples in the selected images are split
randomly into training and test datasets with a ratio of
Confusion matrix of the three-class RF classifier which was trained with the DS1 summer dataset and applied to the DS2 summer dataset of each month. The bold font indicates where the classification was done correctly.
It might not be enough to assess the quality of the classifier output when it is trained with, and evaluated against, only NIC ice charts. The accuracy could be indirectly investigated by comparing the output from our classifier against another data source, such as the OSI SAF sea ice type product (OSI-403-c). The ice classes of OSI-403-c are assigned from atmospherically corrected brightness temperatures of passive microwave radiometers (SSMIS and AMSR2) and backscatter values of radar scatterometer (ASCAT), using a Bayesian approach (Aaboe et al., 2018).
The 1 d mosaics of Sentinel-1A/1B images (
An example of the inconsistency of the ice charts. Note that the
SoD labels and colours are of an original NIC ice chart, while those in Figs. 7
and 8 are of a simplified version as described in Sect. 2.2.1. The SoDs from
the NIC ice charts on different dates (26 December 2018 and 2 January 2019)
are superimposed on the Sentinel-1 backscattering image of the corresponding
dates. The same ice floe (red outline) is classified differently in each ice
chart (old ice in
We trained three RF classifiers with different feature configurations: (i) FC1: Haralick texture features and CV; (ii) FC2: Haralick texture features, CV, and incidence angle; and (iii) FC3: Haralick texture features, CV, incidence angle, and day of the year.
As expected, the classification score increases with the number of trees
(crosses in Fig. 5a), and Richards curve (dashed line) fits
well to the observations (RMSE
The ice types in NIC ice chart
Confusion matrix of the three-class RF classifier which was trained and applied to the DS1 winter dataset. The bold font indicates where the classification was done correctly.
The trained five-class classifier consists of five binary sub-classifiers; each of them is used for discriminating one specific class from the others. For each sub-classifier, each texture feature has a different weight in decision making. The fraction of the samples that each texture feature contributes can be used to compute the relative importance of the features, and the averaged estimates of them over several randomized trees serve as an indicator of feature importance (Louppe, 2014). The feature importance of the sub-classifiers is presented in Fig. 6. The overall pattern shows that the features of HV polarization play a more important role than those of HH polarization. For HH polarization, the sum average, which is equal to the mean backscattering intensity in each subwindow, was the prominent feature. For HV polarization, however, contrast and variance- and entropy-related features were more important. The classifiers for open water and old ice have stronger dependencies on HV polarization than others. This is understandable because the main radar scattering mechanisms for those two types are strongly characterized by the portion of volume scattering: low for calm water and high for dry ice with low salinity (old ice). The classifier for new ice has a distinctive pattern that the sum averages in both polarizations are much more important than other features. This might be because the new ice has different types of recently formed ice including nilas, which is smooth but rafting can make rough features, and frost flowers, which introduce high surface roughness and volume scattering (Isleifson et al., 2014). Thus the new ice can appear either featureless and dark or complex and bright in a SAR image (Dierking, 2010). The large range in backscatter values makes it hard to define characteristic texture in the new ice patch.
The confusion matrix for testing the trained classifier for winter season
with the test dataset (DS1) is shown in Table 3. Three cases with
different feature configurations (FC1–FC3) were tested. The accuracies for
open water and old ice were higher than 90 %; however, those for young ice
and first-year ice were around 60 %. The mean difference between the
results of FC1 and FC2 was only 1.2 %, indicating that residual angular
dependency was negligible after the incidence angle correction. However, the
accuracy significantly improved from FC2 to FC3, especially with new ice
(21.2 %). The Cohen kappa coefficients
Confusion matrix of the three-class RF classifier which was trained with the DS1 winter dataset and applied to the DS2 winter dataset. The bold font indicates where the classification was done correctly.
To see how the denoising step in Sect. 2.2.2 led to improvements in the
classification accuracies, the same training and evaluation were conducted
for the same dataset without applying the textural noise correction (Table 5). In all configurations (FC1–FC3), the accuracies improved for young ice
(
The confusion matrix for testing the trained classifier for the summer season with the test dataset (DS1) is shown in Table 6. As described in Sect. 2.2.1, the further simplified three-class classification is applied. The accuracies for open water and old ice were higher than 92 %; however, the accuracies for mixed first-year ice were only around 15 % in both FC1 and FC2 and 42 % in FC3. The large difference between the results of FC1-FC2 and FC3 indicates that the mixed first-year ice likely changes at short timescales. The misclassifications for mixed first-year ice were mostly into old ice. This might be because of the surface melting and the corresponding image textures which make the discrimination between the mixed first-year ice and old ice difficult. The same patterns were observed from the confusion matrix (Table 7) for validation results from the DS2, except that the accuracy decrease from Tables 6 to 7 was particularly large for FC3, meaning overfitting for DS1. However, it is unclear if the low accuracy for mixed first-year ice is due to the classifier itself or the data. To unravel this, the data were divided into three groups of 1 month each (June, July, August), and then separate classifiers were trained and tested. Tables 8 and 9 show the results with FC1 configuration for DS1 and DS2, respectively. There was a rapid accuracy decrease for mixed first-year ice from June to August in both results (from 60.6 % to 26.5 % for DS1 and from 55.6 % to 11.0 % for DS2). As there is no particularly large difference between the numbers in Tables 8 and 9, meaning the classifiers were not overfitting, the very low accuracies for mixed first-year ice in Table 6 (14.9 %) and Table 7 (12.0 %) seem to be due to a too large temporal extent for a single classifier. In other words, the classifier for the summer season needs to be split into multiple groups with shorter time spans, and/or a feature that can effectively account for surface melting needs to be introduced into the algorithm for further development. Based on these results so far, the trained classifiers in the summer season for 3 months in bulk failed in distinguishing mixed first-year ice from old ice; thus they are close to ice–water discriminators rather than ice type classifiers.
Confusion matrix of the three-class RF classifier which was trained with the DS1 winter dataset and applied to the DS2 winter dataset with reference to the OSI SAF sea ice type product (OSI-403-c). Bold indicates where the classification was done correctly.
Averaged percent agreement of NIC weekly ice chart and OSI SAF daily sea ice type product (OSI-403-c) for the same publication dates (12 different days) in the studied domain during January–March 2019. Bold indicates where the classification was done correctly.
Figure 7 shows a daily mosaic of Sentinel-1 SAR images over the study area and the classified ice map in the winter season. For comparison, the NIC weekly ice chart is also displayed. Despite the SAR images being acquired 3 d before the ice chart was published, the ice edges of the ice chart match well with the SAR mosaic in most parts because the same SAR data were used. Overall, the discriminations between ice and non-ice, old ice and other ice types, and detection of new ice patches look reasonable. However, some young ice patches, for example the ice patches between the Svalbard archipelago, are misclassified as first-year ice. Figure 8 shows another daily mosaic made by the images acquired on the same day of the ice chart publication. Considering notable ice drift in the backscattering images in Figs. 7 and 8, the SAR-based ice classification results in both figures looking consistent, well in line with the ice drift. Although the weekly ice chart is supposed to represent the averaged ice status in the past few days, the actual ice distribution on the actual date of the publication can be largely different. This example shows a clear potential of near-real-time service of ice type classification.
Figures 9 and 10 show the same mosaics for the case in the summer season. As shown in Tables 6 and 7, the misclassifications for the mixed first-year ice into old ice are pronounced in the large ice patches north to Svalbard, while the ice edge positions of the ice chart and the classification result are in good agreement with each other.
To cope with the ambiguous classification for the winter season ice types
with low accuracy, we conducted a test with the three-class classification,
and Tables 10 and 11 show the resulting confusion matrices. The
Figure 11 shows an example of the inconsistent labelling in the reference ice chart. The SoDs from the NIC ice charts are superimposed on the Sentinel-1 backscattering images. The same ice floe (red outline) is classified differently in two different ice charts (old ice on the left panel and first-year ice on the right panel), although it looks almost the same in the corresponding SAR backscattering images. It should be noted that training with the ice chart might have included mislabelled small features even if the image selection based on ice edge matching was successful. Furthermore, the boundaries between different ice types in the ice chart are normally not as precise as those in the SAR image-based classification results. Therefore, the lower classification accuracies compared to those in previous studies (80 % in Zakhvatkina et al., 2013; 91.7 % in Liu et al., 2015; 87.2 % in Aldenhoff et al., 2018), which used manually classified ice maps as training and validation reference, are expected.
Unfortunately, we could not find an official report regarding the accuracy of the NIC ice chart information.
Table 12 shows the confusion matrices for our three-class classifiers when their prediction results are compared with the OSI-403-c product as reference. For one-to-one comparison, it was assumed that the ideal characteristics of the mixed first-year ice and the old ice in our three-class classification are equivalent to those of the first-year ice and the multi-year ice in OSI-403-c. Comparing with the results in Table 11, the accuracies for open water decreased by 6 %; however, this is mainly because the ice concentration threshold for ice–water discrimination in OSI-403-c is 35 %, which is higher than the 20 % that we set in our preprocessing of the NIC ice chart (Sect. 2.2.1). Thus areas with low ice concentration in marginal ice zone are most likely annotated as open water in OSI-403-c. The accuracies for open water at points in the NIC charts with ice concentration between 20 % and 40 % only were considerably lower, with 67.4 %, 67.8 %, and 70.1 % for FC1, FC2, and FC3, respectively (not presented in Table 12). For first-year ice, large portions (72 %) are misclassified as old ice. This might be partly explained in Fig. 12, which shows the ice classes in the NIC ice chart and OSI-403-c for the same publication date. A large extent of old ice in the NIC ice chart is annotated as multi-year ice in OSI-403-c. As our classifiers were trained with the NIC ice chart, it is natural to result in more multi-year ice for the area where the ice type is classified as first-year ice in OSI-403-c. For multi-year ice, the accuracy was the highest, 98 %.
The inconsistency in ice types between the NIC ice chart and OSI-403-c seems persistent at least for the time coverage of DS2 (January–March in 2019). Table 13 shows averaged percent agreement of the two sea ice type products for the same publication dates over 12 weeks (12 one-to-one comparisons as the NIC ice chart is a weekly product). To make a fair comparison, the ice-covered areas with ice concentrations lower than 35 %, which is the threshold for ice–water discrimination in OSI-403-c, were excluded. The percent agreement for first-year ice (58.8 %) was much lower than that of open water (90.0 %) and multi-year ice (99.0 %), which is in line with the results in Table 12. Finding the reason for the clear discrepancy of the extent of first-year ice between the NIC ice chart and OSI-403-c is beyond the scope of this study; however, it should be noted that elaborate future work for cross-calibrating ice types in different ice charts is necessary.
The proposed algorithm has several limitations. First of all, the variations
in radar backscattering and its corresponding image textures due to seasonal
changes were not properly captured. Although day of the year was tested as a
seasonality variable in the FC3 feature configuration, the result did not
show any improvement. This is because SAR image features, which partially
reflect temperature fluxes and weather regimes, might not correspond to
day of the year. Second, the proposed method struggles when the same type of
sea ice is located on different edges of the range swath of SAR images
because the incidence angle dependence could not be normalized perfectly. An
example of such a failure can be seen along the image boundaries at 79.5
A new semi-automated SAR-based sea ice type classification scheme was proposed in this study. For the first time several ice types were successfully identified on Sentinel-1 SAR imagery in the winter season, while only an ice–water discrimination was feasible in the summer season. The main technological innovation is two-fold: (i) reduced manual work in the preparation of a large amount of training and validation reference data using readily available public ice charts and (ii) more objective evaluation of the SAR-based sea ice type classifier compared to the previous studies conducted with a small number of images and customized ice type references from sources not open to the public. A conventional approach for selecting training and testing data by anonymous human ice experts is undesirable not only because it is laborious, but also due to subjectivity and lack of standardization in the assessment of the automated classifier. Therefore, the performance from different literature sources cannot be intercompared directly.
Test results from the datasets of the winter season acquired over the Fram Strait and the Barents Sea area showed overall accuracies of 87 % and 60 % and the Cohen kappa coefficients of 0.75 and 0.67 for the three-class and five-class ice type classifiers, respectively. These are slightly lower than the numbers in previous studies, and the errors are attributed not only to the automated algorithm but also to the inconsistency of the ice charts and the high level of their generalization. Test results from the datasets of summer seasons showed overall accuracy of 67 % and the Cohen kappa coefficient of 0.78 for the three-class classifiers. Considering the misclassifications in different ice types were among themselves, the three-class classifiers are not really a sea ice type classifier, but they performed well at least as an ice–water discriminator with accuracy of 98 %.
Based on the results, we envisage that three-class ice type classification from SAR imagery would be useful for making a global sea ice type product like OSI SAF OSI-403-c with higher spatial resolution. The proposed approach importantly showed that a daily ice type mapping from the Sentinel-1 data is feasible and can help capture details of short-term changes in the stage of sea ice development. Based on the achieved results, we believe that the proposed approach may be efficiently used for operational ice charting services for supporting navigation in the Arctic.
Relevant data can be made available upon request to the authors. The Sentinel-1 data are produced by European Space Agency and are freely available at
JWP and AAK formulated the research plan. JWP and AAK developed the algorithm. JWP implemented the algorithm and performed the data processing. JWP, AAK, MB, JSW, MWH, and HCK carried out the analyses, and JWP wrote the paper.
The authors declare that they have no conflict of interest.
We would like thank the three anonymous reviewers for their invaluable comments and suggestions that helped improve the manuscript. We would also like to thank the editor, John Yackel.
This research has been supported the French Service Hydrographique et Océanographique de la Marine (SHOM) (SHOM-ImpSIM Project 111222), the Research Council of Norway and the Russian Foundation for Basic Research (NORRUSS Project 243608, SONARC), and the Korea Polar Research Institute (grant no. PE20080).
This paper was edited by John Yackel and reviewed by three anonymous referees.