Sea ice thickness is a fundamental climate state variable that provides an
integrated measure of changes in the high-latitude energy balance. However,
observations of mean ice thickness have been sparse in time and space, making
the construction of observation-based time series difficult. Moreover,
different groups use a variety of methods and processing procedures to
measure ice thickness, and each observational source likely has different and
poorly characterized measurement and sampling errors. Observational sources
used in this study include upward-looking sonars mounted on submarines or
moorings, electromagnetic sensors on helicopters or aircraft, and lidar or
radar altimeters on airplanes or satellites. Here we use a curve-fitting
approach to determine the large-scale spatial and temporal variability of
the ice thickness as well as the mean differences between the observation
systems, using over 3000 estimates of the ice thickness. The thickness
estimates are measured over spatial scales of approximately 50 km or time
scales of 1 month, and the primary time period analyzed is 2000–2012 when
the modern mix of observations is available. Good agreement is found between
five of the systems, within 0.15 m, while systematic differences of up to
0.5 m are found for three others compared to the five. The trend in annual
mean ice thickness over the Arctic Basin is

In recent years great interest has developed in the changes seen in Arctic sea ice as ice extent and volume have markedly decreased. While ice extent is reasonably well observed by satellites, observations of ice thickness have been, until recently, sparse. Sea ice model reanalyses (e.g., Schweiger at al., 2011) provide useful estimates of thickness and volume loss but so far do not directly incorporate observations of ice thickness. An observational record that does not depend on a sea ice model therefore remains of substantial interest. Historically, a great number of ice thickness measurements have been made at specific locations using drill holes or ground-based electromagnetic methods; however, these point measurements are difficult to translate into area-averaged mean ice thickness because of the highly heterogeneous nature of the ice pack. Estimates of mean ice thickness require a large number of independent samples. In the last 10 years or so a number of different observations of mean sea ice thickness have been made available by different groups using a variety of different methods. The longest historical record is from sporadic observations made by submarines using upward-looking sonar (ULS) to measure ice draft (Rothrock et al., 1999, 2008). These measurements are currently available starting in 1975 and ending in 2005 and include data from 34 cruises. They have broad but incomplete spatial coverage and limited sampling of the seasonal variations. ULS measurements from anchored moorings have been made by a number of different groups (e.g., Vinje et al., 1998; Melling et al., 2005; Krishfield et al., 2014; Hansen et al., 2013). Each has excellent temporal sampling with record lengths of up to 10 years although only for single locations. More recently, airborne and satellite-based observations have become available. Operation IceBridge uses lidar and radar technology on a fixed-wing aircraft beginning in 2009 (Kurtz et al., 2012) and electromagnetic methods from helicopters have been used to measure the snow plus ice thickness since 2001 (Pfaffling et al., 2007; Haas et al., 2009). Satellite-based lidar techniques began with ICESat during the years 2003–2008 (Kwok et al., 2009; Yi and Zwally, 2009). Radar altimeter techniques are used with data from Envisat (2002–2012; Peacock and Laxon, 2004) and from CryoSat-2 beginning in 2010 (Laxon et al., 2013; Kurtz et al., 2014). However, Envisat and CryoSat-2 estimates are not included in the current study because there are currently few publicly available ice thickness data from these instruments that are not preliminary products.

Observations from submarine ULS instruments have previously been used to establish the time and space variation of sea ice draft using a curve-fitting approach for a limited area of the Arctic Basin (Rothrock et al., 2008). Here we extend this approach by including more recent observations of ice thickness from multiple sources, including satellites, and expand the area to the entire Arctic Basin. In addition, we examine if there are systematic differences between individual data sources. This is important because the data sources differ markedly in their methodologies and sampling characteristics, which may result in systematic errors that can affect the spatial and temporal characteristics of the ice thickness time series.

Differences in mean ice thickness from the various measuring systems vary on a wide range of temporal and spatial scales and even measurements obtained from samples nearly identical in time and space may show differences depending on sampling error, how the measurement is made, and how the systems record small-scale variability. The differences in the results from different measurement systems may also depend on ice type (first-year or multiyear), degree of deformation, ice thickness, snow depth, or season. This study is a first attempt to characterize these differences for a broad range of observing systems with a single number that characterizes the difference between any two observing systems.

All available ice thickness observations are fit with a multiple regression least-squares solution of an expression for the mean ice thickness that is a function of time and space. The expression includes non-linear terms that characterize the spatial and temporal variability as well as terms that indicate which observation system is associated with each observation. The observations can be restricted to particular observation systems, geographic regions, or time periods to refine the analysis, with the trade-off of the results being less general. We begin the analysis with a basin-wide selection of all available observations for the time period 2000–2012, then focus on specific observation systems or regions. The trend in the mean ice thickness determined by the regression expression is compared to model-based estimates and other observational studies. We then expand this analysis to include data back to 1975 to compare with and update the results of Rothrock et al. (2008) and provide an assessment of the 39-year change in ice thickness for the central Arctic Basin from the observational record. An assessment of errors, including sensitivity analyses that examine the role of individual observing systems and focus on subregions of the Arctic, follows.

Locations of the observations from different data sources.

Times and ice thickness of the observations from different data sources. The primary focus is on the years after 2000 (dashed line).

Observational data sets.

The Unified Sea Ice Thickness Climate Data Record (Sea Ice CDR) is a collection of Arctic sea ice draft, freeboard, and
thickness observations from many different sources. It includes data from
moored and submarine-based upward-looking sonar instruments, airborne
electromagnetic (EM) induction instruments, satellite laser altimeters
(ICESat), and airborne laser altimeters (IceBridge). The point observations
have been averaged spatially for roughly 50 km and temporally for 1 month.
The mooring data are averaged only in time, the submarine data only in space,
and the airborne and satellite data are averaged both temporally (1 month)
and spatially (50 km); e.g., airborne data from one campaign that are taken a
few days apart are averaged together. In all data sets except ICESat-J, open
water is included in the mean ice thickness estimates. The mean measurements
and the probability distributions for all of the sources are collected in a
single data set with uniform formatting, allowing the scientific community
to better utilize what is now a considerable body of observations. The Sea
Ice CDR data are available at the National Snow and Ice Data Center
(Lindsay, 2010, 2013; also at

We have little information on the absolute accuracy of the averaged samples because we do not know the degree to which the reported measurement errors are uncorrelated. Clearly if the errors are uncorrelated, the many thousands of point observations that typically comprise a sample would result in very small sample errors (Kwok et al., 2008). However, this assumption is unrealistic (Kwok et al., 2009) since the sea ice characteristics that affect these errors (e.g., thickness variability, snow cover, ridging) likely have spatial autocorrelations substantially larger than the distance between samples (Zygmontovska et al., 2014).

Following RPW08, who developed a regression model to fit ice draft
observations from US submarine data for a sub-area of the Arctic Basin, a
smooth function of space and time,

The choice of terms in the regression follows the methods of RPW08. The
spatial coordinate system

The multiple regression procedure provides an estimate of the standard error
of each of the coefficients:

For the entire Arctic Basin, 2000–2012, the ITRP outlined above selected
21 terms: 7 for indicator variables and 14 for time and space variability of
the ice thickness. Table 2 shows all of the terms and coefficients for this fit. The
multiple regression coefficient is

Fit to ice thickness observation data from the Arctic Basin for 2000–2012.

ITRP coefficients for the Arctic Basin for all observational sources,
2000–2013. Sigma is the standard error of the coefficient and the

As a step towards generating a time series of sea ice thickness from
observations alone, we need to determine what, if any, the mean
differences are between the ice thickness estimates from the different
measurement systems. The ITRP provides a method to do this even when the
observations are not coincident. In this analysis the observation sources
with indicator coefficients not significantly different from zero are
Air-EM, BGEP, IOS-CHK, ICESat-G, and the submarines, indicating that these
sources are all consistent in the mean with each other over the region and
period analyzed. There is just a 0.11 m spread in the mean between the five systems. Ice thickness
data from the three submarine cruises agree in the mean with the ICESat-G
data very closely, with a bias coefficient of

The ICESat-J coefficient, 0.42, indicates that on average the JPL thickness
product is 0.42 m thicker than the Goddard product. A small portion of this
difference is due to the lack of inclusion of open water in the ice
thickness estimates but the bulk of the difference between the ICESat-G and
ICESat-J values may be related to the different techniques of determining
the sea level in order to obtain the freeboard and the different methods for
estimating snow depth. The ITRP shows the ICESat-J estimates are on average
0.47 m thicker than the submarine-based estimates. In contrast, Kwok et al. (2009)
found that the ICESat track estimates of ice draft were 0.1 m

The estimation of the submarine coefficient is sensitive to the inclusion of
a particular cruise. The large difference between the submarines and the
ICESat-J estimates for the entire basin stems from the inclusion of the
2000 submarine cruise when there is no overlap with the ICESat data. If the
analysis period is chosen as 2001–2012 with all sources included, the
ICESat-J product is found to be just 0.05 m

The IceBridge data are also significantly thicker than the reference data, in
this case by 0.59

The ITRP expression for the whole basin can be used to evaluate the spatial and temporal patterns of ice thickness change. To do this, the expression was evaluated at every location within the basin on a 40 km grid with all of the indicator variables set to zero. Here it is important to reconsider the choice of the reference system, ICESat-G. Table 2 shows that the ICESAT-G coefficient, zero by its selection as the reference, is very close to the median value of the coefficients of the cluster of five observation systems that have quite similar coefficients: submarines, BGEP, IOS-CHK, ICESat-G, and Air-EM. These systems have a range of coefficients of 0.11 m, indicating that when spatial and temporal variability is accounted for there is little mean difference in the observations. The coefficients for these five are not significantly different from each other since the sigma values are between 0.06 and 0.13 m (Table 2). This suggests that using ICESat-G as a reference predicts an ice thickness that is consistent with observations from these five systems but not with the unadjusted observations from IOS-EBS, ICESat-J, or IceBridge.

The mean ice thickness for the 2000–2012 period is shown in
Fig. 4. The map shows a maximum along the Canadian coast and a minimum in the vicinity of the New Siberian
Islands. The ITRP annual mean basin-average ice thickness has declined from
2.12 to 1.41 m (34 %) with a linear trend of

The regression analysis of RPW08 concentrated on submarine ice draft data
from 1975 to 2000 within the SCICEX box. They determined that the best fit
included terms up to fifth order in space and up to third order in time. The
fit showed a maximum in 1980 followed by a steep decline and then a leveling
off at the end of the period. Kwok and Rothrock (2009) used 5 years of
ICESat data to analyze the fall and winter changes in the ice draft for an
additional 5 years, to 2008; however, their regression procedure did not take
advantage of the spatial information in the ICESat data but simply
concatenated submarine and satellite records. They found the ICESat data
showed an additional modest thinning. In order to estimate the temporal
variation of ice thickness from 1975 to 2013 and to compare our results to
those of RPW08, the ITRP is extended back to 1975 in this region. The fit
procedure was performed using all of the data available from all sources
that fall within the box, 3017 observations in all. Figure 5 shows the third-order fit from
this study and the third-order curve from RPW08 that is computed for the
years 1975–2001. The ITRP fit includes indicator variables as before and
12 additional terms:

The difference in the trends between the observations and the model for the 1979–2012 period may possibly be due in part to a time-varying bias of the submarine observations. The early part of the record has much thicker ice in this region than the later part. The thicker ice has much larger variability in the ice draft and hence the bias related to the first-return correction (see also below) may be much larger for the earlier thicker ice. If this is the case, the early ice thickness is overestimated by the draft measurements and the magnitude of the ice thickness trend is smaller than estimated here.

Percival et al. (2008) find that the spatial autocorrelation of 1 km ice
draft measurements from submarines exhibits what is known as a long-memory
process, in which the spatial autocorrelation does not drop off as quickly as
for an autoregressive process at length scales up to 80 km. This means that
the sampling error drops off with the track length

As mentioned above, the submarine ice draft data have all been corrected
with a constant

The snow depth or snow water equivalent needs to be taken into account in
determining the ice thickness in all of the measurement systems. The error
in the estimated snow depth then contributes to the error of the thickness
estimate. However, the error in the snow depth is much less important for the
ULS observations of ice draft from submarines and moorings than for the
systems that measure the freeboard of the snow surface such as ICESat and
IceBridge. For the ULS, the snow correction for ice draft,

As we have alluded to above, sampling error can be a significant and serious source of uncertainty in comparing different ice thickness observations. All of the samples are from different times and/or places, so there are real differences in the nature of the ice sampled by the different measurements. The method used here depends on obtaining a large number of observations from a broad range of ice conditions so that comparisons in the mean can be made while accounting for large-scale variations in the mean ice thickness. The error in the fit includes random measurement errors, systematic measurement errors, sampling errors, and errors related to the inadequacy of the ITRP expression to fully represent the thickness variability.

One way to address the robustness of the results is to randomly withhold
some of the data and repeat the fits to see if the coefficients change
significantly. A set of 100 fits were computed for the entire Arctic Basin,
2000–2012, for each of which only half of the data, randomly selected for
each system, was used. The mean of the resulting indicator coefficients is
very similar to that found using all of the data and the variability of the
coefficients from this ensemble is comparable to the standard error,

Coefficients of the ITRP indicator variables for fits that leave one
data source out at a time for the Arctic Basin, 2000–2012. The coefficients
for each source are grouped together. Grey bars show the coefficients for a fit
that includes all of the observations, and bars in other colors indicate which
source has been left out as shown by the colors of the diagonal labels
(same order as the bars). The black lines give the 1

The importance of the individual data sources for computing the bias coefficients can be explored by repeating the analysis while leaving out each of the sources in turn. Do the bias coefficients change significantly? Figure 6 shows a bar chart of the indicator coefficients when just one data source is left out. The coefficients for most of the sources are quite similar for all of the ITRP fits. The largest variability is seen for the coefficients for IOS-EBS, which is not surprising given the isolated location of these measurements. The IOS-EBS coefficient is particularly sensitive to the exclusion of the BGEP or submarine data. There is also a fair amount of variability for the IceBridge coefficients, but in all cases the coefficients are still large. However, if both ICESat data sets are excluded and the submarines are used as a reference, we find very large changes in the relative magnitudes all of the remaining coefficients (not shown). This indicates the great importance of the satellite data in establishing the spatial structure of the ice thickness fields when performing broad analyses of observing system differences.

The comparisons between data sets depend very much on the nature of the samples available for each. If they are far removed from each other in space or time, the true variability of the ice thickness may contaminate the difference estimates. For example, a bias between the observations could be partially resolved by the regression procedure with a spatial term if there is no spatial overlap. In addition, the differences between measurement systems may not be constant because the source of the bias, for example snow thickness or small-scale sea ice variability, is not constant. One way of addressing these uncertainties is to examine subsets of the data to see if differences observed between the systems are more or less robust. We look at five different regions, all for the period 2000–2012: (1) the entire Arctic Basin and using all measurement systems (the fit mentioned above), (2) the so-called SCICEX box in a broad region of the central basin that includes all submarine observations, (3) a 500 km radius circle centered on the BGEP moorings in the Beaufort Sea, (4) a 500 km circle centered on the North Pole, where a variety of observations are concentrated, and (5) a 300 km circle in the Lincoln Sea to evaluate Air-EM and IceBridge observations. Table 3 lists the summary information for each fit and Fig. 7 shows their locations. The coefficients of the indicator variables provide an estimate of the mean difference between each set of observations and the reference set in the sense that the RMS error of the fit is minimized if this difference is accounted for. Table 4 lists the values of the indicator coefficients for each fit and the RMS error of the fit for each observation source. Figure 7 shows the relative magnitudes of the coefficients for easy intercomparison of the bias terms determined for the different regions.

The region, time period, number of observations used, number of terms, multiple regression coefficient, and RMS error (m) for each ITRP fit.

Number of observations, the indicator coefficients and their

Data from US submarines are available mostly from a data release area
defined by the US Navy (RPW08), the so-called “SCICEX box” (taken from the
project name Scientific Ice Expeditions). Of the 34 submarine cruises
available since 1975, there are only three cruises after 2000. However, the
box is a convenient way to restrict the geographic extent of the data
considered to a broad region in the central basin and to also compare our
results to those of RPW08. For the 2000–2012 period the submarine data are
still in good agreement with the reference, 0.14

In the Beaufort Sea the four BGEP moorings provide abundant data for the
entire annual cycle, and this is a good location to further assess the mean
differences between the data sets while restricting the amount of spatial
variability that is encountered. Within a 500 km circle of the center of the
mooring array there are Air-EM and IceBridge observations as well as the
satellite-based estimates. Compared to the reference, ICESat-J estimates are
0.54

Abundant observations from submarines, IceBridge, Air-EM, and ICESat are available in the vicinity of the North Pole. ICESat-G has no observations closer than 400 km because of the nadir viewing of the satellite lidar while the ICESat-J data set has estimates within this circle based on interpolation from adjacent data points. A 500 km circle centered on the pole includes observations from both data sets. Note that data from a mooring at the pole, part of the North Pole Environmental Observatory, are still being reprocessed (Moritz personal communication) and is not included. Within this circle 508 observations are used for the fit. In this region the IceBridge estimates are 1.13 m thicker than the submarine estimates and 0.59 m thicker than the Air-EM estimates. ICESat-J estimates are 0.28 m thicker than the ICESat-G estimates. The coefficients from this fit are in general consistent with those for the entire basin (Fig. 7b and Table 4).

Is the large thickness bias in the IceBridge observations seen in the
previous analyses robust? IceBridge observations have a coefficient larger
than that of any of the other measurement systems in each of the fits except
for the Beaufort Sea, where it is smaller than the Air-EM coefficient.
Perhaps the IceBridge data are not well represented in the regression
equation because they are concentrated in thick ice near the Canadian coast. We
can partially address the IceBridge bias by examining only IceBridge and
Air-EM measurements in a limited region in the Lincoln Sea, where there are
50 Air-EM and 76 IceBridge measurements within 100 km and one month of each
other during the springs of 2009, 2011, and 2012. The ITRP shows that for
this sample the IceBridge data are 0.75

There is no gold standard for the estimation of the mean thickness of sea ice. All of the existing measurement techniques have one or more large sources of uncertainty. In situ measurements from the surface cannot sample the full thickness distribution. The submarine ULS measurements depend of the first-return echo to determine the ice draft, which is a potential source of unknown bias that may be a function of the bottom roughness. The mooring ULS measurements may also be subject to this same source of error. Both have potential errors in determining the open water level and accounting for the correct snow water equivalent. The satellite and airborne lidar observations depend on reliable detection of the surface height of nearby leads to accurately determine the height of the ocean surface and hence the total freeboard. The Air-EM measurements require an independent estimate of the snow depth, as do the satellite lidar measurements. All of the measurements struggle with obtaining an accurate mean value when the thickness is highly variable within the sensor footprint due to ridging. Finally, none of the measurements have been verified against other observations over regions that encompass the full ice thickness distribution of the area.

This study has determined some broad measures of the relative bias of the different systems. The ITRP method is dependent on having a large number of independent observations from each system so that a function can be fit to the thickness observations to account for the large-scale variability of the ice thickness. In addition to the nonlinear space and time variables, a bias term is included for each system that can contribute to the minimization of the error of the fit by adding or subtracting a constant value to all observations from a given system. This bias term can only be interpreted in a relative sense: how much thicker or thinner, in the mean, is one system compared to another? While we have typically used the ICESat-G system as a reference here, that does not mean it is a priori considered to be more accurate than the others. Indeed, nothing in the study speaks to the absolute accuracy of the measurements.

When ordered by relative magnitude of the coefficient of each system (Table 2), we see that the coefficient for IOS-EBS has the largest negative value relative to ICESat-G. However, because these measurements are in a small corner of the southeastern Beaufort Sea, we have little confidence that this result is a good indicator of the bias of the ULS measurements in this location compared to the other measurements. Of the others, ICESAT-G, submarines, IOS-CHK, BGEP, and Air-EM are all in broad agreement and in the mean are within 0.11 m of each other. However, we saw that the submarine bias coefficient is sensitive to the inclusion of the 2000 cruise. ICESat-J is 0.42 m thicker than ICESat-G but in good agreement with the submarine measurements in 2005. Finally, the IceBridge measurements average 0.59 m thicker than ICESat-G measurements.

It is beyond the scope of this study to determine why some of the observation systems appear to have biases, sometimes very significant, compared to the others. Possible sources of these discrepancies are the interpretation of ULS echo data, assumptions about snow depth or snow water equivalent, and methods of determination of the ocean water level for the lidars. While it is possible that there are systematic errors in determining the measurement differences introduced by the different times and locations of the observations, so called sampling errors, all of the systems, with the possible exception of IOS-EBS and the submarines, have sufficient observations spread over large spatial or temporal ranges to make this unlikely. Figure 7 shows the range of the coefficients determined with various spatial subsets of the data. For the entire basin, the experiment in which only a random half of the data from each system was used in a large set of fits gives very similar results to that when using the full data set. The leave-one-out experiment showed that the satellite measurements had a greater impact on the bias coefficients than the other systems. While our results provide an estimate of the relative biases of the measurement systems, they also point to the fact that more research to understand, characterize, and correct these errors is clearly required before we can homogenize the observational ice thickness record.

The ITRP annual mean basin-average ice thickness over the period 2000–2012
has declined 34 %, a trend of

This study was supported by the NASA Cryospheric Sciences Program under grant number NNX11AF45G and by the National Science Foundation Division of Polar Programs under grant number 1023283. We thank all of the data providers for sharing ice thickness observations, often obtained under difficult conditions. We thank Harry Stern and two reviewers for a careful review of the manuscript. Edited by: C. Haas