Using Records from Submarine, Aircraft and Satellites to Evaluate Climate Model Simulations of Arctic Sea Ice Thickness

Arctic sea ice thickness distributions from models participating in the World Climate Research Programme Coupled Model Intercomparison Project Phase 5 (CMIP5) are evaluated against observations from submarines, aircraft and satellites. While it is encouraging that the mean thickness distributions from the models are in general agreement with observations, the spatial patterns of sea ice thickness are poorly represented in most models. The poor spatial representation of thickness patterns is associated with a failure of models to represent details of the mean atmospheric circulation pattern that governs the transport and spatial distribution of sea ice. The climate models as a whole also tend to underestimate the rate of ice volume loss from 1979 to 2013, though the multimodel ensemble mean trend remains within the uncertainty of that from the Pan-Arctic Ice Ocean Model-ing and Assimilation System. Although large uncertainties in observational products complicate model evaluations, these results raise concerns regarding the ability of CMIP5 models to realistically represent the processes driving the decline of Arctic sea ice and to project the timing of when a seasonally ice-free Arctic may become a reality.


Introduction
The last four decades have seen a remarkable decline in the spatial extent of Arctic sea ice at the end of the melt season.Based on sea ice concentrations from the National Snow and Ice Data Center (NSIDC) Sea Ice Index (Fetterer et al., 2002), the linear trend for September, as calculated over the 1979-2013 period, stands at −14.0 % dec −1 , or −895 300 km 2 dec −1 .The downward trend has been linked to a combination of natural climate variability and warming that is a response to increasing concentrations of atmospheric greenhouse gases (e.g., Notz and Marotzke, 2012;Stroeve et al., 2012a).The extent recorded for September 2012 (the record low in the satellite era) was only 50 % of values recorded in the late 1970s to early 1980s.Volume losses are even greater, showing an 80 % decline in between September 1979 and 2012 according to the Pan-Arctic Ice Ocean Assimilation System (PIOMAS).While September ice extent rebounded in 2013, partly a result of anomalously cool summer conditions (e.g., Stroeve et al., 2014), it was still the 6th lowest in the satellite record.
Coupled global climate models (GCMs) consistently project that, if greenhouse gas concentrations continue to rise, the eventual outcome will be a complete loss of the multiyear ice cover, that is, sea ice will become a seasonal feature of the Arctic Ocean (e.g., Stroeve et al., 2007Stroeve et al., , 2012b)), presenting both challenges and opportunities to Arctic residents, government agencies and industry.While GCMs can provide useful projections of when a seasonally ice-free Arctic Ocean may become a reality, confidence in these projections depends on their ability to reproduce features of the present-day climate.Stroeve et al. (2012b) found that models participating in the World Climate Research Programme Coupled Model Intercomparison Project Phase 5 (CMIP5) are more consistent with observations than those from the previous CMIP3 effort, with 67 % of the models (or 16 out of 24) having a 1953-1995 mean September ice extent falling within the minimum and maximum bounds of observed values.However, historical trends from 85 % of the model ensemble members examined remain smaller than observed, and the spread in simulated extent between different models remains large.
Realistically simulating the past and future evolution of the Arctic's floating sea ice cover is one of the most challenging facets of climate modeling.Simulating the sea ice thickness spatial distribution has emerged as a key issue.While it follows that climate models with an overly thick initial (early 21st-century) ice cover will tend to lose their summer ice later than models with initially thinner ice given the same climate forcing (e.g., Holland et al., 2010), the ice thickness distribution strongly determines surface heat fluxes, impacting on both the ice mass budget and ice loss rate, which is in turn a major driver of Arctic amplification -the outsized rise in lower-tropospheric air temperatures over the Arctic Ocean compared to lower latitudes (Serreze et al., 2009).
A major difficulty in evaluating thickness distributions in GCMs is the lack of consistent observations spanning a sufficiently long time period.It was not until 2003 that temporally limited (autumn and spring) near-Arctic-wide estimates of thickness became available from NASA's Ice, Cloud, and land Elevation Satellite (ICESat) Geoscience Laser Altimeter System (GLAS).Prior to ICESat, information was largely limited to data from upward-looking sonars on board British and US submarines collected during the 1980s and 1990s and mainly covering the region near the pole as well as data from several moorings providing time series in fixed locations (Lindsay, 2010).The first European remote-sensing satellite (ERS-1) included a radar altimeter that provided fields of estimated sea ice thickness up to latitude 81.5 • N, but only for the 1993 to 2001 period (Laxon et al., 2003).Since the failure of ICESat in 2009, additional sea ice thickness measurements have become available from airborne flights as part of NASA's Operation IceBridge program.Arctic-wide coverage has since resumed, starting in 2010 from the radar altimeter on board the European Space Agency's CryoSat-2.Together, these data provide a valuable source of information for the validation of spatial patterns of sea ice thickness.In addition, satellite and in situ observations have been used to provide validation of sea ice reanalysis systems such as PIOMAS, which in turn may provide a consistent record of thickness and volume for comparison with climate model long-term trends (Schweiger et al., 2011).
This paper examines biases in contemporary Arctic sea ice thickness and ice volume from the CMIP5 models making use of all of these data sets.Model thicknesses are evaluated for the whole of the Arctic Ocean and on a regional basis depending on data coverage.Since radar measurements are influenced by snowmelt and IceBridge data are only available in March, we focus on spring (e.g., March) estimates of ice thickness.Modeled ice volume spanning the 1979-2013 period is further evaluated against volume estimates simulated from PIOMAS (Zhang and Rothrock, 2003) for the months of March and September.

Evaluation framework
We evaluate models using three criteria: (1) how well they replicate the statistical distribution of observed mean sea ice thickness fields based on aggregating all available data across the Arctic for each observational data set; (2) how well they replicate the observed spatial pattern of sea ice thickness; and (3) how well they replicate the best estimate of trends in sea ice volume.The first two evaluations make use of the thickness records from in situ moorings and submarine, aircraftand satellite-borne instruments introduced in the previous section.This record is not sufficiently homogenous to evaluate thickness or volume trends, which is why we also make use of the PIOMAS record.PIOMAS assimilates sea ice concentration, sea surface temperature and ice velocity.While PIOMAS is a model and sensitive to the atmospheric reanalysis used, estimates of thickness compare well with in situ observations and submarine, airborne and satellites measurements (Zhang and Rothrock, 2003;Schweiger et al., 2011;Lindsay et al., 2012;Laxon et al., 2013).
A further difficulty in our model evaluation, amplified by the piecemeal nature of the ice thickness record, is that individual years in CMIP5 model time do not correspond with the same years in the observational record.Imprints of intrinsic natural climate variability in the observational record (such as that associated with the phase of the North Atlantic Oscillation) will likely be out of phase with natural variability in the model simulations.Thus, discrepancies in modeled ice thickness can either be due to model biases or natural climate variability.Ideally, climatologies of modeled sea ice thickness need to be compared with observed climatologies that are of similar length and long enough (e.g., 30 years) to average out most of the natural variability.
Monthly mean fields of sea ice thickness for 92 ensemble members of 33 climate models from the CMIP5 archive were downloaded from the Earth System Grid of the Program for Climate Model Diagnosis and Intercomparison (PCMDI) data portal (http://cmip-pcmdi.llnl.gov/cmip5/).The archive consists of both atmosphere-ocean global climate models (AOGCMs) and earth system models (ESMs), the latter incorporating interactive biogeochemical cycles into AOGCMs.Both the historical (1850-2005) and future Representative Concentration Pathway (RCP) 4.5 (2006-2100) emission scenarios were processed, and the same number of ensembles for both emission scenarios were used.RCP4.5 is a medium-mitigation scenario that stabilizes CO 2 at ∼ 650 ppm at the end of the century (e.g., Thompson et al., 2011), corresponding to a radiative forcing of 4.5 Wm −2 by 2100.It is perhaps a conservative scenario given current emission rates.A listing of the models used can be found in Table 1.
Monthly mean thickness fields for the 1981 to 2010 period were calculated for every ensemble member.For models having more than one ensemble member, mean thickness fields from each ensemble for a given model were averaged to form a single ensemble average.Spatial resolutions vary considerably from high-resolution ocean modeling grids to coarse grids with a roughly 1 • x 1 • spacing.To enable com-parisons between models and the observations, mean thickness fields were regridded to the 100 km Equal Area Scalable Earth (EASE) grid (Brodzik and Knowles, 2002) using a drop-in-the-bucket approach.The 100 km resolution corresponds to the resolution of the coarser model grids.
To compare aggregate mean thickness (evaluation criterion 1), frequency distributions were derived for each model using the regridded mean fields.Separate distributions were produced for each observed thickness field so that model thicknesses could be extracted corresponding to the coverage of each of the observed thickness data sets.For example, only grid cells with thicknesses from both IceBridge and the model were used when evaluating how well the models represent the aggregate thickness distribution during the IceBridge time period.Regridded model fields were also used to evaluate spatial thickness patterns (criterion 2).To ensure that model ensemble members can be used for validation of spatial patterns, it is important to first assess the natural variability of the sea ice thickness spatial patterns within the models.For models with five or more ensemble members, we evaluated the variability in spatial patterns and Arctic-wide mean thickness from 1981 to 2010 (Fig. 1).As expected, higher variability is the rule over the North Atlantic near the sea ice margin.Three of the models (CCSM4, EC-EARTH and HadCM3) stand out because of high local variability, such as in the Beaufort Sea sector in CCSM4.Two of these models (CCSM4 and EC-EARTH) incorporate an ice thickness distribution (ITD) framework (Table 1).It could be that models that resolve the statistical sub-grid-scale distribution of ice thickness produce grid cell thicknesses more strongly influenced by natural variability than models without ITD.However, for the models evaluated, variability is less than 8 % of the mean over the Arctic Ocean as a whole.In addition, spatial pattern correlations between individual ensembles within a model are above 0.9 (and mostly above 0.98) (not shown).This suggests that the fragmented observational record offers an opportunity to compare characteristics of the thickness patterns, which are less impacted by natural variability.
To evaluate criterion 3 (trends in ice volume using PI-OMAS records), March ice volume was calculated for each model ensemble member corresponding to the domain of the PIOMAS estimates.Unlike thickness, ice volume was calculated on the native model grid.Ice thickness in the CMIP5 archive is given as the grid cell mean including ice-free portions of the grid cell.Grid cell ice volume is simply the product of the mean grid cell thickness and grid cell area.Grid cell volumes were summed for the PIOMAS domain to give a time series of monthly mean ice volume.

Data: observations
As previously introduced, the observed record of sea ice thickness is based on a combination of in situ, submarine, aircraft and satellite data.Although records are available from 1975 through the present, no one data source is spatially or temporally continuous over the whole of this period, making the construction of a homogenous time series from observations alone impossible.To provide a long-term picture, estimates of ice thickness from different sources must be combined.We provide gridded fields at two resolutions on the EASE grid (25 and 100 km) that facilitate comparisons with both PIOMAS (distributed at 25 km spatial resolution) and the CMIP5 mean thickness fields (100 km resolution).
Unclassified sonar data from US Navy and UK Royal Navy submarine missions provide the earliest estimates, starting in 1975 and ending in 1993.Ice thickness estimates from submarines and other platforms have been collated and processed into a consistent format by R. Lindsay at the University of Washington Polar Science Center to produce the Unified Sea Ice Thickness Climate Data Record (CDR) (Lindsay, 2010).The most recent version of the submarine data was obtained from the University of Washington, Polar Science Center.An archive version of the CDR, which is updated annually, is also hosted by NSIDC (Lindsay, 2013).Submarine sonars provide measurements of ice draft (the depth of ice below sea level).Rothrock and Wenshahan (2007) document the conversion of ice draft into thickness.Briefly, ice thickness is derived from draft estimates using Archimedes' principle with assumed ice, snow and water densities, and the depth of snow on the ice.In most cases, snow depth is unknown and the Warren snow climatology (Warren et al., 1998) is used.Rothrock and Wenshahan (2007) estimate an average thickness bias from the sonar data compared to direct observations of 0.29 m.We subtracted this bias from the submarine data set prior to comparison with the CMIP5 model output.Following Schweiger et al. ( 2011), we only use data from US cruises because the processing history for UK cruise data is uncertain.Submarine cruises are designated as spring or summer.We use spring cruises, defined as occurring between March and June.Most cruises provide data for the central Arctic Ocean, away from the shallow continental shelves.
Upward-looking sonar (ULS) instruments on bottomanchored moorings in the eastern Beaufort Sea, Beaufort Gyre and Chukchi Sea provide further estimates of ice thickness.Moorings in the eastern Beaufort Sea and Chukchi Sea are maintained by the Institute of Ocean Sciences (Melling and Riedel, 2008).Data records start in 1990 and end in 2005.Moorings in the Beaufort Gyre region are maintained and data are made available by the Beaufort Gyre Exploration Project based at the Woods Hole Oceanographic Institution (http://www.whoi.edu/beaufortgyre).ULS on moorings also measure ice draft.The most recent versions of these in situ ice draft estimates were also obtained from the Polar Science Center.Thickness was calculated from in situ ice drafts using the same method as applied to the submarine data.
Unlike submarine sonar, satellite and aircraft radar and laser altimeters measure the height of bare ice, snow-covered ice and snow surfaces above the ocean surface, depending on instrument characteristics and surface conditions.By identifying leads between the ice floes, the freeboard (the height of the snow or ice surfaces above sea level) can be derived.Ice freeboard is converted to ice thickness using Archimedes' principle in a similar way as for the conversion of submarine ice draft to ice thickness, using estimates or assumptions of snow and ice density and snow depth.Laxon et al. (2003) retrieved ice thickness from the 13.8 GHz radar altimeter on board the ERS-1 satellite and assessed changes in Arctic sea ice thickness from 1993 to 2001 up to latitude 81.5 • N. The winter sea ice area covered by ERS-1 is about 3.08 ×10 6 km 2 and includes the Beaufort, Chukchi, East Siberian, Kara, Laptev, Barents and Greenland seas.ERS-1-derived ice thickness is provided as a sin-gle mean field averaged from 1993 to 2001 for the month of March on a 0.1 • latitude by 0.5 • longitude grid.
ICESat, with its laser altimeter, provided the first thickness data set to cover almost the entire Arctic Ocean.Thicknesses are derived based on the methodology described by Kwok et al. (2009).The ICESat archive provides 5 years (2004)(2005)(2006)(2007)(2008)(2009) of gridded fields at a 25 km resolution.Estimates of thickness extend up to 86 • N. Kwok et al. (2009) estimate an uncertainty of 0.5 m for each 25 km grid cell.Operation IceBridge is an ongoing airborne laser altimeter mission aimed at bridging the gap between ICESat and the follow-on ICESat-2 scheduled to launch in 2017.IceBridge provides individual tracks of ice thickness, generally confined to the western Arctic Ocean during March and April from 2009 to present (Kurtz et al., 2012).Coverage is sparse in the early years of the program but subsequently improves.Each IceBridge track gives ice thickness estimates at 40 m spacing.Thickness retrievals are detailed by Kurtz et al. (2013).Finally, CryoSat-2 thickness estimates are derived using a satellite radar altimeter with coverage extending up to 88 • N. We use the preliminary thickness product produced by the Alfred Wegner Institute (www.meereisportal.de/cryosat).Data are available for 2011 through 2013 on the EASE-2 25 km grid (Brodzik et al., 2012).
Ice thickness is also measured using a combination of airborne electromagnetic (EM) induction instruments and laser altimeter (Haas et al., 2009).The instrument package is flown above the sea ice surface by helicopter.The EM instrument is used to detect the distance between the instrument and the ice-water interface.The laser altimeter provides the height of the snow or ice surface.The difference between the two measurements provides the combined snow and ice thickness.Ice thickness can be obtained using information about snow thickness and density.EM-derived ice thicknesses are available for the central and western Arctic Ocean between 2002 and 2012.These data are also included in the Unified Sea Ice Thickness CDR and were obtained from the Polar Science Center.
All satellite-derived ice thickness fields were regridded as needed from their original gridded format to 25 km and 100 km EASE grids using a drop-in-the-bucket averaging.This provides a mean 1993-2001 thickness field from ERS-1 as well as a yearly field for each of the 5 ICESat years (spring 2004 to 2009) and each of the 3 CryoSat years (2011 to 2013).Period-of-record mean fields from ICESat and CryoSat were calculated additionally, by first averaging on their native grids and then regridding to 25 and 100 km resolution.
The in situ mooring data, airborne EM, IceBridge and submarine sonar track data needed to be handled differently.For comparison with CMIP5, all observed thickness estimates within 70 km of a 100 km EASE grid box center were averaged to give a grid cell mean thickness.To provide the best coverage for a comparison with modeled thickness distributions, all thickness estimates for all years were used to calculate a single average field for the period of record.In addition, grids of IceBridge and submarine data at 25 km spatial resolution were produced for individual years by combining multiple flight lines and cruise tracks in a single year.Since the time periods of coverage vary, composites of ice thickness from IceBridge and submarine data are based on a range of times during the observational intervals and do not exactly correspond to monthly averages.This will introduce a temporal sampling error when making comparisons between the observations from these data sets and the monthly CMIP5 model and PIOMAS output.
Along with temporal sampling problems, the various thickness records have a range of biases due to differences in sensor types and retrieval approaches.Radar and laser technologies use different wavelengths and footprints, and different techniques have been used to estimate snow depth and snow and ice density, which in turn impacts ice thickness retrievals.This creates additional challenges as differences in snow and ice density and snow depth values used can lead to large biases in ice thickness (e.g., Zygmuntowska et al., 2014).For example, for multiyear ice, Kwok et al. (2009) use a density of 925 kg m −3 while Laxon et al. (2013) use 882 kg m −3 .According to Kurtz et al. (2014), this could lead to a thickness difference of 1.1 m for a typical multiyear ice floe with a thickness of 60 cm snow-ice freeboard with a 35 cm deep snow cover.Similarly, given an ICESat freeboard of 0.325 m with an estimated 0.25 m of snow (density 300 kg m −3 ) atop the ice (density of 900 kg m −3 ), we would compute a sea ice thickness of 1.5 m.Yet, if there had been only 0.15 m of snow, the ice would be 2.2 m thick, a change of 0.70 m or 46 % of the original estimate.
At present, there is no long-term sea ice thickness data set that applies these parameters in a consistent manner regardless of which instrument is used.It is nevertheless encouraging that all of the records show similar spatial patterns of ice thickness (Fig. 2: left column), which, while boosting confidence in the data, also demonstrates persistence of the general spatial pattern of Arctic sea ice thickness from 1979 to the present.Mean thicknesses are greater along the northern coasts of the Canadian Arctic Archipelago and Greenland, where there is an onshore component of ice motion resulting in strong ridging.Mean thicknesses are lower on the Eurasian side of the Arctic Ocean, where there is a persistent offshore ice motion and ice divergence, leading to new ice growth in open-water areas.When viewed for the Arctic as a whole, the combined records show a decline through time in ice thickness, although this must be tempered by differences in physical assumptions used to retrieve thickness (Zygmuntowska et al., 2014).

PIOMAS ice thickness patterns and volume
Since there is no long-term consistent ice thickness data set with which to evaluate ice volume trends, we assess CMIP5 volume trends from 1979 to 2013 against estimates from  (Kwok et al., 2009) and with in situ and airborne EM observations from the sea ice thickness CDR.They established uncertainty estimates for PIOMAS ice volume and trends and concluded that PIOMAS provides useful estimates of changes in ice volume.Comparisons were made for all months of the year.Laxon et al. (2013) compared concatenated time series of ICESat and CryoSat data and found that derived trends agree within the established uncertainty limits from PIOMAS, further arguing that PIOMAS is useful for climate model evaluation.
In this paper, our focus is on the representation of March ice thickness and volume.It is, therefore, useful to assess PI-OMAS for this period in particular.We include data from ERS-1 and IceBridge, which have not been used in previous comparison studies.To this end, the middle column of Fig. 2 (center column) shows the PIOMAS thickness estimates corresponding to the five observational thickness data sets used in this study.The right-hand column of Fig. 2 shows corresponding scatterplots between PIOMAS and the observations for each individual year of the observations (plotted as different colors for each year of data, except for the in situ CDR, which includes 29 years of data, and ERS-1, which was provided as mean field over the entire time period).The CDR data in the top scatterplot includes thicknesses from in situ moorings, United States submarines and airborne EM.Statistics are summarized in Table 2.
The observed thickness patterns and magnitudes generally compare well with those simulated by PIOMAS, providing further confidence that PIOMAS can be used to assess the CMIP5 volume trends during winter.However, the scatterplots reveal a general negative (too thin) thickness bias in PIOMAS for higher thickness values (found near the Canadian Archipelago and north of Greenland).The reverse tends to be true for areas of thin ice.In addition, PIOMAS tends to have a tongue of thicker ice (∼ 2.5 m) that stretches out   2011) estimated an upper bound for the uncertainty of decadal PIOMAS trends of 1 × 10 3 km 3 dec −1 .Given the large observed volume trend of 2.8 × 10 3 km 3 dec −1 in March, PIOMAS is a suitable tool for assessing long-term trends in CMIP5 models.Daily ice volume estimates at 25 km spatial resolution from PIOMAS were averaged to create monthly means of ice volume over the 1979-2013 record to compare with the CMIP5 output.

Ice thickness
We first compare observed and CMIP5 mean sea ice thickness fields averaged over the areas of coverage corresponding to each of the different remotely sensed data sets (Fig. 3).The median spring thickness from each data set is shown as a solid red line, together with the 10th and 90th percentiles (green lines) and the interquartile range (gray shading).
Ice thicknesses from the 33 individual CMIP5 models are presented as box and whisker plots based on data for model years 1981 to 2010, where the boxes represent the interquartile range in thickness (25th to 75th percentiles), the whiskers the 10th and 90th percentiles, and the horizontal bars and asterisks within each box define the median and mean, respectively.As mentioned earlier, the 1981-2010 averaging time period for CMIP5 is somewhat arbitrary as we cannot expect the natural variability in the models to be in phase with observed natural variability.This comparison therefore only reflects how well the long-term mean thickness fields in the models compare to the different observational data sets, such that if the spread of the observations for a given platform/instrument falls within the spread for a given model, we conclude that the model captures the thickness.If the spread does not overlap, then there is a bias.We may additionally expect that the trend in thickness should be captured in the distributions of model thickness if one exists in those models.
In general, the thickness distributions from the models overlap with those from each remotely sensed data set.There are exceptions.Several models have negative biases in comparison to the in situ, ERS-1 and IceBridge data sets, with means below the 10th percentile of the observations.A negative bias with respect to the in situ and ERS-1 data is not surprising as these observations are based on a thicker ice regime than the more recent two decades.However, some models that show a negative bias compared to the in situ and ERS-1 data also show a negative bias with respect to the Ice-Bridge data (e.g., bcc-csm1-1, CanCM4, CanESM2, CNRM-CM5, the GFDL models, MIROC ESM, MIROC-ESM-CHEM, MIROC4h, the MPI models and MRI-CGCM3), suggesting that the models are underestimating in regions of thick ice north of Greenland and the Canadian Archipelago sampled by the IceBridge flights.
The CMIP5 models show the best agreement with the ICESat and CryoSat observations.The ICESat and CryoSat statistics integrate more regions of thin ice along with the thick-ice regions north of Greenland and the Canadian Archipelago, resulting in overall smaller mean thickness values compared to the other data sets.The coverage is also from a time period of significant ice thinning throughout most of the Arctic Ocean (e.g., Kwok and Rothrock, 2009;Kwok et al., 2009;Laxon et al., 2013).In comparison with ICESat, all but two models (CESM1-WACCM and FGOALS-g2) have a mean thickness within the 10th and 90th percentiles of the observed value.Mean thicknesses during the CryoSat period are slightly smaller than for ICESat, resulting in eight models (CESM-CAM5, CESM1-WACCM, CSIRO-MK3-6-0, EC-EARTH, This good agreement with PIOMAS must be tempered by recognition of the pronounced inter-model spread in ice thickness aggregated across the Arctic Ocean and large differences in the spatial patterns of thickness (Fig. 4).Few models capture the pattern of thin ice close to the Eurasian coast and several additionally fail to place the thickest ice along the Canadian Arctic Archipelago and northern coast of Greenland (i.e., both ACCESS models, bcc-csm1-1, CanCM4, CanESM2, CSIRO-Mk3-6-0, FIO-ESM, both GISS models, HadCM3, inmcm4, MIROC-ESM-CHEM).Instead, many models show a ridge of thick ice north of Greenland and across the Lomonosov Ridge towards the East Siberian Shelf, with thinner ice in the Beaufort-Chukchi and the Kara-Barents seas.As a whole, the models tend to overestimate ice thickness over the central Arctic Ocean and along the Eurasian coast and underestimate ice thickness along the North American coast and north of Greenland and the Canadian Archipelago.
An analysis of spatial pattern correlations and root-meansquare error (RMSE) of ice thickness between CMIP5 models and ICESat observations documents serious model shortcomings.Spatial pattern correlations are less than 0.4 for all but three models (CCSM4, MIROC5 and CGCM3) (Fig. 5 (left-hand side)) and RMSE values generally exceed 0.7 m (Fig. 5 (right-hand side)).These spatial pattern correlations are significantly smaller than those between ensembles from the same model, suggesting that the poor correlations cannot be explained by natural variability but rather a bias within the models.Interestingly, the spatial correlations in thickness between the CMIP5 models and PIOMAS are generally higher than those between the CMIP5 models and the ICESat data (not shown).The reason for this is that both PIOMAS and many of the CMIP5 models have a spurious tongue of fairly thick ice extending across the Arctic Ocean towards the Chukchi and East Siberian seas.Kwok (2011) previously attributed deficiencies in ice thickness fields in the CMIP3 models to their inability to simulate the observed pattern of sea level pressure and hence surface winds.For example, if a model fails to produce a wellstructured Beaufort Sea High (BSH) in the correct location north of Alaska, this will adversely affect the Beaufort Gyre ice drift and hence the thickness pattern.Models with overly thick ice offshore of Siberia suggest the presence of a strong anticyclonic drift that extends close to the coast, allowing ice to pile up on the upwind side.However, the presence of thick ice on the Siberian side could also be the result of a higher frequency of occurrence of a specific atmospheric circulation anomaly pattern.
We evaluated the annual mean sea level pressure fields and the associated surface geostrophic wind fields in the CMIP5 models (Fig. 6) against fields from four different atmospheric reanalyses.Note that correlations between the reanalyses themselves range between 0.91 and 0.99 (Table 3).In general, most models feature a closed BSH, though in some it is not well-defined (e.g., MPI-ESM-LR) or is shifted towards the pole (e.g., CanCM4, CSIRO-Mk3-6-0, MIROC-ESM) or towards the eastern Arctic (e.g., IPSL-CM5A-LR).Models that do not feature a closed BSH (e.g., bcc-csm1-1, CCSM4, CESM1-WACCM, FGOALS-g2, FIO-ESM, IPSL-CM5A-MR, MIROC-ESM-CHEM and NorESM1-M) generally also have poor spatial thickness pattern correlations and large RMSEs (Fig. 4).The exception is CCSM4.While CCSM4 shows good spatial pattern correlation in ice thickness and the lowest RMSE of all the models (computed with respect to ICESat), the mean sea level pressure pattern does not feature a closed BSH, and the mean flow fails to capture the Beaufort Gyre and the Transpolar Drift Stream.Thus, while part of the failure of models to capture the observed thickness distribution can be explained in terms of biases in the surface wind fields, this is not always the case.This points to additional issues such as near-surface vertical stability that affects the surface wind stress as well as sea ice rheology, ocean heat fluxes and the ice thickness itself as this affects ice mobility.

Ice volume
Recent studies suggest that because of thinning, sea ice volume is declining faster than ice extent (e.g., Schweiger et al., 2011).Ice volume is also a more important climate indicator than extent through its direct connection with the sea ice energy budget.The rates of ice volume loss for March and September calculated over the 1979-2013 period from PI-OMAS are −9.9 and −27.9 % dec −1 , respectively.
The CMIP5 multimodel ensemble mean March ice volume averaged over this period agrees well with PIOMAS, and remains within 1 standard deviation (1σ ) throughout the 1979-2013 time period (Fig. 7).When viewed as a group, this indicates that the models realistically capture the last 3 decades of changes in Arctic ice volume, assuming that PIOMAS provides a good representation of these changes.However, while we find good agreement between PIOMAS ice volume and the CMIP5 multimodel ensemble mean, ice volume varies substantially between different models.Average March ice volume ranges from around 18 000 km 3 (CanESM2) to 48 000 km 3 (CESM1-WACCM) (Fig. 7 dashed lines).Additionally, as noted earlier, few models correctly capture the observed spatial pattern of thickness.Given the wide range of CMIP5 model results, the close match of the ensemble average with the PIOMAS average is somewhat puzzling.We speculate that modeling groups participating in   the CMIP5 collection may each individually be working to construct and tune their models to match observed historical ice extent and thicknesses.If the effort or success by these groups is randomly distributed, then a close match of the ensemble mean volume and PIOMAS volume, which assimilates observed sea ice concentrations and is tuned to thickness observations, would be expected.
To evaluate CMIP5 ice volume further, volume trends were computed using linear least squares with a test statistic that combines the standard error of both the model and the observation and accounts for the effects of temporal autocorrelation.This approach, which follows Santer et al. (2008), was previously used by Stroeve et al. (2012a) to examine ice extent trends in both the CMIP3 and CMIP5 models and how those trends compared to the observed trend.As in Stroeve et al. (2012a), the null hypothesis is that the CMIP5 volume trends are consistent with those from PIOMAS.Ice volume trends during March from individual ensemble members range between −0.49 × 10 3 km 3 dec −1 (INMCM3) to −4.28 × 10 3 km 3 dec −1 (MIROC5) as assessed over the period of 1979 to 2013 (Table 4 and Fig. 8).The corresponding PIOMAS trend is shown in gray shading for one (dark gray) and two standard deviations (light gray).Note that the gray shading does not represent the uncertainty in the PIOMAS volume estimates, which Schweiger et al. (2011) estimate to  be 1 × 10 3 km 3 .Therefore, the uncertainty in PIOMAS could be larger than we show.
While all model trends are negative, 10 ensemble members have trends that are insignificantly different from 0 (i.e., 2σ of the trend overlaps with 0).Ignoring ensemble members with trends indistinguishable from 0, 36 of the remaining ensemble members have mean March volume trends that are slower and two that are faster (IPSL-CM5A-LR and MIROC5) than the 2σ uncertainty of the PIOMAS trend.Nevertheless, the majority of the ensemble member trends cannot be considered incompatible with PIOMAS.
Finally, several ensembles show pronounced interannual variability in ice volume, with periods of increasing volume not captured by PIOMAS (not shown).Interannual variability in the ensembles likely reflects variability in atmospheric forcing.Averaging together the individual ensemble means from each model yields a multimodel ensemble mean trend in March ice volume of −1.95 × 10 3 km 3 dec −1 (or −6.8 % dec −1 relative to the 1979-2013 mean).This is smaller than the PIOMAS rate of decline of −2.79 × 10 3 km 3 dec −1 (or −10.3 % dec −1 ) but remains within 2σ uncertainty of that value.
It is important to recognize that the difference in trends between PIOMAS and CMIP5 ensemble members can arise from systematic errors in the PIOMAS or CMIP5 models, from uncertainties in the atmospheric reanalysis or from the trend in the PIOMAS time series including significant contributions from natural climate variability.For example, Day et al. (2012) attribute about 0.5 to 3.1 % of the 1979 to 2010 September sea ice extent trend to changes in the Atlantic Meridional Overturning Circulation.The range of trends for individual models summarized in Table 4 indicates that natural variability may be a strong contributor to ice volume trends over the last 35 years.However, the models themselves seem to vary strongly in the amount of natural variability in their integrations.The CSIRO-Mk3-6-0 trends range from −3.19 to −0.67 × 10 3 km 3 dec −1 between its 10 ensemble members while HadCM3 features a substantially smaller range (−2.34 and −1.01 × 10 3 km 3 dec −1 ) for its 10 ensemble members.This makes the identification of model biases or the filtering of models based on how well they represent observed trends difficult.

Conclusions
Evaluating model skill is important given the large role that the model projections play in framing the debate on how to address global environmental change.While the CMIP5 models more accurately hindcast sea ice extent than the CMIP3 models (e.g., Stroeve et al., 2012a), trends from most models remain smaller than observed, leading to the concern that a seasonally ice-free Arctic state may become a reality sooner than suggested by such models.Here, we have evaluated sea ice thickness and volume from 33 CMIP5 models through comparisons with observed records of sea ice thickness and ice volume simulated by PIOMAS.While uncertainties regarding sea ice thickness are not as well-quantified as those regarding ice extent or ice area, we find that the CMIP5 models show a general thinning and reduction in ice volume, in agreement with observations.The CMIP5 ensemble mean ice volume trend over the 1979-2013 period is smaller but within the uncertainties of the PIOMAS values.Although the Arctic-wide ensemble mean ice volume and trend is strikingly similar to the PIOMAS sea ice volume and trend, there are large variations among models.Furthermore, while mean thickness and volume for the Arctic Ocean as a whole appear well represented by many of the models, spatial patterns of sea ice thickness are poorly represented.Many models fail to locate the thickest ice off the coast of northern Greenland and the Canadian Arctic Archipelago and thinner ice over the East Siberian Shelf.Part of the explanation lies in deficiencies in the representation of the details of the prevailing atmospheric circulation over the Arctic Ocean.This is a critical failure as projections of ice extent are strongly related to the initial ice thickness pattern distribution (e.g., Holland et al., 2010;e.g., Holland and Stroeve, 2011).Moreover, Holland and Stroeve (2011) suggested that the variance of September sea ice extent anomalies could be explained by the winter-spring ice thickness increases as the ice cover thins and transitions towards a sea-sonal ice cover.Thus as ice thins, the ability of models to represent the spatial thickness distribution may become more relevant.
Several techniques have been advanced in the literature to subselect models based on different metrics of model performance during the historical time period, with the aim of reducing uncertainty as to when an ice-free Arctic may become a reality (e.g., Wang andOverland, 2009, 2012;Boe et al., 2009;Massonnet et al., 2012).It is clear from our study that even if a model captures the extend of the seasonal cycle, or trends in extent and/or volume, the model may still represent the prevalent atmospheric circulation patterns and thickness distributions poorly.Indeed, we show that a model may show the trend in ice volume or ice extent reasonably correctly, yet fail to locate the thickest ice north of Greenland and the Canadian Archipelago.Only two models capture both the spatial pattern of sea ice thickness and the general pattern of atmospheric circulation (MIROC5 and MRI-CGCM3), further reducing confidence in the veracity of future projections based on CMIP5 climate models.The fact that both models display rather different trends in ice volume (−3.6 × 10 3 km 3 dec −1 and −1.15 × 10 3 km 3 dec −1 ) does not bode well for constraining climate models based on sea ice thickness patterns alone.

Figure 1 .
Figure 1.Variability of March ice thickness in seven models from 1981 to 2010.The values are the coefficient of variability (SD/average).This is a normalized measure of variability so that variability can be compared spatially and between models.

Figure 2 .
Figure 2. Comparison of submarine, ERS-1, ICESat, IceBridge and CryoSat-2 sea ice thickness (zi) fields (left column) for each campaign's period of record, with ice thickness fields simulated by PIOMAS (middle column) and corresponding scatterplots (right column).PIOMAS fields are the average March thicknesses for the periods that correspond to observational records.In the scatterplots, individual years are shown in different colors, except for ERS-1, which was provided as a mean field for the entire time period.

Figure 3 .
Figure 3.Comparison of thickness distributions between five observational data sets, PIOMAS and 33 individual CMIP5 models.Model results are presented as box and whisker plots from 1981 to 2010, where the boxes represent the interquartile range (25th to 75th percentiles) and the horizontal bars and asterisks within each box define the median and mean, respectively.The median spring thicknesses from each observational data set and PIOMAS are shown as a solid red line, together with the 10th and 90th percentiles (green lines) and the interquartile range (gray shading).

Figure 4 .
Figure 4. Spatial patterns of sea ice thickness from 1981 to 2010 from 33 CMIP5 models and PIOMAS.

Figure 5 .
Figure 5. Spatial pattern correlations (top) and root-mean-square error (RMSE) (bottom) of ice thickness in 27 CMIP5 models and ICESat.Filled and empty circles indicate correlations that are significant at the 99 and 95 % level.

Figure 6 .
Figure 6.Mean annual sea level pressure and geostrophic wind from 27 CMIP5 models and from ERA-Interim spanning 1981-2010.Contour interval is 1 hPa.Near-surface geostrophic wind is used as a proxy for sea ice motion and is shown by red vectors.Vector length is proportional to wind speed.Vectors are curved tangents to the instantaneous flow.

Figure 7 .
Figure 7. Change in Arctic sea ice volume as shown in the CMIP5 ensemble and in PIOMAS for the period of 1979 to 2012, for March.Gray shading shows the ±1 standard deviation of CMIP5 ensemble.Upper and lower dashed lines show maximum and minimum ice volume of the model ensemble.Multimodel ensemble mean ice volume is shown as the black line.

Figure 8 .
Figure 8. March ice volume trends from 1979 to 2013 for all 92 individual CMIP5 model ensembles as well as the multimodel ensemble mean (shown in black) with confidence intervals (vertical lines).The 1 and 2σ confidence intervals of PIOMAS trends are shown in dark gray shading (1) and light gray shading (2).

Table 1 .
Listing of models used in the analysis together with information on the sea ice model components and physics.For some models this information is not available in publications or on web sites.

Table 2 .
Mean ice thickness bias, root-mean-square error estimate and correlation between PIOMAS modeled ice thickness and thicknesses from different remotely sensed data sets.
Schweiger et al. (2011) to the Chukchi and East Siberian seas.The observations typically do not depict this feature, especially the ICESat record.PIOMAS also underestimates the ice thickness in the East Greenland Sea.The underestimation of thick ice and overestimation of thin ice by PIOMAS was previously noted inSchweiger et al. (2011).

Table 3 .
Spatial correlations between observed mean annual sea level pressure from four different reanalysis data sets and from the CMIP5 models.Ranks of correlations are given in parentheses, running lowest to highest.Because of difficulties in reducing surface pressures to sea level, pressures over Greenland have been screened out.Correlations between the different reanalyses are also included as well as information on whether or not the models represent a closed Beaufort Sea High (BSH).

Table 4 .
Linear trends in Arctic sea ice volume for March based on the period 1979 to 2013 from 33 CMIP5 models and PIOMAS.For models with more than one ensemble member, the mean trend is given along with the range of the trend (in parenthesis).Trends are listed as km 3 per decade.Trends statistically different from 0 at 95 and 99 % significance are denoted by + and ++, respectively.