Sea-ice extent provides a limited metric of model performance

Introduction Conclusions References


Introduction
Individual satellite retrievals of sea-ice concentration are often taken as the truth for model-evaluation purposes.However, significant differences exist between estimates Figures

Back Close
Full of sea-ice concentration based on different satellite algorithms.These differences imply some uncertainty in our knowledge of the "true" sea-ice coverage.In this contribution, we examine in as how much this uncertainty limits the reliable assessment of the quality of simulated sea-ice coverage.We focus primarily on the two integrative measures that are most widely used for the quantitative assessment of model quality: first, sea-ice area, which is just the total area of sea ice.And second, sea-ice extent, which is the total area of the ocean surface in which significant amounts of sea ice exist.To calculate sea-ice extent in gridded data, one usually adds the area of all grid cells with an ice concentration of more than 15 %.While sea-ice extent was initially only used to assess the observed long-term evolution of the sea-ice cover (e.g.Zwally et al., 1983;Parkinson et al., 1987), it has now become common practice to use sea-ice extent also as the primary (and often sole) variable to assess the quality of modeled sea-ice coverage (e.g.Stroeve et al., 2007Stroeve et al., , 2012;;Massonnet et al., 2012).For all aspects of air-ice-sea interaction, sea-ice extent is less important than seaice area.Such interaction includes for example the surface albedo, the response of the ice cover to wind forcing, and the heat exchange between the ocean and the atmosphere.This was already acknowledged by early works on satellite remote sensing (c.f.Zwally et al., 1983).The focus on sea-ice extent is, nevertheless, understandable since this parameter can be more reliably observed from ships, airplanes and satellites than sea-ice area.This then allows both for a better assessment of the long-term (including pre-satellite) evolution of the ice cover and reduces the uncertainty of the observational data against which model simulations are compared.This reduction in uncertainty in the observational data comes, however, at a price, in that sea-ice extent can give misleading results regarding model quality.Consider the trivial, fictitious observed sea-ice cover in three grid cells shown in Fig. 1a.Compared to these observations, a model could simulate a smaller sea-ice area that nevertheless results in a larger sea-ice extent because of a slight shift in the location of the sea-ice cover (Fig. 1b).A model could also simulate a larger sea-ice area with a smaller sea-Figures

Back Close
Full ice extent (Fig. 1c).Hence, small shifts in the location of the modeled sea-ice pack, in particular in the marginal ice zone with its strong gradients in sea-ice concentration, can result in misleading results regarding the actual bias in modeled sea-ice cover.
In addition to these grid-independent issues, there is also a grid-dependent issue related to the usage of sea-ice extent vs. sea-ice area.Generally, sea-ice extent is the smaller the higher the grid resolution is.At very high resolution, it converges to the same value as sea-ice area, since then almost all grid cells will either be fully ice covered or fully ice free.
We became aware of these issues when we analyzed results from the Max-Planck-Institute for Meteorology Earth System Model MPI-ESM: compared to observations, this model has about 6 % too small a September Arctic sea-ice extent, but 20 % too small a sea-ice area (Notz et al., 2013).In contrast, this model's predecessor ECHAM5/MPIOM had about 20 % too large a September Arctic sea-ice extent, but only about 7 % too large a sea-ice area.This gave rise to the question if too strong a focus on sea-ice extent with its smaller observational uncertainty can give misleading results regarding the quality of modeled sea-ice coverage, and which implications this has for quantitative model evaluation.
In this contribution, we examine these questions by analyzing output from models that have contributed to the Coupled Model Intercomparison Project, phase 5 (CMIP5).Our aim is to give the reader a quantitative assessment, and an explanation, for the different outcome in model-data comparison studies based on sea-ice extent vs. sea-ice area.Because positive and negative regional biases cancel in the calculation of either sea-ice extent or sea-ice area, we additionally analyse these measures' relationship to the mean absolute bias in sea-ice concentration, which avoids such cancellation of errors.We also touch upon the issue of local biases in sea-ice concentration, which are relevant for a more detailed analysis of model quality.Our aim is to allow the reader an informed assessment of which parameter to use for a specific purpose and how to handle the related observational uncertainty.In particular, we put our findings into the Introduction

Conclusions References
Tables Figures

Back Close
Full The satellite products and the model data that we use are introduced in Sect. 2. In Sect.3.1, we analyse the compactness of the modelled and satellite-retrieved sea-ice cover and explain in Sect.3.2 why about half of the CMIP5 models simulate a compact ice cover in summer time, while the other half does not.Based on these insights, in Sect.3.3 we analyse the different biases in sea-ice extent and area, and in their trends.In Sect.3.4 we examine the impact of grid resolution, followed by an analysis of cancelling negative and positive biases in Sect.3.5.Section 3.6 then contains an analysis of the impact of internal variability on the assessment of model quality.In Sect.3.7 we briefly touch upon some issues related to the non-linearity of sea-ice extent.We discuss the implications of these findings for model-evaluation purposes in Sect. 4. Our main findings are finally summarized in Sect. 5.

Models and data
For our analysis, we focus on the period 1979-2005, which is the overlapping period of the most-widely used satellite records of sea ice coverage and the "historical" simulations of the CMIP5 protocol (Taylor et al., 2012).These "historical" simulations are forced by the observed evolution of greenhouse gases, solar radiation, etc.For all 117 historical simulations that we consider here, time series of monthly sea-ice extent and area are calculated from their monthly sea-ice concentration fields.The sea-ice extent is calculated as the total area of all grid cells with at least 15 % sea-ice concentration.For sea-ice area, the area of all grid cells is multiplied by their sea-ice concentration and then added.For sea-ice area and extent, linear trends are calculated as a least-squares fit to the time series.Ensemble-mean and multi-model mean time series of sea-ice extent and sea-ice area are calculated as the ensemble-mean and the multi-model mean of the individual simulations' time-series of these two parameters, and not from the ensemble-mean or multi-model mean concentration fields (compare Sect. 3.7).Introduction

Conclusions References
Tables Figures

Back Close
Full The model results are compared against satellite retrievals of sea-ice concentration.We here discuss primarily comparisons against the two satellite algorithms for sea ice concentration that are most widely used for model-data intercomparison studies: the Bootstrap algorithm (Comiso, 1986) and the NASA Team algorithm (Cavalieri et al., 1984) that forms the basis for the NSIDC Sea-Ice Index (Fetterer et al., 2002(Fetterer et al., , updated 2012)).Additionally, we consider retrievals based on the ASI algorithm (Kaleschke et al., 2001;Spreen et al., 2008).The Bootstrap algorithm is probably more reliable than the NASA Team algorithm, because the latter has been found to be biased low compared to independent observations (e.g Agnew and Howell, 2003;Partington et al., 2003).The Bootstrap algorithm, in contrast, results in estimates of sea-ice concentration that are very close to the "Climate Data Record of Passive Microwave Sea Ice Concentration" (CDR, Meier et al., 2011) that is a merged product of different algorithms with the aim to provide a consistent time series of sea-ice concentration.In summer, estimates of sea-ice area of the Bootstrap algorithm also agree favourably with estimates based on the ASI algorithm from SSMI satellite data and the higher resolved AMSR-E satellite data, while estimates of sea-ice area based on the NASA team algorithm are significantly lower (Fig. 2).For these reasons, we tentatively trust that estimates of sea-ice coverage based on the Bootstrap algorithm are closer to the real sea-ice cover than those of the NASA team algorithm.Unless noted otherwise, we will therefore use the term "observations" to refer to retrievals based on the Bootstrap algorithm.For our discussion of the impact of sea-ice area vs. sea-ice extent in model-data intercomparison studies, we will take both the estimates of sea-ice area and sea-ice extent from the Bootstrap algorithm as the "truth" and will only get back to the issue of the larger uncertainty in estimates of sea-ice area in our discussion in Sect. 4.There we will also discuss in more detail the differences between the various algorithms shown in Fig. 2

Conclusions References
Tables Figures

Back Close
Full

The frequency distribution of sea-ice concentration
The Bootstrap and the NASA Team algorithms result in similar estimates of mean September Arctic sea-ice extent for the period 1979-2005, namely 7.3 million km 2 for the Bootstrap algorithm and 6.9 million km 2 for the NASA Team algorithm.It is usually assumed that this difference is small enough to allow for the meaningful, quantitative comparison of modeled sea-ice extent against the estimated extent from an individual satellite retrieval.The difference in September mean sea-ice area for the same period is much larger, with 6.3 million km 2 for the Bootstrap algorithm compared to only 5.2 million km 2 for the NASA Team algorithm.This much larger difference is the main reason why the sea-ice area estimate of an individual satellite retrieval is usually not used for model-evaluation purposes.Such large relative difference arises, however, only in summer: in March, both the estimates of sea-ice area and of sea-ice extent are similar between the two algorithms: mean 1979-2005 sea ice extent is 15.9 million km Introduction

Conclusions References
Tables Figures

Back Close
Full much of the melt-water covered sea ice as open water, the Bootstrap algorithm more strongly compensates for this well-known bias compared to the NASA Team algorithm.The two versions of the ASI algorithm that were analysed for the present study show a similarly compact ice cover as the Bootstrap algorithm.
The large difference between the NASA Team and the Bootstrap algorithms in the estimated frequency of high sea-ice concentration causes their large difference in estimated sea-ice area.In winter time, the estimated frequency of high sea-ice concentration is much more similar for the two algorithms (Fig. 3c, d), which explains the smaller difference of estimated sea-ice area for that season.Differences in estimated sea-ice extent come about by different estimates of the frequency of low sea-ice concentration.Since at this end of the spectrum, differences between the two algorithms are small both in summer and winter, both algorithms result in similar estimates of sea-ice extent.
Examining the frequency distribution of summer sea-ice concentration in the CMIP5 model simulations, we find that these simulations can be divided into two groups.One group simulates in summer a compact ice cover (red panels in Fig. 4), while the other group simulates a loose ice cover (blue panels in Fig. 4).In winter, all models simulate a compact ice cover (not shown).Somewhat arbitrarily, we chose a normalised frequency of 0.4 for the 90. . . 100 % concentration band as the dividing line between simulations with a compact ice cover and simulations with a loose ice cover.
The emergence of these two classes of models has important implications for the assessment of model quality based on sea-ice extent.Before we turn to these implications, we should however briefly examine the physical processes that are most likely responsible for the division of the models into these two classes.

Processes that control modelled sea-ice concentration
To understand the emergence of the two classes of simulations, we consider the transition from the compact winter-and spring time ice cover into the sometimes compact and sometimes loose summer ice cover.We do so by analysing the percentage change Introduction

Conclusions References
Tables Figures

Back Close
Full in sea-ice volume, thickness and concentration in the central Arctic (north of 85 • N) between May and December.In this region, the two satellite algorithms that we examine here show only small seasonal changes in sea-ice concentration: excluding the satellite pole hole, the NASA Team algorithm shows in the area 85 • N-87.5 • N from May to August a mean reduction in sea-ice concentration of about 7 %, and from September to December an increase in sea-ice concentration of about 10 %.The Bootstrap algorithm shows changes of less than 1 % for both periods.
In the models with a generally loose ice cover, the concentration in this region decreases during the spring-summer transition much more than in the satellite record (Fig. 5a).In these models, most of the volume loss in this region is therefore governed by changes in ice concentration.In simulations with a compact ice cover, the decrease in sea-ice concentration is more similar to that of the satellite record.In these simulations, the volume loss is primarily governed by changes in ice thickness.The distribution is simlar for the transition from summer to winter (Fig. 5b): from September to December, thickness increases comparably little (or even slightly decreases) in models with a loose summer ice pack, while it increases more in models with a compact summer ice pack.
These differences point to differences in the parameterisation of thinning and thickening versus lateral melting and growth as the main responsible for the emergence of two classes of models.During periods of melting, these parameterisations must capture the two main processes that can contribute to a reduction in mean sea-ice concentration: first, the total melting of thin ice within a certain grid cell, and second the lateral melt of the ice pack.In models with a sub-grid scale ice-thickness distribution, the first process is explicitly resolved and only the second one needs to be parameterised.In models with only a single ice class in a specific grid cell, both processes must be parameterised (e.g Hibler, 1979;Notz et al., 2013).In more complex parameterisations, and in all models with an ice-thickness distribution, the two processes are treated independently.The description of lateral melt is then most realistically based on a floe-size distribution within a certain grid cell (Hunke and Lipscomb, 2010;Notz, 2012).Introduction

Conclusions References
Tables Figures

Back Close
Full It is likely that differences in these parameterisations lie at the heart of the emergence of the two model classes, since the more advanced models with an ice-thickness distribution usually show a compact ice cover throughout summer with comparably small seasonal changes in sea-ice concentration, while models that only simulate a single ice class primarily show a loosened ice cover in summer time and large changes in ice concentration.
However, there are exceptions to this rule: CCSM4, HadGEM2 and IPSL-CM5A do have a sub-grid scale ice thickness distribution but produce a loose sea-ice cover.In contrast, the MIROC models and CSIRO-Mk3-6-0 do not have a sub-grid scale icethickness distribution but produce a compact sea-ice cover.This points to the fact that not only thermodynamic factors influence the seasonal cycle of sea-ice concentration, but also dynamic factors: a divergent wind field, for example, will cause a loosening of the ice pack with no direct change in sea-ice volume.Hence, even a model with a very advanced parameterisation of thermodynamic growth and melt can result in unrealistic seasonal cycles in sea-ice concentration if the atmospheric circulation is too divergent.
Analysing the contribution of such dynamical effects is beyond the scope of this study.

Extent versus area
In the context of the present study, one difference between a loose ice cover and a compact one is most relevant: differences between sea-ice extent and sea-ice area are comparably small for compact sea ice, because of the large number of grid cells with a very high ice concentration.In contrast, the difference between extent and area is much larger for a loose ice cover (see Fig. 6a-c).
This obviously has direct consequences for the analysis of model biases based on these two measures.For example, we find that biases relative to the Bootstrap retrieval are for simulations with a compact sea-ice cover similar for sea-ice area and for sea-ice extent (red dots are close to red line in Fig. 7a).In particular, all simulations with a compact sea-ice cover that are within ±10 % of the retrieved sea-ice extent are also within ±10 % of the retrieved sea-ice area.For the simulations with a loose ice cover, we find Introduction

Conclusions References
Tables Figures

Back Close
Full that those models that underestimate sea-ice extent relative to the Bootstrap retrieval have a stronger percentage bias in sea-ice area than they have in sea-ice extent, while those simulations that overestimate sea-ice extent have a smaller percentage bias in sea-ice area than in extent (Fig. 7a).A number of simulations with a loose ice cover that fall within ±10 % of the retrieved sea-ice extent are clearly outside the ±10 % range of the retrieved sea-ice area, and vice versa.Hence, a focus on sea-ice extent can give misleading results regarding model quality compared to a focus on sea-ice area.
Relative to the satellite-retrieved estimates based on the NASA Team algorithm, we find that biases for sea-ice extent are similar to biases for sea-ice area for simulations with a loose ice cover (blue dots close to green line in Fig. 7a).Simulations with a compact ice cover that overestimate the mean sea-ice extent compared to the NASA Team algorithm in contrast have a stronger percentage bias in sea-ice area, and vice versa.
For March, all simulations and both satellite retrievals have a compact ice cover.Hence, percentage biases in sea-ice area are for all simulations almost identical to the biases in sea-ice extent (Fig. 7b).
To understand this behaviour of simulations with a compact ice cover versus those with a loose ice cover, we need to consider that the former have a small difference between sea-ice extent and sea-ice area, while the latter have a larger difference.Figure 8 illustrates how this explains the different behaviour of the two model families: if any of the loose-ice simulations with their comparably large difference between sea-ice extent and sea-ice area results in too small a mean sea-ice extent, this simulations' bias in sea-ice area will be comparably large.If, however, the simulation resulted in too large a sea-ice extent, its bias in sea-ice area would be comparably smaller -simply because the difference between extent and area is larger in the simulations than in the observations.For simulations with a compact ice cover, biases in extent and area relative to the Bootstrap algorithm are very similar, because these simulations' difference between sea-ice extent and sea-ice area is similar to that of the Bootstrap observations.Compared to observations based on the NASA Team algorithm, the simulations Introduction

Conclusions References
Tables Figures

Back Close
Full with a compact ice cover have generally a lower difference between extent and area, which explains their contrasting behaviour relative to the NASA Team algorithm.
In winter, all simulations result in a compact sea ice cover.Therefore, in winter they have a similar difference between sea-ice extent and sea-ice area as the Bootstrap observations, which explains their consistent behaviour compared to winter-time observations.
Examining trends in sea-ice area and sea-ice extent, we find that the Bootstrap retrieval gives almost the same number for both these measures, namely an average loss of 0.56 million km 2 decade −1 in sea-ice extent and a loss of 0.58 million km 2 per decade in sea-ice area during the period 1979-2005.The models, in contrast, show inconsistent behaviour, with both smaller and larger trends in sea-ice area than in extent (Fig. 6d-f).The consistent trends in the satellite retrieval can be understood by analysing the individual trends for different ice-concentration ranges (Fig. 9).Almost all the ice loss in the Bootstrap retrievals happens within the ice-concentration range 90 to 100 %, with no compensating increase in lower ice-concentration ranges (second to last panel in Fig. 9).An ice loss at these high concentrations will have roughly the same impact on sea-ice area and on sea-ice extent.For most models, in contrast, the ice loss is spread over a wider range of sea-ice concentrations.In addition, the grid cells with high ice concentration often only lose some of their ice, which then causes an increase of the number of grid cells with intermediate ice concentration.This compensation then causes a smaller loss of sea-ice extent than of sea-ice area.Some models, however, also show a faster loss in sea-ice extent than in sea-ice area.This behaviour can be understood if a significant amount of grid cells with intermediate sea-ice concentration become ice free in a simulation.The entire area of these grid cells is then lost in terms of sea-ice extent, while only the fraction of these grid cells that was ice-covered is lost from sea-ice area.
The different biases in trends of area and extent in models versus the satellite retrievals obviously again have consequences for the assessment of model quality (Fig. 10a).A number of simulations result in trends that lie within ±20 % of the Boot-

Conclusions References
Tables Figures

Back Close
Full strap retrieved trends in sea-ice extent, while they lie outside the ±20 % range for the simulated trends in sea-ice area.In particular, models that have too fast a loss in seaice extent compared to Bootstrap retrievals sometimes have too slow a loss in sea-ice area compared to the Bootstrap retrievals.The same holds for the trends in winter seaice coverage (Fig. 10b).Hence, again, an assessment of model quality based on an analysis of trends in sea-ice extent can give misleading results.

Grid resolution
While the different histograms of sea-ice concentration explain most of the findings discussed so far, also differences in model grids might be relevant for different biases in sea-ice extent and in sea-ice area.As discussed in the introduction, one would generally expect a smaller difference between sea-ice extent and sea-ice area for higher grid resolution.The comparably high resolution of the satellite data set might therefore have contributed to the comparably small difference between sea-ice area and sea-ice extent for the Bootstrap algorithm (Fig. 6c).
To examine this possibility, we interpolated the gridded Bootstrap-derived sea-ice concentration field of September 2007 from the original 25 km EASE grid to each individual model grid.For each resulting sea-ice coverage on the original model grids, we then counted the number of ice-covered grid cells and calculated sea-ice extent and sea-ice area.We find that the calculated extent on all model grids is slightly larger than that obtained on the original EASE grid (Fig. 11a).For sea-ice area, in contrast, roundoff errors during the interpolation result on some model grids in a slightly larger area than on the original EASE grid, while other grids show a slightly smaller ice-covered area (Fig. 11b).Both for extent and for area, the error that is introduced by the regridding is, however, so small that the difference in observational vs. model grids is largely irrelevant.Introduction

Conclusions References
Tables Figures

Back Close
Full

Canceling biases
So far, we have examined possible misinterpretations that can arise when using seaice extent instead of sea-ice area for model-evaluation purposes.However, a number of issues actually occur through the usage of any of these two integrated measures for model evaluation purposes.One of these issues is related to the possibility of canceling biases: a model that has a large positive bias in sea-ice concentration in one region and a large negative bias in another region might simulate a better overall sea-ice area than a model that has weak negative biases in both regions.Therefore, an analysis of the mean absolute bias in sea-ice concentration gives a better indication of model performance compared to either sea-ice extent or sea-ice area.
We calculated for the period 1979 until 2005 the area-weighted, monthly mean bias and the area-weighted, monthly mean absolute bias in sea-ice concentration in the CMIP5 simulations relative to the Bootstrap retrievals.Doing so, we find obviously very good correlation between the mean percentage bias in the integrative measures extent or area and the mean bias in sea-ice concentration (compare Fig. 12a/b versus c): For the mean bias in concentration, regional errors cancel in a similar way as they do for extent and area.Therefore, a linear regression of the biases in area versus biases in mean concentration results in a high value of R 2 = 0.93.Because of the non-linearity of sea-ice extent, the linear regression of sea-ice extent on sea-ice concentration gives a slightly lower value of R 2 = 0.85.The fact that R 2 is not 1 for the linear regression of area versus mean concentration is primarily related to interpolation issues during the calculation of mean biases.
For the absolute biases in sea-ice concentration that prevent the cancellation of regional biases, however, correlation with the absolute percentage bias in the integrative measures sea-ice extent and sea-ice area is low, giving R 2 ≈ 0.5 for both measures: some models with almost no bias in sea-ice extent or area still have comparably large mean absolute concentration biases, and vice versa.This casts some doubt on the relevance of the integrative measures sea-ice extent and sea-ice area for model-Introduction

Conclusions References
Tables Figures

Back Close
Full evaluation purposes and points towards the need for a more regional estimate of model biases.

Internal variability
In the previous sections, we have shown that the more reliably measurable sea-ice extent can give misleading results regarding model quality compared to the geophysically more meaningful sea-ice area.We will now examine how important these differences are in the light of internal variability.The internal variability of the Arctic climate system is so large that it is often impossible to judge whether a difference in sea-ice coverage for a specific time period between a model simulation and observations is simply random or caused by a model deficiency (c.f.Winton, 2011).To estimate internal variability, we here take the approach of taking the spread of multiple simulations from an individual model as an estimator of internal variability.If the observed value of a certain variable lies within this ensemble spread, the ensemble spread of this particular variable is assumed to represent a "reasonable" range within which differences between models and simulations could simply be caused by internal variability.
Using this approach to examine sea-ice area (yellow shading in Fig. 6a), we find that models that have in one simulation a 1979-2005 mean area similar to that of the Bootstrap algorithm (6.3 million km 2 ), can have in another simulation a mean area that is as low as that retrieved by the NASA Team algorithm (5.2 million km 2 ).Hence, for sea-ice area, internal variability implies a similar uncertainty for model evaluation purposes as does the spread in satellite retrievals.Taking this spread as an estimate of the truth, about 50 % of the 117 CMIP5 simulations that we analyze have too small a sea-ice area for the period 1979-2005, while 30 % have too large a sea-ice area.The mean of all simulations, 5.6 million km 2 , lies within the uncertainty range of the truth.
For mean 1979-2005 sea-ice extent (Fig. 6b), models that have at least one simulation close to the Bootstrap estimate of 6.9 million km 2 have in other simulations a sea-ice extent that ranges from 5.0 million km 2 to 8.2 million km 2 .Hence, for seaice extent the uncertainty that stems from internal variability is much larger than that 3109 Introduction

Conclusions References
Tables Figures

Back Close
Full caused by the uncertainty in the satellite estimate.Only 10 % of CMIP5 simulations lie below the estimated reasonable range, while about 25 % lie above that range.Mean September sea-ice extent of all simulations is 7.1 million km 2 , a value that is between the 7.3 million km 2 of the Bootstrap algorithm and the 6.9 million km 2 for the NASA Team algorithm.
For trends of sea-ice area and sea-ice extent, finally, internal variability is by far dominating any uncertainty that arises from uncertainty in satellite retrievals (Fig. 6d, e): for the period 1979-2005, many models which generated one simulation with a sea-ice trend very close to the observed one simulate for identical forcing and slightly different initial conditions trends that are twice as strongly negative, or trends that even are positive.Hence, any trend that falls within this range might be the consequence of internal variability of the downward trend rather than a model deficiency.Using such criterion, all simulations that we consider here show a "reasonable" trend for the period 1979-2005.Since 2005, Arctic sea ice coverage in summer has decreased rapidly.The trend in September sea-ice coverage for the extended period 1979-2012, however, remains below 1 million km 2 ice loss per decade both for extent and area.As such, the trend remains comfortably within our estimated range of internal variability of modeled trends.Hence, also for the extended temporal range until 2012, we cannot positively identify any of the modeled trends as unreasonable.
Despite the fact that the range of crudely estimated modeled internal variability is comparable for sea-ice area and sea-ice extent, much more models fall outside that range for sea-ice area than for sea-ice extent.This, again, is obviously of relevance for any model evaluation study that focuses primarily on sea-ice extent.

Non-linearity
For completing our discussion of the usage of sea-ice extent for model evaluation, we should finally note that for any such comparison of modeled mean sea-ice extent with observations, the non-linearity of sea-ice extent must carefully be taken into account.
Mean sea-ice extent should normally be calculated as the mean of the sea-ice extents

TCD Introduction Conclusions References
Tables Figures

Back Close
Full of the individual simulations, and not as the sea-ice extent of the mean concentration of the simulations.Consider, for example, two simulations, one with 0 % ice concentration in a certain region and the other with 35 % ice concentration in that same region.The mean ice concentration of these simulations is larger than 15 %, and the sea-ice extent of the mean of the two simulations will be identical to the sea-ice extent of the simulation with the higher sea-ice concentration.The same issue arises when directly comparing sea-ice extent from daily observations with monthly mean fields of model output: the monthly-mean sea-ice extent as derived from a monthly-mean sea-ice concentration field will usually be larger than the monthly mean of daily estimates of sea-ice extent.
Since sea-ice area scales linearly with ice coverage, these issues do not apply for any study using sea-ice area as a metric for model quality.

Discussion
In the previous section, we have shown that for a number of reasons the sole consideration of sea-ice extent for the evaluation of model quality can give misleading results.We therefore recommend that future studies that aim at evaluating the performance of sea ice move away from the sole consideration of sea-ice extent and also consider the model performance for the more meaningful integrative quantity sea-ice area.
In doing so, differences between different satellite algorithms will play a more prominent role than for sea-ice extent (see Fig. 2).Hence, such comparison will need to take more the form of a comparison of observational data with a specific uncertainty versus model simulations with a specific internal variability.To quantify the uncertainty of the satellite data, we compared in more detail the four satellite algorithms shown in Fig. 2. We find that despite their large difference in retrieved sea-ice area, these algorithms have a similar year-to-year variability, which becomes apparent if anomalies of all satellite algorithms relative to the retrieved area in 2010 are plotted together (see Fig. 13a, b).Hence, the difference between the satellite products is largely caused by a constant offset and there is larger certainty in anomalies in sea-ice area than there is

Conclusions References
Tables Figures

Back Close
Full in its absolute value.This is important for any model simulation with assimilated seaice concentration fields: one should expect such model to at least retrieve the anomaly structure of the satellite time series, which can be very reliably estimated.
To quantify the uncertainty of sea-ice area retrievals and of the retrieved trends, we calculated the mean seasonal cycle of sea-ice area and of trends in sea-ice area for the period 2003-2010, for which all four satellite products contain data (see Fig. 13c, d).We then for each month simply subtracted the maximum value from the minimum value to obtain a time series of uncertainties.Doing so, we find that apart from July, differences in estimated sea-ice area are less than 1 million km 2 (blue curve in Fig. 13e).The same is found for an estimate of twice the standard deviation (purple curve in Fig. 13e).
Hence, a value of 1 million km 2 can be taken as a conservative approximation of the uncertainty of retrieved sea-ice area throughout the year.
Repeating such analysis for sea-ice trends, we find that uncertainties are less than 0.4 million km 2 decade −1 throughout the year, with smaller values in winter time (Fig. 13f).Hence, this value can be taken as a conservative approximation of the uncertainty of retrieved trends in sea-ice area.The bulk uncertainty of regional estimates of sea-ice concentration ranges from less than 5 % throughout winter and spring to around 10 % in summer and autumn.
A number of models have smaller biases in sea-ice area than 1 million km 2 relative to satellite retrievals.For these models, biases in this integrative measure could therefore simply be explained by the uncertainty range of the satellite retrievals.For the absolute biases in mean concentration, however, all models show larger biases towards satellite retrievals than the retrievals do among each other.For this most meaningful measure, the biases in the models are hence not explicable by measurement uncertainty.For a quantitative assessment of model quality, this measure is therefore currently most meaningful to quantify overall, integrative model performance in the light of uncertainties in satellite retrievals.For a more detailed analysis of modeled sea-ice coverage, regional biases must be analysed.Therefore, the mapping of differences in modeled mean sea-ice concentra-

Conclusions References
Tables Figures

Back Close
Full tion is a standard tool in examining model quality.However, again the interpretation of such analysis hinges on the reliability of the underlying concentration field as obtained from satellite retrievals: in particular in summer, large differences arise between different algorithms (Fig. 14a).To allow for a rough quantification of the uncertainty of retrieved sea-ice concentration from satellite, we have calculated for each month the median of the gridded difference between sea-ice concentration obtained from the NASA-Team algorithm and that obtained from the Bootstrap algorithm (Fig. 14b).This then allows one to estimate if a certain regional difference between model and satellite retrieval in a specific month still lies within the observational uncertainty.The figure confirms our analysis of the integrative measures discussed in the previous subsections: during winter time, estimates of sea-ice concentration are very similar for different satellite products, while a median uncertainty of around 10 % is typical for summer and early autumn.Note that this assessment only gives a somewhat crude estimate of the reliability of retrieved sea-ice concentration from satellites: locally, differences between the two products considered here can exceed 50 % throughout the year.
Our analysis has also shown that internal variability gives rise to much larger uncertainty in the estimate of model quality than do the differences between individual satellite retrievals.This is particularly true for the assessment of modeled trends in sea-ice coverage, which usually vary rapidly in time (see also Notz et al., 2013).In the light of this finding, for model evaluation purposes an integrative assessment of the quality of modeled processes and statistical distributions is more insightful than a simple comparison of modeled time series.This includes, for example, an assessment of seasonal changes in the ice-thickness distribution, the response of the ice cover to divergent wind fields, and an assessment of the statistical distribution of sea-ice concentration as carried out as part of the present study.Through such more focused Introduction

Conclusions References
Tables Figures

Back Close
Full

Conclusions
By analyzing the differences between sea-ice extent and the geophysically more relevant sea-ice area for model-evaluation purposes, we have found the following: 1.The summer sea-ice cover in the Arctic is seen as a compact ice cover with a high percentage of high-concentration sea ice by satellite retrievals based on the Bootstrap and the ASI algorithm.In contrast, satellite retrievals based on the NASA-Team algorithm see a loose ice cover with a low percentage of high-concentration sea ice.In winter, all satellite algorithms see a compact ice cover.
2. About half of the CMIP5 models simulate a compact ice cover in summer, while the other half simulates a loose ice cover in summer.This difference can be understood by the different distribution of excess heat for lateral melting of the ice or for thinning of the ice.In winter, all models simulate a compact ice cover.
3. Simulations with a compact ice cover have a similar bias in sea-ice extent and in sea-ice area relative to satellite retrievals based on the Bootstrap algorithm or the ASI algorithm.Simulations with a loose ice cover with a negative bias in sea-ice extent have a smaller bias in sea-ice area, while simulations with a positive bias in sea-ice extent have a larger bias in sea-ice area.This is caused by the fact that the difference between sea-ice extent and sea-ice area is larger in simulations with a loose ice cover than it is for a compact ice cover.
4. Models that simulate too fast a retreat of sea-ice extent have generally a smaller bias in simulated sea-ice-area trends relative to Bootstrap retrievals.Models that simulate too slow a retreat of sea-ice extent have generally a larger bias in sea-ice area trends.This is independent of the compactness of the ice cover.
5. The error that is introduced in the calculation of sea-ice extent and sea-ice area by different grid geometries is negligible.6. Internal variability of sea-ice trends in fully coupled models is so large that all differences in trends between observations and simulations of CMIP5 models for the period 1979-2005(and, indeed, until 2012, see Sect. 3.6, see Sect. 3.6) could be caused by internal variability.Many models show in one simulation a much stronger trend than has been observed, while a different simulation with the same model and the same forcing shows for slightly different initial conditions a much weaker trend than has been observed.
7. Internal variability can also explain much of the differences between modeled and retrieved sea-ice extent.For sea-ice area, however, about 80 % of all simulations fall outside the range of estimated internal variability for the period 1979-2005.
8. Because biases in sea-ice extent can give misleading results regarding model quality, we recommend that biases in sea-ice area are also taken into account in the assessment of model quality.We estimate the current uncertainty in satelliteretrieved sea-ice area to be 1 million km 2 throughout the year.The uncertainty in retrieved trends is less than 0.4 million km 2 decade −1 throughout the year.The median uncertainties in retrieved sea-ice concentration range from below 5 % throughout winter and spring to about 10 % in summer.9.There is little correlation between biases in the integrative measures sea-ice extent and sea-ice are compared to the mean absolute bias in sea-ice concentration.This is caused by the fact that for the integrative measures, regional positive and negative biases can cancel.The average absolute bias in sea-ice concentration relative to observations is therefore a better estimator of model quality than are either sea-ice area or extent.Introduction

Conclusions References
Tables Figures

Back Close
Full     7: because of the smaller difference between sea-ice extent and sea-ice area in observations with a compact ice cover than in simulations with a loose ice cover, models with a loose ice cover and slightly too large a simulated sea-ice extent result in a comparably small bias in simulated sea-ice area.Introduction

Conclusions References
Tables Figures

Back Close
Full Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | context of uncertainty that arises because of the internal variability of the Arctic climate system.
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | German Climate Computing Centre (DKRZ) whose data portal facilitated data access tremendously.NASA Team and Bootstrap algorithm sea-ice concentration data were obtained from the National Snow and Ice Data Center NSIDC, Boulder, Colorado, US.ASI Algorithm sea-ice concentration were obtained from the Internet on-line information page http://icdc.zmaw.de/maintained by the Center of Excellence for Climate System Analysis and Prediction (CliSAP), University of Hamburg, Germany.This work has been funded through a Max-Planck Research-Group Fellowship.The service charges for this open access publication have been covered by the Max Planck Society.

Fig. 1 .Fig. 2 .
Fig. 1.A fictitious example to illustrate the possible non-intuitive relationship between sea-ice area and sea-ice extent (a) in the observations, the ice pack is distributed such that two grid cells are covered by more than 15 % ice.(b) In a fictitious model simulation, less sea ice than in the observations is distributed such that three grid cells are covered by more than 15 % ice.(c) In a fictitious model simulation, more sea ice than in the observations is distributed such that only one grid cell is covered by more than 15 % ice.
Fig. 7. (a) September and (b) March sea-ice area versus sea-ice extent in models and satellite retrievals.The red line connects all value pairs that have the same percentage bias in seaice extent and in sea-ice area relative to the Bootstrap retrievals.The gray shading indicates a ±10 % range around the values obtained from the Bootstrap retrievals.

Fig. 8 .
Fig.8.Schematic to explain the findings in Fig.7: because of the smaller difference between sea-ice extent and sea-ice area in observations with a compact ice cover than in simulations with a loose ice cover, models with a loose ice cover and slightly too large a simulated sea-ice extent result in a comparably small bias in simulated sea-ice area.

Fig. 11 .
Fig. 11.(a) Sea-ice extent and (b) sea-ice area versus number of ice-covered grid cells for observational data of September 2007 interpolated onto all model grids examined for this study.

Fig. 13 .
Fig. 13.(a) March and (b) September anomaly in sea-ice area as retrieved from satellites for the period 1979-2010.Different colors denote different algorithms or satellites.(c) Seasonal cycle in sea-ice area and (d) in sea-ice-area trend as retrieved from satellites for the period 2003-2010.(e) Uncertainty in retrieved sea-ice area and (f) in retrieved trend of sea-ice area.