How does internal variability influence the ability of CMIP5 models to reproduce the recent trend in Southern Ocean sea ice extent

Observations over the last 30 yr have shown that the sea ice extent in the Southern Ocean has slightly increased since 1979. Mechanisms responsible for this positive trend have not been well established yet. In this study we tackle two related issues: is the observed positive trend compatible with the internal variability of the system, and do the models agree with what we know about the observed internal variability? For that purpose, we analyse the evolution of sea ice around the Antarctic simulated by 24 different general circulation models involved in the 5th Coupled Model Intercomparison Project (CMIP5), using both historical and hindcast experiments. Our analyses show that CMIP5 models respond to the forcing, including the one induced by stratospheric ozone depletion, by reducing the sea ice cover in the Southern Ocean. Some simulations display an increase in sea ice extent similar to the observed one. According to models, the observed positive trend is compatible with internal variability. However, models strongly overestimate the variance of sea ice extent and the initialization methods currently used in models do not improve systematically the simulated trends in sea ice extent. On the basis of those results, a critical role of the internal variability in the observed increase of sea ice extent in the Southern Ocean could not be ruled out, but current models results appear inadequate to test more precisely this hypothesis.


Introduction
The way climate models reproduce the observed characteristics of sea ice has received a lot of attention (e.g. Flato, 2004;Arzel et al., 2006;Parkinson et al., 2006;Lefebvre and Goosse, 2008a;Sen Gupta et al., 2009). One conclusion of those studies is that the models' skill is higher in the Northern Hemisphere than in the Southern Hemisphere. In particular, simulations performed for the 3rd Coupled Model Intercomparison Project (CMIP3) are generally able to reproduce relatively well the timing of the seasonal cycle of Southern Ocean sea ice extent, but fail in simulating the observed amplitude (Parkinson et al., 2006). Furthermore, the models are usually unable to simulate the observed increase in Southern Ocean sea ice extent (e.g. Arzel et al., 2006;Parkinson et al., 2006), which is estimated to be of 11 200 ± 2680 km 2 yr −1 between 1979 and 2006 (Comiso and Nishio, 2008). At the regional scale, the 1979-2006 trend in observed sea ice extent is positive in all the sectors of the Southern Ocean, except in the Bellingshausen-Amundsen seas sector, and the Ross Sea sector exhibits the largest positive trend (e.g. Cavalieri and Parkinson, 2008;Comiso and Nishio, 2008). Lefebvre and Goosse (2008a) have studied the trend simulated by several CMIP3 models in the different sectors of the Southern Ocean, and they have shown that these models were not able to reproduce this observed spatial structure.
The observed increase in sea ice extent during the past decades is statistically significant at the 95 % significant level (e.g. Cavalieri and Parkinson, 2008). However, its potential causes are still debated. We do not know the part of this trend that can be attributed to external forcing and the one that is due to natural variability. This issue has already been addressed for the Arctic sea ice extent (e.g. Kay et al., 2011), but remains poorly investigated for the Southern Ocean sea ice. Several studies dealing with the potential role of the forced response have pointed out the relationship between stratospheric ozone depletion over the past few decades (Solomon, 1999) and changes in the atmospheric circulation at high latitudes (e.g. Turner et al., 2009;Thompson et al., 2011). Indeed, variations of sea ice extent in the Southern Ocean are strongly influenced by changes in the atmosphere circulation (e.g. Holland and Raphael, 2006;Goosse et al., 2009b). However, the link between atmospheric circulation and the sea ice extent integrated over the Southern Ocean is not straightforward (e.g. Lefebvre and Goosse, 2008b;Stammerjohn et al., 2008;Landrum et al., 2012) and several recent studies came to the conclusion that the stratospheric ozone depletion does not lead to an increase in the sea ice extent (e.g. Sigmond and Fyfe, 2010;Smith et al., 2012;Bitz and Polvani, 2012). A second potential cause of the observed expansion of sea ice cover relies on an enhanced stratification of the ocean which would inhibit the heat transfer to the surface. This strengthened stratification is mainly due to a freshening of the surface water, triggered by an increase in the precipitation over the Southern Ocean, the melting of the ice shelf, and changes in the production and transport of sea ice (e.g. Bitz et al., 2006;Zhang, 2007;Goosse et al., 2009b;Kirkman and Bitz, 2010). Liu and Curry (2010) pointed out that an enhanced hydrological cycle may also increase the snowfalls at high latitudes in the Southern Ocean. In that case, the snow cover on thicker sea ice would raise the surface albedo, strengthen the insulation between the atmosphere and the ocean, and thus would protect the sea ice from melting. Nevertheless, this mechanism mainly impacts thick ice because for thin ice, the higher snow load leads to seawater flooding and to the formation of snow ice. This decreases the effect of the initial increase in snow thickness.

Published by Copernicus
Another hypothesis suggests that the positive trend in the Southern Ocean sea ice extent could arise from the internal variability of the system that masks the warming signal in the Southern Ocean that should characterize the response to an increase in greenhouse gases concentration, according to climate models. In this framework some recent studies have drawn the attention to the importance of distinguishing the lack of agreement between models from the lack of significant signal (e.g. Tebaldi et al., 2011;Deser et al., 2012). A trend can be significant from a statistical point of view, i.e. if it is above a threshold of significance computed through a statistical test. This does not imply that its value is outside of the range that can be reached by the internal variability. For instance, Landrum et al. (2012) have pointed out that large interannual variability in simulated sea ice concentration leads to late 20th Century trends in sea ice concentration that are not always statistically significant for individual members of an ensemble simulation. The observed positive trend of Southern Ocean sea ice extent is statistically significant at the 95 % level for the last 30 yr (e.g. Cavalieri and Parkinson, 2008). However, this time period is too short to properly assess the multidecadal variability of the system. Conse-quently, we cannot estimate if this trend is exceptional or if similar conditions have already occurred many times in the recent past. The period spanning the last 30 yr during which sea ice cover slightly expanded in the Southern Ocean might follow a large melting that may have happened before 1979 (e.g. de la Mare, 1997Mare, , 2009Cavalieri et al., 2003;Curran et al., 2003;Cotté and Guinet, 2007;Goosse et al., 2009b). This suggests that multidecadal variability in the Southern Ocean is large, but the available data do not allow a quantitative estimation of its value. Sparse data from the 1960s are currently being processed (e.g. Meier et al., 2013), making observations of the sea ice extent available over a longer time period. Further analyses based on these prolonged time series might therefore improve our knowledge of the internal variability of the sea ice extent. Nevertheless, until longer continuous time series are available, the results from model simulations appear to be crucial to balance the lack of observations. Provided that models are compatible with the available observations, they can help addressing the issue whether the observed positive trend in the Antarctic sea ice extent is due to external forcing or to internal variability, or to both of them.
The decreasing trend in many model simulations may be due to a misrepresentation of the response of the circulation and/or of the hydrological cycle to the forcing. Alternatively, the observed changes may belong to the range of the trends that can be attributed to the internal variability of the system. In this hypothesis the positive trend observed over the last decades is just one particular realization among all the possible ones. A negative trend in one model's simulations does not imply necessarily a disagreement between model and data as another simulation with the same model (another member of an ensemble, for instance) would likely display a positive one. Furthermore, if this is valid and if the internal variability is to some extent predictable, an adequate initialization of the system could lead to a better simulation of the evolution of the sea ice cover around the Antarctic.
In this paper we examine outputs from general circulation models (GCMs) following the 5th Coupled Model Intercomparison Project (CMIP5) protocol. To further study the role of the internal variability in the increasing trend in sea ice extent in the Southern Ocean and in the apparent disagreements between models and observations, we deal with two kinds of simulations: historical and hindcast (or decadal) simulations. The first ones are driven by external forcing and are initialized without observational constraints. They are used to assess how well each model simulates the observed mean state, variability and trends in sea ice concentration and extent. The objective is to study the possible links between the internal variability of the system and the simulated trend in sea ice extent. Our purpose is, on the one hand, to test if the internal variability of the models agrees with the one of the observations. On the other hand, we check if the observed positive trend stands in the range of trends provided by models internal variability. Analysing the mean state also appears to be The Cryosphere, 7, 451-468, 2013 www.the-cryosphere.net/7/451/2013/ important here because of its impact on the simulated variability (e.g. Goosse et al., 2009a). In addition to those points related to the variability of the system, the way stratospheric ozone is taken into account in models is also discussed to estimate if this has a significant impact on the simulated trends. However, it is out of the scope of this study to discuss specific mechanisms that link the sea ice extent and the stratospheric ozone variations. The second kind of simulations -the hindcasts -are also driven by external forcing, but, in contrast to the historical simulations, are initialized through data assimilation of observations. Consequently, these simulations allow us to assess how the state of the system in the early 80s impacts the variability of the models and their representation of the trend over the last 30 yr. Idealized model studies have shown high potential predictability at decadal time scales in the Southern Ocean (e.g. Latif et al., 2010), i.e. models have deterministic decadal variability, in particular for surface temperatures (Pohlmann et al., 2004). The predictive skill of the models at decadal time scales is also discussed here to see if this potential predictability is confirmed in real applications.
An initial investigation of the results of CMIP5 models has shown that, in agreement with previous studies related to CMIP3 models (e.g. Lefebvre and Goosse, 2008a), current GCMs do not simulate a spatial structure of the trend in sea ice extent similar to the observed one. This spatial structure might as well arise from the internal variability. In such a case, models would not have to fit the observed pattern as discussed above. However, this remains a hypothesis and we have chosen to focus on the sea ice extent in the whole Southern Ocean rather than in the individual sectors to avoid the additional complexity associated with the spatial structure of the changes. Models and observation data are briefly presented in Sect. 2. The time period we analyse is limited by the available observations. For the Southern Ocean, validation data are quite sparse before 1979. We therefore examine outputs between 1979 and 2005. Results provided by models' historical simulations are presented and discussed in Sect. 3. The analyses of hindcast simulations are described in Sect. 4. Finally, Sect. 5 summarizes our results and proposes conclusions.

Models and observation data
The models' data were obtained from the CMIP5 (Taylor et al., 2011) multi-model ensemble: http://pcmdi3.llnl.gov/ esgcet/home.htm. We have analysed results of historical simulations from 24 models which have the required data available. Among these models, 10 of them provide results for hindcast simulations. Both historical and hindcast simulations consist of ensemble simulations of various sizes. Historical runs finish in 2005 and we have decided not to prolong them with the RCP (Representative Concentration Pathways) simulations. Given that these latter contain less members, it would have made the analysis of the internal variability less reliable. Models and their respective modelling groups are listed in Table 1, along with the number of members in each model historical and hindcast simulations. The models have different spatial resolution and representation of physical processes. The spatial resolution of models' components is summarized in Table S1 of the Online Supplement Tables of this paper. A reference is also given for more complete documentation.
We give specific information on the treatment of ozone in Table 2 as a basis for the discussion presented in Sect. 3.3. The AC&C/SPARC ozone database (Cionni et al., 2011) is used to prescribe ozone in most of the models without interactive chemistry. In this database, stratospheric ozone for the period 1979-2009 is zonally and monthly averaged. It depends on the altitude and it takes solar variability into account. Whether they have interactive chemistry or prescribed stratospheric ozone, the 24 models analysed in this study thus take into account the stratospheric ozone depletion in their historical simulations. This is an improvement since the CMIP3 simulations. Indeed, nearly half of the CMIP3 models prescribed a constant ozone climatology (Son et al., 2008). Nevertheless, some of the models have a coarse atmosphere resolution which sometimes does not encompass the whole stratosphere. In that case, processes related to the interaction between radiation and ozone as well as the exchange between the stratosphere and the troposphere may be represented rather crudely.
The hindcast simulations were initialized from a state that has been obtained through a data assimilation procedure, i.e. constrained to be close to some observed fields. There is a large panel of data assimilation methods, but most of the models involved in CMIP5 assimilate observations through a nudging. This method consists of adding to the model equations a term that slightly pulls the solution towards the observations (Kalnay, 2007). MIROC4h and MIROC5 incorporate observations in their data assimilation experiments by an incremental analysis update (IAU). Details about this method can be found in Bloom et al. (1996). Table 3 summarizes the data assimilation method corresponding to each model as well as the variable it assimilates. The relevant documentation was not available to us for CCSM4, FGOALS-g2 and MRI-CGCM3. All the models for which we have the adequate information, except BCC-CSM1.1 and CNRM-CM5, assimilate anomalies. Those anomalies are calculated for both model and observations by subtracting their respective climatology, computed over the same reference period. Working with anomalies does not prevent model biases, but it avoids the initialization of the model with a state which is too far from its own climatology and thus limits model drift (e.g. Pierce et al., 2004;Smith et al., 2007;Troccoli and Palmer, 2007;Keenlyside et al., 2008;Pohlmann et al., 2009) The model skill is measured through its representation of the sea ice concentration (the fraction of grid cell covered by sea ice) and sea ice extent (the sum of the areas of all grid cells having an ice concentration of at least 15 %). We consider the sea ice extent over the whole Southern Ocean and for models it has been calculated on the original models' grids. For each model providing an ensemble of simulations, the model mean is the average over the members belonging to the ensemble. The multi-model mean is then derived by computing the mean of the individual models means without applying any weighting to the models. Sea ice concentration comes from the satellite observation of the National Snow and Ice Data Center (NSIDC) (Comiso, 1999(Comiso, , updated 2008. The sea ice extent is then derived from this dataset following the method described in Cavalieri et al. (1999) and applied by Cavalieri and Parkinson (2008) for the period 1979-2006.

Historical simulations
The historical simulations are driven by external forcing and are initialized without observational constraints. These simulations are here used to assess the mean state and the variability of the models using recent observations.

Mean state and variability
In a first step, we analyse the mean sea ice concentration over the period 1979-2005. Figure 1 shows the multi-model mean of sea ice concentration in the Southern Ocean and compares the simulated sea ice edge to the observed one. Results are given for February (September), the month during which the observed sea ice extent reaches its minimum (maximum). In February the multi-model mean underestimates the sea ice cover in the Bellingshausen and Amundsen Seas as well as in the eastern part of the Ross Sea. In the Western Ross Sea and in small parts of the Weddell Sea and of the Indian Ocean sector, the multi-model mean overestimates the sea ice extent. In September the shape of the sea ice edge computed The Cryosphere, 7, 451-468, 2013 www.the-cryosphere.net/7/451/2013/  Information not available to us CNRM-CM5 Nudging of 3-D ocean temperature and salinity (raw data) as a function of depth and space, sea surface temperature and salinity nudging (raw data).
Swingedouw et al. (2012)  MIROC4h Incremental analysis update (IAU) of 3-D ocean temperature and salinity (anomalies). from multi-model mean roughly fits the observations. However, the multi-model mean overestimates the sea ice cover everywhere except in the Indian Ocean sector and in the eastern part of the Ross Sea sector. This reasonable multi-model mean extent is the result of the average of a wide range of individual behaviours. To account for this variety of mean model states, we have plotted, for individual models, the mean of sea ice extent of each month of the year during the period 1979-2005. Figure 2a confirms that the multi-model mean fits quite well the observations, especially during winter months. However, the seasonal cycle of sea ice extent of the various models is largely spread around the observations and the timing of the minimum/maximum sea ice extent varies from one model to the other. In summer, 16 of the models underestimate the sea ice extent. In particular, CNRM-CM5 and MIROC5 are nearly sea ice free during summer. The latter strongly underestimates the ice extent all over the year, and its winter sea ice extent is smaller than some models' summer sea ice extent. On the contrary, CCSM4 and CSIRO-Mk3.6.0 overestimate the sea ice extent during the whole year, especially during summer. In winter, when the simulated sea ice cover reaches its maximum, the sea ice extent ranges from approximately 5×10 6 to 24×10 6 km 2 , while the observations display a sea ice extent of about 17×10 6 km 2 . 10 models underestimate the sea ice extent in September.
Since the internal variability of the climate system may also have played a role in the observed expansion of sea ice cover, we assess its representation in models by computing the standard deviation of the sea ice extent for each month of the year, over the period 1979-2005 (Fig. 2b). Here, to obtain both the ensemble mean of each model and the multi-model mean of standard deviations, an average of the individual standard deviations has been performed. We have chosen to detrend data before computing the standard deviation in order to suppress the direct impact of a trend on the standard deviation that could obscure our analysis of the potential links between those two variables discussed in Sect. 3.2. The monthly standard deviation indicates that the variability strongly differs between models. In February, 15 models have a standard deviation higher than the observed one, and all of the 24 models overestimate the standard deviation during September. Consequently, the multi-model mean of standard deviations does not fit very well the observations. It overestimates the standard deviation all over the year, particularly during winter. The interannual variability in some models is significantly larger during winter months than during summer months. As a result these models have a pronounced seasonal cycle of their standard deviation, in contrast to the observations, which display a relatively constant value throughout the year. The causes of the overestimated winter variability of modelled sea ice have not been identified yet. We have performed some preliminary analyses that indicate that, for some models, changes in the oceanic convection could be associated to the higher winter variability (not shown). The oceanic or the atmospheric circulations may also play a role in the high winter sea ice variability simulated by the models. However, this aspect is out of the scope of the present study and it will be addressed in future work.
The analysis of Fig. 2b tells us two important things. On the one hand, it points out the inability of the majority of models to reproduce the observed interannual variability. In particular, they all overestimate the winter interannual variability. On the other hand, it highlights the fact that some   (Cavalieri and Parkinson, 2008). models are characterized by a very different magnitude of the interannual variability from one season to the other. In order to avoid a loss of information, we have thus chosen in the following analysis to work with seasonal mean rather than with annual mean and to treat the summer and winter separately.

Trend over the period 1979-2005
For the historical simulations, we have computed for each member of the ensemble the trend from 1979 to 2005 of summer (average of January, February and March) and winter (average of July, August and September) sea ice extent. Each trend has been computed through a linear regression of the yearly values (between 1979 and 2005) of the summer or winter sea ice extent. We have checked if the trends were sig-nificant at the 95 % level (see Table S2 and S3 of the Online Supplement Tables of this paper). The autocorrelation of the residuals has been taken into account in the computation of the standard deviation of each trend as well as in the number of degrees of freedom used to determine the threshold of significance, as proposed by Santer et al. (2000) and applied, for instance, by Stroeve et al. (2012). In addition to a direct evaluation of model skill, one of our goals is to analyse if a relationship can be established between the mean state, the interannual variability simulated by the model and the ability to reproduce the observed trend.
Observations show that the summer sea ice extent ex-  deviation (b, d). The first row corresponds to summer (JFM), the second to winter (JAS). The different colours correspond to the historical simulations from 24 different models. For each colour, the small dots refer to model individual members and the symbol specified in the legend is for the model ensemble mean. The number of members in each model is indicated in brackets in the legend. Orange refers to multi-model means, for which the diamond sign is for the average over all the models, circle sign is for the mean of models with interactive chemistry (in bold in Table 2) and triangle sign is for the mean of models with 35 atmospheric levels or more on the vertical. Black square is for the observations (Cavalieri and Parkinson, 2008), surrounded by 2 standard deviations (dark-grey rectangle). Horizontal (vertical) solid black line with the light-grey shade refers to the trend (mean/standard deviation) of the observations along with 2 standard deviations. The computed standard deviation of the observed trend takes into account the autocorrelation of the residuals (see for instance Santer et al., 2000;Stroeve et al., 2012). in contrast to the trend of the annual mean (not shown). In Fig. 3a it appears that almost all of the simulations performed with the 24 models fail in simulating the sign of this observed trend. Only three models (FGOALS-g2, GFDL-CM3 and GISS-E2-R) have an ensemble mean with a positive trend, while most of them simulate a relatively large negative trend. For four additional models (CCSM4, CSIRO-Mk3.6.0, HadCM3 and MRI-CGCM3), some ensemble members display a positive trend. Nevertheless, CCSM4, CSIRO-Mk3.6.0 and FGOALS-g2 have a mean summer sea ice extent much larger than what is observed, while GFDL-CM3 and GISS-E2-R are well below the observations. Moreover, CCSM4 and CSIRO-Mk3.6.0 have an interannual variability which is on average twice the one of the observations. For summer sea ice extent, some given models display a standard deviation that could be quite different between members (Fig. 3b). Besides, the individual means of ensemble members performed with the same model are relatively similar (Fig. 3a). The range of values reached by the trends of the different members belonging to one model's simulation also differs strongly from one model to the other (Fig. 4a). We quantify the various ranges provided by the different models, thanks to the ensemble standard deviation of the trends, for models that have at least 3 members in their historical simulations. This ensemble standard deviation of the trends stands between 26 000 km 2 per decade for MIROC-ESM and 470 000 km 2 per decade for BCC-CSM1.1 (see Ta- 166 000 km 2 per decade. If we consider this average as an estimate of the range of the trend that can be associated with internal variability, the observed positive trend of 149 000 km 2 per decade is well among the values that could be due to natural processes alone and compatible with the available ensemble of model results. Nevertheless, given that many models have an interannual variability that is much larger than the one of the observations, it is not sure whether the range of the trends they provide is representative of the reality. The comparison between the trend, the mean extent, and standard deviation does not display any clear link in summer between those variables: some of the models that simulate an increase in the ice extent in at least one of their members overestimate the observed mean and variability, some underestimate it. Figure 3b also underlines the fact that models with little ice during summer often have a small interannual variability of summer sea ice extent, in agreement with results of Goosse et al. (2009a). Moreover, the spread of the sea ice extent trends and standard deviations of members belonging to one model ensemble grows with the mean summer sea ice extent.
Winter sea ice extent also increased between 1979 and 2005 by approximately 86 000 km 2 per decade. Two models have an ensemble mean whose trend is positive: GFDL-CM3 and IPSL-CM5A-MR (Fig. 3c) (Cavalieri and Parkinson, 2008). The vertical and the horizontal black bars are for the standard deviation of the observed trend which are barely distinguishable due to their small values. Dashed line represents the line y(x) = x. The computed standard deviations of the trends takes into account the autocorrelation of the residuals (see for instance Santer et al., 2000;Stroeve et al., 2012). to the observed one, but it strongly underestimates the mean winter sea ice extent. It is also an ensemble whose members are highly scattered along the trend axis, three having a positive trend (from approximately 470 × 10 3 to 1300 × 10 3 km 2 decade −1 ) and two having a negative one (from approximately −290×10 3 to −1120×10 3 km 2 decade −1 ). The IPSL-CM5A-MR ensemble is made up of one member only. Its trend and its mean are both close to observations.
The 22 remaining models all have an ensemble mean showing a decrease in winter sea ice extent. However, as noticed for summer, a few of them have ensemble members displaying positive trends (BCC-CSM1.1, CSIRO-Mk3.6.0, IPSL-CM5A-LR and MRI-CGCM3). Two of three BCC-CSM1.1 historical simulation members present a positive trend. The last one has a very negative trend, reaching −2520 × 10 3 km 2 decade −1 . Contrarily, the mean sea ice extent does not vary much between members of BCC-CSM1.1, all of them being larger than the observations. CSIRO-Mk3.6.0 ensemble contains 10 members. They all simulate a mean sea ice extent in winter relatively close to the observations. Only one member shows an increase in sea ice extent. Figure 3d confirms that all the 24 models overestimate the interannual variability in winter. It also underlines the fact that simulations that have an ensemble mean of the trends close to the observed one have generally a standard deviation which is much larger than the one of the observations. IPSL-CM5A-MR single member, which has a trend and a mean state relatively close to the observations, has a standard deviation equals to 0.85×10 6 km 2 , while the observed standard deviation stands around 0.25×10 6 km 2 . GFDL-CM3 is a model that has a very high standard deviation (around 4 times the standard deviation of the observations). It is also a model with a large range of trends reached by its members (Fig. 4b).
For winter sea ice extent, considering again models that have at least 3 members in their historical simulations, the ensemble standard deviation of the trends varies between 100×10 3 km 2 decade −1 for FGOALS-s2 and 1 704×10 3 km 2 decade −1 for BCC-CSM1.1 (see Table S3 of the Online Supplement Tables of this paper). On average, this ensemble standard deviation of the trends equals 428 000 km 2 decade −1 . As for summer, if this value is representative of the range of trends due to internal variability, the observed trend of 86 000 km 2 per decade appears compatible with natural processes and the model ensemble. However, the model biases in their representation of the variance in winter during the last 30 yr is even larger than in summer, making this estimate of the uncertainty based on model results very questionable.
From this analysis of historical simulations, it appears that among all the simulations analysed, only a few of them present a positive trend of the sea ice extent, for both summer and winter. events, but are within the range of internal variability according to model results. The important point here is that these positive trends are generally found in models that overestimate the interannual variability. Because of their high interannual variability, such models can provide a large range of possible trends, some of them agreeing with the observations.

Stratospheric ozone
CMIP5 models all take into account the stratospheric ozone depletion that occurred during the last 30 yr (see Table 2 for details). However, this improvement compared to CMIP3 brought to the stratospheric ozone does not lead to major changes in their representation of the trend in sea ice extent in the Southern Ocean. To go a step further, we discuss if the way stratospheric ozone is treated has an influence on the results. The models with interactive chemistry (activated during the simulation or used in an offline simulation to compute the ozone dataset) and the ones with higher atmospheric vertical resolution (≥ 35 layers) have on average a slightly smaller extent of sea ice in summer (Fig. 3a, respectively circle and triangle orange symbols). In winter the models with high atmo-spheric resolution underestimate the sea ice extent, while the ones with interactive chemistry overestimate it (Fig. 3c). The influence on the trend is hardly detected. This shows that, on average, the inclusion of an interactive chemistry or an increased vertical resolution does not make major differences compared to other models. , 7, 451-468, 2013 www.the-cryosphere.net/7/451/2013/ Looking now at individual models, we have seen in Sect. 3.2 that CSIRO-Mk3.6.0, GFDL-CM3 and IPSL-CM5A-MR provide results for sea ice extent trends in winter in relatively good agreement with observations, but with much too high a standard deviation for GFDL-CM3 and IPSL-CM5A-MR. CSIRO-Mk3.6.0 has a quite coarse resolution in its atmosphere component (18 vertical layers) and prescribes the ozone from the AC&C/SPARC database. GFDL-CM3 and IPSL-CM5A-MR have a finer resolution (48 and 39 layers, respectively). They both have interactive chemistry, but IPSL-CM5A-MR treats the interaction between ozone and climate through a semi-offline approach. Again, from the available ensemble, the representation of ozone in models does not seem to be the dominant factor influencing the simulation of the trend in sea ice extent.

Hindcast simulations
We have shown in Sect. 3 that the lack of agreement between simulated and observed variance over the last 30 yr does not allow us to confidently establish the link between the internal variability and the positive trend found in observations of the sea ice extent. Nevertheless, if this link exists and if the internal variability in the Southern Ocean is in some way predictable, an adequate initialization of the system should improve the results of the simulated evolution of the sea ice extent. This hypothesis is tested in this section using the hindcast simulations performed in the framework of CMIP5. In contrast to the historical simulations, the hindcasts are initialized through data assimilation of observations. The data assimilation method and the variables assimilated vary from one model to the other, as summarized in Table 3.

Impact of the initialization on the simulated trends
The models used for the hindcast analysis have been chosen on the basis of the availability of their results. Fortunately, we see on Fig. 2 that these 10 models (dotted lines) constitute a subset which represents reasonably well the variety of general circulation models. In order to outline the effect of the initialization on the simulated trend in sea ice extent for each model, we have computed the ensemble mean of the trends in hindcast simulations spanning the period 1981-2005, for winter and summer extent, and compared them to the ones from historical simulations (i.e. uninitialized) over the same time period. This period has been chosen as no hindcast was started in 1979. Here the hindcasts were initialized in January 1981 for all the models except HadCM3, whose hindcast members were started in November 1980. On Fig. 5, showing the trend in sea ice extent computed from hindcast simulations against the one computed from historical simulations, a dot located on the line y(x) = x means that the trend in hindcast simulation equals the one of historical simulation. If the trend simulated by hindcast is greater (smaller) than the one computed from historical simulation, then the dot will be above (below) the line y(x) = x.
Regarding summer sea ice extent (Fig. 5a), the initialization through a data assimilation procedure does not improve systematically the simulated trend. HadCM3, MIROC4h and MRI-CGCM3 hindcasts trends are closer to the observation than are their historical trends, but they remain negative. BCC-CSM1.1, CNRM-CM5, IPSL-CM5A-LR and MPI-ESM-LR simulate a more negative trend in their hindcasts than in their historical runs. FGOALS-g2 has a largely positive trend in its hindcast, while the trend in its historical simulation is slightly negative. CCSM4 hindcast displays a slightly positive trend, while the one of its historical simulation is negative.
When initialized through data assimilation of observations, CCSM4, FGOALS-g2, CNRM-CM5 and BCC-CSM1.1 present a systematic drift (not shown). This drift is likely responsible for the high positive or negative trends found in the hindcasts of these models. Such a drift has its origin in the initialization of a model with a state that forces it to produce much more (or less) sea ice than its climatological mean. After the initialization, the model does not have any constraint from observations anymore, and the simulation tends to go back towards the model's climatology. We do not have information about the method used to initialize the models FGOALS-g2 and CCSM4. The use of raw data in the initialization procedures applied to BCC-CSM1.1 and to CNRM-CM5 may partly account for the drift occurring in their hindcast simulations.
Similarly, for winter sea ice extent, the initialization with observations does not systematically lead to a simulated trend in better agreement with observations. Figure 5b shows that hindcast simulations of MIROC4h, MIROC5 and MRI-CGCM3 have trends that are slightly closer to the observation than are the historical trends. The 7 other models perform worse or do not offer any improvement when they are initialized with observations. As in the case of summer sea ice extent (Fig. 5a), FGOALS-g2 simulates a large positive trend in its winter sea ice extent when it is initialized with observations, and CNRM-CM5 has a more negative trend in its hindcast for the same reasons as the one proposed above. For BCC-CSM1.1, the hindcast trend in winter sea ice extent does not differ significantly from the historical trend.
Results presented in Fig. 5 show that the initialization of models through data assimilation of observation does not bring significant improvement on the simulated trend. When raw data are used instead of anomalies, the initialization apparently deteriorates the trend in sea ice extent simulated by models. Corrections can be introduced to take into account that kind of bias (e.g. Troccoli and Palmer, 2007;Vannitsem and Nicolis, 2008). Nevertheless, such a procedure requires a larger amount of initialized simulations spanning several decades. Proposing such a method for sea ice and analysing how it would impact the analysis of the trend is out of the scope of our study.

Correlation between models and observations
The forecast skill of the models can also be assessed by analysing the predictions a few years ahead. To do so, for each model, we have computed the anomaly correlation coefficient used in Pohlmann et al. (2009): where t is the lead time (in years), x ij are the hindcast simulations, i is the ensemble index (different indices correspond to different times when the hindcast simulations are started) and j is the index of the member belonging to the ensemble i. N is the number of ensembles and M is the number of members within each ensemble. o i is the observation covering the time period spanned by the ensemble i. The overbar stands for the climatological mean of the uninitialized (historical) simulation and of the observations, over the analysed period (here 1981-2005).
The correlation between hindcast simulations and observations is shown for summer (Fig. 6) and winter (Fig. 7) sea ice extent. This correlation has been computed from a series of 4 hindcasts ensemble simulations, initialized every 5 yr between January 1981 and January 1996 (every 5 yr between November 1980 and November 1995 for HadCM3). The 95 % significance level is computed using a t-test. This significance level varies from one model to another because of the different number of members in each model ensemble (see Table 1).
In summer, none of the 10 models analysed here has a significant correlation for the first year after initialization (Fig. 6). HadCM3, IPSL-CM5A-LR and MIROC4h never outstrip the 95 % significant level. The 7 remaining models present one or two peaks of significant correlation several years after the initialization, and almost all the models have a negative correlation during most of the 10 yr. The emergence of correlation later on in the simulation can occur randomly, or it might still be a consequence of the initialization. Indeed, models might undergo an initial shock due to the initialization procedure before getting stabilized and benefit from the initialization. For winter sea ice extent (Fig. 7), the correlation is significantly positive during the first year for CCSM4, MIROC5 and MPI-ESM-LR models, indicating some predictive skill. Then the correlation decreases and reaches negative values. A negative correlation is also found in the other models. The significant correlation after one year in three models in winter likely arises from the initialization, but the memory of the system is apparently not sufficient to keep a significant correlation during the following years. Unlike in the Arctic, sea ice around the Antarctic is relatively young. It disappears almost entirely during the melting season and recovers during winter months, preventing this sea ice to retain information from initialization. The ocean can keep the information over longer periods, but in the available experiments its role appears weak during the first year after initialization. Still, it may be responsible for the emergence of correlation several years after initialization, for both summer and winter sea ice extent, through local interactions or teleconnections with remote areas.
In any case, the skill of model predictions for Southern Ocean sea ice extent is quite poor compared to the one obtained for other variables. For instance, Kim et al. (2012) have analysed hindcasts results from seven CMIP5 models and have shown that these models have a high skill in forecasting surface temperature anomalies over the Indian, North Atlantic and Western Pacific oceans up to 6-9 yr ahead. Matei et al. (2012a) have pointed out a significant correlation between hindcast and observations for the Atlantic Meridional Overturning Circulation (AMOC) strength at 26.5 • N up to 4 yr ahead.

Summary and conclusions
From 24 CMIP5 models available to date, we have analysed results of historical and hindcast simulations. This is still a small ensemble, but we consider that it is diverse enough to constitute a reasonable sample to draw conclusions about current models behaviour in the Southern Ocean.
The multi-model mean reproduces well the observed summer and winter sea ice edge as well as the annual cycle of sea ice extent. The skill of individual models is much lower. The majority of the biases in the simulated Southern Ocean sea ice highlighted for CMIP3 models persist for the CMIP5 ones. Furthermore, all the models analysed here overestimate the variability of the sea ice extent in winter. In addition, we saw that, in contrast to observations, the variability in some models can vary significantly from one season to the other. We have thus chosen to analyse seasonal means rather than annual mean, but the conclusions are similar whether we consider summer or winter sea ice extent.
The analyses performed in this paper aimed at better understanding the role played by the internal variability in the observed increase of sea ice extent in the Southern Ocean. Our approach can be summarized in three questions that we can now partly answer.
Firstly, is the trend of winter and summer observed sea ice extent compatible with a combination of the forced response and the internal variability according to model results? The models generally respond to the external forcing by a decrease in their sea ice extent. Our analysis of its representation in the different models has shown that the inclusion of stratospheric ozone depletion does not modify strongly the sign of the simulated trend in sea ice extent in the Southern Ocean compared to CMIP3, in which only half of the models took into account this forcing. Moreover, models with interactive chemistry or with higher atmospheric vertical resolution do not provide better results that the other ones. Nevertheless, natural variability can overwhelm the influence of The Cryosphere, 7, 451-468, 2013 www.the-cryosphere.net/7/451/2013/ the forced response, leading to a positive trend in some ensemble members. This case appears relatively rare among the available simulations. However, if we consider the wide range of trends each model provides because of its own dynamics only, the positive observed trend in sea ice extent can be accounted for by internal variability. Secondly, does the models' internal variability agree with the one of the observations? From our model analysis, positive trends in sea ice extent, such as the observed one, can arise from internal variability. Nevertheless, to have confidence in this conclusion, the models' internal variability must fit the one of the observations. Unfortunately, we have shown that the models often have a climatological mean which is far from the observations, or too high an interannual variability, or even both. None of the CMIP5 models provides thus a reasonable estimate of all the main characteristics of the sea ice cover over the last decades in the Southern Ocean, in contrast to the Arctic (e.g. Stroeve et al., 2012;Massonnet et al., 2012). Moreover, the few models that display an increase in sea ice extent have such a large variability that the sign of the trend is not robust. One may argue that the higher internal variability found in the models, compared to the one of the observations, is due to some transient, specific characteristics of the last decades. However, this hypothesis has not been confirmed since the mean state and the internal variability of the models is roughly constant over the past 150 yr. Because of those models' biases, we cannot reasonably consider the results of these models as a good representation of the behaviour of the Southern Ocean sea ice. As a consequence, even if the positive observed trend in sea ice extent is compatible with the models internal variability, the biases of these models prevent us from firmly assessing the link between the internal variability in the Southern Ocean and the observed increase in sea ice extent.
Thirdly, how does the initialization method impact the simulated evolution of sea ice extent in the Southern Ocean? If the internal variability is important, a correct initialization of the model state may lead to a better agreement with data. In this hypothesis, constraining the model with observations would put the system in a state that favours an increase in ice extent, for instance because of a more stratified or colder ocean. However, results from hindcast simulations have shown that there is no systematic improvement of the simulation of sea ice extent observed trend. Previous studies have demonstrated that models have a high potential predictability in the Southern Ocean region at decadal time scales (e.g. Latif et al., 2010), i.e. in models there exists deterministic decadal variability. The test in real conditions has not shown such predictability for sea ice extent. This may be due to some inadequate representation of physics and/or feedbacks in models, but also to the initialization procedure. Indeed, observations required to initialize properly the system are quite sparse in that area and the time period they cover is relatively short. Furthermore, data assimilation methods used in general circulation models are essentially based on nudging, and improvement may be expected if more sophisticated methods are applied and systematically tested in the Southern Ocean.
To sum up, from an exclusive model approach, a positive trend in the Southern Ocean sea ice extent spanning the last 30 yr, though being a rare event, can be accounted for by the internal variability of the system. Nevertheless, we have shown that the models display a mean state or an interannual variability, or even both that disagree with what is observed. As a consequence, this raises the question whether we can consider these models results as reliable estimates of what happens in reality, and it affects the level of confidence one has in decadal predictions or projections of the evolution of the sea ice around the Antarctic performed with those models.