Improving Met Office seasonal forecasts of Arctic sea ice using assimilation of CryoSat-2 thickness

Interest in seasonal predictions of Arctic sea ice has been increasing in recent years owing, primarily, to the sharp reduction in Arctic sea ice cover observed over the last few decades, which is projected to continue. The prospect of increased human industrial activity in the region, as well as scientific interest in the predictability of sea ice, provides important motivation for understanding, and improving, the skill of Arctic predictions. Several operational forecasting centres now routinely produce seasonal predictions of sea ice cover using coupled atmosphere-ocean-sea ice models. 10 Although assimilation of sea ice concentration into these systems is commonplace, sea ice thickness observations, being much less mature, are typically not assimilated. However many studies suggest that initialisation of winter sea ice thickness could lead to improved prediction of Arctic summer sea ice. Here, for the first time, we directly assess the impact of winter sea ice thickness initialisation on the skill of seasonal summer forecasts by assimilating CryoSat-2 thickness data into the Met Office’s coupled seasonal forecasting system (GloSea). We show a significant improvement in predictive skill of Arctic 15 sea ice extent and ice-edge location for forecasts of September Arctic sea ice made from the beginning of the melt season. The improvements in sea ice cover lead to further improvement of near-surface air temperature and pressure fields across the region. A clear relationship between modelled winter thickness biases and summer extent errors is identified which supports the theory that Arctic winter thickness provides some predictive capability for summer ice extent, and further highlights the importance that modelled winter thickness biases can have on the evolution of forecast errors through the melt season. 20

In response to declining sea ice cover, human activity in the Arctic is increasing with access to the Arctic Ocean becoming more important for socioeconomic reasons (Meier et al., 2014). Such activities include commercial activities like tourism, fishing, mineral and oil extraction, and shipping (Smith and Stephenson, 2013), along with activities of importance to local communities such as subsistence hunting and fishing, search and rescue, and community re-supply. Accurate forecasts of Arctic sea ice are therefore becoming increasingly important for the safety of human activities in the Arctic (Eicken, 2013). 5 Better knowledge of sea ice on seasonal timescales allows for better planning which should lead to a reduced level of risk and a reduction in operational costs for human activities in the Arctic Ocean. Regional changes in Arctic sea ice cover can also have implications for lower-latitude weather and climate Balmaseda et al. 2010;Screen, 2013).
For example, Koenigk et al. (2016) show that late summer sea ice cover can be linked to winter NAO-like patterns and blocking in Western Europe. More accurate Arctic sea ice predictions therefore, can also contribute to improved forecasts, 10 and hence longer-term planning, in mid-latitude regions.
Following the drastic reduction in Arctic sea ice extent in the summer of 2007, which led to a (then) record-low summer minimum extent being set, interest in seasonal predictions has increased. In response to this, in 2008, the Sea Ice Outlook (SIO) was instigated by the Study of Environmental ARctic CHange (SEARCH) to synthesise seasonal predictions of 15 September Arctic sea ice extent, made from late spring and early summer, using a variety of modelling, statistical, and heuristic approaches (see Stroeve et al., 2014). For seasonal forecasts to be of use to stakeholders, a thorough understanding of their predictive skill is needed. The SIO fosters collaboration of such activities across various prediction centres through the inter-comparison and common evaluation of forecasts (see https://www.arcus.org/sipn/sea-ice-outlook). There is also an interesting scientific problem here to test our ability to predict sea ice on seasonal timescales. As the sea ice thins, variability 20 in ice extent increases (Holland et al., 2011;Goosse et al., 2009) and so the problem of making seasonal Arctic sea ice predictionsparticularly for the September minimumis one that is getting more challenging and interesting as the ice cover declines.
Although global coupled forecasting systems have been used successfully for seasonal prediction of mid-latitude weather 25 and climate for some time now (see for example Scaife et al., 2014), their application to Arctic sea ice prediction is much less mature. In particular, forecasts in the Arctic are hampered by the fact that observations are much less abundant and data assimilation techniques less advanced in the polar regions than at lower-latitudes (Jung et al., 2016;Bauer et al., 2016).
Despite this fact, several operational forecasting centres regularly contribute to the SIO with sea ice predictions from fully coupled atmosphere-sea ice-ocean modelling systems. One such system is the Met Office's Global Seasonal (GloSea) 30 coupled ensemble prediction system  which has contributed to the SIO since 2010. The ocean and sea ice components of GloSea are initialised each day using the FOAM operational ocean-sea ice analysis of Blockley et al. (2014;. FOAM routinely assimilates sea ice concentration along with various ocean quantities (satellite and in-situ SST, satellite SLA, in-situ profiles of temperature and salinity) but, in common with most The Cryosphere Discuss., https://doi.org /10.5194/tc-2018-62 Manuscript under review for journal The Cryosphere Discussion started: 18 April 2018 c Author(s) 2018. CC BY 4.0 License. operational ocean analysis systems (Tonani et al., 2015;Martin et al., 2015;Balmaseda et al., 2015), does not assimilate sea ice thickness.
Despite the use of dynamic models for seasonal sea ice prediction being in its relative infancy, there have been several studies that have demonstrated skill in retrospective forecasts (or hindcasts) of September-mean Arctic sea ice extent from 5 spring (e.g., Chevallier et al., 2013;Msadek et al., 2014;. There has also been much progress made towards understanding potential predictability of Arctic sea ice extentoften using so-called 'perfect model' approaches (Guemas et al., 2016;Tietsche et al., 2014;Day et al., 2014;Blanchard-Wrigglesworth et al., 2011). However other studies have suggested there is still much improvement that can be made to these systems before they can be considered as useful operational tools. Blanchard-Wrigglesworth et al. (2015) showed that skill of the actual forecasts submitted to the SIO is 10 lower than that suggested by these hindcast and perfect model studies, and Stroeve et al. (2014) found that skill was only marginally better than a linear trend forecast.
Several studies have suggested that seasonal predictions could be improved by the addition of sea ice thickness. Blanchard-Wrigglesworth and Bitz (2014) found sea ice thickness anomalies in GCMs to have a timescale of between 6 and 20 months 15 making their correct representation in model initial conditions of importance for seasonal predictions. Other modelling studies by Holland et al. (2011) and Kauker et al. (2009) have also shown that knowledge of winter ice thickness can provide some predictive capability for summer ice extent. Perfect model studies (e.g. Day et al., 2014) have also suggested that correct initialisation can lead to improved seasonal forecasts. Day et al. (2014) used the HadGEM1.2 climate model to show that memory of winter thickness conditions can persist well beyond seasonal timescales and provide predictive capability for 20 up to 2 years. Collow et al. (2015) found considerable changes in ice concentration forecasts when changing the initial thickness in the CFSv2 seasonal prediction system. They show an improvement to seasonal forecasts when using thickness fields from the Pan-Arctic Ice-Ocean Model Assimilation System (PIOMAS) model of Zhang and Rothrock (2003).
However Blanchard-Wrigglesworth et al. (2017), who also used PIOMAS to initialise many different SIO seasonal prediction models, find that there is also a strong role for model uncertainty in the evolution of seasonal forecasting errors. 25 Although the assimilation of sea ice concentration has been included in ocean reanalysis, operational ocean prediction and seasonal forecasting systems for several years (Stark et al., 2008;, sea ice thickness is not yet routinely used to initialise these systems (Martin et al., 2015;Balmaseda et al., 2015;Tonani et al., 2015). There have however, been several recent studies that have sought to improve the representation of Arctic sea ice thickness in analyses and short-range 30 forecasts using satellite thickness products derived from Soil Moisture and Ocean Salinity (SMOS) brightness temperatures and/or from CryoSat-2 (hereafter CS2) radar freeboard measurements. Such studies have generally focused on assimilation of thickness using ensemble techniques into short-range forced ocean-sea ice models in the Topaz system (Xie et al., 2016) or using MITgcm (Yang at al., 2014;Mu et al., 2018). Although these studies showed considerable improvement to the The Cryosphere Discuss., https://doi.org /10.5194/tc-2018-62 Manuscript under review for journal The Cryosphere Discussion started: 18 April 2018 c Author(s) 2018. CC BY 4.0 License. simulation of sea ice thickness, the impact on short-range forecasts of sea ice concentration or extent was minimal. More recently, Allard et al. (2018) used direct initialisation of CS2-derived thickness within a series of reanalyses performed with NRL's ocean-sea ice Arctic Cap Nowcast/Forecast System (ACNFS). They show that the analysed sea ice thickness is significantly improved when assimilating CS2 thickness compared against in-situ and airborne measurements. They also show a good agreement between the CS2-derived ice thickness and observations from in-situ and airborne sources. 5 In this study, we use a nudging technique to assimilate CS2 sea ice thickness within the FOAM/GloSea reanalysis system and use these initial conditions to determine the impact of sea ice thickness initialisation on the skill of GloSea seasonal predictions. Although several studies have looked at the impact of thickness initialisation on short-term forecasts and in forced ocean-sea ice models, there have been no studies exploring the impact of initialising coupled seasonal forecasts using 10 sea ice thickness. Here we do so for the first time using the Met Office GloSea seasonal prediction system and show that sea ice thickness initialisation leads to a considerable improvement in the skill of seasonal predictions of Arctic sea ice extent and ice edge location. This paper is structured as follows: Section 2 introduces the modelling systems and observations used in this study; Section 3 15 describes the assimilation of CS2 thickness and the impact on the ocean-sea ice reanalysis; Section 4 provides details of GloSea coupled seasonal forecast experiments performed using CS2 initialised thickness and shows improved skill for seasonal forecasts of Arctic ice cover. Section 5 provides summary discussion and an overview of proposed future work.

Modelling systems 20
The model systems used in this study are taken from the Met Office suite of seamless, traceable prediction systems introduced by Brown et al. (2012) using components of the HadGEM3 coupled model architecture described by Hewitt et al. (2011). All of these HadGEM3-based modelling systems simulate the ocean and sea ice conditions using the Nucleus for European Modelling of the Ocean (NEMO) ocean model (Madec, 2008) coupled to the Los Alamos sea ice model (CICE) (Hunke et al., 2015). 25 Within the Met Office's unified, seamless framework, seasonal forecasts are performed using the GloSea coupled prediction system Scaife et al., 2014). GloSea produces two 210-day seasonal forecasts every day, which, together with those from previous days, are combined to form a lagged ensemble prediction system. Meanwhile hindcasts performed for previous years are used to establish errors in the model climatology for the purposes of bias correction. More 30 details on GloSea can be found in MacLachlan et al., (2014). The ocean and sea ice components of the GloSea system are initialised each day using analyses from the Forecast Ocean Assimilation Model (FOAM) system described in Blockley et al.  (2014; 2015). FOAM is an operational ocean-sea ice analysis and forecast system run daily at the Met Office. Satellite and in-situ observations of temperature, salinity, sea level anomaly and sea ice concentration are assimilated by FOAM each day using the NEMOVAR 3D-Var FGAT scheme. Sea ice thickness is not currently assimilated in FOAM; new ice is added by the concentration assimilation at a default thickness of 0.5 metres. More details of the FOAM system can be found in Blockley et al. (2014) and more about the NEMOVAR 3D-Var FGAT scheme used therein can be found in Waters et al. 5 (2014).
As well as operational analyses and forecasts, longer reanalyses are performed with the FOAM system using surface forcing derived from the ERA-Interim atmospheric reanalysis (Dee et al., 2011). Within the GloSea seasonal prediction system, hindcast experiments initialised from these reanalyses are used to bias correct the GloSea seasonal forecasts (see 10 MacLachlan et al., (2014) for more information). As well as being used for bias correction within GloSea, these ocean reanalyses are used more widely within the ocean community and have been used to help answer a number of other scientific questions (e.g. by Roberts et al., 2013;Jackson et al., 2015).
Throughout this study we shall use prototype FOAM and GloSea systems based on the latest configuration of the Met Office 15 coupled modelling system (GC3: Williams et al., 2017) which will be used as part of Met Office Hadley Centre's contribution to phase 6 of the Coupled Model Intercomparison Project (CMIP6). This GC3 coupled model version uses the GO6 ocean and GSI8 sea ice component configurations described in Storkey et al., (2018, in review) and Ridley et al., (2018) respectively and uses the extended ORCA025 grid described therein (nominally 1/4 degree horizontal resolution with 75 vertical levels). 20

Observations of sea ice thickness
Whilst observations of sea ice concentration providing large-scale coverage for both poles have been available since 1979 (Fetterer et al., 2016;Rayner et al., 2003), measurements of sea ice thickness are, relatively, much less abundant. However, satellite estimates of winter thickness have been available for a number of years using radar altimetry (Laxon et al., 2003), laser altimetry (Kwok et al., 2009), and, more recently for thin ice, microwave brightness temperatures (Kaleschke et al., 25 2016). Although radar altimeter estimates of sea ice thickness have been around for some years their up-take into operational ocean-sea ice assimilation systems has been minimal. The main reasons for this are three-fold: owing to the orbit inclination, these datasets often have a large 'pole-hole' giving poor coverage in the central Arctic Ocean; there is considerable uncertainty associated with these estimates of ice thickness (Ricker et al., 2014); the data were not made available in nearreal-time for use in operational analysis systems. The problems outlined above have been ameliorated somewhat during the 30 last few years by the launch of ESA's CryoSat-2 satellite (CS2) whose primary objective is to acquire accurate measurements of sea ice thickness. CS2 is fitted with a SIRAL altimeter and has an unusually high inclination orbit that provides observational coverage up to 88° North (Laxon et al., 2013). The processed data from CS2 is also provided in  (Tilling et al., 2016) making its use within operational analysis systems a realistic proposition.

CryoSat-2 thickness observations
In this study, we use monthly CS2 winter (Oct-Apr) thickness estimates produced by CPOM (Tilling et al., 2016) which start from October 2010 until present (at time of writing). Sea ice freeboard is inferred from radar altimetry aboard the CS2 5 satellite and is converted to thickness by assuming that the sea ice floats in hydrostatic equilibrium and by making various assumptions about the snow loading and the relative densities of the sea ice, the ocean and the overlying snow. Details of the methods used to generate the CPOM thickness fields can be found in Laxon et al., (2013) and Tilling et al. (2015). Some more general discussion of the uncertainties involved in the calculation of sea ice freeboard and thickness using radar altimetry can be found in Ricker et al. (2014). 10 The CPOM thickness data are provided on a 5 km polar stereographic grid having been smoothed with an averaging window of radius 25 km. We apply a further quality control (QC) to the data before use. To prevent any smearing of thickness values near the ice edge during the filtering step we impose a maximum displacement of 15 km between the average location of the raw track observations and the centre of the final grid point (Andy Ridout, pers. comm., 2017). We further remove any 15 spuriously high observations by imposing a maximum thickness threshold of 7 m. The CS2 thickness retrieval methodology is particularly sensitive for thin ice where the ice freeboard is not much higher than sea level (Ricker et al. 2014;2017). To avoid high observational error associated with these thin measurements we impose a minimum thickness threshold of 1 ma choice that was motivated by Figure 2b of Ricker et al. (2017). Further, to ensure that the observations are as representative of the month as possible we apply the constraint that at least 10 different altimeter tracks are used to determine the monthly-20 mean observation. We also impose a constraint on the spread of the track observations by keeping monthly observations only when the standard deviation of the contributing individual track observations was less than 2 m. In total, application of the abovementioned QC rejected roughly 21.5% of the original observations; about 9.4% of the observations were removed by the 1m cut-off and just over 12% were rejected by the remaining constraints. An example of the thickness observations used in this study can be seen in Figure 1, which shows a map of average December, January and February Arctic thickness for 25 2011-2015 inferred from CS2 estimates after application of the QC process described above.
Although SMOS thickness data is also available for much of the study period used here, we choose to use only CS2 data for this study. The motivation for this is that SMOS only provides information about thin ice up to about 0.5 m whilst the CS2 data is good for thicker ice above 1 m (Ricker et al., 2017). Here we are concerned with seasonal predictions of Arctic 30 summer sea ice -for which ice less than 2 m thick tends to melt away completely (Keen et al., 2013). We therefore expect the assimilation of thicker ice to be more important for our needs and so use only the CS2 thickness observations for this feasibility study.

Observations of sea ice extent and concentration
Uncertainty associated with sea ice concentration and extent estimates from satellites is high (Ivanova et al., 2015) and the commonly used sea ice extent metric is nonlinear and dependent on resolution (Notz, 2014). To account for this uncertainty we include observational estimates from three different sources: extents calculated from the 1 degree gridded HadISST1.2 dataset of Rayner et al. (2003); the NSIDC sea ice index of Fetterer et al. (2016); and gridded sea ice concentration fields 5 from the most recent FOAM-GloSea ocean-sea ice reanalysis. This reanalysis, performed using version 13 of the FOAM system (Blockley et al., 2015), is used within the Copernicus Marine Environment Monitoring Service (CMEMS; http://marine.copernicus.eu/) global ocean reanalyses ensemble product (ID: GLOBAL-REANALYSIS-PHY-001-026; described in http://cmems-resources.cls.fr/documents/QUID/CMEMS-GLO-QUID-001-02). Using this CMEMS reanalysis has the benefit that it is performed on the same ORCA025 grid as the ocean-sea ice components of the GloSea seasonal 10 forecasting system, which makes spatial comparisons easier. This reanalysis has also been evaluated thoroughly through the Ocean Reanalyses Inter-comparison Project (ORA-IP) (see Balmaseda et al., 2015;Chevallier et al., 2017;Uotilla et al., in review). To avoid confusion with the FOAM reanalyses performed as part of this study, and described later, we refer to this product as "CMEMS" hereafter.

15
The CMEMS reanalysis was performed using SSMI/S sea ice concentration data provided by EUMETSAT's Ocean and Sea Ice Satellite Application Facility (OSI-SAF). Sea ice concentration is assimilated along with ocean data sources using the NEMOVAR 3D-Var scheme (see Blockley et al., 2014;Waters et al., 2014). Prior to October 2009, OSI-SAF's Global Sea Ice Concentration Climate Data Records (OSI-409, version 1.1) product was assimilated. When the reanalysis was run, in 2015, these data were only available up to the end of 2009 and so the OSI-SAF near-real-time (NRT) product OSI-401a was 20 used from 25th October 2009 onwards. These two datasets have differences in the processing of low concentration ice and near coastlines (see Section 4.2of OSI-SAF, 2017). However, this does not cause us any concern here because our study is focussed on the CS2 era from October 2010 onwards.

Initialisation of thickness in the ocean-sea ice reanalysis system
Here we use the latest development version of the FOAM-GloSea reanalysis system that has been undertaken as part of the 25 upgrade of GloSea and FOAM to use the latest GC3 version of the Met Office coupled model architecture (Williams et al., 2017). Specifically here the ocean reanalysis system is using the GO6 ocean configuration described in Storkey et al. (2017, in review) and the GSI8 sea ice configuration described in Ridley et al. (2017, in review). We take the latest GO6+GSI8 reanalysis as our control (hereafter CTRL-RA) and modify it to include initialisation of sea ice thickness using CS2 observations (hereafter ThkDA-RA). The CTRL-RA reanalysis was run from 1992 to 2015 but here we only re-run the last 5 30 yearsfrom October 2010 to the end of 2015to tie in with availability of CS2 thickness estimates. Within the ThkDA-RA reanalysis, CS2 thickness data are assimilated using a basic nudging technique in which thickness fields are nudged towards the monthly gridded CS2 observations in a fashion akin to that employed by climatological relaxation schemes. All other data used within the control run (i.e., SST, SLA, T/S profiles, and SSMI/S concentration) are assimilated here too in the same manner as in the standard FOAM system (Blockley et al., 2014;. An overview of reanalysis experiments used in this study can be found in Table 1. 5 We use the monthly CPOM measurements introduced in Section 2.2 and map them onto the model grid using a standard binning technique. A linear interpolation is performed each day to get daily thickness observations from the nearest two months. Assimilation increments are created by taking a simple difference between these daily CS2 thickness observations and the daily-mean model thickness. Where no observations are present, the increments are set to zero to ensure no thickness 10 nudging is performed. We do things this way to avoid problems arising with the sparse data and so we can keep nudging model towards CS2 thickness. The increments are applied within the CICE model code in a similar fashion to the sea ice concentration assimilation described in Peterson et al. (2014) and Blockley et al. (2014). Thickness changes are made at each time step using the 15 Incremental Analysis Update (IAU) method. A 5-day relaxation timescale is used and increments are only applied where the grid-cell ice concentration is above 40%. The CICE sea ice model uses multiple thickness categories to represent the subgrid thickness distribution. To apply the thickness increments into the multi-category model we chose to nudge the grid-boxmean thickness towards observations by making changes across each of the 5 sub-grid categories -so long as there is ice present there with concentration above 1% -maintaining the initial distribution of volume between the categories. We note 20 here that this approach is similar to that employed by Allard et al. (2018) who multiply the ice volume in each category by the grid-box-mean model-observation thickness difference. However, whilst they use direct initialisation, we use the IAU approach to incorporate changes into the model in a gradual manner and limit the potential for sudden shock in the system (Bloom et al., 1996). Figure 2 shows the impact of the CS2 thickness initialisation on the reanalysis thickness fields; the 5-year mean differences for 1st Mayat the end of winter when CS2 observations ceaseand for 30th Septemberafter 5 months of running without thickness assimilationare shown. At the end of winter (Figure 2a) it is apparent that inclusion of CS2 thickness nudging has increased sea ice thickness across much of the Atlantic sector of the Arctic (Barents, Kara and Greenland Seas).

Impact of CroySat-2 initialisation on reanalysis thickness 25
Conversely, ice thickness has been decreased in the Canadian Arctic Archipelago (CAA) and, to a lesser degree, across

15
Although there is a dipole in the thickness changes, with some areas being consistently thicker and others consistently thinner, the net effect of the CS2 thickness nudging is an increase in sea ice thickness. This can be seen on the volume time series plot in Figure 3 noting that an increase in volume here directly implies an increase in average ice thickness because, as sea ice concentration is tightly constrained by the assimilation of sea ice concentration and sea surface temperature, the ice area between the two reanalysis simulations is virtually identical (not shown). In Figure 3, sea ice volume for the CTRL 20 reanalysis is compared with the equivalent for the ThkDA reanalysis along with, as a reference, volume estimates from the PIOMAS model of Zhang and Rothrock (2003). The volume in the CTRL run is much closer to PIOMAS than the ThkDA run. This is expected as PIOMAS has been shown to underestimate thickness/volume in the winter when compared with thickness derived from radar altimeter (Tilling et al., 2015;Laxon et al., 2013) although has been shown to compare better with laser altimeter estimates such as ICESat (Schweiger et al., 2011). Figure 3 shows that winter volume is increased the 25 most by the assimilation of CS2 thickness. This is perhaps not surprising given that winter is the time when the data is available. However, there is some evidence that these winter changes also affect the summer volume, which is most pronounced in 2014 and, to a lesser extent, 2013 and 2015. In all years the volume time series shows a clear kink on 1st October when the CS2 data comes back online and begins to be assimilated in the reanalysis -although this is much less pronounced in 2014 when the summer thickness was also increased. 30 In summary, we have shown that nudging Arctic sea ice thickness to CS2 observations within the ThkDA-RA reanalysis has the net effect of increasing sea ice volume. The differences between the two reanalyses reveal a persistent bias in the thickness distribution in the model when compared with CS2 whereby sea ice is too thick on the Pacific side and not thick enough on the Atlantic side of the Arctic. There is evidence to suggest that the winter Arctic sea ice thickness/volume is an important precondition for evolution of ice through the melt season (in agreement with the current literature) because the effects of winter thickness changes imposed by the nudging are still evident at the end of the summer. Another important result to note here is the fact that the assimilation of thickness worked well and the increments were successfully retained by the model, which bodes well for inclusion of sea ice thickness within the NEMOVAR system in the future. 5

Initialisation of thickness in the GloSea coupled seasonal prediction system
Seasonal forecasts of sea ice extent, amongst other things, are made operationally by the GloSea system each day. Hindcasts, performed from a discrete predefined set of start dates each year, are also run within the operational suite each day and used as part of the bias correction process. These hindcasts are initialised using the long GloSea ocean-sea ice reanalysis (as described in Section 3) which is coupled to atmosphere initial conditions interpolated from the ERA-I reanalysis (Dee et al., 10 2011). In addition to being used operationally for bias correcting forecasts, seasonal hindcasts such as this are performed for testing of model configuration upgrades prior to implementation within the GloSea operational suite. A recent trial of the new GC3 coupled model configuration of Williams et al. (2017) has been performed using the GloSea system and we shall use as our control (denoted CTRL-HC). The ocean and sea ice for these hindcasts are initialised using the control reanalysis (CTRL-RA) described in Section 3 and the atmosphere is initialised from the ERA-I reanalysis. As the GC3 developments 15 include the implementation of a new multi-layer model for terrestrial snow (see Walters et al., 2017;Williams et al., 2017) the snow fields were initialised separately from the atmosphere using a standalone version of the GC3 land surface component (JULES) with ERA-I snow precipitation and data assimilation.
Here we wish to test the impact of initialising with CS2 sea ice thickness on the seasonal forecasts of September sea ice 20 extent. For this purpose, an ensemble of seasonal forecasts was configured that was identical to the CTRL-HC hindcasts except that the ocean and sea ice components were initialised from the ThkDA-RA reanalysis instead of CTRL-RA.
Seasonal forecasts were performed from 3 different spring start dates (25th April, 1st May and 9th May). For each of these start dates, an ensemble of 8 hindcasts was initialised from the same analysis fields with spread between the members achieved by using stochastic physics (see MacLachlan et al., (2013) for more details). This methodology is identical to that 25 used for CRTL-HC and, through a mixture of lagged and perturbed methods, provides an ensemble of 24 forecasts of September sea ice each year. These hindcasts were performed for 2011-2015each year that spring analyses are available from the ThkDA-RA ocean reanalysis. We denote this system of hindcasts as ThkDA-HC. Details of the GloSea coupled hindcast experiments used in this study can be found in Table 2.

Improvements to seasonal prediction of Arctic extent and ice edge location
Results from the ThkDA-HC seasonal hindcasts show that the CS2 thickness initialisation has considerably improved the skill of GloSea seasonal predictions of Arctic sea ice cover. Figure 4 shows September-mean Arctic sea ice extent (upper panel) from the GloSea control ensemble (CTRL-HC; blue) and the ensemble run with initialised thickness (ThkDA-HC; pink). Hindcasts from each of the 24 ensemble members, initialised from the 3 April/May start-dates, are depicted by the 5 crosses; the ensemble mean is plotted with bold symbols and inter-connecting lines. Although the ThkDA-HC hindcasts only start from 2011 we plot the CTRL-HC throughout the whole period of the run from 1992-2015 to help put the, relatively short, 5-year time series into context. To assess the accuracy of the GloSea seasonal predictions, observational estimates of Arctic extent, from the CMEMS, HadISST and NSIDC sources (see Section 2.3), are plotted alongside the model hindcasts (black/grey). We note here that the difference in extent prior to 2010 between the CMEMS and the HadISST and NSIDC 10 data sources apparent in Figure 4a is caused by the switch in OSI-SAF data products in October 2009, from OSI-409 version 1.1 to OSI-401a, described in Section 2.3 above. Being prior to the launch of CS2, this change does not have any impact on the results of our study but we include all years available from CTRL-HC in Figure 4 to build a picture of the skill in the CRTL-HC simulations.

15
The total extent comparisons in Figure 4a show that the run with initialised winter thickness gives improved predictions of September sea ice extent. This is particularly true for 2011 and 2012, for which the ThkDA-HC predictions of total extent are very close to the observed values. The underestimation of basin-wide extent seen throughout the CTRL-HC hindcasts has been reduced; 2011-2015 5-year-mean extent of 3.78 x106 km2 for ThkDA-HC is much closer to the observational average of 4.62 x106 km2 than is the CTRL-HC value of 2.79 x106 km2 (Figure 4a). 20 Basin-wide extent is not a very useful metric for assessing sea ice because, although it provides information about the amount of ice present, it does not take into account the location of the ice or the position of the ice edge -which are more useful for operational users (Notz, 2014). To assess the skill of GloSea predictions in relation to the spatial distribution of ice and ice edge location, we use the Integrated Ice Edge Error (IIEE) metric introduced by Goessling et al. (2016). This metric 25 is essentially the area integral of all model grid cells where the forecast and observations disagree about whether sea ice is present or not (see Goessling et al., 2016 for more details). Here we use a sea ice concentration threshold of 15% to define whether ice is present or not in any particular grid cell and compare the GloSea hindcast predictions to the CMEMS reanalysis which assimilated the OSI-SAF data. The GloSea and CMEMS products are on the same ORCA025 grid and so comparisons between the two are easy and not degraded by having to remap the data between different grids. Results from 30 the IIEE analysis can be found in Figure 4b which shows IIEE for each ensemble member of the CTRL-HC and ThkDA-HC GloSea forecasts (as in Figure 4a but for IIEE not extent).
The Cryosphere Discuss., https://doi.org /10.5194/tc-2018-62 Manuscript under review for journal The Cryosphere Discussion started: 18 April 2018 c Author(s) 2018. CC BY 4.0 License. Figure 4b shows that ice-edge error is considerably improved by the CS2 thickness initialisation with IIEE reduced from 3.21 x106 km2 for CTRL-HC down to 2.02 x106 km2 for ThkDA-HC -a reduction of 37%. The differences in both extent and IIEE shown in Figure 4 are significant at the 1% level over the whole 5-year period and for each of the individual years except for 2013. In general, the improvement in the ice edge location and IIEE is more pronounced than the improvement to the basin-wide extent. This is to be expected given that the CS2 thickness initialisation changed the distribution of sea ice 5 thickness in the Arctic as well as increasing average thickness. Figure 5 further illustrates the spatial improvement in sea ice predictions showing the probability of ice across the CTRL-HC and ThkDA-HC ensembles for each year (2011-2015) with ensemble-mean and observed ice extent (represented by 15% concentration contours) overlain. Here we calculate the probability of ice, at each grid-cell, as the proportion of ensemble members for which the ice concentration is at least 15%. Figure 4b, the ice edge location in Figure 5 for the ThkDA-HC system is much better than 10 for CTRL-HC. In particular, the ThkDA-HC ensemble-mean ice edges for 2011 and 2012 are very close to those produced by the CMEMS reanalysis. A consistent feature of Figure 5 is that the ice edge along the Atlantic sector of the Arctic is very well defined for the ThkDA predictions and is very close to the CMEMS reanalysis for all years.

Consistent with the IIEE results in
The spatial changes in the September-mean sea ice concentration forecasts depicted in Figure 5

Wider impact of Arctic sea ice changes
The changes in winter ice thickness, and associated changes in the evolution of Arctic ice coverage through the melt season 25 (described above), lead to some significant changes, and improvements, for prediction of the wider climate system. Owing to the overall increase in Arctic sea ice thickness and extent, the ThkDA predictions show a general cooling of September 2m air temperatures in the Arctic (Figure 7a). This cooling improves the model errorcalculated here as the difference from the ERA-I atmospheric reanalysisboth for each individual ensemble member (Figure 8a) and for the ensemble mean (not shown). The exception to this is south of the Fram Strait in ice export regions, where the 2m temperature has become too 30 cool. We note, however, that for individual ensemble members, the disagreement with observations should not completely disappear, as the observed value represents only one possible manifestation of the range of possible realities that the ensemble attempts to model. Interestingly, this improvement is also seen over perennially ice covered regions north of The Cryosphere Discuss., https://doi.org /10.5194/tc-2018-62 Manuscript under review for journal The Cryosphere Discussion started: 18 April 2018 c Author(s) 2018. CC BY 4.0 License.
Greenland and the Canadian Arctic Archipelago, where significant improvements in air-sea fluxes would not necessarily be expected.
Significant differences are also present in the pressure fields. A general reduction in 500hPa thickness is seen in the ThkDA experiment over the Arctic Ocean (Figure 7b)which in turn is also manifest in a decreased mean sea level pressure field 5 over the region (not shown). This reduction leads to an improved comparison with observations (ERA-I reanalysis) over the Canadian Basin and Greenland, but slightly worse comparison with observations over the Barents Sea and Western Europe ( Figure 8b). These pressure differences however are generally not significant save for a small patch over the Canadian Arctic Archipelago (Figure 8b). The 500 hPa and mean sea level decrease over the Arctic is suggestive of an increase in both the Arctic Oscillation (AO) and North Atlantic Oscillation (NAO) indices. This is consistent with other studies that have linked 10 lower Arctic sea ice coverage with a tendency for a more meridional atmospheric jet (Francis and Vavrus, 2012), along with a tendency toward the negative phase of the NAO (Petoukhov and Semenov, 2010). However, owing to the small sample of years looked at here, it is doubtful we could establish a link with increased predictive skill of the inter-annual variability of the atmospheric mid-latitude circulation.

Impact of an improved model thickness climatology 15
The reanalysis comparison performed in Section 3 revealed persistent thickness distribution biases in the model relative to the CS2 derived data, whereby the ice was too thin in the Atlantic sector and too thick in the Pacific sector. As shown previously ( Figure 6) these biases align very well with the ice edge errors suggesting a clear relationship between model thickness bias and forecast error. We would therefore like to understand whether the improvements we see in the GloSea seasonal forecasts are caused primarily by an improvement to the model's thickness climatology, or whether the inter-annual 20 thickness distribution changes present in the observations are having an impact.
To try to answer this question another ensemble of seasonal hindcasts was performed, for years 2011-2014 only, using the 2015 sea ice initial conditions each year. This ensemble of hindcasts is denoted CLIM-2015. We note here that CLIM-2015 hindcasts are not performed for 2015 because they would simply be a duplication of the ThkDA-HC 2015 hindcasts. The 25 motivation for doing things this way is to ensure that we have a dynamically self-consistent initial condition for the sea ice model. Simply averaging the initial conditions for the 5 years would potentially introduce some coupled initialisation shock that could make the results harder to analyse.
The total extent and IIEE relative to the CMEMS reanalysis for CLIM-2015 can be found in Figure 4  ThkDA-HC runs statistically.

5
Interestingly the CLIM-2015 hindcasts show much reduced inter-annual variability when compared to those from the CTRL and ThkDA experiments and the ensemble-mean extents for each year are close. This is interesting, given that Arctic summer sea ice melt is strongly influenced by atmospheric variability (Deser et al., 2000), and suggests that the ensemble size of 24 used here is sufficient to remove atmospheric variability from the ensemble mean. It also suggests that the initial Arctic thickness distribution and/or volume at the start of the melt season exhibits a controlling factor on the evolution of the 10 ice through the melt season and the eventual September mean extent. This latter point is further supported by the fact that an additional ensemble of GloSea seasonal forecasts, performed using constant 2015 initial conditions for both the ocean and sea ice components, gave very similar results to that seen in the CLIM-2015 experiment (not shown).

Summary and conclusions
In this study, we have used nudging techniques to test the impact that initialising forecasts using CryoSat-2 (CS2) thickness 15 measurements could have on Met Office seasonal forecasts of September sea ice extent. We have shown that initialisation of sea ice thickness significantly improves the accuracy of GloSea seasonal forecasts of summer sea ice cover. Biases in total Arctic extent are reduced as a whole and there are considerable improvements to the spatial distribution of sea ice and iceedge locationparticularly in the Atlantic sector. These improvements to the sea ice cover also lead to improvements in near-surface temperature and pressure fields over the Arctic domain. 20 Technically the application of thickness increments within the CICE sea ice model has been shown to work well. The model is able to retain the information supplied by the thickness nudging all the way through the summer when thickness observations are absent. This is true during the GloSea coupled seasonal forecasts but also for the FOAM reanalysis in which the sea ice model is also being modified by the assimilation of concentration. This result, which is also supported by the 25 findings of Allard et al. (2018), increases our confidence that assimilating sea ice thickness using a more sophisticated and consistent approach will lead to improvements in the FOAM analyses as well as the short-range (FOAM) and seasonal (GloSea) forecasts initiated from them. Work is now underway at the Met Office, under the EU-SEDNA project, to include thickness within the fully balanced NEMOVAR 3D-Var FGAT variational scheme used in FOAM.

30
Confronting the sea ice thickness from the FOAM reanalysis with the CS2 satellite data has revealed a persistent bias in the modelled thickness distribution whereby the simulated Arctic sea ice is too thin on the Atlantic side and too thick in the The Cryosphere Discuss., https://doi.org /10.5194/tc-2018-62 Manuscript under review for journal The Cryosphere Discussion started: 18 April 2018 c Author(s) 2018. CC BY 4.0 License.
Beaufort Sea. This bias is most likely caused by deficiencies in the formulation of the sea ice dynamics: either the rheologythe so-called ice-ice forceor deficiencies in the momentum exchange between components in the atmosphere-ice-ocean (primarily wind drag). To ameliorate this situation we plan to experiment with the form-drag scheme and the anisotropic rheology developed for CICE by the CPOM group at University of Reading (Tsamados et al., 2013;2014). In particular, the form-drag scheme has been shown to improve the Arctic thickness distribution in standalone sea ice model experiments (D. 5 Schroeder, pers. comm., 2017).
The clear relationship between modelled winter thickness biases and summer extent errors shown in Figure 6, along with the improved ice cover obtained using thickness initialisation (Figure 4 and Figure  Although the addition of sea ice thickness nudging to the FOAM analysis system clearly improves the seasonal forecasts of 15 summer sea ice, it is not clear how much of this improvement comes from initialising each year with the CS2 thickness and how much is down to the assimilation improving the model's thickness distribution climatology. The IIEE and extent analysis suggests that, for 3 out of the 4 years, using the correct thickness initialisation (ThkDA-HC) provides a better forecast of September ice edge location when compared with the run using the 2015 thickness (CLIM-2015). This is in agreement with the findings of Day et al. (2014) who showed that, in the perfect model framework, that correct initialisation 20 of Arctic thickness in the HadGEM1 climate model, led to an improved model evolution when compared with initialising the model with its own thickness climatology. In this case, however, we are not able to say this conclusively because the time series is too short to allow us to reject the null hypothesis that all the ensemble members from these two runs are taken from the same distribution.

25
Furthermore, the improvement shown in Figure 4 between ThkDA and CLIM-2015 is small relative to the improvement between ThkDA and CTRL. Therefore, we conclude that, certainly for the FOAM-GloSea system, improving the model thickness climatology is at least as important as initialisation of sea ice thickness for improving predictive skill of seasonal forecasts.
programme. We are thankful to Philip Davis (Met Office) for running and providing access to the CTRL-HC hindcast ensemble. Provision of observational data sources used within this study is also acknowledged: CryoSat2-derived thickness fields from CPOM; sea ice concentration products from OSI-SAF and NSIDC; ERA-Interim atmospheric reanalysis from ECMWF. EB would further like to thank Andy Ridout (CPOM, UCL) for providing monthly CS2 data and for his useful advice regarding aspects of the quality control.