Variability in Antarctic Surface Climatology Across Regional Climate Models and Reanalysis Datasets

. Regional climate models (RCMs) and reanalysis datasets provide valuable information for assessing the vulnerability of ice shelves to collapse over Antarctica, which is important for 2100 (cid:58)(cid:58)(cid:58)(cid:58) future (cid:58) global sea level rise estimates. Within this context, this paper examines variability in snowfall, near-surface air temperature and melt across products from the MetUM, RACMO and MAR RCMs, as well as the ERA-Interim and ERA5 reanalysis datasets. Seasonal and trend decomposition using Loess 5


Domain
Driving Data :::: Time ::::: Period : of ::::::  Table 1. The two reanalysis datasets and six RCM simulation outputs compared in the paper. The label with which each simulation is referred to in the paper is given.

Reanalysis Datasets and RCMs Specifications
The ensemble of Antarctic-wide RCM simulations examined in this paper are part of the Coordinated Regional Climate Downscaling Experiment (CORDEX: https://cordex.org/), which is a global project that provides coordinated sets of RCM sim-90 ulations worldwide. The model specifications for each of the RCM simulations in the chosen ensemble, as well as for the ERA-Interim and ERA5 reanalysis products, are detailed here. There are significant differences, with some of the key aspects being: different atmospheric dynamics components between all models; different surface schemes-of particular note is the 'zero-layer' scheme used in the MetUM simulations, which has been identified as a major deficiency in simulations, compared with the multi-layer schemes included in MAR and RACMO, due to such impacts as that on heat transfer and not representing 95 the insulating properties of the column of snow (Slater et al., 2017;Walters et al., 2019); differences in the vertical and horizontal resolutions between all models, with particular interest on the performance of the high-resolution 12 km MetUM against the low-resolution 49 km simulation; differences in the driving data between the two RACMO and two MAR simulations, that are otherwise identical; differences in the Digital Elevation Models (DEMs) and masks used by each model, with MAR and RACMO using comparatively similar DEMs, while the MetUM uses a DEM similar to that of ERA5 but shown to have large-100 scale differences relative the that used by MAR and RACMO of over 100 m, particularly over areas close to the perimeter of the ice sheet (Fig. C1).

ERA-Interim and ERA5
ERA-Interim, produced by ECMWF, is a global reanalysis dataset, spanning 1979-2019 with 6-hourly temporal resolution, approximately uniform horizontal resolution of 79 km spacing and 60 vertical levels up to 10 Pa (Dee et al., 2011). The

105
ECMWF uses a four-dimensional variational data assimilation technique (4D-Var). Era-Interim was world leading and is included as the specified driving data in the base criteria for the CORDEX simulations but has since been superseded by ERA5, also produced by ECMWF (Hersbach et al., 2020), with a number of ERA5 driven simulations also included in the Antarctic-CORDEX ensemble of RCM outputs. The ERA5 reanalysis dataset uses the updated Cycle 41r2 version of the Integrated Forecast System (IFS) numerical weather prediction (NWP) model, with significant developments to model physics 110 and assimilation methods (Hersbach et al., 2020). It spans 1950-Present with an enhanced single hourly temporal resolution, horizontal resolution of 31 km and 139 vertical levels up to 1 Pa. In addition, ERA5 has uncertainty estimates derived from an ensemble of 10 data assimilations performed at a 3 hourly temporal resolution and horizontal resolution of 63 km. The elevation used by ERA-Interim comes from interpolating the GTOPO30 elevation product (ECMWF, 2009), whereas for ERA5 surface elevation is derived from interpolation of a combination of the SRTM30 elevation product along with other surface elevation 115 datasets (ECMWF, 2016). The coupled surface schemes used for ERA-Interim and ERA5 are the Tiled ECMWF Scheme for Surface Exchanges over Land (TESSEL) and updated HTESSEL schemes respectively, both use a single tile to represent snow, while one of the major differences is that HTESSEL allows surface runoff (Balsamo et al., 2009).

MAR
MAR is a hydrostatic RCM, specifically developed for the polar areas (Fettweis et al., 2013). The Antarctic-wide simulations 120 analysed in this paper have a spatial horizontal resolution of 35 km with a vertical resolution of 24 atmospheric levels. Specific details of the atmospheric component of MAR can be found in Gallée and Schayes (1994); Gallée (1995). The atmospheric model is fully coupled to the 1-D SISVAT (Soil Ice Snow Vegetation Atmosphere Transfer) surface scheme (Fettweis et al., 2013(Fettweis et al., , 2017, which uses the Crocus multi-layer surface snow model (Brun et al., 1992) that contains subroutines for processes such as: snow metamorphism and meltwater runoff, retention, refreezing and percolation. SISVAT does not include a full 125 radiative transfer scheme in snow/ice and surface albedo is parameterised as a function of snow grain properties (Tedesco et al., 2016). The relaxation technique is used to apply LBCs from the driving data every 6 hours and spectral nudging is used to constrain the large-scale behaviour in the upper atmosphere. The two Antarctic-wide MAR simulations studied in this paper are identical apart from differing driving data from ERA-Interim and ERA5 respectively. The orography used in the simulations is from BEDMAP2 (Fretwell et al., 2013). For further detail on MAR and the specific version used to generate the 130 output examined in this paper (MARv3.10) the reader is referred to Agosta et al. (2019) and Mottram et al. (2021).

RACMO
RACMO is a hydrostatic RCM with a polar version developed to represent the climate specifically over ice sheets (Van Meijgaard et al., 2008). The RCM uses the dynamical core from HIRHAM (High Resolution Limited Area Model) (Undén et al., 2002) and the physics package CY33r1 version of the Integrated Forecast System (IFS) NWP model from ECMWF. The

135
Antarctic-wide simulations analysed in this paper have a spatial horizontal resolution of 27 km with a vertical resolution of 40 atmospheric levels. The simulations include a multi-layer snow scheme that simulates hydrological processes such as melt, percolation, refreezing and runoff as well as firn densification . In addition, a drifting snow scheme simulates movement of snow from surface winds across the ice sheet (Lenaerts et al., 2010(Lenaerts et al., , 2012a. A snow albedo scheme is implemented, which uses snow grain size as a prognostic variable as well as cloud optical thickness and solar zenith angle to 140 estimate albedo (Munneke et al., 2011). The relaxation technique is used to apply LBCs from the driving data every 6 hours for the RACMO simulation driven by ERA-Interim and every 3 hours for the simulation driven by ERA5 and spectral nudging is used to constrain the large-scale behaviour in the upper atmosphere. The two simulations studied are identical apart from differing driving data from ERA-Interim and ERA5 respectively. The orography used in the simulations is the same as from Bamber et al. (2009). For further detail on RACMO and the specific version used to generate the output examined in this paper 145 (RACMOv2.3p2) the reader is referred to van Wessem et al. (2018) and Mottram et al. (2021).

MetUM
The MetUM is a non-hydrostatic climate model, not specifically developed or optimised for use over the polar regions but adapted in these simulations for use over Antarctica ). The Regional Atmosphere physics configuration for midlatitudes (RA1M) is used (Bush et al., 2020), which is identified as the most suitable configuration available for simulating near-150 surface climatology over Antarctica (Gilbert et al., 2020. The Joint UK Land Environment Simulator (JULES) (Walters et al., 2019) is used with the option of a comparatively simple zero-layer snow/soil composite scheme that does not capture processes such as refreezing of melt water (Best et al., 2011). The two Antarctic-wide MetUM simulations analysed in this paper are identical apart from their spatial horizontal resolutions of 12 km and 49 km respectively, both have a common vertical resolution of 70 atmospheric levels. These limited-area, regional simulations are nested inside the global model configuration 155 of the MetUM, which is itself forced using ERA-Interim reanalysis data and follows a 12 hour re-initialisation procedure that constrains the large-scale circulation in the interior of the domain and prevents it from drifting too far from the driving data . The global MetUM model runs for 24 hour periods, with a re-initialisation happening throughout the domain every 12 hours and boundary conditions for the nested run saved each hour. The first 12 hours of each 24 hour run is discarded as spin-up, while the second 12 hours of each run is kept as output and stitched together with following runs. The 160 orography used in the simulations is the MetUM standard GLOBE 1 km dataset (Elvidge et al., 2019).

Comparison Method
The RCM simulations examined in this paper all use an equatorial rotated coordinate system, where a quasi-uniform horizontalresolution grid is defined over the region by first specifying the grid over the equator with constant latitude and longitude spacing between each grid-cell and then applying a rotation that takes the domain over the region of interest, for example 165 Antarctica. Direct comparisons between the model output are made by regridding onto a common grid, with a common domain and spatiotemporal coordinates. Cubic precision Clough-Tocher interpolation (Mann, 1999) is performed on the unrotated 'grid latitude' and 'grid longitude' coordinates, which are assumed approximately euclidean, to regrid all model outputs onto the MetUM(011) resolution grid. This grid is chosen as it is the highest resolution grid of the simulations examined, meaning no information is lost as part of the regridding. The domain is filtered to only include the regions common across the model outputs, 170 see Fig. 1. The time series examined is filtered to the common 1981-2018 period and 3/6 hourly outputs are aggregated to monthly averages, which captures the dominant annual and seasonal dependency in the variability. For surface air temperature, filtering to only the common timestamps across the models is first applied and then the average temperature over each month Figure 1. Map of Antarctica with some of the main regions and ice shelves labelled, made using the Quantarctica mapping environment (Matsuoka et al., 2021). The RCM simulation domains for the MetUM (green), RACMO (blue) and MAR (purple) are shown. A 1 km resolution hill-shade has been applied from BEDMAP2 (Fretwell et al., 2013).
computed. The common timestamps are limited by ERA-Interim to 00 h, 06 h, 12 h and 18 h. This is not required for snowfall or melt, which are defined as fluxes in the model output.
This results in individual trend (T), seasonal (S) and residual (R) components. The decomposition is additive, meaning for each data point ν=1 to N, the components are summed to give the original time series (Y) (eq.[ 1]). The trend component represents the low-frequency/long-timescale pattern of the time series, after filtering out medium and high-frequency signals including Basic time series decomposition involves first approximating the trend component by applying a polynomial fit through the data. Subtracting this component gives the de-trended data that is then split into seasonal sub-series (e.g. January, February, ...) 185 and an average of each sub-series gives the seasonal component of the data. Subtracting both the trend and seasonal components then gives the residual component of the series. STL is a more sophisticated procedure that allows options such as robust fitting (where the influence of outliers is limited) and also a time-varying seasonal component. The algorithm is iterative and involves two loops: the outer loop reduces the influence of outliers by assigning weights based on the magnitude of the remainder term; the inner loop involves estimation of the trend and seasonal components through iterative feedback (Cleveland et al., 1990).

190
The seasonal component is allowed to vary smoothly over the time series, which is done by applying a LOESS (LOcal Re-grESSion) smoothing to the monthly sub-series with window length n s . As n s → ∞ the LOESS smoothing becomes equivalent to simply taking the average over the sub-series. The value of n s is recommended to be greater than 7 (Cleveland et al., 1990).
As the value increases, the seasonal component approaches a constant periodic state. In this work 13 is used as this allows potential decadal oscillations in the climate to be captured in the seasonal component, such as the Pacific Decadal Oscillation  In this paper temporal variability between the ensemble of Antarctic-wide datasets is assessed in several ways, including: calculating the Pearson linear correlation coefficient between the outputs for each component of the time series and each variable of interest; quantifying differences in the mean of the time series as well as in the standard deviation of the seasonal and residual components; and calculating the RMSD ::: root ::::: mean :::::: square ::::::: deviation :::::::: (RMSD) between the outputs for each variable of interest. Each metric is calculated for every grid-cell in the domain, with Antarctic-wide plots showing spatial patterns. For snowfall and melt, differences at each grid-cell are expressed as a proportion of the respective inter-annual deviations, providing some measure of the relative significance of differences at each location. The impact of systematic differences :: in ::::::: snowfall ::: and :::: melt : on estimates of ice shelf stability depend not only on absolute magnitudes but also on the relative magnitude against a baseline variance, as well as how this influences the ratio in magnitudes between snowfall and melt. The inter-annual, 215 baseline deviation at each grid-cell is approximated as the ensemble average standard deviation in the trend component of the time series. :::::: Results :::::::: presented :: in :::::: spatial :::: maps :::: then ::::: show ::: the ::::::: relative :::::::::: significance :: of ::::::::: systematic ::::::::: differences :::: and ::: are ::: not :::::: simply :::::::: dominated ::: by ::: the :::: sites :::: with ::: the :::::: highest ::::::::: magnitude ::::::::::: snowfall/melt. : For near-surface air temperature differences :: for ::::: each ::::::: grid-cell are not expressed as a proportion and instead simply in degrees Kelvin.

220
Variability in the ensemble of Antarctic-wide outputs (Table 1)  A spatial map of the median correlation in the residual component across the 28 unique model output pairs is plot in Fig. 3.
An ice sheet-only mask is applied for melt using the high resolution shapefile from Depoorter et al. (2013), which is found to 240 remove the most prominent edge effects caused by comparing high-and low-resolution models for a variable that is dependent on the sea/ice categorisation of the grid-cell. In addition, grid-cells where the ensemble 40-year average melt is less than 1 millimeter water equivalent per month (mmW Eqm −1 :::::::::::: mm w.e. m −1 ) are again masked. In Fig. 3 the median correlation for near-surface air temperature is shown to be high (>0.8) across the ice sheet, while for snowfall the correlation remains high again across the majority of the ice sheet but is moderate to low over regions such as the trans-antarctic :::::::::::: Transantarctic 245 mountains, where the topography varies sharply. For melt, the correlation is moderate over the majority of ice shelves, although is noticeably low over: the Ronne ice shelf; the ice shelves bounding Victoria Land; and the interior of the Amery ice shelf.
To understand how systematic differences vary spatially the 1981-2018 mean and seasonal/residual standard deviations for 260 the monthly time series of each variable are also computed at a 12 km grid-cell level. Since it is found that systematic differences in the mean and standard deviations are most pronounced between different models in the ensemble, results presented in Fig.   4, 5 and 6 are filtered to only include: ERA5; MetUM(011); MAR(ERA5); and RACMO(ERA5). ::::::::: Differences ::: for :::: each :::::: model :: are :::: then :::::: plotted ::::::: relative :: to ::: this ::::::: reduced :::::::: ensemble :::::: average ::::::::::: as it is shown in Table 2 that the relative magnitude against standard deviations in the seasonal and residual components is low. For snowfall and melt, differences at each grid-cell are expressed as a proportion of the respective inter-annual deviations, approximated by the ensemble average standard deviation in the trend component. This is done so results presented in 270 spatial maps show the relative significance of systematic differences and are not simply dominated by the sites with the highest magnitude snowfall/melt. Given that for near-surface air temperature, it is not clear whether higher magnitude differences in the mean and standard deviation are expected for sites with greater average temperature, then instead of expressing the results as a proportion they are simply expressed in degrees kelvin.
the Antarctic-CORDEX project, as well as significant differences in the driving data for the two MAR and RACMO runs.

385
The same-model RCM simulations in the ensemble, as well as having identical model physics, parametrisation and tuning, also share factors such as the domain specification, ice mask applied, digital elevation model and boundary conditions. The relative contribution of these additional factors is explored in section 5.1 and from this it is argued that the joint influence of choices in model physics, parametrisation and tuning is the primary factor influencing large-scale variability across the ensemble. The impact of parameter tuning is discussed in (Gallée and Gorodetskaya, 2008), ::::::::::::::::::::::::::: Gallée and Gorodetskaya (2008) where 390 it is shown that adjustment of the parameter for ::::::: adjusting : the relative contribution of snow particles compared to ice particles in the radiative scheme used by MAR ::::: MAR's :::::::: radiative ::::::: scheme has a significant impact on near-surface air temperatureas a : . :: A higher relative contribution of snow particles leads to greater flux in long-wavelength downwards radiation. In addition to exploring the relative contribution of different factors to the large spatial scale variability in the ensemble, in section 5.2 specific features of the variability that are mentioned in results section 4 are discussed and the nature of variability for different 395 variables, regions and time-scales is examined.

Contribution to Variability from the choice of: Domain; Ice Mask; DEM and Boundary Conditions
The exact spatial domains differ between the RCM simulations as shown in Fig. 1. However, the spatial domain for all RCM simulations examined is Antarctic-wide and domain boundaries all exist over the ocean, implying there should be no strong local forcing at any of the boundaries. The effect of increasing domain size over the ocean on the output of climatology 400 over the Greenland ice sheet for MAR has previously been studied and found to not significantly impact results over the ice sheet (Franco et al., 2012). In general, the domain size should be great enough such that the buffer zone, in which boundary conditions are applied, does not intersect the region of interest, which in this case is the Antarctic ice sheet. It is found that for the MetUM(044) run : , the buffer zone intersects some areas of the periphery of the ice sheet, shown clearly in Fig. B1d.
Despite this, it can be seen that effects are localised to the buffer zone boundary : , and that even for the regions of the ice sheet 405 that intersect this the relative impact on systematic differences appears less significant than other factors explored. Overall it is assumed that, for the ensemble of RCM simulations studied, differences in the domains does not have a significant effect on the model output for surface climatology over the ice sheet.
As well as having differences in the outer domain boundaries, the different models also have slight differences in the specified boundaries of the ice sheet due to different coordinates and ice masks used. This creates edge effects at the periphery of the 410 ice sheet, particularly noticeable for melt in for example Fig. 5d at the edges of the Ronne-Filchner and Ross ice shelves. It is shown in Mottram et al. (2021) and Hansen et al. (2022) that these edge effects, due to inconsistent ice masks, can have a significant impact on the total estimated SMB over the ice sheet. In this paper, although ice sheet wide totals are computed (Table 2), the focus is primarily on evaluating variability in the time series at each 12 km grid cell after regridding products to a common high-resolution grid. Results for spatial maps of differences for melt are masked using an ice-sheet only mask from 415 Depoorter et al. (2013), which is found to exclude the most significant edge effects from areas where low-resolution models overestimate the extent of the ice sheet after regridding. The same mask is applied before calculating average correlations and RMSDs also reducing the impact of edge effects. Results presented and discussed here, particularly regarding large-scale spatial patterns, are therefore assumed to not be significantly impacted by the different ice masks used in the ensemble of simulations.

420
Another important consideration when comparing RCM simulations is how the method of applying boundary conditions varies across the ensemble. In particular, although all RCMs examined are nudged at the boundaries within buffer zones, MAR and RACMO also use spectral nudging that constrains the large scale circulation in the interior of domain, while the MetUM instead uses a re-initialisation procedure. Spectral nudging involves applying the relaxation technique throughout the interior of the domain to the long wavelength components of the climate model fields (von Storch et al., 2000). This constrains the 425 large-scale climatology of the RCM output to that of the driving data, while allowing value-added by the RCM in the smallscale features. The same is aimed to be achieved with the re-initialisation of the MetUM throughout the domain every 12 hours, as discussed in section 2.4. The fact that systematic differences between the MetUM, MAR and RACMO are all of significant and comparable magnitude ( Fig.4(g,j),5(g,j) and 6(g,j)), despite MAR and RACMO sharing the technique of spectral nudging, suggests that differences between the specific approach of applying large-scale constraints within the ensemble of RCMs 430 studied is not one of the main features contributing to variability in the mean and seasonal/residual standard deviations of snowfall, near-surface air temperature and melt. It is noted however, that in general the MetUM simulations, rather than the MAR and RACMO simulations, show slightly higher correlation to the reanalysis driving data across the ice sheet for the monthly time series of snowfall and surface temperature (Fig. 2), indicating that the re-initialisation procedure potentially constrains the output across the ice sheet more than spectral nudging.

455
Specific features in the variability, identified and mentioned in section 4, are discussed here. In section 4.1 it is mentioned that for melt there is a clear divide in the average correlation in the residual component of the time series between reanalysis datasets compared with RCMs (Fig. 2). That is the correlations between different RCMs are greater than between reanalysis datasets and RCM outputs. This is not the case for snowfall and near-surface air temperature, suggesting the divide in correlation for melt is primarily due to differences in the sophistication and polar specific tuning of the surface schemes used for the RCM 460 simulations and the global reanalysis products. It is shown in Hansen et al. (2021) that the subsurface scheme and the handling of layers within the scheme can have a significant impact on melt.
In section 4.1 it is also shown that, particularly for snowfall and melt, the median correlation between the outputs is strongly dependent on the specific region and topography. For melt three regions are highlighted that show low correlation: the Ronne ice shelf; the ice shelves bounding Victoria Land; and the interior of the Amery ice shelf. In the case of the Ronne ice shelf, 465 the low correlation in melt is due to relatively low average melt occurring over the region, so fluctuations away from no melt are small and erratic. Low correlation over ice shelves bounding Victoria Land is expected to be caused by a combination of their fine scale and the sharply varying topography in the region, making the climatology around them difficult to resolve with the resolution available in the climate models. Finally, the pattern of low correlation around regions such as the interior of the Amery ice shelf is likely the result of atmospheric processes difficult to represent fully in the models, for example: katabatic 470 winds, driven by gravity, flowing from the interior of the ice sheet to the exterior down elevation channels have a significant impact on the climate on the Amery ice shelf, particularly near the grounding line (Lenaerts et al., 2017).
As with for correlation, the systematic differences shown between the outputs in the ensemble vary depending on the region and topography, see section 4.2. This is true at large and small spatial scales and for all variables. An example of a dependency at large scale is in Fig. 4g MAR shows a significant positive difference in the 40-year mean monthly snowfall relative to the 475 other outputs over the majority of the ice sheet and a significant negative difference over the majority of the surrounding ocean.
In the case of MAR this is hypothesized to be due to a couple of reasons: MAR is forced at the boundaries by humidity and needs time to transform this into precipitation; MAR allows precipitation to be advected through the atmospheric layers until reaching the surface. The advection of precipitation in MAR through each atmospheric layer, in comparison to the instantaneous depositing of precipitation by RACMO, leads to increased snowfall towards the interior of the ice sheet, previously identified 480 in (Agosta et al., 2019).

6 Conclusions
The spatial nature and magnitude of variability present in an ensemble of current, state-of-the-art Antarctic-wide RCM outputs and global reanalysis datasets is examined for snowfall, near-surface air temperature and melt. This is done at a 12 km grid level, rather than across elevation bins, which reveals significant spatial patterns in correlation and systematic differences in the mean and seasonal/residual standard deviation. Time series decomposition is used to split comparisons across an ∼inter-annual 500 trend component, a periodic seasonal component and a monthly residual component, which is useful for impact assessments where knowledge of variability in the climate data across different time-scales and climate drivers is needed.
It is found that the RCM outputs and reanalysis datasets show high correlation for the monthly time series of snowfall and surface temperature across the majority of Antarctica and the bounding Southern Ocean. Despite this, : there exists significant differences, with respect to both magnitude and spatial scale, in the mean and seasonal/residual standard deviations of the time 505 series. In addition, high RMSD between the outputs is found for all variables and is particularly significant for melt with respect to the proportional values relative to annual fluctuations. The primary sources of large-scale, systematic differences between the simulations, for all variables and components, are identified as deriving from differences in: the model dynamical core; the surface scheme; parametrisation and tuning. Differences in driving data, resolution, domains, ice masks, DEMs and boundary conditions are identified as having a secondary contribution. On local, fine spatial scales the relative contribution from different 510 factors is more complex and differences in for example resolution are shown to have a more significant impact.
Code and data availability. The monthly output from all RCM simulations examined in this paper, as well as the processed data used for figures and tables, is available at: https://doi.org/10.5281/zenodo.6367850 . The code used to import, process and generate 525 the figures/tables is available at: https://doi.org/10.5281/zenodo.6375205 (Carter, 2022). The reanalysis data is available to download through the C3S climate data store (CDS) (ECMWF, 2011;Hersbach et al., 2018). Further outputs from Antarctic-wide RCM simulations are available from the the Antarctic-CORDEX project: https://climate-cryosphere.org/antarctic/.
Appendix A: STL Decomposition Figure A1 shows an example of applying STL decomposition to the time series of snowfall, surface temperature and melt for 530 a grid-cell on the Larsen C ice shelf. The decomposition has been applied to each of the 8 model outputs examined in this paper. The trend, seasonal and residual components are shown next to the original time series. Decomposing the time series into these components allows some features to be extracted. For example, in the case of snowfall and surface temperature the models all show high correlation in the inter-annual trend, although there exists a significant systematic difference in the mean between the models. Similarly, for snowfall and surface temperature there is high correlation in the residual term between the 535 models but there is a systematic difference between the models in the standard deviation of that component. In the case of melt, the correlation is more moderate for the trend and residual components, meaning systematic differences are less obvious. The seasonal and residual components in STL decomposition are defined to have approximately zero mean.