Near-surface air temperature (SAT) over Greenland has important effects on
mass balance of the ice sheet, but it is unclear which SAT datasets are
reliable in the region. Here extensive in situ SAT measurements (
Near-surface air temperature (SAT) over the Greenland ice sheet (GrIS) is important both for its place in wider climate change and for its effects on mass balance of the ice sheet. Due to its remoteness and extreme climate however, continuous widespread climate monitoring over the GrIS has been carried out for only about the last two decades, and even then with rather sparse coverage in some geographic areas and glaciological regimes. Studies of past climate and surface mass balance (SMB) of the GrIS have used a variety of techniques to achieve complete spatial coverage of SAT, including statistical interpolation, atmospheric reanalysis, dynamic downscaling through regional climate modeling, and satellite remote sensing. Projections of future change in Greenland climate and ice sheet evolution have used global earth system models, either directly (e.g., Ridley et al., 2005; Vizcaíno et al., 2013) or through dynamical downscaling (e.g., Fettweis et al., 2013; Rae et al., 2012). Many such studies have involved some form of assessment using weather station data (e.g., Box, 2013; Noël et al., 2015; Rae et al., 2012) and inter-comparison of several SAT data sources (e.g., Box, 2013). Here we build on such work to assess and compare a greater number of widely available products, using a more comprehensive set of in situ observations than has customarily been used in previous work. In doing so we hope to guide future dataset and model development over this region and address a number of outstanding questions.
Our main focus here is on global datasets – reanalyses, gridded SAT analyses
and earth system models from the CMIP5 archive – though several regional
datasets are also included. Regional climate models (RCMs) have been used
widely to downscale reanalysis (e.g., Box, 2013; Box et al., 2009; Burgess et
al., 2010; Ettema et al., 2010a; Fettweis et al., 2017; Noël et
al., 2015) and global climate model output (e.g, Fettweis et al., 2013; Rae
et al., 2012). While Noël et al. (2016) demonstrated the benefit of high
(
Inter-comparison of SMB components has been carried out among different RCMs and between RCMs and global reanalyses (Cullather et al., 2016; Rae et al., 2012; Vernon et al., 2013). The results from these studies point to a wide inter-model spread, which are related to differences in model parameterizations (e.g., snow and ice physics), model ice mask and forcing at the domain lateral boundaries. One goal of this work is to investigate how closely RCM forcing affects SAT representation, by comparing differently forced runs of the same RCM (building on the work of Fettweis et al., 2017), and comparing these runs with results taken directly from the forcing dataset.
Satellite remote sensing data has been key in spatially complete
reconstruction of GrIS SAT, whether through direct use (e.g., Hall et
al., 2013) or through assimilation into reanalyses. One consequence of this,
though, is that only a small proportion of studies extend GrIS SAT back
before the satellite era. SMB studies that incorporate centennial scale SAT
reconstructions include: Hanna et al. (2011), who combined Twentieth Century
Reanalysis (Compo et al., 2011) and ERA–40 reanalysis (Uppala et al., 2005);
and Box (2013) who adjusted regional climate model output using in situ
observations to reconstruct SAT from 1840 to 2010. The Box (2013) SAT
reconstruction was compared to that of Hanna et al. (2011) and found to be
cooler over most of the common period, but especially so before about 1930.
More recently, Fettweis et al. (2017) investigated the effect on RCM-derived
SMB of using different forcing reanalyses and showed that SAT estimates are
sensitive to model forcing, with large differences in the first half of the
20th century. By looking at multiple datasets that include the first half of
the 20th century (and earlier), we hope to shed light on the climate of the
GrIS in this very poorly observed period. In particular, such datasets allow
comparison with previous assessments of Greenland SAT climate based on
(mainly coastal) station data (e.g., Box, 2002; Chylek et al., 2006; Hanna et
al., 2012; Mernild et al., 2014). Long, spatially complete time series also
offer the best means of assessing CMIP5 models, without differences
introduced by incomplete spatial coverage and short period (
This paper is structured as follows: in Sect. 2 data sources are described and examples of their past use given; results are broken down into Sect. 3.1, dataset assessment using in situ observations, Sect. 3.2, comparison of long term SAT changes among datasets and Sect. 3.3, further discussion; conclusions are presented in Sect. 4.
Map of study area and weather stations used in this work. Symbol types represent the different monitoring networks summarized in the inserted table.
To assess the different SAT products, we use SAT observations made at manned
and automatic weather stations (AWSs) from several sources, totalling 17 000
station-months or 1400 station-years. These are briefly described here, and
further details are shown in Fig. 1. Coastal station records of monthly mean
temperature for 11 stations (stretching as far back as 1784) are compiled by
the Danish Meteorological Institute (DMI; Cappelen, 2014). Thanks to their
long records, SAT from these stations has been studied extensively: Box
(2002) found a pattern of warming from
In contrast to coastal regions, no long term (e.g., 30 years or more) climate monitoring has occurred on the GrIS. Monthly mean temperatures from mid-20th century expeditions and field camps, concentrated in the 1930s and 1950s, are taken from the Appendix of Ohmura (1987). Since the mid-1990s, the number of SAT observations from the ice sheet has greatly increased. We use records from AWSs operated as part of the Greenland Climate Network (GC–Net), predominantly in the accumulation region of the ice sheet (Steffen and Box, 2001), from the K–transect in western Greenland (operated by the Institute for Marine and Atmospheric Research at the University of Utrecht; van de Wal et al., 2005; van den Broeke et al., 2011) and from AWSs mostly in the ablation region operated by the Geological Survey of Denmark and Greenland (GEUS) under the Program for Monitoring the Greenland Ice Sheet (PROMICE) and Greenland Analogue Project (GAP) programs (Van As et al., 2011). Locations and types of all stations are shown in Fig. 1 and further details are available in Table S1 in the Supplement.
Temperature products assessed in this work. Latitude longitude spacing refers to the grids downloaded for this work (not necessarily the native model grid). Maximum output frequency refers to the maximum available – monthly averages are used in the analysis.
The providers of several of these observational datasets employ quality control tests and/or quality inspection as part of their routine data management. In addition, we remove unrealistic values where our inspection of time series reveals them (e.g., with spikes and step changes). Where data were provided as hourly values, we calculate daily averages (the mean of hourly values) for all days with 20 or more hourly values and monthly averages (the mean of daily values) for all months with 24 or more daily values.
Most of the datasets assessed here fall into two categories: global reanalysis and interpolated global SAT analyses. The spatial and temporal resolution and length of record (Table 1) vary greatly across these products. It should be noted that even though reanalyses are constrained by (in some cases) remote sensing and some local observations to represent observed synoptic–planetary scale weather, the lack of assimilated SAT observations over Greenland means that the SAT data assessed here are largely the result of modelled atmospheric and surface processes.
Several of the latest generation of global reanalyses are used in this study (Table 1). Most of these are reliant on radio-sonde and satellite data, and thus cover only the period when these are available (1979 onwards; 1958 in one case). In addition, we analyze the Twentieth Century reanalysis version 2c (20CRv2c; Compo et al., 2011) and ERA–20C (Poli et al., 2016), which do not assimilate satellite or radio-sonde data, but instead use a subset of observation types that are available over the 20th century (and earlier) and therefore cover much longer periods. GrIS SAT from reanalyses has been used in SMB modeling: Hanna et al. (2005) used ERA-40, while Hanna et al. (2011) combined ERA–40 with 20CR. However, SAT data from a number of other reanalyses remain untested for such applications. It should be noted that, with the exception of ERA-Interim, SAT from land stations is not assimilated into reanalyses and so the SAT observations described in Sect. 2.1 are indeed an independent verification. In ERA-Interim, SAT is assimilated from land stations by the surface analysis scheme, to update surface fields (such as soil moisture) which have an effect on SAT. To the best of our knowledge, for the period analysed here the only Greenland SAT observations that are assimilated by ERA-Interim are from DMI stations, and so the ice sheet stations still provide independent data.
Reanalysis represents a combination of observations and model. In contrast,
several research groups have created gridded SAT datasets based almost
entirely on statistical analyses of weather station SAT (we refer to these as
Recognizing that reanalysis SAT over Greenland is dominated by the model
formulation and has relatively coarse horizontal resolutions, a number of
researchers have sought to improve results over the GrIS by using reanalysis
to force higher resolution regional climate models (RCMs) coupled to
comparatively sophisticated snow–ice models. Such models are typically run
with grid spacing of 10–20
Satellite remote sensing data, in addition to being assimilated by reanalyses, have been used directly to study the GrIS. Several studies have focused on the relationship between SAT and ice sheet surface temperature (IST), and have used data from both microwave (e.g., Shuman et al., 1995, 2001) and infrared sensors (e.g., Comiso et al., 2003; Hall et al., 2008, 2013; Koenig and Hall, 2010). Sounding instruments offer a method to retrieve air temperature more directly, but have received little attention over GrIS. Here we assess SAT from the Atmospheric Infrared Sounder (AIRS; Chahine et al., 2006) on board NASA's AQUA satellite platform. AIRS has been operational since September 2002, providing temperature and humidity retrievals at many vertical levels through the atmosphere. We use the level 3 monthly near surface air temperature from ascending and descending overpasses, taking a weighted average to give a single monthly value at each grid point (further details are given in Table 1). This product is a clear-sky only retrieval: a key part of assessing this product is to understand what effect this has through, for example, seasonally varying cloud amounts and increased wind-driven mixing during winter storms, as discussed in Koenig and Hall (2010).
Earth System Models (ESMs) from the CMIP5 multi-model ensemble archive (Taylor et al., 2011) are included in comparisons of long term areal average SAT. However, comparison of CMIP5 ESMs against in situ observations is not performed because the ESMs are free-running coupled (atmosphere–ocean–land–ice) models, so we do not expect them to have the correct phasing of synoptic weather or inter-annual or even decadal climate. Apparent biases at station locations would therefore combine bias in the long term average and differences in variability over the relatively short station records. The ice sheet areal averages, compared to the longer reanalyses and gridded SAT analyses, should adequately reveal the first order biases in the ESMs' long term average SAT and its trends. Thirty-one different model configurations from 11 modeling centers are used. We use the first ensemble member (r1i1p1) of historical runs from all model configurations that had the necessary data (SAT and glacial ice fraction). Further details of individual models are given in Table S2. In contrast to other datasets above, CMIP5 ESM SAT data are used on their model native grids, rather than interpolated to a common grid (to be discussed below).
Our analysis is based on the monthly mean near-surface air temperature. Except for CMIP5 ESMs and the MAR RCM variants, datasets were spatially interpolated from their native grid to a 5 km equal area grid (the Equal-Area Scalable Earth (EASE) grid of the National Snow and Ice Data Center (NSIDC)) using bilinear interpolation. This resolution is used to attempt to resolve the large SAT gradients that occur over the steep topography at the margin of the ice sheet. Interpolating like this presents some potential problems due to model topography: the surface elevation fields used in many of the datasets here are smoother than the actual topography of Greenland, and this leads to elevation biases as seen in Fig. 2. The relatively low resolution 20CRv2c (Fig. 2b) has mostly positive elevation bias around the edge of the ice sheet and negative bias in the interior; however there are also regions of positive bias close to the center of Greenland. The higher resolution MAR (Fig. 2c) does not have the same magnitude of biases in the interior, but still misses much of the small scale detail, as seen by the speckled pattern of biases of alternating sign. All datasets have a negative mean elevation bias on the ice sheet (Table 2), with MAR the smallest and 20CRv2c the largest. Note that elevation errors are not a monotonic function of resolution: despite a smaller grid spacing than MERRA2 and ERA–Interim, Climate Forecast System Reanalysis (CFSR) still has a larger bias and mean absolute error.
The elevation biases cause the SAT fields to be smoother than in reality, and interpolation of the smooth SAT fields is unlikely to accurately reflect the true SAT gradients, which are strongly influenced by elevation. To account for this, a correction is applied to the reanalysis and AIRS datasets after interpolation to the EASE grid: for each product, the elevation field is also bilinearly interpolated to the EASE grid, and then compared to the digital elevation model (DEM) of Bamber et al. (2013; provided at 1 km grid spacing, and here bilinearly interpolated to the EASE grid). The elevation bias (product minus DEM) is multiplied by the relevant month's lapse rate from Fausto et al. (2009) and their product added to the interpolated SAT field. The importance of this step can be seen by comparing the results below with comparable figures for un-corrected datasets (Figs. S2 and S3 in Supplement). For some datasets in some seasons, the correction leads to a deterioration, but in most cases there is a clear improvement: in many cases, bias and MAE (averaged over all months) are reduced by 50 % or more.
Error statistics of model elevation fields (interpolated to EASE grid, except for MAR) relative to the digital elevation model (DEM) of Bamber et al. (2013). Bias and deciles are calculated as (model minus DEM). Averages are taken over all ice sheet grid points, classified using the mask of Bamber et al. (2013).
Mean over station-months of bias
As in Fig. 3, but for elevation-corrected long reanalyses, MAR–ERA–20C and MAR–20CRv2C, Box2013 data and three gridded SAT analyses (not elevation corrected).
Monthly mean SAT bias for winter (DJF) and summer months (JJA)
before and after 1979, for all datasets that extend back before 1979
(elevation-corrected where applicable) at: ice sheet stations above
1500
Comparisons between gridded datasets and in situ observations are made by
choosing the nearest EASE grid point (for CRU and Berkeley Earth, which are
land-only datasets, the nearest grid point may contain missing data, in which
case the nearest non-missing grid point is chosen). Note that an alternative,
using bilinear interpolation directly from the native grids to the station
locations, gives very similar results. The primary statistics used in the
assessment of datasets are mean bias and mean absolute error (MAE). When
aggregating results over multiple stations, the average of station-months is
taken, rather than averaging over time then over stations. Stations are
grouped into coastal (DMI), ice sheet below 1500
The seasonal cycle of bias and MAE averaged over all station months from 1979
onwards in Figs. 3 and 4 suggests that many datasets, though not all, show
similar seasonal cycles: above 1500
Mean bias and mean absolute error (MAE) for all datasets ranked from
smallest (top) to largest (bottom) MAE. These numbers represent an average of
results from Figs. 3 and 4, with unweighted average over months and an
area-weighted average over glaciological regimes (64.9 % ice sheet above
1500
Trends (
The analysis above aggregates all station months from 1979 onwards. To
investigate time variations in biases, Fig. 5 compares mean bias before and
after 1979 for those datasets which begin before 1979. Note that the datasets
beginning in 1979 show only small changes in bias by decade (not shown).
GISTEMP is included here with the MERRA2 elevation-corrected climatology: the
absolute values of the biases are highly dependent on the climatology, but
here can be ignored as, for the purpose of assessing the stationarity in
GISTEMP bias (and thereby the credibility of its long term variability and
trends), we are interested in the
Clear differences are apparent for some seasons and datasets. Statistical
significance of these differences (using Student's
Areal average (weighted by glacial ice fraction) annual mean temperatures for
all datasets show close correlation in recent decades: considering only the
period 1979 onwards, the correlation (
Among the datasets covering the entire 20th century, most have similar
inter-decadal variations, with a general pattern of early 20th century
warming, up to 1930, followed by cooling to around 1990, then strong warming
in recent years (Fig. 6). Nonetheless, differences do exist (Table 4). For
instance, NansenSAT shows relatively large early 20th century jumps thought
to be caused by changing data sources over this period, indicating this
dataset is not suitable for long term monitoring over Greenland. In 20CRv2c,
the
Of the datasets that extend back before 1900, Box2013, Berkeley Earth and GISTEMP agree quite closely but show notable differences with 20CRv2c. Box2013, Berkeley Earth and GISTEMP cannot be considered truly independent data sources (as they all rely on similar input data for this period, as suggested by their close correspondence with observations in Fig. S4), and so their consensus is not especially meaningful. However, the fact that their biases are more constant in time (Fig. 5) than those of 20CRv2c suggest that they are more reliable for this period. In common with disparities mentioned above for the first half of the 20th century, users of these SAT datasets should be aware that significant uncertainties exist before 1900, with notable differences in trends and variability (both inter-annual and inter-decadal). We recommend the use of gridded SAT analyses alongside reanalyses and downscaled reanalyses, to assess sensitivity to these differences.
The range of SAT among CMIP5 ESMs is wider than that among the other datasets
(Fig. 6), but much of this range comes from a group of four relatively warm
models and two relatively cold models: eliminating these gives a range
comparable to the gridded analysis and reanalysis datasets. This highlights
the fact that choice of verification dataset can have a significant effect on
assessments of ESM mean climate. Based on results above, we use GISTEMP with
MERRA2 climatology to assess the long term mean temperatures of the CMIP5
ESMs. Using the 1901–2000 mean of ice sheet annual average temperatures, 10
ESMs lie within 1
The median of the CMIP5 ESM trends (Table 4) is positive for all periods
considered – in marked contrast to the other datasets. However, further
investigation shows the picture is not so clear: the number of individual
ESMs that have positive trends in each period suggest that, with the possible
exception of 1990–2005, the models do not give a clear consensus on signs of
trends: this may be because inter-decadal climate variability dominates, and
the phasing of this variability differs between models. For the 1990–2005
period, 27 out of 31 ESMs have a positive trend and the median is an order of
magnitude larger than for the earlier periods (although still smaller than
the 1990–2005 trends from the other datasets). Thus the ESMs seem to agree
on accelerated warming since 1990. Significance of the trends is tested using
the method described in Santer et al. (2000), which is based on a two-tailed
Student's
Due to its importance in SMB calculations, we briefly consider summer mean (June–August) ice sheet average SAT (Fig. 6c). Many features are shared with the annual time series, e.g., periods of warming in the years leading up to 1930 and beginning in the 1990s. In addition, we see that the variability in MAR–20CRv2c and MAR–ERA–20C closely follow that in 20CRv2c and ERA–20C respectively. In contrast with the annual mean time series though, the CMIP5 ensemble mean more closely follows the evolution in the observation–based datasets.
The majority of in situ SAT observations from the ice sheet have been made
since 1995. We have used the relatively small number of observations from the
mid-20th century to assess the stationarity of biases, and find that several
datasets show significant temporal variations in their bias. At ice sheet
stations above 1500
Trends among the datasets assessed here (excluding CMIP5 ESMs) generally agree with patterns found in previous studies (e.g., Box, 2002). In addition, interannual variability since 1979 matches closely between datasets. However, differences between longer term trends, along with temporal changes in bias (discussed above), suggest that some datasets have limitations in their representation of early to mid-20th century GrIS SAT. In particular, 20CRv2c shows stronger cooling between 1930 and 1990 than most other datasets, and has a 1930s warm period warmer than the 21st century warm period. Such discrepancies between 20CRv2c and anomaly based SAT datasets have been noted at the global scale by Compo et al. (2013), although the differences here are much greater than those for global SAT. Similarity of anomalies among gridded SAT analyses and ERA–20C, along with the greater temporal constancy of their biases, leads us to put greater faith in their representation of long term trends and inter-decadal variability.
While, as noted above, interannual variability in the last 30 years matches closely between datasets, there is variation in the magnitude of ice sheet average trends (Table 4) and spatial variation in trends (Fig. S5) over this period. Box2013 has the largest recent trends, with largest trends in the west. MERRA2 has its largest trends in the south-west, whilst all three MAR versions have their largest trends in the north-east.
One of our central questions in this study is whether global SAT datasets are as good as RCM-downscaled datasets, which are, at least for SMB modeling, the current state of the art. For MAR–ERA and MAR–ERA–20C, results are generally better than for SAT taken directly from the forcing dataset (even with elevation corrections applied). However, at coastal stations, MAR–ERA performs worse than ERA–Interim. For MAR–20CRv2c, the difference is minimal at ice sheet stations and downscaling is detrimental at coastal stations in winter (though without elevation corrections, MAR–20CRv2c has smaller biases and MAE than 20CRv2c; see Fig. S3). Comparing MAR against all global datasets, we find MERRA2 has biases and MAE comparable to or less than MAR (all three forcings) in all seasons and regions. This is likely due to the comprehensive (relative to other reanalyses) snow/ice model in MERRA2 (Cullather et al., 2014) and reinforces the importance of atmosphere–ice sheet coupling in modeling SAT. In summer, and particularly in the ablation region, MAR and Box2013 are among the best datasets, confirming their suitability for SMB modelling. However, for SAT more generally, the benefits of RCM downscaling seem to be limited.
Another question related to the RCM downscaling is: how closely does the forcing dataset constrain climate variability in the downscaled RCM? Correlations (between 20Crv2c and MAR-20CRv2c, and between ERA–20C and MAR–ERA–20C) of ice sheet annual mean SAT before 1979 suggest that the constraint is close: for example, MAR–20CRv2c has correlation coefficients with 20CRv2c greater than 0.9 for both 1900–1940 and 1940–1980, while its correlation with other datasets is lower (0.54–0.62 for 1900–1940; 0.29–0.82 for 1940–1980). The variability of summer SAT is even more closely constrained (Fig. 6c). Downscaling is able to remedy some large biases shown by reanalysis (e.g., for ERA–20C in summer, Fig 6c), and consideration of anomalies (Fig. 6b) suggests that the downscaling improves representation of climate variability by bringing MAR–20CRv2c more into line with other datasets. Nonetheless, differences remain, particularly before 1920 and between 1950 and 1980, and we consider that MAR–20CRv2c still suffers from some shortcomings in 20CRv2c's representation of variability before 1980.
Although the comparison is for a shorter period than for other datasets, we have found that AIRS gives very good results over the ice sheet in summer – with biases and MAE values among the smallest of any dataset in the ablation region for June, July and August. However, its performance is poor in winter over the ablation region and in summer at coastal stations. The wintertime biases in the accumulation region do not agree (although those in the ablation region do) with the findings of Koenig and Hall (2010) at Summit, that satellite-derived clear-sky only temperatures were lower than all-cloud in situ measurements. They attributed this finding to the fact that clear-sky only retrievals miss winter storms – during which strong winds mix warm air from above an inversion down to the surface – which should lead to negative wintertime biases. The fact that AIRS has positive bias in the accumulation region during winter suggests compensating errors from other sources, for example from retrieval of temperature profiles or from times of day of satellite overpass. Attributing the overall bias to different causes is beyond the scope of this study. In summary, the summertime results suggest AIRS may be a useful dataset for studies of recent SMB, but further investigation is needed into the consequences of clear-sky retrievals, particularly the wintertime discrepancy with previous work and the possibility of compensating errors.
Note that there is a discrepancy between various products in calculating
monthly mean SAT. As discussed in Wang and Zeng (2013) the daily mean
calculated using 24 hourly values per day is different from that calculated
using just maximum and minimum SAT. Comparisons for AWSs on the GrIS suggest
the difference for monthly mean temperatures is
Our evaluation of 5 km grid box values using point measurements may also be
affected by the sampling errors due to the SAT variation within a grid box
(e.g., in grid boxes containing a large range of elevations and different
surface types). Quantifying such an error could in principle be done using
several stations within the same grid box; we do not have any 5 km grid boxes
containing more than one station, however. Instead we look to the variation
of elevation, assuming that this is the dominant source of SAT variation at
small spatial scales and implicitly neglecting effects of varying surface
type and other factors. Elevation variation at any particular location is
quantified by taking the standard deviation of elevation values at the
nearest and 24 surrounding grid boxes from the 1 km version of the Bamber et
al. (2013) DEM. This is then multiplied by a (slightly conservative) lapse
rate of 9.0
In our assessment of biases and their changes through time we have assumed
that all observations are un-biased. Observation biases are likely to exist
(e.g., the positive bias of un-aspirated thermometer shields in low wind,
high solar radiation conditions; Genthon et al., 2011) and are likely to vary
in space and time due to differences in station siting, instrumentation and
observing practices (e.g., number per day and timing of manual thermometer
readings). By breaking down the bias assessment into two altitude bands
(below and above 1500
We have assessed a number of global SAT datasets using in situ
observations over Greenland, and found large differences in their
performance. Reanalyses generally perform better than gridded SAT analyses –
particularly at high elevations on the ice sheet. Simple elevation-based
corrections applied to reanalyses lead, in most cases, to improved
performance: changes in mean monthly MAE (weighted as in Table 3) vary from a
3 % increase to a 42 % decrease. Considering all regions and seasons,
the smallest biases are seen in (elevation-corrected) MERRA2 reanalysis.
Biases vary by season and by region of the ice sheet: in the ablation region
(demarcated here by the 1500 m elevation contour) during summer, most
reanalyses have a
Among global datasets that cover the entire 20th century, 20CRv2c generally has the smallest biases and MAEs when comparing against observations made since 1979. However, combining GISTEMP anomalies with the MERRA2 climatology gives slightly better results and, given concerns about spurious long term trends in 20CRv2c (in particular, a warm bias before 1950), we recommend this type of approach (i.e., combining GISTEMP with MERRA2) to represent monthly SAT over the early and mid-20th century. Similarity of anomalies between gridded SAT analyses (except NansenSAT) suggests that observed biases result from their climatology fields, but their anomalies are suitable alternatives to GISTEMP.
Alongside multi-decadal global SAT datasets, we have analyzed SAT from recent (2002 to present) AIRS satellite retrievals and from RCM-downscaled reanalysis. AIRS has among the smallest biases and MAE in summer months over the ice sheet, but larger errors in winter and when comparing to coastal stations. RCMs are found to reduce biases in comparison to their respective forcing datasets and provide among the best representations of SAT on the ice sheet. However, MERRA2 reanalysis performs comparably on the ice sheet, and better in comparison to coastal stations. The long term variability of RCM SAT closely follows that from the forcing dataset; the shortcomings that we highlight for 20CRv2c thus also persist, to some degree, in the version of MAR forced by 20CRv2c. MAR–ERA–20C has long term variability closer to gridded SAT analyses and long-running DMI stations, but differences remain. The Box2013 dataset, by using spatial information from a similar RCM, has similar patterns of bias to the MAR datasets. However, Box2013 inherits its long term variation from the same SAT observations as used in global SAT analyses, rather than from (as in MAR) reanalysis forcing; thus its anomalies closely follow those from CRU, GISTEMP and especially Berkeley Earth.
We have assessed CMIP5 ESMs by comparing their ice sheet average SAT with
that from other datasets. A key finding is that such an assessment depends
crucially on the choice of verification dataset. Using GISTEMP combined with
MERRA2 climatology (due to its overall good performance in comparison with in
situ observations), we find that a large number of the CMIP5 ESMs have
similar ice sheet long term annual average SATs (10 within 1
Our analysis highlights several avenues for future work. Comparison of different instrument types and measurement practices would allow a quantitative assessment of the effects of instrument bias on the results shown here. Such work is also crucial to investigations of GrIS diurnal temperature variation, for example in model assessment and SMB studies using positive degree day methods (Fausto et al., 2011; Rogozhina and Rau, 2014). Results for AIRS retrievals suggest it may provide useful SAT information over the GrIS in summer, but further work is needed on the effects of only sampling clear-sky SAT. Investigation is required to establish the cause of disparities in trends and variability between 20CRv2c and ERA–20C – which are ostensibly formulated in similar ways. Possible causes include different representation of atmospheric circulation and different sea ice and sea surface temperature datasets. While RCM downscaling is currently an important tool in assessing past and future GrIS mass balance changes, our results provide new evidence that results from RCMs are highly dependent on the forcing. For SAT, RCM downscaling can reduce biases and give realistic spatial patterns compared to the forcing dataset, but does not seem to greatly alter the long term evolution of the areal average. It remains to be seen whether the same is true for SMB. The greatest SAT differences between the versions of MAR used here occur before 1980, but there are differences since 2000 too, highlighting that uncertainties in GrIS SMB exist even in the better-observed recent past.
Most of the data used in this work are freely and publicly available. Full dataset references are given in the Supplement. Derived data fields (e.g., elevation-corrected SAT) and code used to analyze data and plot figures are available from the corresponding author on request.
DMI AWS data were downloaded from
The authors declare that they have no conflict of interest.
This research was supported by NASA (NNX14AM02G), DOE (DE-SC0016533), and the Agnese Nelms Haury Program in Environment and Social Justice. We thank Jason Box, Xavier Fettweis and an anonymous reviewer for their constructive comments and suggestions. Chris Castro and Guo-Yue Niu are thanked for useful discussions during the preparation of this manuscript. We also thank the various groups and centers for making their datasets and model results available. We thank C. J. P. P. (Paul) Smeets and the Institute for Marine and Atmospheric Research at the Utrecht University for providing the K-transect data. Edited by: Marco Tedesco Reviewed by: Xavier Fettweis and one anonymous referee