Articles | Volume 18, issue 4
Research article
18 Apr 2024
Research article |  | 18 Apr 2024

Validation of pan-Arctic soil temperatures in modern reanalysis and data assimilation systems

Tyler C. Herrington, Christopher G. Fletcher, and Heather Kropp

Reanalysis products provide spatially homogeneous coverage for a variety of climate variables in regions such as the Arctic where observational data are limited. Soil temperatures are an important control of many land–atmosphere exchanges and hydrological processes, and permafrost soils are thawing as the climate warms. However, very little validation of reanalysis soil temperatures in the Arctic has been performed to date, because widespread in situ reference observations have historically been limited there. Here we validate pan-Arctic soil temperatures from eight reanalysis and land data assimilation system products, using a newly assembled database of in situ observations from diverse measurement networks across Eurasia and North America. We examine product performance across the extratropical Northern Hemisphere between 1982 and 2018, and find that most products have soil temperatures that are biased cold by 1–5 K, with an RMSE of 2–9 K, and that biases and RMSE are generally largest in the cold season. Monthly mean values from most products correlate well with in situ data (r>0.9) in the warm season but show lower correlations (r=0.55–0.85) in the cold season. Similarly, the magnitude of monthly variability in soil temperatures is well captured in summer but overestimated by 20 %–50 % for several products in winter. The suggestion is that soil temperatures in reanalysis products are subject to much higher uncertainty when the soil is frozen and/or when the ground is snow covered, suggesting that the representation of processes controlling snow cover in reanalysis systems should be urgently studied. We also validate the ensemble mean of all available products and find that, when all seasons and metrics are considered, the ensemble mean generally outperforms any individual product, in terms of its correlation and variability, while maintaining relatively low biases. As such, we recommend the ensemble mean soil temperature product for a wide range of applications, such as the validation of soil temperatures in climate models, and to inform models that require soil temperature inputs, such as hydrological models.

1 Introduction

Soil temperatures, both near the surface and at depth, are an important control of many physical, hydrological, and land surface processes, as soils act as a reservoir for energy and moisture underground. They provide an important initial condition for numerical weather prediction, as energy and water fluxes from the land are important for convective processes (Dirmeyer et al.2006; Kim and Wang2007; Siqueira et al.2009). As soils react relatively slowly to variations in weather, soil temperature is also an important predictor of seasonal and mid-term weather forecasts (Xue et al.2011). Soils over large portions of the Arctic are perennially frozen (permafrost soil). Roughly 1400–1600 Gt of carbon are estimated to be stored in soils in permafrost-affected regions of the Northern Hemisphere (Hugelius et al.2014). Continued warming and thawing of permafrost soils as well as related decomposition of carbon could act as a potential positive feedback on warming – the permafrost carbon feedback – by releasing more methane (CH4) and carbon dioxide (CO2) into the atmosphere (Koven et al.2011). In situ based soil temperature monitoring networks using thermistor probes, particularly at high latitudes, are limited in terms of their spatial and temporal coverage (Yi et al.2019), making it difficult to assess hemispheric scale changes in permafrost.

Reanalysis products have been used in a variety of weather and climate applications to provide information on a regular spatial grid, particularly in regions where limited or no observational data are available (Koster et al.2004; Zhang et al.2008). Previous studies validating reanalysis soil temperature have primarily focused on the middle latitudes, such as across China (Yang and Zhang2018; Zhan et al.2020; Zhao et al.2022), the Qinghai–Tibetan Plateau (Hu et al.2019; Jiao et al.2023; Wu et al.2018), Europe (Albergel et al.2015; Johannsen et al.2019), and the continental United States (Albergel et al.2015; Xia et al.2013), with a couple of recent studies validating soil temperatures globally (Li et al.2020; Cao et al.2020; Ma et al.2021). Relative to in situ ground temperature probe networks, most reanalysis products are biased cold by about 2–5 °C, on average (Hu et al.2019; Qin et al.2020; Yang and Zhang2018). Ma et al. (2021) found that most reanalysis products show larger cold biases over polar regions than they do over tropical and temperate regions, while a recent study by Cao et al. (2020) found that ERA5-Land soil temperatures were biased warm over the Arctic, particularly in winter.

Several explanations have been suggested for the biases in reanalysis soil temperatures, including model parameterizations (Albergel et al.2015; Cao et al.2020; Chen et al.2015; Wu et al.2018; Xiao et al.2013), air temperature biases (Cao et al.2020; Hu et al.2017), errors in topography and elevation, arising from the coarse resolution of reanalysis products (Yang and Zhang2018; Zhao et al.2008; Ma et al.2021), and errors in simulated snow cover and snow thermal insulation (Cao et al.2020; Royer et al.2021; Cao et al.2022).

While soil temperature biases in individual reanalysis products may limit their utility, a consensus is emerging that multi-product ensembles, based on the same principle as ensemble weather prediction (World Meteorological Organization2012), are an effective way to increase the signal-to-noise ratio for many important geophysical variables. Ensemble mean datasets based on combinations of in situ, model, satellite, and reanalysis data have been used to reduce biases in estimates of snow water equivalent (Mudryk et al.2015), soil moisture (Dorigo et al.2017; Gruber et al.2019), and precipitation (Beck et al.2019), as well as for local-scale permafrost simulations (Cao et al.2019). Li et al. (2020) suggest that a similar method could be used to reduce biases in reanalysis soil temperatures.

Reanalysis soil temperatures have been relatively well characterized over the middle latitudes. Studies validating Arctic soil temperatures in reanalysis products, however, have either focused on a single product (Cao et al.2020) or have only considered a limited spatial extent (Li et al.2020; Ma et al.2021). Here we perform a validation of pan-Arctic (and Boreal) soil temperatures from eight reanalysis and land data assimilation system (LDAS) products. The main objectives are to (1) validate the eight reanalysis and LDAS soil temperature products in terms of their bias, RMSE, correlation, and standard deviation, and (2) investigate whether an ensemble mean soil temperature product outperforms the individual reanalysis products.

2 Data

2.1 Reanalysis and LDAS data

Tables 1 and 2 outline the six reanalysis and two LDAS soil temperature products used in this study. For simplicity, the term “reanalysis” will hereafter be used to describe both reanalysis and LDAS products. A summary of each product follows below. Products were remapped onto the European Reanalysis – Interim (ERA-Interim) grid for comparison using three different methods: nearest-neighbour, bilinear interpolation, and first-order conservative remapping. The choice of remapping method did not affect the overall conclusions of the study, and the analysis is based on data remapped using the conservative remapping method, as it facilitated the use of the largest number of validation sites and grid cells.

The reanalysis products investigated span a wide range of horizontal resolutions, ranging between 0.1°, in the case of ERA5-Land and FLDAS, to 0.75° for ERA-Interim (Table 1). Most products – CFSR, ERA-Interim, ERA5, ERA5-Land, and the Famine Early Warning Systems Network Land Data Assimilation System (FLDAS) – simulate soil temperature across four vertical layers, while MERRA2 includes six vertical layers and JRA55 calculates soil temperature across a single layer. The topmost soil layer has the highest resolution (7–10 cm in most cases), while the bottom soil layer often averages soil properties over 1 m or more (Table 2).

The Noah Land Surface Model (Noah-LSM) (Chen et al.1996; Betts et al.1997; Koren et al.1999; Ek2003) is used by CFSR and FLDAS. CFSR uses the Noah-LSM in a fully coupled mode to obtain a first-guess land–atmosphere simulation before operating in a semi-coupled mode with GLDAS to obtain information about the state of the land surface (Saha et al.2010a). FLDAS, however, is run in an offline mode utilizing meteorological forcing from MERRA2 (McNally et al.2017) and rainfall information from NOAA's Global Data Assimilation (GDAS) (Derber et al.1991), the Climate Hazards Group Infrared Precipitation with Stations (CHIRPS) (Funk et al.2015), and the African Rainfall Estimation version 2.0 (RFE2) (Xie and Arkin1997).

ERA-Interim, ERA5, and ERA5-Land use versions of the Tiled ECMWF Scheme for Surface Exchanges over Land (TESSEL) land model (Viterbo and Beljaars1995; Viterbo and Betts1999). In the case of ERA-Interim, TESSEL is informed by empirical corrections from 2 m (surface) air temperature and humidity (Dee et al.2011). Meanwhile, ERA5 and ERA5-Land use an updated version of TESSEL, known as the Hydrology-Tiled ECMWF Scheme for Surface Exchanges over Land (HTESSEL) (Balsamo et al.2009). In ERA5, a weak coupling exists between the land surface and atmosphere. It includes an advanced LDAS that incorporates information regarding the near-surface air temperature, relative humidity, as well as snow cover (de Rosnay et al.2014), along with satellite estimates of soil moisture and soil temperature from the top 1 m of soil (de Rosnay et al.2013). ERA5-Land, unlike ERA5, does not directly assimilate observational data. Instead, the ERA5 meteorology (such as air temperature, humidity, and atmospheric pressure) is used as forcing information for HTESSEL, allowing it to be run at higher resolutions (Muñoz-Sabater et al.2021). It includes an improved parameterization of soil thermal conductivity allowing for it to account for ice content in frozen soil, improvements to soil water balance conservation, and the ability to capture rain-on-snow events (Muñoz-Sabater et al.2021).

MERRA2 utilizes the Catchment Land Surface Model (CLSM) (Ducharne et al.2000; Koster et al.2000). Though MERRA2 does not include a land surface analysis (Gelaro et al.2017), CLSM is informed using an updated version of the Climate Prediction Center's unified gauge-based analysis of global daily precipitation (CPCU) precipitation correction algorithm that originated in MERRA-Land (Reichle et al.2017b). No corrections are available, however, for high latitude regions north of 62.5° N (Reichle et al.2017a). Finally, JRA55 uses the Simple Biosphere Model (SiB) (Onogi et al.2007; Sato et al.1988; Sellers et al.1986) in offline mode, forced by atmospheric data and data from land surface analyses that incorporate microwave satellite retrievals of snow cover (Kobayashi et al.2015).

Saha et al. (2010a)Saha et al. (2014)Hersbach et al. (2020)Muñoz-Sabater et al. (2021)Dee et al. (2011)McNally et al. (2017)Harada et al. (2016)Kobayashi et al. (2015)Gelaro et al. (2017)

Table 1Summary of the eight reanalysis and LDAS products, their equatorial resolution, land model, and relevant references.

Download Print Version | Download XLSX

Table 2Summary of the eight reanalysis and LDAS products, as well as the number and depths of the soil layers included.

* The JRA55 Simple Biosphere Model contains up to three soil layers (whose depths vary depending on vegetation type), but the soil temperature is averaged over all layers to produce a singular value at each grid cell.

Download Print Version | Download XLSX

2.2 Observational data

Owing to the lack of dense soil temperature monitoring networks in the Arctic, most of the observed soil temperature record is characterized by a soil temperature record that is temporally and spatially sparse (Luo et al.2020). While Russia has a more complete record of permafrost temperatures extending back to the 1980s (Sherstiukov2012), longer-term permafrost records over North America are generally limited to the western Arctic (Smith et al.2010). Portions of coastal Nunavik, in northern Quebec, have records of permafrost temperatures from the 1990s onwards (CEN2020a, b, c, d, e, f, g), while soil temperature measurements in the central Arctic are rather sparse (Smith et al.2010). Rather than limit our validation to a small geographic region in the permafrost zone, as several prior studies have done (Hu et al.2019; Qin et al.2020; Wu et al.2018; Ma et al.2021; Li et al.2020), we choose to combine data from a variety of sparse and dense networks. Such an approach has been used to validate soil temperature and permafrost performance in ERA5-Land (Cao et al.2020), and allows for the examination of larger geographic regions as well as for the inclusion of a more diverse set of vegetation types across the continent (Ma et al.2021).

The study compiles a comprehensive set of in situ soil temperature measurements, approximately 1700 stations in total, from across extratropical Eurasia and North America (Table 3; also see the Supplement). Incorporating data from multiple diverse sparse networks, the dataset includes data from the Yukon Geological Survey (Yukon Geological Survey2021), the Northwest Territories (Cameron et al.2019; Ensom et al.2020; Gruber et al.2019; Spence and Hedstrom2018a, b; Street et al.2018), Roshydromet network in Russia (Sherstiukov2012), Nordicana series D (Nordicana) (Allard et al.2020; CEN2020a, b, c, d, e, f, g), Global Terrestrial Network for Permafrost (GTN-P) (GTN-P2018), and Kropp et al. (2020) – in an attempt to provide a representative estimate of soil temperature across the circumpolar Arctic. Our validation data also include sites from outside regions typically underlain by permafrost in order to facilitate a comparison of the performance of reanalysis soil temperatures at high latitudes with their performance in regions outside the permafrost zone. These include stations from Kropp et al. (2020), Sherstiukov (2012), and GTN-P (2018), as well as locations from the Manitoba Mesonet network (RoTimi Ojo and Manaigre2021), the Michigan Enviroweather network (MAWN) (Enviroweather2024), the North Dakota Agricultural Weather Network (NDAWN) (North Dakota Agricultural Weather Network2022), and the Alberta Climate Information Service (ACIS) network (Alberta Agriculture, Forestry and Rural Economic Development2022). Data are also sourced from a peatland ecosystem in Metro Vancouver (British Columbia) (Lee et al.2017), as well as several locations in central and northern British Columbia. (Déry2017; Hernández-Henríquez et al.2018; Morris et al.2021). This provides a unique baseline upon which to perform a hemispheric wide assessment of soil temperature in reanalysis and LDAS systems, and to the authors' knowledge, it presents the most comprehensive analysis to date of soil temperatures across Canada and the Great Lakes basin.

GTN-P (2018)Kropp et al. (2020)Sherstiukov (2012)Cameron et al. (2019)Gruber et al. (2019)Ensom et al. (2020)Rudy et al. (2020)Street and Wookey (2016)Yukon Geological Survey (2021)Spence and Hedstrom (2018b)Hernández-Henríquez et al. (2018)Déry (2017)Morris et al. (2021)Lee et al. (2017)RoTimi Ojo and Manaigre (2021)Enviroweather (2024)Alberta Agriculture, Forestry and Rural Economic Development (2022)North Dakota Agricultural Weather Network (2022)

Table 3Summary of the observational data networks included in this study, with the dataset name, number of stations, and their references. Note that Nordicana D references are listed by site in the supplemental metadata.

Download Print Version | Download XLSX

2.3 Collocation of station and reanalysis data

In order to compare with data from reanalysis and LDAS products, temperatures were averaged across two depth bins: a near-surface layer (0–30 cm) and soil temperatures at depths ranging from 30 to 300 cm. For each site, temperatures from all depths residing within a layer were averaged, producing an estimated layer-averaged temperature for every time step. In order to maximize the amount of observational data available, layer-averaged soil temperatures were calculated at each time step with all available data. This tradeoff meant that layer averages often included a different number of depths at different time steps, and as such, we needed to limit our analysis of soil temperature trends and variability to locations where layer averages had a consistent number of depths.

Figure 1Panel (a) shows the location of the validation grid cells collocated with in situ stations in the near-surface layer. Grid cells excluded from the soil temperature trend analysis are shown as an “×”. Type 1 refers to grid cells where the ensemble mean simulates a winter minimum soil temperature that is too cold. Type 2 refers to grid cells where the ensemble mean simulates a summer maximum soil temperature that is too cold. Type 3 refers to grid cells where the ensemble mean underestimates the seasonal cycle of soil temperatures. The number in brackets beside each legend entry displays the number of grid cells in each category. The contour lines encircle regions where the Obu et al. (2018) permafrost cover is at least 50 %. Panel (b) shows the impact of spatial variation and depth variation on the spread of soil temperatures in a grid cell. The mean is shown by a green triangle and outliers are shown as grey diamonds.

Many of the in situ (station) sites reported measurements at hourly or daily frequency; however, we chose to perform the analysis at monthly time scales in order to focus on processes controlling the seasonal cycle of soil temperatures. As such, we use monthly averages of soil temperatures for validation purposes. Outlier observations with anomalies greater than ±3.5σ were removed before monthly averaging.

Since the station data often included days with missing observations, the sensitivity of the monthly averages to missing data was tested by computing monthly averages in five ways: using all months with at least 1 valid day in a month, using all months with at least 25 %, 50 %, and 75 % valid data, and finally, using all months with no missing data in a month. It was found that Tsoil was not substantially impacted by the inclusion or exclusion of months containing missing data. In order to increase sample size, we therefore included all months with at least 50 % valid data.

In order to be considered as a validation location, the grid cell was required to include soil temperature data for all eight reanalysis/LDAS products and be collocated with at least one in situ station. Duplicate stations across datasets were excluded. In situ locations were only included if there was at least 2 years of in situ data in order to properly assess the station's seasonal cycle. For grid cells containing multiple in situ stations, the value used in the comparison is a simple spatial average of the in situ stations in that grid cell on each calendar day.

Over Eurasia, grid cells contained a single in situ measurement location. In North America, however, a number of the grid cells contain two or more in situ stations. The near-surface layer includes 430 validation grid cells (Fig. 1a), while at depth there are 377 grid cells (not shown). A subset of stations with longer time series and a more complete data record are used to calculate soil temperature trends (Sect. 4.2). Stations included in the soil temperature trend and variability analysis are shown as circles of varying size and colour, while those excluded from the soil temperature trend and variability analysis are shown as an “×” (Fig. 1a). The details of Fig. 1a will be described further in Sect. 5.3.

To calculate spatial averages, a simple average of (layer-averaged) soil temperatures from all stations within the bounds of a particular grid cell was calculated at each time step using all available stations. This meant that the number of stations included at each time step was not always consistent, and the analysis of soil temperature trends was limited to a subset of 52 grid cells in Eurasia where the following conditions were met:

  1. The time series included data between January 1985 and December 2010, with no missing data.

  2. The number of stations included in the spatially averaged grid cell temperature was consistent over all time steps.

  3. The number of depths included in the layer-averaged soil temperature of each contributing station remained consistent over all time steps.

As a result, North American grid cells were excluded from the soil temperature trend analysis, and the analysis is based on grid cells from Eurasia (where grid cells often only contained a single station) (Fig. 1a). Using a subset of grid cells that incorporate multiple stations in the spatial average, and include a consistent number of stations and depths in the time series, we quantified the variability in soil temperatures between stations within a grid cell and across depths within a layer average. It was found that the median temperature range between stations within a grid cell was approximately 1.5 °C, roughly 1.75 times larger than the median temperature range across depths within the near-surface layer of a station (Fig. 1b), suggesting that temperature variability within a grid cell is substantially larger than variations in temperatures within the near-surface layer of a particular station.

3 Methods

3.1 Validation metrics

Reanalysis and observational (station) soil temperature data were collocated with one another spatially and temporally. Grid-cell-level soil temperatures from each product were compared against in situ soil temperatures using the following statistical metrics: bias (Eq. 1), root-mean-squared error (RMSE) (Eq. 2), normalized standard deviation (σnorm) (Eqs. 3 and 4), and the Pearson correlation coefficient (R) (Eq. 5). We also include an overall skill score for each model, i.e., a Thackeray et al. (2015) type formulation of the Taylor (2001) skill score (SS) (Eq. 6). Statistical metrics were calculated as follows:


where Tp is the Tsoil from the reanalysis product and Ti is the Tsoil of the in situ data. Tp and Ti refer to the mean Tsoil of the reanalysis product and in situ data, respectively, and N is the number of monthly soil temperature values. σTp and σTi are the standard deviations of the reanalysis product soil temperatures and in situ soil temperatures, respectively. Finally, x refers to the Tsoil (from a particular time step in a dataset) and x is the mean Tsoil of the dataset.

Metrics were calculated separately for each grid cell and then averaged to obtain regional values. Estimates for the permafrost zone and the zone with little to no permafrost were also calculated by averaging together metrics from grid cells falling within a particular zone. Skill scores were calculated separately for the near surface and depth, while the “overall” skill score represents an average of the near-surface and depth skill scores.

3.2 Binning of datasets by season and permafrost

Datasets were binned into a cold season and warm season using the University of East Anglia's Climatic Research Unit (CRU) TS version 4.07 2 m air temperature (Tair) (Harris et al.2020) for each grid cell. Cold season months are those where Tair-2 °C, while the warm season refers to months with Tair>-2 °C, where Tair is the monthly mean air temperature. Sensitivity testing on the cold/warm season revealed no substantive impact on our conclusions using a threshold of 0, 5, and 10 °C. We also tested the impact of using a different temperature dataset to perform the binning: the ERA5 2 m air temperature, which resulted in similar findings.

Permafrost zonation was estimated using the Obu et al. (2018) permafrost map, which employs a temperature at the top of the permafrost (TTOP) model based on a 2000–2016 climatology, driven by a combination of remotely sensed land surface temperatures, downscaled atmospheric data from ERA-Interim, and land-cover information from The European Space Agency (ESA) Climate Change Initiative (CCI) (Obu et al.2019). To maximize the sample size in each group, we merge the “continuous” and “discontinuous” permafrost zones into a single category called the “permafrost zone” and compare it against the zone with “little to no permafrost”, which includes all regions with <50 % permafrost cover.

3.3 Elevation impacts

The authors examined the potential impacts of elevation differences between in situ datasets and reanalysis products by estimating the station elevation using the 90 m Copernicus Global Digital Elevation Model (GLO-90) (European Space Agency2021) and obtaining reanalysis elevations at their native resolution for the nearest grid cell to the station. For grid cells with more than one station, station elevations were averaged together to obtain a grid cell estimate of the “station” elevation.

Grid cells were grouped into three elevation bins based on the station elevation, and it was found that over 70 % of grid cells are located in regions where the in situ station(s) are below 500 m (Table 4). Only 15 grid cells had station elevations above 1000 m, so the authors grouped all grid cells at or above 500 m together for the purposes of validation. While reanalysis products generally underestimated the elevation of higher elevation stations, with an average RMSE of between 144 and 589 m (not shown), this did not appear to have a major impact on soil temperature performance. (Readers are referred to Sect. 4.3 (“Spatial variability”) for more details.)

3.4 Regridding of reanalysis products and calculation of ensemble mean soil temperature

All products were first regridded to the ERA-Interim grid using a first-order conservative remapping technique (Jones1999). The near-surface soil layers, which are representative of the top 30 cm of soil, were calculated as a simple average of the top two soil layers in each reanalysis product. For JRA55, the single soil layer was used as a near-surface estimate. Soil temperatures at depth, for each product, were calculated as a simple average of all layers which fell between 30 and 300 cm, beginning with the third soil layer. For JRA55, the single averaged soil layer was used as an estimate of soil temperatures at depth. (Readers are referred to Table 2 for further information about product soil layers.) While the vertical discretization is coarser than that of the individual products, this approach allows the ensemble mean product to incorporate soil temperatures from products with different land models, whose vertical resolution is not constant.

Table 4Number of grid cells in each elevation bin for the near surface and at depth.

Download Print Version | Download XLSX

After the near-surface and deep soil layer average temperatures were calculated for each product, the ensemble mean soil temperature, for each layer, was calculated as the unweighted arithmetic mean from six individual soil temperature products (CFSR, ERA-Interim, ERA5, ERA5-Land, FLDAS, and MERRA2). Owing to JRA55's simplified land model, which is unable to capture near-surface soil temperatures, we decided to exclude it from the ensemble mean product, as its inclusion dramatically increased the bias and RMSE of the ensemble mean.

Figure 2Bias (solid colour), RMSE (hatching), and skill scores (circles) of each product for the cold season (blue) (-2 °C) and the warm season (red) (>-2 °C) performance of reanalysis products. Bias, RMSE, and skill score over the annual mean are displayed in purple. Panel (a) displays the bias, RMSE, and skill score for the near-surface (0–30 cm) layer, while (b) displays the bias, RMSE, and skill score at depth (30–300 cm). The ensemble mean is shown beside for comparison.


4 Validation of reanalysis products

4.1 Annual mean

The Taylor skill score ranges between a minimum of zero and a theoretical maximum of one. A product with a skill score of 1.0 would display a perfect correlation of 1.0 relative to in situ soil temperatures and a soil temperature variance identical to that of the in situ data. Near the surface, most products display relatively high annual mean skill scores of > 0.9, suggesting that they generally capture the overall seasonal cycle of soil temperatures. At depth, skill scores are typically about 0.1 lower (Fig. 2; Tables S1 and S2 in the Supplement), arising from the lag between air temperatures and soil temperatures at depth. JRA55, however, displays an annual mean skill score of 0.54 near the surface and a skill score of 0.79 at depth (Fig. 2). This arises because JRA55 uses a simplified land model that uses just a single vertical layer, i.e., the soil temperatures used are computed as averages over the soil column and are, therefore, more similar to deeper soil layers than to the surface. Consequently, JRA55 underestimates the seasonal cycle of observed soil temperatures in the near surface, and the timing of its annual maximum and minimum soil temperatures is offset by roughly 1 month relative to other products (not shown).

Most products are biased cold over the annual mean and display soil temperature biases of between +0.3 and 3.1 °C, with RMSE values ranging between 2.5 and 7.7 °C over the extratropical Northern Hemisphere (Fig. 2; Tables S1 and S2). ERA5-Land is a notable exception as it shows a warm bias over the annual mean, driven by warm biases in winter, a factor that will be discussed further in Sects 4.2 and 6.1.

4.2 Seasonal cycle

As alluded to above, strong seasonal differences exist in reanalysis performance, with noticeably lower skill scores in the cold season. This is particularly true near the surface, where cold season skill scores are 0.08–0.35 lower than their warm season counterparts. Skill score declines at depth are reduced but still show a decline of 0.02–0.08 relative to the warm season (Fig. 2; Tables S1 and S2).

The decline in skill score is mirrored by increases in near-surface bias and RMSE for several products. This is particularly true for ERA-Interim, whose bias and RMSE are 4.1 and 3.7 °C larger, respectively. Interestingly, biases for all products are somewhat larger in the warm season at depth, though seasonal differences are also generally smaller in the deeper soil layers (Fig. 2). JRA55 and ERA5-Land are noticeable outliers to this trend, as they both exhibit positive (warm) biases in the cold season. In the case of JRA55, this is due to an underestimation of the seasonal cycle of temperatures, while for ERA5-Land, it is likely due to issues with snow cover properties (which will be discussed further in Sect. 6.1).

Near the surface, most products show a maximum cold (negative) bias when soil temperatures are between 2 and 10 °C, and there is a tendency for the biases of most products to decrease or flip sign over the coldest temperatures (Fig. 3), suggesting that they may underestimate the coldest in situ temperatures. At depth, however, most products display larger cold biases over warmer temperatures (Fig. 3b). JRA55 displays a maximum cold (negative) bias over the warmest temperatures and the bias flips sign as temperatures approach freezing, linked to its reduced seasonal cycle (Fig. 3). In the case of ERA-Interim, the largest cold (negative) biases are found over the coldest temperatures, likely linked to issues with its snow cover representation (discussed in Sect. 6.1).

Most products generally capture the observed soil temperature variance during the warm season. This is evident in Fig. 4b and d, as the normalized standard deviation is within 25 % of the observed variance for all products. This is contrasted by the cold season, where several products overestimate soil temperature variability (particularly at depth), contributing to a decline in product skill (Fig. 4a and c). For example, ERA5-Land's (blue diamond) cold season skill is impacted by its underestimation of cold season soil temperature variability (which is roughly half of the near-surface observed variance) and arises in part because of its warm (positive) bias in winter (Fig. 2).

The overestimation in soil temperature variance is also apparent in Fig. 5 as products display larger soil temperature variances (as measured by their standard deviation) for a given in situ soil temperature over colder temperatures. The spread in standard deviation between products (similar to their biases) is also generally largest over colder temperatures. The reduced standard deviation near the surface in the 32 °C bin is likely a function of the small sample size (11).

Similar seasonal variation is seen in reanalysis soil temperature correlations (against station data), as most products show warm season correlations of greater than 0.93 near the surface, with reductions of between 0.20 and 0.39 for most products over the cold season. At depth, warm season correlations are generally 0.09–0.18 lower than near the surface, and seasonal differences are much smaller (Fig. 4b and d). The poor JRA55 correlation near the surface arises from its mismatched seasonal cycle.

Figure 3Reanalysis soil temperature bias as a function of station soil temperature for (a) the near-surface (0–30 cm) layer and (b) at depth (30–300 cm). Station temperatures are binned into 4 °C intervals, beginning with the 32 to 28 °C bin and ending with the 26–30 °C bin. The midpoint of each temperature bin is plotted along the x axis. The secondary y axis displays the number of data points in each bin (in conjunction with the histogram).


Figure 4Taylor diagram of the cold season and the warm season performance of reanalysis products. Panels (a) and (c) refer to the cold season, while panels (b) and (d) refer to the warm season. The top panels, (a) and (b), are for the near surface, while the bottom panels, (c) and (d), refer to soil temperatures at depth. The concentric rings (solid grey lines) refer to the centralized root-mean-squared error (CRMSE), and a product would have a CRMSE of zero, with a normalized standard deviation of one and a correlation of one, if the time series of the reanalysis matched the station data perfectly.


Figure 5Reanalysis soil temperature standard deviation as a function of station soil temperature for (a) the near-surface (0–30 cm) layer and (b) at depth (30–300 cm). Station temperatures are binned into 4 °C intervals, beginning with the 32 to 28 °C bin and ending with the 26–30 °C bin. The midpoint of each temperature bin is plotted along the x axis. The secondary y axis displays the number of data points in each bin (in conjunction with the histogram).


4.3 Spatial variability

Similar to the strong seasonal differences in product performance, we also see substantial spatial variability in performance that is strongly linked to latitude. Over the permafrost regions, where snow cover lasts longer, soil temperature performance is typically worse than over regions further south. Near the surface, skill scores over permafrost regions are typically 0.05–0.1 lower over the annual mean than in regions with little to no permafrost, and at depth, they are generally between 0.07 and 0.26 lower (Fig. 6). The one exception is JRA55, which actually sees a slightly higher skill score over permafrost regions, driven by slight declines in RMSE over the cold season relative to regions further south. It remains for future studies to determine whether these differences are due to permafrost regions being colder or due to structural issues with the land models, as this is beyond the scope of this study.

Mirroring the lower skill scores in most products are larger RMSEs over the permafrost regions. This is particularly true during the cold season, when the RMSE for most products is typically 1.3–4.5 °C larger (Fig. 6) than over the zone with little to no permafrost. Warm season RMSE is also larger for most products in the permafrost zone, though by a lesser margin (0.1–2.1 °C).

Bias is also typically larger over permafrost regions by 0.63–3.9 °C. The ERA5-Land warm (positive) bias in the cold season is largest over permafrost regions (Fig. 6). In the case of JRA55, however, the warm (positive) biases over the cold season are largest further south. In fact, over many grid cells in the permafrost zone, JRA55 exhibits a cold (negative) bias during the cold season (not shown).

Reduced agreement between products is also apparent over permafrost regions, particularly at depth, where the spread in standard deviation between products is roughly 1.6–3.7 times larger over the permafrost zone (Fig. S1 in the Supplement), because of substantial differences in the variance of ERA5-Land, JRA55, and ERA-Interim. Interestingly, the differences in correlation and standard deviation between the permafrost zone and the zone with little to no permafrost, in the near-surface soil layers, are less dramatic (Fig. S2).

Generally, the skill is higher over Eurasia than over North America (Fig. 7). The lower skill in North America arises in part due to the underestimation of seasonal cycle over many grid cells in the Yukon and an overestimation of variability in cold season temperatures over much of the Great Lakes region (Fig. S3). CFSR and JRA55 are exceptions, however, as they greatly overestimate the cold season variability over much of western Eurasia (not shown) and consequently exhibit lower Eurasian skill scores. Soil temperature correlations (with in situ soil temperatures) are also lower by about 0.02–0.08 for most products in the warm season over North America, relative to Eurasia (not shown), which further contributes to reduced skill over North America.

As there are few stations above 1000 m, elevation does not have a substantial impact on product performance (Fig. 8). The slight improvement in near-surface cold season performance in CFSR at higher elevations can be linked to a slightly higher correlation, while in ERA-Interim, the slightly higher skill score is due to a slight improvement in cold season temperature variance (not shown). Skill scores and correlations at depth (not shown in Fig. 8) are lower by about 0.05–0.1 over higher elevation stations; however, the overall conclusions are not altered.

Figure 6Bias (solid colours), RMSE (hatching), and skill scores (circles) of each product for the cold season (blue) and the warm season (red) performance of reanalysis products over the zone with little to no permafrost (a, c) and the permafrost zone (b, d). The skill score is also shown for the annual cycle (purple circles). Panels (a) and (b) display the bias, RMSE, and skill score for the near-surface layer, while (c) and (d) display the bias, RMSE, and skill score at depth. The ensemble mean is shown beside for comparison.


Figure 7Bias (solid colours), RMSE (hatching), and skill scores (circles) of each product for the cold season (blue) and the warm season (red) performance of reanalysis products over North America (a, c) and Eurasia (b, d). The skill score is also shown for the annual cycle (purple circles). Panels (a) and (b) display the bias, RMSE, and skill score for the near-surface (0–30 cm) layer, while (c) and (d) display the bias, RMSE, and skill score at depth (30–300 cm). The ensemble mean is shown beside for comparison.


Figure 8Bias (solid colour), RMSE (hatching) and skill scores (circles) of each product the cold season (blue) and the warm season (red) performance of reanalysis products over low elevation grid cells (below 500 m) (a, c), and grid cells at or above 500 m elevation (b, d). The skill score is also shown for the annual cycle (purple circles). (a) and (b) displays the bias, RMSE and skill score for the near-surface (0–30 cm) layer, while (c) and (d) display the bias, RMSE and skill score at depth (30–300 cm). The ensemble mean is shown beside for comparison.


4.4 Soil temperature trends

We calculate product trends over the 1985–2010 period in order to be able to calculate a station estimate from a subset of 52 Eurasian grid cells that have a continuous time series, as well as a consistent number of sites and depths included over all dates and times (denoted as the “Eurasian subset” hereafter). Trends at depth are very similar in magnitude and spatial pattern to the near surface, so we focus on the near-surface results here. Trends in the Eurasian subset (hatched bars in Fig. 9c) are generally representative of the Eurasian average (solid blue bars), though they are overestimated slightly in the case of CFSR.

Regionally averaged annual mean soil temperature trends show a small positive trend of < 0.5 °C  per decade in most products, over both Eurasia and North America, and trends in most products are generally consistent with the station estimate over the Eurasian subset (Fig. 9c). In CFSR (purple circles), however, the trend is near zero over North America and tends towards negative in Eurasia, arising because of anomalously cold years in 2009 and 2010 (see inset in Fig. 9a), and anomalously warm periods in the 1980s and early 1990s at the beginning of the time series (Fig. 9a and b). It is likely that the cold anomalies in 2009 and 2010 can be linked to issues with CFSR snow cover (Fig. S4, purple line) between January 2009 and January 2011.

Similar to skill score and RMSE, products show greater disagreement over higher latitudes (Fig. S5) and during winter (Fig. S6). ERA5 in particular, ERA5-Land, and to a lesser extent, ERA-Interim show several pockets of cooling over Siberia and the western Arctic over North America (Fig. S5), driven by strong cooling trends during December, January, and February (DJF) (Fig. S6). Meanwhile, the cooling trends in DJF are not as apparent in FLDAS, JRA55, and MERRA2. Trends over June, July, and August show good agreement between products, with most regions showing small warming trends of < 1 °C per decade and pockets of slight cooling over portions of Eurasia and western North America (Fig. S7).

Figure 9Near-surface soil temperature anomalies and trends for each of the reanalysis products. Panel (a) displays the regionally averaged 1982–2018 annual mean soil temperature anomalies for each reanalysis product north of 40° N over Eurasia, while (b) displays the same but over North America. The CFSR time series is also shown as an inset to display the full range of values. Panel (c) exhibits an estimate of the regionally averaged 1985–2010 annual mean decadal soil temperature trend for each of the products and the ensemble mean for comparison. (The error bars represent the 95 % CI for the mean trend.)


4.5 Variability in seasonal extremes

The winter minimum and summer maximum show a cold bias over all latitude bands for most products (Fig. 10c and d), similar to the mean biases in the warm and cold seasons (Fig. 2). Similarly, the warm biases in ERA5-Land (blue diamond) and JRA55 (red circle) also extend to their winter maximum temperatures (Fig. 10c). The conclusions regarding variability in soil temperature extremes, at depth, are generally similar to those near the surface (Fig. S8c and d).

Table 5Standard deviation (as a measure of spread between products) of the mean biases in winter minimum and summer maximum soil temperature, as a function of latitude and depth (from Figs. 10 and S8c and d). Latitude bands are 10° in width, such that the 40° N latitude band is an average between 40 and 50° N, while the 60° N latitude band is an average between 60 and 70° N, for example.

Download Print Version | Download XLSX

The spread between products in the bias of the summer maximum sees less latitudinal variation than the spread in the winter maximum over both depths, though the spread is largest near the surface (Table 5). Using the standard deviation as a measure of spread between product biases, the near-surface standard deviation in winter minimum bias increases from 2.80 °C over the 40° N latitude band to 3.81 °C north of 60° N (Table 5), in large part because of the large winter biases in ERA-Interim (Fig. 10c, green squares). Meanwhile, the spread in the summer maximum bias sees smaller increases (from 2.32 °C at 40° N to 2.95 °C at 60° N) (Table 5).

Figure 10Performance of the near-surface soil temperature variability in the ensemble mean. Panel (a) is a scatterplot of the station and ensemble mean winter minimum soil temperature. Panel (b) is a scatterplot of the station and ensemble mean summer maximum soil temperature. Panel (c) shows latitudinal averages (from Eurasian grid cells) of the near-surface soil temperature winter minimum for the ensemble mean and contributing products. Panel (d) shows latitudinal averages (from Eurasian grid cells) of the near-surface soil temperature summer maximum for the ensemble mean and contributing products.


5 Ensemble mean product

5.1 Ensemble mean validation

The ensemble mean soil temperature product shows closer agreement with in situ soil temperatures than any of the individual products, when all depths, seasons, and regions are considered as a whole. First, its annual mean and cold season skill scores are higher than in any individual product. Second, its bias and RMSE are generally close in magnitude to the product with the lowest bias and RMSE over all depths and seasons (Fig. 2). Third, the ensemble mean product (Fig. 4, pink triangles) displays a temporal variance within 20 % of the observed variability over all depths and seasons, including in the cold season when many products fail to capture observed variance (Fig. 4).

The value of using the ensemble mean soil temperature is particularly noticeable in the cold season when individual products see a decline in skill, and a larger spread in performance. This is particularly noticeable in that its near-surface skill score in the cold season is nearly 10 % higher than the next best product (Fig. 2). Next, its correlation is roughly 0.05 larger than best individual product in cold season over both depths (Fig. 4). Moreover, its bias and RMSE only see relatively small increases over permafrost regions (Fig. 11), while products such as ERA5-Land, which have a small RMSE over more southern regions, see more substantial increases in bias and RMSE over the permafrost zone (Fig. S1). Thus, the ensemble mean soil temperature dataset provides the best estimate of in situ temperatures for the broadest range of conditions.

5.2 Ensemble mean soil temperature trends

We focus our analysis of trends on the near-surface data, as the spatial pattern of soil temperature trends near the surface and at depth show a pattern correlation of greater than 0.95 (not shown), and the conclusions regarding performance are generally similar. The ensemble mean shows a small positive soil temperature trend of 0.23 °C ± 0.09 °C per decade over Eurasia and 0.20 °C ± 0.109 °C per decade over North America between 1985 and 2010 (Fig. 9c). Most regions show small positive mean annual trends of < 0.5 °C per decade, though portions of North America and Siberia exhibit slight cooling trends of < 0.5 °C per decade (Fig. 12a).

Annual mean soil temperature trends in the ensemble mean over Eurasia show a strong correlation of 0.82 with observations (Fig. 12b). The ensemble mean also generally captures the correct sign of the observed trends, though it has a slight tendency to underestimate the magnitude of the trend (Fig. 12b). Trends at depth show a pattern correlation of 0.98 with the near surface (not shown in Fig. 12) and the conclusions are generally the same.

Figure 11Spatial map of bias for the ensemble mean product. Values for the cold season are shown in the left-hand panels and those for the warm season are shown in the right-hand panels. Panels (a) and (b) show the near-surface bias, while biases at depth are shown in (c) and (d).

Figure 12Panel (a) shows the 1985–2010 ensemble mean decadal soil temperature trends, near the surface, with the locations of validation grid cells included in the trend analysis shown as green dots. The grey contour line represents the boundary of the permafrost zone (regions with at least 50 % permafrost cover). Panel (b) shows the relationship between the near-surface ensemble mean and station soil temperature trends (per decade).

5.3 Ensemble mean variability in seasonal extremes

Similar to most products, the ensemble mean is biased cold in both the winter minimum (Fig. 10a) and the summer maximum (Fig. 10b) soil temperature. As Fig. 10a and b show, however, there is a fair degree of variability in the behaviour of the ensemble mean seasonal extremes – making an assessment of the mean behaviour in seasonal extremes somewhat tricky. Therefore, in the following paragraphs, we will focus on the most robust findings.

Near the surface, biases in the winter minimum soil temperature (Fig. 10a) are generally larger than in the summer maximum (Fig. 10b). This can also be seen in the latitudinally averaged soil temperatures, where the ensemble mean (pink line) is further from the station (black line) in the winter minimum (Fig. 10c) than in the summer maximum (Fig. 10d), which agrees with the findings that the near-surface cold season bias is generally larger than the bias in warm season (Fig. 2).

At depth, however, the summer maximum shows a larger bias (Fig. S8c) than the winter maximum (Fig. S8d), consistent with the finding that the extratropical mean bias is largest in the warm season at depth (Fig. 2). From Fig. S8b, we also see that the greatest disagreement in summer maximum occurs over the coldest regions.

Referring to Fig. 1a, several different types of grid cells are denoted. The first group, type-1 (16 occurrences) grid cells, is characterized by a strong cold bias in (underestimating) the winter minimum soil temperature (Fig. 13a). A second group of grid cells, which we refer to as type-2 (6 occurrences) grid cells, is defined as those which have a strong warm bias in (overestimating) the summer maximum temperature (Fig. 13b).

A common feature of the third group, type-3 grid cells, is that they underestimate the observed seasonal cycle of soil temperatures (Fig. 13c). While only one occurrence was found in the 52 Eurasian grid cells used for trend analysis, many grid cells in the Yukon would also show a similar underestimation of the seasonal cycle. This is evident as the ensemble mean normalized standard deviation (a measure of soil temperature variability) is substantially smaller than 1.0 in both seasons (Fig. S3). Often the in situ stations located within type-3 grid cells are located in areas devoid of vegetation, and it is likely that disagreements in the simulated vegetation cover in the contributing reanalysis products may partially account for the reduced seasonal cycle.

Figure 13Time series from selected grid cells showing the ensemble mean (pink) and station (black) soil temperatures. Panel (a) displays a time series where the ensemble mean simulates a winter minimum that is too cold, (b) a time series where the ensemble mean simulates a summer maximum that is too cold, and (c) a time series where the ensemble mean underestimates the seasonal cycle of soil temperatures.


6 Discussion and conclusions

This study conducted a validation of pan-Arctic soil temperatures for eight reanalysis products and validated a new ensemble mean pan-Arctic soil temperature dataset. The results are qualitatively similar to the findings of previous studies exploring reanalysis soil temperature performance in the extratropical Northern Hemisphere, which generally highlighted a cold bias in most products (Hu et al.2019; Qin et al.2020; Wu et al.2018; Xu et al.2019; Yang and Zhang2018; Zhan et al.2020). Similar to Li et al. (2021), we note greater biases in cold season soil temperatures, and our results qualitatively reflect the findings of Cao et al. (2020), who found that ERA5-Land exhibited warm soil temperature biases – particularly over higher latitudes.

The soil temperature trends reported here are of a similar magnitude to those reported by Biskaborn et al. (2019), who found that permafrost soil temperatures generally warmed at a rate of 0.29 °C ± 0.12 °C per decade, though ours differ in that they represent the mean extratropical Northern Hemisphere north of 40° N, whereas Biskaborn et al. (2019) predominantly focus on permafrost regions.

Other major differences here are that we developed an ensemble mean soil temperature product and had a greater focus on higher latitude regions than most other studies. We also note a strong difference in seasonal performance. Relative to the warm season, the cold season is generally characterized by lower skill, larger near-surface temperature biases, a larger spread in the reanalysis products' soil temperature variability, and lower correlations with in situ soil temperatures. When all depths and seasons are considered, the ensemble mean product performs better than any individual product, exhibiting a consistently high skill, realistic soil temperature variability, and relatively small biases over all seasons.

Here we show an approximate estimate of the magnitude of soil temperature uncertainty associated with instrumental uncertainties, and those associated with structural differences and parameterizations in land models, using the standard deviation in soil temperature across time and as a function of station temperature. A complete quantitative assessment of the contributions of various sources of uncertainty is not possible using this dataset, as time series did not have a consistent start or end date, and consequently, the metrics are calculated using different climatologies across different grid cells. A more complete uncertainty analysis is beyond the scope of this study but in the future could be achieved by limiting analysis to a subset of grid cells with consistent time series, for example, by focusing on soil temperature networks such as the Michigan Enviroweather network or the North Dakota Agricultural Weather Network, or limiting the uncertainty analysis to a smaller portion of the permafrost region.

We find that the median spread in the spatially averaged soil temperature of stations in a grid cell is approximately 1.49 °C (Fig. 1b) – several degrees smaller than the standard deviation of reanalysis soil temperatures for a given station soil temperature – particularly over frozen soils (Fig. 5). For example, when soil temperatures are below 20 °C, soil temperature standard deviations increase to near 10 °C in several products. Reanalysis 2 m air temperatures maintain a relatively consistent standard deviation between 1.25 and 1.75 °C for most products, and they only increase slightly to between 2.25 and 3.5 °C over the coldest station air temperatures (not shown). Unlike with soil temperatures, the spread in reanalysis 2 m air temperatures only increases modestly over the cold season (not shown). This would suggest that the largest degree of uncertainty in reanalysis soil temperatures over the Arctic is most likely caused by differences in the land models between products, rather than by uncertainties in observed soil temperatures, or from differences in product air temperatures.

6.1 Uncertainties associated with land model parameterizations and structural differences

Uncertainties in soil temperatures associated with structural differences and parameterizations in land models can be grouped into several categories. The first category surrounds the simplified parameterizations controlling frozen soil processes. For example, in the Noah LSM, utilized by CFSR and FLDAS, freeze–thaw processes are highly simplified and unsuited for permafrost simulations (Hu et al.2019) – and may have contributed to the relatively large soil temperature biases simulated in these products. Even the best-performing products – ERA5 and ERA5-Land – are unsuited for simulation of permafrost soil temperatures, as they fail to simulate phase-dependent changes in soil thermal conductivity (Cao et al.2020).

Yang et al. (2020) noted that larger soil temperature biases over the Qinghai–Tibetan Plateau in deeper soil layers were likely related to the fact that soil temperatures are less constrained by air temperature observations (and soil properties). This could also explain why soil temperature biases in the warm season are larger at depth than near the surface in this study. Moreover, the near-surface soil layers tend to fall within the active layer (which undergoes seasonal freeze–thaw cycles), while deeper soil layers are more likely to contain permafrost. Permafrost has a high degree of impermeability, which prevents soil water from infiltrating below the bottom of the active layer, and owing to latent heat considerations, leads to soil water freezing at the base of the active later (Zhao et al.2000); however, these processes are not well represented in reanalysis LSMs (Yang et al.2020; Hu et al.2019).

LSMs, such as the Simple Biosphere Model (used in JRA55), that use the force restore method to estimate soil temperature are prone to overestimating diurnal soil temperature ranges (Gao et al.2004; Kahan et al.2006) as well as the seasonal cycle of soil temperatures (Luo et al.2003). This is because they underestimate heat capacity and overestimate temporal variation in ground heat flux compared with more complex land models (Hong and Kim2010). Moreover, the force restore method assumes a strong diurnal forcing from above, an assumption that is likely violated when snow cover is present (Tilley and Lynch1998; Slater et al.2001), because snow cover leads to a decoupling of the surface forcing from the soil below. These factors may help explain why JRA55 is unable to simulate near-surface soil temperatures as accurately as the other products explored in this study which explicitly incorporate representations of soil heat flux between soil layers (Niu et al.2011; Koster et al.2000; van den Hurk et al.2000; Balsamo et al.2009), and hence they are able to simulate a dampening of seasonal variability in soil temperatures at depth (and greater variability near the surface).

Burke et al. (2020) note that differences in snow cover properties were important in explaining soil temperature biases of several Coupled Model Intercomparison Project 6 (CMIP6) models, and it is likely that differences in snow cover properties between the LSMs studied here could account for some of the observed spread, particularly in the cases of ERA-Interim, ERA5, and ERA5-Land, because during the warm season, these products have similar soil temperature biases, but their performance varies widely during the cold season (Figs. 2 and 10), in large part because of snow density biases (Cao et al.2020; Gao et al.2022). In ERA-Interim, the large cold (negative) bias during the cold season is strongly related to the fact that it overestimates the observed snow density (Gao et al.2022) and, consequently, also overestimates the thermal conductivity of the snow pack. Conversely, snow density (and thermal conductivity) in ERA5-Land (and ERA5) is too low, and hence biases in snow density are a large contributing factor to the warm (positive) bias during the cold season (Cao et al.2020).

Snow was also cited as a major controlling factor in soil temperature biases in ECMWF's Integrated Forecast System, which also uses the HTESSEL land surface model (Albergel et al.2015). In the case of the Noah LSM, which is included in CFSR/CFSv2 and FLDAS, Li et al. (2021) found that an overestimation of snow cover was a major contributor to larger soil temperature biases in winter over the Qinghai–Tibetan Plateau. Shukla et al. (2019) and Shukla and Huang (2020) found that overestimation of snow cover in CFSR during autumn and early winter leads to an overactive snow-albedo feedback and excessive cooling of the near-surface soil layers. This translates into a strong cold bias at depth over the spring and summer, and likely explains why CFSR's warm season bias and RMSE at depth are the largest of all seven products examined in this study (Fig. 2).

6.2 Uncertainties associated with scale effects

Here we evaluated soil temperatures at a relatively coarse resolution of 0.75°. As such, it is difficult for reanalysis products to capture local-scale variability in soil temperature. The subgrid-scale variability in soil temperatures calculated in Fig. 1b is of a similar magnitude to those calculated by previous studies exploring subgrid-scale variability in cryospheric soil temperatures (Gubler et al.2011; Morse et al.2012; Gisnås et al.2014), though they are generally smaller than those reported by Cao et al. (2019). We found that the spatial variability in soil temperatures in one high latitude grid cell is larger than 10 °C at times (Fig. 1) – of a similar magnitude to those reported by Cao et al. (2019).

Moreover, as many grid cells in Eurasia only included a single in situ station, there is a significant chance that this single in situ station may not necessarily be reflective of conditions elsewhere in the grid cell (Gubler et al.2011). When multiple in situ stations were available, we took the spatial mean of all stations in an attempt to estimate the mean soil temperature over the grid cell.

6.3 Uncertainties arising from sampling variability

As was described in Sect. 5.2, the presence of missing data created a challenge for calculating in situ soil temperature averages. While most grid cells in Eurasia had relatively consistent time series, and fewer issues with missing data, this was not the case over North America. Rather than limit our analysis to a small number of grid cells with little to no missing data (as we did for the calculation of soil temperature trends), we chose to make use of all available data at each time step when calculating our validation metrics (bias, RMSE, standard deviation, correlation, and skill score). Thus, the spatially averaged in situ soil temperature did not always contain a constant number of depths or grid cells at each time step in many grid cells over North America.

From Fig. 1b, it is apparent that the median variability in soil temperatures between stations within a grid cell (spatial variation), 1.49 °C, is roughly 1.75 times larger than the median variability in soil temperatures at different depths, 0.84 °C, for a particular station (depth variation). Thus, it would appear that fluctuations in the number of stations comprising the spatially averaged soil temperature are responsible for a greater proportion of the uncertainty than fluctuations in the number of depths included. However, it is also apparent that the uncertainties arising from variations in the number of grid cells included in a station average are substantially smaller than the spread between reanalysis products. During the cold season, the uncertainty in soil temperatures associated with the spread between reanalysis products is often two to three times larger than the uncertainty arising from fluctuations in station availability.

6.4 Applications and future work

The ensemble mean data product provides gridded, monthly-averaged soil temperature estimates of near-surface and deeper soil temperatures at a 0.75° resolution. Therefore, it is most suitable for regional or hemispheric-scale analyses of soil temperature climatologies, or their seasonal cycle, or to explore recent trends in soil temperatures. The product could also be used to provide boundary conditions for models that require soil temperature inputs, such as hydrological models, and for the validation of model soil temperatures. While the ensemble mean product still exhibits substantial cold biases over permafrost regions, and therefore is likely unsuitable for permafrost modelling, the RMSE over the permafrost zone of the ensemble mean product outperforms the RMSE of the best-performing product by 0.5 °C, on average (Fig. S1), in the cold season, and hence it may still provide some added value for estimation of high latitude soil temperatures relative to the individual products.

A robust ensemble mean can be computed with four products (not shown), which means a higher resolution ensemble mean data product could be created using a subset of the higher resolution reanalysis products. For example, ERA5, ERA5-Land, MERRA2, and CFSR have resolutions lower than 0.5°. Using a similar blending methodology, we have been investigating the performance of a 0.31° product (using a smaller subset of products that provide data at higher spatial resolution). We have also performed similar analyses with a 0.05° soil temperature product, using interpolated soil temperatures from the Arctic System Reanalysis version 2 (ASR), ERA5-Land, and FLDAS. The goal has been to assess the impact of spatial resolution on the performance of the ensemble mean product. We hope to include these results in a follow-up paper. Future work should aim to investigate how differences in snow cover and snow density between the reanalysis products may influence biases in the individual products. On a related note, future studies should also emphasize how differences in the land model structure and parameterization may account for the spread in soil temperatures.

Data availability

The CRU TS v 4.07 2 m air temperature dataset can be found on the CRU TS dataset website (, Harris et al.2020). The Obu et al. (2018) permafrost map is available from GTN-P data (; GTN-P2018) are available from The Global Terrestrial Network for Permafrost ( and the Kropp et al. (2020) dataset is available from Heather Kropp's Arctic Data Center page ( Roshydromet data are available from RIHMI-WDC (, Sherstiukov2012; Veselov et al.2021) and Nordicana data can be obtained from Nordicana D (, CEN2022). CFSR (, Saha et al.2010b), CFSv2 (, Saha et al.2012), ERA-Interim (, European Centre for Medium-Range Weather Forecasts2012), ERA5 (, European Centre for Medium-Range Weather Forecasts2019), and JRA55 (, Japan Meteorological Agency2014) data were obtained from the National Center for Atmospheric Research's (NCAR) Research Data Archive (RDA) (, University Corporation for Atmospheric Research2024). FLDAS (; McNally and NASA/GSFC/HSL2018) and MERRA2 (; Global Modeling and Assimilation Office2015) were obtained from the Goddard Earth Sciences Data and Information Services Center (GES DISC) (, Goddard Earth Sciences Data and Information Services Center2024). ERA5-Land data (, Muñoz-Sabater2019) were downloaded from the Copernicus Climate Change Service (C3S) Climate Data Store (CDS) (, Copernicus and ECMWF2024). The ensemble mean soil temperature dataset has been made available on the Arctic Data Center (; Herrington and Fletcher2023).


The supplement related to this article is available online at:

Author contributions

TCH and CGF conceived the study. TCH gathered and analyzed the data, and interpreted the data together with CGF. HK provided in situ data for the study. TCH and CGF wrote the paper with contributions from HK.

Competing interests

The contact author has declared that none of the authors has any competing interests.


Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.


The authors thank the editor and anonymous referees, as well as Hugh Brendan O'Neill and Andre Erler, for their helpful comments towards improving the final version of the paper.

Financial support

This research has been supported by the Natural Sciences and Engineering Research Council of Canada (grant no. PGSD3-504032-2017).

Review statement

This paper was edited by Ylva Sjöberg and reviewed by four anonymous referees.


Albergel, C., Dutra, E., Muñoz-Sabater, J., Haiden, T., Balsamo, G., Beljaars, A., Isaksen, L., de Rosnay, P., Sandu, I., and Wedi, N.: Soil temperature at ECMWF: An assessment using ground-based observations: Soil temperature at ECMWF, J. Geophys. Res.-Atmos., 120, 1361–1373,, 2015. a, b, c, d

Alberta Agriculture, Forestry and Rural Economic Development: Current and Historical Alberta Weather Station Data, (last access: 15 July 2022), 2022. a, b

Allard, M., Sarrazin, D., and Hérault, L.: Borehole and near-surface ground temperatures in northeastern Canada, v. 1.5 (1988–2019), Nordicana D8 [data set],, 2020. a

Balsamo, G., Beljaars, A., Scipal, K., Viterbo, P., van den Hurk, B., Hirschi, M., and Betts, A. K.: A Revised Hydrology for the ECMWF Model: Verification from Field Site to Terrestrial Water Storage and Impact in the Integrated Forecast System, J. Hydrometeorol., 10, 623–643,, 2009. a, b

Beck, H. E., Pan, M., Roy, T., Weedon, G. P., Pappenberger, F., van Dijk, A. I. J. M., Huffman, G. J., Adler, R. F., and Wood, E. F.: Daily evaluation of 26 precipitation datasets using Stage-IV gauge-radar data for the CONUS, Hydrol. Earth Syst. Sci., 23, 207–224,, 2019. a

Betts, A., Chen, F., Mitchell, K. E., and Janjić, Z. I.: Assessment of the Land Surface and Boundary Layer Models in Two Operational Versions of the NCEP Eta Model Using FIFE Data, Mon. Weather Rev., 125, 2896–2916, 1997. a

Biskaborn, B. K., Smith, S. L., Noetzli, J., Matthes, H., Vieira, G., Streletskiy, D. A., Schoeneich, P., Romanovsky, V. E., Lewkowicz, A. G., Abramov, A., Allard, M., Boike, J., Cable, W. L., Christiansen, H. H., Delaloye, R., Diekmann, B., Drozdov, D., Etzelmüller, B., Grosse, G., Guglielmin, M., Ingeman-Nielsen, T., Isaksen, K., Ishikawa, M., Johansson, M., Johannsson, H., Joo, A., Kaverin, D., Kholodov, A., Konstantinov, P., Kröger, T., Lambiel, C., Lanckman, J.-P., Luo, D., Malkova, G., Meiklejohn, I., Moskalenko, N., Oliva, M., Phillips, M., Ramos, M., Sannel, A. B. K., Sergeev, D., Seybold, C., Skryabin, P., Vasiliev, A., Wu, Q., Yoshikawa, K., Zheleznyak, M., and Lantuit, H.: Permafrost is warming at a global scale, Nat. Commun., 10, 264,, 2019. a, b

Burke, E. J., Zhang, Y., and Krinner, G.: Evaluating permafrost physics in the Coupled Model Intercomparison Project 6 (CMIP6) models and their sensitivity to climate change, The Cryosphere, 14, 3155–3174,, 2020. a

Cameron, E., Lantz, T., O’Neill, H., Gill, H., Kokelj, S., and Burn, C.: Permafrost Ground Temperature Report: Ground temperature variability among terrain types in the Peel Plateau region of the Northwest Territories (2011–2015), Tech. Rep. NWT 2017-009, Northwest Territories Geological Survey, Northwest Territories, Canada,, 2019. a, b

Cao, B., Quan, X., Brown, N., Stewart-Jones, E., and Gruber, S.: GlobSim (v1.0): deriving meteorological time series for point locations from multiple global reanalyses, Geosci. Model Dev., 12, 4661–4679,, 2019. a, b, c

Cao, B., Gruber, S., Zheng, D., and Li, X.: The ERA5-Land soil temperature bias in permafrost regions, The Cryosphere, 14, 2581–2595,, 2020. a, b, c, d, e, f, g, h, i, j, k

Cao, B., Arduini, G., and Zsoter, E.: Brief communication: Improving ERA5-Land soil temperature in permafrost regions using an optimized multi-layer snow scheme, The Cryosphere, 16, 2701–2708,, 2022. a

CEN: Climate station data from Whapmagoostui-Kuujjuarapik Region in Nunavik, Quebec, Canada, v. 1.5 (1987–2019), Nordicana D4 [data set],, 2020a. a, b

CEN: Climate station data from the Sheldrake river region in Nunavik, Quebec, Canada, v. 1.1 (1986–2019), Nordicana D61 [data set],, 2020b. a, b

CEN: Climate station data from the Clearwater lake region in Nunavik, Quebec, Canada, v. 1.1 (1986–2019), Nordicana D57 [data set],, 2020c. a, b

CEN: Climate station data from the Little Whale River region in Nunavik, Quebec, Canada, v. 1.1 (1993–2019), Nordicana D58 [data set],, 2020d. a, b

CEN: Climate station data from the Biscarat river region in Nunavik, Quebec, Canada, v. 1.0 (2005–2019), Nordicana D62 [data set],, 2020e. a, b

CEN: Climate station data from Northern Ellesmere Island in Nunavut, Canada, v. 1.7 (2002–2019), Nordicana D8 [data set],, 2020f. a, b

CEN: Environmental data from Boniface river region in Nunavik, Quebec, Canada, v. 1.3 (1988–2019), Nordicana D7 [data set],, 2020g. a, b

CEN: Nordicana D, Centre for Northern Studies [data set], (last access: 19 January 2022), 2024. a

Chen, F., Mitchell, K., Schaake, Y., Xue, Y., Pan, H.-L., Koren, V., Duan, Q., Ek, M., and Betts, A.: Modeling of land surface evaporation by four schemes and comparison with FIFE observations, J. Geophys. Res., 101, 7251–7268, 1996. a

Chen, H., Nan, Z., Zhao, L., Ding, Y., Chen, J., and Pang, Q.: Noah Modelling of the Permafrost Distribution and Characteristics in the West Kunlun Area, Qinghai-Tibet Plateau, China: Noah Modelling of Permafrost, Permafrost Periglac., 26, 160–174,, 2015. a

Copernicus and ECMWF: Copernicus Climate Change Service,!/home (last access: 10 April 2024), 2024. a

Dee, D. P., Uppala, S. M., Simmons, A. J., Berrisford, P., Poli, P., Kobayashi, S., Andrae, U., Balmaseda, M. A., Balsamo, G., Bauer, P., Bechtold, P., Beljaars, A. C. M., van de Berg, L., Bidlot, J., Bormann, N., Delsol, C., Dragani, R., Fuentes, M., Geer, A. J., Haimberger, L., Healy, S. B., Hersbach, H., Hólm, E. V., Isaksen, L., Kållberg, P., Köhler, M., Matricardi, M., McNally, A. P., Monge-Sanz, B. M., Morcrette, J.-J., Park, B.-K., Peubey, C., de Rosnay, P., Tavolato, C., Thépaut, J.-N., and Vitart, F.: The ERA-Interim reanalysis: configuration and performance of the data assimilation system, Q. J. Roy. Meteor. Soc., 137, 553–597,, 2011. a, b

Derber, J., Parrish, D. F., and Lord, S. J.: The New Global Operational Analysis System at the National Meteorological Center, Weather Forecast., 6, 538–547, 1991. a

de Rosnay, P., Drusch, M., Vasiljevic, D., Balsamo, G., Albergel, C., and Isaksen, L.: A simplified Extended Kalman Filter for the global operational soil moisture analysis at ECMWF, Q. J. Roy. Meteor. Soc., 139, 1199–1213,, 2013. a

de Rosnay, P., Balsamo, G., Albergel, C., Muñoz-Sabater, J., and Isaksen, L.: Initialisation of Land Surface Variables for Numerical Weather Prediction, Surv. Geophys., 35, 607–621, 2014. a

Dirmeyer, P. A., Koster, R. D., and Guo, Z.: Do Global Models Properly Represent the Feedback between Land and Atmosphere?, J. Hydrometeorol., 7, 1177–1198, 2006. a

Dorigo, W., Wagner, W., Albergel, C., Albrecht, F., Balsamo, G., Brocca, L., Chung, D., Ertl, M., Forkel, M., Gruber, A., Haas, E., Hamer, P. D., Hirschi, M., Ikonen, J., de Jeu, R., Kidd, R., Lahoz, W., Liu, Y. Y., Miralles, D., Mistelbauer, T., Nicolai-Shaw, N., Parinussa, R., Pratola, C., Reimer, C., van der Schalie, R., Seneviratne, S. I., Smolander, T., and Lecomte, P.: ESA CCI Soil Moisture for improved Earth system understanding: State-of-the art and future directions, Remote Sens. Environ., 203, 185–215,, 2017. a

Ducharne, A., Koster, R. D., Suarez, M. J., Stieglitz, M., and Kumar, P.: A catchment-based approach to modeling land surface processes in a general circulation model: 2. Parameter estimation and model demonstration, J. Geophys. Res.-Atmos., 105, 24823–24838,, 2000. a

Déry, S.: Cariboo Alpine Mesonet (CAMnet) Database, Zenodo [data set],, 2017. a, b

Ek, M.: Implementation of Noah land surface model advances in the national centers for environmental prediction operational mesoscale Eta model, J. Geophys. Res., 108, 8851,, 2003. a

Ensom, T., Kokelj, S., and McHugh, K.: Permafrost Ground Temperature Report: Inuvik to Tuktoyaktuk Highway stream crossing and alignment sites, Northwest Territories, Tech. Rep. NWT Open Report 2019-004, Northwest Territories Geological Survey, Northwest Territories, Canada,, 2020. a, b

Enviroweather: Enviroweather Network, Enviroweather [data set], (last access: 12 April 2024), 2024. a, b

European Centre for Medium-Range Weather Forecasts: ERA-Interim Project, Monthly Means,, 2012. a

European Centre for Medium-Range Weather Forecasts: ERA5 Reanalysis (Monthly Mean 0.25 Degree Latitude-Longitude Grid),, 2019. a

European Space Agency: Copernicus Global Digital Elevation Model GLO-90 [data set],, 2021. a

Funk, C., Peterson, P., Landsfeld, M., Pedreros, D., Verdin, J., Shukla, S., Husak, G., Rowland, J., Harrison, L., Hoell, A., and Michaelsen, J.: The climate hazards infrared precipitation with stations – a new environmental record for monitoring extremes, Sci. Data, 2, 150066,, 2015. a

Gao, S., Li, Z., Zhang, P., Zeng, J., Chen, Q., Zhao, C., Liu, C., Wu, Z., and Qiao, H.: An Assessment of the Applicability of Three Reanalysis Snow Density Datasets Over China Using Ground Observations, IEEE Geosci. Remote Sens. Lett., 19, 1–5,, 2022. a, b

Gao, Z., Chae, N., Kim, J., Hong, J., Choi, T., and Lee, H.: Modeling of surface energy partitioning, surface temperature, and soil wetness in the Tibetan prairie using the Simple Biosphere Model 2 (SiB2): MODELING OF THE SURFACE PROCESSES, J. Geophys. Res.-Atmos., 109, D06102,, 2004. a

Gelaro, R., McCarty, W., Suárez, M. J., Todling, R., Molod, A., Takacs, L., Randles, C. A., Darmenov, A., Bosilovich, M. G., Reichle, R., Wargan, K., Coy, L., Cullather, R., Draper, C., Akella, S., Buchard, V., Conaty, A., da Silva, A. M., Gu, W., Kim, G.-K., Koster, R., Lucchesi, R., Merkova, D., Nielsen, J. E., Partyka, G., Pawson, S., Putman, W., Rienecker, M., Schubert, S. D., Sienkiewicz, M., and Zhao, B.: The Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA-2), J. Climate, 30, 5419–5454,, 2017. a, b

Gisnås, K., Westermann, S., Schuler, T. V., Litherland, T., Isaksen, K., Boike, J., and Etzelmüller, B.: A statistical approach to represent small-scale variability of permafrost temperatures due to snow cover, The Cryosphere, 8, 2063–2074,, 2014. a

Global Modeling and Assimilation Office: MERRA-2 tavg1_2d_lnd_Nx: 2d,1-Hourly,Time-Averaged,Single-Level,Assimilation,Land Surface Diagnostics V5.12.4, Greenbelt, MD, USA, Goddard Earth Sciences Data and Information Services Center (GES DISC) [data set],, 2015. a

Goddard Earth Sciences Data and Information Services Center (GES DISC):, last access: 10 April 2024. a

Gruber, S., Brown, N., Stewart-Jones, E., Karunaratne, K., Riddick, J., Peart, C., Subedi, R., and Kokelj, S. V.: Permafrost Ground Temperature Report: Ground temperature and site characterisation data from the Canadian Shield tundra near Lac de Gras, Northwest Territories, Canada, Tech. Rep. NWT Open Report 2018-009, Northwest Territories Geological Survey, Northwest Territories, Canada,, 2019. a, b, c

GTN-P: GTN-P global mean annual ground temperature data for permafrost near the depth of zero annual amplitude (2007–2016), PANGAEA [data set],, 2018. a, b, c, d

Gubler, S., Fiddes, J., Keller, M., and Gruber, S.: Scale-dependent measurement and analysis of ground surface temperature variability in alpine terrain, The Cryosphere, 5, 431–443,, 2011. a, b

Harada, Y., Kamahori, H., Kobayashi, C., Endo, H., Kobayashi, S., Ota, Y., Onoda, H., Onogi, K., Miyaoka, K., and Takahashi, K.: The JRA-55 Reanalysis: Representation of Atmospheric Circulation and Climate Variability, J. Meteorol. Soc. Jpn. Ser. II, 94, 269–302,, 2016. a

Harris, I., Osborn, T. J., Jones, P., and Lister, D.: Version 4 of the CRU TS monthly high-resolution gridded multivariate climate dataset, Sci. Data, 7, 109,, 2020. a, b

Hernández-Henríquez, M. A., Sharma, A. R., Taylor, M., Thompson, H. D., and Déry, S. J.: The Cariboo Alpine Mesonet: sub-hourly hydrometeorological observations of British Columbia's Cariboo Mountains and surrounding area since 2006, Earth Syst. Sci. Data, 10, 1655–1672,, 2018. a, b

Herrington, T. and Fletcher, C. G.: Ensemble Mean Reanalysis Soil Temperature Dataset at 0.75 degree latitude/longitude resolution (60 South to 90 North), 1982–2018, Arctic Data Center [data set],, 2023. a

Hersbach, H., Bell, B., Berrisford, P., Hirahara, S., Horányi, A., Muñoz‐Sabater, J., Nicolas, J., Peubey, C., Radu, R., Schepers, D., Simmons, A., Soci, C., Abdalla, S., Abellan, X., Balsamo, G., Bechtold, P., Biavati, G., Bidlot, J., Bonavita, M., Chiara, G., Dahlgren, P., Dee, D., Diamantakis, M., Dragani, R., Flemming, J., Forbes, R., Fuentes, M., Geer, A., Haimberger, L., Healy, S., Hogan, R. J., Hólm, E., Janisková, M., Keeley, S., Laloyaux, P., Lopez, P., Lupu, C., Radnoti, G., Rosnay, P., Rozum, I., Vamborg, F., Villaume, S., and Thépaut, J.: The ERA5 global reanalysis, Q. J. Roy. Meteor. Soc., 146, 1999–2049,, 2020. a

Hong, J. and Kim, J.: Numerical study of surface energy partitioning on the Tibetan plateau: comparative analysis of two biosphere models, Biogeosciences, 7, 557–568,, 2010. a

Hu, G., Zhao, L., Wu, X., Li, R., Wu, T., Xie, C., Pang, Q., and Zou, D.: Comparison of the thermal conductivity parameterizations for a freeze-thaw algorithm with a multi-layered soil in permafrost regions, Catena, 156, 244–251,, 2017. a

Hu, G., Zhao, L., Li, R., Wu, X., Wu, T., Xie, C., Zhu, X., and Su, Y.: Variations in soil temperature from 1980 to 2015 in permafrost regions on the Qinghai-Tibetan Plateau based on observed and reanalysis products, Geoderma, 337, 893–905,, 2019. a, b, c, d, e, f

Hugelius, G., Strauss, J., Zubrzycki, S., Harden, J. W., Schuur, E. A. G., Ping, C.-L., Schirrmeister, L., Grosse, G., Michaelson, G. J., Koven, C. D., O'Donnell, J. A., Elberling, B., Mishra, U., Camill, P., Yu, Z., Palmtag, J., and Kuhry, P.: Estimated stocks of circumpolar permafrost carbon with quantified uncertainty ranges and identified data gaps, Biogeosciences, 11, 6573–6593,, 2014. a

Japan Meteorological Agency: JRA-55 Product Users’ Handbook: Model Grid Data, Japan Meteorological Agency, Japan, (last access: 28 September 2023), 2014. a

Jiao, M., Zhao, L., Wang, C., Hu, G., Li, Y., Zhao, J., Zou, D., Xing, Z., Qiao, Y., Liu, G., Du, E., Xiao, M., and Hou, Y.: Spatiotemporal Variations of Soil Temperature at 10 and 50 cm Depths in Permafrost Regions along the Qinghai-Tibet Engineering Corridor, Remote Sens., 15, 455,, 2023. a

Johannsen, F., Ermida, S., Martins, J. P. A., Trigo, I. F., Nogueira, M., and Dutra, E.: Cold Bias of ERA5 Summertime Daily Maximum Land Surface Temperature over Iberian Peninsula, Remote Sens., 11, 2570,, 2019. a

Jones, P. W.: First- and Second-Order Conservative Remapping Schemes for Grids in Spherical Coordinates, Mon. Weather Rev., 127, 2204–2210,<2204:FASOCR>2.0.CO;2, 1999. a

Kahan, D. S., Xue, Y., and Allen, S. J.: The impact of vegetation and soil parameters in simulations of surface energy and water balance in the semi-arid sahel: A case study using SEBEX and HAPEX-Sahel data, J. Hydrol., 320, 238–259,, 2006. a

Kim, Y. and Wang, G.: Impact of vegetation feedback on the response of precipitation to antecedent soil moisture anomalies over north america, J. Hydrometeorol., 8, 534–550,, 2007. a

Kobayashi, S., Ota, Y., Harada, Y., Ebita, A., Moriya, M., Onoda, H., Onogi, K., Kamahori, H., Kobayashi, C., Endo, H., Miyaoka, K., and Takahashi, K.: The JRA-55 Reanalysis: General Specifications and Basic Characteristics, J. Meteorol. Soc. Jpn. Ser. II, 93, 5–48,, 2015. a, b

Koren, V., Schaake, J., Mitchell, K., and Chen, F.: A parameterization of snowpack and frozen ground intended for NCEP weather and climate models, J. Geophys. Res.-Atmos., 104, 19569–19585, 1999. a

Koster, R. D., Suarez, M. J., Ducharne, A., Stieglitz, M., and Kumar, P.: A catchment-based approach to modeling land surface processes in a general circulation model: 1. Model structure, J. Geophys. Res.-Atmos., 105, 24809–24822,, 2000. a, b

Koster, R. D., Suarez, M. J., Liu, P., Jambor, U., Berg, A., Kistler, M., Reichle, R., Rodell, M., and Famiglietti, J.: Realistic initialization of land surface states: Impacts on subseasonal forecast skill, J. Hydrometeorol., 5, 1049–1063,, 2004. a

Koven, C. D., Ringeval, B., Friedlingstein, P., Ciais, P., Cadule, P., Khvorostyanov, D., Krinner, G., and Tarnocai, C.: Permafrost carbon-climate feedbacks accelerate global warming, P. Natl. Acad. Sci. USA, 108, 14769–14774,, 2011. a

Kropp, H., Loranty, M. M., Sannel, B., O'Donnell, J., and Blanc-Bates, E.: Synthesis of soil-air temperature and vegetation measurements in the pan-Arctic 1990–2016. Arctic Data Center [data set]., 2020. a, b, c, d

Lee, S.-C., Christen, A., Black, A. T., Johnson, M. S., Jassal, R. S., Ketler, R., Nesic, Z., and Merkens, M.: Annual greenhouse gas budget for a bog ecosystem undergoing restoration by rewetting, Biogeosciences, 14, 2799–2814,, 2017. a, b

Li, M., Wu, P., and Ma, Z.: Comprehensive evaluation of soil moisture and soil temperature from third‐generation atmospheric and land reanalysis datasets, Int. J. Climatol., 40, joc.6549,, 2020. a, b, c, d

Li, X., Wu, T., Wu, X., Chen, J., Zhu, X., Hu, G., Li, R., Qiao, Y., Yang, C., Hao, J., Ni, J., and Ma, W.: Assessing the simulated soil hydrothermal regime of the active layer from the Noah-MP land surface model (v1.1) in the permafrost regions of the Qinghai–Tibet Plateau, Geosci. Model Dev., 14, 1753–1771,, 2021. a, b

Luo, D., Liu, L., Jin, H., Wang, X., and Chen, F.: Characteristics of Ground Surface Temperature at Chalaping in the Source Area of the Yellow River, Northeastern Tibetan Plateau, Agr. Forest Meteorol., 281, 107819,, 2020. a

Luo, L., Robock, A., Vinnikov, K. Y., Schlosser, C. A., Slater, A. G., Boone, A., Etchevers, P., Habets, F., Noilhan, J., Braden, H., Cox, P., de Rosnay, P., Dickinson, R. E., Dai, Y., Zeng, Q.-C., Duan, Q., Schaake, J., Henderson-Sellers, A., Gedney, N., Gusev, Y. M., Nasonova, O. N., Kim, J., Kowalczyk, E., Mitchell, K., Pitman, A. J., Shmakin, A. B., Smirnova, T. G., Wetzel, P., Xue, Y., and Yang, Z.-L.: Effects of Frozen Soil on Soil Temperature, Spring Infiltration, and Runoff: Results from the PILPS 2(d) Experiment at Valdai, Russia, J. Hydrometeorol., 4, 334–351,<334:EOFSOS>2.0.CO;2, 2003. a

Ma, H., Zeng, J., Zhang, X., Fu, P., Zheng, D., Wigneron, J.-P., Chen, N., and Niyogi, D.: Evaluation of six satellite- and model-based surface soil temperature datasets using global ground-based observations, Remote Sens. Environ.t, 264, 112605,, 2021. a, b, c, d, e, f

McNally, A. and NASA/GSFC/HSL: FLDAS Noah Land Surface Model L4 Global Monthly 0.1 x 0.1 degree (MERRA-2 and CHIRPS), Tech. rep., Goddard Earth Sciences Data and Information Services Center (GES DISC), Greenbelt, Maryland, USA, Goddard Earth Sciences Data and Information Services Center (GES DISC) [data set],, 2018. a

McNally, A., Arsenault, K., Kumar, S., Shukla, S., Peterson, P., Wang, S., Funk, C., Peters-Lidard, C. D., and Verdin, J. P.: A land data assimilation system for sub-Saharan Africa food and water security applications, Sci. Data, 4, 170012,, 2017. a, b

Morris, J., Hernández-Henríquez, M., and Déry, S.: Cariboo Alpine Mesonet meteorological data, 2017–2021, Zenodo [data set],, 2021. a, b

Morse, P., Burn, C., and Kokelj, S.: Influence of snow on near-surface ground temperatures in upland and alluvial environments of the outer Mackenzie Delta, Northwest Territories, Can. J. Earth Sci., 49, 895–913,, 2012. a

Mudryk, L. R., Derksen, C., Kushner, P. J., and Brown, R.: Characterization of Northern Hemisphere Snow Water Equivalent Datasets, 1981–2010, J. Climate, 28, 8037–8051,, 2015. a

Muñoz-Sabater, J.: ERA5-Land monthly averaged data from 1981 to present, Copernicus Climate Change Service (C3S) Climate Data Store (CDS) [data set],, 2019. a

Muñoz-Sabater, J., Dutra, E., Agustí-Panareda, A., Albergel, C., Arduini, G., Balsamo, G., Boussetta, S., Choulga, M., Harrigan, S., Hersbach, H., Martens, B., Miralles, D. G., Piles, M., Rodríguez-Fernández, N. J., Zsoter, E., Buontempo, C., and Thépaut, J.-N.: ERA5-Land: a state-of-the-art global reanalysis dataset for land applications, Earth Syst. Sci. Data, 13, 4349–4383,, 2021. a, b, c

Niu, G.-Y., Yang, Z.-L., Mitchell, K. E., Chen, F., Ek, M. B., Barlage, M., Kumar, A., Manning, K., Niyogi, D., Rosero, E., Tewari, M., and Xia, Y.: The community Noah land surface model with multiparameterization options (Noah-MP): 1. Model description and evaluation with local-scale measurements, J. Geophys. Res., 116, D12109,, 2011. a

North Dakota Agricultural Weather Network: The North Dakota Agricultural Weather Network [data set], (last access: 3 August 2022), 2022. a, b

Obu, J., Westermann, S., Kääb, A., and Bartsch, A.: Ground Temperature Map, 2000-2016, Northern Hemisphere Permafrost, PANGAEA [data set],, 2018. a, b, c

Obu, J., Westermann, S., Bartsch, A., Berdnikov, N., Christiansen, H. H., Dashtseren, A., Delaloye, R., Elberling, B., Etzelmüller, B., Kholodov, A., Khomutov, A., Kääb, A., Leibman, M. O., Lewkowicz, A. G., Panda, S. K., Romanovsky, V., Way, R. G., Westergaard-Nielsen, A., Wu, T., Yamkhin, J., and Zou, D.: Northern Hemisphere permafrost map based on TTOP modelling for 2000–2016 at 1 km2 scale, Earth-Sci. Rev., 193, 299–316,, 2019. a

Onogi, K., Tsutsui, J., Koide, H., Sakamoto, M., Kobayashi, S., Hatsushika, H., Matsumoto, T., Yamazaki, N., Kamahori, H., Takahashi, K., Kadokura, S., Wada, K., Kato, K., Oyama, R., Ose, T., Mannoji, N., and Taira, R.: The JRA-25 Reanalysis, J. Meteorol. Soc. Jpn. Ser. II, 85, 369–432,, 2007. a

Qin, Y., Liu, W., Guo, Z., and Xue, S.: Spatial and temporal variations in soil temperatures over the Qinghai–Tibet Plateau from 1980 to 2017 based on reanalysis products, Theor. Appl. Climatol., 140, 1055–1069,, 2020. a, b, c

Reichle, R. H., Draper, C. S., Liu, Q., Girotto, M., Mahanama, S. P. P., Koster, R. D., and De Lannoy, G. J. M.: Assessment of MERRA-2 Land Surface Hydrology Estimates, J. Climate, 30, 2937–2960,, 2017a. a

Reichle, R. H., Liu, Q., Koster, R. D., Draper, C. S., Mahanama, S. P. P., and Partyka, G. S.: Land Surface Precipitation in MERRA-2, J. Climate, 30, 1643–1664,, 2017b. a

RoTimi Ojo, E. and Manaigre, L.: The Manitoba Agriculture Mesonet: Technical Overview, B. Am. Meteorol. Soc., 102, E1786–E1804,, 2021. a, b

Royer, A., Picard, G., Vargel, C., Langlois, A., Gouttevin, I., and Dumont, M.: Improved Simulation of Arctic Circumpolar Land Area Snow Properties and Soil Temperatures, Front. Earth Sci., 9, 685140,, 2021. a

Rudy, A., Kokelj, S., Morse, P., and Ensom, T.: Permafrost Ground Temperature Report: Inuvik to Tukyoyaktuk Highway Sentinel Sites, Northwest Territories, Northwest Territories Geological Survey, Yellowknife, NWT, Canada, Technical Report, NWT Open Report 2019-017,, 2020. a

Saha, S., Moorthi, S., Wu, X., Wang, J., Nagida, S., Tripp, P., Behringer, D., Hou, Y., Chuang, H., Iredell, M., Ek, M., Meng, J., Yang, R., Mendez, M. P., van den Dool, H., Zhang, Q., Wang, W., Chen, M., and Becker, E.: NCEP Climate Forecast System Version 2 (CFSv2) Monthly Products, Research Data Archive at the National Center for Atmospheric Research, Computational and Information Systems Laboratory [data set],, 2012. a

Saha, S., Moorthi, S., Pan, H.-L., Wu, X., Wang, J., Nadiga, S., Tripp, P., Kistler, R., Woollen, J., Behringer, D., Liu, H., Stokes, D., Grumbine, R., Gayno, G., Wang, J., Hou, Y.-T., Chuang, H.-y., Juang, H.-M. H., Sela, J., Iredell, M., Treadon, R., Kleist, D., Van Delst, P., Keyser, D., Derber, J., Ek, M., Meng, J., Wei, H., Yang, R., Lord, S., van den Dool, H., Kumar, A., Wang, W., Long, C., Chelliah, M., Xue, Y., Huang, B., Schemm, J.-K., Ebisuzaki, W., Lin, R., Xie, P., Chen, M., Zhou, S., Higgins, W., Zou, C.-Z., Liu, Q., Chen, Y., Han, Y., Cucurull, L., Reynolds, R. W., Rutledge, G., and Goldberg, M.: The NCEP Climate Forecast System Reanalysis, B. Am. Meteorol. Soc., 91, 1015–1058,, 2010a. a, b

Saha, S., Moorthi, N., Pan, H.-L., Wu, X., Wang, J., Nadiga, S., Tripp, P., Kistler, R., Woollen, J., Behringer, D., Liu, H., Stokes, D., Grumbine, R., Gayno, G., Wang, J., Hou, Y.-T., Chuang, H.-Y., Juang, H.-M. H., Sela, J., Iredell, M., Treadon, R., Kleist, D., Van Delst, P., Keyser, D., Derber, J., Ek, M., Meng, J., Wei, H., Yang, H., Lord, S., van den Dool, H., Kumar, A., Wang, W., Long, C., Chelliah, M., Xue, Y., Huang, B., Schemm, J.-K., Ebisuzaki, W., Lin, R., Xie, P., Chen, M., Zhou, S., Higgins, W., Zou, C.-Z., Liu, Q., Chen, Y., Han, Y., Cucurull, L., Reynolds, R. W., Rutledge, G., and Goldberg, M.: NCEP Climate Forecast System Reanalysis (CFSR) Monthly Products, January 1979 to December 2010, Research Data Archive at the National Center for Atmospheric Research, Computational and Information Systems Laboratory [data set],, 2010b. a

Saha, S., Moorthi, S., Wu, X., Wang, J., Nadiga, S., Tripp, P., Behringer, D., Hou, Y.-T., Chuang, H.-y., Iredell, M., Ek, M., Meng, J., Yang, R., Mendez, M. P., van den Dool, H., Zhang, Q., Wang, W., Chen, M., and Becker, E.: The NCEP Climate Forecast System Version 2, J. Climate, 27, 2185–2208,, 2014. a

Sato, N., Sellers, P., Randall, D., Schneider, E., Shukla, J., Kinter, III, J., Hou, Y.-T., and Albertazzi, E.: Effects of Implementing the Simple Biosphere Model in a General Circulation Model, J. Atmos. Sci., 46, 2757–2782,<2757:EOITSB>2.0.CO;2, 1988. a

Sellers, P. J., Mintz, Y., Sud, Y. C., and Dalcher, A.: A Simple Biosphere Model (SIB) for Use within General Circulation Models, J. Atmos. Sci., 43, 505–531,<0505:ASBMFU>2.0.CO;2, 1986. a

Sherstiukov, A.: Dataset of daily soil temperature up to 320 cm depth based on meteorological stations of Russian Federation, Trudy VNIIGMI-MTsD, 176, 224–232, 2012 (in Russian). a, b, c, d, e

Shukla, R. P. and Huang, B.: Cumulative Influence of Summer Subsurface Soil Temperature on North America Surface Temperature in the CFSv2, J. Geophys. Res.-Atmos., 125, e2019JD031899,, 2020. a

Shukla, R. P., Huang, B., Dirmeyer, P. A., and Kinter, J. L.: The Influence of Summer Deep Soil Temperature on Early Winter Snow Conditions in Eurasia in the NCEP CFSv2 Simulation, J. Geophys. Res.-Atmos., 124, 9062–9077,, 2019. a

Siqueira, M., Katul, G., and Porporato, A.: Soil Moisture Feedbacks on Convection Triggers: The Role of Soil–Plant Hydrodynamics, J. Hydrometeorol., 10, 96–112,, 2009. a

Slater, A. G., Schlosser, C. A., Desborough, C. E., Pitman, A. J., Henderson-Sellers, A., Robock, A., Vinnikov, K. Y., Entin, J., Mitchell, K., Chen, F., Boone, A., Etchevers, P., Habets, F., Noilhan, J., Braden, H., Cox, P. M., Rosnay, P. d., Dickinson, R. E., Yang, Z.-L., Dai, Y.-J., Zeng, Q., Duan, Q., Koren, V., Schaake, S., Gedney, N., Gusev, Y. M., Nasonova, O. N., Kim, J., Kowalczyk, E. A., Shmakin, A. B., Smirnova, T. G., Verseghy, D., Wetzel, P., and Xue, Y.: The Representation of Snow in Land Surface Schemes: Results from PILPS 2(d), J. Hydrometeorol., 2, 7–25,<0007:TROSIL>2.0.CO;2, 2001. a

Smith, S., Romanovsky, V., Lewkowicz, A., Burn, C., Allard, M., Clow, G., Yoshikawa, K., and Throop, J.: Thermal State of Permafrost in North America: A Contribution to the International Polar Year, Permafrost Periglac., 21, 117–135,, 2010. a, b

Spence, C. and Hedstrom, N.: Baker Creek Research Catchment Hydrometeorological and Hydrological Data, Federated Research Data Repository [data set],, 2018a. a

Spence, C. and Hedstrom, N.: Hydrometeorological data from Baker Creek Research Watershed, Northwest Territories, Canada, Earth Syst. Sci. Data, 10, 1753–1767,, 2018b. a, b

Street, L. and Wookey, P.: Soil temperature, soil moisture, air temperature and relative humidity for vegetation at Siksik Creek, North West Territories, Canada [data set],, 2016. a

Street, L. E., Mielke, N., and Woodin, S. J.: Phosphorus Availability Determines the Response of Tundra Ecosystem Carbon Stocks to Nitrogen Enrichment, Ecosystems, 21, 1155–1167,, 2018. a

Taylor, K. E.: Summarizing multiple aspects of model performance in a single diagram, J. Geophys. Res.-Atmos., 106, 7183–7192,, 2001. a

Thackeray, C. W., Fletcher, C. G., and Derksen, C.: Quantifying the skill of CMIP5 models in simulating seasonal albedo and snow cover evolution: CMIP5-SIMulated albedo and SCF skill, J. Geophys. Res.-Atmos., 120, 5831–5849,, 2015. a

Tilley, J. S. and Lynch, A. H.: On the applicability of current land surface schemes for Arctic tundra: An intercomparison study, J. Geophys. Res., 103, 29051–29063,, 1998. a

University Corporation for Atmospheric Research: NSF NCAR Research Data Archive (RDA), (last access: 10 April 2024), 2024. a

van den Hurk, B. J. J. M., Viterbo, P., Beljaars, A. C. M., and Betts, A. K.: Offline Validation of the ERA-40 Surface Scheme, Tech. Rep. Technical Memorandum 295, European Centre for Medium-Range Weather Forecasts, Shinfield Park, Reading, United Kingdom, (last access: 23 July 2020), 2000. a

Veselov, V. M., Pribylskaya, I. R., Mirzeabasov, O. A., and VNIIGMI-WCD: All-Russian Research Institute of Hydrometeorological Information – World Data Center – Specialized arrays for climate research [data set], (last access: 3 May 2021), 2022. a

Viterbo, P. and Beljaars, A. C. M.: An improved land surface parametrization scheme in the ECMWF model and its validation, J. Climate, 8, 2716–2748,<2716:AILSPS>2.0.CO;2, 1995. a

Viterbo, P. and Betts, A.: Impact on ECMWF forecasts of changes to the albedo of the boreal forests in the presence of snow, J. Geophys. Res., 104, 27803–27810, 1999. a

World Meteorological Organization: Guidelines on Ensemble Prediction Systems and Forecasting, Tech. Rep. WMO-No. 1091, World Meteorological Organization, Geneva, Switzerland, (last access: 1 November 2021), 2012. a

Wu, X., Nan, Z., Zhao, S., Zhao, L., and Cheng, G.: Spatial modeling of permafrost distribution and properties on the Qinghai-Tibet Plateau, Permafrost Periglac., 29, 86–99,, 2018. a, b, c, d

Xia, Y., Ek, M., Sheffield, J., Livneh, B., Huang, M., Wei, H., Feng, S., Luo, L., Meng, J., and Wood, E.: Validation of Noah-Simulated Soil Temperature in the North American Land Data Assimilation System Phase 2, J. Appl. Meteorol. Climatol., 52, 455–471,, 2013. a

Xiao, Y., Zhao, L., Dai, Y., Li, R., Pang, Q., and Yao, J.: Representing Permafrost Properties in CoLM for the Qinghai-Xizang (Tibetan) Plateau, Cold Reg. Sci. Technol., 87, 68–77, 2013. a

Xie, P. and Arkin, P. A.: Global Precipitation: A 17-Year Monthly Analysis Based on Gauge Observations, Satellite Estimates, and Numerical Model Outputs, B. Am. Meteorol. Soc., 78, 2539–2558,<2539:GPAYMA>2.0.CO;2, 1997. a

Xu, W., Sun, C., Zuo, J., Ma, Z., Li, W., and Yang, S.: Homogenization of Monthly Ground Surface Temperature in China during 1961–2016 and Performances of GLDAS Reanalysis Products, J. Climate, 32, 1121–1135,, 2019. a

Xue, Y., Huang, B., Hu, Z.-Z., Kumar, A., Wen, C., Behringer, D., and Nadiga, S.: An assessment of oceanic variability in the NCEP climate forecast system reanalysis, Clim. Dynam., 37, 2511–2539,, 2011. a

Yang, K. and Zhang, J.: Evaluation of reanalysis datasets against observational soil temperature data over China, Clim. Dynam., 50, 317–337,, 2018. a, b, c, d

Yang, S., Li, R., Wu, T., Hu, G., Xiao, Y., Du, Y., Zhu, X., Ni, J., Ma, J., Zhang, Y., Shi, J., and Qiao, Y.: Evaluation of reanalysis soil temperature and soil moisture products in permafrost regions on the Qinghai-Tibetan Plateau, Geoderma, 377, 114583,, 2020. a, b

Yi, Y., Kimball, J. S., Chen, R. H., Moghaddam, M., and Miller, C. E.: Sensitivity of active-layer freezing process to snow cover in Arctic Alaska, The Cryosphere, 13, 197–218,, 2019.  a

Yukon Geological Survey: Yukon permafrost reports data, in: Yukon Permafrost Database, Government of Yukon, (last access: 26 July 2022), 2021. a, b

Zhan, M.-J., Xia, L., Zhan, L., and Wang, Y.: Evaluation and Analysis of Soil Temperature Data over Poyang Lake Basin, China, Adv. Meteorol., 2020, 1–11,, 2020. a, b

Zhang, J., Wang, W.-C., and Wei, J.: Assessing land-atmosphere coupling using soil moisture from the Global Land Data Assimilation System and observational precipitation, J. Geophys. Res.-Atmos., 113, D17119,, 2008. a

Zhao, C., Gong, C., Duan, H., Yan, P., Liu, Y., and Zhou, G.: Evaluation of Three Reanalysis Soil Temperature Datasets with Observation Data over China, Earth, 3, 1042–1058,, 2022. a

Zhao, L., Cheng, G., Li, S., Zhao, X., and Wang, S.: Thawing and freezing processes of active layer in Wudaoliang region of Tibetan Plateau, Chin. Sci. Bull., 45, 2181–2187,, 2000. a

Zhao, T., Guo, W., and Fu, C.: Calibrating and evaluating reanalysis surface temperature error by topographic correction, J. Climate, 21, 1440–1446, 2008. a

Short summary
Here we validate soil temperatures from eight reanalysis products across the pan-Arctic and compare their performance to a newly calculated ensemble mean soil temperature product. We find that most product soil temperatures have a relatively large RMSE of 2–9 K. It is found that the ensemble mean product outperforms individual reanalysis products. Therefore, we recommend the ensemble mean soil temperature product for the validation of climate models and for input to hydrological models.