Diagnostic and model dependent uncertainty of simulated Tibetan permafrost area

Introduction Conclusions References


Introduction
The Tibetan Plateau (TP) has the highest and largest lowlatitude frozen ground in the world, with more than 50 % of its area occupied by permafrost (Zhou et al., 2000).The unique geography and plateau climate make the permafrost on the TP very different from the Arctic.The TP permafrost is warmer (with only discontinuous and sporadic permafrost (Zhou et al., 2000)), has less underground ice (Ran et al., 2012) and has no large forests (Wu, 1980).The active layer thickness ranges from 1 to 3 m, with some intensely degraded area reaching 4.5 m (Wu and Liu, 2004;Wu and Zhang, 2010;Zhang and Wu, 2012).Freeze/thaw cycles, and the extent of permafrost play an important role in the thermal state of the TP.The underlying surface temperature contrast between the TP and the Indian Ocean is an important controlling factor for both the Asian monsoon and the wider general atmospheric circulation (Xin et al., 2012).As the TP gets intensely warmer (IPCC, 2013;Wu et al., 2013), the impact of degraded permafrost on desertification (Li et al., 2005(Li et al., , 2014;;Yang et al., 2010), water cycling (Cheng and Jin, 2013;Yao et al., 2013), carbon budget (Dörfer et al., 2013;Wang et al., 2008;Schuur et al., 2008), and infrastructure (Wu and Niu, 2013;Yu et al., 2013) have also become active research topics.
Hence, the simulation of TP permafrost is motivated both by its global importance and by its unique properties.A number of land-surface models (LSMs) (e.g.CLM4.0,CoLM, SHAW, Couple Model and FSM) have been applied at individual station locations on the TP to reproduce soil thermohydro dynamics (Li et al., 2009;Wang and Shi, 2007;Xiong et al., 2014;Zhang et al., 2012).Simulations of ground temperature and moisture variations are relatively realistic when using observed atmospheric forcing (Guo and Yang, 2010;Luo et al., 2008).The results were improved by setting appropriate permafrost parameters for soil organic matter contents and soil texture properties (Luo et al., 2008;Wang et al., 2007;Xiong et al., 2014).CLM4.0 has also been used to provide future projections of permafrost extent for the whole TP (Guo and Wang, 2013;Guo et al., 2012), and simulates 81 % loss of permafrost area by the end of 21st century under the A1B greenhouse gas emissions scenario.This raises the question of how reliable the estimate is in comparison with results from other models.
Simulations of Northern Hemisphere (NH) permafrost area showed large differences amongst Coupled Model Intercomparison Project (CMIP5) models (Koven et al., 2013;Slater and Lawrence, 2013).Moreover, different diagnostic methods, using either a direct method which relies on model simulated ground temperatures, or indirect methods inferred from air temperatures and snow characteristics, also lead to quite different permafrost areas.Slater and Lawrence (2013) applied two direct methods to 19 CMIP5 models and found differences of up to 12.6 × 10 6 km 2 in diagnosed NH permafrost area.Saito (2013) showed that differences in pre-industrial NH continuous permafrost area between direct and indirect methods were around 3 × 10 6 km 2 .This raises the question of why different methods arrive at different estimates and which method is better suited.
A reliable simulation of permafrost extent is important, since permafrost is a comprehensive reflection of soil thermo-hydro dynamics that is hard to measure directly except at sparse observational sites.Further, reliable presentday simulations can contribute to an increased confidence in simulations of future permafrost degradation by these models.Our approach provides information on the ability of models on the warmer and physically unique TP permafrost in a NH simulation, hence providing some test of reliability for simulations of present and future global permafrost over the TP.
To date, an examination of the uncertainties in modelderived TP permafrost area has not been attempted.One way of estimating this uncertainty is to explore a single model and to perform a set of sensitivity experiments in which the model parameters are modified (e.g.Dankers et al., 2011;Essery et al., 2013;Gubler et al., 2013).An alternative approach is to explore an ensemble of multiple models where the uncertainty is discussed in terms of the spread among the models (e.g.Koven et al., 2013;Slater and Lawrence, 2013).Here we follow the second approach and examine the uncertainty of TP permafrost simulations by an ensemble of six state-of-the-art stand-alone land-surface schemes.The models are from the Permafrost Carbon Network (PCN; http://www.permafrostcarbon.org/) and include a broad variety of snow and ground parameters and descriptions, along with a clear experimental design under prescribed observation-based atmospheric forcing.The first focus of our paper is therefore the quantification of the uncertainty in the simulated TP permafrost area due to the models' structural and parametric differences.Further, using time series of soil temperature from the few available TP stations, we discuss the biases in relation to the land-surface model description (e.g.soil texture, vegetation and snow cover).We also discuss in the paper the uncertainty due to the different methods of diagnosing the TP permafrost area, with 5 different (direct and indirect) methods.
In Sect. 2 we introduce the different methods used to derive permafrost extent for the TP from LSMs.Section 3 describes the applied model data, the observation-based estimate of TP permafrost map, the method to assess the agreement of simulated vs. observation-based estimates of permafrost maps, and ground temperature data to evaluate soil thermal profiles simulated by the models.Results and discussion are presented in Sect. 4. Section 5 further discusses the specific land-surface processes causing ground temperature discrepancies.The robustness of the results is presented in Sect.6, and conclusions are summarized in Sect.7.

Permafrost diagnosis
We make use of all five major permafrost diagnostic methods promoted in the literature (Slater and Lawrence, 2013;Guo et al., 2012;Guo and Wang, 2013;Wang et al., 2006;Wang, 2010;Nan et al., 2002Nan et al., , 2012;;Saito et al., 2013;Ran et al., 2012;Wang et al., 2006;Jin et al., 2007;Xu et al., 2001;Nelson and Outcalt, 1987).Since the model intercomparison relies on LSMs that are all driven at monthly resolution, the methods we use are tailored, as usual, to reflect the forcing data resolution.The model-derived TP permafrost maps are shown in Fig. 1.The modeling spatial domain is not consistent among the models.CLM4.5, CoLM, JULES and UVic cover the whole TP while others (ISBA, LPJ-GUESS) do not (Table 1).We mainly focus on the common modeling region (Fig. 1) to discuss differences between models and methods, but also give the results for whole TP for the four models that produce them.
In detail, the five methods are the following: 1. Temperature in soil layers (TSL) The TSL method allows a direct diagnosis of permafrost from modeled soil temperature (Slater and Lawrence, 2013).The standard definition of permafrost is that ground remains at or below 0 • C for at least 2 consecutive years.Many recent modeling studies (e.g.Guo et al. (2012), Guo and Wang (2013), Slater and Lawrence (2013) and references therein), have consistently adapted this for land surface and earth system models by defining a model grid cell as permafrost if the simulated ground temperature (of at least one level in the upper soil) remains at or below 0 • C for at least 24 consecutive months.Furthermore, these modelbased studies are limited by the maximum soil depth of the models (Table 1).Hence, we analyze the ground temperatures down to a depth of 3 m, which should be satisfactory as this range spans the observed active layer thickness on the TP.Data at higher than monthly temporal resolution are not stored by the models in the PCN  archive.Therefore, TSL diagnosis is calculated from monthly mean soil temperatures, which has been previously demonstrated to be a viable substitute for modelbased estimates of permafrost both on the TP (Guo et al., 2012;Guo and Wang, 2013) and for the Arctic (Slater and Lawrence, 2013).

Mean annual ground temperature (MAGT)
Permafrost is detected if the mean annual ground temperature at the depth of zero annual amplitude is at or below 0 • C (Slater and Lawrence, 2013).Some papers use a slightly higher critical temperature, e.g.0.5 • C (Wang et al., 2006;Wang, 2010;Nan et al., 2002), which has been found to fit TP observations well.Slater and Lawrence (2013) suggested MAGT as an indicator of deeper permafrost.The problem with this definition is that many models have quite shallow soil depth (Table 1), and of course, zero amplitude would require great (actually infinite in steady state) soil depth.For practical purposes, we use MAGT at 3 m depth (the approximate base of the active layer) and the common critical temperature of 0 • C.Although annual ground temperature amplitudes at 3 m depth are still several degrees, they are much smaller than the amplitudes in up-per layers (Sect. 4.3).We investigated one model with a larger depth range (CLM4.5;Table 1) in more detail, but found that using MAGT at 38 m depth does not significantly change the derived permafrost area (Sect.6.2).

Surface frost index (SFI)
Originally, Nelson and Outcalt (1987) introduced the surface frost index SFI * , also used in Slater and Lawrence (2013): where DDF * a and DDT a are the annual freezing and thawing degree-day sums, both calculated using air temperature (indicated by "a" subscripts), and with DDF * a further modified to correct for the insulating effect of snow cover (indicated by the "*" superscript).In this way, SFI * is designed to reflect the ground surface thermal conditions by combining snow insulation effect with air temperature.However, the snow insulation effect alone can not account for the soil structure complexity.So here we calculate surface frost index directly from the ground-surface temperature (indicated by "s" subscripts) (Nan et al., 2012), using an asymmetric sinusoidal annual temperature cycle fitted to the warmest and coldest monthly temperatures (T h , T c ) and a frost angle (β) (Nan et al., 2012): (2)  2013) is a proxy for F.

Mean annual air temperature (MAAT)
A critical value of MAAT is often used to derive the southern boundary of permafrost (Ran et al., 2012;Wang et al., 2006;Jin et al., 2007).The −2 • C isotherm of MAAT has been found to fit well with TP observation-based permafrost maps (Xu et al., 2001).MAAT has been used to compare the air-temperaturebased permafrost area with permafrost areas derived by other methods (Koven et al., 2013;Saito et al., 2013).Note that the calculation method of MAAT in Saito et al. (2013) is slightly different from that used in other works.Here we calculated MAAT traditionally, as the average of 12 monthly 2 m air temperatures.
All five diagnostic methods are summarized in Table 2.The three direct methods (TSL, MAGT, SFI) are based on simulated ground temperatures, while the two indirect methods (F and MAAT) use the prescribed air temperature.SFI is mainly controlled by air temperature and snow cover, but it also depends on how the soil is parameterized, so SFI is somewhat closer to the indirect methods than are TSL and MAGT.
The three methods introduced in the 1980s (SFI, F, MAAT), were designed to map permafrost based on the assumption that the permafrost distribution is related to cli-matic parameters.Although permafrost processes are directly represented in present-day climate models, the simulated soil temperatures have considerable errors, and the directly diagnosed permafrost area has model-dependent biases (Koven et al., 2013;Slater and Lawrence, 2013).Therefore the older indirect diagnostic methods are also still very commonly used (e.g.Wang et al., 2006;Jin et al., 2007;Ran et al., 2012;Nan et al., 2012;Slater and Lawrence, 2013;Saito et al., 2013;Koven et al., 2013).The TP permafrost area directly diagnosed from the simulated monthly soil temperatures (TSL) is not superior to the other methods in comparison with the observation-derived permafrost map (Figs. 1  and 2).Hence, we consider all five diagnostic methods to quantify the full range of uncertainty in the model-derived permafrost maps.
Since the forcing air temperatures of LSMs were not the same due to discrepancies in the historical temperature data sets (and precipitation and other forcing fields) used by the individual models (Table 1), we use the indirect methods to quantify forcing differences.If these differences are not too large, we can attribute the differences in the direct method-derived permafrost areas primarily to differences of modeled land surface processes.Across-model and acrossmethod variability are listed in Table 3.As we use fairly small numbers of methods and models, rather than defining uncertainty in terms of standard deviation, we choose to use the full range of values from the simulations and define uncertainty as maximum-minimum values among the models.

Data from stand-alone LSMs
Output from six stand-alone LSMs participating in the intermodel comparison project "Vulnerability of Permafrost Carbon to Climate Change" (http://www.permafrostcarbon.org/) is analyzed in this study (Table 1).The simulations have been generally conducted for recent decades from 1960 to 2009 using monthly resolution climate forcing input data.Each modeling team was free to choose appropriate driving data sets for climate, atmospheric CO 2 , N deposition, disturbance, soil texture, etc., as used in their standard modeling system.Model spin-ups are also different, but they are long enough (around 1000 years) to ensure that the deep carbon is in equilibrium.The LSMs use different horizontal model resolutions and different soil layer divisions (Table 1).
Our analysis is based on monthly averages of the driving air temperature and simulated ground temperature.As three models (CoLM, JULES and LPJ-GUESS; Table 1) have shallow soil layers, we restrict our analysis to the common depth range spanning near surface to 3 m.Ground temperatures were linearly interpolated onto the common depths: 0.05, 0.1, 0.2, 0.5, 1, 2, 3 m.Since there is no ground surface temperature output, we linearly extrapolate the top two layers' soil www.the-cryosphere.net/10/287/2016/The Cryosphere, 10, 287-306, 2016 temperatures onto the ground surface.For CLM4.5, CoLM, ISBA and LPJ-GUESS, the first layer soil depth is not deeper than 0.01 m and the second layer soil depth is not deeper than 0.05 m.For JULES and UVic, the first layer soil depth is 0.05 m and the second layer soil depth is not deeper than 0.18 m.Most TP permafrost work has been post-1980 (Guo and Wang, 2013;Nan et al., 2012), so we choose 1980 as the start of the analysis period.The end is limited to the year 2000 by results from the JULES model (Table 1).
The LSMs in this study considered the following processes: dynamic vegetation, carbon cycling (Rawlins et al., 2015), snow, near-surface hydrological budget, soil thermal dynamics (Peng et al., 2015) and the treatment of freezing soil.Sophistication in the treatment of these processes varies amongst the models with each having specific parameterizations.In this study we investigate some key schemes and parameters that are important for permafrost simulation: (1) unfrozen water/phase change.All models calculate soil thermal properties as a function of soil moisture and consider the phase change of water/ice, but CoLM and LPJ-GUESS do not consider transformation to ice of water solute mixtures below 0 • C, which is a key feature in soil freezing and thawing.(2) Surface organic layer insulation.Only CLM4.5 and ISBA consider the insulating effect of moss.(3) Soil texture parameterization.The specified fraction of clay and sand in soil differs.LPJ-GUESS specifies the same soil texture for the TP as for the Arctic.(4) Organic soil fraction treatment.The organic content of soil differs among the models.LPJ-GUESS sets the same value for TP as for the more organically rich permafrost of the Arctic.(5) Snow processes.ISBA, LPJ-GUESS and UVic set static snow layers.UVic uses an implicit snow scheme while LPJ-GUESS uses the Bulk-layer scheme, which are both simpler than the dynamic multi-layer snow scheme of the other land models.

TP permafrost observation-based map
Mapping permafrost on TP is challenging due to the absence of field observations, especially in the central and western parts where permafrost is widespread.In practice, permafrost maps on the TP have been statistical models based on a compilation of earlier maps, aerial photographs, Landsat images and terrain analysis (Ran et al., 2012;Shi et al., 1988;Li and Cheng, 1996;Nan et al., 2002) as well as on some MAGT and MAAT data from the few long-term monitoring sites (Ran et al., 2012;Wang et al., 2006).The classification and therefore the mapping of TP permafrost is not consistent across the different studies (Ran et al., 2012).Thus there is a large spread of observation-based TP permafrost area estimates from 110 × 10 4 km 2 (Wang et al., 2006) to 150 × 10 4 km 2 (Shi et al., 1988;Li and Cheng, 1996).
The mostly widely used map by Li and Cheng (1996) has large differences from other maps, and shows excess permafrost in the southeast where permafrost can only exist on extremely cold mountains (Gruber, 2012).The International Permafrost Association (IPA) map (Brown et al., 1997;Heginbottom, 2002) is the most widely used in NH permafrost analysis.However, the IPA map is not well suited for the TP because the data and information in this map are based on the map made by Shi et al. (1988), which has not been updated since.
We use the 1 : 4 000 000 Map of the Glaciers, Frozen Ground and Deserts in China (Wang et al. (2006), hereafter referred to as the "Wang06 map") as the primary reference.The map is based on MAGT (Nan et al., 2002) with 0.5 • C as the boundary between permafrost and seasonally frozen ground.Nan et al. (2002) fitted a multiple linear regression between latitude, altitude and MAGT from all 76 TP stations having borehole data, and extrapolated this regression to the whole TP with a 1 km resolution digital elevation model (DEM).to get the MAGT distribution.The Wang06 map was re-gridded to match the different model resolutions and spatial domain (see "Wang06 map" column in Fig. 1), and the different permafrost areas derived from the methods and models are compared with the Wang06 map in Fig. 2.
We emphasize that the Wang06 map is subject to uncertainty as it is based on a relatively sparse set of observations and then statistical extrapolation.Nan et al. (2013) pointed out that permafrost was overestimated in the western TP in both the maps by Li and Cheng (1996) and Wang et al. (2006).However, a better permafrost map covering the whole TP is not available.

Measure of agreement between simulated and Wang06 permafrost maps
To evaluate the agreement of a simulated permafrost map with the Wang06 map, we calculate the Kappa coefficient (Cohen, 1960;Monserud and Leemans, 1992;Wang, 2010), K, which measures the degree of agreement between two maps.
where the total number of the map points is n, and s is the number of points where simulation and observational estimate agree.The numbers of Wang06 map cells with permafrost is a 1 , and those without are a 0 , and the corresponding simulated map cell numbers are b 1 and b 0 .
The calculated K matrix of simulated and Wang06 permafrost maps is presented in Fig. 3. Empirically and statistically arbitrary quality values for K have been proposed, e.g.Cohen (1960) suggested that K ≥ 0.8 signifies excellent agreement, 0.6 ≤ K < 0.8 represents substantial agreement, 0.4 ≤ K < 0.6 represents moderate agreement, 0.2 ≤ K < 0.4 represents fair agreement, while lack of agreement corresponds to K < 0.2.There is a sample size issue in estimating the confidence of K and this can be a factor when very small numbers of grid points are available (here this applies to UVic).

Data used to examine model thermal structures
The derived permafrost maps depend on the modeled ground thermal structures.However, field studies on the TP are quite limited, and we have only short-duration (1996)(1997)(1998)(1999)(2000) ground temperature profiles obtained from the GEWEX Asian Monsoon Experiment (GAME)-Tibet (Yang et al., 2003)  polated onto the station locations) in Fig. 4 and Table 4.We also give a short description of the sites vegetation and soil texture information, both from observation and models.
We also analyze monthly air and ground temperatures in a selected area in the western TP (33-36 • N, 82.5-85.5 • E, Fig. 1) to examine across-model differences (Fig. 5).The air temperature is also different among the models, especially in the winter season, though the differences are much smaller than soil temperatures differences.As this region is the coldest part of the TP (according to the annual mean air temperature), permafrost is widely distributed and the active layer thickness is less than 3 m.However, TSL-method-derived permafrost areas vary significantly among the models in this area (Fig. 1).Despite the lack of any ground temperature observations in this area, the definite presence of permafrost makes it useful to look at the simulated ground thermal structures as well as their differences as a way of interpreting the modeled permafrost areas.

Uncertainties in air-temperature-derived permafrost area
Air-temperature-derived permafrost maps are investigated with the two indirect methods, F and MAAT.Figures 1 and 2 compare both Wang06 and model-derived permafrost maps, and show that F produces consistently excessive permafrost area compared with MAAT.That is because the empirical threshold of −2 • C for MAAT fits well with TP observations (Xu et al., 2001), while F ≥ 0.5 is a theoretical assumption, which has been reported to overestimate permafrost area (Nelson and Outcalt, 1987;Slater and Lawrence, 2013).Accordingly, Fig. 3 shows that the F-derived permafrost is less consistent with the Wang06 map (model average K = 0.3 for  the common region) than the MAAT-derived permafrost area (K = 0.5).Across-model variability (Table 3) for the MAAT-based method is 14 × 10 4 km 2 and for the F-based method is 17 × 10 4 km 2 , equivalent to about 14-17 % of the Wang06 permafrost area inside the common modeling region (101 × 10 4 km 2 ).This variability is much smaller than the 56 % calculated by Slater and Lawrence (2013) for the CMIP5 models with SFI * over NH permafrost area.The relatively smaller difference among the models here is because, although the temperature forcing was not identical among models, the mean annual air temperature and its spatial variability in the permafrost region are quite similar (between −6 and −8 • C).Since the across-model variations in permafrost extent using the air-temperature-based indirect methods are relatively small, the variations in the direct method derived extents can primarily be attributed to the LSMs structural and parametric differences.

Uncertainties in model-derived permafrost area
There is a large across-model variability of permafrost area derived from direct methods (TSL, MAGT and SFI) (Figs. 1 and 2; 111-120 × 10 4 km 2 ; Table 3), and it is similar for all three diagnostic methods.This across-model variability is much larger than the variability using the indirect methods discussed in Sect.4.1, and is equivalent to 110-112 % of Wang06 permafrost area for the common modeling region.CMIP5 across-model variability derived from TSL in NH permafrost area was similarly large (Slater and Lawrence, 2013;Koven et al., 2013).Clearly this points to large acrossmodel differences in ground thermal structures.
The across-method (TSL, MAGT and SFI) variability in permafrost area (Figs. 1 and 2; Table 3) is very variable between models: UVic and LPJ-GUESS have smallest ranges (up to 9 × 10 4 km 2 ), while CoLM has the largest (87 × 10 4 km 2 ) (Table 3), near to the total permafrost area of the common region.Thus the across-direct method range is similar to the across-model range.Slater and Lawrence (2013) also emphasized the variable acrossmethod variability for NH permafrost area between models.However, Saito et al. (2013) showed insignificant variability across both direct and indirect methods for derived preindustrial NH continuous permafrost area.

Model evaluation based on K and ground temperature profile
A good land-surface model should adequately simulate the seasonal and annual ground temperature profiles.Hence one quality test for a model is that it should be able to produce "good" permafrost maps, which we define as agreement with the observation-based map, based on all three direct diagnostic methods.The applied criterion is the Kappa coefficient K (Sect.3.3), and we limit the discussion to the K associated with TSL, MAGT and SFI, which are calculated with simulated soil temperatures.If we take the (arbitrary) threshold K ≥ 0.4 (indicating "moderate agreement"), then no model passes this test for the common simulation region, while reducing the threshold to K ≥ 0.2 ("fair agreement") allows most models and methods to pass, while UVic stands out as a clear failure (Fig. 3).
If the criterion for an acceptable model is ground temperature bias ≤ ±2.0 • C, then simulations of mean annual ground temperatures from most models (CLM4.5,CoLM, ISBA and JULES) agree with the observations, but the simulation of seasonal cycle amplitude of only one model (ISBA) is consistent with the limited observations.However, if the criterion is biased ≤ ±1.0 • C, then no model agrees with observations for neither mean annual ground temperature nor the seasonal cycle amplitude (Fig. 4, Table 4).
We now look at the performance of the two models with larger biases in mean annual ground temperature: LPJ-GUESS and UVic.LPJ-GUESS simulated too cold (by more than 3 • C) mean annual ground temperatures for both the surface and deeper layers (Fig. 4, Table 4).The summer temperatures simulated by the model in the surface layers are especially cold, with maximum temperatures lower than observation by 8 • C (Fig. 4a and c) and its ground temperature amplitude is substantially underestimated (Table 4), which must greatly limit the summer thaw depth.This cold soil results in substantial overestimation of permafrost area (119-131 × 10 4 km 2 ; Table 3, Fig. 2) with small across-method variability.
UVic simulates a soil thermal state that is the warmest among the models, with the simulated mean annual ground temperature at D66 surpassing observation by more than 7 • C (Fig. 4, Table 4).If the observational sites are representative then the generally too warm ground temperature in UVic is the reason for the extremely small simulated permafrost area (8 × 10 4 km 2 ; Table 3, Fig. 2) with all direct methods, and hence to no across-method variability, and poor agreement with the Wang06 permafrost map (K < 0.1; Fig. 3).

Method comparison based on K and ground temperature profile
Permafrost maps derived using MAGT and SFI often show larger area than TSL (Fig. 2), with generally better agreement with the Wang06 map (Fig. 3).The MAGT method simply defines a grid as permafrost as long as its 3 m mean annual ground temperature is colder than 0 • C, and a permafrost threshold value of SFI ≥ 0.5 also only requires the mean annual ground surface temperature is lower than 0 • C (Nan et al., 2012).Figures 4 and 5 show most models meet these criteria.However, assuming that the site observations are representative, the simulated mean annual ground temperatures of both surface and deeper soil layers often have obvious biases (≥ ±1 • C) in all the models (Fig. 4 and Table 4).
www.the-cryosphere.net/10/287/2016/The Cryosphere, 10, 287-306, 2016 In general, model-derived permafrost distribution using the TSL method shows little agreement with the Wang06 map (Figs.1-3).In contrast with MAGT and SFI methods, the TSL method requires adequate simulation of both mean annual ground temperature and the seasonal cycle at monthly resolution (Fig. 4, Table 4).This means that the TSL method is more susceptible to model errors, but it offers a more comprehensive insight into land model processes.CoLM is an extreme example of how a simulated permafrost map can be totally incorrect due to small errors in seasonal ground temperature.CoLM simulates nearly no TSL-derived permafrost (Figs. 1 and 2), accounting for much of the large across-model and across-method variability (Table 3).We investigate both the air and ground temperatures (Fig. 5) of the selected region (the region shown in Fig. 1), which is the coldest part of the TP and should be permafrost.CoLM simulates no permafrost in the selected region despite CoLM having lower mean annual ground temperatures for the 3 m layer than many other models (ISBA, CLM4.5 and JULES) (Fig. 5).However, CoLM simulates a larger seasonal amplitude than CLM4.5 and ISBA (Fig. 5), so that, in the western TP, the monthly maximum 3 m ground temperatures in CoLM always surpasses 0 • C by around 0.2 • C (Fig. 5c) precluding it being classified as permafrost with the TSL method.

Main processes causing ground temperature discrepancies
As discussed in Sect.4, the most noticeable ground temperature discrepancies among the six models are the underestimation of soil temperature by LPJ-GUESS and the overestimation of soil temperature by UVic, which lead to the largest biases in simulated permafrost area.There are many other, rather subtle, potential model discrepancies that we do not investigate in detail here.One example is the overestimation of the amplitude of the seasonal temperature cycle at deep depths in several models (Fig. 4b and d; Table 4).Table 4 also shows that the observed vegetation and soil texture are mismatched by all the models at each of the stations.Although it is a common problem to compare grid cell results against site data, model description of vegetation and soil texture is too simplified.
To help elucidate the causes of ground temperature discrepancies associated with soil processes we also inspect snow depth and vertical ground temperature gradients.We use the "Long Time-Series Snow Data Set of China" (Che et al., 2008) (http://westdc.westgis.ac.cn) to examine the modeled snow depth.The complete data set is composed of SMMR (1978SMMR ( -1987)), SSM/I (1987SSM/I ( -2008) ) and AMSR-E (2002AMSR-E ( -2010)).According to Wang et al. (2013), the snow depth pattern and the significant seasonal snow characteristics of the satellite data are consistent with those of station data in most of our common TP region.The satellite data are different from station data on the southeast of the TP (Wang et al., 2013); however, our analyzed common region does not include this part of the TP.Thus this satellite data are reliable in this study.Here we use the data of SMMR and SSM/I to produce the winter (DJF) climatological distribution of 1980-2000 (Fig. 6).Furthermore, we follow Koven et al. (2013) and calculated two vertical gradients to isolate processes: from the atmosphere to ground surface (Fig. 7) and from ground surface to deeper soil (at 1 m depth) (Fig. 8).While the first one is mainly controlled by the snow insulation, the latter is mainly determined by soil hydrology, latent heat and thermal properties.Important factors that influence the ground thermal structure are compared in Table 5.Since several models produce incomplete or not directly comparable output, we restrict ourselves to a qualitative assessment here.
The LPJ-GUESS simulated underestimation of soil temperature is not caused by a bias in the surface air temperature forcing (Fig. 5, Table 4).Instead, this bias may be due to many factors such as inappropriate prescriptions of soil thermal properties, poor representation of soil hydrology, mismatch of vegetation types, and weak coupling of soil water and vegetation cover.Figure 8 shows that the soil tem- 1 Low snow cover is confined to high elevations; medium tends to be on the western TP. 2 LPJ-GUESS has constant albedo everywhere and UVic albedo varies slightly due to vegetation; year-round albedo variability for other models depends mainly on snow cover in winter and soil moisture, vegetation, etc. in summer. 3Soil water content includes both liquid and ice fractions. 4All models calculate soil thermal properties depending on soil moisture and also phase change of water, but CoLM and LPJ-GUESS ignore solute dependent freezing processes. 5Dynamic or static snow layering; ML: multi-layer, BL: bulk-layer, I: implicit; according to Slater et al. (2001).peratures increase with depth, but LPJ-GUESS has a much smaller temperature gradient between the surface and the 1 m-deep soil (0-2 K) than the other models.This suggests a different (larger) winter soil thermal conductivity probably associated with a high soil porosity and water content.LPJ-GUESS specifies the same soil texture for the TP as for the Arctic, which is mostly clay-like (Table 4).Clay has high water retention capacity.Many studies have reported that the soil on the TP is immature, with coarser particles than typical for Arctic permafrost and with much less organic matter.Inappropriate soil texture classification will affect the simulated ground thermal structure.LPJ-GUESS underestimates the surface and top soil temperatures particularly in summer (Figs.4a, c and 5).Precipitation and hydrological processes determine the vertical profile of soil water content which can change the fraction of water and ice retained in different soil layers and influence soil thermal conduction.The energy required to melt the high water (ice) content in the surface soil layers in summer appears to lead to underestimated low summer temperatures compared with other models, and a phase lag in summer warming (Fig. 4a and c).
In addition, LPJ-GUESS shows a similarly thick snow depth in the western part of Tibetan Plateau as CLM4.5 and CoLM (Fig. 6), but does not show as large surface temperature offset as those two models (Fig. 7).That is because LPJ-GUESS has a fixed snow density (362 kg m −3 ) which is higher than used in other models, and uses a relatively simple Bulk-layer snow scheme, with one static snow layer, unlike the dynamic multi-layer snow schemes of CLM4.5 and CoLM (Table 5).
UVic uses the same climate forcing as CLM4.5 (Table 1), but simulates much warmer ground temperatures than other models.In contrast with the other models, UVic has no snow cover in winter (Fig. 6), which is consistent with grid cell surface albedo staying at values between 0.15 and 0.35 yearround.The simulated snow depth is derived from the pre- scribed winter precipitation, and the model's snow, energy and water balances.The lack of snow over the TP in UVic likely indicates removal by sublimation.An overly low snow albedo makes the snow gain energy that is lost through sublimation.Since it takes more energy to sublimate snow than it does to melt it, the latent heat flux should be, and is higher in UVic than in other models (not shown).However, despite the apparent snow sublimation-which should cool the soil, the ground surface temperatures in UVic are warmer than in all the models.The large absorption of short wave radiation allowed by the year-round low albedo provides this heat and is sufficient for very little permafrost simulated by UVic over the TP.
ISBA and especially JULES stand out from other models in their calculated winter temperature offsets: ground surface temperatures are colder than the driving air temperatures over much of the simulated region (Fig. 7).Snow (Fig. 6) and vegetation cover would normally be expected to provide insulation, making soil warmer than air temperatures in winter.However, we observe that the snow depths from ISBA and JULES are not very thick (< 10 cm) in most places on the TP (Fig. 6). Figure 9 shows the temperature offset between ground surface and air temperatures as a function of snow depth.By inspection we note that there is different behavior for snow depths thinner and thicker than 4 cm.For snow depth > 4 cm, most negative offsets disappear in ISBA and JULES, which means that the ground surface temperature is warmer than air temperature for snow depth larger than 4 cm.For snow depth < 4 cm, the ground surface temperature of much of the region is colder than air temperature in ISBA and JULES, which indicates the cooling effect of thin snow.The very small or slightly negative temperature offset for thin snow is also seen in the other models.Of course, the strength of this effect depends on the individual model's simulation/parameterization of the snow processes (such as sublimation, evaporation, melting).The thin snow mechanism is also confirmed by the weak insulation effect in Fig. 10.

Choice of thresholds in the methodologies
In Sect. 4 we used the most commonly applied threshold of each method, based on the empirical findings from previous studies, to compare models and methods.However, the thresholds themselves have the potential to affect the results.To reduce the potential uncertainties in terms of the methodologies, we also examine the sensitivity of permafrost area to different thresholds (Table 2), calculating changes in the permafrost area (Table 3) for a range of thresholds for each method (i.e.MAAT ≤ −3/−2/−1/0 • C; F and SFI ≥ 0.4/0.5/0.6 • C; MAGT ≤ 0/0.5 • C).
Generally, when the permafrost definition requires colder climate, the derived permafrost area becomes smaller.The across-threshold uncertainty (Table 3) is similar for different models.But the across-threshold uncertainty with SFI varies greatly among models, 23-105 × 10 4 km 2 , which is due to the seasonal amplitude of ground surface temperatures it requires.This is illustrated in Fig. 5 where UVic and LPJ-GUESS have a relatively small seasonal amplitude of ground surface temperature, which corresponds to their small acrossthreshold variability for SFI-derived area in Table 3.
The across-model uncertainty is highly consistent even with different thresholds for each method (Table 3, last column).Thus it seems changing the thresholds does not affect one key point in our paper: across-model uncertainties using direct methods are much larger than using indirect ones.Large across-model uncertainties using direct methods imply that differences among these land surface processes are worthy of investigation.

Model settings
The lowest soil boundary is a critical uncertainty affecting the simulation of permafrost (e.g.Nicolsky et al., 2007).The common boundary of 3 m soil depth may produce uncertainties in the derived permafrost area.Three (CLM4.5,ISBA, UVic) of the six models extended the soil to deeper depths (Table 1); this provides insight on this issue.As UVic does not do a reasonable simulation of snow cover and ground temperature, we feel it is not necessary to include this model in the discussion here.Based on results from CLM4.5 and ISBA, the permafrost area calculated from MAGT at 3 m and at 10 m only changes by 1 × 10 4 km 2 .For results from CLM4.5, the areas calculated from MAGT at 20 and 30 m do not change from the one calculated at 10 m.This is due to MAGT only considering annual mean soil temperature, not the seasonal cycle.This is consistent with the finding that the across-threshold uncertainty for MAGT-derived permafrost area is quite small (Table 3).However, the derived permafrost area with the TSL method improves when soil depth used for calculation is increased from 3 to 5 m (Table 6).This sensitivity is because TSL requires information on the seasonal cycle of soil temperature.In other words, results of the TSL method are sensitive to the active layer dynamics.The permafrost on the TP is usually much warmer and has a deeper active layer than found in continuous permafrost of the arctic and boreal region.Hence deeper soil layers would be well suited for the TP permafrost simulation.A shallow column in a permafrost model can cause problems in the simulation of the degradation of warm permafrost (near 0 • C), which is expected for projections of future climate warming (Lawrence et al., 2008).In addition, Alexeev et al. (2007) pointed out that deep soil configuration can improve the simulation of seasonal and even annual cycle of shallow layers.Nicolsky et al. ( 2007) recommend a soil column of at least 80 m for models applied to permafrost regions.Soil layer discretization and spatial resolutions are different among the six models (Table 1).In this study we linearly interpolated and extrapolated the soil temperatures onto the standard layers (Sect.3.1).The impact of ground surface temperature extrapolation was found to be small by comparing Figs.7 and 8 with those made using temperatures at 5 cm depth (not shown), with both geographical patterns and widespread negative surface temperature offsets in ISBA and JULES.We re-gridded the Wang06 map onto each model's spatial resolution to evaluate the models objectively.This leads to an error bar estimate of half a grid cell area, up to 20 × 10 4 km 2 , which is half of the spread of observational area estimates (Sect.3.2).Daily and hourly temperature data may make some differences to the permafrost extent map, but the diurnal cycle wave decays at shallower soil depths than the deepest model layer.

Summary and conclusions
Results of this model intercomparison quantify, for the first time, the uncertainties of model-derived permafrost area on the Tibetan Plateau (TP).The uncertainties stem from across-model and across-diagnostic method variability as well as historic climate data uncertainties.According to the agreement of the air-temperature-based diagnostic methods (MAAT and F), we found lower uncertainty in permafrost area associated with air temperature forcing (99 to 135 × 10 4 km 2 ) in comparison with the uncertainty (1 to 128 × 10 4 km 2 ) associated with the simulation of soil temperature used in the other three diagnostic methods (TSL, MAGT, and SFI).The observation-based Wang06 permafrost area is 101 × 10 4 km 2 .
Most models in this study produced permafrost maps in better agreement with the Wang06 map using the MAGT and SFI methods rather than with the TSL method.But this does not mean that the models simulate permafrost dynamics correctly.Although most models can capture the threshold value of MAGT and SFI, their ground temperatures still show various biases, both in the mean annual value and the seasonal variation.Therefore, most models produce worse permafrost maps with the TSL method.The TSL method is a more demanding, and to date, elusive target.
Modeled snow depth and surface and soil temperature offsets vary widely amongst the models.If the observation sites for soil temperature are representative, then LPJ-GUESS and UVic have substantial biases in their soil temperature simulations, mainly attributable to inappropriate description of the surface (vegetation, snow cover) and soil properties (soil texture, hydrology).Other models (ISBA, JULES) show biases in the simulation of winter soil temperature.
Further evaluation of model results from the permafrost-PCN is underway for the TP that examines permafrost temperature, active layer thickness and carbon balance under present and future climate forcing.We also plan to complement this model intercomparison study by an uncertainty quantification analysis of key model parameters (e.g.improved vegetation and snow albedo, soil colors, etc.) with the CoLM model.However, a crucial requirement for this is much better data availability allowing for better spatial coverage across the TP in the evaluation of simulated ground temperature profiles.Under the Chinese Scientific Foundation Project "Permafrost Background Investigation on the Tibetan Plateau" (no.2010CB951402), a series of new stations have been established, especially in the depopulated zone.More ground truth data will be published in the near future, which will also be assimilated in a new observation-based permafrost map.

Figure 1 .
Figure 1.Permafrost maps derived from different diagnostic methods and models compared with the Wang06 map.Permafrost inside the common modeling region is used for all-models inter-comparison, while permafrost outside allows further evaluation over the whole TP for CLM4.5, CoLM, JULES and UVic.The observation-based map of permafrost (Wang et al., 2006) is re-gridded to match model resolution.The selected area in the western TP (33-36 • N, 82.5-85.5 • E) is used to examine across-model differences in Fig. 5. Insets show location map of TP and how the common region is related to the TP.

Figure 2 .
Figure 2. Permafrost areas derived from different diagnostic methods compared with the Wang06 map.(a) Permafrost area, with TP permafrost outside the common region denoted by grey extensions to the bars for CLM4.5, CoLM, JULES and UVic.(b) Bias in permafrost area "Model minus Wang06 estimate", only for the common modeling region.The error bar is calculated as half of the averaged grid cell area of the model, so is model resolution dependent.

Figure 3 .
Figure3.Kappa coefficient, K, quantifying the agreement between model-derived and Wang06 maps (see Sect. 3.3).K ≥ 0.2 indicates at least fair agreement with the Wang06 map.The lower triangle is K for the whole TP and is only available for CLM4.5, CoLM, JULES and UVic, while the upper triangle is K for the common modeling region.

Figure 4 .
Figure 4. Monthly soil temperature variations at three stations from models and observations.(a, c) Soil temperature of top layer.(b, d) Soil temperature of deeper layer, 1996-2000."Mean" denotes annual average temperature.We use the topmost available soil temperatures (at 0.045 m for D66 and D110, no good data for D105) and lowest available ones (at 2.65 m for D66, at 3 m for D105), while D110 has only temperatures at 2 m depth.

Figure 6 .
Figure 6.Winter snow depth for the common region, averaged over 1980-2000.Note the nonlinear color scale.We use the Long Time-Series Snow Data Set of China (Che et al., 2008; http://westdc.westgis.ac.cn) as the observed snow depth.The observed snow depth plot is further interpolated onto the models' resolutions as "OBS_".The OBS_05 is in 0.5 • resolution for CoLM, ISBA, JULES and LPJ-GUESS.The OBS_CLM4.5 and OBS_UVic are in the resolutions of CLM4.5 and UVic separately.

Figure 7 .
Figure 7. Mean surface temperature offset: difference in mean winter temperatures between surface soil and air, averaged over 1980-2000.Warm colors indicate soil is warmer than air temperature.

Figure 8 .
Figure 8. Mean soil temperature offset: difference in mean winter temperatures between soil at 1 m depth and surface soil, averaged over 1980-2000.Warm colors indicate deep soil is warmer than shallow soil.

Figure 9 .
Figure 9. Mean surface temperature offset (difference in mean winter temperatures between surface soil and air, averaged over 1980-2000).Left column is for snow depth > 4 cm, right column shows regions with snow depth < 4 cm.Warm colors indicate soil is warmer than air temperature.

Figure 10 .
Figure 10.Mean surface temperature offset (difference in mean winter temperatures between surface soil and air, averaged over 1980-2000) as a function of snow depth for grid points where average snow depth < 4 cm.

Table 1 .
The six land-surface models, analyzed over the Tibetan plateau (TP).

Table 2 .
The five diagnostic methods and threshold values used to derive permafrost.The thresholds commonly used in the literature and in this paper are marked in bold.

Table 3 .
Derived permafrost area inside the common modeling region on Tibetan plateau (10 4 km 2 ) from six LSMs and five diagnostic methods, using different thresholds.The results of thresholds commonly used in the literature and in this paper are marked in bold.

Table 4 .
Model-observed temperature differences in mean annual and seasonal cycle amplitude of air and soil temperature, based on data from 1996 to 2000 (Sect.3.4;Fig.4),and the corresponding vegetation and soil properties of both observation and models.Air temperature data are only available for D66 station and limited from September 1997 to August 1998.Thus the statistics of ground temperature of D66 is also confined to this period.

Table 5 .
Description of model characteristics relevant to soil temperatures on the TP.

Table 6 .
Derived permafrost area (10 4 km 2 ) with deeper soil layers using the TSL method.The results for thresholds commonly used in the literature and in this paper are marked in bold.