Systematic bias of Tibetan Plateau snow cover in subseasonal-to-seasonal models

Accurate subseasonal-to-seasonal (S2S) atmospheric forecasts and hydrological forecasts have considerable socioeconomic value. This study conducts a multimodel comparison of the Tibetan Plateau snow cover (TPSC) prediction skill using three models (ECMWF, NCEP and CMA) selected from the S2S project database to understand their performance in capturing TPSC variability during wintertime. S2S models can skillfully forecast TPSC within a lead time of 2 weeks but show limited skill beyond 3 weeks. Compared with the observational snow cover analysis, all three models tend to overestimate the area of TPSC. Another remarkable issue regarding the TPSC forecast is the increasing TPSC with forecast lead time, which further increases the systematic positive biases of TPSC in the S2S models at longer forecast lead times. All three S2S models consistently exaggerate the precipitation over the Tibetan Plateau. The exaggeration of precipitation is prominent and always exists throughout the model integration. Systematic bias of TPSC therefore occurs and accumulates with the model integration time. Such systematic biases of TPSC influence the forecasted surface air temperature in the S2S models. The surface air temperature over the Tibetan Plateau becomes colder with increasing forecast lead time in the S2S models. Numerical experiments further confirm the causality.

Abstract. Accurate subseasonal-to-seasonal (S2S) atmospheric forecasts and hydrological forecasts have considerable socioeconomic value. This study conducts a multimodel comparison of the Tibetan Plateau snow cover (TPSC) prediction skill using three models (ECMWF, NCEP and CMA) selected from the S2S project database to understand their performance in capturing TPSC variability during wintertime. S2S models can skillfully forecast TPSC within a lead time of 2 weeks but show limited skill beyond 3 weeks. Compared with the observational snow cover analysis, all three models tend to overestimate the area of TPSC. Another remarkable issue regarding the TPSC forecast is the increasing TPSC with forecast lead time, which further increases the systematic positive biases of TPSC in the S2S models at longer forecast lead times. All three S2S models consistently exaggerate the precipitation over the Tibetan Plateau. The exaggeration of precipitation is prominent and always exists throughout the model integration. Systematic bias of TPSC therefore occurs and accumulates with the model integration time. Such systematic biases of TPSC influence the forecasted surface air temperature in the S2S models. The surface air temperature over the Tibetan Plateau becomes colder with increasing forecast lead time in the S2S models. Numerical experiments further confirm the causality.

Introduction
Anomalous weather-and climate-related natural disasters are among the most common disasters and are associated with severe socioeconomic consequences. Reliable forecasts of such weather and climate anomalies with sufficient lead time have significant benefits for decision-makers (White et al., 2017). Traditionally, weather forecasts cover a time range of up to 2 weeks, while climate forecasts begin at the seasonal timescale and extend outward. Demands are growing rapidly in operational forecasts in the subseasonal-to-seasonal (S2S) range (from 2 weeks to a season). The primary basis for longer lead forecasts beyond 2 weeks is the interaction of the atmosphere with other, more slowly varying earth system components, such as the ocean or land, that evolve over timescales of weeks and months rather than days as in the atmosphere (Mariotti et al., 2018). Land-atmosphere coupling is one of the key physical processes for S2S prediction but is not well simulated and may reduce S2S prediction skill (Robertson et al., 2014;Dirmeyer et al., 2019).
Snow cover is a crucial component in both the climate system and the cryosphere. The radiative and thermal properties of snow cover significantly influence the ground thermal regime (Zhang, 2005). As the lower boundary condition of the atmosphere, snow cover forces the regional and global atmosphere and can serve as an indicator of the atmosphere (Barnett et al., 1989;Bamzai and Shukla, 1999;Wu and Kirtman, 2007;Henderson et al., 2018). Snow cover can vary rapidly within a season over discontinuous or sporadic permafrost zones (Wang et al., 2015;Suriano and Leathers, 2018;Li et al., 2020a) and rapidly influence the atmosphere (Clark and Serreze, 2000;. Snow cover may provide a potential source of S2S predictability via its variability and atmospheric effects at the subseasonal timescale (F. Diro and Lin, 2020).
The Tibetan Plateau is the highest plateau in the world and is known as the "third pole". Due to its high elevation and cold climate, the Tibetan Plateau has much more snow cover than the other regions at the same latitude. Tibetan Plateau snow cover (TPSC) is a key component of the climate system. TPSC influences land surface thermal conditions Li et al., 2018) and thus influences atmospheric circulations and monsoons over Asia and beyond (Wu and Qian, 2003;Lin and Wu, 2011;Xiao and Duan, 2016;Wang et al., 2017;You et al., 2020). TPSC shows variations at multiple timescales, including the subseasonal scale Li et al., 2020a). The subseasonal variations in TPSC influence the atmosphere over East Asia (Li et al., 2018;Li et al., 2020b). A better TPSC simulation and forecast may favor a better forecast for weather and climate at the S2S timescale.
Snow cover also affects the hydrologic cycle. The accumulation of precipitation in the form of snow and its release through snowmelt runoff is an important component of the hydrologic cycle (Jeelani et al., 2012;Fayad et al., 2017). TPSC plays an important role in hydrological systems, providing a reservoir of water and acting as a buffer that controls river discharge. Rivers including the Yangtze River, Yellow River, Yarlung Zangbo River and Mekong River have headwaters over the Tibetan Plateau. Studies on the variability in TPSC are critical for water management in downstream regions (Immerzeel et al., 2009;Zhang et al., 2012Zhang et al., , 2013. Skillful predictions of TPSC with sufficient lead time are thus of great societal importance for hydrologic prediction. Since the implementation of the S2S prediction project database , many studies have evaluated the skill of S2S models for atmospheric elements and variables, such as the Madden-Julian Oscillation (Vitart, 2017), surface air temperature (Yang et al., 2018;Wulff and Domeisen, 2019) and precipitation (de Andrade et al., 2019). Some works also focus on the skill of S2S models for hydrological elements (W. Schmitt Quedi and Mainardi Fan, 2020). However, we still know little about the skill of S2S models for TPSC. Understanding the forecasting skills of the S2S model on the TPSC is the first step to applying the S2S model to hydrological forecasts over the Tibetan Plateau. Moreover, considering the influence of TPSC on the atmosphere, clarifying the issue of the S2S model for TPSC helps improve the ability of the S2S model for atmospheric forecasting.
This study conducts a multimodel comparison of the TPSC prediction skill using selected models from the S2S project database to learn about their performance in capturing TPSC variability. Our main goal is to use the state-ofthe-art S2S prediction systems of these operational centers to demonstrate why models exhibit systematic biases of TPSC and whether such systematic biases influence the regional air temperature forecasted in S2S models. The remainder of this paper is organized as follows. Details on the dataset and method used in this study are described in Sect. 2. The systematic bias of TPSC in S2S models and its effect on local temperature during wintertime are presented in Sects. 3 and 4, respectively. The conclusions and a discussion are presented in Sect. 5.
2 Data and method 2.1 S2S forecast models The reforecasts considered for this study are taken from three operational forecast systems that are part of the S2S project database: the European Centre for Medium Range Weather Forecasts (ECMWF), the US National Centers for Environmental Prediction (NCEP) and the China Meteorological Administration (CMA). These models share a common reforecast period of 1999-2010 with a reforecast initialized frequency that is equal to or greater than once a week. This study only used reforecasts produced by the control forecast (using a single unperturbed initial condition). Details of the S2S database can be found in Vitart et al. (2016). Daily reforecast data were averaged for each 7 d period starting every 1 January to create a total of 52 weeks per year (31 December was excluded). The reforecasts that initialized on the first day of these weeks were selected. Forecast lead times were defined here as 1 week (1-7 d), 2 weeks (8-14 d), 3 weeks (15-21 d), 4 weeks (22-28 d) and 5 weeks (29-35 d).
For the ECMWF model, the reforecasts initialization is based on ERA-Interim and ERA-Interim/Land datasets. The daily Interactive Multisensor Snow and Ice Mapping System (IMS) snow cover product has been used to constrain the ERA-Interim snow analysis (Dee et al., 2011). The NCEP model also initialized realistic snow in the forecasts. The snow initialization comes from the Climate Forecast System Reanalysis snow analysis using IMS and the Air Force Weather Agency snow depth analysis. Snow in the CMA model was not directly initialized in the forecasts. The initial conditions of the snow in the CMA model are from a balanced state produced by long-term air-sea initialization integration. See the details on snow initialization in the S2S models at https://confluence.ecmwf.int/display/S2S/Models (last access: October 2020).
The land surface models used for ECMWF, NCEP and CMA are the Hydrology Tiled ECMWF Scheme for Surface Exchanges over Land (HTESSEL; Balsamo et al., 2009), Noah (Ek et al., 2003) and BCC_AVIM2 (Wu et al., 2014), respectively. All these land surface models contain snow schemes. According to the snow scheme in each land surface model, we obtain the snow cover fraction, which is a diagnostic variable in this study.
The snow cover fraction (f snow ) in the ECMWF model is parameterized as follows: where min indicates the minimum function, S is the snow water equivalent (unit is kg m −2 ) and ρ is the snow density (unit is kg m −3 ) (Dutra et al., 2010). The f snow in the NCEP model is parameterized as follows: where e is the natural logarithm, and SNUP is the vegetation parameter, which indicates the threshold snow depth (in water equivalent meters) that implies 100 % snow cover (Koren et al., 1999;Ek et al., 2003). The SNUP ranges from 0.01 to 0.08 for different vegetation types. Details on the Noah code and vegetation parameters can be accessed at https://ral.ucar.edu/solutions/products/unified-noah-lsm (last access: October 2020). The f snow in the CMA model is parameterized as follows: where d is the snow depth (unit is cm), which is calculated from the snow water equivalent and snow density (Wu and Wu, 2004). The surface air temperature (SAT) in these S2S models is also used. All variables are at a 1 • × 1 • horizontal spatial resolution.

Validation data and method
The Tibetan Plateau area of focus in this study is the region within 26-41 • N and 70-105 • E at an altitude of greater than 3000 m (Fig. 1). Although the Tibetan Plateau is located over middle latitudes, the area is cold due to high altitude, especially in boreal winter. This study focuses on TPSC during wintertime. Here, each winter contains 17 weeks, covering from the 45th week (5)(6)(7)(8)(9)(10)(11) in one year to the 9th week (26 February-4 March) in the following year. This study spans 11 winters (from 1999/2000 to 2009/2010).
The reforecasts in the S2S models are verified against observational daily snow cover and SAT in the reanalysis. Observational daily snow cover data are obtained at a 24 km resolution from the Interactive Multisensor Snow and Ice Mapping System (IMS) snow cover analysis (Helfrich et al., 2007) provided by the National Oceanic and Atmospheric Administration. The IMS examines satellite images and other sources of data on snow cover and generates maps of snow cover distribution. The IMS analysis over the Tibetan Plateau corresponds well with ground-based measurements and can capture the general subseasonal variability in TPSC (Yang et al., 2015;Li et al., 2018). The original 24 km resolution IMS analysis is interpolated into the 1 • × 1 • grid of the S2S models. IMS provides binary snow cover information: it has the value of 1 if more than 50 % of the 24 km pixel is covered by snow; otherwise, it is 0 (snow free). Orsolini et al. (2019) aggregated the original IMS product to a lower resolution rectilinear grid. They counted the number of pixels with a value of 1 in a grid box; assuming that they have 100 % cover gave the high estimate, and assuming that they represent 50 % cover gave the low estimate. These two estimates provide a range of values, which reflects the uncertainty inherent to aggregating the 24 km binary data; e.g., a value of 1 in a pixel means a 50 % to 100 % snow coverage. Here, we used a method similar to Orsolini et al. (2019) to interpolate the original IMS product into the 1 • × 1 • grid of S2S products, but we further averaged these two estimates. Daily SATs at a 1 • × 1 • resolution are obtained from the ERA-Interim reanalysis (Dee et al., 2011). These data range from 1 January 1999 to 30 December 2010. S2S reforecasts are compared with the observations and reanalysis for the same calendar date.
Two precipitation datasets, the Global Precipitation Analysis Products of the Global Precipitation Climatology Centre (GPCC; Schneider et al., 2011) and the Tropical Rainfall Measuring Mission (TRMM; Huffman et al., 2007), are used to evaluate the wintertime mean precipitation. The GPCC precipitation dataset is from built rain gauges that were GTS based. The TRMM precipitation dataset is based on satellite observations. The precipitation used in this study spans 11 winters (from 1999/2000 to 2009/2010).
To quantify the forecast ability of S2S models, three common statistical measures, i.e., the temporal correlation coefficient (TCC), the root-mean-square error (RMSE) and the mean bias, are calculated in this study. A composite analy-sis is performed to investigate the different performances on predicting the snow cover for increasing cases and decreasing cases (details are described in Sect. 3.2).

Numerical model and experimental design
To reveal the causality of the systematic bias of the TPSCinduced regional SAT bias, numerical experiments are performed. Numerical experiments are performed using the Advanced Weather Research and Forecasting Model (WRF-ARW, version 4.1.3), which was developed by the National Center for Atmospheric Research (NCAR). WRF-ARW has been applied to climate research, including studies of landatmosphere interactions. The land surface parameterization scheme used in this study is the Noah land surface model (Ek et al., 2003). Important physics options include the WRF single-moment 6-class microphysics scheme (Hong and Lim, 2006), the NCAR Community Atmosphere Model (CAM 3.0) spectral-band shortwave and longwave radiation schemes (Collins et al., 2006), the Yonsei University planetary boundary layer scheme  and the Kain-Fritsch convective parameterization scheme (Kain, 2004). The WRF is driven by atmospheric and surface forcing data extracted from the National Centers for Environmental Prediction (NCEP) FNL (Final) Operational Model Global Tropospheric Analyses. The simulation domain is in a cylindrical equidistant projection with a horizontal resolution of 1 • × 1 • and located within 5-65 • N and 40-170 • E (as shown in Fig. 1) and without nesting. There are 41 levels in the vertical direction.
Two ensemble experiments are performed: control (CTL) runs and sensitive experimental (EXP) runs. All these runs have the same initial times as the forecasts in the S2S models that we used in this study for each winter. But the experiments were run for 20 winters (from 2000/2001 to 2019/2020), and both runs contain 340 cases. Each member ran continuously for 22 d. The first day in each run is for spin-up, and the results are discarded. The CTL runs are integrated freely without any modification. Because both the NCEP S2S model and our numerical experiment use Noah as the land surface model, the TPSCs in CTL runs are expected to show unreal increases with integration time, which is similar to that in the NCEP S2S model (will be revealed in Sect. 3). The EXP run is designed to eliminate such bias in TPSC. The FNL analyses are from the Global Data Assimilation System (GDAS), which continuously collects observational data from the GTS and other sources for many analyses. GDAS incorporates daily snow data from IMS analyses and the Air Force Weather Agency Snow Depth Analysis Model. We replace the forecasted TPSC in the WRF model with TPSC in the FNL analyses every 6 h. Because FNL analyses assimilate the observed TPSC, the TPSC in the EXP run is expected to show a small bias that increases with integration time. We averaged all 340 cases in CTL runs and EXP runs respectively. Ensemble mean results between the CTL and EXP runs are compared with each other.
3 Tibetan Plateau snow cover in the S2S forecast models 3.1 Increasing Tibetan Plateau snow cover with forecast lead time Before we present the systematic bias of TPSC in the S2S models, the overall forecast skill of TPSC is evaluated. Here, we focus on the variation in snow-covered area over the entire Tibetan Plateau, which can be measured by a Tibetan Plateau snow cover index (TPSCI). The TPSC index represents the percentage of grid points covered by snow in the analysis or models over the entire Tibetan Plateau. The unit of the TPSC index is percent (%). The prediction skill of the TPSC index has been investigated through the TCC and RMSE between the TPSC index in the predictions and that in the observations during wintertime (Fig. 2). A skillful prediction is generally defined as a TCC greater than 0.5. All three models show good prediction skills at lead times of 1-2 weeks with a TCC greater than 0.5 (Fig. 2a). At lead times of 1-2 weeks, the TCC for the ECMWF model is largest among the three models. The NCEP model has the lowest TCC among the three models at a lead time of 1 week. However, the TCC for NCEP falls the most slowly at lead times of 2 weeks or more. The NCEP model has a larger TCC than the CMA model at lead times of 2 weeks or more. The TCC values decrease with the increase in the forecast lead time and decline below 0.5 at and after lead times of 3 weeks for all three models. RMSEs increase with the forecast lead time (Fig. 2b). The RMSE for ECMWF is the smallest among the three models. Additionally, CMA has the largest RMSEs.
These results indicate that the S2S models can skillfully forecast TPSC variations within a lead time of 2 weeks during wintertime but show limited skill at a lead time of 3 weeks or more.
The above results also indicate that the ECMWF model is shown to have a better TPSC forecasting skill than the other two models. Even so, the ECMWF model shows nonnegligible RMSEs with a TPSC index of more than 15 % (Fig. 2b). The other two models, especially the CMA model, show even more significant RMSEs up to more than 25 %. These large errors in the forecasting of the TPSC are induced by systematic bias of the TPSC, as shown by the following. The multiyear wintertime mean biases of the TPSC index in forecasts against that in the IMS snow cover analysis for all three models show positive values, which indicates that all of the models tend to overestimate the TPSC during winter (Fig. 3a). The TPSC index in the ECMWF is higher than the observed TPSC index by approximately 20 %-30 %. NCEP has a larger TPSC index than that in the observation by ap- Figure 2. Prediction skill of the Tibetan Plateau snow cover (TPSC) index in the S2S models during wintertime. (a) The temporal correlation coefficients (TCCs; y axis) between the observed TPSC index and the predicted TPSC index in the ECMWF (orange line), NCEP (green line) and CMA (blue line) models during winter. The x axis represents the forecast lead time (unit: week). A good prediction skill has a TCC that is greater than 0.5 (marked by black line). Panel (b) is similar to panel (a) but is for the root-mean-square errors (RMSEs; y axis, unit: %).
Another remarkable issue regarding the forecast of TPSC is the increasing TPSC with forecast lead time, which further increases the overestimation of TPSC in models at longer forecast lead times. These increasing biases can be detected from the multiyear winter mean biases (Fig. 3a). To highlight such increasing biases, we further present differences in the multiyear winter biases for the TPSC index between forecasts for leads of 2-5 weeks and forecasts for leads of 1 week in three modes (Fig. 3b). Such differences are obtained by subtracting the multiyear winter mean of the TPSC index at a lead time of 1 week from that at forecast lead times of 2-5 weeks. The differences in the three models show common features: the differences in all three models are all positive and increase with increasing forecast lead time. The positive biases of TPSC with the longest forecast lead time (5 weeks) are largest among all forecasts. The increases in the differences in the ECMWF model are the smallest, while the CMA model has the largest increases in the differences. Taking the differences between the forecasts with a lead of 4 weeks and the forecasts with a lead of 1 week as an example, the spatial patterns of these increases in the biases in the three models show some similarities (Fig. 4). Although the spatial patterns of the differences in the three models show some small discrepancies, the differences are mainly positive in the three models, especially over parts of central and eastern Tibetan Plateau. These indicate that the increasing TPSC with the forecasting lead time occurs at a regional scale.

Snow cover accumulation versus dissipation
The intraseasonal variability in TPSC leads to obvious rapid variations in TPSC with a period shorter than a season, making TPSC exhibit a distinct lack of persistence within one season (Li et al., 2020a). Both accumulation and dissipation of snow cover occur within a season over the Tibetan Plateau. The increase in TPSC with forecast lead time in the models may be induced by overestimation of snow cover accumulation or underestimation of snow cover dissipation. To support this hypothesis, we analyzed the frequency of weekly TPSC accumulation and dissipation in the observation and forecast models in winter (Table 1). Here, the increasing (decreasing) weeks means that the TPSC index is greater (less) than that in the preceding week. The TPSC indexes in the S2S models are compared with the TPSC indexes in the preceding week, which are initialized at the same time, but with different forecast lead times.
The proportions of increasing and decreasing weeks in the observations are 50.3 % and 49.7 %, respectively, which is fairly even (Table 1). However, this kind of balance does not exist in the models. In the models, the proportion of increasing weeks is mostly more than 2 times as large as the proportion of decreasing weeks. The proportion of decreasing weeks is low compared with that in the observations. Specifically, decreasing weeks occupy only 23.0 %-31.0 % of the total forecasts by ECMWF. NCEP shows similar results, except for forecasting at a lead time of 5 weeks. This underestimation of the proportion of decreasing weeks is more severe in CMA. Moreover, the most severe underestimations of the proportion of decreasing weeks are the forecasts with a lead time of 2 or 3 weeks for all models.
The above results indicate that the models underestimate the frequency of TPSC dissipation, whereas they overestimate the frequency of TPSC accumulation, which leads to a systematic TPSC bias. To highlight increases in the overall TPSC biases, as well as changes in biases in successive weeks, a composite analysis is performed for all TPSC reforecasts during winter (Fig. 5a), increasing TPSC cases (Fig. 5b) and decreasing TPSC cases (Fig. 5c). All reforecasts initialized in winter are taken into account for the composite of all cases shown in Fig. 5a. The sample numbers  To focus on the increase in biases, values with a lead time of 1 week are removed for forecasting at all lead times. On a seasonal average, the growth of the TPSC index in winter is only 1.3 % over 2 weeks in the observation (black line in Fig. 5a). However, the models tend to exaggerate the growth of the TPSC index (color lines in Fig. 5a). The growth of the TPSC index over the 2 weeks in the models ranged from 4.9 % (ECMWF) to 9.8 % (CMA). The TPSC index in the forecast shows distinct differences between the increasing TPSC cases and decreasing TPSC cases (Fig. 5b and  c). The growth of the TPSC index in the increasing TPSC cases is 14.1 % over 2 weeks in the observation (black line in Fig. 5b). The growth of the TPSC index over 2 weeks in NCEP and CMA is close to that in the observation, while there is some underestimation of such growth in the ECMWF (color lines in Fig. 5b). Although there are some differences between the TPSC index in the models and that in the observation, all models can forecast the increasing trend in the TPSC index. However, the situation for the decreasing TPSC cases is quite different. The reduction of the TPSC index in the decreasing TPSC cases is −10.0 % over 2 weeks in the observation (black line in Fig. 5c). However, all the changes in the TPSC index in the models are positive values (color lines in Fig. 5c), indicating that there are some difficulties for the models in forecasting the dissipation of TPSC.
Studies have shown that current state-of-the-art atmospheric general circulation models (GCMs) tend to strongly overestimate the precipitation over the Tibetan Plateau (e.g., Su et al., 2013;Chen and Frauenfeld, 2014;Zhang and Li, 2016;. For example, Su et al. (2013) evaluated 24 GCMs that were available in the fifth phase of the Coupled Model Intercomparison Project (CMIP5) over the eastern Tibetan Plateau by comparing the model outputs with ground observations, and they found that all of the models consistently overestimated the observed precipitation for all seasons.  found similar results, in that all climate models they evaluated exaggerated the daily precipitation in the Tibetan Plateau during winter compared with the observed values. Here, we also found that the S2S models tended to overestimate the precipitation over the Tibetan Plateau. We compared the precipitation in the S2S models with both the gauge-based GPCC precipitation dataset and the satellite-based TRMM precipitation dataset (Fig. 6). The regional averaging wintertime mean precipitation values over the Tibetan Plateau in the GPCC and TRMM models are 0.27 and 0.32 mm d −1 , respectively. Compared with the observed precipitation, all three S2S models exaggerate the regional precipitation obviously. Notably, such an overestimation persists throughout the model integration. The ECMWF model reproduces the precipitation that is closest to the observations among the three models, but it still shows a large overestimation. The precipitation in the ECMWF model is 0.78 to 0.88 mm d −1 . The precipitation values in the NCEP  model (1.07 to 1.37 mm d −1 ) and in the CMA model (1.50 to 2.13 mm d −1 ) have larger precipitation biases and even increase with the forecasting lead time. These overestimations of the precipitation induce underestimations of the TPSC dissipation, and they lead to positive biases in the TPSC from the models. Because the overestimation of the precipitation exists throughout the model integration, the positive biases of the TPSC accumulate and increase with the model integration.
In this section, it was found that S2S models underestimate the frequency of TPSC dissipation and have some difficulties forecasting TPSC dissipation with an observed rate. Exaggerations of the precipitation were found in all three models, which directly lead to accumulated overestimation of TPSC. As a result, systematic bias of TPSC occurs and increases with the model integration time.

Colder temperature with increasing forecast lead time
The local SAT over the Tibetan Plateau is highly correlated with simultaneous TPSC at a subseasonal timescale (Li et al., 2020a). Local snow-temperature relationships in S2S models were examined. We took a similar approach as in F.  and Diro and Lin (2020). The temporal correlation between the snow cover fraction and SAT with a lead of 1 week and 4 weeks for each grid point in the three models was computed to identify the extent and nature of the relationship (Fig. 7). Almost all of the regions exhibit a significant negative correlation in all of these three models. Additionally, such a relationship in all three models did not weaken with the forecasting lead time (compare Fig. 7a-c and Fig. 7d-f), even if the forecasting skill on the TPSC declined over time. The reason is that the relationship between the snow cover fraction and the SAT is embedded in the land surface model. The skill of predicting the TPSC will further influence the skill of predicting the SAT. As shown in Sect. 3, the TPSC in the S2S models during the cold season increases with increasing forecast lead time. Such systematic biases of TPSC may influence the forecasted SAT in the S2S models. To test this hypothesis, we performed an analysis on SAT over the Tibetan Plateau similar to our analysis on TPSC. The SAT over the Tibetan Plateau is derived by averaging the SAT over the Tibetan Plateau region as defined in Sect. 2.2. Differences in the multiyear winter mean SAT over the Tibetan Plateau between forecasts with leads of 2-5 weeks and forecasts with leads of 1 week in the three models, which were obtained by subtracting the multiyear winter mean with a lead time of 1 week from that for forecast lead times of 2-5 weeks, are examined (Fig. 8). The differences in the three models show some common features. The differences in all three models are all negative. By comparing values at different lead times, we also find that such negative differences increase with increasing lead time, except for the value at a lead of 3 week in the CMA model. The negative differences of SAT with the longest forecast lead time (5 weeks) are largest among all forecasts. The differences in SAT between the forecast for a lead of 5 weeks and the forecast for a lead of 1 week can be up to 1.9 • C. The increases in the SAT with the forecasting lead time are on a regional spatial scale (Fig. 9). Almost all of the grid points show negative values. Such increases in the CMA are less than those in ECMWF and NCEP.
The above results indicate that the SAT over the Tibetan Plateau becomes colder with increasing forecast lead time in the S2S models. Considering the results we obtained in Sect. 3, it can be concluded that the increasing TPSC is accompanied by decreasing SAT with forecast lead time.

Sensitivity of SAT to snow cover accumulation and dissipation
Section 3.2 reveals that models show different performances on snow cover accumulation and dissipation. We also found that there are some difficulties for the models in forecasting the dissipation of TPSC. To learn whether such different performances influence the SAT forecast and to examine the sensitivity of SAT to TPSC in the S2S models, we investigated the changes in SAT in the S2S models over the Tibetan Plateau during winter (Fig. 10a), as well as the increasing TPSC cases (Fig. 10b) and decreasing TPSC cases (Fig. 10c). To provide a SAT reference in the models, a composite was performed on SAT in the ERA-Interim reanalysis. We performed the same composite method as that is used in Sect. 3.2 on TPSC but for SAT over the Tibetan Plateau. On a seasonal average, the change in SAT over the Tibetan Plateau in the reanalysis during winter is less than 0.1 • C (black line in Fig. 10a). However, the SAT in the models tends to decrease as the forecast lead time increases, especially in the ECMWF and NCEP models (color lines Fig. 10a). The decline of the SAT over 2 weeks is 1.2 • C for the ECMWF and NCEP models. Considering the exaggerated growth of TPSC shown in Fig. 5a, a decrease in SAT is expected. In the ECMWF and NCEP models, more TPSC leads to lower SAT. SAT tends to be sensitive to TPSC in the ECMWF and NCEP models. However, SAT in the CMA model lacks sensitivity to TPSC. Although the exaggerated growth of the TPSC index in the CMA model is the most intense in these three models, the decrease in SAT in the CMA model is the least obvious.
The change in SAT should be closely connected to the variations in TPSC. The change in SAT in the increasing TPSC cases is −1.9 • C in 2 weeks in the ERA-Interim reanalysis (black line in Fig. 10b), which is associated with the accumulation of TPSC (black line in Fig. 5b). SAT shows considerable decreases during the increasing TPSC cases (Fig. 10b). Cold biases of SAT between the forecasted SAT with lead time and that at the initial week tend to appear in all models (Fig. 10b), which is associated with accumulation of TPSC (in Fig. 5b). Here, the change in SAT in CMA over 2 weeks is smaller than that in the ECMWF and NCEP models. SATs in the ECMWF and NCEP models are more sensitive to TPSC than that in the CMA model.
Here, we further find that such biases lead to biases in SAT. SAT increases by 1.4 • C over 2 weeks in the reanalysis (black line in Fig. 10c), which is associated with the dissipation of TPSC (black line in Fig. 5c). However, the SATs in all these models show small changes (color lines in Fig. 10c) compared with that in the reanalysis. Such small changes in the SATs in the ECMWF and NCEP models are consistent with the changes in the TPSC indexes in these models, which show little changes (Fig. 5c). However, the large change in TPSC in the CMA model (Fig. 5c) does not induce large bi-

Numerical experiment
Through the results in Sect. 4.1 and 4.2, we find that the local SAT over the Tibetan Plateau becomes colder with increasing forecast lead time. We assumed that the cold SAT biases are induced by the overestimation of TPSC. However, the relationship between snow cover and the atmosphere is a two-way coupling connection (Henderson et al., 2018). The assumption should be tested by numerical experiments (see Sect. 2.2 for details about the numerical model and experimental design). Otherwise, one may suspect that the cold SAT induces an increasing TPSC other than the TPSC influence on SAT. Therefore, we used the predicted TPSC as a boundary condition in CTL runs (with overestimated TPSC), while observational TPSC in GDAS was used as a boundary condition in the EXP runs (without overestimated TPSC). The difference between the CTL and EXP runs is considered to represent the response or the sensitivity of the SAT to the overestimated TPSC.
We averaged snow cover and SAT over the Tibetan Plateau in all simulations for the CTL and EXP runs to obtain a composite for all reforecasts of TPSC during winter in the numerical experiment (Fig. 11a-b). As we discussed in Sect. 3.2, the growth of the TPSC index in winter is only 1.3 % for 2 weeks in the observations, while the S2S models tend to exaggerate the growth of the TPSC index (Fig. 5a). In the numerical experiment, CTL also exaggerates the growth of the TPSC index (blue line in Fig. 11a). Because both the NCEP S2S model and our numerical experiment use Noah as the land surface model, such biases may be attributed to the land surface model. Compared with the CTL run, the EXP run shows smaller cumulative biases (red line in Fig. 11a), which is because TPSC in the EXP run is replaced by TPSC in the FNL analyses every 6 h. The SAT becomes colder with increasing forecast lead time in CTL (blue line in Fig. 11b).  However, such a decrease in SAT is much smaller in the EXP run (red line in Fig. 11b). By checking the land surface energy fluxes over the Tibetan Plateau between the CTL run and the EXP run (Fig. 11c), we found that the overestimated TPSC strongly increases the upward-reflected shortwave radiation (negative value indicates enhanced upward radiation) due to the snow-albedo effect. This difference in the solar surface energy leads to a decrease in the absorbed solar radiation. Thus, the net shortwave radiation is decreased (−10.2 W m −2 ), while the response of the net longwave radiation is much smaller than that of the net shortwave radiation. The decreased absorbed solar radiation is mainly balanced by the sensible heat flux (8.1 W m −2 ; positive values indicates reduced upward heat flux). In contrast, the differences in the latent heat flux and ground heat flux are low. The overall responses of the surface energy to the overestimated TPSC lead to an incorrect cooling shift. Hence, the numerical experiment indicates that the cold SAT biases are induced by the overestimation of TPSC.

Conclusions and discussion
Accurate subseasonal-to-seasonal (S2S) atmospheric forecasts and hydrological forecasts have considerable socioeconomic value. This study evaluates the Tibetan Plateau snow cover (TPSC) prediction capabilities of three S2S forecast models (ECMWF, NCEP and CMA) during wintertime. These three S2S models can skillfully forecast TPSC variations within a lead time of 2 weeks during wintertime with temporal correlation coefficients greater than 0.5. ECMWF better captures TPSC variations compared with NCEP and CMA at a lead time of 1-2 weeks. All models show limited skill in forecasting TPSC at a lead time of 3 weeks or more. Compared with the IMS snow cover analysis, all three models tend to overestimate the area of TPSC. Another remarkable issue regarding the TPSC forecast is the increasing TPSC with forecast lead time, which makes the systematic positive biases of TPSC in models further increase at longer forecast lead times.
S2S models underestimate the frequency of TPSC dissipation, whereas they overestimate the frequency of TPSC accumulation. The accumulation and dissipation of wintertime TPSC occurs evenly in the observations. However, this kind of balance does not exist in the S2S models. In the models, the proportion of TPSC accumulation is mostly more than 2 times as large as the dissipation proportion. The most severe underestimations of the dissipation proportions are the forecasts at a lead time of 2 or 3 weeks for all models. The models also have some difficulties forecasting the TPSC dissipation at an observed rate. The growth of TPSC in the decreasing TPSC cases is −10.0 % over 2 weeks in the observations, but all the changes in TPSC in the models are increasing.
All of the three S2S models consistently exaggerate the precipitation over the Tibetan Plateau compared to the observations. The exaggeration of the precipitation is prominent and always exists throughout the model integration. Systematic bias in the TPSC therefore occurs and accumulates with the model integration time due to exaggeration of the precipitation in the models.
The increasing TPSC is accompanied by decreasing surface air temperature (SAT) with forecast lead time. The SAT over the Tibetan Plateau becomes colder with increasing forecast lead time in the S2S models. The differences in SATs between the forecast for a lead of 5 weeks and the forecast for a lead of 1 week can be up to 1.9 • C. SATs tends to be sensitive to the TPSCs in both ECMWF and NCEP. However, SAT in CMA lacks sensitivity to TPSC. Numerical experiments were performed to test whether the cold SAT biases are induced by the TPSC overestimation. The control run exaggerates the growth of TPSC, which is similar to that in S2S models. The SAT in the control run becomes colder with integration time. When the increasing TPSC with forecast lead time in the models along with the integration of the model is removed in the sensitivity run, the decreasing SAT with integration time also disappears. The overall responses of the surface energy to the overestimated TPSC lead to incorrect cooling shifts. This finding indicates that cold SAT biases are induced by the TPSC overestimation.
Land-atmosphere coupling is one of the key physical processes for S2S prediction but is not well simulated and may reduce S2S prediction skill (Robertson et al., 2014;Dirmeyer et al., 2019). Studies have shown that better snow cover initialization improves subseasonal and seasonal forecasts/simulations (Jeong et al., 2013;Orsolini et al., 2013;Senan et al., 2016;Lin et al., 2016;Kolstad, 2017;F. Li et al., 2019). This study indicates that in addition to snow cover initialization, a better model skill for snow cover prediction may also improves S2S prediction skill. More work is necessary and valuable to improve the prediction ability of models for snow cover.
Previous studies have shown that current state-of-the-art GCMs tend to strongly overestimate the precipitation over the Tibetan Plateau (e.g., Su et al., 2013;Chen and Frauenfeld, 2014;Zhang and Li, 2016;. It is worthwhile to note that the S2S models also significantly overestimate the precipitation over the Tibetan Plateau and further cause other biases (e.g., TPSC biases and SAT biases). It is of great significance to reduce the biases of the precipitation over the Tibetan Plateau in the GCMs. Surface winds and snow sublimation could also play a role in causing the snow ablation. Identifying the relative contributions of these factors to the biased snow prediction needs more detailed and careful diagnoses. Note that the current study analyzed the data during common reforecast period of 1999-2010 for ECMWF, NCEP and CMA models. All these three operational models provide real-time forecasts since 2015 based on the improved prediction systems. It could be valuable to carry out evaluation works based on the up-to-date forecast results. Future studies on these issues are potentially valuable.
Code and data availability. The data and model used in this study are free to the public. The S2S datasets and ERA-Interim data are available at https://apps.ecmwf.int/datasets/ (European Centre for Medium-Range Weather Forecasts, 2020a, b). The IMS snow cover data are available at https://nsidc.org/data/G02156 (National Snow and Ice Data Center, 2020). The GPCC data are available at https://www.dwd.de/EN/ourservices/gpcc/gpcc.html (Deutscher Wetterdienst, 2020). The TRMM data are available at https: //disc.gsfc.nasa.gov (NASA Goddard Earth Sciences Data and Information Services Center, 2020). The NCEP FNL data are available at https://rda.ucar.edu/datasets/ds083.2/ (National Center for Atmospheric Research, 2020). The WRF source codes can be obtained at https://www2.mmm.ucar.edu/wrf/users/download/ get_source.html (University Corporation for Atmospheric Research, 2020). All figures were produced using NCAR Command Language (NCL) version 6.6.2, an open-source software free to the public, by UCAR/NCAR/CISL/TDD (2020), https://doi.org/10.5065/d6wd3xh5. The NCL scripts used in this study are available from the corresponding author upon reasonable request Author contributions. WL led the overall scientific questions and designed the research. SH and WL analyzed the data and drafted the manuscript for initial submission. WL analyzed the data for the revised manuscript. WL, PH, WG and JW made substantial contributions to revise the manuscript and prepare the responses to the referees.
Competing interests. The authors declare that they have no conflict of interest. Review statement. This paper was edited by Mark Flanner and reviewed by two anonymous referees.