Understanding Drivers of Glacier Length Variability Over the Last Millennium

Changes in glacier length reflect the integrated response to local fluctuations in temperature and precipitation resulting from both external forcing (e.g., volcanic eruptions or anthropogenic CO2) and internal climate variability. In order to interpret the climate history reflected in the glacier moraine record, therefore, the influence of both sources of climate variability must be considered. Here we study the last millennium of glacier length variability across the globe using a simple dynamic glacier model, which we force with temperature and precipitation time series from a 13-member ensemble of sim5 ulations from a global climate model. The ensemble allows us to quantify the contributions to glacier length variability from external forcing (given by the ensemble mean) and internal variability (given by the ensemble spread). Within this framework, we find that internal variability drives most length changes in mountain glaciers that have a response timescale of less than a few decades. However, for glaciers with longer response timescales (more than a few decades) external forcing has a greater influence than internal variability. We further find that external forcing also dominates when the response of glaciers from widely 10 separated regions is averaged. Single-forcing simulations indicate that most of the forced response over the last millennium, pre-anthropogenic warming, has been driven by global-scale temperature change associated with volcanic aerosols.

At each of these grid points, we compute annual time series of average winter precipitation (P ) and summer temperature (T ).
For grid points in the Northern Hemisphere, P is taken from October-March and T is taken from April-September. The seasons are flipped for grid points in the Southern Hemisphere. We then estimate mass-balance variability using the following linear approximations, where b w and b s represent winter and summer mass balance, α and λ are positive constants, and primes indicate anomalies relative to the millennial average . Note that the negative sign in Eq. 2 reflects an anticorrelation between mass 65 balance and temperature in the summer.
Like most GCMs, the grid resolution of the LME is too coarse to capture the full influence of orography on P and T at most glacier locations. To compensate for this, we choose values of α and λ for each glacier that produce variances in b w and b s that match the observed values reported by Medwedeff and Roe (2017). This is achieved by setting where the numerators represent the observed standard deviations in winter and summer mass balance, and the denominators represent the standard deviations in winter precipitation and summer temperature. Annual mass-balance anomalies are then given by For this portion of our analysis, we focus on three geographically diverse glaciers: i) the Silvretta glacier in the Alps, ii) the South Cascade glacier in the American Pacific Northwest, and iii) the Martial Este glacier in the Patagonian Andes. Figure 1 shows the time series of winter mass-balance anomalies (b w ; Fig. 1a), summer mass-balance anomalies (b s ; Fig. 1b), and annual mass-balance anomalies (b ; Fig. 1c) at these three glaciers, with their average shown in green at the bottom of each panel. The

80
heavy colors indicate the ensemble-mean anomalies, while the lighter colors represent the range of ensemble members. The vertical grey line at the year 1880 marks the approximate transition from the pre-industrial era to the modern era, after which the time axis on each panel is dilated by a factor of 4 to enhance visibility. We evaluate the accuracy of LME temperatures over the modern era by plotting values of b s calculated from observed temperature anomalies (NASA GISS dataset; Lenssen et al., 2019) using Eq. 2 (Fig. 1b, black lines). 85 The ensemble mean mostly represents the response to external forcing. The importance of this forcing relative to the internal variability can be measured by a signal-to-noise ratio (SNR), which we define as where Var[b (t)] is the variance of the ensemble mean time series (i.e., an approximation of the forced response, or "signal"), and Var[b (t)] is the total variance across all ensemble members. The difference between these two quantities in the denomi-90 nator represents the variance due to internal climate variability (i.e., the "noise"). In general, SNR values less than one indicate that most of the variance in a time series can be attributed to internal variability, while SNR values greater than one indicate that most variance is driven by external forcing. For a 13-member ensemble, SNR values greater than 0.125 in the modern era or 0.098 in the pre-industrial era indicate that a forced signal can be detected and is statistically significant at the 99% confidence level (see Appendix). SNR values are listed in Fig. 1 for each time series and time period, with bold font indicating statistical 95 significance.
In the winter (Fig. 1a), there is little evidence of a forced signal in b w at any glacier during either time period, as indicated by SNR values that are uniformly low and statistically insignificant. This implies that interannual variability in winter mass balance (and thus winter precipitation) at these glaciers is dominated by internal climate noise. This finding is expected based on past studies of the contribution from winter precipitation trends in glacier change (e.g., Medwedeff and Roe, 2017).

100
In contrast, b s shows clear evidence of a forced signal (Fig. 1b). During the pre-industrial period, all locations exhibit several abrupt spikes associated with volcanic eruptions, with the most notable event being the Samalas eruption in 1257 (e.g., Guillet et al., 2017). The amplitude of these spikes is strongest in Europe and North America, where they occur against the backdrop of a longer-term increase in b s associated with a cooling trend that persists through the end of the 19th century. Although the forced signal is statistically significant at all locations during the pre-industrial period, SNR values are less than 0.2, implying 105 that internal noise still accounts for the majority of local year-to-year temperature variability.
Around 1900, b s begins to decrease at all three glaciers in response to global warming (Fig. 1b). SNR values are greater in this period than in the pre-industrial period, reflecting the unprecedented strength and persistence of anthropogenic forcing.
Comparing the simulated trends in b s over the 20th century with those derived from observed temperatures ( Fig. 1b; colored vs. black lines), we find good agreement at South Cascade and Martial Este but not at Silvretta, where the decrease in b s in 110 the LME ensemble mean is too low by 0.07 m/decade, reflecting a warming trend that is too low by 0.063 K/decade (Eq. 2, with λ = 1.11 m K −1 ). One possible interpretation of this difference is that part of the observed warming trend at Silvretta was the result of internal variability. This conclusion is supported by the fact that the observed trend at Silvretta lies within the ensemble range of simulated trends. On the other hand, 20th-century warming trends in the LME are weaker than observed trends over most of the globe (Fig. A1), suggesting that model bias likely also plays a role. We consider possible causes of this 115 bias in Section 3.4, but note that it does not affect the bulk of our analysis.
Finally, variability in b (Fig. 1c) shows the influence of both b w and b s , but in different proportions for each glacier. For example, at South Cascade glacier (red), the variance in b w is a factor of 2 greater than the variance in b s , and thus exerts a much stronger influence on b . Because b w is essentially noise, the SNR of b at South Cascade glacier is statistically insignificant variance in b w despite the presence of forced variance in b s is typical of maritime glaciers and detailed, for example, in Young et al. (2020). At Silvretta and Martial Este, by contrast, the variance comes mostly from b s . Thus, while b w remains a source of noise, its amplitude is small enough that the forced signal from b s can still be detected in b .
A final point to emphasize from Fig. 1 relates to the averaged time series among the three glaciers (green lines). These exhibit higher SNRs in b s and b than any individual glacier, suggesting that spatial averaging may help suppress noise and bring out 125 a common forced signal. This echoes results from studies of modern-day warming, which have found that the anthropogenic signal is clearest at global scales, even when it is obscured by internal variability at local and regional scales (e.g., Deser et al., 2012). We elaborate on the reasons for this result in Section 3.2 below.

Glacier simulations
To simulate variability in glacier length, we use the three-stage linear model of Roe and Baker (2014), which has been shown 130 to better capture the high-frequency response of glacier dynamics than earlier low-order models (Harrison and Post, 2003;Oerlemans, 2000Oerlemans, , 2005. The three stages, which can be diagnosed from ice dynamics in numerical models, are: (1) changes in interior thickness, which drive (2) changes in terminus ice flux, which in turn drive (3) changes in glacier length. Collectively, the three stages can be represented as a linear, third-order differential equation,

135
where L is the length anomaly as a function of time, t; b is the annual mass balance anomaly derived from LME output; is the ratio of the variances in the three-stage and one-stage models, which is set to 1/ √ 3; τ is the glacier response timescale, and β is a non-dimensional shape parameter that only affects the amplitude of L , and not its temporal characteristics. The glacier response time is given by τ = −H/b t , where H is a characteristic ice thickness in the terminus zone and b t is the net (negative) mass balance in the terminus zone (Jóhannesson et al., 1989). Note there is not necessarily a simple relationship between τ and 140 glacier size: cirque glaciers can be thick and have termini that extend only a little past the equilibrium line altitude, giving them τ s of many decades (e.g., Barth et al., 2017). Conversely, glaciers sourced from large accumulation areas may have termini well below the equilibrium line altitude, and if terminating on steep slopes, also be relatively thin, giving them short τ s of a decade or less (e.g., Franz Joseph N.Z., Purdie et al., 2014).
The three-stage model accurately emulates the autocorrelation function and power spectrum of numerical flowline models of 145 ice dynamics (Christian et al., 2018;Roe and Baker, 2014) and has been shown to produce realistic length responses to climate trends and variability given appropriate choices of τ and β (e.g., Herla et al., 2017).
In the simulations we present here, we did not attempt to assign realistic values of τ and β to each glacier. Rather, for each glacier, we performed three separate simulations with τ = 10, 30, and 100 years, and β = 150. This approach has two main advantages. First, the subset of glaciers analyzed in Fig. 1 are located in regions with a wide range of glacier sizes.  Comparing the columns of Fig. 2, we find that the time series of L become increasingly smooth as τ increases, indicating 165 a progressive "reddening" of the variance spectra. This reflects the fact that glaciers essentially act as low-pass filters of annual mass balance. The spectral power drops sharply at frequencies higher than 1/(2πτ ) (Roe and Baker, 2014). During the pre-industrial period, positive trends in L across the Northern Hemisphere reflect a cooling trend within the LME, as noted previously (Fig. 1b). Conversely, during the past century of anthropogenic warming, negative trends in L are found at all locations when τ = 10 years, but only in the Southern Hemisphere when τ = 100 years. The absence of 20th-century retreat in 170 the Northern Hemisphere partially reflects the muted warming trend in the LME relative to observations (Fig. A1). However, the difference in recent trends between τ = 10 years and τ = 30 years also illustrates a more general point: namely, that higher τ values imply a more delayed response to a given climate perturbation. We discuss the implications of this result for future changes in glacier length in Section 4.
In addition to the ensemble-mean differences discussed above, the panels in Fig. 2 also exhibit significant differences in 175 ensemble spread. These differences are most clearly illustrated by SNR values, which reveal two key dependencies. First, at all locations, SNR increases monotonically with increasing timescale (i.e., from left to right in Fig. 2). Notably, we find a similar increase in SNR with τ at all other glacier locations as well (Fig. A2), suggesting that it reflects a general property of the climate system. Large-τ glaciers are usually more reliable indicators of forced climate change than small-τ glaciers, which are more prone to natural fluctuations. There is a trade-off: the larger-τ glaciers smooth over longer periods and so the temporal 180 resolution of the forced change is degraded.
Second, SNR is substantially greater in the spatially-averaged time series of glacier length (Fig. 2, bottom row) than it is for any individual glacier, similar to what we found with annual mass balance ( Fig. 1). This suggests that glacier variability is more likely to reflect external forcing if the glacier variability is coherent across a large spatial domain. We address each of these behaviors in greater detail below. We also extend our analysis to a larger set of 76 glaciers, for which variability in 185 summer and winter mass balance were also provided by Medwedeff and Roe (2017 , Table A1).

Dependence of SNR on timescale
The relationship between SNR and τ evident in Fig. 2 stems from differences in how natural and forced variance is distributed across the frequency spectrum. These differences are illustrated in Fig. 3, which shows the spectra of unforced and forced annual mass balance (b) variability in the top row, with subsequent rows showing equivalent results for winter precipitation (P )  195 Comparing the spectra of unforced and forced mass-balance variability in the top row of Fig. 3, we find that the unforced spectrum is essentially white (i.e., it has a flat spectral slope), while the forced spectrum is red, exhibiting greater variance at low frequencies. Therefore, when low-pass filtered by the glacier response, more of the forced variability is retained by the glacier, resulting in higher SNR values in glacier length relative to annual mass balance. This effect becomes larger as τ increases and the filter cutoff shifts to lower frequencies, thus explaining the positive correlation between SNR and τ found in The source of the differences in slope between the forced and unforced mass-balance spectra is evident in Fig. 3c-f, which show the same spectral decompositions for P and T . The spectra of P are essentially flat at all frequencies, indicative of white noise generated by internal atmospheric variability ( Fig. 3c-d). This is equally true of the forced and unforced components, providing further evidence that most of the "signal" in ensemble-mean precipitation is not physical, but rather residual noise 205 retained during the averaging (Fig. 1). Similarly, the unforced T spectra (Fig. 3e) are also mostly white, except at the highest frequencies where variance is likely damped by interannual persistence in sea surface temperatures. In contrast, the forced T spectra are significantly red at all frequencies (Fig. 3f), and are thus clearly responsible for the similar slopes of the forced mass balance spectra (Fig. 3b).
The redness of the forced mass-balance spectra in Fig. 3b is not evident in the spectrum of volcanic forcing, which is white 210 at sub-decadal frequencies (Fig. A3). It must therefore be caused by reddening mechanisms inherent to the climate system, such as ocean heat storage and sea-ice feedbacks (Stenchikov et al., 2009;Miller et al., 2012). GCM simulations have further shown that strong, abrupt forcing from volcanic aerosols can also be reddened by enhanced heat exchange with the deep ocean, due to a strengthening of both vertical mixing and the Atlantic Meridional Overturning Circulation (AMOC) (Stenchikov et al., 2009). This unique response to volcanic forcing may help explain why the forced spectra of T and b are so much redder than 215 the unforced spectra (Fig. 3). However, further research is needed to fully understand the difference in spectral slopes between forced and internal temperature variability.

Dependence of SNR on spatial scale
We now compare the spatial coherence of the forced and unforced responses among the set of 76 glaciers. The top row of Fig.   4 shows the simultaneous cross-correlation matrices for the time series of unforced and forced glacier length. The correlations 220 were computed assuming τ = 10 years for each glacier, but the results are similar for τ = 30 and 100 years (not shown). For natural variability (Fig. 4a), we find strong correlations at the scale of individual mountain ranges, but not at larger scales. For example, while all glaciers in the Alps are strongly correlated with each other, they are not significantly correlated with glaciers in Scandinavia. This implies that internal variability has a relatively short decorrelation scale, consistent with synoptic-scale atmospheric variability (e.g., Wallace and Hobbs, 2006). By contrast, cross-correlations among the forced time series of glacier 225 length are significantly positive among a large majority of glacier pairs (92%; Fig. 4b), indicating that climate changes resulting from external forcing are globally coherent (Fig. 4b). When multiple time series of glacier length are averaged across different mountain ranges, the incoherent internal noise is damped relative to the coherent forced signal. This explains why the SNR of the averaged time series of glacier length is greater than the SNR at any individual glacier (Fig. 2).
The bottom two rows of Fig. 4 show the same cross-correlation matrices as in the top row, but with glacier length time series 230 separated into contributions from T and P . In the unforced case (Figs. 3c,e), the correlation patterns resulting from both Tand P -driven length variability resemble the total unforced correlations in Fig. 3a. One exception is in the P -driven matrix ( Fig. 3c), where negative correlations indicate the existence of dipole patterns between adjacent regions, such as the Alps and Scandinavia, or the Pacific Northwest and Alaska. However, these dipole patterns have a relatively small impact on the total unforced correlations across these regions. Meanwhile, the decomposition of the forced correlations (Figs. 4d,f) shows that the 235 global coherence of forced glacier variability comes entirely from T rather than P . Precipitation-driven variability is a source of noise that mostly weakens the correlations, but does not change their global coherence.

The relative importance of precipitation versus temperature
The preceding analysis has shown that forced changes in glacier length are driven primarily by globally-coherent changes in summer temperature. However, this result provides little insight into the relative importance of temperature versus precipitation 240 in driving glacier variability more generally. These contributions are quantified in Fig. 5, which shows the ratio of T -driven variance to total variance, both in annual mass balance (Fig. 5a), and in glacier length ( Fig. 5b; assuming τ = 10 years).
Because T and P are not significantly correlated at any glacier location, this ratio approximately represents the fraction of total variance that can be attributed to T . The difference between the two panels is shown in Fig. 5c.
In the case of annual mass balance, T accounts for more than half the variance at 58 out of 76 glaciers, with an average value 245 of 67% of the total variance among all glaciers. The few glaciers where precipitation variability plays a larger role are mostly located in maritime environments with large storm-track variability, such as Alaska, the Pacific Northwest, and the Andes.
In contrast, variability in glacier length is dominated by temperature everywhere (Fig. 5b), including at glaciers (like South Cascade) where precipitation accounts for most of the variability in annual mass balance. spectrum of T is redder than the spectrum of P . This difference is especially pronounced in the forced time series, but it is also evident in the unforced time series at frequencies greater than ∼ 1/decade, where T variance is suppressed by SST persistence.
As low-pass filters, glaciers eliminate variance at the highest frequencies, where precipitation's contribution to annual mass balance variance is greatest. Thus, for the same reasons that SNR is enhanced by a glacier's filtering properties, the contribution of temperature to glacier-length variability is enhanced relative to that of precipitation. This means that fluctuations in glacier-255 length variability primarily reflect low-frequency variability in summer temperature, even where mass-balance variability is more strongly influenced by winter precipitation.

Roles of individual forcings
Finally, we evaluate the relative importance of the different climate-forcing factors in these simulations. In addition to the full 13-member ensemble, the LME archive also contains smaller ensembles of simulations representing the climate response to  (5), and changes in solar and orbital patterns (3). As in the full-forcing ensemble, we use the ensemble mean to approximate the climate response to each individual factor. However, it is important to note that these approximations contain substantially more noise than the full ensemble mean because there are fewer ensemble members. During the pre-industrial era, most of the forced variability in glacier length can be attributed to volcanic aerosols (Fig.   6, red and green lines). Over the last century, however, anthropogenic factors have played the largest role. In the Southern Hemisphere, the full-forcing time series closely follows greenhouse-gas-driven trends beginning around the year 1850. In the 270 Northern Hemisphere, by contrast, retreat due to greenhouse gases is largely offset by industrial aerosol emissions during the modern era. While it is well known that industrial aerosols provided radiative cooling over the 20th century, warming trends over this period are generally lower in the LME than in observations (Fig. A1), suggesting that the model's aerosol forcing may be too strong, or that its transient climate sensitivity may be too low. Whatever the cause, the suppressed 20th-century warming trend in many regions within the LME explains why our simulations appear to show less glacier retreat than has been 275 observed in some locations (e.g., Herla et al., 2017).

Summary and Discussion
In this study, we have combined an ensemble of numerical climate model simulations and a glacier length model to evaluate the relative importance of climate forcing and internal climate variability in driving glacier-length fluctuations. While the potential importance of internal variability has been noted before, this is the first study to evaluate the relative importance of 280 natural forcing vs. internal variability in the pre-industrial era, and the accompanying spatial patterns of glacier response. We estimated annual mass balance anomalies over the last millennium at 76 glaciers around the world using simulated time series of summer temperature and winter precipitation from the 13-member CESM Last Millennium Ensemble. Using the three-stage linear model of Roe and Baker (2014), we then converted these mass-balance anomalies into glacier-length anomalies for a range of glacier response timescales, thus capturing the diversity of behavior exhibited by glaciers of different sizes and 285 geometries. Because the ensemble simulations differ only in their initial conditions, the responses of mass balance and glacier length to external radiative forcing are mostly captured by the ensemble mean, while internal unforced variability is represented by departures from the ensemble mean (i.e., the ensemble spread). The ratio of ensemble-mean variance to ensemble-spread variance is defined as the signal-to-noise ratio (SNR), and represents the fraction of total variance driven by external forcing.
While SNR varies by location, we found that two factors influence SNR more generally. First, SNR is greater for glacier 290 length than for annual mass balance, and increases further with increasing glacier response timescale, τ . This timescaledependence reflects differences in how forced and unforced mass-balance variance is distributed across the frequency spectrum.
Specifically, because forced variance has a redder spectrum than unforced variance, the forced signal is amplified relative to unforced noise when low-pass-filtered by a glacier. This explains why a glacier like South Cascade, which does not exhibit a significant forced response in annual mass balance due to high interannual variability (Fig. 1, red), still shows a significant 295 forced response in glacier length in our simulations (Fig. 2, red).
For both mass balance and glacier length, SNR is enhanced by averaging multiple time series from different locations. This is because forced changes in mass balance tend to be globally coherent, reflecting a global-scale temperature response, while unforced variability has little coherence beyond the scale of an individual mountain range.
Our results have implications for how past glacier variability should be interpreted. For example, because SNR increases 300 with spatial averaging, a change in glacier length is more likely to be forced when coherent changes are also observed in other mountain ranges.
However, it is important to recognize that the amplified signal in large-τ glaciers comes at the expense of decreased temporal resolution. Thus, while large-τ glaciers may be more reliable indicators of climate change on centennial timescales, they may entirely miss climate changes that occur on multi-decadal timescales.

305
Differences in response time also have implications for how glaciers have responded and will respond to global warming.
Because glaciers are lagging indicators of climate change, recent glacier retreat has been more modest than the decrease in annual mass balance (Figs. 1 and 2), implying that further retreat is locked in, even in the absence of additional warming (Christian et al., 2018). This disequilibrium is especially pronounced for large-τ glaciers, whose slow response means that they have only begun to feel the effects of the past century of warming. In future decades, therefore, we expect that large-τ glaciers 310 will experience the greatest retreat, as they integrate the effects of both past and future warming.
Our analyses have been made possible by ensemble climate modeling. As done here, such ensembles can be used to decompose the contributions due to internal variability and natural and anthropogenic forcing. However, we have used only one climate model. As other ensembles become more widely available it will be important to evaluate different climate models, and also different estimates of the natural and anthropogenic forcing, for which there are still significant uncertainties (Schmidt 315 et al., 2011). Lastly, we have analyzed continuous time series of glacier length. Actual records of past glacier extent are more fragmentary, with moraines representing periods of maximum extent that were not subsequently over-ridden by a later advance.
An interesting extension of this study would be to make estimates of the moraine record that might be left on the landscape. We consider a signal-to-noise ratio (SNR) to be statistically significant if we can reject the null hypothesis that SNR = 0 with at least 99% confidence (p < 0.01). To determine the threshold for statistical significance, we performed a Monte Carlo simulation consisting of 100,000 ensembles of 13 randomly generated Gaussian time series (i.e., noise). We set the length of each synthetic time series based on the estimated degrees of freedom in the given climate variable, as described below. For 325 each 13-member ensemble, we compute the SNR as in Eq. 6. This yields a set of 100,000 synthetic SNR values which have a mean of 0.083 and a distribution that depends on the number of degrees of freedom in the time series. Because the true SNR of each ensemble is equal to 0, we set the threshold for statistical significance to be the 99th percentile of SNR values within the distribution.
At all glacier locations within the LME, we find that time series of winter precipitation and summer temperature exhibit an 330 e-folding decorrelation scale of no more than 1 year. Thus, for time series of summer and winter mass balance, we assume the number of degrees of freedom is equal to half the number of years in the stated time period Leith (1973). This yields significance thresholds of 0.098 for the pre-industrial and 0.125 for the modern.
For the glacier length time series, we find that the e-folding decorrelation scale is on average equal to about 1.5τ . Over the pre-industrial period, this implies about 30, 10, and 3 degrees of freedom for τ = 10, 30, and 100 years, respectively. The 335 corresponding SNR significance thresholds are found to be 0.15 for τ = 10 years, 0.22 for τ = 30 years, and 0.49 for τ = 100 years. All of our simulated glacier length time series have SNR values that exceed these thresholds.
Author contributions. Huston and Siler performed the analysis and wrote the paper. Roe and Siler designed the study. All authors helped interpret the results and edit the paper.    Table A1. Glacier data from Medwedeff and Roe (2017). Glaciers are excluded if observations of summer or winter mass balance span less than 10 years. σ b,w : the standard deviation of observed winter mass balance (in m). σ b,s : the standard deviation of observed summer mass balance (in m). σP : the standard deviation of simulated winter precipitation within the LME over the last millennium (1000-1999) (in m).
σT : the standard deviation of simulated summer temperature within the LME over the last millennium (1000-1999) (in K). α: the ratio of σ b,w /σP . λ: the ratio of σ b,s /σT (in m K −1 ).