Controls of outbursts of moraine-dammed lakes in the greater Himalayan region

Glacial lakes in the Hindu Kush–Karakoram– Himalayas–Nyainqentanglha (HKKHN) region have grown rapidly in number and area in past decades, and some dozens have drained in catastrophic glacial lake outburst floods (GLOFs). Estimating regional susceptibility of glacial lakes has largely relied on qualitative assessments by experts, thus motivating a more systematic and quantitative appraisal. Before the backdrop of current climate-change projections and the potential of elevation-dependent warming, an objective and regionally consistent assessment is urgently needed. We use an inventory of 3390 moraine-dammed lakes and their documented outburst history in the past four decades to test whether elevation, lake area and its rate of change, glaciermass balance, and monsoonality are useful inputs to a probabilistic classification model. We implement these candidate predictors in four Bayesian multi-level logistic regression models to estimate the posterior susceptibility to GLOFs. We find that mostly larger lakes have been more prone to GLOFs in the past four decades regardless of the elevation band in which they occurred. We also find that including the regional average glacier-mass balance improves the model classification. In contrast, changes in lake area and monsoonality play ambiguous roles. Our study provides first quantitative evidence that GLOF susceptibility in the HKKHN scales with lake area, though less so with its dynamics. Our probabilistic prognoses offer improvement compared to a random classification based on average GLOF frequency. Yet they also reveal some major uncertainties that have remained largely unquantified previously and that challenge the applicability of single models. Ensembles of multiple models could be a viable alternative for more accurately classifying the susceptibility of moraine-dammed lakes to GLOFs.

hazard categories. This and many similar qualitative ranking schemes are accessible to a broader audience and policy makers, 70 but are difficult to compare and potentially oversimplify uncertainties.
One way to deal with these uncertainties in a more objective way involves a Bayesian approach. Here, we used this probabilistic reasoning with data-driven models. Specifically, we tested how well some of the more widely adopted predictors of GLOF susceptibility and hazard fare in a multi-level logistic regression that is informed more by data rather than by expert opinion.
We used lake elevation z (m a.s.l.) as a proxy for the standard lapse rate of tropospheric air temperature (Rolland, 2003;Yang and Smith, 1985). This elevation-dependent thermal gradient is also a major control on the distribution of alpine permafrost 115 (Etzelmüller and Frauenfelder, 2009) and precipitation. Mean annual rainfall along the Himalayan front can exceed 4,000 mm at elevations some 4,000 m high, where ~25% of all glacial lakes occur (Fig. 1;Bookhagen and Burbank, 2010). Lake elevation should also represent to first order topographic effects of EDW. For example, the stability of low-lying moraine dams may be compromised by the loss of permafrost and commensurate increases in permeability in the moraine barrier and adjacent valley slopes (Haeberli et al., 2017). 120 Glacial lake area A (m²) and its rate of change ΔA (net change) and A * (relative change, %) are other common predictors of susceptibility and hazard in GLOF studies (Allen et al., 2019;Bolch et al., 2011;Prakash and Nagarajan, 2017; see Table 1 for full list of references) that we considered here. Due to a general lack in available bathymetric data on a regional scale, a number of studies used the frequently observed phenomenon that lake area scales with lake volume and depth (Huggel et al., 2002;Iribarren Anacona et al., 2014). Growing lake depths increase the hydrostatic pressure acting on moraine dams, thus 125 raising the potential of failure (Iribarren Anacona et al., 2014;Rounce et al., 2016). In the past decades, lake areas have grown largest in the Central Himalayas (+23% in 1990Nie et al., 2017) and Nyainqentanglha Mountains but lowest in the northwestern Himalayas Nie et al., 2017), and many studies have emphasised the role of growing lakes on GLOF susceptibility (e.g. GAPHAZ, 2017;Prakash and Nagarajan, 2017;Rounce et al., 2016)Since 1990, lake areas have grown largest in the Central Himalayas (+23%), and lowest in the northwestern Himalayas (+5.0%) (Nie et al., 2017), and 130 many studies have emphasised the role of growing lakes on GLOF susceptibility (e.g. GAPHAZ, 2017;Prakash and Nagarajan, 2017;Rounce et al., 2016). Many previous GLOF assessment schemes included lake area or lake area growth as a proxy for the volume of water that could be potentially released by an outburst and, thus, the resulting downstream hazard (e.g. Allen et al., 2019;Bolch et al., 2011). However, a number of studies also stress that lake area and its growth define the exposure to external and internal triggers of moraine dam breach: larger and growing lakes offer more area for impacts from mass flows 135 such as avalanches, rockfalls, and landslides originating from adjacent valley slopes (GAPHAZ, 2017;Haeberli et al., 2017;Prakash and Nagarajan, 2017;Rounce et al., 2016). Some authors also link growing lake areas to an increase in hydrostatic pressure acting on its moraine dam, thus, making the letter more susceptible to sudden failure (Iribarren Anacona et al., 2014;Mergili and Schneider, 2011).
We also tested the impact of upstream catchment area C (m²) on GLOF susceptibility. A larger upstream catchment area has 140 been associated with an increased susceptibility to GLOFs as runoff from intense precipitation as well as glacier and snow melt can lead to sudden increases in lake volume (Allen et al., 2019;GAPHAZ, 2017). We find that catchment area C correlates with lake area A (Pearson's ρ = 0.45) and we, thus, preferred C over A in two of our models, as C is invariant at the timescale of our study and we use these two models to explicitly test whether runoff by glacier melt or monsoonal precipitation had an effect on GLOFs in our study area. 145 Similarly to changes in lake area, glacier dynamics are frequently mentioned, though rarely incorporated quantitatively in susceptibility appraisals (Bolch et al., 2011;Ives et al., 2010). This motivated us to consider the average changes in regional glacier-mass balances between 2000 and 2016 Δm (m water equivalent yr -1 ) from Brun et al. (2017). These readily available data on regional glacier-mass balances are proxies for other, less accessible, physical controls on GLOF susceptibility such as glacial meltwater input, either directly from the parent glacier or from glaciers upstream, as well as permafrost decay in slopes 150 fringing the lake (see Table 2 for full list).
Meteorological drivers entered previous qualitative GLOF hazard appraisals mostly as (the probability of) extreme monsoonal precipitation events: the Kedarnath GLOF disaster, for example, was triggered by intense surface runoff (Huggel et al., 2004;Prakash and Nagarajan, 2017). Heavy rainfall may also trigger landslides or debris flows from adjacent hillslopes followed by displacement waves that overtop moraine dams (Huggel et al., 2004;Prakash and Nagarajan, 2017). Elevated lake levels during the monsoon season also raise the hydrostatic pressure acting onto moraine dams (Richardson and Reynolds, 2000).
Furthermore, different precipitation regimes and climatic preconditions may also influence moraine dam-failure mechanics (Wang et al., 2012). Intense precipitation occurs in our study region largely during the summer monsoon, so that we derived a synoptic measure of monsoonality M (%). We define monsoonality M in terms of the annual proportion of summer, i.e. the warmest quarter's, precipitation, which is highest in the southeast HKKHN, where it is linked to monsoonal low-pressure 160 systems (Krishnan et al., 2019). -increasing lake area commonly reported as scaling with increasing lake depth  potentially increased hydrostatic pressure acting on the moraine dam -increased proximity to steep valley slopes  increased potential of mass movements entering the lake  Glacier-mass balance r glacier-mass balance region -Brun et al., 2017 -proxy for direct or surface runoff glacier meltwater input, calving potential of parent glacier front, and permafrost distribution in lake surroundings -link between regional glacier-mass balance and synoptic regime (winter westerlies versus monsoon dominated) Δmr average glaciermass balance m w.e. (water equivalent) yr -1 Monsoonality (annual proportion of summer precipitation) M % (mm) CHELSA (Karger et al., 2017) -high intensity precipitation events during monsoon season might lead to increased surface runoff into glacial lakes (cloudburst event) -seasonal increases in lake levels and, hence, lake depths increase hydrostatic pressure acting onto moraine dam -link between regional glacier-mass balance and synoptic regime (winter westerlies versus monsoon dominated) 165 We extracted information on these characteristics for glacial lakes recorded in two inventories. First, we used the ICIMOD database of 25,614 lakes manually mapped from Landsat imagery acquired in 2005 (± two years) (Maharjan et al., 2018), from which we extracted 7,284 lakes dammed by moraines (classes m(l), m(e), and m(o) in Maharjan et al., 2018).from which we 170 extracted 7,284 lakes dammed mostly by lateral and end moraines . Second, we identified from an independent regional GLOF inventory (Veh et al. 2019) 31 lakes that had at least one outburst between 1981 and 2017 and that are listed in the ICIMOD inventory. The triggering mechanism of these studied GLOFs is reported in only seven cases, four of which are attributed to ice avalanches entering the lake (e.g. Tam Pokhari, Nepal or Kongyangmi La Tsho, India; Ives et al., 2010;Nie et al., 2018).
Other triggers of the GLOFs studied here include piping (Yindapu Co, China; Nie et al., 2018) and the collapse of an ice-cored 175 moraine (Luggye Tsho, Bhutan; Fujita et al., 2008). We focused on lakes >10,000 m² to ensure comparability between the two inventories, thus acquiring a final sample size of 3,390 lakes. Given the sparse network of weather stations in the HKKHN, we computed the monsoonality averaged for each lake from the 1-km resolution CHELSA bioclimatic variables (Karger et al., 2017). These variables are correlated with elevation because of the same underlying interpolation technique so that we limited our models to those with poorly correlated predictors. This meant omitting other predictors such as mean annual temperature, 180 annual precipitation totals and annual temperature and precipitation variability. We extracted topographic data from the voidfree 30-m resolution SRTM (Shuttle Radar Topographic Mission of 2000) DEM, and use approximate lake-area changes for two intervals (1990 to 2005 and 2005 to 2018) by Wang et al. (2020). We discarded newer, higher resolved DEMs to minimise data gaps and artefacts. Overall, we considered six topographic, synoptic, and glaciological predictors (Fig. 2, Table 2).

Bayesian multi-level logistic regression
We used logistic regression to learn the probability of whether a given lake in the HKKHN had a reported GLOF in the past 190 four decades. This method was pioneered for moraine-dammed lakes in British Columbia (McKillop and Clague, 2007).
Logistic regression estimates a binary outcome y from the optimal linear combination of p weighted predictors x = {x1, …, xp}. The probability y = PGLOF that lake i had released a GLOF is expressed as: where Here α0 is the intercept and = { 1 , … , } T are the p predictor weights (Gelman and Hill, 2007). The logit function S -1 (x) 200 describes the odds on a logarithmic scale (the log-odds ratio) such that a unit increase in predictor xm raises the log-odds ratio by an amount of , with all other predictors fixed. We used standardised data to ensure that the weights measure the relative contributions of their predictors to the classification, whereas the intercept expresses the base case for average predictor values.
Our strategy was to explore commonly reported predictors of GLOF susceptibility and dam stability as candidate predictors Table 1, Table 2). We further acknowledged that data on moraine-dammed lakes in the HKKHN are structured, 205 reflecting, for example, the variance in topography and synoptic regime such as the summer monsoon in the eastern HKKHN and westerlies in the western HKKHN. Different data sources, collection methods, and resolutions also add structure. This structure is routinely acknowledged, often raised as a caveat, but rarely treated, in GLOF studies. Ignoring such structure can lead to incorrect inference by bloating the statistical significance of irrelevant or inappropriate model parameter estimates (Austin et al., 2003). To explicitly address this issue, we chose a multi-level logistic regression as a compromise between a 210 single pooled model and individual models for each group in the data ( Fig. 3;Gelman and Hill, 2007;Shor et al., 2007). We recast Eq. (2) using a group index j:

220
where µα is the mean and σα is the standard deviation of the group-level intercepts αj that are learned from all data and inform each other via the model hierarchy. We used a Bayesian framework (Kruschke and Liddell, 2018) by combining the likelihood of observing the data with prior knowledge from previous GLOF studies (Fischer et al., 2020). The small number of reported GLOFs introduces strong imbalance to our data, given that some regions, and hence levels, had few or no reported GLOFs.
Although this would be problematic in most other modelling approaches, Bayesian multi-level models are well suited for this 225 kind of imbalanced training data (Gelman and Hill, 2007;Shor et al., 2007;Stegmueller, 2013).
We used the statistical programming language R with the package brms, which estimates joint posterior distributions using a Hamiltonian Monte Carlo algorithm and a No-U-Turn Sampler (NUTS) (Bürkner, 2017). We ran four chains of 1500 samples after 500 warm-up runs each, and checked for numerical divergences or other pathological issues. We only considered models with all values of Ȓ <1.01, a measure of numerical convergence of sampling chains, to avoid unbiased posterior distributions 230 (Nalborczyk et al., 2019).
Unless stated otherwise, we used a weakly informative half Student-t distribution with three degrees of freedom and a scale parameter of 10 for the standard deviations of group-level effects (Table 3; Bürkner, 2017;Gelman, 2006). At the population level, we chose weakly informative priors for the intercept and coefficients for which we had no other prior knowledge. We encoded this lack of knowledge with a prior Cauchy distribution centred at zero and with scale 2.5, following the 235 recommendation by Gelman et al. (2008). Rapidly growing moraine-dammed lakes are a widely used predictor of high GLOF susceptibility (Aggarwal et al., 2016;Allen et al., 2019;Bolch et al., 2011;Ives et al., 2010;Mergili and Schneider, 2011;Prakash and Nagarajan, 2017). We encoded this notion in a prior Gaussian distribution with one unit mean and standard deviation, hence shifting more probability mass towards positive regression weights without excluding the possibility of negative weight estimates (Table 3). 240 We estimated the predictive performance of all models with leave-one-out (LOO) cross-validation as part of the brms package (Bürkner, 2017). LOO values like the expected log predictive density (ELPD) summarise the predictive error of Bayesian 245 models, similar to the Akaike Information Criterion or related metrics of model selection (Vehtari et al., 2017). They are based on the log-likelihood of the posterior simulations of parameter values (Vehtari et al., 2017).

Elevation-dependent warming model
Our first model addresses the notion of elevation-dependent warming (EDW) by considering lake elevation as a grouping 250 structure in the data. The model further assumes that the GLOF history of a given lake is a function of its area A and net change ΔA. This dependence differs up to a constant, i.e. the varying model intercept, across elevation bands z that we define here in five quantile grouping levels (Fig. 1). The model intercept may vary across these elevation bands, whereas lake area (in 2005) and its net change remain fixed predictors. In essence, this varying-intercept model acknowledges that glacial lakes in the same elevation band may have had a common baseline susceptibility to GLOFs in the past four decades. The indicator variable ΔA 255 records whether a given lake had a net growth or shrinkage between 1990 and 2018: 260 where index z identifies the elevation band.
We obtain posterior estimates of βA = 0.79 +0.27 /-0.27 and βΔA = 0.48 +0.73 /-0.72 (95% highest density interval, HDI) that indicate that larger lakes are more likely classified as having had a GLOF, whereas net growth or shrinkage has ambivalent weight as its HDI includes zero (Fig. 4, Fig. 5, Table 4). On the population level, the low spread of intercepts (σz = 0.29 +0.68 /-0.28) estimated for each of the five elevation bands shows that elevation effects modulate the pooled model only minutely. These posterior 265 effects are positive for the lower elevation bands, but negative for the higher elevation bands. Thus, the mean posterior probability of a GLOF history, PGLOF, under this model increases slightly for lakes in lower elevations and with larger surface area in 2005. We also observe that PGLOF <0.5 regardless of reported lake elevation, and that the associated uncertainties are higher for larger lakes.

280
Black dots are lake data with (no) reported GLOF records. Thick coloured lines are mean fits, and colour shades encompass the associated 95% HDIs.

Forecasting model
Our second model refines our approach by including only relative changes in lake area before the reported GLOFs happened. 285 We can use this model to fore-or hindcast historic GLOFs in our inventory. Here we use lake area A (in 2005) and its relative change A *a from 1990 to 2005 as predictors of eleven GLOFs that occurred between 2005 and 2018 across the five elevation bands. We assume that larger and deeper lakes are more robust to relative size changes and thus also include a multiplicative interaction term between lake area and its change: 290 = S( + + * * + × * × * ) We find that lake area has a credible positive posterior weight of βA = 0.86 +0.44 /-0.43, hence greater lakes are more likely to having had a GLOF between 2005 and 2018. The weight of relative lake-area change in the 15 years before is ambiguous ( * = -0.04 +0.76 /-0.67) and so is the interaction ( × * = -0.16 +0.41 /-0.51). On average, however, relative increases in lake area 295 between 1990 and 2005 slightly decrease PGLOF. Unlike in the elevation-dependent warming model, the effects of elevation bands are less clear, while the uncertainties are more pronounced and highest for larger and shrinking lakes (Fig. 4, Fig. 6).

Glacier-mass balance model
Besides elevation, our third model considers the average historic glacier-mass balances across the HKKHN. The model assumes that mean ice losses ∆ add a distinctly regional structure to the susceptibility to GLOFs in the past four decades, given that accelerated glacier melt may raise GLOF potential (Emmer, 2017;Richardson and Reynolds, 2000). We use our seven study area regions as group-levels r and their average glacier-mass balance, derived from Brun et al. (2017), as a group-310 level predictor Δmr.We use the seven RGI regions as defined by Brun et al. (2017) as group-levels r and their average glaciermass balance as a group-level predictor Δmr. Our pooled predictors are the relative change of lake area A *b from 2005 to 2018 (to ensure a comparable time interval) and the catchment area C upstream of each lake. We replace lake area by its upstream catchment area, which is less prone to change, but well correlated to lake area.
This model returns a positive weight for catchment area (βC = 0.85 +0.50 /-0.50) and a negative weight for relative lake-area changes ( * = -0.69 +0.64 /-0.61), whereas the effect of the mean glacier-mass balance remains inconclusive (γr = -2.98 +4.87 /-6.70). On the 320 basis of higher standard deviations, we learn that effects of glaciological regions vary more than those of elevation bands (σr = 0.81 +1.60 /-0.78 and σz = 0.48 +1.19 /-0.47). When training this model on a subset of glacial lakes with documented GLOFs that happened after 2000 (i.e. including only those in the interval covered by glacier-mass balance data), posterior estimates of σr increase to 1.11 +1.77 /-1.03, further underlining our result that glacier-mass balance credibly affects PGLOF. This is also reflected in the posterior distributions across the glacier-mass balance regions (Fig. 4) as well as the calculated group-level effects. This 325 model has the highest values of PGLOF for average lakes (i.e. all average predictor values combined) in the Nyainqentanglha Mountains and the Eastern Himalaya (Fig. 4). In contrast to the forecasting model, we observe that increases in lake area now credibly depress PGLOF (Fig. 7). Fig. 1). Black dots are lake data with (no) reported GLOF records for the interval 2005 to 2018. Thick coloured lines are mean fits, and colour shades encompass the associated 95% HDIs.

Monsoonality model
Our last model explores a synoptic influence on GLOF susceptibility by grouping the data by the summer proportion of mean annual precipitation and thus by approximate monsoonal contribution. We defined five monsoonality levels based on quantiles of the annual proportions of summer precipitation (Fig. 1). We use relative lake-area change A *c between 1990 and 2018, and catchment area C as population-level predictors, as well as the additional grouping by regional glacier-mass balance: 340 where index M identifies the monsoonality group. We find that larger catchment areas (βC = 0.82 +0.46 /-0.48) and lakes with 345 relative shrinkage ( * = -0.63 +0.59 /-0.59) credibly raise PGLOF (Fig. 4, Fig. 8). Higher standard deviations show that regional effects vary more for the mean glacial-mass balance than for monsoonality (σr = 0.79 +1.59 /-0.76 and σM = 0.40 +1.04 /-0.39), although both hardly change the pooled model trend.  Fig. 1). Black dots are lake data with (no) reported GLOF records for the interval 1990 to 2018. Thick coloured lines are mean fits, and colour shades encompass the associated 95% HDIs.

Model performance and validation
We estimate the performance of our models in terms of the posterior improvement of our prior chance of finding a lake with known outburst in the past four decades in our inventory by pure chance. We compare the posterior predictive mean PGLOF 360 with a mean prior probability that we estimate from the ~1% proportion of lakes with known GLOFs in our training data. We measure what we have learned from each model in terms of the log-odds ratio that readily translates into probabilities using Eq. (3). A positive log-odds ratio means that we obtain a higher posterior probability of attributing a historic GLOF to a given lake compared to a random draw. Negative log-odd ratios indicate lakes for which the posterior probability of a reported GLOF is lower than the prior probability. Based on this metric, all models have higher true positive than true negative rates. For a 365 prior probability informed by the historic frequency of GLOFs, the models have at least about 80% true positives, and at least 70% true negatives on average (Fig. 9, Table 5).
The values of the LOO cross-validation of the predictive capabilities show that the EDW model formally has the least favourable, i.e. higher, values for both LOO metrics (Table 5). This is potentially due to the different true positives counts in the training data sets. However, the range of estimated ELPD values between the remaining three models is small (ΔELPD = 370 1.9).

Figure 9: Average posterior log-odds ratios for true positives TP (negatives, TN), i.e. lakes with (without) a GLOF in the period 1981 -2018 (a and e) and 2005 -2018 (b-d and f-h) on the x axis for the four different models. The log-odds ratios describe here the ratio
375 of the mean posterior over the mean prior probability of classifying a given lake as having had a GLOF. We estimate the mean prior probability from the relative frequency of GLOFs in the datasets; EDW = elevation-dependent warming model.

Topographic and climatic predictors of GLOFs
We used Bayesian multi-level logistic regression to test whether several widely advocated predictors of GLOF susceptibility and glacial lake stability are credible predictors of at least one outburst in the past four decades. All four models that we considered identify lake area and catchment area as predictors with weights that credibly differ from zero with 95% 385 probability. Our model results quantitatively support qualitative notions of several basin-wide studies in the HKKHN (Bolch et al., 2011;Ives et al., 2010;Mergili andSchneider, 2011) andelsewhere (McKillop andClague, 2007), which proposed that larger moraine-dammed lakes have a higher potential for releasing GLOFs.
We also found that changes in lake area have partly inconclusive influences in the models. Two exceptions are the negative weight of lake-area changes * and * in the glacier-mass balance model and in the monsoonality model, regardless of the 390 differing intervals that these changes were determined for (Table 4). While this result formally indicates that shrinking lakes are more likely to be classified as having had a historic GLOF, the period over which these lake-area changes are valid (2005 to 2018) overlaps with the timing of eleven recorded GLOFs (Eq. 9). In other words, the lake shrinkage could be a direct consequence of these GLOFs instead of vice versa. Nonetheless, our results indicate that lake-area changes, either absolute or directional, are somewhat inconclusive in informing us whether a given lake has a recent GLOF history. One advantage of our 395 Bayesian approach is that we can express the role of lake-area changes in GLOF susceptibility by choosing different highest density intervals. For example, if we adopted a narrower, say 80% HDI for ΔA, we could be 80% certain that net lake-area growth increased PGLOF under the elevation-dependent warming model (Eq. 6). However, in the forecasting model, in which we tested whether differing data observation periods have any credible effects, the influence of lake-area change remains negligible even for <50% HDIs. We thus conclude that relative lake-area change before outburst is an inconclusive predictor. 400 This result contradicts the assumptions made in many previous studies that argued that rapidly growing lakes are the most prone to sudden outburst (Aggarwal et al., 2016;Bolch et al., 2011;Ives et al., 2010;Mergili and Schneider, 2011;Prakash and Nagarajan, 2017;Rounce et al., 2016;Wang et al., 2012).
The role of elevation in GLOF predictions is also less pronounced than that of lake or catchment area, at least as a group level.
The weights of the elevation-dependent warming model indicate that lower (higher) lakes are slightly more (less) likely to 405 have had a historic GLOF (Fig. 4), but hardly warrant any better model performance compared to the pooled (or elevationindependent) model. In the forecasting model, however, the contributions of lake elevation to PGLOF are devoid of any systematic pattern and likely reflect several, potentially combined, drivers (Fig. 4). This model was trained on fewer GLOFs and the imbalance in the data introduces more uncertainties in terms of broad 95% HDIs. Clearly, the role of elevation may need more future investigation. In terms of elevation bands, it hardly seems to aid GLOF detection with the models used here. 410 Similarly, Emmer et al. (2016) reported that lake elevation was hardly affecting GLOF hazard in the Cordillera Blanca, Peru.
Judging from the regionally averaged glacier-mass balances, our models predict the highest GLOF probabilities in the Nyainqentanglha Mountains and the Eastern Himalaya, which have had the highest historic GLOF counts (Fig. 1). The timing and seasonality of snowfall affects how glaciers respond to rising air temperatures. Observed frequencies and predicted probabilities of historic GLOFs are lowest for several glaciers with positive mass balance in the Karakoram and Western 415 Himalayas (Fig. 1, Fig. 10). Most moraine-dammed lakes in the HKKHN, however, are fed by glaciers with negative mass balances that likely help to elevate GLOF potential through increased meltwater input and glacier-tongue calving rates (Emmer, 2017;Richardson and Reynolds, 2000). This is also supported by the findings of King et al. (2019), which imply that higher rates of mass loss of lake-terminating glaciers since the 1970s might have also led to increased meltwater input into lakes adjacent to their termini. More than 70% of all lakes that burst out in the past four decades were in contact to their parent 420 glaciers (Veh et al., 2019). However, systematically recorded time series of glacier fronts are even harder to come by when compared to systematic measurements of changes in glacial-lake areas. Given that the regional glacier-mass balance is linked to synoptic precipitation patterns (Kapnick et al., 2014;King et al., 2019;Krishnan et al., 2019), our glacier-mass balance model highlights that the regional ice loss outweighs the role of monsoonality in terms of higher changes to the group-level intercepts for comparable mean PGLOF and associated uncertainties (Fig. 4, Fig. 7, Fig. 8). 425 Our results offer insights into the links between historic GLOFs and the synoptic precipitation patterns. Richardson and Reynolds (2000) presumed that seasonal floods and GLOFs are both caused by high monsoonal precipitation and summer ablation. In contrast, our results indicate that the fraction of summer precipitation changes the predictive probabilities of historic GLOFs only marginally, at least at the group level, so that deviations from a pooled model for the HKKHN are minute when compared to the spread of posterior group-level intercepts in the other models (Fig. 4). In essence, our results underline 430 the need for exploring more the interactions of both precipitation and temperature as potential GLOF triggers. It may well be that seasonal timing of heavy precipitation events and type (rain or snow) at a given lake may be more meaningful to GLOF susceptibility than annual totals or averages. Whether our finding that glacier-mass balances driven by superimposed synoptic regimes credibly influence regional GLOF susceptibility in the HKKHN is applicable to other regions, for example the Cordillera Blanca in the South American Andes (Emmer et al., 2016;Emmer and Vilímek, 2014;Iturrizaga, 2011), also needs 435 further investigation.

Model Assessment
We consider our quantitative and data-driven approach as complementary to existing qualitative and basin-wide GLOF hazard appraisals. Our models cannot replace field observations that deliver local details on GLOF-disposing factors such as moraine or adjacent rock-slope stability, presence of ice cores, glacier calving rates, or surges. Our selection of predictors is a 440 compromise between widely used predictors of GLOF susceptibility and hazard and their availability as data covering the entire HKKHN. To this end, we used lake (or catchment) area and lake-area changes as predictors, and elevation, regional glacier-mass balance, and monsoonality as group levels of past GLOF activity of several thousand moraine-dammed lakes in the HKKHN. Among the many possible combinations of predictors and group levels we focused on those few combinations with minimal correlation among the input variables. We minimised the potential for misclassification by using a purely remote-445 sensing-based inventory of GLOFs, which reduces reporting bias for GLOFs too small to be noticed or happening in unpopulated areas: more destructive GLOFs are recorded more often than smaller GLOFs in remote areas (Veh et al., 2018(Veh et al., , 2019. We are thus confident that we trained our models on lakes with a confirmed GLOF history at the expense of discarding known outbursts predating the onset of Landsat satellite coverage in 1981. We acknowledge that climate products such as precipitation can have large biases because of orographic effects or climate circulation patterns and interpolation using 450 topography (Karger et al., 2017;Mukul et al., 2017). Cross-validation of CHELSA precipitation estimates with station data has a global mean coefficient of determination R 2 of 0.77, with regional variations between 0.53 and 0.90 (Karger et al., 2017).
By accounting for orographic wind effects, CHELSA products outperform previous global datasets such as the WorldClim (Hijmans et al., 2005), especially in the rugged HKKHN topography. We stress that we therefore used all climatic data as aggregated group-level variables to avoid spurious model results. At the level of individual lakes, we thus resorted only to size, 455 elevation, and upstream catchment area as more robust predictors.
Due to strong imbalance in our training data, we opted for prior vs. posterior log-odd comparison instead of commonly applied Receiver Operating Characteristics (ROC) in estimating the predictive capabilities of our models (Saito and Rehmsmeier, 2015). In our models, only few posterior estimates of PGLOF are >0.5 and they, thus, offer very conservative estimates of a GLOF history (Fig. 10). All models have wide 95% HDIs that attest a high level of uncertainty. This observation may be 460 sobering, but nevertheless documents objectively the minimum amount of accuracy that these simple models afford for objectively detecting historic outbursts.
The low fraction of lakes with a GLOF history (~1%) curtails a traditional logistic regression model and favours instead a Bayesian multi-level approach that can handle imbalanced training data and collinear predictors (Gelman and Hill, 2007;Hille Ris Lambers et al., 2006;Shor et al., 2007). We prefer the straight-forward interpretation of posterior regression weights to 465 random forest classifiers, neural networks or support vector machines (Caniani et al., 2008;Falah et al., 2019;Kalantar et al., 2018;Taalab et al., 2018). While these methods may perform better, they disclose little about the relationship between model inputs and outputs (Blöthe et al., 2019;Dinov, 2018); much of their higher accuracy is also linked to the overwhelming number of true negatives. Yet so far, multi-criteria decision analysis or decision-making trees have been the method of choice in GLOF hazard assessments, both in High Mountain Asia (Bolch et al., 2011;Prakash and Nagarajan, 2017;Rounce et al., 2016;Wang 470 et al., 2012) and elsewhere (Emmer et al., 2016;Emmer and Vilímek, 2014;Huggel et al., 2002;Kougkoulos et al., 2018).
While these methods strongly rely on expert judgement (Allen et al., 2019), a Bayesian logistic regression encodes any prior knowledge or constraints explicitly and reproducibly as probability distributions. Still, inconsiderate or inappropriate prior choices can introduce bias (Van Dongen, 2006;Kruschke and Liddell, 2018). Therefore, we carefully considered our choice of weakly informative priors for predictors with limited prior knowledge, following the guidelines concerning regression 475 models by Gelman (2006) and Gelman et al. (2008). We also cross-checked our results when applying varying prior choices and found negligible differences in the resulting posterior distributions.
To summarise, our simple classification models hardly support the notion that elevation or changes in lake area are straightforward predictors of a GLOF history, at least for the moraine-dammed lakes that we studied in the HKKHN. Lake size and regional differences in glacier-mass balance are items that future studies of GLOF susceptibility may wish to consider 480 further. The performance of these models is moderate to good if compared to a random classification, yet associated with high uncertainties in terms of wide highest density intervals. We underline that these uncertainties have rarely been addressed, let alone quantified, in previous work. One way forward may be to create ensembles of such models to improve their predictive capability instead of relying on any single model. We quantitatively investigated the susceptibility of moraine-dammed lakes to GLOFs in major mountain regions of High Asia. 490 We used a systematically compiled and comprehensive inventory of moraine-dammed lakes with documented GLOFs in the past four decades to test how elevation, lake area and its rate of change, glacier-mass balance, and monsoonality perform as predictors and group levels in a Bayesian multi-level logistic regression. Our results show that larger lakes in larger catchments have been more prone to sudden outburst floods, as have those lakes in regions with pronounced negative glacier-mass balance.
While elevation-dependent warming (EDW) may control a number of processes conducive to GLOFs, grouping our 495 classification by elevation bands adds little to a pooled model for the entire HKKHN. Historic changes in lake area, both in absolute and relative values, have an ambiguous role in these models. We observed that shrinking lakes favour the classification as GLOF-prone, although this may arise from overlapping measurement intervals such that the reduction in lake size arises from outburst rather than vice versa. In any case, the widely adapted notion that (rapid) lake growth may be a predictor of impending outburst remains poorly supported by our model results. Our Bayesian approach allows explicit probabilistic 500 prognoses of the role of these widely cited controls on GLOF susceptibility, but also attests to previously hardly quantified uncertainties, especially for the larger lakes in our study area. While individual models offer some improvement with respect to a random classification based on average GLOF frequency, we recommend considering ensemble models for obtaining more accurate and flexible predictions of outbursts from moraine-dammed lakes.

Author contributions
This study was conceptualised by all authors. While formal analysis and methodology were conducted by MF and OK, data curation was mainly carried out by GV. Visualisations of data and results, including maps, were prepared by GV, OK and MF. 515