Glacial lakes in the Hindu Kush–Karakoram–Himalayas–Nyainqentanglha (HKKHN) region have grown rapidly in number and area in past decades, and some dozens have drained in catastrophic glacial lake outburst floods (GLOFs). Estimating regional susceptibility of glacial lakes has largely relied on qualitative assessments by experts, thus motivating a more systematic and quantitative appraisal. Before the backdrop of current climate-change projections and the potential of elevation-dependent warming, an objective and regionally consistent assessment is urgently needed. We use an inventory of 3390 moraine-dammed lakes and their documented outburst history in the past four decades to test whether elevation, lake area and its rate of change, glacier-mass balance, and monsoonality are useful inputs to a probabilistic classification model. We implement these candidate predictors in four Bayesian multi-level logistic regression models to estimate the posterior susceptibility to GLOFs. We find that mostly larger lakes have been more prone to GLOFs in the past four decades regardless of the elevation band in which they occurred. We also find that including the regional average glacier-mass balance improves the model classification. In contrast, changes in lake area and monsoonality play ambiguous roles. Our study provides first quantitative evidence that GLOF susceptibility in the HKKHN scales with lake area, though less so with its dynamics. Our probabilistic prognoses offer improvement compared to a random classification based on average GLOF frequency. Yet they also reveal some major uncertainties that have remained largely unquantified previously and that challenge the applicability of single models. Ensembles of multiple models could be a viable alternative for more accurately classifying the susceptibility of moraine-dammed lakes to GLOFs.

Glacial lake outburst floods (GLOFs) involve the sudden release and downstream propagation of water and sediment from naturally impounded meltwater
lakes (Costa and Schuster, 1987; Emmer, 2017). About one third of the 25 000 glacial lakes in the Hindu Kush–Karakoram–Himalayas–Nyainqentanglha
(HKKHN) region are dammed by moraines, and some of these are potentially unstable (Maharjan et al., 2018). Such impounded meltwater can overtop or incise dams
rapidly with catastrophic consequences downstream (Costa and Schuster, 1987; Evans and Clague, 1994). High Mountain Asian countries are among the
most affected by these abrupt floods if considering both damage and fatalities (Carrivick and Tweed, 2016). For example, in June 2013, a GLOF from
Chorabari Lake in the Indian state of Uttarakhand caused

Current scenarios entail that atmospheric warming may change the susceptibility of HKKHN glacial lakes to sudden outburst floods: the IPCC's (Intergovernmental Panel on Climate Change) most recent
projections attribute the decay of low-lying glaciers and permafrost to increases in lake number and area because of rising air temperatures, more
frequent rain-on-snow events at higher elevations, and changes in precipitation seasonality (Hock et al., 2019). Air surface temperature in the HKKHN
rose by about 0.1

Frequently used predictors of GLOF susceptibility and hazard in the HKKHN.

Previous work on GLOF susceptibility and hazard in the region focused on identifying or classifying potentially unstable glacial lakes, including local case studies largely informed by fieldwork, dam-breach models (Koike and Takenaka, 2012; Somos-Valenzuela et al., 2012, 2014), and basin-wide assessments (Bolch et al., 2011; Mool et al., 2011; Rounce et al., 2016; Wang et al., 2011). GLOF hazard appraisals for the entire HKKHN, however, remain rare (Veh et al., 2020). Most basin-wide studies proposed qualitative to semi-quantitative decision schemes using selective lists of presumed GLOF predictors (Table 1; Rounce et al., 2016). Yet researchers have used subjective rules to choose these variables and associated thresholds, leading to diverging hazard estimates (Rounce et al., 2016). Expert knowledge has thus been essential in GLOF hazard appraisals despite an increasing amount of freely available climatic, topographic, and glaciological data. Statistical models can help to estimate the occurrence probability of GLOFs and thus reduce the inherent subjective bias (Emmer and Vilímek, 2013). For example, Wang et al. (2011) classified the outburst potential of moraine-dammed lakes on the southeastern Tibetan Plateau by applying a fuzzy consistent matrix method. They used as inputs the size of the parent glacier, the distance and slope between lake and glacier snout, and the mean steepness of the moraine dam and the glacier snout to come up with different nominal hazard categories. This and many similar qualitative ranking schemes are accessible to a broader audience and policy makers but are difficult to compare, and they potentially oversimplify uncertainties.

One way to deal with these uncertainties in a more objective way involves a Bayesian approach. Here we used this probabilistic reasoning with data-driven models. Specifically, we tested how well some of the more widely adopted predictors of GLOF susceptibility and hazard fare in a multi-level logistic regression that is informed more by data rather than by expert opinion. We checked how well this approach identifies glacial lakes in the HKKHN that had released GLOFs in the past four decades. Our method estimates the probability of correctly detecting historic GLOFs from a set of predictors which act as proxies subsuming various physical processes described as being relevant to GLOFs. Triggering mechanisms of these GLOFs are rarely reported, however. Thus, we discuss what more we can learn about how these historic GLOFs were linked to readily available measures of topography, monsoonality, and glaciological changes. Our model results provide a posterior probability of outburst conditioned on detection, and this may be used as a relative metric of GLOF release from a given lake. Therefore, our approach is an alternative to a formal assessment of moraine-dam stability, which is (geo-)technically feasible only at selected sites and at scales much finer than our regional and decadal focus.

Overview map of the HKKHN showing the distribution of moraine-dammed lakes in 1

We studied glacial lakes of the Hindu Kush–Karakoram–Himalayas–Nyainqentanglha (HKKHN) region that we defined here as the Asian mountain ranges between
16 to 39

Details on tested predictors and our reasoning for selection based on their commonly reported physical links to GLOF susceptibility.

Guided by these projections, we selected several widely used glacial lake susceptibility predictors (Table 2).

We used

Glacial

We also tested the impact of upstream

Similar to changes in lake area, glacier dynamics are frequently mentioned though rarely incorporated quantitatively in susceptibility appraisals
(Bolch et al., 2011; Ives et al., 2010). This motivated us to consider the average changes in

Meteorological drivers entered previous qualitative GLOF hazard appraisals mostly as (the probability of) extreme monsoonal precipitation events: the
Kedarnath GLOF disaster, for example, was triggered by intense surface runoff (Huggel et al., 2004; Prakash and Nagarajan, 2017). Heavy rainfall may
also trigger landslides or debris flows from adjacent hillslopes followed by displacement waves that overtop moraine dams (Huggel et al., 2004;
Prakash and Nagarajan, 2017). Elevated lake levels during the monsoon season also raise the hydrostatic pressure acting on moraine dams (Richardson
and Reynolds, 2000). Furthermore, different precipitation regimes and climatic preconditions may also influence moraine-dam failure mechanics (Wang
et al., 2012). Intense precipitation occurs in our study region largely during the summer monsoon, so we derived a synoptic measure of

Data sources and workflow; EDW

We extracted information on these characteristics for glacial lakes recorded in two inventories. First, we used the ICIMOD database of 25 614 lakes
manually mapped from Landsat imagery acquired in 2005

We used logistic regression to learn the probability of whether a given lake in the HKKHN had a reported GLOF in the past four decades. This method was pioneered for moraine-dammed lakes in British Columbia, Canada (McKillop and Clague, 2007). Logistic regression estimates a
binary outcome

Schematic comparison of global vs. multi-level logistic regression models.

Our strategy was to explore commonly reported predictors of GLOF susceptibility and dam stability as candidate predictors (Fig. 2, Tables 1 and 2). We further acknowledged that data on moraine-dammed lakes in the HKKHN are structured, reflecting, for example, the variance in topography and synoptic regime such as the summer monsoon in the eastern HKKHN and westerlies in the western HKKHN. Different data sources, collection methods, and resolutions also add structure. This structure is routinely acknowledged, often raised as a caveat, but rarely treated in GLOF studies. Ignoring such structure can lead to incorrect inference by bloating the statistical significance of irrelevant or inappropriate model parameter estimates (Austin et al., 2003). To explicitly address this issue, we chose a multi-level logistic regression as a compromise between a single pooled model and individual models for each group in the data (Fig. 3; Gelman and Hill, 2007; Shor et al., 2007).

We recast Eq. (

We used the statistical programming language R with the package brms, which estimates joint posterior distributions using a Hamiltonian Monte Carlo
algorithm and a No-U-Turn Sampler (NUTS; Bürkner, 2017). We ran four chains of 1500 samples after 500 warm-up runs each and checked for numerical
divergences or other pathological issues. We only considered models with all values of

Prior distributions for group- and population-level effects.

Unless stated otherwise, we used a weakly informative half Student's

We estimated the predictive performance of all models with leave-one-out (LOO) cross-validation as part of the brms package (Bürkner, 2017). LOO values like the expected log predictive density (ELPD) summarise the predictive error of Bayesian models similar to the Akaike information criterion or related metrics of model selection (Vehtari et al., 2017). They are based on the log-likelihood of the posterior simulations of parameter values (Vehtari et al., 2017).

Our first model addresses the notion of elevation-dependent warming (EDW) by considering lake elevation as a grouping structure in the data. The model
further assumes that the GLOF history of a given lake is a function of its area

Posterior pooled and group-level intercepts for the four models considered. EDW

Elevation-dependent warming model: posterior probabilities

Summary of the results of our four models. CI

We obtain posterior estimates of

Our second model refines our approach by including only relative changes in lake area before the reported GLOFs happened. We can use this model to
fore- or hindcast historic GLOFs in our inventory. Here we use lake area

Forecasting model: posterior probabilities

We find that lake area has a credible positive posterior weight of

Besides elevation, our third model considers the average historic glacier-mass balances across the HKKHN. The model assumes that mean ice
losses

Glacier-mass balance model: posterior probabilities

This model returns a positive weight for catchment area (

Monsoonality model: posterior probabilities

Our last model explores a synoptic influence on GLOF susceptibility by grouping the data by the summer proportion of mean annual precipitation and
thus by approximate monsoonal contribution. We defined five monsoonality levels based on quantiles of the annual proportions of summer precipitation
(Fig. 1). We use relative lake-area change

Average posterior log-odds ratios for true positives (TP) (true negatives, TN), i.e. lakes with (without) a GLOF in the period 1981–2018 (

Overview of model validation measures for the predictive capabilities of our models. LOOIC

We estimate the performance of our models in terms of the posterior improvement of our prior chance of finding a lake with known outburst in the past
four decades in our inventory by pure chance. We compare the posterior predictive mean

The values of the LOO cross-validation of the predictive capabilities show that the EDW model formally has the least favourable, i.e. higher, values
for both LOO metrics (Table 5). This is potentially due to the different true positive counts in the training datasets. However, the range of
estimated ELPD values between the remaining three models is small (

We used Bayesian multi-level logistic regression to test whether several widely advocated predictors of GLOF susceptibility and glacial lake stability
are credible predictors of at least one outburst in the past four decades. All four models that we considered identify

We also found that

The role of

Mean posterior probabilities of HKKHN glacial lakes for having had a GLOF history (

Judging from the regionally averaged

Our results offer insights into the links between historic GLOFs and the

We consider our quantitative and data-driven approach as complementary to existing qualitative and basin-wide GLOF hazard appraisals. Our models
cannot replace field observations that deliver local details on GLOF-disposing factors such as moraine or adjacent rock-slope stability, presence of
ice cores, glacier calving rates, or surges. Our selection of predictors is a compromise between widely used predictors of GLOF susceptibility and
hazard and their availability as data covering the entire HKKHN. To this end, we used lake (or catchment) area and lake-area changes as predictors, as well as elevation, regional glacier-mass balance, and monsoonality as group levels of past GLOF activity of several thousand moraine-dammed lakes in the
HKKHN. Among the many possible combinations of predictors and group levels we focused on those few combinations with minimal correlation among the
input variables. We minimised the potential for misclassification by using a purely remote-sensing-based inventory of GLOFs, which reduces reporting
bias for GLOFs too small to be noticed or happening in unpopulated areas: more destructive GLOFs are recorded more often than smaller GLOFs in remote
areas (Veh et al., 2018, 2019). We are thus confident that we trained our models on lakes with a confirmed GLOF history at the expense of discarding
known outbursts predating the onset of Landsat satellite coverage in 1981. We acknowledge that climate products such as precipitation can have large
biases because of orographic effects or climate circulation patterns and interpolation using topography (Karger et al., 2017; Mukul et al.,
2017). Cross-validation of CHELSA precipitation estimates with station data has a global mean coefficient of determination

Due to strong imbalance in our training data, we opted for a prior vs. posterior log-odd comparison instead of commonly applied receiver operating
characteristics (ROCs) in estimating the predictive capabilities of our models (Saito and Rehmsmeier, 2015). In our models, only a few posterior
estimates of

The low fraction of lakes with a GLOF history (

To summarise, our simple classification models hardly support the notion that elevation or changes in lake area are straightforward predictors of a GLOF history, at least for the moraine-dammed lakes that we studied in the HKKHN. Lake size and regional differences in glacier-mass balance are items that future studies of GLOF susceptibility may wish to consider further. The performance of these models is moderate to good if compared to a random classification, yet it is associated with high uncertainties in terms of wide highest density intervals. We underline that these uncertainties have rarely been addressed, let alone quantified, in previous work. One way forward may be to create ensembles of such models to improve their predictive capability instead of relying on any single model.

We quantitatively investigated the susceptibility of moraine-dammed lakes to GLOFs in major mountain regions of High Asia. We used a systematically compiled and comprehensive inventory of moraine-dammed lakes with documented GLOFs in the past four decades to test how elevation, lake area and its rate of change, glacier-mass balance, and monsoonality perform as predictors and group levels in a Bayesian multi-level logistic regression. Our results show that larger lakes in larger catchments have been more prone to sudden outburst floods, as have those lakes in regions with pronounced negative glacier-mass balance. While elevation-dependent warming (EDW) may control a number of processes conducive to GLOFs, grouping our classification by elevation bands adds little to a pooled model for the entire HKKHN. Historic changes in lake area, both in absolute and relative values, have an ambiguous role in these models. We observed that shrinking lakes favour the classification as GLOF-prone, although this may arise from overlapping measurement intervals such that the reduction in lake size arises from outburst rather than vice versa. In any case, the widely adapted notion that (rapid) lake growth may be a predictor of impending outburst remains poorly supported by our model results. Our Bayesian approach allows explicit probabilistic prognoses of the role of these widely cited controls on GLOF susceptibility but also attests to previously hardly quantified uncertainties, especially for the larger lakes in our study area. While individual models offer some improvement with respect to a random classification based on average GLOF frequency, we recommend considering ensemble models for obtaining more accurate and flexible predictions of outbursts from moraine-dammed lakes.

This study is based on freely available data. Shuttle Radar Topography Mission (SRTM) data are available from the US Geological Survey (

This study was conceptualised by all authors. While formal analysis and methodology were conducted by MF and OK, data curation was mainly carried out by GV. Visualisations of data and results, including maps, were prepared by GV, OK, and MF. MF prepared the original manuscript; OK, GV, and AW reviewed and edited the writing.

The authors declare that they have no conflict of interest.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This research was funded by the Deutsche Forschungsgemeinschaft (DFG) via the graduate research training group NatRiskChange at the University of Potsdam (

This research has been supported by the Deutsche Forschungsgemeinschaft (grant nos. GRK 2043/1 and GRK 2043/2).

This paper was edited by Tobias Bolch and reviewed by Adam Emmer and Holger Frey.