the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A Bayesian hierarchical model for glacial dynamics based on the shallow ice approximation and its evaluation using analytical solutions
Giri Gopalan
Birgir Hrafnkelsson
Guðfinna Aðalgeirsdóttir
Alexander H. Jarosch
Finnur Pálsson
Bayesian hierarchical modeling can assist the study of glacial dynamics and ice flow properties. This approach will allow glaciologists to make fully probabilistic predictions for the thickness of a glacier at unobserved spatiotemporal coordinates, and it will also allow for the derivation of posterior probability distributions for key physical parameters such as ice viscosity and basal sliding. The goal of this paper is to develop a proof of concept for a Bayesian hierarchical model constructed, which uses exact analytical solutions for the shallow ice approximation (SIA) introduced by Bueler et al. (2005). A suite of test simulations utilizing these exact solutions suggests that this approach is able to adequately model numerical errors and produce useful physical parameter posterior distributions and predictions. A byproduct of the development of the Bayesian hierarchical model is the derivation of a novel finite difference method for solving the SIA partial differential equation (PDE). An additional novelty of this work is the correction of numerical errors induced through a numerical solution using a statistical model. This errorcorrecting process models numerical errors that accumulate forward in time and spatial variation of numerical errors between the dome, interior, and margin of a glacier.
 Article
(2521 KB) 
Supplement
(369 KB)  BibTeX
 EndNote
The shallow ice approximation (SIA) is a nonlinear partial differential equation (PDE) that describes ice flow when glacier thickness is relatively small compared to the horizontal dimensions. Derived from the principle of mass conservation, the SIA PDE depends on two key physical parameters: ice viscosity and basal sliding (sometimes described as basal friction or drag). The primary objective of this paper is to develop a Bayesian hierarchical model (BHM) for glacier flow utilizing the framework espoused by Wikle (2016) and Cressie and Wikle (2015), which allows one to (1) infer ice viscosity and basal sliding parameters and (2) make probabilistic predictions for glacial thickness at unobserved spatiotemporal coordinates. This BHM relies upon a finite difference scheme for solving the SIA that is inspired by the Lax–Wendroff method (Hudson, 1998). To validate this BHM, we utilize exact analytical solutions from Bueler et al. (2005). Hence, in addition to the development of a BHM for shallow glaciers, this paper serves as a case study for the strategy of using exact analytical solutions to validate or tune BHMs governed by physical dynamics. Moreover, the BHM developed can be applied to the general “physical–statistical” problem (Berliner, 2003). This BHM is verified and diagnosed through a combination of assessments of posterior probability intervals, checks of predictive accuracy for glacial thickness prediction, and a comparison between observed and expected errors due to the numerical solution of the SIA.
1.1 An overview of Bayesian modeling and BHMs
Before describing how BHMs are used in physical–statistical models, particularly for geophysical problems, a very brief overview of Bayesian modeling and Bayesian hierarchical modeling is given for the uninitiated reader. A main component of Bayesian statistics is the use of probability distributions to model parameters thought to be fixed quantities (i.e., scientific constants); this assumption allows one to use rules of conditional probability (i.e., Bayes' theorem) to derive probability distributions for scientific quantities of interest, such as physical constants or predictions of future quantities of a system being studied. Typically, the major assumptions required as input to the analysis are prior distributions for parameters as well as a probabilistic model for the data. The output is a probability distribution for parameters or predictions conditional on data that have been collected or observed; canonically, this is referred to as the posterior distribution.
A BHM is a Bayesian model in which the probabilistic model for data is specified in a hierarchy. Working with such a hierarchy has a number of advantages – it is usually easier to conceptualize the probabilistic model for the data, and it is also easier to model various parts of a system of interest modularly instead of all at once. Such an approach is conducive to the construction of a probabilistic model that tightly corresponds to a scientific system of interest, which is naturally thought of in separate components or modules. In a BHM, the rules of conditional probability can be used to specify the relevant distributions. For example, let us consider a mock system that has parameter vector θ, an intermediate unobserved vector S, and observations Y. θ might be statistical or physical parameters, S could be a quantity of scientific interest, and Y could be noisy observations of S. A schematic for such a model is given in Fig. 1, and the joint probability distribution is
The distribution p(θ) represents prior beliefs about parameters before data are collected, while p(Sθ) represents prior knowledge or assumptions for how S is generated given parameters. For instance, this prior knowledge could entail clustering or some dependence between the elements of S. The process that models Y conditional on S and θ is $p\left(\mathit{Y}\right\mathit{S},\mathit{\theta})$. The posterior distribution of scientific quantities of interest, $p(\mathit{\theta},\mathit{S}\mathit{Y})$, is proportional to $p(\mathit{Y},\mathit{S},\mathit{\theta})$ by Bayes' theorem. Estimates and assessments of uncertainty of scientific parameters and quantities can be extracted from the posterior distribution.
1.2 An overview of physical–statistical modeling with BHMs
The case for applying Bayesian hierarchical modeling and methodology in geophysics is strongly made by Berliner (2003), which he describes as “physical–statistical modeling”. Particularly, employing the Bayesian hierarchical approach has the primary advantage of incorporating all relevant sources of uncertainty and randomness into one coherent probabilistic framework. The sources typically modeled together are (1) measurement errors in the data collection process; (2) lack of full knowledge of the precise functional form of the underlying physical equations describing the physical phenomenon being modeled, or else simplification of the physical system description; (3) numerical errors induced while approximating the solution to a system of PDEs; and (4) lack of precise knowledge of fundamental parameters (constants) in the underlying PDEs describing said phenomenon. In the Bayesian hierarchical framework (Berliner, 1996; Cressie and Wikle, 2015; Wikle, 2016) each of these sources of uncertainty is modeled by conditioning on the appropriate quantities, and inference is performed by sampling from or approximating the posterior distribution (the distribution of the unknown quantities of interest conditional on the observed data).
At the highest level of a BHM, prior probability distributions are laid out for the physical parameters of interest. At the intermediary level, a probability distribution for the physical process of interest is laid out conditional on the parameters, which is typically motivated by a numerical scheme for solving PDEs. In particular, this level may be modeled as the sum of the output from a numerical solver and an errorcorrecting process. Finally, at the observed level, a probability distribution is set forth for the observed data conditional on the latent physical process and other relevant measurement parameters, which include variances of measuring procedures or devices. The product of these probability distributions specifies the joint distribution of all relevant quantities, which is proportional to the posterior distribution by the definition of conditional probability. While a traditional analysis may handle each of these disparate sources of uncertainty in an ad hoc and disjointed fashion, the Bayesian hierarchical approach leverages probability measures to cohesively model major sources of uncertainty and undertake inference in a principled manner. Figure 2 diagrams what a prototypical physical–statistical Bayesian hierarchical model might look like.
While the BHM approach to physical–statistical problems offers many advantages, it is not an infallible approach. In particular, while constructing a BHM may be straightforward, actually fitting a BHM to data can be computationally difficult. In the analysis that follows, there are only one to two physical parameters and the likelihood function is tractable, so posterior computation is not difficult. In more complex scenarios with many physical parameters (e.g., a basal sliding field with a parameter for each grid point), it becomes more difficult to compute the posterior or draw samples from it. There are now many new tools, however, for Bayesian inference of complicated and high dimensional posterior distributions, such as Stan (Stan Development Team, 2018) and INLA (Rue et al., 2017). Another potential difficulty in using BHMs for physical–statistical problems is that solving for a set of dynamical equations with a numerical method can be computationally onerous, generally speaking; while this is not a detriment in the work that follows, this can be a problem for posterior computation. One way to circumvent this issue is to emulate a numerical solver, using techniques as in Hooten et al. (2011). Another methodology that can be used to efficiently solve PDEs using Bayesian numerical analysis comes from Owhadi and Scovel (2017). Finally, Calderhead et al. (2008) suggest methodology to avoid explicitly solving ordinary differential equations by using Gaussian processes.
To put the contributions of this work into context, we briefly review glaciology papers that have employed Bayesian modeling. In Berliner et al. (2008), a Bayesian hierarchical approach is used to model ice streams in one spatial dimension, and an errorcorrecting process is utilized to account for a simplification in the physical model. A combination of Markov chain Monte Carlo and empirical Bayes methodology is used to fit the model, and basal shear stress and resistive stresses are included. Furthermore, wavelets are used for dimensionality reduction purposes in order to make the computations more feasible. In Pralong and Gudmundsson (2011), a Bayesian model is constructed for an ice stream where the likelihood and prior are Gaussian. The observed data are surface topography, horizontal and vertical surface velocities, and the latent system state is basal topography and slipperiness. The goal is to infer the system state given the observed data, and ultimately a maximum a posteriori point estimate is used for inference in conjunction with an iterative method for posterior maximization. Physics is incorporated by solving for the steady state solution with a finite element method (FEM) solver, given the system state. In Brinkerhoff et al. (2016) a flowline model of the SIA is considered with vertically integrated velocities. Gaussian process priors are used for all unknowns, and the Metropolis–Hastings algorithm is used to fit the model. The approach yields convincing results in simulations and a real data set. In Isaac et al. (2015), numerical methods are presented for solving a nonlinear Stokes equation boundary value problem for an ice sheet in Antarctica. The method ultimately uses a low rank approximation to a covariance matrix for the posterior distribution of a basal parameter field. Finally, and perhaps most directly related to this research, in Minchew et al. (2015) interferometric synthetic aperture radar (InSAR) is used to determine velocity fields at Langjökull and Hofsjökull in early June 2012. The velocity directions match the surface gradient, but magnitudes do not appear to coincide with the theoretical predictions of other authors (likely due to the inappropriate modeling of basal sliding).
The same approach within this work can be used for nonSIA problems in cryosphere science, and the Bayesian hierarchical model does not necessitate analytical solutions; the analytical solutions are used for the evaluation of the particular BHM in the paper based upon the SIA. However, in general, the biggest difficulty will be in developing a statistical errorcorrecting process that appropriately models numerical errors for an arbitrary scenario, where a numerical solver for a different set of dynamical equations is used. In the SIA context, we can rely on prior studies of Bueler et al. (2005) to tell us something about how the numerical errors will look in the SIA case – i.e., spatial variation in the scale of numerical errors between the dome, interior, and margin. This error pattern will not hold in general for other geometries and systems, and so different prior studies must be utilized or, if these do not exist, the hierarchical model must be extended to include a more general model for the errorcorrecting process (e.g., a spatially varying field for the log of the scale of numerical errors with a Gaussian process prior).
The main differentiating contribution of this paper is to utilize the exact analytical solutions from Bueler et al. (2005) to evaluate the BHM employed. An additional novelty is the derivation and utilization of a novel finite difference method for solving the SIA PDE that operates in two spatial dimensions; consequently, the Bayesian model employed also operates in two spatial dimensions, in addition to time. Finally, we explicitly model the errors due to a numerical solver with a spatiotemporal statistical process, which accounts for different scales of spatial variability within the dome, within the interior, and within the margin of the glacier, as well as accumulation of numerical errors forward in time.
2.1 Shallow ice approximation
The physics of glaciers is an extensive topic; hence, only the portions which are most relevant to this paper are described. The reader is pointed to the comprehensive works by Cuffey and Paterson (2010) and van der Veen (2017) for further reading on the subject. PDEs for glaciers are derived from the following considerations. First, glaciers are modeled as very slowly moving and viscous fluids. By applying the principle of mass conservation, the net ice flux moving in or out of an infinitesimal column of the glacier located at some spatial coordinate, plus the net mass change due to precipitation or melting, yields the change in the height of the column over an infinitesimal time interval. Such a heuristic argument provides a PDE in two dimensions for a glacier, with averaged velocities in two spatial dimensions. The PDE relates the time derivative of the thickness of the glacier to the flux and net mass change (i.e., mass balance). The main assumptions are that ice is isotropic and homogeneous and also that longitudinal and transverse stress terms can be ignored, which is reasonable when the overall thickness of the glacier is small in comparison to its width. Under these assumptions, the velocity of the ice is made up of two additive components. The first component of the velocity is based upon deformation due to gravity, which acts in the direction of steepest descent of the surface and is a function of the ice viscosity parameter. The second component of velocity also acts along the gradient of the glacier surface and is a function of the basal sliding parameter field. The formulations stem from Glen's flow law (Glen, 1958, 1955) and Weertman's sliding relation (Weertman, 1964).
Written in terms of glacial thickness, $H(x,y,t)$, the SIA PDE is
Here $H(x,y,t)$ is the thickness of the glacier at spatial coordinate (x,y) and time t, $\stackrel{\mathrm{\u203e}}{u}$ is the average velocity in the x direction, and $\stackrel{\mathrm{\u203e}}{v}$ is the average velocity in the y direction. This model is vertically integrated, and hence only two spatial dimensions are modeled. $R(x,y,t)$ is the bedrock elevation, which is assumed to be constant in time and thus can be written as R(x,y); $\dot{b}(x,y,t)$ is the mass balance field, B and C_{0}γ are physical parameters governing the viscosity and basal sliding; ρ governs the mass density of the ice; and finally n is Glen's flow law constant, typically set to 3. Initial conditions (i.e., $H(x,y,\mathrm{0})$) are assumed to be given, and the boundary condition H≥0 is assumed, just as in Table 2 of Bueler et al. (2005). Additional derivations and details on the SIA are covered in a variety of sources, including Fowler and Larson (1978), Hutter (1982), Hutter (1983), and Flowers et al. (2005).
Bueler et al. (2005)It is important to make explicit that there are some limitations of this PDE. Besides ignoring longitudinal and transverse stress terms, the PDE does not model subglacial hydrology, tunneling systems, jökulhlaups, or surges, the dynamics of which are believed to contribute to dynamics of glaciers as a whole. Nonetheless, one hopes these equations may serve as a first approximation for shallow glacier dynamics. In addition to dynamics, another important physical consideration of glaciers is the relationship between temperature and viscosity, which follows an Arrhenius relationship (Cuffey and Paterson, 2010). However, in the context of Icelandic glaciers like Langjökull, this is not consequential since they are temperate (i.e., their temperature is at melting point).
2.2 Bayesian hierarchical model
In this section, we provide an overview and setup of the BHM employed in addition to notation for the key parameters, both statistical and physical. The reader is referred, however, to Table 1 for a summary of the model parameters utilized and a schematic illustrating the BHM in Fig. 3. We use index i to refer to spatial coordinates (for this model space is assumed to be discretized into squares) and index j to refer to time coordinates. Furthermore, the notation ${\mathit{S}}_{.,j}$ refers to the surface elevation at all spatial coordinates for a particular time index j. Keeping in line with the Bayesian hierarchical modeling framework from Wikle (2016) and Cressie and Wikle (2015), we delineate the models used for the data level, process level, and parameter level. The primary inferential goals are to infer physical process parameters (i.e., ice viscosity and basal sliding) and to predict the height of the glacier at various time points and spatial locations besides those that have been observed (aligned to a grid for which we have bedrock and initial surface height conditions). Within the Bayesian framework, all inferential goals may be achieved by determining the posterior distribution of these quantities (i.e., their probability distributions conditioned on observed data).
At the data level, the observed height for each grid point is modeled with a normal distribution (denoted with the notation N(μ,τ^{2}), where μ is the mean and τ^{2} is the variance), where the mean is the physical process value, and the variance is assumed to be known. In particular it is assumed that ${\mathit{Y}}_{ij}\sim N({\mathit{S}}_{ij},{\mathit{\sigma}}^{\mathrm{2}})$, where Y_{ij} is the observed surface elevation of the glacier at location i and time index j, S_{ij} is the latent (i.e., unobserved) surface elevation at location i and time index j (equivalent to sum of the glacier thickness and bedrock level), and σ^{2} is the variance of the measurement errors for the surface height observations, a fixed and known quantity. The number of observed spatial indices is assumed to be much smaller than the number of total spatial indices modeled at the latent level.
At the process level, ${\mathit{S}}_{.,j}=f({\mathit{S}}_{\mathrm{0}},B,\dot{b},{C}_{\mathrm{0}}\mathit{\gamma},j)+{\mathit{X}}_{j}$, where f is a numerical solution to the SIA at time index j, and X_{j} is an errorcorrecting process at time index j. A finite difference version of the SIA PDE is described in full detail in Appendix A. In principle, however, the function f may be derived from other numerical solvers. Additionally, it should be made clear that f is the output of a numerical solver for the underlying dynamics. Also, S_{0} denotes the glacier surface elevation values at the initial time point, which are assumed to be known; e.g., with high precision light detection and ranging (lidar) initial conditions provided by the Institute of Earth Sciences at the University of Iceland (IESUI). ${\dot{b}}_{.,j}$ is the mass balance field for time index j at all the grid points, which is assumed to be fixed and known for the purpose of this analysis. B is the ice viscosity parameter and C_{0}γ is the basal sliding field, which itself is parameterized with μ_{max} as in Eq. (16) of Bueler et al. (2005) and, furthermore, is static in time. For compact notation, θ is used to refer to B in test cases B–D and (B,μ_{max}) jointly in test case E.
Since we believe numerical errors will accumulate over time (Bueler et al., 2005), we define the errorcorrecting process as follows: ${\mathit{X}}_{j+\mathrm{1}}={\mathit{X}}_{j}+{\mathit{\u03f5}}_{j+\mathrm{1}}$, where ϵ_{j+1} is MVN(0,Σ). (MVN stands for multivariate normal, and the first argument is the mean and the second is the covariance.) Σ is block diagonal, with three blocks for indices corresponding to the margin, interior, and dome of the glacier (the margin is defined as the last grid squares before the glacier drops to 0 thickness, and the dome is the origin grid square), respectively. Each block is defined from a squareexponential kernel with the same length scale, denoted by ϕ, but distinct marginal variances, ${\mathit{\sigma}}_{\mathrm{interior}}^{\mathrm{2}},$ ${\mathit{\sigma}}_{\mathrm{margin}}^{\mathrm{2}}$, and ${\mathit{\sigma}}_{\mathrm{dome}}^{\mathrm{2}}$. The motivation for using different marginal variance parameters is to account for the widely different errors exhibited at the dome, interior, and margin, as is demonstrated by Bueler et al. (2005) and Jarosch et al. (2013). This errorcorrecting process leads to a tractable likelihood function, as is shown in Appendix B.
Finally, at the parameter level, B and μ_{max} are endowed with truncated normal distributions as priors. B has a normal prior with mean $\mathrm{3.5}\times {\mathrm{10}}^{\mathrm{24}}$ and standard deviation $\mathrm{3}\times {\mathrm{10}}^{\mathrm{24}}$, truncated to have support $[\mathrm{1},\mathrm{70}]\times {\mathrm{10}}^{\mathrm{24}}$. μ_{max} has a normal prior with mean $\mathrm{3}\times {\mathrm{10}}^{\mathrm{11}}$ and standard deviation $\mathrm{1}\times {\mathrm{10}}^{\mathrm{11}}$, truncated to have support $[\mathrm{1},\mathrm{70}]\times {\mathrm{10}}^{\mathrm{12}}$. (Units are s^{−1} Pa^{−3} for ice viscosity and Pa^{−1} m s^{−1} for basal sliding.) The prior supports for B and μ_{max} provide plausible values for temperate ice caps.
It is prudent to discuss the motivations and justifications of the various modeling choices employed in the model previously delineated. The data level is assumed to have independent normal errors with fixed variance; this is justified because of the uniformity of the measuring technology used from site to site (e.g., digital GPS) and symmetry of errors. In contrast, the precise functional form of the data level is chosen somewhat arbitrarily as a Gaussian, which affords one analytical convenience. Similarly, the errorcorrecting process at the process level uses a zero mean Gaussian process with a parameterized covariance kernel (e.g., square exponential), mostly as an analytically manageable way to induce spatial correlation in the errorcorrecting process. Spatial correlation in numerical errors has been demonstrated, for example, in Bueler et al. (2005).
Moreover, it is appropriate to consider potential variations of this model for slightly different scenarios; naturally, these could fall into alternate choices of covariance kernel at the process level (e.g., Matérn, to allow for a less smooth errorcorrecting process) and varying errors at the data level, for example to account for compaction or densification that occurs between seasons. For the latter, a suggestion is to use conjugate inversegamma distributions for the variances, so that sampling can be accomplished with a Gibbs sampler. Additionally, as aforementioned, one can conceivably use any numerical solver for a PDE at the process level. Future variations may consider utilizing nonzero mean Gaussian processes for the errorcorrecting process, which may be more computationally costly yet perhaps more realistic. Generally, this model can be adapted to any science or engineering system that is driven by physically meaningful parameters, whose dynamics are solved by noisy numerical methods, and for which noisy and continuous data are collected with wellprobed errors.
The mathematical details for how to do posterior computation within this model are given in Appendix B, which includes a derivation of an approximation to the loglikelihood that allows for computational efficiency. In summary, we compute the posterior of physical parameters directly on a grid since there are at most two physical parameters, and we use samples from the posterior distribution of physical parameters to generate predictions for glacier thickness in the future.
3.1 Analytical solutions
In Bueler et al. (2005), analytical solutions to the SIA are presented as benchmarks for numerical solvers of the SIA. As opposed to using other benchmarks such as the EISMINT experiment (Payne et al., 2000), which itself is based on numerical modeling and hence subject to numerical errors, the benchmark solutions provided in this work can be treated as ground truth to compare to. (This is in the sense that these are exact solutions of the SIA, but it must be stressed that the SIA is an approximation of the true physical dynamics governing a glacier.) These analytical solutions serve as a basis for simulating data sets to validate the Bayesian hierarchical approaches developed in this paper. In other words, the exact analytical solutions provide the latent process in the BHM, conditioning on given initial conditions and mass balance functions. Hence to simulate data from the BHM, one can bypass the need to numerically solve the PDE and introduce errors.
We make use of four analytical solutions from Bueler et al. (2005) that are summarized here, but the reader is referred to the original paper for the exact mathematical formulation and derivation of these analytical solutions. All of the analytical solutions assume a flat bedrock. Test case B includes no mass balance or basal sliding, and, consequently, the motion of the glacier is only attributable to deformation due to gravity. Test case C makes use of a mass balance field that is inversely proportional to time and directly proportional to thickness, but there is no basal sliding field modeled. Similarly, test case D utilizes a mass balance field with no basal sliding field modeled. In distinction from test case C, however, the mass balance field of test case D is such that the overall solution for glacial thickness is periodic in time. Finally, in contrast to the other tests, test case E has a spatially varying basal sliding field, yet the overall solution is static in time. Note that test A was not utilized in this study because it is a steady state solution without a varying mass balance or basal sliding field.
3.2 Simulation study test details
Conditions of the simulation study have been chosen to closely emulate the data collected at Langjökull ice cap by the IESUI. In particular, 20 years of data are assumed, which is comparable to data provided by the IES. 25 fixed measurement sites are used for biannual surface elevation measurements, which are geographically distributed on the glacier in a manner that is comparable to the real data provided by the IESUI. Figure 4 illustrates the locations of these measurement sites on the glacier. Surface elevation measurements for these sites are taken twice a year (i.e., for summer and winter mass balance measurements). The surface elevation measurements are generated by adding Gaussian noise (zero mean, unit variance) to the analytical solutions at the spatiotemporal coordinates of the fixed measurement sites. The choice of unit variance is larger than the errors produced by digital GPS measurements. Remaining physical parameters were chosen using the values from Table 2 in Bueler et al. (2005) to allow for comparisons to this work and the EISMINT I experiment (Payne et al., 2000).
Validation and diagnostics of the BHM were achieved through a combination of an assessment of posterior probability intervals, a test of the predictive error of thickness values 100 years from the initial time point t_{0}, and a comparison between observed and expected values for numerical errors based on the errorcorrecting process utilized. As is discussed in more detail below, these assessments suggest that the BHM is useful for inference of posterior probability distributions for physical parameters, prediction of future glacial thickness values on the order of 100 years, and the modeling of numerical errors at the margin, interior, and dome of the glacier.
Table 2 contains posterior credibility intervals for ice viscosity in test cases B–D. A 3 SD credibility interval was computed with mean ±3 standard deviations of the posterior samples. In all of these test cases, the 3 SD credibility interval covers the actual ice viscosity. Furthermore, as is apparent in Table 3, the predictive error, relative to thickness values on the order of a kilometer, appears be small overall, particularly at the interior; predictive error is the root mean squared difference between predictions and the exact analytical values for each of the test cases. Note that test E was not included with the predictive checks since it is static in time. Consistent with Bueler et al. (2005) and Jarosch et al. (2013), however, errors are greatest at the margin and dome of the glacier (evident in Fig. 6). Nonetheless, the predictive distributions cover the actual thicknesses even at these extremes. This illustrates the utility of the BHM for accounting for errors induced by the numerical solution of the SIA. Additionally, an illustration comparing the posterior and prior distributions for test case D is shown in Fig. 7.
To investigate the frequentist properties of the posterior probability distribution for ice viscosity (i.e., its performance under repeated sampling of data), 500 simulations were completed under repeated sampling of the surface elevation data at the 25 fixed measurement sites for test cases B–D. The coverage of ice viscosity for a 3 SD interval was computed for each of the simulations, where coverage for a given interval is binary; either the actual parameter value is in the interval or it is not. For test case B, in 499 of 500 simulations the 3 SD credibility interval covered the actual value of ice viscosity. In test cases C and D, the 3 SD credibility interval covered the actual value of ice viscosity in all of the simulations. This suggests that the frequentist coverage probability of the credibility interval is at least 99 %.
For test case E, one assumes that the field is described by parameterized Eq. (16) of Bueler et al. (2005). That is, in polar coordinates with radius r and angle Θ,
for ${\mathit{\theta}}_{\mathrm{1}}<\mathrm{\Theta}<{\mathit{\theta}}_{\mathrm{2}}$ and ${r}_{\mathrm{1}}<r<{r}_{\mathrm{2}}$, and C_{0}γ=0 otherwise. In addition to ice viscosity, the inferential object of interest is the scale parameter μ_{max}. The 3 SD posterior credibility interval for B is [1,43] in units of 10^{−25} s^{−1} Pa^{−3}, and for μ_{max} it is [1,50] in units of 10^{−12} Pa^{−1} ms^{−1}. The actual values for B and μ_{max} are $\mathrm{32}\times {\mathrm{10}}^{\mathrm{25}}$ s^{−1} Pa^{−3} and $\mathrm{25}\times {\mathrm{10}}^{\mathrm{12}}$ s^{−1} Pa^{−1} ms^{−1}, respectively. Hence, the credibility intervals cover both parameters. A figure illustrating the posterior distribution of μ_{max} is given in the Supplement.
While the credibility intervals achieved coverage of the actual values of the parameters, it was noticed that the posterior distribution for physical parameters and predictions are biased. Brynjarsdóttir and O'Hagan (2014) exhibit the same phenomenon in a simple physical system with a single physical parameter, and they demonstrate that the bias of a physical parameter posterior distribution reduces as better prior information is encoded to model the difference between the output of a computer simulator of a physical system and the actual physical process values (i.e., what we have termed as an errorcorrecting process). To demonstrate that this also holds in the BHM presented in this paper, we consider the following comparison. To assign prior information to the errorcorrecting process, we consider a discrete parameter set for ${\mathit{\sigma}}_{\mathrm{interior}}^{\mathrm{2}},$ ${\mathit{\sigma}}_{\mathrm{margin}}^{\mathrm{2}}$, and ${\mathit{\sigma}}_{\mathrm{dome}}^{\mathrm{2}}:\mathit{\{}\mathrm{0.1},\mathrm{1},\mathrm{10},\mathrm{100}\mathit{\}}$ in units of square meters, which corresponds to different orders of magnitude for variability. In one case, we ignore prior information from Bueler et al. (2005) and put equal probability mass on the parameter space for these parameters. In the second case, we encode more realistic prior information into the scales of errors at the three regions: equal mass on 10 and 100 at the margin, equal mass on 0.1 and 1 at the interior, and equal mass at 1 and 10 at the dome (all units are square meters). In both cases, the parameter ϕ is fixed at 70 km to place emphasis on the scales of error. The results of inferring the posterior distribution for ice viscosity B are shown in Fig. 8. Consistent with Brynjarsdóttir and O'Hagan (2014), the posterior distribution of the physical parameter B is much less biased when prior information is encoded into the errorcorrecting process.
To assess how the posterior distribution for ice viscosity evolves under different sampling plans of the data, we conducted a series of simulations in test case D under varying sampling periods. In particular, we considered data samples once every 10 years, once every 5 years, once a year, and twice a year; the resulting posteriors for ice viscosity are in Fig. 9. The general pattern is that the bias of the posterior distributions reduces as the period gets shorter, although the posterior becomes more diffuse. The result that some posterior uncertainty does not go away with more collected data is also consistent with the results in Brynjarsdóttir and O'Hagan (2014). The particular period we chose in this analysis (data collected twice a year) was meant to model how the UIIES Glaciology Team collects data, that is, twice a year due to summer and winter mass balance measurements.
To assess the accumulating errorcorrecting process model, we estimated the marginal variances of the errorcorrecting process for each of the time points with observed data in test case B, by examining the residuals formed by the difference between the numerical solver and the observed data. According to the model, the standard deviation of these residuals at the interior of the glacier should grow as $\sqrt{{\mathit{\sigma}}^{\mathrm{2}}+t{\mathit{\sigma}}_{\mathrm{interior}}^{\mathrm{2}}}$, where t is the number of time steps (and likewise at the dome and margin). Figure 10 shows a match between observed and expected in this regard, and, in particular, the 99 % confidence bands appear to cover the expected variability as time progresses. Also apparent from this figure is that, as time progresses, the errors at the margin, dome, and interior contribute more error than measurement error, which is on the order of 1 m. Moreover, this is also evident in Table 4, since after 200 time steps from t_{0} (i.e., 20 years), the marginal variances will be $\mathrm{200}{\mathit{\sigma}}_{\mathrm{interior}}^{\mathrm{2}}$, $\mathrm{200}{\mathit{\sigma}}_{\mathrm{margin}}^{\mathrm{2}}$, and $\mathrm{200}{\mathit{\sigma}}_{\mathrm{dome}}^{\mathrm{2}}$ based on the accumulating errors model; all of these values exceed 1, the measurement variance.
The primary contribution of this work has been to construct a BHM for glacier flow based on the SIA that operates in two spatial dimensions and time, which successfully models numerical errors induced by a numerical solver that accumulate with time and vary spatially. This BHM leads to full posterior probability distributions for physical parameters as well as a principled method for making predictions that takes into account both numerical errors and uncertainty in key physical parameters. Furthermore, the BHM operates in two spatial dimensions and time, which, to our knowledge, is new to the field of glaciology. An additional contribution is the derivation of a novel finite difference method for solving the SIA. When tested using simulated data sets based on analytical solutions to the SIA from Bueler et al. (2005), the results herein indicate that our approach is able to infer meaningful probability distributions for glacial parameters, and, furthermore, this approach makes probabilistic predictions for glacial thickness that adequately account for the error induced by using a numerical solver of the SIA. A future goal is to create an R package for fitting a generalized version of the model used within this work, where the function f(.) is provided by the user. This will allow glaciologists to extend the modeling approach we have developed to other similar scenarios in which the physical dynamics are more complex than the SIA. An additional scenario for which this package can be useful is when the numerical method is not a finite difference method, e.g., a FEM. To this end, we will attempt to utilize emulator inference (Hooten et al., 2011); this will be crucial to ensure that the methodology scales well computationally, since each posterior sample requires a forward PDE solve. Finally, and perhaps most importantly, future work will involve the application of the modeling and methodologies developed within this paper to real data collected by the IESUI, which includes bedrock elevation and mass balance measurements.
R scripts and R data files have been included in the Supplement to rerun the simulations utilized within this paper.
Here a finite difference scheme is derived for the SIA PDE. The overarching strategy in developing this finite discretization scheme is to take a secondorder Taylor expansion for $H(x,y,t)$ with x,y fixed, and then equate the resultant time derivatives, H_{t} and H_{tt}, to functions of spatial derivatives by using the original SIA PDE. That is, one starts with the approximation $H(x,y,t+\mathrm{\Delta}t)\approx H(x,y,t)+{H}_{t}(x,y,t)\mathrm{\Delta}t+{H}_{tt}(x,y,t)\mathrm{\Delta}{t}^{\mathrm{2}}/\mathrm{2}$ and uses the first equation of Sect. 2 to write H_{t} and H_{tt} in terms of spatial derivatives. Finally, central differences in space are substituted for the spatial derivatives. This finite difference scheme is motivated by the Lax–Wendroff (Hudson, 1998) method, which is generally better than finite difference methods that use only a singleorder Taylor expansion (indeed, in the advection–diffusion equation such methods may be unconditionally unstable).
In the following derivations, note that the subscripts mean “derivative with respect to” (e.g., H_{t} means derivative of H with respect to t):
Now we solve for these derivatives in terms of spatial derivatives in $H(x,y,t)$, the glacier thickness, and R(x,y), the bedrock level. The derivation makes repeated use of the differentiation rule for products, the chain rule for differentiation, and equality of partial derivatives. (e.g., H_{xt}=H_{tx}).
By symmetry in x and y, ${\left[\stackrel{\mathrm{\u203e}}{v}H\right]}_{y}$ can be analogously derived:
Derivatives [α^{n−1}]_{x} and [α^{n−1}]_{y}:
Now we derive $[\stackrel{\mathrm{\u203e}}{u}H{]}_{xt}$:
Note that terms with a time derivative of bedrock such as R_{xt} can be set to 0 since R is assumed to be static in time. However, we keep the time derivatives for R in the above equation for full generality in case a scenario is revisited where this does not hold. Next we derive ${\left[{\mathit{\alpha}}^{n\mathrm{1}}\right]}_{t}$:
Next we derive ${\left[{\mathit{\alpha}}^{n\mathrm{1}}\right]}_{tx}$:
Next we derive ${\left[{\mathit{\alpha}}^{n\mathrm{1}}\right]}_{ty}$:
Note that ${\mathit{S}}_{tx}={R}_{tx}+{H}_{tx}={H}_{tx}$ since R is assumed to be fixed as a function of t. Note that the same argument holds for other derivatives of S with respect to t. Next we derive ${H}_{tx},{H}_{txx},{H}_{ty}$, and H_{tyy},H_{tyx}:
Hence, these partial derivatives allow us to substitute purely spatial derivatives into the forward in time approximation for H. Without loss of generality, we use a central difference approximation for all spatial derivatives. Furthermore, we used Δ_{t}=0.1 years and ${\mathrm{\Delta}}_{x}={\mathrm{\Delta}}_{y}={\mathrm{10}}^{\mathrm{5}}$ m for the analysis in this paper. In total, 441 grid squares were modeled (i.e., 21 by 21) with the dome grid square at the origin. While a coarse grid was chosen for computational convenience, it is expected that numerical errors will go to zero as the grid width goes to zero, as is demonstrated both by Bueler et al. (2005) and Jarosch et al. (2013).
In the following subsections, we go through the key details regarding Bayesian computation for the model used in this work. Assume n total grid points are modeled, of which m≪n are observed. Let X_{j}∈ℝ^{n} be the errorcorrecting process at time j, S_{j}∈ℝ^{n} be the latent glacier surface values at time j, $f(\mathit{\theta},j)\in {\mathbb{R}}^{n}$ be shorthand for the output of the numerical solver at time point j, and ϵ_{j} be an independent and identically distributed (i.i.d) multivariate normal noise term at time j with mean 0 and covariance matrix Σ. MVN stands for multivariate normal, and the first argument is the mean and the second is the covariance. Furthermore, assume that data are collected regularly at every kth time point, such that one observes ${\mathit{Y}}_{k},{\mathit{Y}}_{\mathrm{2}k},\mathrm{\dots},{\mathit{Y}}_{Nk}\in {\mathbb{R}}^{m}$, and the corresponding observation error ${\mathit{Z}}_{k},{\mathit{Z}}_{\mathrm{2}k},\mathrm{\dots}{\mathit{Z}}_{Nk}$ is i.i.d MVN(0,σ^{2}I). For convenience, we denote Nk as T. Finally, let $\mathbf{A}\in {\mathbb{R}}^{m\times n}$ be a matrix which selects the grid squares of the latent process S that are observed; that is, its rows are unit basis vectors corresponding to those indices that are observed.
Calculating the likelihood $p({\mathit{Y}}_{k},\mathrm{\dots},{\mathit{Y}}_{T}\mathit{\theta})$
In this subsection, we derive both the likelihood of the observed data: $p({\mathit{Y}}_{k},\mathrm{\dots},{\mathit{Y}}_{T}\mathit{\theta})$ and an approximation to the likelihood.
Though Sect. 2.2 specifies the BHM in greater detail, the process and data levels of the BHM (i.e., conditioning on θ) are concisely written as follows.
Assume $j\in \mathrm{1},\mathrm{2},\mathrm{\dots}T$ and $c\in \mathrm{1},\mathrm{2},\mathrm{\dots}N$; hence there are N total spatial vectors observed with a period of length k. Furthermore, X_{1} is marginally MVN(0,Σ). That is, the process level vectors, S_{j}, are modeled conditional on the parameter level and the errorcorrecting process. The data level vectors, Y_{ck}, are generated conditional on the process level S_{ck}. Throughout the following, we condition on θ being fixed.
B1.1 The exact likelihood
Conditional on θ, the distribution of $({\mathit{Y}}_{k},\mathrm{\dots},{\mathit{Y}}_{T})$, viewed as one long random vector, is multivariate normal. Also, conditional on θ, the mean of $({\mathit{Y}}_{k},\mathrm{\dots},{\mathit{Y}}_{T})$ is $\left(Af\right(\mathit{\theta},k),\mathrm{\dots}Af(\mathit{\theta},T\left)\right)$ because both $({\mathit{X}}_{k},\mathrm{\dots},{\mathit{X}}_{T})$ and $({\mathit{Z}}_{k},\mathrm{\dots},{\mathit{Z}}_{T})$ have mean 0. It suffices to thus derive the covariance matrix for $({\mathit{Y}}_{k},\mathrm{\dots},{\mathit{Y}}_{T})$ conditional on θ. To do this, we note that Var$\left({\mathit{Y}}_{ck}\right)=\text{Var}(\mathbf{A}{\mathit{S}}_{ck}+{\mathit{Z}}_{ck})=\text{Var}\left(\mathbf{A}{\mathit{S}}_{ck}\right)+\text{Var}\left({\mathit{Z}}_{ck}\right)=\left[\mathbf{A}\right(ck\mathbf{\Sigma}\left){\mathbf{A}}^{\mathit{\u22ba}}\right]+{\mathit{\sigma}}^{\mathrm{2}}I$. Additionally, for a<b:
Therefore, the covariance matrix for the observed data can be written as $\mathbf{M}\otimes \left(\mathbf{A}\mathbf{\Sigma}{\mathbf{A}}^{\mathit{\u22ba}}\right)+{\mathit{\sigma}}^{\mathrm{2}}I$, where ${\mathbf{M}}_{ij}={k}_{\text{min}}(i,j)$ and $\mathbf{M}\in {\mathbb{R}}^{N\times N}$. This is a useful matrix representation because the inverse of M is bandlimited and sparse, for which there exist efficient computationally efficient linear algebraic routines (Rue, 2001).
B1.2 An approximation to the likelihood
The joint distribution $p({\mathit{Y}}_{k},\mathrm{\dots},{\mathit{Y}}_{T}\mathit{\theta})$ can be written as $p\left({\mathit{Y}}_{k}\right\mathit{\theta}\left)p\right({\mathit{Y}}_{\mathrm{2}k}{\mathit{Y}}_{k},\mathit{\theta})\mathrm{\dots}p\left({\mathit{Y}}_{T}\right{\mathit{Y}}_{k},\mathrm{\dots},{\mathit{Y}}_{(N\mathrm{1})k},\mathit{\theta})$. Since we expect that the data level errors are quite small (on the order of 1 m) in comparison to the overall surface elevation measurements (on the order of 1 km), we can approximate $p\left({\mathit{S}}_{(c\mathrm{1})k}\right{\mathit{Y}}_{k},\mathrm{\dots},{\mathit{Y}}_{(c\mathrm{1})k},\mathit{\theta})$ with $p\left({\mathit{S}}_{(c\mathrm{1})k}\right{\mathit{Y}}_{(c\mathrm{1})k},\mathit{\theta})$. Consequently, $p\left({\mathit{Y}}_{ck}\right{\mathit{Y}}_{k},\mathrm{\dots},{\mathit{Y}}_{(c\mathrm{1})k},\mathit{\theta})$ will be close to $p\left({\mathit{Y}}_{ck}\right{\mathit{Y}}_{(c\mathrm{1})k},\mathit{\theta})$. From the above recursive relationship, we can write
This expression motivates approximating $p\left({\mathit{Y}}_{ck}\right{\mathit{Y}}_{k},\mathrm{\dots},{\mathit{Y}}_{(c\mathrm{1})k},\mathit{\theta})$ as MVN distribution with mean ${\mathit{Y}}_{(c\mathrm{1})k}+\mathbf{A}\left[f\right(\mathit{\theta},ck)f(\mathit{\theta},(c\mathrm{1})k\left)\right]$ and covariance matrix A(kΣ)A^{⊺}+2σ^{2}I. A similar expression shows that p(Y_{k}) is multivariate normal with mean Af(θ,k) and covariance matrix A(kΣ)A^{⊺}+σ^{2}I. Nonetheless, we must be clear: $p\left({\mathit{Y}}_{ck}\right{\mathit{Y}}_{(c\mathrm{1})k},\mathit{\theta})$ does not exactly follow a MVN with mean ${\mathit{Y}}_{(c\mathrm{1})k}+\mathbf{A}\left[f\right(\mathit{\theta},ck)f(\mathit{\theta},(c\mathrm{1})k\left)\right]$ and covariance matrix A(kΣ)A^{⊺}+2σ^{2}I; this is because Z_{(c−1)k} and Y_{(c−1)k} are dependent. A simple example illustrating this approximation is presented in the Supplement.
B2 Posterior computation
Posterior inference is accomplished with grid sampling (Gelman et al., 2013); this approach directly computes the posterior distribution, $p\left(\mathit{\theta}\right{\mathit{Y}}_{k},\mathrm{\dots},{\mathit{Y}}_{T})$ of the parameter, proportional to $p({\mathit{Y}}_{k},\mathrm{\dots},{\mathit{Y}}_{T}\mathit{\theta}\left)p\right(\mathit{\theta})$, on a grid of plausible values. The likelihood is derived in the previous subsection. Parameters for the errorcorrecting process are selected using knowledge elicited from the studies of Bueler et al. (2005). To verify the sensitivity of grid sampling to the grid width, three grid widths for B are considered: 0.25, 0.50, and 1, and the grid's range is from [1,70] (all in units of 10^{−25} s^{−1} Pa^{−3}). The summary statistics for generating 10^{6} posterior samples from more to less fine (0.25, 0.50, 1) are given below.

The minimum is (5.25,5.00,6.00).

The first quartile is (23.8,23.5,24.0).

The median is (27.0,26.5, 27.0).

The mean is (27.1,26.7,27.1).

The third quartile is (30.5,30.0,30.0).

The max is (51.50,49.0,51.0).
The similarity of summary statistics across grid widths indicates that the posterior samples are not very sensitive to grid width; a grid width of 0.50 was used for the analyses within. Moreover, the posterior samples in this check were generated for test case D (i.e., mass balance field producing a periodic solution to the SIA).
B3 Making spatiotemporal predictions of glacial surface elevation
In this section, we give details for how to make predictions under the proposed Bayesian model. Denote ${\mathit{S}}_{{T}_{\text{end}}}\in {\mathbb{R}}^{n}$ for future glacier elevation values we want to make a prediction for at time point T_{end}. Our goal is to approximate the posterior predictive distribution $p\left({\mathit{S}}_{{T}_{\mathrm{end}}}\right{\mathit{Y}}_{k},\mathrm{\dots}{\mathit{Y}}_{T})$. To make this computationally simple, our first assumption (as in the computation of the likelihood) is to suggest that $p\left({\mathit{S}}_{T}\right{\mathit{Y}}_{k},\mathrm{\dots}{\mathit{Y}}_{T},\mathit{\theta})$ is approximately equivalent to $p\left({\mathit{S}}_{T}\right{\mathit{Y}}_{T},\mathit{\theta})$. This is because relative to the overall glacier surface elevation values (an average of about 2000 m), the measurement errors are small, on the order of 1 m. Moreover, based on the model specified above, we know that ${\mathit{S}}_{{T}_{\mathrm{end}}}={\mathit{X}}_{T}+{\sum}_{j=T+\mathrm{1}}^{{T}_{\mathrm{end}}}{\mathit{\u03f5}}_{j}+f(\mathit{\theta},{T}_{\mathrm{end}})$. This suggests the following iterative procedure to generate a posterior sample for the prediction of ${\mathit{S}}_{{T}_{\mathrm{end}}}$: for each independent sample θ_{l} from $p\left(\mathit{\theta}\right{\mathit{Y}}_{k},\mathrm{\dots},{\mathit{Y}}_{T})$, generate a sample from a multivariate normal whose mean is 0 and covariance given by (T_{end}−T)Σ, add the sample to f(θ_{l},T_{end}), and then add this sum to a sample from $p\left({\mathit{X}}_{T}\right\mathit{\theta}={\mathit{\theta}}_{l},{\mathit{Y}}_{T})$.
We must then determine how to sample from the distribution of $p\left({\mathit{X}}_{T}\right\mathit{\theta}={\mathit{\theta}}_{l},{\mathit{Y}}_{T})$. Let X_{Tobs}∈ℝ^{m} be a subvector of X_{T} corresponding to the indices that are observed at the data level, and ${\mathit{X}}_{{T}_{\mathrm{pred}}}\in {\mathbb{R}}^{nm}$ be a subvector of X_{T} corresponding to unobserved indices. The distribution for $p\left({\mathit{X}}_{{T}_{\mathrm{obs}}}\right\mathit{\theta},{\mathit{Y}}_{T})$ is multivariate normal due to conjugacy. The precision, denoted by Q_{obs}, is ${\mathit{\sigma}}^{\mathrm{2}}I+\left[\mathbf{A}\right(T\mathbf{\Sigma}){\mathbf{A}}^{\mathit{\u22ba}}{]}^{\mathrm{1}}$. The mean, denoted by μ_{obs}, is ${Q}_{\mathrm{obs}}^{\mathrm{1}}({\mathit{\sigma}}^{\mathrm{2}}I{\mathit{Y}}_{T}+[\mathbf{A}\left(T\mathbf{\Sigma}\right){\mathbf{A}}^{\mathit{\u22ba}}{]}^{\mathrm{1}}\mathbf{A}f(\mathit{\theta},T))\mathbf{A}f(\mathit{\theta},T)$. $p\left({\mathit{X}}_{{T}_{\text{pred}}}\right{\mathit{X}}_{{T}_{\text{obs}}},\mathit{\theta},{\mathit{Y}}_{T})$ is multivariate normal, whose mean and variance can be derived with the wellknown conditional multivariate normal formula, as in Theorem 2.44 of Wasserman (2013). That is, the mean is TΣ_{pred,obs}Q_{obs} and the variance is $T{\mathbf{\Sigma}}_{\mathrm{pred},\mathrm{pred}}T{\mathbf{\Sigma}}_{\mathrm{pred},\mathrm{obs}}{Q}_{\text{obs}}T{\mathbf{\Sigma}}_{\mathrm{obs},\mathrm{pred}}$. Here, Σ_{pred,obs} is the submatrix of Σ that contains the rows of Σ that correspond to the indices that are to be predicted, and the columns correspond to the indices which are observed. Σ_{obs,pred} is analogously defined.
The supplement related to this article is available online at: https://doi.org/10.5194/tc1222292018supplement.
All of the glaciologists contributed equally to this work.
The authors declare that they have no conflict of interest.
The Icelandic Research Fund (RANNIS) is thanked for funding this
research.
Edited by: Eric
Larour
Reviewed by: Lambert Caron and one anonymous
referee
Berliner, L. M.: Hierarchical Bayesian Time Series Models, in: Maximum Entropy and Bayesian Methods, edited by: Hanson, K. M. and Silver, R. N., Springer Netherlands, Dordrecht, 15–22, 1996. a
Berliner, L. M.: Physicalstatistical modeling in geophysics, J. Geophys. Res.Atmos., 108, 8776, https://doi.org/10.1029/2002JD002865, 2003. a, b
Berliner, L. M., Jezek, K., Cressie, N., Kim, Y., Lam, C. Q., and van der Veen, C. J.: Modeling dynamic controls on ice streams: a Bayesian statistical approach, J. Glaciol., 54, 705–714, https://doi.org/10.3189/002214308786570917, 2008. a
Brinkerhoff, D. J., Aschwanden, A., and Truffer, M.: Bayesian Inference of Subglacial Topography Using Mass Conservation, Front. Earth Sci., 4, 8, https://doi.org/10.3389/feart.2016.00008, 2016. a
Brynjarsdóttir, J. and O'Hagan, A.: Learning about physical parameters: the importance of model discrepancy, Inverse Probl., 30, 114007, https://doi.org/10.1088/02665611/30/11/114007, 2014. a, b, c
Bueler, E., Lingle, C. S., KallenBrown, J. A., Covey, D. N., and Bowman, L. N.: Exact solutions and verification of numerical models for isothermal ice sheets, J. Glaciol., 51, 291–306, https://doi.org/10.3189/172756505781829449, 2005. a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u
Calderhead, B., Girolami, M., and Lawrence, N. D.: Accelerating Bayesian Inference over Nonlinear Differential Equations with Gaussian Processes, in: Proceedings of the 21st International Conference on Neural Information Processing Systems, NIPS'08, Curran Associates Inc., 217–224, USA, ISBN: 9781605609492 , 2008. a
Cressie, N. and Wikle, C. K.: Statistics for spatiotemporal data, John Wiley & Sons, 2015. a, b, c
Cuffey, K. M. and Paterson, W.: The Physics of Glaciers, 4 edn., Academic Press, 2010. a, b
Flowers, G. E., Marshall, S. J., Björnsson, H., and Clarke, G. K.: Sensitivity of Vatnajökull ice cap hydrology and dynamics to climate warming over the next 2 centuries, J. Geophys. Res.Earth, 110, F02011, https://doi.org/10.1029/2004JF000200, 2005. a
Fowler, A. C. and Larson, D. A.: On the Flow of Polythermal Glaciers. I. Model and Preliminary Analysis, Proc. R. Soc. Lon. Ser.A, 363, 217–242, 1978. a
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B.: Bayesian data analysis, 3rd edn., Chapman & Hall/CRC, 2013. a
Glen, J.: The flow law of ice: A discussion of the assumptions made in glacier theory, their experimental foundations and consequences, IASH Publ., 47, 171–183, 1958. a
Glen, J. W.: The Creep of Polycrystalline Ice, Proc. R. Soc. Lon. Ser.A, 228, 519–538, 1955. a
Hooten, M. B., Leeds, W. B., Fiechter, J., and Wikle, C. K.: Assessing FirstOrder Emulator Inference for Physical Parameters in Nonlinear Mechanistic Models, J. Agr. Biol. Envir. St., 16, 475–494, https://doi.org/10.1007/s1325301100737, 2011. a, b
Hudson, J.: Numerical Techniques for Conservation Laws with Source Terms, Tech. rep., Engineering and Physical Science Research Council, 1998. a, b
Hutter, K.: A mathematical model of polythermal glaciers and ice sheets, Geophys. Astro. Fluid, 21, 201–224, https://doi.org/10.1080/03091928208209013, 1982. a
Hutter, K.: Theoretical Glaciology: Material Science of Ice and the Mechanics of Glaciers and Ice Sheets, Mathematical Approaches to Geophysics, Springer, 548 pp., 1983. a
Isaac, T., Petra, N., Stadler, G., and Ghattas, O.: Scalable and efficient algorithms for the propagation of uncertainty from data through inference to prediction for largescale problems, with application to flow of the Antarctic ice sheet, J. Comput. Phys., 296, 348–368, https://doi.org/10.1016/j.jcp.2015.04.047, 2015. a
Jarosch, A. H., Schoof, C. G., and Anslow, F. S.: Restoring mass conservation to shallow ice flow models over complex terrain, The Cryosphere, 7, 229–240, https://doi.org/10.5194/tc72292013, 2013. a, b, c
Minchew, B., Simons, M., Hensley, S., Björnsson, H., and Pálsson, F.: Early melt season velocity fields of Langjökull and Hofsjökull, central Iceland, J. Glaciol., 61, 253–266, 2015. a
Owhadi, H. and Scovel, C.: Universal Scalable Robust Solvers from Computational Information Games and fast eigenspace adapted Multiresolution Analysis, ArXiv eprints, 142 pp., arXiv:1703.10761, 2017. a
Payne, A. J., Huybrechts, P., AbeOuchi, A., Calov, R., Fastook, J. L., Greve, R., Marshall, S. J., Marsiat, I., Ritz, C., Tarasov, L., and Thomassen, M. P. A.: Results from the EISMINT model intercomparison: the effects of thermomechanical coupling, J. Glaciol., 46, 227–238, 2000. a, b
Pralong, M. R. and Gudmundsson, G. H.: Bayesian estimation of basal conditions on Rutford Ice Stream, West Antarctica, from surface data, J. Glaciol., 57, 315–324, https://doi.org/10.3189/002214311796406004, 2011. a
Rue, H.: Fast sampling of Gaussian Markov random fields, J. Roy. Stat. Soc. B Met., 63, 325–338, 2001. a
Rue, H., Riebler, A., Sørbye, S. H., Illian, J. B., Simpson, D. P., and Lindgren, F. K.: Bayesian Computing with INLA: A Review, Annu. Rev. Stat. Appl., 4, 395–421, https://doi.org/10.1146/annurevstatistics060116054045, 2017. a
Stan Development Team: RStan: the R interface to Stan, r package version 2.17.3, available at: http://mcstan.org/ (last access: 1 April 2018), 2018. a
van der Veen, C.: Fundamentals of Glaicer Dynamics, 2nd edn., CRC Press, 2017. a
Wasserman, L.: All of statistics: a concise course in statistical inference, Springer Science & Business Media, 2013. a
Weertman, J.: The theory of glacier sliding, J. Glaciol., 5, 287–303, https://doi.org/10.1017/S0022143000029038, 1964. a
Wikle, C. K.: Hierarchical Models for Uncertainty Quantification: An Overview, Springer International Publishing, Cham, 1–26, https://doi.org/10.1007/9783319112596_41, 2016. a, b, c
 Abstract
 Introduction
 Description of models
 Experiments to assess the Bayesian hierarchical model
 Results
 Summary, discussion, and future work
 Data availability
 Appendix A: Finite difference method for the shallow ice approximation
 Appendix B: Model fitting
 Author contributions
 Competing interests
 Acknowledgements
 References
 Supplement
 Abstract
 Introduction
 Description of models
 Experiments to assess the Bayesian hierarchical model
 Results
 Summary, discussion, and future work
 Data availability
 Appendix A: Finite difference method for the shallow ice approximation
 Appendix B: Model fitting
 Author contributions
 Competing interests
 Acknowledgements
 References
 Supplement