Research article 09 Apr 2021
Research article  09 Apr 2021
Inferring the basal sliding coefficient field for the Stokes ice sheet model under rheological uncertainty
 ^{1}School of Mathematical Sciences, Rochester Institute of Technology, Rochester, NY 14623, USA
 ^{2}Department of Engineering Science, University of Auckland, Auckland 1010, New Zealand
 ^{3}Electrical & Systems Engineering, Washington University in St. Louis, St. Louis, MO 63130, USA
 ^{4}Department of Applied Mathematics, University of California, Merced, Merced, CA 95343, USA
 ^{1}School of Mathematical Sciences, Rochester Institute of Technology, Rochester, NY 14623, USA
 ^{2}Department of Engineering Science, University of Auckland, Auckland 1010, New Zealand
 ^{3}Electrical & Systems Engineering, Washington University in St. Louis, St. Louis, MO 63130, USA
 ^{4}Department of Applied Mathematics, University of California, Merced, Merced, CA 95343, USA
Correspondence: Noémi Petra (npetra@ucmerced.edu)
Hide author detailsCorrespondence: Noémi Petra (npetra@ucmerced.edu)
We consider the problem of inferring the basal sliding coefficient field for an uncertain Stokes ice sheet forward model from synthetic surface velocity measurements. The uncertainty in the forward model stems from unknown (or uncertain) auxiliary parameters (e.g., rheology parameters). This inverse problem is posed within the Bayesian framework, which provides a systematic means of quantifying uncertainty in the solution. To account for the associated model uncertainty (error), we employ the Bayesian approximation error (BAE) approach to approximately premarginalize simultaneously over both the noise in measurements and uncertainty in the forward model. We also carry out approximative posterior uncertainty quantification based on a linearization of the parametertoobservable map centered at the maximum a posteriori (MAP) basal sliding coefficient estimate, i.e., by taking the Laplace approximation. The MAP estimate is found by minimizing the negative log posterior using an inexact Newton conjugate gradient method. The gradient and Hessian actions to vectors are efficiently computed using adjoints. Sampling from the approximate covariance is made tractable by invoking a lowrank approximation of the data misfit component of the Hessian. We study the performance of the BAE approach in the context of three numerical examples in two and three dimensions. For each example, the basal sliding coefficient field is the parameter of primary interest which we seek to infer, and the rheology parameters (e.g., the flow rate factor or the Glen's flow law exponent coefficient field) represent socalled nuisance (secondary uncertain) parameters. Our results indicate that accounting for model uncertainty stemming from the presence of nuisance parameters is crucial. Namely our findings suggest that using nominal values for these parameters, as is often done in practice, without taking into account the resulting modeling error, can lead to overconfident and heavily biased results. We also show that the BAE approach can be used to account for the additional model uncertainty at no additional cost at the online stage.
 Article
(8917 KB) 
Supplement
(4583 KB)  BibTeX
 EndNote
Inferring the basal sliding coefficient field using both the linear and nonlinear Stokes ice sheet model from noisy surface velocity measurements has received considerable attention in recent years (see, for example, Truffer, 2004, Raymond and Gudmundsson, 2009, Pollard and DeConto, 2012, Isaac et al., 2015b, Morlighem et al., 2013, Zhao et al., 2018a, b, Giudici et al., 2014, Petra et al., 2012, 2014, and Isaac et al., 2015a). The standard approach to this problem invariably assumes that the other parameters of the ice, such as those controlling the rheology, are known precisely. This is particularly common, for example, in the case of the socalled flow rate factor and the Glen's flow law exponent, in which nominal values such as $A={\mathrm{10}}^{\mathrm{16}}$ Pa^{−n} a^{−1} and n=3, respectively, are prescribed; we refer, e.g., to Isaac et al. (2015a), Petra et al. (2014), Raymond and Gudmundsson (2009), Truffer (2004), Morlighem et al. (2013), Zhu et al. (2016), Zhao et al. (2018b), Giudici et al. (2014), and Pollard and DeConto (2012). The inference problem is made significantly more challenging (both theoretically and numerically) by allowing the rheology parameters to be uncertain and spatially varying. One possible approach to solve the problem is to infer both the basal sliding coefficient and the rheology parameters. However, this considerably increases both the illposedness of the inverse problem and the associated computational costs. For most ice sheet inverse problems considered in the literature, the field of interest is the basal sliding parameter, which arguably presents the largest uncertainty in determining the ice flow rate.
It is well documented that, in practice, the rheology parameters of ice sheets are not known exactly (e.g., Bons et al., 2018; Marshall, 2005; GilletChaulet et al., 2011, 2012; Cuffey and Paterson, 2010; Brondex et al., 2019; Raymond and Gudmundsson, 2011). Compounding this issue is the fact that measured ice velocities can be heavily influenced by rheology parameters (Schlegel et al., 2015; Bulthuis et al., 2019). This fact was demonstrated in Petra et al. (2012), in which the authors used the same Stokes ice sheet model as in the current paper to reconstruct reasonable estimates of the Glen's flow law exponent from noisy surface velocity measurements, suggesting that the surface measurements are indeed sensitive to changes in the Glen's flow law exponent field. Despite these findings, it is standard in the literature to assume that rheology – among other – parameters of the ice are known a priori (see, for instance, Bons et al., 2018, Marshall, 2005, GilletChaulet et al., 2011, 2012, Cuffey and Paterson, 2010, Brondex et al., 2019, and Van der Veen, 2013).
In this paper, we treat the rheology parameters (specifically the Glen's flow law exponent and the flow rate factor fields) as auxiliary (nuisance) parameters, i.e., parameters which are not of primary interest. However, fixing these auxiliary parameters at incorrect, though possibly welljustified, values often induces socalled modeling errors. It is well understood, though, that the solutions to inverse problems are generally sensitive to modeling errors which – if not properly accounted for – can lead to inaccurate, nonphysical, and in some cases meaningless solutions of the inverse problem (Brynjarsdóttir and O'Hagan, 2014; Giudici et al., 2014; Kaipio and Somersalo, 2007, 2005). From a statistical viewpoint, fixing auxiliary parameters to nominal values suggests that these parameters are known exactly and hence neglects all associated uncertainties. This in turn often results in biased and overconfident estimates for the parameters of interest (see, for example, Kaipio and Somersalo, 2007, Kaipio and Kolehmainen, 2013, and Nicholson et al., 2018 and the references therein).
We carry out estimation of the basal sliding coefficient within the Bayesian framework (Kaipio and Somersalo, 2005; Stuart, 2010), which is particularly well suited to incorporating various sources and types of uncertainties, including those resulting from model errors (Tarantola, 2005; Kaipio and Somersalo, 2005, 2007). Moreover, to ensure the work here is readily transferable to inference problems in largescale ice flow problems, such as those discussed in Isaac et al. (2015a), we make use of the computational framework proposed in BuiThanh et al. (2013) and Petra et al. (2014) for handling infinitedimensional Bayesian inverse problems (Stuart, 2010). This approach, combined with adjointbased means to compute the derivative information needed by the optimization solver, ensures mesh independence and computational efficiency.
To account for the uncertainty in the rheology parameters, we utilize the Bayesian approximation error (BAE) approach (Kaipio and Somersalo, 2005, 2007; Kaipio and Kolehmainen, 2013), which, broadly speaking, lumps all modeling and measurement uncertainties into a single additive total error term. The total error can then be approximately marginalized over in a similar manner to how standard additive errors are dealt with (Kaipio and Kolehmainen, 2013). The BAE approach is particularly attractive computationally as
 (a)
the approximate marginalization can be carried out prior to data acquisition, i.e., premarginalization, and
 (b)
the equations to be solved in the adjointstate approach maintain the same general form (Nicholson et al., 2018).
The BAE approach has been used in a variety of settings (see, for example, Kaipio and Kolehmainen, 2013, Arridge et al., 2006, Castello and Kaipio, 2019, and Lamien et al., 2019, among others, and the references therein). A particularly relevant, and recent, example is the application of the approach to the socalled Robin inverse problem encountered, for instance, in corrosion detection (Nicholson et al., 2018). There the parameter of interest is also a Robintype boundary condition on an inaccessible part of the domain, while the nuisance parameter is the (electrical or thermal) conductivity of the domain.
To study the performance of the BAE approach, we formulate and solve three ice sheet flow model problems involving synthetic data. Our results suggest that simply setting rheology parameters to nominal values can result in severely misleading estimates of the basal sliding coefficient field and associated posterior uncertainty if the additional uncertainty in the rheology parameters is not accounted for. In comparison, we show that incorporating the additional modeling uncertainties using the BAE approach leads to sensible estimates of the basal sliding coefficient and reasonable posterior uncertainty at no additional online cost. We place particular emphasis on the feasibility of the posterior uncertainty estimates, in particular, on how well the true parameter is contained within the posterior distribution.
1.1 Contributions
In previous work, we addressed the problem of inferring the basal sliding coefficient field from surface velocity measurements in the context of ice sheet flow in a deterministic, moderate scale, synthetic observational data setting in Petra et al. (2012), in a Bayesian inference and infinitedimensional setting in Petra et al. (2014), and more recently in a largescale, real data setting in Isaac et al. (2015a). Here the goal is to extend this inversion framework to account for additional uncertainties in the ice sheet model. The main contributions of this paper are as follows. Firstly, we show that setting rheology parameters to values commonly found for ice sheet models in the literature can lead to erroneous posterior estimates of the basal sliding coefficient if the underlying uncertainty in the rheology parameters is not accounted for. Secondly, we show that this situation can be remedied by employing the BAE approach to premarginalize over rheology uncertainties. Thirdly, we show that this approach requires no additional computational resources or time at the online stage as all computations required for premarginalization are carried out prior to the acquisition of data.
1.2 Organization of paper
The paper is organized as follows. In Sect. 2, we outline the forward nonlinear Stokes flow equations for ice sheet problems, while in Sect. 3 we briefly review the Bayesian framework for inverse problems, the computation of the maximum a posteriori estimate, and the approximate posterior covariance. In Sect. 4, we show how to apply the BAE approach to premarginalize over auxiliary parameters. In Sects. 5 and 6, we formulate and solve three ice sheet inverse problems and study the performance of the proposed method. Finally, Sect. 7 provides concluding remarks.
In this section, we describe the forward ice sheet flow problem that is used for the inference of the basal sliding coefficient field under uncertain rheology. As in Petra et al. (2012, 2014) and Isaac et al. (2015a), we model the flow of ice as an isothermal, viscous, shearthinning, incompressible fluid via the balance of mass and linear momentum (Hutter, 1983; Marshall, 2005; Paterson, 1994), namely
where u denotes the velocity field, σ_{u} the stress tensor, ρ the density of the ice, and g gravity. The stress, σ_{u}, can be decomposed as ${\mathit{\sigma}}_{u}={\mathit{\tau}}_{u}\mathbf{I}p$, where τ_{u} is the deviatoric stress tensor, p the pressure, and I the identity tensor. The domain considered in this paper is $\mathrm{\Omega}=[\mathrm{0},L{]}^{d\mathrm{1}}\times [\mathrm{0},H]$, for d=2 or d=3. We employ the Glen's flow law (Glen, 1955) which relates the stress and strain rate tensors by
where η is the effective viscosity, A is the flow rate factor, and ${\dot{\mathit{\epsilon}}}_{u}=\frac{\mathrm{1}}{\mathrm{2}}(\mathrm{\nabla}\mathit{u}+\mathrm{\nabla}{\mathit{u}}^{T})$ and ${\dot{\mathit{\epsilon}}}_{\mathrm{II}}=\frac{\mathrm{1}}{\mathrm{2}}\text{tr}\left({\dot{\mathit{\epsilon}}}_{u}^{\mathrm{2}}\right)$ are the strain rate tensor and its second invariant. Above, n=n(x) is the spatially varying Glen's flow law exponent, which satisfies n(x)≥1 for all x∈Ω to ensure the ice is a shearthinning fluid (Glen, 1955).
In line with Petra et al. (2012), the top boundary Γ_{t} is equipped with a tractionfree boundary condition, all lateral boundaries Γ_{p} are equipped with periodic boundary conditions, and on the basal surface Γ_{b} we apply a no flow condition for the normal component of u along with a linear sliding law for the tangential components. That is, the boundary conditions are given by
where β(x) is the log basal sliding coefficient field^{1}, n is the outward normal unit vector, and $\mathbf{T}:=\mathbf{I}\mathit{n}{\mathit{n}}^{T}$ is the projection onto the tangential plane. Above, we generically used Γ_{l} and Γ_{r} to denote pairs of opposing boundaries on Γ_{p} on which periodic conditions are imposed. We note that β generally represents a combination of complex phenomena (see, for example, Schoof, 2005, 2010, and Perego et al., 2014). Furthermore, the methods and results discussed in the current paper do not rely on the particular top and lateral boundary conditions specified. As such, alternative boundary conditions could also be imposed on Γ_{t} and Γ_{p}. For a simple illustration of the problem setup (shown in two dimensions), see Fig. 1.

The weak form of the Stokes equation. In what follows, let us introduce the weak form of Eq. (1), as it is the starting point for both the finite element discretization of the forward model and the computation of the gradient and action of the Hessian required for the solution of the inverse problem using the adjointstate method (see, for example, Isaac et al., 2015a). Multiplying the nonlinear Stokes system (1) with arbitrary test functions $\stackrel{\mathrm{\u0303}}{\mathit{u}}$ and $\stackrel{\mathrm{\u0303}}{p}$ and using integration by parts over Ω (Gockenbach, 2006; Elman et al., 2005), the weak form of Eq. (1) is given by the following. Find $(\mathit{u},p)\in \mathcal{W}=\mathcal{V}\times \mathcal{Q}$ such that
$$\begin{array}{}\text{(2)}& \begin{array}{rl}{\displaystyle}\underset{\mathrm{\Omega}}{\int}\mathrm{2}\mathit{\eta}\left(\mathit{u}\right){\dot{\mathit{\epsilon}}}_{u}& {\displaystyle}:{\dot{\mathit{\epsilon}}}_{\stackrel{\mathrm{\u0303}}{\mathit{u}}}\phantom{\rule{0.25em}{0ex}}\mathrm{d}\mathit{x}\underset{\mathrm{\Omega}}{\int}(p\mathrm{\nabla}\cdot \stackrel{\mathrm{\u0303}}{\mathit{u}}+\stackrel{\mathrm{\u0303}}{p}\mathrm{\nabla}\cdot \mathit{u})\phantom{\rule{0.25em}{0ex}}\mathrm{d}\mathit{x}\\ {\displaystyle}& {\displaystyle}+\underset{{\mathrm{\Gamma}}_{\mathrm{b}}}{\int}\mathrm{exp}\left(\mathit{\beta}\right)\mathbf{T}\mathit{u}\cdot \mathbf{T}\stackrel{\mathrm{\u0303}}{\mathit{u}}\phantom{\rule{0.25em}{0ex}}\mathrm{d}\mathit{s}=\underset{\mathrm{\Omega}}{\int}\mathit{\rho}\mathit{g}\cdot \stackrel{\mathrm{\u0303}}{\mathit{u}}\phantom{\rule{0.25em}{0ex}}\mathrm{d}\mathit{x}\end{array}\end{array}$$for all $(\stackrel{\mathrm{\u0303}}{\mathit{u}},\stackrel{\mathrm{\u0303}}{p})\in \mathcal{W}$. In line with Elman et al. (2005) and Isaac et al. (2015b), we set $\mathcal{V}:=\mathit{\{}\mathit{u}\in ({H}^{\mathrm{1}}\left(\mathrm{\Omega}\right){)}^{d}:{\left.\mathit{u}\right}_{{\mathrm{\Gamma}}_{\mathrm{l}}}={\left.\mathit{u}\right}_{{\mathrm{\Gamma}}_{\mathrm{r}}},{\left.\mathit{u}\cdot \mathit{n}\right}_{{\mathrm{\Gamma}}_{\mathrm{b}}}=\mathrm{0}\mathit{\}}$ and 𝒬:=(L^{2}(Ω))^{d}, for d=2 or d=3.

Discretization. To guarantee the infsup stability (wellposedness) of the discretized forward problem, we discretize the velocity and pressure using Taylor–Hood finite elements, i.e., quadratic elements for each velocity component and linear elements for pressure (see, for example, Elman et al., 2005). The basal sliding coefficient field is discretized using continuous linear Lagrange basis functions ${\left\{{\mathit{\varphi}}_{j}\left(\mathit{s}\right)\right\}}_{j=\mathrm{1}}^{m}$, i.e., ${\mathit{\beta}}_{h}\left(\mathit{s}\right)={\sum}_{j=\mathrm{1}}^{m}{\mathit{\beta}}_{j}{\mathit{\varphi}}_{j}\left(\mathit{s}\right)$, where s∈Γ_{b}. In what follows, we denote by $\mathit{\beta}=({\mathit{\beta}}_{\mathrm{1}},{\mathit{\beta}}_{\mathrm{2}},\mathrm{\dots},{\mathit{\beta}}_{m})\in {\mathbb{R}}^{m}$ the discrete basal sliding coefficient field.
In this section, we summarize the Bayesian inference framework, which will be used in combination with the Bayesian approximation error approach, to account for uncertainties in rheology parameters. To allow for the systematic incorporation of uncertainties, we consider the inverse problem in the Bayesian framework (Tarantola, 2005; Kaipio and Somersalo, 2005). In this framework, the solution of the underlying statistical inverse problem is given by the posterior probability density. For nonlinear inverse problems with expensive forward models and highdimensional parameters (as is the case for ice sheet inverse problems), fully characterizing the posterior is typically not tractable. Consequently, we compute the Laplace approximation of the posterior, which requires only the maximum a posteriori (MAP) estimate, i.e., the basal sliding coefficient which maximizes the posterior density and the approximate posterior covariance.
We use Bayes' theorem to write the solution of the Bayesian inverse problem as the posterior measure, which describes the probability law of the parameter conditioned on measurements (Tarantola, 2005; Stuart, 2010). The Formulation of the posterior relies on both the prior density and the likelihood function, which we outline below. We note that initially we pose the problem in an infinitedimensional setting, which is particularly well suited to largescale problems (e.g., BuiThanh et al., 2012; Isaac et al., 2015a) as it ensures the discretization invariance and wellposedness of the Bayesian inverse problem (Stuart, 2010).
3.1 The prior
We postulate a Gaussian prior density on the (spatially varying) basal sliding coefficient, i.e., $\mathit{\beta}\sim \mathcal{N}({\mathit{\beta}}_{*},{\mathcal{C}}_{\mathit{\beta}})$, with covariance operator 𝒞_{β}, and mean value ${\mathit{\beta}}_{*}\in \mathcal{E}$, where ℰ is defined as the range of ${\mathcal{C}}_{\mathit{\beta}}^{\frac{\mathrm{1}}{\mathrm{2}}}$ (see, for example, Stuart, 2010 and BuiThanh et al., 2013 for more details). To ensure the inverse problem is well posed in infinite dimensions, we use a squared inverse elliptic operator to define the prior covariance operator^{2} (e.g., Flath et al., 2011; BuiThanh et al., 2013; Petra et al., 2014).
More specifically, we take ${\mathcal{C}}_{\mathit{\beta}}={\mathcal{A}}^{\mathrm{2}}$, where 𝒜 is the second order elliptic differential operator defined by
where the strictly positive parameters γ_{β} (m^{2}) and δ_{β} (adimensional) control the correlation length and the marginal variance. Specifically, the correlation length (defined as the distance for which the two points have a correlation coefficient of 0.1) is proportional to $\sqrt{{\mathit{\gamma}}_{\mathit{\beta}}/{\mathit{\delta}}_{\mathit{\beta}}}$ (m), while the variance is proportional to ${\mathit{\delta}}_{\mathit{\beta}}^{\mathrm{2}}{\left({\mathit{\gamma}}_{\mathit{\beta}}/{\mathit{\delta}}_{\mathit{\beta}}\right)}^{\frac{d\mathrm{1}}{\mathrm{2}}}$ (see, for example, Khristenko et al., 2019 and the references therein). This choice of prior covariance operator is particularly well suited to largescale problems, as discretization of 𝒜 (using a finite element discretization) is sparse (see, for example, Lindgren et al., 2011 and Osborn et al., 2017). As discussed in Khristenko et al. (2019), Daon and Stadler (2018), and Roininen et al. (2014), suitable boundary conditions need to be stipulated to reduce boundary artifacts. In this work, we choose to equip 𝒜 with periodic boundary conditions on ∂Γ_{b}, which parallels the periodic boundary condition (1e) of the forward model. We note that the discrete representation of the prior covariance operator, denoted Γ_{pr}, is defined as follows (e.g., BuiThanh et al., 2013; Petra et al., 2014; Villa et al., 2021):
Therefore, the discrete parameter β follows a Gaussian distribution $\mathcal{N}({\mathit{\beta}}_{*},{\mathbf{\Gamma}}_{\mathrm{pr}})$, with prior mean ${\mathit{\beta}}_{*}\in {\mathbb{R}}^{m}$ and covariance Γ_{pr}. That is, the prior density of β is given by
where ${\u2225\cdot \u2225}_{{\mathbf{\Gamma}}_{\mathrm{pr}}^{\mathrm{1}}}$ denotes the ${\mathbf{\Gamma}}_{\mathrm{pr}}^{\mathrm{1}}$ weighted l_{2} inner product.
3.2 The data likelihood
We assume the velocity measurements, denoted d, are corrupted by additive noise and are related to the basal sliding coefficient through
where $\mathcal{F}:{L}^{\mathrm{2}}\left(\mathrm{\Omega}\right)\to {\mathbb{R}}^{q}$ is called the parametertoobservable map, and e∈ℝ^{q} denotes the noise in the measurements.
As is somewhat common in the literature (e.g., Raymond and Gudmundsson, 2009; Petra et al., 2012, 2014), we take the data to consist of (noisy) pointwise observations of each component of the velocity field on the top surface^{3}. In discrete settings, we compute ℱ(β) by first solving the Stokes equation (Eq. 1) and then applying a linear observation operator that extracts the velocity at the measurement locations. We assume the noise, e, is independent of the basal sliding coefficient, has zero mean, and is Gaussian, i.e., $\mathit{e}\sim \mathcal{N}(\mathbf{0},{\mathbf{\Gamma}}_{e})$. The likelihood is then of the following form (e.g., Tarantola, 2005; Kaipio and Somersalo, 2005):
3.3 The posterior
By applying Bayes' theorem, the posterior density of β is proportional to the product of the prior density (Eq. 5) and the data likelihood (Eq. 7). This is given by
The corresponding MAP estimate is then defined as
We note that the problem of finding the MAP estimate, defined in Eq. (9), reduces to a deterministic inverse problem. To solve this problem we use an inexact Newton conjugate gradient (CG) method, as in Petra et al. (2012). To derive the required first (i.e., gradient) and second (i.e., Hessian) derivative information needed by Newton's method, we use an adjointbased method and refer the reader to Petra et al. (2012) for the derivation and expressions of the required derivatives.
3.4 Quantifying posterior uncertainty
To (approximately) quantify the resulting uncertainty in the inferred basal sliding parameter, we invoke a local Gaussian approximation of the posterior (i.e., the Laplace approximation). That is, the solution to the Bayesian inverse problem is now given by a Gaussian distribution with mean β_{MAP} and covariance Γ_{po} given by the inverse of the (Gauss–Newton) Hessian of the negative logposterior, evaluated at the MAP estimate. More specifically, we make the approximation, $\mathit{\beta}\mathit{d}\sim \mathcal{N}({\mathit{\beta}}_{\mathrm{MAP}},{\mathbf{\Gamma}}_{\mathrm{po}})$, with β_{MAP} given by Eq. (9), and
where $\stackrel{\mathrm{\u203e}}{\mathbf{H}}\left(\mathit{\beta}\right)$ is the Gauss–Newton Hessian of the data misfit term (i.e., the negative loglikelihood), and F is the Jacobian matrix of the parametertoobservable map, ℱ (e.g., BuiThanh et al., 2013).
The construction of the posterior covariance matrix (i.e., the inverse of the Hessian) is prohibitive for largescale problems since its dimension is equal to the dimension of the parameter. To make operations with the posterior covariance matrix tractable, we exploit the fact that the eigenvalues of $\stackrel{\mathrm{\u203e}}{\mathbf{H}}\left({\mathit{\beta}}_{\mathrm{MAP}}\right)$ collapse to zero rapidly since the data contain limited information about the (infinitedimensional) parameter field. Thus a lowrank approximation of the data misfit component of the Hessian $\stackrel{\mathrm{\u203e}}{\mathbf{H}}$ can be constructed as in Isaac et al. (2015a) by solving the generalized eigenvalue problem
where ${\mathbf{\Lambda}}_{r}=\mathrm{diag}({\mathit{\lambda}}_{\mathrm{1}},{\mathit{\lambda}}_{\mathrm{2}},\mathrm{\dots},{\mathit{\lambda}}_{r})\in {\mathbb{R}}^{r\times r}$ is a diagonal matrix collecting the r largest generalized eigenvalues, λ_{i}, and ${\mathbf{V}}_{r}=\left[{\mathit{v}}_{\mathrm{1}},{\mathit{v}}_{\mathrm{2}},\mathrm{\dots},{\mathit{v}}_{r}\right]\in {\mathbb{R}}^{m\times r}$ is the matrix collecting the corresponding ${\mathbf{\Gamma}}_{\mathrm{pr}}^{\mathrm{1}}$orthonormal eigenvectors, v_{i}. Above, the truncation index r is chosen such that the remaining eigenvalues, λ_{i}, for $i=r+\mathrm{1},\mathrm{\dots},m$, are sufficiently smaller than 1 – often chosen such that λ_{i}≤c, for some $\mathrm{0.01}\le c\le \mathrm{1}$ (e.g., Isaac et al., 2015a; Flath et al., 2011).
Substituting $\stackrel{\mathrm{\u203e}}{\mathbf{H}}\approx {\mathbf{\Gamma}}_{\mathrm{pr}}^{\mathrm{1}}{\mathbf{V}}_{r}{\mathbf{\Lambda}}_{r}{\mathbf{V}}_{r}^{T}{\mathbf{\Gamma}}_{\mathrm{pr}}^{\mathrm{1}}$ into Eq. (10) and using the Sherman–Morrison–Woodbury identity (Golub and Van Loan, 1996) and after a few algebraic manipulations (e.g., Isaac et al., 2015a), we obtain the following lowrankbased approximation of the posterior covariance (under the Laplace approximation):
where ${\mathit{d}}_{r}=\mathrm{diag}({\mathit{\lambda}}_{\mathrm{1}}/({\mathit{\lambda}}_{\mathrm{1}}+\mathrm{1}),{\mathit{\lambda}}_{\mathrm{2}}/({\mathit{\lambda}}_{\mathrm{2}}+\mathrm{1}),\mathrm{\dots},{\mathit{\lambda}}_{r}/({\mathit{\lambda}}_{r}+\mathrm{1}\left)\right)\in {\mathbb{R}}^{r\times r}$.
The Bayesian approximation error (BAE) approach (Kaipio and Somersalo, 2007, 2005; Kaipio and Kolehmainen, 2013) can be used to approximately premarginalize over auxiliary parameters. The BAE approach essentially combines all uncertainties, including those generated by fixing uncertain parameters, into a single additive total error term. The total error term can then be premarginalized over, i.e., marginalized over before the acquisition of data. We next outline the process.
We denote by a the auxiliary parameters which in the current study are defined over the entire computational domain, Ω, and are assumed to be Gaussian distributed with covariance operator ${\mathcal{C}}_{a}={\mathcal{L}}^{\mathrm{2}}$, where ℒ is defined by
and mean value a_{∗}. In line with the forward problem, ℒ is equipped with periodic boundary conditions on the lateral boundaries of Ω, while on the top and bottom boundaries we enforce Robin boundary conditions. Note that explicit knowledge of the distribution of a is not needed; we only require the ability to sample realizations of a. In what follows, we denote by a any (possibly more than one) discretized auxiliary (uncertain) parameter, such as the rheology parameters.
Next, we let $\stackrel{\mathrm{\u0303}}{\mathcal{F}}(\mathit{\beta},\mathit{a})$ denote an accurate parametertoobservable mapping so that the relationship between the parameters and the measured data is given by
Then, with the aim of avoiding socalled joint inversion, i.e., estimating β and a simultaneously, we introduce the approximate parametertoobservable mapping: $\mathcal{F}\left(\mathit{\beta}\right)=\stackrel{\mathrm{\u0303}}{\mathcal{F}}(\mathit{\beta},{\mathit{a}}_{\ast})$. That is, the approximate parametertoobservable map is the accurate parametertoobservable map but with the auxiliary parameters fixed to the associated mean value, i.e., $\mathit{a}={\mathit{a}}_{\ast}$. Fixing a to some other nominal value is also possible.
The goal is then to carry out the estimation of β using only the approximate parametertoobservable map, ℱ(β), while taking into account the (statistics of) the discrepancy between the models. To this end, Eq. (14) is reformulated as
where $\mathit{\epsilon}=\stackrel{\mathrm{\u0303}}{\mathcal{F}}(\mathit{\beta},\mathit{a})\mathcal{F}\left(\mathit{\beta}\right)$ is known as the approximation error and $\mathit{\nu}=\mathit{e}+\mathit{\epsilon}$ as the total error (e.g., Nicholson et al., 2018; Tarvainen et al., 2010). Next, the approximation error is approximated as a Gaussian with mean ε_{*} and covariance Γ_{ε}, i.e., $\mathit{\epsilon}\sim \mathcal{N}({\mathit{\epsilon}}_{*},{\mathbf{\Gamma}}_{\mathit{\epsilon}})$. Though, formally, the approximation error depends on the parameters, i.e., $\mathit{\epsilon}=\mathit{\epsilon}(\mathit{\beta},\mathit{a})$, a further approximation, termed the enhanced error model or the composite error model approximation, is often made, which approximates ε as independent of all parameters (Kaipio and Kolehmainen, 2013). This leads to the total errors being distributed as $\mathit{\nu}\sim \mathcal{N}({\mathit{\nu}}_{*},{\mathbf{\Gamma}}_{\mathit{\nu}})=\mathcal{N}({\mathit{\epsilon}}_{*},{\mathbf{\Gamma}}_{e}+{\mathbf{\Gamma}}_{\mathit{\epsilon}})$.
Use of the BAE approach results in an updated posterior density for β:
which is obtained by explicit marginalization over ν (Kaipio and Kolehmainen, 2013). The updated MAP estimate is then
This updated expression for the MAP estimate is only a slight modification of the original MAP estimate given in Eq. (9); thus reformulating the corresponding adjoint, incremental forward, and incremental adjoint equations is essentially trivial. Lastly, the updated posterior covariance matrix (under the Laplace approximation) is now given by
4.1 Computing the approximation error statistics
In the current paper, all parameters are taken to have Gaussian (prior) distributions, i.e., $z\sim \mathcal{N}({z}_{\ast},{\mathcal{C}}_{z})$, with $z=(\mathit{\beta},a)$. We also assume β and a are independent; thus specifying β_{∗}, a_{∗}, 𝒞_{β}, and 𝒞_{a} fully describes the prior density.
Unlike the statistics of the parameters and the measurement noise, the mean (ε_{*}) and covariance (Γ_{ε}) of the approximation errors must in general be estimated based on (Monte Carlo) samples. That is,
with N∈ℕ the number of samples, ${\mathit{\epsilon}}^{\left(\mathrm{\ell}\right)}=\stackrel{\mathrm{\u0303}}{\mathcal{F}}({\mathit{\beta}}^{\left(\mathrm{\ell}\right)},{\mathit{a}}^{\left(\mathrm{\ell}\right)})\mathcal{F}\left({\mathit{\beta}}^{\left(\mathrm{\ell}\right)}\right)$ for $\mathrm{\ell}=\mathrm{1},\mathrm{2},\mathrm{\dots},N$, where β^{(ℓ)} and a^{(ℓ)} are samples drawn from the joint prior density, and $\mathit{e}=[{\mathit{\epsilon}}^{\left(\mathrm{1}\right)}{\mathit{\epsilon}}_{*},{\mathit{\epsilon}}^{\left(\mathrm{2}\right)}{\mathit{\epsilon}}_{*},\mathrm{\dots},{\mathit{\epsilon}}^{\left(N\right)}{\mathit{\epsilon}}_{*}]$. The samples, β^{(ℓ)} and a^{(ℓ)}, are generated efficiently as in Villa et al. (2021, Eq. 30).
It is worth noting that all sampling and computations of the approximation errors and the associated statistics can be carried out prior to the acquisition of any data and is thus often termed offline computations (Kaipio and Kolehmainen, 2013). Furthermore, though the computational cost per sample of ε in the current paper is two forward (nonlinear) Stokes solves, the sampling procedure is embarrassingly parallel, i.e., each sample can be carried out independently, and in practice, only a fairly small number of samples is required.
We conclude this section by giving several rules of thumb relating to the use of the BAE approach (for more details see Kaipio and Kolehmainen, 2013). Firstly, the total number of samples required to accurately construct the statistics of the approximation errors is generally (often substantially) less than N=1000. Secondly, two measures have been developed to identify when neglecting the approximation errors can result in misleading, and potentially infeasible, results (Kaipio and Kolehmainen, 2013). Specifically, if either
holds or if for any w∈ℝ^{q}
holds, then the approximation errors are said to dominate the noise, and ignoring them often gives erroneous results. Intuitively, if the approximation errors dominate the noise, then ignoring them often results in overconfidence in the approximate forward model, in turn leading to overly confident and biased posterior estimates.
In this section, we outline three numerical examples to assess the applicability, performance, and robustness of the BAE approach to account for uncertain rheology parameters. Several additional examples are provided in the accompanying Supplement, as detailed in Sect. 6.4. In all cases, the parameter of interest is the basal sliding coefficient, β. Any other unknown/uncertain parameters are (approximately) premarginalized over using the BAE approach, as outlined in Sect. 4.
The forward problems considered here are based on the models used in the Ice Sheet Model Intercomparison Project for HigherOrder ice sheet Models (ISMIPHOM) benchmark study carried out in Pattyn et al. (2008). Accordingly, all problems are considered in boxlike geometries, i.e., $\mathrm{\Omega}=[\mathrm{0},L{]}^{d\mathrm{1}}\times [\mathrm{0},H]$, for d=2 (in examples 1 and 2) or d=3 (in Example 3). Furthermore, in all model problems we take the ice slab to be set on an incline plane with slope θ=0.1^{∘}, the density of the ice to be ρ=910 kg m^{−d}, and the gravitational acceleration constant to be g=9.81 m s^{−2}. For all examples, we set the length at L=10^{4} m, while for examples 1 and 2 we set H=250 m, and for Example 3, we set H=10^{3} m. In Fig. 1, we show a twodimensional schematic of the model problem setup.
The true basal sliding coefficient fields used for each example are based on those in Petra et al. (2012). Specifically, letting $\mathit{\omega}=\mathrm{2}\mathit{\pi}/L$, for examples 1 and 2 (posed in two dimensions) we set
as shown in Fig. 7, while in Example 3 (posed in three dimensions) we set
as shown in Fig. 13.
For all numerical experiments, we use synthetic measurements; these are randomly placed noisy pointwise measurements of each component of the velocity on the top surface of the domain, i.e., at points on Γ_{t}. Examples 1 and 2 are carried out based on q=80 measurement locations, while for Example 3 we use q=100 measurement locations. These measurements are obtained by adding zero mean white noise to the solution of the forward problem. Thus the additive noise is of the form $\mathit{e}\sim \mathcal{N}(\mathbf{0},{\mathbf{\Gamma}}_{e})$ with covariance matrix ${\mathbf{\Gamma}}_{e}={\mathit{\delta}}_{e}^{\mathrm{2}}\mathbf{I}$. We take δ_{e} to satisfy ${\mathit{\delta}}_{e}=\left(\frac{\mathrm{1}}{\mathrm{100}}\right)\times (max(\mathcal{B}\mathit{u}\left({\mathit{\beta}}_{\mathrm{true}}\right))min(\mathcal{B}\mathit{u}\left({\mathit{\beta}}_{\mathrm{true}}\right)\left)\right)$, i.e., the noise level is 1 % of the range of the noiseless synthetic measurements. The precise noise level is problem specific; however, when using GPS techniques and InSAR velocity measurements, a 1 % noise level is realistic (see, for example, Martin and Monnier, 2014 and the references therein).
For all examples considered here, the prior mean for the basal sliding coefficient, β, is set at ${\mathit{\beta}}_{*}=\mathrm{7}$. On the other hand, the prior covariance operator, 𝒞_{β}, is identical for examples 1 and 2, while for Example 3 different controlling parameters are used (details are provided in Table 1). Along with the true basal sliding coefficient used in examples 1 and 2, we also show the prior distribution and three samples drawn from the prior in Fig. 7. In Fig. 12, we show four samples from the prior used in Example 3.
5.1 Example problems
We now give the specific details of each model problem and make apparent which parameters we treat as auxiliary parameters and subsequently premarginalize over. Key details about each model problem are summarized in Table 1.
Example 1: uncertain flow rate factor in the twodimensional linear Stokes ice sheet model
The first example is carried out assuming a linearized (Stokes) ice sheet model in twodimensions. Specifically, we set n=1 in Eq. (1), resulting in the effective viscosity being given by $\mathit{\eta}\left(\mathit{x}\right)=\frac{\mathrm{1}}{\mathrm{2}}A(\mathit{x}{)}^{\mathrm{1}}$. The flow rate factor, A, is taken to be unknown and spatially varying, as is often the case in reality. We represent the flow rate factor as $A={A}_{\mathrm{0}}\mathrm{exp}(na(\mathit{x}\left)\right)$, with ${A}_{\mathrm{0}}=\mathrm{2.140373}\times {\mathrm{10}}^{\mathrm{7}}$ Pa^{−1} a^{−1}, n=1 is the Glen's flow law exponent, and the prefactor, exp (−na(x)), takes the role of the auxiliary parameter, which will subsequently be premarginalized over using the BAE approach. The prefactor accounts for several physical and computational phenomena, such as the Arrhenius relationship between A(x) and the ice temperature (e.g., Cuffey and Paterson, 2010; Zhu et al., 2016) and the use of the socalled enhancement factors (Cuffey and Paterson, 2010; Ma et al., 2010). The “exp” function is used to ensure the prefactor remains positive.
The prior distribution of the flow rate prefactor is set by taking the prior mean to be ${a}_{*}=\mathrm{0}$, while the controlling parameters of the prior covariance operator are given in Table 1. The true prefactor and three other samples drawn from this prior distribution are shown in Fig. 2 for examples 1a and 1b. As outlined below, the computational meshes used for examples 1a and 1b are different. This leads to differences in the true prefactor used for both examples. This in turn results in different synthetic data being used for the inversions; however, in both cases, the standard deviation of the noise is δ_{e}≈0.07. In both cases, the flow rate prefactor is discretized using continuous quadratic Lagrange basis functions.
We use this example to also demonstrate that the proposed approach is independent of the discretization, a critical property to have when aiming to solve largescale problems. This is done by considering identical problems on two different levels of discretization. Specifically, we consider the problem on two structured meshes having substantially different levels of discretization.
 (a)
In the first case, the computational mesh consists of 2000 triangular elements, which results in the discretized velocity and pressure having 8400 degrees of freedom (DOFs), and 1100 DOFs, while the basal sliding coefficient has 100 unknowns, and the flow rate prefactor has 4200 DOFs.
 (b)
In the second case, the mesh is refined, and it consists of 8000 triangular elements, leading to 32 800, and 4200 DOFs for the discretized velocity and pressure, respectively, while the dimensions of the basal sliding coefficient and the flow rate prefactor are 200 and 16 400, respectively.
Example 2: uncertain Glen's flow law exponent in the twodimensional nonlinear Stokes ice sheet model
For the second example, we use the nonlinear Stokes problem (Eq. 1) as the governing equation. We take the Glen's flow law exponent, n(x), as an uncertain (and unknown) spatially varying auxiliary parameter, i.e., we set a(x)=n(x), and proceed to approximately premarginalize over it. The prior mean of the Glen's flow law exponent is set to ${a}_{*}=\mathrm{3}$, while the parameters controlling the covariance operator, 𝒞_{a}, are given in Table 1 and are chosen to ensure that the Glen's flow law exponents are in line with the literature. As noted, a shearthinning rheology is generally used when modeling ice sheets, and we thus enforce 1≤n(x). In the current paper, this is done by rejection sampling, which corresponds to constraining the function space in which n lies (see Dashti and Stuart, 2016, Eq. 10.10 for details), though other methods could also be used, such as reparameterizing the Glen's flow law exponent.
We also use this example to study the effect of larger modeling errors (i.e., excessive errors). That is, we consider the case when the variance of the approximation errors is so large that essentially all information in the data is washed out. As we shall see, however, the resulting uncertainty estimates are still feasible. To induce larger uncertainties (and resulting approximation errors), we alter the prior covariance operator for the Glen's flow law exponent, n, to favor more highly oscillatory realizations. We can thus further divide Example 2 into two cases:
 (a)
modest approximation errors and
 (b)
excessive approximation errors.
The parameters used to control the covariance of the distributions on n are shown in Table 1. The true Glen's flow law exponents used to generate the data for examples 2a and 2b are drawn from the respective distributions, which, along with several other samples of the Glen's flow law exponent from each of the distributions, are shown in Fig. 3. In both cases, the Glen's flow law exponent is discretized using continuous linear Lagrange basis functions, while the computational mesh used is the same as that used in Example 1a. Finally, in Example 2a we have δ_{e}≈0.04, while in Example 2b we have δ_{e}≈0.05.
Example 3: uncertain flow rate factor in the threedimensional nonlinear Stokes ice sheet model
In this example, we consider a threedimensional (d=3), nonlinear analogue of Example 1. Specifically, we consider Eq. (1) in three dimensions, with the Glen's flow law exponent set to n=3. Similar to Example 1, we suppose the flow rate factor is spatially heterogeneous, unknown, and parameterized as $A={A}_{\mathrm{0}}\mathrm{exp}(na(\mathit{x}\left)\right)$. Here the nominal value for the flow rate factor is set to ${A}_{\mathrm{0}}={\mathrm{10}}^{\mathrm{16}}$ Pa^{−3} a^{−1}, the Glen's flow law exponent is set to n=3, and the prefactor, exp (−na(x)), takes into account several physical and computational phenomena as described previously.
The mean value of the auxiliary parameter is set at ${a}_{\ast}=\mathrm{0}$, while the parameters controlling the distribution of the prefactor are given in Table 1. These values for the prior covariance operator of a ensure the flow rate values are in line with those presented in the literature (see, for example, Table 3.4 in Cuffey and Paterson, 2010). In Fig. 4, we show the true flow rate prefactor (a sample from the prior) along with three other samples from the associated prior density. Unlike Example 1, the flow rate prefactor in this example is discretized using continuous linear Lagrange basis functions. The computational mesh used consists of 19 200 tetrahedral elements, leading to 81 600 DOFs for the velocity, 3600 DOFs for the pressure, 27 200 DOFs for the flow rate prefactor, and 400 DOFs for the basal sliding coefficient.
5.2 Estimates and approximate posterior covariances
For each of the examples listed above, we compare the estimation results (MAP points and approximate posterior covariances) for three different approaches. Within each example, for each of the approaches, the same prior distribution is used for the basal sliding coefficient; thus it is only the associated likelihoods that differ. In our analysis, we place particular emphasis on the feasibility of the posterior estimates, that is, whether or not the computed posterior distributions support the true basal sliding coefficient. The three different approaches considered are as follows.
 (a)
The accurate case (REF). In this case, any auxiliary parameters are set to their true values; i.e., we use $\stackrel{\mathrm{\u0303}}{\mathcal{F}}(\mathit{\beta},{a}_{\mathrm{true}})$ as the parametertoobservable map. REF is computed as a benchmark/reference. The resulting likelihood for REF is
$$\begin{array}{}\text{(24)}& {\mathit{\pi}}^{\mathrm{REF}}\left(\mathit{d}\mathrm{}\mathit{\beta}\right)\propto \mathrm{exp}\left\{{\displaystyle \frac{\mathrm{1}}{\mathrm{2}}}{\u2225\stackrel{\mathrm{\u0303}}{\mathcal{F}}(\mathit{\beta},{\mathit{a}}_{\mathrm{true}})\mathit{d}\u2225}_{{\mathbf{\Gamma}}_{e}^{\mathrm{1}}}^{\mathrm{2}}\right\},\end{array}$$while the accurate MAP estimate and the corresponding posterior covariance matrix are denoted by ${\mathit{\beta}}_{\mathrm{MAP}}^{\mathrm{REF}}$ and ${\mathbf{\Gamma}}_{\mathrm{po}}^{\mathrm{REF}}$, respectively.
 (b)
The conventional error model approach (CEM). This approach uses the standard error model (induced by the additive error, e) while using the approximate model, ℱ(β), where the auxiliary parameters are set to some nominal value (such as $a={a}_{\ast}$). The likelihood is then of the form
$$\begin{array}{}\text{(25)}& {\mathit{\pi}}^{\mathrm{CEM}}\left(\mathit{d}\mathrm{}\mathit{\beta}\right)\propto \mathrm{exp}\left\{{\displaystyle \frac{\mathrm{1}}{\mathrm{2}}}{\u2225\mathcal{F}\left(\mathit{\beta}\right)\mathit{d}\u2225}_{{\mathbf{\Gamma}}_{e}^{\mathrm{1}}}^{\mathrm{2}}\right\}.\end{array}$$We denote the corresponding MAP estimate and the posterior covariance matrix by ${\mathit{\beta}}_{\mathrm{MAP}}^{\mathrm{CEM}}$ and ${\mathbf{\Gamma}}_{\mathrm{po}}^{\mathrm{CEM}}$, respectively.
 (c)
The Bayesian approximation error approach (BAE). This approach also uses the approximate model, ℱ(β), but accounts for the approximation errors using the BAE approach outlined in Sect. 4. As given in Eq. (17), the updated likelihood found using the BAE approach is
$$\begin{array}{}\text{(26)}& {\mathit{\pi}}^{\mathrm{BAE}}\left(\mathit{d}\mathrm{}\mathit{\beta}\right)\propto \mathrm{exp}\left\{{\displaystyle \frac{\mathrm{1}}{\mathrm{2}}}{\u2225\mathcal{F}\left(\mathit{\beta}\right)\mathit{d}+{\mathit{\nu}}_{*}\u2225}_{{\mathbf{\Gamma}}_{\mathit{\nu}}^{\mathrm{1}}}^{\mathrm{2}}\right\},\end{array}$$with the MAP estimate and the posterior covariance matrix denoted by ${\mathit{\beta}}_{\mathrm{MAP}}^{\mathrm{BAE}}$ and ${\mathbf{\Gamma}}_{\mathrm{po}}^{\mathrm{BAE}}$, respectively.
Here we discuss and compare MAP estimates for the basal sliding coefficient and the respective approximate posterior covariance for each example. As alluded to previously, we pay particular attention to the feasibility of the posterior uncertainty estimates when comparing the results. We also examine the spectrum of the prior preconditioned misfit Hessians, which gives further insight into the uncertainty, and the sensitivity of each approach. To conclude the section, we give a brief comparison of the online computational costs (in terms of linearized Stokes partial differential equation, PDE, solves) for computing the MAP estimates.
To solve the optimization problems, we use an inexact NewtonCG method (see, for example, Petra et al., 2012). In all cases, we start the optimization procedure using the prior mean for the initial estimate of the basal sliding coefficient, while the prior covariance operator is used as the preconditioner. The optimization is carried out using a Gauss–Newton Hessian approximation for the first five iterations and then full Newton combined with an Armijo line search (Nocedal and Wright, 2006). Convergence is established when the gradient has decreased by a factor of 10^{6} relative to the norm of the initial gradient.
The numerical results presented in this paper are obtained using hIPPYlib (an inverse problem Python library; Villa et al., 2018, 2021). The hIPPYlib library implements stateoftheart scalable adjointbased algorithms for PDEbased deterministic and Bayesian inverse problems. It builds on FEniCS (Dupont et al., 2003; Logg et al., 2012) for the discretization of the PDEs and on PETSc (Balay et al., 2009) for scalable and efficient linear algebra operations and solvers needed for the solution of the PDEs. In line with the finite element discretization used for the weak form of the forward problem (Eq. 2), in what follows, we use Taylor–Hood finite elements for the adjoint, incremental forward, and incremental adjoint equations, as in Petra et al. (2012).
6.1 Example 1
In this example, we consider the case of an uncertain flow rate factor in the twodimensional linear Stokes ice sheet model and demonstrate the mesh independence of the approach. We begin by discussing the statistics of the approximation errors which are induced by treating the unknown flow rate factor as a known constant, specifically, $A=\mathrm{2.140373}\times {\mathrm{10}}^{\mathrm{7}}\phantom{\rule{0.125em}{0ex}}{\text{Pa}}^{\mathrm{1}}\phantom{\rule{0.125em}{0ex}}{\text{a}}^{\mathrm{1}}$. In Fig. 5, we show the marginal distribution of the approximation errors in the x component (top) and y component (bottom) for Example 1a (left) and Example 1b (right). The approximation errors are similar for the coarser mesh (Example 1a) and the finer mesh (Example 1b), both having fairly constant mean and variance in each component. For both examples, the mean of the approximation errors in the x component of the velocity measurements is nonzero, ${\mathit{\epsilon}}_{\ast}\approx \mathrm{0.2}$, while the standard deviation of the approximation errors is substantially larger than the additive noise (δ_{e}≈0.07). That is, the approximation errors dominate the additive noise, as explained in Sect. 4.1, and it is likely (and is indeed the fact) that ignoring the approximation errors may lead to infeasible results.
To illustrate the convergence of the approximation errors, in the top row of Fig. 6, we show the spectrum of the approximation errors covariance matrices, Γ_{ε}, for examples 1a and 1b for increasing sample sizes (N=62, 250, 500, and 1000). From the figure, it is evident that for N≥250 samples, the spectra essentially coincide and have both converged, thus demonstrating the discretization independence of the approach and that approximately N≥250 samples are likely sufficient to characterize the approximation error statistics. Note, however, that the results displayed here use N=1000 samples.
To give further insight into the distribution of the approximation errors, we show the (Pearson's) correlation matrix of the approximation errors for Example 1a (left) and 1b (right) in the bottom row of Fig. 6. Firstly, we notice that the correlation matrices are highly structured (unlike the noise covariance matrix which is diagonal). It is also apparent that the correlation matrices are (visually) identical, further illustrating the discretization independence. The 2×2 block structure of the correlation matrices is to be expected since the measurement number indexing used corresponds to measuring the q=80 velocity measurements in the x direction first, followed by the 80 velocity measurements in the y direction. The behavior within the diagonal blocks is also fairly intuitive as periodic boundary conditions are used, while the structure also illustrates that measurements (relatively) far away from each other are fairly uncorrelated. Comparing the diagonal blocks we see that the approximation errors in the x component of the velocity measurements are more highly correlated at greater distances than those of the y component. Finally, the offdiagonal blocks show a nontrivial correlation between the approximation errors in the x and y components. In particular, this figure shows that the approximation errors have a similar structure to the main diagonal but reveals smaller (Pearson's) correlation coefficients (these range from about −0.5 to 0.5).
In the top row of Fig. 7, we show the marginal prior distributions and the resulting marginal posterior distributions. Also shown are the corresponding MAP estimates, the true basal sliding coefficient, and three draws from each of the distributions. Firstly, the accurate MAP estimate, ${\mathit{\beta}}_{\mathrm{MAP}}^{\mathrm{REF}}$, is in good agreement with the true basal sliding coefficient, while the accurate posterior is clearly feasible in the sense that the true basal sliding coefficient is well supported by the Laplaceapproximated posterior. On the other hand, the MAP estimate found using the conventional error model, ${\mathit{\beta}}_{\mathrm{MAP}}^{\mathrm{CEM}}$, differs substantially from the true basal sliding coefficient over most of the domain. Furthermore, the posterior is essentially infeasible, with the actual coefficient having virtually no posterior density. Conversely, the MAP estimate found using the BAE approach, ${\mathit{\beta}}_{\mathrm{MAP}}^{\mathrm{BAE}}$, is in fairly good agreement with the true coefficient, and the Laplaceapproximated posterior supports the truth well. We do see that the marginal posterior standard deviations found using the BAE approach are somewhat larger than those found using the accurate and conventional error approaches. This is typical, and to be expected, as the additional uncertainty in the flow rate prefactor manifests itself as extra posterior uncertainty.
In the bottom row of Fig. 7, we show the corresponding results for Example 1b. The results are fairly similar to Example 1a when using the accurate approach and when using the BAE approach^{4} despite the substantial difference in the discretizations used. Lastly, the MAP estimate found using the conventional error model has changed drastically from Example 1a, though the posterior is equally as bad.
6.2 Example 2
In this example, we consider the case of an uncertain Glen's flow law exponent in the twodimensional nonlinear Stokes ice sheet model and also demonstrate what happens when the approximation errors are, in some sense, too large. The approximation errors here are the result of treating the unknown, and spatially varying, Glen's flow law exponent as a fixed constant, i.e., setting $n={n}_{\mathrm{0}}=\mathrm{3}$. To induce the larger approximation errors for Example 2b compared to Example 2a, we increase the uncertainty in the Glen's flow law exponent by altering the associated prior distribution (see Table 1). The difference in magnitude of the approximation errors is apparent in Fig. 8, where we show the marginal distribution of the approximation errors at the observation locations in the x direction (top) and y direction (bottom) velocities for examples 2a (left) and 2b (right). Note that the variance of the approximation errors for Example 2b is substantially larger than that of Example 2a. Considering that the standard deviation of the added noise for the small approximation error case (Example 2a) is ${\mathit{\delta}}_{{e}_{\mathrm{a}}}\approx \mathrm{0.04}$ and for the large approximation error case (Example 2b) is ${\mathit{\delta}}_{{e}_{\mathrm{b}}}\approx \mathrm{0.05}$, the approximation errors in both examples dominate the noise (see Sect. 4.1).
The top row of Fig. 9 shows the spectrum of the covariance matrices of the approximation errors, Γ_{ε}, for N=125, 500, 1000, and 2000 samples for Example 2a and for N=250, 1000, 2000, and 4000 samples for Example 2b. For Example 2a, it appears N≈500 is enough samples, though for the results here we used N=1000, while for Example 2b we require N≈2000 samples. The fact that more samples are required to ensure convergence of the approximation errors in Example 2b follows naturally from the increased uncertainty. It is worth pointing out that Example 2b is used mainly to demonstrate how the BAE approach performs in the presence of too much modeling uncertainty; thus for the purposes of the current study we deem taking N=2000 as tolerable.
In the bottom row of Fig. 9, we show the (Pearson's) correlation matrices of the approximation errors. The correlation matrices for this example share several of the characteristics seen in the corresponding correlation matrices in Example 1, specifically, the block structure and general behavior within the blocks. Comparing the correlation matrices for examples 2a and 2b, it appears the approximation errors in the x component for Example 2a are more highly correlated at greater distances towards the edges of the computational domain compared to the approximation errors in the x component of the velocity measurements for Example 2b.
In the top row of Fig. 10, we show the marginal prior and Laplaceapproximated posterior distributions, as well as three draws from each of the distributions, the corresponding MAP estimates, and the true basal sliding coefficient for Example 2a. A couple of conclusions can be drawn from this figure. First, the accurate MAP estimate, ${\mathit{\beta}}_{\mathrm{MAP}}^{\mathrm{REF}}$, closely resembles the true basal sliding coefficient, and the truth is well supported by the accurate posterior distribution. Second, the Laplaceapproximated posterior found using the conventional error approach is infeasible for most of the right half of the domain, with the MAP estimate, ${\mathit{\beta}}_{\mathrm{MAP}}^{\mathrm{CEM}}$, (severely) underestimating the true basal sliding coefficient. Third, the true basal sliding coefficient lies well within the bulk of the (Laplaceapproximated) posterior for the BAE approach, with the MAP estimate, ${\mathit{\beta}}_{\mathrm{MAP}}^{\mathrm{BAE}}$, in fairly good agreement with the true basal sliding coefficient.
In the bottom row of Fig. 10, we show the corresponding results for Example 2b, in which the approximation errors are excessive. Under the Laplace approximation, the accurate posterior, found by using the true Glen's flow law exponent, remains an accurate representation of the truth as in Example 2a. The posterior found using the conventional error model approach has significantly deteriorated, however, with the true basal sliding coefficient even more markedly underestimated and the truth lying well outside the bulk of the Laplaceapproximated posterior over almost all of the domain. Conversely, by taking into account the excessive modeling errors in Example 2b, the posterior found using the BAE approach is comparable to the prior, with the corresponding MAP estimate, ${\mathit{\beta}}_{\mathrm{MAP}}^{\mathrm{BAE}}$, being fairly similar to the prior mean. This demonstrates that when using the BAE approach, as the modeling errors become larger, the corresponding posterior density tends towards the prior, as should be hoped, to avoid overconfidence in biased results.
6.3 Example 3
In this example, we consider an uncertain flow rate factor in a larger scale, threedimensional nonlinear Stokes ice sheet model. The approximation errors are the result of setting the unknown flow rate factor to $A={\mathrm{10}}^{\mathrm{16}}\phantom{\rule{0.125em}{0ex}}{\text{Pa}}^{\mathrm{3}}\phantom{\rule{0.125em}{0ex}}{\text{a}}^{\mathrm{1}}$. The spectrum for the approximation errors are shown in Fig. 11. The plot indicates that taking $\mathrm{500}<N\le \mathrm{1000}$ samples would likely suffice to accurately characterize the approximation errors. For the results discussed here, we used N=1000 samples. The average standard deviation of the approximation errors in the x component of the approximation errors is approximately 3.1, while for the y and z components, the average standard deviation of the approximation errors are 0.5 and 0.4, respectively. The standard deviation of the noise, on the other hand, is δ_{e}≈0.25. We thus can expect the resulting estimates found by disregarding the approximation errors to be unreasonable (see Sect. 4.1).
In Fig. 12, we show four draws from the prior density on the basal sliding coefficient, while in Fig. 13 we show the true basal sliding coefficient (top left) and each of the MAP estimates: ${\mathit{\beta}}_{\mathrm{MAP}}^{\mathrm{REF}}$ (top right), ${\mathit{\beta}}_{\mathrm{MAP}}^{\mathrm{CEM}}$ (bottom left), and ${\mathit{\beta}}_{\mathrm{MAP}}^{\mathrm{BAE}}$ (bottom right). We also show the locations (y=2.5 km and y=7.5 km) of two lines, labeled l_{1} and l_{2}, for which crosssectional plots are shown in Fig. 14. It is clear from Figs. 13 and 14 that the reference posterior is completely feasible, and the corresponding MAP estimate is in good agreement with the true basal sliding coefficient. On the other hand, although the MAP estimate found using the conventional error model (CEM) shows similar qualitatively behavior, as seen in Fig. 13, when taking the corresponding posterior density into account, it is clear that the approach is essentially infeasible, with the truth lying well outside the bulk of the posterior across most of the domain (see Fig. 14). Finally, from Figs. 13 and 14 we see that, though not as good as the accurate case, the MAP estimate found using the BAE approach qualitatively remains similar to the truth. Furthermore, the truth is generally very well supported by the BAE posterior under the Laplace approximation.
6.4 Additional examples in the Supplement
To further demonstrate the flexibility and robustness of the proposed approach, we provide several additional numerical examples in the Supplement accompanying this paper. Specifically, we consider three additional cases that are variations of Example 1. Section S.1 of the Supplement demonstrates the robustness of the BAE approach in the case in which the true distribution of the auxiliary parameter is not known. The results show that as long as the true auxiliary parameter is well supported by the assumed distribution, the BAE approach provides posteriorconsistent estimates of the basal sliding coefficient β. Section S.2 compares the CEM and BAE approaches for different assumptions of the mean and marginal variance of the prior distribution. The results show that the qualitative behavior of the two approaches is consistent to that observed for Example 1. Finally, Sect. S.3 compares the BAE approach and a socalled tempering approach, in which a heuristic criterion (such as the Lcurve) is used to select an appropriate scaling of the prior or likelihood density. This tempering approach can be understood as varying the regularization parameter in a deterministic setup. This example demonstrates that the BAE approach provides a robust solution to the inverse problem without requiring multiple solutions of the inverse problem for different scaling parameters. In addition, in contrast to the tempering approach, the BAE approach does not rely on (possibly) unreliable heuristic methods to select the scaling parameter.
6.5 Spectra of the data misfit Hessians and computational costs
In this section, we compare the spectra of the data misfit Hessian and the computational cost of the three approaches (accurate, conventional error, and Bayesian approximation error) for each of the three examples. The dominant eigenvalues of the data misfit Hessian, $\stackrel{\mathrm{\u203e}}{\mathbf{H}}$ (see Eq. 10), evaluated at the corresponding MAP estimate, are shown in Fig. 15 for each example. Firstly, we observe that for all three approaches in all three examples, we only need to retain a relatively low number of eigenvalues and eigenvectors to compute a reasonable lowrank approximation of the Laplace posterior covariance matrix. Secondly, we see that the dominant spectrum resulting from using the BAE approach is often lower than that of the reference and conventional error approach cases. This is to be expected since we are accommodating the approximation errors, which naturally lead to an increase in uncertainty. Finally, the dominant spectrum of the misfit Hessian for examples 1 and 2, found using the conventional error approach, further illustrates the fact that ignoring the uncertainty in the auxiliary parameters can lead to overconfidence in erroneous estimates.
The spectrum of the misfit Hessian for Example 3, found using the conventional error approach, seems to be somewhat anomalous in that the spectrum decays faster than that of the misfit Hessian found using the BAE approach. However, this is possibly explained by the fact that the respective misfit Hessians are evaluated at quite different MAP estimates.
Figure 15 shows that the number of eigenvalues required to compute a reasonable lowrank approximation, in the sense of Eq. (12), is considerably lower for the BAE approach in most of the examples. This result suggests that computing the lowrank approximation is cheaper for the BAE approach compared to the other two approaches.
With regard to the computational cost, we consider the number of (linearized) Stokes problem solves required for the optimization algorithm to converge as the unit of cost. As stated in Sect. 3, we use the inexact NewtonCG algorithm with Armijo line search to find the MAP point. At each iteration, inexact NewtonCG requires the following:
 (a)
one (or more if required to satisfy the sufficient descent condition) evaluation of the loglikelihood, which involves solving the nonlinear Stokes equations;
 (b)
one gradient evaluation, which involves solving an additional linearized Stokes problem, i.e., the adjoint equation; and
 (c)
one Newton system solve using the conjugate gradient (CG) method, which at each CG iteration requires solving two linearized Stokes problems, i.e., the incremental forward and adjoint problems.
The total number of linearized Stokes solves required to compute the MAP estimate can then be calculated – per Gauss–Newton iteration – as the sum of the number of iterations required to solve the nonlinear forward problem plus one adjoint solve (to calculate the gradient), along with one incremental forward solve and one incremental adjoint solve (to calculate the action of the Hessian) per CG iteration. To ensure a sufficient decrease in the objective function at each (Gauss–)Newton iteration, the forward problem may be solved multiple times until the Armijo condition is satisfied, thus further increasing the number of linearized Stokes solves.
The results shown in Table 2 indicate that in each of the examples considered in this paper, the BAE approach generally requires less than half the number of the linearized Stokes solves that are required for the REF case to converge to the MAP point. Furthermore, the conventional error approach requires (in some cases, significantly) more iterations, and thus linearized Stokes solves, than the accurate model. This is to be expected as the optimization is hampered by model mismatch. It is also worth noting that the CEM approach requires substantially more backtracking iterations compared to the REF and BAE approaches, which is in line with Nicholson et al. (2018). Furthermore, the number of CG iterations is significantly reduced for the BAE approach when compared to the CEM and REF case.
In this paper, we have considered the inference for the basal sliding coefficient field for ice sheet flow problems with uncertain rheology from surface velocity measurements. The rheology parameters of the ice, in particular the flow rate factor and the Glen's flow law exponent, are often uncertain and can, at best, only be estimated in practice. We considered examples in both two and three dimensions and used both the linear and nonlinear Stokes ice sheet model. In each of the cases considered, our goal was to infer the basal sliding coefficient only; as such the unknown rheology parameters were a priori fixed to nominal values and treated as auxiliary parameters. To account for the resulting modeling uncertainties (or errors), we employed the Bayesian approximation error (BAE) approach. This approach shifts all uncertainty into a single additive total error term, which is approximated as Gaussian and can be premarginalized over.
The quantification of the resulting uncertainty in the estimated basal sliding coefficient was carried out based on the Laplace approximation to the posterior. In all of the examples considered here, the results suggest that fixing rheology parameters to standard values found in the literature can lead to overly confident and (heavily) biased estimates, with the true basal sliding coefficient generally lying outside the bulk of the posterior density if the uncertainty in the rheology parameters is not accounted for. Conversely, carrying out approximate premarginalization over the unknown rheology parameters, via the BAE approach, leads to feasible estimates for the basal sliding coefficient in all cases considered. To illustrate a limitation of the BAE approach, we included an example in which the modeling errors introduced were, in some sense, too large. This case led to a posterior density (found using the BAE approach) which showed very little reduction in variance compared to the prior, though it still contained the truth.
By avoiding the simultaneous estimation of the basal sliding coefficient and rheology parameters (which are spatially varying over the entire domain), the online computational overheads of the estimation problem are substantially reduced. To ensure the work carried out here is applicable to largescale problems, i.e., scalable, we initially posed the problem in infinite dimensions and then employed the adjointstate methodology to compute the MAP estimate.
In assessing the applicability and performance of the BAE approach, the current study only considers fairly limited domains, i.e., boxlike geometries and idealized boundary conditions. A natural next step for future work is to apply the same framework to more realistic setups and to continentalscale ice flow problems.
The numerical results presented in this paper were obtained using hIPPYlib version 3.0 (https://doi.org/10.5281/zenodo.3634136, Villa et al., 2018, 2021). The hIPPYlib software is an opensource project released under GPL v2. It is available for download at https://hippylib.github.io (last access: 1 April 2021).
No external data or geometries/meshes were used in this paper. The observations for the inversion are synthetic measurements obtained via simulation. All the parameters used to obtain the data are reported in the paper.
The supplement related to this article is available online at: https://doi.org/10.5194/tc1517312021supplement.
All authors contributed equally to the design of numerical experiments, the discussion and interpretation of the numerical results, and the writing of the original and revised version of this paper. OB implemented the numerical methods and ran the numerical experiments with help from all coauthors.
The authors declare that they have no conflict of interest.
The authors gratefully acknowledge computing time on the MultiEnvironment Computer for Exploration and Discovery (MERCED) cluster at UC Merced, which was funded by National Science Foundation grant no. ACI1429783. We thank the editor, Olivier Gagliardini, as well as the anonymous reviewer and Douglas Brinkerhoff for their valuable comments that contributed to improving the final version of this paper.
This research has been supported by the National Science Foundation (grant nos. ACI1550593, ACI1550547, and ACI1429783 and CAREER1654311).
This paper was edited by Olivier Gagliardini and reviewed by Douglas Brinkerhoff and one anonymous referee.
Arridge, S., Kaipio, J., Kolehmainen, V., Schweiger, M., Somersalo, E., Tarvainen, T., and Vauhkonen, M.: Approximation errors and model reduction with an application in optical diffusion tomography, Inverse Probl., 22, 175–195, https://doi.org/10.1088/02665611/22/1/010, 2006. a
Balay, S., Buschelman, K., Gropp, W. D., Kaushik, D., Knepley, M. G., McInnes, L. C., Smith, B. F., and Zhang, H.: PETSc Web page, available at: http://www.mcs.anl.gov/petsc (last access: 25 March 2021), 2009. a
Bons, P. D., Kleiner, T., Llorens, M.G., Prior, D. J., Sachau, T., Weikusat, I., and Jansen, D.: Greenland Ice Sheet: Higher nonlinearity of ice flow significantly reduces estimated basal motion, Geophys. Res. Lett., 45, 6542–6548, 2018. a, b
Brondex, J., GilletChaulet, F., and Gagliardini, O.: Sensitivity of centennial mass loss projections of the Amundsen basin to the friction law, The Cryosphere, 13, 177–195, https://doi.org/10.5194/tc131772019, 2019. a, b
Brynjarsdóttir, J. and O'Hagan, A.: Learning about physical parameters: The importance of model discrepancy, Inverse Probl., 30, 114007, https://doi.org/10.1088/02665611/30/11/114007, 2014. a
BuiThanh, T., Burstedde, C., Ghattas, O., Martin, J., Stadler, G., and Wilcox, L. C.: Extremescale UQ for Bayesian inverse problems governed by PDEs, in: SC12: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 10–16 November 2012, Salt Lake City, UT, USA, 1–11, https://doi.org/10.1109/SC.2012.56, 2012. a
BuiThanh, T., Ghattas, O., Martin, J., and Stadler, G.: A Computational Framework for InfiniteDimensional Bayesian Inverse Problems Part I: The Linearized Case, with Application to Global Seismic Inversion, SIAM J. Sci. Comput., 35, A2494–A2523, 2013. a, b, c, d, e
Bulthuis, K., Arnst, M., Sun, S., and Pattyn, F.: Uncertainty quantification of the multicentennial response of the Antarctic ice sheet to climate change, The Cryosphere, 13, 1349–1380, https://doi.org/10.5194/tc1313492019, 2019. a
Castello, D. A. and Kaipio, J. P.: Modeling errors due to Timoshenko approximation in damage identification, Int. J. Numer. Meth. Eng., 120, 1148–1162, 2019. a
Cuffey, K. M. and Paterson, W. S. B.: The physics of glaciers, Academic Press, Cambridge, MA, 2010. a, b, c, d, e
Daon, Y. and Stadler, G.: Mitigating the Influence of Boundary Conditions on Covariance Operators Derived from Elliptic PDEs, Inverse Probl. Imag., 12, 1083–1102, 2018. a
Dashti, M. and Stuart, A. M.: The Bayesian approach to inverse problems, in: Handbook of Uncertainty Quantification, edited by: Ghanem, R., Higdon, D., and Owhadi, H., Springer International Publishing, Cham, Switzerland, 311–428, https://doi.org/10.1007/9783319123851_7, 2016. a
Dupont, T., Hoffman, J., Johnson, C., Kirby, R. C., Larson, M. G., Logg, A., and Scott, L. R.: The FEniCS project, Chalmers Finite Element Centre, Chalmers University of Technology, 2003. a
Eisenstat, S. C. and Walker, H. F.: Choosing the forcing terms in an inexact Newton method, SIAM J. Sci. Comput., 17, 16–32, 1996. a
Elman, H. C., Silvester, D. J., and Wathen, A. J.: Finite Elements and Fast Iterative Solvers with applications in incompressible fluid dynamics, Oxford University Press, Oxford, 2005. a, b, c
Flath, P. H., Wilcox, L. C., Akçelik, V., Hill, J., van Bloemen Waanders, B., and Ghattas, O.: Fast Algorithms for Bayesian Uncertainty Quantification in LargeScale Linear Inverse Problems Based on LowRank Partial Hessian Approximations, SIAM J. Sci. Comput., 33, 407–432, 2011. a, b
GilletChaulet, F., Hindmarsh, R. C., Corr, H. F., King, E. C., and Jenkins, A.: Insitu quantification of ice rheology and direct measurement of the Raymond Effect at Summit, Greenland using a phasesensitive radar, Geophys. Res. Lett., 38, 2011. a, b
GilletChaulet, F., Gagliardini, O., Seddik, H., Nodet, M., Durand, G., Ritz, C., Zwinger, T., Greve, R., and Vaughan, D. G.: Greenland ice sheet contribution to sealevel rise from a newgeneration icesheet model, The Cryosphere, 6, 1561–1576, https://doi.org/10.5194/tc615612012, 2012. a, b
Giudici, M., Baratelli, F., Comunian, A., Vassena, C., and Cattaneo, L.: Model calibration for ice sheets and glaciers dynamics: a general theory of inverse problems in glaciology, The Cryosphere Discuss., 8, 5511–5537, https://doi.org/10.5194/tcd855112014, 2014. a, b, c
Glen, J. W.: The creep of polycrystalline ice, P. Roy. Soc. Lond. A. Mat., 228, 519–538, 1955. a, b
Gockenbach, M. S.: Understanding and implementing the finite element method, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2006. a
Golub, G. H. and Van Loan, C. F.: Matrix Computations, Johns Hopkins University Press, Baltimore, Maryland, USA, 1996. a
Hutter, K.: Theoretical Glaciology, Mathematical Approaches to Geophysics, D. Reidel Publishing Company, Springer Netherlands, Heidelberg, Germany, 1983. a
Isaac, T., Petra, N., Stadler, G., and Ghattas, O.: Scalable and efficient algorithms for the propagation of uncertainty from data through inference to prediction for largescale problems with application to flow of the Antarctic ice sheet, J. Comput. Phys., 296, 348–368, 2015a. a, b, c, d, e, f, g, h, i, j
Isaac, T., Stadler, G., and Ghattas, O.: Solution of nonlinear Stokes equations discretized by highorder finite elements on nonconforming and anisotropic meshes, with application to ice sheet dynamics, SIAM J. Sci. Comput., 37, B804–B833, 2015b. a, b
Kaipio, J. and Kolehmainen, V.: Bayesian Theory and Applications, chap. Approximate Marginalization Over Modeling Errors and Uncertainties in Inverse Problems, pp. 644–672, Oxford University Press, Oxford, UK, 2013. a, b, c, d, e, f, g, h, i, j
Kaipio, J. and Somersalo, E.: Statistical and Computational Inverse Problems, vol. 160 of Applied Mathematical Sciences, SpringerVerlag New York, 2005. a, b, c, d, e, f, g
Kaipio, J. and Somersalo, E.: Statistical Inverse Problems: Discretization, Model Reduction and Inverse Crimes, J. Comput. Appl. Math., 198, 493–504, 2007. a, b, c, d, e
Khristenko, U., Scarabosio, L., Swierczynski, P., Ullmann, E., and Wohlmuth, B.: Analysis of Boundary Effects on PDEBased Sampling of Whittle–Matérn Random Fields, SIAM/ASA Journal on Uncertainty Quantification, 7, 948–974, 2019. a, b
Lamien, B., Le Maux, D., Courtois, M., Pierre, T., Carin, M., Le Masson, P., Orlande, H. R. B., and Paillard, P.: A Bayesian approach for the estimation of the thermal diffusivity of aerodynamically levitated solid metals at high temperatures, Int. J. Heat Mass Transf., 141, 265–281, 2019. a
Lindgren, F., Rue, H., and Lindström, J.: An explicit link between Gaussian fields and Gaussian Markov random fields: The stochastic partial differential equation approach, J. R. Stat. Soc. B, 73, 423–498, 2011. a
Logg, A., Mardal, K.A., and Wells, G. N.: Automated Solution of Differential Equations by the Finite Element Method, vol. 84, in: Lecture Notes in Computational Science and Engineering, Springer, Berlin, Germany, 2012. a
Ma, Y., Gagliardini, O., Ritz, C., GilletChaulet, F., Durand, G., and Montagnat, M.: Enhancement factors for grounded ice and ice shelves inferred from an anisotropic iceflow model, J. Glaciol., 56, 805–812, 2010. a
Marshall, S. J.: Recent advances in understanding ice sheet dynamics, Earth Planet. Science Lett., 240, 191–204, 2005. a, b, c
Martin, N. and Monnier, J.: Adjoint accuracy for the full Stokes ice flow model: limits to the transmission of basal friction variability to the surface, The Cryosphere, 8, 721–741, https://doi.org/10.5194/tc87212014, 2014. a
Morlighem, M., Seroussi, H., Larour, E., and Rignot, E.: Inversion of basal friction in Antarctica using exact and incomplete adjoints of a higherorder model, J. Geophys. Res.Earth, 118, 1746–1753, 2013. a, b
Nicholson, R., Petra, N., and Kaipio, J. P.: Estimation of the Robin coefficient field in a Poisson problem with uncertain conductivity field, Inverse Probl., 34, 115005, https://doi.org/10.1088/13616420/aad91e, 2018. a, b, c, d, e
Nocedal, J. and Wright, S.: Numerical optimization, Springer Science & Business Media, New York, NY, USA, 2006. a
Osborn, S., Vassilevski, P. S., and Villa, U.: A multilevel, hierarchical sampling technique for spatially correlated random fields, SIAM J. Sci. Comput., 39, S543–S562, 2017. a
Paterson, W. S. B.: The Physics of Glaciers, 3rd edn., Butterworth Heinemann, Pergamon Oxford, UK, 1994. a
Pattyn, F., Perichon, L., Aschwanden, A., Breuer, B., de Smedt, B., Gagliardini, O., Gudmundsson, G. H., Hindmarsh, R. C. A., Hubbard, A., Johnson, J. V., Kleiner, T., Konovalov, Y., Martin, C., Payne, A. J., Pollard, D., Price, S., Rückamp, M., Saito, F., Souček, O., Sugiyama, S., and Zwinger, T.: Benchmark experiments for higherorder and fullStokes ice sheet models (ISMIP–HOM), The Cryosphere, 2, 95–108, https://doi.org/10.5194/tc2952008, 2008. a
Perego, M., Price, S., and Stadler, G.: Optimal initial conditions for coupling ice sheet models to Earth system models, J. Geophys. Res.Earth, 119, 1894–1917, 2014. a
Petra, N., Zhu, H., Stadler, G., Hughes, T. J. R., and Ghattas, O.: An inexact GaussNewton method for inversion of basal sliding and rheology parameters in a nonlinear Stokes ice sheet model, J. Glaciol., 58, 889–903, 2012. a, b, c, d, e, f, g, h, i, j, k
Petra, N., Martin, J., Stadler, G., and Ghattas, O.: A computational framework for infinitedimensional Bayesian inverse problems: Part II. Stochastic Newton MCMC with application to ice sheet inverse problems, SIAM J. Sci. Comput., 36, A1525–A1555, 2014. a, b, c, d, e, f, g, h, i
Pollard, D. and DeConto, R. M.: A simple inverse method for the distribution of basal sliding coefficients under ice sheets, applied to Antarctica, The Cryosphere, 6, 953–971, https://doi.org/10.5194/tc69532012, 2012. a, b
Raymond, M. J. and Gudmundsson, G. H.: Bayesian estimation of basal conditions on Rutford Ice Stream, West Antarctica, from surface data, J. Glaciol., 57, 315–324, 2011. a
Raymond, M. J. and Gudmundsson, G. H.: Estimating basal properties of ice streams from surface measurements: a nonlinear Bayesian inverse approach applied to synthetic data, The Cryosphere, 3, 265–278, https://doi.org/10.5194/tc32652009, 2009. a, b, c, d
Roininen, L., Huttunen, J. M. J., and Lasanen, S.: WhittleMatérn priors for Bayesian statistical inversion with applications in electrical impedance tomography, Inverse Probl. Imag., 8, 561, 2014. a
Schlegel, N.J., Larour, E., Seroussi, H., Morlighem, M., and Box, J.: Ice discharge uncertainties in Northeast Greenland from boundary conditions and climate forcing of an ice flow model, J. Geophys. Res. Earth, 120, 29–54, 2015. a
Schoof, C.: The effect of cavitation on glacier sliding, P. R. Soc. AMath. Phy., 461, 609–627, 2005. a
Schoof, C.: Coulomb friction and other sliding laws in a higherorder glacier flow model, Math. Mod. Meth. Appl. S., 20, 157–189, 2010. a
Steihaug, T.: Local and superlinear convergence for truncated iterated projections methods, Math. Program., 27, 176–190, https://doi.org/10.1007/BF02591944, 1983. a
Stuart, A. M.: Inverse problems: A Bayesian perspective, Acta Numer., 19, 451–559, 2010. a, b, c, d, e
Tarantola, A.: Inverse Problem Theory and Methods for Model Parameter Estimation, SIAM, Philadelphia, PA, 2005. a, b, c, d
Tarvainen, T., Kolehmainen, V., Pulkkinen, A., Vauhkonen, M., Schweiger, M., Arridge, S. R., and Kaipio, J. P.: Approximation error approach for compensating modelling errors in optical tomography, in: Biomedical Optics, BSuD48, 11–14 April 2010, Miami, Florida, USA, Optical Society of America, BSuD48, 2010. a
Truffer, M.: The basal speed of valley glaciers: an inverse approach, J. Glaciol., 50, 236–242, 2004. a, b
Van der Veen, C. J.: Fundamentals of glacier dynamics, CRC press, Boca Raton, Florida, 2013. a
Villa, U., Petra, N., and Ghattas, O.: hIPPYlib: An Extensible Software Framework for LargeScale Inverse Problems, J. Open Source Soft., 3, 115005, https://doi.org/10.21105/joss.00940, 2018 (data available at https://doi.org/10.5281/zenodo.3634136). a, b
Villa, U., Petra, N., and Ghattas, O.: hIPPYlib: An Extensible Software Framework for LargeScale Inverse Problems Governed by PDEs; Part I: Deterministic Inversion and Linearized Bayesian Inference, ACM Transactions of Mathematical Software, 47, 16, https://doi.org/10.1145/3428447, 2021. a, b, c, d
Zhao, C., Gladstone, R. M., Warner, R. C., King, M. A., Zwinger, T., and Morlighem, M.: Basal friction of Fleming Glacier, Antarctica – Part 1: Sensitivity of inversion to temperature and bedrock uncertainty, The Cryosphere, 12, 2637–2652, https://doi.org/10.5194/tc1226372018, 2018a. a
Zhao, C., Gladstone, R. M., Warner, R. C., King, M. A., Zwinger, T., and Morlighem, M.: Basal friction of Fleming Glacier, Antarctica – Part 2: Evolution from 2008 to 2015, The Cryosphere, 12, 2653–2666, https://doi.org/10.5194/tc1226532018, 2018b. a, b
Zhu, H., Petra, N., Stadler, G., Isaac, T., Hughes, T. J. R., and Ghattas, O.: Inversion of geothermal heat flux in a thermomechanically coupled nonlinear Stokes ice sheet model, The Cryosphere, 10, 1477–1494, https://doi.org/10.5194/tc1014772016, 2016. a, b
The “exp” function is used to ensure the basal sliding coefficient remains positive. For simplicity, in what follows, we will refer to β as the basal sliding coefficient.
For cases in which β has only one spatial dimension, an inverse elliptic operator, i.e., without the squaring, also results in a valid covariance operator (see Petra et al., 2014). However, in the current paper we consider cases in which β is onedimensional and twodimensional, and thus for ease of exposition, and in the interest of space, we limit the choice of the prior covariance operator to the squared inverse elliptic operator.
Vertical velocity measurements may not always be available; however, as shown in Raymond and Gudmundsson (2009), these measurements are fairly insignificant. Furthermore, the assumed noise level in the current paper is larger than the vertical velocities.
We attribute the differences in the BAE approach to the differences in the true flow rate prefactor, the noise realization, and the specific samples of ε.
 Abstract
 Introduction
 Forward ice sheet flow model
 Inferring the basal sliding coefficient field
 Premarginalization over auxiliary parameters and the Bayesian approximation error approach
 Numerical examples
 Results
 Conclusions
 Code availability
 Data availability
 Author contributions
 Competing interests
 Acknowledgements
 Financial support
 Review statement
 References
 Supplement
 Abstract
 Introduction
 Forward ice sheet flow model
 Inferring the basal sliding coefficient field
 Premarginalization over auxiliary parameters and the Bayesian approximation error approach
 Numerical examples
 Results
 Conclusions
 Code availability
 Data availability
 Author contributions
 Competing interests
 Acknowledgements
 Financial support
 Review statement
 References
 Supplement