Estimating basal properties of ice streams from surface measurements: a non-linear Bayesian inverse approach applied to synthetic data

. We propose a new approach to indirectly estimate basal properties of ice streams, i.e. bedrock topography and basal slipperiness, from observations of surface topography and surface velocities. We demonstrate how a maximum a posteriori estimate of basal conditions can be determined using a Bayesian inference approach in a combination with an analytical linearisation of the forward model. Using synthetic data we show that for non-linear media and non-linear sliding law only a few forward-step model evaluations are needed for convergence. The forward step is solved with a numerical ﬁnite-element model using the full Stokes equations. The Fr ´ echet derivative of the forward function is approximated through analytical small-perturbation solutions. This approximation is a key feature of the method and the effects of this approximation on model performance are an-alyzed. The number of iterations needed for convergence increases with the amplitude of the basal perturbations, but generally less than ten iterations are needed.


Introduction
The goal of geophysical inverse methods is to make quantitative inferences about Earth characteristics from indirect observations (e.g., Gouveia and Scales, 1998). Estimating basal properties of glaciers from surface measurements is an example of such an inverse problem. In this paper, we introduce and test the suitability of a non-linear probabilistic Bayesian inference approach (e.g., Rodgers, 2000;Tarantola, Correspondence to: G. H. Gudmundsson (ghg@bas.ac.uk) 2005) to estimate bedrock topography and basal slipperiness under ice streams from surface velocities and surface geometry. An identical Bayesian inference approach, but for linear media and small amplitudes of the basal perturbations, has been presented in Gudmundsson and Raymond (2008). In Bayesian inference, a priori information about the basal properties is expressed as a probability density function and combined with the surface measurements to give a posteriori probability distribution describing the final uncertainty of the estimate. The solution of the inverse problem, i.e. the a posteriori probability distribution, provides an ensemble of solutions from which we single out the most likely one corresponding to the maximum of the a posteriori probability (MAP estimate).
The forward function describing the relationship between basal conditions (bedrock topography and basal slipperiness), and the observations (surface topography, surface velocities, rates of elevation change), is solved numerically with a two-dimensional non-linear plane-strain finiteelement model. We restrict the observations to noisedegraded synthetic surface data generated with the forward model. A posteriori probability distribution for the system state is optimized via a non-linear Gauss-Newton procedure to find the maximum a posteriori probability.
A key issue in the derivation of the solution involves determining the sensitivity of surface fields to perturbation in basal quantities. The Fréchet derivative of the forward model can, in principle, be evaluated numerically. However, as the computational times involved in doing so are typically long in comparison to one forward model computation, it is, whenever possible, preferable to evaluate the derivatives of the forward function algebraically (Rodgers, 2000). Here we approximate the forward model derivatives using analytical Published by Copernicus Publications on behalf of the European Geosciences Union. transfer functions (Gudmundsson, 2003). These transfer functions describe the effects of small-amplitude perturbations in basal properties (bedrock profile and slipperiness) on surface fields in the case of Newtonian rheology and linear sliding law. The numerical forward model solves for nonlinear rheology, non-linear sliding law, and finite-amplitude basal perturbations. The transfer functions are, thus, only approximations to the actual Fréchet derivative of the forward model. It is far from clear that using the analytical transfer functions in this context will result in a usable inverse model. What is clear, however, is that if this does work, the resulting improvements in computational efficiency are large. The main focus of this paper is to determine whether this approximation is adequate in situations commonly encountered in glaciology. We do so by systematically constructing synthetic data sets where, to a varying degree, assumptions of the analytical theory are not fullfilled. Thus, we start by using linear rheology and small basal perturbations, and then in a step-wise fashion introduce finite-amplitude effects of basal perturbations and non-linear rheology. The influence of nonuniform englacial temperature and non-linear sliding law on retrieval are also examined. The case studies presented allow us to explore the performance of the proposed inverse procedure and in particular to assess the practicality of approximating the Fréchet derivative of the forward function using analytical small-amplitude solutions. We determine to what extent the inverse procedure converges to the true solution and how many iterations are needed.
The method proposed here differs in a number of ways from previous inverse methods developed and applied to ice streams. Thorsteinsson et al. (2003) used the analytical small-amplitude solutions by Gudmundsson (2003) as a forward model for least-squares inversion of data from Ice Stream E, West Antarctica. In doing so the assumption was made that non-linear effects were not strong. A novel aspect of the method used by Thorsteinsson et al. (2003) was the simultaneous inversion for both basal topography and basal slipperiness. Gudmundsson (2006) suggested using a formal Bayesian inverse method instead of the least-square approach used by Thorsteinsson et al. (2003). MacAyeal (1992) and MacAyeal et al. (1995) applied control theory to determine the basal shear stress under ice streams using surface velocity data, ice thickness and surface elevation. Joughin et al. (2004) used a similar method to arrive at estimates of basal stress for ice streams flowing over a perfectly plastic bed. These inverse procedures use forward models that solve a reduced set of the Stokes equations. Another interesting approach to surface-to-bed inversion can be found in Truffer (2004) who inverted a linearized onedimensional forward model to calculate the basal velocity of a valley glacier. Further examples of inversion of surface observations to determining basal conditions under glaciers can be found in e.g. Van der Veen and Whillans (1989) and Vieli and Payne (2003). Recently, Maxwell et al. (2008) proposed an itertive scheme for determining basal conditions by solv-ing a sequence of well-posed forward problems to arrive at solution to the (ill-posed) inverse problem.
The structure of the paper is as follows. We start by describing in Sect. 3 the numerical forward model. We then introduce the non-linear Bayesian inference method in Sect. 4. The results of the proposed inverse method are presented in Sect. 5, where the suitability of the method is discussed.

Notation
Vectors will be denoted by boldface italic letters (e.g. d) and matrices by bold uppercase letters (e.g. C). Surface measurements are available at discrete points, and we denote the set of all available surface quantities as the measurement vector d. The measurement vector d = (s,u,w) T consists of surface topography s, horizontal velocity u and vertical velocity w. The basal properties to be estimated are assembled into one system state vector m. The vector m = (b,c) T contains the basal topography b and basal slipperiness c. The superscript T means transposition, here to column vectors. The subscript prior denotes a prior estimate, while a hat (e.g.m) indicates a maximum a posteriori estimate.

Forward model
The relation between basal properties and surface data can be written as (1) We refer to the function g as the forward model but we also use the term "forward function" when referring to g. The forward model gives the surface quantities (surface velocity and surface topography) as a function of basal properties (bedrock topography and basal slipperiness). The model is non-linear because the ice rheology is non-linear, the sliding law is non-linear, and because the surface reacts in a nonlinear fashion to finite amplitude basal perturbations. We use as a forward function a numerical flow model that allows us to deal with all the types of nonlinearities mentioned above. The numerical model is a two-dimensional finite element model that uses four-node isoparametric and quadrilateral Hermann elements. A mixed Lagrangian-Eulerian approach is employed in determining the position of the steady-state surface (Leysinger Vieli and Gudmundsson, 2004). The numerical model solves the full equilibrium equations, together with the mass-conservation equation for incompressible ice. These equations read σ ij,j = −ρf i and v i,i = 0, respectively, where σ ij are the components of the Cauchy-stress tensor, ρ is the ice density, f the acceleration due to gravity and v i are the components of the velocity vector v = (u,w). The glacier geometry corresponds to a uniformly inclined plane slab of constant thickness on which perturbations in bed and surface topography are superimposed. Figure 1 illustrates the problem geometry. M. J. Raymond and G. H. Gudmundsson: Non-linear inversion of synthetic data 267 The coordinates are (x,z), where x is parallel and z perpendicular to the mean slope. The equation z = s(x,t) defines the surface and z = b(x) the base of the glacier. The constitutive law is Glen's flow law, extended, following Hutter (1983), with a linear term to avoid the singularity in viscosity as the deviatoric stress goes to zerȯ In this equation, A is the rate factor, n the stress exponent, ij and τ are the strain rate, the deviatoric stress tensors and the effective shear stress, respectively. The parameter τ 0 is the crossover stress at which the linear and power terms contribute equally to the total strain rate. This parameter has been introduced only in model runs where the model geometry did not give rise to a sufficiently big mean longitudinal strain rate to avoid a large effective viscosity at the surface. Values for the rate factor for temperate ice are taken from Paterson (1994), and the dependency on temperature follows Smith and Morland (1981).
Boundary conditions along the bed are specified by a sliding relation of the form where u b is the sliding velocity tangential to the bed, c(x) the sliding coefficient, τ b the bed parallel shear stress and m the basal sliding exponent. Basal sliding is introduced in the finite-element model by adding a uniform thin layer of different viscosity to the base of the glacier such that relation (3) is fulfilled. The ice surface (z = s(x,t)) is stress-free and evolves with time according to the kinematic boundary condition until steady-state is reached. The kinematic boundary condition reads ∂s ∂t where s(x,t) describes the surface elevation, t is the time, and u and w are the horizontal and vertical velocity components, respectively. In model runs accumulation and ablation are not taken into account, but doing so represents a straightforward modification. The kinematic boundary condition (Eq. 4) is integrated forward in time with an semiimplicit Crank-Nicholson scheme. To speed up the evolution of the free surface towards steady-state, we initialize the computations with the analytical steady-state surface profile from (Gudmundsson, 2003). Periodic displacement boundary conditions are imposed along the upstream and downstream glacier model boundaries.
The size of the mesh in the x-direction follows a Gaussian repartition centered around the middle of the prescribed perturbations where we expect the largest deformations. This allows to reduce the total amount of elements and computational time. The results of the numerical model have been validated by comparison with relevant perturbation theories .

Perturbed fields
We define perturbations in boundary data and all field variables as the difference between the value of the variable in question at some given point and its spatial averaged mean value. For the basal topography, for example, we write where b(x) is the bedrock profile,b the mean bedrock elevation, and b(x) = b(x) −b the bedrock perturbation. Similarly, the function describing the basal slipperiness c(x) is written as wherec is the mean basal slipperiness, andc c(x) is the basal slipperiness perturbation. Perturbations in bedrock topography are referred to in the following as b perturbations, and perturbations in basal slipperiness as c perturbations.
Similarly to the basal perturbations, the steady-state surface topography (s), surface horizontal velocity (u), and surface vertical velocity (w) are partitioned as s(x) =s + s(x),u(x) =ū + u(x),w(x) =w + w(x), respectively. Collectively u, w and s are the surface perturbations and b and c the basal perturbations. For reasons of notational compactness we will sometimes refer to s, u and w as the surface perturbations and to b and c as the basal perturbations.

Non-linear Bayesian inverse approach
We perform a Bayesian inverse calculation to determine both bedrock topography and basal slipperiness from surface topography and surface velocities data. In Bayesian inference, the notion of knowledge and uncertainty about data and system state is expressed in terms of probability density functions (pdfs). The solution of the inverse problem is a posterior probability distribution P (m|d,m prior ) for the sys- probability can be written as the combination of prior information m prior and the data d P (m|d,m prior ) = P (m|m prior )P (d|m) P (d) .
The denominator of Eq. (7) is independent of the system state and does not affect the position of the maximum of the conditional probability on the left hand side of Eq. (7). Hence, the a posteriori distribution reduces to the product of two terms, i.e. the a priori distribution P (m|m prior ) for the system state m and the likelihood function P (d|m). The likelihood function measures the probability of observing the data d if the system state was m, while the prior distribution incorporates prior information that is known independently of the measurements. As an example, the bed topography could be known independently from radio-echo sounding measurements. The a priori information may also arise from theoretical considerations (e.g., bedrock perturbation must be smaller than ice thickness), or some expectations (e.g., basal slipperiness not negative). Equation (7) is general. In this study, we assume that both data and modeling uncertainties can be described by Gaussian distributions. The a priori probability density function is therefore of the form where C M is the a priori covariance matrix describing the uncertainties in the prior system state. The likelihood function is given by Here, C D is the covariance matrix of the noise in the data and g(m) is the forward modeling operator encapsulating the relevant physics in the relation between surface data d and system state m as described in more detail in Sect. 3. Defining the cost function by and substituting Eqs. (8) and (9) into Eq. (7) we obtain . We solve Eq. (10) for the minimum of J (m) corresponding to the maximum of the posterior probability distribution P (m|d,m prior ), that is, we single out the most likely system statem from the ensemble described by the pdf. This solution is referred to as the maximum a posteriori solution (Rodgers, 2000, p. 66,84). Because the forward modeling operator g(m) is non-linear, there is no explicit solution to Eq. (10). We therefore perform a non-linear optimization to find the maximum a posteriori solutionm that maximizes P (m|d,m prior ). To find the minimum of J (m) we equate the derivative in respect to m of Eq. (10) to zero. Defining is the Jacobian matrix, the solution of the optimization problem is given by φ(m) = 0. The value ofm is found using Newton's method for systems of equations via the iteration where the subscript i denotes the i-th iteration, and The term ∇ m K(m) is the second derivative of the cost function, also called its Hessian. Equation (13) involves both the first derivative K(m) and the second derivative ∇ m K(m) of the forward model. As the product of ∇ m K(m) with the vec- ] is small in the moderately non-linear case and becomes smaller as the solution proceeds, this term can be ignored. Substituting Eqs. (11) and (13) into the Newtonian iteration Eq. (12) gives the m i+1 iteration according to the Gauss-Newton method, namely where K i = K(m i ). The Fréchet derivative of the forward model K i , is approximated using linear transfer functions, i.e The transfer functions T are analytical solutions for linear rheology describing the effects of small-amplitude variations in bed topography and basal slipperiness on surface fields (Gudmundsson, 2003(Gudmundsson, , 2008. The transfer functions T have a two letter suffix. The first suffix denotes the effect and the second one the cause. T sb describes a change in surface topography caused by a perturbation in bedrock topography, whereas T uc describes a change in surface-parallel velocity caused by a spatial variation in basal slipperiness. Figure 2 shows examples of analytical transfer functions as functions of the wavelength for both b and c perturbations. When using the transfer function formulation for K i , Eq. (14) is most easily solved in Fourier space. In Eq. (14) all vector components, i.e., surface fields, a priori and basal perturbations are therefore transformed to frequency space by  (14) is substituted with the Hermitian transpose H . The covariance matrix of the maximum a posteriori solutionm is given bŷ

Data uncertainties
The covariance matrix for the noise in the data C D is defined as The matrix C D is a block diagonal matrix consisting of the matrices describing the uncertainties in the surface topography C s , horizontal velocity C u and vertical velocity C w along the main diagonal. The off-diagonal blocks are zero matrices, since no cross-correlation between errors in surface topography, horizontal and vertical velocity is considered. The covariance matrix for the noise in surface topography C s takes the form where N is the number of discretization points. The matrix element σ s i s j is the covariance of surface data s i and s j with 1 ≤ i,j ≤ N. If the noise in the data is uncorrelated, the corresponding covariance matrix is of diagonal form. C u and C w are of the same form but with the indices s replaced by u and w, respectively. The covariance matrix for m prior is defined as where C b and C c have the same structure as C s . No a priori cross-correlation between the a priori estimates of basal topography and basal slipperiness is considered. The correlation between off-diagonal elements is assumed to follow a Gaussian distribution and the elements of C b are therefore given by Steady-state amplitude ratios and phase shifts of surface-parallel velocities for a bedrock (T ub ) and basal slipperiness perturbation (T uc ). The medium is Newtonian. The mean surface slope is 0.5 • and the mean non-dimensional basal slipperiness isC = 200. σ b describes the variance of the fluctuation about the mean m prior of the Gaussian probability density that characterizes the uncertainty of the a priori estimate, l b is the correlation length and x the position. A corresponding expression is used for basal slipperiness a priori covariance matrix (C c ) with l c denoting the correlation length.

First guess
To start the iteration we need an initial guess for m i=0 in Eq. (14). The initial guess generally corresponds to the a priori values for the model parameters m prior . Here, to define m i=0 , we assume that the relationship between basal and surface properties is linear (i.e. the ice rheology is linear and the amplitude of the basal perturbation is small) and can be completely described using the perturbation theory of Gudmundsson (2003). Thus, the forward relation Eq. (1) simplifies to d = Km+ , where K is a matrix of transfer functions. As all pdfs are still Gaussian, the cost function is of same form as Eq. (10) but with g(m) replaced by K. Taking the negative logarithm of this new expression and maximizing with respect to d, the maximum a posteriori solution is given bŷ Hence, we start the inverse procedure by setting m i=0 = m linear .

Convergence
The iteration is continued until either the convergence test is fulfilled, or a maximum number of iterations has been exceeded. The above criterion is based on the fact that 3 × N corresponds theoretically to the expected value of J (m) (Tarantola, 2005).

Inverse procedure
In summary, the different steps involved in the iterative optimization by which the objective function J (m) is minimized, are: i Initialization step: define a first guess for m i=0 assuming everything can be described perfectly by the analytical transfer functions valid for small-amplitudes and Newtonian rheology.
ii Note that the analytical transfer functions are used not only in the initialization step (i) but also in step (iv).

Model experiments
The main focus of this paper is to evaluate the performance of the inverse method described above in situations typically encountered on active ice streams. The key novel aspect of the method is the use of the analytical transfer functions in approximating the Fréchet derivative of the forward model. This approximation renders the method tractable on current generation of computers. On the other hand, because the transfer functions are based on small-amplitude perturbations theory, this aspect of the method is also the one which might most severely restrict its applicability. We start by inverting for basal properties for cases where we expect the method to work, i.e. for linear rheology, linear sliding law, and small perturbation amplitudes. Subsequently we introduce large perturbation amplitudes, non-linear ice rheology, and non-linear sliding. Only a subset of the experiments performed are presented here. A more detailed description can be found in Raymond (2007).

Synthetic data
All synthetic surface data sets (i.e. surface topography, horizontal and vertical velocity) were generated with the finiteelement forward model described in Sect. 3. The perturbations in bedrock and basal slipperiness correspond either to Gaussian peak distributions or to a step-shaped sigmoidal function with a corresponding expressions for the c perturbations.
Here a b is the basal amplitude, x b the center of the perturbation and the standard deviation σ b defines the width of the basal perturbation. The synthetic surface data set was corrupted by uncorrelated Gaussian noise. The forward relation Eq. (1) hence modifies to where are uncorrelated measurement errors. The surface data was assumed to be available at equally spaced location along the profile. The surface data was interpolated to the nodes of the non-equidistant finite-element mesh. This interpolation introduces errors that, in general, are spatially correlated, and the covariance matrix describing the surface data errors is therefore no longer of diagonal form. Data were interpolated using the Best Linear Unbiased Estimate (BLUE) (e.g. Kitanidis, 1997). The method makes use of the experimental variogram to describe the spatial variability of the measurements. The variogram was approximated with an isotropic Gaussian model. Key aspect of this interpolation method is that it delivers full covariance matrices C s , C u , and C w , describing the combined effects of noise in the original surface data and the errors and correlations introduced by the interpolation.

Non-dimensionalisation
All results are presented in non-dimensional form. The dimensional variables entering the problem are substituted through scalings by non-dimensional variables . The spatial variables (x and z) are scaled with the mean ice thicknessh and the velocity components (u and w) with the mean surface-parallel deformational velocityū d . The stresses and the pressure are scaled by the mean basal shear stressτ b . For a given mean surface slope, mean ice thickness, and stress-law exponent, the rate factor is scaled so that the mean forward deformational velocity is equal to unity. It follows that in non-dimensional form, the rate factor A equals to A = (n + 1)/2. When inserted into the sliding law these scalings give a non-dimensional basal slipperiness C(x) = c(x)τ (0)m /ū d where as before c(x) is the dimensional slipperiness and C(x) is its non-dimensional counterpart. It further follows thatC =ū b /ū d , whereC is the mean non-dimensional slipperiness (C(x) =C(1+ C)) and u b andū d are the mean basal and deformational velocities, respectively.C is therefore equal to the slip ratio which is defined as the ratioū b /ū d . Figure 3 illustrates a synthetic surface data set generated with the numerical forward model. The data represents surface response to a combined perturbation in both bedrock profile and basal slipperiness. The amplitudes of both basal disturbances are small (5% of their respective mean values) and the perturbations are Gaussian shaped with a standard deviation of 12 mean ice thicknesses. Mean surface slope was 0.5 • and mean slip ratio equal to 200. Further information on basal parameters is given in Table 1 and in the figure caption. The non-dimensionalized surface data were corrupted by uncorrelated Gaussian noise with zero mean and standard deviation equal to 5 × 10 −4 for the surface topography, 10 −3 and 3 × 10 −3 for the horizontal and vertical velocity components, respectively. This level of noise is realistic for situations where surface measurements are done using GPS technology. Synthetic measurements were equally spaced with an interval equal to two ice thicknesses. The interpolation of surface data on the surface nodal points of the finite-element mesh was done with the method of best linear unbiased estimator (BLUE). The surface data covariance matrix used in the inversion is the corresponding BLUE estimate of interpolation errors. The a priori estimate was no perturbation in either basal topography or basal slipperiness with large unknowns. The a priori covariance matrix was a Gaussian with diagonal elements corresponding to a 25% error in the respective mean values of basal slipperiness and ice thickness. The a priori correlations lengths for both the c and the b perturbation were equal to 10 mean ice thicknesses. Figure 4 shows the inferred bed topography and basal slipperiness distribution using the data shown in Fig. 3. The first guess and the maximum a posteriori solution obtained after two iterations are shown. For comparison the true basal disturbances are depicted using solid lines with circles. As can be seen in Fig. 4, the first guess (blue lines), obtained by assuming a linear relationship between basal and surface properties, already resolves both basal perturbations quite accurately. This is to be expected since the ice rheology is linear  Fig. 4. Inferred, (a) bed topography and, (b) basal slipperiness distributions estimated from the surface data set shown in Fig. 3. The true basal perturbations are the black lines with circle, and the iterations are labeled with iteration number. The maximum a posteriori solution is the red line. Both a priori estimates of bedrock topography and basal slipperiness were set to zero. The medium is Newtonian, mean surface slope is 0.5 • and mean basal slipperiness C = 200. Note that the 0th iteration and the MAP virtually coincide with the true bedrock perturbation in panel (a). Lines in cyan show the retrieval error estimate defined as the square root of the diagonal elements of the a posteriori error covariance matrixĈ (Eq. 16). and perturbation amplitudes fairly small (5%). Nevertheless, the subsequent iterative optimization significantly improves on the initial estimate, and the final maximum a posteriori solution (m) obtained after only two iterations is almost identical to the true value of the basal disturbances (m). In Fig. 4 lines in cyan show the retrieval error estimate as given by Eq. (16). Figure 5 shows the residuals between observations and FE-model predictions for, (a) the surface topography s, (b) the horizontal velocity u and, (c) the vertical velocity w for iteration number 0 and 2. The residuals are defined as

Linear rheology, linear sliding law, and smallamplitude perturbations
where d i is the vector of residuals and g(m i ) the forward finite-element model prediction for the system state m i at iteration numbers i = 0,2. As can be seen, only the horizontal velocity is poorly predicted by the FEmodel in the first guess. This is almost definitely because the slipperiness perturbation, which has a much larger effect on horizontal velocity component than the vertical one and negligible effect on surface topography, is not resolved in full detail in this initial step. Note that even in the final MAP estimate the b perturbation is better resolved than the c perturbation. As we will see below, this is generally the case. Figure 6 shows the inferred bedrock and basal slipperiness perturbations for the Gaussian peak distributions as used in the example above, but now for perturbation amplitudes of 20% instead of 5%. All other parameters are the same as in previous 5% amplitude example. Whereas for 5% amplitude perturbation it would have been surprising had the method not converged in a few iterations, we now have basal amplitudes that can be expected to be too large to be well retrieved in the first initialization step (step (i)). In addition, the Fréchet derivative of the forward model used in step (iv) is now considerably less accurate. As expected on the basis of this, the first guess retrieval is poorer than it was in the 5% amplitude case. The initialization step creates a peak in the basal slipperiness at the right position (x = −10). However, it also produces a negative peak at x = 10 with a considerable amplitude of more than −0.1.

Linear rheology, linear sliding law, and large amplitude Gaussian perturbations
Encouragingly, a positive perturbation of smaller amplitude appears at the same location in the next iteration step (Fig. 7). In the following iterations the retrieval converges quickly to the true solution. Only two more iterations are needed as compared to the 5% amplitude case, and only a total of four iterations is needed to compute the MAP estimate.

Non-linear ice rheology, linear sliding law, and small-amplitude Gaussian perturbations
In the following experiment, we introduce a non-linear ice rheology, i.e. n = 3 rather than n = 1 in Eq.
(2). The 5% amplitude experiment with the two Gaussian shaped bedrock and slipperiness perturbations described above was repeated.
Despite the introduction of non-linear ice rheology the inver- sion algorithm not only converged but did so in only 7 iterations. Further details of this experiment and other smallamplitude experiments using bump-shaped sigmoidal functions can be found in Raymond (2007).

Non-linear ice rheology, linear sliding law, and large-amplitude Gaussian perturbations
We now consider non-linear rheology and relative large amplitudes of the basal perturbations of 20%. Both rheological and finite-amplitude nonlinearities are thus present and affect the surface response. The cross-over stress was τ 0 = 0.3τ b . The mean surface slope is now α = 0.2 • instead of α = 0.5 • . Note that the results are not dependent on the exact value of the surface slope. The maximum a posteriori solution, computed using 11 iterations, is shown in Fig. 8  disturbances (black lines with circles). As can be seen, the inverse procedure converged to the correct solution and the MAP reproduces quite accurately the prescribed basal disturbances. The initialization step is somewhat inaccurate (see Fig. 8) and the amplitudes of the initial guess for the b and the c perturbations had to be forced to remain smaller than 30% of the respective mean values. In comparison to the linear-rheology case (experiment 5.1.2) twice as many iterations are needed for convergence.
Looking at the residual ( Fig. 9) shows that the iteration is in the form of a damped oscillation towards the final solution. This is seen in both the b and the c perturbation fields. After iteration 3, the oscillating behavior disappears.

Non-linear rheology, non-linear sliding law, and small-amplitude Gaussian perturbations
In all previous experiments the sliding law was linear (i.e. m = 1). In the following, we will consider the inversion of synthetic data generated for non-linear ice rheology and non-linear sliding law, i.e. m = 3 in Eq.
(3). The amplitude of the basal perturbations is 5%. In this experiment, the mean ice thicknessh = 1000 m, τ 0 = 10 kPa, and the deformational surface velocity u d = 4.03 m a −1 . The optimization procedure converged to the correct solution in 10 iterations. Further details of this experiment can be found in Raymond (2007).

Non-linear rheology, non-linear sliding law, and large-amplitude Gaussian perturbations
The following experiment considers amplitudes of the basal perturbations of 20%. All other parameters are the same as in the previous experiment. Figure 10 shows the inferred bedrock and basal slipperiness distributions (red lines). As for non-linear ice rheology, linear sliding law and largeamplitude perturbations (Sect. 5.1.3, see also Fig. 8), the amplitudes of the initial guess for the basal slipperiness had to be forced to remain smaller than 50% of the respective mean values (blue lines in Fig. 10) by imposing a corresponding constrain on the first guess. Interestingly, the form of the initial basal slipperiness distribution is similar to the corresponding linear-sliding case (Sect. 5.1.3), but with a larger amplitude. The MAP solution is obtained in 14 iterations and reproduces quite accurately the prescribed perturbations. As compared to the corresponding linear-sliding case, only 3 more iterations are needed for convergence. Figure 11 shows the residuals between observations and FE-model predictions.

Temperature-dependent non-linear ice rheology
We now introduce two modifications to the experimental setup, 1) the rate factor is no more constant across the thickness, and 2) we use basal perturbations corresponding to bump-shaped sigmoidal functions instead of the Gaussian used above. The vertical variation in rate factor corresponds to a linear variation in ice temperature from −25 • at the surface to 0 • at the base. The rate factor A is then determined using the double exponential fit between A and temperature derived by Smith and Morland (1981). The surface-parallel deformational velocity amounts in this case to u d = 1.27 m a −1 .
The basal disturbances are shown in Fig. 12. The parameters are: mean ice thicknessh = 300 m, surface slope α = 1 • , slip ratioC = 40, cross over stress τ 0 = 0 kPa, and 5% bumpshaped perturbations in b and c (see also Table 2). Surface data errors and a priori estimates are the same as in all previous experiments.
As can be seen in Fig. 12 the (linear) initialization step (blue lines) already gives a good approximation to the true bump-shaped perturbations in b and c. In comparison to the retrieval of the b perturbation, the c initial estimate is, however, less accurate and the rate of convergence slower (see   Fig. 12b) and a total of 14 iterations is needed for convergence. In contrast to Fig. 8, the estimates converge towards the solution without oscillations and the residuals (Fig. 13) diminish continuously towards the noise-level. In Raymond (2007) a corresponding experiment using isothermal conditions is described. Comparisons reveals that about four times as many iterations are needed than for the isothermal case to fit the observed data down to the noise level. The final MAP model estimates are, however, equally good showing that only the rate of convergence but not the quality of the solution suffers when the rate factor varies across the depth.

Forward model parameter error
The output of the forward model depends not only on the basal boundary conditions, that is on the form of the c and the b perturbations, but also on a number of forward model parameters. There are a total of six such forward model parameters: the stress exponents n and m, mean ice thicknessh, and mean surface slope α, the mean slipperinessC, and the rate factor A. In all experiments described above it has been assumed that the values of these model parameters are known with perfect accuracy, whereas in a real situation these parameters are always only known within some error bounds. The general approach to this problem would be to include the forward model parameters as elements of the system state and formally invert for these parameters in a manner identical to the inversion for b and c. Here we limit the discussion to the case where the forward model uses an incorrect estimate of englacial temperature. Let us consider the synthetic surface data generated for bump-shaped perturbations with linear temperature profile increasing from −25 • at the surface to 0 • at the bottom (see the case for temperature-dependent ice rheology in Sect. 5.2). The true deformational velocity amounts to 1.27 m a −1 and the corresponding slip ratio isC = 40. As the temperature profile is not known, we start by estimating a linear temperature profile with surface temperature −20 • and bottom temperature 0 • . The mean surface deformational velocity u d is estimated to be 1.45 m a −1 using the standard temperaturedependent flow law for ice. The slip ratio is then estimated from the mean longitudinal surface speed toC = 35. These estimated values of the forward model characteristics are subsequently introduced in the inverse calculations as well as in the forward finite-element model.  Figure 14 shows a comparison of the maximum a posteriori model as obtained with the true model characteristics and with the modeling errors. The true basal properties are also shown for comparison. In both cases, the basal topography perturbation is completely recovered. The basal slipperiness perturbation is, on the other hand, resolved less accurately for the case with the wrong estimated model characteristics than with the true ones. The optimization procedure has been stopped after 7 iterations for the wrong model and after 14 iterations for the true one. The value of the maximum a posteriori solution J (m) for the case with the wrong model characteristics is 8 times larger than for the true model and also about 8 times bigger than the theoretically expected value of J (m) = 3 × N. Hence, the MAP solution for the model with a wrong estimate of englacial temperatures would obviously be rejected as valid solution of the inverse problem.

Summary
We have shown how a non-linear Bayesian inference approach can be used to simultaneously determine both basal topography and basal slipperiness from surface measurements of velocity and topography. We emphasize the fact that we do not produce one single solution to the inverse problem considered. Rather, we determine the maximum a posteriori solution together with the a posteriori error covariance matrix. The availability of such an error estimate greatly facilities any quantitative analysis of the results. Furthermore, the method does not require any prior smoothing of input data.
We have shown that the Fréchet derivative of the forward model can be adequately approximated by small-amplitude analytical solutions (Gudmundsson, 2003) for the method to converge. Our key result is that this remains the case even when the problem is strongly non-linear. This result is of considerable practical value as this approximation greatly enhances the numerical efficiency of the method by sparing the time-consuming numerical evaluation of model derivatives.
The proposed method is capable of dealing with both nonlinear finite-amplitude effects and rheological nonlinearities. In all case studied, the inverse procedure converged quickly and to the correct solution and in only a few iterations, with the exact number of iterations needed being dependent on the type and magnitude of non-linear effects.