Parameter and state estimation with a time-dependent adjoint marine ice sheet model

To date, assimilation of observations into large-scale ice models has consisted predominantly of time-independent inversions of surface velocities for basal traction, bed elevation, or ice stiffness, and has relied primarily on analytically-derived adjoints of glaciological stress balance models. To overcome limitations of such “snapshot” in5 versions, i.e. their inability to assimilate time-dependent data for the purpose of constraining transient flow states, or to produce initial states with minimum artificial drift and suitable for time-dependent simulations, we have developed an adjoint of a timedependent parallel glaciological flow model. The model implements a hybrid shallow shelf-shallow ice stress balance, solves the continuity equation for ice thickness evo10 lution, and can represent the floating, fast-sliding, and frozen bed regimes of a marine ice sheet. The adjoint is generated by a combination of analytic methods and the use of algorithmic differentiation (AD) software. Several experiments are carried out with idealized geometries and synthetic observations, including inversion of time-dependent surface elevations for past thicknesses, and simultaneous retrieval of basal traction and 15 topography from surface data. Flexible generation of the adjoint for a range of independent uncertain variables is exemplified through sensitivity calculations of grounded ice volume to changes in basal melting of floating and basal sliding of grounded ice. The results are encouraging and suggest the feasibility, using real observations, of improved ice sheet state estimation and comprehensive transient sensitivity assess20 ments.


Introduction
Simulation of land ice evolution is hampered by a great number of sources of uncertainties regarding poorly or unknown quantities which exert control over ice dynamics.These uncertainties must be dealt with to address questions regarding the estimation of past ice flow and future behavior of ice sheets and glaciers.Unknown quantities often take the form of spatial fields rather than scalars, requiring computational techniques that can handle sets of unknowns which scale with model dimension, but with computational costs largely independent of that dimension.The adjoint or Lagrange multiplier method is an ideal candidate.
The adjoint of a (generally nonlinear) model is essentially the transpose of its Jacobian, the Jacobian being the linear map of perturbations to model input (e.g., its initial and boundary conditions) to perturbations in output, for a given model realization.Assuming the model is differentiable, the Jacobian can, in principle, be estimated with finite differences.However, when the size of the parameter set numbers in the thousands or millions, as is the case in land ice models, such an approach quickly becomes intractable.With the adjoint method, on the other hand, gradient or sensitivity information of the model output with respect to its input is obtained efficiently.Its computational cost is independent of the size of the input set, making gradient-based optimization/estimation for large-scale problems with highdimensional input spaces tractable.
Model adjoints have been valuable tools in climate research for a number of years (see Errico (1997) and Wunsch and Heimbach (2013) for partial reviews in the context of meteorology and oceanography).In the field of land ice modeling, adjoint methods have, until recently, been used D. N. Goldberg: Parameter and state estimation with a time-dependent adjoint marine ice sheet model almost exclusively to estimate basal or interior properties of ice sheets and ice shelves (e.g., basal traction parameters, stiffness parameters) based on observations of surface velocities.The first such application was by MacAyeal (1992), who used a depth-integrated stress balance (Morland, 1987;MacAyeal, 1989).In recent years, similar inversions have been carried out with so-called "higher-order" models, i.e., models that incorporate vertical inhomogeneity and even nonhydrostatic effects (e.g., Maxwell et al., 2008;Raymond and Gudmundsson, 2009;Goldberg and Sergienko, 2011;Morlighem et al., 2010;Arthern and Gudmundsson, 2010;Gillet-Chaulet et al., 2012;Petra et al., 2012).However, all these inversions consider only the nonlinear stress balance; i.e., they do not take the time-dependent mass balance into account.Furthermore, they only consider uncertainties in surface velocities (not, for instance, in surface elevation or thickness).Brinkerhoff et al. (2011) applied an adjoint method to a flowline Stokes model coupled to a steadystate thermal balance, but there was no time dependence in the model.An example of the use of a time-dependent ice model is that of Heimbach and Bugnion (2009), in which the adjoint of SICOPOLIS (Greve, 1997), a thermo-mechanical ice model which makes the shallow ice approximation (SIA; Hutter, 1983), was generated.
A natural extension to the work of Heimbach and Bugnion, then, is the application of adjoint methods to time-dependent ice models that include horizontal stress coupling in the nonlinear momentum balance.Such an approach has the advantage that the prognostic components of the ice model, such as thickness and temperature evolution, are accounted for in the model adjoint, thus enabling assimilation of time-dependent data to produce a dynamically consistent state estimate with associated optimized parameters.In the field of ocean modeling, state estimation efforts based on the adjoint method were first introduced by Thacker and Long (1988) and Tziperman and Thacker (1989).Since then, estimation systems, in which the adjoint was derived by means of automated differentiation of full-fledged ocean general circulation models, have provided solutions that are consistent with observational data, suitable for model initializations and in-depth data analysis, as well as a framework for estimating the information content of new observations (Stammer et al., 2002;Wunsch et al., 2009;Wunsch and Heimbach, 2013).While ice sheet state estimation is still in its infancy, we view ocean state estimation as a model paradigm, and time-dependent ice model adjoints as a step toward this goal.
In this paper, we present the adjoint generation of a timedependent ice sheet-stream-shelf model.The ice model implements a "hybrid" stress balance (e.g., Bueler and Brown, 2009;Pollard and DeConto, 2009;Schoof and Hindmarsh, 2010;Goldberg, 2011), which is the simplest form of higherorder stress balance, yet still it accounts for horizontal stress coupling, which makes our approach novel.Prior to this study, the adjoint method has not been applied to a timedependent ice model with a nonlocal stress balance.Ray-mond and Gudmundsson (2009) developed a method for a maximum a posteriori (MAP) estimate of basal properties in terms of observed surface properties, taking the steady state of the continuity equation into account.Their forward model contains the full Stokes stress balance without approximation.However, it is in a single horizontal dimension, and is based on analytical transfer functions that assume Newtonian rheology as well as small perturbations about a mean state (Gudmundsson, 2003).In addition to the conceptual novelty of using a time-evolving adjoint model for inversion, a technical novelty of our study is the use of algorithmic (or automatic) differentiation (AD) (Griewank and Walther, 2008) for an ice model that involves longitudinal (nonlocal) stress balance terms.AD tools are a powerful array of software that are capable of generating adjoint model source code via lineby-line differentiation of the numerical model.The pliability of the AD tools means that sensitivities to diverse sets of inputs can be examined, and changes to the input set, the cost function, and the forward model can be reflected in the adjoint much more easily than through separate error-prone byhand adjoint code extensions.Finally, the approach provides the exact adjoint of the discrete model.In some of the glaciological studies mentioned above, the nonlinear dependence of viscosity on strain rates has been ignored in the adjoint calculation.which is generally a safe approximation, but in some cases has been shown to cause difficulties with model inversion (Goldberg and Sergienko, 2011).
The ice sheet stress balance equations (depth-integrated or otherwise) have a differentiable structure which considerably simplifies the derivation of their adjoint.This, and the fact that they do not involve time stepping, enables straightforward discretization of the adjoint differential equations.This option is, in practice, much simpler than applying AD, which is not well-suited to iterative methods, such as those used to solve the nonlinear elliptic partial differential equations of the ice models.On the other hand, other conservation equations involved in solving the ice flow, such as the thickness (or continuity) equation or conservation of heat, only solve for local interdependencies and are well suited for AD.The adjoint of Heimbach and Bugnion, as well, was generated with AD tools -the only instance of AD application to a land ice model.Here, we shall adopt a mix-and-match strategy, using both AD-generated code and "hand-written" solvers where this appears more efficient to produce the exact model adjoint.
In the following, we briefly describe the forward model used in our study, and then discuss the application of the AD tool to the model, notably the steps taken to deal with the nonlocal stress balance.We then proceed to demonstrate the utility of the adjoint through several idealized sensitivity calculations as well as state and parameter estimation experiments.

Forward model
The ice model used in this study is an extension of the stress balance solution presented in Goldberg (2011).It is a hybrid model, also referred to as L1L2 under the Hindmarsh (2004) classification, meaning it accounts for vertical shear in its stress balance, although not as completely as the Blatter-Pattyn balance (Blatter, 1995;Pattyn, 2003;Greve and Blatter, 2009).On the other hand, the balance requires the solution of a two-rather than three-dimensional nonlinear elliptic differential equation, greatly reducing computational expense.The balance is derived by making an approximation to the variational principle corresponding to the Blatter-Pattyn equations rather than to the equations themselves.It has been demonstrated to be a good approximation to Blatter-Pattyn and to Stokes flow (Sergienko, 2012), especially when some level of basal sliding is present.In addition, the model solves the depth-integrated continuity equation for ice thickness and accounts for grounded and floating ice through a hydrostatic floatation condition.
Table 1 is a pseudocode version of the ice model.We present this diagram for clarity, but also in order to aid the description of adjoint generation in the following section.At the beginning of a given time step, h (n) , the thickness at time t n is known.The cells that are floating are determined from the hydrostatic floatation condition: where ρ w and ρ are ocean and ice densities, respectively, and R is bed elevation (negative when below zero).This also determines surface elevation s (n) , because when the ice is floating a fraction ( ρ ρ w ) of total column thickness is below sea level.These operations are represented in the pseudocode by UPDATE_FLOATATION, which also sets the contribution of basal sliding coefficients to zero in floating cells.
Following this call the nonlinear hybrid stress balance is solved for velocity u (n) , using the scheme from Goldberg (2011).This involves first evaluating the discretized form of the glaciological driving stress τ d = ρgh∇s (CALC_DRIVING_STRESS), which depends on s (n) and h (n) .This is then followed by Picard iteration on viscosity and basal coefficients.In each iteration m of the loop, the matrix A m (corresponding to the two-dimensional elliptic PDE mentioned above) is constructed, using current iterates of nonlinear ice viscosity ν (m) and basal coefficient β eff is a variable unique to the stress balance derived in Goldberg (2011), and thus its form is given here for the benefit of the reader: Here u b is the magnitude of basal sliding velocity (u| z=b ), and f (u b ) determines the functional relation between basal • CALL ADSTRESS_CG_SOLVE Equations ( 7) and ( 8) (non-AD) stress (τ b ) and sliding velocity: u b is determined from basal stress and depth-averaged velocity under the approximation of depth-independent longitudinal stresses, and ω is a factor that depends on the current iterate of viscosity, as defined in Goldberg (2011) (Eq. 35).
Next the resulting linear system A m u (m+1) = τ d is solved for the new iterate of u (STRESS_CG_SOLVE).The nonlinear ice viscosity and sliding coefficients are then updated with this new guess for velocity (UPDATE_VISC_BETA).
Following the velocity solve, thickness is updated via the depth-integrated continuity equation, using a simple secondorder flux-limited finite volume scheme.If calving front advance is allowed, this is carried out using an algorithm based on that of Albrecht et al. (2010), though no calving parameterization has been implemented.This all takes place in ADVECT_THICKNESS, and concludes the time step.
In the hybrid balance, the viscosity ν and sliding coefficient β eff (which depends on velocity even for a linear sliding law) have slightly more complicated expressions than in the SSA balance.However, under the Picard iterative scheme employed, the updates of these fields are straightforward.Additionally, UPDATE_VISC_BETA can seamlessly be re-placed by an SSA viscosity update, effectively making the model a shallow-shelf model.
The matrix A m referred to above is constructed based on a finite element method with bilinear Q1 elements on a rectangular mesh.ν (m) and β (m) eff are considered constant within an element.Finite elements were chosen, not to use a mesh that conforms to irregular boundaries, but rather because of the ease with which complicated nonlinear boundary conditions can be implemented.A conjugate gradient method is used to solve the linear system, with a simple Jacobi preconditioner.
The model was implemented as an extension (a "package") within the Massachusetts Institute of Technology general circulation model (MITgcm; Marshall et al., 1997).As such the code takes advantage of various grid definitions and metrics, file I/O subroutines, and the MITgcm parallel domain decomposition and message-passing facilities.The ice flow model run time has been observed to scale favorably with domains of ∼40,000 degrees of freedom and tens of processors.Developing the ice model within the MITgcm framework has other potential advantages: while the model is currently isothermal, there is the potential to rapidly implement temperature transport (and transport of other scalars) using the generic tracer code developed for MITgcm.An ice model that shares grid and modeling components with an ocean model also lends itself to ice-ocean coupling, which is a future development direction.However, the most important benefit for present purposes is that we are able to take advantage of the adjoint generation framework within the MITgcm, as discussed in the following section.

Model adjoint
The notion of "adjoint" is best understood in terms of its construction.Assume that, in a time-dependent ice model, one were concerned with how the initial thickness at a single location affected the thickness field at the end of the run.Assume the initial thickness field is represented by a vector h (0) , and the thickness at the final time step t n by h (n) .If the model is differentiable, one can then consider a perturbation to a single element of h (0) (i.e., a directional derivative) and propagate the perturbation forward to the final time to the resulting perturbation in the model output.This forward-propagation of perturbations is known as a tangent linear model (TLM).The TLM provides information about sensitivities of model output to a single variable (i.e., h (0) (i), the value of h (0) in cell i).If one wanted information about sensitivities to all such h (0) k , k = 1, ..., M, where M is the size of the grid, the TLM would need to be run M times.(Note that in this context, the "single location" referred to above is not a point in space, but rather a localized area that depends on the model's discrete representation of a continuous field.If the representation were spectral in nature, a localized value could not be well represented by a single degree of freedom.) On the other hand, the adjoint can provide this information in a single run, provided the output of interest is scalarvalued.This scalar is often referred to as a cost, objective, or target function, for instance, the total volume of ice in the domain at the end of the run.The adjoint model is referred to as such because it is the mathematical adjoint of the TLM.This seemingly trivial distinction has important implications for how the models are constructed.For the TLM, the forward perturbation is found by successive compositions of the TLMs of sequential time steps.The adjoint model operates backwards in time, by composing the adjoints of individual time steps (or operations within time steps) in reverse order.The eventual result of the TLM is the sensitivity of the output to a single input variable, whereas the result of the adjoint model is the sensitivity of a single scalar output to a set of input variables.In the remainder of this section we give an outline of the generation of our ice model adjoint, highlighting the varied approaches employed.For a more comprehensive discussion of the mathematical meaning and implications of adjoint models, see Heimbach and Bugnion (2009).
First we introduce notation that will aid our discussion.Each state variable of the model has a corresponding variable in the adjoint model state (or more formally, a co-vector in the dual to the tangent space of the model at the given point in state space).For a state variable x, we denote its tangent space counterpart by δx, and its adjoint by δ * x.In gener-ating the adjoint we relate the adjoint state variables to one another.Assume the state variable y derives from x through the atomic operation y = g(x).To generate the tangent linear model, we find δy, the perturbation to y, by applying the linear operator ∂g ∂x to δx.Conversely, we track sensitivities of a cost function J from the final time backward.This means, if the sensitivities of J to y are stored in δ * y, we find δ * x by applying the adjoint of ∂g ∂x to δ * y.In Table 3 we give a pseudocode version of the adjoint ice model, corresponding to the version of the forward model presented in Table 1.The time-stepping loop is now from the final time t f to 0. Note that the intermediate steps of a single time step occur in reverse order, and the adjoint of the Picard iteration loop begins from m term , the termination step of the forward Picard loop.From this it can be seen how initial condition sensitivities, denoted δ * h (0)  ≡ ∇ [h (0) ] J , might be found, as well as parameter sensitivities.For instance, β eff derives in part from the sliding parameter field β in the pseudo-subroutine UPDATE_VISC_BETA, and so contributions to δ * β are calculated in ADUPDATE_VISC_BETA.Note that the divergence operator in ADCALC_DRIVING_STRESS is actually the discrete divergence, corresponding exactly to the discretized gradient operator CALC_DRIVING_STRESS.Note also that the updates of δ * h (n) and δ * R from δ * s (n)  in ADUPDATE_FLOATATION involve the same conditional statement as in UPDATE_FLOATATION.Also in this pseudo-subroutine, the backward propagation of βsensitivities are terminated where the floatation condition is satisfied.
The majority of the adjoint generation is carried out with AD software.Source-to-source transformation AD tools generate adjoint code by treating each line of source code as an atomic step and finding its adjoint -similar to the transformation implied by Tables 1 and 3 but at a higher level of granularity.Several products are available, including open-source software such as OpenAD (Utke et al., 2008).We use the tool TAF (Transformation of Algorithms in Fortran; Giering et al. (2005)), which has long been used for adjoint generation with the MITgcm (Heimbach et al., 2005), and which was used in Heimbach and Bugnion (2009).This highlights an additional advantage of developing the ice model with MITgcm: the readiness with which the AD tools can be applied.The code needs to satisfy certain requirements in order to be easily parsed by TAF (e.g., there should not be do-loops of indeterminate length, and the use of pointers should be minimal), and conforming to the standards of MITgcm helped to ensure this.Additionally, MITgcm defines rules for steps involved with parallel message-passing, such that they can be treated as adjointable operations, and thus the adjoint model is parallel as well.The same is true for our ice model adjoint.
In Table 3, the steps to which AD tools are applied are indicated by (AD).These are straightforward operations and we had minimal difficulties applying the tools.Only one step, the D. N. Goldberg: Parameter and state estimation with a time-dependent adjoint marine ice sheet model matrix solve, is not handled by AD.The conjugate gradient algorithm employed involves a large number of intermediate variables relevant only to the solver, and can require many iterations.A direct application of AD tools to the solver would involve large memory requirements, as well as a great deal of either code modification or manual "directing" of the tools, to prevent costly recomputation loops.On the other hand, TAF (as well as other AD tools) offers a facility to replace the adjoint code that is automatically generated by manually written code at the subroutine level, if the adjoint of a subroutine is known, as is the case with the linear solver.As in Table 1, the linear solve can be written where u (m+1) is the iterate of velocity found at step m of the Picard loop, A m is the matrix constructed with the previous iterates of viscosity and basal coefficient, and τ d is the driving stress ρgh∇s at the current time step.The solve can be viewed as an operator g with arguments A m and τ d , i.e., (5) thus the adjoint operator must have the form The adjoint of Eq. ( 4) is given by Equation ( 7) is written as an accumulation of adjoint sensitivities of τ d ; the adjoint of each Picard iteration has an effect on δ * τ d .In our adjoint model, a subroutine carries out these operations; since A m is self-adjoint, the same conjugate gradient solver (with the same matrix coefficients) is used in Eq. ( 7).Note that in the MITgcm ocean model, which solves a linear system for either rigid-lid surface pressure or free surface elevation, an equation similar to Eq. ( 7) is solved by the adjoint model, and the symmetry of the matrix is similarly exploited.However, the coefficients of the matrix are fixed, and so Eq. ( 8) has no counterpart.Following the solution of Eq. ( 7) and Eq. ( 8) by hand-written code, evaluation of the adjoint via AD-generated code is resumed: δ * τ d and δ * A i are passed to the adjoint of the matrix construction, ADBUILD_STRESS_MATRIX.
An important by-product of this approach ("hiding" the matrix inversion from the AD software) is that it allows us to potentially replace our linear solver with faster, optimized "black-box" solvers (such as those available in external packages such as PETSc) without affecting the accuracy of the adjoint.We point out that the advantages mentioned here are not limited to our model, i.e., to one that implements a shallow-shelf or hybrid stress balance.Matrix inversion is the most time-intensive component of any ice model with a nonlocal stress balance, and the part of the code that is most likely to lead to difficulties in application of AD software.As long as it can be handled in a similar way to our model (i.e., as long as the matrix is self-adjoint), efforts can be focused on ensuring that the remainder of the code is suitable for algorithmic differentiation.
It was found that, in order that the adjoint model produce accurate results, the CG tolerance for the linear solve in the adjoint needed to be several orders of magnitude smaller than that used in the linear solve of the forward model.(The accuracy is assessed by comparing the derivatives calculated by the adjoint to finite-difference approximations.Relative agreement to within 10 −6 was considered accurate.)This suggests that without special treatment of the convergence criteria, a fully AD-generated adjoint might have low accuracy, and further supports our decision to let the AD software "bypass" our linear solver.

Nonlinear optimization
For optimization problems, our model uses the M1QN3 library, publicly available Fortran code which is based on the algorithm described in Gilbert and Lemaréchal (1989).The M1QN3 algorithm solves large-scale unconstrained minimization problems.It provides a search direction and step size based on a limited-memory quasi-Newton approximation to the Hessian of the objective function J , using gradients of J that are provided by the user throughout the minimization process.The gradient of J is calculated by the adjoint.The M1QN3 software has been adapted for use with the control package of MITgcm, and so we are able to make use of it as well.

Experiments
We present a suite of experiments that showcases the adjoint and optimization capabilities of our model.The optimizations consist mainly of "identical-twin" experiments, in which the field being inverted for is known a priori, and the "observed" data is a perfect solution of the model.Such inverse problems have been termed "inverse crimes" (Colton, 1998), because they are considered too optimistic to provide a reliable test of performance.However, an inverse model must at a minimum be able to perform well on these idealized experiments, which demonstrate the strengths and limitations of the model.

Experiment 1: sensitivity to ice shelf stiffness and melt rates
The first experiment does not involve optimization, but simply demonstrates the interpretive powers of the adjoint model.We consider the sensitivity of the volume of grounded ice in a marine ice stream to thermodynamic effects on its adjacent ice shelf.Such effects are of considerable importance, given observations of the Antarctic coastline made over the last several decades.Confined ice shelves are known to act as logjams to ice stream flow (a phenomenon referred to as ice shelf buttressing; Dupont and Alley (2005)) and therefore exert a large control on grounded ice mass balance.The heat contained in Southern Ocean subsurface waters is able to cause melting at the underside of Antarctic ice shelves, most notably those in the Amundsen Sea embayment (Jacobs et al., 1996;Jacobs, 2006;Jenkins et al., 2010;Jacobs et al., 2011).Meanwhile, widespread speedup, thinning, and mass loss have been observed in these ice shelves and the ice streams that feed them (Rignot, 1998;Rignot et al., 2002;Shepherd et al., 2002Shepherd et al., , 2004)).
A number of modeling studies have been carried out exploring the magnitude and distribution of sub-ice shelf melting that results from intrusions of warm water into an ice shelf cavity, as well as how these quantities depend on the geometry of the cavity and the strength of the forc-ing (e.g., MacAyeal, 1984;Jenkins, 1991;Hellmer et al., 1998;Holland et al., 2008;Little et al., 2009;Heimbach and Losch, 2012).Less frequently asked is how the response of grounded ice depends on the magnitude and spatial distribution of melting, though Walker et al. (2008) and Gagliardini et al. (2013) have investigated this question.To understand how large-scale changes in ocean heat content and circulation can affect ice sheets, both questions are important.
Here we address the second question with an idealized ice sheet-stream-shelf system.Ice is allowed to flow in a rectangular domain of 150 km × 150 km, where ice flux input is constant along the x = 0 boundary (simulating flow from the ice sheet interior).Along the x = L = 150 km boundary a calving front boundary condition is imposed, whether ice is grounded or floating (Weertman (1957); Schoof (2007), Appendix B. A channel runs the length of the domain, deepening away from the front.The bedrock elevation is expressed (in meters) by where x and y are in km.Sliding is governed by a linear sliding law, i.e., Within the channel, i.e., where 50 km ≤ y ≤ 100 km, β 2 is set to 30 Pa (m/a) −1 .Outside of the channel it is 9 times larger.The lateral boundaries at y =0 km, 150 km are no-slip boundaries, but the resistance to ice stream flow comes from basal stress in the outer "sheet" region, not the sidewalls.The Glen's law parameter A is set uniformly to 9.5 × 10 −18 Pa −3 a −1 , which corresponds roughly to a temperature of −15 • C. The model is run to equilibrium, shown in Fig. 1.
An ice shelf about 50 km long forms over the channel.
To examine changes in grounded ice, we consider volume above floatation (VAF), defined as volume of ice that would contribute to sea level rise if all of the ice in the domain were to melt, and the loss thereof (Dupont and Alley, 2005).Note that the floating columns of ice do not contribute to VAF.We calculate the adjoint sensitivities of VAF loss during the ten-year run to two different input fields: basal melting under the ice shelf ( ṁ), and the Glen's law flow parameter A. A realistically depends on the temperature and fabric of the ice, but here we consider dependence on A directly.We define a scalar function where "HAF" is height above floatation, R(i) and h(i) are bed elevation and thickness at final time in cell i, and x and y are spacings on the (here uniform) grid.J is the cost function, or objective function, for this experiment (equal to final-time VAF), and it is the scalar function of which the gradient is found, with respect to ṁ and A, by the adjoint model.Melt rate sensitivities are shown in Fig. 2a.The value at a location (i.e., in cell i) can be interpreted as the loss in VAF after 10 yr with a constant melt rate of 1 m a −1 in cell i.Sensitivities are only nonzero in locations where ice is floating.This is due to a rule in the model that melt rates cannot be applied under grounded ice: so even though adjoint sensitivities are propagated backward in time, they terminate at the point in the code where melt rates are applied.The pattern of sensitivities is interesting: they are largest not in the deepest part of the shelf near the grounding line, where melt rates are generally highest, but rather in the margins of the shelf.This implication (which should be taken with many qualifications, as explained below) that shifts in melt rates near relatively shallow ice shelf margins could have stronger impacts on grounded ice than shifts of equal magnitude near the grounding line.Also curious is the fact that sensitivities are actually negative (though very small) near the ice shelf front.This effect is in fact realized in forward runs: thinning of the ice shelf front leads to flux across the ice shelf margin (into the shelf) and drawdown of the grounded ice cliffs, lessening their contribution to VAF loss.
Note that these results differs from those of Walker et al. (2008) and Gagliardini et al. (2013), who found that melt rate patterns which concentrate toward the grounding line have the greatest impact on grounded ice.However, these studies used flowline models and thus could not resolve the effects of thinning near shear margins.Therefore, this experiment highlights the need to resolve both horizontal directions when assessing the impact of melting on ice shelf buttressing.
Figure 2b shows sensitivities with respect to A. (Values are large because the nominal value of A is on the order of 10 −17 Pa −3 a −1 .)Here values are nonzero in grounded and floating ice, and are largest in the margins of the ice shelf (and negative, but small toward the ice shelf front).A is sometimes referred to as the "fluidity" of ice; i.e., the larger it is the more easily the ice flows.A positive change corresponds to weakening of the ice, and weakening in the margins leads to the most grounded ice loss.
That the thinning (through melting) and weakening of an ice shelf can lead to grounded ice loss is well established on theoretical (Thomas, 1979) and modeling (Dupont and Alley, 2005;Goldberg et al., 2009;Little et al., 2012) bases.But less-oft discussed is which parts of the shelf are most sensitive to this mechanism, that is, the structural integrity of the ice shelf.The result of our idealized case, with a small, confined ice shelf, suggests that the margins are the weak points of the shelf.It is not clear to what extent this applies to ice shelves in general, although intuition suggests that margins play a similar role in all confined, relatively small shelves.This demonstration shows that a similar analysis can be done on any ice shelf-ice stream system (although a baseline solution free of artificial model drift would be required for meaningful results).
The computational advantage of the adjoint in producing data sets such as those shown in Fig. 2 is considerable.Using a domain decomposition over 9 processors (with a 50 × 50 grid belonging to each process) took approximately 12 minutes, or 5 seconds per time step.The adjoint run that produced both data sets in Fig. 2 (and could have produced additional adjoint sensitivity fields as well) took approximately 4 times longer, giving a total run time of about an hour.(The additional run time is because parts of the forward model must be run multiple times to provide state information for the reverse-time adjoint run.)If, on the other hand, one were to approximate sensitivities by perturbing single parameter values and using one-sided finite differencing, the melt rate sensitivities would take approximately 50 × 50 × 0.2 h ≈ 25 days (since only a portion of the domain is ice shelf) and the A-sensitivities about 9 times as long.
While the efficiency of the adjoint in finding sensitivities is obvious, it should be kept in mind that the analysis is inherently linear: a forward trajectory is needed around which to perturb.In this case the trajectory was a quiescent one, as the run began in steady state with no melting.As discussed above, melting increases flux across the grounding line through loss of ice shelf buttressing; sufficiently high melt rates would lead to grounding line retreat.This, in turn, would result in increased grounding line thickness (due to the shape of the bedrock), leading to further mass flux increase (Schoof, 2007).The latter effect is nonlinear, however, and is not detected by our adjoint results.When calculating adjoint sensitivities, the results should not be taken at face value, but rather serve as a starting point for further investigation.It is worth noting that Goldberg et al. (2012b), using a coupled ice-ocean model that allowed grounding line migration, found that thinning of an ice shelf at the margin due to melting was a key factor in unstable retreat of grounded ice -giving credence to the high sensitivities of VAF loss to values in the ice shelf margin.

Experiment 2: inversion of basal sliding coefficients from surface velocities
In our first inversion experiment we infer basal sliding coefficients from surface velocities, considering only the momentum balance of the model (i.e., no time dependence).This is an identical-twin experiment, in that the surface velocities come from model output with prescribed parameter values (or "true" values), and the inverted parameters are then compared with the truth.The forward model is one-dimensional (only one horizontal direction is considered, and the SSA balance is implemented) with periodic boundary conditions, and surface slope and thickness are constant.Many authors have carried out similar inversions over the past two decades, both with synthetic observations and real ones (e.g., MacAyeal, 1992MacAyeal, , 1993;;Vieli and Payne, 2003;Khazendar et al., 2007;Sergienko et al., 2008;Morlighem et al., 2010;Joughin et al., 2009).The purpose of this experiment is not to introduce a new type of glaciological inversion, or to improve upon an existing one, but rather to enable direct comparison with existing methods.As one of the simplest glaciological inverse problems, our adjoint optimization framework must, at a minimum, perform as well as other inversion methods.
The experiment is based on Experiment B from the ISMIP-HOM intercomparison (Pattyn et al., 2008) with L = 40 km, and 1 km resolution.The intercomparison specifies a constant thickness of 1000 m, a constant surface slope of 0.1 • , and a linear sliding law with a mode-one sinusoidally varying sliding coefficient, or lubrication, β 2 (Fig. 3a).This profile of β 2 represents our true parameter values.The Glen's law parameter A is uniformly set to 10 −16 Pa −3 m a −1 .We define as u * 1 the x velocities from the model with these parameters.
For the inverse problem to determine β 2 , we define a cost function on the model output: where the summation is over all cells i, and u * 1 (i) and u(i) are the nodal values of the observed velocities u * the current guess for β through the stress balance.(In this inversion, we attempt to recover β, not β 2 , which is the easiest way to impose the constraint that sliding coefficients are nonnegative.)The numbers σ i are scaling factors for the cost function.Generally these scaling factors represent a priori knowledge or first guesses regarding observations or parameters; for instance, σ i might be the uncertainty in the velocity observation.In practice, this prevents poorly constrained observations from leading to overspecification.In this example, the scaling factors are set uniformly to 1: this presents no loss of generality, as long as values of J are compared to the initial value.
For a given β, the adjoint finds sensitivities of J with respect to β.The cost function is then minimized using the optimization algorithm described in Sect. 4. Notice that no regularization terms have been added to the cost function to ensure a priori properties of the lubrication field, e.g., smoothness and boundedness, although we include such terms in later experiments.
Figure 3b shows how J evolves, eventually reaching cost reduction, defined as J J 0 , on the order of 10 −6 .J 0 is the value of J using the initial guess for β 2 (described below).For comparison, an inversion is also carried out where the "approximate" adjoint of MacAyeal (1993) is used, rather than the adjoint sensitivities from the AD-generated adjoint (the "full" adjoint).The approach is termed approximate because the dependence of viscosity on strain rates is ignored.
In both cases, the same optimization algorithm is used, so calculated adjoint sensitivities are the only difference between the two inversions.
The inverted β 2 is not shown, but it is very close to the true profile.The same is true of the experiment with higher surface slope, described below.This is because of the lack of high-frequency variability in the initial guess.If the initial guess had a broad spectrum with power at high frequencies, such as a Gaussian shape (not shown), the inversion would still decrease J as in Fig. 3b, and the inverted β 2 would have the same broad pattern as the true profile, but there would be high-frequency variability as well.This is because the velocities in the model are insensitive to high-frequency variability in β 2 ; they are muted by longitudinal stresses.
Both optimizations begin with the same initial guess for β 2 , a half-mode sinusoid of the same amplitude.A uniform β would be the simplest guess, but with a constant thickness and surface slope there is zero strain rate, and this leads to very high values for viscosity.The performance of the inverse model then depends on the viscosity regularization parameter.With the hybrid balance, this is not an issue; but in this experiment the SSA balance is used so that different inversion approaches can be compared.
The full and approximate adjoints perform equally well in the experiment with a surface slope of 0.1 • , with the full adjoint actually leading to greater values of J early on.However, when surface slope is increased to 0.5 • (as in the ISMIP-HOM experiments with flow over a wavy bed), their performances differ.Now the target surface velocities u * 5 (i) are an order of magnitude larger (Fig. 3a), and it is important to maintain the nonlinear dependence of viscosity on strain rates in the adjoint, as evidenced by the poor performance of the approximate adjoint (Fig. 3c).This can also be seen by examining the adjoint sensitivities from the two models.After the first iteration of the inversion (Fig. 3d) the sensitivity profiles are similar, yielding similar search directions in the optimization algorithm.On the next iteration, however, the profiles look very different, and upon being provided with the approximate adjoint sensitivities, the optimization algorithm is unable to lower the cost function.This is not to say that the AD-generated adjoint model is in all cases an improvement over the approximate adjoint in this type of inversion.The optimization algorithm has a number of associated parameters; it is possible that, for the experiment considered, different parameters might yield better performance with the approximate adjoint, or worse with the full adjoint.It is clear, nevertheless, that instances exist where the full adjoint is advantageous.

Experiment 3: estimation of past conditions
We test the ability of our inversion framework to recover two different parameter fields simultaneously based on timedependent data.Previous studies have considered timedependent observations, for example Jay-Allemand et al. (2011).However, the assimilation in this study consisted of a series of "snapshot" inversions of surface velocities; no dynamic consistency between the states at different times was enforced.We re-emphasize that the propagation of sensitivity/misfit information back in time by the adjoint model implies that the inversion takes advantage of available observations not only forward in time (as do filter or sequential assimilation methods) but also backward in time (smoother or variational methods), mediated through the model dynamics.
We consider a mountain glacier undergoing adjustment in response to a perturbation in basal conditions.We assume we have knowledge of surface velocities at the end of this adjustment (the "present") and of thickness and surface elevations at certain discrete times during the adjustment, but not of the initial thickness, nor of the basal conditions during this adjustment.Perfect knowledge of bed elevation is also assumed (this is not, in general, the case, and in the next experiment we deal with bed topography uncertainty).We ask whether we can recover this initial thickness and basal traction with adjoint-based optimization.Reconstructions of past glacier configuration, coupled with inversions of basal properties, could be very useful in certain glaciological settings.For instance, if an ice stream is known to be out of balance due to relatively recent changes in its basal environment, such an inversion could give us thickness of the stream prior to its observational history or, conversely, could pinpoint the time of onset of the changes.
The forward model is again a periodic domain with a constant bed slope of −0.5 • in the x direction.Both horizon-  4c, which is used as the initial guess.Maximum misfit is reduced from 9 to 1.8 m; rms misfit is reduced from 2.6 to 0.25 m.(c) Inverted initial thickness when velocity is "frozen" in its observed state (Fig. 4d).(d) Inverted initial thickness when only the observed thickness at t = 10 yr is used to constrain the problem.tal dimensions are resolved, and the domain is y periodic as well, with zero bed slope in the y direction.The domain is 40 × 40 km with 1 km resolution.The initial thickness is uniform with a value of H 0 = 1000 m.The Glen's law parameter A is as in the previous experiment.A hybrid stress balance is used.The sliding law is again linear, with sliding coefficient β 2 .The true β 2 is defined as with units of Pa m a −1 , where r is distance from the center of the domain in km.The model is integrated forward with 30-day time steps for 10 yr, at the end of which the model is close to a new equilibrium.Figure 4a shows the initial surface and thickness, as well as annual surface elevations during the adjustment (magnified a hundred-fold), along the center line y = 20 km.
For data in our identical-twin inversion to recover β 2 and the initial thickness, we take annual surface elevation over the last 5 yr (the red curves in Fig. 4a) and surface velocity averaged over the last year (Fig. 4d).These are referred to below as the "target data".We define a cost function where the outer summation is over all cells i. u s * (i) is the target surface velocity in cell i, and s (n), * (i) is the target surface elevation in cell i in year n.σ u and σ s are meant to signify uncertainties in velocity and surface measurements, but and σ s affect the results of the inversion, however.We use 1 m/a for σ u and 2 cm for σ s .Thus when J ∼ O(1) then the model misfit is on the order of observational uncertainties on average.This value is motivated by the spread of surface elevations over the 5 yr sampling period; field measurements of surface elevation generally have O(∼ 1 m) errors, at least in relatively flat regions (Griggs and Bamber, 2008).GPS measurements are capable of much higher accuracy than remote sensing, provided the presence of a nearby exposed promontory, or ice for which the surface elevation is changing on much slower timescales (N.Gourmelen, personal communication, 2013), but we concede that GPS observations are unlikely to be collected with such high spatial resolution.
For initial guesses, a uniform value of β 2 guess ≡ 400 Pa m a −1 is used for β 2 ; as for h (0) , the observed thickness at t = 10 yr is used.That is, where R is the bedrock, the rationale being that, in this artificial experiment, the glacier is in this configuration (or close to it) during the period of observation.
Figure 5a shows the cost function trajectory.About 60 to 80 gradient evaluations are required for J to be O(1), but the inversion is carried farther than that.Figure 5b shows the final estimate of h (0) .Remnants of the initial guess can be seen, but the associated misfit (when compared with the true h (0) ) has a maximum amplitude of about 1.8 m, whereas that of the initial guess has a maximum of 9 m.In terms of root mean square error, this value is reduced from 2.6 m for the initial guess to 0.25 m.The estimated β 2 is not shown; it differs from the true β 2 by at most 2 %.
To appreciate the complexity of this inverse problem, one should contrast with that of reconstructing the history of an advected field in a flow that is fixed, or independent of the field.Such problems have been dealt with frequently in generic flow fields.Our problem is more complex, in that the velocities depend nonlinearly and nonlocally on the advected quantity (thickness).To illustrate this, another experiment is carried out, in which depth-averaged velocity is fixed to the target surface velocity, independent of time.The cost function consists only of the second term in Eq. ( 17), and the only control field is initial thickness, with initial guess the same as before.The forward model is essentially just the mass balance equation.The estimated h (0) is shown in Fig. 5c.In this experiment the cost function only decreases by a factor of 20 (as opposed to the O(10 6 ) decrease shown in Fig. 5a).More interesting is that the final estimate of h ( 0) is actually worse than the initial guess.This is because the divergence pattern in the target velocity field (Fig. 4d) differs from that throughout much of the "truth" simulation.
It is also interesting to consider how the time dependence of the constraints affects the inversion.The forward model is close to steady state by the end of the 10 yr integration, and the observed surface fields s (n), * differ on the order of centimeters.Since they are so close, one might guess that constraining the surface at years 6 through 9 of the simulation adds little information beyond that contained in the surface elevation at year 10.However, this is not the case.We carry out another inversion in which the cost function given by Eq. ( 17) is modified so that the summation over n only contains one term, n = 10.That is, only the final surface elevation is constrained.In this experiment, the stress and mass balances are again used, rather than solely the mass balance.Figure 5d shows the result of this inversion.In this case the estimated initial thickness is much closer to the initial guess than the true h (0) .Valuable information about the thickness trajectory is contained in the intermediate surface observations, even though the temporal variability is small.This particular result hinges on a level of measurement accuracy that is not generally attained; still, it demonstrates the importance of time-dependent information in estimating past ice sheet behavior and other unknown parameters.
To our knowledge, a similar inversion experiment has not been carried out with a higher-order model; although numerical inversions for historic data, including prior ice thicknesses, have been carried out using a model that makes the shallow ice approximation (e.g, Waddington et al., 2007).For this reason, Experiment 3 is intended as a preliminary investigation in inferring unknown parameters based on timedependent data, in which different types of data (surface elevation and velocities) are not co-temporal.There are many other questions that can be asked of such inversions: for instance, how is the inversion affected if data is not spatially complete, and if the "holes" in data at different times are not coincident?Does the quality of the estimation decay with the hindcast horizon?The choice of duration of our experiment was based on the adjustment time of the forward model, but it is unclear at which point the signal of observations gets lost in the noise of the model.Also, we did not allow for timedependent lubrication, as in Jay-Allemand et al. ( 2011); how does this affect the ease of solution?
Further, the assumed accuracy of surface elevation measurement (centimeter scale) is far below that expected by satellite or airborne sensing (meter scale).Nevertheless, the exploratory nature of our experiment is warranted and provides considerable substance, with more realistic accuracy and spatial resolution of measurements being an avenue of further investigation.

Experiment 4: simultaneous inversion of basal topography and sliding coefficients
Two prevalent challenges in modeling dynamic behavior of real glaciers and ice sheets are those of basal topography and of model initialization.Basal topography is often collected by sparse flight lines of airborne ice-penetrating radar, leading to very low-resolution representations of ice thickness much lower than that required by models used to study highly dynamic features such as ice streams.Recent published errors of gridded Antarctic ice thickness uncertainty are tens of meters or more (e.g., Fretwell et al., 2013).Given the sensitivity of models to representations of bed topography (Durand et al., 2011;Seroussi et al., 2011), this introduces large uncertainties into model response.Model initialization is important for studies that aim to assess the time-dependent response of ice sheets and glaciers in their current configurations to external forcing.In these studies it is important to start from a quasi-balanced state, in which there is no unrealistic model adjustment or drift that will contaminate the results.Long-timescale spinups can be computationally expensive, and due to parameter uncertainty there is no guarantee that the steady state produced is close to the observed state.
A number of studies have made use of the adjoint method introduced by MacAyeal (1993) to estimate these parameters.However, these inversions are problematic in that the various data sets used are gathered using different methods and resolutions at different times.Very often the solutions can lead to large anomalous mass balances when used to initialize a time-dependent model.Contributing greatly to the error inherent in the inverted solutions are the uncertainties in basal topography (Morlighem et al., 2011).
Two related issues emerge: uncertainties in ice thickness and basal topography, and their impact on the usefulness of assimilated products in model initialization.To deal with the latter, some studies run spinups (albeit shorter ones due to improved parameter guesses) or add "flux corrections" in the term of surface or basal mass balances (e.g., Joughin et al., 2009;Larour et al., 2012a).Such approaches are not ideal, because the model response may be influenced by the adjusted configuration or artificial mass balance.Morlighem et al. (2011) carries out an adjoint-based method that is constrained by the continuity equation with observed velocities, thus guaranteeing a stable mass balance.Basal lubrication and ice stiffness can then be estimated using other methods.However, the method does not consider the stress and mass balance equations together, and thus it does not truly provide a balanced state.In particular, the resulting model velocities may agree well with observed velocities in the L 2 -norm, but this does not guarantee that the divergence patterns will be identical.
With a time-dependent ice model adjoint it is possible to constrain both the continuity equation and the momentum equations in simultaneous inversions for basal lubrication and topography.In this section we explore the potential of using this approach both to minimize drift in model initializations and to provide improved estimates of basal topography.We note that methods for such inversions have been previously developed (Thorsteinsson et al., 2003;Gudmundsson and Raymond, 2008).However, these methods assume Newtonian rheology and rely on linear transfer functions of small perturbations, so it is not clear that their results carry over to large deviations and nonlinear rheologies.
We also point out that, in the general case, the retrieval of basal lubrication and topography is ill-posed.Consider a sliding glacier of infinite extent, with constant thickness, surface slope, and basal lubrication.With negligible horizontal velocity divergence, the glacier would be in steady state.There are an infinity of lubrication/bed elevation parameter pairs that would give the same surface velocity with the given surface elevation.This is an extreme case, but we will keep this potential limitation in mind.On a related note, the degree to which existing studies of inversions for basal lubrication have compensated for errors in bed topography has remained largely unquantified.

Uncertainties in bed elevation
We perform an identical-twin experiment, in which the surface elevation and velocity are known for an idealized glacier in steady state.This steady state is found by time-integrating the model in a doubly periodic, 40 × 40 km domain.A linear sliding law is used, with sliding coefficient β 2 .Bed topography and lubrication are given very simple analytical prescriptions, with single-wavelength variation in both x and y directions: given in units of m and Pa (m/a) −1 , respectively, where x and y are in km and R 0 (x) gives a constant downward slope of 0.5 • in the x direction.The Glen's law coefficient is as in the previous two experiments.Horizontal resolution is 1 km.Note that R and β 2 are uncorrelated.This is not motivated by any realistic model relating basal lubrication to basal elevation; the physics of basal sliding are far too complicated to be addressed in this simple experiment.Rather, it is a simple experimental setup intended to serve as proof of a concept.
To achieve steady state, the surface is allowed to adjust, and the state is presumed steady when the surface elevation rate of change is O(10 −4 m a −1 ). Figure 6 shows R and β 2 , along with the steady-state surface elevation and speed.A hybrid stress balance is used, so the surface speeds do not reflect their depth-averaged values; vertical shear accounts for up to ∼ 30 % of the surface velocity.We set up our identical-twin experiment by assuming perfect knowledge of surface elevation and surface velocity, and little or no information about basal topography and basal lubrication, aside from their general scales.We define a cost function where the summation is over all cells i.The forward model runs for a single time step, and u s and h (1) are the model surface velocity and thickness after that time step.u s * is the observed surface velocity shown in Fig. 6d.The first term in Eq. ( 22) penalizes misfit with observed velocities.σ u is as in Sect.5.3.The second term penalizes model drift, i.e., the amount by which thickness changes over the single time step.If we were constraining the rate of thickness change to be close to a nonzero observed rate, the term would be different, but here we are constraining the model to be close to steady state.γ d is chosen so that the term is order unity when thickness rate of change is on the order of 10 −4 .
The third and fourth terms in (Eq.22) pertain to the "model" norm, rather than the misfit norm (Waddington et al., 2007), meaning they deal with a priori knowledge of the unknown parameters.The third term is a simple Tikhonov smoothing term that penalizes large oscillation in thickness (and bed topography; see below).The norm |•| is that induced by summing the square of the centered finite difference approximation to the gradient over all cells.γ o is chosen so that the term only makes a contribution if thickness gradients are larger than ∼0.1 on average.The fourth term penalizes basal lubrication if it becomes too large during the minimization.It was found that without this term, the minimization algorithm can sometimes favor extremely large values of β 2 in favor of making adjustments to initial thickness, and this term helps to prevent that.For the experiment shown here this term was found not to be necessary.
The basal lubrication β and the bed elevation R are the control variables in the minimization of Eq. ( 22).The surface elevation, s, is constrained to be that of the computed steady state, s ss (the same as that shown in Fig. 6c).Rather than penalizing the misfit of s in the cost function, the surface elevation is controlled exactly through the initial guess for ice thickness.That is, the initial guess for thickness, h (0) guess , and topography, R guess , are defined such that h (0)  guess + R guess = s ss .
When the guess for R is updated by a function δR, the topography h (0) is updated by −δR.
The initial guesses for basal lubrication and topography are β 2 guess (x, y) ≡ 400 Pa (m/a) −1 , (24) The results of this inversion are shown in Fig. 7.While there is significant reduction in J , indicating not only good agreement of model and observed velocities but also small model drift (i.e., steady state is achieved), J does not include a measure of the misfit of R and β 2 .This can be assessed directly, however, since the true values of these fields are known.The inverted R and β 2 fields compare well with their true values; the root-mean-square (rms) error in the inverted basal topography is ∼ 6 m (reduced from 100 m in the initial guess).
Such accuracy cannot always be expected.To demonstrate this an equally simple identical-twin experiment is carried out.It is similar to the experiment discussed above, only the basal traction is in phase with the basal topography, such that the "sticky spots" that can be seen in Fig. 6b are shifted and coincide with the topographic lows in Fig. 6a.Again, there is no specific physical motivation behind the spatial relationship between basal lubrication and topography.Rather, it is to examine the degree of compensation between the two parameters in their inverted fields: since both lubrication and topography control velocity, it is possible that the inversion is nonunique.This type of compensation is similar to the "mixing" referred to by Gudmundsson and Raymond (2008).
The results of this experiment are shown in Fig. 6a.While the reduction in the cost function J is similar (not shown), the inversion is not as accurate, with an rms error in inverted topography of ∼ 15 m.The regions of difficulty can be seen to be those of the topographic lows, where the inverted basal topography perturbation is about 50 m less than the true value.At the same time, inverted basal traction is lower than its true value here.The smaller thickness is compensated by the weaker bed in order to match the observed speeds.Over the topographic bumps, inverted values are more accurate.We draw the tentative conclusion that, when bed strength and topography are uncorrelated, their inversions are more accurate with the procedure used here.

Uncertainties in bed elevation and surface elevation
In the above experiments, the exact surface elevation s ss (x, y) corresponding to the steady state of the model was used as a hard constraint, i.e., the initial surface elevation was constrained to be exactly equal to s ss , no matter the guess for bed topography or lubrication.(The surface elevation after a single time step was obviously not directly constrained.)The rationale behind this choice was the relative magnitudes of error in measurements of elevation and ice thickness in general.However, one might ask how sensitive the above results are to uncertainties in surface elevation.We carry out additional experiments of two distinct types: in the first, we still impose a hard constraint on surface elevation, but we constrain to an inexact estimate of s ss .In the second we impose a soft constraint on surface, i.e., we add an additional penalty term to the cost function, and allow for an additional control.
In these experiments, only runs where the true topography and lubrication are out of phase (as in Fig. 6) are considered.
In the hard-constrained experiments, we perturb s ss to produce our error-prone surface s 0 with a mode 2 pattern: where η s is a specified amplitude.s 0 now takes the place of s ss in Eq. ( 23).The results are not shown, but when η s = 0.1 m (implying accuracy higher than that achieved by satellite or airborne altimetry), the inverted patterns of lubrication and topography are similar to those shown in Fig. 7c and b, and the rms topography error is 5.6 m, similar to that achieved when constraining to s ss .When η s = 0.5 m, a slightly more realistic value, the patterns are again similar, but with a rms error of 11.4 m still a significant decrease in error relative to the initial guess.However, one can assume the accuracy of inversion will worsen when errors are larger, or when true topography and lubrication are correlated.
In our final experiment we weaken the constraint on surface elevation: we do away with Eq. ( 23) and instead allow for independent controls of β, R, and h.To the cost function given in Eq. ( 22) we add the term J s , defined by where s 0 is as in Eq. ( 26).Both η and σ s are set to 0.1 m, implying an accuracy of 0.1 m in surface elevation measurements.Note that the constraint is on the surface after a time step is taken, in contrast to the other experiments, but this should not matter if the system is in steady state.An experiment was carried out with an initial guess of a flat bed, which did not lead to the recovery of an accurate bed topography (not shown).However, the level of accuracy in the prior bed topography estimation in the above experiments was quite low; in practice bed elevation is known at least somewhat accurately from ice-penetrating radar, or seismic inference, along with interpolation.Therefore we choose an initial guess with an error pattern similar to that of s 0 , and with magnitudes of 30 m rather than 200 m: R guess (x, y) = R(x, y) + 30 cos 4πy 40 cos 4πx 40 .
The initial guess for β was as in the previous experiments.The results of this experiment are shown in Fig. 9. Bed topography is depicted in terms of error with respect to the true bed R(x, y), not in absolute (i.e., with respect to sea level).Overall, the reduction in rms bed error was not large (from 15 to 8.3 m.) Nevertheless, the pattern of error is instructive: by comparison between Figs. 6a, b and 9b, it seems that error reduction was significant in areas of high surface velocity or relatively flat bed.Inverted bed lubrication is shown as well, but in an absolute sense, as the initial guess is uniform.The pattern is qualitatively correct, but there are obvious errors, to be expected given the related errors in bed topography.
Here we have examined two approaches that allow for uncertainty in surface elevation measurements; but for a similar order of accuracy, the first (hard-constrained) produces a far more accurate estimate of bed topography and lubrication than the second (soft-constrained).It seems that when the thickness and bed are independent controls, the minimization scheme is free to find agreement with the imposed surface elevation at the expense of the constraint on elevation change.Since both methods incorporate uncertainties in surface elevation, we therefore suggest the former (hard constraint on surface elevation) when recovering bed topography and lubrication based on surface information.However, one would not be able to use such hard constraints for a time-dependent inversion, such as that of Sect.5.3.

Conclusions and further work
Using a synthesis of AD tools and analytical considerations, we have successfully generated the adjoint of an ice model which accounts for both a time-dependent mass balance and a nonlocal, higher-order stress balance.The adjoint model is capable of providing transient sensitivities, allowing for exploration of very large parameter spaces.Coupled with a large-scale optimization algorithm, the adjoint is able to successfully perform the inversions that are typically carried out in glaciological settings (retrieval of basal lubrication parameters) and some that are not (simultaneous retrieval of basal topography and past thickness).The fact that the adjoint is time-dependent means that an evolving model state can be found consistent with observations, enabling ice model initialization without costly spinups or artificial flux adjustments.Perhaps most importantly, this opens up a fundamentally different approach to the use of ice observations for constraining transient ice flow, in particular with respect to satellite remote sensing data that are resolved in time.
The process of adjoint generation is streamlined: a change to the ice model requires simply another application step of AD tools, rather than the derivation of a new set of adjoint equations.The forward and adjoint models are fully parallel, meaning the costly solves of large linear systems can be spread across a number of processors.Currently the linear solver is a very simple conjugate gradient algorithm with a trivial preconditioner.However, the decoupling of the matrix solve from the AD tools means that the efficacy of different solvers can be investigated without affecting the adjoint model.A possible future development is to replace our solver with an external package for higher efficiency, similarly to other ice models intended for large-scale studies (Bueler and Brown, 2009;Larour et al., 2012b).
A number of developments remain to be made in our model in order to apply it to many relevant glaciological issues.As mentioned in Sect.5.1, the floatation condition at the grounding line introduces nondifferentiability in the expressions for basal and driving stress.In its present form sensitivity studies cannot account for grounding line migration, and neither can observation-based inversions involving ice thickness and basal topography as controls.In other adjoint studies these types of issues have been approached by "smoothing out" these nondifferentiabilities (Losch and Heimbach, 2007); we will investigate whether grounding line issues can be dealt with by such smoothing.
An important physical process currently absent in our model is temperature transport.Ice internal temperature, while difficult to observe on large scales, can play an important role in ice flow and in the controls on basal melting and freezing of grounded ice.Many parameter estimation studies assume temperature is steady (Joughin et al., 2009;Morlighem et al., 2010), but there is evidence in certain locations that there is ongoing thermal adjustment due to unsteady transport (Engelhardt, 2004), and complex thermomechanical coupling in shear margins (Schoof, 2012).Using the existing tracer transport framework of MITgcm, we plan to implement thermal advection/diffusion in our ice model.This will enable investigations of the adjoint sensitivities of ice evolution to poorly constrained ice internal temperatures, and to even less well-known geothermal fluxes.Such sensitivities are important to constrain, especially for areas undergoing large change, such as Pine Island (Larour et al., 2012a).
Our forward and adjoint model lay the groundwork for a fully coupled land ice-ice shelf-ocean model that are based on the same model infrastructure.Heimbach and Losch (2012) used an ocean model adjoint to investigate the sensitivities of melt rates under Pine Island ice shelf to ocean forcings.Our ice model is not currently coupled to an ocean model, but such coupling studies have been carried out (without the capability of adjoint generation; Goldberg et al., 2012a).Similarly coupling our model to the MITgcm ocean model (for which an adjoint exists) would allow adjoint sensitivities to then be propagated across the ice-ocean interface, and the sensitivities of ice evolution to ocean forcings could be investigated, among other questions.As demonstrated in Sect.5.1, the structural "weak points" of an ice shelf do not coincide with the places of strongest melting, or necessarily with those areas where melt rates are most sensitive to a given ocean forcing.A coupled ice-ocean adjoint model could reveal which ocean trends are relevant to land ice evolution, which is not currently known.
Lastly, we point out the ubiquity of our approach, as we believe many higher-order ice model codes could potentially be amenable to automated differentiation techniques, as long as the costly linear solvers are hidden from the AD tools (although there may be complicating factors not considered here, such as discrete grid remeshing).Given the utility of being able to generate an adjoint code, we make the suggestion that next-generation ice model code be written with this in mind.

Fig. 1 .
Fig. 1. Background steady state of the ice stream-shelf system in Experiment 1. Coloring on the upper surface is velocity magnitude.The thick black contour denotes the grounding line.

Fig. 2 .
Fig. 2. (a) Sensitivity of grounded ice volume after 10 years to sub-shelf melt rates, in km 3 per (m/a) of melting.The non-colored section of the figure is within the grounded part of the domain (b) Similarly for Glen's Law parameter A, but with units of 10 −15 km 3 Pa −3 a −1 .A thick black contour denotes the grounding line.The largest values in both figures are in the ice shelf margins.Note the differing x-and y-bounds on the figures.

Fig. 3 .
Fig. 3. (a) Data for inverse problem."True" basal sliding coefficient β 2 true (left axis) and corresponding velocity profiles (right axis) for low and high surface slopes.(b) Cost function J versus number of model evaluations for full and approximate adjoint, low surface slope.(c) Same for high surface slope.(d) Gradient found by full and approximate adjoint in first iteration (solid) and second iteration (dashed), high surface slope.48 Fig. 3. (a) Data for inverse problem."True" basal sliding coefficient β 2 true (left axis) and corresponding velocity profiles (right axis) for low and high surface slopes.(b) Cost function J versus number of model evaluations for full and approximate adjoint, low surface slope.(c) Same for high surface slope.(d) Gradient found by full and approximate adjoint in first iteration (solid) and second iteration (dashed), high surface slope.

Fig. 4 . 49 Fig. 4 .
Fig. 4. (a) "True" profile of glacier at time t = 0 and transient surface profiles spaced one year, exaggerated by a factor of 100.The last 5 profiles (in red) correspond to data used to constrain β 2 and h(t = 0).(b) "True" β 2 .(c) Thickness at final time (t = 10a).(d) Surface speed at final time.Region where flow is strongly divergent (convergent) is indicated by solid (dotted) black contours.

Fig. 5 .
Fig. 5. (a) Value of cost function versus iteration.(b) Final inverted initial thickness (h(t = 0))."True" field is uniformly 1000 m.Compare with Fig. 4(c), which is used as the initial guess.Maximum misfit is reduced from 9 m to 1.8 m; r.m.s.misfit is reduced from 2.6 m to 0.25 m.(c) Inverted initial thickness when velocity is "frozen" in its observed state (Fig.4(d)).(d) Inverted initial thickness when only the observed thickness at t = 10 years is used to constrain the problem.

Fig. 6 .
Fig. 6.(a) True basal topography R in Experiment 4, with constant trend in x-direction removed.(b) True β 2 in Experiment 4. (c) Steady-state surface elevation, linear trend removed.(d) Ice surface speed corresponding to steady state.

Fig. 8 .Fig. 8 .
Fig. 8.An inversion similar to Figs. 6-7, but with β 2 true "in phase" with R true , such that the sticky spots coincide with the topographic minima.Initial guesses for R and β 2 are the same as before.(a) Steady state surface elevation with trend removed.(b) Steady state surface velocity.(c) Inverted basal topography.rms error is 16 m.Inverted topographic minima are smaller in amplitude than the "true" topography by ∼ 50.This is compensated by lower β 2 in those regions (d).53 Fig. 8.An inversion similar to Figs. 6-7, but with β 2 true "in phase" with R true , such that the sticky spots coincide with the topographic minima.Initial guesses for R and β 2 are the same as before.(a) Steady-state surface elevation with trend removed.(b) Steady-state surface velocity.(c) Inverted basal topography.rms error is 16 m.Inverted topographic minima are smaller in amplitude than the "true" topography by ∼ 50.This is compensated by lower β 2 in those regions (d).

Fig. 9 .
Fig. 9. (a) The error inherent in the initial guess for bed topography, weakly-constrained surface elevation.(b) The error in bed topography at the end of the inversion.Note that error is reduced where surface velocities are high or true bed topography is relatively flat.(c) Corresponding inversion for β 2 .

Table 1 .
Pseudocode version of forward model time-stepping procedure.

Table 2 .
Pseudocode version of adjoint model reverse-time-stepping algorithm corresponding to Table1.(AD) indicates that this step is processed by the algorithmic differentiation tools.

Table 3 .
Pseudocode version of adjoint model reverse-time-stepping algorithm corresponding to Table1.(AD) indicates that this step is processed by the algorithmic differentiation tools.

www.the-cryosphere.net/7/1659/2013/ The Cryosphere, 7, 1659-1678, 2013 D. N. Goldberg: Parameter and state estimation with a time-dependent adjoint marine ice sheet model since
the observations are synthetic there is no rationale to use spatially varying uncertainties.The relative values of σ u