Over the past 3 decades, inversions for ice sheet basal drag have become commonplace in glaciological modeling. Such inversions require regularization to prevent over-fitting and ensure that the structure they recover is a robust inference from the observations, confidence which is required if they are to be used to draw conclusions about processes and properties of the ice base. While L-curve analysis can be used to select the optimal regularization level, the treatment of L-curve analysis in glaciological inverse modeling has been highly variable. Building on the history of glaciological inverse modeling, we demonstrate general best practices for regularizing glaciological inverse problems, using a domain in the Filchner–Ronne catchment of Antarctica as our test bed. We show a step-by-step approach to cost function normalization and L-curve analysis. We explore the spatial and spectral characteristics of the solution as a function of regularization, and we test the sensitivity of L-curve analysis and regularization to model resolution, effective pressure, sliding nonlinearity, and the flow equation. We find that the optimal regularization level converges towards a finite non-zero limit in the continuous problem, associated with a best knowable basal drag field. Nonlinear sliding laws outperform linear sliding in our analysis, with both a lower total variance and a more sharply cornered L-curve. By contrast, geometry-based approximations for effective pressure degrade inversion performance when added to a sliding law, but an actual hydrology model may marginally improve performance in some cases. Our results with 3D inversions suggest that the additional model complexity may not be justified by the 2D nature of the surface velocity data. We conclude with recommendations for best practices in future glaciological inversions.

With four parameters I can fit an elephant, and with five I can make him wiggle his trunk.

– John von Neumann

Ever since the pioneering “tutorial” of

However, the question remains of just how much information can truly be gleamed from ice model inversions. Fundamentally, the trouble is that a spatially resolved inversion has an infinite number of free parameters, and in a numerical implementation of the problem, the number of free parameters is determined by the discretization used, not by anything fundamental in the problem. Furthermore, the transfer function relating variations in basal shear stress to surface velocity or slope is essentially a low-pass filter

Yet this introduces a new free parameter into the problem: if

In this paper, we first give a brief review of the ways in which regularization and L-curve analysis have been treated in glaciological inverse models, and then we give a tutorial on how to regularize and perform L-curve analyses for glaciological inverse problems. Using an inversion of the Filchner–Ronne catchment in Antarctica as our test bed, we demonstrate how to scale misfit components, compute L-curve curvature, and select both the optimal

The original work of

Later,

Indeed, it is rare to find inversions of real data and real glacier geometries performed without either explicit regularization or series truncation. However, that regularization is unfortunately not always discussed in detail, and often the effect of regularization is not explored and the choice of regularization is not justified. For instance,

When an L-curve analysis is performed in glaciological inversions, the presentation of the L-curve can be highly variable.

We do not claim that this review of regularization and L-curve analysis in the glaciological literature is exhaustive. Our intention here was merely to illustrate the spectrum of ways in which these issues have been treated within glaciological inversions. Inversions are a vital source of boundary conditions for projections of ice sheet dynamics in a changing climate

In addition to regularization, we also test the sensitivity of our results to element size, flow equation, sliding nonlinearity, and the use of effective pressure. We test the influence of regularization through L-curve analysis, and we created separate independent L-curves whenever we tested anything else in our experimental setup: for instance, when we varied element size, we created a separate L-curve for each mesh that we tested. We organized our experimental setup around a reference case using the shelfy-stream approximation (SSA) in the highest-resolution mesh with a linear Weertman sliding law. We explored the influence of regularization itself by analyzing the spatial and spectral characteristics of the reference case in detail (Sect.

We model a domain covering the catchments of West and East Antarctica that feed into the Filchner–Ronne Ice Shelf (Fig.

Geographic setting of our study area.

We use the Ice-Sheet and Sea-level System Model (ISSM;

For our basic model, we use a Weertman-type sliding law

We show a comparison of the three

Comparative maps of effective pressure

We use a single observational misfit and a single regularization term in our cost function. Our observational misfit is the L2 norm of the absolute velocity misfit:

Before combining the data and regularization cost terms, it is useful to scale them according to estimates of their characteristic magnitude. Scaling the cost function terms is valuable for two reasons: (1) it allows us to easily identify the region of parameter space that we need to search in our L-curve analysis, and (2) it ensures that the regularization parameter is both unitless and readily interpretable in terms of relative weight placed on the two component terms. Without scaling, the regularization parameter would have obscure units, it would be difficult to identify the region of parameter space near the corner of the L-curve to search for the optimal regularization, and there would be no basis for judging whether a given level of regularization was “large” or “small”.

The characteristic scale of the data term is readily given by the variance of the velocity observations themselves:

Computing the characteristic scale of the regularization term is more complex than computing the characteristic scale of the data term, because the magnitude of the regularization term depends on the gradient of the unknown drag coefficient. We computed the characteristic scale of the regularization term based on the mean-square gradient of a reference sinusoid with a wavelength given by the mean ice thickness and an amplitude given by an estimate of the standard deviation of the drag coefficient. We estimated the standard deviation of the drag coefficient by computing a first guess based on the ratio of driving stress and observed velocity:

We then use the characteristic scales to normalize both components of the cost function:

The L-curve for our reference case of mesh no. 1 and linear Weertman sliding is shown in Fig.

Example L-curve for mesh no. 1. Large plot

Far from the corner region, both limbs of our L-curve tend towards straight lines, but for the large-

The best-

While, for simplicity's sake, we are only showing a single representative L-curve analysis in Fig.

In Fig.

Comparison of inversion results for the highest-resolution mesh as a function of

We also show details of the inverted structure within several fast-flowing ice streams and outlet glaciers. In response to the

For each region, we first interpolated

Spectral analysis for all four detail regions highlighted in Fig.

We show the results of this spectral analysis of basal drag for the four detail regions in Fig.

Overall, these results provide qualified support to the hypothesis advanced in

In Fig.

Convergence of L-curve analysis with mesh resolution.

The dependence of all three

In addition to examining variation in

Comparison of inversion results as a function of mesh resolution and

In addition to comparing inversion results at the (variable)

These results confirm the utility of L-curve analysis, by demonstrating that the fine structure obtained by the inversion at

Before analyzing our results for Budd sliding in detail, it is useful to consider Eq. (

Comparative L-curves for different

All three of our candidate

Figure

At face value, Fig.

This table summarizes inversion performance for the three sources of effective pressure

In Fig.

Comparative L-curves for different values of the sliding exponent

Comparative maps of inverted drag coefficient

As in our experiments with

This table summarizes inversion performance for the three values of the sliding exponent

We produced L-curves for HO inversions with linear Weertman sliding in mesh nos. 4, 6, and 8. We compare these with the corresponding SSA inversions in Fig.

Comparative L-curves for HO and SSA inversions.

A detailed examination of the misfit maps for the HO inversions (not shown) revealed that their increased observational misfit was largely due to increased positive misfit (i.e., model velocity too high) in slow-flowing regions. Thus, we hypothesized that they may have been unable to fit the data because they contained too much deformation flow: if the temperature model used as input contained too much warm ice near the base, then a vertically resolved HO model would have too much shear deformation in the lower part of the ice column, resulting in model surface velocities that were faster than observed, with no way to bring the model velocities down by adjusting basal slip. We tested this hypothesis by running one additional L-curve experiment in mesh no. 6 with HO and a constant ice rheology corresponding to a temperature of

We summarize the performance of the two HO inversions in mesh no. 6 as compared to the SSA inversion for that mesh in Table

This table summarizes inversion performance for HO and SSA models in mesh no. 6. Each model is evaluated at its own

Throughout this paper we have produced many different inversions and competing descriptions of the ice sheet basal drag. In this subsection, we combine them to produce a single consensus picture representing our best combined estimate. We feel that it is more appropriate to produce a consensus view of drag,

Our best combined estimate of the ice sheet basal drag

Figure

The uncertainty in our combined drag estimate is mostly low, but with large excursions co-located with areas of high drag (Fig.

Fundamentally, the purpose of regularization is to determine the information content of our observations by managing the tradeoff between increased complexity in the inversion target field and reduced observational misfit. While one can usually reduce the observational misfit by adding ever more complexity to the inversion target field, this is equivalent to increasing the number of free parameters one is allowed to tune, and the meaningfulness of a good observational fit achieved with an excess of free parameters is questionable. Occam's razor suggests that we should prefer the model that explains the maximum amount of observational variance with the minimum amount of spatial structure. The ideal

This focus on information content also provides a useful framework for thinking about the question of convergence in inverse models. In forward models, “convergence” can be simply defined to mean that, in the limit that mesh resolution approaches zero, the numerical solution approaches the continuous solution. In inverse models, by contrast, there are two fields that the numerical solution could potentially approach: the continuous solution and the true field. If the true field is sufficiently rough in comparison to the attenuating effect of the forward problem, then the limit of

If it is not the true basal drag, what then is the continuous solution? It is useful to think of the continuous solution of the regularized inverse problem as the best knowable drag field. When

It is common in the inverse modeling literature to read some version of the statement that, because it is possible to fit the data and achieve force balance with any sliding law, we therefore cannot use inverse models to distinguish between different sliding laws. For instance,

However, we argue that inverse models are not neutral between different sliding laws, even if they are limited to a single time slice. Occam's razor dictates that, when choosing between two parameterizations that both obtain a similar fit to the data, we should prefer the parameterization that requires less complexity to do so. In the case of a sliding law, we are creating a parameterization that relates two quantities that each have about

Thus, we would argue that, rather than being agnostic about the value of the sliding exponent, inverse models actually provide evidence in favor of nonlinear sliding. Our total variance measure, which combines both observational misfit and coefficient variance, was better for nonlinear sliding as compared to linear sliding by about a factor of 3 over the whole domain or a factor of 2 when the analysis was limited to the more challenging fast-flow region. In addition, our L-curves for nonlinear sliding were more sharply curved, producing a narrower and more well-defined range of acceptable

While our inverse model results are strongly in favor of nonlinear basal sliding, they are more ambiguous on the utility of including effective pressure in a sliding rule. Other lines of evidence, including laboratory tests

When it comes to simple geometry-based calculations for

Nonetheless, there remains hope that further improvements in hydrological models may change the situation.

In almost all of our experiments in this paper, we used the SSA stress balance equations, which are the simplest approximation that can be used in an inversion. It is tempting to assume that inversions using more advanced stress balance equations, such as HO or full Stokes (FS), are inherently more powerful than inversions performed with the SSA equations. However, that is forward model thinking, not inverse model thinking. From an inverse modeling perspective, the extra stress and strain rate components found in HO and FS models are a liability, not an asset. If a simpler SSA model is able to obtain a similar fit to the data as a more complex HO or FS model, then it is the simpler model that should be preferred. But “similar fit” may actually be an optimistic case for the more complex models: in our experiments comparing HO and SSA using the same mesh and sliding law, we found that the SSA inversion was able to obtain a better fit to the data than the HO model. There is thus very little justification, from our results at least, for using the more complex stress balance equations in an inversion.

The proximal cause of the poor performance of our HO inversions was that the basal ice rheology was too warm in large portions of the domain. Remedying this by using a constant cold rheology at all depths throughout the domain brought the inversion performance closer to the performance of the SSA inversion. However, a spatially invariant constant is a very poor approximation of the ice sheet thermal structure; it ignores very real variations in thermal boundary conditions such as surface temperature, accumulation rate, geothermal flux, ice thickness, and strain heating. While we probably could have produced a realistic colder temperature field that enabled our HO inversions to match the performance of our SSA inversions with a bit of work, the mere fact that this tuning is required is a mark against the HO inversions. In addition, using a stiffer ice rheology near the bed has the effect of greatly reducing shear deformation, forcing the entire domain to shift towards a regime of plug flow. In other words, in order to improve the performance of our HO inversion, we had to force it into a regime where it mimicked SSA! This explanation may also help to explain why some previous studies, such as

Fundamentally, the problem is that, by allowing for vertical variations in rheology, strain rate, and therefore velocity, HO and FS introduce an enormous amount of variability into the problem that cannot be constrained by the available datasets. Sliding inversions typically take the rheological structure of the ice sheet to be fixed, but properly speaking, an FS or HO inversion should be simultaneously adjusting the sliding coefficient and the rheology of the lower ice column. While glaciological inversions for vertically averaged rheology have been performed

It is interesting that we found that our SSA inversions even outperformed our HO inversions in slow-flowing areas of the ice sheet, where the SSA equations should not be a good approximation to the ice sheet stress balance. However, this is less surprising when considering the framework of the “hybrid” approximation to the stress balance equations

On the basis of the results and discussion above, we suggest the following principles for guiding the use and analysis of L-curves for regularization in glaciological inversions. Note that, while we have made a number of particular choices in our model setup here (e.g., the choice to use the L2 norm of absolute velocity misfit for

Whenever possible, modelers should use an a priori estimate of characteristic scale to normalize their individual cost function terms before those terms are combined. Doing so ensures that

L-curves should always be presented on log–log axes with equal axis scaling (i.e., 1 order of magnitude on the

In addition to an honest visual presentation of L-curve shape, it is vital that the optimal

Inversions with noisy, uncurved, nonmonotonic, or otherwise ill-formed L-curves should be regarded as suspect. Rather than simply selecting one

In an inverse modeling context a modeler cannot simply assume that HO or FS models are superior to an SSA model simply because the former contain more complete representations of the stress tensor. In an inverse modeling context, the increased internal complexity and degrees of freedom in the 3D models are a liability that must be justified, for instance through comparative L-curve analysis (e.g., Fig.

The utility of sliding laws that incorporate effective pressure

In contrast to the small marginal utility added by including

Fundamentally, inverse models are a mathematical expression of principles of knowledge and inference. We have some aspects of the ice sheet that we can observe, such as the surface velocity, and we would like to use those observations to infer aspects that we cannot observe, such as the basal drag. However, we cannot infer all variables everywhere from a finite set of observations on a single surface of the ice sheet. It is therefore vital that we structure our inference engine to be skeptical of the inferred structure that it generates. Regularization is the mathematical expression of that skepticism, ensuring that our inversions only produce structure that is actually required to fit the data.

In this paper, we have given a tutorial on how to regularize glaciological inverse models, including normalization of cost function components and L-curve analysis. We have shown that, with non-zero observational error, the glaciological L-curve converges towards finite non-zero regularization values and a best knowable basal drag field in the continuous problem. It remains to be seen whether regularization approaches zero and the best knowable field approaches the true field for real glacier settings in the limit that observational error also approaches zero. We have shown how an L-curve analysis can enable a modeler to draw more rigorous conclusions about the short-wavelength structure of basal drag, and we have shown how the optimal regularization level on coarse meshes is intimately connected with numerical convergence of spatial structure with respect to element size. We have also advocated a change in philosophy for glaciological inverse models that centers the role of regularization in the process, with the goal of inverse modeling being explicitly understood as not merely fitting the data, but fitting the data using the least amount of structure. We have shown how this shift in philosophy allows inverse models to break their agnosticism on the question of sliding nonlinearity, coming down strongly on the side of nonlinear sliding laws while providing more ambiguous conclusions on the utility of incorporating effective pressure into the sliding law. This philosophy also provides a framework for thinking about the relative performance of different types of inversions, with more complex models being required to justify their increased degrees of freedom with an improved observational fit.

But while this shift in philosophy may favor simpler 2D models for inversions of 2D datasets, a fascinating future of inverse modeling may lie in using 3D models to assimilate a greater variety of information than that which can be included in 2D models. Surface horizontal velocity is just one of many observations glaciologists have that give us information about the ice sheet state. While inverse modelers have occasionally tried to fit time-variable surface elevations

Our SSA model needs an estimate of the column-average ice rheology

Thermal forcing and ice rheology. Top row

The resulting thermal structure is shown in Fig.

For our numerical mesh, we use an unstructured triangular mesh in the map-view plane, constructed using the bidimensional anisotropic mesh generator

Element size analysis for our series of meshes.

We construct a control field on a 1 km regular grid using the logarithm of potential energy dissipation (the dot product of driving stress and ice velocity) in the grounded domain, the logarithm of strain-rate magnitude in the floating domain, and adding a constant multiple of the mask in order to produce discontinuities at the grounding line and calving front (Fig.

The resulting meshes are summarized in Fig.

Once the meshes were generated, we interpolated all relevant fields (e.g., geometry, velocity) onto each mesh using a multi-wavelength technique, described below (Sect.

Element anisotropy for select meshes. This figure shows the distribution of element anisotropy for the same select three meshes shown in Fig.

Our model setup requires that various data products be interpolated from their native regular square grids onto our unstructured triangular meshes. For instance, our ice geometry data are taken from BedMachine Antarctica Version 2

Thus, we developed a multi-wavelength smoothing procedure for interpolating gridded data products onto our model mesh. The objective of this procedure is to ensure that each mesh node receives an interpolated value that is representative of the average of the original data grid over an area comparable to the local element size. Where mesh resolution is fine, mesh nodes receive values interpolated from a high-resolution grid, and where mesh resolution is course, the nodes receive values interpolated from a highly smoothed grid. The procedure is as follows:

First, we generate multiple copies of the original grid smoothed at a range of wavelengths. Starting from the original full-resolution grid, each subsequent grid is generated by twice smoothing the previous grid with a

Next, we need to calculate a characteristic local element size for each mesh vertex. This is done by first computing the hydraulic radius for each element (half the hydraulic diameter, or

Finally, we can interpolate the data from our grids to the mesh vertices. Each vertex is assigned to the first grid with a resolution coarser than its hydraulic radius. We loop through the grids, and for each grid, we perform a simple bilinear interpolation from that grid onto the vertices that were assigned to it.

This procedure produces an interpolated product that both avoids aliasing in coarse-resolution regions of the mesh and preserves high-resolution information where the mesh is fine. Additionally, while this method is not exactly conservative, it is approximately so: integrated ice volume within our domain computed on the mesh was within a factor of

The smoothing and interpolation of mask values can be especially tricky, as there is not always a straightforward interpretation of intermediate values between the original integer classes. For instance, BedMachine has a mask variable with the following five values:

If the mask reordering method described here is applied to a domain that includes Lake Vostok, then the lake should be given a mask value for either a floating ice shelf or the grounded ice sheet. If Lake Vostok is treated like a grounded ice sheet, then the bed elevation should be raised to the ice bottom and the friction coefficient set to zero, and if Lake Vostok is treated like a floating ice shelf, then the local “sea level” needs to be raised to a value reflecting the actual level of hydrostatic equilibrium in the lake.

. Often, modelers will interpolate mask values using nearest neighbor interpolation, thus ensuring that they only get integer results and they do not need to do the hard work of interpreting intermediate values. However, nearest neighbor interpolation will not necessarily produce values that are spatially representative. For instance, if a single mesh vertex happens to find the lone nunatak in an otherwise ice-covered region, then the apparent area of that nunatak in the model mesh would be much greater than reality. Our multi-wavelength interpolation procedure, described above, can produce values that are spatially representative, but these values will not necessarily be integer-valued, and we thus need a way to interpret intermediate values of the interpolated mask.Flowchart showing all combinations of intermediate values for both the original

Unfortunately, the original BedMachine mask order does not always behave in an ice dynamically reasonable manner under smoothing and interpolation operations. We consider behavior to be most “reasonable” when intermediate fractional mask values can be easily interpreted as either area fractions of the respective regions or as an intermediate boundary position, for instance if an interpolated mask value in between grounded ice sheet and floating ice shelf can be simply interpreted as grounded area fraction. We consider behavior to be moderately unreasonable when intermediate values change the configuration but in a way that does not produce big disturbances in the force balance or overall dynamics, for instance if interpolation between a grounded ice sheet and the open ocean produces a narrow floating shelf at the terminus that provides no appreciable buttressing. And we consider behavior to be egregiously unreasonable if intermediate values change the ice sheet configuration in a way that produces a major change in force balance or dynamics, for instance if interpolation between floating ice shelf and open ocean produces a ring of grounded ice rises at the terminus.

The original mask order produces two egregiously unreasonable intermediate states (Fig.

However, the two egregiously unreasonable intermediate states in the original mask order can be eliminated by a simple reordering of the mask values. The only change required is that the numbers representing exposed rock and floating ice shelf are swapped, such that the mask now goes as follows:

This point is admittedly tangential to the main topic of our paper, but we find it striking that a simple reordering of the mask can produce behavior under smoothing and interpolation operations that is far more reasonable from an ice dynamic perspective than the original mask order. After reordering of the mask as described here, we used the same multi-wavelength interpolation procedure described above to interpolate the reordered mask onto the model mesh. Finally, we removed areas with an ice thickness less than

For our L-curve analysis, we sample the range

Before selecting the optimal

The selection of the optimal smoothing wavelength in

Once we have a smoothed tradeoff curve, we computed the total logarithmic curvature from

Despite being a poor representation of reality (Sect.

Effectiveness of converting the slip coefficient from a linear sliding law to a nonlinear one.

We converted our linear Weertman drag coefficient to a nonlinear sliding law using

As can be seen in Fig.

Thus, these results demonstrate that L-curve analysis can be a valuable tool even for modelers who intend to convert their coefficient from linear Weertman to a new sliding law. Conversion of the coefficient without performing an L-curve analysis first may produce unreliable results, especially for

Inversion scripts, inverse model results, L-curve summaries, and our best combined drag estimate are available at

MJW wrote inversion scripts, devised experimental design, made figures, and prepared the manuscript text. TK ran the CUAS hydrology model. AH and MR acquired funding and supervised the work. MR installed and maintained the ISSM installation on AWI's HPC system. All authors contributed to the revision and improvements of the manuscript as well as discussions about ideas and conclusions.

The contact author has declared that none of the authors has any competing interests.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.

We thank the two anonymous reviewers for their thorough and rigorous comments on our manuscript. Additionally, we thank Christian Schoof for correcting a mistake in Eq. (1).

This research has been supported by the Alfred-Wegener-Institut Helmholtz-Zentrum für Polar- und Meeresforschung under the INSPIRESII project FRISio.The article processing charges for this open-access publication were covered by the Alfred-Wegener-Institut Helmholtz-Zentrum für Polar- und Meeresforschung.

This paper was edited by Alexander Robinson and reviewed by two anonymous referees.