Firn densification modelling is key to understanding ice sheet mass balance, ice sheet surface elevation change, and the age difference between ice and the air in enclosed air bubbles. This has resulted in the development of many firn models, all relying to a certain degree on parameter calibration against observed data. We present a novel Bayesian calibration method for these parameters and apply it to three existing firn models. Using an extensive dataset of firn cores from Greenland and Antarctica, we reach optimal parameter estimates applicable to both ice sheets. We then use these to simulate firn density and evaluate against independent observations. Our simulations show a significant decrease (24 % and 56 %) in observation–model discrepancy for two models and a smaller increase (15 %) for the third. As opposed to current methods, the Bayesian framework allows for robust uncertainty analysis related to parameter values. Based on our results, we review some inherent model assumptions and demonstrate how firn model choice and uncertainties in parameter values cause spread in key model outputs.

On the Antarctic and Greenland ice sheets (AIS and GrIS), snow falling at the surface progressively compacts into ice, passing through an intermediary stage called firn. The process of firn densification depends on local conditions, primarily the temperature, the melt rate and the snow accumulation rate, and accurate modelling of densification is key to several applications in glaciology. Firstly, variability in firn densification affects altimetry measurements of ice sheet surface elevation changes. Consequently, uncertainties in modelled densification rates have a direct impact on mass balance estimates, which rely on a correct conversion from measured volume changes to mass changes (Li and Zwally, 2011; McMillan et al., 2016; Shepherd et al., 2019). Errors in the firn-related correction can lead to over- or underestimation of mass changes related to surface processes and also lead to misinterpreting elevation change signals as changes in mass balance and in ice flow dynamics. Secondly, firn models are used to estimate the partitioning of surface meltwater into runoff off the ice sheet, and refreezing within the firn column, which strongly influences mass loss rates (van den Broeke et al., 2016). Model estimates of current and future surface mass balance of the AIS and GrIS are thus dependent on accurate models of firn evolution. And finally, the densification rate determines the firn age at which air bubbles are trapped in the ice matrix. Knowing this age is crucial for precisely linking samples of past atmospheric composition, which are preserved in these bubbles, to paleo-temperature indicators, which come from the water isotopes in the ice (Buizert et al., 2014).

Firn densification has been the subject of numerous modelling studies over the last decades (e.g. Herron and Langway, 1980; Goujon et al., 2003; Helsen et al., 2008; Arthern et al., 2010; Ligtenberg et al., 2011; Simonsen et al., 2013; Morris and Wingham, 2014; Kuipers Munneke et al., 2015). However, there is no consensus on the precise formulation that such models should use. Most models adopt a two-stage densification process with the first stage characterizing faster densification for firn with density less than a critical value and then slower densification in the second stage. The firn model intercomparison of Lundin et al. (2017) demonstrated that, even for idealized simulations, inter-model disagreements are large in both stages. Firn compaction is driven by the pressure exerted by the overlying firn layers. Dry firn densification depends on numerous microphysical mechanisms acting at the scale of individual grains, such as grain-boundary sliding, vapour transport, dislocation creep and lattice diffusion (Maeno and Ebinuma, 1983; Alley, 1987; Wilkinson, 1988). Deriving formulations closely describing the densification of firn at the macroscale as a function of these mechanisms is challenging. Consequently, most models rely on simplified governing formulations that are calibrated to agree with observations. The final model formulations have usually been tuned to data either from the AIS (Helsen et al., 2008; Arthern et al., 2010; Ligtenberg et al., 2011) or from the GrIS (Simonsen et al., 2013; Morris and Wingham, 2014; Kuipers Munneke et al., 2015), consisting of drilled firn cores from which depth–density profiles are measured. However, the calibration of firn densification rates to firn depth–density profiles requires the assumption of a firn layer in steady state. To overcome this limitation, some models have been calibrated against other types of data such as strain rate measurements (Arthern et al., 2010; Morris and Wingham, 2014) or annual layering detected by radar reflection (Simonsen et al., 2013), but such measurements remain scarce and do not extend to firn at great depths below the surface. Ultimately, firn model calibration is an inverse problem that relies on using observational data to infer parameter values.

In this study, we adopt a Bayesian approach in order to address firn model calibration. This provides a rigorous mathematical framework for estimating distributions of the model parameters (Aster et al., 2005; Berliner et al., 2008). Bayesian inversion has been applied in several glaciological studies, and it has been demonstrated that this methodology improves our ability to constrain poorly known factors such as basal topography (Gudmundsson, 2006; Raymond and Gudmundsson, 2009; Brinkerhoff et al., 2016a), basal friction coefficients (Gudmundsson, 2006; Berliner et al., 2008; Raymond and Gudmundsson, 2009), ice viscosity (Berliner et al., 2008) and the role of the subglacial hydrology systems on ice dynamics (Brinkerhoff et al., 2016b). In the Bayesian framework, model parameters are considered as random variables for which we seek an a posteriori probability distribution that captures the probability density over the entire parameter space. This distribution allows us not only to identify the most likely parameter combination, but also to set confidence limits on the range of values in each parameter that is statistically reasonable. This enables us to quantify uncertainty in model results, to challenge the assumptions inherent to the model itself and to assess correlation between different parameters. Calculations rely on Bayes' theorem (see Sect. 2.4 and Eq. 7), but because of the high-dimensional parameter space and the non-linearity of firn models, solutions cannot be computed in closed form. As such, we apply rigorously designed Monte Carlo methods to approximate the target probability distributions efficiently. By exploiting the complementarity between the Bayesian framework and Monte Carlo techniques, we recalibrate three benchmark firn models and improve our understanding of their associated uncertainty.

In order to calibrate three firn densification models, we use observations
of firn depth–density profiles from 91 firn cores (see Data Availability and
Supplement) located in different climatic conditions on both
the GrIS (27 cores) and the AIS (64 cores) (Fig. 1). Using cores from both
ice sheets is important since we seek parameter sets that are
generally applicable and not location-specific. We only consider dry
densification since meltwater refreezing is poorly represented in firn
models and wet-firn compaction is absent
(Verjans et al., 2019). As such, we select
cores from areas with low mean annual melt (

Maps of Antarctic

We use DIP as the evaluation metric for the models because of the crucial role of this variable in both surface mass balance modelling and altimetry-based ice sheet mass balance assessments (Ligtenberg et al., 2014). We note that it is commonly used in firn model intercomparison exercises (Lundin et al., 2017; Stevens et al., 2020) and is a quantity of interest for field measurements (Vandecrux et al., 2019). Due to its formulation (Eqs. 1 and 2), DIP represents the mean depth–density profile and thus is robust to the presence of individual errors and outliers in density measurements.

Observed firn density can be prone to measurement uncertainty, which previous studies point out is about 10 %, though it is variable in depth and between measurement techniques employed (Hawley et al., 2008; Conger and McClung, 2009; Proksch et al., 2016). We outline our procedure to account for measurement uncertainty in Sect. 2.4.

We separate the dataset into calibration data (69 cores) and independent
evaluation data (22 cores). The latter are selected semi-randomly; we ensure
that they include a representative ratio of GrIS–AIS cores and that they cover
all climatic conditions, including an outlier of the dataset with high
accumulation and temperature (see Supplement). The resulting
evaluation data have 8 GrIS and 14 AIS cores; 11 of the 22 cores extend to

At the location of each core, we simulate firn densification under climatic
forcing provided by the RACMO2.3p2 regional climate model (RACMO2 hereafter)
at 5.5 km horizontal resolution for the GrIS
(Noël et al., 2019) and 27 km for the AIS
(van Wessem et al., 2018).
Each firn model simulation consists of a spin-up by repeating a reference
climate until reaching a firn column in equilibrium, which is followed by a
transient period until the core-specific date of drilling. The reference
climate is taken as the first 20-year period of RACMO2 forcing data
(1960–1979 and 1979–1998 for the GrIS and AIS respectively). The number of
iterations over the reference period depends on the site-specific
accumulation rate and mass of the firn column (mass from surface down to

Results of the calibration would depend on the particular climate model used for forcing. We thus propagate uncertainty in modelled climatic conditions into our calibration of firn model parameters by perturbing the temperature and accumulation rates of RACMO2 with normally distributed random noise. Standard deviations of the random perturbations are based on reported errors of RACMO2 (Noël et al., 2019; van Wessem et al., 2018 – see more details in the Supplement). By introducing these perturbations, uncertainty intervals on our parameter values encompass the range of values that would result from using other model-based or observational climatic input.

In addition to the climatic forcing, another surface boundary condition is
the fresh snow density,

We use the Community Firn Model (Stevens et al., 2020) as the
framework of our study because it incorporates the formulations of all three
densification models investigated: HL (Herron and Langway, 1980),
Ar (Arthern et al., 2010) and LZ
(Li and Zwally, 2011). The Robin hypothesis
(Robin, 1958) constitutes the fundamental assumption of HL, Ar
and LZ. It states that any fractional decrease in the firn porosity,

HL

Information for the free parameters of HL

In our approach, the free parameters of the firn models are identified as
the quantities of interest and we define this parameter set as

Implementation of the random walk Metropolis algorithm.

There is no analytical form of

From the posterior probability distributions, we can infer the maximum a
posteriori (MAP) estimates of each model (MAP

Since there is no analytical form of our posterior distributions, and to facilitate future firn model uncertainty assessments, we can approximate the posterior distributions with MVN distributions whose means and covariances are set to the posterior means and posterior covariance matrices of the calibration. This allows straightforward sampling of random parameter sets instead of relying on posterior samples of the MCMC. We provide information about the normal approximations and assess their validity in the Supplement. Such normal approximations are asymptotically exact and are commonly applied to analytically intractable Bayesian posterior distributions (Gelman et al., 2013).

We present the results of the calibration process after 15 000 algorithm iterations and compare the MAP and original models' performances against the 22 evaluation cores. We also evaluate the uncertainty of the posterior distributions and compare performances between the different MAP models. All the evaluation simulations are performed without climatic and surface density noise in order to make the evaluation fully deterministic.

Posterior probability distributions, shown for pairs of
parameters, for

For HL and even more so for Ar, the posterior distributions for the
parameters demonstrate some strong disagreements with the original values
(Fig. 3a, b). The 95 % credible intervals for each parameter (Table 1)
incorporate 95 % of the marginal probability density in the posterior. Two
original parameter values of HL (

Comparison of evaluation data DIP with model results. The 95 % credible intervals are computed from results of 500 randomly selected parameter combinations from the posterior ensembles of each model (HL, Ar, LZ). Similar scatter plots for the LZ dual and IMAU results are shown in the Supplement (Fig. S6).

We use the original models and the MAP estimates to simulate firn profiles
at the evaluation sites and we compare DIP results with the observed
values. This is an effective way to assess possible improvements in
parameter estimates reached through our method since the evaluation sites
were not used in the calibration process. The match between observations and
the model is improved for MAP

Model results on the evaluation data. The root-mean-squared errors (RMSEs) are calculated with respect to the observations of depth-integrated porosity until 15 m depth and until pore close-off.

Depth–density profiles at three evaluation sites. DML is a climatic outlier of our dataset with particularly high temperatures and accumulation rates. The 95 % credible intervals are computed from results of 500 randomly selected parameter combinations from the posterior ensembles of each model (HL, Ar, LZ).

For LZ, the relative performance of the MAP

Improvements of the MAP models with respect to the
original models for the evaluation data. The ratios indicate the ratios of
cores for which an improvement is achieved by the corresponding MAP. Panels

Compared to the original HL, MAP

As explained in Sect. 2.3, the original LZ model was developed for GrIS firn
only (Li and Zwally, 2011) and later complemented by an
AIS-specific model (Li
and Zwally, 2015). We compute results at the AIS and GrIS evaluation sites
using the Li and Zwally (2015) model for the AIS and the Li and Zwally (2011) model for the GrIS, so that both models are applied to the ice sheet
for which they were originally developed. We call this pairing of models LZ
dual and evaluate its general performance. The RMSE for DIP15 of LZ dual
is slightly larger (

We also compare MAP results with the IMAU firn densification model
(IMAU-FDM), which has been used frequently in recent mass balance
assessments from altimetry (Pritchard
et al., 2012; Babonis et al., 2016; McMillan et al., 2016; Shepherd et al.,
2019). IMAU-FDM was developed by adding two tuning parameters to both
densification stages of Ar. All four extra parameters are different for the
AIS (Ligtenberg et al., 2011) and for the
GrIS (Kuipers Munneke
et al., 2015), thus also resulting in two separate models. For the evaluation
data, the performance of IMAU-FDM for DIP15 is slightly better than MAP

To assess the uncertainty captured by the Bayesian posterior distributions,
we compute results on the evaluation data with the 500 parameter sets
randomly selected from each of the three posterior ensembles. For all three
models, the average performance of their random sample is similar to the
corresponding MAP performance, with a maximum RMSE change of 6 % (Table 2). This demonstrates a low uncertainty in the optimal parameter
combinations identified by calibration. Furthermore, the best-performing
95th percentile of the random selection allows the construction of the
uncertainty intervals shown in Figs. 4 and 5. Of the original models, LZ reaches
the lowest RMSE values. Of all models, MAP

This calibration method is potentially applicable to models of similar
complexity in a broad range of research fields. We exploit it here to
investigate the parameter space of HL, Ar and LZ and to re-estimate optimal
parameter values conditioned on observed calibration data; no further
complexity is introduced since the number of empirical parameters remains
the same. We treat the accumulation exponents of Ar (

In the IMAU model introduced in Sect. 3, tuning parameters have been added
to Ar in order to reduce its sensitivity to accumulation rates (Ligtenberg
et al., 2011; Kuipers Munneke et al., 2015). The calibration method
presented in this study detects and adjusts for this over-sensitivity in Ar
without the need for more tuning parameters in the governing densification
equations. The sensitivity of stage-1 densification to

HL, Ar and LZ only use temperature and accumulation rates as input
variables. Other models use additional variables hypothesized to affect
densification rates. These include the temperature history mentioned above
(Morris and Wingham, 2014), firn grain size
(Arthern et al., 2010), impurity content
(Freitag et al., 2013), and a transition region
between stage-1 and stage-2 densification (Morris,
2018). Other models are explicitly based on micro-scale deformation
mechanisms (Alley, 1987; Arthern
and Wingham, 1998; Arnaud et al., 2000). These efforts undoubtedly
contribute to progressing towards physically based models. A potential
problem with such approaches is overfitting calibration data by adding
parameters to model formulations while detailed firn data remain scarce. As
long as more firn data are not available to appropriately constrain the role
of each variable in model formulations, we favour the use of parsimonious
models relying on few input variables. It is noteworthy that MAP

In order to quantify the consequences of our calibration, we investigate two
aspects for which firn models are of common use: calculating firn compaction
rates and predicting the age of firn at

Coefficients of variation for the 2000–2017 cumulative
compaction anomaly (

Monthly time series of compaction anomalies at two sites on the GrIS. Insets show details for particular intervals of the time series. Mean climatic anomalies are calculated as a difference between mean climatic values over the period 2000–2017 with respect to the reference period 1960–1979, and based on RACMO2 values.

We further investigate how using different models and different
parameterizations leads to discrepancies in the modelled compaction. We
compute monthly values of compaction anomalies over the 2000–2017 period
with the original and MAP models of HL, Ar and LZ (Fig. 7). Ar shows the
strongest sensitivity to climatic conditions diverging from these of the
reference period; compaction responds strongly to the general increases in
GrIS in temperature and accumulation rate, especially in late summer. Due to
its lower values for

We have implemented a Bayesian calibration method to estimate optimal
parameter combinations applicable to GrIS and AIS firn for three benchmark
firn densification models (HL, Ar, LZ). An extensive dataset of 91 firn
cores was separated into calibration and independent evaluation data. Two
optimized models (MAP

In total 41 of the 91 firn cores are from the SUMup dataset (2019 release), which is
publicly available from the Arctic Data Center (

The supplement related to this article is available online at:

VV, AAL and CN conceived this study. VV performed the development of the calibration method, performed the model experiments and led writing of the manuscript. AAL and CN supervised the work. CMS developed the Community Firn Model. PKM provided firn core data. BN and JMvW provided the RACMO2 forcing data. All authors provided comments and suggested edits to the manuscript.

The authors declare that they have no conflict of interest.

We thank Lora Koenig and Lynn Montgomery for making the SUMup dataset of firn cores available and easily accessible (Koenig and Montgomery, 2019). Matt Spencer is also acknowledged for publishing a separate dataset of firn cores (Spencer et al., 2001). We thank Joe McConnell and Ellen Mosley-Thompson, supported by the NSF–NASA PARCA project, for providing additional firn core data (Bales et al., 2001; Banta and McConnell, 2007; McConnell et al., 2000; McConnell, 2002; Mosley-Thompson et al., 2001). We thank Malcolm McMillan for his interest in the study and for providing insight into the subject of ice sheet mass balance assessments. Vincent Verjans thanks Elizabeth Morris for pointing out errors in geographical coordinates of some of the firn cores and for her endless interest in firn densification. We thank all contributors to the development of the Community Firn Model (CFM) who are not authors of this study. All authors thank the two anonymous referees for their time and effort in reviewing the manuscript.

This research has been supported by the Centre for Polar Observation and Modelling, EPSRC (A Data Science for the Natural Environment, grant no. EP/R01860X/1), NESSC (Netherlands Earth System Science Centre), and NWO (Netherlands Organisation for Scientific Research, grant no. VI.Veni.192.019).

This paper was edited by Pippa Whitehouse and reviewed by two anonymous referees.