Forty-year Simulations of Firn Processes over the Greenland and Antarctic Ice Sheets

Conversion of altimetry-derived ice-sheet volume change to mass requires an understanding of the evolution of the combined ice and air content within the firn column. In the absence of suitable techniques to observe the changes to the firn column across the entirety of an ice sheet, the firn column processes are typically modelled. Here, we present new 40-year 10 simulations of firn processes over the Greenland and Antarctic Ice Sheets using the Community Firn Model and atmospheric reanalysis variables. A dataset of more than 250 measured depth-density profiles from both ice sheets provides the basis of the calibration of the dry-snow densification scheme. The resulting scheme results in a reduction in the rate of densification, relative to a commonly used semi-empirical model, through a decreased dependence on the accumulation rate, a proxy for overburden stress. The modelled firn column runoff, when combined with atmospheric variables from MERRA-2, generates 15 realistic mean integrated surface mass balance values for the Greenland (+361 Gt yr) and Antarctic (+2623 Gt yr) ice sheets when compared to published model-ensemble means. We find that seasonal volume changes associated with firn air content are approximately 3 times larger than those associated with surface mass balance; however, when averaged over multiple years, ice and air-volume fluctuations within the firn column are of comparable magnitudes. Between 1996 and 2019, the Greenland Ice Sheet lost more than 5% of its firn air content indicating a reduction in the total meltwater retention capability. 20 Nearly all (>98%) of the meltwater produced over the Antarctic Ice Sheet is retained within the firn column through infiltration


Introduction
One of the most robust methods for measuring ice-sheet mass balance uses satellite altimetry (Shepherd et al., 2012(Shepherd et al., , 2018 to measure changes in surface height through time and ultimately provide ice-sheet-wide volume change estimates (Helm et al., 25 2014;Paolo et al., 2015;Pritchard et al., 2009;Zwally et al., 2005Zwally et al., , 2015. Interpretation of volume changes, however, requires ancillary information because there are several processes that generate height changes observable by satellite altimeters (Ligtenberg et al., 2011;Smith et al., 2020). The measured surface height change is a combination of signals, which reflect processes that involve ice or solid earth mass change, or even no mass change at all. Even if we remove the solid-earth processes, partitioning the remaining ice-sheet-volume change to the appropriate material densities remains a challenge.
Height changes that manifest from solid-ice processes ( ℎ ⁄ ) result from ice dynamical change over grounded ice, but over floating ice, there is an additional component due to sub-ice-shelf melt. These processes are difficult to observe or quantify; 65 thus, we can approximate the solid-ice changes by further reworking Eq. (2) to remove the firn-column height change signal ( ℎ ⁄ ) from the total ice-sheet column change ( ℎ ⁄ ): which provides the groundwork for determining ice-sheet mass balance. If the role of firn processes in ice-sheet height change is adequately modeled, we can isolate the contribution due to ice dynamical changes, which are easily converted to mass 70 because the material is assumed to be solid ice (917 kg m -3 ).
Firn column changes, however, have a complicated relationship with mass change. Variable rates of the height change due to compaction of the firn column ( ℎ ⁄ ) do not reflect a change in mass but impact the observed ice-sheet height variations through changes in volume and density. Meltwater production ( ℎ ⁄ ) is more ambiguous: when it is able to infiltrate the firn and refreeze, there is no resulting mass change, but when infiltration is impeded and meltwater runs off, there is mass 75 change. The effect of net snow accumulation ( ℎ ⁄ ) always reflects a change in mass and can be positive or negative. As a result, the conversion between height, volume, and ultimately mass change requires understanding the material density of each component, which is neither constant in time nor space.
Rather than partition firn column changes by its individual components (see above), we divide total firn-column height change into changes in the air thickness and the thickness of ice: surface mass balance (SMB) and firn air content (FAC), respectively. 80 Specifically, we define ℎ ⁄ as: where ℎ ⁄ and ℎ ⁄ represent height change fluctuations due to SMB and FAC. These components are defined below.

Surface Mass Balance 85
The SMB is the summation of mass fluxes at the surface including precipitation (solid and liquid), evaporation/sublimation, and runoff (Lenaerts et al., 2019). Here, we do not account for blowing snow processes that likely impact local-scale SMB; however, these processes comprise an overall small percentage of total SMB (Van Wessem et al., 2018). Specifically, where Sn is snowfall, Ra is rainfall, Ev is evaporation/sublimation, and Ru is runoff. All are in units of m ice-equivalent (i.e.) 90 per year. https://doi.org/10.5194/tc-2020-266 Preprint. Discussion started: 14 October 2020 c Author(s) 2020. CC BY 4.0 License.

Firn Air Content
The FAC or depth-integrated porosity represents the integrated volume of air within the entire firn column and is defined as: where is the density of ice, is the depth in meters at which the density of ice is reached, and ( ) is the density at a given 95 depth. The FAC is in units of meters of air.

Materials and Methods
We simulated firn column processes over both the Greenland and Antarctic Ice Sheets using the Community Firn Model (CFM) framework (Stevens et al., 2020), forced by reanalysis climate variables. These simulations are referred to as GSFC-FDMv1.
First, we provide specifics relating to the CFM as well as our methodology for calibration, spin-up, and implementation. We 100 then describe our selected climate forcing from NASA's Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA-2) used in our simulations. Finally, we discuss the differences between GSFC-FDMv1 and v0, the latter of which was used in Smith et al. (2020) and Adusumilli et al. (2020).

The Community Firn Model 105
The Community Firn Model was built as a resource to the glaciology community, consisting of a modular, open-source framework for Lagrangian modeling of several firn and firn-air related processes (Stevens et al., 2020). The CFM allows the user to select the processes and/or physics of each simulation. The core CFM modules contain physics for firn density and temperature evolution; however, there are several modules for additional processes that the user can implement. For the GSFC-FDMv1 simulations, we use modules for grain-size evolution, meltwater percolation and refreezing, and sublimation. The 110 user also has several options of firn densification physics from which to choose. Several of the models are calibrated using climate forcing from an RCM, atmospheric reanalysis, or even satellite-derived products, which means that any biases in these climate variables will bias the calibration coefficients in the firn densification model. Thus, it is necessary to have consistent climate forcing between the calibration and actual model runs, so we perform our own densification model calibration (Sect. 2.1.3). Finally, we use a simple bucket scheme for simulating meltwater percolation and refreezing; while the CFM contains 115 a choice of physics of varying complexity, recent work by Verjans et al. (2019) suggests there is currently no evidence that the higher-order models perform better. Here, we use CFM v1.0.5, which is described in detail in Stevens et al. (2020).

Model Spin-up
To ensure that we do not impose any unwanted transients in our simulations, we must have a sufficiently long spin-up interval during which the majority of the firn column is refreshed. Due to variable snow accumulation rates across the ice sheets, the 120 https://doi.org/10.5194/tc-2020-266 Preprint. Discussion started: 14 October 2020 c Author(s) 2020. CC BY 4.0 License. time required to fully refresh the firn column can vary significantly. Thus, we impose a variable spin-up time that is dependent on the long-term mean climate. Specifically, we use the Herron & Langway (1980) densification model to approximate the depth to the bottom of the firn column (delineated at a density of 910 kg m -3 ) using the long-term reference snow accumulation, temperature, and surface density. This depth is divided by a burial rate (snowfallsublimationmelt) to estimate the time needed to refresh the firn column for a given site. Spin-up intervals typically span 200 to 6,000 years in the Antarctic and 250 125 to 3,000 years for Greenland. In regions with no net accumulation (snowfall < sublimation + melt), no spin-up is implemented.
Rather, the simulations begin with a solid-ice column allowing the model to simulate seasonal snowfall, snowmelt, and runoff.
The CFM has the option to impose a dry-snow spin-up; however, this solution would build a firn column that is in dynamic equilibrium under dry conditions only. If melt were then imposed, meltwater processes would create large negative ℎ ⁄ and ⁄ that are not realistic. Instead, we only apply a 30-year spin-up to build a dry firn column. We next repeat a 130 baseline reference climate interval (RCI) time series the number of times required to match the estimated spin-up time described above. For example, if we need an 800-year spin-up and the RCI is 40 years, the latter is repeated twenty times. If the spin-up time required is not divisible by the RCI interval, we round up to the next integer to exceed the required spin-up time. The CFM is then run using this synthetic time series to generate a firn column that is in dynamic equilibrium with the climate under both dry and wet conditions. For Greenland, we use a baseline RCI of January 1, 1980-December 31, 1995 which we assume is representative of a longer-term mean climate state. The GrIS underwent a significant increase in temperatures and meltwater production after 1995 (see Sect. 3.2.1). For Antarctica, the baseline RCI is the entire MERRA-2 interval (January 1, 1980 -December 31, 2019) because there were no appreciable shifts in climate during that time. We discuss the selection of RCI for both ice sheets in the Sect. 3.2.

Densification Model and Calibration 140
We use a subset of 256 published firn depth-density profiles from both the Greenland and Antarctic Ice Sheets as the basis of our calibration procedure and perform a single calibration that is representative of both ice sheets. The Arthern et al. (2010) dry-snow densification model provides the physical basis for our GSFC-FDMv1 simulations. Specifically, modelled dry-snow densification rates are separated into two stages: where the rate coefficients for the two stages ( 0 and 1 ) are defined as a function of the total mass above a given firn parcel (̇: defined as the mean accumulation rate in ice equivalence, m.i.e. yr -1 , experienced since that parcel was deposited), the temperature of the parcel in Kelvin, , and the mean annual temperature, ̅ : 150 https://doi.org/10.5194/tc-2020-266 Preprint. Discussion started: 14 October 2020 c Author(s) 2020. CC BY 4.0 License.
where is the gravitational acceleration constant (9.8 m s -2 ), the activation energy for lattice diffusion commonly used for ice is 0 = 1 = 60 kJ mol -1 (Cuffey and Paterson, 2010), is the activation energy for grain growth (42.4 kJ mol -1 ), is the gas constant (8.314 J K -1 mol -1 ), and the exponential dependence of overburden is 0 = 1 = 1 (Arthern et al., 2010). Thus, the dry densification rate experienced by a given firn parcel varies in time and based on , ̇, and within a single stage of 155 densification. A semi-empirical version of the same model (Arthern et al., 2010) used the simplified assumption that grain growth is a function of mean annual temperature, ̅ (Gow et al., 2004).
To begin the calibration procedure, we first run the model in its original form at 226 calibration sites across Greenland and Antarctica ( Figure 1). The number of sites is less than the actual number of observations (256) as some fall within the same grid cell (e.g., several observations form the vicinity of Summit, Greenland). Unlike other calibration efforts (Kuipers 160 Munneke et al., 2015;Zwally, 2004, 2011;Ligtenberg et al., 2011) the calibration procedure presented here treats dryfirn densification from both ice sheets together, forming a single calibration parameterization, which benefits from a much wider range of climate conditions than if each ice sheet was treated individually.
The logarithm of the firn density profile with depth is approximately linear (Herron and Langway, 1980). For each calibration site, we compare the slopes of the logarithmic density versus depth for the two stages of densification between observations 165 ( 0 , 1 ) and the equivalent model output ( 0 , 1 ) using the original Arthern et al. (2010) model configuration (Eqns. 7-10) forced by the RCI. Both the firn density measurements and model output are binned into half-meter depth increments to obtain similar sampling intervals before slopes are estimated. Because density measurements are noisy, we determine the slopes in an iterative fashion, removing individual density measurements with residuals to the linear model larger than 3-sigma, recalculating the linear model, and repeating until all residuals are less than 3-sigma (i.e., an iterative 3-sigma edit). Calibration 170 sites were not used in a given stage if they either (1) did not contain more than 7 data points for that stage, (2) did not span more than 5 meters in depth, (3) the final linear model produced a slope that was not significant ( > 0.01), or (4) encountered significant melt (mean annual surface melt exceeds 1% of the mean annual snow accumulation). The latter ensures that we are only calibrating to dry-snow densification. Our final calibration dataset contains 141 depth-density profiles spanning stage 1 and 77 spanning stage 2. Note, there are fewer profiles for stage 2 because not every density profile extends to stage-2 densities. 175 There are a limited number of sites used in the calibration from Greenland because most of them cannot fully reflect dry snow conditions (i.e., they do not meet the aforementioned melt criterion) (Figure 1). The ratios ( 0 , 1 ) of the observed slopes ( 0 , 1 ) to the modeled slopes ( 0 , 1 ) provide the necessary correction (or calibration coefficient) for each site as described below.
Rather than develop a new physical form for calibration, we optimize two parameters within the Arthern et al. (2010) model: 180 the exponential dependence on the mean annual accumulation rate since the parcel was deposited and the activation energy for creep. Arthern et al. (2010) found evidence that the activation energy is not well constrained for the sites investigated, suggesting that the physical processes at play under various conditions are not fully understood. Similarly, Ligtenberg et al. https://doi.org/10.5194/tc-2020-266 Preprint. Discussion started: 14 October 2020 c Author(s) 2020. CC BY 4.0 License.
(2011) and Kuipers Munneke et al. (2015), found that the Arthern et al. (2010) model required additional dependence on snow accumulation to best fit observations. Thus, we elect to calibrate the parameters relating to variations in snow accumulation 185 ( 0 , 1 ) and temperature ( 0 , 1 ) for each stage of densification. This choice of calibration parameters is also important because the climate forcing contains unknown biases, which can be partially overcome through calibration. Thus, the calibrated model presented here (along with all others) is only relevant when used with the same climate forcing (see Sect. 2.2).
We define our calibration coefficients for the two stages of densification ( 0 , 1 ) as a function of the mean accumulation rate ( ̅ ) and temperature ( ̅ ): 190 In order to solve for β and E using a least-squares fit regression model (intercept = 0) with the climate forcing, ̅ and ̅ , as predictor variables and our calibration coefficient, R, as the response variable, we must first linearize Eq. (11) and (12): 195 To generate ̅ , we calculate the mean accumulation rate for each parcel since its deposition and take the average for each stage of densification. We also define different mean temperatures for each stage of densification. For stage 2, the firn column is effectively isothermal, so we substitute in Eq. (14) the mean temperature of all firn parcels with a density greater than 550 kg m -3 , 1 ̅ . Parcels undergoing stage 1 densification incur much larger fluctuations in temperature, especially near the surface. 200 Thus, to better capture the sensitivity of densification to temperature, we use an effective mean of the temperature forcing over the entire 40-year time period, ̅̅̅ , in Eq. (13) to calibrate stage 1 densification. Because of the non-linear relationship between densification and temperature, the mean of the temperature forcing in Eq. (13) would not represent the densification rate well.
Instead, we define ̅̅̅ by first determining a temperature-dependent densification rate, k, from the temperature forcing, T: where ̅ is the mean rate constant generated from the temperature forcing. Hourly skin temperatures from MERRA-2 (see Sect. 2.2) are compiled into temporally coarsened climate forcings using the same technique described in Eq. (15) and (16).
Because TE is dependent on the activation energy, we iteratively solve for E0 and E1; however, only a single iteration was sufficient for both stages. We find the optimal parameters for Eq. (11) and (12) producing newly calibrated parameters for use with equations 9-10: 215 The dry compaction model used in the GSFC-FDMv1 simulations presented here is summarized by equations 7-10 and 20.
Model performance is discussed in Sect. 2.4.

Spatial Domain
For Greenland, we define the ice boundary using the Greenland Mapping Project (GIMP) ice mask posted at 90-meter spatial 220 resolution (Howat et al., 2014). We identified approximately 13,200 of the 12.5 km GSFC-FDMv1 pixels as ice if any of the GIMP pixels within were flagged as ice. For SMB determination, we scale each pixel by the area of ice within based on the GIMP ice mask; the total ice sheet area along with the peripheral ice is 1.78 × 10 6 km 2 . We note that this count does not correspond with the actual number of simulations within the Greenland GSFC-FDMv1 that have perennial firn layers. The grid cells with positive net accumulation (i.e., snowfallsublimationmeltwater > 0), a condition required to build a firn 225 column, amount to just over 9,000. About 4,100 grid cells did not meet the requirements to sustain a firn column, and their seasonal snowfall, melt, and runoff are simulated as described in Sect. 2.1.2.
For Antarctica, we use the drainage basins at 1-kilometer resolution defined by Zwally et al. (2012). We identified any of its 12.5 km pixels that contain an ice-flagged pixel from Zwally et al. (2012) as ice, resulting in 88,300 ice-covered pixels. We assume all the pixels are 100% ice-covered, which is equivalent in area to 13.8 × 10 6 km 2 . Most meet the positive net 230 accumulation condition to sustain a firn column (87,800). To improve efficiency, we do not simulate firn column processes for each grid cell. Rather, we investigate the similarities in atmospheric forcing between neighboring pixels to eliminate redundant simulations. We do not simulate neighboring cells individually, but rather interpolate between these neighbors if: (1) mean annual temperature is consistent within 0.75 K, (2) the root mean square difference (RMSD) in snowfall-minussublimation is less than 10% of the mean annual snowfall-minus-sublimation, (3) the RMSD in skin temperature is less than 235 0.25 K, and (4) the RMSD in meltwater production is less than 5% of the mean annual meltwater production. These selection criteria reduce the number of simulations to 38,000. With these criteria, the fine spatial resolution is preserved in coastal regions where climate gradients are strong and is coarsened in the interior where correlation length scales are quite large ( Figure 2a).

Temporal Resolution 240
All the MERRA-2 variables used are provided at hourly resolution (Sect. 2.2), yet to maintain computational efficiency, we opt to coarsen the temporal resolution to 5, 10, or 20 days, depending on the climate. Because of its relatively small size and https://doi.org/10.5194/tc-2020-266 Preprint. Discussion started: 14 October 2020 c Author(s) 2020. CC BY 4.0 License.
high accumulation rates, we run the entire Greenland Ice Sheet at resolution of 5 days. In the dry interior of Antarctica, snowfall events are infrequent; thus, we coarsen the temporal resolution based on comparison of ⁄ at several calibration sites (see Sect. 2.1.3) when simulated at five, ten, and twenty days ( Figure 2b). Specifically, we found the residuals 245 in ⁄ between the five-day simulations and the ten-and twenty-day simulations and then built a regression model for these residuals with mean snow accumulation and skin temperature as predictors. Maps were next generated for the expected residuals over the entire Antarctic Ice Sheet. Because the GSFC-FDM model was built in support of NASA's next generation Ice, Cloud and land Elevation Satellite (ICESat-2), we used the mission's science requirement for height change precision (4 mm yr -1 ) as the threshold for assessing the expected residuals (Markus et al., 2017). Grid cells were assigned their temporal 250 resolution as the coarsest time sampling where the expected residuals met the 4 mm yr -1 criterion. If it was exceeded in both the ten-and twenty-day residuals, that cell was run at five days ( Figure 2b). Expectedly, changes in FAC are quite small over the interior and are sufficiently captured at a resolution of 20 days whereas the opposite is true in coastal regions.

Initial (Surface) Density
Because of the low accumulation rates over the ice sheets and the coarse (5-to 20-days) time resolution of our simulations, we 255 anticipate significant reworking of the initial, low-density surface snowpack. Ideally, the imposed initial density would vary in time based on the ambient climate conditions; however, there are few studies that focus on the temporal evolution of freshly fallen snow over the ice sheets (e.g., Groot Zwaaftink et al., 2013). Thus, we focus rather on improving the bulk (or timeinvariant) initial density for each grid cell based on the mean annual climate conditions as done by Helsen et al. (2008) and Kuipers Munneke et al. (2015). This approach means that on average we will approximate the surface density well, but we 260 accept that there might be significant deviation from this bulk density over shorter timescales. To build a model of initial density ( 0 ), we estimate initial densities from the 151 depth-density profiles (stage 1) by finding the surface-intercept of a linear fit to the logarithm of density versus depth. These represent the best-fit of the initial density to the observed density profile. We then solved a multivariate linear regression model to relate the observed initial densities to mean annual MERRA-2 surface climate (snow accumulation, air temperature, eastward and northward winds, total wind speed, maximum wind speed, 265 and specific humidity). We next removed any points that have residuals outside of the 99 th percentile of the model in an iterative fashion starting with the largest residuals. This process removed 10 points as outliers. For the final model, we kept only the predictors that were significantly related ( < 0.05) to the initial density. The final initial density model is: 0 = −369.6 + 1.985 ̅ 0 + 3.009 ̅ 0 − 1.302 * 10 5 ̅ 0 + 27.57 ̅ + 3.192 ̅ 0 where ̅ 0 and ̅ 0 are the surface mean northward and the maximum wind speeds (m s -1 ), ̅ 0 is the surface mean specific 270 humidity (kg kg -1 ), ̅ is the mean accumulation rate (m.i.e. yr -1 ), and ̅ 0 is the surface mean temperature (K). The model results are shown in Figure 3: while we capture more than 50% of the variability, evaluating the mismatch with discarded values suggests that we are not accurately modelling the lowest densities. The 0 used in GSFC-FDMv1 are displayed in Figure 1. variables. In such a manner, the magnitude of the gradients in precipitation and temperature from the high resolution M2R12K 290 are transferred to the coarse MERRA-2 output. Figure 5 and Figure 6 show the mean annual net accumulation (snowfallminus-sublimation) and skin temperature, respectively, for the Greenland and Antarctic ice sheets. For simplicity, we hereinafter refer to the hybridized MERRA-2 as MERRA-2.
We generated rainfall grids by differencing total precipitation and snowfall. We first define our ice sheet domains in Sect. 2.4.
While the variables are provided at hourly resolution, to maximize computational efficiency, we perform the firn simulations 295 at a resolution of either five, ten, or twenty days, which is described in Sect. 2.1.5. Although MERRA-2 includes meltwater processes, only net runoff is retained. Thus, we use a degree-day approach to build gridded meltwater time series, which is described in Sect. 2.2.1.
Melt was activated when the 2-meter air temperature ( 2 ) exceeded a calibrated temperature threshold ( 0 ); the exceedance is then scaled by the calibrated degree-day factor ( : 2 ℎ ) to generate the magnitude of melt. Here, we used hourly temperatures (∆ = 1 ℎ ) to estimate five-day ( = 5 ) meltwater production. While degree-day models traditionally 305 use ∆ = 1 , we used a finer temporal resolution to ensure more realistic meltwater production, but ultimately, melt was accumulated over a five-day window.
We calibrated our melt model for Antarctica using a calibration data set of surface meltwater fluxes (Trusel et al., 2013) that span the 1999 to 2009 melt seasons, which are linearly interpolated to our 12.5 km grid. Rather than calibrate our model to 5day meltwater fluxes, we optimized correspondence of annual meltwater production between the model and calibration data 310 set, and set = 1 . For each grid cell, we quantified the that best relates the annual accumulated exceedance of 2 over a predetermined threshold, 0 , which does not vary in space. Van den Broeke et al. (2010) achieved improved results if 0 was set a few degrees below freezing as it yielded a more realistic spatial distribution of in Greenland. To evaluate which temperature threshold yielded the best model, we calculated under a wide range of 0 (265-273 K) at quarterdegree intervals. To eliminate unrealistic , we set all in the upper 1% to the 99 th percentile factor. We evaluated 315 the performance of these models in reproducing the annual time series of Antarctic-wide meltwater production as compared to our calibration data set (Trusel et al., 2013). Specifically, we compared on a grid-cell basis their ability to reproduce interannual variability ( 2 ) and to minimize mismatch ( ) with the observations. Giving equal weight to the aforementioned, we selected the temperature threshold that maximizes the normalized distance between the two evaluators ( Figure 7a). This approach selected a threshold that lies in between the threshold if determined by one evaluator alone. For 320 Antarctica, we used 0 = 270.5 , and the calibrated to that threshold, to generate 5-day meltwater production using Eq.
We estimated 5-day meltwater production over the Greenland Ice Sheet using a similar approach. While we used observationbased calibration data set over Antarctica, a similar data set does not exist for Greenland, so we instead used independent model output as the basis of our calibration. Specifically, we used the 1980-2014 annual meltwater rates from the MARv3.5.2 325 regional climate model (RCM) (Fettweis et al., 2017). Although this product provides sub-annual resolution, we opted to once more calibrate to annual meltwater production. In such a manner, the short timescale meltwater fluxes were driven by MERRA-2, but the calibration to annual RCM output ensured that the simple model remains aligned with realistic magnitudes from MAR. For Greenland, we used 0 = 269.0 , and the calibrated to that threshold (Figure 7b), to generate 5-day meltwater production using Eq. (21) (Figure 9a). 330

Improvement from GSFC-FDMv0
The results presented here build off a prior simulation, GSFC-FDMv0, detailed in a previous publication (Smith et al., 2020).
2. calibration of the dry snow/firn compaction model that limits the inclusion of observations based on the ratio of 335 mean annual meltwater production to snowfall (see Sect. 2.1.3). The calibration approach for v0 did not discard observations based on their exposure to liquid water processes. This change in v1 should lead to a small improvement in the representation of dry compaction; 3. a more robust approach to handling mass fluxes at the surface. The CFM underwent a significant update between v0 and v1, including allowing the explicit removal of mass via sublimation and also inclusion of rainfall. For 340 v0, sublimation was handled by aggregating the accumulation from neighboring time steps until positive thereby still accounting for sublimation but at the cost of smoothing out the accumulation signal. Rainfall was not included in v0. For v1, mass via rainfall can now be added to the total liquid volume present and become subject to liquid water processes; 4. an improved meltwater model. The degree-day approach for both v0 and v1 are the same; however, the v0 model 345 was built using skin temperature, which cannot exceed 273.15 K and will not capture the large temperature deviations above freezing, especially in Greenland. For v1, we use 2-meter air temperature (see Sect. 2.2.1), which is a more robust approach (e.g., van den Broeke et al., 2010); 5. runoff as an output. The older CFM version used for v0 did allow for melt, percolation, and refreezing, but did not provide runoff as an output. Thus, we are now able to calculate surface mass balance using v1. 350

Model Performance
To evaluate the model improvement through our calibration procedure (Sect. 2.1.3), we evaluate the uncalibrated and calibrated model abilities to capture the slopes of the logarithmic density versus depth for both stages against observations. We found that the mean absolute error in modeled slopes for both stages was reduced by 2/3rds after calibration, and the shared variance between observed and modeled was significantly increased in stage 2 ( Figure 4). The mean observed slope is 0.066 m -1 for 355 stage 1 and 0.028 m -1 for stage 2, indicating that the calibration reduced dry-snow densification model error from 45% to 17% for stage 1 and from 39% to 11% for stage 2. Interestingly, the calibration relies heavily on modification to the accumulation rate (i.e., overburden) component of densification for both stages. Modification to the temperature dependence is only necessary for stage 2. For both stages, densification rates are reduced under increasingly high accumulations, although the changes are more dramatic for stage 2. Densification due to temperature fluctuations during stage 2 is increased especially at 360 colder temperatures. Ligtenberg et al. (2011) andKuipers Munneke et al. (2015) similarly found that the semi-empirical Arthern et al. (2010) model mostly overestimated the rate of densification and found a empirical link with the accumulation rate.
We would ideally prefer to perform an evaluation of modeled firn densification rates, but a substantial number of published observations is lacking. Here, we further evaluate the ability of GSFC-FDMv1 to reproduce the observed densities in our full 365 calibration data set. The majority of these observations were used in the calibration; however, those with significant melt were excluded (see Sect. 2.1.3). Thus, we break out our evaluation into sites exhibiting zero, moderate, and high melt rates, https://doi.org/10.5194/tc-2020-266 Preprint. Discussion started: 14 October 2020 c Author(s) 2020. CC BY 4.0 License. quantified by their ratio to net snowfall. Specifically, these are respectively defined as 0%, less than 10%, and more than 10% of the mean annual snowfall, and we evaluate the modelled mean absolute error in reproducing depth-density observations (Figure 8.). Not surprisingly, the error increases with larger melt fractions, especially for stage 1 where the impact of melt is 370 stronger. Because most of these observations are included in the calibration, we report them as interquartile ranges and assume the upper bounds are more representative of a realistic error for each group. For Stage 1, we expect density errors of 13.6 to 26.4 kg m -3 for dry snow/firn, 23.4 to 52.2 kg m -3 for moderate melt fractions, and 34.7 to 83.7 kg m -3 for high melt fractions.
For Stage 2, we expect density errors of 10.0 to 24.1 kg m -3 for dry snow/firn, 14.4 to 41.1 kg m -3 for moderate melt fractions, and 16.3 to 34.3 kg m -3 for high melt fractions. We note here that we assume that each observation was taken on January 1, 375 1980 for comparison with the model, which likely introduces additional error.

Firn Air Content
During the RCI, the average firn air content over the GrIS was 17.1 meters, but it varied quite substantially in space ( Figure   10a) from 0.2 to 26.4 meters (lower and upper 5% bounds). The peripheral ice contained much less FAC with an average of 380 2.0 meters, yet, like the GrIS, there was a substantial range (0.1-11.3 m). Between September 1, 1996 and September 1, 2019, the mean loss of FAC over the GrIS was 5.5%, however, locally losses spanned up to 100% while the majority ranged from 0.3% to 19.8%. These change estimates were based on locations where the mean annual RCI SMB was greater than zero (i.e., a firn column exists).
Because of the much colder conditions, the AIS firn column contained substantially more air than the GrIS. The average FAC 385 during the RCI for the AIS was 24.9 meters, which typically ranges in space between 15.8 and 37.7 meters (Figure 10b).
Floating ice has a lower average FAC (19.4 m) than the grounded ice (25.6 m) because of higher temperatures and increased meltwater production.

Surface Mass Balance
The net mass flux at the surface of an ice sheet is referred to as the surface mass balance (SMB), and is typically presented in 390 units of mass per unit time. Here, we use gigatons per year (Gt yr -1 ) to refer to area-integrated values and meters of ice equivalence per year (m i.e. yr -1 ) for local values (i.e., grid cell). Specifically, where is snowfall, is rainfall, is evaporation/sublimation, and is runoff. We also present total meltwater production, . The excess + over is retained within the firn column in either a solid or liquid state. 395 https://doi.org/10.5194/tc-2020-266 Preprint. Discussion started: 14 October 2020 c Author(s) 2020. CC BY 4.0 License.

Greenland Ice Sheet and Peripheral Ice
Over the RCI, the mean annual SMB of the Greenland Ice Sheet was 374 ± 100 Gt yr -1 (± 1 standard deviation), which was comprised of 617 ± 62 Gt yr -1 in net accumulation ( + ), 25 ± 5 Gt yr -1 in rainfall, and 268 ± 55 Gt yr -1 in runoff ( Figure   11b). Total meltwater production averaged 411 ± 71 Gt yr -1 , suggesting that the firn column accommodated ~39% of all liquid water at the surface ( + ). The average local SMB was 0.24 m i.e. yr -1 ; however, it typically ranged from -0.69 to +0.90 400 m i.e. yr -1 (lower and upper 5% bounds) where approximately 13% of the ice sheet by area experienced < 0. The largest positive SMB (+3.9 m i.e. yr -1 ) was found in the snowfall-rich Southeastern GrIS, while the largest negative SMB (-4.2 m i.e. yr -1 ) was found along the most coastal portion of the Southwestern GrIS. Such large magnitudes, however, are extremely atypical. The RCI is ideally representative of long-term steady-state conditions; we find that for the GrIS neither SMB nor any of its components nor skin temperatures experienced a significant trend over our chosen RCI (p-values > 0.3; Figure 11a). 405 We also used a two-sample t-test to evaluate whether the variables from the RCI are sampled from a population with different means than after the RCI (1996-2019). We found no statistical difference in annual means for and + between the intervals during and after the RCI; however, rainfall, meltwater production, and skin temperatures are most likely elevated post-RCI (p-values < 0.05). Because our spin-up involves repeating the RCI until the entire column is refreshed, our choice of RCI (1980RCI ( -1995 should not generate non-physical transients in our firn simulations. 410 After 2003, the mean annual SMB for the GrIS was 307 ± 119 Gt yr -1 , a reduction of 68 Gt yr -1 as compared to the RCI. Insignificant increases in solid and liquid precipitation (18 Gt yr -1 ) were outweighed by a strong increase in meltwater production (115 Gt yr -1 ) and ultimately runoff (86 Gt yr -1 ). The firn column only accommodated 36% of liquid water present at the surface, suggesting decreased firn-air storage. After the major melt event of 2012 when GrIS experienced its lowest SMB (+92 Gt yr -1 ) over the 40-year interval, sharp reductions in runoff followed by years of above normal net precipitation 415 allowed the SMB to recover between 2013 and 2018. In 2019, however, the GrIS incurred its second lowest annual SMB (+117 Gt yr -1 ) due to a combination of well-below average precipitation and well-above average melt.
The SMB of the entirety of Greenland peripheral ice was never positive over the entire 1980-2019 period with a mean of -60 ± 20 Gt yr -1 and -76 ± 22 Gt yr -1 during and after the RCI, respectively. As with the GrIS, the peripheral ice bodies experienced minimal precipitation gains (5 Gt yr -1 ) in conjunction with moderate increases in melt and runoff (both 21 Gt yr -1 ). Over the 420 entire 40-year record, the firn only accommodated ~17% of all liquid water, indicating that the majority of Greenland's peripheral ice is bare ice. Local SMB over the RCI ranges from -2.7 to +0.86 m i.e. yr -1 with a mean of -0.69 m i.e. yr -1 . As expected, 80% of the peripheral ice experienced < 0.

Antarctic Ice Sheet
The SMB of the Antarctic Ice Sheet nearly entirely controlled by snowfall (Figure 12b). Of the 2609 ± 147 Gt yr -1 annual 425 mass gain over the RCI (1980RCI ( -2019, net accumulation ( + ) accounts for 2605 ± 146 Gt yr -1 whereas rainfall contributed a mere 6 ± 3 Gt yr -1 and runoff removed only 2 ± 2 Gt yr -1 . Meltwater production does exist (98 ± 32 Gt yr -1 ), however, https://doi.org/10.5194/tc-2020-266 Preprint. Discussion started: 14 October 2020 c Author(s) 2020. CC BY 4.0 License. majority (98%) is retained within the firn column. Local SMB is predominantly positive with a mean of +0.21 m i.e. yr -1 , yet values commonly span +0.04 to +0.74 m i.e. yr -1 (lower and upper 5% bounds). Less than 0.5% of the ice sheet by area exhibited mean annual < 0. The maximum SMB of +6.1 m i.e. yr -1 was found along the spine of the Antarctic Peninsula, 430 whereas the minimum of -0.6 m i.e. yr-1 was found at the Northwestern corner of the Ross Ice Shelf. Net snow accumulation, rainfall, and skin temperatures did not experience significant trends (p-values > 0.5) over the RCI. Runoff exhibited a significant negative trend (-0.08 Gt yr -2 ; p-value = 0.01); however, because of the extremely small contribution to SMB ( Figure   12b) and highly localized spatial distribution, the choice of RCI is justified.
Most mass gains over the AIS occur in the form of net accumulation over the grounded ice sheet (2148 ± 127 Gt yr -1 ), whereas 435 floating ice accumulates 457 ± 27 Gt yr -1 . Although not substantial, meltwater production over floating ice (66 ± 20 Gt yr -1 ) was on average double that over grounded ice (33 ± 13 Gt yr -1 ). Nearly all this meltwater is retained within the firn column as runoff averages <1 Gt yr -1 and 2 Gt yr -1 for grounded and floating ice, respectively.

Height and Volume Change
The combined fluctuations in SMB and FAC drive the total ice-sheet volume changes due to surface processes, yet only the 440 former constitutes an actual mass change. We evaluate the relative contributions of mass (SMB) and air (FAC) at seasonal and multi-annual timescales.

Greenland Ice Sheet
The seasonal cycles of the SMB and FAC components of ice-sheet-wide volume change averaged over the RFI are 163 km 3 and 408 km 3 , respectively (Figure 13a), indicating that changes in the FAC are 2.5x larger than SMB at sub-annual timescales. 445 When combined, this volume change translates into ice-sheet-wide average height change of 34 cm due to seasonal variability of surface processes. The volume increases until May when it typically reaches its maximum and rapidly decreases to its minimum in September, bringing the ice sheet effectively back in balance (i.e., net zero change) as by design. After the RFI, the GrIS on average experienced an annual net volume loss of 142 km 3 by September due to both FAC (78 km 3 ) and SMB (64 km 3 ); however, that average is skewed largely by two extreme years in 2012 and 2019 when the GrIS lost 806 km 3 and 712 450 km 3 , respectively (Figure 13c). Although FAC exhibits a larger seasonal cycle, the contribution to ice-sheet-wide volume change over longer timescales (i.e., several years) are somewhat smaller for the FAC than SMB (Figure 13b). Between 2003 and 2019, SMB anomalies and FAC changes contributed to a decrease in GrIS volume of 2411 km 3 (142 km 3 yr -1 ): 1034 km 3 due to FAC and 1377 km 3 due to SMB.

Antarctic Ice Sheet 455
The seasonal component of height change due to surface processes alone averages to just under 7 cm averaged over the entire AIS (Figure 14c), which is one-fifth that of the GrIS (34 cm). Due to its large area, however, the seasonal volume change amounts to 921 km 3 . Like the GrIS, the change in FAC is 3 times larger than SMB and dominates seasonal signal, amounting https://doi.org/10.5194/tc-2020-266 Preprint. Discussion started: 14 October 2020 c Author(s) 2020. CC BY 4.0 License.
for 690 km 3 in seasonal change (Figure 14a). While the maximum and minimum volume changes due to SMB occurs in October and February, respectively, those due to FAC variability occur one month later (November and March). Since 2003, 460 the AIS has grown in volume by 841 km 3 from surface processes alone of which 536 km 3 resulted from FAC changes and 305 km 3 from SMB. In sum, surface processes contributed +50 km 3 yr -1 to the volume of the AIS since 2003, a number that is vastly overshadowed by the seasonal cycle. Because the RFI encompassed the entire 1980-2019 interval, the height and volume changes begin and end with zero (i.e., no height change over the entire RFI).

Discussion and conclusion 465
We present 40-year simulations of GrIS and AIS firn processes using the CFM forced by MERRA-2 atmospheric reanalysis data. Specifically, we calibrate the Arthern et al. (2010) firn densification model through modification of its dependence on overburden and temperature. The resulting model reduces the rates of densification, largely in response to the overburden, which is approximated by the mean accumulation rate. Only minor modification to the temperature dependence was necessary for the second stage of densification, which is in line with other studies that found the accumulation rate as a key parameter in 470 model calibration (Kuipers Munneke et al., 2015;Ligtenberg et al., 2011). Our calibration differs, however, as we derive the form of our calibration using the original form of the Arthern et al. (2010) densification equation, which provides adjusted model parameters that best-fit observed depth-density profiles and the MERRA-2 climate conditions. Additionally, we calibrate the model using observations from both ice sheets, resulting in one set of adjusted parameters. It is important to note that the adjustments to the densification model parameters reflect missing physical processes as well as persistent biases within 475 the climate forcing (e.g., if the forcing exhibited a cold bias). Thus, application of these adjustments when using a different climate forcing is not recommended.
The surface density parameterization is also dependent on the mean annual climate conditions derived from MERRA-2, so any biases will manifest in the derived coefficients. We note that while the model does a satisfactory job of reproducing moderate to high surface densities (325-425 kg m -3 ), it appears insufficient at capturing the lowest observed densities. Thus, our model 480 potentially overestimates the initial density, predominantly over GrIS. More exploration into the density of new snow accumulations and their subsequent evolution over short time scales (hours to days) and across several locations is necessary to improve this simple density model. While new snow accumulation is often very low density, these values cannot be directly applied to the firn densification model which models density evolution over coarse time steps (5-to-20 days) during which the snow can undergo rapid densification. Thus, the GSFC-FDMv1 does not account for sub-time-step surface density evolution 485 and requires a bulk density representative of snowfall that has been exposed at the surface for several days to weeks. Future improvements in the time resolution of the simulations as well as observations of the rapid evolution of new snow accumulation should provide important future improvements to the model presented. Furthermore, the modeled surface density does not evolve in time, which is likely an oversimplification. https://doi.org/10.5194/tc-2020-266 Preprint. Discussion started: 14 October 2020 c Author(s) 2020. CC BY 4.0 License.
While not the focus of the work, one important output from the GSFC-FDMv1 simulation is surface runoff, which allows us 490 to estimate ice sheet SMB. The mean annual GrIS SMB is comparable to SMB estimates from an ensemble of models of varying complexity (Fettweis et al., 2020). Our estimates of the 1980-2012 GrIS mean annual SMB (361 ± 106 Gt yr -1 ) and runoff (309 ± 80 Gt yr -1 ) are very similar to the GrSMBMIP ensemble averages (338 ± 111 Gt yr -1 and 331 ± 102 Gt yr -1 , respectively). Comparison with an accompanying Antarctic model ensemble results suggests that our AIS SMB estimate for grounded and floating ice is larger than most models: Mottram et al. (2020) found the ensemble mean of AIS SMB of 2486 Gt 495 yr -1 (range: 2031-2757 Gt yr -1 ), which is less than our estimate of 2623 Gt yr -1 . We note, however, that the evaluation in Mottram et al. (2020) of each model against observations suggests they each contain a negative bias (i.e., the modeled SMB is typically less than the observed). Future evaluation of the MERRA-2 and GSFC-FDM SMB against observations will help assess whether its performance is degraded as compared to the ensemble presented in Mottram et al. (2020).
Deviations in SMB from its mean over the RCI result in ice-sheet height and volume fluctuations; however, these SMB 500 deviations along with changes in temperature also modulate the total air content within the firn column, amplifying the massrelated height and volume fluctuations. While the SMB and FAC contributions to total firn volume change over multiannual time scales are somewhat comparable, the seasonal signal is dominated by FAC for both ice sheets. This difference suggests that 70-75% of sub annual volume fluctuations are in response to a change in the air content rather than actual mass change.
Thus, determination of seasonal mass change using satellite altimetry requires a substantial FAC correction, highlighting the 505 importance of firn densification models, especially when investigating shorter intervals of change as not being mindful of the seasonal cycles of SMB and FAC can generate large biases.
The time series of firn height and volume change, split into its respective SMB (ice) and FAC (air) components, provide the data necessary to isolate the ice-dynamical change from the changes observed using airborne and satellite altimeters. Future work improving the representation of the near surface climate, initial density, and especially liquid water processes within firn 510 column should improve future iterations of GSFC-FDM modeled firn volume changes. Because of the challenges in measuring firn processes, future evaluations of firn densification model representation will likely rely on direct comparisons with altimetry-derived volume changes.
Code and data availability. The NASA GSFC MERRA-2 data are available at https://disc.gsfc.nasa.gov/, and the M2R12K data are available from B.M. upon request.
The Community Firn Model code is available at 515 https://github.com/UWGlaciology/CommunityFirnModel. The GSFC-FDMv1 simulations (including firn air content and surface mass balance) for both ice sheets are available on the ICESat-2 website (<insert link when live>). Competing interests. We declare no competing interests.
Acknowledgements. The GSFC-FDM effort was supported by the NASA ICESat-2 Project Science Office. The authors would like to acknowledge all who have contributed to the Community Firn Model effort and making it a useful resource for the https://doi.org/10.5194/tc-2020-266 Preprint. Discussion started: 14 October 2020 c Author(s) 2020. CC BY 4.0 License. community. We would also like to acknowledge Richard Cullather who provided significant insight into MERRA-2 and provided the M2R12K data. Finally, we would like to thank Tyler Sutterley and Susheel Adusumilli for providing feedback 525 on the GSFC-FDMv0 and v1 output.