A Bayesian approach towards daily pan-Arctic sea ice freeboard estimates from combined CryoSat-2 and Sentinel-3 satellite observations

Observations of sea ice freeboard from satellite radar altimeters are crucial in the derivation of sea ice thickness estimates, which in turn provide information on sea ice forecasts, volume budgets, and productivity rates. Current spatio-temporal resolution of radar freeboard is limited as 30 d are required in order to generate pan-Arctic coverage from CryoSat-2 and 27 d are required from Sentinel-3 satellites. This therefore hinders our ability to understand physical processes that drive sea ice thickness variability on submonthly timescales. In this study we exploit the consistency between CryoSat-2, Sentinel-3A, and Sentinel-3B radar freeboards in order to produce daily gridded pan-Arctic freeboard estimates between December 2018 and April 2019. We use the Bayesian inference approach of Gaussian process regression to learn functional mappings between radar freeboard observations in space and time and to subsequently retrieve pan-Arctic freeboard as well as uncertainty estimates. We also employ an empirical Bayesian approach towards learning the free (hyper)parameters of the model, which allows us to derive daily estimates related to radar freeboard spatial and temporal correlation length scales. The estimated daily radar freeboard predictions are, on average across the 2018–2019 season, equivalent to CryoSat-2 and Sentinel3 freeboards to within 1 mm (standard deviations< 6 cm), and cross-validation experiments show that errors in predictions are, on average, ≤ 4 mm across the same period. We also demonstrate the improved temporal variability of a panArctic daily product by comparing time series of the predicted freeboards, with 31 d running means from CryoSat2 and Sentinel-3 freeboards, across nine sectors of the Arctic, as well as making comparisons with daily ERA5 snowfall data. Pearson correlations between daily radar freeboard anomalies and snowfall are as high as +0.52 over first-year ice and +0.41 over multi-year ice, suggesting that the estimated daily fields are able to capture real physical radar freeboard variability at sub-weekly timescales.


Introduction
Over the past 4 decades remote sensing has provided key insights into the changing state of the polar climate system. From passive microwave sensors we have seen a significant decline in sea ice area since 1979, for all months of the year (Stroeve and Notz, 2018), as well as an increasing length of the summer melt season (Stroeve et al., 2014a). Estimates of sea ice thickness are crucial for monitoring the volumetric response of sea ice to long-term climatic change (Slater et al., 2021), as well as understanding the consequences for Arctic ecosystems, whose productivity rates are driven by the sea ice thickness distribution (Sakshaug et al., 1994;Stirling, 1997;Stroeve et al., 2021). Furthermore, with continued sea ice loss we are beginning to see an upward trend in Arctic maritime activity (Wagner et al., 2020), for which timely forecasts of ice conditions are required for safe passage and cost-effective planning. Our ability to provide reliable forecasts is inherently dependent on the availability of observations which allow us to exploit sources of sea ice predictability (Guemas et al., 2016). Initialising climate models with observations of sea ice thickness for example has been shown to considerably improve seasonal sea ice forecasts compared Published by Copernicus Publications on behalf of the European Geosciences Union.
In recent decades, advancements in satellite altimetry have enabled sea ice thickness to be estimated from space. This is achieved by measuring the sea ice freeboard; that is, the height of the sea ice surface relative to the adjacent ocean, and converting it to thickness by assuming hydrostatic equilibrium and bulk values of the ice, ocean, and overlying snow densities -and also snow depth (Laxon et al., 2003;Quartly et al., 2019). CryoSat-2 was the first radar altimeter launched with a specific focus on polar monitoring, and while it has been pivotal in improving our understanding of polar climate, its long repeat sub-cycle means that 30 d are required in order to generate pan-Arctic coverage (up to 88 • N). The same is true for the ICESat-2 laser altimeter, launched in 2018, whose capability to estimate total (snow) freeboard, at monthly timescales, has been recently demonstrated (Kwok et al., 2019). Other radar altimeters in operation such as Sentinel-3A and Sentinel-3B have slightly shorter repeat cycles (27 d) but only extend to 81.5 • latitude. With this in mind, the maximum temporal resolution we can achieve for pan-Arctic freeboard from one satellite alone is 1 month. Increasing the resolution leads to a stepwise drop in spatial coverage until we arrive at 1 d of observations, for which we achieve less than 20 % coverage for latitudes below 83 • N, with tracks averaged to a 25 km 2 grid spacing ; see also Tilling et al., 2016, for a similar analysis using 2 d of observations). This poses a significant limitation on our ability to both understand physical processes that occur on sub-monthly timescales and on capturing the temporal variability of sea ice thickness, which could provide information on sea ice forecasts.
Recent studies have found ways to improve temporal coverage through the merging of different satellite products. Ricker et al. (2017) for example, merged thickness observations from CryoSat-2 and Soil Moisture and Ocean Salinity (SMOS) satellites to produce pan-Arctic thickness fields at weekly timescales. Furthermore, Lawrence et al. (2019) showed that radar freeboard observations from CryoSat-2, Sentinel-3A, and Sentinel-3B can be merged to produce pan-Arctic coverage below 81.5 • N every 10 d. While both of these studies are significant improvements in temporal resolution, we do not yet have a daily pan-Arctic freeboard and/or thickness product. The Pan-Arctic Ice Ocean Modeling and Assimilation System (PIOMAS) sea ice thickness model (Zhang and Rothrock, 2003) is a commonly used substitute for observations when evaluating sea ice thickness in climate models and is available at daily resolution. PIOMAS assimilates daily sea ice concentration and sea surface temperature fields, and despite assimilating no information on sea ice thickness, it has been shown to be generally consistent with in situ and submarine observations (Schweiger et al., 2011) as well as exhibiting similar mean trends and annual cycles to both ICESat and CryoSat-2 satellites (Schweiger et al., 2011;Laxon et al., 2013;Schröder et al., 2019). However it generally over(under)estimates thin(thick) ice regions (Schweiger et al., 2011;Stroeve et al., 2014b), giving a misrepresentation of the true ice thickness distribution.
In this study we exploit the consistency between CryoSat-2 (CS2), Sentinel-3A (S3A), and Sentinel-3B (S3B) radar freeboards  in order to produce a gridded pan-Arctic freeboard product, hereafter referred to as CS2S3, at daily resolution between December 2018 and April 2019. Our method, Gaussian process regression, is a Bayesian inference technique which learns functional mappings between pairs of observation points in space and time by updating prior probabilities in the presence of new information -Bayes' theorem. Through this novel supervised learning approach, we are able to move beyond the typical assessment of satellite-based radar freeboard variability with simple running means and instead move closer towards understanding drivers of radar freeboard variability on subweekly timescales with a daily product (see Sect. 5). While many previous studies have reported statistical interpolation methods under a variety of names, e.g. Gaussian process regression (Paciorek and Schervish, 2005;Rasmussen and Williams, 2006), kriging (Cressie and Johannesson, 2008;Kang et al., 2010;Kostopoulou, 2020), objective analysis (Le Traon et al., 1997), and optimal interpolation (Ricker et al., 2017), each of these methods contains the same approach to learning functional mappings and the same key set of predictive equations. Despite this, the method is almost infinitely flexible in terms of its application and model setup.
We will see how our model differs from, e.g., Ricker et al. (2017) through our choice of input-output pairs as well as our choice of prior over functions and approach to learning model hyperparameters (see Sect. 3).
The paper is structured as follows: Sect. 2 introduces the data sets which are used within this study, Sect. 3 outlines the method of Gaussian process regression and presents an example of how we achieve pan-Arctic radar freeboard estimates on any given day. In Sect. 4 we evaluate the interpolation performance through a comparison with the training inputs and cross-validation experiments. In Sect. 5 we provide an assessment of the improved temporal variability achieved by the use of a daily product, before finally ending with conclusions in Sect. 6.

Data
In the following section we outline the processing steps applied to CS2, S3A, and S3B along-track data for generating the radar freeboard observations used as inputs to the Gaussian process regression model as well as listing auxiliary data sets used. Note that radar freeboard (the height of the radar scattering horizon above the local sea surface) is distinct from sea ice freeboard (the height of the snow-ice interface above the local sea surface). To convert radar freeboard to sea ice freeboard, a priori information on snow depth, density, and radar penetration depth are required. In this study we focus solely on radar freeboard so as not to impose new sources of uncertainty. Radar freeboard can be regarded as the "base product", from which pan-Arctic daily estimates of sea ice freeboard and thickness can later be derived. For this study, along-track radar freeboard was derived for CS2 (synthetic aperture radar (SAR) and synthetic aperture radar interferometric (SARIN) modes), S3A, and S3B in a two-stage process that is detailed in full in Lawrence et al. (2019). First, raw Level-0 (L0) data were processed to Level-1B (L1B) waveform data using ESA's Grid Processing On Demand (GPOD) SARvatore service (Dinardo et al., 2014). At the L0 to L1B processing stage, Hamming weighting and zero padding were applied, both of which have been shown to be essential for sea ice retrieval and which are not included in the ESA standard processing of S3 data at this time  -they are however applied during ESA's CS2 processing chain. Next, L1B waveforms were processed into radar freeboard following the methodology outlined in Lawrence et al. (2019) (based on that of Tilling et al., 2018), and accounting for the Sentinel-3A and -3B (S3) retracking bias, as suggested in their conclusion. After subtracting the extra 1 cm retracking bias, 2018-19 winter-average freeboards from CS2, S3A, and S3B fall within 3 mm of one another (see Lawrence, 2019). Notably, the standard deviation on the S3A(B)−CS2 difference is comparable to the standard deviation on S3A−S3B (σ = 6.0(6.0)(5.9) cm for S3A−CS2(S3B−CS2)(S3A−S3B)). Since S3A and S3B are identical in instrumentation and configuration, differing only in orbit, this suggests that any CS2−S3A(B) radar freeboard differences are the result of noise and the fact that the satellites sample different sea ice floes along their different orbits rather than due to biases relating to processing. Such consistency between data from individual satellites permits us to combine data from all three; thus in the following methodology we propagate data from the three satellites as a single data set (CS2S3). It is also worth noting that while we have not compared GPOD-derived CS2 radar freeboard with the ESA L2 baseline D product, Lawrence et al. (2019) applied the same L1B → L2 processing to GPOD L1B and ESA L1B (baseline C) data and found a radar freeboard difference of ∼ 6 mm, which can be attributed to the fact that the GPOD L1B data do not contain the stack standard deviation parameter which is used for filtering lead and floe waveforms in the ESA L1B → L2 processing chain. This means that the product presented in this article can effectively be regarded as version 1.0, as we await the availability of ESA Sentinel-3 Level-2 data which are processed to be consistent with CS2 (i.e. with Hamming weighting and zero padding applied).

Auxiliary data
In order to produce pan-Arctic estimates of radar freeboard for a given day, we need to know the maximum sea ice extent on that day. For this we extract daily sea ice concentration fields between December 2018 and April 2019 from the National Snow and Ice Data Center (NSIDC). We choose the NASA Team sea ice algorithm applied to passive microwave brightness temperatures from the Nimbus-7/SMMR (Scanning Multichannel Microwave Radiometer), DMSP (Defense Meteorological Satellite Program)/SSM/I (Special Sensor Microwave/Imager), and DMSP/SSMIS (Special Sensor Microwave Imager/Sounder), which are provided on a 25 × 25 km polar stereographic grid (Cavalieri et al., 1996). We then select grid cells containing sea ice as those with a concentration value ≥ 15 %. In Sect. 3 we distinguish between first-year ice (FYI) and multi-year ice (MYI) zones in order to perform computations related to the model setup. For this we use the Ocean and Sea Ice Satellite Application Facility (OSI-SAF) daily ice type product (OSI-403-c; Aaboe et al., 2016), derived from the DMSP/SSMIS, Metop/ASCAT, and GCOM-W/AMSR-2 satellites. These data are provided on a 10 × 10 km polar stereographic grid.
Finally, in Sect. 5 we utilise the NSIDC-defined sea ice regions (Fetterer et al., 2010) in order to perform a regional analysis of the temporal variability of the daily CS2S3 product, as well as making comparisons with daily ERA5 snowfall reanalysis data (ERA5, 2017). Daily snowfall data were generated for each day between December 2018 and April 2019 by computing the sum of 24-hourly reanalysis fields.

Method
In this section we outline the method of Gaussian process regression, and how it can be used to produce gridded pan-Arctic radar freeboard observations on any given day. In the example presented here, we average along-track freeboard observations from each satellite on a 25 × 25 km polar stereographic grid. We also re-grid NSIDC sea ice concentration, and OSI-SAF ice type data to the same grid.
For a given day t, we have gridded freeboard observations from the CS2, S3A, and S3B satellites (z 1 (t), z 2 (t), and z 3 (t) respectively). The number of observations from each satellite varies such that z , where some, but not all, of the observations between z 1 (t), z 2 (t), and z 3 (t) are colocated in space. Let us then define z(t) as an n(t) × 1 vector n(t) = n 1 (t) + n 2 (t) + n 3 (t) which is generated by concatenating the freeboard observations from each of the three satellites. Repeating this step for ± τ consecutive days in a window around day t, we combine T = 2τ + 1 number of z(t) vectors in order to produce a single n × 1 vector n = n(t − τ ) + . . . + n(t) + . . . + n(t + τ ) of all observations z. In this case we choose τ = 4; hence 9 d of observations are used in the model training. This results in the majority of grid cells having been sampled at least once over this period, thus reducing the prediction uncertainty at any given grid cell. We can see in Fig. 1 how, in this example, the spatial coverage is improved from ∼ 23 % to ∼ 72 % by using 9 d of observations instead of 1. It is also worth noting that freeboard measurements from different satellites which are co-located in space and time are treated as separate observation points in our proposed workflow, where the average percentage of co-located points on any given day between CS2, S3A, and S3B; CS2 and S3A; CS2 and S3B; and S3A and S3B is < 1 %, ∼ 4 %, ∼ 4 %, and ∼ 8 % respectively. Our aim is then to understand the function which maps the freeboard observations to their respective space-time positions in order to make predictions at unobserved locations on day t. This functional mapping is commonly expressed as where ε represents independent identically distributed Gaussian noise with mean 0 and variance σ 2 . The zonal and meridional grid positions of the freeboard observations are then given by the n × 1 vectors x and y respectively, and finally t is a n × 1 vector which contains the time index of each observation point t . For convenience we also define the collective training inputs = (x, y, t) as an n × 3 matrix, such that f (x, y, t) ≡ f ( ).
Gaussian process regression (GPR) is a Bayesian inference technique which enables us to learn the function f from the training set of inputs and outputs z and to subsequently make predictions for a new set of test inputs * = (x * , y * , t).
Here the test inputs correspond to the zonal and meridional grid positions of where we would like to evaluate the predictions. For now, let us assume that this corresponds to one grid cell which contains sea ice on day t (i.e. a grid cell within the grey mask shown in Fig. 1). In GPR we assume that f is a Gaussian process (GP), which means that we automatically place a prior probability over all possible functions, not just the function we wish to learn (see, e.g., Rasmussen and Williams, 2006). We can then assign higher probabilities to functions which we think might closely resemble f , which we do through a choice of mean and covariance function: Here represents arbitrary function inputs (either the training or test inputs * in our case). For the prior mean m( ) we assign a constant value which is given as the mean of CS2 FYI freeboards from the 9 d prior to the first day of training data, i.e. from (t − τ − 9) to (t − τ − 1). The reason that we only use FYI freeboards is that with fewer observation points, GPR has less evidence to support significantly different freeboards from m( ). From Fig. 1e we can see that we have fewer observation points in the (FYI) coastal margins; hence predictions here will likely remain close to m( ). In the MYI zone, however, we have a large number of observations (more evidence) to support freeboards which may be different from m( ), so the choice of m( ) will likely have little effect on the prediction values here. For the prior covariance k( , ), we choose the anisotropic Matérn covariance function: with Euclidean distance d( , ) in space and time given by In the above definition, σ 2 f , x , y , and t (and also σ 2 from Eq. 1) are known as hyperparameters, each taking a value > 0. σ 2 f controls the overall variance of the function values, while ( x , y , t ) are the space and time correlation length scales; i.e. how far does one have to move in the input space (metres or days) for the observations to become uncorrelated. Let us now refer to θ = (σ 2 f , x , y , t , σ 2 ) T as the collection of all hyperparameters. In the Bayesian approach we can select values of θ which maximise the log marginal likelihood function: wherẽ z = z − m( ) and K = k( , ) + σ 2 I .
This procedure chooses hyperparameters in such a way as to make p(z| , θ ) (the probability of the observations under the choice of model prior) as large as possible, across all possible sets of hyperparameters θ . It is also worth noting that Eq. (4) is a useful tool in terms of model selection.
In classical approaches one might compare different models (e.g. different choices of prior covariance function) through cross-validation; however in the Bayesian approach we can select the preferred model by evaluating Eq. (4) directly on the training set alone, without the need for a validation set. In this case we tested a variety of prior covariance functions and found the Matérn to be favourable, with the highest log marginal likelihood. Once the hyperparameters have been found, the model is then fully determined and we can generate predictions of radar freeboard at the test locations * . This amounts to updating our prior probabilities of the function values from Eq. (2), by conditioning on the freeboard observations z in order to generate a posterior distribution of function values. , and S3B (c) on a 25 × 25 km polar stereographic grid, respectively covering approximately 11 %, 7 %, and 7 % of the total NSIDC NASA Team sea ice extent (grey mask) on day t = 1 December 2018. By combining the three satellites (d), this coverage is increased to approximately 23 %. Combining t ± 4 d of gridded tracks (e), the coverage is increased further to approximately 72 % of the total sea ice extent. The black ice type contour (from OSI-SAF) shows the boundary between thick MYI and thin FYI, on day t.
The posterior function values f * come in the form of a pre- where both the mean f * and variance σ 2 f * can be computed in closed form: Now that we have the framework in place to generate a predictive distribution of radar freeboard values for a given set of training and test locations, let us explore how we implement this in practice. Due to the need to invert a matrix of size n×n (K), GPR has run time complexity O(n 3 ), which means that if we double the number of training points then we increase the run time by a factor of 8. As we need to invert K at every iteration step when optimising the model hyperparameters, GPR becomes increasingly computationally expensive with increasing n. For this reason, we take an iterative approach to the predictions: for a particular day t, we generate predictions of radar freeboard at each grid cell by using only the available training data that exist within a 300 km radius (Fig. 2). We essentially repeat the process shown in Fig. 2 for every grid cell which contains sea ice, until we produce a pan-Arctic field on day t. While this does effectively mean that we consider observations beyond 300 km distance to be uncorrelated, it has the advantage of both computational efficiency, and it allows us to freely incorporate spatial variation into the length scales x , y , t (and hence spatial variation into the smoothness of the function values). This seems sensible given the different scales of surface roughness that exist between FYI and MYI (Nolin et al., 2002). Furthermore, we can consider 300 km to be a reasonable distance in estimating the spatial covariance between inputs given that freeboard observations are correlated up to a distance of at least 200 km due to along-track interpolation of sea level anomalies Lawrence et al., 2018). Figure 3 shows the result of repeating the steps in Fig. 2 for every grid cell which contains sea ice on day t. The freeboard values here correspond to the mean of the posterior distributionf * , and the uncertainty as the square root of the variance term σ f * . Notice that the CS2S3 freeboard appears smoother than if we were to simply average the training observation points (e.g. Fig. 1e). This is because GPR estimates the function values f , while we have stated in Eq. (1) that the observations z are the function values corrupted by random Gaussian noise. We also notice how the uncertainty in our estimation of the freeboard values increases in locations where we have fewer training data and is typically highest in areas where we have no data at all (e.g. the polar hole above 88 • N). In Sect. 5 we provide further discussion relating to the predictive uncertainty of the CS2S3 field, looking particularly at the Canadian Archipelago and the Greenland, Iceland and Norwegian seas, where uncertainties are consistently higher than in other regions of the Arctic. See also Supplement Figs. S1-S2 for sensitivity tests, where we show how the predictions and uncertainty estimates in Fig. 3 are affected by varying the number of days of observations used to train the model.
As a final point related to the methodology, given that we generate the predictions iteratively, we are therefore able to construct daily spatial maps of each of the model hyperparameters that maximise the log marginal likelihood function in Eq. (4). Of particular importance are perhaps the correlation length scale parameters, as these could be of use in other applications such as localisation techniques in sea ice data assimilation systems (Sakov and Bertino, 2011;Massonnet et al., 2015;Zhang et al., 2018Zhang et al., , 2021. Previous studies have estimated the average spatial length scales of sea ice thickness anomalies across a range of coupled climate models (Blanchard-Wrigglesworth and Bitz, 2014) and reanalysis products (Ponsoni et al., 2019) to be typically in the range of 500-1000 km. In our approach, the maximum spatial length scales of radar freeboard are capped at 600 km as this is the maximum distance between any pair of grid cells used during the model training (similarly the temporal length scales are capped at 9 d). Nevertheless, from Fig. 4 we notice some interesting features. Specifically, that larger spatial length scales typically coincide with the thicker MYI zone, as well as some localised features around the East Siberian-Chukchi seas and the Kara and Laptev seas. We also see some presence of anisotropy (i.e. at a given pixel, x = y ), particularly in the area north of Greenland, and some regions of the Beaufort Sea. This could be explained by the presence of ridges and other deformation features which form in a preferential orientation relative to the apparent stress regime (Kwok, 2015;Petty et al., 2016). Some of the lowest correlation length scales occur in the peripheral seas, which is perhaps related to more dynamic ice conditions and uncertainty related to spatial sampling in these regions (see Sect. 5 for further discussion). With regards to temporal length scales, we see that radar freeboard observations are correlated over much of the Arctic over the 9 d training period, except for the Canadian Archipelago and some areas of the Greenland, Iceland and Norwegian seas (also discussed further in Sect. 5).

Validation
In this section we use various metrics to assess the GPR model in terms of training and prediction. We first compare CS2S3 daily freeboards against the training inputs in order to derive average errors across the 2018-2019 winter season, before evaluating the predictive performance of the model through cross-validation experiments. Note that hereafter, all CS2S3 fields are generated at 50×50 km resolution, and CS2 and S3 along-track data are averaged to the same 50 × 50 km polar stereographic grid, for computational efficiency.

Comparison with training inputs
Here we assess the performance of the GPR model during training; that is, an assessment of the errors between the gridded CS2S3 field and CS2 and S3 gridded tracks, for all days between the 1 December 2018 and the 24 April 2019. In general, caution should be applied when evaluating training performance, as a model which fits the training data well may be fitting to the noise in the observations (over-fitting), leading to poor predictive performance. Meanwhile, a model which under-fits the training data is equally undesired (note however that over-fitting is inherently mitigated in the GPR methods; see, e.g., Bishop, 2006). In Eq. (1) we expressed that in our model the freeboard observations from CS2 and S3 correspond to the function values plus random Gaussian noise. Therefore by comparing our CS2S3 freeboard values from Eq. (5) to the training data, we should expect the difference to correspond to zero-mean Gaussian noise. In Fig. 5 we compute differences (CS2-CS2S3, S3A-CS2S3, S3B-CS2S3) for all days, where we can see that the difference (error) follows a normal distribution, centred approximately on 0. The average daily difference µ between CS2S3 and CS2 (or S3) is ≤ 1 mm, with standard deviation on the difference σ < 6 cm (see also Supplement Fig. S3 where we compare training errors for interpolations run at different spatial grid resolutions for 1 d). Furthermore, we can see that the root mean square error (RMSE) between CS2S3 and the training inputs is equivalent to the standard deviation of the difference, which can only occur when the average bias is approximately 0. Notably, each of the standard deviations is approximately  equal to the ∼ 6 cm uncertainty in 50 km grid-averaged freeboard measurements, determined from a comparison of S3A and S3B data during the tandem phase of operation (see Supplement Fig. S4). A breakdown of average errors for each month is given in Table 1; however it should be noted that rounding is likely to play a role in many of the presented statistics. For example, we can see that the CS2-CS2S3 mean difference for the "all months" case is reported as 0.001 m, and similarly for S3A-CS2S3 we have 0.000 m. If we expand these, we arrive at 0.00078 and 0.00024 m respectively; hence the apparent difference between means CS2-CS2S3 and S3A-CS2S3 is almost halved.

Cross-validation
We have seen in the previous section that CS2S3 freeboards closely resemble CS2 and S3 observations (the training data); however this does not give an indication as to how well the model performs in predicting at unobserved locations. An independent set of radar freeboard observations would allow Table 1. Mean (µ), standard deviation (σ ), and RMSE of the daily difference between gridded CS2, S3A, and S3B tracks and co-located CS2S3 points, for all days between the 1 December 2018 and the 24 April 2019. Also computed for all days in each respective month. CS2S3 freeboards to be validated at locations unobserved by CS2 or S3 on any given day; however in the absence of these we rely on alternative metrics to evaluate the predictions. Commonly, data from the airborne mission Operation Ice-Bridge are used to validate sea ice freeboard and/or thickness observations; however as only 6 d were collected in a small area around north of Greenland in 2019, there are insufficient data points here to realistically draw any conclusions about the CS2S3 freeboards on larger temporal and spatial scales. K-fold cross-validation is a useful tool when validation data are limited; however models must be run K number of times (ideally where K = n), incurring significant computational expense when n is large (as it is in our case). We opt for a pragmatic solution here: we test the predictive performance of the model by removing different combinations of each of the S3 satellites from the training data, re-generating the predictions with the remaining subset, and subsequently evaluating predictions on the withheld set. For example, in the first instance we generate daily pan-Arctic predictions across the 2018-2019 period, except that we withhold both S3A and S3B from the training set. We then evaluate predictions from each day against observations from S3A and S3B. In the next instance, we repeat the same process except that we withhold only S3A from the training set (whilst retaining S3B) and use S3A as the validation set. In the final scenario, we withhold S3B from the training set and use S3B as the validation set. It should be noted, however, that this approach only allows us to validate predictions below 81.5 • N and at locations where sea ice concentration values are ≥ 75 %. This is because 81.5 • N is the limit of the spatial coverage of S3 satellites, and during the processing of both CS2 and S3 along-track data, diffuse waveforms which sample grid cells with lower than 75 % concentration are discarded . Figure 6 compares predictions for each of the previously mentioned scenarios, across all days in the 2018-2019 winter period. As we would expect, the mean and spread of error in the predictions is largest in the case where only CS2 is used to train the model, with µ = −0.002 m, σ = 0.074 m and µ = −0.004 m, σ = 0.073 m relative to S3A and S3B observations respectively. The mean and spread of error then decreases slightly with the incorporation of either S3A or S3B, with (µ = −0.001 m, σ = 0.072 m) and (µ = −0.003 m, σ = 0.072 m) respectively. The mean error across all validation tests is consistently ≤ 4 mm, showing that the model is able to make reliable predictions at unobserved locations. We can therefore be confident that with the inclusion of all three satellites, the predictions at unobserved locations are at least as good as ≤ 4 mm error if not better. A breakdown of equivalent statistics is presented for each month in Table 2, although we do not include RMSE here as it is again consistent with the standard deviation in each case. See also Supplement Figs. S5-S6 which show the uplift in the actual predictions and uncertainty estimates brought by the inclusion of all three satellites, as opposed to any one of the cross-validation experiments outlined above.

Assessment of temporal variability
Finally, to showcase the capabilities of the CS2S3 product, we turn to an assessment of temporal variability. Within nine sectors of the Arctic we calculate the mean CS2S3 freeboard for each day of the 2018-2019 winter season to produce a time series of radar freeboard evolution for each sector. We also plot the 31 d running mean freeboard from CS2, S3A, and S3B to demonstrate the increase in variability when moving from a monthly to a daily product. The nine Arctic sectors are taken from NSIDC (Fetterer et al., 2010) and include Baffin and Hudson bays, the Greenland, Iceland and Norwegian (GIN), Kara, Laptev, East Siberian, Chukchi and Beaufort seas, the Canadian Archipelago (CAA), and the central Arctic.
In Fig. 7 we can see how the day-to-day variability is increased with the CS2S3 product, compared to the CS2 and S3 31 d running means. Generally, the mean of the CS2S3 time series lies within 3 mm of CS2 and S3 (approximately in line with the results of the cross-validation presented in Sect. 4.2); however large discrepancies exist in the GIN seas (up to 1.1 cm), the CAA (up to 1.3 cm), and Baffin and Hudson bays (up to 1.2 cm in December 2018). This is perhaps not surprising given the larger uncertainty in radar freeboard in shallow-shelf seas and at coastal margins, relating to higher uncertainty in interpolated sea level anomalies . Indeed, we can see that the difference between the 31 d running means (CS2-S3A(B)) in the CAA is also  Table 2. Mean (µ) and standard deviation (σ ) of the daily difference between gridded S3A and S3B tracks and co-located CS2S3 points, with different combinations of S3 observations removed from the training data. large, at 1.1 (1.9) cm. The GIN region includes the area east of Greenland, one of the most dynamic regions which carries MYI exported out of the Fram Strait southward. Being such a dynamic region, we expect that the difference in spatial sampling of the three satellites may drive discrepancies between the 31 d running means. Note that the GIN seas, the CAA, and Baffin and Hudson bays also coincide with where we see some of the largest uncertainty in the CS2S3 daily field (e.g. Fig. 3b), as well as the lowest spatial and temporal freeboard correlation length scales (Fig. 4), which may be, in part, a reflection of the discrepancies between CS2 and S3 freeboards in these locations but also due to the limited amount of data available in these regions across the 9 d training period. We can see from A natural question is then whether the variability we see in the time series in Fig. 7 represents a real physical signal or whether it is just noise related to observational uncertainty.
To address this question, we first consider the potential issue of spatial sampling by creating a "benchmark" time series (see Fig. 7). For this, we initially compute a "static" background field by averaging all CS2 and S3 gridded tracks between the 1 December 2018 and the 24 April 2019 and then re-generate predictions for each day, as per our GPR model, except that the training data now sample from the background field along the CS2 and S3 track locations of the respective days used in the model training. We should therefore expect the benchmark predictions for each day to be approximately equivalent in areas where there are a significant amount of training data (see Fig. 8) and to see variability in areas where there are less data, which can be explained by tracks sampling different locations of the background field on different days. This is typically what we see in Fig. 7, where there is almost zero day-to-day variability of the benchmark time series in regions such as the Beaufort, Chukchi, East Siberian and Laptev seas but slight variability in the coastal margins. As a second test, we also compare the evolution of daily CS2S3 radar freeboard to daily ERA5 snowfall data, building on the work of Lawrence (2019). Assuming that the radar pulse fully penetrates the sea ice snow cover, radar freeboard is expected to change with snowfall by two distinct mechanisms: (i) snow loading of sea ice will depress the sea ice floe into the ocean, reducing the sea ice freeboard and therefore also the radar freeboard; (ii) because of the slower speed of radar propagation through snow compared to air, additional snowfall will further slow down the pulse, resulting in delayed receipt of the echo at the satellite and equating to a reduced radar freeboard. Thus, under the assumption of full snow penetration, radar freeboard is expected to decrease with increasing snowfall according to these two mechanisms. In Fig. 9 we compute the average of each field over FYI and MYI zones for each day, since surface roughness and snow depth typically show different characteristics over the two ice types , and therefore the variability of radar freeboard with snowfall may also show different signals. Freeboard anomalies are then generated by subtracting a second-order polynomial fit to each of the time series. In contrast to expectations, we find a strong positive correlation of 0.52 over FYI and a medium positive correlation of 0.41 over MYI. While it goes beyond the scope of this study to investigate the specific drivers of this correlation, these results appear to challenge conventional assumptions of full snow penetration at Ku-band frequency. Our derived correlation of 0.52 and 0.41, together with our benchmark test, suggest that the CS2S3 data are able to capture physical radar freeboard variability at sub-weekly timescales.

Conclusions
In this study we presented a methodology for deriving daily gridded pan-Arctic radar freeboard estimates through Gaus- sian process regression, a Bayesian inference technique. We showed an example of how our method uses 9 d of gridded freeboard observations from CryoSat-2 (CS2), Sentinel-3A (S3A), and Sentinel-3B (S3B) satellites in order to model spatio-temporal covariances between observation points, and make pan-Arctic predictions of radar freeboard, with uncertainty estimates, on any given day at 25 × 25 km resolution. We refer to this product as CS2S3. We also introduced an empirical Bayesian approach for estimating the (hyper)parameters which define the covariance function, one which maximises the probability of the observations under the choice of model prior and allows us to derive estimates of the spatial and temporal correlation length scales of radar freeboard. An evaluation of the model performance was then carried out for both training and predictions at 50×50 km resolution for computational efficiency. For the training points, we compared our CS2S3 freeboards to gridded CS2 and S3 freeboards at co-located points for all days across the 2018-2019 winter season, where we found the differences to follow a normal distribution, with mean errors ≤ 1 mm and standard deviations < 6 cm. We then evaluated the predictive performance of the model for latitudes below 81.5 • N and at locations where sea ice concentration is ≥ 75 %, based on a cross-validation approach where we withheld different combinations of S3 freeboards from the training set of observations, re-generated the daily predictions, and used the withheld data to validate the predictions. We found the general prediction error to also be normally distributed, with mean errors ≤ 4 mm and standard deviations < 7.5 cm. Finally, we presented the improved temporal variability of a daily pan-Arctic freeboard product by comparing time series of CS2S3 freeboards, with 31 d running means from CS2 and S3 observations, in nine different Arctic sectors. We showed that the mean of the CS2S3 time series was generally within 3 mm of the 31 d running mean time series from CS2 and S3, except for the Canadian Archipelago and the Greenland, Iceland and Norwegian seas, where prediction uncertainty is large due to significant discrepancies between CS2 and S3 freeboards. We then presented two pieces of analysis to conclude that the variability we see in our radar freeboard time series is related to a real physical signal rather than noise related to observational uncertainty. One is based on a benchmark where we generated predictions from a static field for all days across the 2018-2019 winter season, and for the other we compared time series of daily radar freeboard anomalies to daily ERA5 snowfall data over first-year-ice (FYI) and multi-yearice (MYI) zones. Pearson correlations between freeboard and snowfall in these regions were given as 0.52 over FYI and 0.42 over MYI, suggesting that the daily fields produced by Gaussian process regression are able to capture real radar freeboard variability. Interestingly, the positive correlation between radar freeboard seemingly contradicts conventional assumptions of full snow penetration at the Ku-band radar frequency and remains the subject of future investigation. The improved temporal variability from a daily product is a hopeful prospect for improving the understanding of physical processes that drive radar freeboard and/or thickness variability on sub-monthly timescales. Of course, an investigation into the drivers of the temporal variability in the CS2S3 field would be an additional way to validate the product, although this goes beyond the scope of our study. We conclude here that the Gaussian process regression method is an extremely robust tool for modelling a wide range of statistical problems, from interpolation of geospatial data sets, as presented here and in other works (Le Traon et al., 1997;Ricker et al., 2017), to time series forecasting (Rasmussen and Williams, 2006;Sun et al., 2014;Gregory et al., 2020). The Gaussian assumption holds well in many environmental applications, and the fact that the Gaussian process prior can take any number of forms, so long as the covariance matrix over the training points is symmetric and positive semidefinite, means that the model can be tailored very specifically to the problem at hand.
Code and data availability. Generating the daily CS2S3 radar freeboards at 25×25 km resolution with optimised hyperparameters (as per the methodology outlined in Sect. 3) is computationally demanding and is currently only achieved through high-performance computing systems. We plan to make these open access as and when they become available; however the timings of the release are likely to be governed by the availability of computer resources. It is considerably less burdensome to generate the daily fields without op- Figure 9. Time series of CS2S3 daily radar freeboard anomalies with daily ERA5 snowfall (cm snow water equivalent (SWE)). The corresponding contour for FYI and MYI zones are given in the accompanying spatial plot, where each pixel represents a location which has remained either FYI or MYI for the entire 2018-2019 winter season. The Pearson correlation (r) between radar freeboard anomalies and snowfall are given for each zone.
timisation, by prescribing pre-computed hyperparameter estimates (e.g. a climatology) -we refer to these fields as a "quick-look" product. We find that the quick-look product produces relatively consistent results compared to those presented in Sects. 4 and 5 for the optimised fields, and therefore we will also make these data publicly available. The code and data files for the optimised and quick-look daily radar freeboards (with uncertainty estimates) can be found in the GitHub repository https://github.com/William-gregory/ OptimalInterpolation (last access: 5 May 2021) and Zenodo (https://doi.org/10.5281/zenodo.5005980, Gregory, 2021).
Author contributions. WG developed the GPR algorithm and ran the processing for generating the daily CS2S3 fields. IRL conducted all of the pre-processing of CryoSat-2 and Sentinel-3 data files, as well as performing the analysis of the Sentinel-3 tandem phase comparisons. IRL also wrote Sect. 2.1 of this paper. WG wrote the remaining sections. MT suggested the application of the GPR algorithm for this framework of combining satellite altimetry products and also provided technical support and input for writing this paper. Both co-authors contributed to all sections of the paper.