Relating regional and point measurements of accumulation in southwest Greenland

. In recent decades, the Greenland ice sheet (GrIS) has frequently experienced record melt events, which significantly affected surface mass balance (SMB) and estimates thereof. SMB data are derived from remote sensing, regional climate models (RCMs), ﬁrn cores and automatic weather stations (AWSs). While remote sensing and RCMs cover regional scales with extents ranging from 1–10 km, AWS data and ﬁrn cores are point observations. To link regional scales with point measurements, we investigate the spatial variability of snow accumulation ( b s ) within areas of approximately 5 1–4 km 2 and its temporal changes within two years of measurements. At three different sites of the southwestern GrIS (Swiss Camp, KAN-U, Dye-2), we performed extensive ground-penetrating radar (GPR) transects and recorded multiple snow pits. If the density is known and the snowpack dry, radar-measured two-way travel time can be converted to snow depth and b s . We spatially ﬁltered GPR transect data to remove small scale noise related to surface characteristics. The combined uncertainty of b s from density variations and spatial ﬁltering of radar transects is at 7–8% per regional 10 scale of 1–4 km 2 . Snow accumulation from snow the spatial pattern of very similar two consecutive years. target reﬂectors placed at respective end-of-summer-melt horizons, the occurrences of lateral redistribution within one melt found no lateral ﬂow of the current at Dye-2. Such studies of spatial representativeness and temporal changes in accumulation are necessary to assess uncertainties of the linkages of point measurements and regional scale data, are used for validation and calibration of remote sensing data and RCM of the representativeness of point measurements. For this purpose, we examine snow-pit and ground-penetrating radar (GPR) data from two sites within the percolation zone of the GrIS and one site at the equilibrium line gathered over several ﬁeld seasons. For each site, we investigate density variability between measurements from up 75 to six snow pits within an area of 4 km 2 made in a single season, process radar transects of up to 25 km recorded in close proximity to those snow pits, and spatially extrapolate the radar-derived accumulation to estimate area-wide accumulation variability. For temporal comparisons, we use continuous observations of accumulation and melt recorded by upGPR (Heilig et al., 2018). Our results show that spatial representativeness of snow accumulation for a point measurement (snow pit) is large but values can be affected by local wind-induced surface roughness. We recommend to apply multiple snow depth measurements at the vicinity of the pits to better assess accumulation on regional scales.

L32 Snowfall can be measured by remote sensing through satellites (i.e. CloudSAT).I.e. Bennartz et al, 2019Bennartz et al, (https://doi.org/10.5194/acp-19-8101-2019, etc. Here, we respectfully disagree. Bennartz et al. (2019) describe that "…CloudSAT provide ESTIMATES of snowfall in remote regions…". They present several sources of uncertainties and "…approaches to mitigate these adverse effects…". So we still keep the statement that snowfall cannot be measured but changed the phrase to: This is because surface mass fluxes, such as snowfall and melt, cannot be measured by remote-sensing technology and derived estimates on snowfall can still have significant errors \citep{Bennartz2019}. Hence, predictions of SMB are usually obtained using scarce in situ measurements together with regional climate models (RCMs), which can introduce significant uncertainties \citep{Vernon2013} as well.
L33 "in concert" use another phrase here, take out "dedicated" Changed to: …together…; dedicated has been removed L47 Remove "worked to" and change "link" to "linked" Changed accordingly. quantification.." This is repeating the same point as earlier in the paragraph (L47) Probably only need to state this once even though it is an important point. Changed to: "Since quantification of spatial representativeness of single point measurements for the surrounding square kilometers has only been conducted for one point in western Greenland so far \citep{Dunse2008}, there is a need to explore uncertainties at local and regional scales.". We consider this sentence as being valuable to highlight the motivation of this work.

L61-69 This paragraph is a bit disjointed. It begins with surface melt affecting SMB to annual accumulation estimates and observations to validating RCMs to melt impacting firn layers and then stating that there is a gap in how melt impacts temporal changes in accumulation distribution. Needs better flow.
Changed to: Meltwater percolation can move mass from snow to the underlying firn (e.g., \citealp{Charalampidis2016,Humphrey2012,Heilig2018}) or even laterally along the surface slope \citep{Humphrey2012}. Hence, surface melt affects SMB (e.g., \citealp{Sasgen2012}) and accumulation \citep{Heilig2018}. However, it is unlikely that water percolation and mass redistribution are homogeneous over regional scales. Consequently, it is necessary to assess the impact of melt on temporal changes in accumulation distribution for the percolation zone of the GrIS.

L70-78 See comments in Major Questions.
Answered as stated above.

L76 Clarify Question (iv) if it is kept here. It should be clear enough on its on that there should not be a "In other words" after.
Question has been removed.

L89, L83, L94 Remove coordinates and elevations from text and include this in table 1. It is very distracting.
Coordinates are now included in Table 1.

L99-102 Can this small section on radar units be combined with the paragraph above? Or can it be taken out and part of the table with a radar unit column?
Since the information in brackets on the respective coordinates of the measurement locations were considered as being distracting, we decided to keep the paragraph as is.
L104 Include a sentence or small clause about what and why dewow and bandpass filters for those who are not spun up about radar terminology. Changed to: All recorded radar traces were processed in a very similar way. In case first arrivals were delayed by more than approximately 2 ns, we started with a correction for the DC shift. Offsets in the zero line of each radar trace (wow) were corrected utilizing a dewow function and low (approximately below 0.5 times the center frequency) and high frequency noise (approx. above 1.5 times the center frequency) were cut by bandpass filters. We further applied background removals to minimize direct wave influences.

(L116?) Equation 2 -Define beta.
We included: …the exponent β=0.5 (related to a medium with random orientation at the micro scale), …We apologize for this.

L127 -Could you include the depth of the bulk density that you took from the snowpits?
The bulk density is calculated over the entire snow column. We did not define samples of a specific depth as being representative of the bulk. As requested, we included snow depth values in Table 2 and included: (see Table 2 for details).

L129 -Why do you include NASA-SE and EKT?
They do provide you with two more range values but they are not relevant for SW Greenland. These sites are not brought up again later for any other analysis so could they be removed? We included description of the sites within the methodology and used the presented data for extension of the conclusions of regional spatial variability in snow density within the discussion section. As these two sites are located within a distance of 45-60 km of the GrIS ice divide (W of the divide -EKT and E of the divide -NASA SE, see Figure 1), they extent our data analysis of spatial variability of ρs to the dry-snow zone. The recorded pits at NASA SE provide data for a high accumulation site as well.
L133 -"For all three sites", similar to comment above, you are talking about five sites in this section but now only reference three in SW Greenland. Changed to: For all three transect sites… L138 -Is vertical sampling related to the frequency of the radar? If so, state this. Also, what is an example of small scale surface roughness? Are these not wind features? No, vertical sampling rates are related to the depth ranges selected (time window length of the radar acquisition) and the sampling frequency (how many samples are measured within the selected range). Since we intended to use the recorded data also for other purposes such as analyzing deeper firn stratigraphy, the selected range and sampling rates were a trade off in between vertical resolution and depth. Concerning the second question, you are right small scale surface roughness are mostly related to wind features as being introduced in the subsequent sentence.
L172 -Need an explanation of variograms prior to using it consistently throughout the next section. L187-193 -Using variograms consistently now, the term or concept needs to be explained prior for readers unfamiliar. We extended the following sentence to introduce the term variogram: \citet{Webster2007} state that sample size is directly related to the precision of variogram estimates, while variograms are used to estimate the variance of a parameter (here snow accumulation) at increasing intervals of distance in between measurements and in multiple directions.
L174 -Add a comma after "First" Changed accordingly.
L175 -Clarify "there are no gaps in accumulation in between", are there no gaps in the radar transmission of the accumulation? We modified the phrase within the brackets to: …(accumulation occurred everywhere within the area of interest, governed by local weather conditions). However, the entire subsection changed significantly. … L186 -"Despite the trend removal, anisotropy of the covariance...", unclear on what this means? See above, the section was rephrased. The respective sentence reads now: In addition, we found directional anisotropy of the covariance in all of the longer transects, which means that accumulation variation varies with direction.

L197-198 -Define bs and bn in the sentence before the equation. They are stated but adding in the variables adds another layer of clarification.
Changed accordingly. L199 -Re-arrange this sentence. "Using the recorded radar traces, it is determined whether any randomly located..." Changed accordingly.
L208 -Back to the "Major Questions" point brought up above, the step to assess errors associated with TWT is necessary for your main question (ii) of the paper. This is stated as the first sentence. Is it necessary for this to be a major question in the opening since this is already a part of answering your other question? Clearly, this is a major result and should be discussed (as it is in the paper) but it is not necessarily the focus as the other question(s) are. See above -the respective paragraph has been changed significantly and all listed questions are removed. As suggested, we focus on spatial representativeness whereas liquid water percolation as well as multiple radar acquisitions are supportive to assess representativeness and reach a broader impact as just singular point observations in time.
L210 -Is "accumulation pattern persistence" the same thing as inter-annual variability? The analysis is how accumulation is changing over space and time. We have modified the language and removed term "accumulation pattern persistence". We now describe changes within the two consecutive accumulation season observations at Dye-2. L211 -The wording of "whether seasonal changes in accumulation due to melt and liquid water percolation have major effects on accumulation pattern" is confusing. How would there be seasonal changes in accumulation due to melt? What is meant by accumulation pattern? How accumulation would change spatially due to melt? Is the question about how meltwater influences thickness of the layer? Please clarify this. We clarified to: Finally, we investigate how accumulation changes due to melt and liquid-water percolation.
We included a column in Table 2 with mean snow depths and included the following phrase: …distances between ranged from a few meters up to 1 km, while snow depths ranged: from 0.83 m to 1.70 m.

L221 -Is it five locations in SW Greenland?
The NASA-SE site is in SE Greenland, though the EKT site could be considered to be in SW Greenland. Changed to …southern GrIS…. In addition, we included: The inclusion of two more sites close of the southern Greenland ice divide extents the data set to a low accumulation site west of the ice divide (EKT: ̅~300 kg/m 2 ) and a high accumulation sites east of the divide (NASA SE: ̅~600 kg/m 2 ).
L247 -Is "8-10m" the scale of the wind generated surface features causing minimums? Changed to: However, the observed minimums in bs along the south-north transect lines are at regular distances between 8-10 m and are likely the result of wind-generated surface features.

L252 -Earlier the "SWE" is referenced as "scaled accumulation", bs.N. (Section 2.3) Can you reference back to this for clarity as the variable?
Modified to: Figure 4b displays the scaled accumulation distribution (bs,N) through box plots.
L293-295 -"which represents not averaging snow depth around the snow pit". This is unclear, why would the area around the snow pit be averaged in?' Changed to: The unfiltered data, however, show a decreased representativeness with p=0.89 in 2015/16 and p=0.77 in 2016/17 for the same uncertainty range of ±10\%. Here snow depth is solely derived from the snow pit. Such values demonstrate that bs data derived simply from a snow pit without averaging snow depth for an area around the pit location will decrease the area-wide representativeness at Dye-2.
L324 -"However..." does this refer to KAN-U? Can you combine this with the previous sentence for clarity ? We included: However, we consider a probability of p≥0.8 with uncertainty of ±10\% for both study sites as a resilient estimate. Section 3.3 -Frankly, could take out the KAN-U comparison with such a small area overlapped and not having consecutive years of data, there is no real major conclusions to be drawn here and it is not brought up again in the paper. Changed as suggested to: At KAN-U only 0.16 km 2 were covered during both radar acquisitions and, consequently, we do not investigate changes in accumulation for spring 2013 and 2017. For Dye-2, we recorded radar transects for two consecutive winter accumulation seasons. However, multi-year intersecting radar transects and, hence, spatially-consistent area-wide bs estimates are reduced. The intersecting area at Dye-2 comprises roughly 1.7 km 2 . Here, we observe a slight trend in the north -south direction for both accumulation seasons (Figure 6a and b). While the most southerly parts of the transect show above area-wide average bs values, the northern fringes are below the arithmetic mean of the area in bs. However, for both years the trends (in north to south direction) are statistically non-significant and very low at 5 kg/m 2 per 1 km for 2015/16 and 8 kg/m 2 per 1 km for 2016/17. The respective coefficients of determination of accumulation with latitude are very low as well (R 2 =0.15 -2015/16 and R 2 =0.25 -2016/17). The parallel stripes, mainly visible in Figure 6b for the southern parts, are certainly artifacts provoked by the grid design and the applied kriging. Local maximums in regular distances (150 -220 m) occur along the transect line, however, the spatial extrapolation of these features is impossible due to the applied radar grid.
To quantitatively assess agreement in accumulation patterns, we used the respective normalized accumulation data and calculated the quotient. The cumulative data distribution of the quotients is presented in Figure 8. A constant area-wide quotient of 1 would imply that the normalized accumulation patterns are exactly equal. For Dye-2, the probability of data being equally distributed in May 2016 and 2017 with a given uncertainty of ±10\% is p≥0.95, meaning all intersecting locations of the accumulation pattern in two consecutive years at Dye-2 are similar.

L349 -example here to explain the figure is great for clarity. Thanks
L363 -Will using a firn core instead of a density pit induce any further uncertainty? No, but it was impossible to dig down to the end-of-summer-horizon 2015 just using snow shovels. We did not have a chain saw with us for this field campaign and, hence, collected density data in a firn core. We do not consider that the firn core is providing more uncertainty but the method is different, which should be mentioned here.

L368 -Can you use a Delta symbol?
Corrected, we had a missing \ in the previous version. We apologize for this.
L370 -Artifacts in the sense that the GPR data from the winter accumulation was greater than the net accumulation? Artifacts in the sense that the accumulation in September 2016 was higher than in May 2016. Due to the fact that summer melt 2016 was significantly above average in terms of area extent in surficial melt (see Heilig et al. 2018 for details), it is unlikely that for specific locations accumulation increased while the average decrease in bs is at 51 kg/m 2 . Those artifacts most likely arise from singular outliers in kriged accumulation and are restricted to only six pixels. We included: ….are likely artifacts due to kriging outliers and errors… to clarify the sentence.

L375 -If the ice movement is known from the upGPR site, can it be corrected for?
We only have a rough location estimate from handheld GPS data. We do not consider such accuracies as adequate to correct all radar locations even though location uncertainties (5-10 m) are likely smaller than the annual ice movement (~25 m). However, accumulation values are extrapolated for 20 m by 20 m pixel sizes. It is debatable, whether co-locating GPR transects would decrease discrepancies of accumulation values from May 2016 to September 2016.
Section 4 -Conclusions -This very nicely ties up the study concisely and answers the questions put forth in the opening. If some questions are taken out, needs to be revised. Although, we removed the questions from the introduction, we do not think that the conclusion section has to be changed significantly. The term "interannual persistence" was removed throughout the manuscript. So the respective paragraph in the conclusions changed to: Our results suggest that there is only little change of accumulation patterns at Dye-2 for spring 2016 and 2017. However, the data only span two consecutive accumulation seasons that were very similar in average density and accumulation. As such, we cannot confirm whether such persistence might be observed in seasons with significantly more or less accumulation or at different sites; this is a topic for future work.   Table 2. We tried to include radar units but after including coordinates as suggested there is not enough space left for radar details other than antenna frequency.  We changed the dash for the density ranges to "to". The column headline has been modified as well to clarify that density ranges are given. We kept EKT and NASA SE since they extent the presented density variation to a factor of 2 in accumulation. The respective color for KAN-U 2012/13 has been changed from green to purple to account for colorblind purposes. In case you are referring to Fig. 2, here, the red line has been changed to yellow to facilitate reading for colorblind persons.   You are certainly correct and we attempted to combine those plots into one single figure. However, since TC will be printed as two-column paper, a smaller 1 panel plot will use less space than a 3 panel plot with a blank part underneath panel b. In Section 2.3, we included: After trend removal, we found directional anisotropy of the covariance in all of the longer transects, which means that accumulation variation varies with direction. Hence, we modeled variograms with different ranges per direction. In Table 3, we present major and minor axis of the range ellipsoid used for the variogram modeling. We included the colorbar for the elevation bands and tried to facilitate visibility of the upGPR locations. Figure 4,6,7,9 -A personal preference is to have coordinates in lat/lon instead of UTM. If the scales do not allow though, especially for an area like Swiss camp, that is fine because it is on a few km scale. As you mention, the respective areas are rather small and, hence, we prefer the UTM grid to remain consistent for all figures.
Other comments: The word "very" is used quite a bit throughout the manuscript as a qualifier and those instances can be removed the majority of the time. Using it does not add to the meaning of the sentences. Thank you for this suggestion. We checked whether the usage of the word "very" was necessary in the context of each sentence and removed/ changed expressions wherever useful.
We thank both referees and the associated editor for very constructive and helpful comments. There were several points raised by both referees that addressed similar or equivalent points. We listed the common points of criticism first before individual comments of each referee are considered separately. Minor changes such as typos have been incorporated in the MS without listing them here. In order to improve readability, comments by the respective referee are listed in italic, while responses and modifications in the MS are written in regular typesetting. Sentences and paragraphs being incorporated in the manuscript are listed in bold letters here and in the manuscript. To keep the manuscript up to date, we checked for recent publications and included some wherever appropriate. Within the introduction, we included Mottram et al. (2019) as another source for changes in mass loss processes and added Lewis et al. (2019) as another example of extensive ground-based radar campaigns. In addition, we exchanged the previously referenced Lewis et al. (2019)   -Both referees suggest to change the title of the manuscript. We decided to use the suggestion by Lynn Montgomery and changed the title to: Relating regional and point measurements of accumulation in southwest Greenland.
-Another point both referees criticize is the inconsistent/ interchangeable usage of SWE and snow accumulation within the manuscript. Surface mass balance (SMB) is solely used (and properly introduced, L27) within the introduction. Here, SMB is defined as …sum of snow accumulation and lateral redistribution by sublimation, wind and runoff…. This specifies the usage of the term "accumulation" and the importance of determining its spatial representativeness. In the revised manuscript, we consistently have changed the terminology to snow accumulation with symbol bs and units [kg/m 2 ].
-In addition, it has been suggested to simplify especially the section 2.3 dealing with spatial extrapolation. We now introduce terms such as variogram, nugget and anisotropy to facilitate readability of Section 2.3. Some radar terms are additionally explained as well.
-We modified the respective paragraphs in the introduction, which deal with objectives and scientific questions this work tries to answer. We fully agree that the main purpose of this manuscript is the relation of point measurements to regional accumulation. As stated by referee #1, the raised question (i) is a prerequisite to assess spatial representativeness and, hence, is removed from this listing. Since commonly applied in situ measurements of snow accumulations represent only a snapshot in time, it remains open whether accumulation patterns change with summer melt processes and are similar for two different winter accumulation season. We agree that the assessment of seasonal persistency cannot be properly determined with the available field data. However, since temporally continuous determinations of changes in accumulation are available and feasible in Greenland nowadays (upGPR, neutron probes), a relation of two consecutive years of data with point measurements is valuable and consequently is addressed in the results and discussion section. In addition, liquid water percolation has an effect on accumulation resulting in seasonal mass fluxes from the surface into deeper firn especially for the investigated sites within the deep percolation zone of the Greenland Ice Sheet. We changed the respective paragraph to the following statement: The aim of this work is to relate point scales to regional scales of one to several square kilometers in area to improve our understanding of the representativeness of point measurements. For this purpose, we examine snow-pit and GPR data from two sites within the percolation zone of the GrIS and one site at the equilibrium line gathered over several field seasons. For each site, we investigate density variability between measurements from up to six snow pits within an area of 4 km 2 made in a single season, process radar transects of up to 25 km recorded in close proximity to those snow pits, and spatially extrapolate the radarderived accumulation to estimate area-wide accumulation variability. For temporal comparisons, we use continuous observations of accumulation and melt recorded by upGPR \citep{Heilig2018}. Our results show that spatial representativeness of snow accumulation for a point measurement (snow pit) is high but values can be affected by local wind-induced surface roughness. We recommend to apply multiple snow depth measurements at the vicinity of the pits to better assess accumulation on regional scales.
Reply to referee #2: We highly appreciate comments raised by the referee and present a point-to-point reply for all issues raised by the referee. For an improved readability and to facilitate direct response, we sometimes subdivided comments into several paragraphs referring to similar issues Please also note our general response at the top of this document.
This paper tries to answer the question of how representative point measurements of snow accumulation are for the larger regional scale. This subject is important, urgently needs attention, and this paper fills a void in our knowledge on the connection between the observation scale and the (regional) climate modelling scale. Scientifically, the paper is solid, and I have few methodological remarks. In terms of presentation however, I have quite a few remarks. Changing a word or sentence here or there won't fix the fact that the paper is quite tough to read. We thank the referee for the evaluation and the overall positive assessment.

General
-Various terms are used interchangably, without a proper definition. Snow accumulation, SMB, SWE, snowfall, snow depth. Please have a critical look at the terminology, simplify, and make uniform. Please see the common comments above. We agree that snow accumulation and SWE were used inconsistently. Snowfall and snow depth are standing terms all described in Fierz et al. (2009). We do not consider it being necessary to introduce these terms. SMB is only used within the introduction where it is properly introduced.
-I had to dig quite deep in my memory to connect the dots between variograms,nuggets, spaceinvariance, isotropy and stationarity. Would it be feasible to ease the text in section 2.3?. A criticism raised by referee #1 as well. We now introduce each geostatistical term within this section. Range in the sense of correlation range is consistently used as correlation range from now on.
-The abstract is particularly awkward in grammar and style, as if it was the last part that was written and not checked before submission. I'll give three example sentences and how to make this readable:.
We sincerely apologize for the sloppiness of the abstract and carefully revised the entire abstract. We included all recommendations and now hope it is significantly simplified. In recent decades, the Greenland ice sheet (GrIS) has frequently experienced record melt events, which significantly affected surface mass balance (SMB) and estimates thereof. SMB data are derived from remote sensing, regional climate models (RCMs), firn cores and automatic weather stations (AWSs). While remote sensing and RCMs cover regional scales with extents ranging from 1--10~km, AWS data and firn cores are point observations. To link regional scales with point measurements, we investigate the spatial variability of snow accumulation (bs) within areas of approximately 1-4 km 2 and its temporal changes within two years of measurements. At three different sites of the southwestern GrIS (Swiss Camp, KAN-U, Dye-2), we performed extensive ground-penetrating radar (GPR) transects and recorded multiple snow pits. If the density is known and the snowpack dry, radarmeasured two-way travel time can be converted to snow depth and bs. We spatially filtered GPR transect data to remove small scale noise related to surface characteristics. The combined uncertainty of bs from density variations and spatial filtering of radar transects is at 7--8\% per regional scale of 1-4 km 2 . Snow accumulation from a randomly selected snow pit is very likely representative of the regional scale (with probability p=0.8 for a value within 10\% of the regional mean for KAN-U, and p>0.95 for Swiss Camp and Dye-2). However, to achieve such high representativeness of snow pits, it is required to determine the average snow depth within the vicinity of the pits. At Dye-2, the spatial pattern of snow accumulation was very similar for two consecutive years. Using target reflectors placed at respective end-of-summer-melt horizons, we additionally investigated the occurrences of lateral redistribution within one melt season. We found no evidence of lateral flow of meltwater in the current climate at Dye-2. Such studies of spatial representativeness and temporal changes in accumulation are necessary to assess uncertainties of the linkages of point measurements and regional scale data, which are used for validation and calibration of remote sensing data and RCM outputs.
-In several parts, you claim that snow accumulation should be established for an area of at least 20 x 20 m. This seems a very important implication for future field work. However, I miss the quantitative underpinning of these numbers. Why not 10 x 10, or25 x 25 m? And how should this be done if no GPR is available? This is such a crucial part of the manuscript that I expect some more discussion of the implications on field practice. We included an analysis on benefits from multiple snow probings on the assessment of the mean snow depth per area. This changed a large fraction of the respective section: The above results imply that a point measurement of bs (snow pit, upGPR value, neutron probe, etc.) is representative for an area of roughly 4x4 km 2 at Dye-2 with a probability of p ≥ 0.9 and an uncertainty of ±10% in case snow depth is averaged. For KAN-U, the spatial variability is slightly higher and, consequently, there is less certainty about how well a single measurement represents the surrounding area. However, we consider a probability of p ≥ 0:8 with uncertainty of ±10% for both study sites as a resilient estimate.
To quantitatively assess the benefit of snow depth measurements in addition to a snow pit, we numerically assume a sinusoidal snow depth variation with wavelengths of 56 m (arithmetic mean of the previously presented range in wavelength for the GPR transects) and average amplitude of ±6.8 cm (the fluctuations in snow depth from arithmetic mean). Averaging multiple snow depths (with a sampling distance of 1 m) from a 20 m long probing transect, result in a maximum possibly measured offset in snow depth of -20\% (amplitude decreases to 5.4 cm). A 10 m long probing line reduces the maximum offset by -6\% compared to single point measurements (6.4 cm amplitude). A 30 m long snow probing line, however, result in a decrease of maximum possible offsets by -44\% (3.8 cm amplitude). An additional cross line of probings will further decrease offsets. Only if the surface features are aligned symmetrically in both probing directions, the maximum offset derived from both lines will theoretically remain stable. For a measured snow pit with ρs=350 kg/m 2 and Ls=1 m, the combined regional uncertainty (±5\% density uncertainty, ±6.8 cm snow depth variation) reduces from a single point measurement with bs = 350±42 kg/m 2 to a maximum possible uncertainty of bs = 350±35 kg/m 2 for just a single 20 m probing line. These numerical results confirm values for representativeness derived from geostatistical extrapolation. Hence, we recommend to combine a larger number of snow-depth probings within an area of at least 20 m by 20 m in the vicinity of the pits to increase the regional representativeness. Regional snow density variations of ±5\% can be accepted if snow depth uncertainty is minimized. Snow probing lines can easily be performed with respectively low time consumption compared to multiple snow pits. In particular, the wind-induced surface roughness has to be accounted for to provide spatially-representative bs values.
-The title is inappropriate. My suggestion would be: "Representation of point measurements for regionalscale snow accumulation in/on the southwestern Greenland Ice Sheet." See above, the title has been modified in accordance to Lynn Montgomery's suggestion and we believe, it addresses your concerns as well.
-Throughout the paper, you seem to use rho_s mostly as a bulk parameter: a mean over a certain depth.
Can you more clearly distinguish between the actual snow density and this vertically integrated bulk density, and define the bulk density clearly?
We now specify "bulk snow density" when it first appeared in the methodology section: In dry snow and firn (with two contributing volume fractions θa+θi=1), the wave propagation depends solely on the relation of air (θa) to ice volume fraction (θi) (e.g., \citealp{Kovacs1995,Maetzler1996}). Hence, with the bulk snow density (ρs, the average density of the entire snow column) measured in snow pits, we can convert from TWT to snow depth (Ls) and the amounts of bulk accumulation bs with unit kg/m 2 ) using the equation Specific remarks -Title: "South-Western". I looked it up but this should be either "southwestern Green-land" or "Southwest Greenland". Title has been changed -see above.
-L1: significant changes. In what? See above the abstract has been changed significantly.
-L11: "per regional scale"? Changed to: The combined uncertainty of bs from density variations and spatial filtering of radar transects is at 7--8\% per regional scale of 1-4 km 2 .
-L11: "to analyze for". To analyze is a transitive verb. Suggestion "we investigate". This recurs frequently in the text (e.g. P1L17) We thank the referee for highlighting this. We haven't been aware that analyze is a transitive verb. We consistently substituted "to analyze" with "to investigate" or verbs with similar meaning, where appropriate.
-L70: suggest "To improve our understanding of the representativeness of ..." Has been changed and rephrased to: Point observations, such as snow pits and ice cores are usually performed once a year at most. Such temporal snapshots limit the evaluation of spatial representativeness as they can be influenced by recent weather conditions. Hence, it is necessary to clarify whether regional accumulation patterns are consistent over more than one accumulation season to investigate if temporally continuous point measurements such as AWS data, upGPR and neutron probes remain representative.
-L71: "in area for two sites" -> " in areas around two sites" We have removed the respective sentence.
-L72: I have once been taught that one paper answers one major question. Your one major question is about representativeness of point measurements. All other questions are hurdles that you come across while answering that question. My suggestions would be to rephrase, and to formulate L70-80 such that you introduce the different steps needed to answer your "major question" with associated sections.
We rephrased the respective paragraph as listed above.
-L197: Do not start a sentence with a mathematical symbol. Modified to: The term bs,N is simply… -L196 -206: past and present tense are used here interchangably. Please unify.
We apologize for this. It is now unified.
-L217: why call this ice volume fraction? Suggestion to simplify these sentences: "We investigate the error that we introduce by assuming a single bulk density in the conversion from TWT to snow depth for an entire GPR transect. For that, we use a collection of snow pits, several from each of five locations, that were collected in a period of three years (table 2)." Modified to: We investigate the error that we introduce by assuming a single bulk density in the conversion from TWT to snow depth for an entire GPR transect. Hence, we determine the spatial variability in density within the respective area. Table 2 presents snow-pit data from our three study sites and two additional sites.
-L243: above average -> above-average (idem below-average) -L327: larger scale -> larger-scale -L341: north to south direction -> north-to-south direction We have not yet inserted dashes for all such phrases. Such details are very specific and treated differently depending on the style and language of each journal. If applicable such phrases will be corrected within the final editing phase with the journal directly.
- Figure 4: consider inverting the color scale. Blue = low accumulation, yellow is high accumulation Here, we respectfully disagree. We consider it being more intuitive to have high accumulation associated with blue color and low accumulation with yellow.
-L293: awkward construction. Rephrase Modified to: The unfiltered data, however, show a decreased representativeness with p=0.89 in 2015/16 and p=0.77 in 2016/17 for the same uncertainty range of ±10\%. Here snow depth is solely derived from the snow pit. Such values demonstrate that bs data derived simply from a snow pit without averaging snow depth around the pit location will decrease the area-wide representativeness at Dye-2.
-L316: Not all of the collected radar transect patterns (grids?) ... Sentence has been changed to: Not all of the recorded radar transect grids are ideal for the applied geostatistical analyses.
-L347: this sentence is not complete The whole paragraph has been modified as suggested by referee#1. KAN-U is no longer used for this analysis.
To quantitatively assess agreement in accumulation patterns, we used the respective normalized accumulation data and calculated the quotient. The cumulative data distribution of the quotients is presented in Figure 8. A constant area-wide quotient of 1 would imply that the scaled accumulation patterns are exactly equal. For Dye-2, the probability of data being equally distributed in May 2016 and 2017 with a given uncertainty of ±10\% is p≥0.95, meaning all intersecting locations of the accumulation pattern in two consecutive years at Dye-2 are similar.
-L408: The conclusion about persistence is unsatisfactory, and you seem to be shifting goal posts in the manuscript. In the abstract you write that interannual accumulation patterns "are very persistent". In section 3.3, the 2016 and 2017 data are "very similar". Then in L408 you say that "results suggest persistence". I think you should refrain at all from inferring persistence based on two data points. It's ok to mention that the patterns were similar in both 2016 and 2017, but I don't think there is enough argument here to start discussing persistence.
We fully agree and weakened consistently throughout the manuscript pattern persistence to changes in accumulation pattern for 2016 and 2017 or agreement in accumulation patterns. In the conclusion, we state: Our results suggest that there is only little change of accumulation patterns at Dye-2 for spring 2016 and 2017.
1-4 km 2 and its temporal changes within two years of measurements. At three different sites of the southwestern GrIS (Swiss Camp, KAN-U, Dye-2), we performed extensive ground-penetrating radar (GPR) transects and recorded multiple snow pits. If the density is known and the snowpack dry, radar-measured two-way travel time can be converted to snow depth and b s . We spatially filtered GPR transect data to remove small scale noise related to surface characteristics.
The combined uncertainty of b s from density variations and spatial filtering of radar transects is at 7-8% per regional 10 scale of 1-4 km 2 . Snow accumulation from a randomly selected snow pit is very likely representative of the regional scale of 1-4 km 2 (with probability p = 0.8 for a value within 10% of the regional mean for KAN-U, and p > 0.95 for Swiss Camp and Dye-2). However, to achieve such high representativeness of snow pits, it is required to determine the average snow depth within the vicinity of the pits. At Dye-2, the spatial pattern of snow accumulation was very similar for two consecutive years. Using target reflectors placed at respective end-of-summer-melt horizons, we additionally 15 investigated the occurrences of lateral redistribution within one melt season. We found no evidence of lateral flow of meltwater in the current climate at Dye-2. Such studies of spatial representativeness and temporal changes in accumulation are necessary to assess uncertainties of the linkages of point measurements and regional scale data, which are used for validation and calibration of remote sensing data and RCM outputs.

Introduction
Numerous recent studies have documented a continuous mass loss from the Greenland ice sheet (GrIS) using remote sensing data and/or estimates from model simulations (e.g., Shepherd et al., 2012;Velicogna et al., 2014;Khan et al., 2015;van den Broeke et al., 2016;Sørensen et al., 2018;Mouginot et al., 2019). From 1980 to 2018, mass loss from the GrIS increased by a factor of six (Mouginot et al., 2019), and over the last two decades the major mass loss process has changed from solid ice 25 discharge to surface mass balance (SMB) (Enderlin et al., 2014;van den Broeke et al., 2016;Mottram et al., 2019). SMB can be regarded as the sum of snow accumulation (b s ) and lateral redistribution by sublimation, wind and runoff. Depending on the location, lateral redistribution can increase SMB as well as decrease it. Over most of the GrIS, net accumulation is the dominating factor for SMB (Koenig et al., 2016), while recent negative trends in SMB are related to surface melt and runoff (Vaughan et al., 2013). Despite of all advances, SMB estimates remain a major source of uncertainty in ice-sheet mass-balance 30 calculations (van den Broeke et al., 2009). This is because surface mass fluxes, such as snowfall and melt, cannot be measured by remote-sensing technology and derived estimates on snowfall can still have significant errors (Bennartz et al., 2019). Hence, predictions of SMB are usually obtained using scarce in situ measurements together with regional climate models (RCMs), which can introduce significant uncertainties (Vernon et al., 2013) as well. Different scales between in situ observations and simulations may also contribute to these uncertainties. The spatial resolution of RCMs and remote sens-35 ing data are limited to regional scales (on the order of one to tens of square kilometers), while in situ observations cover point data (on the order of a few square meters or less). Effects of wind redistribution, for instance, are leveled out for regional scales but can have significant influences at point scales. As a consequence, evaluation and validation of regional-scale data products using in situ data is difficult without knowledge of the spatial extent and representativeness of the point measurements. To date, only a few studies have investigated how representative point observations (e.g., snow pits, firn cores, mass-balance-stake 40 readings, automatic weather station [AWS] measurements) are of the surrounding several square kilometers.
Within the last decade several studies have used radar systems to quantify accumulation variability in Greenland by tracking internal reflection horizons (IRHs) (e.g., Dunse et al., 2008;Miège et al., 2013;Hawley et al., 2014;Karlsson et al., 2016;Koenig et al., 2016;Lewis et al., 2017Lewis et al., , 2019. While those studies aimed to track IRH variability using data from long ground transects of roughly 100 km (Miège et al., 2013) to more than 1000 km (Hawley et al., 2014) length or using airborne radar to calculate accumulation rates with a stated uncertainty of 14%, and they compared their results to outputs from an RCM.
They compare radar-derived accumulation to two sites with core data, but the locations of those sites are up to 8 km away from the radar track. Hence, it is not possible to identify whether mismatch between the core-and radar-derived accumulations 50 is due to spatial variability or to assumptions in radar-data processing. Systematic offsets in b s between radar data and RCM outputs, however, occur in northern Greenland with discrepancies between RCMs and radar up to 30% (Karlsson, personal communication). Other recent studies attempt to relate point observations of melt events within the percolation zone of the GrIS with annual atmospheric patterns (Graeter et al., 2018) or determine the mass of percolating liquid water and compare percolation depths observed by upward-looking radar (upGPR) with temperature records in snow and firn (Heilig et al., 2018).

55
In addition, several studies have quantified temporal accumulation variability using ice core records (e.g., Mosley-Thompson et al., 2001;Vandecrux et al., 2019). Since quantification of spatial representativeness of single point measurements for the surrounding square kilometers has only been conducted for one location in western Greenland so far (Dunse et al., 2008), there is a need to explore uncertainties at local and regional scales. The best means of resolving these uncertainties are to increase the spatial coverage of direct measurements (Farinotti et al., 2014) and to improve our understanding of how well 60 point measurements represent a larger area.
Point observations, such as snow pits and ice cores are usually performed once a year at most. Such temporal snapshots limit the evaluation of spatial representativeness as they can be influenced by recent weather conditions. Hence, it is necessary to clarify whether regional accumulation patterns are consistent over more than one accumulation season to investigate if temporally continuous point measurements such as AWS data, upGPR and neutron probes remain 65 representative.
Meltwater percolation can move mass from snow to the underlying firn (e.g., Charalampidis et al., 2016;Humphrey et al., 2012;Heilig et al., 2018) or even laterally along the surface slope (Humphrey et al., 2012). Hence, surface melt affects SMB (e.g., Sasgen et al., 2012) and accumulation (Heilig et al., 2018). However, it is unlikely that water percolation and mass redistribution are homogeneous over regional scales. Consequently, it is necessary to assess the impact of 70 melt on temporal changes in accumulation distribution for the percolation zone of the GrIS.
The aim of this work is to relate point scales to regional scales of one to several square kilometers in area to improve our understanding of the representativeness of point measurements. For this purpose, we examine snow-pit and groundpenetrating radar (GPR) data from two sites within the percolation zone of the GrIS and one site at the equilibrium line gathered over several field seasons. For each site, we investigate density variability between measurements from up 75 to six snow pits within an area of 4 km 2 made in a single season, process radar transects of up to 25 km recorded in close proximity to those snow pits, and spatially extrapolate the radar-derived accumulation to estimate area-wide accumulation variability. For temporal comparisons, we use continuous observations of accumulation and melt recorded by upGPR (Heilig et al., 2018). Our results show that spatial representativeness of snow accumulation for a point measurement (snow pit) is large but values can be affected by local wind-induced surface roughness. We recommend to 80 apply multiple snow depth measurements at the vicinity of the pits to better assess accumulation on regional scales.

Test site, instrumentation and data processing
We collected radar data along transects at three different locations on the southwestern GrIS over several years (Figure 1, Table   1). The sites were visited in spring of each year (see Table 1). At Swiss Camp a small transect was measured in May 2015 All recorded radar traces were processed in a very similar way. In case first arrivals were delayed by more than approximately 2 ns, we started with a correction for the DC shift. We corrected offsets in the zero line of each radar trace (wow) utilizing a dewow function and filtered low (approximately below 0.5 times the center frequency) and high frequency noise (approx. above 1.5 times the center frequency) applying bandpass filters. We further applied background removals to minimize disturbing 100 effects from the direct wave and antenna ringing. For all radar transects, we corrected for divergence losses by gain functions and interpolated to equidistant traces. The zero-crossings of the snow surface reflections were corrected to be at time zero.
The measured quantity of radar transects is the two-way travel time (TWT with mathematical symbol τ ) from the transmitter to the reflector and back to the antennas (e.g., Heilig et al., 2018). In dry snow and firn (with two contributing volume fractions θ a + θ i = 1), the wave propagation depends solely on the relation of air (θ a ) to ice volume fraction (θ i ) (e.g., Kovacs et al., 105 1995;Mätzler, 1996). Hence, with the bulk snow density (ρ s , the average density of the entire snow column) measured in snow pits, we can convert from TWT to snow depth (L s ) and the amounts of bulk accumulation (b s with unit kg/m 2 ) using the equation The ice density (ρ i = 917 kg/m 3 ), the exponent β = 0.5 (related to a medium with random orientation at the micro scale), the speed of light in vacuum (c) and the relative dielectric permittivity of ice (ε i = 3.18) are constants taken from previous literature (e.g., Heilig et al., 2018). The reflections of the previous end-of-melt-season (EMS) horizons are clearly detectable in all radargrams. We relate internal reflecting horizons (IRHs) to depths at pit locations using the measured bulk snow density However, before applying a constant ρ s over the entire length of the radar transects, one has to investigate the spatial 120 heterogeneity in ρ s over an area of comparable size. To accomplish this, we dug several snow pits at Dye-2 in May 2015 and  Table 2 for details). The snow pits were dug at various distances from each other, at maximum up to 1 km apart. In addition to locations where we collected radar data, we also investigated spatial variability in ρ s at two more sites, EKT and NASA-SE ( Figure 1). As these two sites are located 125 within a distance of 45-60 km of the GrIS ice divide (W of the divide -EKT and E of the divide -NASA SE, see Figure   1), they extent our data analysis of spatial variability of ρ s to the dry-snow zone. The recorded pits at NASA SE provide data for a high accumulation site as well. Table 2

Transect data analysis
The measured TWT of the GPR data are influenced by small-scale surface roughnesses and vertical time sampling. Windinduced surface features, such as sastrugi, appear in 2-D radar transects as discontinuous, erratic noise. Ideally, we would have performed radar surveys on high-resolution grids (i.e. with spacing smaller than the characteristic length of the features) to spatially extrapolate such features to the non-surveyed areas. However, it was not possible to conduct such high-resolution 140 6 surveys in the one to two days available at our sites. Instead, we apply spatial smoothing to minimize artifacts from vertical sampling and to remove wind-induced surface-feature noise.
The time sampling of the recorded GPR transects ranges from 0.05 ns per sample (Swiss Camp 2015) to 0.24 ns per sample (Dye-2 2017), corresponding to approximately 0.006 m and 0.028 m per sample respectively. For the longer transects at KAN-U and Dye-2 (Table 1), the vertical sampling is always coarser than 0.1 ns/sample. As displayed in Figure 2, the raw radar 145 data for these transects are continuously fluctuating by ±1 sample (corresponding to roughly ±3 cm). Such effects are caused by amplitude clipping of the signal response and uncertainties of the zero-crossing as consequence of the vertical sampling.
For each radar trace, we consistently picked the first strong positive half cycle and shifted the first break upwards to match the zero-crossing. However, due to a vertical sample intervals of 0.25 ns, it is likely that the strongest amplitudes shift by 1-2 samples for consecutive radar traces. To reduce effects caused by the amplitude shifts, in our (lower resolution)  Dye-2 data, we applied a Savitzky-Golay filter (Savitzky and Golay, 1964)  with large scale GPR transects as well (e.g., Lewis et al., 2019). We use the smoothed data for spatial extrapolation.

Spatial extrapolation
In order to analyze accumulation patterns over a larger area, it is necessary to extrapolate the data gathered along the radar transects. One radar trace provides a single depth estimate to a specific reflector. Combining GPR-derived snow accumulation transects with geostatistical techniques is a powerful method to model spatial occurrences of continu-170 ous subsurface features. Similar combinations of geophysical and stochastical techniques have been used in previous research (e.g., Rea and Knight, 1998;Tercier et al., 2000). The benefit of radar data is that numerous data pairs for a wide range of measurement distances are recorded enabling more constrained experimental variograms. Webster and Oliver (2007) state that sample size is directly related to the precision of variogram estimates, while variograms are used to estimate the variance of a parameter (here snow accumulation) at increasing intervals of distance in be-175 tween measurements and in multiple directions. Before spatial extrapolation of a data parameter, the data must fulfill several prerequisites: data have to be spatially continuous and spatially correlated within a specific distance and the expected mean and variance of the data should be invariant in space (e.g., Rea and Knight, 1998). We used experimental variograms to investigate spatial correlation and snow accumulation at the surveyed sites is spatially continuous (accumulation occurred everywhere within the area of interest, governed by local weather conditions). To ensure that 180 mean and variance are invariant, we investigated trends in X-and Y-direction separately and subtracted these trends before further analysis. At DYE-2 and KAN-U, we discovered accumulations trends in both, X-and Y-directions, over the distances surveyed, while at Swiss Camp, we found a simple one-dimensional trend.
For spatial extrapolation of the univariant parameter snow accumulation, we use ordinary kriging, which is the most robust and most commonly used method (Webster and Oliver, 2007). Ordinary kriging requires normal distribu-185 tion of the data. Figure 3 displays the probability distributions of all five radar transects. If the distribution (plotted crosses) follows the straight line, the data are normally distributed. At least 10-80% of data match normality for all five GPR transects, and, consequently, no data transformation is applied. We used the Geostatistical Analyst toolbox in ArcGIS10.4.1 to perform the kriging.
After trend removal, the next step in ordinary kriging is to simulate variograms, which adequately mimic the cal-  We present such accuracy assessments in Table 3. In addition, we found directional anisotropy of the covariance in all of the longer transects, which means that correlation ranges of accumulation vary with direction. Hence, we modeled variograms with different correlation ranges per direction. The correlation range marks the limit in distance of point pairs 195 for being spatially dependent. Major and minor axis of the correlation range ellipsoid used for the variogram modeling are given in Table 3 as well. Swiss Camp is an exception and can be modeled simply by an isotropic variogram. At Dye-2, a spherical variogram model provided highest prediction accuracies while at KAN-U and Swiss Camp, the usage of stable variogram modeling resulted in lowest mean prediction errors and best RMS standardized prediction offsets.
The presented correlation ranges in Table 3 represent the direction-wise major extrapolation range. Nugget effects (de- To assess the distribution and spatial representativeness of the data, we calculate normalized accumulation values (b s,N ) and normalized cumulative probability distributions. Normalized accumulation is computed such that the individual kriged accumulation value (b s ) is divided by the mean kriged accumulation per site and campaign b s : b s,N = bs bs .

205
In Figures 4b, 6c and 7c, data distributions of b s,N are displayed as box plots with the whiskers set to the 5% and 95% percentiles respectively. Using the recorded radar traces, we determine whether any randomly located point measurement Table 3. Kriging results with description of correlation ranges for the major and minor axis used in the variogram modeling, the resulting mean prediction error (pred. err.) and the root mean square (RMS) standardized prediction error. such as a snow pit is representative of the entire extrapolated area. We average all radar traces within a radius of 1 m around each radar trace (which represents a standard pit size) and scale this data point by the mean of the kriged output for the same campaign. Data distribution for each campaign including filtered and sampling-corrected data (see Section 2.2) are 210 presented to describe offset dependencies. At KAN-U for the 2012/13 data, we increase the assumed pit size to an area with 2 m radius because of more sparse horizontal data resolution (1.5 m in between traces). Corner locations of radar transects with less than four (three for KAN-U 2012/13) neighboring traces within the respective search radius are excluded.

Results and Discussion
We first discuss errors associated with converting measured TWT to accumulation because understanding these errors is es-215 sential for assessing how representative a single point observation, such as a snow pit, is of a larger area; we present that assessment in Section 3.2. We then evaluate whether accumulation-patterns over two consecutive years at Dye-2 are different.
Finally, we investigate how accumulation changes due to melt and liquid-water percolation. Such effects could be caused by strong lateral differences in melt or lateral flow of meltwater. In the following, to distinguish between offsets, deviations from mean and data distribution, we will describe offsets, deviations and uncertainties of b s values in percentage (%) and data 220 distribution as probability values of 0-1.

Error in travel time to accumulation conversion
We investigate the error that we introduce by assuming a single bulk density in the conversion from TWT to snow depth for an entire GPR transect. Hence, we determine the spatial variability in density within the respective area. Table 2 presents snow-pit data from our three study sites and two additional sites. The data were collected over three years, and the distances 225 between pits ranged from a few meters up to 1 km, while snow depths ranged from 0.83 m to 1.70 m. The inclusion of two more sites close to the southern Greenland ice divide extents the data set to a low accumulation site west of the ice divide (EKT: b s ∼ 300 kg/m 2 ) and a high accumulation sites east of the divide (NASA SE: b s ∼ 600 kg/m 2 ). The range in density variation fromρ in Table 2-independent of distances in between pits-does not exceed −6 to +5% for nine snow pit campaigns in total, at five different locations for the southern GrIS. Calculated range averages for the last column in Table 2 are 230 −3.7 to +3.1%. We thus consider ±5% variation in average density to be a robust and conservative estimator of uncertainty within areas of several square kilometers for these regions. This corresponds well with observations by Proksch et al. (2016), who derived a mean measurement uncertainty for density of 2-5%.
Uncertainty in ρ s results in only a small uncertainty in the derived L s : ρ s factors into the conversion of τ to L s as a fraction within the denominator (Equation 2). For our measured TWTs, a ±5% variation in ρ s leads to a 0.7-1.4% uncertainty in L s for 235 bulk ρ s values of 200-450 kg/m 3 . Additional uncertainty in L s is introduced by the smoothing applied to the larger transects.
The average RMS deviation in snow depth of the smoothed transects from the sample-corrected transects at Dye-2 and KAN-U is 4.5 cm (5-6%). Combining the errors due to smoothing of radar traces and using a mean density for processing radar transects with observed ρ s variations using Equation 1 leads to an average uncertainty in b s of 7.0-7.9%. This uncertainty is significantly smaller than discrepancies between RCM simulations and Operation IceBridge airborne radar determinations (16%) (Koenig 240 et al., 2016) and smaller than measured relative standard deviations in density observed within the same study (12%). However, to increase the robustness of accumulation estimates and to decrease effects of spatial extrapolation, we consider an estimated maximum uncertainty of 10% in b s determined from radar data as a conservative estimate for regional catchments of size of 1-5 km 2 .

Dye-2 and KAN-U
For the much longer radar transects at Dye-2 and KAN-U, we filtered out wind-induced surface variabilities of the radar traces 280 to increase spatial extrapolation with enlarged variogram ranges from 10 -30 m to 50 -270 m (Table 3). Such filtering implies spatial smoothing of surface roughnesses, which could be performed in the field by extensive snow-depth probings. Later in this section, we present comparisons for spatial representativeness of filtered and non-filtered GPR data. In 2016 and 2017, the radar transects were designed to follow the prevailing wind direction to better assess systematic inhomogeneities for Dye-2 and KAN-U in 2017 (see Figures 6, 7 and A1b and c). The box plots in Figure 6c represent the same quantiles as in Figure 4b. Data distribution for Dye-2 in 2015/16 is very 290 homogeneous with an IQR of only ±2.5%. The whiskers for the same year reach ±6%. Hence, b s in May 2015/16 varies only little with p > 0.9 of data within the uncertainty margins of ±10%. Since already more than 95% of radar-derived b s follow a normal distribution (Figure 3), values of extrapolated b s have a high distribution symmetry as well. We observe slightly less homogeneity in the subsequent year at Dye-2. Here, the IQR increases to ±3%, with 5% and 95% percentiles being slightly of p = 0.9 that all extrapolated 20 m by 20 m pixels range from 275-311 kg/m 2 . In May 2017, extrapolated b s values for an area of 4 km 2 are at 266-326 kg/m 2 with a likelihood of p > 0.9.
The normalized cumulative probability distributions in Figure 6d demonstrate how representative a randomly located snow pit would be for the entire surveyed area. We analyzed both, the sample resolution corrected radar data (dotted lines) and the filtered data (solid lines, see Section 2.2). The filtered data in Figure 6d indicate that b s measured in a snow pit anywhere Not all of the recorded radar transect grids are ideal for the applied geostatistical analyses. The distances between radar lines at Dye-2 and KAN-U in May 2017 are too large to allow interpolation between the lines. We had limited time available for radar surveys, and we chose to focus on surveying larger areas (up to 20 km 2 ) instead of only surveying dense grids. The results because no such inhomogeneities exist within the areas of good spatial coverage.
The above results imply that a point measurement of b s (snow pit, upGPR value, neutron probe, etc.) is representative for an area of roughly 4x4 km 2 at Dye-2 with a probability of p ≥ 0.9 and an uncertainty of ±10% in case snow depth is averaged. For KAN-U, the spatial variability is slightly higher and, consequently, there is less certainty about how well a single measurement represents the surrounding area. However, we consider a probability of p ≥ 0.8 with uncertainty of ±10% for both study sites 335 as a resilient estimate.
To quantitatively assess the benefit of snow depth measurements in addition to a snow pit, we numerically assume a sinusoidal snow depth variation with wavelengths of 56 m (arithmetic mean of the previously presented range in wavelength for the GPR transects) and average amplitude of ±6.8 cm ( derived from geostatistical extrapolation. Hence, we recommend to combine a larger number of snow-depth probings within an area of at least 20 m by 20 m in the vicinity of the pits to increase the regional representativeness. Regional 350 snow density variations of ±5% can be accepted if snow depth uncertainty is minimized. Snow probing lines can easily be performed with respectively low time consumption compared to multiple snow pits. In particular, the wind-induced surface roughness has to be accounted for to provide spatially-representative b s values.
Averaging radar traces within 1 m radius results in a pit size of roughly 3 m 2 . This is slightly too big for conventional pits with on average 1 m snow depth. However, the search radius is related to the horizontal data resolution of the radar traces and 355 had to be further increased for the KAN-U site in 2012/13.

Interannual changes in accumulation patterns
At KAN-U only 0.16 km 2 were covered during both radar acquisitions and, consequently, we do not investigate changes in accumulation for spring 2013 and 2017. For Dye-2, we recorded radar transects for two consecutive winter accumulation seasons. However, multi-year intersecting radar transects and, hence, spatially-consistent area-wide b s estimates are reduced.

360
The intersecting area at Dye-2 comprises roughly 1.7 km 2 . Here, we observe a slight trend in the north -south direction for both accumulation seasons (Figure 6a and b). While the most southerly parts of the transect show above area-wide average b s values, the northern fringes are below the arithmetic mean of the area in b s . However, for both years the trends (in north to south by the grid design and the applied kriging. Local maximums in regular distances (150 -220 m) occur along the transect line, however, the spatial extrapolation of these features is impossible due to the applied radar grid.
To quantitatively assess agreement in accumulation patterns, we used the respective normalized accumulation data and calculated the quotient. The cumulative data distribution of the quotients is presented in Figure 8. A constant area-wide quotient  the upGPR site (Heilig et al., 2018). It is likely that the seasonal mass flux is not homogeneous over the investigated area. In addition, the increased variability is in part due to mismatches in co-locating transects due to the ice movement. However, the 395 mean change in b s during summer 2016 corresponds almost exactly with observations derived from the upGPR (Heilig et al., 2018), which is 50.9 kg/m 2 from 01 May 2016 until the end of the melting period. This may be a coincidence or a confirmation of the benefits of upGPR, which averages a surface area of up to 10 m 2 compared to 1-3 m 2 area of a snow pit.
We cannot identify trends in b s over the summer melt in 2016 associated with elevation; there are large differences within the same elevation band (Figure 9a). This implies that (i) no lateral redistribution of mass can be observed at Dye-2 during 400 snow and firn melt and (ii) that melt and seasonal mass fluxes are much more inhomogeneous than accumulation distribution.
These conclusions support the assumption made by Heilig et al. (2018) that in the current climate there is no systematic lateral mass redistribution during the melt season at Dye-2. We also measured b s in snow pits near the upGPR at Dye-2 in May 2018 and 2019. Although accumulation measured in May 2016 and May 2017 was very similar, the 2018 and 2019 data deviate strongly (Table 4). In 2018, b s was more than 20% 405 higher than in the previous two accumulation seasons. The accumulation measured in May 2019 was the lowest of the four years by a significant margin: 40% lower than the previous season and 23% lower than the next-lowest season (2017). This interannual accumulation variability is larger than the ±10% uncertainty in how well a b s point measurement can be derived from radar data and usually represents the surrounding area. In agreement with Koenig et al. (2016), we conclude that annual or more frequent density and b s observations are necessary to estimate mean accumulation rates per region correctly. When snow 410 depth is measured and averaged over an area of roughly 20 × 20 m 2 , the value provides a reliable estimate of accumulation on regional scales of 1-20 km 2 . Such data can be used for airborne radar campaigns and for validation of RCM simulations.

Conclusions
This study investigated how representative single point observations of b s , such as snow pits, are for the surrounding 400 m 2 to 4 km 2 large areas. We used GPR to track IRHs created by summer melt surfaces along transects at three sites on the south-415 western GrIS over the course of several field seasons. We derived maps of snow accumulation variability and compared them to snow pit and upGPR measurements. We found an uncertainty in radar-derived accumulation of 7-8%, which results from neglecting density variations along the radar transect and from applying a smoothing algorithm to minimize surface variability and layer-picking errors. In addition, we investigated the persistence of spatial patterns in accumulation over consecutive years and the influence of melt on an annual firn layer.

420
At all three sites, we found that point measurements such as snow pits represent the average b s well over the study areas.
A randomly selected snow pit location at any of the three sites would provide b s values for the surrounding area (i.e. within May 2016). These likelihoods are independent of the size of investigated areas. However, not measuring and averaging snow depth over an area of at least 20 × 20 m decreases the probability of hitting arithmetic means by at least 10%. Snow-density variability is usually below ±5% on regional scales (1-4 km 2 ), while snow depth can vary significantly because of surface features such as dunes and sastrugi with various wavelengths ranging from submeters up to 60 m and more.
Our results suggest that there is only little change in accumulation patterns at Dye-2 for spring 2016 and 2017. However, the data only span two consecutive accumulation seasons that were almost identical in average density and accumulation. As such, we cannot confirm whether such persistence might be observed in seasons with significantly more or less accumulation 430 or at different sites; this is a topic for future work.
We also investigated the mass change that an accumulation layer (end of melt season to May) undergoes during the summer melt season using the GPR-transect data and continuous melt and accumulation observations from upGPR. We conclude that temporal changes in firn layer mass detected by the upGPR are representative of larger (∼1 km 2 ) areas at Dye-2. We did not detect any patterns in summer melt along flowlines, suggesting that lateral meltwater flow at Dye-2 is not significantly Figure A1. Prevailing wind distribution at Swiss Camp (a), Dye-2 (b) and KAN-U (c).