Comment on tc-2021-94

The acronym SMB is defined in the abstract, but the acronym UAV not. Define both in the abstract or introduction. L68: Maybe use a gender-neutral term such as “Unoccupied Aerial Vehicle (UAV)” that have been introduced recently (Joyce et al. 2021) and become more and more popular in the community. L69-72: Maybe mention here that repeated UAV surveys have been conducted before by other groups in different regions to derive surface velocities (e.g. Kraaijenbrink et al., 2016; Benoit et al., 2019) or to compare surface elevation change with ablation stake measurements (e.g. Groos et al., 2019*), emphasising the need for or potential of a transferable method to use such topographic data to determine the SMB distribution. Figure 1: It’s a personal preference, but I think for international readers geographic coordinates (LatLon) would be more informative. L117-119: Where no surveys performed in 2017 or was the photogrammetric processing not successful? Other studies have shown that the SFM-technique in principle also works for snow-covered areas (e.g. Bühler et al., 2016). Or was it impossible to distribute GCPs under this circumstnaces? L126: Can you estimate the uncertainty? See general comment. Table 1: Could you provide some more information here: e.g. flight dates, range of height above ground level, no. of images acquired, size of surveyed area... L134: Can you include the position of the GCPs (at least exemplarily for one year) in one of the overview maps, in Fig. 2? L141: In case of the 2020 surveys, were the GCPs only used for validation? Is it realistic that the P4RTK system is more accurate than the Trimble 7 GeoXH RTK GPS? Any comparative tests on stable terrain? L148: Can you provide the total number of ablation stakes rather than the number of measurements? Figure 2: see comment Fig. 1 and L134. Figure 3: The two reviewers already commented on that. Why is the average of both datasets the best choice? Would there be any arguments for using one over the other. Anyway, the sensitivity analysis is appreciated. Would it be possible to include the conducted radar measurements pathways in panel a and panel b? It would be helpful to use the empty lower right panel to include a difference map of THIZ and THIL to highlight areas of good agreement and areas with larger uncertainties. L232: because => because L248: Why did you choose the old Swiss Grid (CH1903 LV03) rather than the new one (CH1903+ LV95)? L250: How many of the GCPs were used as GVPs? It would be fair to provide some more details and, if possible, indicate them in one of the maps (e.g. in Fig. 2). L252: Sometimes you use 5-10 cm and sometimes 0.05-0.10 m. Try to be consistent. L431-432: The stated MAE defines the accuracy of a DSM relatively to the used GCPs, but it does not tell you anything about the “absolute” accuracy. This can only be assessed by considering data from “stable terrain” outside the glacierised area. The vertical accuracy of the Trimble 7 GeoXH RTK GPS was stated to be in the order of 20-30 cm, so it is likely that the difference between DSMs from different acquisition dates is larger than the stated MAE in Table 2. Table 2: Does the GCP density also include the points used as GVPs? L453-457: Did you place GCPs in the relatively steep area? If not, do you think the observed positive surface elevation changes between 2019 and 2020 could be the result of inaccuracies of the DSMs (especially at the margin of your study area) rather than a mass gain related to increased avalanche activity? It’s not necessarily the case here, but DSMs are prone to large-scale distortions (e.g. warping) if no GCPs are distributed at the margin of the study area (e.g. James and Robson, 2014; Groos et al., 2019*). I would suggest to include a discussion section to elaborate on the implications of your study and recommendations for future work. Regarding the transferability of the presented approach, it would be interesting to discuss the uncertainties related to the use of modelled ice thickness data (e.g. Farinotti et al., 2021) when applying your method to determine spatial SMB variations of glaciers in data-scarce regions. Are there any limitations or challenges that should be considered when applying this method to mountain glaciers with a different setting (e.g. varying geometry, varying surface velocities, varying debris cover extent, presence of ponds and ice cliffs). Moreover, glaciers, for which multiannual high-resolution topographic data from repeated UAV surveys already exist (e.g. Kraaijenbrink et al., 2016; Benoit et al., 2019 Groos et al., 2019*), could be briefly mentioned as potential sites for the further testing of the presented method. Recommendations regarding best practices for the implementation of UAV-surveys in mountainous terrain could also be included here.

study, the authors investigate the potential of surface elevation change and surface velocity data obtained from repeated UAV surveys in combination with ice thickness data to produce high-resolution ice flux divergence and surface mass balance maps of the ablation zone of alpine valley glaciers. This contribution is very welcome as it introduces an approach to determine spatial variations of the glacier mass balance, which can hardly be assessed by stake measurement alone. In view of the ever increasing number of highresolution topographic data, UAV-based surface mass balance investigations seem to be a viable complementation to glaciological mass balance observations. Moreover, the described method might also be useful to investigate the surface mass balance of glaciers that are difficult to access. Whether the UAV-based approach (especially when relying on GCPs rather than on RTK) is less time-consuming than classical ablation stake measurements is, however, questionable.
The manuscript is well-structured and the methods are clearly described. However, I have some remarks and questions, mainly concerning the presented UAV data. I have tried not to repeat the reviewers' comments, but there may still be some overlaps.

General comments:
To my knowledge, the high-resolution UAV-based topographic datasets of the ablation area of the Morteratsch-Pers glacier complex have not been presented elsewhere before. Therefore, I would have expected a more rigorous accuracy assessment and a more detailed description of the datasets, although I don't doubt that the datasets are generally of high quality.
-Did you experience any difficulties or problems during the areal surveys that should be considered by other groups when applying this method in the future? -On which days were the aerial surveys performed. Did the illumination conditions (e.g. cloud cover) change during the 4-6 days field work period and did this affect the image processing?
-Can you estimate the melt rate and surface lowering during these days? If 4 to 6 days passed between the aerial surveys and the melt rate was in the order of 3-4 cm day⁻¹, this would translate into surface an elevation change in the order of 12 to 24 cm (if ice flow is ignored). Did this affect the image processing and the generated digital surface models in any way? -How were the GCPs distributed across the ablation area? Could you include the position of the GCPs in one of the overview maps (e.g. Fig. 2), at least exemplarily for one year?
-How many of the GCPs were used for "calibration" and how many for "validation" (GVPs)? Better distinguish betweeen GCPs and GVPs from the beginning.
-I think the major drawback is that the "stable terrain" outside the glacier area was not considered to assess the accuracy of the DMSs. You mentioned that the vertical accuracy of the Trimble 7 GeoXH RTK GPS used to measure the GCPs is in the order of 20-30 cm. As stated in Table 2, the mean absolute error (MAE) of each DSM is less than 10 cm. This means that the DSM are self-consistent and very accurate (at least relative to the considered GCPs), but the MAE does not tell you anything about the xyz-offset between the DSMs of the different years. Therefore, I would suggest to compare the DSMs over stable terrain (in case you covered such an area during your surveys).
-In 2020, you used a UAV with RTK. I assume that in this case you considered the distributed GCPs only for validation. Is this correct?
Minor comments: Abstract: The acronym SMB is defined in the abstract, but the acronym UAV not. Define both in the abstract or introduction.
L68: Maybe use a gender-neutral term such as "Unoccupied Aerial Vehicle (UAV)" that have been introduced recently (Joyce et al. 2021) and become more and more popular in the community.
L69-72: Maybe mention here that repeated UAV surveys have been conducted before by other groups in different regions to derive surface velocities (e.g. Kraaijenbrink et al., 2016;Benoit et al., 2019) or to compare surface elevation change with ablation stake measurements (e.g. Groos et al., 2019*), emphasising the need for or potential of a transferable method to use such topographic data to determine the SMB distribution. Figure 1: It's a personal preference, but I think for international readers geographic coordinates (LatLon) would be more informative.
L117-119: Where no surveys performed in 2017 or was the photogrammetric processing not successful? Other studies have shown that the SFM-technique in principle also works for snow-covered areas (e.g. Bühler et al., 2016). Or was it impossible to distribute GCPs under this circumstnaces? L126: Can you estimate the uncertainty? See general comment. Table 1: Could you provide some more information here: e.g. flight dates, range of height above ground level, no. of images acquired, size of surveyed area... L134: Can you include the position of the GCPs (at least exemplarily for one year) in one of the overview maps, in Fig. 2? L141: In case of the 2020 surveys, were the GCPs only used for validation? Is it realistic that the P4RTK system is more accurate than the Trimble 7 GeoXH RTK GPS? Any comparative tests on stable terrain? L148: Can you provide the total number of ablation stakes rather than the number of measurements? Figure 2: see comment Fig. 1 and L134. Figure 3: The two reviewers already commented on that. Why is the average of both datasets the best choice? Would there be any arguments for using one over the other. Anyway, the sensitivity analysis is appreciated. Would it be possible to include the conducted radar measurements pathways in panel a and panel b? It would be helpful to use the empty lower right panel to include a difference map of THIZ and THIL to highlight areas of good agreement and areas with larger uncertainties. L232: because => because L248: Why did you choose the old Swiss Grid (CH1903 LV03) rather than the new one (CH1903+ LV95)?
L250: How many of the GCPs were used as GVPs? It would be fair to provide some more details and, if possible, indicate them in one of the maps (e.g. in Fig. 2).
L252: Sometimes you use 5-10 cm and sometimes 0.05-0.10 m. Try to be consistent. L431-432: The stated MAE defines the accuracy of a DSM relatively to the used GCPs, but it does not tell you anything about the "absolute" accuracy. This can only be assessed by considering data from "stable terrain" outside the glacierised area. The vertical accuracy of the Trimble 7 GeoXH RTK GPS was stated to be in the order of 20-30 cm, so it is likely that the difference between DSMs from different acquisition dates is larger than the stated MAE in Table 2. Table 2: Does the GCP density also include the points used as GVPs?
L453-457: Did you place GCPs in the relatively steep area? If not, do you think the observed positive surface elevation changes between 2019 and 2020 could be the result of inaccuracies of the DSMs (especially at the margin of your study area) rather than a mass gain related to increased avalanche activity? It's not necessarily the case here, but DSMs are prone to large-scale distortions (e.g. warping) if no GCPs are distributed at the margin of the study area (e.g. James and Robson, 2014;Groos et al., 2019*).
I would suggest to include a discussion section to elaborate on the implications of your study and recommendations for future work. Regarding the transferability of the presented approach, it would be interesting to discuss the uncertainties related to the use of modelled ice thickness data (e.g. Farinotti et al., 2021) when applying your method to determine spatial SMB variations of glaciers in data-scarce regions. Are there any limitations or challenges that should be considered when applying this method to mountain glaciers with a different setting (e.g. varying geometry, varying surface velocities, varying debris cover extent, presence of ponds and ice cliffs). Moreover, glaciers, for which multiannual high-resolution topographic data from repeated UAV