Hourly surface meltwater routing for a Greenlandic supraglacial catchment across hillslopes and through a dense topological channel network

Recent work has identified complex perennial supraglacial stream/river networks in areas of the Greenland Ice Sheet (GrIS) ablation zone. Current surface mass balance (SMB) models appear to overestimate meltwater runoff in these networks 20 compared to in-channel measurements of supraglacial discharge. Here, we constrain SMB models using the Hillslope River Routing Model (HRR), a spatially explicit flow routing model used in terrestrial hydrology, in a 63 km 2 supraglacial river catchment in southwest Greenland. HRR conserves water mass and momentum and explicitly accounts for hillslope routing, and we produce hourly flows for nearly 10,000 channels given inputs of an ice surface DEM, a remotely sensed supraglacial channel network, SMB-modelled runoff, and an in situ discharge dataset used for calibration. Model calibration yields a Nash 25 Sutcliffe Efficiency as high as 0.92 and physically realistic parameters. We confirm earlier assertions that SMB runoff exceeds the conserved mass of water routed to match measured flows in this catchment (by 12-59%) and that large channels do not dewater overnight despite a diurnal shutdown of SMB runoff production. We further test hillslope routing and network density controls on channel discharge and conclude that explicitly including hillslope flow and routing runoff through a realistically fine channel network produces the most accurate results. Modelling complex surface water processes is thus both possible and 30 necessary to accurately simulate the timing and magnitude of supraglacial channel flows, and we highlight a need for additional in situ discharge datasets to better calibrate and apply this method elsewhere on the ice sheet.


Introduction
The study of supraglacial streams and rivers atop the Greenland Ice Sheet (GrIS) is an emerging subfield with implications for physical understanding of ice sheet subglacial hydrologic systems, ice motion, and sea level rise (Irvine-Fynn 35 et al., 2011;Rennermalm et al., 2013;Chu, 2014;Flowers, 2018;Pitcher and Smith, 2019). When the GrIS surface melts, meltwater that is not evaporated, stored, or refrozen moves through what is now understood to be a complex perennial hydrologic system with unique and complex hydrologic process distinct from terrestrial hydrology Pitcher and Smith, 2019). Recent advances in mapping (Lampkin and VanDerberg, 2014;Rippin et al., 2015;Smith et al., 2015;Smith et al., 2017;, modelling (Banwell et al., 2012;Banwell et al., 2016;Clason et al., 2015;Karlstrom 40 and Yang, 2016;Yang et al., 2018), and measuring (McGrath et al., 2011;Legleiter et al., 2014;Gleason et al., 2016;Smith et al., 2017) supraglacial channel networks have revealed numerous similarities to terrestrial watersheds, but their scale and remoteness have limited the number of field studies.
This new appreciation for supraglacial hydrologic processes has emerged at a time of increasing accuracy and sophistication of Surface Mass Balance (SMB) modelling of the GrIS. SMB models use regional atmospheric forcing to 45 simulate GrIS surface mass balance components, including the amounts of meltwater production and of liquid water in excess of evaporation and retention/refreezing (termed "runoff") available for hydrologic functions (Fettweis et al., 2020;Vernon et al., 2013). SMB models here refer to any global/regional circulation model (G/RCM) or reanalysis that explicitly simulates ice surface runoff. These models are grid-based and operate at pan-GrIS scales, producing a single runoff value for a given model grid and timestep. Note that the terrestrial hydrology community commonly uses the term 'water excess' to represent the 50 volume of water available for routing after hydrologic processes, while the glaciology community uses the term 'runoff' to represent this same quantity specific to ice sheets. Most existing SMB models do not route this runoff, and instead assume that all runoff not refrozen in snow or firn leaves the ice sheet as soon as it is produced (Fettweis et al., 2020). In reality, observations of the GrIS surface indicate that lake impoundment, flow through weathering crust (e.g. Cooper et al., 2018), and transport through supraglacial stream/river networks modify the timing and magnitude of excess water reaching moulins or the ice sheet 55 edge . Modelling these processes would be precisely analogous to the use of land surface models in terrestrial hydrology, whereby a land surface model (SMB model here) produces gridded water excess (runoff here) and then routes this water with a coupled routing model. Coupling surface water processes to SMB models, loosely or tightly, is thus needed for a fuller representation of GrIS supraglacial hydrology to align this field with practice in terrestrial hydrology (e.g. Bates et al., 1997;Beighley et al., 2009;Wood et al., 2011;Lin et al., 2019) 60 Previous studies have begun to stich these two research avenues together. For example, Banwell et al. (2012) used Darcy's law to describe meltwater flow routing through snow and Manning's equation to describe lateral runoff transport across bare ice, then later used this meltwater to fill supraglacial lakes or supply surface meltwater to moulins (Banwell et al., 2013). Liston and Mernild (2012) applied mass conservation at the grid cell level to route runoff between grid cells, but did not account for the presence of channels that convey this runoff with distinct hydraulics. Smith et al. (2017) attempted to 65 https://doi.org/10.5194/tc-2020-273 Preprint. Discussion started: 16 October 2020 c Author(s) 2020. CC BY 4.0 License. address this channel routing via the classic empirical Snyder Synthetic Unit Hydrograph (SUH) model (Snyder, 1938) to calculate discharge hydrographs for the terminal moulins of 799 internally drained surface catchments in the southwest GrIS. Yang et al. (2018) used a similar classic empirical model, the Rescaled Width Function (RWF, Rinaldo et al., 1995), to partition the ice surface into slow-flowing interfluvial (i.e., hillslope) and fast-flowing (open-channel) zones, and calculated moulin discharge while improving physical realism of the supraglacial routing process. Importantly, Yang et al demonstrated the 70 likelihood of subsurface unsaturated zone flow even through bare glacial ice, a phenomenon confirmed by field , Irvine-Fynn et al., 2011Munro, 2011 and theoretical (Karlstrom and Yang, 2016) studies. Yang et al. (2020) recently compared several of these empirical models and found they introduce significant variability in diurnal moulin discharges and corresponding subglacial effective pressures.
These previous efforts demonstrated successful meltwater transport modelling on the GrIS ablation zone and its 75 necessity, but their relative simplicity allows space for the application of sophisticated routing models from terrestrial hydrology routing models to ice sheet surfaces more generally. For instance, Lin et al (2019) used gridded estimates of water excess (analogous to runoff) at the global scale to simulate daily flows in nearly three million river reaches 1979-2013 with fully conserved mass and momentum in realistic river networks. This undertaking was the first demonstration of this capability at global scale, following years of well-established theoretical work and advances in hydrologic representation for big data. 80 This routing paradigm is suitable for representing GrIS surface water transport processes, as gridded runoff on ice sheets must be routed through supraglacial rivers, lakes, and hillslopes, as on land. Building and calibrating models to route water through landscapes and channel networks while obeying fundamental principles of mass and momentum conservation is an established practice in terrestrial hydrology that may readily be applied to ice sheet surfaces as well.
There are several barriers to explicit routing for the GrIS at the catchment scale. First, routing models require a well-85 defined channel network with explicit and continuous topology. There have been demonstrations of network mapping  and topology generation (King et al., 2016), but to our knowledge no automated, large scale network extraction and topological connection work exists for the GrIS. Existing terrestrial routing models like the Hillslope River Routing model (HRR, Beighley et al., 2009) stand ready to route runoff 'off-the-shelf,' yet these cannot be applied until these issues are overcome. Applying such a model as HRR could also further the science of GrIS river networks, which is currently 90 underdeveloped (Pitcher and Smith, 2019). For instance, the relative importance of hillslope flows and channel density on runoff transport have not been explored on a first-principles basis at network scales, and model parameters controlling hillslope friction, channel friction, and runoff reduction/augmentation could reveal how these physical processes are interacting to produce channel discharges.
In this paper we use HRR to advance physical understanding of GrIS supraglacial meltwater transport processes as 95 follows. 1) We automatically generate spatially explicit topological networks of varying drainage density for a supraglacial catchment for which a brief (72 hr) in situ record of outlet channel discharge is available; 2) We route water runoff generated by four different SMB models through these networks at an hourly timescale; 3) We constrain and calibrate the routing via hourly in situ discharge measurements and previously published field measurements of supraglacial channel frictions and https://doi.org/10.5194/tc-2020-273 Preprint. Discussion started: 16 October 2020 c Author(s) 2020. CC BY 4.0 License.
velocities. Our initial routing results immediately revealed a mismatch between modelled and routed runoff and measured 100 channel flows, so our philosophy for this study is to assume that measured discharge at the outlet is correct and calibrate SMB runoff volumes and channel properties to match discharge observations as mediated through the physics of the routing model. 4) To advance understanding of hillslope processes and channel density on meltwater transport, we design an experiment to test how network density (as derived by our automated network generation process) and representation of hillslope processes affect the routing model. We ultimately route meltwater through thousands of supraglacial channels every hour, and we solve 105 (via conservation of mass and momentum inherent to routing) for the roles of channel friction, hillslope delay, and network density in controlling the magnitude and timing of water fluxes through supraglacial channels and ultimately moulin injection in our test watershed. These procedures and results form a blueprint for general coupling of runoff modelling, water transport, and channel processes atop the GrIS.

Study area and data 110
We develop our routing model for Rio Behar, a previously studied, internally drained supraglacial river catchment in southwest Greenland. First introduced by Smith et al. (2017), the Rio Behar catchment is approximately 63 km 2 and centered at 48.55W and 67.04N with a highly developed perennial and well drained supraglacial stream/river during peak flow periods of late summer. Smith et al (2017) report that the basin elevation spanned approximately 1,200-1,400m in 2015, with air temperatures in the summer measurement period (Section 2.2) ranging from -3 to 2C and net radiation ranging from 115 approximately -100 to 300 W/m 2 . Previous work in the basin includes a comparison of SMB runoff and field measured discharge using a simpler routing method , studies of subsurface water storage in bare-ice weathering crust , albedo mapping , and satellite and un-crewed aerial vehicle (UAV) remote sensing work to map the catchment's supraglacial channel network Yang et al., 2018). Readers are referred to these published works for more information on the physical setting of the basin: we here use the Rio Behar specifically because it is 120 the only known large GrIS supraglacial river catchment with an hourly in situ record of channel discharge at the outlet at the peak of summer melt-season (see Section 2.2). Other discharge records exist, as for instance McGrath et al. (2011) provide hourly discharge records for a small (1.1 km 2 ) catchment while Chandler et al (2013) give hourly moulin (fed by a channel) discharge for another small catchment. However, the size of Rio Behar and the wealth of previous work therein makes it an ideal setting for this study. Using high-resolution remote sensing, the watershed is delineated to an in situ streamflow 125 measurement point (Section 2.2) that defines the outlet and is located less than 1 km upstream of the catchment's terminal moulin. Because all meltwater runoff passing out of our watershed penetrates the ice sheet via a moulin, accurate modelling of this water flux is important for studies of GrIS subglacial hydrology and ice dynamics (Chu, 2014;de Fleurian et al., 2016;Flowers, 2018;Davison et al., 2019) https://doi.org/10.5194/tc-2020-273 Preprint. Discussion started: 16 October 2020 c Author(s) 2020. CC BY 4.0 License.

Remotely sensed and SMB model data 130
A high-resolution remotely sensed supraglacial stream network for the Rio Behar catchment, mapped from a 0.5 m resolution panchromatic WorldView-2 satellite image acquired on 18 July 2015, was obtained from Smith et al. (2017), and this scale is sufficient for capturing the smallest streams in this region . A seasonally simultaneous 2 m resolution ArcticDEM DEM was obtained from Polar Geospatial Centre (Porter et al., 2018). ArcticDEM has been widely used in GrIS hydrology studies and performed reasonably well in representing drainage patterns in previous work (Yang et al., 135 2019). This DEM was used to generate two distinct supraglacial stream networks from the Smith et al. (2017) product as described in Section 3.2.
GrIS runoff was simulated by four models (HIRHAM5, MAR3.6, RACMO2.3, and MERRA-2). Data and detailed descriptions of these SMB models are provided in Smith et al. (2017), but in brief each of these models solves a local surface energy balance from meteorological forcing to produce some amount of runoff produced after physical processes of melting, 140 condensation, retention, and refreezing. This excess water is spatially gridded, and for a given grid cell the models each produce hourly runoff. We take the average runoff in all grid cells intersecting Rio Behar to arrive at a single hourly runoff value for each SMB model following Smith et al (2017). We therefore have four different runoff forcings available for routing that cover from one month before the in situ measurement period through the end of the measurements (Section 2.2). Our goal for this paper is not to interrogate these models. Rather, we hope to highlight the nuances of supraglacial meltwater routing across a 145 range of forcings.

In situ data
Two sources of field data are available for this study. The first source is an hourly Acoustic Doppler Current Profiler (ADCP) discharge record published by Smith et al. (2017). An ADCP is an instrument that measures river flow depth via sonar ranging and vertical velocity profiles using Doppler shifts in the water column. The instrument is transited orthogonal to flow 150 and makes its measurements in discrete bins which are then summed to arrive at the mass flux of water in the channel. ADCP outputs are thus correctly labelled as 'estimates' of discharge rather than 'measurements,' as the measured quantities are depth and velocity and discharge is derived. However, the ADCP provides the most trusted and accurate method for estimating discharge used in hydrology and frequently labelled as measurements (Gleason and Durand, 2020), and further reading on ADCP estimates of discharge and measurement protocols can be found in Turnipseed and Sauer (2010).  (2017) give a detailed description of measurement protocol for collecting and processing these ADCP discharges, and readers are referred to that publication for more information. ADCP estimated discharges ranged from 4 to 26 m 3 /s, revealing that large supraglacial rivers do not de-water at night and can sustain peak flows comparable to streams of moderate catchment size in terrestrial hydrology. These ADCP discharges form the core 160 https://doi.org/10.5194/tc-2020-273 Preprint. Discussion started: 16 October 2020 c Author(s) 2020. CC BY 4.0 License.
HRR model calibration dataset for our study, allowing us to calibrate the free parameters of the routing model (Section 3.2) and to adjust water excess of the SMB models to best match these observations.
The second source of in situ data used here is a broad set of observations of supraglacial channel hydraulics collected in summer 2012 across 64 supraglacial streams and rivers of the southwest GrIS (Gleason et al., 2016). These in situ measurements consist of instantaneous supraglacial channel flow widths, depths, water surface slopes, and velocities collected 165 using traditional surveying, radar velocimetry, and an ADCP. These measurements in turn yielded derivative estimates of discharge, stream power, Froude number, and roughness coefficient (Manning's n) at 64 sites, representing the largest known empirical dataset of supraglacial channel hydraulic properties currently available in the literature. Site locations ranged from 502 to 1485 m elevation and up to 74 km inland from the ice margin, and instantaneous discharges ranged from 0.006 to 23.12 m 3 /s in actively flowing channels 0.20 to 20.62 m wide. These observations are used to constrain our modelled roughness 170 coefficients to produce realistic parameters and velocities. Section 3.2 describes this process fully.

Experiment design
Our overall goal for this study is to improve current understanding of supraglacial hydrological transport processes through classical hillslope and channel routing. We test two experimental settings (inclusion/exclusion of hillslope flow, 175 coarse/fine channel network densities) in factorial on four different SMB models to produce 16 experimental runs ( Figure 1).
For each run, we calibrate eleven parameters: a global runoff correction coefficient (1 parameter), a spatially explicit channel roughness coefficient binned by channel slope (9 parameters), and a global hillslope roughness coefficient (1 parameter) to optimize modelled and measured discharge at the basin outlet (section 3.3.2 gives full details). Model calibration statistics were used as indicators of the physical realism of each experiment, and we seek to identify robust, cross-SMB model parameter 180 trends in our factorial experimental setting. Thus, we calibrate HRR 16 separate times to produce a set of results that vary by runoff forcing, channel density, and inclusion/exclusion of hillslope process.
Note that in all configurations (Figure 1), we calibrate a runoff correction coefficient (Rcoef). Previous work comparing SMB runoff to ADCP discharge at our field site reveals that the total runoff is frequently overpredicted by SMB models . We therefore created a multiplicative runoff correction coefficient to either reduce or augment SMB runoff that 185 is calibrated within HRR without changing the timing of production. Previous routing studies have forced model runoff to equal the cumulative measured river discharge before further routing , yet this restrictive assumption amounts to an empirical ad-hoc mass conservation rather than explicitly relying on hillslope and channel mass and momentum conservation across thousands of channels. Thus, we calibrate Rcoef together with the traditional HRR parameters (i.e., channel and hillslope roughness coefficients, Table 1, Section 3.3.2) for each model run to learn the total volume of excess needed in 190 each case to simultaneously match both hydrograph timing and mass conservation. This allows our results and routing https://doi.org/10.5194/tc-2020-273 Preprint. Discussion started: 16 October 2020 c Author(s) 2020. CC BY 4.0 License. framework guide our conclusions on the total volume of water needed to generate the outlet hydrograph as this volume might differ between network and hillslope configurations. Figure 1 shows an overall schematic of our approach.

River network extraction
Although Smith et al. (2017) provide a topologically connected channel network for our study area, we are interested 195 in generalizing the process of water routing from satellite image collection to water routing, which is also necessary to test the effects of network density on the routing model. Therefore, we performed a three-step process to generate two alternate river networks for eventual routing from ArcticDEM. 1) We first 'burned' (i.e. lowered the pixel elevations) the remotely sensed stream map of Smith et al. (2017) into ArcticDEM, a standard hydrologic practice (e.g. Lindsay, 2016). Two large topographic depressions in the catchment, one located in the upper part of the catchment and the other located near the catchment outlet, 200 confound conventional network generation. Standard DEM preparation for hydrological analysis (in which an upstream depression is filled while an outlet depression is preserved) generated unrealistic parallel drainage channels upstream and no channels in the outlet depression for our data. To address this problem, 2) a priority-flood algorithm (Lindsay, 2016) was applied to breach the two depressions and to create a continuously-flowing, realistic drainage network for the Rio Behar https://doi.org/10.5194/tc-2020-273 Preprint. Discussion started: 16 October 2020 c Author(s) 2020. CC BY 4.0 License. catchment ( Figure 2). Finally, 3) the parameter that drives network generation and ultimate channel density is the channel 205 initiation threshold: the minimum area needed to form a free-flowing channel. To estimate the impact of drainage pattern on meltwater routing, we tested both a large (10 4 m 2 ) and a small (10 3 m 2 ) channel initiation threshold to create a 'coarse' and a 'fine' supraglacial drainage network, respectively ( Figure 2). Two modelled stream networks were thus generated for subsequent analysis, enabling us to test the effects of including or excluding very small tributary streams on surface water routing. Our 'fine' channel network produces streams with a minimum width of 0.5m, matching Gleason et al.'s (2016) 210 reporting of channels as narrow as 0.2m. GrIS supraglacial channels incise and meander over time, yet HRR cannot represent this behaviour and instead assumes that channels remain fixed in space and time. It is possible to derive expected erosion and incision (and additional meltwater) due to frictional heating of the channels, but without including a radiation budget and ice property data we cannot model how the stream network changes in time nor satisfactorily model this additional meltwater with commensurate sophistication to the SMB runoff forcing (i.e., tight coupling with SMB models). Instead, we model this network 215 snapshot with HRR as loosely coupled to SMB runoff, which is reasonable for our one-month experiment (Section 3.3.1).
Our river network extraction ultimately produced two topologically connected networks of 1,044 and 8,095 channels (coarse and fine, respectively, Figure 2). The coarse network has six stream orders (e.g. the smallest streams on the landscape are order 1, and every junction of stream produces a new stream of higher order), and the fine network seven orders. The networks are topologically complete (i.e., all channels are explicitly connected to one another and preserve their hydrologic 220 hierarchy) allowing for successful routing without the need for further correction of network connections. The main trunk streams only are visible in the coarse network, and lakes are represented by wide, shallow 'throughflow' river segments as all are non-terminal with outflow channels.  Cunge, 1969). HRR uses an explicit kinematic wave for hillslope transport as non-channelized overland flow (Li et al 1975). HRR requires inputs of channel widths and lengths, which are assumed invariant and derived from remote sensing (Section 2.1), channel slope, and each channel's subcatchment area and total upstream area, as derived here from the DEM 230 where bed slope is assumed to equal the free surface flow consistent with Manning's equation. In addition, the network topology derived in Section 3.2.2 is required so that HRR can conserve mass and momentum in a downstream direction and across channel junctions. HRR is one of several routing models that classically conserve mass and momentum designed for large-network applications. Our choice of HRR is based on familiarity, model speed (written in FORTRAN, unlike other terrestrial hydrology models), and its rigorous representation of network routing and classic open channel flow hydraulics. 235 HRR routes time-varying runoff onto existing flows, commonly onto a baseflow in terrestrial hydrology. We 'spin up' the model by routing a constant forcing of median observed ADCP flow through the model rather than attempt to define a minimum baseflow. This steady forcing allows all channels to fill with water and accurately transfer runoff from the SMB models through the system. We used a three month spin up period then temporally varied flows beginning July 1 from SMB forcing. Our experiment begins on July 20, and thus the model has time to adjust to runoff forcing and mitigate the impact of 240 this spin up flow before we begin to validate the model.

Model calibration
Nearly all hydrologic models require calibration to function well. To calibrate terrestrial routing models, hydrologists typically iterate parameters until hydrographs at one or more reaches match a stream gauge in that reach. Here, we have calibration data available only at the basin outlet, so we calibrate our routing model to outlet discharges, despite producing 245 discharges in thousands of reaches. We do this using an evolutionary algorithm (EA; NSGA II, Deb et al., 2002) as EAs are efficient estimators in large parameter spaces that can achieve near-optimal results (Gleason and Smith, 2014). This calibration ensures a heuristically optimized outlet hydrograph but does not explicitly calibrate upstream reaches. However, since outlet flows are the sum effect of the routing delays and volumes of all upstream reaches, and since we explicitly conserve mass and momentum, a well-calibrated outlet should satisfactorily model upstream flows, but we cannot validate these upstream reaches. 250 Therefore, we constrain allowable parameters in upstream reaches (and therefore their discharges and velocities) using the in situ observations of Gleason et al. (2016).
We calibrate 11 constrained parameters (Table 1) which represent three physical concepts: channel friction (here binned by upstream area into 9 separate parameters), hillslope friction, and a water excess adjustment coefficient. Channel friction is represented by Manning's n and the EA solves for a single n per bin and assigns that n to all streams falling within 255 https://doi.org/10.5194/tc-2020-273 Preprint. Discussion started: 16 October 2020 c Author(s) 2020. CC BY 4.0 License. that drainage area threshold. This procedure follows general hydraulic correlations between channel size, slope, total discharge, and n (Brinkerhoff et al., 2019). Hillslope flow is modelled as an explicit kinematic wave for non-channelized flow (Li et al 1975), which requires a surface roughness coefficient (i.e., hillslope friction), and we limit hillslope friction to between 0.05 (non-dimensional; a hillslope with friction equivalent to a rough channel) and 25 (a hillslope with extreme friction to approximate slow interflow through weathering crust). For context from the terrestrial hydrology literature, McCuen (1998) Hergarten and Neugebauer (1997) suggest friction up to a value of 1. Thus, we allow GrIS ice surface hillslope frictions to vary up to two orders of magnitude greater than typical terrestrial reference values to allow for potentially unique supraglacial processes ranging from fast flow over smooth bare ice to slow porous-media flow through weathering 265 crust. Finally, we bound Rcoef to range between 0.3 and 2.0 to allow for both over and underproduction of water excess without imposing mass (e.g. runoff) production. For each of our 16 experimental trials, the EA thus solves for the optimal combination of hillslope and channel friction in tandem with runoff production to best match the ADCP record measured at the outlet.
Recall we do not run the SMB models directly. We parameterized our EA as follows. Crossover probability and distance were set to 0.7 and 5, respectively, and mutation probability and distance were set to 0.2 and 10, respectively. These parameters control the degree of change in one parameter set to the next. The objective function for the EA was the Nash Sutcliffe Efficiency (NSE) at the outlet, calculated 275 between the in situ ADCP record and the model discharge. NSE is a standard hydrology metric for hydrograph analysis optimal https://doi.org/10.5194/tc-2020-273 Preprint. Discussion started: 16 October 2020 c Author(s) 2020. CC BY 4.0 License. at a value of 1. An NSE of 0 is equivalent to modelling a hydrograph as the true mean flow, and negative NSE values indicate that the mean outperforms a given model. Finally, we set the population size and number of generations (parameters that control how many different solutions the EA tests the size of the search space in tandem with crossover and mutation) based on the model configuration due to runtime. Even though we ran our tests using parallel computing on a powerful modelling 280 machine (Intel Xeon Gold 6126 3GHZ CPU with 96 GB of RAM and 24 logical processors), a single fine-network hillslope HRR run took approximately 2 minutes to complete. Thus, we used 40 population members for the non-hillslope tests, 16 members for the coarse hillslope test, and 12 members for the fine hillslope test. EA length was set to 2,500 generations for the non-hillslope tests, and 1,000 and 500 generations for the coarse and fine hillslope tests, respectively. The total number of tested parameterizations is equivalent to the number of generations multiplied by the population size, so we tested between 285 6,000 and 100,000 parameter sets across our calibration runs equivalent to approximately 6 days of computing time for the longest calibration. We saved globally optimal results as they occurred within the EA as a single objective problem, and these results were obtained well before the end of the EA in each run, so we are confident that the length of the EA was sufficient in each case.

Basin outlet hydrograph
We first analyse our model results at the basin outlet (Figure 3). In aggregate, two major results are immediately apparent across our 16 model configurations. First, the fine river network generally outperformed the coarse network across models and hillslope choices (as 7/8 of fine networks appear in the of the top 10 performing models, Table 2). Second, the top three performing models all include explicit hillslope kinematic wave routing, with the best outcome (a RACMO2 forced fine 295 network hillslope configuration) having an excellent calibration RMSE of 1.85 m 3 /s. Model calibration statistics show high skill (defined here as NSE > 0.8) in 5/16 cases and moderate skill (NSE > 0.5) in all 16 cases, with RMSE ranging from 1.85 to 4.55 m 3 /s (observed flows ranged from 4.6 to 26.7 m 3 /s, for context). Note that RMSE and NSE do not track perfectly given the differing nature of their assessments. RMSE is a total mass error that is influenced by the scale of variation in the hydrograph, where NSE compares to the mean. There is no universally acknowledged threshold for model calibration goodness 300 of fit, but the models presented here meet a traditional gauging station expectation of 5-10% error in matching ADCP flows (Turnipseed and Sauer, 2010).
All 16 calibrated HRR model configurations match daily peak flow magnitude and timing, regardless of input runoff or hillslope/density controls. This occurs despite runoff forcings from each model that are out of phase with the peak recession observed in the ADCP outlet hydrograph. While all calibrated models match peak magnitude well, only RACMO2-forced 305 models capture the peak recession seen by the ADCP. All instantaneously routed SMB runoff incorrectly shows zero flow in the overnight period, and many of our calibrated models also approach near-zero flow overnight, but the fine-network models do retain some water regardless of forcing. RACMO2-forced experiments are successful at matching both peak and low flows for all experiments except the coarse non-hillslope case, and indeed achieve NSE scores of up to 0.92 and corresponding RMSE of only 1.85 m 3 /s. Post-routing total cumulative discharge is relatively consistent across all models (see Figure 4 where 310 total discharge is shown for hillslope models). Rcoef varied from model to model but little within each model and ranged from 41% to 88% retention (Table 2). Despite indicating that reduced input runoff is required to route flows accurately across all models, overall routed cumulative discharge was lower than in situ measurements for this time period due to underprediction of low flows (Figure 4).  ('instantaneous', brown lines) to match field observations. Even after coupling SMB models with HRR routing models, most simulations underpredict low flows. Peak flows are relatively well modelled, although ADCP peak recession is only modelled correctly by RACMO2 -forced routing.
Finally, we calculate routing delays for each of our 16 calibrated routing models by noting the difference in ADCP 315 peak and the unrouted SMB runoff peak. This delay is the shortest for MERRA2 (1-3 hours) and longest for MAR and RACMO2 (5-6 hours). This routing delay is a function of both time of day and discharge, so these values represent an approximate estimate for daily peak flow delay between runoff forcing and calibrated HRR model. These calculations represent the total travel time for water to pass through the system, from runoff production to the outlet. Our routed flows are non-zero in many cases despite a zero water excess forcing at night (Figure 3), signifying that the network architecture and HRR-320 modelled routing delays are sufficient to introduce physically realistic (i.e. non-zero) night-time water discharges atop the GrIS, consistent with in situ ADCP measurements in Rio Behar.

Lower-order hydrographs 345
While we cannot verify flows at any network channel besides the outlet, we have simulated hourly flows for all 1,044 and 8,095 channel segments in the coarse and fine networks, respectively. If we assume that accurate model performance at the main basin outlet indicates physically realistic upstream flows, it is profitable to report results for upstream flows during the calibration period. To analyse these large data, we summarize flows in the 72-hour validation period by stream order, with Figures 5 presenting results for 1-3 rd order streams and Figure 6 presenting results for 4 th and 5 th order streams. In each figure, 350 we plot the mean hydrograph for the order with one-standard deviation shaded area to represent variability around the mean. Total water export is relatively consistent across all four SMB models, but substantially different than input runoff (i.e., instantaneous routing) for all models but MERRA2. The ADCP represents a measured cumulative export, while instantaneous routing assumes that SMB runoff immediately leaves the watershed as soon as it is produced. Calibrated models underpredict water export due to underestimation of night-time low flows. https://doi.org/10.5194/tc-2020-273 Preprint. Discussion started: 16 October 2020 c Author(s) 2020. CC BY 4.0 License.
fewer streams are thus more homogenous by definition in these plots.
There is a large difference in flow magnitude across fine and coarse models, regardless of SMB forcing or inclusion/exclusion of hillslopes ( Figure 5, 6). For 4 th and 5 th order streams these flow differences span roughly a factor of two, 355 while in the lower orders flow differences span almost an order of magnitude. This signifies that smaller streams are more Figure 5. Mean and one-sigma shaded variability for channel segment hydrographs by order for stream orders 1-3 for the validation period. Non-Hillslope process flows are dashed. Note the increase by a factor of ~10 in flows between fine and coarse networks, and the difference in peak timing between hillslope and non-hillslope models.
sensitive to their hillslopes, as expected. We also note that the networks have different total orders (6 for the coarse network, 7 for the fine network). Therefore, the 2 nd order fine streams loosely correspond to 1 st order coarse streams, but this correlation is not a 1:1 match. Peak timing also differs between hillslope and non-hillslope models in the lower orders for coarse networks.
This effect is more pronounced in the lowest 1-3 rd orders, where e.g. RACMO2-forced models show a peak delay of almost 5 360 hours between hillslope and non-hillslope models. This delay in peak timing when explicitly modelling a hillslope process at smaller streams is intuitive and stronger in coarse models, which have larger individual hillslopes via their larger channel inception area threshold. Turning to the calibrated model parameters, mean n values (across either 1,044 or 8,095 channels) ranged from 0.006 to 0.055 across all 16 calibrated models ( Table 2). The standard deviation of n varied considerably and was often the same 365 order of magnitude as its mean (Table 2). Figure 7 summarizes channel friction across inclusion/exclusion of hillslope process and across coarse/fine networks. Channel friction is given by calibrated n, and recall that we calibrated channel friction in nine discreet bins based on upstream area, such that all channels within the area bin receive the same n. Upstream area loosely tracks stream order, and thus the larger the area, the higher the order.
Non-hillslope large channels in the three highest orders require a substantially larger Manning's n value than these 370 same channels with a hillslope process included, indicating that the non-hillslope models necessitate higher friction in large channels to match outlet flows. For the lower order streams with upstream areas less than 1.260 km 2 , channel friction decreases with increasing upstream area. This pattern repeats when analysing across coarse/fine networks, but there are less clear patterns in n when analysing the coarse vs fine network for the three largest bins. This suggests that the dominant control on modelled  Table 1. Bins are bounded by the maximum value indicated in the x axes and a minimum value equal to the maximum area of the next smallest bin. There are 8 values per each boxplot. Non-hillslope trials require substantially more friction than hillslope trials in the largest channels, suggesting compensation for lack of hillslope process representation.
https://doi.org/10.5194/tc-2020-273 Preprint. Discussion started: 16 October 2020 c Author(s) 2020. CC BY 4.0 License. channel friction is whether or not water first enters a channel via a hillslope. Finally, channel friction values in Figure 7 fall 375 well within our physically realistic constraints until the three largest bins. These largest channels for non-hillslope models in particular require friction near the upper limit of plausibility (particularly the 2 nd largest bin) to satisfactorily conserve mass, and the worse validation metrics for these configurations might be traced to this effect.

Discussion
We have successfully calibrated a hillslope river routing model capable of simulating hourly flows through thousands 380 of supraglacial channels atop the GrIS while conserving runoff mass and momentum. The most accurate models to emerge from our experiments were those that employed a fine channel network and/or inclusion of hillslope flow routing. We assert that our results support the inclusion of realistically fine river/stream networks and hillslope-enabled routing models for supraglacial runoff modelling applications that require realistic representation of runoff timing and magnitude. While we cannot validate in-channel flows upstream of the outlet, this level of hydrological simulation could, for instance, be coupled 385 with SMB models to calculate hourly moulin discharge rates, lake fill-and-spill volumes, channel incision rates (e.g. following Karlstrom and Yang, 2016), and supraglacial contributions to subglacial water pressures (e.g. following Yang et al., 2020).
These processes have important implications for GrIS surface hydrology, surface mass balance, and subglacial hydrological systems. We believe this work represents a promising step toward coupled SMB-routing modelling that can be used to generate more realistic predictions of these processes and their sensitivity to changing surface meltwater forcings or surface topography. 390 The goal of this study was not to interrogate individual SMB models or suggest one is better than another, but rather to demonstrate the importance of coupling SMB model output with a surface flow routing model. This enables rigorous estimation of supraglacial flow accumulation and routing delays to moulins atop the GrIS that route meltwater into a dynamically varying subglacial hydraulic system that influences ice sheet acceleration in response to the timing and magnitude of input discharges, which is imperative to accurately estimate diurnally varying moulin discharges using climate models. 395 Second, this work advances physical understanding of ice sheet surface hydraulic properties, for example our finding hillslope friction values (Table 2) well outside typical terrestrial values of 0.01 to 1 (Hergarten and Neugebauer, 1997;McCuen, 1998;Kalyanapu et al. 2009). Yang et al. (2018) similarly estimated slow transport of meltwater on ice interfluves (similar to the hillslopes studied here) some 2-3 orders of magnitude slower than open-channel flow (∼10 −1 m s −1 ). Observations of ice density and saturation in shallow ice cores within the Rio Behar catchment indicate that substantial subsurface meltwater is stored 400 within the upper decimetres of bare-ice weathering crust, and was anecdotally observed to percolate through the crust . If so, this unsaturated flow would move orders of magnitude slower than bare-ice overland flow. These convergent findings are consistent with conceptual models of unsaturated subsurface porous media flow, and support the very slow lateral transport we observe here (on the order of 10 −5 to 10 −1 m s −1 ) to the channel from the ice surface, but we cannot make any further conclusions on physical process or mechanism given our experiment design and model setup. This result highlights the 405 need for further basic research on supraglacial hydrological process to further understand the importance of these velocities. https://doi.org/10.5194/tc-2020-273 Preprint. Discussion started: 16 October 2020 c Author(s) 2020. CC BY 4.0 License.
The importance of including hillslope process is also clearly manifested through calibrated channel frictions generated in model experiments that exclude it. There are discernible changes in channel friction when hillslopes are/are not modelled, and the results are intuitive: channels lacking hillslopes have much higher friction, especially in large channels (Figure 7).
Further, for the largest channels (i.e., upstream areas greater than 1.260 km 2 ), models without hillslopes take channel friction 410 values almost uniformly at the maximum of the realistic constraints we set (Figure 7) while at the same time having a poor match to observed flows (Table 2, Figure 3). Thus, these models would likely improve only by including physically unrealistic frictional values. This is in line with mass conservation, as without hillslopes to slow water upstream, HRR needs to slow water using extreme friction near the outlet in order to match the hydrograph. This pattern is observed across both coarse and fine networks. 415 Ideally, we would have enough data to calibrate and validate the model over separate time periods and at more locations than the outlet. HRR produces an individual hourly discharge at each of our thousands of channels, but we can only verify these at the outlet. However, we believe that model calibration statistics at the outlet indicate the physical realism of the process we're attempting to model: if HRR cannot produce an outlet hydrograph that matches the ADCP, we attribute this failure to physics, rather than calibration or model error. Our results show that HRR is capable of matching outlet flows 420 extremely well (calibration Kling-Gupta Efficiency (KGE) as high as 0.96 and NSE as high as 0.92), and thus we believe this assumption well-founded. Recall also that the ADCP data were collected July 20-23, but we model hourly flows for the entire month. We focus our evaluation only on this 72 h calibration period to discuss our experimental results, without extrapolation into unverified time. This extrapolation is of course an ultimate end goal of future GrIS water routing as we look toward future coupled SMB-routing models that can be used to study interactions between surface hydrologic routing processes and 425 subglacial processes. While we have here only reported flows during a verifiable 72-hour period, in theory our model parameters should be able to accurately appliable to flow route watering in similar areas of the GrIS with similar network drainage patterns.
Our results also support earlier assertions of mismatched timing and magnitude of SMB runoff and observed discharges entering the Rio Behar terminal moulin . Our routing model indicates that between 41 and 88% 430 of SMB-modelled runoff exited the catchment in this three-day period. The routing model is unable to apportion where this extra mass must go, so we can only suggest plausible mechanisms for closing that mass balance gap. Mass gaps could perhaps result from to subsurface retention and/or refreezing in bare-ice weathering crust , a process not currently well-represented in SMB models. Our model can store water in lakes and slower rivers, but peak timing delays indicate total travel times on the order of less than 5 hours, so our results do not plausibly support in-network storage. Further, errors in our 435 outlet hydrographs are dominated by night-time low flow periods, as peak flows are modelled well across nearly all 16 trials. These night-time low flows are particularly important for mass balance in the Rio Behar watershed, as a large driver of mismatches in total mass balance ( Figure 4) comes from these low flow periods. The ADCP itself is generally less certain at lower flows, but the fact that HRR reproduces night-time low-flows after calibration is encouraging, particularly when forced with RACMO2. However, we affirm that all SMB models examined here produce too much excess water relative to ADCP 440 https://doi.org/10.5194/tc-2020-273 Preprint. Discussion started: 16 October 2020 c Author(s) 2020. CC BY 4.0 License.
observations and do not model night-time flows without routing, consistent with Smith et al. (2017). Our results suggest that hydrologic process modelling can correctly reproduce these night-time low flows.
The workflow presented here is repeatable for any supraglacial stream/river network on the GrIS, but the in situ discharge datasets needed for calibration are not readily available. Future studies attempting to repeat this model setup elsewhere need an in situ discharge record (ideally longer than our 3-day record and ideally collected at multiple locations 445 across stream orders), a high quality DEM, and a fine-scale remotely sensed image. Modelling is efficient with these data in- hand, yet the collection in situ discharge in particular present a major hurdle for widespread application to the GrIS. It is possible to use assumed discharges for calibration, but as our results clearly support a difference between predicted and measured fluxes, we believe measured calibration data are best. We suggest that collection and publication of a repository of supraglacial channel discharges and hydraulic properties atop the GrIS would be an invaluable resource, and that future studies 450 should explore transferability of key parameters (e.g. channel and hillslope frictions) to other locations on the ice sheet.

Conclusion
We confirm earlier assertions of the importance of terrestrial hydrological processes, specifically hillslope water transport and open-channel flow, on GrIS surface meltwater routing. Unlike previous studies routing meltwater, our results are generated using the Hillslope River Routing model (HRR), which uses an explicit kinematic wave to conserve water mass and 455 momentum in hillslopes and channels and represents hourly flow in nearly 10,000 individual channels in a fully topological network. This first-principles investigation shows that observed supraglacial river discharges (and thus moulin hydrographs) cannot be accurately simulated without both reducing the volume of surface runoff generated by SMB models and accounting for hydrologic transport processes. We investigated two process-level controls on this modelling: modelling coarse vs. finescale channel networks and inclusion/exclusion of hillslope process and found that incorporating fine-scale channel networks 460 and hillslopes yields superior results. Calibrated model parameters are intuitive and align with field observations and theory.
The automated methods developed here could readily be deployed elsewhere atop the GrIS bare-ice ablation zone but require in situ supraglacial discharge data for calibration. More of these data should be collected if GrIS surface hydrology processes are to be fully understood.

Author contributions
CJG and KY conceived of the idea and designed the study. KY and KL extracted river networks. CJG and DF set up and calibrated HRR. CJG designed and created the figures and drafted the text. All other authors participated in fieldwork to collect the ADCP record, and all authors wrote the text. https://doi.org/10.5194/tc-2020-273 Preprint. Discussion started: 16 October 2020 c Author(s) 2020. CC BY 4.0 License.