A comparison between generalized least squares regression and top-kriging for homogeneous cross-correlated flood regions

ABSTRACT Spatial cross-correlation among flood sequences impacts the accuracy of regional predictors. Our study investigates this impact for two regionalization procedures, generalized least squares (GLS) regression and top-kriging (TK), which deal with cross-correlation in two fundamentally different ways and therefore might be associated with different accuracy and uncertainty of predicted flood quantiles. We perform a Monte Carlo experiment based on a dataset of annual maximum flood series for 20 catchments in a hydrologically homogeneous region. Based on a log-Pearson type III parent distribution, we generate 3000 realizations of the region with different degrees of cross-correlation. For each realization, GLS and TK are applied in leave-one-out cross-validation to predict at-site flood quantiles. Our study shows that (a) TK outperforms GLS when catchment area is the only catchment descriptor used for predicting “true” population (theoretical) flood quantiles, regardless of the level of cross-correlation, and (b) GLS and TK perform similarly when multiple catchment descriptors are used.

1 Introduction: state of the art and aims of the study

Prediction of flood quantiles in ungauged basins
The accurate estimation of flood quantiles (i.e. flood discharge associated with a given non-exceedance probability, usually expressed in terms of return period T-although the latter concept is often prone to misuses, especially in the presence of nonstationarity, see e.g. Serinaldi 2015) is of paramount importance in many practical engineering applications. Because gauging stations are heterogeneously and sparsely distributed in space, one of the most common tasks for hydrological engineers is to produce an accurate estimate of the design flood at ungauged or scarcely gauged river cross-sections (see the Predictions in Ungauged Basins (PUB) initiative of the International Association of Hydrological Sciences (IAHS) for the decade 2003-2012; (Sivapalan et al. 2003, Blöschl et al. 2013). The term "ungauged" generally refers to a river cross-section where no streamflow data are available. A special case is when the ungauged river cross-section has no information available upstream or downstream within the given catchment (that is absent of nested gauged catchments, i.e. no gauges are within the watershed of other gauges); in this article, we refer to this latter case as "fully ungauged" conditions.
The estimation of flood quantiles at ungauged sites is often addressed by means of regional flood frequency analysis, or statistical regionalization, which consists of transferring the hydrological information collected at gauged sites that are supposed to be hydrologically similar to the ungauged (or scarcely gauged) target site Wallis 1993, 1997). In this process, it is important to consider that, being based on gathering data from a finite number of observations (i.e. events) collected at a finite number of sites (i.e. limited record size), the obtained empirical estimates of flood quantiles can differ considerably from the theoretical ones, which could be theoretically known based upon the perfect knowledge of the process (i.e. infinite observations). In regional flood frequency analysis, this difference is usually amplified by the presence of the intersite dependence of streamflow series (i.e. spatial crosscorrelation), due to the temporal overlap between time series of observed streamflow at streamgauges that are close in space (i.e. concurrent flows are recorded at different streamgauges). For these reasons, the distinction between empirical and theoretical flood quantiles is crucial in regional flood frequency analysis: while empirical quantiles can be seen as estimates of what the T-year flood would have been if the target site was gauged, theoretical quantiles are the "true" population flood quantiles. The implications related to this difference are important: as observed records are realizations of a random process that will never be repeated, the knowledge of the "true" distribution (i.e. theoretical flood quantiles) from which observations arise would be fundamental for making general evaluations capable of accounting for the main properties of the (true) natural process.

Statistical regionalization
The literature reports several regional statistical methods, which assume the hydrological variable of interest to be a random variable: they can be classified as regression-based methods, index-flood methods and geostatistical methods (see e.g. Blöschl et al. 2013). These methods can differ significantly in the way they identify groups of hydrologically similar catchments and account for the limited size of record and the presence of cross-correlation. Regression-based methods relate the hydrological variable of interest to observable catchment and climate characteristics; they can require the preliminary identification of a homogeneous region and may or may not account for the presence of spatial cross-correlation and of unequal record lengths from site to site (see e.g. Thomas and Benson 1970, Tasker 1980, Stedinger and Tasker 1985, Tasker and Stedinger 1989) (see also Durocher et al. 2019, for an application of nonparametric regression methods with spatial components). Index-flood methods (see e.g. Dalrymple 1960, Hosking andWallis 1997) estimate the hydrological design variable of interest as the product between an index-flood (i.e. scale factor), which depends only on the specific target site, and a dimensionless quantile (i.e. growth factor), which is unique within the given homogeneous region. Note that the concept of homogeneous region has evolved significantly over the last few decades: the traditional idea of contiguous and geographically identifiable regions (see e. g. NERC 1975) has been gradually replaced with the more general idea of homogeneous pooling-groups of sites with similar hydrological behaviour, which may or may not be geographically close to each other (see e.g. Acreman and Wiltshire 1989, Burn 1990, Ouarda et al. 2001. Finally, geostatistical methods assume the hydrological signature of interest in the ungauged catchment to be a weighted mean of the hydrological signatures in the neighbouring catchments, where the weights account for the spatial correlation of the signatures and the mutual locations of the catchments (see e.g. De Marsily 1986, Chokmani and Ouarda 2004, Skøien et al. 2006, Skøien and Blöschl 2007.

Impact of intersite correlation
In regional flood frequency analysis, the presence of spatial cross-correlation among flood sequences motivates information transfer from gauged to ungauged neighbouring sites (for interpolating in space) or from long-record sites to shortrecord sites (see e.g. Vogel and Stedinger 1985). For instance, spatial interpolation techniques (see e.g. Skøien et al. 2006, Archfield andVogel 2010) consider spatial correlation an opportunity for predicting streamflow indices in ungauged sites. At the same time, cross-correlation tends to decrease the effective record length of series for computing regional statistics (see e.g. Matalas and Langbein 1962), thereby reducing the overall information content of regional datasets. This limits the accuracy with which moments of the regional parent distribution can be estimated (see e.g. Stedinger 1983), thus hampering the identification of the theoretical regional flood statistics. Classical studies (Matalas andLangbein 1962, Stedinger 1983) theoretically derived the reduction of the hydrological information content for a region having crosscorrelated flood sequences, and quantified the corresponding increase of uncertainty in regional estimators of streamflow statistics. Hosking and Wallis (1988) showed that intersite correlation increases the variance of regional flood statistics by impacting the prediction uncertainty of regional flood frequency models (not their bias). Rosbjerg (2007) demonstrated the importance of including the cross-correlation of flood peaks for properly quantifying the uncertainty of regional estimates of flood quantiles. Further studies (Hosking and Wallis 1997, Madsen and Rosbjerg 1997, Madsen et al. 2002 showed that cross-correlation may also hamper the identification of the actual degree of heterogeneity of the region by reducing the power of the statistical tests used for this aim (see e.g. Castellarin et al. 2008, for a quantification of the loss of performance of the homogeneity tests proposed by Wallis 1993, 1997). This impacts the assessment of the homogeneity of the region (e.g. heterogeneous pooling-groups of cross-correlated sites may be considered homogeneous), which is the fundamental hypothesis of the index-flood procedures (Dalrymple 1960) and a fundamental requirement for performing an effective regional estimation of flood quantiles (see e.g. Lettenmaier et al. 1987, Stedinger andLu 1995).

Regional models accounting for, or exploiting, intersite correlation
The presence of spatial correlation is an important consideration when predicting flood quantiles in ungauged basins. Several studies in the literature tackled the problem of accounting for, or exploiting, the presence of spatial correlation among concurrent streamflows when predicting flood quantiles in ungauged basins (i.e. by means of regression methods, or spatial interpolation methods). Concerning regression-based methods, one example is the generalized least squares (GLS) regression introduced by Stedinger and Tasker (1985). Specifically, Tasker and Stedinger (1989) developed a particular version of GLS explicitly for the estimation of streamflow characteristics in fully ungauged basins (i.e. no gauged sites upstream or downstream within the same river basin). The Stedinger-Tasker GLS regression, which is illustrated in detail in the Appendix, accounts for sampling variability and cross-correlation among concurrent streamflows in developing a regional (multi-)regression model, with the aim of estimating the theoretical regional flood statistics (i.e. reducing the hampering effect due to the cross-correlation structure of the study region). The Stedinger-Tasker GLS regression (hereinafter referred to as GLS, for the sake of brevity), which is adapted for a log-Pearson type III (LP3) frequency analysis (see Bulletin 17B of the Interagency Advisory Committee on Water Data 1982, and the most recent Bulletin 17C by England et al. 2018), is the reference procedure for estimating streamflow characteristics in ungauged catchments in the US (Eng et al. 2009) and has been widely used for the regionalization of flood quantiles (see e.g. , Grehys 1996, Feaster and Tasker 2002. It is important to note that GLS regression has been formulated to perform estimates at fully ungauged sites and that the presence of nested gauged sites with considerably overlapping drainage areas can result in high cross-correlations (i.e. they share an analogous flooding experience and can be considered redundant); for this reason, Gruber and Stedinger (2008) and Veilleux et al. (2011) introduced some metrics to identify these redundant sites and then delete one gauged site from each pair of redundant sites. Also, a Bayesian-GLS (B-GLS) regression model has been introduced for estimating model error variance and regression parameters (see e.g. Reis et al. 2005, Gruber et al. 2007, Gruber and Stedinger 2008, Veilleux et al. 2011Bulletin 17C by England et al. 2018); more recently, Reis et al. (2020) developed an operational B-GLS regression capable of providing a comprehensive framework for regional hydrological analyses, estimation of flood quantiles included.
Other significant examples are the spatial interpolation methods (or geostatistical procedures), which were developed in the last few decades as an evolution of ordinary kriging (OK) (Matheron 1971, Farmer 2016) and explicitly exploit spatial correlation (e.g. canonical kriging (CK), see Ouarda et al. 2001, Chokmani andOuarda 2004;top-kriging (TK), see Skøien et al. 2006). Also, maintenance of variance extension (MOVE) techniques (see e.g. Hirsch et al. 1982, Vogel and Stedinger 1985, Grygier et al. 1989 can be used to fill in missing observations at scarcely gauged locations by explicitly leveraging cross-correlation. Concerning spatial interpolation methods, they have been shown to be effective for predicting several streamflow indices and hydrological signatures in nested ungauged catchments (see e.g. Castiglioni et al. 2009, Archfield et al. 2013, Pugliese et al. 2014. For instance, for estimating daily streamflows at an ungauged catchment, the map-correlation method introduced by Archfield and Vogel (2010) suggests that if the correlations between a set of streamgauges and the ungauged catchment could be reliably estimated (by means of spatial methods, such as kriging), then the selection of the most correlated gauge leads to better performances than the selection of the nearest one (see also Patil and Stieglitz 2012, which show that spatial proximity alone cannot fully explain the prediction performance at a given location).
Another recently developed geostatistical procedure is topkriging (TK) (Skøien et al. 2006), which interpolates the runoff signature of interest along the stream network by taking the area and the nested structure of catchments into account. The method, originally tested for the prediction of specific 100-year flood for two Austrian regions (Skøien et al. 2006), was shown to provide more plausible and accurate estimates than OK. TK, which is thoroughly described in the Appendix, has been shown to be particularly successful in predicting a wide spectrum of point streamflow indices and variables in various geographical and climatic contexts: low flows (Castiglioni et al. 2011, high flows and floods (Merz et al. 2008, Archfield et al. 2013, flow-duration curves (Pugliese et al. 2014, Castellarin et al. 2018, stream temperature (Laaha et al. 2013), habitat suitability indices (Ceola et al. 2018), and daily streamflow series (Skøien and Blöschl 2007, Vormoor et al. 2011, Parajka et al. 2015, de Lavenne et al. 2016, Farmer 2016).

Open problems and aim of the study
In this context, a recent study by Archfield et al. (2013) compared the performances of GLS regression, CK and TK in predicting empirical estimators of flood quantiles across a set of 61 nested gauged basins (i.e. many of the gauges were within the watershed of other gauges) located in the Flint River basin in the south-eastern USA. Their study demonstrated that when the goal is the prediction of estimated empirical flood quantiles in nested ungauged sites, TK is likely to provide better predictions (i.e. smaller absolute errors and higher Nash-Sutcliffe efficiency; see Nash and Sutcliffe 1970) than CK and GLS regression models. Nevertheless, Archfield et al. (2013) also pointed out that, being entirely based on empirical data, their analysis cannot address the fundamental problem of understanding which technique is better suited (i.e. provides less uncertain estimates) for predicting the theoretical ("true" population) unknown flood quantiles in ungauged sites when the observed flood sequences are affected by cross-correlation.
In fact, referring to a set of correlated streamflow observations, in which the regional information content is reduced by intersite dependence, could hamper the identification of the theoretical flooding potential at a given ungauged site. On the one hand, one might expect TK to have better efficiencies in predicting the empirical estimator of the flood quantiles, weighting empirical information on the basis of the observed cross-correlation. On the other hand, one might assume that GLS, which explicitly models spatial correlation, would be more accurate in predicting the theoretical (and unknown) flood quantiles (i.e. "true" population flood quantiles). Despite its fundamental importance, this aspect has not yet been formally addressed in the literature, which motivates our study. Moreover, as recently suggested by Reis et al. (2020), it is important to point out that the application performed by Archfield et al. (2013) considered ungauged catchments that are nested with gauged ones (i.e. not fully ungauged): in such conditions, which are common in hydrological applications, TK takes advantage of flood data collected at nested gauged catchments, while GLS does not, being formulated to estimate hydrological statistics at fully ungauged sites.
The main objective of our study is to address the fundamental theoretical issue raised by Archfield et al. (2013), i.e. to understand which of the two ways of incorporating information on the cross-correlation structure of the data featured by GLS and TK is the most effective for estimating (a) the local empirical quantile estimate and (b) the theoretical flood quantile. To this aim, differently from Archfield et al. (2013), we consider a spatially limited, hydrologically homogeneous region and perform a controlled Monte Carlo experiment, which enables us to also assess the performance of the procedures in predicting the theoretical flood quantiles. In particular, we mimic the regional flood frequency regime and the spatial structure of the correlation among concurrent flows of a real-world example dataset for a homogeneous poolinggroup of catchments (see the Supplementary material for a detailed description of the dataset). Based on the structure of the real-world study region, we design a Monte Carlo simulation framework to generate 1000 realizations of the homogeneous regional set of floods (hereinafter referred to as the homogeneous region, for the sake of brevity) for three levels of regional cross-correlation (i.e. 3000 realizations in total).
In this preliminary study, we disregard the representation of the possible impact of nestedness in defining the cross-correlation structure based on the evidence. This impact is shown to be negligible for the study area (see the Supplementary material) as well as for other areas (see e.g. Fig. 4 in Castellarin 2007). Future analyses will specifically address this aspect, which has been shown to be significant in other geographical areas and hydroclimatic contexts (see e.g. Stedinger 2008, Veilleux et al. 2011). For each realization of the region and level of crosscorrelation, we apply GLS and TK to obtain predictions of at-site flood quantiles for return periods T equal to 10, 30, 50 and 100 years in a leave-one-out cross-validation (LOOCV) scheme, consistently with several studies on regionalization, including Archfield et al. (2013). Differently from Archfield et al. (2013), which considered a simple-regression application (i.e. based on Gotvald et al. 2009, drainage area was the only significant physiographic explanatory variable considered for their study area) of the two methods, we apply GLS and TK in both simple (i.e. univariate) and multiple (i.e. multivariate: more than one significant physiographic explanatory variable) versions. Finally, we compare the cross-validated GLS and TK regional flood quantiles against (a) the empirical estimators of flood quantiles and (b) their known theoretical values at each site in the realizations of the region.
As the basis for our simulations experiments, we consider a dataset of annual maximum series (AMS) of peak flow discharges collected at 20 nested catchments in Triveneto (northeastern Italy; see Persiano et al. 2016, and see the Supplementary material of the present article for a more detailed illustration), which may be regarded as possibly homogeneous in terms of flood frequency regime according to the test proposed by Hosking and Wallis (1997) (see also Castellarin et al. 2008). In the Supplementary material we also show that performing the exercise described in Archfield et al. (2013) for the Triveneto case study (i.e. application of GLS and TK in an LOOCV scheme for predicting at-site empirical flood quantiles in a nested region, see S.3 in the Supplementary material) leads to exactly the same results and conclusions as those obtained by Archfield et al. (2013) for the study area located in the south-east United States.

Monte Carlo simulation framework
To assess the behaviour of GLS and TK under different crosscorrelation scenarios and their accuracy in predicting the theoretical unknown flood quantiles at ungauged sites, we implement a Monte Carlo simulation experiment for generating realizations of a given cross-correlated homogeneous region, which can be summarized as follows (see also Fig. 1): Step 1. We focus on a real-world regional dataset (see the Supplementary material) to (a) mimic the regional flood frequency regime and controls of relevant catchment descriptors, as well as evaluate spatial correlation structure of flood flows, and (b) define theoretical flood quantiles at each and every site in the region referring to a unique regional parent distribution. Step 2. We generate 1000 realizations of the homogeneous region, with three different degrees of regional cross-correlation; each realization consists of N = 20 concurrent sequences (i.e. sites) of 35 annual floods (see Section 2.2 for the Monte Carlo simulation algorithm).
Step 3. We apply the L-moments algorithm Wallis 1993, 1997) for predicting at-site flood quantiles associated with return periods T = 10, 30, 50, 100 years at each and every site in each realization (see Section 2.3).
Step 4. We then refer to at-site flood quantiles for predicting flood quantiles in ungauged locations by means of simple and multiple (i.e. univariate and multivariate) applications of GLS (i.e. s-GLS and m-GLS) and TK (i.e. s-TK and m-TK) within an LOOCV procedure for each realization (see Section 2.4); Step 5. Finally, we compare GLS and TK cross-validated flood quantile predictions with their empirical and theoretical counterparts in terms of relative bias (RBIAS) and root mean square normalized error (RMSNE).
The main steps of the analysis and performance metrics are synthetically reported in Fig. 1. A more detailed description is reported in the following subsections.

Step 2. Realizations of cross-correlated regions
To investigate the impact of spatial correlation in flood data on the prediction accuracy of both GLS and TK, we refer to realizations of cross-correlated homogeneous regions generated in a Monte Carlo framework for three different degrees of intersite (or spatial, or cross-) correlation. We generate numerous realizations of cross-correlated hypothetical regions (each one composed of cross-correlated annual sequences of flood flows) that mimic the main characteristics of the real-world homogeneous reference region in terms of its flood frequency regime, these being: (1) the regional parent distribution of annual flood flows (we refer to the LP3 distribution, for which the Stedinger-Tasker GLS regression is adapted; see S.1); (2) the geomorphological and climatic controls on annual flood; and (3) the cross-correlation structure (see S.1 and S.2 in the Supplementary material).
We adopt as parameters of the theoretical parent distribution for each realization the LP3 regional L-moments, μ R , σ R and χ R , presented in S.1, hence we consider as theoretical dimensionless quantiles the regional LP3 quantiles q R T with return periods T = 10, 30, 50 and 100 years. It is worth noting here that we adopt as theoretical dimensional flood quantiles at each and every site i, Q R Ti , the product between q R T and local empirical mean annual flood, MAF i (i.e. the mean value of the observed AMS at site i).
The cross-correlation structure of the study region is modelled in the standard-normal space through the nonlinear model of Tasker and Stedinger (1989) (see Equation (A6) in the Appendix) and without distinguishing between cases where basins are nested and where they are not, for the reasons described above and according to the empirical evidence illustrated in the Supplementary material (see Fig. S4). We consider a scenario with � ρ equal to 0.6 (values of the parameters of the model by Tasker and Stedinger 1989, equal to θ,0.95 and α,0.07;see Equation (A6) in the Appendix) and two alternative models that adopt similar laws of decay between correlation and distance, describing a lower and higher degree of cross-correlation (or cross-correlation scenario) in the standard-normal space, and resulting in � ρ equal to 0.2 (θ,0.91, α,0.03) and 0.8 (θ,0.96, α,0.18), respectively.
For each cross-correlation scenario (i.e. � ρ = 0.2, 0.6, 0.8), 1000 realizations of the hypothetical homogeneous region are compiled by generating cross-correlated random annual flood sequences. Each realization of the region consists of 20 overlapping annual sequences with record length equal to 35 years (average record length in the real-world dataset). Flood sequences are simulated by first generating cross-correlated annual series from a multivariate standard-normal distribution with the selected record length (i.e. 35) and cross-correlation structure (i.e. a 20 × 20 correlation matrix resulting from the nonlinear model for the specific cross-correlation scenario). The generated standard-normal variates are then back-transformed to dimensionless LP3 flows (with parameters μ R , σ R and χ R ) through a quantile-quantile transformation. Note that the quantile-quantile back-transformation from the standard normal variate to the dimensionless LP3 flows introduces a slight distortion in terms of average crosscorrelation: a slight negative bias is observed (� ρ = 0.2, 0.6, 0.8 in the standard-normal space correspond to � ρ = 0.17, 0.55, 0.76 in the real space, after back-transformation), yet the variation range does not change noticeably. For this reason, the distortion is found to be not relevant for the aims of our study, and in this manuscript we always refer to � ρ computed in the standard-normal space.
Finally, to obtain the synthetic dimensional annual sequence of flood flows at site i, the i-th synthetic dimensionless sequence is multiplied by the local empirical mean annual flood, MAF i . Note that, due to sampling variability (i.e. limited record length), the empirical mean value of each generated dimensionless series is not exactly equal to 1; for this reason, the resulting dimensional series for site MAF i shows an arithmetic mean value that can differ from the local theoretical mean annual flood (MAF i , that is the empirical estimate from the real-world annual sequence for site i), and varies from realization to realization across the region according to the regional cross-correlation structure used in the random generation.
Moreover, it is worth noting here that, consistent with the hypothesis of homogeneity, the Monte Carlo generation of the synthetic dimensionless series considers no spatial pattern in the flood frequency distribution of the regional dimensionless samples, only different values of cross-correlation. The regional spatial pattern in the theoretical dimensional series is introduced by the local empirical mean annual flood, MAF i , which is always the same for the different realizations of the region, i.e. we mimic the same regional spatial pattern in terms of flood frequency distribution in all the Monte Carlo realizations. Figure 2 illustrates the cross-correlation structure in the standard-normal space as results from an empirical analysis of 1000 realizations for each cross-correlation scenario: (a) � ρ = 0.2, (b) � ρ = 0.6, (c) � ρ = 0.8. Figure 2(d) shows the overall distributions of empirical cross-correlation coefficients for the 1000 realizations of the three correlation scenarios, which are centred around the corresponding average cross-correlation (i.e. the median values of each box plot are consistent with the imposed � ρ values).

Step 3. Empirical flood quantiles
For each realization and each site in the region, empirical flood quantiles are then estimated for arbitrarily selected return periods T (i.e. T = 10, 30, 50, 100 years). The quantile estimator combines at-site and regional information. In particular, we refer to the LP3 distribution (i.e. no uncertainty on the model selection) and estimate the LP3 parameters of location and scale, μ and σ, respectively, locally by using the L-moments method (Hosking and Wallis 1997) (i.e. uncertainty from sampling error only on the moments of order 1 and 2); while the shape parameter χ is set equal to χ R (theoretical shape parameter, i.e. no uncertainty on the third-order moment).

Step 4. Prediction of flood quantiles in ungauged sites: application of GLS and TK in cross-validation
The Appendix recalls the theoretical bases of GLS and TK. Both methods are applied to the locally estimated (dimensional) empirical flood quantiles recalled in Section 2.3 for any realization that we generated as described above in order to predict flood quantiles at ungauged locations. Ungauged conditions at each and every site are simulated through an LOOCV scheme. For each realization in each cross-correlation scenario, we apply a simple and a multiple implementation of GLS and TK, as described in the following sections.

Application of GLS
The GLS analysis is carried out using the R-package WREG (i.e. Farmer 2017) (see also Eng et al. 2009) in the statistical environment R (R Core Team 2016). Applying the GLS procedure to a given site i requires estimates of standard deviation s i , regional skew G R;i , weighted skew G w;i and LP3 distribution standard deviate K i (see also Griffis and Stedinger 2007b). These variables are computed for the log-transformed samples of each synthetic region, as follows: • standard deviation s i is evaluated as the at-site empirical standard deviation; • we use the same value of regional skew G R for each and every catchment in the synthetic region (i.e. assumption of homogeneity); this value is computed as the empirical skew of the log-transformed regional dimensionless sample; • weighted skew G w;i for each site is computed as indicated in Equation (A4) in the Appendix, by combining the atsite empirical skew g i with the regional skew G R ; weights are computed as indicated in Equation (  The application of the GLS procedure requires an estimate of the cross-correlation structure, in terms of parameters θ and α in Equation (A6) (see Tasker and Stedinger 1989). We estimate the cross-correlation model relative to the distances between catchment centroids by optimizing model parameters θ and α through the ordinary least squares (OLS) procedure.
GLS regressions may use several catchment descriptors, e.g. drainage area, precipitation, elevation, etc. In the present study, we implement a GLS regional model in two ways: • simple-GLS (hereinafter referred to as s-GLS): the GLS quantile regression analysis is performed by considering drainage area A alone (as did in Archfield et al. 2013): where Q T i is the T-year flood for site i, A i is the drainage area for site i, and a T and b T are the GLS parameters; • multiple-GLS (hereinafter referred to as m-GLS): the GLS quantile regression analysis considers more catchment descriptors. In particular, we refer to drainage area A, mean annual precipitation MAP, latitude of catchment centroid Y g , and mean elevation H mean , which preliminary OLS stepwise log-linear regression analyses (see Draper and Smith 1981, Weisberg 1985, Chambers 1992 indicated are the most significant subset of descriptors (i.e. adjusted R 2 adj � 0.87 and normally distributed standardized residuals) for predicting the mean annual flood (MAF) for the reference real-world dataset described in the Supplementary material. The resulting multiple-GLS model can be described as: where a T , b T , c T , d T and e T are the GLS parameters. Equations (1 and 2) are then reduced to linear additive forms by means of log-transformation (see e.g. Thomas and Benson 1970, Pandey and Nguyen 1999, Griffis and Stedinger 2007a, Laio et al. 2011).
Both the s-and m-GLS applications are explored in an LOOCV scheme, by removing in turn one site from the dataset and by referring to the remaining N-1=19 sites while estimating (1) the cross-correlation structure (i.e. fitting of the model of Tasker and Stedinger 1989), (2) the GLS parameters, and finally (3) the flood quantiles Q T at the discarded site. Consistently with the Monte Carlo generation, where nestedness was not modelled, GLS is applied without removing eventual redundant nested sites.

Application of TK
The first step of the ungauged application of TK is the use of OLS to identify a regional power-law model between flood quantile Q T and basin area (see e.g. Pugliese et al. 2014Pugliese et al. , 2016. The OLS estimates are then used to standardize LP3 quantiles (i.e. Q T with T = 10, 30, 50, 100 years) at all sites: this fundamental step is necessary as TK directly handles drainage area as a key variable of the model itself.
We opt for two types of OLS regional power-law to keep the applications of GLS and TK consistent (see Section 2.4.1): • simple-OLS (s-OLS; i.e. considering only drainage area), resulting in a standard application of TK (hereinafter referred to as s-TK); • multiple-OLS (m-OLS; i.e. considering the five significant descriptors identified via stepwise regression analysis; see e.g. Equation (2)), resulting in a TK with external drift (hereinafter referred to as m-TK; see e.g. Laaha et al. 2013, for the use of m-TK for predicting stream temperatures).
Both s-OLS and m-OLS are applied in linear additive forms by means of log-transformation, and their natural estimates (i.e. back-transformed from the logarithmic to the natural space) are then used to standardize LP3 quantiles. TK interpolation is then applied using the R-package rtop (i.e. Skøien et al., 2014;Skøien, 2014) by fitting the sample variogram of the standardized quantiles with the five-parameter fractal-exponential model suggested in Skøien et al. (2006) through a modified version of weighted least squares (WLS) regression (see Cressie 1993). The fitted variogram model is then used to compute the kriging weights, referring to the six closest neighbouring stations (in line with recent sensitivity analyses, see e.g. Pugliese et al. 2014Pugliese et al. , 2016. The standardized quantiles are then predicted site by site using Equation (A9) in the Appendix. Finally, the TK estimates of the standardized quantiles are combined with the regression estimates resulting from s-OLS and m-OLS to obtain the s-TK and m-TK estimated T-year value at each site.
Both the s-TK and m-TK analyses are performed in an LOOCV scheme, by removing in turn one site from the dataset and referring to the remaining N-1=19 sites while estimating (1) the OLS regional power-law useful for standardizing flood quantiles, (2) the variogram (i.e. five-parameter fractal-exponential model suggested in Skøien et al. 2006), and (3) the flood quantiles Q T at the discarded site.

Performance metrics
We use two metrics for assessing the prediction performance for the considered versions of GLS (i.e. s-GLS and m-GLS) and TK (i.e. s-TK and m-TK). In particular, to assess the overall performance of GLS and TK in the entire region, we consider two error measures, RBIAS and RMSNE: where N is the number of sites, x i the estimated variable at site i and x i the observed value of the variable at site i. The choice of relative (i.e. RBIAS) or normalized (i.e. RMSNE) measures is made to quantify errors and performance, regardless of the size of the drainage area of each catchment.

Results
Box plots in Figs 3 and 4 depict the efficiency of GLS and TK in estimating empirical and theoretical flood  quantiles, respectively, in ungauged sites (note that "ungauged" refers here to the application of the LOOCV scheme). Each box plot depicts the distribution of 1000 values (i.e. one value for each realization of the region) of the selected performance measure for the selected method, return period and cross-correlation scenario. To enable one to visually compare the results across different return periods, methods and degree of cross-correlation, the y-axes in Figs 3 and 4 are fixed for each row.
Concerning the efficiencies in estimating the empirical estimates of flood quantiles, TK results in generally better predictions of empirical quantiles (i.e. lower RBIAS and lower RMSNE) than the corresponding version of GLS (see Fig. 3): s-TK outperforms s-GLS, and m-TK shows similar median RMNSE, but smaller RBIAS than m-GLS. Moreover, the comparison of s-GLS with m-GLS, and of s-TK with m-TK, shows that the inclusion of more catchment descriptors in the regression analysis (i.e. m-GLS; m-OLS for m-TK) leads to substantially improved performances. In particular, the weak performances of s-GLS can be explained by the fact that drainage area alone is not enough for fully describing MAF (and therefore dimensional flood quantiles) in the study region. Moreover, the similar behaviour of m-GLS and m-TK can be attributed to the fact that they both use the important physiographic information needed to explain variations in the mean flood. In this regard, as observed for the real-world study area (see S.3 in the Supplementary material), all the considered procedures are slightly positively biased, with median RBIAS values that are similar to the corresponding values obtained for the real-world application (see Table S1 for the 100-year flood quantiles), and RBIAS substantially decreasing from s-GLS to m-TK.
Another aspect shown in Fig. 3 is the dependence on T: for each � ρ and considered method, we observe that the higher the T value, the lower the performance (i.e. the higher the bias or uncertainty). This behaviour is expected and confirms that estimates of flood quantiles associated with lower probability of occurrence (i.e. higher return period T) are affected by higher uncertainties.
Finally, we observe a dependence of the performance in estimating empirical flood quantiles on the average regional cross-correlation � ρ: for a given method and return period T, the higher the � ρ the better the median of the performance and the lower the uncertainty, with much less dispersed performance. This behaviour is expected, especially for TK, which explicitly exploits cross-correlation in predicting flood quantiles.
With regards to the efficiencies of GLS and TK in estimating the theoretical flood quantiles, Fig. 4 confirms the trend already observed for empirical flood quantiles: s-TK, m-GLS and m-TK outperform s-GLS in predicting theoretical flood quantiles (i.e. lower RBIAS and lower RMSNE), and, in particular, m-GLS and m-TK perform similarly, with slightly less biased predictions for m-TK. This confirms that incorporating multiple regression in m-GLS and m-TK improves substantially the prediction performances. Moreover, the weak dependence on T is confirmed: the higher the T value, the higher the uncertainty.

Discussion
Notwithstanding the above-mentioned similarities in the relative behaviour of the methods, a significant difference is present in the extent of the performances: all the considered methods show generally higher uncertainty in estimating theoretical flood quantiles than empirical quantiles. This can be explained by the fact that the presence of cross-correlation hampers the identification of the theoretical flooding potential in the region, and none of the considered methods is able to effectively overcome this effect, unless the regional average cross-correlation is very limited (i.e. see results for � ρ = 0.2). An exception is observed for � ρ = 0.2, for which RBIAS values show a larger dispersion in predicting theoretical flood quantiles (compare panels (a) in Figs 3 and 4), yet RMNSE indicates higher accuracies for all the considered methods in predicting theoretical flood quantiles than their empirical estimates. In this regard, other important indications come from the dependence on � ρ: contrary to what is observed for empirical flood quantiles, for estimating theoretical flood quantiles, the higher the � ρ the lower the performance for both GLS and TK. As the spatial correlation has the effect of hampering the identification of the flood magnitude for a limited set of flood-flow observations, and TK exploits cross-correlation to perform its estimates, the decreasing performances and increasing uncertainty of s-TK and m-TK with increasing � ρ are expected. On the other hand, the increasing uncertainty of GLS with increasing � ρ shows that, even if GLS is able to reduce the hampering effect due to cross-correlation thanks to its efficient estimation of the covariance matrix (Kroll and Stedinger 1998), a remaining uncertainty, increasing with increasing levels of spatial correlation, is still present.
In summary, the Monte Carlo experiment performed in this study enables us to address the unsolved issue raised by Archfield et al. (2013) regarding the ability of GLS and TK in predicting the theoretical unknown flood quantiles in ungauged sites when the flood sequences observed at nested gauged catchments are affected by cross-correlation. The main outcome of our study is that, despite the different nature of GLS and TK and their different ways to treat spatial correlation, cross-correlation controls GLS and TK accuracy of prediction of empirical and theoretical flood quantiles in a very similar way. In particular, TK outperforms GLS when catchment area is the only catchment descriptor used for predicting theoretical flood quantiles, regardless of the level of crosscorrelation. These results are valid under the simplifying assumptions adopted in the analysis; a fair interpretation of the outcomes of our study, which is the first to address this very interesting issue, needs to acknowledge the simplifying hypotheses we adopted, which are recalled and discussed below.
We refer to synthetic regions that mimic the real-world reference dataset (i.e. an acceptably homogeneous region, with partially nested catchments, and no noticeable influence of nestedness on the cross-correlation structure; see the Supplementary material). Indeed, given the partial nestedness of catchments in the study area, the LOOCV scheme performed in our analyses refers to ungauged sites that are likely to have other gauges upstream or downstream within the same catchment, and therefore are not fully ungauged catchments in the strict sense. Although the presence of nested catchments represents a common situation in hydrological applications, it is important to recall that Stedinger and Tasker (1985) considered GLS very explicitly for fully ungauged catchments (no nested sites), and Veilleux et al. (2011) suggested to remove (redundant) nested gauged sites when they share a large portion of drainage area. For this reason, future studies could explore the more general case where cross-correlation is modelled by considering nestedness and GLS is applied by removing redundant sites (as suggested by e.g. Veilleux et al. 2011).
Another important aspect is related to the homogeneity of our study region: we refer to the straightforward case of a homogeneous region, yet the homogeneity hypothesis (i.e. the same growth factor but index flood varying in space) is not a fundamental prerequisite for the application of GLS and TK, which can indeed be applied to heterogeneous areas. For this reason, future studies could consider the application of a Monte Carlo experiment similar to the one considered here, but referring to a heterogeneous region (i.e. theoretical flood quantiles for each site referring to different values for the parameters of the chosen distribution). Also, our experiment (see Section 2.2) mimics the same regional spatial pattern (in terms of regional flood frequency distribution) in all the Monte Carlo realizations of the region: while no spatial pattern is introduced in the regional dimensionless samples (i.e. hypothesis of homogeneity), the generated dimensional samples depend on the local empirical mean annual flood, MAF i , which is always the same for the different realizations of the region. Therefore, our Monte Carlo generation does not model the dependence of the generated series from the considered catchment descriptors (i.e. covariates). In this context, future studies could consider synthetic series generated as a function of the covariates (as done e.g. by Stedinger and Tasker 1986).
Moreover, we considered overlapping annual sequences with record length equal to 35 years for every site, without investigating the sensitivity of the methods to different record lengths. Finally, we referred to the L-moments approach Wallis 1993, 1997) to fit the LP3 distribution, but we should consider that the GLS procedure implements another approach for fitting LP3 (see Section 2.4.1), and this difference could in part affect the results.
Relaxing the above-mentioned simplifying assumptions opens up interesting future research avenues that can deepen our current understanding on how best to handle spatial correlation of flood sequences for predicting flood quantiles in partially gauged or fully ungauged catchments.

Conclusions
The present study addresses an important research question raised by Archfield et al. (2013), namely understanding which of two techniques, Tasker-Stedinger generalized least squares (GLS) regression or top-kriging (TK), is better suited for predicting the theoretical unknown flood quantiles in ungauged sites when the observed flood sequences are affected by crosscorrelation and nestedness does not play a significant role on the cross-correlation structure of the study region. The performance of GLS and TK is evaluated relative to the prediction in ungauged conditions of the local empirical quantile estimates (i.e. those that would have been estimated if data were available at that location) and the theoretical flood quantiles.
The preliminary LOOCV analysis performed over a realworld study area (i.e. a spatially limited, hydrologically homogeneous region in Triveneto consisting of 20 partially nested catchments; see the Supplementary material) highlights that the behaviour of the simple (i.e. function of the drainage area alone) versions of GLS (i.e. s-GLS) and TK (i.e. s-TK) applied for predicting the 10-year flood is consistent with the results reported by Archfield et al. (2013). The better performances of s-TK compared to s-GLS are expected: referring to n neighbouring sites, TK is implicitly able to take some climate and geomorphological similarities between catchments into account. However, the inclusion of more catchment descriptors in the analysis (i.e. m-GLS, m-TK) can lead to substantially improved performances, especially for GLS. Although informative, our preliminary analysis cannot address the unsolved problem raised by Archfield et al. (2013), as the theoretical flood quantiles are unknown in real-world study areas.
To shed some light on the above-mentioned research question, we resort to the Monte Carlo simulation experiment described in Section 2, generating a total of 3000 realizations of the homogeneous region with different values of average regional cross-correlation (i.e. equal to 0.2, 0.6, 0.8), where the cross-correlation structure does not distinguish between nested and non-nested catchments. For each realization, GLS and TK are applied to obtain predictions of at-site flood quantiles (with return periods equal to 10, 30, 50, 100 years) in an LOOCV scheme. This application provides us with some significant information about the ability of GLS and TK to predict empirical estimates of flood quantiles and theoretical flood quantiles at nested ungauged catchments, when GLS is applied in its traditional formulation (i.e. non-Bayesian and without removing redundant nested gauged sites), consistent with Archfield et al. (2013). First, in agreement with what is observed for the real-world application, the multiple versions of both GLS and TK are better suited than the corresponding simple versions for predicting both empirical and theoretical flood quantiles. Moreover, the analyses highlight an analogous dependence of GLS and TK performance on the degree of cross-correlation: the larger the regional average cross-correlation, the better the empirical estimators of flood quantiles, and the poorer the estimators of the theoretical flood quantiles. These findings prove that, under the adopted simplifying assumptions, the presence of cross-correlation in the annual flood sequences hampers the identification of the theoretical flood magnitude for both GLS and TK. This behaviour is totally expected for TK (which explicitly exploits spatial correlation in performing its estimates), whereas what was observed for GLS may be explained by acknowledging the very specific and simplified situation considered in our study (i.e. homogeneous region with nested gauged sites). The investigation of a more general case where nested catchments may impact the cross-correlation structure, which should therefore be modelled in GLS by removing redundant sites, is left for future studies.
In particular, the multiple versions of GLS and TK show very similar performances: the application of m-GLS or m-TK is almost equivalent when descriptive geomorphological and climatic catchment attributes are available for representing mean annual flood; for practical estimation in cases like this, one could consider the application of a model-averaging approach between the two candidate models. On the contrary, when only a simple-regression analysis with drainage area is performed, the application of TK is recommended, especially in the presence of high degrees of spatial correlation. In this context, an interesting issue to be investigated in future studies could be the use of GLS (instead of OLS) for identifying the regional power-law model between flood quantiles and catchment descriptors when applying TK.
The findings outlined are valid for the simplified situation that is investigated here and suggest that, for a homogeneous region, the benefits of applying traditional GLS over TK are rather limited even in estimating theoretical flood quantiles. Therefore, in such a situation, the application of TK is suggested over traditional GLS. Further analyses regarding heterogeneous regions (i.e. the presence of a regional spatial signal), flood series generated as a function of catchment descriptors (see e.g. Stedinger and Tasker 1986), fully ungauged conditions (no information is available upstream or downstream within a given catchment, i.e. the absence of nested catchments) and/or the application of more recent versions of GLS (e.g. with removal of redundant nested gauged catchments, B-GLS), as well as sensitivity analyses of the results to the presence of different record lengths in the AMS series are suggested for future studies.
where X T indicates the transposition of matrix X, and Λ À 1 is the inverse of the weighting matrix Λ. Once estimated, β can be used to compute the regression estimates Ŷ (i.e. ŷ i at the ith gauge). Ordinary least squares (OLS) regression is traditionally used for estimating the regression parameters β , by setting Λ ¼ I in Equation (A2), where I indicates the identity matrix (Thomas and Benson 1970, Hardison 1971, Riggs 1973. To estimate β , OLS considers the total errors of the model to be homoscedastic and independently distributed (Riggs 1973); however, these assumptions are often violated in hydrological applications, where different streamgauges usually have different record lengths, and concurrent streamflows observed at different gauges in a region are often cross-correlated (Tasker and Stedinger 1989). If not properly represented in a regional analysis, cross-correlation affects the precision of regression parameters, and the estimators of flood characteristics are inefficient in that they are not as accurate as they could be (i.e. the higher the cross-correlation, the higher the model errors; see Stedinger and Tasker 1985).
To deal with situations where regression model residuals are heteroscedastic and cross-correlated, the regression parameters can be estimated by means of a generalized least squares (GLS) technique, which is the best linear unbiased estimator (BLUE) when the true residual error covariance matrix Λ is known (Johnston 1972). Unfortunately, in general Λ is unknown and an estimator must be employed. Stedinger andTasker (1985, 1986) and Tasker and Stedinger (1989) addressed the issue of the loss of efficiency from using an approximation of the covariance matrix when estimating streamflow characteristics in non-nested ungauged (i.e. fully ungauged) basins: they developed a procedure for estimating Λ accounting for both correlated streamflows and time-sampling errors. The Stedinger-Tasker GLS improves the representation of the overall regression error ε, by assuming that it equals the sum of the sampling error η for the estimates of the streamflow statistics (e.g. flood statistics), and the modelling error δ in modelling the theoretical streamflows (e.g. flood quantiles) across catchments. Kroll and Stedinger (1998) demonstrated that when estimating flood quantiles with the Stedinger-Tasker GLS Λ-estimator, estimation of Λ GLS results in little loss of efficiency. More recently, Reis et al. (2005) considered Bayesian estimators of model error and regression parameters. Note that, unlike feasible GLS (FGLS) estimation (which can be applied to any frequency distribution; see e.g. Greene 2002), the Stedinger-Tasker GLS regression is applied for a log-Pearson type III (LP3) frequency analysis (see Bulletin 17C by England et al. 2018), which is the reference procedure for estimating streamflow characteristics in ungauged catchments in the US (Eng et al. 2009).
In particular, the Stedinger-Tasker GLS (hereinafter referred to as GLS, for the sake of brevity) computes the regression parameters by setting Λ ¼ Λ GLS in Equation (A2), where Λ GLS contains the estimates of the covariances of ε i among gauged sites. The main diagonal elements of Λ GLS include a part associated with the model error δ i and all elements include the effect of the time-sampling errors η i . For streamflow characteristics that are computed from a log-Pearson type III (LP3) frequency analysis (see Bulletin 17B of the Interagency Advisory Committee on Water Data 1982, and the most recent Bulletin 17C by, England et al. 2018), Tasker and Stedinger (1989) propose to estimate Λ GLS as follows: