Relation between design floods based on daily maxima and daily means: use of the Peak Over Threshold approach in the Upper Nysa Kłodzka Basin (SW Poland)

ABSTRACT The estimation of flood quantiles is crucial in the assessment of the magnitude and frequency of floods. We carried out a comparative analysis of design discharges estimated from both daily maximum flows and daily mean flows for four mountainous catchments located in the Upper Nysa Kłodzka river basin (SW Poland). After separation of baseflow, split of the riverflow time series in independent events, and selection of the Peak Over Threshold sample, the parameters of the Generalized Pareto Distribution were estimated using the Hill statistic, after bias correction, and considering asymptotic properties. The comparison was performed for various return periods, where the long return periods were of main concern. The jack-knife approach was used to assess the uncertainty of the predicted flood quantiles, and comparison was made with an alternative approach based on annual maxima. We found a meaningful level of differences between daily maximum and mean design discharges and between the rate of change of flood magnitude for which the level (i) stabilized with increasing return period, (ii) decreased downstream, and (iii) was large for catchments susceptible to flooding and with high elevation change. Results are useful in practice when daily maximum discharge is not routinely recorded.


Introduction
The main objective of Flood Frequency Analysis (FFA) is to estimate flood quantile magnitudes for given return periods T. Such estimates are applied in flood hazard and risk assessment, and in designing water protection devices, including levees, dams, barrages, canals, floodgates and polders or other structures such as bridges and basements for viaducts.
In support of these applications, FFA delivers Flood Frequency Curve (FFC) that depicts relationships between flood magnitude and T. In an FFC, the response behaviour of a basin to rainfall is represented because its shape reflects the dynamics of floods. Different methods, however, exist to construct such FFC and analyse its shape (Kellagher 2013).
In the Peak Over Threshold (POT) method a discharge X over a certain threshold x t (X > x t ) is assumed to follow the Generalized Pareto Distribution (GPD). The approach is based on the Pickands-Balkema-de Haan theorem (Balkema & de Haan 1974, Pickands III 2000 stating that the GPD is the limiting distribution function of X if x t ! 1 . The cumulative distribution function (CDF) of the GPD is where g 2 R, s > 0 are parameters of shape and scale, respectively. The tail 1 ¡ G is heavy if g > 0, normal if g D 0 (exponential CDF) and light (truncated) if g < 0. The shape g is known as the extreme value index. The scale parameter depends linearly on the threshold x t , with proportionality factor equal to the shape parameter for sufficiently high x t , s D gx t . High quantiles of GPD are especially responsive to g and very sensitive to deviations of g from 0 (Davidson & Smith 1990, Brilhante 2004. Therefore, the issue of how to distinguish g D 0 from g > 0 is of great importance in FFA and was addressed in the context of rainfalls (Gomes & Van Montfort 1987, Groisman et al. 2005, Kozubowski et al. 2008) and discharges (Van Montfort & Witter 1985, Smith 1989).
The main advantage of the POT approach in comparison to the use of block maxima (e.g. annual maxima AM) is an increase of the sample size by considering all independent peak flows in the time series instead of only the AM. Next, the GPD can cover a wide range of tail weights. The disadvantage is the inclusion of a subjective judgment into the POT selection and GPD calibration process, and the strong sensitivity of high quantile estimates to the threshold choice. The main difficulties in selecting the POT series are to ensure independence between successive floods and to select the threshold that produces the lowest bias of quantile estimates.
The problem of the choice of independent events and the truncation level are related to each other as a threshold should be high enough to ensure independence. However, it should not be too high to reach much profit from a large sample size. A two-step approach consisting of the identification of the sample of independent events (physical declustering) and setting a threshold high enough to yield convergence to GPD (statistical optimization) usually leads to a reliable estimation of the parameters of the GPD (Bernardara et al. 2014). To provide independent events, Cunnane (1979) proposed to select consecutive peaks the time between which is at least three times longer than the average time to peak and with the smallest peak lower than two-thirds of the highest peak, the Water Resources Council (USWRC 1982) advised that the interexceedance time is not shorter than 5 C ln (A) (where A is the catchment area in square miles) and the drop of the interevent discharge below 75% of the lowest peak (Madsen et al. 1997, Lang et al. 1999, Ba cov a-Mitkov a & Onderka 2010, Bezak et al. 2014). Yet another method was introduced by Willems (2009) where the extraction of approximately independent events was based on split of the series into quick and slow flow hydrograph periods. Since then, this method has been increasingly adopted both for river discharges and rainfall intensities.
In relation to the problem of the threshold selection, Langbein (1949) proposed the smallest AM discharge as the initial threshold which is then increased to meet independence criteria, and Cunnane (1979) included a test of the Poissonian number of exceedances based on dispersion index. The issue of the mean number of exceedances has been widely discussed (Dalrymple 1960, Madsen & Rosbjerg 1997b, Lang et al. 1999, Robson & Reed 1999. Those procedures may suffer from some subjectivity and from lack of unique guidelines (Serinaldi & Kilsby 2014). In the method used by Willems et al. (2007), the condition of optimal goodness-of-fit between the estimated and observed quantiles was introduced, as an objective judgment, on the threshold selection.
Calibration of the GPD was conducted using various methods. Madsen and Rosbjerg (1997a) compared the precision of the 100-year quantile assessment for the method of moments, probability weighted moments and maximum likelihood (ML). Martins and Stedinger (2001) proposed the Generalized Maximum Likelihood (GML) method which combines ML and Bayesian approaches. Another approach to estimating of the GPD parameters was adopted by Willems et al. (2007) who used the Hill estimator of g for POT models of river discharges. This method has been successfully applied for various world regions both to river discharges and rainfall intensities (Boniphace & Willems 2011, Onyutha & Willems 2015, Taye & Willems 2011. The distribution of excess values over a threshold converges to the GPD if the threshold increases to infinity. Thus, the calibration of the GPD may lead to a bias of the estimator of g which also leads to a bias of quantile estimates. Therefore, a bias correction method should be included to reveal a more reliable shape parameter estimate and lessen the bias of high flood quantile estimates. In this paper, a bias correction method based on slowly varying function was used (Beirlant et al. 1999). This method was implemented by Willems et al. (2007) to river discharges.
The POT models were applied to various regions of Poland by Zieli nska (1965), Strupczewski (1967), Banasik and Byczkowski (2007), Byczkowski et al. (2008), Rutkowska and Banasik (2014). In this paper, the method of Willems et al. (2007), Willems (2009) was used based on the generalization of the Chapman filter for extraction of the slow and quick runoff fractions and the split of the time series in independent quick runoff event based on independency criteria. For calibration of the GPD, the Hill estimates and the bias correction were used.
The main objective of this paper is to find the relationship between X b , the means-based design discharge, and X a , the maxima-based design discharge based on discharge data from four river gauging stations located within the upper part of the Nysa K»odzka river basin. The differences between shape parameter estimates, design discharges per unit area and rate of change of FFC per unit area are studied. Additionally, catchment characteristics that have the strongest impact on the difference between X a and X b are also identified.

Rationale for the study
Not uncommonly, even if daily discharge maxima are needed for certain applications (e.g. dam management, hydrologic modelling), they are/were not measured routinely at hydrologic gauges. For instance, historical data, acquired before the era of automatic hydrologic stations, may be available as mean daily discharge calculated from very few readings a day, none of which captures a daily maximum. Although recent automatic hydrometric gauges allow us to measure water levels or discharges with unprecedented sampling intervals of minutes, there exist hydrologic observational networks, or their parts, in which measurements are carried out only a few times a day. This may occur either at sites where there is no automatic gauge or at gauges in which sensors are temporarily out of order. In such cases, an experienced observer is responsible for carrying out the measurements in question, and this is routinely done a few times per day (e.g. 6:00 UTC, 6:00 + 18:00 UTC, 6:00 + 12:00 + 18:00 UTC). This kind of sampling may lead to omissions of observations of daily peak flows. When we know the relationship between X a and X b from previous studies and when we accept that calculating a mean value from these small data-sets produces a meaningful estimate of the true mean, we are able to apply the method elaborated in this paper to reconstruct the peak flow. Therefore, knowledge gained through the reconstruction may be important for various types of water engineering applications, such as for the design of hydraulic structures (peak discharges remain the key information for engineers) or for hydrologic forecasting (when mean is forecasted and the prognosis of the peak flow may be established using the approach outlined in this article). In order to confirm that the described application of our method is feasible, we refer to the official hydrometeorological network of Poland which is built and maintained by the Institute of Meteorology and Water Management National Research Institute (Instytut Meteorologii i Gospodarki Wodnej, IMGW-PIB). Even though this network is advanced, with over one thousand fully automatic weather or hydrologic gauges (Dubicki 2007), there are individual sites in which an observer conducts water level or flow measurements. This is currently the case in the town of Zgorzelec in southwestern Poland where water levels are observed every 12 hours.
Thus, having such mean daily discharge data, it is possible to process them in order to estimate the daily maxima time series. The problem is that these two types of data over certain thresholds represent two different random variables. The first variable, the maxima-based discharge (X a ) is stochastically larger than the second variable, the means-based discharge (X b ). Thereby, for a certain return period T, the design discharge estimated from X b is lower than the design discharge estimated from X a , and the use of X b instead of X a can result in an incorrect assessment of flood risk. The difference between X a and X b is substantial for rivers with high flood dynamics. In this case, if only mean daily discharges X b are available, these may be transformed to maxima-based discharges X a . A key issue then is the recognition of the proper transformation between X b and X a . The transformation is catchment-specific.
In this paper, discharges of rivers in the Nysa K»odzka basin are studied. The basin is a part of the Odra basin, the second longest river in Poland. The necessity of proper management of flood risk is undeniable, as the region is very susceptible to flooding. The region has suffered from severe floods for centuries (Dubicki et al. 2005, Kasprzak 2010, Czaja et al. 2014). The Odra basin was investigated by many authors. Among others, Sen and Niedzielski (2010), Niedzielski (2007Niedzielski ( , 2010Niedzielski ( , 2011 as well as Niedzielski and Mizi nski (2016) examined statistical characteristics of riverflow variability in the basin and attempted to forecast the hydrographs. It was found that discharges at gauges located in the upper and middle Odra basin revealed heavy-tailed dynamics and may be controlled by many processes among which El Niño/Southern Oscillation has been identified. However, the predictability of the hydrographs with data-based models is acceptable for short lead times and is rather limited to a rising limb of hydrograph, with daily peak discharge being usually mismodelled. Hence, there is a need for seeking methods that may help resolve a daily maximum discharge in the process of forecasting high flow episodes when input data are daily means rather than daily maxima.

Studies on the relationship between instantaneous peak flows (IPF) and mean daily flows (MDF)
Studies on this relationship have been carried out by Langbein (1944), Sangal (1983), and Fill and Steiner (2003) who estimated IPF using a linear combination of MDF from days prior and next to the flow peak. Next, the ratio of MDF to IPF as a function of catchment area was studied by Fuller (1914), Tucci (1991), and Ellis and Gray (1966). Regionalization of this ratio was also proposed enabling the introduction of a series of flows that are shorter than those required by the local approach and that could be used to estimate design floods in places without instantaneous peak flows measurements. Canuti and Moisello (1982) included basin geomorphic parameters into this relation for groups of rivers in Tuscany (Italy). Taguas et al. (2008) found, using the Principal Component Analysis technique, several regional linear equations for rivers in the South Mediterranean Basin of Spain and used such characteristics as rainfall, catchment area and slope. Dastorani et al. (2013) and Shabani and Shabani (2012) estimated IPF from MDF for rivers in Iran using Artificial Neural Networks. Ding et al. (2016) compared various methods to estimate the distribution function of IPF using HBV hydrologic model while IPF quantiles were estimated from MDF quantiles using multiple regression scheme. A common formula between IPF and MDF for groups of catchments was found by Ellis and Gray (1966) (Canada), Taguas et al. (2008) (Spain), Cotecchia (1965) and Tonini (1969) (Italy). In the studies mentioned above, MDF is the mean daily flow at the day of peak. Also note that in this paper no such assumption was introduced because of our focus on design flows, i.e. flows for given return period. The days of maximum flows X a and the days of mean flows X b selected for POT samples may differ. This problem is discussed in Sections 3.5 and 4.5.

Study area and data
The research focuses on the rivers draining the K»odzko Valley (SW Poland), a mid-mountain abasement situated in the Sudetes Mountains. The specific geographical setting of the K»odzko Valley is associated with its location between several mountain chains, with considerable relative elevations observed in respect to the bottom of the valley. The main river of the K»odzko Valley is Nysa K»odzka which, after passing through the trough in Bardo at the approximate height of 260 m a.s.l., flows into the Odra River (the second largest river in Poland). There are several tributaries of the Nysa K»odzka river, most of which are mountainous rivers that are flood-prone (Kasprzak 2010) and reveal short flood wave propagation periods. Extreme peak flows were recorded along most of them, but the Nysa K»odzka river and its right tributary Bia»a Lądecka are particularly endangered (Latocha & Parz och 2010). Their upper parts drain the entire Snie_ znik Massif, the highest mountain chain in the study area. This Massif experiences rainfall episodes which are usually more intensive than in other mountain chains in the K»odzko Valley. For these reasons, we selected these two rivers for this study, with three gauging stations along the Nysa K»odzka river (at Bystrzyca K»odzka, K»odzko, Bardo) and one along the Bia»a Lądecka river (at _ Zelazno). The studied catchments were shown in Figure 1.
Although the two rivers in their upper parts drain the entire Snie_ znik Massif, they have several differences. The most profound difference is a dissimilar land cover of the basins, with various percentages of forested areas. Three sub-basins upstream of Bystrzyca K»odzka, K»odzko and Bardo reveal similar values of the Terrain Ruggedness Index (TRI), while the remaining sub-basin upstream of _ Zelazno is more rugged than the former three basins. The TRI is a measure of terrain elevation heterogeneity. It is the root mean square elevation change between any point on a grid and its surrounding area (Riley et al. 1999). The catchment characteristics are juxtaposed in Table 1. The difference in river defences is also noticeable. The number of settlements along the Bia»a Lądecka river banks is observed to be much higher than along the Nysa K»odzka, thus the river course of the former has been much more protected than of the latter. Along the Bia»a Lądecka river, many buildings, bank reinforcements and groynes are tied together side by side to form a continuous system of protection measures; mostly along the middle and lower river reaches Witek 2013, Witek 2012). This causes an increase in the steepness of the river banks, and of the river flow response time to rainfall. Along the Nysa K»odzka river, in turn, the length of the regulated banks is much lower because the reinforcements are mostly located along the main cities.
Water levels at 22 automatic river gauging stations in the K»odzko Valley and its vicinity are measured in real time with approximately 15-minute temporal resolution by the Local System for Flood Monitoring (Lokalny System Os»ony Przeciwpowodziowej -LSOP) of K»odzko County. The system was designed to provide a riverflow monitoring platform available both for authorities and citizens.
The LSOP system has recently been coupled with the innovative HydroProg infrastructure (Niedzielski et al. 2014) to calculate and publish predictions of water levels in real time (Niedzielski and Mizi nski 2016). The gauges selected for this studynamely Bystrzyca K»odzka, K»odzko, Bardo and _ Zelaznoare covered by the LSOP infrastructure, and hence the unique 15-minute water level data are available at these sites. The data were pre-processed within the HydroProg system to: (1) produce equally sampled 15-minute water-level time series, (2) express time in UTC (Universal Time Coordinated) and (3) remove artefacts from a raw water-level time series using Rosner's test (Rosner 1983). We limited our analysis to the hydrologic years 2006-2014, namely to the time period 01/11/2005-31/10/2014 according to the Polish definition of hydrologic year.
Each 15-minute, water-level time series was subsequently transformed into discharge. This procedure was feasible since the above-mentioned four gauges are common both for the LSOP gauging network and the state hydrological network which is managed and maintained by the Institute of Meteorology and Water Management -National Research Institute (Instytut Meteorologii i Gospodarki Wodnej -Pa nstwowy Instytut Badawczy, IMGW-PIB, Poland). Tabulated rating curves for these sites have been acquired from IMGW-PIB. After obtaining the 15-minute discharge time series, we produced two daily series, of daily maximum and daily mean discharges.
Additionally, empirical AM discharges from three gauging stations: Bystrzyca K»odzka, K»odzko and _ Zelazno were included in the study. The AM data were obtained from IMGW-PIB. AM were not available for Bardo.
Of note is that measurements from LSOP and from IMGW-PIB are independent of each other. We used the series from LSOP for the estimation of design discharge by means of the POT method and the series from IMGW-PIB for validation. That data differed by origin, which is reflected in different annual maxima peak values in the common data period 2006-2014. The length of AM series was 63 years (Bystrzyca K»odzka, K»odzko) and 53 years ( _ Zelazno).

Calibration of the GPD and estimation of design discharge
The calibration of the GPD to POT values, starting from a daily flow series, involves different steps (Willems 2009): (i) separation of the flow time series in quick and slow subflows, (ii) separation of the quick subflow series in independent events and selection of the POT values and (iii) calibration of the GPD. In (i), a generalization of the recursive digital filter of Chapman (1991) was used to derive the baseflow component in the total flow series, and use it to separate the quick flow from the slow flow (baseflow) runoff subseries.
In (ii), the following criteria were used: the peaks had to exceed a given threshold (higher than baseflow of the quick flow component); the inter-event time between two successive events had to be greater than the recession constant; and the difference between the lower discharge and the baseflow divided by the higher discharge had to be smaller than a fixed fraction. Several interevent times, thresholds and fractions were considered. Finally, as a result of (ii), several discharge samples x 1 , ..., x m were designed to further study for each catchment. It is important to note that some physical flow characteristics, such as the flow recession times and related hydrograph shapes were considered in stages (i) and (ii), while only a purely probabilistic approach was used in (iii). Based on the criteria in (ii), only independent hydrograph events were considered for (iii). This multi-step procedure is in line with the need both for the physical declustering and for statistical optimization (Bernardara et al. 2014).
Subsequently, for every subsample x 1 ... x t (t m), the shape parameter g was estimated in (iii) as the Hill statistic (Beirlant et al. 1996, Willems 1998, Willems et al. 2007, Onyutha and Willems 2015 which is the mean excess of the log-transformed data over the threshold x t , where t ¼ P x j x t 1. In assessing the degree of agreement between the empirical and theoretical GPD cumulative distribution functions, the Pareto quantile-quantile plot (qq plot) was helpful because the shape parameter estimate b g is equal to the slope of this plot. The Pareto qq plot depicts the logarithms of POT values versus minus logarithms of exceedance probabilities, (¡ln (1 ¡ G(x)), ln x). Another helpful plot was obtained after regressing b g against x t . Two disjoint regions of x t were observed in this plot while t varied from 1 to m, namely S 1 where b g was approximately stable and t was not much less than m, and S 2 where b g strongly fluctuated and t was far from m. The region S 2 was of no interest in further considerations because of the high variance of b g for very high thresholds. The final choice of the POT sample was based on the optimal threshold x t , selected from S 1 , by minimizing the asymptotic mean squared error whose estimate is Beirlant et al. (1996) assuming the Hill weights w j ¼ À 1 ln j t ; j ¼ 1; :::; t À 1. The optimal threshold is a trade-off between low and high threshold values where the former cause a high bias of b g and the latter cause a high variance of b g . In case of small POT samples, the very careful choice of x t is especially advisable. The main advantage of the Hill weights is pulling regression towards matching data of low probability of exceedance as such data are especially important for flood frequency analysis. The Hill estimator b g is consistent and its bias is of order n tÀ1 À Á Àsg (Beirlant et al. 1996), hence the higher s the quicker the convergence of b g to g. The asymptotic variance of b g is Varðb g Þ ¼ b g 2 tÀ1 . The number of POT values during a period of 1 year is assumed to follow a Poisson distribution. To verify this hypothesis, a test based on dispersion index C was used. This index equals 1 for the poissonian number of exceedances over threshold. Theoretically, C ¢ (n ¡ 1) asymptotically follows a x 2 distribution with n ¡ 1 degrees of freedom where n is the sample size (Cunnane 1979, Lang et al. 1999. In this study, many divisions of the observation period in 1-year time slots were considered while exceedances were summed up in every time slot. The test was then completed for every division. The return period T of a given flow level x was derived as the mean recurrence interval between exceedances over x, which is the inverse of the probability that a flow is not less than that level, with respect to m, the mean number of exceedances per year (Rosbjerg 1985), The theoretical design discharge x (theoretical flood magnitude) of a certain return period T is a quantile of order 1 À 1 Tm of the GPD distribution, assuming that Tm > 1. Provided that g 6 ¼ 0 and s D gx t , this quantile equals to Observe that if g > 0 then the condition Tm > 1 must hold to ensure discharges to be greater than x t . If the threshold x t is low, bias correction (BC) should be introduced to the GPD parameter estimation. Such a low threshold can be obtained for small sample size when the calibrated GPD not sufficiently follows high quantiles and when the slope b g in qq plot is considerably different than its asymptotic value. The bias correction was based on the slowly varying function and was modelled according to the formula (Willems et al. 2007) z ¼ Àg 1 ln p À ξðpÞ ¼ Àg 1 ln p 1 þ ξðpÞ g 1 ln p where z D ln x, lim p ! 0 ξðpÞ g 1 ln ðpÞ ¼ 0 and ξ(p) is a slowly varying function, lim p ! 0 ξðλpÞ ξðpÞ ¼ 0 for any λ > 0. The simplified assumption on ξ was introduced for p values close to 0, namely ξ(p) D cp ¡r (Willems et al. 2007) where c, r were constant values. Moreover, r is always less than 0 and is more negative for small bias and near to 0 for high bias. Therefore, the estimate of the asymptotic slope g 1 was obtained after shifting of logs of observations according to the equation z þ ξðpÞ ¼ Àg 1 ln p: This means that the shift ξ(p) is high for low observations, small for high observations and zero for the highest observation. If c D 0, no correction is introduced. In calibration of ξ, the minimum of MSE was required. Willems et al. (2007) pointed out that BC usually yields lower threshold value and lower shape estimate than the model without BC. As a final result of (iii), the parameter estimates of the calibrated GPD were obtained, both for daily maxima flows and daily mean flows.
The goodness of fit between empirical and theoretical distribution functions was tested using the Anderson-Darling (AD) test (Anderson & Darling 1954, Laio 2004, where n is the sample size.
Remembering that p ¼ 1 Tm , the design discharge was estimated as xðTÞ ¼ x t ðTmÞ g 1 ¢e ÀcðTmÞ r : Observe that Equation (9) is a generalization of Equation (5) for a model with BC. The FFC was obtained from Equation (9) for every POT sample. Special attention was paid to g 1 , the strongest factor influencing the FFC for long T.
The shape of FFC was assessed using the derivative of x with respect to T, dx dT ¼ x t m g 1 T g 1 À1 ¢ðg 1 À rcðTmÞ r Þ ¢e ÀcðTmÞ r which is the rate of change of the flood magnitude as a function of T, or, in other words, it is the slope (steepness measure) of the FFC. In Equations (9) and (10) the values x t , g 1 were replaced by their estimates, x t,a , g a calculated from daily maxima and x t,b , g b calculated from daily means. The values of m a , m b represented the mean number of exceedances per year in POT maxima and POT means sample.
Equation (10) was used for assessing the flood dynamics. High values of dx dT reflect the fact that even small changes of return period cause large changes in design discharge. If applied to flood risk assessment, a high value of dx dT is a sign of a catchment's high susceptibility to flooding. Note that if 0 < g 1 < 1 then increase of T to C1 causes decrease of dx dT to 0, which means that the FFC stabilizes with increasing T.

Comparisons
Our study focused on comparisons between: (i) g a and g b at every station, (ii) X a (X b ) per unit area for every station, (iii) ln X a and ln X b for every station, (iv) rate of change dX a dT ( dX b dT ) per unit area and rate of change for every station, (v) catchment characteristics that are supposed to mostly influence the magnitude of the difference between X a and X b .
In (ii), (iii) and (iv) the comparisons were made for various return periods. In (iii), the variable U D ln (X a ) ¡ ln (X b ) was estimated from Equation (9) as: where d ¼ ln is a constant value for every catchment. If T tends to C1 then the last two terms tend to zero. If, additionally, g a > g b then U increases linearly with logarithm of T. The slower the increase the nearer to zero the difference g a ¡ g b is. Therefore, U(T) is for large T mainly influenced by d and g a ¡ g b . It is worth pointing out that U can be considered as the term which is to add to the logarithm of the design flood magnitude estimated from daily mean discharge to receive the logarithm of the design flood magnitude of the same return period estimated from daily maximum discharge. In other words, V D exp(U) is the factor by which X b has to be multiplied to obtain X a . The factor V represents the transformation between the variables X b and X a . This factor increases with T while the rate of increase depends on the parameters of the GPD. For FFA applications, it is crucial that this factor be properly estimated. From practical point of view, the consideration of Equation (11) may be limited to the first two terms for large T, especially if r a , r b < ¡1, i.e. when the influence of the bias correcting terms is low.

Uncertainty and accuracy of quantile estimates
We also assessed the uncertainty both for X a and X b . This problem was of high importance because small samples often lead to highly uncertain parameter estimates which, in turn, lead to low confidence of design discharge estimates. This was expressed through the 90% confidence interval around the FFC. For every catchment, the leave-one-out jack-knife bootstrapping method (Efron & Tibshirani 1993, Castellarin et al. 2004, Sahinler & Topuz 2007) was used to estimate the confidence intervals (CIs). Each element, in turn, was dropped from the sample of independent events to obtain the bootstrap jack-knife sample of independent events, and the different GPD calibration steps repeated: (1) Derivation of the parameters of the GPD for threshold values successively increasing from the lowest to the highest value in the bootstrap jack-knife sample, (2) Selection of the final bootstrap threshold by minimizing the MSE (Equation (3)). The threshold was selected from S 1 , the region of stability of the shape parameter estimate. In that manner, the final POT bootstrap jack-knife sample of values greater or equal to the bootstrap threshold was selected, (3) Estimation of flood quantile magnitudes (Equation (9)) using parameters of the GPD which were derived from the POT bootstrap jack-knife sample.
To include BC, the range of variability of c and r was studied in the original POT sample for threshold values varying in S 1 . It was noticed that c and r varied by §30% then. Therefore, such a variability of c and r was imposed when flood quantile magnitudes were estimated from Equation (9).
As a result, a sample of flood magnitudes was obtained for every T. The lower and upper confidence limits were calculated as the 5% and 95% quantiles of that sample.
The relative bias was derived to assess the accuracy of quantile estimates. First, probabilities of exceedance and return periods of empirical, ordered quantiles x j , j D 1, ..., t (j D 1 to the highest) of the original POT sample were calculated as p j ¼ j t , T j ¼ 1 m ¢ t j , respectively. Next, x j were compared to x(T j ), the theoretical quantiles of the same return period (Equation (9)), using The probability plot correlation coefficient (PPCC) was also calculated; this is the Pearson correlation coefficient between theoretical and empirical quantiles.

Validation based on AM data
The model for X a was validated using the AM data for Bystrzyca K»odzka, K»odzko and _ Zelazno. It should be pointed out that the source of AM discharges differed from the source of daily discharges on the basis of which the GPD was calibrated, as stated in Section 2. AM discharges y i , i D 1, ..., n were compared to theoretical design discharges X a of the same return period T i through the relative bias and root mean squared error, namely Subsequently, Generalised Extreme Value (GEV) distribution was fitted to AM data wherein the parameters of the GEV were estimated using the ML method, and compared with the calibrated Results moreover were compared with results from existing calculation methods of IPF from MDF. The equation IPF D MDF ¢ (1 C 2.66 ¢ A ¡0.3 ) was proposed by Fuller (1914), where A (km 2 ) is the catchment area. However, according to Fuller (1914) and Dastorani et al. (2013), the MDF should consist of daily mean flows of the days that contain the peaks. It should be noted that days of peaks of X a and X b are not always identical. In some cases, the mean daily flow X b from the POT sample is observed one to three days before or after the day of peak flow X a . The dates are sometimes inconsistent because no X b is observed in the POT sample around the time of peak flow X a . This inconsistency is discussed in Section 4.5. Summarizing, MDF 6 ¼ X b while IPF D X a . Therefore, to compare results to other studies, the sample MDF had to be extracted directly from the series of daily mean flows while the IPF was assumed as X a .
Other studies were based on IPF, Q 2 (equal to MDF) and Q 1 (Q 3 ), the instantaneous peak flow, the daily mean flow of the day of peak and the mean daily flow in the previous (next) day to peak, respectively. Two formulas were used, formula (S): IPF ¼ ð4 ¢MDFÀQ 1 ÀQ 3 Þ 2 (Sangal 1983) and formula (FS): (Fill & Steiner 2003).
The approach of Taguas et al. (2008) aimed at establishing regions with a similar rate k D IPF/MDF, where k was computed using the Least Squares method. These rates from basins in southeastern Spain and from the four studied catchments were compared. Of note is that the rate k is not equivalent to the factor V D exp(U) (Equation (11)) because k is based on X a and MDF while V is based on X a and X b .

Calibration of the GPD
To separate the baseflow and to extract independent events, various recession constants, lengths of independency period and other filter parameters were considered. The GPD parameters were calculated using the methods discussed in Section 3.1. The AD test showed a good fit of the POT samples to theoretical GPDs. The AD test statistic varied from 0.12 for daily means based sample from Bardo to 0.52 for daily maxima based sample from Bardo, being less than the critical value equal to 0.6 (Arshad et al. 2003). Subsequently, the BC was introduced. For all series, the condition of minimum MSE yielded lower shape parameter estimates and exactly the same threshold values as before correction. In Figure 2, the qq plot and the bias corrected qq plot for the POT sample extracted from the daily maxima flows in K»odzko are presented. The slope of the former is equal to b g (Equation (2)) while the slope of the latter is equal to g 1 obtained after shifting of natural logarithms of observations by ξ(p) (Equation (7)). It is confirmed in this plot that g 1 < b g . The variance of the shifted sample is lower than that of the original sample. The shift ξ(p) vanishes with increasing ¡ln p. Similar graphs were obtained for other series.
The final choice of the maxima-based (means-based) POT samples resulted in the following record lengths for Bystrzyca K»odzka, K»odzko, Bardo and _ Zelazno,respectively: 22 (19), 17 (17), 19 (19) and 18 (13) To complete the test based on the dispersion index, the 9-year observation period was first divided into 1-year time slots for every POT sample. To yield the second division, all these slots were shifted simultaneously by the time between the first two events in the POT sample so that the first time slot started at the time of the second event while the first event was omitted. Then, the next division was obtained by shifting the slots by the time between the second and the third event, and so on. To summarize, eight sets of several 1-year slots were considered for every POT sample, creating 64 sets in total. In as many as 53 cases, the test based on the dispersion index did not reject the hypothesis. Therefore, the distribution of the number of exceedances over threshold was found to follow the Poisson distribution in every POT sample.

Comparisons
4.2.1. Between g a and g b _ Zelazno is featured by the largest difference between these two values, Bystrzyca K»odzka and K»odzko by the second and third largest, and for Bardo the parameters are nearly the same. The high difference between g a and g b suggests a quick increase of the difference between x a and x b with T, according to Equation (11). The shape parameter estimates differed considerably from 0 thus the tail of the GPD distribution is heavy for both maxima-based and means-based design discharges.

Between X a (X b ) per unit area for various stations
In Figure 3 the theoretical discharges X a and X b per unit area were depicted in FFCs while T varied from 20 to 500. The highest discharge value per unit area for X a was recognized at the _ Zelazno outlet. From up-to downstream along the Nysa K»odzka river, the highest value of X a per unit area was obtained for the first gauge (Bystrzyca K»odzka), followed by K»odzko and Bardo. Hence, more downstream locations show lower discharge magnitude for X a per unit area. Similar properties were observed for X b . However, for downstream located stations, the differences between X b per unit area were lower than the differences between X a per unit area.
The T ¡ year flood magnitudes for T D 20, 50, 100 (years) were depicted in Figure 4. The largest values of X a per unit area are observed for _ Zelazno, followed by the three gauges more downstream along the Nysa K»odzka river.
When observing Figure 4 we conclude that the 50-year design discharge X a per unit area on the Bia»a Lądecka at _ Zelazno is comparable to the 100-year design discharge X a per unit area on the Nysa K»odzka at Bystrzyca K»odzka. Further comparisons reveal that the 50-year discharge per unit area at _ Zelazno is similar to the 200-year flood at K»odzko.

Between X a and X b for every station
For every station, the direct comparison between X a and X b per unit area was conducted for various T values. The largest absolute differences for T D 20, 50 and 100 years were noticed at _ Zelazno, and the lowest at Bardo. The largest relative differences for T D 20, 50 and 100 years were noticed at Bystrzyca K»odzka, reaching 76%, 80% and 84% and the lowest at Bardo, namely 42%, 43%, 43%. What clearly distinguishes _ Zelazno from other stations is the strong increase of the relative difference from 59% for T D 20 years to 72% for T D 100 years.
Graphical depiction of the X a and X b for T D 20, …, 500 years is shown in Figure 5. The quickest increase of the frequency curves is apparent for _ Zelazno. When logarithmized values are considered, the increase of the curves is limited for high T and the curves are nearly parallel.
A graph of the difference U D ln X a ¡ ln X b can be seen in Figure 6(a). The largest U is found for Bystrzyca K»odzka, followed by _ Zelazno, K»odzko and Bardo. The relative difference (ln X a ¡ ln X b )/ln X b is pictured in Figure 6(b). Of note is that it is higher at Bystrzyca K»odzka and
A limited increase of U with T from 20 to 200 years is observed, namely from 0.5652 to 0.6282 (Bystrzyca K»odzka), from 0.5189 to 0.5573 (K»odzko), from 0.3556 to 0.3607 (Bardo) and from 0.4625 to 0.5773 ( _ Zelazno). The large U values at Bystrzyca K»odzka are due to the large first term d in Equation (11), and at _ Zelazno by the high difference g a ¡ g b (second term in Equation (11)). In Table 2 the values of factor V D exp(U) are displayed for various return periods. Remembering that V is the factor by which X b has to be multiplied to obtain X a , these values can be used in practice in estimation of X a from V and X b . The use of the factor V is advised in the future when only daily mean discharges are available to the POT sample and daily maxima values are not routinely recorded at the four gauging stations.

Between rates of change
In Figure 7, the rates of change dX a dT , dX b dT (Equation (10)) divided by the catchment areas were depicted. The lowest rate of change of X a is revealed for Bardo and the highest for _ Zelazno. This implies that even small change in T causes a huge change in design flood magnitude for _ Zelazno or, in other words, discharges may go up to very large values in this catchment. The rate of change of X a is lower on the Nysa K»odzka river downstream, hence similar change in T causes much larger change in design flood per unit area at Bystrzyca K»odzka than at Bardo. With increasing T, the rate approaches 0 which shows the tendency of the FFC to stabilize with increasing T.
For every catchment, dX a dT is greater than dX b dT . Therefore, the X a based FFC is featured by a stronger tendency to changes than the X b based FFC. Along the Nysa K»odzka river, the largest discordancy between dX a dT and dX b dT is noticed at Bystrzyca K»odzka and the lowest at Bardo. For Bystrzyca K»odzka, K»odzko, and Bardo, the relative differences between dX a dT and dX b dT are similar for T from 50 to 200 years, reaching values of 98%, 81% and 44% for T D 200 years. At _ Zelazno, an increase of this difference is observed, from 79% to 92%. To summarize, _ Zelazno (Bardo) is featured by the largest (lowest) difference between the X a based rate of change than of the X b based rate of change.
The difference between the rates of change is related to the variability of peak streamflows, which is stronger in small than in large drainage areas (Fill andSteiner 2003, Malamud andTurcotte 2006). However, results from the comparison between X a and X b and between the rates of change indicate that area is not the only factor to influence the relation between X a and X b because the catchments at Bystrzyca K»odzka and at _ Zelazno have a comparable area but a different relation between X a and X b .

Catchment characteristics versus the magnitude of the difference between X a and X b
The largest magnitude of difference between X a and X b , and the large rate of change of the flood magnitudes along the Bia»a Lądecka river reflect the fact that this catchment is the most flash-flood prone one among the four catchments. It is worth reminding that the Bia»a Lądecka catchment at

_
Zelazno is featured by a much larger TRI value than the other three catchments. The issue that high ruggedness can increase susceptibility to flooding was considered, for example, by Patton and Baker (1976) for basins in North Central Utah, USA, and by Bhatt and Ahmed (2014) for catchments located in the Krishna basin, India. The difference in river regulations is supposed to be another factor that affects the difference between the X a and X b based design discharges and the rate of change. The continuous reinforcements along a substantial part of the Bia»a Lądecka river causes an increase of water velocity and an increase of the instantaneous maximum discharge in comparison to rivers with low number of reinforcements. The mean discharge, in turn, is not as sensitive to changes in velocity as the maximum value, which explains why the difference mentioned above is higher for the Bia»a Lądecka river than for the Nysa K»odzka river.
Results of the study of the difference between X a and X b suggest that similar comparison could be completed for larger group of catchments. It should be intended to regionalize this group according to factor V.

Uncertainty and accuracy of quantile estimates
The bootstrapping method based CIs of ln X a and ln X b are depicted in Figure 5. The uncertainty increases with T and is higher for X a than for X b . For 20 T 500 and both for X a and X b , K»odzko is featured by the largest width of CI, _ Zelazno by the second largest, followed by Bardo and Bystrzyca K»odzka. However, the largest width of CI divided by the catchment area is observed at _ Zelazno (1.00, 1.85, 3.35), followed by Bystrzyca K»odzka (0.50, 0.80, 1.27), K»odzko (0.43, 0.72, 1.19) and Bardo (0.25, 0.43, 0.70) for T D 50, 100, 200 years. Substantial differences between catchments suggest that the estimates are the most reliable for Bardo and the least reliable for _ Zelazno. The width of CI partially reflects the variability of the shape parameter estimate (Section 3.1) because the largest variance of b g , equal to b g 2 tÀ1 , is detected at _ Zelazno, followed by K»odzko, Bardo and Bystrzyca K»odzka, namely 0.028, 0.019, 0.016, 0.012, respectively.
Results of the accuracy analysis are displayed in Table 3. The low values of Bias and high values of PPCC reflect a low discrepancy between observed and estimated quantiles. This shows a high quality of the calibrated GPD model.

Validation
For Bystrzyca K»odzka, K»odzko and _ Zelazno, the AM discharges are depicted in Figure 8 together with the design discharges X a and design discharges obtained from the GEV, the AM-based X GEV . The validation was not completed for Bardo due to the unavailability of AM data. The X a -based calibrated GPD fits very well the highest AM value for all three catchments and fits well the two highest AM values for Bystrzyca K»odzka and _ Zelazno. The fit between the lower AM values and the X a is poorer. Bias and RMSE values are shown in Table 4. The best fit is evident for Bystrzyca K»odzka, followed by K»odzko and _ Zelazno. However, for Bystrzyca K»odzka and _ Zelazno, this fit is much better for T > 10 years.
The shape parameter of the GEV distribution is equal to 0.32 § 0.12, 0.41 § 0.13 and 0.77 § 0.17 for Bystrzyca K»odzka, K»odzko and _ Zelazno, respectively. For Bystrzyca K»odzka and K»odzko, it is somewhat lower than the GPD shape parameter. However, for _ Zelazno a greater congruence is observed. The GEV fit yields a strong underestimation of the highest AM values in all three catchments and a very strong underestimation of the second highest AM value in Bystrzyca K»odzka. The difference between the two estimation methods is reflected in Figure 8 which demonstrates how the Hill estimation pulls the FFC leading to a better matching of the highest observations while the ML method leads to underestimation of those highest observations. A possible reason of this underestimation is that the ML method attaches the same importance to every AM observation.

Comparison to other studies
The comparison, reported in Section 3.5, was carried out between various peak flow estimation methods, namely using the Fuller (1914), Sangal (1983), Fill and Steiner (2003) methods, (F), (S), (FS), and the method based on Hill's estimates (H), i.e. the formula X a D V ¢ X b and Equation (11). It is worth remembering that the days of MDF are the days of peaks, which makes MDF not equal to X b . Thus, in the first three methods, the estimation was not based on the POT means sample but on the MDF at the days of peak, i.e. at the days of X a while MDF were extracted directly from the raw data of the mean flows. Only in the fourth (H) method, the POT means sample was used. After transformation of the mean daily flows to the maximum daily flows using the four methods, these flows were compared with the POT maxima. Then, the diagnostic measures were used. As evident  in the results presented in Table 5, the method based on (H) estimates yielded a much lower relative bias and relative root mean square error rmse between estimated and observed values than any of the methods of (F), (S) and (FS). However, it is worth remembering that both daily maxima and daily means are used in this study to estimate V while only daily means are used in the (F), (S) and (FS) methods. For this reason, these three methods yield lower quality results.
Regarding the rate k D IPF/MDF, this value is equal to 1.65, 1.54, 1.44, 2.18 for Bystrzyca K»odzka, K»odzko, Bardo and _ Zelazno, respectively, and is equal to 1.03 and 1.64 (Group 1) and varies from 2.91 to 4.77 (Group 2) for the 12 catchments located in southeastern Spain studied by Taguas et al. (2008). The two catchments in Group 1 have a small area and a moderate slope and elevation. This proves that some similarity can be found in k between Group 1 and the four catchments studied. However, the areas of the studied catchments are much larger than those of Group 1. Because Taguas et al. (2008) studied the factor k only in the context of the relation between sample values of IPF and MDF, it is constant for every catchment. In this study, however, the factor V between X a and X b was considered. It is intended for use in FFA, thus to be applicable for design discharges for various return periods. As shown in Table 2, it is noteworthy that V is different from k in Bystrzyca K»odzka, K»odzko and _ Zelazno, and nearly equal to k in Bardo. A deeper investigation into the POT maxima and POT means samples revealed that the dates of these two series are inconsistent. The ratio of the number of days during which both a maximum and mean value belong to the POT samples to the length of the POT means sample is only 0.74 (Bystrzyca K»odzka), 0.71 (K»odzko), 0.63 (Bardo) and 0.77 ( _ Zelazno). This low degree of agreement precludes the applicability of the (F), (S) and (FS) methods of derivation of peak flows from daily mean flows to the POT means sample in order to estimate design discharges in these catchments. Therefore, these three methods cannot be applied together with the POT approach.

Conclusions
The discrepancies between X a and X b enable us to draw the conclusion that attention should be paid to lacking availability of maximum daily discharges and when design flood discharges are assessed on the basis of daily means instead. This statement is crucial for rivers with high flow dynamics, as the Bia»a Lądecka river at _ Zelazno and the Nysa K»odzka river at Bystrzyca K»odzka. Results of this study allowed us to make the following conclusions and recomendations: (1) for long T, the FFCs of the logarithmized X a and X b design flows vary, approximately, by a constant value. Therefore, the flood magnitudes X a can, for given return periods, be obtained by multiplication of X b by the exponent of this value, (2) the difference between X a and X b based design discharges depends on area and on other catchment characteristics, (3) the highest absolute and relative differences between ln X a and ln X b and the highest difference between rate of change of FFC are obtained for catchment with large TRI, and for river with continuous system of protection measures due to many human settlements along river banks, (4) the magnitude of the differences decreases from up-to downstream, Table 5 Results of the comparison between estimators of X a : (H) using X a D V ¢ X b and Equation (11), and (F), (S), (FS) using Fuller, Sangal,Fill and Steiner methods (Section 3.5 (5) the magnitude of the differences reflects the susceptibility to flooding of the studied catchments, (6) using the GPD calibration and bias correction method presented in this paper, the factor V between the daily maxima based and daily means based discharges can be obtained, (7) the methods of Fuller (1914), Sangal (1983), Fill and Steiner (2003) for calculation of peak flows from daily mean flows cannot be applied to POT means samples for the estimation of design discharges in the four catchments studied, (8) in case of accessibility of data, the estimation of factor V should be completed again for a longer period in the four catchments studied, (9) using the tools used in this paper, the estimation of V can be carried out also for other catchments in order to compare results from various catchments of different hydrological regimes, (10) results of the study suggest the possibility of regionalizing catchments according to the factor V.