Stationary vs non-stationary modelling of flood frequency distribution across northwest England

ABSTRACT Extraordinary flood events occurred recently in northwest England, with several severe floods in Cumbria, Lancashire and the Manchester area in 2004, 2009 and 2015. These clustered extraordinary events have raised the question of whether any changes in the magnitude and frequency of river flows in the region can be detected. For this purpose, the annual maximum series of 39 river gauging stations in the study area are analysed. In particular, non-stationary models that include time, annual rainfall and annual temperature as predictors are investigated. Most records demonstrate a marked non-stationary behaviour and an increase of up to 75% in flood quantile estimates during the study period. Annual rainfall explains the largest proportion of variability in the peak flow series relative to other predictors considered in our study, providing practitioners with a useful framework for updating flood quantile estimates based on the dynamics of this highly accessible and informative climate indicator.


Introduction
There is a perception that the frequency and magnitude of extreme flood events and storms have changed significantly over the last few decades around the world, mainly because of climate change, seasonal rainfall intensities, temperature variations, change in the land cover and deforestation (Coles et al. 2001, Milly et al. 2002, 2008, Vogel et al. 2011, López and Francés 2013, Salas and Obeysekera 2014, Prosdocimi et al. 2015. These changes have in some cases altered the seasonality of flooding processes and the magnitude of flood flows across Europe, increasing remarkably the fluvial flood hazard in large European regions (i.e. North-Central Europe, see Blöschl et al. 2017Blöschl et al. , 2019 and ultimately leading to a change in the characteristics of underlying distribution of river flood flows (non-stationarity). A stationary stochastic process is based on two assumptions, namely the independence and identical distribution of time series (Coles et al. 2001). These assumptions might be violated if the flood characteristics of a catchment have changed over time -that is, the peak discharges are not identically distributed. Numerous studies have addressed the applicability and value of the stationarity hypothesis relative to flood frequency regime (Douglas et al. 2000, Milly et al. 2008, Vogel et al. 2011, Montanari and Koutsoyiannis 2014, Blöschl et al. 2015, Šraj et al. 2016. In fact, in recent years, there has been a lively debate about the advantages and disadvantages of stationary and non-stationary analysis, and many researchers have discussed the preference for each framework (Milly et al. 2008, Montanari and Koutsoyiannis 2014, Serinaldi and Kilsby 2015, Serinaldi et al. 2018. Several studies argued that unless there is a clear deterministic process of change, the stationary setting should still be employed (Montanari andKoutsoyiannis 2014, Serinaldi et al. 2018). This is mainly because of the large uncertainties associated with non-stationary models. While the scientific debate continues, flood managers and practitioners, who might witness numerous inundation events in the communities, need straightforward guidance on whether, and if so, how, to change current design approaches. Among several studies carried out within the realm of non-stationarity, different fitting and goodness-of-fit approaches as well as different covariates and frequency distributions have been utilized.
Although significant increasing trends in time series of river flows were identified on most continents, including Asia, South America, North America (Labat et al. 2004) and northern Europe (Stahl et al. 2010, Blöschl et al. 2017, decreasing trends were found in other regions including Africa (Labat et al. 2004), southern Europe and some parts of eastern Europe (Stahl et al. 2010. Mangini et al. (2018) investigated the existence of trends in the frequency and magnitude of flood events using both annual maximum (AM) and peaks-over-threshold (POT) data recorded in rivers from across Europe during the period 1995-2005. They inferred that utilizing the AM approach results in more trends in the magnitude of flood events, as opposed to POT series, which showed more trends in the frequency of flood events.
Additionally, numerous studies have been carried out focusing on parametric non-stationary flood frequency analysis, most of which tackled the parameters of probability distribution depending on time as a covariate (Strupczewski et al. 2001, El Adlouni et al. 2007, Delgado et al. 2010, Gül et al. 2014, Debele et al. 2017. Nonetheless, the problem of time-varying distribution parameters is that it can sometimes be unrealistic to extrapolate the detected changes into the future, which ultimately does not lead to accurate results (Ahn andPalmer 2016, Agilan andUmamahesh 2017). This reason encouraged researchers to incorporate hydrological and physically based variables as covariates in the non-stationary models.
For example, Villarini et al. (2009b) employed non-stationary flood frequency analysis for generalized additive models (GAMLSS) in which scale, location and shape parameters varied with time, daily maximum rainfall, and population density for different basins in the Little Sugar Creek watershed in North Carolina. They inferred that the recurrence intervals significantly vary over the time series for the specific river discharge. Prosdocimi et al. (2014) investigated non-stationary frequency analysis of the UK AM data using a 2-parameter lognormal distribution, the location parameter of which varied with time and 99 th percentile daily rainfall. Their results demonstrated that various patterns are found for the peak flow series, and that the variability of river flow data could be explained by means of extreme rainfall events for each year. Šraj et al. (2016) carried out flood frequency analysis using a non-stationary framework for two river gauging stations in Slovenia. Assigning the location parameter of a generalized extreme value (GEV) distribution model as a function of time and annual rainfall, they compared the results using maximum likelihood estimation (MLE) and Bayesian-based Markov chain Monte Carlo (MCMC) methods for the estimation of parameters. Their results showed that the stationary model underestimates flood quantiles compared to the non-stationary models in recent years. Furthermore, the inclusion of annual precipitation as a covariate in the model demonstrates the best goodness-of-fit performance. Likewise, Dong et al. (2019) performed bivariate nonstationary GEV flood frequency analysis using covariates such as precipitation and urbanization/deforestation attributes in Dongnai River in Vietnam. They showed that the stationary condition remarkably underestimates the flood quantiles compared to the non-stationary models.
That being said, the majority of studies mentioned above addressed the detection of trends in the frequency and magnitude of extreme meteorological events through non-parametric tests (e.g. Mann-Kendall test). The use of parametric non-stationary frequency analysis, in which a distribution is assumed to be the parent distribution for the data under study, is less common. In particular, parametric studies often assumed a GEV distribution model. As far as covariates are concerned, the majority of studies in the literature consider time in describing the non-stationary behaviour of flood characteristics, as opposed to a systematic implementation of hydro-meteorological data as covariates.
As a result, to the best of our knowledge, there is still a research gap for fully capturing the characteristics of non-stationary settings based on a generalized logistic (GLO) distribution model, by integrating various sequences of hydrological predictors. In this context, although limited studies have been undertaken to investigate the changes underlying the stochastic process of riverflow data in northwest England (Spencer et al. 2018, Faulkner et al. 2020, these have used the GEV model as the fitting distribution, while the GLO distribution is the recommended frequency model on most UK catchments. This discrepancy can have a major impact on the outcome of the analysis, and indeed further assists plan investment in flood alleviation in northwest England, where successive extreme flood events have occurred in recent years.
The main objectives of the present study are then as follows: • identification of significant changes in the annual flood peak series observed in northwest England; • evaluation of the importance of applying various components as covariates in the frequency models; • detection of the mechanism driving the non-stationary behaviour of flood characteristics; • selection of the best model, capable of delivering the best fit over the flood series; and • quantification and comparison of the (design) flood quantiles under stationary and non-stationary settings at all river gauging stations across the study area.

Data
An AM series of river flow data was obtained from the National River Flow Archive for a total of 39 catchments located in the northwest of England (NRFA 2018). The last water year available in the AM series data was 2015. The characteristics of the investigated river stations are shown in Table A1 in the Appendix, as well as in Fig. 1. Additionally, regional climate datasets for northwest England were obtained from the UK's national weather service, the Met Office. Specifically, monthly rainfall (mm) and temperature (°C) for the years from 1910 to 2018 were obtained and matched to the river flow recording periods.

Non-parametric tests
A preliminary analysis of the AM series was performed to detect changes in the frequency regime. In particular, two well-known and widely used non-parametric tests, in which no explicit assumption about the parent distribution for the data is made, were utilized in this study to detect any significant change in the annual maximal flood peak series, namely the non-parametric Mann-Kendall test (MKT) and the Pettitt test (PT) (see Kendall 1975, Pettitt 1979, Douglas et al. 2000. MKT is widely used to identify significant monotonic upward or downward trends in hydro-meteorological data series, while PT is used to detect sudden changes in the mean (and/or the variance) of the time series.

Frequency distribution model
The GLO distribution model is the recommended distribution curve for flood frequency analysis in the UK (Reed and Robson 1999). For this reason, it is employed here instead of the more commonly used GEV distribution. In this regard, the cumulative distribution function of the GLO distribution, F(x), is shown as follows (Hosking and Wallis 2005): The corresponding GLO quantile function, the inverse of F(x), corresponding to a given recurrence interval, x(F), is given by Equation (2), where F is the cumulative function shown in Equation (1) (Hosking and Wallis 2005): The location, scale and shape parameters are denoted μ, σ and ξ, respectively.

Parametric non-stationary framework
Whereas in the classical stationary setting, all parameters are constant (Model 1), in the non-stationary framework the statistical properties of distributions can be specified as a function of different predictors. Six non-stationary models (Models 2-7) are introduced in this study, allowing the location parameter to change linearly as a function of the predictors. The scale and shape parameters are considered constant in all models. The reason the shape parameter is treated as constant is the fact that reliably achieving its correct value is generally challenging (Salas and Obeysekera 2014). In addition, preliminary analyses performed for the study area clearly indicated the hypothesis of a timevarying scale parameter is not statistically significant for the vast majority of stations, therefore we assumed the scale parameter to be constant in our study. The first explanatory variable integrated into the parameter was time -that is, the years over which the flood happened (Model 2). Time can be viewed as a proxy for the identification of time-varying physical drivers, which are causing the change in the annual flood series (e.g. land-use and land-cover dynamics). As a further step, physically based covariates were included to help identify the flood changes and, therefore, to yield a better fit over the data. Hence, annual precipitation was included in the next step (Model 3). Although extreme precipitation is often used as covariate in the literature (Villarini et al. 2009b, Prosdocimi et al. 2014, annual rainfall is considered in this study. This is because Salas and Obeysekera (2014) determined that annual rainfall as a predictor can represent non-stationary behaviour more accurately than short-term extreme precipitation events can. This is because annual precipitation and event extreme precipitation are usually correlated, and annual precipitation has long-term impacts on the development of catchment characteristics (Salas and Obeysekera 2014). The other reason is that annual precipitation influences the antecedent soil moisture of each single event, ultimately influencing flood magnitudes (Gaál et al. 2012). Further, annual temperature was included as a covariate (Model 4) as a proxy for evapotranspiration. To provide additional alternatives, which may help produce a more accurate fit, Models 5, 6 and 7 were constructed as additive models combining the aforementioned covariates. In summary, the following models are employed: → Model 1, stationary model in which all parameters are constant, μ; σ; � → Model 2, non-stationary model in which location parameter varies linearly with time (t), μ t ð Þ ¼ B 2;0 þ B 2;1 � t; σ; � → Model 3, non-stationary model in which location parameter varies linearly with annual rainfall (R), , non-stationary model in which location parameter varies linearly with annual temperature (T), μ T ð Þ ¼ B 4;0 þ B 4;1 � T; σ; � → Model 5, non-stationary model in which location parameter varies linearly with both time (t) and annual rain- , non-stationary model in which location parameter varies linearly with both time (t) and annual temperature (T), , non-stationary model in which location parameter varies linearly with both annual rainfall (R) and annual temperature (T), μ R; where μ is the location parameter, σ is the scale parameter, ξ is the shape parameter and t is the water year associated with each AM event. Note that time (t) is considered from the first observation record for each river gauge station until the end of the flow observations (water year 2015). R and T represent cumulative annual rainfall depth and annual temperature, respectively, for northwest England ending in water year 2015. B m;i with m (model) varying from 2 to 7 and i = 0,1,2 are unknown regression coefficients which need to be estimated based on the available annual maximum series, in order for the location parameter to be calculated.

Parametric estimation
The unknown parameters of seven GLO distribution models defined in section 2.4 are estimated using the MLE method (Coles et al. 2001, Katz 2013. If X ¼ fX 1 ; X 2;:::: X n g are observations from the distributed random variables, coming from a probability distribution model f with parameters P, the MLE measure maximizes the likelihood function, given by: However, it is often easier to work with the log-likelihood, instead of the likelihood function, given by Equation (4): The log-likelihood function for the GLO distribution function is then shown below: Thus, the MLE finds the values of the parameters of the loglikelihood that make the observed data sample most likely and, finally, delivers the estimated parameters. In this study, nonlinear optimization using the maxLik package (Henningsen and Toomet 2011) was carried out in R software (R Development Core Team 2013), utilizing the Newton-Raphson algorithm, for numerically optimizing the loglikelihood function of the GLO models.

Selection of the best model
The Akaike information criterion (AIC, Akaike 1974) is a goodness-of-fit measure that aids in comparing frequency models and selecting the best-fitting one. It represents how well the model fits over the data relative to the other frequency models. The AIC equation is as follows: in which L is the maximum value of the log-likelihood function, and k is the number of parameters. The smaller the AIC value, the better the model performs, in comparison to other models. This implies that the best model is recognized based on the balance between the goodness of fit (i.e. the value of the log-likelihood) and the model complexity (i.e. the number of parameters). Additionally, confidence intervals for the regression parameters for each predictor were derived using the delta method (Salas and Obeysekera 2014): these were used to check whether the inclusion of a predictor into the model is significant at the 5% significance level. The literature presents various approaches to construct confidence intervals, such as the delta method, bootstrap method, and profile likelihood method (Efron andTibshirani 1994, Royall 1997). Although profile likelihood might deliver a more robust and accurate estimation of the uncertainties (due to assuming the asymmetrical characteristics of maximum likelihood estimates), it is oftentimes computationally demanding and burdensome (Obeysekera and Salas 2014). In contrast, using locally computed derivatives (e.g. the delta method) can be relatively accurate but computationally easy and more efficient. This justifies the popularity of the delta method as an approach to assess uncertainties in the parameters and their transformations (e.g. quantiles) in the literature (see e.g. Macdonald et al. 2014, Obeysekera and Salas 2014, Šraj et al. 2016). This method is also used herein to calculate the uncertainties for the quantile estimations at the 95% confidence level.

Study area
A total of 39 AM series of peak flow obtained from river gauging stations located in the northwest of England (Cumbria, Lancashire and the Greater Manchester area) are considered in this study (Fig. 2). These stations were chosen due to their locations on rivers that have recently experienced extraordinary flood events, in 2004, 2009 and 2015. The study area as well as the locations of the river stations are illustrated in Fig. 2. During storm Desmond, 4-6 December 2015, almost all of the gauges investigated in Cumbria observed discharges that exceeded their previous records. This region of England is targeted in this study because of the exceptional nature of the inundation events that have occurred there over the past few years (Miller et al. 2013). Topological ordering of the rivers over the investigated catchments is represented as well, in terms of Strahler stream order (Strahler 1957) as shown in Fig. 2.

Results and discussion
First, AM series for all 39 stations were tested for trends and sudden changes, or change-points, via the MKT and PT, respectively. Figure 3 shows the spatial pattern of MKT and PT results. The MKT detected the presence of statistically significant trends (at the 5% significance level) at 20 stations. Note that all monotonic trends detected across the study region are upward. The majority of series characterized by statistically significant trends are scattered around the Northern region of the study area (e.g. Cumbria). Additionally, PT detected significant sudden changes in the mean of the time series at six stations across the study area. Note that the same six stations for which PT detected statistically significant sudden changes are also flagged by MKT as series characterized by statistically significant upward trends. That is, both tests detect significant changes in these series, which could be interpreted as trends or abrupt changes.
Second, based on the results of the non-parametric tests, the seven stationary and non-stationary distribution models introduced in section 2.4 were estimated using peak flow series from each of the 39 stations in turn. Although the implications are depicted for all of the 39 stations in the form of spatial maps (described and shown in the following sections), only results for stations number 69044 (located in Greater Manchester) and 75005 (located in Cumbria and Lancashire) with at least 40 years of observations are presented here in more detail to showcase the outcome of the study; detailed results for all stations can be found in the Supplementary material.

Station no. 69044: River Irwell at Bury Ground
Results from fitting the model parameters by the MLE method, along with the ranking based on the AIC, are presented in detail in Table 1 for River Irwell at Bury Ground. The results show that Model 2, where time is included as the only covariate, is not statistically different from zero, as B 2,1 encompasses zero. However, when annual rainfall is included as an additional step (Model 3), the corresponding estimated parameter, B 3,1 , is found to be significantly different from zero at the 5% significance level (i.e. the 95% confidence interval does not contain zero). This indicates that the inclusion of annual rainfall is able to explain a large part of the variability of the peak flow series.
When incorporating temperature as a covariate (Model 4), its parameter, B 4,1 , was not found to be significantly different from zero. In particular, the estimated confidence intervals for the parameters of all time-and temperature-dependent regression models (Models 2, 4 and 6) -that is, B 2,1 , B 4,1 , B 6,1 and B 6,2 -do  in fact contain zero. Despite the improvement over the stationary fit when both rainfall and time (Model 5) and rainfall and temperature (Model 7) are included, the AIC measure indicates that including only annual rainfall as a covariate yields the bestfitting model. This implies that allowing the location parameter of the GLO frequency model to vary as a linear function of annual rainfall (as the only covariate) expressed the alterations in flood peaks more accurately than the other covariates and, thus, gave a better fit of the data.

Station no. 75005: River Derwent at Portinscale
The second station explored in detail is station no. 75005. Table 2 shows the estimated parameters (along with the limits of the 95% confidence intervals) and ranking of all models by the MLE and AIC. According to the table, the inclusion of time and temperature (Models 2 and 4) did not produce a statistically significant change in the model fit. On the other hand, the inclusion of annual rainfall in all models (Models 3, 5 and 7) into the location parameter -B 3,1 , B 5,1 and B 7,1 , respectively -significantly improved the stationary model's fit. This is supported by AIC ranking alongside the confidence limit around the B 1 parameter, which indeed does not encompass zero. However, according to the AIC goodness-of-fit measure, Model 3, with only annual rainfall as covariate, was found to be the best model to explain the ongoing changes in the peak river flow regimes, compared to the other models. A detailed examination of the flood peaks and annual precipitation for this streamgauge also demonstrates that the maximum annual rainfall (119 mm) observed in northwest England in 2015 is associated with the second highest river flow (365 m 3 /s), which happened in the same year, the main reason for which could be the intense precipitation that occurred in November and December 2015. This measure is, again, highlighting the relevance of annual rainfall as covariate to help ascertain the changes in flood frequency.

Identification of the best-fit models at all streamgauges
Both stationary and non-stationary frequency analyses using the GLO distribution were repeated at all 39 gauging stations in the study area; the detailed results can be found in the Supplementary material (Tables S1-S39). The selected frequency models at all stations, as decided by the AIC measure, are shown in Fig. 4. This highlights that the stationary Model 1 was preferred at only three streamgauges out of 39, meaning that treating the location parameter as a constant value at these stations revealed a better performance and fit to the data, as opposed to 36 stations at which non-stationary settings were found to give a better fit to the data. This indicates that the driving factors such as meteorological conditions and time largely influenced the flood characteristics in northwest England. Non-stationarity, thus, might be a dominant process at most gauges, urging authorities and designers to take non-stationarity into account as an option for the future construction of flood defence structures alongside conventional methods. Moreover, annual rainfall rather than time and temperature is often included in the best-fit models, indicating that this variable explains a large proportion of the variability in the peak flow samples. At 22 stations out of 39, the regression model with only annual rainfall as the explanatory variable (Model 3) was preferred, as the best fit. These findings are in agreement with the other studies in the literature (Prosdocimi et al. 2014(Prosdocimi et al. , Šraj et al. 2016, which reported improvement over stationary performance when rainfall is included in the frequency analysis in preference to the time-based models. According to Fig. 4, eight and six stations where the regression models were fitted with time (Model 2) and with rainfall and time (Model 5), respectively, were found to maximize the AIC measure, making a significant improvement over the stationary fit. Note also that all the regression models incorporating temperature as a covariate (Models 4, 6 and 7) underperformed relative to other options at all streamgauges. This indicates that temperature appeared to be a poor predictor for change in the peak flow values. In other words, temperature (as a proxy for evapotranspiration) is not a good descriptor of the frequency of the flooding process and, hence, was not able to accurately detect the changes in the flood characteristics in the study area. It is worth highlighting that Model 3, where annual rainfall was integrated as the only covariate, was preferred at all streamgauges in the south of the study area (i.e. Greater Manchester and Lancashire), whereas Cumbria, located in the north, reported more diverse preferred models (i.e. Models 1, 2, 3 and 5).
It is worth noting that MKT did not detect statistically significant changes at three stations for which the stationary framework provided the best fit. Also, at 16 gauges with statistically non-significant trends, non-stationary models (mostly precipitation-included models) outperformed the stationary one (Figs 3 and Figs 4). Bertola et al. (2019) reported similar results; they show superior performance for rainfalldriven non-stationary models even at gauging stations without statistically significant trends.

Resorting to non-stationary models for quantile estimation
Flood return period is a key concept for the design of hydraulic facilities and flood control systems. For "stationary" models with constant parameters, the probability p that a flood peak will be larger than a certain design event Q T in a year is expressed in terms of the return period T, which is the inverse of p (i.e. T = 1/p). On average, a T-year flood is exceeded once in T years. Therefore, a 100-year flood event has a probability of 1% of occurring or being exceeded in 1 year. These simple relationships between design events, exceedance probability and return period rely on the assumptions that the probability distribution of high flows is constant in time: this is not the case for the non-stationary models. There is a large body of literature on the need to revise the concept of the return period and on the possible adaptation of this quantity in the context of non-stationary conditions (Cooley 2013, Obeysekera and Salas 2014, Salas and Obeysekera 2014, Volpi 2019. In particular, there are two main approaches to flood return period approximation under non-stationary conditions: (a) using the concept of expected waiting time for the first occurrence of a flood event exceeding the design flood (Salas and Obeysekera 2014), and (b) defining the return period as the time interval in years for which the expected number of exceeding flood events is equal to 1 (Cooley 2013).
Instead of resorting to either of the revised definitions listed above, we decided to discuss the practical implications of adopting the non-stationary models presented above to estimate the flood quantiles at a given location, by calculating the at-site flood quantiles for any given year by using the distribution parameters as estimated for that year (e.g. referring to the cumulative rainfall depth of that year). Once the parameters of the seven models (one stationary and six non-stationary) are estimated and the best-fit model is identified based on the AIC criterion, the flood quantiles for specific recurrence intervals and different models can be quantified and compared. This will allow for a direct assessment of the impact of nonstationary frequency models, for example on design events.
Sections 3.2.1 and 3.2.2 present and compare the estimates for a rare and median flood (i.e. a one-in-T-years, or T-year flood, with T being conveniently large; and a 2-year flood) at the two aforementioned test site stations (see Figs 5 and Figs 6). According to report no. 629 from the Flood Defence Standards for Designated Sites (Risk & Policy 2006), the 100year flood event is generally considered for constructing flood defences in most parts of the UK, and for this reason we selected T = 100 years in our study. Section 3.2.3 focuses on the 100-year flood and addresses the implications and potential consequences of selecting a stationary or a non-stationary model at all streamgauges when designing flood defences and flood risk mitigation measures.
It should be pointed out that it is not quite straightforward to answer which (design) discharge should be taken into account for flood risk management when they are associated with non-stationary outcomes (Serinaldi and Kilsby 2015). This is due to the changing characteristics of non-stationary frequency analysis over time. As a result, we take the last year of the fitting period (2015), as advised by Luke et al. (2017), to select and compare the (design) quantiles between nonstationary and stationary estimates throughout this study. However, to better showcase and represent the practical implications of our study, results for the second-to-last year (i.e. 2014) are included as well, and compared with those for 2015.

Comparison of the 2-and 100-year flood quantiles at Station no. 69044: River Irwell at Bury Ground
Considering Irwell at Bury Ground, as shown in Fig. 5, where only the stationary and best-fit non-stationary estimates are displayed, incorporating annual rainfall as a covariate in the location parameter led to an abrupt change for both median and design flood estimates moving from one year to the next, as opposed to the stationary model for which the flood quantile is constant. Differences between stationary and the best-fit non-stationary model predictions become more apparent at larger return periods in the flood estimations. This is attributed to the occurrence of larger uncertainties for non-stationary settings at higher frequencies. The stationary model predicted the median to be 105.85 m 3 /s, as opposed to the best-fit nonstationary model (driven by cumulative annual rainfall), which predicted a median of 146.79 m 3 /s in the last year of the fitting period. This value is approximately 40% greater than the value predicted by the classic stationary setting.
This river also shows an abrupt increase in the median flood flows in the late 1990s, alongside three sharp spikes in 2007, 2011 and 2015 (Fig. 5). Given the strong correlation between annual precipitation totals and annual floods for this catchment, these spikes are a consequence of the considerable amount of cumulative annual precipitation that occurred in these years (above 1500 mm in all three cases). For instance, the highest annual precipitation (1724 mm) is associated with the largest observed discharge (284 m 3 /s), which occurred in 2015.
In terms of the 100-year flood (i.e. the design flood quantile; see Fig. 5), similar to the median (i.e. the 2-year flood), the best-fit non-stationary model was able to capture the design floods based on precipitation values occurring in each year. For example, the non-stationary model predicts the largest design flood (i.e. 276.85 m 3 /s) in 2015, reflecting the highest value observed for the cumulative annual precipitation value, that occurred in the same year. Furthermore, the design flood (100-year flood) estimated under the best-fit non-stationary condition (Model 3) is associated with the 160-year quantile under stationary conditions. This implies that there might be a 60-year frequency difference between the stationary (254.2 m 3 /s) and non-stationary (276.85 m 3 /s) setting in the last year of the flood records. Nonetheless, since the 95% confidence limit becomes wider for the non-stationary estimate resulting from Model 3 (see Table A2 in the Appendix) compared to the stationary one, the interpretation of change in the design quantile is not straightforward. As shown in Table A2, inspecting the confidence interval around the stationary and the preferred non-stationary quantile does not allow us to infer whether we have certainly (at least with 95% confidence) underestimated the design quantile using the stationary setting. These findings emphasize the importance of incorporating uncertainty analysis in non-stationary flood risk management schemes.

Comparison of the 2-and 100-year flood quantiles at station no. 75005: River Derwent at Portinscale
Comparing median and design flood quantiles for River Derwent at Portinscale (Fig. 6), the same interpretations as above can be made. As stated in section 3.2.1, the nonstationary model, where cumulative annual rainfall was integrated, generated jumps. The median and design quantile exhibit a similar pattern over the flood period, with larger differences between the flow estimates of the 100-year events. The median quantiles estimated by the regression model are consistently larger than the one obtained by the stationary model in the late 1990s, especially after 2006.
Moreover, the 100-year quantile of the rainfall-dependent model (Model 3), with 338.36 m 3 /s discharge in the last year, can be associated with the 140-year flood quantile of the stationary model. It stands to reason that the rate of increase in the non-stationary design quantile might have been 7%. However, the implication of change between design quantiles based on the stationary and the preferred non-stationary model is complicated, as their 95% confidence intervals overlap (see Table A2).

Practical implications of selecting non-stationary vs stationary design quantiles
To assess the implications of selecting a non-stationary vs a stationary model for estimating the design flood quantile (as a measure of constructing flood defences in most parts of the UK) in northwest England, the approach outlined above was repeated at all 39 stations with a specific focus on the 100year return period.
In this context, to investigate the discrepancy between the design values, we measure the ratio between the design quantiles derived from the preferred non-stationary model (in case a non-stationary model was selected by the AIC criterion) in 2015 (the last year for which data is available), and the stationary one at each river station ( Fig. 7(a)). Furthermore, uncertainties around the design quantiles are shown in Table  A2 of the Appendix for all stations (represented by confidence limits). To further showcase and support the implications of our framework, the same procedure performed and explained above was repeated for the second-to-last year in our sample (i.e. 2014), shown in Fig. 7(b). The importance of evaluating different windows of records was also emphasized by Griffin et al. (2019).
According to the results shown in Fig. 7(a), the stationary distribution produced the best fit at three stations (black squares). At six stations (shown in yellow), non-stationary analysis reduced the stationary estimates in 2015. At all of the remaining stations, in contrast, the non-stationary regression models might have increased over the conventional stationary design estimate. The most significant increase is observed mainly in the north of the study area (Cumbria). This finding, however, alongside the design events' confidence limits (see Table A2) makes it difficult to conclude whether the stationary models do effectively underestimate or overestimate the quantile when compared to the non-stationary ones in 2015. For instance, the flow estimate resulting from the bestfit non-stationary model at gauge 73002 was around 75% higher (the largest change across the area) than the one predicted by stationary model. However, by looking at their associated uncertainties (Table A2), it is not possible to properly judge the estimation of quantiles. This is because the confidence interval associated with the non-stationary model becomes wider compared to the stationary one, and indeed, they overlap with each other.
However, based on the achieved uncertainties shown in Table A2, we can infer with 95% confidence that at only 11 stations out of 39, stationary models underestimated the quantiles compared to non-stationary ones in 2015. These gauges (highlighted in red in Fig. 7(a)), all located in Cumbria, are the ones that were severely hit with floods, especially in that year, producing tremendous discharge rates up to 65% higher than the stationary estimates. This conveys a crucial message, that the non-stationary framework for designing hydraulic facilities in northwest England should be considered as an alternative option along with the traditional stationary setting, with special attention to such stations with severe flood records.
Comparing the results (see Table A2 and the Supplementary material), for example at station no. 72005 on River Lune at Killington, with those of a similar study (Faulkner et al. 2020) demonstrates a notable difference in terms of the design flood event in water year 2015. The stationary and the best-fit nonstationary design event in the earlier study was calculated at around 575 m 3 /s [450; 700] and 600 m 3 /s [550; 750], respectively. However, the same quantities were calculated as 622.08 m 3 /s [583.51; 665.98] and 878.63 m 3 /s [820.24; 937.02] in the present study. This implies a 4% and 46% discrepancy with respect to the stationary and non-stationary design flood event, respectively, at this gauge. This is attributed to the selection of a different frequency distribution (i.e. GEV distribution) by Faulkner et al. (2020). Although the inclusion of time was found to be significant both by Faulkner et al. (2020) and in this study, the simultaneous incorporation of rain and time here gave the preferred fit at the investigated station -which can be viewed as the part missing from the literature.
In addition to the implications for the last year in the record (2015), in which the highest ever total rainfall accumulation was recorded, Fig. 7(b) illustrates the ratios of the best-fit nonstationary design quantiles to the stationary ones for 2014 (the second-to-last year in the dataset). This figure shows patterns of the ratio that are very similar to those depicted in Fig. 7 (a) (particularly in Cumbria). Also, the gauges highlighted in red (i.e. those where stationary distribution underpredicts flood quantiles at the 95% confidence level) show close similarity for both years, even though 2014 was not an extremely wet year (1302 mm cumulative annual rainfall) relative to 2015 (1724 mm cumulative annual rainfall).

Non-stationary design flood quantiles
To provide insight regarding how the design event might have changed between the stationary and the best-fit non-stationary analysis in the last year of observations, Fig. 8a shows the results produced by dividing the non-stationary design quantile by the stationary one in 2015, and accordingly obtaining the design "trend." Similar to the previous section, results obtained by referring to 2015 have been compared with those associated with 2014 ( Fig. 8b) to better draw the practical implications for the design quantiles.
As shown in Fig. 8a, the vast majority of stations (30 stations) recorded larger non-stationary design values with respect to the stationary ones -that is, an upward "trend"supporting the conclusion that it is likely that large flood events might happen again in the future in those areas. In contrast, six stations revealed downward behaviour, which means that the preferred non-stationary models accounted for lower quantiles compared to the stationary ones. However, the implementation of uncertainties (e.g. confidence intervals as in Table A2) for both stationary and nonstationary analysis should be an integral part of any conclusion, all of which helps in detecting the ongoing changes in the flood peak magnitudes.
When it comes to the comparison of quantiles' sensitivity to the selected predictors between 2014 and 2015 ( Fig. 8(a) and (b)), we observe an analogous situation across the region, as discussed. In other words, most river stations -especially in the North -show exactly the same setting in 2014 and 2015. The situation is slightly different in the south (Lancashire), where there is a positive signal (increasing estimates, blue triangle) for 2015 but a negative signal (downward estimates, red triangles) in 2014.
This indicates that flood hazard can be quite sensitive (especially in the south) to the changes in annual rainfall totals in the study region, and previous studies clearly pointed out significant changes in observed mean annual precipitation across the study area between 1960 and 2000 (i.e. mean annual precipitation between 1960-1990 and between 1970-2000), showing increases as high as 15% in Cumbria (Jenkins et al. 2007).
It is worth mentioning that common methods to quantify risk are based on the assumption that the distribution of the phenomenon under study (e.g. flooding) is unchanged. Although alternative methods to quantify risk under changing conditions have been proposed (Parey et al. 2007, 2010, Villarini et al. 2009a, Rootzén and Katz 2013, Salas and Obeysekera 2014, there is still no standard paradigm to assess risk when using models that allow for change. When it comes to time-dependent models, extrapolation of the future design quantiles can be unrealistic, as the change happening in the future might not happen the same way as it did in the past. Utilizing physically based predictors, on the other hand, establishes a relationship between covariate and variable (flood flows), yielding a better fit -however, potential future risk assessment would depend on the unknown future distribution of the physical covariate. Further, the relationship between the variable of interest and the physical covariate would need to remain the same. Given these methodological challenges, we do not attempt to assess future flood risk in northwest England but simply present the implications of the present-day (i.e. year 2015) conditions relative to the second-to-last year in our sample (i.e. 2014).

Summary and conclusions
This study investigated the presence of non-stationarity in annual maxima (AM) of river peak flow data series for numerous catchments across northwest England. The study was motivated by the concerns over the suitability of traditional procedures for the estimation of flood frequencies following the successive extreme floods in 2004, 2009 and 2015 in northwest England. Taking into account the indirect impact of climate/human-induced attributes, a linear regression model for the location parameter of the generalized logistic (GLO) frequency distribution model was constructed, where explanatory variables such as time, annual rainfall, and annual temperature, alongside their linear combinations, were integrated. In this context, maximum likelihood estimation (MLE) and the Akaike information criterion (AIC) were applied to the frequency models at 39 river gauging stations across the northwest of England to infer the estimated parameters and choose the best-fitting model, respectively. Our analysis revealed that 36 river stations demonstrated non-stationary behaviour, implying that the flood characteristics are changing over time. However, three gauges recorded no significant changes in any of their models' parameters (i.e. stationary behaviour was dominant). Among non-stationary-dominated streamgauges, the best model often included annual rainfall as the predictor, signifying that annual rainfall is the climatic driver most responsible for changes in the flood characteristics in northwest England.
Moreover, a general implication of this study for flood quantiles is that most rivers in the area showed a sharp increase in higher quantiles in the late 1990s, with an even sharper increase within the last 10 years of the recording period. This implies that the stochastic process of the distribution underlying peak flows might be changing in most cases, especially in recent years; the impact of this should be incorporated into the design of future hydraulic facilities. Hence, we highly recommend the consideration of a non-stationary framework alongside the traditional stationary analysis in northwest England, especially in the Cumbria region, as these implications can be put into practice (in the light of uncertainty analysis) and, finally, help in predicting the ongoing alterations in the flood frequency. This would prompt local flood managers to enhance the current flood management plans and reduce the flood risk.
Despite the notable improvements over the stationary fit, resulting from the physically based non-stationary distributions (e.g. when rainfall is included as covariate), further research needs to be carried out towards the estimation of the frequency distribution for the covariate itself. This means that when introducing an extra stochastic component such as annual rainfall into the model, there should be an additional study on whether the stochastic component is stationary or not. Future studies could also potentially consider different precipitation indices as covariates in a non-stationary framework, such as design rainfall quantiles or seasonal rainfall characteristics, selecting those that are more aligned with the seasonality of floods and processes driving the flood production mechanism in the study area. Table A2. Estimated (design) flood quantiles for 100-year return periods for 2015 and 2014, along with their 95% confidence intervals.

Gauge number
Design flood quantile associated with stationary model with 95% confidence limits (m 3 /s) Design flood quantile associated with the preferred non-stationary model in 2015 with 95% confidence limits (m 3 /s)