Generalized extreme value shape parameter and its nature for extreme precipitation using long time series and the Bayesian approach

ABSTRACT Assessing the probability of extreme precipitation events is consequential in civil planning. This requires an understanding of how return values change with return periods, which is essentially described by the generalized extreme value (GEV) shape parameter. Some works in the field suggest a constant shape parameter, while our analysis indicates a non-universal value. We re-analysed an older precipitation dataset (169 stations) extended by Norwegian data (71 stations). We showed that while each set seems to have a constant shape parameter, it differs between the two datasets, indicating regional differences. For a more comprehensive analysis of spatial effects, we examined a global dataset (1495 stations). We provided shape parameter maps for two models and found clear evidence that the shape parameter depends on elevation, while the effect of latitude remains uncertain. Our results confirm an explanation in terms of dominating precipitation systems based on a proxy derived from the Köppen-Geiger climate classification. EDITOR D. Koutsoyiannis ASSOCIATE EDITOR not assigned


Introduction
Extreme precipitation estimates are decisive for planning and design of important infrastructure, such as reservoir dams, avalanche mitigation measures, and power and transport lines. The accuracy of extreme precipitation estimates is therefore crucial for both economic and safety aspects (e.g. Coles and Tawn 1996, Blanchet et al. 2009, Eli et al. 2012, Dyrrdal et al. 2014. As mentioned by Serinaldi and Kilsby (2014), the history of extreme value theory (EVT) in its present formalization and its application to hydrological analyses is rooted in an extensive literature dating back to the 1940s. For a detailed historical survey on that subject, the reader is referred to Papalexiou and Koutsoyiannis (2013). The present paper focuses on the shape parameter, ξ, of the generalized extreme value (GEV) distribution. The GEV distribution encompasses three limiting distributions of extreme value depending on the value of the shape parameter (Coles 2001): • ξ > 0 giving the heavy-tailed Fréchet case (EV2); • ξ = 0 giving the light-tailed Gumbel case (EV1); • ξ < 0 giving the short-tailed negative-Weibull case (EV3). Koutsoyiannis (2004aKoutsoyiannis ( , 2004b has analysed the statistics of daily rainfall extremes and argued for the use of the EV2 distribution (with positive shape parameter) instead of the Gumbel distribution (EV1) when analysing rainfall data to avoid an underestimation of risk associated with extreme rainfall. L-moment estimation of the distribution's shape parameter, ξ, led Koutsoyiannis (2004aKoutsoyiannis ( , 2004b to conclude that ffi 0:15 and that it is "constant for all examined geographical zones (Europe and North America)". Recent work on the generalized Pareto (GP) distribution's shape parameter, GEV ffi GP (Serinaldi and Kilsby 2014), supports > 0 (Cavanaugh andGershunov 2015, Cavanaugh et al. 2015). The conclusion made by Koutsoyiannis (2004aKoutsoyiannis ( , 2004b was drawn from a comparison of the spread of ξ estimates in the data with that of simulations where ξ is constant. However, a strict statistical test with the assumption of a constant ξ was not carried out. More recently, Papalexiou and Koutsoyiannis (2013) presented an extension of the work of Koutsoyiannis (2004b) to analyse daily rainfall from 15 137 precipitation stations worldwide with observation lengths of 40-63 years. They concluded, among other things, that: (a) "The record length strongly affects the estimate [L-moment method only was used] of the GEV shape parameter and long records are needed for reliable estimates;" (b)"The GEV shape parameter is expected to belong in a narrow range, approximately from 0 to 0.23 with confidence 99%;" and (c) "The geographical location of the globe may affect the value of the shape parameter." In addition to the work mentioned above, which primarily focuses on the proper choice of the GEV distribution, there is a significant body of work in which the shape parameter was evaluated as a by-product (e.g. Fowler andKilsby 2003, Meddi andToumi 2015). However, there are almost no studies dedicated to the nature of the GEV shape parameter itself. The value of this shape parameter directly influences estimated values of extreme precipitation, which in turn are crucial for dimensioning expensive engineering constructions.
The main aim of the present paper was to investigate the properties of the GEV shape parameter by using a statistical modelling approach to facilitate comparison of different hypotheses in a consistent fashion. We use Bayesian methods because some of our models are hierarchical and some prior knowledge exists. Bayesian analysis has been used previously for, amongst others, extreme precipitation modelling (e.g. Coles and Tawn 1996, Coles 2001, Coles et al. 2003, Smith 2005, Alston 2011, Eli et al. 2012, Sun et al. 2015. Our analysis is conducted in three phases. In the first phase, to extend the analysis from Koutsoyiannis (2004b) consistently, we used the same time series. For a more comprehensive perspective, we extended these time series by additional data from 71 stations in Norway, each consisting of 99-131 years of observations. The objective of this phase of the study was to understand whether a statistical modelling and model comparison approach could broaden our knowledge of the shape parameter beyond the conclusions drawn by Koutsoyiannis (2004b). Three hypotheses about the shape parameter were tested in this part of the study, namely: a shape parameter that is (a) constant, (b) station specific, or (c) stochastic, but drawn from a common distribution. For this purpose, the same analysis was performed on a further extended (worldwide) dataset consisting of daily precipitation data from the Global Historical Climatology Network-Daily database (version 2.60, www.ncdc.noaa.gov/oa/climate/ghcndaily) referred to as the Extended international dataset.
In the second phase of the study, we examined regional differences in the extended Koutsoyiannis dataset (with Norwegian data).
The third phase was a spatial analysis of the shape parameter using the Extended international dataset. Only those stations with a minimum of 99 complete years of measurements were chosen in order to meet the requirement (a) by Papalexiou and Koutsoyiannis (2013). Spatial patterns of the shape parameter's distribution are discussed here.
The paper is organized as follows. Section 2 presents the data used in this study; Section 3 describes the statistical models for precipitation annual maxima; and Section 4 explains the tested hypotheses and the corresponding model structures. In Section 5 we discuss the choice of prior distributions to be used in Bayesian inference, and how Bayesian hypothesis testing is performed. Section 6 presents the results of the analyses, and in Section 7 we summarize our conclusions.

Data
In the first and the second phases of the study, we used the collection of 169 of the longest available rainfall records worldwide from Koutsoyiannis (2004b), each having 100-154 years of data, with annual maximum values (excluding the years with missing data). According to Koutsoyiannis (2004b), these time series were chosen after examination of some thousands of raingauge time series from Europe and the USA, namely data from the United States Historical Climatology Network (USHCN), Land Surface Observation Data of the UK Met Office, and data from the oldest stations of France, Italy and Greece (Fig. 1). Years with more than five missing daily values in two or more months were excluded.
Seventy-one additional time series, each having 99-131 years of data (excluding the years with missing data), were chosen among 3531 time series from precipitation stations in Norway (data from the web-service of the Norwegian Meteorological Institute (MET Norway), eKlima.no) (Fig. 1). Years with 36 or more missing daily values per year (≥10% of full year with data) were excluded.
The extended international dataset, used in both the first and third phases of the study, comprises 1495 daily precipitation time series (see Fig. 2 for geographical locations). These time series were selected from more than 15 000 precipitation records available in the Global Historical Climatology Network-Daily database (version 2.60, www.ncdc.noaa.gov/oa/climate/ghcn-daily) according to our requirements of 99-year minimum record length with less than 5% missing values. Raingauge geographical coordinates (longitude, latitude and elevation) were included in the datasets to perform spatial analysis.

Statistical models
For the analysis of precipitation time series, it is assumed that the distribution of yearly maximum precipitation for a given station follows the GEV distribution: where z is a yearly maximum precipitation, μ is a location parameter, σ is a scale parameter and is the shape parameter. Here, the shape parameter is assumed not to be zero. If it is zero, the Gumbel distribution is used. The location and scale parameters can be assumed to differ from station to station according to how dry or wet the place is and how much variability there is in the yearly extremes. The analysis of Koutsoyiannis (2004aKoutsoyiannis ( , 2004b suggested that was the same for all stations for precipitation data. The objective of the first phase of our study is to perform direct statistical tests of this hypothesis ( is universal for precipitation data) as well as to study whether the hypothesis holds upon inclusion of additional Norwegian data and possible reasons for regional differences of values. Let i 2 1; . . . ; n f gdenote the station number and let z i;1 ; . . . ; z i;n i È É denote the n i annual precipitation maxima for each station. Under the assumption of independent data, the combined likelihood is: While this equation expresses the likelihood for all GEV parameters separately for each station, it can also accommodate models in which the shape parameter is the same for all stations, by setting for all i.

Hypotheses
Three hypotheses were tested during the analysis of the Koutsoyiannis and Norwegian data (Phase I of the study): a. The shape parameter differs from station to station (individual). If so, the expression for the likelihood in Equation (2) is used as it is. b. The shape parameter has a parametric distribution determined by the data, where λ is the parameter set (as a vector) of the distribution of the shape parameter, . That means that the shape parameter is regarded as a random effect. Since this requires a probabilistic treatment of the station-wise shape parameter, , it becomes simpler to treat the entire parameter set in a Bayesian fashion than to split the analysis into a Bayesian part for λ and a classic part for the other parameters (including those that determine the distribution of λ). c. The shape parameter is universal (common constant value). In this case i ¼ for all stations, i (Equation (3)).
After the inference about the nature of was made, a new block of hypotheses was created.
In the case that the results are in favour of hypothesis b or c for a given dataset, it is also possible to go further, since both these cases contain parameters that are "global" to that dataset. For hypothesis b, the parameters describing the distribution of the station-wise are global. For hypothesis c, it is the shape parameter, , itself that is global.
When data from two different regions are involved, it is of interest to test if the global parameters are the same or different in the two sets. This testing constitutes Phase II of the analysis. Thus, instead of a constant value of for all stations (hypothesis c), there could be one shape parameter, 1 , if a station belongs to dataset D 1 , and another, 2 , if that station belongs to dataset D 2 . Hence, it is possible to test if the shape parameter is truly global or rather regional (belonging to a particular selection of stations). The regional model (hypothesis) thus takes the form: d. i ¼ region1 if station i belongs to the first region and i ¼ region2 if it belongs to the second region.
The results from the regional analysis in Phase II encouraged us to try spatial analysis for the Extended international dataset. Considering the as realizations of a random field, their values can be described by a multivariate distribution accounting for mutual (spatial) dependence. This can be seen as an extension of hypothesis b, as each i is again assigned a distribution, but this distribution also describes the correlation between the at different places. We assigned a multivariate normal distribution, so that: where ¼ 1 ; 2 ; . . . ; m ð Þis a vector of all , μ is a vector of expected values for the (assumed the same for all stations in the simplest model) and Σ is a covariance matrix. Assuming the same variance for each station, each element can be described by where ρ ;i;j is the correlation between i and j and σ 2 is the variance of the . Many sophisticated models for the correlation can be made, but without any indication of which would be the right one, we opted for what is arguably the simplest one: where r i;j is the distance between stations i and j, and R is a parameter that encodes the characteristic correlation length, i.e. R is the distance for which the correlation drops by a factor of e and, by analogy to the meanreverting process, can be called the "characteristic distance" of the spatial field. Distances between stations were calculated using a spherical approximation, so that: where φ i ; φ j is the latitude in radians, # i , # j is the longitude in radians of stations i and j, respectively, and E is the radius of the Earth.
Calculating the probability density of a huge array of values can be time consuming, making likelihood calculations computationally intense due to the need of finding the determinant and inverse of the covariance matrix or the corresponding correlation matrix. It was thus determined to discretize the parameter R (in 40 values ranging exponentially from 1 km to the circumference of the Earth) and calculate the determinant and inverse of the correlation matrix for each discretized value before doing the inference. This significantly speeded up the analysis.
Additional structure to a spatial model could be assigned by correcting for the height and latitude of each station in the expected value using linear models. We created four spatial model variants this way, which were the models examined in Phase III of our analysis: e. Global expected value: where μ 0 is the global expected value and 1 is a vector of ones. f. Expected value depends linearly on height: where h is a vector consisting of the heights of each station and β h is a regression parameter that describes how the expected shape parameter changes with height. Hence, if β h ¼ 0:0001; for instance, the expected shape parameter changes from μ 0 at sea level to μ 0 þ 0:1 at a height of 1000 m. g. Expected value depends linearly on height and latitude: , where ϕ is a vector of latitudes and β ϕ1 is a regression parameters describing how much the expected value changes with changing latitude. h. Expected value depends linearly on height and linearly and quadratic on latitude: where ϕ 2 is a vector of squared latitudes and β ϕ2 is a regression parameter describing how much the expected value changes with squared latitude.

Priors
As this study contains Bayesian hierarchical models, Bayesian inference is necessary when it comes to both parameter estimation and model choice. The middle part of the hierarchy could conceivably be handled as random effects in a frequentist setting, but, since there is no ready-made methodology for our models, a Bayesian treatment offered fewer inferential and numerical challenges. In addition, we are interested in the uncertainty of the shape parameter, which can be read directly from the posterior distribution of a Bayesian analysis, but which would require extra steps in a classic analysis. In both cases, a prior distribution for the parameters is required. Prior distributions are used to represent a set of beliefs about the parameter of interest (Eli et al. 2012). In Bayesian statistics, this prior knowledge about the parameters is updated by the data using Bayes' theorem: where D is the data, θ is the parameter set, f θ ð Þ is the prior distribution, f ðDjθÞ is the likelihood and is a normalizing constant known as the marginal likelihood. One does not need to know the marginal likelihood in order to sample from the posterior distribution, but it is crucial for calculating model probabilities in Bayesian model selection (see Appendix for details). The return period for a given outcome, x, in extreme value analysis is defined as: i.e. the expected number of years before that value will be exceeded. The outcome x is then called the T-year return value. According to Coles and Tawn (1996), using the prior knowledge of an expert hydrologist, a Bayesian 95% interval estimate of the 100-year return level for daily rainfall was found to be approximately half of the width of the corresponding likelihood-based confidence interval. For this study, we used a relatively wide (in terms of having a larger variance and credibility bands of predictions than would be expected from the data) and unstructured (in terms of assuming parametric independence rather than examining which parameter combinations yield reasonable predictions) prior, as well as a narrower and structured prior. The wide prior (assuming that there is little information available about the process apart from the data) was taken from Smith (2005), where wide marginal distributions for the GEV parameters were set directly and independently. Smith (2005) used the following prior distribution for each parameter: , N 0; 10 ð Þ The joint prior distribution is an independent combination of these normal single parameter distributions with large variances. The variances are chosen large enough to make the distributions almost flat, corresponding to prior ignorance. This represents a problem in two respects. Firstly, the prior can correspond to a distribution on the quantiles that is entirely unrealistic, in which case it can be described as being too wide. It is not easy to have intuition as to whether a particular combination of GEV parameter values is realistic or not. However, priors yielding a physically unrealistic 10-year return level (i.e. the 0.9 quantile of the GEV distribution) should be discarded. If this is the case, the prior cannot be said to represent our knowledge on the topic and is thus unrealistic. As it turns out, Smith's prior gives a 27% chance of a negative 10-year return value and similarly a 26% chance of it being larger than 10 30 . Secondly, while a too wide prior is seldom seen as a problem for parameter estimation when enough data are available, it can, nonetheless, pose serious problems for model selection. An observation named Bartlett's paradox (Bartlett 1957) states that the Bayes factor for a model compared to a simpler model (a zero hypothesis) can go to zero when the width of the prior distribution goes to infinity. In other words, a too wide prior can give an unfair advantage to a simpler model, and it might be difficult for the data to overcome this bias in the model selection. However, care must be taken when specifying a narrow prior, so that it does not penalize any parameter estimates deemed reasonable.
In addition, a more specific prior considerably reduces the uncertainty connected to value estimates. An example of the effect of a narrow band for the shape parameter, (posterior band with Coles and Tawn (1996) prior described below), compared to one with the wide prior (Smith's prior, Equation (18)) is given in Figure 3. Data from the Norwegian precipitation station no. 1650, Strømsfoss sluse (130 years with data), were used.
There is another prior for GEV models on precipitation data, elicited by Coles and Tawn (1996). This prior assigns a gamma distribution to the 10-year return value, T10, the 100-year return value minus the 10-year return value (T100 − T10), and the 1000year return value minus the 100-year return value (T1000 − T100). Thus, one gets a reasonable range of return values, given that reasonable (well-founded) hyper-parameters for the gamma distributions are specified. As the gamma distribution only has two parameters, this can be done by specifying (for instance) the 95% prior credibility band. For this study, we set a 95% credibility band for T10, T100 − T10 and T1000 − T100 to the interval 3-600 mm. Such a prior then implies a distribution for the parameter set also, which can be calculated using the transformation rule for probability distributions. Even though the distributions for T10, T100 − T10 and T1000 − T100 are set independently, there is a dependency between the GEV parameters in the prior distribution (hereafter, this elicitation approach is referred to as the Coles & Tawn prior). If we define the vector: and the vector: φ ; T10; T100 À T10; T1000 À T100 ð Þ then the Coles & Tawn prior can be expressed as: where f φ is the distribution of φ described in Equations (21)-(23).Thus, the prior distribution is readily available for the station-wise analysis. However, for models with common shape parameter or shape parameter from a common distribution, care must be taken. In the case of an underlying common distribution (hypothesis b) for the shape parameter, it is this distribution that determines the prior for the shape parameter, not the individual return periods. Similarly, for a constant (hypothesis c), this cannot be specified by the return values of a single station, so that a separate prior distribution is used instead. In this case, we used a normal prior distribution: which has a 95% credibility band roughly from −1 to +1. Given the shape parameter, location and scale parameter, a distribution can be determined by the distribution of T10 and T100 − T10, which again were assigned a gamma distribution with 95% prior credibility interval 3-600 mm. Thus, for hypothesis c: If i is considered to belong to one of two regions (hypothesis d), region1 or region2 , then similarly: For the distributional model for , i.e. hypothesis b, we get where μ and σ are the two parameters of the distribution for . For a spatial model, where the distribution of the shape parameters, f , is described in Equations (5)- (12) and (30)-(32), we similarly get: Only normal and lognormal models for were tried; both have just one location parameter and one scale parameter. The results based on the lognormal model are not shown, since the model gave a worse fit compared to the normal model. For the spatial models, we used a uniform prior on the log scale for the discretized characteristic correlation length, R. Thus, this prior was simply constructed by specifying that all log ranges are equally probable and that the range is more than 1 km (most stations were placed more than that distance apart from each other) and not so large that two stations on the opposite side of Earth are strongly correlated. For the regression coefficient, we used a normal distribution so that: with 95% prior probability. This means that we do not expect to change by more than 0.1 for every 1000 m, nor to change by more than 0.2 from the Equator to a pole.

Hypothesis testing
Bayesian hypothesis testing is based on the Bayesian model likelihood (BML, see Appendix). From this, one can calculate model probabilities (where one can compare multiple hypotheses), or the Bayes factor (where one compares two hypotheses). Using the latter approach, Bayesian hypothesis testing can be represented by an analysis of the Bayes factors (Jeffreys 1961), which compares the data prediction strength of one hypothesis with the data prediction strength of another. The Bayes factor interpretation scale from Jeffreys (1961) is given in Table 1, which provides an evaluation of how many times data are more probable under Model 1 compared to Model 2. However, the amount of time series (stations) analysed is of high importance. For example, if for every station the data are just 5% more probable under Model 1 than under Model 2, with data from 169 stations this gives a Bayes factor of 10 3 -10 4 , which represents very significant evidence. In other words, even a small difference per station will provide strong evidence if the number of stations is high enough; and the opposite will occur if the number of stations is low, so that a poor value for the Bayes factor can be expected with almost no regard to the strength of an effect. For instance, a low Bayes factor of 3 for 17 stations (as is shown in Table 2) means that the data under Model 1 are 6% more probable than under Model 2. This difference is 1% bigger than in the previous example. However, because only a few stations were analysed, the resulting Bayes factor (equal to 3) cannot be considered as strong evidence (see Jeffreys 1961 and Table 1). That is why it is essential to be careful when making an inference based on data from just a few stations.
Different priors may also lead to different parameter estimates. As shown in Table 2, the estimates of the shape parameter, (medians), differ depending on the choice of prior. However, since Smith's prior has a rather unrealistic nature, whereas the Coles & Tawn prior incorporates reasonable assumptions (T10, T100 − T10 and T1000 − T100), the estimates based on the Coles & Tawn prior are supposed to be more trustworthy than those based on Smith's prior. Nevertheless, the posterior 95% credibility intervals from both priors overlap a lot.
While the shape parameter analysis itself was performed using Bayesian methodology, we performed  some simple classic tests on the estimated shape parameters from the spatial analysis, in order to see if climate classes or climate groups could explain some of the differences. This was done using p-values from ANOVA testing. Also, the BIC (Bayesian information criterion), where a classic estimate of the likelihood is penalized by the model complexity (using a Bayesian justification for the penalty term), was utilized for selecting among climate classes and climate groups.
6 Results and discussion 6.1 Phase I: global, distributional or constant shape parameters for the International, Norwegian and Extended international datasets As described in Section 4, in the first phase of this study we tested hypotheses a, b and c.
The results of the Bayesian comparison between the models for the International, Norwegian and Extended international datasets, as well as the estimates of the shape parameter's median and posterior 95% credibility interval, are given in Table 2. In all cases, the model selection suggests a constant shape parameter (hypothesis c). According to Table 2, the evidence for the shape parameter, , being global (common constant value; hypothesis c), is decisive for both datasets with both priors (Bayes factor: 4000-10 319 ), except for the Norwegian dataset with the Coles & Tawn prior, where the evidence is still strong (Bayes factor = 20).
6.2 Phase II: examining regional differences The significant difference between the estimates for the International and Norwegian datasets (0.117 and 0.044, respectively) in combination with the decisive evidence for the global constant value of this parameter (Table 2), suggested that further investigation was required to find out whether the shape parameter is globally constant or gradually changing, giving rise to regional differences. To this end, a new set of hypotheses was created to make regional inference on (hypothesis d, described in Section 4).
The results of the Bayesian comparison between the models for the International and Norwegian datasets and their subsets, such as USA, Europe, UK, South USA, North USA, East USA and West USA (described in Section 2), are presented in Table 3. A dataset in each row of Table 3 represents the data from the combined set of stations for the two regions. When evidence is in favour of one constant shape parameter, rather than two regional ones, the Bayes factor for model c vs model d is given, rather than the Bayes factor for model d vs model c: We found evidence of regional differences between the International dataset and Norway, between the UK and Norway, and between West and East USA. We did not find evidence for regional differences between the USA and the UK, between South and North USA, or between West USA and Norway. Estimated medians and posterior 95% credibility intervals of the shape parameter for the same datasets/subsets are shown in Table 4.
According to the results summarized in Table 4, assumes values between 0.028 and 0.156 (95% credibility interval) and varies depending on the region. A positive supports Koutsoyiannis' conclusion that the EV2 distribution ( > 0) should be used for inference on extreme precipitation instead of the Gumbel distribution ( ¼ 0), since the latter underestimates the values (Koutsoyiannis 2004a(Koutsoyiannis , 2004b. We found decisive evidence that is different for Norway than for the UK (Table 3). This indicates that neither the location on the western coast of the European continent nor the proximity to an ocean has a significant influence on the value of . Moreover, we found even stronger evidence for the South and North USA having a common shape parameter. This rejects the hypothesis that depends strongly on latitude.
Nevertheless, is undoubtedly different for the Norwegian stations compared to the International (Bayes factor = 2 × 10 10 ; Table 3); in addition there is Table 3. Bayes factors for a regional model, d, vs constant shape parameter model, c, for various selections of two regions. Smith's prior is described in Equations (16) evidence of a difference between the western and eastern parts of USA. Blanchet et al. (2009) differentiated between extreme rainfall and extreme snowfall distributions, referring to Katz et al. (2002). In an attempt to explain our results, we investigated the hypothesis of the shape parameter being dependent on the form of precipitation (rain/ snow) by analysing the Norwegian dataset from the following subgroups for selecting annual maxima: (1) Norway (summer, 71 stations): only observations from 1 May to 1 October were used; (2) Norway (winter, 71 stations): only observations from 1 October to 1 May of the following year were used; (3) Norway (summer rain, 61 stations): only observations specified as "rain" in periods from 1 May to 1 October were used; (4) Norway (the rest, 71 stations): all observations from the Norwegian dataset excluding Norway (summer rain) were used; (5) Norway (rain, 61 stations): only observations specified as "rain" were used; and (6) Norway (snow, 71 stations): only observations specified as "snow" and/or "sleet" were used.
Our results, presented in Table 5, showed no dependence of the shape parameter on the precipitation form in Norway, and the hypothesis was eventually rejected.
Strong evidence (Bayes factor = 20) was found for a common constant shape parameter for West USA and Norway. Both data regions belong to mountain areas; however, the stations within each dataset are situated at different elevations. Possibly it is the ruggedness of an area that has an influence on precipitation pattern and, hence, on the shape parameter, rather than the height above sea level alone. However, a spatial analysis was needed to investigate this question.

Phase III: spatial analysis of the Extended international dataset
The second part of the data analysis (Section 6.2) has given strong evidence for there being regional differences in the shape parameter. Since the Extended international dataset contained stations distributed globally, we considered spatial models rather than regional or global models to be more appropriate for this analysis. In this manner, we were able to extract the spatial patterns suggested by the data, as well as find some characteristics of these patterns in the form of characteristic correlation length R (described in Section 4) and regression coefficients.
From the estimated BML (see Appendix) values, we found evidence that height and possibly latitude were variables that worked well as linear predictors for the shape parameter, while squared latitude did not improve the results. In other words, hypotheses f and g were preferred over hypotheses e and h (Section 4). The Bayes factor, which discerns between presence or absence of a height dependency, was B fvse % 10 25 , while the Bayes factor for a linear latitude dependency vs a second-order polynomial dependency was B gvsh % 10 50 : However, due to computational limitations, we could not resolve which of hypotheses f and g had evidence on its side (as the BMLs were approximately equal for the two hypotheses).
Despite the numerical uncertainty connected to the estimation, the analysis suggested that R = 250 km, with a 95% credibility interval (150 km, 500 km). Thus, regions corresponding to small countries or administrative regions within larger countries can be assigned roughly the same shape parameter, but larger areas such as continents can be expected to be heterogeneous. This can also be seen in the maps of the estimated values (medians; Figs. 4 and 5). Corresponding maps with standard deviations of the estimates are shown in Figures 6 and 7. Both maps present estimated values for 0 m elevation (sea level). The model g (hypothesis g) considers latitude, as shown in Equation (11).
Some similarities can be seen in the spatial distribution patterns of estimates simulated by the two models f and g. The eastern coast of the USA has lower values than its western coast; Australia has the opposite situation combined with stronger gradients. The whole of Scandinavia and the northwestern parts of Russia show low values. The southern part of Norway has the lowest estimated values of . Eastern Siberia, the coastal area of the Caspian Sea and the northern coast of the Black Sea have relatively high values. The highest values were estimated for Australia by both models. While it may seem obvious from Figures 6 and 7 that the estimated shape parameter generally has a lower value for model f than for model g, two notes of caution should be made: (1) This is a trend per area rather than per station.
It turned out that the shape parameter estimates for model f were not different from the shape parameter estimates for model g when averaged  out over the stations. However, averaged out over an area, the shape parameter estimates can be different in the two models.
(2) These are estimates for the situation at sea level, thus not corrected for the real geographical height of each point. (Correction for height  would entail a much finer resolution of the spatial analysis as well as usage of a global map with similarly fine resolution for elevation.) The estimate for the regression parameter for height, β h , was approximately −0.072 per 1000 m for both models, so the shape parameter drops for high altitudes.

Discussion
Few attempts have been made in the literature to explain the origin of the spatial differences in the parameters for the distribution of extreme precipitation. One example of such an analysis was performed by Blanchet et al. (2009). However, they used only observations of extreme snowfall measured during 10 consequent years in the Swiss Alps (239 stations). As shown in Papalexiou and Koutsoyiannis (2013), such short time series are not sufficient to make a good inference on the shape parameter and, hence, extreme precipitation distribution. Nevertheless, Blanchet et al. (2009) found an indication of nonlinear dependence of the shape parameter on altitude and mentioned a possible correlation with dominating precipitation systems.
Another investigation performed by Dyrrdal et al. (2014) that proposed a statistical analysis directly on areal 24-h precipitation from a gridded dataset in Norway, suggested that the shape parameter varies spatially according to the dominating precipitation systems and, most probably, to the degree of orographic enhancement. Villarini and Smith (2010), in their study of flood peak distributions for the eastern United States, showed that tropical cyclones have a large impact on the GEV shape parameter values. Unfortunately, it was not possible in the present work to test the hypothesis that the shape parameter depends on the dominating precipitation systems, since a worldwide spatial classification of dominating precipitation systems does not exist yet. However, we made an attempt to use the existing Köppen-Geiger climate classification (http://webmap.ornl.gov/wcsdown/ wcsdown.jsp?dg_id=10012_1, Peel et al. 2007) for this purpose. Serinaldi and Kilsby (2014) pointed out some similarities between the Köppen-Geiger classification zones and the spatial behaviour of their estimates of the generalized Pareto distribution shape parameter. We re-grouped the 32 climate zones (Table 6) into 10 zones according to precipitation conditions (Table 7). Shape parameter estimates (medians, at 0 elevation) for the Extended international dataset stations (both f and g models) were then plotted against the climate zones and the "climate-precipitation" groups to which the stations belong (according to the Köppen-Geiger climate classification). The resulting box plots are shown in Figures 8(a) and (b) and 9(a) and (b) (respectively for the full classification and for the re-grouped).
There are visible differences in the shape parameter estimates from different "climate-precipitation" zones. ANOVA tests suggest that there is strong evidence (very low p-values, smaller than about 10 −15 ) for the residuals having different means for different climate zones as well as for different "climate-precipitation" groups, for both model f and model g residuals. Climate zones were compared to "climate-precipitation" groups using BIC, which suggested that the latter were better for predicting the residuals. However, to investigate this dependency more research is needed. Spatial correlation might invalidate these results, as ANOVA analysis does not take this correlation into account. Table 6. Köppen-Geiger climate classification (Peel et al. 2007 As an overall summary of our analysis, we obtained a global average (expected value) of the shape parameter, which equals 0.139 (model f, estimates for sea level), with a 95% credibility interval ranging from 0.127 to 0.150. Thus the estimate of 0.15 by Koutsoyiannis (2004b) obtained from a smaller dataset is still consistent with, but lies at the high end of, the range of values suggested by the present analysis of the extended dataset.
Confirming the statement of Dyrrdal et al. (2014) regarding the shape parameter's dependency on the degree of orographic enhancement, we found a height dependency, suggesting a decrease of the shape parameter of about 0.07 per 1000 m height, with: β h Â 1000 m 2 À0:088; À0:056 ð Þ (34) as a 95% credibility interval. This means that at a height of 2000 m, the expectation for the shape parameter will drop from about 0.14 to zero (the shape parameter of the Gumbel distribution). However, extrapolating the decrease of the shape parameter to negative values can be seen as physically inappropriate (Koutsoyiannis 2004a). This suggests that the dependency on elevation found may be nonlinear in reality.
The stationary standard variation of the spatial process was estimated to be 0.05, with (0.04, 0.06) as a 95% credibility interval for this parameter. Since the global expected value was approximately 0.14, this means   that, at sea level, 95% of all stations are expected to have a shape parameter of between 0.04 and 0.24. At 1000 m height, this changes to i 2 À0:03; 0:17 ð Þfor 95% of all stations.

Conclusions
The main conclusions of our study can be summarized as follows: • The shape parameters of various stations are neither entirely separate nor independent if a distribution is assigned to them (hypotheses a and b were rejected). At the same time, the results presented here show that the shape parameter is not a universal, globally common value, but a regionally (large-scale) common value. Regions corresponding to small countries or administrative regions within larger countries can be assigned roughly the same shape parameter, but larger areas such as continents can be expected to be heterogeneous. • The global average (expected value) for the shape parameter is equal to 0.139, with a 95% credibility interval ranging from 0.127 to 0.150. • Shape parameter decreases with elevation: by 0.07 per 1000 m height, with β h Â 1000 m 2 À0:088; À0:056 ð Þ as a 95% credibility interval: • There is no detectable dependency of the shape parameter on the precipitation form (rain/snow). • It is very likely that the shape parameter varies according to dominating precipitation systems. However, more research is needed to define this dependency. • It remains unclear whether the shape parameter changes with latitude in a systematic fashion. • Maps of the shape parameter's global distribution were created (Figs. 4 and 5) and can be used for estimating extreme precipitation for engineering purposes. However, uncertainty of ξ (local) values cannot be neglected, and this uncertainty (see Figs. 6 and 7) is close to the variability of local (median) estimates across the globe.
returned, meaning that a total of 8000 MCMC iterations were performed. For the hierarchical spatial models, this was all we could afford, as the combination of spatial modelling, thousands of parameters (one μ; σ and parameter for each of 1495 stations), a large dataset and 8000 iterations meant an execution time of nearly a month with our available computing resources at the time. One can make proposal distributions for the whole parameter set. However, it is also possible to traverse the parameters one parameter at a time and when a particular parameter is handled the proposal is performed for the value of that parameter while the values of the other parameters are held constant. This is often easier to implement. In hierarchical models, such as some of those being examined in our study, such a parameter set traversal is much simpler to implement. Thus, we used parameter set traversal rather than seeking a single proposal for the whole parameter set.
The implementation and execution efficiency will depend on the proposal distribution of each parameter. As we have many different models and many parameters per model (at least for the hierarchical models), implementation efficiency was prioritized.
If the proposal distribution is symmetric along the previous values, so that: h θ proposed jθ previous ¼ hðθ previous jθ proposed Þ (A3) one term from the acceptance probability falls away and one is left with the original Metropolis algorithm (Metropolis et al. 1953). This can be achieved, for instance, by letting: where is standard normally distributed and is called the random walk MCMC method (RWMCMC). The RWMCMC method only depends in efficiency on the standard deviation of the proposal distribution, σ prop . An optimal value for σ prop can be obtained manually, but since we have many models and a great many parameters (for the hierarchical models), an automated method for optimizing σ prop was sought. An acceptance rate of approximately 1/3 is deemed near optimal (Roberts et al. 1997). We thus added a step in the burn-in phase in which the σ prop for each parameter was adjusted up if there was more than a 1/3 acceptance rate and down if there was less than a 1/3 acceptance rate. We also included the use of the parallel tempering algorithm (Geyer 1991), in order to deal with the possibility of multimodality in the posterior distribution. Tempering consists in that a number, N, of MCMC chains are run in parallel, one for the target distribution (the posterior distribution), f ðθjDÞ, and N À 1 for a modified distribution proportional to f ðθjDÞ 1=T i , for a set of "temperatures", T i > T iÀ1 f g i¼2;...;N . Thus, each higher distribution will be a smoother version of the distribution preceding it. Sometimes, the algorithm will propose switching the values of neighbouring chains. This makes it possible for the original chain to bridge the gap between two modes in the target distribution. For the spatial models, we needed N ¼ 12 parallel chains, with T i gradually increasing from 1 to 4, in order to get stable results. Even so, some runs failed, but we achieved three runs that gave consistent results.

BML estimation
The BML (Equation (14)) of a model can be used for calculating Bayesian model probabilities or the Bayes factors. From Bayes' theorem, the posterior model probability is: