Modeling spatial processes with unknown extremal dependence class

Many environmental processes exhibit weakening spatial dependence as events become more extreme. Well-known limiting models, such as max-stable or generalized Pareto processes, cannot capture this, which can lead to a preference for models that exhibit a property known as asymptotic independence. However, weakening dependence does not automatically imply asymptotic independence, and whether the process is truly asymptotically (in)dependent is usually far from clear. The distinction is key as it can have a large impact upon extrapolation, i.e., the estimated probabilities of events more extreme than those observed. In this work, we present a single spatial model that is able to capture both dependence classes in a parsimonious manner, and with a smooth transition between the two cases. The model covers a wide range of possibilities from asymptotic independence through to complete dependence, and permits weakening dependence of extremes even under asymptotic dependence. Censored likelihood-based inference for the implied copula is feasible in moderate dimensions due to closed-form margins. The model is applied to oceanographic datasets with ambiguous true limiting dependence structure.


Introduction
The statistical modeling of spatial extremes has received much attention since the article of Padoan, Ribatet, and Sisson (2010) provided a method of inference for max-stable processes. The latter form an important class of models for spatial extremes, as they arise as the only nondegenerate limits of renormalized pointwise maxima of spatial stochastic processes. More precisely, let Y i (s), i = 1, 2, . . . , be independent and identically distributed copies of a stochastic process {Y (s) : s ∈ S} with index set S ⊂ R 2 . If there exist functions a n (s) > 0, b n (s) such that the limiting process has nondegenerate marginals, then M(s) is a max-stable process (de Haan and Ferreira 2006, chap. 9). A practical issue with max-stable processes is that their d-dimensional densities (and hence the likelihood function) are difficult to evaluate, as the number of terms involved equals the dth Bell number, which grows super-exponentially with d. As such, spatial models for high threshold exceedances, which have simpler likelihoods, have become more appealing; see, for example, Ferreira and de Haan (2014), Wadsworth and Tawn (2014), Engelke et al. (2015), Thibaud and Opitz (2015), and de Fondeville and Davison (2018). The threshold exceedance analog of the maxstable process is known as the generalized Pareto process, and has a similar asymptotic dependence structure in its joint tail region.
In order for limiting max-stable or generalized Pareto processes to provide good statistical models, we require that the extremes of Y (s) are well represented by these processes, that is, adequate convergence has occurred. However, there are no guarantees on rates of convergence, and in practice, limit models may not hold well. One way to assess the validity of convergence is to assess whether the stability properties of limit models hold well: Max-stable copulas are invariant to the operation of taking pointwise maxima (max-stability), while generalized Pareto copulas are invariant to conditioning on threshold exceedances of higher levels (threshold-stability). A graphical diagnostic for max-stability is presented in Gabda et al. (2012), while for threshold-stability, one can examine plots of where the argument s j denotes the jth spatial location; if the data follow a generalized Pareto process law, then this function should be constant as the quantile u tends to one (Rootzén, Segers, and Wadsworth 2018). For environmental data in particular, it is much more common to see estimates of (2) decreasing as u → 1, indicating that dependence weakens with level of extremeness. An example of this is given in Figure 1, for a dataset of significant wave heights, to be analyzed in Section 4.1.
If the limit of χ u defined in (2) as u → 1 is positive for all sites s 1 , . . . , s d and all d ≥ 2, the process Y (s) is termed asymptotically dependent, and eventually, possibly at much higher levels, a generalized Pareto process should represent a suitable model for the data. If the limit is zero for all sites s 1 , . . . , s d and all d ≥ 2, we term the process asymptotically independent; in such cases, no generalized Pareto model would ever be suitable. Intermediate scenarios are possible, but owing to the structure of spatial data, it is common over small spatial domains to assume that the process is either asymptotically dependent or asymptotically independent, and we assume this here also. Determining a suitable model for the data usually requires distinguishing between these two scenarios; since most models exhibit only one type of dependence, choosing the incorrect class will lead to unsuitable extrapolation into the joint upper tail (Ledford and Tawn 1997;Davison, Huser, and Thibaud 2013).
In practice, because asymptotic properties are always difficult to infer, it is ideal to fit spatial models encompassing both asymptotic dependence classes, and let the data speak for themselves. To our knowledge, the only instance in the literature of such a hybrid spatial extreme model is the max-mixture model of Wadsworth and Tawn (2012). However, in that model, asymptotic independence only occurs at a boundary point of the parameter space, thus inference methods allowing for this are nonregular. Moreover, the model is highly parameterized and requires pairwise likelihood fitting methods.
In this article, we address such deficiencies by presenting a class of spatial processes described by a small number of parameters and making a smooth transition between the two dependence paradigms. Specifically, we propose a novel class of spatial extremal models that have nontrivial asymptotically dependent and asymptotically independent submodels with the transition taking place in the interior of the parameter space. The latter property allows us to quantify our uncertainty about the dependence class in a simple manner. Our new spatial models can thus be viewed as similar in spirit to the generalized extreme-value (GEV) distribution in the univariate case, which was introduced by von Mises (1936) and Jenkinson (1955) as a three-parameter model combining the three limiting extreme-value types (i.e., reversed Weibull, Gumbel, and Fréchet), hence providing a way to make inference without specifying the asymptotic distribution family prior to fitting the model. Furthermore, subject to model assumptions, standard hypothesis testing methods can be used to assess the evidence for asymptotic dependence over asymptotic independence, if so desired.
In encompassing both extremal dependence classes, our approach has similarities with the bivariate model of Wadsworth et al. (2017). However, our construction here is simpler and substantially more amenable to higher dimensional inference. Other related work that allows for both asymptotic dependence structures in a spatial setting is the Gaussian scale mixture models proposed in the recent work of Huser, Opitz, and Thibaud (2017), but their models either make the transition at a boundary point of the parameter space, or are inflexible in their representation of asymptotic independence structures.
The article is organized as follows. Section 2 describes the new spatial model and its extremal dependence properties. Section 3 details censored likelihood inference, describes a test for the asymptotic dependence class, and presents a simulation study validating the methodology. The new model is then applied to two oceanographic datasets in Section 4, while Section 5 concludes with some discussion. All proofs are deferred to Appendix A.

Copula-Based Approach
The main goal of this work is to provide flexible extremal dependence structures for spatial processes. As such, we take a copula-based approach and seek the construction of flexible families of copulas for spatial extremal dependence. For a process with marginal distribution functions X j ∼ F j , the d-dimensional copula function C, is defined as When the margins F j , j = 1, . . . , d, are continuous, which will be the case throughout this paper, the copula is unique (Sklar 1959), and represents a multivariate distribution function with standard uniform margins. In Section 2.2, we describe construction of a model whose copula displays interesting extremal dependence properties. Details of likelihood calculations of the copula of the model we introduce are presented in Section 3.

Construction
Let {W (s) : s ∈ S ⊂ R 2 } be a stationary spatial process with standard Pareto margins, and displaying asymptotic independence with hidden regular variation; a consequence of this is that for any x ≥ 1, where L W : (0, ∞) → (0, ∞) is slowly varying at infinity, that is, L W (ax)/L W (x) → 1 as x → ∞ for any a > 0, and 0 < η W (h) < 1 for h = s j − s k = 0 (Ledford and Tawn 1996;Resnick 2002). Note that we exclude the further possibility η W (h) = 1 (h = 0), L W (x) → 0 as x → ∞, because this does not arise in models that we might naturally consider for W (s).
The parameter η W (h), called the coefficient of tail dependence, summarizes the joint tail decay of the process W (s) and it is a function of the lag vector h. For simplicity, in what follows, we will restrict ourselves to isotropic processes, and will therefore write η W (h) (or, for notational convenience, η W , when no confusion can arise), where h = h = s j − s k ≥ 0 denotes the Euclidean distance between sites s j , s k ∈ S. Examples of models satisfying (3) include marginally transformed Gaussian processes and inverted max-stable processes, see Section 2.5 for more details.
With W (s) as described, let R be an independent standard Pareto random variable. Our spatial dependence model is defined through the random field constructed as The following simple observation highlights why the parsimonious model defined in (4) is potentially useful: When δ > 1/2, then R δ is heavier tailed than W 1−δ and this induces asymptotic dependence; when δ < 1/2 the converse is true, and this induces asymptotic independence. These facts are formalized in Section 2.3. Construction (4) has superficial similarities with the Gaussian scale mixture models studied by Huser, Opitz, and Thibaud (2017), who multiplied a Gaussian random field by a random effect that determines the extremal dependence properties. However, in (4), the latent process W (s) does not have Gaussian margins, resulting in a very different construction in practice, and need not have a Gaussian copula structure, which yields a much wider class of models. In practice, high-dimensional inference requires tractable densities for W (s) (see Section 3.1), leading to the Gaussian copula as a natural choice in spatial settings. Alternative possibilities for W (s) are discussed further in Section 2.5.
Remark 1. Representation (4) is convenient to study the asymptotic dependence properties of the process X (s) using the theory of regular variation, see Section 2.3. However, as the copula structure is invariant with respect to monotonically increasing marginal transformations, there is an infinite number of ways to characterize the copula stemming from X (s), some of which may be computationally more attractive or have appealing interpretations. For example, taking the logarithm on both sides of (4), we obtain an additive structurẽ whereR := log(R) ∼ Exp(1) is independent ofW (s) := log{W (s)}, also with Exp(1) margins. In Sections 3 and 4, copula and likelihood computations are based on expression (5).
The variable R in (4) or equivalently the variableR in (5), may be interpreted in various ways, shedding light on the extremal behavior of X (s). For example, by writingR := {R(s) : s ∈ S ⊂ R 2 }, it can be seen as a random process indexed by S with perfect dependence, so the representation in (5) implies thatX (s) can be interpreted as a mixture between perfect dependence and asymptotic independence. This contrasts with Coles and Pauli (2002), who constructed hybrid bivariate models using a certain type of mixture between asymptotic dependence and complete independence.
Alternatively, R orR may be interpreted as an unobserved latent random factor impacting simultaneously the whole region S, hence affecting the joint tail characteristics, and making a link with the common factor copula models for spatial data introduced by Krupskii, Huser, and Genton (2018). One major difference with our approach, however, is that hereR andW (s) are both on the unit exponential scale, whereas the location mixture copula models of Krupskii, Huser, and Genton (2018) assume thatW (s) is a Gaussian process and that both components in (5) are weighted equally, corresponding to δ = 1/2. Consequently, their exponential factor model always displays asymptotic dependence. Other distributions for the random factor were investigated by Krupskii, Huser, and Genton (2018), but they all yield copulas with (nontrivial) asymptotic dependence lying on the boundary of, or at a single point in, the parameter space.
We next study the dependence properties of model (4) for δ ∈ (0, 1) noting the simple interpretations at the endpoints of the parameter space: It is clear from (4) or (5) that perfect dependence arises as δ → 1, while the copula of W (s) is recovered as δ → 0.

Dependence Properties
Owing to the simple construction of this process, it is sufficient to study bivariate dependence to make more general conclusions. Comments on higher dimensional dependence will be made throughout the remainder of the section.
To examine the dependence properties of the process (4), we relate the behavior of the bivariate joint survivor function on the diagonal, P(X j > x, X k > x), to the marginal survivor function, P(X j > x), where for simplicity, we write X j = X (s j ) and so forth. We focus on a bivariate version of the dependence measure (2), and its limit χ (h) := lim u→1 χ u (h), with h = s j − s k . A value of χ (h) > 0 indicates asymptotic dependence for this pair of sites, while χ (h) = 0 defines asymptotic independence. Because the process X (s) has common margins with upper endpoint at infinity, the limit may be equivalently expressed as When χ (h) = 0, alternative measures are needed to discriminate between the different levels of dependence exhibited by asymptotically independent distributions. A widely satisfied assumption, already made for the process W (s) in (3) (modulo the restriction made on the coefficient of tail dependence) is where L X is slowly varying at infinity and η X (h) ∈ (0, 1] is the coefficient of tail dependence for the process X (s). When η X (h) = 1 and L X (x) → χ (h) > 0 as x → ∞, the pair of variables (X j , X k ) T are asymptotically dependent, else they are asymptotically independent and the value of η X (h) summarizes the strength of extremal dependence in the joint upper tail. For notational convenience, the dependence on distance h in χ (h) and η X (h) may be omitted when no confusion can arise.

... Marginal Distribution
The marginal distribution of the process (4) may be established for δ = 1/2 as follows: The case δ = 1/2 may either be established independently, or as a limit, from which we get this is the survival function of a log-Gamma random variable with rate and shape parameters both equal to two. Notice that margins are here available in closed form, unlike the Gaussian scale mixture model of Huser, Opitz, and Thibaud (2017), or the bivariate model of Wadsworth et al. (2017). Since the copula is the object of interest in all of the above cases, this makes model (4) computationally more appealing.

... Joint Distribution
We now derive the joint survivor function of a pair of variables (X j , X k ) T from the process X (s) in (4), and then use this result in (7) and (8), combined with (9), to derive the corresponding coefficients χ and η X characterizing tail dependence of X (s) depending on the value of δ.
Proposition 1. With definitions and notation as established above, the joint survivor function of (4) satisfies where L is slowly varying at infinity.
If δ ≤ 1/2, the pair (X j , X k ) T is asymptotically independent, that is, χ = 0. Furthermore, the coefficient of tail dependence for the process (4) is Remark 2. Analogous dependence summaries in d dimensions are simple to establish using the same techniques of proof as for Proposition 1 and Corollary 1. Specifically, letting η 1:d X and η 1:d W denote d-dimensional counterparts of the coefficient of tail dependence, defined using the d-dimensional joint survivor function, then expression (11) still holds with η X and η W replaced by η 1:d X and η 1:d W . The d-dimensional analog of χ generalizes expression (10), and is discussed in Remark 3.
The case δ = 1/2 is of particular interest, since it represents a boundary between asymptotic dependence and asymptotic independence: According to Corollary 1, we have asymptotic independence (χ = 0), but the coefficient of tail dependence η X attains its boundary value of 1. In this case, we therefore have L X (x) → 0 as x → ∞ in (8). Furthermore, the model has the appealing property that χ 0 as δ 1/2 and η X η W as δ η W /(1 + η W ). As noted at the end of Section 2.2, as δ 0, the dependence structure of the W process is recovered. Our model X (s) in (4) hence provides a smooth interpolation from the asymptotically independent submodel W (s) and perfect dependence, as the parameter δ varies in the unit interval, and it transits through nontrivial asymptotically independent and asymptotically dependent submodels.

Further Dependence Properties under Asymptotic Dependence
Here, we outline the connection to other well-known measures of dependence in the case of asymptotic dependence. We focus first on a limiting measure, namely the so-called exponent func- which describes the joint dependence of the associated maxstable or generalized Pareto process, see Davison, Padoan, and Ribatet (2012), Cooley et al. (2012), Segers (2012), or Davison and Huser (2015) for recent reviews on max-stable models. We then examine the subasymptotic behavior under asymptotic dependence, that is, the mode of convergence toward such limiting structures, which is important in practice for modeling extreme events at observable levels.
, that is, the rate at which χ u converges to its limit χ , determines the flexibility of a process for capturing subasymptotic extremal dependence. Proposition 3 demonstrates that the parameterization of model (4) gives flexibility in this rate, meaning that dependence can weaken above the level used for fitting, while still allowing for the possibility of asymptotic dependence.
Proposition 3. For δ > 1/2, For comparison, generalized Pareto processes have χ u ≡ χ for all u above a certain level (Rootzén, Segers, and Wadsworth 2018), while all max-stable processes have χ u − χ (1 − u), as u → 1. However, as χ is a dependence measure on the scale of the observations rather than maxima, it is less useful in the context of max-stable processes, where the summary (12) is typically used instead. From Proposition 3, we observe a wide range of convergence rates, from very rapid for δ near 1, to rates slower than (1 − u) for 1/2 < δ < 2/3. Note that for δ < 1/2, the rate χ u − χ is determined by the coefficient of tail dependence, η X ; recall (8) and (11). Figure 2 illustrates the flexibility in extremal dependence structures, by plotting χ u in (6) as a function of the threshold u and the limit quantity χ = lim u→1 χ u in (10), for a range of values of δ ∈ (0, 1), and (W j , W k ) T following a Gaussian copula with correlation parameter 0.4. Figure 2 also displays the coefficient of tail dependence η X defined in (8) and (11) as a function of δ for η W = 0.1, . . . , 0.9. The smooth transition from asymptotic dependence to asymptotic independence taking place at δ = 1/2 can be clearly seen from these two plots. Moreover, as is intuitive, the right panel of Figure 2 shows that the process X (s) in (4) cannot reach lower levels of dependence than its underlying W (s) process.

Example Models
We conclude this section with some concrete suggestions for the W process that may be useful in certain applications, such as those described in Section 4.
Example 1 (Gaussian process). Let {Z(s) : s ∈ S} be a stationary Gaussian process with correlation function ρ(h)<1, and standard Gaussian marginal distribution, denoted . Then, has a Gaussian copula, Pareto margins, and coefficient of tail dependence η W (h) = {1 + ρ(h)}/2 < 1. In this case, the value of χ (h) in (10) needs to be calculated either by Monte Carlo or numerical integration, both of which are simple and quick.
Example 2 (Inverted max-stable process). Let {M(s) : s ∈ S} be a stationary max-stable process with extremal coefficient function θ (h) ∈ (1, 2], and marginal distribution functions G s , s ∈ S. Then, the process has an inverted max-stable copula (Ledford and Tawn 1996;Wadsworth and Tawn 2012), Pareto margins, and coefficient of tail dependence η W (h) = 1/θ (h). The value of χ (h) in (10) can be calculated as  In what follows, we will principally take W (s) to have a Gaussian copula because the resulting density is much simpler in high dimensions than that of the inverted max-stable process, which suffers the same explosion in the number of terms as a maxstable process density. Pairwise or higher dimensional composite likelihoods (see, e.g., Padoan, Ribatet, and Sisson 2010;Varin, Reid, and Firth 2011;Castruccio, Huser, and Genton 2016) offer an alternative approach, but we do not explore this further here. Outside of a spatial context however, other dependence structures may be preferred.
Example 3 (Nonspatial model). We remark that nonspatial use of the model (4) is also possible, replacing the process W (s) with an asymptotically independent random vector W = (W 1 , . . . ,W d ) T with pairwise coefficients of tail dependence η jk W < 1, j < k ∈ {1, . . . , d}. For multivariate models in dimension d greater than two some care is required, however, as model (4) allows only for d-wise asymptotic dependence (i.e., χ 1:d > 0), or d-wise asymptotic independence (i.e., χ jk = 0, for all j < k ∈ {1, . . . , d}). Such assumptions are natural in the context of spatial processes, but often less so for genuinely multivariate data. For dimension d = 2 however, where χ 1:2 > 0 is the complement of χ 1:2 = 0, model (4) offers an interesting alternative to that of Wadsworth et al. (2017) for bivariate data. The latter show that the copula model defined by X = RW , where the radial variable R follows a unit scale generalized Pareto distribution with shape parameter ξ ∈ R and max(W ) = 1, with R and W independent, displays asymptotic dependence for ξ > 0 and asymptotic independence for ξ ≤ 0. One advantage of model (4) is that a version with an asymmetric dependence structure is simpler to implement, by selecting an asymmetric bivariate distribution for the copula of W . We illustrate the improvement this can offer in Section 4.2.

Censored Likelihood
We wish to fit the dependence structure of model (4) to the extremes of spatial processes. Since the dependence characteristics of the model are tailored toward appropriately capturing extremal dependence, we use a censored likelihood, which prevents low values from affecting the estimation of the extremal dependence structure. Such an approach is now standard in inference for multivariate and spatial extremes, although different censoring schemes have been adopted; see, for example, Smith, Tawn, and Coles (1997), Wadsworth and Tawn (2012), Huser, Davison, and Genton (2016), and Huser, Opitz, and Thibaud (2017). We assume that we are working with a W process that has a density, so that this is also true for the copula.
Assume that n independent replicates of a random process {Y (s) : s ∈ S ⊂ R 2 } are observed at d spatial locations, s 1 , . . . , s d ∈ S. Denote the ith replicate at the jth location by Y i j , i = 1, . . . , n, j = 1, . . . , d. We assume that in its joint tail region, that is, for observations above a high marginal threshold, the process Y (s) has the same copula as our model X (s) defined in (4), but with possibly different marginal distributions F s . To estimate the dependence structure, we first transform the margins to uniform independently at each site s j , j = 1, . . . , d.
In Section 4, we use the semiparametric procedure of Coles and Tawn (1991), whereby the distribution function is estimated using the asymptotically motivated generalized Pareto distribution above a high marginal threshold, and the empirical distribution function below that threshold. The resulting variables are denoted U i j =F s j (Y i j ). An alternative is to use the empirical distribution function throughout as in Huser, Opitz, and Thibaud (2017). This two-step approach is common practice in the copula literature and provides consistent inference for the copula under mild regularity conditions (see, e.g., Joe 2015).
The second step is to estimate the copula parameters using the transformed data based on a censored likelihood. When fitting the copula stemming from model (4), the parameters to be estimated are (5), the resulting copula C and its density c are where FX (x) = F X (e x ) and fX (x) = f X (e x )e x with f X (x) = dF X (x)/dx, easily obtained in closed form through (9). The functions FX and fX are the marginal distribution and density, respectively, stemming from theX (s) process observed at the sites s 1 , . . . , s d , while represent the joint distribution function and density, respectively, of this process. Here, r δ = min(x 1 , . . . , x d )/δ, and FW , fW denote the joint distribution and density, respectively, for theW (s) process. The partial derivatives of the copula C(u 1 , . . . , u d ; ψ) with respect to any set of variables J ⊂ {1, . . . , d} of cardinality d J may be expressed as where When the process W (s) is based on a Gaussian copula, partial derivatives in (17) involve the multivariate Gaussian distribution in dimension d − d J . Although the unidimensional integrals appearing in (15)-(17) cannot be expressed in closed form, they can nevertheless be accurately approximated using standard finite integration or (quasi) Monte Carlo methods. To estimate the parameters ψ = (δ, ψ T W ) T ∈ , while avoiding influence of nonextreme data below high marginal thresholds u 1 , . . . , u d , we maximize the censored log-likelihood function defined as with contributions defined through the sets of indices J i = { j : The set J i determines whether the ith observation vector (U i1 , . . . ,U id ) T has threshold exceedances in no, all, or some but not all components, respectively; therefore, these sets may be different for each likelihood contribution i = 1, . . . , n. The estimator maximizing (18) over is denoted byψ. The performance of this inference approach is assessed in our simulation study Section 3.2 and it is used in the application in Section 4.1.
Another possible censoring scheme is to use either the fully censored contribution C(u 1 , . . . , u d ; ψ) in (18) if J i = ∅ (i.e., the variable U i j is lower than the threshold u j for all j = 1, . . . , d), or the completely uncensored contribution c(U i1 , . . . ,U id ; ψ) otherwise. This was used by Wadsworth and Tawn (2012), Opitz (2016), andWadsworth et al. (2017), and is adopted in the example of Section 4.2, where we compare fits of bivariate models.
Overall, the estimation procedure works as expected, with boxplots for δ approximately centered around the true value, though a small bias appears for δ = 0.8, 0.9, which is due to numerical instabilities and difficulties in identifying all three parameters in such strong dependence scenarios, despite the higher numerical precision, recall Figure 2. As is typical for a bounded parameter, the asymptotic normality ofδ looks to hold well when δ is not too close to 0 and 1, but the distribution displays some asymmetry near to the endpoints 0 and 1. Estimation seems to be easier when δ ≈ 1/2, which leads to small bias and variability. As δ → 0, the copula structure of X (s) converges to that of the latent process W (s), here a Gaussian copula, and therefore low values such as δ = 0.1, 0.2 yield very similar dependence structures, leading to higher variability. Boxplots of λ andν (see supplementary material) suggest that results are better in the asymptotic independence case when δ ≤ 0.5. For larger values of δ, the range λ is more variable and the smoothness parameter ν is slightly more biased, owing to the very strong dependence. However, in practice, one could restrict the parameter δ to 0 ≤ δ ≤ 0.8, say, as δ > 0.8 is very unlikely to occur in  applications. For all parameters δ, λ, and ν, but particularly for ν, the fit improves significantly when more locations are available.

... Testing the Dependence Class
A major advantage of model (4) over currently available models for spatial extremes is that we do not need to explicitly determine whether the data exhibit asymptotic dependence or asymptotic independence in order to select an appropriate class of models. However, since so much effort has previously been placed on determining the appropriate dependence class, we present the details and simulation experiments of a model-based test for this here. Coles, Heffernan, and Tawn (1999) suggested using nonparametric estimators of the measure χ u (defined slightly differently to (6)) and its counterpartχ u , but when the threshold u increases to unity, the associated uncertainty inflates dramatically. This renders any test based on these nonparametric estimators almost useless in practice. To increase the power for discriminating between asymptotic dependence and independence, a parametric model-based approach seems sensible and our copula model (4) provides a very natural way to proceed, because the transition between the two asymptotic paradigms takes place in the interior of the parameter space. We stress, however, that the validity of such a test is reliant on modeling assumptions, and as such is best used in conjunction with other diagnostics. Standard likelihood theory can be invoked to design tests for the null hypotheses H AD 0 : δ > 1/2 (asympt. dependence) versus
Letψ = (δ,λ,ν) T be the maximum likelihood estimator (MLE). We suggest using asymptotic normality ofψ to test for H AD 0 or H AI 0 , an assumption that should hold true if n is large and δ is not too close to its boundaries 0 and 1. In particular, denoting the estimated variance ofδ byv δ , the power of these tests at the level 100 × (1 − α)% can be computed as respectively, where z 1−α is the (1 − α)-quantile of the standard normal distribution.
To compute the power curves (19) and (20), we drew 300 simulations of model (4) at d = 2, 5, 10 locations in [0, 1] 2 with n = 1000, 2000 independent replicates, under the same setting as Section 3.2.1. Range λ = 0.5 and smoothness ν = 1 were fixed, and we considered a sequence δ ∈ [0.3, 0.8] in steps of 0.02, estimating all parameters using the MLE based on (18) with marginal thresholds u 1 = · · · = u d = 0.95. The Hessian matrix at the MLE was used in order to computev δ as the (1, 1)entry of the reciprocal Fisher information. Figure 4 displays the proportion of null hypotheses rejected (i.e., the power curves (19) and (20) when the corresponding null hypotheses are false), estimated using the 300 simulations and plotted as a function of δ ∈ [0.3, 0.8]. As expected, for all dimensions, the power to reject asymptotic dependence (respectively, asymptotic independence) improves as δ → 0 (respectively, δ → 1), and with higher dimensions, although there is little difference between d = 5 and d = 10. Comparing left and right panels, increased sample size also improves power, with a steeper transition around δ = 1/2. The departure from nominal levels for the Type I error, however, suggests that the Hessian may not give a good representation of the asymptotic variance, possibly owing to numerical approximations. In Section 4.1, we suggest using bootstrap methods to calculate uncertainty.

Hindcast Significant Wave Height Data
Wadsworth and Tawn (2012) considered modeling the extremes of the winter observations of a hindcast dataset of significant wave height, a measure of ocean energy, from the North Sea. Calculating the coefficient of tail dependence η(h) for the wave height process, they suggested that there was evidence for asymptotic independence of the process, although strong spatial dependence between sites. Figure 1 suggests a high degree of ambiguity in what the appropriate extremal dependence structure should be, since the summary χ u is decreasing as u increases, but not necessarily to a value of zero. This ambiguous situation is replicated throughout numerous applications, and demonstrates the necessity for a model such as model (4) that can handle both scenarios.
Measurements of the hindcast are recorded at three-hourly intervals, yielding eight observations per day, over a period of 31 years. In total, the dataset of winter (December, January, and February) wave heights consists of 22,376 observations at 50 locations. Margins are transformed to uniform using the semiparametric transformation of Coles and Tawn (1991). The data are strongly temporally dependent and so we subsample to extract one realization per day, giving 2797 observations. The resulting data still exhibit temporal dependence, but this thinning eases the computational burden of model fitting, while the information loss should be small. Finally, we select a subset of 20 sites to fit the model to, while using all data for validation of the fit. Distance is measured in units of latitude (one unit ≈ 111 km); the range of distances between sites is 0.27-2.99 units.
Model (4) was fitted by maximum likelihood based on (18) with thresholds u 1 = · · · = u d = 0.95, assuming a Gaussian copula for the W process (Example 1). Table 1 reports the results. The uncertainty measures are based on 200 bootstrap samples, created using the stationary bootstrap (Politis and Romano 1994). This procedure relies on sampling blocks of geometric length; we sampled using an average length of 14 days, although any blocks that reached the end of February (i.e., the end of one winter) were curtailed, so that observations within a block are always consecutive. Figure 2 in the supplementary material shows that this bootstrap procedure captures the temporal dependence in the extremes adequately.
The MLE of δ indicates asymptotic independence, although the 95% bootstrap confidence interval includes values above 0.5, meaning that firm conclusions about the asymptotic dependence class are difficult to draw; this further highlights the need for models that can incorporate both scenarios. While asymptotic independence is indicated, the value of δ suggests that our model is more suited than a simple Gaussian model. To reinforce this, we also fit a Gaussian model, using the same censored likelihood scheme, with results reported on the right side of Table 1. Although the Gaussian model is nested within the model we fit, testing is nonstandard as it occurs at the boundary of the parameter space, that is, for δ = 0. The maximized log-likelihood for our model was 62 units higher than for the Gaussian model, representing a clear improvement, although interpretation is difficult as there is no explicit accounting for temporal dependence in the likelihood.
To assess the fit of the model, we consider two diagnostics. Figure 5 displays the fitted value of χ u , as defined in (2), for the subset of sites included in the model fit (left panel) and the subset of sites excluded from the fit (right panel). Although the model was fitted using censored likelihood above a 95%quantile threshold, the fit looks good on the plotted range u ∈ (0.9, 1). The Gaussian model clearly underestimates the dependence.
The second diagnostic we consider is the distribution of the number of threshold exceedances, conditioning upon having at least one exceedance. The supplementary material contains histograms of the distribution from our data sample and from the fitted model, and suggests that the fitted model appears to capture this distribution quite well.

Newlyn Oceanographic Data
We fit a bivariate version of model (4), as discussed in Example 3, to the Newlyn oceanographic data analyzed in Wadsworth et al. (2017) to illustrate an asymmetric construction, and to compare with the symmetric models fitted therein. The data, shown in Figure 6, comprise 2894 observations of wave height, surge, and period, and we analyze them pairwise, transforming to uniformity again using the semiparametric transformation of Coles and Tawn (1991). To generate an asymmetric model, we assume that the copula of (W 1 , W 2 ) T is that of an inverted Dirichlet max-stable distribution (recall Example 2). The bivariate Figure . Estimates of χ u for the hindcast wave height data. Central black dots: empirical estimate of χ u from the temporally thinned data; dashed lines: approximate % confidence intervals based on the stationary bootstrap procedure described in the text; dot-dash red line: empirical estimate of χ u from all data; thick solid blue line: fit from our model; thin solid green line: fit from the Gaussian model. Left: data to which the model was fitted (from  sites); right: data to which the model was not fitted (from  sites). Dirichlet max-stable distribution (Coles and Tawn 1991) has exponent function where Be(·, a, b) is the Beta distribution function with shape parameters a and b. The bivariate inverted max-stable distribution with standard Pareto margins has joint survivor function To ensure consistency with the approach of Wadsworth et al.
(2017), we use the censored likelihood described therein and at the end of Section 3.1 for both models. That is, we use the full density contribution when either variable is above a censoring threshold, which is set to the 95%-quantile in each margin. Table 2 gives the Akaike information criterion (AIC) for the model of Wadsworth et al. (2017) and our asymmetric model; improvements are seen for pairs involving wave period, which shows a more asymmetric dependence structure than height and surge. One limitation of this choice for (W 1 , W 2 ) T is that it cannot exhibit negative dependence, and as such, the model is less flexible when it comes to accounting for dependence structures with weak asymptotic dependence (i.e., with small but positive χ ). This does not appear to be an issue for these asymptotically independent pairs, but alternative choices for (W 1 , W 2 ) T such as the skew bivariate normal (Azzalini and Dalla Valle 1996) could be used to overcome this.

Discussion
Motivated by deficiencies in existing frameworks for modeling spatial extremes, we presented a parsimonious model that is able to capture the subasymptotic dependence behavior of spatial processes. Importantly, both extremal dependence classes are captured, with rich structures within each class, and a smooth transition between paradigms at the interior of the parameter space. Inference for model (4) is feasible in moderate dimensions, but computationally intensive when W has a Gaussian copula, owing to the need to integrate expressions involving a multivariate Gaussian distribution function. However, new quasi-Monte Carlo algorithms, such as those used by de Fondeville and Davison (2016), and the associated R package mvPot, have the potential to increase scalability; their code was used to speed up the bootstrap procedure in Section 4.1. With the exception of the specific model used in de Fondeville and Davison (2018), truly high-dimensional inference for spatial extreme-value models has yet to be achieved, and our model is competitive with others in this aspect. There are two notable limitations of the model (4). The first of these is that for δ > 1/3, η X (h) > 1/2 indicating a persistence of positive extremal association even as the lag h → ∞. This is, however, a common problem with many models for spatial extremes. Consequently, the model is more suitable for smaller spatial regions or data for which this is not an issue. The second limitation concerns the link between δ and the limiting value of χ (h) for δ > 1/2. Since W (s) ≥ 1, we have min(W j , W k ) ≥ 1 and consequently from (10), χ (h) ≥ (2δ − 1)/δ. As can be observed from Figure 2 and Equation (13), for values of δ near 1, the process (4) behaves similarly to a generalized Pareto process. However, model (4) would be unable to capture a weakly dependent generalized Pareto process, that is, one for which χ u (h) is constant in u but its limit χ (h) is small and positive. In practice however, this is not likely to be restrictive, since in our experience almost all environmental datasets display a decreasing χ u function.
Code and data. Code for fitting the models described is available as supplementary material and at http://www.lancaster.ac. uk/∼wadswojl/SpatialADAI. The NEXTRA hindcast data analyzed in Section 4.1 are subject to restrictions. Access may be granted for academic purposes by members of the North European Storm Study User Group (NUG), requests can be made using the details at http://www.oceanweather.com/metocean/ next/index.html. The Newlyn wave data analyzed in Section 4.2 are available as supplementary material.

Supplementary Materials
The supplementary materials contain proofs of theoretical results, additional simulation results, as well as further diagnostics for the applications of Section 4.