Partial Tail-Correlation Coefficient Applied to Extremal-Network Learning

We propose a novel extremal dependence measure called the partial tail-correlation coefficient (PTCC), in analogy to the partial correlation coefficient in classical multivariate analysis. The construction of our new coefficient is based on the framework of multivariate regular variation and transformed-linear algebra operations. We show how this coefficient allows identifying pairs of variables that have partially uncorrelated tails given the other variables in a random vector. Unlike other recently introduced conditional independence frameworks for extremes, our approach requires minimal modeling assumptions and can thus be used in exploratory analyses to learn the structure of extremal graphical models. Similarly to traditional Gaussian graphical models where edges correspond to the non-zero entries of the precision matrix, we can exploit classical inference methods for high-dimensional data, such as the graphical LASSO with Laplacian spectral constraints, to efficiently learn the extremal network structure via the PTCC. We apply our new method to study extreme risk networks in two different datasets (extreme river discharges and historical global currency exchange data) and show that we can extract meaningful extremal structures with meaningful domain-specific interpretations.


Introduction
Characterizing the extremal dependence of complex stochastic processes (e.g., in spatial, temporal, and spatio-temporal settings) is fundamental for both statistical modeling and applications, such as risk assessment in environmental and financial contexts.Important applications include the modeling of precipitation extremes (Huser and Davison, 2014;Opitz et al., 2018;Bacro et al., 2020;Saunders et al., 2021;Richards et al., 2022), heatwaves (Winter and Tawn, 2016;Zhong et al., 2022), and air pollution (Vettori et al., 2019(Vettori et al., , 2020)), as well as financial risk assessment (Bassamboo et al., 2008;Ferro and Stephenson, 2011;Marcon et al., 2016;Bekiros and Uddin, 2017;Yan et al., 2019;Gong and Huser, 2021).Often, applications illustrate the benefits of methodological innovations, such as the application of the extremal dependence measure in Larsson and Resnick (2012), which is a key ingredient of the present work, in the analysis of financial data.
Models for extremal dependence traditionally rely on asymptotic frameworks, such as max-stable processes for block maxima or r-Pareto processes for threshold exceedances of a summary functional of the process over a high threshold.Recently, more advanced models have been proposed to further improve flexibility, especially towards modeling of asymptotically independent data with dependence vanishing at the most extreme levels, such as inverted max-stable processes (Wadsworth and Tawn, 2012), max-mixture models (Ahmed et al., 2020), random scale mixtures of Gaussian processes (Opitz, 2016;Huser et al., 2017) or of more general processes (Wadsworth et al., 2017;Engelke et al., 2019;Huser and Wadsworth, 2019), max-infinitely divisible processes (Bopp et al., 2021), and conditional spatial extremes models (Wadsworth and Tawn, 2022); for a comprehensive review, see Huser and Wadsworth (2022).Specifics of serial extremal dependence have been studied by Davis et al. (2013), among others.
In the study of stochastic dependence structures, networks and graphs are natural tools to represent dependence relationships in multivariate data.Conditional independence, sparsity, and parsimonious representations are key concepts in graph-based approaches for random vectors.Recently, graph-based tools have also been developed for extremal dependence, where variants of conditional independence apply to variables not directly connected by the edges of the graph.For example, Huang et al. (2019) provide an exploratory tool, called the χ-network, for modeling extremal dependence, and they use it to analyze maximum precipitation during the hurricane season in the United States (US) Gulf Coast and in surrounding areas.In their approach, however, the χ-network does not remove the effect of confounding variables, so it does not naturally lead to sparse extremal dependence representations.More recently, Engelke and Hitz (2020) introduce a notion of conditional independence adapted to multivariate Pareto distributions arising for limiting multivariate threshold exceedances, and they use it to develop parametric graphical models for extremes based on the Hüsler-Reiss distribution.Similarly, Gissibl (2018) and Klüppelberg and Krali (2021) propose max-linear constructions for modeling maxima on tree-like supports, and Tran et al. (2021) propose QTree, a simple and efficient algorithm to solve the "Latent River Problem" for the important case of extremes on trees.In the same vein, Engelke and Volgushev (2020) develop a data-driven methodology for learning the graphical structure in the setting of Engelke and Hitz (2020), whereas Röttger et al. (2021) further propose Hüsler-Reiss graphical models under the assumption of multivariate total positivity of order two (MTP 2 ), which allows estimating sparse graphical structures.Finally, Engelke and Ivanovs (2021) review the recent developments in sparse representations, dimension reduction approaches, and graphical models for extremes.Overall, existing graphical representations for extremes from the literature often rely on rather stringent asymptotically justified models, sometimes leading to issues when dealing with relatively high-dimensional problems or when specific graph structure assumptions (e.g., trees) are required.A recent exception is Engelke et al. (2022), who develop theories and methods for learning extremal graphical structures in high dimensions based on L 1 regularized optimization, though their methodology still assumes a parametric extremal dependence structure of Hüsler-Reiss type.
By contrast, rather than restricting ourselves to a strict parametric modeling framework, we adopt a more pragmatic and empirical approach.Specifically, our goal is to extend and enrich existing approaches by defining the new concept of partial tail correlation as an extreme-value analog of the notion of partial correlation widely used in classical multivariate analysis, and by introducing a new coefficient that enables estimation of general extremal networks under minimal modeling assumptions.In the same way that correlation does not imply independence in general, our concept of partial tail-uncorrelatedness is a weaker assumption than conditional tail independence.However, we shall show that it still provides relevant insights into various forms of extremal dependence structures and helps in guiding modeling choices at a data exploratory stage.
As a novel extremal dependence measure, we propose the partial tail correlation coefficient (PTCC) as an equivalent of the partial correlation coefficient in the non-extreme setting.In the classical setting, the Pearson correlation coefficient between two random variables can give misleading interpretations when there are confounding variables that influence both variables, whereas the partial correlation coefficient measures the residual degree of association between two variables after the linear effects of a set of other variables have been removed.To compute the partial correlation between two variables of interest, we regress each of these variables onto the set of covariates given by all the other variables in the multivariate random vector, and then compute the correlation between the residuals from the two fitted linear regressions.In the Gaussian setting, a partial correlation of zero is equivalent to conditional independence between two variables (Lawrance, 1976;Baba et al., 2004), and the elements of the inverse of the covariance matrix (i.e., the precision matrix ) of the full vector are known to characterize this conditional (in)dependence structure.
In this paper, we adopt a similar strategy to define the PTCC, namely by computing a suitable measure of tail dependence between residuals obtained by regressing variables using transformed-linear operations that do not alter tail properties.While classical linear regression only makes sense for Gaussian-like data, such transformed-linear operations can be used for tail regression of multivariate regularly-varying random vectors, which is a fundamental assumption characterizing asymptotically dependent extremes.
To be more precise, we here define the PTCC by building upon the framework of Cooley and Thibaud (2019), who developed a customized transformed-linear algebra on the positive orthant, preserving multivariate regular variation and thus being well adapted to "linear" methods for joint extremes.Cooley and Thibaud (2019) used this framework for principal component analysis of extremes based on decompositions of the socalled tail pairwise dependence matrix (TPDM), which conveniently summarizes information about extremal dependence in random vectors and possesses appealing properties for such decompositions.The TPDM can be thought of as an analogy of the classical covariance matrix but tailored for multivariate extremes.In some follow-up work, Mhatre and Cooley (2020) then developed non-negative regularly varying time series models with autoregressive moving average (ARMA) structure using the transformed-linear operations for time series extremes.For spatial extremes, Fix et al. (2021) extended the simultaneous autoregressive (SAR) model under the transformed-linear framework and developed an estimation method to minimize the discrepancy between the TPDM of the fitted model and an empirically estimated TPDM.Furthermore, Lee and Cooley (2021) recently introduced transformed-linear prediction methods for extremes.In the aforementioned papers, the TPDM always plays a central role.Similar to covariances, the entries of the TPDM are tail dependence measures giving insights into the direct extremal dependence structure without removing the influence of other confounding variables.However, just as the covariance matrix does not reflect partial correlations, the TPDM does not directly inform us about partial associations among extremes.In this work, we fill this gap with our new proposed PTCC.Thanks to its definition in terms of transformed-linear operations, we show that the PTCC inherits several appealing features of the classical partial correlation coefficient.In particular, the PTCC between two components X i and X k from a random vector X is such that the (i, k)th entry of the inverse TPDM matrix of X equals zero if and only if the corresponding PTCC for these two variables is also equal to zero.In other words, partial tail-uncorrelatedness can be conveniently read off from the zero elements of the inverse TPDM, similar to classical Gaussian graphical models.We then exploit this property to define a new class of extremal graphical models based on the PTCC and then use efficient inference methods to learn the extremal network structure from high-dimensional data based on state-of-the-art techniques from graph theory (e.g., the graphical LASSO with or without Laplacian spectral constraints).We note that here, our focus is on studying undirected graph structures, which is different from causal inference, where causal relationships can be encoded using directed graph edges.
The remainder of this article is organized as follows.In Section 2, we first review the necessary background on multivariate regular variation and transformed-linear algebra, as introduced in Cooley and Thibaud (2019).Then, we define the new PTCC and the related notion of partial tail-uncorrelatedness.In Section 3, we present methods for learning general extremal network structures from the PTCC in a high-dimensional data setting, and we discuss two particularly appealing approaches, namely the graphical LASSO and Laplacian spectral constraint-based methods.Section 4 presents a simulation study for general structured undirected graphs using the above two inference methods.In Section 5, we apply these new tools to explore the risk networks formed by river discharges observed at a collection of monitoring stations in the upper Danube basin, and by historical global currency exchange rate data from different historical periods, covering different economic cycles, the COVID-19 pandemic, and the 2022 military conflict in Ukraine.
2 Transformed-linear algebra for multivariate extremes Before introducing the partial tail correlation coefficient (PTCC) and the related notion of partial-tail uncorrelatedness, we first briefly review the multivariate regular variation framework, which is our main assumption for defining the PTCC, and we also summarize the foundations of transformed-linear algebra.

Regular variation framework and transformed linear algebra
A random vector is multivariate regularly varying (Resnick, 2007) (i.e., jointly heavy-tailed) if its joint tail decays like a power function.Precisely, we say that a p-dimensional random vector where v − → denotes vague convergence to the non-null limit measure ν X , a Radon measure defined on the space [0, ∞] p \ {0}.This measure has the scaling property r α ν X (rB) = ν X (B) for r > 0 and Borel sets , where α > 0 controls the tail decay (with 1/α commonly called the tail index).For this reason, the measure can be further decomposed into a radial measure and an angular measure H X on the unit sphere i.e., L(n) > 0 and L(rn)/L(n) → 1 for any r > 0, as n → ∞.We use the short-hand notation X ∈ RV p + (α) for a regularly varying vector X with tail index 1/α.Cooley and Thibaud (2019) introduced the transformed-linear algebra framework to construct an inner product space on an open set (the so-called target space) via a suitable transformation, where the distribution of the random vector X has support within this set.Our use will mainly concern transformation towards the target space R p + from the space R p , but we first present the general approach.Let t be a bijective transformation from R onto some open set X ⊂ R, and let t −1 be its inverse.For a p-dimensional vector y ∈ R p , we define x = t(y) ∈ X p componentwise.Then, arithmetic operations among elements of the target space are carried out in the space R p before transforming back to the target space.We define vector addition in X p as x 1 ⊕ x 2 = t{t −1 (x 1 ) + t −1 (x 2 )}, and scalar multiplication with a factor a ∈ R as a • x = t{at −1 (x)}.The additive identity in X p is set to 0 X p = t(0), and the additive inverse of x ∈ X p is given as x = t{−t −1 (x)}.A valid inner product between two elements the target space is then obtained by applying the usual scalar product in R p , i.e., we set To obtain an inner product space on the positive orthant for which arithmetic operations preserve multivariate regular variation, thus having a negligible effect on large values, we follow Cooley and Thibaud (2019) and define the specific transformation t : R → (0, ∞) given by t(y) = log{1 + exp(y)}, though they are other possibilities.We have y/t(y) → 1 as y → ∞, such that the upper tail behavior of a random vector Y = t(X) is preserved through t.For lower tails, we have exp(y)/t(y) → 1 as y → −∞.
The inverse transformation is t −1 (x) = log(exp(x) − 1), x > 0. Algebraic operations done in the vector space induced by the above transformation t are commonly called transformed-linear operations, and we can exploit this framework to extend classical linear algebra methods (e.g., principal component analysis, etc.) to the multivariate extremes setting, where heavy-tailed vectors and models are often conveniently expressed on the positive orthant, R p + .We note that our main assumption, multivariate regular variation, implies asymptotic tail dependence (as well as homogeneity of the limit measure in ( 1)), but it does not impose further parametric structural assumptions such as with Pareto models of Hüsler-Reiss type.

Inner product space of regularly varying random variables
With transformed-linear operations, we can use a vector of independent and identically distributed (i.i.d.) random variables to construct new regularly varying random vectors on the positive orthant that possess tail dependence.Suppose that Z = (Z 1 , . . ., Z q ) T ≥ 0 is a vector of q ∈ N i.i.d.regularly varying random variables with tail index 1/α, such that there exists a sequence {b n } that yields where the first condition is equivalent to regular variation (1) in dimension p = 1.The random vector Z with independent components has a limit measure of multivariate regular variation characterized by Then, we can construct new regularly varying pdimensional random vectors X = (X 1 , . . ., X p ) T by exploiting transformed-linear operations, via a matrix product with a deterministic matrix A = (a 1 , . . ., a q ) ∈ R p×q + , with columns a j ∈ R p + , as follows: We write X = A • Z ∈ RV p + (α).This construction ensures that the multivariate regular variation property is preserved with the same index α (Corollary 1, Cooley and Thibaud, 2019).Furthermore, we require A to have a full row-rank.Based on the construction (2), it is possible to define a (different) inner product space spanned by the random variables obtained by transformed-linear operations on Z, where some but not all of the components of a j are further allowed to be non-positive.Following Lee and Cooley (2021), an inner product of X i , X k on the space spanned by all possible transformed-linear combinations of the elements of the random vector X constructed as in (2), may be defined as follows: where a ij refers to the entry in row i of the column j of the matrix A for i ∈ {1, . . ., p} and j ∈ {1, . . ., q}, and the corresponding norm becomes ||X|| = X, X .The metric induced by the inner product is d

Generality of the framework
In practice, given a random vector X for which we assume X ∈ RV p + (α), we will further assume that it allows for a stochastic representation as in (2).Since the constructions of type (2) form a dense subclass of the class of multivariate regularly varying vectors (if q is not fixed but allowed to tend to infinity, i.e., q → ∞), this assumption is not restrictive; see Fougères et al. (2013) and Cooley and Thibaud (2019).
Thanks to their flexibility, representations akin to the transformed-linear random vectors in (2) have recently found widespread interest in statistical learning for extremes.The fundamental model structure used in the causal discovery framework for extremes developed by Gnecco et al. (2021) is essentially based on a variant of (2).In the setting of max-linear models, in particular the graphical models of Gissibl (2018), we can use (2) to construct random vectors X possessing the same limit measure ν X as the max-linear vectors.
Finally, low-dimensional representations of extremal dependence in random vectors obtained through variants of the k-means algorithm can be shown to be equivalent to the extremal dependence induced by construction (2); see Janßen and Wan (2020).

Tail pairwise dependence matrix
The tail pairwise dependence matrix (TPDM, Cooley and Thibaud, 2019) is defined to summarize the pairwise extremal dependence of a regularly varying random vector using the second-order properties of its angular measure.Let α = 2, which ensures desirable properties; in practice, this condition can be ensured through appropriate marginal pre-transformation of data (Cooley and Thibaud, 2019).Then, the TPDM Σ of X ∈ RV p + (2) is defined as follows: where H X is the angular measure on S + p−1 = {w ∈ R p + : ||w|| 2 = 1} as introduced in Section 2.1.The matrix Σ is an extreme-value analog of the covariance matrix, and it has similar useful properties.It is positive semi-definite and completely positive, i.e., there exists a finite p × q matrix A with nonnnegative entries such that the TPDM can be factorized as Σ = AA T (Cooley and Thibaud, 2019, Proposition 5).The matrix A is not unique, in particular if we do not impose nonnegative entries.Specifically, for random vectors X obtained by the transformed-linear construction (2), the entries of Σ correspond to the values of the inner product σ ik = X i , X k .In the following, we further assume that Σ is positive definite, which guarantees the existence of the inverse matrix of the TPDM.
We emphasize that the special case where σ ik = 0 is equivalent to asymptotic tail independence of the components X i and X k (see Cooley and Thibaud, 2019), meaning that the conditional exceedance probability Sibuya, 1960;Ledford and Tawn, 1996).
By exploiting the property that the TPDM is completely positive, we can construct new transformedlinear random vectors that have the same TPDM as a given random vector X.Since we can always factorize the TPDM as Σ = AA T for some matrix A of dimension p × q, we can then use the construction (2) by multiplying A with a (new) random vector Z ∈ RV p + (2) of independent regularly varying random variables, and the resulting vector will still have TPDM Σ.It is also worth noting that as q → ∞, the angular measure of the new random vector can be arbitrarily close to that of X thanks to the denseness property of discrete angular measures.While A is not unique, we note that it is very important that the inner product depends only on the entries of the matrix Σ = AA T , such that the specific choice of A does not matter.
An estimator of the TPDM was proposed by Cooley and Thibaud (2019, Section 7.1).For an i.i.d.
sequence of vectors x t , t = 1, . . ., n samp , i.e., independent realizations from a random vector X ∈ RV p + (2), define where refers to the number of threshold exceedances, and the probability measure ) is obtained by normalizing H X .Moreover, m denotes an estimate of H X (S + p−1 ), and N X is the empirical counterpart of N X .We note that when the data are preprocessed to have a common unit scale, we can set m = p and there is no need to estimate the normalizing factor.The estimator (3) was discussed by Larsson and Resnick (2012) in the bivariate case.
In this section, we introduce our new measure, the partial tail correlation coefficient (PTCC), which is analogous to the partial correlation coefficient but tailored to heavy-tailed random vectors.
Let X = A • Z ∈ RV p + (α) be a p-dimensional vector constructed as in (2), with TPDM Σ.We write X ik = (X i , X k ) T ; X rest for the (p−2)-dimensional random vector obtained by removing the two components X i and X k from X; A ik for the matrix comprising the i-th and k-th columns of A; and A rest for the matrix A without its i-th and k-th columns.Moreover, we define the p-dimensional random vector X by re-ordering the columns of X as It is straightforward to show that the best transformed-linear predictor of X ik given X rest can be obtained as their minimum, respectively.Suppose that the TPDM of X is of the block-matrix form is the TPDM restricted to X rest , and is the cross-TPDM between X ik and X rest .Then, based on the projection theorem for vector spaces with inner products, we have that Straightforward calculations show that the prediction error e = X ik X ik has the following TPDM: Definition 1.The partial tail correlation coefficient (PTCC) of two random variables X i and X k is defined as the off-diagonal TPDM coefficient of the bivariate residual vector e in (4), such that transformed-linear dependence with respect to all other random variables is removed.
Definition 2. Let X ik = (X i , X k ) T and X rest be defined as above.Given X rest , we say that X i and X k are partially tail-uncorrelated if the PTCC of X i and X k (given X rest ) is equal to zero, i.e., if Σ ki − Σ k,rest Σ −1 rest,rest Σ rest,i = 0 as defined in (4).
Remark 1: Thanks to the properties of TPDMs, the residuals of two partially tail-uncorrelated random variables are necessarily asymptotically tail independent.
The following proposition links tail-uncorrelatedness to the entries of the inverse of the TPDM of X.
Proposition 3. Given the representation of the TPDM of X as a 3 × 3 block matrix as follows, where the dimensions of submatrices are as above, then the following two statements are equivalent: ( where Σ is the TPDM of the original vector X. Since Σ X is a positive definite and therefore an invertible covariance matrix, this result is a direct consequence of the equivalence of statements (i ) and (ii) of Proposition 1 of Speed and Kiiveri (1986).In the notation of Speed and Kiiveri (1986), we have the following index sets: a = {k, rest}, b = {i, rest}, and ab = a ∩ b = rest.The following corollary is a direct consequence of this result.
Corollary 4. Denote the inverse matrix of the TPDM of a random vector X by Q = Σ −1 .Then, where PTCC ik is the PTCC of components X i and X k .Recall that a PTCC equal to zero corresponds to partial tail-uncorrelatedness.

Learning extremal networks for high-dimensional extremes
Using the PTCC, we can now explore the partial tail correlation structure of multivariate random variables under the framework of multivariate regular variation.In this section, we define new graphical models to Figure 1: Examples of undirected graph structures: a tree (left), a decomposable graph (middle), and a nondecomposable graph (right).
represent extremal dependence for extremes.Thanks to the transformed-linear framework exposed in the previous section, we can proceed as for classical graphical models by replacing the classical covariance matrix with the TPDM.

Graphical models for extremes
Let G = (V, E) be a graph, where V = {1, . . ., p} represents the node set and E ∈ V × V the edge set.We call G an undirected graph if for two nodes i, k ∈ V , the edge (i, k) is in E if and only if the edge (k, i) is also in E. We show how to estimate graphical structures for extremes for any type of undirected graph in which we have no edge (i, k) if and only if the variables X i and X k are partially tail-uncorrelated given all the other variables in the graph, which we write as Our methods work for general undirected graph structures, including trees, decomposable graphs and non-decomposable graphs, see the example illustrations in Figure 1.Note, however, that our general method cannot restrict the estimated graph to be of a specific type, such as a tree.

Sparse representation of high-dimensional extreme networks
For high-dimensional extremes with a relatively large number of components p (e.g., up to tens and hundreds of variables), a graphical representation of the extremal dependence structure is desirable for reasons of parsimony and interpretability.We now introduce two efficient inference methods to learn extremal networks from high-dimensional data via the PTCC based on two state-of-the-art graphical methods: the extremal graphical Lasso, and the structured graph learning method via Laplacian spectral constraints.These two methods both work efficiently in high-dimensional settings and return an estimate of the underlying extremal dependence with sparse structure, i.e., with the cardinality of E being of the same order as the one of V .

Extremal graphical Lasso
Given an empirical estimator Σ of the TPDM and a tuning parameter λ ≥ 0, the optimization carried out by the extremal graphical Lasso method is expressed as follows (Friedman et al., 2008): where indicates positive-semidefiniteness and Q λ is a L 1 -regularized estimate of the precision matrix Note that thanks to the L 1 regularization, the estimate Q λ will tend to be sparse (with exact zero entries) and thus contains information on the extremal graph structure.A larger λ enforces a larger proportion of zeros in Q λ and hence fewer edges in the graph.Choosing an appropriate value for λ is thus critical.On the one hand, we want to enforce sparsity in the graph, where only significant connections are maintained in the network.On the other hand, Q λ should be well-defined, with estimation being stable, and with meaningful dependence structures in the estimated model.In our river discharge application in Section 5.1 we use a voting procedure to select the best value for λ, while in our global currency application in Section 5.2 we set the sparsity level to a pre-defined level for interpretation purposes.

Structured graph learning via Laplacian spectral constraints
As an alternative to the graphical Lasso approach, we can seek to include more structural information into the graph by using the structured graph Laplacian (SGL) method of Kumar et al. (2019), which assumes that the signal residing on the graph changes "smoothly" between connected nodes.This method allows us to better balance the sparsity and connectedness of the estimated precision matrix, thanks to additional constraints on the eigenvalues of the graph Laplacian operator that encodes the graph structure.For instance, if exactly one eigenvalue is zero and all other eigenvalues are positive, then the graph is connected.Laplacian matrix estimation can be formulated as the estimation problem for a precision matrix Q, which is therefore linked to our framework that uses the TPDM and its inverse.For any vector of eigenvalues λ ∈ S λ with appropriate a priori constraints for the desired graph structure defined through the set of admissible eigenvalues S λ , we set Q = Lw with L the linear operator that maps a non-negative set of edge weights w ∈ R

Simulation study
We present a simulation study with three examples where the corresponding true structures of the extremal dependence graphs are as in Figure 1, i.e., a tree (Case 1), a decomposable graph (Case 2), and a nondecomposable graph (Case 3).The simulated models are constructed as follows.We simulate a dataset of We then construct n replicates of the random vector X = (X 1 , X 2 , X 3 , X 4 ) T according to the following three cases, for which we also specify the true TPDM Σ and its inverse Q = Σ −1 : Case 1: Case 2: Case 3: We proceed as follows to infer the extremal graph structure in each case.First, we estimate the TPDM of X, Σ, based on the estimator Σ specified through (3) using the 99% quantile for the threshold r 0 (i.e., there are 1000 threshold exceedances to estimate the TPDM).Then, we apply the extremal graphical Lasso and SGL methods.In each setting, we test m 1 = 300 different values for the regularization parameter λ when using extremal graphical Lasso and m 2 = 400 different settings for the combination of α and β for the SGL method.The range of λ and {α, β} values is chosen to span a wide range of graphical structures, from fully connected to fully sparse (no connection).In all experimental results, we have found that when the true number of edges is achieved, both methods can retrieve 100% of the true extremal graph structure, i.e., all connections are correctly identified and the estimated graph has no wrong connections.This is illustrated in Figure 2, which displays the estimated extremal graph structure for Case 3 (general non-decomposable "square graph") when using the extremal graphical Lasso method.The heading of each display shows the range of λ values that leads to the estimated graph shown below.The tuning parameter λ controls the number of edges in the graph, i.e., the sparsity level: when λ decreases, the number of edges increases, and vice versa.Interestingly, the proposed method always retrieves true connections (i.e., it never yields wrong connections) whenever the estimated graph is as sparse, or sparser than the true graph.This simple experiment shows that our method is able to retrieve the true extremal dependence graph structure, provided the tuning parameter λ (or {α, β} for the SGL method) is well specified.While our numerical experiments were performed in dimension p = 4, we expect similar results to hold in higher dimensions provided enough data replicates are available.Our higher-dimensional data applications in Section 5 demonstrate that the estimated graph structures indeed make sense and yield interpretable results.
We also note that with our distribution-free approach, there is no universally optimal way of setting the tuning parameters.However, we can use problem-specific criteria to achieve the desired outcome (and therefore to set the penalty parameters); see the data applications in Section 5.1 and 5.2.

Applications
Risk networks are useful in quantitative risk management to elucidate complex extremal dependence structures in collections of random variables.We show two examples of both environmental and financial risk analysis.First, we study river discharge data of the upper Danube basin (Asadi et al., 2015), which has become a benchmark dataset for learning extremal networks in the recent literature.The true underlying physical river flow network is available, which can be used as a benchmark to compare the performance of our method with other existing approaches.Second, we apply our method to historical global currency exchange rate data from different historical periods, including different economic cycles, the COVID-19 period, and the period of the 2022 military conflict opposing Russia and Ukraine (2022.02.24-2022.09.26).

Extremal network estimation for a river network
We apply our method to study the dependence structure of extreme discharges on the river network of the upper Danube basin (see the left panel of Figure 3 for the topographic map).This region has been regularly affected by severe flooding events in its history, which have caused losses of human lives and damage to material goods.The original daily discharge data from 1960 to 2009 were provided by the Bavarian Environmental Agency (http://www.gkd.bayern.de),and Asadi et al. (2015) preprocessed the data, which now include n = 428 approximately independent events X 1 , . . ., X n ∈ R d recorded at d = 31 gauging stations located on the river network from three summer months (June, July, and August), obtained using declustering methods.The data were later also studied by Engelke and Hitz (2020)   the edges selected based on the extremal graphical Lasso method, obtained from multiple fits with a range of λ values producing different dependence structures, from fully connected to fully sparse graphs.

Graph structure learning using the SGL method
To enhance connectedness, we further explore the SGL method, which learns sparse graph structures under additional spectral constraints.In particular, as described in Section 3.2.2,we can control both the sparsity

Results: estimated extremal river discharge networks
For both the extremal graphical Lasso and the SGL method, it is important to carefully select the tuning parameters, λ and {α, β}, respectively.These tuning parameters impact both the sparsity level and the connectedness of the resulting graph structure.Ideally, we would like to obtain a sparse graph while keeping it connected.However, there is a tradeoff between these two requirements.Our approach is to control the overall sparsity level while imposing a soft connectedness condition: we start from the fully sparse graph (no edges) and sequentially add edges between nodes according to the ranking of the votes shown in Figures 4   and 5, until no node is left alone (i.e., each node has at least one connection with another node).The estimated graph structure thus prioritizes edges that are most often selected and is obtained by blending the : Extremal river discharge network estimated using the extremal graphical Lasso (left) and the SGL method (right).The edge thickness is proportional to the votes shown in Figures 4 and 5.
results from several model fits, which also makes it less sensitive to specific values of the tuning parameters.
Figure 6 displays the estimated extremal river discharge network using our approach for both the extremal graphical Lasso and the SGL method, respectively.The edge thickness is proportional to the votes shown in Figures 4 and 5. Recall that when two nodes are not connected, it means that they are partially tailuncorrelated in our framework.The estimation based on the extremal graphical Lasso method has a few more edges than the true physical river flow network.The extra links could be interpreted as being due to extremal dependence induced by regional weather events, though this is not very clear.By contrast, the estimation result based on the SGL method matches most of the true river flow connections, while the votes (shown in terms of the edge thickness) represent the strength of the extremal dependence connections.
Interestingly, this dependence strength seems well-aligned with the actual physical strength of the river flow.
Compared with existing methods (Engelke and Hitz, 2020;Klüppelberg and Krali, 2021), our estimated extremal network based on the SGL method looks more realistic (thus more easily interpretable), as it is closer to the true flow structure of the river network, though the recent results from Engelke et al. (2022) are quite similar to ours (with a few extra connections).

Extremal network estimation for global currency rate network
We now apply our method to explore historical global currency exchange rate data for 20 currencies from different historical periods, including two different global economic cycles (2009-2014 and 2015-2019, where the segmentation is determined from the world GDP cycles illustrated in Figure 9 from Appendix A), COVID-19 (2020COVID-19 ( .01.01-2022.02.23).02.23), and the period from the beginning of the 2022 military conflict between Russia and Ukraine until a most recent date when we downloaded the data (2022.02.24-2022.09.26).Historical data were downloaded from Yahoo Finance.We chose the currencies from all G20 countries, and also added the Ukrainian and Kazakhstani currencies.The list of selected currencies and their corresponding symbols can be found in Table 1 in Appendix A. Since the unit of the currencies is the US Dollar (USD), USD is not considered in our list of currencies under study.
First, we preprocess the historical daily closing prices of the currencies.An ARMA(1,1)-GARCH(1,1) time series model is fitted to the negative log return time series of each currency, and then standardized residuals are extracted and transformed marginally to Frechét margins with shape parameter α = 2.The extremal dependence graph structure governing negative log returns therefore represents partial associations among extreme losses, shedding light on the integration and/or vulnerability of major economies in periods of stress.To estimate this risk network, we follow the same procedure as before, estimating first the TPDM using the estimator in (3) with r 0 as the empirical 90% quantile, and then applying the SGL method.
In this analysis, we use only the SGL method, because it includes the extremal graphical Lasso method as a special case when β = 0, and it has shown better performance in the Danube river application.Furthermore, for comparability among different historical periods, we here fix the sparsity level to 80% (i.e., with only about 38 edges), rather than using the method based on votes.This approach yields a unique combination of {α, β} tuning parameters, which then yields the final estimated graph structure.To assess the estimation uncertainty of the graph structure, we have further conducted 300 bootstrap simulations, whereby standardized residuals are resampled with replacement and the TPDM is then re-estimated, as well as the graph structure fixing the same sparsity level (i.e., potentially with different selected {α, β} values for Different edge types (respectively thickest, thick, and thin) with different colors (respectively red, blue, and grey) indicate a frequency of > 90%, 70%-90%, and 50%-70%, respectively, to be selected among the 300 bootstrap fitted models, while absent edges indicate that this frequency is less than 50%.
each bootstrap simulation).The bootstrap results are shown in Figures 7 and 8 for the first two, and last two periods, respectively, where different edge types represent the "significance" of the displayed connections: the thickest edges (in red) indicate a frequency of at least 90% to be included among the 300 bootstrap fitted models; thick edges (in blue) indicate a frequency between 70% and 90% to be selected; thin edges (in grey) indicate a frequency between 50% and 70% to be selected; absent edges have been selected less than 50% of the time.Moreover, the size of nodes in the displayed graphs is proportional to their degree, i.e., to their number of connections.The bigger a node, the more connected it is, giving an idea of the centrality of a currency in the risk network.
We now provide some interpretation of the estimated risk networks, assuming that more strongly connected nodes tend to be more vulnerable/exposed to network risks, or are important currencies that strongly determine the behavior of other currencies in the monetary system.We can back up some of our findings using historical events and economic development regimes for different countries.From the left panel in Figure 7, representing the economic cycle 2009-2014, the strongest-connected currencies within the esti-are isolated and most of the other currencies are also less interconnected.As to ARS, Argentina was struck by COVID-19 during an already fragile economic situation, and its aforementioned strong international monetary ties might have led to strong extremal connectedness to other countries.Finally, the right panel of Figure 8 shows the risk network for currency exchange rates from the beginning of the 2022 military conflict between Russia and Ukraine until 2022.09.26.Since the length of the time period is shorter, there is higher estimation uncertainty, and no edges have been selected more than 90% of the time among the 300 bootstrap model fits.Nevertheless, we can still make the interesting observation that RUB (Russia) is the only isolated currency from the rest of the network, which might be due to the antagonism and strong economic sanctions imposed by western countries, as well as the dramatic changes in monetary policies.

Conclusions
We have proposed the partial tail correlation as a novel notion of extremal dependence measure that removes the effect of confounding variables through transformed-linear operations.Unlike other approaches from the recent literature, our new partial tail correlation coefficient (PTCC) assumes multivariate regular variation but does not rely on any further strict parametric assumptions.Furthermore, the PTCC has appealing theoretical properties and it can be used to define a new class of extremal graphical models, where the absence of edges indicates partial tail-uncorrelatedness between variables (i.e., when the PTCC of the corresponding edges equals zero).We have shown that the zero PTCC values between variable pairs can be retrieved by identifying the zero entries in the inverse tail pairwise dependence matrix (TPDM).This convenient property, which is akin to classical Gaussian graphical models, allows us to efficiently learn high-dimensional extremal networks defined in terms of the PTCC by exploiting state-of-the-art methods from graph theory, such as the graphical Lasso and structured graph learning via Laplacian spectral constraints.
Our graph-inference approach is flexible, can be applied to general undirected graphs, and easily scales to high dimensions.We demonstrate the effectiveness of our method as an exploratory tool for interpretable extremal network estimation.In our first application to river discharge data from the upper Danube basin, we show that the proposed method outperforms other existing methods by realistically capturing most physical using our proposed method.Finally, another direction to investigate concerns the geometric representation of multivariate extremes (Nolde and Wadsworth, 2022;Simpson et al., 2021).It would be interesting to see if the new PTCC can be defined through this geometric approach and to explore its links with other popular measures of extremal dependence.
for r > 0 and Borel subsets B H of S + p−1 .The normalizing sequence b n is not uniquely determined but must satisfy b n = L(n)n 1/α , where L(n) is a slowly varying function (at infinity), Laplacian constraints.The Laplacian matrix Q can be factorized as Q = U Diag(λ)U T (with an orthogonal matrix U ) to enforce the constraints on λ.Then the optimization problem can be formulated as follows:( λ, U ) = arg max λ,U max w log gdet(Diag(λ)) − tr( ΣLw) + α||Lw|| 1 + β 2 ||Lw − U Diag(λ)U T || 2 F , subject to w ≥ 0, λ ∈ S λ , and U T U = I,where S λ denotes the set of constrained eigenvalues, || • || F is the Frobenius norm, and gdet is the generalized determinant defined as the product of all positive values in λ.The optimization problem can be viewed as penalized likelihood if data have been generated from a Gaussian Markov random field; in more general cases such as ours, it still provides meaningful graphical structures since it can be viewed as a so-called penalized log-determinant Bregman divergence problem.Therefore, this method can be seen as an extension of the graphical Lasso that allows us to set useful spectral constraints with respect to the structure of the graph.A larger value of α increases the sparsity level of the graph.The hyperparameter β ≥ 0 additionally controls the level of connectedness, and a larger value of β enforces a higher level of connectedness of the estimated graph structure.

Figure 2 :
Figure 2: Estimated extremal graph structures using the extremal graphical Lasso method based on the PTCC for Case 3, as a function of the tuning parameter λ (shown at the top of each display).

Figure 3 :
Figure 3: Left: Topographic map of the upper Danube basin (from Asadi et al., 2015), showing 31 sites of gauging stations (red circles) and the altitudes of the region.Right: The true physical river flow connections; the arrows show the flow directions.
among others, using graphical models for extremes based on a conditional independence notion adapted to multivariate Pareto distributions.The true physical river flow connections and directions are represented by a directed graph shown in the right panel of Figure 3, where the arrows indicate the flow directions.This can serve as an accurate benchmark of the "true" conditional independence structure, against which we can compare the results from our proposed extremal graphical structure learning methods based on the PTCC.

Figure 4 :
Figure4: Left: Estimated TPDM of the river discharge data from the upper Danube basin.Right: Votes (%) of the edges selected based on the extremal graphical Lasso method.Darker red cells indicate that the corresponding edge has been selected more often by the graphical Lasso.

Figure 5 :
Figure 5: Left: Number of edges selected by the SGL method as a function of the tuning parameters α and β.Lighter blue cells correspond to parameter combinations producing sparser graphs.Right: Votes (%) of the edges selected based on the SGL method.Darker red cells indicate that the corresponding edge has been selected more often by the SGL method.
Figure6: Extremal river discharge network estimated using the extremal graphical Lasso (left) and the SGL method (right).The edge thickness is proportional to the votes shown in Figures4 and 5.