The bivariate $K$-finite normal mixture"blanket"copula

There exist many bivariate parametric copulas to model bivariate data with different dependence features. We propose a new bivariate parametric copula family that cannot only handle various dependence patterns that appear in the existing parametric bivariate copula families, but also provides a more enriched dependence structure. The proposed copula construction exploits finite mixtures of bivariate normal distributions. The mixing operation, the distinct correlation and mean parameters at each mixture component introduce quite a flexible dependence. The new parametric copula is theoretically investigated, compared with a set of classical bivariate parametric copulas and illustrated on two empirical examples from astrophysics and agriculture where some of the variables have peculiar and asymmetric dependence, respectively.


Introduction
Multivariate response data abound in many applications including insurance, risk management, finance, health and environmental sciences. Data from these application areas have different dependence structures including features such as tail dependence (Joe, 1993), that is dependence among extreme values. Modelling dependence among multivariate outcomes is an interesting problem in statistical science. The dependence between random variables is completely described by their multivariate distribution. One may create multivariate distributions based on particular assumptions thus, limiting their use. For example, most existing multivariate distributions assume margins of the same form (e.g., Gaussian, Poisson, etc.) or limited dependence (e.g., tail independence, positive dependence, etc.).
The power of copulas for dependence modelling is due to the dependence structure being considered separate from the univariate margins. Copulas are a useful way to model multivariate data as they account for the dependence structure and provide a flexible representation of the multivariate distribution. They allow for flexible dependence modelling, different from assuming simple linear correlation structures and normality, which makes them well suited to the aforementioned application areas. In particular, the theory and application of copulas have become important in finance, insurance and other areas, in order to deal with dependence in the joint tails (Joe et al., 2010).
For the bivariate case, a rich variety of copula families is available and their properties are well-established (Joe, 1997(Joe, , 2014Nelsen, 2006). Nevertheless, a problem in practical applications is to identify the most plausible parametric family of bivariate copulas for dependence modelling (Durrleman et al., 2000;Huard et al., 2006;Nikoloulopoulos and Karlis, 2008). Furthermore, sometimes none of the bivariate parametric copula families provides a good fit. For example, Nagler and Czado (2016) and Czado (2019) analysed the dependence between the Major Atmospheric Gamma Imaging Cherenkov (MAGIC) telescope variables (Bock et al., 2004) and pointed out the uncommon characteristics of the dependence structure between some of the variables, which do not correspond to any bivariate parametric copula. Figure 1 depicts, in particular the relationship between the length of the major axis of the ellipse (Length) and the third root of the third moment along the major axis (M3Long), and indeed reveals that none of the existing parametric families of bivariate copulas (see e.g., Joe 1997Joe , 2014 can model the joint distribution of these variables. Note in passing that in the right panel graph of Figure 1, we transform the data to the uniform scale by applying their empirical distributions, in order to isolate the effect of the marginal distributions and solely focus on the dependence structure. In this paper, we propose a new parametric family of copulas that can represent the dependence structure between the aforementioned MAGIC variables. It can also remedy the copula selection issue as can "nearly" approximate any parametric family of bivariate copulas. A multivariate 2-finite normal mixture (FNM) copula has been proposed by Nikoloulopoulos and Karlis (2009) to model multivariate discrete data. The correlation matrix for each mixture component was restricted to the identity matrix with the mixing operation introducing the dependence among the discrete responses. Therefore, it has a rather simple computational form, but suffers from a restricted range of attainable dependence. We will study the full dependence capacity of the bivariate K-FNM copula, where K is the number of mixture components, by using general correlation matrices for each mixture component and we will show that the the proposed K-FNM is a "blanket" copula, i.e., a copula that can "nearly" approximate any bivariate parametric copula. A similar construction in the literature, which is called Bayesian non-parametric estimation of a copula (Wu et al., 2015;Dalla Valle et al., 2018), takes BVN copulas as the mixture components, hence it allows only for reflection symmetric dependence and is not as general as the proposed K-FNM copula, which can allow reflection asymmetric dependence.
The remainder of the paper proceeds as follows. Section 2 introduces the bivariate K-FNM copula, discusses its properties and provides computational details for maximum likelihood (ML) estimation. Before that it has a brief overview of relevant copula theory. Section 3 shows that the proposed copula is a "blanket" copula.
Section 4 studies the small-sample efficiency of the proposed ML estimation technique. Section 5 illustrates the methods on two empirical examples from astrophysics and agriculture where some of the variables have peculiar and asymmetric dependence, respectively. We conclude with some discussion in Section 6.
2 The bivariate K-finite normal mixture copula In this section we will define the bivariate K-FNM copula and study its properties. Before that, the first subsection has some background on bivariate copulas.

Overview and relevant background for copulas
A copula is a multivariate cumulative distribution function (cdf) with uniform U (0, 1) margins (Joe, 1997;Nelsen, 2006;Joe, 2014). If F 12 is a bivariate cdf with univariate margins F 1 , F 2 , then Sklar's (1959) theorem implies that there is a copula C such that The copula is unique if F 1 , F 2 are continuous, but not if some of the F j have discrete components. If F 12 is continuous and (Y 1 , Y 2 ) ∼ F 12 , then the unique copula is the distribution of (U 1 , where F −1 j are inverse cdfs. In particular, if Φ 12 (·; θ) is the bivariate normal (BVN) cdf with correlation θ and standard normal margins, and Φ is the univariate standard normal cdf, then the BVN copula is If C(·; θ) is a parametric family of copulas and F j (·; η j ) is a parametric model for the jth univariate margin, then C F 1 (y 1 ; η 1 ), F 2 (y 2 ; η 2 ); θ is a bivariate parametric model with univariate margins F 1 , F 2 . For copula models, the variables can be continuous or discrete (Nikoloulopoulos, 2013;Nikoloulopoulos and Joe, 2015).
(4) Subsequently, one can derive the bivariate K-FNM copula density as below where f and f 2 is the univariate and bivariate density, respectively, of the K-FNM distribution.

Dependence properties of the K-FNM distribution
We study the dependence properties of the bivariate K-FNM distribution as these will be inherited to the copula.
The mean vector and covariance matrix of the K-FNM are given respectively by An aspect of mixture models is their lack of identifiability, but this can be overcome by imposing some restrictions in the parameters. In our approach, to overcome the typical identifiability issues we priory assume that µ 1 + . . . + µ K = 0, i.e., the mean vectors become and that the variances of the mixture components are set to one, i.e., σ 2 k1 = σ 2 k2 = 1 for k = 1, . . . , K.
We depict some dependence shapes that can be imposed by the bivariate K-FNM copula with the above parametrization in Figure 2.

Maximum likelihood estimation
In copula models, a copula is combined with a set of univariate margins. This is equivalent to assuming that variables Y 1 , Y 2 have been transformed to uniform random variables U 1 = F 1 (Y 1 ), U 2 = F 2 (Y 2 ). For data y ij , i = 1, . . . , n, j = 1, 2, we use either non-parametric or parametric univariate distributions to transform the data y ij to copula data u ij = F j (y ij ), i.e., data on the uniform scale. These semi-parametric and parametric estimation techniques have been developed by Genest et al. (1995) and Joe (2005), respectively, and can be regarded as two-step approaches on the original data or simply as the standard one-step maximum likelihood (ML) method on the transformed (copula) data.
In this section we will show that the K-FNM copula is quite close to any parametric family of copulas. We will use the Kullback-Leibler methodology (Joe, 2014, pages 234-241) to compare the new parametric copula family with existing parametric families of copulas. Before that, the first subsection provides choices of parametric bivariate copulas.

Existing parametric families of copulas
We will consider copula families that have different tail dependence (Joe, 1993) or tail order (Hua and Joe, 2011).
Otherwise, it is reflection asymmetric often with more probability in the joint upper tail or joint lower tail. Upper tail dependence means that is the survival or reflected copula of C; this "reflection" of each uniform U (0, 1) random variable about 1/2 changes the direction of tail asymmetry. Under some regularity conditions (e.g., existing finite density in the interior of the unit square, ultimately monotone in the tail), if there exists κ L (C) > 0 and some L(u) that is slowly varying at 0 + (i.e., can be defined by the reflection of (U 1 , U 2 ), i.e., After briefly providing definitions of tail dependence and tail order we provide below a list of bivariate parametric copulas with varying tail behaviour: • Reflection symmetric copulas with intermediate tail dependence such as the BVN copula in (2) with where θ is the copula (correlation) parameter.
• Reflection symmetric copulas with tail dependence, such as the t ν copula with where θ is the correlation parameter of the bivariate t distribution with ν degrees of freedom, and T ν is the univariate t cdf with ν degrees of freedom.
The aforementioned bivariate copula families are sufficient for applications because tail dependence and tail order are properties to consider when choosing amongst different families of copulas, and the concepts of upper/lower tail dependence and upper/lower tail order are one way to differentiate families. Nikoloulopoulos and Karlis (2008) and Joe (2014) have shown that it is hard to choose a copula with similar tail dependence properties from real data because copulas with similar tail dependence properties provide similar fit.

Kullback-Leibler distance and sample size
For inferences based on likelihood, the Kullback-Leibler (KL) distance is relevant, especially as the parametric model used in the likelihood could be misspecified (Joe, 2014). Typically, one considers several different models when analysing data, and from a theoretical point of view, the KL distance of pairs of competing models provides information on the sample size needed to discriminate them.
We will define the KL distance for two copula densities and the expected log-likelihood ratio. Because the KL is a non-negative quantity that is not bounded above, we use also the expected value of the square of the log-likelihood ratio in order to get a sample size value that is an indication of how different two copula densities are. Consider two copula densities (competing models) c 1 and c 2 with respect to Lebesgue or counting measure in R 2 . The KL distance between copulas with densities c 1 , c 2 is defined as The KL distance can be interpreted as the average difference of the contribution to the log-likelihood of one observation.
We use the log-likelihood ratio to get a sample size n c 1 ,c 2 which gives an indication of the sample size needed to distinguish c 1 and c 2 with probability at least 0.95. If c 1 , c 2 are similar, then c 1 , c 2 will be larger, and if c 1 , c 2 are far apart, then n c 1 ,c 2 will be smaller. The calculation is based on an approximation from the Central Limit Theorem and assumes that the square of the log-likelihood ratio has finite variance when computed with c 1 being the true density (Joe, 2014). If the variance of the log-density ratio is then the KL sample size is This is larger when KL(c 1 , c 2 ) is small or the variance σ 2 c 1 is large.

Minimizing the KL distance
For a theoretical likelihood comparison between existing bivariate parametric families of copulas and the bivariate K-FNM copula we minimize the KL distance in (6) where the true c 1 is the copula density of each of parametric bivariate copulas in Subsection 3.1 and c 2 is the copula density of the K-FNM copula in (5), and hence (a) obtain the parameters of the K-FNM copula that is quite close to the true copula in KL distance, (b) the KL sample size for these parameters. The minimized KL distances and resultant sample sizes will show the similarity or dissimilarity of the K-FNM copula with the existing parametric families of copulas.

Numerically evaluate
With Gauss-Legendre quadrature, the same nodes and weights are used for different functions; this helps in yielding smooth numerical derivatives for numerical optimization via quasi-Newton (Nash, 1990). Our comparisons show that n q = 15 is adequate with good precision to at least at four decimal places; hence it also provides the advantage of fast implementation.   Table 1 shows the minimized KL distances and the corresponding 2-FNM copula parameters and KL sample sizes for comparing 1-parameter copula families, with symmetric or asymmetric dependence as the Kendall's τ varies from 0.1 to 0.9, versus the bivariate 2-FNM copula. Table 2 shows the minimized KL distances and the corresponding 2-FNM or 3-FNM copula parameters and KL sample sizes for comparing the BB1 copula, with reflection asymmetric tail dependence (λ L = λ U ) as the lower and upper tail dependence varies from 0.1 to 0.9 and from 0.9 to 0.1, respectively, versus the bivariate 2-or 3-FNM copula. Table 3 shows the minimized KL distances and the corresponding 2-FNM or 3-FNM copula parameters and KL sample sizes for comparing the t ν copula for a small ν, with reflection symmetric tail dependence (λ L = λ U ) as the Kendall's τ varies from 0.1 to 0.9, versus the bivariate 2-or 3-FNM copula.  The conclusion from the values in the tables are: • The K-FNM copula is close to any parametric bivariate family of copulas and a large sample size is required to distinguish when the Kendall's τ values range from 0.1 (weak dependence) to 0.5 (moderate dependence).
• To approximate copulas with refection symmetric or asymmetric tail dependence, they are required up to K = 3 mixture components, while for any 1-parameter family K = 2 mixture components are sufficient.
• Since the K-FNM copula and each of the parametric families of copulas have the same strength of dependence as given by Kendall's τ , the magnitude of the KL distance is related to the closeness of the strength of dependence in the tails. This is because copula densities can asymptote to infinity in a joint tail at different rates (tail order less than dimension d) or converge to a constant in the joint tail (if tail order is the dimension d or larger), see e.g, Joe (2014).
• Copula families with stronger dependence have larger KL distance with the K-FNM copula than those with weaker dependence when strength of dependence in the tails are different based on the tail orders.  Tables 1-3, i.e., the ones that the FNM copulas are close in terms of KL distance to the true copulas, and normal margins, along with the contour plots of the true copulas with normal margins. We summarize the case of τ = 0.5 (λ L = 0.4, λ U = 0.6 for BB1).
If two copula models are applied to discrete variables and have the same strength of dependence as given by the Kendall's τ , then the KL distance gets smaller. This is because the different asymptotic rates in the joint  tails of the copula density do not affect rectangle probabilities for the log-likelihood with discrete response (Joe, 2014). This means that a discretized K-FNM copula model will be close to any copula model for discrete data even for strong dependence. To show that we use ordinal response variables, say Y 1 , Y 2 with regressions on a scalar covariate x, which is assumed to take X values equally spaced in [−1, 1]. Let Z be a latent variable with cdf F, such that Y = y if α y + βx ≤ Z ≤ α y+1 + βx, y = 0, . . . , Y − 1, where Y is the number of categories of Y (without loss of generality, we assume α 0 = −∞ and α Y = ∞), and β is the slope of x. From this definition, the ordinal response Y j is assumed to have probability mass function (pmf) Note that G normal leads to the probit model and G logistic leads to the cumulative logit model for ordinal response. With copula families, the bivariate pmf (see e.g., Nikoloulopoulos and Karlis 2010) can be obtained Let f and g denote the bivariate pmfs defined as in (7) for the bivariate Clayton and K-FNM copula, respectively. Then the KL(f, g) is log f (y 1 , y 2 |x) g(y 1 , y 2 |x) f (y 1 , y 2 |x). Table 4 shows the minimized KL distances, the corresponding 2-FNM copula parameters and KL sample sizes for comparing the discretized Clayton copula model, as the Kendall's τ varies from 0.1 to 0.9, versus the discretized 2-FNM copula model. We show the comparison results versus the Clayton copula, as in Table 1 it was revealed that the Clayton copula is the 1-parameter copula family which is the most far apart from the 2-FNM copula for continuous responses. We used univariate ordinal regressions, but note that using ordinal probit regressions led to similar results.
The conclusions from the table and the other computations we have done for other copula families are: • The K-FNM copula is close to any parametric bivariate family of copulas if two copulas models are applied to discrete variables.
• With discrete response variables, it takes larger sample sizes to distinguish the K-FNM copula (because tails of the copula densities would not be "observed").
• The KL distances (sample sizes) get larger (smaller) with less discretization, i.e., as Y increases.

Simulations
To gauge the small sample efficiency of the ML estimation method in Section 2.4 to estimate the K-FNM copula parameters, we performed several simulation studies using K-FNM copula models with various parameter choices for K = 2, 3. We report here typical results from these experiments.
We randomly generated B = 10 4 samples of size n = 100, 300, 500 from the 2-and 3-FNM bivariate copulas with exponential margins and marginal parameters λ 1 = 0.5 and λ 2 = 1. We have transformed the simulated data to uniform random variables using their empirical distributions, i.e., we have approached estimation of the K-FNM copula parameters using the semi-parametric estimation (Genest et al., 1995). We have used as initial values the ones that resemble the independence copula. Table 5 contains the copula parameter values, the bias, standard deviation (SD) and the root mean square errors (RMSE) of the ML estimates, along with the average of their theoretical SDs. The theoretical SD of the ML estimate is obtained via the gradients and the Hessian computed numerically during the maximization process. The conclusions from the table and the other computations we have done are that • ML is highly efficient according to the simulated biases, SDs and RMSEs as the sample size increases.
• The SDs computed from the simulations are close to the asymptotic SDs as the sample size increases.

Empirical examples
In this section we illustrate the proposed methodology by analysing two real data examples with distinct dependence structures, the first in the area of astrophysics and the second in agriculture. In Section 5.1 we analyse the two aforementioned MAGIC variables with peculiar dependence that typical bivariate copulas fail to model, while in Section 5.2 we analyse three variables from the 1985 survey of nutritional habits commissioned by the United States Department of Agriculture that have strong reflection asymmetric dependence.
We estimate each marginal distribution non-parametrically by the empirical distribution function of Y j , viz.
where r ij denotes the rank of y ij . Hence we allow the distribution of the continuous margins to be quite free and not restricted by parametric families. We use simple diagnostics to identify the suitable copula family.
Although copula theory uses transforms to standard uniform margins, for diagnostics, we convert the original data to normal scores using the normal quantiles of their empirical distributions. With a bivariate normal scores plot (Nikoloulopoulos et al., 2012) one can check for deviations from the elliptical shape that would be expected with the BVN copula, and hence assess if tail asymmetry exists on the data.
Having discussed why more flexible dependencies are needed we proceed with the K-FNM copula models and construct a plausible K-FNM copula model, to capture any type of dependence. For a baseline comparison, we initially fit the typical copula families presented in Section 3.1. To make it easier to compare strengths of dependence, we convert the estimated copula parameters to Kendall's τ 's, lower tail dependence λ L and upper tail dependence λ U via the relations in Joe (2014, Chapter 4). The estimated Kendall's τ of the K-FNM copula, viz.
To find a copula model that provides a good fit we don't use goodness-of-fit procedures (see e.g., Genest et al. 2009 and the references therein), but we rather adopt the Akaike's information criterion (AIC). The goodnessof-fit procedures involve a global distance measure between the model-based and empirical distribution, hence they might not be sensitive to tail behaviours and are not diagnostic in the sense of suggesting improved parametric models in the case of small p-values (Joe, 2014). For vine copulas, Dissmann et al. (2013) found that pair-copula selection based on likelihood and AIC seem to be better than using bivariate goodness-of-fit tests.
The AIC is −2 × ℓ + 2 × (# model parameters) and a smaller AIC value indicates a copula model better approximates both the dependence structure of the data, and the strength of dependence in the tails.

MAGIC telescope
Ground-based atmospheric Cherenkov telescopes using the imaging technique are a useful addition to the variety of instruments used by astrophysicists. The MAGIC telescope, located on the Canary islands, observes high-energy gamma rays, detecting the radiation emitted by charged particles produced inside electromagnetic showers. Depending on the energy of the primary gamma, Cherenkov photons get collected, in patterns (called the shower image), allowing to discriminate statistically those caused by primary gammas (signal) from the images of hadronic showers initiated by cosmic rays in the upper atmosphere (background). Typically, the image of a shower is an elongated cluster; its long axis is oriented towards the camera center if the shower axis is parallel to the telescope's optical axis, i.e. if the telescope axis is directed towards a point source. If the depositions were distributed as a BVN, this would be an equidensity ellipse. The characteristic parameters of this ellipse are among the image parameters. The energy depositions are typically asymmetric along the major axis (Bock et al., 2004).
We apply the K-FNM copula to 2 out of 10 MAGIC image parameters in Bock et al. (2004). Our objective is to describe the joint distribution of the Length and M3Long that have a peculiar dependence. The data set with the 10 MAGIC image parameters is available from the UCI Machine Learning Repository web page and comprises n = 19, 020 observations. In Figure 4 we depict the bivariate normal scores plot for the Length and M3Long. From the plot, it is revealed that none of the existing parametric families of copulas can adequately model the dependence structure between the variables. Table 6 gives the AICs, estimated copula parameters and their SE, along with the family-based Kendall's τ and tail dependence parameters λ L , λ U for each fitted parametric family of copulas. The AICs show, that among the existing parametric families of copulas, the t δ copula provides the best fit.
Then we exploit the use of the K-FNM copula to construct a plausible copula family to represent the joint distribution of Length and M3Long. Table 7 gives the AICs, estimated copula parameters and their SE, along with the family-based Kendall's τ for different numbers of components. The AICs show, that the 3-FNM copula provides the best fit and provides much better fit than the t δ , since the AIC has been improved by 22473.8 = −4590.3 − (−27064.1). Note in passing that using K > 3, the estimated mixing probabilities for the extra components were close to zero, and, hence, there was no improvement in fit. In Figure 5 we depict the estimated contour plots of the 2-and 3-FNM copulas with standard normal margins, along with the bivariate  The new-parametric family of copulas does not only allow to make accurate inferences that are based on the joint distribution, but also provides superior statistical inference for the parameters of interest, such as Kendall's τ . From Table 9, it is revealed that the Kendall's τ was underestimated using simple parametric families of copulas and a change from a τ -value of 0.23 (t δ -based) to one slightly larger than 0.30 has been achieved.  a strongly asymmetric dependence structure between the intakes of calcium and iron, and between the intakes of calcium and protein. In this section we apply the K-FNM copula to the pairs identified as asymmetric to illustrate that it can sufficiently allow for reflection asymmetric dependence. In Figure 6 we depict the bivariate normal scores plots for the pairs identified as asymmetric. From the plots, it is revealed that there is more skewness in the lower tail. Table 8 gives the AICs, estimated copula parameters and their SE, along with the family-based Kendall's τ and tail dependence parameters λ L , λ U for each fitted parametric family of copulas. The AICs show, that  among the existing parametric families of copulas, the survival BB7 copula provides the best fit for both pairs identified as asymmetric.
Then we exploit the use of the K-FNM copula to construct a plausible copula family to represent the joint distribution of both pairs of variables. It has been revealed, that for both pairs K = 2 mixture components  are sufficient to describe their dependence. Table 9 gives the AICs and estimated 2-FNM copula parameters, along with their standard errors. The AICs show, that between the intakes of calcium and iron and between the intakes of calcium and protein the 2-FNM copula provides better fit than the survival BB7, since the AIC has been improved by 3.1 = −240.6 − (−243.7) and 7.1 = −284.6 − (−291.7), respectively. In Figure 7 we depict the estimated contour plots of the 2-FNM and survival BB7 copulas with standard normal margins, along with the bivariate normal scores plot for the pairs identified as asymmetric in the nutrient data set. From the plots, it is revealed that the 2-FNM copula provides a nearly identical or even better representation of the joint distribution compared to the survival BB7 (best fit amongst the existing parametric families of copulas).

Discussion
We have proposed the K-FNM parametric family of bivariate copulas and demonstrated that the new family is so flexible, it removes the ad-hoc constraints on the tails of existing parametric copula families, and is able to handle various dependence patterns that appear in the existing parametric bivariate copula families.
There exist many bivariate copula families, and as the new copula family can "nearly" approximate any of these, the selection of the appropriate copula family among many candidates can be subsided by solely using the K-FNM copula. This applies when the data are continuous and have weak to moderate dependence and when the data are discrete for any different strength of dependence.
Given that bivariate copulas are building blocks for many multivariate dependence models such as the vine (e.g., Nikoloulopoulos et al. 2012;Panagiotelis et al. 2012;Dissmann et al. 2013) and factor (e.g., Krupskii and Joe 2013;Nikoloulopoulos and Joe 2015;Kadhem and Nikoloulopoulos 2021) copula models, there is much potential of the proposed copula for building up more complex multivariate dependence models. Future research will focus on exploring this potential in modelling real multivariate datasets that have complex dependence structures.