Optimal designs for copula models

Copula modelling has in the past decade become a standard tool in many areas of applied statistics. However, a largely neglected aspect concerns the design of related experiments. Particularly the issue of whether the estimation of copula parameters can be enhanced by optimizing experimental conditions and how robust all the parameter estimates for the model are with respect to the type of copula employed. In this paper an equivalence theorem for (bivariate) copula models is provided that allows formulation of efficient design algorithms and quick checks of whether designs are optimal or at least efficient. Some examples illustrate that in practical situations considerable gains in design efficiency can be achieved. A natural comparison between different copula models with respect to design efficiency is provided as well.


Introduction
Due to their flexibility in describing dependencies and the possibility of separating marginal and joint effects copula models have become a popular device for coping with multivariate data. in many areas of applied statistics eg. for insurances (Valdez, 1998), econometrics (Trivedi and Zimmer, 2006), medicine (Nikoloulopoulos and Karlis, 2008), marketing (Danaher and Smith, 2011), spatial extreme events (Wadsworth and Tawn, 2012), time series analysis (Patton, 2012), even sports (McHale and Scarf, 2011) and particularly in finance (Cherubini et al., 2004).
The concept of copulas, however, has only been rarely employed in experimental design with notable exceptions of spatial design in Li et al. (2011) and Pilz et al. (2012), and sequential trials in Schmidt et al. (2014).The design question for copula parameter estimation has to our knowledge just been raised in Denman et al. (2011), where a brute-force simulated annealing optimization was employed for the solution of a specific problem.By this paper we provide the necessary theory for fully embedding the situation into optimal design theory.Particularly we provide a Kiefer-Wolfowitz type equivalence theorem (Kiefer and Wolfowitz, 1960) in Section 3 as a basis for a substantial analysis of the arising issues in the example sections.
To be more concrete, let us consider a vector x T = (x 1 , . . ., x r ) ∈ X of control variables, where X ⊂ r is a compact set.The results of the observations and of the expectations in a regression experiments are the vectors: y(x) = (y 1 (x), , . . ., y m (x)), where β = (β 1 , . . ., β k ) is a certain unknown (trend) parameter vector to be estimated and η i (i = 1, . . ., n) are known functions.In the remainder of the paper we will focus on the case m = 2, but generalizations of our results are possible.
Let us call F Y i (y i (x, β)) the margins of each Y i for all i = 1, . . ., m and c Y (y(x, β), α) the joint probability density function of the random vector Y, where α = (α 1 , . . ., α l ) are unknown (copula) parameters.

Design issues
We need to quantify the amount of information on both (trend and copula) sets of parameters α and β respectively from the regression experiment embodied in the Fisher information matrix, which for a single information is a (k + l) × (k + l) matrix defined as where the submatrix m ββ (x) is the (k × k) matrix with the (i, j)th element defined as and the submatrices m βα (x) (k×l) and m αα (x) (l×l) are defined accordingly.
Here we model the dependence between Y 1 and Y 2 with a copula function ) and find the density of that copula from Definition 2. A probability distribution function ξ on the actual design space Ξ , which is the class of all the probability distributions on the Borel set X , is called a design measure.
The Information Matrix on a general design measure is M (ξ, β, α) = E(m(x, β, α)) where x is a random vector with distribution ξ.So for r independent observations at x 1 , . . ., x r , the corresponding Information matrix is and the aim of approximate optimal design theory is concerned with finding ξ * (β, α) such that it maximizes some scalar function φ(M (ξ, β, α)), the socalled design criterion.In the following we will consider only D-optimality, i.e. the criterion φ(M ) = log det M , if M is non singular.There exist several well written monographs on optimal design theory and its application, but in this paper we follow mainly the style and notation of Silvey (1980).

Equivalence theory
The cornerstone of a theoretical investigation into optimal design is usually the formulation of a Kiefer-Wolfowitz type equivalence relation, which is given in the following theorem.It is a generalized version of a theorem given without proof in Heise and Myers (1996) and follows from a multivariate version of the basic theorem given in Silvey (1980), its full proof can be found in the supplementary material.
Theorem 1.For a local parameter vector ( β, ᾱ), the following properties are equivalent: This theorem allows us the use of standard design algorithms such as of the Fedorov-Wynn-type (Fedorov, 1971;Wynn, 1970).It also provides simple checks for D-optimality through the maxima of which is usually called sensitivity function.
Definition 3.For the comparison of two different designs define the ratio where (k + l) is the number of the model parameters, which is called D-Efficiency of the design ξ with respect to the design ξ .
Note that the resulting optimal designs will now depend not only upon the trend model structure, but also upon the chosen copula and through the induced nonlinearities potentially also on the unknown parameter values for α and β, which is why we are resorting to localized designs around the values ( β, ᾱ).A main question of course concerns whether ignorance or wrong guesses of copula function and/or parameters may lead to inefficiencies of the designs.

Tools
For that purpose let us here give the list of copulas used in our examples (for more details see eg.Nelsen, 2006 or Durante andSempi, 2010).We provide the copula function along with the so-called Kendalls τ , which is a dependence measure that allows us to conveniently relate different copulas (for a definition and a more exhaustive comparison see Michiels and De Schepper, 2008).Definition 4.
1. Product Copula, which represents the independence case.

The linear case
Let us first consider a simple example reported in Fedorov (1971).For each design point x, we may observe an independent pair of random variables Y 1 and Y 2 , such that This case is covered by Theorem 1 and employing the product copula and Gaussian margins.The optimal design which we have computed is given in Figure 1 and is the same as reported in Fedorov (1971), namely ξ * = w i x i = 0.16 0.28 0.23 0.33 0 0.38 0.76 1.0 .
Let consider a more general case, when the joint distribution is described by a Gaussian copula and we thus allow the random variables Y 1 and Y 2 to be dependent.In this case the joint probability function of the random vector where Φ 2 (•, •; α) denotes the bivariate normal cumulative distribution function with correlation α ∈ (−1, 1) and Φ denotes the cumulative distribution function of the standard normal distribution N (0, 1) (see Meyer, 2013).
Our computations gave rise to the following Corollary 2. For different values of α the optimal design is the same as for the independence case, which is the Gaussian case with α = 0.
Note, that the sensitivity function now has a different scaling (with a maximum at 7) as we have an additional copula parameter.This corollary, however, is hardly surprising as this fact coincides with the classic findings for the multivariate Gaussian distribution by Krafft and Schaefer (1992).
But for a contrast consider now the Farlie-Gumbel-Morgenstern copula.Following our approach, we must calculate the density of the function: which eventually leads to expressions like for the information matrix.These integrals are not analytically solvable, but we can evaluate them numerically and we can use the algorithm in order to find the optimum designs.
The results are subsumed in Table 1, which displays the loss in D-efficiency that occurs by using the optimal design ξ * from (6) compared to the respective optimal designs for various copula models and Kendall's τ .It can be seen that these losses are generally quite small, except perhaps for extreme values of τ in the FGM model.

A binary bivariate model
Let us now do the same for a more elaborate case with potential application in clinical testing.We consider a bivariate binary response (Y i1 , Y i2 ), i = 1, . . ., n with four possible outcomes {(0, 0), (0, 1), (1, 0), (1, 1)} where 1 usually represents a success and 0 a failure (of eg. a drug treatment).For a single observation denote the joint probabilities of Y 1 and Y 2 by p y 1 ,y 2 = pr(Y 1 = y 1 , Y 2 = y 2 ) for (y 1 , y 2 = 0, 1).Now, define (8) The complete log-likelihood for the bivariate binary model is then given by where w i are the design weights and the log-likelihood for a single observation is given by As shown in Dragalin and Fedorov (2006) the Fisher information matrix for a single observation can then be written as where p = (p 11 , p 10 , p 01 ), P = diag(p) and e = (1, 1, 1) T .Some useful formulae for calculating information matrices in copula models can also be found in Schepsmeier and Stöber (2014).
The following example has initially been proposed in Denman et al. (2011).They assumed marginal probabilities of success given by the models where x ∈ [0, 10] and the initial parameters were They also investigated the three different copulas Frank, Clayton and Gumbel in order to make comparisons between the resulting designs.Note that in their calculations they employed a brute-force simulated annealing algorithm and had no means for checking definitive optimality, which is now possible through the equivalence theorem (1) provided.Note that the correlation range is restricted for these three copulae chosen, but we are generally not dependent upon this choice (Demirtas, 2013).So again by ignoring the dependence by not estimating the copula parameters, i.e. using just a four parameter model, for all copulas the same optimal design is found, which is given by ξ * = w i x i = 0.42 0.36 0.22 > 0 2.80 6.79 .
Using this design as a benchmark we note the losses in D-efficiency as reported in Table 2.These losses are now stronger than in the linear case and seem to (at least for the Frank and Gumbel copula) grow with the dependence, as is intuitive.In Figure 2 we display the designs and sensitivity functions for a representative case.Denman et al. (2011) also compared designs for various copula choices against each other in their Table 8.However, they have been using the same parameter values for these copulas, which does not seem sensible.We instead provide a comparison along the same Kendall's τ values in Table 2, which naturally now shows much smaller discrepancies.Figure 2: The optimal designs and the sensitivity functions for the binary example (Clayton left, Gumbel right).The copula parameters chosen correspond to Kendall's τ = 0.816.

Discussion
Although the effects of ignoring the copula parameter seem to be rather mild judging by our examples, we expect stronger effects for some more nonsymmetric copulae (see eg. Klement and Mesiar, 2006), which are subject to our current research.
In general, our theory forms the basis to investigate further showcase examples from the literature, like e.g. in Oakes and Ritz (2000) or eventually treat mixed discrete/continuous type models like in de Leon and Wu (2011).Particularly for the latter, but also quite generally the methods provided in this paper can thus be expected to be valuable for real applications from clinical trials, environmental sampling, industrial experiments, etc..
It is certainly of interest to extend the methods to models for which the copula parameters themselves are model-dependent such as in Noh et al. (2013), which we plan to do future research on.

Definition 5 (Gâteaux and Fréchet derivative).
Considering two elements M 1 and M 2 in M, the Gâteaux derivative of φ at M 1 in the direction of M 2 is: the Fréchet derivative of φ at M 1 in the direction of M 2 is: The following are the properties of the derivatives that we defined before: It is clear that if we put ε = 1 in the previous equation, we obtain: According to the definitions of Fréchet and Gâteaux derivatives, we can stress the following relationship between them: Then, if we assume the differentiability of φ it is clear that So, if M is a random matrix, the following equivalence holds: Theorem 3. Suppose to have a fixed parameters vector ( β, ᾱ), a concave function φ on M ( β, ᾱ) which is also differentiable at all points of M ( β, ᾱ) where φ(M ) < −∞, so where a φ optimal measure exists.
Then the following are equivalent: Proof.Let us prove the theorem by double implications.
For the properties of the function φ, the following relation holds:

D-optimality
Let consider now as design criterion the following function:

Figure 1 :
Figure 1: Sensitivity function (left axis) and optimal design (right axis) for the Fedorov example.

Table 1 :
Losses in D-efficiency (in bold) by ignoring the dependence in percent.

Table 2 :
Losses in D-efficiency (in bold) by ignoring the dependence in percent.

Table 3 :
Losses in D-efficiency by comparing the true copula model with the assumed one for a fixed Kendall's tau value.