A stochastic recurrence equations approach for score driven correlation models

ABSTRACT We describe stationarity and ergodicity (SE) regions for a recently proposed class of score driven dynamic correlation models. These models have important applications in empirical work. The regions are derived from sufficiency conditions in Bougerol (1993) and take a nonstandard form. We show that the nonstandard shape of the sufficiency regions cannot be avoided by reparameterizing the model or by rescaling the score steps in the transition equation for the correlation parameter. This makes the result markedly different from the volatility case. Observationally equivalent decompositions of the stochastic recurrence equation yield regions with different shapes and sizes. We use these results to establish the consistency and asymptotic normality of the maximum likelihood estimator. We illustrate our results with an analysis of time-varying correlations between U.K. and Greek equity indices. We find that also in empirical applications different decompositions can give rise to different conclusions regarding the stability of the estimated model.


Introduction
Time-variation in correlations is an important feature of economic and nancial data and a crucial ingredient of empirical analyses, such as the assessment of risk and the construction of investment portfolios. Available models for capturing the time-variation in correlations include, amongst many others, the Baba-Engle-Kra -Kroner (BEKK) model of Engle and Kroner (1995), the switching correlation models of Pelletier (2006), the dynamic conditional correlation (DCC) model of Engle (2002) with its adaptation by Aielli (2013), the dynamic equicorrelation (DECO) model of Engle and Kelly (2012), the dynamic copula models of Patton (2009) and Oh and Patton (2012), and the score driven models of Creal et al. (2011Creal et al. ( , 2013 and Harvey (2013); see also the overviews of Bauwens et al. (2006) and Silvennoinen and Teräsvirta (2009).
We focus on the stochastic properties of the recently proposed score driven models of Creal et al. (2011Creal et al. ( , 2013 and Harvey (2013), which we refer to as generalized autoregressive score (GAS) models. These models have proved particularly useful when modeling fat-tailed or skewed data, such as o en encountered in empirical nance; see for example Janus et al. (2014), Harvey and Luati (2014), . The dynamics of correlations and volatilities in these models are driven by the score of the predictive conditional distribution. If the latter is fat-tailed, the score driven dynamics automatically correct for in uential observations, see Creal et al. (2011). In this way, they share some similarities with models from the robust generalized autoregressive conditional heteroskedasticity (GARCH) literature; see for example Boudt et al. (2013). The score driven approach used in the construction of GAS models, however, provides a much more general and uni ed framework for parameter dynamics that is applicable far beyond the volatility and correlation context; see Creal et al. (2013Creal et al. ( , 2014 for a range of other examples. In addition, from a forecasting perspective GAS models o en have a similar performance to correctly speci ed state-space models, see Koopman et al. (2012).
Despite their proven empirical usefulness, the theoretical properties of GAS models are less well developed. The complication lies in the highly nonlinear transition dynamics of the time-varying parameter in typical GAS models. In this article, we contribute to our understanding of the stochastic properties of GAS models for dynamic correlations. The fundamental question is to understand which parameterizations, and parameter values generate stationary and ergodic (abbreviated as SE from now on) time series processes. This o ers an important characterization of the stochastic properties of GAS models. SE properties can be combined with the existence of unconditional moments for the objective function to establish proofs of consistency and asymptotic normality of extremum estimators; see, e.g., Straumann and Mikosch (2006) for maximum likelihood estimation of nonlinear conditional volatility models, Francq and Zakoian (2011) and Boussama et al. (2011) for the case of multivariate GARCH models, and Harvey (2013) for GAS volatility models. For each correlation model we consider, we identify the parameter values that ensure the SE property and call this the "SE region" of the parameter space. To establish SE regions, we follow the classical average contraction argument for stochastic recurrence relations as laid out in the su cient conditions formulated by Bougerol (1993). Given these conditions, we compute numerically the SE regions for a range of empirically relevant models.
We have four contributions. First, we are the rst to derive SE regions for the class of score driven correlation models that have been suggested recently in the literature. We show that the SE su ciency regions take a highly nonstandard form, dissimilar to the familiar triangle and curved triangular shapes for the GARCH model; see Nelson (1990). In an empirical example, we demonstrate that the conditions for nonlinear recurrence equations can be used to ensure stationarity of concrete models, applied on real data. This also extends the results in Blasques et al. (2014b) for volatility and tail index models with univariate observations to the case of time-varying parameters and multivariate observations.
Second, we show that the shape and size of the SE su ciency region as derived from the conditions of Bougerol (1993) depends on the way the stochastic recurrence equation for the correlation is constructed from bivariate uncorrelated noise. In particular, we show that the choice of the square root of the correlation matrix in this construction has a nontrivial e ect on the size of the SE su ciency region.
Third, we show analytically why the correlation case is markedly di erent from the volatility case. For the volatility case, Harvey (2013) shows that modeling the log-volatility renders the information matrix independent of the time-varying volatility. The resulting stochastic recurrence equation becomes linear, and we can use linear process theory to study the SE properties. A similar feature is generally not available for the dynamic correlation model: neither a reparameterization of the correlation nor a scaling of the score steps makes the stochastic recurrence equation a linear process. The reason is that unlike in the volatility case, the GAS steps for the correlation model consist of two separate terms with di erent nonlinearities in the correlation parameter.
Fourth, we use our SE results to establish the consistency and asymptotic normality of the ML estimator.
The remainder of this article is organized as follows. In Section 2, we introduce our model for dynamic bivariate correlations. In Section 3, we state the conditions for the SE su ciency regions. In Section 4, we establish model invertibility as well as the consistency and asymptotic normality of the ML estimator. In Section 5, we determine the SE regions numerically for a number of di erent models and provide an empirical illustration for U.K. and Greek equity indices. We conclude in Section 6. The Appendix gathers the more technical results and derivations.

Score models for correlations
Consider a real-valued bivariate stochastic sequence of observations {y t } t∈N generated by a zero mean elliptical conditional distribution with time-varying correlation matrix R(f t ), where p : R + 0 → R + 0 denotes a real-valued density generator function in the quadratic form y ⊤ t R(f t ) −1 y t , the sequence {f t } t∈N is a real-valued sequence for the time-varying parameter f t , ρ(f t ) ∈ [−δ, δ] with δ ∈ (0, 1] is the dynamic correlation parameter at time t, and R(f t ) is the correlation matrix at time t. For example, if y t is conditionally normal, we have p(x) = (2π ) −1 exp(−x/2). We fully focus the exposition on the correlation case by restricting the variances in (1) to one. Time-varying variances in score driven models have already been dealt with in for example Creal et al. (2011) andHarvey (2013).
The formulation in (1) can also be interpreted as a copula model, see the discussion in Patton (2009). Under the assumptions of stationary marginals and no volatility spillovers, stability conditions for the copula then lead to stability of the whole model. The class of elliptical models is also economically interesting, as it enables an analytic characterization of the resulting portfolio returns and the risk-return trade-o ; see for example Chamberlain (1983), Owen and Rabinovitch (1983), and Hamada and Valdez (2008).
Following Creal et al. (2011Creal et al. ( , 2013, the GAS dynamics for the time-varying parameter f t in (1) take the form with an arbitrary xed initial condition f 1 ∈ F. We de ne the parameter vector θ ∈ as θ = (ω, α, β, λ), where (ω, α, β) ∈ R 3 is a vector of updating parameters, and λ ∈ R n λ allows n λ ≥ 0 density tail shape parameters to be estimated. The time-invariant parameter space is described by ⊆ R 3+n λ . We suppress dependence of the scaling and the score on θ by writing s(f t , y t ) ≡ (f t , y t ; λ), S(f t ) ≡ S(f t ; λ), and q(y t , f t ) ≡ q(y t , f t ; λ). For the case of the bivariate correlation model (1), we obtain Each choice for the scaling function S in (3) gives rise to a new GAS model. An o en used choice of S relates to the local curvature of the score as measured by the information matrix, for example where a is typically taken as 0, 1/2 or 1.
The parameter dynamics in (2) and (3) are intuitive. The time-varying parameter f t is updated in the (scaled) direction of steepest ascent as measured by the scaled conditional log observation density at time t. For example, standard GARCH and BEKK models are special cases of the GAS framework under normality, see Creal et al. (2013). The GAS setup is very general and can also easily be applied outside the correlation context as long as a conditional observation density is available. For other examples, including many new models, we again refer to Creal et al. (2013Creal et al. ( , 2014.

Conditions for stationarity and ergodicity
We follow the approach of Blasques et al. (2014b), who consider a treatment of univariate GAS models. Our SE results build on the stochastic recurrence relations or iterated random functions approach; see Diaconis and Freedman (1999) and Wu and Shao (2004). In particular, we use the su cient conditions of Bougerol (1993) and results in Straumann and Mikosch (2006) to establish, for any xed initial condition f 1 ∈ F, exponentially fast almost sure convergence of the time series {y t , f t (θ , f 1 )} t∈N generated by (1)-(3) to a unique SE solution {y t , f t (θ )} t∈Z .
Let F ⊆ R and Y ⊆ R 2 denote the domains of f t and y t , respectively. We have that ρ : F → (−δ, δ) for 0 < δ ≤ 1 and s : F × Y × → R is almost surely (a.s.) smooth in all its arguments and Lipschitz in f ∈ F. Using the model as speci ed in (1)-(4), we analyze the stochastic properties of y t and f t via the stochastic recurrence equation and {u t } is an independent and identically distributed (i.i.d.) sequence with y t = h(f t )u t . We notice that the dynamics of {f t } in (6) are now written in terms of the innovation sequence {u t } rather than the observed data {y t } by substituting h(f t )u t for y t . As a result, when seen as a function of f , the shape of q(h(f )u t , f ), for every u t , is markedly di erent from that of q(y t , f ), for every y t . This additional dependence on f may either complicate or simplify the nonlinear dependence of f t+1 on f t as embedded in (6). Second, the functional form of (6) is not uniquely de ned. Each square root h(f ) of the correlation matrix R(f ) leads to an observationally equivalent model in y t . The choice of h(f ), however, is not innocuous for determining the size and shape of the SE region, as we see later.
Continuity of φ t in u t for every t can be used to ensure that {φ t } is an i.i.d. sequence of functions. Together with Eq. (6), it then follows directly from Bougerol (1993) and Straumann and Mikosch (2006) that there is a unique SE solution to (1)-(3) if φ t is contracting on average, i.e., if the Lyapunov exponent of the mapping is negative. In particular, we obtain the desired SE result if see Bougerol (1993). In computing the supremum in condition (7), f is treated as a parameter rather than as the random variable f t . For the score driven dynamic correlation model of Section 2, we prove the following result in the Appendix.
We note several features of the result stated in Lemma 1. First, the SE region only depends directly on the parameters α and β, on the functional forms of S(f ) and q(h(f )u t , f ), and on the density of u t . The dependence on the latter enters in two ways, namely through the expectations operator in (8) and through the functional form of k(u t ) in (10). Also note that the expectations operator in (8) does not necessarily require the second moments of u t to exist. Instead, we only require the expectation of |ṗ(u ⊤ t u t )u i,t u j,t | for i, j ∈ {1, 2} to exist. This condition is much weaker, particularly for fat-tailed elliptical densities. For example, it is easily satis ed for the bivariate Cauchy distribution, even though neither the second nor the rst moment exists for this distribution. The continuity and boundedness properties of s can be veri ed immediately for parametric distributional forms, notably for the Student's t density in Section 5.1. 2 Therefore, condition (8) e ectively forms a su cient condition for the SE property of the model. Second, Eq. (8) directly reveals that the correlation case is markedly di erent from the volatility case. For the volatility case, it is shown in Harvey (2013) and Blasques et al. (2014b) that through a clever choice of parameterization h or scaling S the scaled score in recurrence relation (6) can be made independent of f t . The SE condition then reduces to the requirement that |β| < 1. In the volatility case the analogue of the function g(ρ) in (9) is scalar valued. In the correlation case, Eq. (8) shows that through the trivariate nature of the functions g(ρ) andġ(ρ) the contraction condition consists of a number of di erent terms, each with a di erent nonlinear dependence on f . It is impossible to o set all of these simultaneously by a single choice of scaling function or parameterization. This makes the correlation model theoretically more interesting in its own right.
Third, the SE su cient condition in Eq. (8) has an additional degree of exibility provided by the choice of ψ. As follows from the proof of Lemma 1, the function ψ determines which square root h(f ) is used for the correlation matrix R(f ). For the purpose of guaranteeing a proper correlation matrix, de ne ξ(ρ) = arcsin(ρ) − ψ(ρ), and This parametrization gives rise to familiar alternatives for matrix roots. For k ψ = 1/2, we obtain symmetric matrix root of Lemma 2, whereas the choice k ψ = 1 reduces to the Cholesky decomposition with y 1,t = u 1,t and y 2,t = ρ t u 1,t + 1 − ρ 2 t u 2,t . Any choice of ψ and thus of h results in an observationally equivalent model for y t . The dynamic properties of {f t } following from (7), however, depend on the precise ψ that is chosen. We therefore obtain a su cient condition for SE if (7) is satis ed for some choice of ψ ∈ satisfying the conditions formulated in Lemma 1. This yields the additional in mum in condition (8). A similar complication is absent in the volatility case; compare Blasques et al. (2014b) and Harvey (2013).
Fourth, condition (8) simpli es for particular choices of parameterizations and scale functions. For example, if we use the familiar Fisher transformation ρ(f ) = tanh(f t ), we haveρ(f t ) = 1−ρ(f t ) 2 and the entire middle term in (8) vanishes. For this particular parameterization and xing the scaling function to S(f ) ≡ 1, we can even provide further analytical results relating to the optimal choice of the function ψ. Using a Jensen, triangle, and Cauchy-Schwarz inequality, we obtain a stricter su cient condition for SE from (8) as inf where · denotes the standard Euclidean norm. Instead of the Cauchy-Schwarz inequality, we can also use a second triangle inequality to obtain the alternative stricter su cient condition whereġ i and k i are the ith elements ofġ and k, respectively. Using either of the more stringent SE conditions (12) or (13), we obtain the following result.

Lemma 2.
Under the assumptions stated in Lemma 1, setting ψ(ρ) = k ψ arcsin(ρ) with k ψ = 1/2 reaches the functional lower bound for the su cient condition stated in either Eq. (12) or (13). The condition then reduces to |β| + |α| E k(u t ) < 1 for condition (12) and |β| + |α| E|k 1 (u t )| < 1 for condition (13), respectively, where k 1 (u t ) is the rst element of k(u t ). The link function becomes the symmetric matrix root The result in Lemma 2 shows that we uniformly obtain the largest SE region for the stricter conditions (12) or (13) for the symmetric matrix root h in (6). The choice of h in setting up the dynamic Eq. (6) is thus far from innocuous and directly in uences the size and shape of the SE region.

Asymptotic properties of ML estimator
In this section we establish the invertibility of the GAS model, as well as the consistency and asymptotic normality of the maximum likelihood estimator (MLE) for the static parameters θ .
Model invertibility is a critical element in the proof of consistency and asymptotic normality of the MLE since the lter (2) enters the likelihood function and thus must be ensured to have appropriate stochastic properties. Similarly to Straumann and Mikosch (2006), this section uses the contraction condition of Bougerol (1993) in order to ensure model invertibility and bounded moments for the ltering sequence. This is crucial for the asymptotic properties of the MLE since the initialized timevarying parameter and its derivatives enter the likelihood function and its derivatives. The following result builds on the SE nature of the data {y t } t∈Z which follows easily from the SE nature of the true time-varying parameter {f t } t∈Z established in the previous section.
Lemma 3 (Model Invertibility). Let be compact, let {y t } t∈Z generated by (1)-(3) be SE, let the scaled score s be smooth in all arguments and Lipschitz in f ∈ F, and assume that there exists a nonrandom f 1 such that the following statements hold: Here, log + x ≡ max(log x, 0) for x ∈ R + and q(f t , y t ) is the score expression in (4). Then the GAS recursion {f t (θ , f 1 )} t∈N de ned in (2) converges e.a.s. to a unique limit SE process {f t (θ )} t∈Z that admits the representation f t (θ ) = (y t−1 , y t−2 , ...) for every t and some measurable function .
Note that the contraction condition (ii) in Lemma 3 is di erent from the one studied in Lemma 1 since it refers to the ltering equation that takes the data y t as given. By contrast, the contraction property in Lemma 1 looks at the GAS model as a data generating process, and hence de nes the data y t in terms of the true unknown parameter f t and the innovations u t . The invertibility condition above is in particular required in order to make dependence on the xed initial value f 1 vanish in the GAS recursion and therefore in the objective function. To make this transparent, let ℓ t (θ , f 1 ) denote the time t log-likelihood contribution for the vector of static parameters θ , and De ne L T (θ ) and ℓ t (θ ) similar to (14), but with the limiting process f t (θ ) replacing f t (θ , f 1 ). Theorem 1 establishes the strong consistency of the MLE assuming the identi cation of the true parameter vector θ 0 ∈ . The strong consistency result holds for every initialization of the lter satisfying the conditions of Lemma 3.
Theorem 2 (Asymptotic Normality). Let the conditions of Theorem 1 hold, and let θ 0 be a point in the interior of . Furthermore, let the rst and second derivatives of the log likelihood contributions ℓ t (θ ) be asymptotically SE and satisfy E|l t (θ 0 )| 2 < ∞ and E sup θ∈ |l t (θ )| < ∞. Then, for every f 1 ∈ F, the ML estimatorθ T (f 1 ) satis es The theoretical results in the previous two theorems are supported by unreported simulation experiments. We nd that increasing the sample size brings the ML estimates over repeated Monte Carlo replications closer to their true values in a controlled setting. Moreover, we also nd that the distribution of the estimator approaches the normal distribution for increasing sample sizes.
Theorems 1 and 2 rely on high-level conditions and are applicable to the generic setting. For particular models, however, the statements can be further particularized into more transparent low-level conditions. Corollaries 1 and 2 provide such primitive conditions for the consistency and asymptotic normality of the MLE for the parameters of two bivariate models based on the normal and Student's t distribution, respectively. We use unit scaling and a parameterization given by ρ t = δ · tanh(f t ). This automatically bounds the correlation to the interval (−δ, δ) to render the correlation matrix positive de nite. The practitioner can choose a value of δ arbitrarily close to unity to obtain the contraction conditions in Lemmas 1-3. Proofs of these corollaries can be found in Section D of the Supplementary Appendix.

Corollary 1.
There exists a compact parameter space satisfying 0 < δ < 1 and α = 0, where the MLE for the parameters of the bivariate Gaussian model with δ · tanh link function and unit scaling is consistent and asymptotically normal.

Corollary 2.
There exists a compact parameter space satisfying 0 < δ < 1, α = 0, and 2 < ν, where the MLE for the parameters of the bivariate Student's t model with δ · tanh link function and unit scaling is consistent and asymptotically normal.
The implied parameter space , constrained by the above conditions, depends not only on α and β, but also on distributional and parametrization choices. The shapes and sizes of the contraction regions are further analyzed in Section 5. Further analytical and numerical details are developed in Sections C and D in the Supplemental Appendix.

Numerical results
Alternative choices for the error density generator p, the scaling function S, the parameterization ρ, and the matrix square root h give rise to di erent models with di erent SE regions. For a number of these choices, we check for every pair (α, β) whether the su cient condition (8) is satis ed. We plot the SE region in the (α, β)-plane by numerically identifying, for every xed β, the corresponding maximum and minimum values of α that satisfy (8).
To x ideas, consider the class of Student's t densities for u t as in Creal et al. (2011). The Fisher transformation ρ(f t ) = tanh(f t ) ensures proper value for the correlation parameter. As indicated in Section 3, this also simpli es the evaluation of the SE condition in Lemma 1. For the scaling function S, we adopt the three choices based on the information matrix as presented in Eq. (5).
Next, we investigate the sensitivity of the SE region to the choice of matrix root h(·). For this, we consider two prominent alternatives, both described by ψ(ρ) = k ψ arcsin(ρ) for k ψ ∈ R. The rst alternative is the symmetric matrix root of Lemma 2 with k ψ = 1/2. The second is the familiar (lower triangular) Cholesky decomposition, which is obtained by setting k ψ = 1.
To numerically evaluate the su cient SE condition (8), we need to solve an optimization problem within an integration procedure. As the state equation is univariate, the integral can be evaluated via a quadrature rule. We can gain further numerical e ciency for the inner optimization problem by storing maximum and minimum values of S(f ) q(h(f )u t , f ) for each point u t and recycling these for evaluation at di erent points in the (α, β)-plane. Local optima are avoided by evaluating the function over a wide grid and by noting that for the Student's t distribution (∂/∂f ) i s(f , y t ) → 0 as |f | → ∞ for all i > 1. We can further halve the computation time by noting that in our setting ∂φ t (f ; θ )/∂f = ∂φ t (f ; −θ )/∂f .
In panel (a) of Fig. 1, we present the results for the normal distribution and the symmetric root h(f ), i.e., k ψ = 1/2. The gure contains three di erent regions, each one corresponding to a di erent form of scaling in Eq. (5). Points inside each region are combinations of (α, β) for which the su cient condition (8) is met. The shape of the su cient SE region is antisymmetric around the origin due to the absolute signs in (8), such that we only plot its upper half. The region also shows a nonmonotonic curvature, particularly in the second quadrant. This feature is due to the use of absolute values, the change in the location of the supremum in (8) in the second quadrant, and a shi in the relevant region of integration if the derivative of S(f )q(h(f )u t , f ) changes sign.
An interesting feature in Fig. 1 is the behavior of the region for square root inverse information matrix scaling, a = 1/2 in (5). First note that a = 1/2 has the property that the update via s(f t , y t ) is invariant with respect to reparametrizations of f t . Furthermore, under correct speci cation the steps S(f )q(y t , f ) in (4) are by construction martingale di erences with unit variance; see also Creal et al. (2013). This implies that {f t } t∈N converges to a covariance stationary process as long as |β| < 1. The region in Fig. 1 shows that |β| < 1 is necessary, but not su cient for (8) to be satis ed. This relates directly to discussions in the GARCH literature, where in the univariate setting covariance stationarity is a more restrictive condition than strict stationarity, but the relation between the two remains an open question in a multivariate context; see for example Boussama et al. (2011).
Panel (b) in Fig. 1 shows the di erent SE regions for a di erent choice of matrix root h(f ), namely the Cholesky decomposition. It is clear that the su ciency regions in the (α, β)-plane are smaller than the corresponding regions for the symmetric root. As models constructed with a symmetric root and a Cholesky root are observationally equivalent, we can take the larger regions as su cient regions for SE to hold; see also Lemma 2. The di erences make clear that the choice of matrix root is important for determining the size of the region either analytically or numerically.
We provide more SE regions in the Supplemental Appendix, including regions based on the further inequalities used to establish Lemma 2. In particular, we show that the SE regions for the Student's t distribution under square root information matrix scaling (a = 1/2) are smaller for fatter tails if the Cholesky decomposition is used (k ψ = 1), while the converse holds if we use the symmetric root decomposition (k ψ = 1/2). In Section 5.2, we document how this wedge may also become empirically relevant.

Empirical illustration
In this section, we study the time-varying correlation between the London and Athens stock exchange indexes. We take daily returns of the Financial Times Stock Exchange (FTSE) 100 and the Athex Composite over the period January 1, 2002 to March 2, 2013. We are particularly interested in whether there are indications that the correlation between these two markets changed over the course of the European sovereign debt crisis.
To focus on the correlation part of the model, we rst lter both series using AR-GARCH type models; see also Patton (2009). The mean of both series is modeled by a rst order autoregressive progress. We nd a strong leverage e ects in both series and capture these by a GJR(1,2) model of Glosten et al. (1993) for the FTSE index, and an EGARCH(1,2) speci cation of Nelson (1991) for the Athens index, respectively. 3 We test the null hypothesis of constant and zero residual correlation against time-varying alternatives using a Nyblom test of the form where B(z) and B b (z) = B(z) − zB(1) denote a standard Brownian motion and a standard Brownian bridge, respectively. The average correlation is estimated byρ = T −1 T t=1 x 1t x 2t andσ 2 is a heteroskedasticity and autocorrelation consistent estimator of the long-run variance of (x 1t x 2t −ρ), withσ 2 0 de ned similarly whenρ is set to 0. 4 By letting x t denote the univariate volatility-ltered series, i.e., x t := y t in the notation of our article, we nd strong evidence for time-varying correlations. We therefore use the ltered univariate series to estimate the GAS model from Section 2 with a time varying correlation coe cient. The Nyblom test can further be used as a diagnostic tool for remaining timevariation in dynamic correlations when applied to estimated residuals x t :=û t = h −1( f t )y t . The results are shown in Fig. 2 and Table 1.
that correlations exhibit clear signs of time variation. Correlations lie around 0.4 up to about 2006, then increase to about 0.6, and come down substantially to around 0.2 during the sovereign debt crisis. On top of this slow variation, there are also substantial dynamic patters at higher frequencies.
The possibly lower correlations between the U.K. and Greek stock indices are interesting economically, particularly given the stable correlation pattern between the two series during almost the whole of the prelude, height, and a ermath of the preceding nancial crisis (2007)(2008)(2009). The nancial crisis, apparently, did not substantially alter the real economic linkages between the two economies as evidenced by the stable dynamic of the correlations between between the two stock markets. It is only a er the announcement of the record Greek de cit late 2009 and the subsequent actions that gave rise to the European sovereign debt crisis, that the link between the euro-denominated Greek stock market and the sterling-denominated FTSE is broken. The correlations remain at low levels even a er the nonstandard monetary policy actions by the European Central Bank. Table 1 provides the parameter estimates of the GAS models. We provide a benchmark by estimating a simple exponentially weighted moving average (EWMA) scheme for the correlation based on the recursion ρ t = tanh(f t ) and f t+1 = ω + βf t + (1 − β)y 1t y 2t , see also the Gaussian dynamic copula speci cation of Patton (2006).
We see that the GAS model is empirically useful both in terms of in-sample likelihood t and improving the diagnostics for time-varying correlation. All models indicate that the correlation parameter is highly persistent: the estimated values of β all lie very close to 1. The scaling function for the score only has a mild e ect on the model's t: the likelihood values are similar for a = 0, 1/2, 1. The degrees of freedom parameter λ is estimated around 9. This is substantially fatter-tailed than the normal, but also substantially lighter-tailed than the Student's t distribution with the degrees of freedom xed at 5. The di erences in likelihood values, as well as Akaike (AIC) and Bayes (BIC) information criteria indicate that estimating the degrees of freedom improves the t of the model substantially. Furthermore, the estimation of the tail parameter λ also improves the t in terms of model diagnostics: only the GAS model with estimated degrees of freedom passes the Nyblom tests for remaining time-variation in the residual cross-correlations.
We plot the SE region boundaries for the Cholesky and the symmetric root decomposition in panel (b) of Fig. 2. The estimated values of α and β for the t(λ)-GAS speci cation are indicated by the cross mark. The results clearly con rm the importance of the choice of ψ in verifying the SE properties. For the symmetric root-based region, we obtain stationarity and ergodicity at the estimated parameter values. For the Cholesky decomposition, by contrast, we fail to obtain this result. As condition (8) takes the in mum over ψ and thus the widest region in panel (b) over all admissible decompositions h(f t ), the Cholesky decomposition is in this setting less powerful to discriminate SE from non-SE regions of the parameter space. We stress again that all of these regions are only based on su cient conditions, and that the actual regions may be wider.
For all models considered, the bottom lines in Table 1 indicate whether the estimated parameters lie inside the SE region. For the Gaussian models with unit (a = 0) and inverse square root information matrix scaling (a = 1/2) the choice of matrix decomposition does not have an impact. Both the symmetric root (k ψ = 1/2 in (11)) and Cholesky root (k ψ = 1 in (11)) indicate that the estimated parameters are inside the SE region and satisfy the average contraction condition. For inverse information matrix scaling, however, and for the Student's t based models, we nd a similar di erence as in panel (b) of Fig. 2: we cannot ensure SE properties based on the Cholesky decomposition, whereas we can do so for the symmetric root. This again highlights that the use of di erent constructive devices such as di erent matrix decompositions is empirically relevant for the veri cation of su cient SE conditions in a multivariate setting.

Concluding remarks
We have derived su cient regions for stationarity and ergodicity for a new class of score driven dynamic correlation models. The regions exhibit a highly nonstandard shape. Moreover, we have shown that the shape and size of the SE regions depends on the type of matrix root that is chosen in checking the su cient conditions of Bougerol (1993). Furthermore, we have seen how the stationarity conditions can be used in establishing results for consistency and asymptotic normality of the maximum likelihood estimator. The numerical stability conditions were supported by an empirical investigation of the time varying correlation between U.K. and Greek stock markets. We found a substantial drop in the linkages between the sterling-denominated U.K. market and the euro denominated Greek market over the course of the European sovereign debt crisis. Such a break in dependence between markets, however, was absent during the preceding global nancial crisis.
Proof of Lemma 3. The proof follows immediately from Theorem 2.8 in Straumann and Mikosch (2006) by noting that our conditions (i) and (ii) imply conditions S.1 and S.2 in their theorem.
Proof of Theorem 1. We follow Blasques et al. (2014a) and appeal to Theorem 3.4 of White (1994), and obtainθ T (f 1 ) a.s. → θ 0 from the uniform convergence of the criterion function and the identi able uniqueness of the maximizer θ 0 ∈ de ned, e.g., in White (1994).
Existence. Note that L T (θ , f 1 ) is a.s. continuous in θ ∈ if each likelihood contribution is. This is obtained by the smoothness of the scaled score s : F × Y × → R and the resulting continuity of f t in θ as a composition of t continuous maps. Due to the compactness of , by Weierstraß theorem, the arg max set of the likelihood is nonempty a.s., and henceθ T exists.
→ 0 ∀f 1 ∈ F. It thus follows → 0 as t → ∞ by application of the continuous mapping theorem (see also Theorem 2.3[i] in van der Vaart, 2000) for ℓ : The second term in (A7) vanishes by an application of the ergodic theorem for separable Banach spaces (Theorem 2.7 in Straumann and Mikosch, 2006) to the sequence {L T (·)} with elements taking values in C( ), so that sup θ∈ |L T (θ ) − L ∞ (θ )| a.s. → 0 as T → ∞. This is obtained under the moment assumption E sup θ∈ |ℓ t (θ )| < ∞, by the SE nature of the sequence {ℓ t } t∈Z , which is implied by continuity of ℓ on the SE sequence {(f t (y t−1 , ·), y t )} t∈Z , which is SE using Lemmas 1 and 3 and Proposition 4.3 in Krengel (1985).
Proof of Theorem 2. We make use of the asymptotic normality conditions found, e.g., in Theorem 6.2 of White (1994). These conditions are as follows: (i) the strong consistency ofθ T a.s. → θ 0 ∈ int( ); (ii) the a.s. twice continuous di erentiability of L T (θ , f 1 ) in θ ∈ ; (iii) the asymptotic normality of the score √ Tl t θ 0 , f where {f  (A8) and (v) the nonsingularity ofL ∞ (θ ) = EL T (θ ) = I(θ ). Weak Convergence of the Score. The score sequence {l t (θ , f 1 )} depends not only on the data {y t } and the initialized process {f t (θ , f 1 )} but also on the derivative processes {ḟ t (θ , f 1 )} ≡ {∂f t (θ , f 1 )/∂θ }. As such, the limit SE nature of the score and its smoothness properties imply thatl t (θ , f 1 ) = ℓ y t , f t (θ , f 1 ),ḟ t (θ ,ḟ 1 ) is a continuous function of the limit SE process y t , f t (θ , f 1 ),ḟ t (θ ,ḟ 1 ) and thus SE by Theorem 36.4 in Billingsley (1995). Note that the data {y t } is SE under the conditions of Lemma 1, and the process {f t (θ , f 1 )} and its derivative {ḟ t (θ ,ḟ 1 )} both converge e.a.s. to an SE limit under the conditions of Lemma 3 since it is easy to show that the contraction condition in (ii) of Lemma 3 for {f t (θ , f 1 )} is also the relevant contraction condition for any derivative process {f (i) t (θ , f (i) 1 )} of any order; see Blasques et al. (2014a).
The remainder of the proof now follows along similar lines as in Blasques et al. (2014a, Theorem 4). As a continuous function of the SE process {y t , f t (θ ),ḟ t (θ )}, the score sequence {l t (θ )} is also SE, and we can apply the central limit theorem (CLT) for SE martingales in Billingsley (1961) As a result, we can also conclude by Theorem 18.10[iv] in van der Vaart (2000) that √ TL T θ 0 , f since the exponential rate in (A10) implies that √ T L T θ 0 , f (0:1) 1 ) −L T θ 0 ) a.s.
Uniform Convergence of Second Derivatives. We use the triangle inequality to write The rst term vanishes a.s. with T → ∞ by application of a continuous mapping theorem because the maintained smoothness assumptions ensure thatL T (·, f 1 ) is continuous in its arguments {(y t , f Finally, the nonsingularity of the limitL ∞ (θ ) = El t (θ ) = I(θ ) in (v) is implied by the uniqueness of θ 0 as a maximizer ofL ∞ (θ ) in .