Empirical likelihood inference in autoregressive models with time-varying variances

This paper develops the empirical likelihood ( ) inference procedure for parameters in autoregressive models with the error variances scaled by an unknown nonparametric time-varying function. Compared with existing methods based on non-parametric and semi-parametric estimation, the proposed test statistic avoids estimating the variance function, while maintaining the asymptotic chi-square distribution under the null. Simulation studies demonstrate that the proposed procedure (a) is more stable, i.e., depending less on the change points in the error variances, and (b) gets closer to the desired confidence level, than the traditional test statistic.


Introduction
In the literature of the macroeconomics and financial applications, the assumption of heteroscedasticity in many time series models revealed the facts that ignoring the issue of heteroscedasticity often leads to the inefficient estimation and unreliable inference. Thus, heteroscedasticity has been focused mainly on the effect of violations of homoscedasticity, usually in two forms, 'conditional heteroscedasticity' and 'unconditional heteroscedasticity'.
Non-constant volatility will be identified by 'conditional heteroscedasticity', when future periods of high and low volatility cannot be identified. Bollerslev (1986) and Engle (1982) proposed ARCH or GARCH models and provided the efficient estimation of the mean function by quasi-maximum likelihood based on other adaptive procedures. More complicated GARCH models had been proposed to allow for conditional heteroscedasticity, for instance, varying coefficient GARCH models (see Polzehl & Spokoiny, 2006) and spline GARCH models (see Engle & Rangel, 2008). The time-varying volatility is often used to describe the conditional heteroscedasticity. Drees and Starica (2002) and Starica (2003) made use of a non-stationary framework to analyse time series of S&P 500 returns, and found that this approach outperformed the GARCHtype models.
'Unconditional heteroscedasticity' will be used, when variables that have identifiable seasonal variability, such as electricity usage, are discussed. Hansen (1995) considered the linear regression model with deterministically trending regressors only, in which the error is an AR(p) process scaled by a continuous function of time. Nesting autoregressive model is also a special case when the conditional error variance of the model is a function of a covariate that has a form of a nearly integrated stochastic process with no deterministic drift. For the constant coefficient autoregressive model with time-varying variances (ARTV) which will be discussed in this article, Phillips and Xu (2006) utilised the ordinary least squares method and the nonparametric estimation of the variance function to provide three heteroscedasticity-robust test statistics, and proved their asymptotic standard normal distributions. Xu and Phillips (2008) proposed the heteroscedasticityrobust adaptive estimation for ARTV. Meanwhile, performances of methods in Phillips and Xu (2006) and Xu and Phillips (2008) relied on appropriately selecting the bandwidth used in the non-parametric function estimation.
Motivated from the 'empirical likelihood' (EL) approach, this article aims to develop a test statistic which is more stable, namely, depending less on the change points in the error variances, and avoiding the problem of selecting the bandwidth. In the literature, the EL approach was introduced by Owen (1988), Owen (1990) and Owen (1991) to construct confidence intervals in a nonparametric setting, which can be seen in Owen (2001). Since an EL approach possesses nonparametric properties, the distribution for the data is not required to be specified, and meanwhile more efficient estimates of the parameters can be yielded. The EL approach allows data to decide the shape of confidence regions without estimating the variance of the test statistic, and also is Bartlett correctable in DiCiccio et al. (1991). The EL approach has been applied to various situations, such as generalised linear models in Kolaczyk (1994), local linear smoother in Chen and Qin (2000), partially linear models in Shi and Lau (2000), parametric and semi-parametric models in multi response regression in Chen and Ingrid (2009); linear regression with censored data in Zhou and Li (2008), plug-in estimates of nuisance parameters in estimating equations in the context of survival analysis in Li and Wang (2003) and Qin and Jing (2001), heteroscedastic partially linear models in Lu (2009); GARCH models in Chan and Ling (2006); variable selection in Han et al. (2013) and Variyath and Chen (2010); analysis of longitudinal data in Qiu and Wu (2015). Qin and Lawless (1994) linked the EL with finitely many estimating equations, which served as finitely many equality constraints. To the best of our knowledge, there is no existing published work in the literature using the EL approach in the constant coefficient autoregressive models with timevarying variances. This article will also consider the constant coefficient autoregressive models with timevarying innovation variance by using the EL approach.
The remainder of the paper proceeds as follows. Section 2 describes the autoregressive model with timevarying variances and discusses main assumptions. Section 3 reviews the existing methods. Section 4 develops the empirical likelihood inference procedure with theoretical guarantees. Section 5 conducts simulation studies to evaluate the finite sample performance of the proposed method when compared with alternative methods. Section 6 briefly concludes. Technical details and proofs of the main results are relegated to Appendix.

Autoregressive model with time-varying variances
The constant coefficient autoregressive model with time-varying variances is described as follows, where denotes transpose, X t−1 = (1, Y t−1 , . . . , Y t−p ) ∈ R p+1 is the vector of covariates, and β o = (β 0 , β 1 , . . . , β p ) ∈ R p+1 is the true parameter vector of interest, with β p = 0, and the lag order p finite and known. We assume that {σ t } is a deterministic sequence of time t, satisfying and {ε t } is a martingale difference sequence with respect to F t , where F t = σ (ε s : s ≤ t) is the σ -field generated by {ε s : s ≤ t} with E(ε 2 t | F t−1 ) = 1, a.s., for all t. Thus, the conditional variance of {u t } is fully characterised by the multiplicative factor σ t in (2), i.e., Suppose that the data are generated from models (1)-(2), and we observe a sample containing T + p observations, denoted by The main goals are to make inferences about the true parameter vector β o in models (1)-(2), i.e., testing the null hypothesis, , and constructing a confidence region for β o . Section 4 will present our proposed empirical likelihood inference, after Section 3 describes the estimation methods in Phillips and Xu (2006).
To facilitate the discussion of main results and comparison with related existing methods, the following conditions provided in Phillips and Xu (2006); Xu and Phillips (2008) are considered.
Conditions (3) and (4) is a measurable and strictly positive function on the interval (0, 1] such that 0 < inf r∈(0,1] g(r) ≤ sup r∈(0,1] g(r) < ∞, and g(r) satisfies a Lipschitz condition except at a finite number of points of discontinuity; (A2) Suppose that L is the lag operator. Then 0 = 1 − β 1 L − β 2 L 2 − · · · − β p L p has all roots outside the unit circle; Remark 2.1: (i) In condition (A1), the function g is integrable on the interval (0, 1] to any finite order. For brevity, we write 1 0 g m (x) dx as g m for any finite positive integer m ≥ 1. (ii) Condition (A2) satisfies the stability conditions which, for a constant g(·) and homoskedastic {ε t }, would ensure that {Y t } is stationary or asymptotically covariance-stationary. Under condition (A2), the mean μ of Y t is given by and Y t has the Wold representation, difference sequence and, at the same time, stipulates E(u 2 t | F t−1 ) = g 2 (t/T) doesn't depend on the past events, in other words, models (1)-(2) are unconditional heteroscedastic.

Existing methods
Regarding the estimation of β o in models (1)-(2), Phillips and Xu (2006) reviewed the ordinary least squares (OLS) estimatorβ, and showed that under the stated conditions, as T → ∞, where D → stands for converges in distribution, = −1 1 2 −1 1 , 1 and 2 are defined as the (p + 1) × (p + 1) matrices, l p = (1, . . . , 1) ∈ R p is a vector of ones, and μ and are as defined in Remark 2.1. Since g is typically unknown, the asymptotic covariance matrix in (6) must be estimated and this can be done in several ways. First, by applying the weighted sum of squared OLS residuals using kernel smoothing, originally proposed by Nadaraya (1964) and Watson (1964) for estimation of regression functions, they proposed the consistent estimator of the function g 2 (r) non-parametrically for r ∈ [0, 1], is the OLS residual and the weights w r,t , t = 1, . . . , T, are defined as where the kernel function for some constant C 1 and C 2 ; h T is a bandwidth parameter depending on T. The selection of bandwidth parameter h T uses the cross-validation procedure, i.e., minimises the averaged squared prediction errors (see Wong, 1983), with respect to b, wherê Phillips and Xu (2006) suggested the following three consistent estimators of the asymptotic covariance matrix when g is unknown.
• The first estimator of the asymptotic covariance matrix iŝ • The second estimator of the asymptotic covariance matrix iŝ where the matrixˆ 1 is defined aŝ whereμ andˆ correspond to replacing β o , in the expressions of μ and in Remark 2.1, withβ. • The third estimator of the asymptotic covariance matrix isˆ where the matrixˆ 2 is defined as Based on the above three estimatorsˆ j of the true covariance matrix , Phillips and Xu (2006) constructed three test statistics t j , j = 1, 2, 3, for the true parameter vector β o , stated as follows.

Proposed method
In terms of the practical performance of the three tests t j in (14), however, simulation results reveal two major issues arising from the estimation of the asymptotic covariance matrix and the selection of the bandwidth. In order to solve these problems, the proposed empirical likelihood approach will be applied to test parameters in models (1)-(2).
To construct an empirical likelihood function, the estimation equations will be defined by means of, for a generic model parameter b ∈ R p+1 . According to condition (A3), we have that holds for the true parameter vector β o . Based on (16), we define the empirical likelihood for the parameter b by By using the Lagrange multiplier, we havê We also note that T t=1 q t , subject to constraints q t ≥ 0 and T t=1 q t = 1, attains its maximum (1/T) T at q t = 1/T. Thus, the empirical likelihood ratio at b is defined by Taking the log transformation of the above equation, we get the corresponding empirical log-likelihood ratio, In addition, Theorem 4.1 below provides the asymptotic null distribution of (β o ).
Then, under the null hypothesis (5), the limiting distribution of (β o ) is the chi-square distribution with p degrees of freedom, i.e., According to Theorem 4.1, the empirical likelihood ratio confidence interval for the true value β o can be constructed as follows: where χ 2 p; 1−α is defined below (15). Combined with (20), Theorem 4.1 implies Corollary 4.1.

Simulation evaluation
In this section, simulation studies are conducted to compare the finite sample performance of five methods described in Sections 3-4: Ordinary least squares without the heteroscedasticity correction (OLS), t 1 , t 2 , t 3 , the proposed empirical likelihood (EL) procedure.
The zero-mean AR(1) with the time-varying variance is considered as follows: ∼ N(0, 1). The kernel function K(·) is the standard Normal density function, and the bandwidth parameter is selected by the crossvalidation criterion (10). We consider H 0 : β 0,1 = β 1 with known values of β 1 .
Three kinds of the variance functions g 2 (r) are considered in the following simulations: a single abrupt point model, two abrupt points model, continuous function variance model as follows.
Model 1 and Model 3 are the same as in Cavaliere (2004), Cavaliere and Taylor (2007) and Phillips and Xu (2006). Simulations are done when the parameter of interest β 1 increases on the set {0.1, 0.5, 0.9}, and the nominal size is 5%. The sample size T is from {60, 200} respectively. The number of Monte Carlo replications is 5000.
Simulation results include two parts. The first part displayed in Tables 1, 2 and 3 assesses the rejection rates of five methods under the null hypothesis.
The second part includes Figures 1-3 to evaluate the rejection rates of methods OLS, t 1 , t 2 , t 3 and EL as the parameter β 1 under the alternatives increases.
From these simulations, we draw the following conclusions.
(a) First, the OLS-based test is the inefficient and unreliable test under the heteroscedastic innovations. From Table 1, the OLS-based test overrejects overwhelmingly the null hypothesis when the null is true, and has the largest distorted size under (κ, δ) ∈ {(0.1, 0.2), (0.9, 5)}. In addition, the distorted size doesn't reduce except for the homoscedastic innovations with the increasing sample size which is also shown in Figures  1 and 3. From Table 2, the OLS-based test has better performance than Table 1, however, the rejection rate reduces as the sample size increases. The results of the OLS-based test in Table 3 are similar to those in Table 1. (b) Second, the performance of t 2 and t 3 depends on the numerical value of the true parameter and the pattern of the variance function. From Figures 1, 2, 3, an interesting phenomenon can be found that the rejection rates of t 2 and t 3 are Table 3. Comparison of the rejection rates of five methods in Model 3 for β 1 ∈ {0.1, 0.5, 0.9}, m ∈ {1, 2}, δ ∈ {0.2, 5} and the sample size t ∈ {60, 200}, based on 5000 replications.   likely to be an increasing function of the parameter and grow bigger as β 1 > 0.5. The rejection rate of t 2 is far greater than the nominal size 5% when the numerical value of the parameter is close to unity, namely β 1 = 0.9. In particular, it is easy to see that t 2 and t 3 overaccept the null hypothesis when the parameter is less than or equal to 5% when β 1 < 0.5. On the contrary, t 2 and t 3 overreject the null hypothesis when β 1 > 0.9. It also has the similar conclusions from Tables 1-3. So both t 2 and t 3 aren't the stable test for the ARTV model. (c) Third, both EL and t 1 are the stable tests for the ARTV model and EL outperforms t 1 . From Tables 1-3, we can find that EL and t 1 overreject the null hypothesis when the null is true. From  Figures 1-3, the rejection rate of EL is almost a horizontal line and is closer to the nominal level 5% than t 1 except Figure 1(a) when the sample size is 60. When the sample size is 200, EL's rejection rate is nearly a nominal size of 5% and doesn't depend on the numerical value of the parameters even if the true value of β is close to unity (β 1 = 0.9). EL has the smallest size distortion overall and avoids correcting the variance. The simulation results generally support the asymptotic results. EL is more stable and has better performance than OLS, t 1 , t 2 , t 3 for testing the parameters of ARTV. So EL seems to be the better choice.

Conclusion
This article focuses on the empirical likelihood appro ach for autoregressive models with error terms scaled by an unknown nonparametric time-varying function.
The empirical likelihood ratio test statistic avoids estimating the unknown variance function, in the presence of heteroscedastic error terms. The results of simulations of three different models show that the empirical likelihood is more stable than the other four test statistics. In addition, some extensions include improving the efficiency of statistic based on the different equations, and locating the abrupt time points when they exist.
where P → denotes converges in probability.

Lemma
By (A1) of Lemma 3.1, According to conditions (A1) and (A4), we have E(|Y t | 4ν ) < ∞ for some ν > 1, and then From (A2) of Lemma A.1 and a similar argument used in Owen (1991), the proof of Lemma A.2 is completed.
Proof: Noticing that if β o is the true parameters, applying Taylor's expansion to (18), we have where r T , in probability, satisfies the following inequality in light of Lemma A.1 (A2) and Lemma A.2 for some constant C > 0, By (17), we obtain By (A5) and (A6), we obtain Again by (17), we obtain By Lemma A.1 and (A3), we have Thus, we havê By substitutingλ of the above equation into (A4) and (A7), we have The proof of Theorem 4.1 is completed by using Lemma A.1.