Improved tests for stock return predictability

Abstract– Predictive regression methods are widely used to examine the predictability of (excess) stock returns by lagged financial variables characterized by unknown degrees of persistence and endogeneity. We develop a new hybrid test for predictability in these circumstances based on simple regression t-statistics. Where the predictor is endogenous, the optimal, but infeasible, test for predictability is based on the t-statistic on the lagged predictor in the basic predictive regression augmented with the current period innovation driving the predictor. We propose a feasible version of this augmented test, designed for the case where the predictor is an endogenous near-unit root process, using a GLS-based estimate of the innovation used in the infeasible test regression. The limiting null distribution of this statistic depends on both the endogeneity correlation parameter and the local-to-unity parameter characterizing the predictor. A method for obtaining asymptotic critical values is discussed and response surfaces are provided. We compare the asymptotic power properties of the feasible augmented test with those of a (non augmented) t-test recently considered in Harvey et al. and show that the augmented test is more powerful in the strongly persistent predictor case. We then propose using a weighted combination of the augmented statistic and the t-statistic of Harvey et al., where the weights are obtained using the p-values from a unit root test on the predictor. We find this can further improve asymptotic power in cases where the predictor has persistence at or close to that of a unit root process. Our final hybrid testing procedure then embeds the weighted statistic within a switching-based procedure which makes use of a standard predictive regression t-test, compared with standard normal critical values, when there is evidence for the predictor being weakly persistent. Monte Carlo simulations suggest that overall our new hybrid test displays superior finite sample performance to comparable extant tests.


Introduction
Many studies in the applied economics and finance literature have focused on testing for the predictability of asset returns, employing a range of candidate predictor variables, such as valuation ratios, interest rates, and other financial and macroeconomic variables. By way of examples, Fama (1981) considers various predictors including interest rates, industrial production, GNP, and capital stock and expenditure, while Campbell and Yogo (2006) consider the dividend-price ratio, the earnings-price ratio, the three-month T-bill rate, and the long-short yield spread. Standard approaches to testing predictability are based on a simple linear regression model with a constant and lagged putative predictor (x t−1 say), with a corresponding regression coefficient β.
In empirical studies, it is commonly found that the candidate predictor variables are highly persistent (with either unit root or near unit root autoregressive processes) and also endogenous with a non zero (often strongly negative) correlation between the errors in the predictive regression and the innovations driving the predictor process; see, inter alia, Campbell and Yogo (2006), Goyal and Welch (2003), and Welch and Goyal (2008). In the presence of strong persistence and endogeneity, Cavanagh et al. (1995) show that the standard t-test on the estimate of β suffers from severe size distortions; see also Campbell and Yogo (2006), Nelson and Kim (1993), and Stambaugh (1999). This finding has motivated the development of tests for predictability that are designed to allow for both endogeneity and strong persistence in the predictor series x t , modeled by a first-order autoregression with a local-to-unity coefficient φ = 1 − cT −1 (where c is an unknown finite constant and T is the sample size).
As a result, a number of likelihood-based predictability tests have been developed in the literature which are designed to be asymptotically valid when the predictor is strongly persistent and endogenous; see, inter alia, Cavanagh et al. (1995), Lewellen (2004), Campbell and Yogo (2006), Elliott et al. (2015) [EMW, hereafter], and Jansson and Moreira (2006), and most recently a hybrid test, based around a number of simple regression t-ratios, developed in Harvey et al. (2021) [HLT hereafter]. Arguably the most widely applied of these tests in the literature is the Q test of Campbell and Yogo (2006), which falls within the general control variable approach outlined in Elliott (2011). Here the simple linear predictive model is augmented by an additional regressor used as a proxy for the current period innovation driving the predictor; an infeasible version of this test using the actual current period innovation is optimal when the predictor is endogenous. In particular, in its simplest form, Q is based around the infeasible t-statistic on β when (x t − φx t−1 ) is added as a regressor to the predictive regression. Campbell and Yogo (2006) develop a feasible version of this test, using the approach of Cavanagh et al. (1995), based on a Bonferroni confidence interval for β obtained using a confidence interval for φ (equivalently c) formed from the well-known quasi-GLS demeaned augmented Dickey-Fuller [ADF] unit root statistic of Elliott et al. (1996).
Among the likelihood-based approaches listed above, only the procedures developed in EMW and HLT are also asymptotically valid for the case of a weakly persistent predictor. 1 Like Lewellen (2004), the testing procedure outlined in EMW rules out the possibility that the predictor x t is locally explosive (by imposing that c is non negative), while HLT and Campbell and Yogo (2006) allow for some local explosivity (−5 ≤ c < 0) in the predictor. Simulation results presented in HLT suggest that where the predictor is locally explosive the Q test of Campbell and Yogo (2006), although valid, displays very poor power and is easily dominated by the hybrid test proposed in HLT, while the EMW test is highly unreliable. Where the possibility of local explosivity in the predictor can be ruled out, based on their simulation results HLT find that the EMW test dominates other tests where the predictor is either a pure unit root process (c = 0) or lies very close to a unit root process (c is small and positive) arguing that "... it appears that exclusion of robustness to the case of explosive predictors affords the EMW test the opportunity of greater power in the unit root setting... " op. cit. p.207. For larger c, HLT argue on the basis of their simulations that their proposed hybrid test offers superior power to all of the leading tests in the literature, including the EMW test.
Our aim in this article is to investigate an alternative to the hybrid testing procedure of HLT designed to exploit available power advantages that exist for strongly persistent predictors when c is either zero or small and positive in cases where locally explosive predictors can be ruled out, a priori. This then allows us to develop a procedure that can be compared on a level playing field with the EMW test. The approach we outline will be focused on easy to implement tests based on regression t-ratios. The hybrid testing procedure we propose can be viewed as an extension of the hybrid test outlined in HLT with the 1 A different strand of the literature which allows for both weakly and strongly persistent predictors is characterized by contributions from Phillips and Magdalinos (2009), Kostakis et al. (2015) and Breitung and Demetrescu (2015) and focuses on instrumental variable [IV] estimation using an instrument constructed from the predictor variable and designed to be less persistent than a local-to-unity process. While such IV based tests are valid regardless of the degree of persistence in the predictor, they are less powerful than the tests of EMW and HLT, particularly so when the predictor is weakly persistent or where it is strongly persistent with c zero or close to zero; see  introduction of information from an additional t-ratio motivated by the control variable approach of Elliott (2011). This t-ratio is formed on the lagged predictor in the basic predictive regression augmented with a GLS-based estimate used to proxy the current period innovation driving the predictor. In the strongly persistent case, we show that the limiting null distribution of this statistic depends on both the endogeneity correlation parameter and the local-to-unity parameter characterizing the predictor. We therefore propose a feasible method for obtaining asymptotically conservative critical values and provide response surfaces for practical use. An analysis of the asymptotic local power function of the resulting conservative test together with a corresponding feasible (conservative) implementation of the t-ratio proposed in HLT, obtained from a variant of the standard predictive regression where the OLS demeaned returns are regressed on the GLS demeaned lagged predictor, shows that, in the empirically most relevant case where a significant negative correlation exists between returns and the predictor's innovations, the new proxy-based test is more powerful than the corresponding test from HLT for positive predictability (β > 0) for c = 0 and small values of c; that is, exactly the areas where the original hybrid test of HLT is less powerful than the EMW test. We then show that substantial further power improvements can be obtained in these scenarios by considering a weighted combination of the new t-ratio and the t-ratio from HLT, the weights depending on the persistence of the predictor via a function of the p-values from a standard Dickey-Fuller-type unit root test applied to the predictor, again made operational using asymptotically conservative critical values with a response surface provided for practical implementation. Like HLT we find that when testing for positive persistence with a positive or small negative endogeneity correlation, or when testing for negative predictability (β < 0) with an endogeneity correlation that is not significantly positive, asymptotic local power is improved by using the standard predictive regression t-statistic with an asymptotically conservative critical value. Consequently, when testing for positive (negative) predictability, our recommended procedure in the near-unit root environment is to use the conservative standard t-ratio when the estimated endogeneity correlation is either positive or "small" and negative (either negative or "small" and positive), but to use the conservative test based on the weighted statistic otherwise. Further, in common with EMW and HLT, if the data suggest the predictor is weakly persistent, we propose switching into the standard t-ratio test with reference to standard normal critical values. Like HLT we base our switching function not on an (inconsistent) estimate of c, but rather on the familiar augmented Dickey-Fuller normalized bias coefficient unit root test.
In Monte Carlo simulations, we find that the hybrid test proposed in this article performs well in terms of finite sample size and power across a range of correlation parameters and persistence levels for the predictor, and compares very favorably with extant tests, offering a simple yet highly effective method for predictability testing. In particular, our proposed hybrid test almost always outperforms both the EMW and HLT hybrid test procedures in the case of strongly persistent predictors, with all three being largely identical for predictors displaying only very weak levels of persistence (as expected, given all three switch to a conventional t-test in this case). In cases where one is prepared to rule out the possibility of an explosive predictor, we therefore recommend the hybrid test developed in this article. Otherwise the hybrid test in HLT is preferred.
The remainder of the article is organized as follows. Section 2 introduces the predictive regression model which we will consider in this article together with the assumptions which we place on this data generating process [DGP]. In Section 3, we present the new augmented t-statistic that will subsequently feature in our hybrid testing procedure and detail its asymptotic properties. Here we also outline our method for obtaining asymptotic critical values and provide numerical comparisons with existing tests based on asymptotic local power functions. These simulation results provide motivation for the weighted statistics that we propose and evaluate in Section 4. Our final proposed hybrid testing procedure that allows for both weakly and strongly persistent predictors is then outlined in Section 5. Section 6 discusses extensions to deal with higher order serial correlation in the predictor. In Section 7, we investigate the finite sample size and power properties of our proposed hybrid test, comparing with the test procedures of EMW and HLT. Section 8 concludes. We use the notation x := y (x =: y) to denote that x is defined by y (y is defined by x), and ⇒ to denote weak convergence.

The predictive regression model
Let y t denote the (excess) stock return in period t and let x t−1 denote a variable observed at time t − 1 which is considered to be a putative predictor for y t . The predictive regression model we consider is where x t is an observed process, specified according to the DGP with s 1 a mean zero O p (1) random variable. As discussed in Section 1, it is important for practical purposes to allow for the possibility of high persistence in the predictor variable x t and to allow the shocks driving the predictor, xt in (2), to be correlated with the unpredictable component of stock returns, yt in (1). As regards the latter, we assume that the innovation vector t := ( xt , yt ) is IID with finite fourth-order moments and satisfying

Remark 1.
The assumption that t is a vector IID process is made purely to simplify our presentation. All of the large sample results given in this article continue to hold in the case where t is a (bivariate) martingale difference process satisfying the conditions given on p.200 of HLT. Indeed, for the case of a strongly persistent predictor (Assumption S), it is also possible to allow for conditional heteroskedasticity of the form considered in Assumption A.1 of Campbell and Yogo (2006) without altering the large sample results which are given in what follows. In the case of a weakly persistent predictor (Assumption W), the same would be true for conditional heteroskedasticity of the form given in, for example, Assumption INNOV(ii) of Kostakis et al. (2015Kostakis et al. ( , p.1512 providing the regression t-ratios discussed in what follows are implemented using White standard errors rather than OLS standard errors; notice, however, that for our final hybrid test outlined in Section 5 only the conventional t-test, T N , would actually need to be based on a t-statistic computed with White standard errors. The assumption that xt is serially uncorrelated is also not crucial and we will subsequently discuss in Section 6 how the methods we propose can be modified to allow for weak dependence in xt . The methods developed in the literature on predicting returns are, however, based on the assumption that yt is serially uncorrelated; different methods are required in cases where yt may be serially correlated and, as such, will not be considered here. With respect to the degree of persistence in x t , we assume that the true value of φ in (2) is unknown to the practitioner and satisfies one of the following two assumptions: Assumption S. Strongly persistent predictor: The autoregressive parameter φ in (2) is local-to-unity with φ := 1 − cT −1 where c is a fixed non negative constant.
Remark 2. Many putative predictors are strongly persistent, with sums of sample autoregressive coefficients close to or only slightly smaller than unity. In such cases, near-integrated asymptotics provide good approximations for the behavior of test statistics. However, not all possible predictors are strongly persistent and many models in the literature treat x t as generated from a stable autoregressive process. We therefore allow for either of these possibilities to hold for x t . As discussed in Section 1, our assumptions exclude the possibility of explosive predictors (φ > 1), in line with the approach of, for example, EMW and Lewellen (2004). In contrast, HLT and Campbell and Yogo (2006) both allow for a small degree of local explosivity (−5 ≤ c < 0) in the predictor in the tests they develop.
In this article, our focus is on developing tests of the null hypothesis that y t is not predictable by x t−1 , i.e., H 0 : β = 0 in (1), which do not require the practitioner to know which of Assumption S or Assumption W holds for φ in (2). The alternative hypothesis is that y t is predictable by x t−1 , in which case β > 0 or β < 0 (one-sided alternatives are commonly adopted in practice). We will establish the large sample behavior of the predictability tests considered in this article under local alternatives such that the slope parameter β in (1) is local-to-zero. This approach permits analysis of the tests' local asymptotic power, and is consistent with the fact that predictive regressions for stock returns typically exhibit a small R 2 and low signal-to-noise ratios, with departures from the null being small when predictability is present. The appropriate localization rate (Pitman drift) is dictated by which of Assumption S and Assumption W holds. Under Assumption S, where x t is strongly persistent, the appropriate local alternative is given by H 1,S : β = gT −1 , while for weakly dependent x t under Assumption W, it is given by H 1,W : β = gT −1/2 , where in each case g is a finite constant.
The familiar Cholesky decomposition allows us to write the two components of t in the form where e t := (e 1t , e 2t ) ∼ IID (0, I 2 ) and ρ xy := σ xy /(σ x σ y ) is the contemporaneous correlation between the innovations driving the predictor, xt , and the unpredictable component of stock returns, yt . Using this representation, we can then re-write the predictive regression in (1) as The representation in (4) is instructive, in that it demonstrates how a predictive regression featuring an endogenous predictor x t−1 , such as (1), can be re-written using xt as an additional covariate in a form in which the predictor regressor, x t−1 , is strictly exogenous.

A new predictability test
In what follows it is convenient to define a generically notated regression model: and consider the generic t-statistic associated with the OLS estimate of β in (5).

An infeasible test
If xt was observed, which is equivalent to knowing φ (abstracting from the unknown constant, α x ), we could then perform a standard OLS regression in (5) with z xt = xt , which is clearly a correct specification with respect to the DGP in (4). Denoting the corresponding infeasible t-statistic as T inf , it is straightforward to show that T inf has a standard normal limiting distribution under the null hypothesis H 0 , irrespective of whether Assumption S or Assumption W holds. Moreover, under Gaussianity this would be an efficient test (among α y , α x invariant tests) whenever ρ xy = 0. Note that including xt as a regressor reduces the error variance from σ 2 y in (1) to σ 2 y (1 − ρ 2 xy ) in (4); that is, with knowledge of φ we can essentially subtract off the part of the innovation to returns that is correlated with the innovation to the predictor variable, thereby delivering a more powerful test. When ρ xy = 0, T inf remains asymptotically efficient as incorporation of the redundant regressor xt has no effect in large samples.

A feasible test using a proxy measure for xt
Given that xt is unobservable, one might ask if it is possible to obtain a proxy measure for xt ? In fact, this testing problem falls within the general control variable approach outlined in Elliott (2011). Here, (1) is augmented by an additional regressor used as a proxy for the current period innovation driving the predictor, xt ; so we may consider (5) as this augmented regression with z xt the proxy regressor. There are a number of ways in which this approach can be implemented, including the Bonferroni-based method advocated in Campbell and Yogo (2006) which as discussed in Section 1 is based on a sequence of such augmented regressions. Here we consider an alternative approach based on a single regression including a covariate z xt in (5) acting as a direct proxy for xt in (4). We will also primarily focus our discussion on the case of Assumption S where x t is strongly persistent, as this is the most problematic case where the standard t-statistic based on OLS estimation of (1), which we denote by T, has a non pivotal limiting distribution. An obvious approach to obtaining a proxy for xt is to assume a particular value for the local-to-unity parameter c, sayc; we would then construct z xt =x t − (1 −cT −1 )x t−1 (assuming α x = 0 for simplicity). If it happened to be the case thatc = c, then z xt = xt and we obtain the asymptotically standard normal and efficient test, T inf . However, whenc = c the critical values for this test will depend on both ρ xy and c, and it will no longer be an efficient test, with power being a (decreasing) function of the distance |c −c|. This clearly poses a problem in implementation as c cannot be consistently estimated.
An obvious proxy for xt is the OLS estimate,ˆ xt say, obtained from an OLS regression of x t on a constant and x t−1 . However, setting z xt =ˆ xt in (5) runs into the problem that z xt is exact orthogonal to the predictive regressor x t−1 . The estimate of β from such a fitted model is then numerically identical to that which would be obtained if z xt was omitted from (5). Moreover, the corresponding statistic is approximately 1/ 1 − ρ 2 xy times the simple t-statistic, T, and so the inference drawn from such a test would essentially be identical to that from T, hence using the proxy regressorˆ xt delivers no benefit whatsoever.
An alternative method for obtaining a proxy for xt , which takes account of a strongly persistent autoregressive structure in estimating the intercept term α x , is to employ a quasi-GLS estimate of α x obtained from the quasi-differenced OLS regression of (x 1 , Elliott et al. (1996) for further details. We denote this estimatorα x . We would then estimate the OLS regression and, denoting the estimate of φ byφ, construct the residuals˜ xt := x t −φ(x t−1 −α x ). Then we consider setting z xt =˜ xt in (5). In contradistinction to the OLS-based proxy regressorˆ xt , the GLSbased proxy regressor˜ xt is not orthogonal to x t−1 . This lack of orthogonality raises the potential for xt to act as a useful proxy for xt in the strongly persistent case. We therefore construct the t-statistic associated with the OLS estimate of β in the regression and denote this t-statistic as T * in what follows. As we shall establish in Section 3.3, the limiting null distribution of T * depends on both ρ xy and c in the case where x t is strongly persistent (Assumption S), although this issue notwithstanding, we might anticipate that this procedure could deliver decent power performance due to the inclusion of a proxy for xt . Under Assumption W, the asymptotic distribution of˜ xt , and therefore that of T * , will depend on the distribution of s 1 , so this statistic is appropriate only under Assumption S.

Asymptotic distribution of T *
In Theorem 1 we now report the asymptotic distribution of the T * statistic under both the null and local alternatives under H 1,S . A proof of Theorem 1 is provided in the appendix.
Theorem 1. Let y t and x t be generated according to the model in (1)-(2) under the conditions stated in Section 2 and let Assumption S hold. Then, as T → ∞, under H 1,S : Remark 3. The result in Theorem 1 highlights that the offset seen in the limiting distribution under the local alternative, H 1,S , given by the first term in the expression for S * (gσ x /σ y , ρ xy , c), is a function of the deterministic offset term gσ x /σ y , comprised of the Pitman drift, g, and the signal-to-noise ratio associated with the predictor, σ x /σ y , weighted by a stochastic offset term. Consequently, the test's asymptotic local power is higher, other things being equal, the larger the Pitman drift, and the larger the amount of variability in the predictor, relative to the error term in the predictive regression in (1). Under the null hypothesis, H 0 , the asymptotic distribution of the statistic is non standard and depends on both ρ xy and c.
We do not present the limiting distribution for T * under H 1,W because, as noted above, it depends on the distribution of s 1 . The test procedure we will subsequently develop is such that it never selects T * in large samples under Assumption W and, hence, the limit in that case is not relevant.

Asymptotic critical values for T *
Considering the case of strong dependence in Theorem 1 above, under H 0 , relevant critical values for the test based on T * will depend on the unknown nuisance parameters ρ xy and c. At a practical level, and as we will show below, ρ xy can be consistently estimated and so this dependence is easily dealt with (at least in large samples). The dependence on c, however, cannot be dealt with as easily because c, unlike ρ xy , is not consistently estimable. We therefore adopt a scheme for simulating critical values that will, by design, yield asymptotically conservative tests. HLT propose such a method for the statistics they consider, and here we outline a similar approach for the T * statistic and its null limit distribution S * (0, ρ xy , c). For expository purposes we will focus attention here on upper tail critical values relevant for upper tailed tests, as this is the case of most practical relevance, but the same approach could be used in an obvious way for lower-tailed and two-tailed tests. In outlining our final preferred hybrid procedures inbreak Section 5 we will detail how to perform both upper-tailed and lower-tailed tests, and two-tailed tests.
The steps to obtaining the conservative critical values are as follows: 1. For a chosen value of ρ xy , simulate the null distribution S * (0, ρ xy , c) for different c across an interval c ∈ [0, c max ]. 2. At each value of c, compute the π -level upper-tail critical value, cv * π (ρ xy , c) say. 3. Set the π -level critical value for T * equal to cv * π (ρ xy ) := max c∈[0,c max ] cv * π (ρ xy , c). Using cv * π (ρ xy ) will yield a correct π -level sized test when c = arg max c∈[0,c max ] cv * π (ρ xy , c), and give a conservatively sized test for other values of c. We simulated critical values in this manner, approximating the Brownian motion processes in the limiting functional from Theorem 1 using IIDN(0, 1) random variates, and with the integrals approximated by normalized sums of 1,000 steps, with 20,000 replications (these values are used throughout our asymptotic analyses). This was carried out for the conventional significance levels π ∈ [0.1, 0.   , c) is obtained for c much smaller than c max ; for example, with ρ xy = −0.95, it is obtained at c = 0 for each value of π .
To automate selection of an appropriate critical value for a given value of ρ xy , we calculated a response surface by regressing cv * π (z) on F(z) := [1, z, z 2 , ..., z 9 ] with z = ρ xy for the 38 data points corresponding to the grid of values for ρ xy . 2 The response surface critical value is the fitted value from this regression, and the coefficient estimates are given in Table 1. In practice, the response surface critical values can be calculated by substituting the unknown correlation parameter ρ xy with a consistent estimate. To that end, as in HLT, we suggest using the estimator whereˆ yt are the OLS residuals from regressing y t on a constant and x t−1 , and where it is recalled from Section 3.2 thatˆ xt denote the OLS residuals from regressing x t on a constant and x t−1 . It is straightforward to show thatρ xy is a consistent estimator of ρ xy under either Assumption S or Assumption W. In what follows, we denote tests based on comparison of T * with an asymptotically conservative critical value by T * con .

Alternative feasible tests
One alternative feasible test is the standard t-statistic T. Under Assumption W, it is straightforward to show that T has a standard normal limiting null distribution for any value of ρ xy , and thus has the potential for nuisance parameter free inference in this world. With respect to the DGP in (4), T is based on a correctly specified regression when ρ xy = 0, but when ρ xy = 0, the regression omits a relevant regressor; while this does not affect the limiting null distribution, T will be inefficient relative to the infeasible test if ρ xy = 0. However, among feasible tests, T is asymptotically optimal (under Gaussianity) for all ρ xy (see Jansson and Moreira, 2006, p.704), hence we would wish to apply this test under Assumption W, as is done in HLT's hybrid procedure. Theorem 2 of HLT shows that under Assumption W, as T → ∞, T ⇒ N(g, 1) under H 1,W . Under Assumption S, T has a standard normal limit null distribution provided ρ xy = 0, in which case it is also efficient; whenever ρ xy = 0, however, its limit null distribution depends on ρ xy and c.
A second feasible statistic proposed by HLT is a variant of the standard t-statistic, appropriate in the case of strongly persistent x t , taking the form of the t-statistic associated with the OLS estimate of β in the regression whereα y := (T − 1) −1 T t=2 y t . We denote this statistic as T . Under Assumption W, the limiting null distribution of T will depend on the (unknown) distribution of s 1 (as with T * ), hence the statistic is again only designed for use in the strongly persistent world (in contrast to T). Under Assumption S, it follows from Theorem 1 of HLT that In what follows, we denote tests based on comparison of T and T with asymptotically conservative critical values by T con and T con , respectively (response surfaces for the conservative critical values are provided in HLT).

Asymptotic local power comparisons under strong persistence
Under Assumption S, we can use the limiting representations given in Theorem 1 to compare the asymptotic local powers of tests based on the T, T and T * statistics for a range of values of the relevant nuisance parameters on which these depend, ρ xy and c. 3 We simulate S(g, ρ xy , c), S (gσ x /σ y , ρ xy , c) and S * (gσ x /σ y , ρ xy , c) and compare these to the relevant conservative critical values. In what follows, we set π = 0.05 and conduct upper tail tests. For a given value of ρ xy and c, we compute asymptotic powers across g ≥ 0 (g = 0 representing asymptotic size). We consider ρ xy ∈ [−0.95, −0.7, −0.5, −0.1] and c ∈ [0, 1.25, 2.5, 5, 10, 25, 50, 100]. For positive values of ρ xy , we find a result similar to HLT in that T con becomes the best performing test as ρ xy increases; in the hybrid procedure that we later propose, we follow HLT and make use of T con forρ xy > −0.1, hence here our focus is on negative values of ρ xy . Note that this is also the reason why our response surfaces for T * con outlined above were based only on non positive values of ρ xy .
The results for ρ xy = −0.95 are given in Fig. 1. For c = 0 we see that T * con is more powerful than T con and T con , substantially so with respect to T con . This remains the case for c = 1.25 and c = 2.5; we observe the power advantage over T con increasing for these values of c, although the power advantage of T * con relative to T con is diminishing as c increases. Once c = 5 or greater, T * con is only marginally more powerful than T con , but both are considerably more powerful than T con . In Figs. 2-4, the analysis is repeated with ρ xy = −0.7, ρ xy = −0.5, and ρ xy = −0.1. Again T * con is more powerful than T con for the lower values of c and they appear to be very similar for the higher values of c. Comparing T * con with T con , we observe similar patterns of relative power behavior in Figs. 2 and 3 as were seen in Fig. 1, with T * con outperforming T con , increasingly so as c increases. However, as ρ xy becomes less negative, the differences in power between T * con and T con become less marked. Indeed, in Fig. 4 where ρ xy = −0.1, the powers of T * con and T con become almost indistinguishable across almost all c.

A weighted test under strong persistence
Given the asymptotic power simulation results reported above, it is interesting to consider whether we might be able to combine T * and T (the two best performing tests) in a way to possibly improve power over and above that displayed by T * . For the purposes of illustration, our arguments will concentrate on the environment of Fig. 1(a), where ρ xy = −0.95 and c = 0 where, as noted above, T * is clearly the more powerful test. Now, under H 0 the correlation between T * and T is 0.90. A consequence of this high level of correlation is that the rejections obtained from T under H 1,S are close to being a subset of those obtained from T * . This implies that any (linear) combination of T * and T of the form w * f T * + w f T , where w * f and w f are some fixed positive weights standardized such that w * f + w f = 1, cannot lead to improved levels of power above that of T * because the correlation between the components w * f T * and w f T is identical to that between T * and T . We can, however, consider a randomized weighting scheme, with the random weights w * r and w r , say, determined from the available data, and with w * r and w r having support on [0, 1] and w * r + w r = 1. The aim is for w * r T * and w r T to have lower correlation than holds between T * and T , although it is crucial that the two components remain positively correlated. In this way, each component can potentially make a greater individual contribution to overall power. In what follows we will use x t to construct the weights because, unlike y t , its behavior does not depend on whether H 0 or H 1,S is true.
The distribution of any data dependent weights based on x t will, of course, depend on the value of c. The weight scheme we consider here is based on the p-value, denoted p NB , associated with the familiar local-GLS demeaned normalized bias unit root test statistic of Elliott et al. (1996), i.e., NB := Tφ in the context of (6). Well known results show that, under Assumption S, NB ⇒ ( . The attractive feature of using p NB in the weight scheme is that when c = 0, p NB ⇒ p NB (S NB (0)) = U(0, 1). Hence in the c = 0 case where most difference is observed between the power profiles of T * and T , the weights can be based on a uniformly distributed variate on (0, 1). As c becomes large, the power gains of T * over T diminish and consequently the role of the weight function becomes less critical; use of a weight scheme based on p NB is again appealing here, because as c → ∞, p NB ⇒ 0.
Following such considerations, the weights we consider are defined as w * r := (p NB ) λ and w r := 1 − (p NB ) λ , where the positive constant λ is introduced to permit an additional degree of calibration in the weight specification. We therefore consider the weighted statistic T w := w * r T * + w r T . With this weighting scheme, the asymptotic correlation between w * r T * and w r T remains positive across all values of ρ xy and c we consider (and for all λ). The weighted statistic T w is thus comprised of a weighted average of T * and T , with the weighted average of the tests having most effect when c = 0 and reducing towards simply T as c becomes large.
In Theorem 2 we next state the limiting distribution of T w under Assumption S. The stated result follows straightforwardly from the result given in Theorem 1 and the limiting distribution for T given in Section 3.5.

Theorem 2. Under the conditions of Theorem 1,
In order to implement T w , we obtain a response surface for p NB based on simulated limit distributions. We simulated the limit of NB under c = 0, i.e., S NB (0), and then calculated the numerical approximation to p NB (x) for x ∈ [−20, −19.95, −19.9, ..., 4]. To automate selection of an appropriate asymptotic p-  Table 3. Response surface coefficient estimates for T w con . λ max (ρ xy ) cv * π (ρ xy ) Regressor π = 0.1 π = 0.05 π = 0.025 π = 0.01 π = 0.1 π = 0.05 π = 0.025 π = 0.01 value for a given value of x, we once again calculated a response surface by regressing p NB (x) on G(z) := [1, z 0.25 , z 0.5 , z, z 2 , z 3 ] with z = 1/(1 + e −x ) (481 data points), the logistic function z being a natural choice given that we are approximating a cumulative density function. 4 The response surface pvalue is the fitted value from this regression, and the response surface coefficient estimates are provided in Table 2, denoted p NB (x). In practice, the response surface p-value can be calculated using x = NB.

Asymptotic critical values and selection of λ
Calculation of conservative asymptotic critical values for the T w statistic is carried out in exactly the same manner as for the T * statistic in Section 3.4, but based on S w NB (0, ρ xy , c, λ). For a given value of λ, we obtain the conservative critical value cv w π (ρ xy , λ). At the same time, we evaluate the local alternative distribution S w NB (gσ x /σ y , ρ xy , c, λ) with σ x = σ y = 1 for g =7.5 over c ∈ [0, 1, 2, ..., 25] and compare this with cv w π (ρ xy , λ). We choose λ to maximize the average power across c, where the candidate values of λ we consider are λ ∈ [0.05, 0.1, 0.15, ..., 2.5]. We denote the power-maximizing value of λ as λ max (ρ xy ) and the corresponding conservative critical value as cv w π (ρ xy ). As with cv * π (ρ xy ) in the context of T * con , we found cv w π (ρ xy ) is obtained with c = 0 when ρ xy = −0.95. To select the appropriate value of λ and conservative critical value for a given value of ρ xy , we calculated responses surfaces by regressing λ max (ρ xy ) and cv w π (ρ xy ), respectively, once again on F(z) = [1, z, z 2 , ..., z 9 ] with z = ρ xy for the 38 data points in our grid of values for ρ xy . The response surface coefficient estimates for the ρ xy -dependent values of λ and associated critical values for π ∈ [0.10, 0.05, 0.025, 0.01] can be found in Table 3 (the remarks made in footnote 2 apply here also). We will refer to this testing procedure in what follows as T w con .

Asymptotic local power comparisons of T * , T and T w under strong persistence
Figure 1(a) also presents simulations of the asymptotic powers of the T w con procedure for c = 0, again using upper tail tests for π = 0.05, eliciting a direct comparison with T * con and T con . We see that T w con (calculated using λ max (ρ xy ) and compared to its conservative critical value cv w π (ρ xy )) is substantially more powerful than T * con . So we obtain a situation where combining the tests produces useful gains in the sense that the combined procedure has higher power than both of the individual constituent tests. This is made possible because the components of T w , i.e., (p NB ) λ T * and {1 − (p NB ) λ }T , have asymptotic correlation 0.43 under H 0 , which is positive but much lower than that of T * and T (0.90). It is also interesting to note that the critical value of T w con here is 1.96, which is close to the critical value of T con (1.94) and substantially smaller than that of T * con (5.40). For the other values of c in Fig. 1, we see T w con still dominating T * con (and hence T con ) until c = 10. At this point its power essentially coincides with that of T con since p NB is now generally close to zero. For ρ xy = −0.7 and ρ xy = −0.5 in Figs. 2 and 3, respectively, we see that the power levels of T w con are near to those of T * con , even for small values of c where T * con is more powerful than T con . Hence it is not always the case that the weighted combination improves upon the better of T * con and T con , but we do find that T w con is never meaningfully outperformed by the better of the two individual tests. When ρ xy = −0.1 in Fig. 4, T w con has a power profile that essentially coincides with T * con . We investigated the effects of switching the weights in T w such that w * r = 1 − (p NB ) λ and w r = (p NB ) λ in the case of ρ xy = −0.95 and c = 0 (i.e., the settings of Fig. 1(a)). The components of this variant of T w , {1 − (p NB ) λ }T * and (p NB ) λ T , now have asymptotic correlation −0.38 under H 0 and the critical value of T w con is 5.66, somewhat larger than that of T * con (5.40). The powers of T w con were found to be uniformly below those of T con , let alone T * con , which serves to illustrate the importance of the components of T w being positively correlated.

A hybrid procedure allowing for strong or weak persistence
Although the main focus of our analysis thus far has been on the case of strong persistence, we now outline our proposed hybrid testing procedure which closely mirrors the hybrid testing procedure, denoted T hyb in what follows, outlined in Section 3.3 of HLT. This procedure is designed to capitalize on the optimality property of the conventional t-test (where T is compared to a standard normal critical value) under weak persistence (Assumption W), and exploit the relative local power advantages of T w con and T con observed from the analysis in Section 4.2 for different values of ρ xy under strong persistence (Assumption S). This will entail the use of two switching mechanisms. The first involves a switching approach similar to that used in EMW, whereby the standard test is selected when evidence of a weakly persistent predictor is present. In the absence of such evidence, a secondary switching mechanism is needed to determine whether T w con or T con should be applied, this time on the basis of a consistent estimate of ρ xy ; in particular, for a strongly persistent predictor we would want to make use of T w con for more negative values of ρ xy , and T con for small negative and positive ρ xy .
The hybrid testing procedure we outline below can therefore be seem to parallel the structure of the T hyb procedure of HLT, with the statistic T in HLT's procedure replaced by the weighted statistic T w , developed in Section 4 above, in the light of its superior power performance documented in Section 4.2. Denoting such a procedure by T w hyb , our proposed hybrid testing approach proceeds as follows: 1. If NB OLS < −4T 1/2 perform T N , where T N denotes the test which compares T with a standard normal critical value, and where NB OLS := Tφ is the standard OLS demeaned Dickey-Fuller normalized bias unit root statistic based onφ, the OLS slope estimate obtained from regressing x t on a constant and x t−1 .
Step 1 coincides with Step 1 of the corresponding hybrid testing procedure, T hyb , from HLT. As in HLT, the normalized bias statistic is used to distinguish between the strongly and weakly persistent cases. Under Assumption S, NB OLS = O p (1), while under Assumption W, NB OLS diverges to minus infinity. For the reasons outlined on p.205 of HLT, we implement NB OLS with a sample size dependent critical value of −4T 1/2 . Under Assumption W, NB OLS diverges to infinity at a rate faster than T 1/2 , hence T N is always selected asymptotically under weak persistence because Pr(NB OLS < −4T 1/2 ) → 1 as the sample size diverges.

Remark 5.
Under strong persistence and the values of ρ xy we have considered in our asymptotic power analysis in Section 4.2, the asymptotic behavior of the hybrid test procedure T hyb of HLT coincides with that of T con , while the asymptotic behavior of the new procedure T w hyb coincides with that of T w con . As such, T w hyb will have considerably higher asymptotic power than T hyb in this environment when c is small.
Remark 6. Although we have outlined our hybrid testing procedure in terms of upper-and lower-tail one-sided tests for predictability, in principle these could also be used to perform two sided tests for predictability. In particular, supposing the upper-and lower-tail versions of the test were both run at the (asymptotic) π/2% significance levels, then combining inference from the two individual one sided tests for predictability would lead to an overall two sided test for predictability with asymptotic size of no greater than π %.

Higher-order predictor serial correlation
We next consider how our procedures should be adapted to take account of possible additional serial correlation in the process for the predictor series x t . To that end, we generalize the AR(1) formulation placed on s t in (2) to the AR(p + 1) formulation, where ψ(L) := 1 + p j=1 ψ j L j is a finite-order stationary AR(p) polynomial such that all of the roots of ψ(z) = 0 lie outside the unit circle, |z| = 1. The assumption of a finite-order autoregression for ν t , and hence s t , appears to be standard in both the control variable and residual-augmented strands of the predictive regression literature; see, for example, Campbell and Yogo (2006), Elliott (2011), andRodrigues (2022), all of whom assume finite-order autoregressions. As argued in Demetrescu and Rodrigues (2022, p.431), in practice we might view this as an approximation to a more general linear process for ν t , although formally this would require establishing a suitable rate at which p → ∞ as T → ∞. We conjecture that the conventional rate conditions on p associated with unit root test statistics given in, for example, Chang and Park (2002), should suffice for this purpose.
To accommodate the additional stationary serial correlation introduced through (10), the following modifications to the procedures outlined previously need to be made. First, NB OLS and NB need to be based on the corresponding estimated augmented Dickey-Fuller regressions and respectively, with the statistics re-defined as NB OLS := Tφ/(1 − p i=1γ i ) and NB := Tφ/(1 − p i=1γ i ). Each of these ADF regressions now includes p lagged difference terms. In practice p can be chosen by any consistent lag selection method; the numerical simulations reported in Section 7 use the MBIC rule of Ng and Perron (2001). Next, the residualˆ xt from (11) is used to calculateρ xy in (8), and finally, the residual˜ xt from (12) enters the regression (Eq. 7) for calculating T * . With these modifications implemented, the hybrid procedure outlined in Section 5 can continue to be implemented using the same set of conservative critical values.

Finite sample simulations
In this section, we evaluate the finite sample size and power properties of the T w hyb procedure developed in Section 5. We generate data using a sample size T = 200 from the model (1)-(3) with (e 1t , e 2t ) ∼ IIDN (0, I 2 ), σ x = σ y = 1 and drawing s 1 as a standard normal variate. We set α y = α x = 0 as the tests we calculate are invariant to these constant terms. The values of ρ xy , c and g we consider are the same as in the asymptotic analysis of Figs. 1-4 to facilitate a comparison between finite sample and asymptotic performance. Upper tail 0.05-level tests are again conducted, with the results based on 20,000 replications. Throughout, we estimate p using the MBIC rule of Ng and Perron (2001) with a maximum permitted lag order of p max = 12(T/100) 1/4 ( . denoting the integer part) together with the modification suggested by Perron and Qu (2007).
We first consider simulations of the finite sample size of T w hyb , i.e., setting g = 0, allowing for additional serial correlation in the process for x t through the specification for various values of ϕ, θ ∈ {−0.5, 0, 0.5}. The simulation DGP for ν t in (14) allows for MA behavior whenever θ = 0. We recall that this is not formally allowed for in the DGP specified for ν t in (10), but it is still of interest to consider MA errors in the simulations to investigate how well our proposed tests work in such cases, not least given our conjecture that the tests will remain valid for MA errors under a suitable rate condition on p. Table 4 reports the results across the different settings for c and ρ xy . We observe size to be generally well-controlled and close to the nominal level, with the serial correlation parameter settings for φ and θ having relatively little bearing on the rejection frequency under the null. Some modest over-size is apparent for the more negative values of ρ xy when c is zero or small, but this diminishes as c increases and as the model innovations become less correlated.
That the over-size is most apparent around c = 0 and ρ xy = −0.95 is a consequence of (i) in this region the T w hyb procedure will be almost always be performing T w con and (ii) as noted above cv w π (ρ xy ) is obtained with c = 0 (i.e., T w con is asymptotically correctly sized for c = 0 and conservative elsewhere). Consequently, when finite sample over-size of T w hyb occurs via T w con it would be expected to be most prominent when c = 0. Of course, this does not resolve why finite sample over-size (as opposed to near correct-or under-size) should be manifest in the first place and we have no ready explanation for this. What we can say is that the over-size diminishes reasonably quickly as the sample size increases and the asymptotics start to assert themselves. For example, the leading entry of Table 4 (c = 0, ρ xy = −0.95) shows a size of 0.075 when T = 200. For T = 400 and T = 800 the corresponding sizes are 0.060 and 0.056, respectively. We next evaluate the finite sample power of T w hyb , and do so through comparison with the T hyb procedure of HLT and the test procedure proposed by EMW, which we denote by EMW. As discussed in Section 1, the EMW test procedure is the most natural extant comparator for the T w hyb test, given that both exclude explosive predictors (recall that HLT's T hyb procedure allows for a small degree of local explosivity). Here we set ϕ = θ = 0, but do not assume knowledge of this and continue to determine p using the method of the previous section. To implement the EMW procedure, we adopt the switching function specified on p.799 of EMW so that the standard t-test, T N , is applied if an estimate of the local offset c is at least 130, while their weighted average power criterion-based test is applied otherwise, using the sample statistics and long run correlation estimator specified on p.697 of Jansson and Moreira (2006). To estimate c we follow HLT and use −Tφ from (11), and when the standard t-test is used in EMW, we follow EMW's approach of setting the critical value to the usual value of 1.645 for non negative estimates of the long run correlation parameter, but to set it to 1.7 for negative estimates. Long run variances are calculated using a Bartlett kernel with lag truncation T 1/3 . The T hyb procedure was implemented as in HLT, i.e., the same procedure as T w hyb in Section 5 above, but with T con replacing T w con . We first consider the comparison between T w hyb and T hyb . Figure 5 gives the results for ρ xy = −0.95. In Fig. 5(a) where c = 0, we see that T w hyb and T hyb have approximately the same size, slightly above the nominal level (cf. Table 4 for T w hyb ). It is clear however that T w hyb is substantially more powerful than T hyb . The relevant asymptotic counterpart here is a comparison between T w con and T con in Fig. 1(a) and the power differences there appear even more significant, suggesting that we would see further power gains of T w hyb over T hyb for finite sample sizes larger than T = 200. Elsewhere in Fig. 5 we see that T w hyb continues to be more powerful than T hyb for values of c up to c = 5; thereafter T w hyb and T hyb have identical power profiles. Notice again that the slight over-size associated with the tests is less apparent for the larger values of c. Figures 6-8 show the results for the other values of ρ xy . For small c we continue to see T w hyb outperform T hyb , while they behave similarly elsewhere, again in line with the asymptotic comparisons of T w con and T con in Figs. 2-4. Clearly then, there are potential benefits to be gained in practice by using the procedure T w hyb instead of T hyb , since it performs either as well as or better than T w hyb , offering power gains when the predictor variable is highly persistent, without worry of compromise when less persistent predictors are employed.
Finally, comparing our T w hyb procedure to EMW, we find that, across the ρ xy values we consider, the tests have very similar levels of power for the smaller values of c, while for larger c, T w hyb clearly emerges as the more powerful procedure, with increasing gains over EMW seen as the magnitude of c increases. This feature is most apparent for the most negative values of ρ xy , where the power gains of T w hyb relative to EMW are apparent even for c = 2.5. Overall, in addition to offering a generally superior power profile to T hyb , the T w hyb procedure can achieve substantial power advantages over the procedure of EMW for a wide range of ρ xy and c combinations, while the reverse is never true.

Conclusions
In this article, we have proposed a new hybrid procedure designed to test for predictability in returns which is valid in cases where the predictor is either weakly or strongly persistent. Our proposed hybrid test is a complement to the closely related hybrid testing procedure of HLT. In particular, the simulation results presented in HLT highlight that their hybrid test outperforms other extant predictability tests in most settings, but is outperformed by the test procedure of EMW in the case of strongly persistent predictors with the persistence parameter c either zero or small. The comparison is, however, not on a level playing field because EMW rule out the possibility of mild explosivity (c < 0) in the predictor, while HLT allow for some mild explosivity. By restricting the predictor to be non explosive, we are able to consider using a control variable based test (in the spirit of Elliott, 2011), whereby the predictive regression is augmented by a GLS-based proxy for the innovation driving the predictor. We have shown that a feasible conservative implementation of this augmented test improves upon the asymptotic local power of the feasible test used in HLT, which is based on using a quasi-GLS demeaned version of the predictor (but no covariate), in precisely the region of the parameter space where the HLT procedure is less powerful than the EMW test. Moreover, we show that a test based on a weighted average of the augmented statistic and the quasi-GLS statistic from HLT delivers notable further improvements in asymptotic local power in this region. Our hybrid test then replaces the quasi-GLS test used in the hybrid procedure in HLT with this weighted test. Like the hybrid tests in both EMW and HLT our proposed hybrid test procedure reverts to a conventional regression t-test on the predictor (comparing to standard normal critical values) if the data suggest that the predictor is weakly persistent. Monte Carlo simulations presented in this article demonstrate that our proposed hybrid procedure is overall more powerful than both the EMW and HLT test procedures across a wide spectrum of values of the persistence level in the predictive regressor (including where c is zero or small) and the correlation coefficient between the innovations in the model. Where explosive predictors can be ruled out, we therefore recommend the procedure developed in this paper. Otherwise we recommend using the corresponding procedure we developed in HLT.