Asymptotic behavior of encompassing test for independent processes: Case of linear and nearest neighbor regressions

Abstract Encompassing test has been well developed for fully parametric modeling. In this study, we are interested on encompassing test for parametric and nonparametric regression methods. We consider linear regression for parametric modeling and nearest neighbor regression for nonparametric methods. We establish asymptotic normality of encompassing statistic associated to the encompassing hypotheses for the linear parametric method and the nonparametric nearest neighbor regression estimate. We also obtain convergence rate depending only on the number of neighbors while it depends on the number of observation and the bandwidth for kernel method. We achieve the same convergence rate when . Moreover, asymptotic variance of the encompassing statistic associated to kernel regression depends on the density, this is not the case for nearest neighbor regression estimate.


Introduction
Encompassing tests lie on model selection step. They are used for detection of redundant models among admissible models. In that case an encompassing model is intended to account for the Patrick Rakotomarolahy ABOUT THE AUTHOR Patrick Rakotomarolahy is the assistant professor in the department of mathematics and their applications at Fianarantsoa University. He has completed his Bsc and Msc in applied mathematics. He received his doctorate at the Panthéon-Sorbonne Paris 1 University. His current researches are in statistical model selection and in modeling macroeconomic and financial variables. He focuses especially on issues about model selection between parametric and nonparametric techniques. This study is in-line with this direction as the findings on asymptotic behavior of encompassing tests allow us to detect redundant models.

PUBLIC INTEREST STATEMENT
Regression techniques are used for quantitative analysis method in many fields such as in economic or financial modeling. They are a useful tool for identification of factors, which may explain the evolution of any variable of interest. In economic modeling for example, when we want to analyze the evolution of Gross Domestic Product or GDP, it might be affected by many variables like interest rate, inflation, exchange rate, sentiment indicators. Researchers or experts may face several admissible models from parametric or/and nonparametric regression methods. Encompassing test can be helpful for detection of redundant models among admissible models. The findings in this study contribute on encompassing test between linear and nearest neighbor regression estimates.
results found by encompassed model. Theoretical development on encompassing test can be found in Mizon (1984), Gouriéroux and Monfort (1995) and Florens et al. (1996). For their application, we refer readers to the general to specific computer based model selection procedure, Hendry and Doornik (1994).
Recently, Bontemps et al. (2008) have developed encompassing test for linear parametric against kernel nonparametric regression methods. They provide asymptotic normality of the associated encompassing statistics under the independent and identically distributed hypothesis (i.i.d.). As stated in Hendry et al. (2008) that the work of Bontemps et al. (2008) is the starting treatment of encompassing tests to functional parameter based on nonparametric methods.
We extend this result to nearest neighbor regression method, which has been claimed more flexible compared to kernel. Other motivation would be its interest in application like in Nowman and Saltoglu (2003), Guégan and Huck (2005), Ferrara et al. (2010), Guégan and Rakotomarolahy (2010), and Puspitasari and Rustam (2018), among others.
In the next section, we provide an overview of the encompassing test. After, we establish asymptotic normality for various encompassing statistics associated to linear parametric and nearest neighbor regression methods. Last, we conclude.

Encompassing test for independent processes
This section introduces the encompassing test and then builds the corresponding encompassing hypothesis. So, given two regression models M 1 and M 2 , we are interested in knowing if model M 1 can account the result of model M 2 . In fact, we want to know if M 1 encompasses M 2 or, in a short notation M 1 EM 2 . Testing such a hypothesis will be done using the notion of encompassing test. Generally speaking, model M 1 encompasses model M 2 , if the parameter θ M 2 of the latter model can be expressed in function of the parameter θ M 1 of the former model. In other words, let Δðθ M 1 Þ be the pseudo-true value of θ M 2 on M 1 . In general, the pseudo-true value is defined as the plim of θ M 2 on M 1 . For more discussion on pseudo-true value associated with the KLIC, 1 we refer to Sawa (1978) and Govaerts et al. (1994). The encompassing statistic is given by the difference between θ M 2 and Δðθ M1 Þ scaled by a coefficient a n . Let S ¼ ðY; X; ZÞ be a zero mean random process with valued in RxR d xR q where d; q 2 N � . For x 2 R d and z 2 R q , we consider the two models M 1 and M 2 defined as follows: In addition, the general unrestricted model is given by rðx; zÞ ¼ E½YjX ¼ x; Z ¼ z�. Following the encompassing test for functional parameter in Bontemps et al. (2008), we have the null hypothesis: This null states that M 1 is the owner model, and M 2 will be served on validating this statement and is called the rival model. We test this hypothesis H through the following implicit encompassing hypothesis: The following homoskedasticity condition will be assumed all along this work: Moreover, a necessary condition for the encompassing test relies on the errors of both models where the intended encompassing model M 1 should have smaller standard error than the encompassed model M 2 .
Given a sample, of size n, s i ¼ ðy i ; x i ; z i Þ for i ¼ 1; . . . ; n as realization of the random process S ¼ ðY; X; ZÞ. We suppose that s i , i ¼ 1; . . . ; n are i:i:d:. Then, for given functional estimates m n and g n of the functions m and g, respectively, we have the following encompassing statistic: δ mn;gn ¼ g n ÀĜðm n Þ; where Ĝ ðm n Þ is an estimate of the pseudo-true value, associated with g n on H, in the LHS of the hypothesis H � . Bontemps et al. (2008) has provided asymptotic normality of this encompassing statistic δ by considering kernel regression estimate for nonparametric method. This result can be extended to nearest neighbor regression estimate but of course with different assumptions.
For nearest neighbor regression estimate, we consider the representation in Mack (1981), that is the k nearest neighbor (or k-NN) estimate g n of g is given by: where R n will be defined as distance, according to the Euclidean norm in R q , from z to its kðnÞth neighbors, and wðuÞ is a bounded, non-negative weight function satisfying ð wðuÞdu ¼ 1 and wðuÞ ¼ 0 for juj � 1: To establish an asymptotic distribution of δ mn;gn , we need some assumptions. The following assumptions will be used for insuring the asymptotic normality and are taken from Mack (1981). Without loss of generality, the function f will be a marginal density or a conditional density or a joint density according to the variables on its arguments.
The first assumption relies on the density function of the couple ðY; ZÞ.
The following assumption concerns conditions on the moments up to order three of the variable of interest.

Assumption 2. E½jYj
The last assumption states conditions on the relationship between the number of neighbors k and the sample size n.
When assumptions 1-3 hold and the relation (3) is satisfied, then Mack (1981) has established the asymptotic normality of the centered k-NN regression of g n . Moreover, under assumption 3, the bias of such k-NN regression estimate vanishes to zero.
Without loss of generality, we proceed as previously when model M 1 will be estimated by k-NN regression method. In the rest of the paper, N ðμ; vÞ denotes the normal distribution with mean μ and variance v. We now present the asymptotic normality of the encompassing statistic.

Asymptotic normality of the encompassing statistic
In general, M 1 or M 2 can be estimated using nonparametric or parametric regression methods. We can encounter the following four situations: M 1 and M 2 are both estimated parametrically, M 1 and M 2 are both estimated nonparametrically, M 1 is estimated nonparametrically and M 2 parametrically and M 1 is estimated parametrically and M 2 nonparametrically.
For development on the asymptotic behavior of the encompassing statistic for fully parametric case, i.e the two models M 1 and M 2 have parametric specification, we refer readers to Gouriéroux et al. (1983) and Mizon and Richard (1986) among others. For recent discussion on this encompassing test for fully parametric case, Bontemps et al. (2008) is a good reference.
Next, we will study the completely nonparametric case.

Nonparametric specification for M 1 andM 2
We consider the case where the two models M 1 and M 2 defined in (1) are estimated using the nonparametric nearest neighbor regression method. To test the hypothesis" M 1 encompasses M 2 ", we establish asymptotic normality of the associated encompassing statistic.
Theorem 3.1. Assume that assumptions 1-3 and relations (2) and (3) Γððqþ2Þ=2Þ is the volume of unit ball in R q with Γð:Þ the gamma function.

Proof of Theorem 3.1
The proof will be based on the decomposition of the encompassing statistic into two parts as an expression of nearest neighbor regression and a kind of bias. Before all, let's denote by: We write down our encompassing statistic by replacing our estimates g n and Ĝ ðm n Þ at a given point z, and we have: where A is the first expression in RHS of the equality. This involves a k-NN regression of � i ¼ Y i À mðx i Þ on Z i scaled by the coefficient ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi k À 1 p seeing as convergence speed rate when n goes to infinity. Using Mack (1981), under assumptions 1-3 and when relation (3) holds, we have: in distribution as n ! 1: Next, for the second expression B, we can bound by taking its supremum with respect to x i and then we get: When using the expression of the bias, Theorem 1 in [2], B 2 becomes: where Að:Þ is a function depending only on x i and its expression can be found in Mack (1981). Then from Assumption 3, B 2 vanishes to zero when n ! 1. It remains on showing that B 1 goes to zero also. This can be achieved using result of Mukerjee (1993) extension of Cheng's work (Cheng, 1984). Therefore, we remark that when the number of neighbors k increases more the weights given to neighbors decrease, then rewriting m n ðx i Þ and we have the following equivalence: where Kð:Þ is a given weight function which satisfies condition (3), c j is a bounded weight equal to zero when j is larger than the number of neighbors and R i is the distance between x i and its k th neighbor.
When we denote by m n ðx i Þ ¼ ∑ n j¼1 c j ffi ffi k p Y j , then from Theorem 2.1 in Mukerjee (1993), we have: with r > 1 and θ n a positive sequence which tends to zero as n ! 1. So we get jBj converges to zero in probability as B 1 . This completes the proof of theorem.
Next, we will consider the mixed situation where the owner model has parametric specification and the rival is from nonparametric method.

Parametric modelling for M 1 vs nonparametric specification forM 2
In this section, we consider the case that model M 1 is a linear parametric model and M 2 is estimated by nearest neighbor regression technique. Therefore, the hypothesis H will have linear parametric specification. The encompassing statistic associated to the null M 1 EM 2 can be rewritten as follows: δ β;g ðzÞ ¼ g n ðzÞ ÀĜ L ðβÞðzÞ; where Ĝ L ðβÞ is an estimate of the pseudo-true value G L ðβÞðzÞ associated with g n on H, and is defined as G L ðβÞðzÞ ¼ β½XjZ ¼ z�.
We estimate the rival model M 2 using k-NN regression method where the owner model M 1 is still with linear parametric specification. The following theorem provides the asymptotic normality of the encompassing statistic introduced in relation (5).

Proof of Theorem 3.2.
When the owner model M 1 is the linear regression parametric and the rival model M 2 is the k-NN regression, we write the encompassing statistic as follows: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi k À 1 pδ β;g ðzÞ ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi k À 1 p ðg n ðzÞ ÀĜ L ðβÞðzÞÞ ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi Rn � � is the weight associated to the nearest neighbor regression of β 0 X i on Z i and R n is the distance from z to its kth neighbor.
We remark that Y i and β 0X i as fitted values of Y i would have the same Z i nearest to z. We then have N 3 À N 4 ¼ 0. Otherwise, this can happen asymptotically, that is Y i and β 0X i as fitted values of Y i have the same Z i nearest to z when k and k tend to infinity. Thus, N 3 is asymptotically equivalent to N 4 .
For the first expression N 1 ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi Thus, from Slutsky's theorem, N 2 tends to zero in distribution.
We will consider the last case where the owner model M 1 is a nonparametric method and the rival model M 2 is a linear parametric model.

Nonparametric specification for M 1 vs parametric modelling for M 2
We now consider the owner model M 1 to be estimated using a k-NN nonparametric regression and the rival model M 2 to be a linear parametric method. Therefore, the encompassing statistic associated to the null M 1 EM 2 is given by: δ m;γ ¼γ Àγðm n Þ; where γðm n Þ is an estimate of the pseudo-true value γðmÞ associated with γ on H, which is defined by γðmÞ ¼ ðE½ZZ0�Þ À 1 E½Zm�. We estimate the unknown conditional mean m associated to the model M 1 using k-NN regression estimate. We state in the following theorem the asymptotic normality of the encompassing statistic in relation (7). For precision, we use the assumptions introduced in previous section for k-NN regression estimate m n instead of g n :

Proof of Theorem 3.3.
When the functional parameters m n is from k-NN regression estimate, we rewrite the associated encompassing statistic as follows: where L 1 corresponds to the first expression in the RHS of the equality (8). It coincides to the linear regression of the error � (with Under i.i.d. assumption in Theorem 3.3, L 1 converges in distribution to Z where Z is normally distributed with mean zero and variance Ω ¼ σ 2 ðE½ZZ0�Þ À 1 . For the second expression L 2 , we bound it by taking the maximum with respect to x i and then we get jL 2 j � ffi ffi ffi n p S n D n Supfðmðx i Þ À m n ðx i ÞÞÞ; cally equivalent to the bound of jBj in Equation 4 which converges to zero in probability. Thus, the product vanishes to zero also from Slutsky's theorem. This completes the proof.

Illustration
In this section, we illustrate our theoretical results on real data. We focus on socio-economic factor determinants of Life expectancy. As explanatory variables for Life expectancy at birth, we consider the Gross National Income per capita in US $, the Gross Domestic Product per capita in US $ and the government health expenditure per capita in US $. Impact of these variables on Life expectancy at birth has been analyzed a long way in the literature, for regression analysis we may look at Hussain (2002) and Ali and Ahmad (2014). We use cross sectional data for 169 countries in 2017, which have been collected from the United Nation and the World Health Organization websites. To start our empirical study, we compute some basic statistics.
The highest life expectancy hits 84 years and belongs to Japan. While, the lowest is around 52 years belonging to Central African Republic. The best life expectancy of 84 years would be remarkable. Besides, life expectancy mean 72 years seems interesting. Moreover, the median value 73.69 indicates that around 84 countries have life expectancy above 73 years, largely beyond the retirement ages.
For socio-economic variables; Luxembourg, Switzerland and USA have the highest GDP, Income and health expenditure per capita, respectively. Burundi has the lowest GDP and Income per capita. Congo Democratic Republic registers the lowest government spending on health care. These variables exhibit some common behaviors such as the median of each variable is around fifteen times of its minimum and one over fifteen times of its maximum. They also have high dispersion. We now proceed on analysis of their relationship with life expectancy.
Let compute the correlation coefficients between life expectancy and the predictor variables.
From Table 2, Life expectancy has positive and high correlation with each explanatory variables. Such correlations indicate that higher GDP, income or expenditure on health will link with longer life expectancy. This preliminary analysis could motivate us on exploring other statistic and econometric analysis of the relationship between life expectancy and the three socio-economic variables. We will use the linear and the nearest neighbor regression methods. In sequel, we will work on demeaned and scaled (by a factor 1 MaxÀ Min ) variables.
For the linear regression, we explain life expectancy at birth Y by health expenditure per capita X, gross income per capita Z and GDP per capita W. Considering several combination of these explanatory variables, following we summarize regression coefficient estimates with their standard errors in parenthesis.
where û j an estimate of the error term of model M j , j ¼ 1; . . . ; 6.
Coefficients of models M 1 , M 2 and M 3 are all significant. In contrast, models M 4 and M 6 nest to model M 2 due to non-significance of X and Z coefficient estimates. Besides, M 5 nests to M 3 as X's coefficient estimate is not significant. We then focus our analysis on models M 1 , M 2 and M 3 and proceed on their diagnostics. Results are reported in Table 3.  From Table 3, we accept the homoscedasticity property of residuals and their non-correlation with predictors. In addition, residuals of the three models have zero mean. Thus, our three models meet standard assumptions on linear regression. M 1 , M 2 and M 3 are non-nested models. Thus, the decision on choosing one model will be based on encompassing test. A necessary condition is that the encompassing model should fit better than encompassed model. Therefore, encompassing model is expected to have smaller error variance than its rival. The standard errors of models M i , i ¼ 1; 2; 3 are σ 1 ¼ 0:192, σ 2 ¼ 0:179 and σ 3 ¼ 0:182, respectively. Then, among the three models, M 1 has the worst fit and M 2 has the best fit. We report in Table 4 various encompassing tests associated to models M 1 , M 2 and M 3 .
From Table 4, we accept the null M 2 EM 3 and M 3 EM 1 that is, M 2 encompasses M 3 and M 3 encompasses M 1 . In contrast, we reject M 3 EM 2 and M 1 EM 3 , there are no mutual encompassing. Thus, we retain model M 2 as it also has the smallest standard error. We will re-examine the link between life expectancy and explanatory variables using nearest neighbor regression.
For k-NN regression of life expectancy, we need the specification of the weighting function wð�Þ and the estimation of the parameter k. Two weighting functions have been mostly used in the literature: the exponential function expðÀ jjzÀ Z ðiÞ jj 2 Þ ∑ k j¼1 expðÀ jjzÀ Z ðiÞ jj 2 Þ with ðZ ðiÞ Þ i ¼ 1;...;k the k nearest to z, and the uniform function 1 k . We also consider these two weighting functions.
Assumption 3 states that the number k should satisfy 1 < k ¼ n α < n 4 4 þ d , for n observations and d explanatory variables. Then, as n ¼ 169, we have maximum values for k which are 60, 30, and 18 for d ¼ 1, d ¼ 2 and d ¼ 3 respectively. We estimate this parameter k by minimizing the root mean squared error (RMSE). Results are summarized in Table 5 where we keep the following notation already used in linear regression: X for health expenditure per capita, Z for gross income per capita and W for GDP per capita.
For models M 7 to M 11 , model M 10 has the lowest standard error. We also remark that standard errors of models M 9 and M 10 are very close. We will check if model M 10 can account results of other models and if there are mutual encompassing between M 10 and M 9 . We now compute the following standardized encompassing statistics using result developed in Theorem 3.1:   Table 6.
Values in Table 6 are all less than 1.96 in absolute value, except for M 9 EM 10 . We accept null hypotheses M 10 EM 8 , M 8 EM 7 , M 10 EM 9 and M 10 EM 11 . In other word, M 10 can account information content in other models. As M 9 does not encompass M 10 , there is no mutual encompassing. Thus, we can retain model M 10 from all k-NN regression models.
Next illustration concerns encompassing test on nonparametric and parametric regression techniques in Theorem 3.3, having as null hypothesis: the nearest neighbor regression M 10 encompasses the linear regression M 2 . Under this null, we have the following statistic from Theorem 3.3: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffî i Z i ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffî where Ω is an estimate of the asymptotic variance Ω, ê i residuals of model M 10 and σ 2 is a k-NN regression estimate of the conditional variance σ 2 ¼ varðY=X ¼ x; Z ¼ zÞ.
Absolute value of the standardized encompassing statistic δ S ¼ À 0:01 is less than 1:96. Therefore, we accept the null hypothesis at a risk level 5% i.e the nearest neighbor regression M 10 encompasses the linear regression M 2 . We conclude that we may retain k-NN regression of life expectancy on health expenditure and income.

Conclusion
We know that different approaches of encompassing tests present in the literature provide different results. We have considered encompassing test in asymptotic way which is inline with the encompassing principle announced in the introduction. The work has been conducted for parametric and nonparametric regression techniques.
As stated in Hendry et al. (2008) that the work of Bontemps et al. (2008) is the starting treatment of encompassing tests to functional parameter based on nonparametric methods. We have extended that work to nearest neighbor functional parameter estimate under the i.i.d. assumption. When using linear and nearest neighbor regressions as estimators for conditional expectations, we have established asymptotic normality of the associated encompassing statistics for independent processes.
Comparing the convergence rate of the asymptotic encompassing statistic of k-NN regression estimate to kernel regression obtained by Bontemps et al. (2008), it depends only on the number of neighbors k for k-NN while for kernel ones depends on the number of observation n and the bandwidth h n . We have the same convergence rate when h n ¼ k=n. Moreover, Bontemps et al. (2008) obtained asymptotic variance of the encompassing statistic associated to kernel regression depending on the density, which is not the case for nearest neighbor regression estimate.
Development of encompassing test to nonparametric methods opens new research direction in theory as well as in practice.