HRLR regression

ABSTRACT Selecting the most important predictor variables and achieving high prediction accuracy are the two main goals in Statistical Learning. In this work, we propose a new regularization method, HRLR (a H ybrid of Relaxed Lasso and Ridge Regression), where we use properties of both relaxed lasso and ridge regression. We also demonstrate the effectiveness of our method and the accuracy of the algorithm on simulated as well as real-world data. Simulation results suggest that the HRLR regression outperforms the existing well-known methods including lasso and Elastic net in many scenarios described, especially when the error distribution is non-normal. The proposed algorithm is implemented in R and links are provided in this paper along with the R functions used for the simulation.


Introduction
Suppose that the response variable Y i and at least one predictor variable x i;j are quantitative with x i;1 ;1. Let . . . ; x i;p Þ ¼ ð1 u T i Þ and β ¼ ðβ 1 ; . . . ; β p Þ T where β 1 corresponds to the intercept, then the multiple linear regression (MLR) model is for i ¼ 1; . . . ; n: This model is also called the full model.
Here n is the sample size, and we assume that the zero mean random variables � i are independent and identically distributed (iid) with variance Vð� i Þ ¼ σ 2 . In matrix notation, these n equations become where Y is an n � 1 vector of responses, X is an n � p matrix of predictors, β is a p � 1 vector of unknown coefficients, and � is an n � 1 vector of unknown errors. The ith fitted valueŶ i ¼ x T iβ and the ith residual r i ¼ Y i ÀŶ i where β is an estimator of β.
There are many methods for estimating β, including, lasso by Tibshirani (1996), Eelastic net by Zou and Hastie (2005), relaxed lasso by Meinshausen (2007) and ridge regression by Hoerl and Kennard (1970). The lasso by Tibshirani (1996) is an attractive regularization method for high dimensional regression. The lasso coefficients, β L , minimize the quantity There are some benefits and limitations of using lasso.
In terms of benefits, first, it does variable selection by setting some of the coefficients equal to zero. Second, it uses an efficient computational algorithm. According to Meinshausen (2007), one of the limitations of lasso is when p > > n the rate of convergence of the lasso is very slow. Also, if the data set contains highly correlated independent variables, lasso tends to produce poor coefficient estimates.
Relaxed lasso is a generalization of the lasso shrinkage technique for linear regression. Relaxed lasso does variable selection and parameter estimation by using two penalty parameters in contrast to lasso which uses only one. According to Meinshausen (2007), relaxed lasso produces all standard lasso solutions with the same computational effort as regular lasso and at the same time, generates models that are similar or have even slightly better predictive performance.
Elastic net by Zou and Hastie (2005) is another regularization and variable selection method. It is a hybrid of ridge regression and lasso regularization. Elastic net also generates zero-valued coefficients. Zou and Hastie (2005) has shown that the elastic net outperforms lasso on data with highly correlated predictors.
The ridge regression coefficient estimates denoted by β R are values that minimize where λ � 0 is a tuning parameter. The quantity λ P p j¼1 β 2 j is the shrinkage penalty in this technique. The final model with ridge regression will include all p predictors.
Apart from the methods described above, there are numerous techniques for variable section have been discovered more recently. See the studies by Y. Wang et al. (2019), S. Wang et al. (2011) and Xie and Zeng (2010) for examples.
This work proposes a new regularization and variable selection method, which is a Hybrid of Relaxed Lasso regularization and Ridge regression (HRLR). Theoretical and numerical results demonstrate that the HRLR produces parsimonious models with equal or lower prediction loss than the regular lasso and relaxed lasso estimators for high dimensional data where the number of predictor variables p is very large, possibly much larger than the number of observations n. In Section 2, we define the estimator and describe the theory and method. We present a simulation in Section 3 as well as two real data examples in Section 4.

Theory and method
Definition 1. The HRLR regression estimator is defined for λ 1 2 ½0; 1Þ; λ 2 2 ½0; 1Þ and ϕ 2 ð0; 1� as β λ 1 ;λ 2 ;ϕ ¼ arg min Where 1 M λ 1 is the indicator function on the set of variables M � f1; . . . ; pg so that for all k 2 f1; . . . ; pg The HRLR estimator is a two-stage procedure: for each fixed λ 2 we first find the ridge regression coefficients, and then compute the relaxed lasso-type shrinkage along the relaxed lasso coefficient solution paths (along ϕ and λ 1 ). The occurrence of double shrinkage in the process takescare of using a scaling transformation.
For ease of implementation, we introduce Theorem 1, where we consider a data set with n observations and p predictors. Let y ¼ ðy 1 ; . . . ; y n Þ T be the response and X ¼ ðx 1 ; . . . ; x n Þ be the model matrix, where x j ¼ ðx 1j ; . . . ; x nj Þ T , j ¼ 1; . . . ; p are the predictors. Also, assume that the response is centered and the predictors are standardized. Theorem 1. Given dataset ðy ; X Þ and ðλ 1 ; λ 2 Þ, define an artificial dataset ðy � ; X � Þ by and β � ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi 1 þ λ 2 p β, then the HRLR criterion can be written as Proof. From Equation (5), we have Substituting the identities Theorem 1 says that we can transform the HRLR problem into an equivalent relaxed lasso problem on augmented data. Therefore the set of solutions of the HRLR problem is given by β γ;ϕ ¼ 1 ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi 1 þ λ 2 pβ �γ;ϕ

Asymptotic results
As shown in Theorem 1, HRLR problem can be transformed into an equivalent relaxed lasso problem on augmented data. Therefore, many of the asymptotic results of HRLR regression directly follow the asymptotic results of the relaxed lasso procedure on augmented data. In the study by Meinshausen (2007), it is conjectured and later supported by numerical examples that the relaxed lasso is a consistent variable selection procedure when the penalty parameters are obtained by cross validation. Therefore, according to Theorem 1, this conjecture is true for HRLR estimates on augmented data as it uses cross validated penalty parameters. It has been further shown by Meinshausen (2007) that the conjecture is not applicable for lasso.

Simulations
In this section, we use simulations to demonstrate the proposed HRLR method, and compare HRLR to four standard algorithms (lasso, relaxed lasso, ridge and elastic net). This simulation closely follows the simulation done by Tibshirani (1996). We simulate data from the true model y ¼ X β þ σ� with four error distributions: (1) �,Nð0; 1Þ (2) The same examples used by Tibshirani 1(996) were used in this paper to compare the prediction performance of the HRLR, lasso, ridge regression and elastic net systematically. Cross validation was used to select the final model. Examples are described below. The statistical software R (R Core Team, 2017) was used to conduct the simulation.
(1) Example 1: We simulated 50 data sets consisting of 20 observations in the training set and 200 observations in the testing set with eight predictors in both the training and testing sets. We let β¼ ð3; 1:5; 2; 0; 0; 0; 0; 0Þ and σ ¼ 3. The pairwise correlation between x i and x j was set to be corði; jÞ ¼ 0:5 jiÀ jj (2) Example 2: Same as Example 1, except that β j ¼ 0:85 for all j.

Simulation results
In this section, we use simulations to compare the proposed HRLR method to a collection of commonly used variable selection and/or shrinkage methods. As displayed in Table 1 and Figure 1, from the Example 1 simulations, we see that HRLR has the lowest median test root mean square errors (denoted as TRMSEs) compared to all other methods for each error type. The standard error of the TRMSE (denoted as SE) is slightly higher for HRLR for normal errors compared to other methods, but was significantly lower for all the other error distributions. According to Table 2 and Figure 2, HRLR has slightly higher TRMEs compared to other methods for all the error distributions in Example 2 except when compared to relaxed lasso, but the standard error was lower in most cases, especially when the errors are uniformly distributed. As discussed by S. Wang et al. 2011) the lower TRMSE for elastic net in Example 2 might be due to the setup of the Example. It is clear from Table 3 and Figure 3 that HRLR has consistently smaller TRMEs than all other methods in Example 3 for all error types except for normal errors. It is

Real data example 1
We considered a diabetes data set originally used in (Efron et al., 2004) shown in Table 1. This data set contains 10 baseline variables, age, sex, body mass index (BMI), average blood pressure (BP) and six blood serum measurements. The response of interest is a quantitative measure of disease progression 1 year after baseline. We split the data into training and testing sets, having roughly 50% of the observations in each set. Table 5 shows the coefficient estimates for diabetes data based on the training set. The testing set was used to calculate the Test Root Mean Square Error (TRMSE) for each model built.
HRLR and lasso dropped the same three variables (age, ldl and tch) while relaxed lasso drooped four variables (age, sex, ldl and tch). Elastic net drooped three variables (age, sex and tch). All models dropped the age variable. Sex is typically an important predictor of diabetes. Consequently, one might question the decision to drop the sex variable made in the relaxed lasso and elastic net models.
The last row of Table 5 shows the TRMSE values. Among these TRMSE values, HRLR had the smallest TRMSE while elastic net had the highest.

Real data example 2
In the article, Fitting Percentage of Body Fat to Simple Body Measurements, Johnson (1996) uses the data at http://jse.amstat.org/datasets/fat.dat.txt provided to him by Dr. A. Garth Fischer in a personal communication on 5 October 1994, as a multiple linear regression activity with his students. A subset of the variables at http://jse. amstat.org/datasets/fat.dat.txt is available in the R package mfp by Ambler and Benner (2015) and the data set is frequently used in the text Statistical   (2017). We used a clean version of the data found at https://raw. githubusercontent.com/alanarnholt/MISCD/master/ bodyfatClean.csv. The cleaned data set contains 17 potential predictor variables and a response variable named brozek_C. The data were partitioned into a training and a testing set where roughly 70% of the observations were assigned to the training set and 30% to the testing set. Table 6 shows the coefficient estimates for "BodyFat" data. In this example, HRLR selected the least number of predictors followed by relaxed lasso, lasso, and elastic net, respectively. All the TRMSE values were very close. Among these values, relaxed lasso had the smallest TRMSE while ridge regression had the highest. The predictor variable abdomen_wrist was found by subtracting wrist measurements from the abdomen measurements (i.e. abdomen_wrist = abdomen-wrist). Therefore, the variables abdomen_cm, wrist_cm and abdomen_wrist are highly correlated and one of these three variables must be dropped from the model to reduce multicollinearity. HRLR, relaxed lasso and lasso agreed on dropping abdo-men_cm while elastic net did not.

Regression and Classification by Matloff
The collection of R functions hrlrpack, available from https://hasthika.github.io/hrlrpack.txt was used to create tables and figures. Both Tables 1 and 2, as well as Figures 1,2, and 3 were made with hrlrsim. An implementation of the algorithm can be found in the R function hrlr and was used to produce Tables 4 and  5 for real data examples presented in Section 4.

Conclusion
We have proposed a computationally efficient method, a hybrid of relaxed lasso and ridge regression (HRLR), for variable selection in linear models. The idea of HRLR is similar to the elastic net method where we used relaxed lasso instead of lasso. HRLR is likely to eliminate highly correlated variables more efficiently. Simulation and examples suggest that the number of variables selected by HRLR is usually smaller than the number of variables selected by   the other methods studied. The simulation results show that the HRLR compares favorably with other well-known modeling techniques, especially when the error distribution is non-normal. The analysis of both diabetes and body fat data set demonstrates the usefulness of the proposed method with real-world data. Finally, the HRLR method can be used for model fitting as well as a reliable variable selection technique with many error distributions.

PUBLIC INTEREST STATEMENT
When working with "Big Data", variable selection has become a very important and a necessary step in model building across all disciplines. Among the numerous methods available for variable selection, lasso, relaxed lasso, and elastic net are widely used. In this article, we have proposed a novel method, A Hybrid of Relaxed Lasso and Ridge Regression (HRLR), where we integrate properties from both relaxed lasso and ridge regression to develop a computationally efficient variable selection method for linear models. The practitioners can produce linear models using HRLR method, by simply using the R functions provided in this article.

Disclosure statement
No potential conflict of interest was reported by the authors.

Funding
The authors received no direct funding for this research.