A simulation study: new optimal estimators for population mean by using dual auxiliary information in stratified random sampling

ABSTRACT Recently, Haq et al. [A new estimator of finite population mean based on the dual use of the auxiliary information. Commun Stat Theory Methods. 2017;46(9):4425–4436] utilized the dual auxiliary information under simple random sampling only. Motivated by their idea, we initiated the dual use of auxiliary variable under a stratified random sampling scheme. Dual use of auxiliary variable consists: (1) the original auxiliary information and (2) the ranked auxiliary information. We proposed new optimal exponential-type estimators for the estimation of the finite population mean. Mathematical properties such as bias and mean squared error of the proposed estimators are derived. Monte Carlo simulation studies are included to successfully validate the theoretical results. Moreover, the applicability of the proposed estimators is highlighted through empirical interpretation with the help of a real-life data set. It is clearly identified from the numerical results that our proposed estimators are more efficient over the competitors.


Introduction
One of the objectives of sample survey theory is to estimate the unknown population parameters of the study variable such as population total, mean, proportion, ratio and variance etc. A procedure is desirable that provides a precise estimator of the parameter of interest by surveying a suitably chosen sample of individuals. Supplementary/additional information provided by an auxiliary variable which is correlated with the study variable enhances the precision of the estimators. Survey statisticians take advantage of this information whenever it is available to explore the efficient estimators. Ratio, product, regression and their modified estimators are best examples in this regard.
An elaborate literature has grown for identifying more efficient estimators under different sampling designs, e.g. simple random sampling, stratified random sampling, cluster sampling, systematic sampling and etc. Simple random sampling does not produce administrative convenience and representative sample for a heterogeneous population. As it does not capture the diversity which is likely to be mined through stratified random sampling. Stratified random sampling is one of the possible ways to increase the precision of the estimates. It is a powerful and flexible method that is widely used in practice. Many researchers, such as Kadilar and Cingi [1,2], Koyuncu and Kadilar [3,4], Singh and Vishwakarma [5], Shabbir and Gupta [6], Haq and Shabbir [7], Singh and Solanki [8], Yadav et al. [9], Solanki and Singh [10,11], Aslam [12], Bhatti et al. [13], Javed et al. [14], Marin et al. [15][16][17], etc. have contributed to estimate the finite population mean under stratified random sampling scheme. All these contributions and alike published work under a stratified random sampling scheme are based on only the utilization of original auxiliary information. None of them tried the dual use of auxiliary information to enhance the estimation procedure.
Recently, Haq et al. [18] used an additional information of the auxiliary variable called ranked auxiliary variable to develop efficient estimators for the estimation of mean. These estimators are developed only to cope with the simple random sampling scheme.
Here, comes a new challenge/idea for exploring more optimal estimators using dual use of auxiliary information to deal with the stratified random sampling scheme. This challenge is successfully meet and new optimal estimators for finite population mean are developed under a stratified random sampling scheme in this article.
The remaining part of the paper is organized as follows: In section 2, procedures, notations and various estimators under stratified random sampling are introduced. In section 3, proposed estimators for estimating finite population mean using the original and ranked auxiliary information are defined. In section 4, an empirical study is carried out to evaluate the performance of the proposed estimators. Monte Carlo simulation studies are included to successfully validate the theoretical results in section 5. Finally, concluding remarks are enclosed in the last section. We define the following relative error terms and their expectations to drive the expressions for bias, MSE and minimum MSE of the proposed estimators.

Procedure, notations and review of literature
Let us define, Y aZbXc .
(2.1) Using (2.1), we can write as: and be the population correlation coefficient (hth stratum) between Z and X Some well-known estimators for population mean under stratified random sampling scheme are detailed below. All these estimators are based on only original auxiliary information.

Usual unbiased, combined ratio and combined regression estimators are detailed below
(2.6)

Haq and Shabbir [7] proposed two exponential ratio-type families of estimators detailed below
where η is the suitable constant, a st ( = 0) and b st are either real numbers or functions of known parameters of the auxiliary variable.

Singh and Solanki [8] proposed a family of estimators as given below
where η, a st ( = 0) and b st are defined earlier.

Given below is the class of estimators suggested by Solanki and Singh [9]
where η, a st ( = 0) and b st are defined earlier.

Recently, Solanki and Singh [10] defined an improved estimation given as
and a h ( = 0), b h are either real number to parameters related to auxiliary variate X.

Remark 2.5:
By placing the suitable weights in corresponding estimators, we have the following minimum MSE's of above-said estimators. (2.14)

Proposed estimators
In this section, two new exponential-type estimators are proposed for the estimation of population mean using dual auxiliary information in stratified random sampling. Dual auxiliary information refers to the double use of auxiliary variable (i) the original/actual measurements of the auxiliary variable and (ii) the use of ranks of the auxiliary variable. Mathematical properties such as bias and mean square error (MSE) of the proposed estimators are derived up to first order of approximation. The bias of an estimator is the difference between the estimator's expected value and the true value of the parameter being estimated i.e. Bias( Ȳ ) = E( Ȳ −Ȳ) and MSE can be defined as the divergence of the estimator values from the true parameter value i.e. MSE( Ȳ ) = E( Ȳ −Ȳ) 2 .

First proposed estimator
where μ 11 , μ 12 and μ 13 are the suitably chosen weights. The bias and MSE of Ȳ P1 are given below

Second proposed estimator
where μ 14 , μ 15 and μ 16 are the suitably chosen weights. The bias and MSE of Ȳ P2 are given below and By minimizing Equation (3.7), the optimal weights μ 14 , μ 15 and μ 16 are as under: , , Inserting optimal weights of μ 14 , μ 15 and μ 16 in Equation (3.7), the minimum MSE of the proposed estimator is where

Application on a real data
In this section, we compare the performance of newly proposed estimators with the traditional unbiased, combined ratio and combined regression estimators and existing estimators, i.e. Haq and Shabbir [7], Singh and Solanki [8] and Solanki and Singh [10,11]. We considered a real-life data set of Turkey (2007) used by Koyuncu and Kadilar [3]. For the remaining characteristics of the data set, interested readers may refer to Koyuncu and Kadilar [3]. Necessary data statistics are given in Table 1.
We calculated the MSEs of the proposed and competing exponential-type estimators and are presented in Table A1.

Simulation study based on real data
In the previous section, it is clearly observed that proposed estimators are efficient over the competing estimators. In addition, this superiority is assessed through a Monte Carlo simulation study using R software. Again, the real population presented in Table 1 is used. We considered different sample sizes (n = 180, 230 and 280) through the proportional allocation method. The steps of a simulation study to find the average MSE of an estimator are as follows: Step 1: Select a bivariate stratified sample of size n using SRSWOR from the bivariate stratified population.
Step 2: Use sample data from step 1 to find the MSE of all the estimators under study.
Step 3: The whole procedure is repeated 30,000 times and obtain 30,000 values i.e.ŷ for MSEs.
Step 4: Average MSE of each estimator is calculated . Tables A2-A4 present the minimum mean square errors provided by the simulation study. It is quite obvious, as in the previous section, that the proposed estimators Ȳ P1 and Ȳ P2 have the least MSEs over all the competing estimators under study in different sample sizes i.e. n = 180, 230 and 280.
The sequel of the above findings, the performance of the proposed estimators Ȳ P1 and Ȳ P2 is the best among all the reviewed estimators under study.

Concluding remarks
Several estimators for the estimation of finite population mean based only on original auxiliary information under stratified random sampling are available in the literature. Haq et al. [18] built up a family of estimators for evaluating the population mean under simple random sampling scheme by using additional information of the auxiliary variable called ranked auxiliary variable. First time in this manuscript, new optimal estimators are suggested for the estimation of population mean by using the original and the ranked auxiliary information under a stratified random sampling scheme. Mathematical properties such as bias, mean square error (MSE) and minimum MSE of the proposed estimators are derived up to the first degree of approximation. Both real-life applications and simulation studies are performed to access the potentiality of the proposed estimators over the competitors. Numerical findings confirmed that the proposed estimators have the minimum mean square errors than all the other existing estimators such as usual unbiased, combined ratio, combined regression, Haq and Shabbir [7], Singh and Solanki [8] and Solanki and Singh [10,11]. Therefore, new proposed estimators under stratified random sampling are very attractive to the survey statisticians.
The possible extension of this current work to estimate the: (1) finite population mean under other sampling designs like stratified double sampling and different rank set sampling schemes, etc.; (2) other unknown finite population parameters including median, variance, interquartile range and proportions, etc.; (3) population mean of a sensitive variable in the presence of sensitive and non-sensitive auxiliary information.

Disclosure statement
No potential conflict of interest was reported by the author(s).