Chen-G class of distributions

Abstract The quest to generate distributions with more desirable and flexible properties for the modeling of data has led to an intense focus on the development of new families that are generalizations of existing distributions by researchers. A new family of distributions called the chen generated family is developed in this study. Its statistical properties such as the quantile, moments, incomplete moments, stochastic ordering and order statistics are derived by using the method of maximum likelihood, estimators for the parameters of the new family are developed. Three special distributions, Chen Burr III, Chen Kumaraswamy and Chen Weibull, are proposed from the new family, though it can generalize other distributions. A demonstration of the usefulness of the new family is performed using real dataset.


PUBLIC INTEREST STATEMENT
Modeling of natural phenomena such as earthquakes, rainfall, tsunami and so on mostly involves the use of statistical distributions. Since the accuracy of the results largely depends on how well the distribution fits the dataset, the study develops a new family of distributions which is to improve the flexibility of existing distributions.
In this study a new class of distributions is developed and proposed using the T-X approach . The Chen generated (CG) family of distributions is obtained by compounding the two-parameter Chen distribution (Chen, 2000) and an arbitrary baseline cumulative distribution function (cdf) of a continuous random variable. The main motivation for developing this family is to improve the flexibility of the existing classical distributions, thus to enabling them to provide a better fit to real data sets than other candidate distributions with the same number of parameters and model different kinds of failure rate (monotonic and nonmonotonic).
The remaining sections of the paper follow this order: the Chen generated (CG) family of distributions is defined in section 2. The mixture representation of the probability density function (pdf) is presented in section 3. Some statistical properties of the family of distributions are derived in section 4. The estimators for the parameters of the family are developed in section 5. Some special distributions from the CG family of distributions are proposed and discussed in section 6. Simulations to examine the properties of estimators of parameters of the special distributions are carried out in section 7. Real-life data set is used to demonstrate the application of the special distributions in section 8. Concluding remarks of the study are captured in section 9.

Chen generated a family of distributions
Let T be a Chen distributed continuous random variable, its cdf denoted by FðtÞ is given by FðtÞ ¼ 1 À e λ 1Àe t β À Á ; t > 0 (Chen, 2000). Also, let GðxÞand gðxÞ be the respective cdf and pdf of an arbitrary continuous random variable X. The cdf of the CG family is defined as; where A ¼ 1 1 À e λ 1Àe ð Þ is a normalizing constant, λ and β are scale and shape parameters, respectively. The pdf f ðxÞ of the family is given by; The survival function, SðxÞof the CG family is; The failure rate or hazard function, hðxÞof the family is obtained as follows:

Mixture representation of distribution
Mixture representation plays a useful role in the derivation of the statistical properties of the new family of distributions. Hence, the mixture representation of the pdf of the CG family of distributions is derived in this section.
By applying Taylor series expansion, the pdf of the CG family in Equation (2) is expressed as Equation (5) can be rewritten as; f ðxÞ is further expanded using the binomial series expansion ð1 À zÞ aÀ1 ¼ ∑ 1 k¼0 ðÀ1Þ k a À 1 k z k ; jzj < 1 for any real non-integer a > 0 as follows: Assuming a an integer in the binomial expansion, where From Equation (6), the CG family's density is expressed as a product of the parameters and the sum of the product of the pdf and weighted power series of the baseline distribution function GðxÞ.

Statistical properties
This section discusses some of the statistical properties of the CG family of distributions. These include: quantile function, non-central moments, moments, generating functions, inequality measures, entropies, residual life, stochastic ordering and order statistics.

Quantile function
Proposition 1. The quantile function for CG family of distributions is given by Proof. The quantile function Q G ðuÞ of a random variable X is defined as Fðx u Þ ¼ PðX x u Þ ¼ u; u 2 ð0; 1Þ. Replacing x with x u in Equation (1), equating Fðx u Þ to u and making x u the subject yields the quantile function. The median of the family is obtained by substituting u ¼ 0:5 in Equation (8).

Moments, moment generating functions and incomplete moments
Moments are very essential in statistical analysis as they can be used to study important features (such as tendencies, variation, skewness, kurtosis and so on) of a distribution.

Non-central moments
Proposition 2. The r th non-central moment of the CG family is given by ω ijkl τ ðr;lÞ ;r ¼ 1; 2; :::; where τ ðr;lÞ ¼ ð 1 À1 x r gðxÞ GðxÞ ð Þ l dx is the probability weighted moment of the baseline distribution GðxÞ: Proof. The r th non-central moment is defined as EðX r Þ ¼ μ x r f ðxÞdx, thus using the mixture form of the density, the r th non-central moment of the CG family is given by x r gðxÞ GðxÞ ð Þ l dx: Alternatively, the r th non-central moment of the CG family can be described in terms of the quantile function as follows;

Moment generating functions
Proposition 3. The moment generating function of the CG family is given by ðtÞ r r! ω ijkl τ ðr;lÞ: Proof. By definition, the moment generating function is given by M X ðtÞ ¼ x r f ðxÞdx.
Alternatively, letting GðxÞ ¼ u, the moment generating function can be expressed in terms of quantile functions as;

Incomplete moments
Proposition 4. The r th incomplete moment of the CG family of distribution is given by x r gðxÞ GðxÞ ð Þ l dx; r ¼ 1; 2; :::: Proof. The r th incomplete moment is defined as M r ðyÞ ¼ ð y

À1
x r f ðxÞdx: Substituting the mixture representation of the density function into the definition of the r th incomplete moments completes the proof.
Alternatively, letting GðxÞ ¼ u, the incomplete moments can be expressed in terms of the quantile function as;

Inequality measures
Lorenz and Bonferroni curves are applied in so many fields such as econometrics, demography, reliability, medicine and insurance. They are generally used in studying inequality measures like income and poverty.

Lorenz curve
The Lorenz curve L F ðyÞ for incomplete moments is defined as xf ðxÞdx for the CG family, it is given by; xgðxÞ GðxÞ ð Þ l dx: Alternatively, letting GðxÞ ¼ u, L F ðyÞ can be expressed in terms of the quantile functions as;

Bonferroni curve
Bonferroni curve B F ðyÞ is defined as B F ðyÞ ¼ L F ðyÞ FðyÞ , hence for the CG family it is given by; xgðxÞ GðxÞ ð Þ l dx:

Mean residual life
The mean residual life of a component (which is the average survival time of the component after it has exceeded a specific timey) is defined as EðX À y=X>yÞ: Proposition 5. The mean residual life of a CG random variable Y is given by xgðxÞ GðxÞ ð Þ l dx 2 4 3 5 À y: Proof. The mean residual life is defined as Equation (6) into MðyÞ gives the mean residual life.

Entropy
Entropy is a measure of variation or uncertainty of a random variable. Its application spans across probability theory, engineering and science in general.

Rényi's entropy
The Rényi's entropy (Rényi, 1961) for a random variable with pdf f ðxÞ, is defined as; Proposition 5. Renyi's entropy for the CG random variable is given by; where Adopting similar concept for expanding the density, f δ ðxÞ becomes Substituting f δ ðxÞ into I R ðδÞ completes the proof.

Stochastic ordering
Ordering mechanism in data can easily be shown using stochastic ordering. Let X and Y be random variables with cdfs F X ðxÞ and F Y ðxÞ respectively. X is less than Y in likelihood ratio order ðX lr YÞ, if the function f ðxÞ=gðxÞ is decreasing for all x.

Order statistics
The pdf for the p th order statistic X p:n , of an independent identically distributed random sample X 1 ; X 2 ; :::; X n of size n,f Xp:n ðxÞ, is given by; f Xp:n ðxÞ ¼ n! ðp À 1Þ!ðn À pÞ! FðxÞ ½ pÀ1 1 À FðxÞ ½ nÀp f ðxÞ; p ¼ 1; 2; . . . ; n: Expanding FðxÞ ½ pÀ1 using binomial expansion, into the density of the p th order statistic yields, Hence, the pdf for the p th order statistic is given by; Employing a similar concept of expanding the density of the CG family, a mixture representation of the pdf of the p th order statistic is defined as; where

Moments of order statistics
The r th non-central moment of the p th order statistic is given by EðX r p:n Þ ¼ μ x r f Xp:n ðxÞdx.
Substituting Equation (21) into EðX r p:n Þ; the r th non-central moment of the p th order statistic of the CG random variable is given by, where τ ðr;mÞ ¼ x r gðxÞGðxÞ m dx is the probability weighted moment of the baseline distribution.

Parameter estimation
Maximum likelihood estimation method was used in estimating the parameters for the family of distribution for similar reasons as stated in Nasiru et al. . Given a random sample x 1 ; x 2 ; :::; x n of size n from the CG family of distributions, the total log-likelihood function is given by 1 À e Gðx i ;ψÞ β ; where ψ is a ðp Â 1Þ vector of parameters associated with the baseline distribution.
The parameters are then estimated by partially differentiating the total log-likelihood function with respect to the parameters of the CG family as follows.
Equating the score functions to zero and numerically solving the system of equations using techniques such as the quasi Newton-Raphson method, gives the maximum likelihood estimates for the parameters. The interval estimates of the parameters are obtained by first finding the observed ðp Â pÞ information matrix given by Jð#Þ ¼ @ 2 , @q@r (for q; r ¼ λ; β; ψ and # ¼ ðλ; β; ψÞ T ), whose elements can be numerically computed. Under the regularity conditions, as n ! 1,# e N p ð0; Jð#Þ À1 Þ, where Jð#Þ is the observed information matrix evaluated at#: The approximate100ð1 À ρÞ% confidence intervals (where ρ is the significance level) can be constructed using the asymptotic normal distribution.

Some special distributions
The CG family of distributions can be used to extend many distributions to create more flexibility in their applications. In this section some special distributions were developed.

III (CB) is given by
Its corresponding density and hazard functions are, respectively and Plots of the density and hazard rate functions of the CB distribution are displayed in Figure 1. The density plot exhibit varying shapes including unimodal with different degrees of kurtosis, right skewed and reversed J shapes. The hazard rate function for some selected values exhibited upside down bathtub, decreasing and increasing failure rates.
The CB distribution's quantile function Q G ðuÞ is given by; Figure 1. Plots of density and hazard rate functions of CB distribution. Figure 2. Plots of the density and hazard rate function of CK distribution.

Chen Kumaraswamy distribution
The Chen Kumaraswamy (CK) distribution uses the Kumaraswamy distribution (Kumaraswamy, 1980) with pdf and cdf respectively given by GðxÞ ¼ 1 À 1 À x a ð Þ b and gðxÞ ¼ abx aÀ1 1 À x a ð Þ bÀ1 , 0 < x < 1; a > 0; b > 0 as the baseline distribution. The cdf of CK distribution is given by with its corresponding density and hazard rate functions, respectively, given by and Plots of the density and hazard rate functions of the CK distribution are displayed in Figure 2. The plot of the density shows shapes such as; the reversed J, left skewed, right skewed and unimodal shapes among others. The hazard rate plot for some selected values exhibits increasing and decreasing failure rates, unimodal and bathtub shapes.
The quantile function Q G ðuÞ is obtained as.

Chen Weibull distribution
Chen Weibull (CW) distribution is obtained using Weibull distribution (Weibull, 1951) with cdf and pdf, respectively, given by GðxÞ ¼ 1 À e À x α ð Þ γ and gðxÞ ¼ γ as baseline distribution. The cdf and pdf of the CW distribution are, respectively, given by The hazard rate function is given by The CW distribution's plots of its density exhibit; right skewed, left skewed, unimodal and reversed J shapes among others as shown in Figure 3. The hazard rate plot of the CW distribution for some selected values exhibits varying shapes such as increasing and decreasing failure rates, right and left skewed unimodal shapes and upside down bathtub shape.
The quantile function Q G ðuÞ of the CW distribution is given by

Simulations
Monte Carlo simulations were performed in this section to investigate the behavior of the maximum likelihood estimators of the parameters. For illustration purposes, the simulation experiments were undertaken using the Chen Weibull distribution. The experiments were replicated for N ¼ 1500 times using sample size n ¼ 50; 150; 300; 600; 1000 and parameter values I : λ ¼ 1:9; β ¼ 0:9; α ¼ 0:8; γ ¼ 4:8 and II : λ ¼ 0:5; β ¼ 0:5; α ¼ 0:5; γ ¼ 0:5. The average bias (AB), root-mean-square error (RMSE) and coverage probability (CP) of the 95% confidence intervals for the estimators of the parameters were estimated. From Table 1, the ABs and RMSEs for the estimators generally decrease to zero as the sample size increases. This implies that as the sample size increases the accuracy and consistency of the maximum likelihood estimators are achieved. Also, the CPs for most of the estimators are quite close to the nominal value of 0.95. Thus, we can say that the maximum likelihood technique works very well to estimate the parameters of the Chen Weibull distribution.

Applications
In this section the performance of the CW distribution in providing good parametric fits to real-life datasets is demonstrated. Its goodness of fit measures are compared with competing models such as; exponentiated Chen (EC) (Chaubey & Zhang, 2015), extended Weibull (EW) (Xie, Tang, & Goh, 2002) and   Kumaraswamy exponentiated Chen (KEC) (Khan, King, & Hudson, 2018) distributions. The information criteria and goodness of fit measures used are; Akaike information criteria (AIC), Bayesian information criteria (BIC), consistent Akaike information criteria (CAIC), HQ information Criteria (HQIC), Kolmogorov-  Smirnov statistic(KS),Cramer-von misses distance values (CM) and Anderson Darling statistic (AD). In obtaining the maximum likelihood estimates for the parameters, the log-likelihood function of the models were maximized using the bbmle package's subroutine mle2 in R (Bolker, 2014). The maximum likelihood estimates with the largest maxima were chosen after using a wide range of initial values.
For illustration, the first dataset (data1) consists of the fatigue times of 6061-T6 aluminum coupons cut parallel with the direction of rolling and oscillated at 18 cycles per second found in Birnbaum & Saunders (Birnbaum & Saunders, 1969), whilst the second dataset (data2) represents survival times of guinea pigs injected with different amounts of tubercle bacilli studied by Bjerkedal (Bjerkedal, 1960). These datasets are given in Tables 2 and 3. A preliminary exploration of the datasets on the shapes of the hazard rate functions showed that data1 has an increasing hazard rate function whilst data two have a unimodal hazard rate function as shown in Figure 4.
The maximum likelihood estimates and the corresponding standard errors of the parameters of the fitted distributions for both datasets and their goodness of fit measures are displayed in Tables 4 and 5 respectively. The parameters of all the distributions were significant at 5% significance level, with the exception of CW and KEC distributions which had only one of their parameters (λandbrespectively) significant at 15% significance level. Compared to the competing models, the CW distribution with its four parameters provides a better fit for the datasets as it has the smallest value for all the goodness of fit measures used as shown in Table 5. This is further confirmed by the plots of densities and cdfs of the empirical and fitted distributions as shown in Figures 4 and 5. From the fitted plot, it is observed that the CW provides a reasonable fit to the density.
The P-P plots also indicates the CW distribution provides a better fit for both datasets in comparison with KEC, EC and EW distributions as shown in Figures 6 and 7.
The profile likelihoods of the estimated parameters of the CW distribution for the datasets are shown in Figures 8 and 9. From the plots, it is observed that the estimated values for the parameters are the maxima.

Conclusion
The focus of most researchers is geared towards developing new families of distributions for generalizing existing distributions to provide better fit for the modeling of life data. A new family of distribution called the CG family is developed and studied. Its statistical properties such as the quantile, moments, incomplete moments, generating function, entropies, stochastic ordering and order statistics are derived. Estimators for the parameters of the new family were developed using the method of maximum likelihood. A demonstration of the application of the special distribution developed from the family was carried out using two-real datasets. A comparison of the results with that of other existing distributions showed that the special distribution developed from the CG family provide a better parametric fit to these datasets.  Figure 9. Profile log-likelihood plot of CW parameters for data1.