Some inferences based on a mixture of power function and continuous logarithmic distribution

In recent decades, many families of distributions and, consequently, new distributions are proposed in order to provide a good flexibility and fit to real data sets. However, several of these distributions have a complicated shape to express their probability density function (pdf) and cumulative distribution function (cdf). Examples of families that involve such functions are: beta-G (see Eugene et al. Beta normal distribution and its applications. Commun Stat - Theory Methods. 2006;31:497–512, for example) and gamma-G (for more details, see Nadarajah et al. The Zografos–Balakrishnan-G family of distributions: mathematical properties and application. Commun Stat - Theory Methods. 2015;44:186–215) families. In this sense, we introduce a new bounded distribution by using a mixture of power function and continuous logarithmic distribution, named as the power logarithmic (PL) distribution, that has a simple form in the expressions of its pdf and cdf. Various statistical and mathematical properties of the new model are obtained in closed form, which is a very positive aspect when we propose a new model. Based on the basic properties, two new characterizations of the new model will be given. Finally, the applicability of PL model to modelling real data is proved by two real data sets, showing the good fit of the new distribution, when compared with others already know in the literature.


Introduction
The power function is a simple distribution model. It is a special case of the beta model. It is one of the distributions to study the reliability of electric devices (see Meniconi [1]). This simple distribution is preferred by most of the engineers to find future rates and reliability info over other distributions. Many authors have studied various aspects of the power function distribution (see Meniconi [1], Rider [2], Lwin [3] and Arnold and Press [4]). Also, Kabir and Ahsanullah [5] obtained estimation location and scale parameter of power function model. Further, Ahsanullah [6] gave characterizations of the power function model. Tahmasbi and Rezaei [7] introduced a new two parameter decreasing failure rate distribution by mixing exponential and logarithmic distributions. Also, Athar and Abdel-Aty [8] studied characterization of general class of distributions by truncated moments. The statistical literature contain many extended forms of the distribution. For example, Mc-Donald modified Burr-III distribution (Mukhtar et al. [9]) and weighted exponential Gompertz distribution (Abd El-Bar and Ragab [10]).
In recent years, many distributions have been proposed in order to adjusting several type of data. The odd log-logistic-Stacy distribution is proposed by Prataviera et al. [11] and deals with the regression model of that distribution with applications in survival analysis. Although the proposed distribution has bimodality with one of the forms of its probability density function (pdf), this distribution has a big expression for the pdf that involves complicated functions as gamma and incomplete gamma ratio functions.
Handique et al. [12] introduce the beta generated Kumaraswamy-G family of distributions that is a generalization of the Kumaraswamy-G family. Again, despite having some flexibility in its hazard rate function (hrf), the expression of pdf and cdf involve the beta and incomplete beta ratio functions.
The generalized Kumaraswamy-G family of distributions, proposed by Nofal et al. [13] and which extends the Kumaraswamy-G family, has simple expressions for the pdf and cdf and presented good flexibility in their pdf and hrf. However, the mathematical properties are not obtained in closed forms.
Bhatti et al. [14] introduce a new family called Burr III Marshall Olkin family and present some special submodels. Some of them have a bimodality shape in their pdf. However, like the other families mentioned above, the expressions for the mathematical properties of that family are not expressed in closed form.
In this sense, the goal of this paper is to propose a model, more flexible than the power function distribution, with support in (0,1) and having three parameters, that performs well in these situations.
Besides that we introduced a distribution that extends the power function model, but which does not have complicated functions in the forms of its pdf and cumulative distribution function (cdf), as can be seen in the following section.
The article is sketched into the following sections. We describe the new model and some of its important features in Section 2. Various statistical properties of the power logarithmic (PL) model are derived in Section 3, including moment generating function, moments, skewness, kurtosis and the distributions sum, products and ratio. Also, we discuss the residual life random variables for the PL model. In Section 4, we investigate Rényi and Shannon entropy. The maximum likelihood estimators of the PL parameters, asymptotic and expected information matrix are discussed in Section 5. The results for the new characterizations of the PL model by using truncated moments are obtained in Section 6. Section 7 is related to order statistics. We provide a simulation study in order to verify the asymptotic properties of the parameters vector, varying the true parameter vector and the sample size n in Section 8. Two real data applications for the PL model discuss in Section 9, while the concluding remarks are presented in Section 10.

The proposed model
A random variable X is said to have a PL distribution if the cdf is where α is a shape parameter and β, δ are scale parameters. The corresponding pdf takes the form The pdf (2) can be defined as a two-component mixture The survival and hazard rate (hr) functions of X are given, respectively, by If a random variable has cdf given by Equation (1) with parameters α, β and δ, we denoted it by PL(α, β, δ). Besides that Equations (1) and (2) do not involve any complicated function. It is a good point of this model.
Following the Qian idea (see Qian [15]), from Equation (4), we have Taking derivative, we obtain So, we can rewrite the last equations as The sign of h (x) is the same as the sign of s(x) since h(x) > 0. Thus, we have two important shapes to the hrf if we analysis the plot of the function s(.) as presented in Figure 1. Since our proposed model has only one shape parameter (α), the shapes of the hazard depend on this parameter. It is easy to verify that if α > 0.2, the hazard is increasing (solid line in Figure 1) and otherwise, the hazard has the bathtub shape (dotted line in Figure 1). The pdf and hrf for different parameter values are shown in Figure 2. From Figure 2, we note that the plot (Figure 2(a)) indicates how the three parameter α, β and δ affect on the PL density and show that the density can take various forms, which are increasing, decreasing, constant, right-skewed, left-skewed and upside-down bathtub stapled. While the plot (Figure 2 (2) Submodel Logarithmic distribution (β, δ) α + 1 = σ and δ = 1 Log-Lindley distribution (σ , β) α + 1 = σ , β = 0 and δ = 1 Transformed gamma distribution (σ ) δ = 1 Weighted log-Lindley distribution (α, β, c = 1) [New] increasing, U and bathtub-shaped hrfs. Additionally, our distribution contains some well-known models as special cases, these submodels being listed in Table 1.
We note some motivations for the proposed distribution: • Our distribution contains several distribution as special cases (as listed in Table 1). • Figure 2 shows the shape of the density function that can be unimodal, increasing and decreasing. It is good point because the power function model is a special case (see Table 1) and has only the increasing shape for the density function. • As well as saw, the PL hazard can be increasing, U and bathtub-shaped. • The first four moments of X ∼PL(α, β, δ) can be obtained in closed form. As "closed-form", we understand an expression in terms of a number of know function. • We can obtain a semi-closed MLE for one of the parameters (see Section 4.1). • Several properties of our proposed model are obtained in closed form.
Through the paper, we will use the following lemma, some integrations and some special functions.

Lemma 2.1: Let
And the used integrations are Finally, the used special functions are (i) The exponential integral function defined by (ii) The hyperbolic sine integral function, often called "Shi function" is defined by Shi(z) = z 0 ((sinh t)/(t))dt. (iii) The hyperbolic cosine integral function, often called "Chi function" is defined by where γ is the Euler constant. It has the series expansion

Mathematical properties
In this section, we introduce some important mathematical properties for the PL model, including moment generating function, moments, skewness, kurtosis, the distributions of sums, products and ratios and residual lifetime moments.

Moments
The moment-generating function (mgf) and the rth moment for the PL distribution can be defined as respectively. In particular Now, the skewness (γ 1 ) and kurtosis (γ 2 ) can be obtained using the following relations In Table 2, we introduce the mean, variance, γ 1 and γ 2 for different values of the parameters α, β and δ. It is observed that the mean of PL increases as α increases, while the variance decreases as α increases for fixed β and δ. Also, for fixed α and δ, the mean and the variance increase as β increases. Further, the mean and the variance decrease by increasing δ for fixed α and β.
Additionally, Table 2 clears that for fixed β and δ, the skewness decreases as α increases, while the kurtosis first decreases and then increases as α increases. The skewness and kurtosis decrease when β increases for fixed α and δ. Also, Table 2 reveals that for fixed α and β, the skewness increases as δ increases while the kurtosis increases when δ increases. Hence, Table 2 indicates that α, β and δ affect the shape of the PL model.

Sums, products and ratios
In this subsection, we derive the exact distributions of sums, products and ratios of PL variables.

Remark 1:
The results above come directly using the known definitions of

Residual life function
Some measures of residual lifetime of the PL distribution are obtained in this section, including survival, density, hrfs, mean and variance. The survival function of the residual lifetime ξ t for the PL model is The corresponding pdf of ξ t takes the form Based on Equations (9) and (10), the hrf of ξ t is The mean residual lifetime (MRL) ξ t for the PL model is where μ / 1 can be obtained using (8), and ψ(t ; 1 , α, β, δ) is obtained by Lemma 2.1 for r = 1.
Additionally, the variance residual lifetime ξ t for the PL model is defined by where μ / 2 is given by (8), and ψ(t ; 2 , α, β, δ) is obtained by Lemma 2.1 for r = 2. The pdf of ξ t and MRL for the PL model are shown in Figure 3. These plots present the possible shapes of the pdf of ξ t . Also, we note that the MRL decreases with increasing the time t.

Entropy
In this section, we introduce the Rényi and Shannon entropy for the PL distribution.

Statistical inferences
Here, we discuss the maximum likelihood estimates (MLEs) of the parameters of PL model and construct the expected Fisher's information matrix.

Maximum likelihood estimates
The log-likelihood function for our distribution parameters from a size n is Differentiating Equation (12) with respect to α, β and δ, we have the following equations: The MLEs α, β and δ of α, β and δ, respectively, can be obtained by solving the above nonlinear equations numerically using the statistical software Mathematic package.
However, note that from Equation (13), we can obtain a semi closed form to the MLE of α.
log(x i ). Thus, we can write:

Fisher's information matrix
Here, we construct the expected Fisher's information matrix. First, we introduce the following lemma to help us to provide Fisher's information matrix.

Lemma 5.1: Let X have the pdf (2). Then, the expectations of
are, respectively, given by The expected Fisher's information matrix for sample size n is given by where each element can be found in Appendix 1.

Characterization of PL model
Here, we derive two characterization of the PL model by truncated moment. For this, we will provide the following assumption and lemmas to prove our characterization theorems.
where c is a constant.
if and only if X has the PL model with pdf defined by Equation (2).

Theorem 6.2:
Suppose that X satisfies the conditions of Assumption 6.1 with α = 0 and β = 1. and

Order statistics
If X 1 , . . . , X n is a random sample from the PL distribution and let X 1:n < . . . < X n:n be the order statistic. Then the pdf f i:n (x) of the ith order statistic X i:n is given by Where

Simulation study
In this section, we provide a simulation study in order to verify the asymptotic properties of the parameters.
In this sense, we did a Monte Carlo simulation with 1000 replications and considering n = {50, 150, 250} and (α, β, δ) = (1, 2, 3). The random numbers are generated based on the quantile function of the PL distribution, say Q(u), obtained taking the inverse function of the cdf. Using software as Mathematica, we obtain where W(.) indicates the Lambert function. Table 3 shows that was expected: when the sample size increases, the average estimates (AEs) tend to the true parameters and the mean square error (MSE) decrease. Thus, the asymptotic properties of the parameters are satisfied.

Applications
Here, we provide two applications to show the performance of the new model. To illustrate the good performance of our proposed distribution, we use some packages from R software: Adequacy Model, Gen SA and Mass. We use the goodness fit function (from the first one package) in order to provide the W * (Cràmer-von Misses) and A * (Anderson Darling) statistics. Further, we obtain the MLEs and the respective standard errors. To do this, we use the CG method. About the initial kicks, we use the Generalized Simulated Annealing function (from the second and third packages, mentioned above). In order to illustrate the performance of the new model, we compare it with another well-known distributions, listed below. This study is important since the main point of this section is proof that our proposed model performs better than others well known in the literature in terms of fitting the data sets in question. To do this, we choose some competitive families: beta-G, gamma-G, extended generalized-G and Kumaraswamy-G (for some specific baselines in each application). These generators are very used in many applications and have good adjustments in several cases. But, the most of them presents its density functions involving some complicated functions, which implies, for example, difficulties in finding some properties in closed form. On the other hand, note that the pdf of the proposed model does not involve any complicated function and presents some mathematical properties in closed form.

First data set
For the first data set, we have a "data on proportion of income spent on food for a random sample of 38 households in a large US city" (from betareg package). We choose the variable "food", that indicates the household expenditures for food. Since these data are not in [0, 1], we modified it by dividing each observation by the number of observations of the data. Table 4 gives some descriptive statistics. Table 5 provides the W * and A * statistics of the competitive models. We can conclude the proposed model presents a good fit, even competing with distributions having more parameters. Table 6 shows the MLEs and the respective standard errors (in parenthesis). Note that we have no identificability problems. Figure 4 shows the TTT plot and empirical and fitted density functions. Note that the TTT plot indicates an increasing hazard hate function, and then reveals the adequacy of the PL model to fit these data. We plot only the PL density and the PF density since the last one is competitive with our proposed model. As previous discussion, the PL model had the best performance.

Second data set
Now, we fit the PL model to the data about the total milk production in the first birth of 107 cows from SINDI race. The original data are not in the interval (0,1), and we must make the transformation These data can be found in [26], for example. Table  7 gives some descriptive statistics for the second data.   We note that these data have negative skewness and kurtosis. Table 8 gives the W * and A * statistics for the competitive distributions. We can notice again that our proposed model has the better fit. Besides that, we obtain the MLEs and the respective standards errors (in parenthesis) - Table 9. Figure 5 shows the TTT plot and the empirical and fitted density functions for the second data set. In this case, again, the TTT plot indicates an increasing hrf. Besides that, we plot only the PL density and the GW density since the last one is competitive with our proposed model.

Concluding and discussion
The power logarithmic distribution is introduced by using a mixture power function and continuous logarithmic distribution. The main motivation for introducing this distribution is to propose a model with simple expressions, but that brings with it the flexibility that other more robust models also have. In addition, since such a distribution has many expressions in closed form, this is a positive and important point when comparing it with other distributions that are part of families already known. We also know that proposing new distributions has been a challenge in the area, since much has already been developed. But we understand that it is necessary to discuss distributions that are as simple as possible and that fit as well as other distributions with real data sets. And our proposal has these positive aspects and is, therefore, a great option for the good adjustment to these data. We hope, therefore, that this proposal can motivate the introduction of other distributions like it. In this sense, statistical properties, estimation, information matrix and characterizations of this new model are obtained. The new model has various shapes of density and failure rate functions. Additionally, the new model includes submodels as special cases. A simulation study is performed and we prove that the asymptotic properties of the parameters are satisfied. Finally, two applications are developed in order to illustrate the performance of the new model. As the previous discussion, the PL model had the best performance.

Disclosure statement
No potential conflict of interest was reported by the author(s).