On discriminating between Libby-Novick generalized beta and Kumaraswamy distributions: theory and methods

In fitting a continuous bounded data, the generalized beta (and several variants of this distribution) and the two-parameter Kumaraswamy (KW) distributions are the two most prominent univariate continuous distributions that come to our mind. There are some common features between these two rival probability models and to select one of them in a practical situation can be of great interest. Consequently, in this paper, we discuss various methods of selection between the generalized beta proposed by Libby and Novick (1982) (LNGB) and the KW distributions, such as the criteria based on probability of correct selection which is an improvement over the likelihood ratio statistic approach, and also based on pseudo-distance measures. We obtain an approximation for the probability of correct selection under the hypotheses HLNGB and HKW , and select the model that maximizes it. However, our proposal is more appealing in the sense that we provide the comparison study for the LNGB distribution that subsumes both types of classical beta and exponentiated generators (see, for details, Cordeiro et al. 2014; Libby and Novick 1982) which can be a natural competitor of a two-parameter KW distribution in an appropriate scenario.


Introduction
Methods of discriminating between two or more probability models (in both continuous and in the discrete domains) are not new in the literature, and it has several useful applications, including, but not limited to, design of experiments, see Atkinson (1970) and the references cited therein.Atkinson and Fedorov (1975) discussed a strategy in which a combined distribution containing the component models as special cases are investigated by constructing summary statistics based on the combined distribution and then formulate a test of departures from one model to the other.Balakrishnan and Ristić (2016) discussed the role of maximum entropy for the selection of parent distribution among two parent distributions, as a procedure for discrimination between two probability models.In this paper, we consider the problem of discriminating between two bounded (defined on (0, 1)) absolutely continuous probability models, namely a two-parameter Kumaraswamy distribution (see, for details, Kumaraswamy (1980), and a three parameter Libby-Novick generalized beta distribution which subsumes both types of classical beta distribution.However, Kumaraswamy argued that the beta distribution and its several generalizations does not faithfully fit hydrological random variable such as daily rainfall and daily stream flow.This motivates our current work in which we want to discriminate between a LNGB and a KW distribution by using the methodology of computing probability of correct selection.In addition, we adopt a new strategy for this purpose which appears to be more efficient that is based on several pseudo-distance measures.
It is interesting to note that under certain parametric conditions, both of these two probability models (i.e., KW and LNGB) reduces to a standard uniform distribution, while the LNGB distribution reduces to a Beta (type-I) distribution if one of the shape parameters takes the value 1.Therefore, in a practical setting, where the data points such as proportions, it will be interesting to see when the data indicates that the estimated values of the parameters closely resembles to the specific conditions for which both of them reduces to either a uniform or beta; and in such a scenario, how efficiently one can distinguish between them.Next, we provide the mathematical expression for the two probability models understudy: • The probability density function (pdf) of a KW distribution with two shape parameters α > 0 and δ > 0 is defined by (1) • The pdf of a three-parameter LNGB distribution is given by (3) where (a, b, β) ∈ R + , and β is the shape parameter, and (a, b) are two scale parameters.
Our main objective paper is to propose and discuss a probability based selection criterion and also a pseudo-divergence measure based criterion that has not been discussed before in the literature to the best of our knowledge which will be utilized to distinguish/discriminate between the two probability models as given in Eqs. ( 1) and (3) within the framework of two non-nested statistical hypothesis.The proposed criterion is based on the asymptotic distribution of the likelihood ratio statistic proposed by Cox (1961Cox ( , 1962)).With this, we obtain the probability of correct selection under the null hypotheses that the data comes from either the two-parameter KW or a three-parameter LNGB distribution and select the probability model that maximizes this probability of correct selection.The test statistic is the logarithm of the ratio of the maximized log-likelihoods under both the null and alternative hypotheses.This statistic is compared with its expected value under the null hypothesis.Small deviations of the expected mean imply evidences in favor of the null hypothesis, while large deviations indicate evidences against.Regularity conditions and a rigorous proof of the asymptotic normality of the Cox's test statistic was provided by White (1982a).In the literature, several authors have worked on this topic, a nonexhaustive list of such works can be found in, for instance, in the works of Bain and Engelhardt (1980), Fearn and Nebenzahl (1991), Gupta and Kundu (2004), Kundu, Gupta, and Manglick (2005), Dey and Kundu (2012), Ristić et al. (2018) and Silva et al. (2014).Here, we provide several pseudo-distance measures and minimum sample size criterion which is the new contribution in this topic.The remainder of the paper is organized as follows.
In Section 2, we discuss the strategy of discriminating between the two probability models given in Eqs.(1) and (3) based on the log-likelihood ratio statistic and its' asymptotic null distribution approach.In Section 3, we discuss the probability of selection criterion based on the material developed in Section 2. Section 4 deals with the discussion a new strategy that is based on several pseudo-distance measures and associated minimum sample size criterion.In Section 5, we provide a lay out for undertaking a simulation study, and a small simulation study is presented to illustrate the feasibility of the proposed methodology.In Section 6, two real-life data sets are re-analyzed to illustrate the efficacy of the proposed methodology of probability of correct selection.Finally, some concluding remarks are made in Section 7.

Discrimination between the LNGB and the Kumaraswamy probability models
Let {X i } n i=1 be i.i.d. with observed values X 1 , . . ., X n from a LNGB (a, b, β) or KW(α, δ) distribution with respective densities.We define Then, the log-likelihood function associated to the LNGB distribution is given by From Eq. ( 5), the MLEs a n , b n , β n of (a, b, β), respectively are obtained as solutions of the non-linear equations that are given as follows: where (m) On the other hand, the log-likelihood function associated with the KW distribution is given by Therefore, the associated MLEs α n , δ n of (α, δ) are given by Next, we define our test statistic as More explicitly, our test statistic can be written as on using Eqs.( 6)-( 8) and Eqs. ( 11)-( 12), respectively.Next, we consider the two cases separately, in each case we consider LNGB (KW) as the true population distribution while the other one as the distribution under the alternative hypothesis.In practical situations, one may consider the Akaike Information Criterion (AIC), or the Bayesian Information Criterion (BIC) as a selection criteria.However, in this paper, we here consider a different selection criterion that is based on the asymptotic distribution of a normalized version of the test statistics W n under the hypotheses H LNGB and H KW .This details will be discussed in the next two subsections.Next, we begin our discussion by focusing on deriving the asymptotic null distribution of the test statistic in two specific scenarios: (i) when the true data distribution is a three parameter LNGB; and (ii) when the true data distribution is a two-parameter KW.

Situation when the LNGB distribution is the null hypothesis
Here our goal is to find the asymptotic distribution of W n under the null hypothesis H LNGB against the alternative H KW .We provide some useful mathematical preliminaries along this process which are given as follows.Let us suppose that X 1 , . . ., X n ∼ LNGB (a, b, β) .For any Borel measurable function h(), the under subscript LNGB in E LNGB (h(X)) will represent the following: Observe that under the null hypothesis H LNGB , as n → ∞, (i) a n → a, b n → b, and β n → β almost surely where (ii) α n → α, and δ n → δ almost surely where The maximum likelihood estimators α, δ are functions of a and b, and β which is not further simplified in order to simplify the notation.The above convergence(s) follow from the results discussed in details by White (1982b).Next, in order to present the asymptotic distribution of the test statistic W n under H LNGB , we need to compute the mean and variance of the random variable W n = log f LNGB (X; a, b, β) − log f KW (X; α, δ) under the condition that X ∼ LNGB (a, b, β) which will be denoted by M LNGB (a, b, β) and Var LNGB (a, b, β), respectively, for which the details of their derivations are given in Appendix A. Next, we provide the following theorem which represents the asymptotic null distribution of W n under the null hypothesis H LNGB .
Theorem 1.The asymptotic distribution of (under the null hypothesis H LNGB ), of W n is given by as n → ∞, where Proof.From the Central Limit Theorem, it fol- Consequently, the remainder of the proof lies in showing the asymptotic equivalence between (a, b, β) .This follows from an adaptation of the results presented in White (1982a).This completes the proof.
In Table 1, some representative values of the M LNGB (a, b, β) and Var LNGB (a, b, β) for specific choices of a and b and for some choices of β are provided.These values are for illustrative purposes as to how the mean and variance of the random variable W n varies for different (given) choices of the parameters a, b, β, and for estimated values of α and δ based on the procedure described earlier.

Scenario when the Kumaraswamy distribution is the null hypothesis
Here, we suppose that , the true probability model for the data.As before, we consider some pertinent preliminaries given below.Note that under the hypothesis H KW , as n → ∞, (i) α n → α, and δ n → δ almost surely, where (ii) a n → a, and b n → b, and β n → β, almost surely, where The quasi-maximum likelihood estimators a, b, β are functions of a and b, and β which is not further simplified in order to simplify the notation.The above convergence(s) follow from the results discussed in details by White (1982b).Next, in order to present the asymptotic distribution of the test statistic W n under H KW , we need to compute the mean and variance of the random variable W n = log f LNGB (X; ã, b, β) − log f KW (X; α, δ) under the condition that X ∼ KW(X; α, δ) which will be denoted by M KW (α, δ) and Var KW (α, δ) respectively.For the details on their variation, see Appendix B. Next, we provide the following theorem which represents the asymptotic null distribution of W n under the null hypothesis H KW .
Proof.The proof is very similar to Theorem 1 and is omitted for brevity.
In Table 2, some representative values of M KW (α, δ) and Var KW (α, δ) are listed for fixed value of δ for some representative values of α.As before, these values are for illustrative purposes as to how the mean and variance of the random variable W n varies for different (given) choices of the parameters α and δ and for estimated values of a, b, β based on the procedure described earlier.
Next, in the two sections, we discuss the two distinct strategies, viz., the probability based and via the pseudo-divergence and minimum sample size based as selection criteria in the context of choosing one of the two distributions assumed in this paper.

Probability based selection criterion
Let us first present asymptotic forms for the probabilities of correct selection (PCS) which are given for the two probability models as follows PCS LNGB (a, b, β) ≡ P (W n < 0) , and PCS KW (α, δ) ≡ P (W n > 0) , under the null hypotheses H LNGB and H KW , respectively.Next, consider the scenario when the null and the alternative hypotheses are H LNGB and H KW , respectively.
From the previous section, PCS LNGB (a, b, β) can be approximated by where () is the cumulative distribution function of the standard normal distribution.Next, consider that the null and alternative hypotheses are H KW and H LNGB respectively.Subsequently, based on the convergence in distribution given in Eq. ( 16), PCS KW (α, δ) can be approximated by where the expressions for M KW (α, δ) and Var KW (a, b, β) are given in Appendix B.
Since the PCS as given in Eqs. ( 17) and ( 18) depends on the unknown parameters, in practice, we replace the parameters with their maximum likelihood estimators.Consequently, we define our selection criterion as follows: (i) If PCS LNGB a, b, β < PCS KW α, δ , choose the KW distribution, otherwise select the LNGB distribution, where the quantities under are the maximum likelihood estimators of the respective parameters.(ii) Equivalently, one can say that if then select the KW distribution, otherwise select the LNGB distribution.

Distances and minimum sample size criterion
In this section, we discuss a procedure to determine the minimum sample size required in order to discriminate between the LNGB, and the KW distributions for a specific value(s) of the PCS and a given tolerance level that can be defined in terms of some pseudo-distance measures to determine the closeness between the two distributions understudy.There are several distance (or pseudo-distance) measures that are available in the literature to study the proximity between two probability distributions, among them, the Kolmogorov-Smirnov (KS), Hellinger distance are worthwhile to mention.For a detailed study on the use of pseudo-distance measures, see Cressie and Read (1984) and the references cited therein.Next, we provide some useful preliminaries in brief in this context.Let f and g be two absolutely continuous density functions with the same support = (0, 1) with distribution functions F and G, respectively.Then, • the Hellinger distance is given by Next, we provide the following Lemmas which describes the expression of the Hellinger and the Power divergence distance for the two distributions as given in Eqs. ( 1) and ( 3).
Lemma 1.The Hellinger distance between the LNGB and the KW distribution will be on using Mathematica.
Lemma 2. The expression for the Power-divergence statistic between the LNGB and the KW distribution will be on using Mathematica, and after some algebraic simplification.
Observe that for small distance between two probability distributions, it is expected that the minimum sample size required to discriminate them will be large.If not, then a small or moderate sample size is sufficient to discriminate between the probability distributions.It is assumed that the user/practitioner will specify in advance the PCS and the tolerance level in terms of the distance between the KW and the LNGB distributions.In case a tolerance level is specified (by means of some distance), the two distribution functions are not considered to be significantly different, if their distance does not exceed the tolerance level.Based on a given value of the PCS and a given tolerance level, one can determine the minimum sample size required to discriminate between the two distributions.Next, we are interested in finding the required sample size n such that the PCS achieves a certain protection level p for a stated tolerance level D 1 .We explain the procedure under the null hypothesis that the true distribution is a two-parameter KW distribution.A similar procedure can mimicked for the LNGB distribution, and that is why it is omitted for brevity.To determine the sample size needed to attain at least a tolerance level p, we set PCS KW (α, δ) = p.Hence, using the asymptotic result as obtained in Eq. ( 18), we get Then, on solving for n, we have where z p is the 100p th percentile point of the standard normal distribution.
Similarly, under the null hypothesis H LNGB , and using the result in Eq. ( 17), the minimum sample size requirement will lead us to the following: Values of Eq. ( 24), for some representative values α, corresponding to δ = 2.5, and p = 0.25, 0.55, 0.75, are provided in Table 3.In Table 4, lists some values of Eq. ( 25) for some representative values of a, and with b = 1.5, and β = 2.5.We will briefly discuss how to use the PCS and the tolerance level in a practical setting.Suppose that an experimenter is interested in discriminating between the LNGB and KW distributions where the null hypothesis is H KW .Further, suppose that the tolerance level is based on the Power divergence statistics, and it is fixed at 0.0211.Therefore, from the Table 3, one needs to take the sample size n ≥ 864, for p = 0.75 to discriminate between the two distributions.For a more accurate result, under the hypothesis H KW (or H LNGB ), a greater range of α (as well as a and b) will be required.

Simulation study
Here, we begin our discussion by providing an outline to carry out the simulation study.We are interested in comparing the asymptotic PCS's under the null hypotheses LNGB and KW with respect to the simulated probabilities based on Monte Carlo simulations.We begin with the case where the null hypothesis is H KW .A similar procedure holds when the null hypothesis is a 3-parameter LNGB, and therefore is omitted for brevity.
Let M be the number of loops of the Monte Carlo simulation and J = (J 1 , . . ., J M ) T be a vector of length M. The steps, for each loop , can be considered as follows: 1. Generate a random sample of size n from the KW (α, δ) distribution.2. Find the MLEs of (α, δ) and (a, b, β) based on the KW and LNGB distributions, respectively.3. Compute the observed value of the test statistic Then, at the completion of the Monte Carlo simulation, the simulated PCS will be We also compute the PCS based on the asymptotic results derived in Section 2 and for the computations, statistical software R is utilized (R Core Team 2019).
From the values obtained in Tables 5-6, it is quite clear that there is a good agreement between the asymptotic and empirical probabilities, mainly for moderate and large sample sizes.
We also observe that, when α approaches 1, the PCSs approaches 0.5.This was expected as α goes to 0, both KW and LNGB distributions converge to the same law.Another expected result we observed is that when n increases the PCS approaches one.
In Tables 7-8, we present the asymptotic and simulated PCS values under the null hypothesis that the true distribution of the data is a three-parameter LNGB for a = 0.2, 0.5, 0.9, 1.5, 2.0, 3.0, 5.0, and n = 25, 40, 70, 85, 100, 150, 400 for a fixed b = 1.25, β = 1.5.In this case we also observe a good agreement between the asymptotic and empirical PCS values.When a is close to one, the PCS values are close to 0.5, and as n increases, the probabilities goes to one, as expected and discussed in the previous case.

Application 1: HIV data set
Here, we consider a Brazilian HIV data set to illustrate the feasibility of the proposed methodology described in Section 3. developed in this article.This data set was originally used by Louzada et al. (2012) and was re-analyzed in Arnold and Ghosh (2017) 17), PCS LNGB a, b, β = 0.5520.Therefore, the probability of correct selection (based on the asymptotic results) is at least equal to min{0.0003608,0.5520} = 0.0003608.Since the PCS is maximum under the hypothesis H LNGB , we select the LNGB distribution.Based on the simulated PCS values, we obtain the same conclusion.

Application 2: On modeling arthritic pain relief times data
We consider a data set from a medical field that has been analyzed previously.The data set reports results from a clinical trial that was performed to assess the efficacy of an analgesic.These data show relaxation periods (in hours) of 50 arthritic patients taking a fixed dosage of certain drug, for details on this data set, see Wingo (1983)

Concluding remarks
Discriminating between two absolutely continuous probability distributions in not new in the literature.However, not much work has been done to study on the discrimination between two probability distributions for modeling continuous bounded data.In this article we discuss and explore the strategy of discriminating between a three parameter generalized beta distribution proposed by Libby-Novick and a two parameter Kumaraswamy distribution.It can be conjectured at this point that similar strategy can be adopted in higher dimension, such as between a multivariate Kumaraswamy distribution and a multivariate generalized beta distribution.The only hindrance it could be in pursuing this research is that the practitioner needs to find a real-life motivation.

Disclosure statement
The author do not have any conflict of interest in preparing this manuscript.

Table 3 .
Values of n and the Hellinger and the Power divergence distances between KW (α, δ) and LNGB (a, b, β) distribution for δ = 2.5, and for some values of α.

Table 4 .
Values of n and the Hellinger and the Power divergence distances between KW (α, δ) and LNGB (a, b, β) distribution for b = 1.5, and β = 2.5 and for some values of a.

Table 5 .
Asymptotic probability under the null hypothesis H KW .

Table 6 .
Empirical probability under the null hypothesis H KW .

Table 7 .
Asymptotic probability under the null hypothesis H LNGB .

Table 8 .
Empirical probability under the null hypothesis H LNGB .