Comparison of estimation methods for one-inflated positive Poisson distribution

This paper aims to propose estimation methods for one-inflated positive Poisson (OIPP) distribution and compare their properties in terms of unbiasedness, consistency, efficiency and deficiency. All estimators considered in the study are asymptotically unbiased and consistent. The maximum likelihood estimator (MLE) for the OIPP distribution is asymptotically normal. When compared to the MLE, the ordinary least square estimator (OLSE) is the most efficient, followed by the method of moments estimator (MME) and the ratio of probability estimator (RPE). A novel one-inflation index was also proposed to assess the presence of excess ones in the dataset for the positive Poisson distribution to determine whether a one-inflated distribution is required for model fitting. A real dataset with a large number of ones, as identified by the proposed one-inflation index, was used for model fitting. It is found that the OLSE and MLE are the best estimators for an OIPP distribution.


Introduction
In discrete count data distribution, one method of modelling positive data is to truncate the distribution at zero, resulting in a zero-truncated distribution. Positive count data modelling can be traced back to the mid-twentieth century, when the first zero-truncated distribution, known as the zero-truncated Poisson distribution [1], was proposed. The probability mass function of the zero-truncated Poisson distribution is given as.
Pr(X = x|λ) = λ x x![exp(λ) − 1] ; x = 1, 2, 3, . . . , where λ > 0 and the maximum likelihood estimator (MLE) of λ can be obtained numerically by solvinḡ Subsequently, other estimators for a zero-truncated Poisson distribution were developed and analysed [2][3][4][5]. An asymptotically unbiased estimator of λ was proposed [2] as follows: where n x is the frequency of x-valued data and r is the maximum number taken by x. Similarly, an efficient estimator of λ was proposed [3] as follows: Both (1) and (3) were used to estimate the mortality rate of the number of gall-cells per flower-head [4], and it was discovered that both estimators provide similar estimates for mortality rate estimation. In addition to the estimators in (1-3), a minimum variance unbiased estimator of λ was proposed [5] as follows: C(n, t); 1 ≤ n ≤ t, 1 ≤ t ≤ 50 or n = t ≥ 51 or n = 1, t ≥ 51 t n 1 − n − 1 n t−1 ; 2 ≤ n ≤ 15, t ≥ 51 or n ≥ 16, t n λ 1 orλ 3 ; otherwise, (4) where t = nx, C(n, t) = 1 − S n−1 t−1 /S n t and S n t is the Stirling number of the second kind. The function used for this estimator is dependent on the subdomain as shown in (4).
The negative binomial distribution was modified to the zero-truncated negative binomial distribution as an alternative to the zero-truncated Poisson distribution. As a result, the moment estimator for the zero-truncated negative binomial distribution was proposed [6] and further modified, resulting in a simpler but more efficient estimator [7]. A zero-truncated distribution based on the Poisson-Lindley distribution likelihood was also proposed, leading to the development of estimation methods based on the method of moments and maximum likelihood [8].
Inflated models based on the zero-truncated distribution have been studied to incorporate a large number of ones in the dataset. The proposed distributions include the one-inflated positive Poisson [9], one-inflated zero-truncated negative binomial [10], one-inflated positive Poisson mixture [11] and oneinflated positive Poisson-Lindley [12] models, which are all commonly used to estimate the population size of individuals in capture-recapture experiments. The inflation parameter in the one-inflated model is critical for reflecting the desire and ability of a captured subject to avoid recapture [9]. In line with [9], the goal of our study is to propose multiple estimation methods for an OIPP distribution and to identify the best estimators by considering various estimator properties. An inflation index that can assess the presence of excess ones in the dataset was also developed and analysed. This paper is structured as follows: Section 2 provides a brief overview of the OIPP distribution and its statistical properties. Section 3 proposes estimation methods for an OIPP distribution. Section 4 discusses two simulation studies that were conducted to investigate the performance of the proposed estimation methods in terms of unbiasedness, consistency, efficiency and deficiency. Section 5 proposes a novel one-inflation index to assess the presence of excess ones in a dataset for the positive Poisson distribution to determine whether a oneinflated distribution is required for model fitting. The performance of the proposed index is also addressed in the same section. In Section 6, a dataset is fitted to the OIPP distribution using the proposed estimation methods and model fittings. Section 7 concludes the study.

One-inflated positive Poisson (OIPP) distribution and its statistical properties
Let Y ∼ OIPP(ω, λ), then the probability mass function of Y is given in (5), where λ > 0 refers to the rate parameter and ω refers to the one-modification parameter for the oIPP distribution.
; y ≥ 2 (5) If ω > 0, then the distribution is known as an OIPP distribution [9]. If ω = 0, the OIPP distribution is reduced to a zero-truncated Poisson distribution [1]. It is possible for ω to be negative, such as ω ∈ (−λ/[exp(λ) − 1 − λ], 0), in which case the distribution is known as a onedeflated positive Poisson distribution. In this study, we restrict ω > 0 to ensure the one-inflation property. The formulae for the first two moments about the origin, the dispersion index and the moment generating function are given respectively as Figure 1 illustrates the dispersion index of various pairs of parameters λ and ω. The heatmap in Figure 1 shows that the OIPP distribution can be either underdispersed or overdispersed.
For y ≥ 2, the OIPP distribution is unimodal, which can be deduced based on the decreasing function from the following ratio:

Estimation of parameters for OIPP distribution
Several estimation methods for an OIPP distribution were developed based on the method of moments, maximum likelihood, one-proportion, ratio of probability and ordinary least squares. Fitting a model to a dataset based on its method of moments, maximum likelihood and/or ordinary least squares estimators is a common practice in statistical modelling. On the other hand, the one-proportion and ratio of probability estimators are rarely used.

Method of moments estimator (MME)
The method of moments estimator (MME) is obtained by equating sample moments with theoretical moments. By equating the first two sample moments with the first two equations in (6), the MMEs of λ and ω are obtained by solving.
where m k = n i=1 y i k /n for n data, andω andλ are the respective MMEs of ω and λ.

Maximum likelihood estimation (MLE)
The log-likelihood function l for a random variable Y that follows the OIPP distribution is given as.
where n y refers to the number of y-valued observations and n = ∞ y=1 n y . By differentiating l with respect to ω and λ and setting it to zero, the respective MLEs of ω and λ can be obtained by solvinĝ and where A = (nm 1 − n 1 )/(n − n 1 ) andω andλ are the respective MLEs of ω and λ. A one-proportion estimator (OPE) is the extension to the estimator for the generalized negative binomial distribution. The OPE for the generalized negative binomial distribution can be obtained by comparing the first two sample moments and the sample proportion of zeros to the population proportion [13]. Similarly, the zero-inflated Poisson distribution parameter estimator is obtained by equating the empirical probability with the theoretical probability of zero-valued observations [14]. For the OIPP distribution, the OPE of ω is obtained by equating the theoretical proportion of ones to the sample proportion of one, while the OPE of λ is obtained by equating the population mean to the sample mean. Surprisingly, the OPEs of ω and λ are identical to their respective MLEs.

Theorem 1:
The MLEλ of λ is consistent and asymptotically normal, such that. where Proof: The regularity conditions under which the MLÊ λ is consistent and asymptotically normal is satisfied by the OIPP distribution (see [15,Chapter 6]), therefore where f (y) = Pr(Y = y), and P and Q are respectively given as Solving the summation in I(y) yields From Theorem 1, the asymptotic 100(1 − α)% confidence interval of λ is given as.

Theorem 2:
The MLEω of ω is consistent and asymptotically normal, such that where Proof: The regularity conditions under which the MLÊ ω is consistent and asymptotically normal is satisfied by the OIPP distribution (see [, Chapter 6]), therefore Substituting R in the derivation above yields Theorem 2 implies that the asymptotic 100(1 − α)% confidence interval of ω is given aŝ

Ratio of probability estimator (RPE)
The ratio of probability estimator (RPE) is the extension to the estimator for the generalized negative binomial distribution that can be obtained either by equating the ratio of one-valued observations to two-valued observations or by equating the first two theoretical moments with the sample moments [13]. The same method has been employed to obtain the parameter estimators for the zero-inflated Poisson-Lindley distribution [16]. Our study is premised on [13,16], in which the respective estimators of λ and ω were obtained by considering the ratio of probability for 3-valued observations to 2-valued observations and by equating the population mean with the sample mean. Parameter ω can be eliminated by considering the ratio of probability for 3-valued observations to 2-valued observations, allowing parameter λ to be easily obtained as follows: By further equating λ/3 to the empirical ratio of n 3 /n 2 , the resulting RPE of λ is given as whereλ is the RPE of λ andλ can only be obtained when both n 2 and n 3 are greater than zero. Note that the formula ofλ is very simple, hence can be solved manually.
The calculation of RPE of ω is similar to the calculation of MME of ω. Notice that the RPE of λ is a special case of the general form of the Zelterman estimator studied by Böhning [17] given as (i + 1)n i+1 /n i , where i = 2. Since the OIPP distribution is an extended zerotruncated Poisson distribution, it is only appropriate to use the first approach given by Böhning [17]. This entails truncating the Poisson model at all counts except 2 and 3, resulting in the following log-likelihood function.
with maximum likelihood estimateλ = 3n 3 /n 2 =λ (refer to [17] for a detailed explanation). Based on the findings in [17], the variance of RPE of λ, which is a special case of Zelterman estimator, can be written as The variance can be further estimated by substituting λ withλ, resulting in Vâr λ = 9 n 3 n 2 3 (n 2 + n 3 ) .
Identical results can be obtained using the second approach given by Böhning [17], which considers the nonparametric multinomial approach. Based on the variance and the estimated variance formulae, it can be concluded that the RPE of λ is consistent since as n increases, so does n i . Also, the one-inflation and unimodality properties indicate that n 2 > n 3 , resulting in zero Var(λ) and Vâr(λ).

Ordinary least squares estimator (OLSE)
The ordinary least squares estimator (OLSE) is an estimator that minimizes the function of the squared difference between theoretical and empirical cumulative functions. Suppose y (1) ≤ y (2) ≤ . . . ≤ y (n) is the order statistics of the data that follows the OIPP distribution.
However, for count data, the expected value of the order statistics is best expressed with respect to the data frequency, such that E[F(Y n j )] = y j n j /(n + 1). The OLSE of ω and λ can be obtained by minimizing the following function: where I (·) is an indicator function and n y is the frequency of y-valued data. The estimators ω * and λ * that give the smallest S y ( ω * , λ * ) are the OLSEs of ω and λ, respectively. The resulting estimators have invariance property since OLSE is a special case of the minimumdistance estimation [18].

Simulation study
Two simulation studies were conducted to evaluate the performance of the proposed estimators in terms of unbiasedness, consistency and efficiency. Both simulation studies used λ > 1 to ensure that n 2 > 0 and n 3 > 0, so that the RPEs of ω and λ can be obtained.

Unbiasedness and consistency properties of the estimators
The first simulation study was conducted to evaluate the unbiasedness and consistency of the proposed estimators. The setting for the first simulation study is as follows: Simulation setting: Step 1. Generate n = 100, 200, . . . , 1000 random data samples that follow a positive Poisson distribution with parameters ω and λ, where ω = 0.1, 0.3 and λ = 1.2, 2.0.
Step 2. Change the value of ωn data to 1 at random to ensure the one-inflation property.
Step 4. Repeat Steps 1-3 for a total of 2000 times, and calculate bias Bias(δ ) and mean squared error MSE(δ ) for parameter δ using the following respective formulae  (iii) The estimators with the lowest to highest mean squared error for any given sample size are MLE, OLSE, MME and RPE.
In short, the proposed estimators are asymptotically unbiased and consistent for all values of parameters. RPE has a significantly larger bias and mean squared error than other estimators, but it still may provide a good model fitting for very large samples (n 1000). The best to worst estimators in terms of unbiasedness and consistency are MLE, OLSE, MME and RPE.

Efficiency property of the estimators
The second simulation study was conducted to evaluate the efficiency of the proposed estimators compared to the efficiency of the MLE. The setting for the second simulation study is as follows: Simulation setting: Step 1. Generate n = 1000 random data samples that follow the positive Poisson distribution with parameters ω and λ.
Step 2. Change the value of ωn data to 1 at random to ensure the one-inflation property.
Step 4. Repeat Steps 1-3 for a total of 2000 times, and calculate the efficiency of all estimators compared to the efficiency of the MLE using where δ is the MME, RPE and OLSE of δ, andδ is the MLE of δ. Since the asymptotically unbiased property has been established and variance can be represented as the sum of mean squared error and squared bias, the efficiency of the estimators can be calculated using the following formula:  It can be concluded that the OLSE is almost as efficient as the MLE but more efficient than the MME and RPE. Also, the MME is significantly more efficient than the RPE.
Another way to evaluate estimator performance is by calculating the joint efficiencies of the estimators for the two parameters using the deficiency criterion [19], which is defined as.
where λ and ω are the estimators of λ and ω, respectively. Figure 8 shows that the joint efficiency decreases as the sample size grows for given λ and ω. The MLE, MME and OLSE have similar joint efficiencies, whereas RPE has a substantially larger joint efficiency, even when the sample size is as large as 1000. According to the deficiency criterion, the MLE is the most efficient estimator. The estimators with the best to worst efficiency are MLE, OLSE, MME and RPE. These findings corroborate the findings in Figure 6 and

One-inflation index under positive Poisson distribution
A novel index called the one-inflation index was developed to determine whether a one-inflated distribution is required to model a dataset. By definition, a k-inflation index is an index that assesses the presence of k-valued data with respect to a certain distribution. For instance, the zero-inflation index, denoted as zi p , assesses the presence of excess zeros in a dataset for a Poisson distribution [20,21]. Another zero-inflation index, denoted as zi nb , was introduced to assess the presence of excess zeros in the dataset for a negative binomial distribution [22]. The formulae of zi p and zi nb are respectively given as.
where p 0 is the proportion of zeros, μ is the mean and σ 2 is the variance. When the sample zero-inflation index is greater than zero, this indicates that there are excess zeros in the dataset. Following the works in [20][21][22], a new one-inflation index is proposed to assess the presence of excess ones in the dataset for a positive Poisson distribution. The one-inflation index is denoted as where p 1 is the proportion of ones and d is the dispersion index. If a random variable follows the positive Poisson distribution, then oi PP = 0, while oi PP > 0 indicates that the dataset contains a large number of ones for the positive Poisson distribution. The presence of excess ones in the sample data can be determined by computing the sample one-inflation index. As an example, a dataset of size 5000 that follows the OIPP distribution was simulated, and the one-inflation indexôi PP for a positive Poisson distribution are shown in Table 1. According to Table 1, the index correctly assesses the presence of excess ones in the sample data. When ω = 0, this implies that the index is very close to zero, which indicates that there is no excess of ones in the data for the OIPP distribution since this distribution reduces to a positive Poisson distribution. Trivially, a smaller λ yields a higher proportion of ones in the positive Poisson distribution and a lower proportion of ones contributed by ω in the OIPP distribution. Therefore, the resulting index will be comparatively low (refer to Table 1 for λ = 0.5 and all values of ω). However, when λ is large, the proportion of ones in the positive Poisson distribution will be small, while the proportion of ones contributed by ω in the OIPP distribution will be large. This results in a high index value (refer to Table  1 for λ = 2.5 and positive ω). It can be concluded that the one-inflation index is useful to assess the presence of excess ones in a dataset for a positive Poisson distribution.
This one-inflation index can be a viable alternative to the score test proposed by Godwin and Böhning [9], in which the null hypothesis of the test states that there is no inflation in the data. The score test [9] is given as.
(1) . The information about whether a dataset contains excess ones or not needs to be considered in statistical modelling. If the dataset contains excess ones, then one-inflated positive count data distributions should be considered.

Applications
A dataset on the frequency of a person being arrested for drunk driving [23] was used to demonstrate and    analyse the performance of the proposed estimators in fitting real data. The sample one-inflation index of 0.1545 implies that the dataset contains a substantial number of ones in the data that cannot be explained by a positive Poisson distribution. The model fitting to the data using a positive Poisson distribution yieldŝ λ = 0.1275. The estimated value of λ generates a score s = 30.6686, hence rejecting the null hypothesis of no inflation. Therefore, the OIPP distribution is appropriate for model fitting for this dataset. The chi-square goodness-of-fit test and the root mean squared error (RMSE) of the fitted data were used to evaluate the proposed estimators, in which (n x −n x ) 2 /h, where h is the number of data groups andn x is the estimated frequency for the respective n x . In general, the best estimator provides adequate goodness of fit and has the lowest RMSE.vn The results of the model fitting for the frequency of a person being arrested for drunk driving based on the proposed estimators are shown in Table 2. Based on the goodness-of-fit test and p-values, all estimators are found to be adequate except for RPE. The RMSE of the OLSE is found to be significantly smaller compared to other estimators. Therefore, the OLSE provide the best fit for describing the frequency of a person being arrested for drunk driving.

Conclusions
Several estimators for the OIPP distribution were proposed in this study, namely MME, MLE, RPE, OPE and OLSE, in which the MLE and OPE both yield identical results. Two comprehensive simulation studies were conducted to evaluate the unbiasedness, consistency and efficiency of the proposed estimators. According to the results, the proposed estimators are asymptotically unbiased and consistent. The best to worst estimators based on the bias, mean squared error, efficiency and deficiency values are MLE, OLSE, MME and RPE. A oneinflation index to assess the presence of excess ones in a dataset for a positive Poisson distribution was also proposed. Based on the chi-square goodness-of-fit test and RMSE, the OLSE provides the best fit to the dataset adopted in this study. Therefore, both MLE and OLSE are the best estimators for the OIPP distribution.