A discussion of ‘A selective review on calibration information from similar studies’

Being a long-time friend of Dr. Qin and served as a supervisor of Drs. Li and Liu, I am as proud as authors of the richness of the content as well as the broadness of this paper. It helps me to play catch up and shames me to work hard rather than hardly work. As a discussant, I wish to come up with some additional insight on this research topic but this is deemed a very difficult task. I should congratulate the authors for covering a vast territory and leave no room for that. Instead, I raise two not so important technical issues which might be of interest to some fellow researchers.


Combination of the empirical likelihood and estimating function
Likelihood-based approaches have many easy-to-uti lize properties for statistical inference. Under regularity conditions on a parametric statistical model and with a set of independent and identically distributed observations, the maximum likelihood estimator of the model parameter is asymptotically normal and has the lowest possible asymptotic covariance matrix, the likelihood ratio test of certain hypotheses generally has superior power, and the test statistic has a convenient chi-squared limiting distribution. These statements are granted, however, by implicitly assuming the users can come up with a defensible parametric model for the data they collected. In many industries, users are reluctant to put forward a parametric model for various reasons. Can there be a nonparametric likelihood that possesses most of the useful properties of the parametric likelihood? The groundbreaking work of Owen (1988Owen ( , 1990 on the empirical likelihood (EL) confirms this possibility. His initial developments focus on inferences on parameters that can be regarded as population means or regression parameters. Qin and Lawless (1995) greatly empower the EL by introducing estimating equations (EE) under this umbrella. They form the basis of a new branch of mathematical statistics.
Presented slightly differently from the paper being discussed, suppose the data generating distribution F is a member of a nonparametric collection of distributions denoted by F. It is further known that E{g(X; θ)} = 0 for some value θ , with the expectation calculated when X has distribution F and the function g(X; θ) is known. Such a model setup uses θ to summarize certain aspects of the population distribution and focuses on the statistical inference regarding θ . The model F does not possess other parametric structures. Amazingly, Qin and Lawless (1995) construct the profile EL for θ and find its nonparametric maximum likelihood estimator (MLE)θ is asymptotic normal with the lowest possible asymptotic variance under some conditions mostly on g(X; θ).
Unlike many asymptotic results in some papers, the conditions here are simple and easy to understand. Two major conditions include that g(X; θ) being smooth in θ and that g(X; θ) and its derivatives with respect to θ having finite moments up to order 3. In other words, the asymptotic conclusions regarding nonparametric MLE are valid under most general conditions and therefore widely applicable. Users are practically free from any model misspecification risk.
There is an important case, population quantile, that their results do not directly apply. An estimating function for the univariate population median, for instance, is given by Suppose this is the only estimating function regarding θ in the model, we do not need a general theorem regarding the large sample properties of its nonparametric MLE under EL. We can directly work on them. Such luck does not extend generally. Some researchers studied the profile EL with a non-smooth estimating function (Molanes Lopez et al., 2009) however, generally lose the simplicity and the beauty of these given in Qin and Lawless (1995). Can we come up with a non-scary set of conditions for nonsmooth g(X; θ) under which the asymptotic results of Qin and Lawless (1995) hold? This question has been in my mind for some time.
Another generally overlooked deficiency, however minor, is the locality of some asymptotic results. To make it clear, let us review a key technical step in the proof of consistency and asymptotic normality of the nonparametric MLEθ . Denote the true parameter value by θ * , the sample size n and the log profile empirical likelihood n (θ ). One proves that n (θ * + n −1/3 a) < n (θ * ) with probability approaching 1 as n → ∞ uniformly with respect to unit vector a. Hence, there must be a local maximum within n −1/3 distance of the true parameter value θ * under some smoothness condition on g(X; θ). It is this local nonparametric MLE, to name it so, that has all the claimed nice asymptotic properties. This line of approach is widely attributed to Cramér (2016) and can be found in Shao (2003).
At the same time, the consistency of the global MLE under a parametric model with minimum conditions is given by Wald (1949). By global, I requireθ to be the maximum point of n (θ ) over the whole parameter space of θ. The same result must be true for the nonparametric MLE of Qin and Lawless (1995), only if we can lay out a set of easily verifiable and reasonable conditions on g(X; θ).
Let me deliberate it further. Following the general practice, let x 1 , . . . , x n be a set of observations on independent and identically distributed random variables with common F ∈ F. They may also be regarded as random variables wherever appropriate. I will also use X for a random variable with distribution F. Suppose F is parametric containing density functions f (x; θ) indexed by θ ∈ . The Jensen inequality is valid almost universally: when the true parameter value is θ * . This leads to almost surely for any fixed θ = θ * . The consistency proof of Wald (1949). is to show that this inequality holds uniformly outside any neighbourhood of θ * . Do we have a similar inequality for the profile EL? Given any θ , let λ = λ(θ ) be the Lagrange multiplier that is the by-product of solving the constrained optimization problem regarding profile EL. It is well known that λ satisfies (or with probability approaching 1) The profile log-likelihood function, after omitting an additive constant n log n, is given by Let us study this expression from a different angle following Chen et al. (2008). Given θ, we define a function of γ as follows: It is seen that ψ(γ ) is convex because its second derivative matrix is positive definite. There is a negligible chance to be semi-positive definite. Hence, ψ(γ ) is convex and attains the minimum at γ = λ = λ(θ ) given by (2). Consequently, we have Let θ be such that E{g(X; θ)} = δ = 0. When g(X; θ) has finite third absolute moment, following Owen (2001), we have Note that log(1 + t) = t + O(t 2 ) when t → 0 and keep in mind about the uniformity of the above claim in i. Applying this fact with the current γ , we find with a probability approaching 1, ψ(γ ) = −n 1/3 ḡ n 2 → −∞ at rate n 1/3 as n → ∞. Hence, we have shown From Qin and Lawless (1995), we already know that 2 n (θ * ) has a chisquare limiting distribution. Hence, we have n (θ * ) = O p (1) and in probability. I leave it as a future project to proving the ultimate consistency of the global MLE.

The efficiency of nonparametric MLE
I am enlightened by the optimality of the nonparametric MLE under the model specified by an estimating function. The nonparametric model may be more precisely presented as F = {F : E{g(X; θ)} = 0 for some θ ∈ } (4) with some smoothness and moment requirements omitted. Let there be a parametric family containing distributions whose density functions are given by Assume the true distribution F * of X is a member of P.
Let the true parameter value be θ * . Given these, F * is also a member of for some normalization constant C(γ , θ). I use γ instead of η in order to use for its parameter space. The true distribution F * under model H has density function h(x; 0, θ * ). Because H is a regular parametric family, its parametric MLE of (η, θ) is asymptotically efficient: they have the lowest asymptotic variance matrix. The paper points out that the asymptotic variances of the parametric and nonparametric MLE of θ are the same. Consequently, the nonparametric MLE of θ has attained the lowest possible asymptotic variance.
There is a hidden agenda behind the above discussion. One cannot attain a lower asymptotic variance matrix for estimating (η, θ) jointly. Still, how should we conclude the nonexistence of a more efficient estimator of θ from here? I cannot come up with a quick justification. In fact, Qin and Lawless (1995) has solid proof that the nonparametric MLE attains the lowest possible asymptotic variance of the estimator based on the optimal estimating function derived from g(x; θ).
Incidentally, I did some calculations and found the gradient This is in contradiction of θ log{h(x; θ * , 0)} = 0 given in the paper in discussion. Interestingly, may we replace f (x; θ) by the true density f * (x) of F * , whatever the σfinite algebra? If so, we still have a well defined extended model H. Now we have θ log{h(x; θ * , 0)} = 0 which further implies E{ θ h(X; θ * , 0)} = 0 so that the Fisher information matrix is block orthogonal. Consequently, it also addresses my previous doubt about the gap between the joint efficiency and marginal efficiency. However, I am not confident in the validity of my suggested choice of f * (x).

Wrap up
Many of my recent research problems came from reading the papers of the first author of this discussion paper. Naturally, I have many join work with other two authors. If history is a reflection of the future, I should greatly benefit from studying this discussion paper. I urge young statisticians with similar research to follow my example.

Disclosure statement
No potential conflict of interest was reported by the author.