Conditionally unbiased estimation in the normal setting with unknown variances

ABSTRACT To efficiently and completely correct for selection bias in adaptive two-stage trials, uniformly minimum variance conditionally unbiased estimators (UMVCUEs) have been derived for trial designs with normally distributed data. However, a common assumption is that the variances are known exactly, which is unlikely to be the case in practice. We extend the work of Cohen and Sackrowitz (Statistics & Probability Letters, 8(3):273-278, 1989), who proposed an UMVCUE for the best performing candidate in the normal setting with a common unknown variance. Our extension allows for multiple selected candidates, as well as unequal stage one and two sample sizes.


Introduction
Two-stage adaptive trial designs offer an efficient way of selecting and validating multiple candidate treatments within a single trial. A common strategy is to select the best performing treatment (according to some ranking criteria) after an interim analysis, and to then validate its properties in an independent sample in stage 2.
However, selecting and ranking candidates in this way can induce bias into the naïve estimates that combine data from both stages. If the selection rules are not properly taken into account by the estimation strategy, then intuitively one might expect overly optimistic estimates of the performance of the selected candidate, given that it had to perform 'well' in stage 1 in order to proceed to stage 2.
In order to efficiently and completely correct for this selection bias, the technique of Rao-Blackwellisation can be used, where the unbiased stage 2 data is conditioned on a complete, sufficient statistic. The resulting estimator is the uniformly minimum variance conditionally unbiased estimator (UMVCUE). An appealing feature of the UMVCUE is that, as the name suggests, it has the smallest variance (or equivalently, mean squared error (MSE)) amongst all possible unbiased estimators.
This two-stage estimation framework was introduced by Cohen and Sackrowitz (1989), who derived the UMVCUE for normally distributed data. Their work has since been extended to a variety of other trial settings with normal (or asymptotically normal) data. Bowden and Glimm (2008) extended the UMVCUE to apply to unequal stage one and two sample sizes, and when the parameter of interest belongs to the j-th best candidate out of k. In related work, Kimani, Todd, and Stallard (2013) derived UMVCUEs for the means of the selected and control treatments in the seamless phase II/III trial setting with early stopping. Meanwhile, Bowden and Dudbridge (2009) derived the UMVCUE for a two-stage genomewide association study with ranking based on p-values. In a recent development, Bowden (2016a, 2016b) generalised all of these approaches to normal data with an arbitrary correlation structure.
However, apart from the original Cohen and Sackrowitz paper, all of the above analyses assume that the variances of the parameter of interest are known exactly. This may not be a reasonable assumption to make. In practice, the variance of a treatment effect (say) will often be estimated directly from the data of the trial itself. Alternatively, the variance will be based on the results from a previous trial or pilot study on the same treatments.
Cohen and Sackrowitz did derive the UMVCUE for the mean of the selected normal population with a common unknown variance. However, the estimator is only valid for the highest ranked population when the stage 1 sample sizes are all equal, and the stage 2 sample size is equal to one. In this paper, we aim to address these limitations.
In Section 2 we present the model framework and the (corrected) form of the UMVCUE given by Cohen and Sackrowitz (1989). We extend their approach in Section 3, and in Section 4 compare the resulting UMVCUE with the one derived by Bowden and Glimm (2008) that assumes a known variance. A case study based on the INHANCE trial is presented in Section 5, and we conclude with a discussion in Section 6.

Equal sample sizes
To start with, we consider the setting of Cohen and Sackrowitz (1989) with a common unknown variance. Suppose there are k experimental treatments, with each tested on n subjects in stage 1. Let the stage 1 data X i j , i = 1, 2, . . . , k; j = 1, 2, . . . , n, be normally distributed with means μ i and common unknown variance σ 2 . Denote the stage 1 sample means byX 1 ,X 2 , . . . ,X k .
At the end of stage 1, the treatment with the largest sample mean is selected for confirmatory analysis in stage 2. Let Y be a single observation taken from the highest ranked treatment group in stage 2. Also let Q be the event {X :X 1 >X 2 > · · · >X k }. Without loss of generality, we condition on Q, as this can be viewed as simply a relabelling of the treatments.
Cohen and Sackrowitz derived the following UMVCUE for the mean μ 1 of the highest ranked treatment: Remark. Note that there are two errors in the formula for the UMVCUE as presented in Cohen and Sackrowitz (1989). Firstly, the summation in the definition ofS 2 should start from i = 2, and not i = 1. In addition, the denominator of the second term of the UMVCUE should have a factor of 2 2c , and not 2 c+1 .

Extending the UMVCUE
A natural extension to the setting of Cohen and Sackrowitz is to consider unequal sample sizes. Suppose now that treatment i is tested on n i subjects in stage 1. The stage 1 data X i j are again normally distributed with mean μ i and common unknown variance σ 2 . The stage 1 sample meansX i are given byX i = 1 n i n i j X i j . We also now allow there to be more than one subject in stage 2.
Sometimes the properties of treatments that were not the highest ranked are of interest. Hence an additional extension is to find the UMVCUE of the mean of the l-th ranked treatment (l = 1, . . . , k). To this end, letȲ l be the mean of m l additional observations (denoted by Y l j ) taken from the l-th ranked treatment group in stage 2.
Again conditioning on Q, we have the following UMVCUE for the mean μ l .
Theorem 3.1. The UMVCUE of μ l given Q is and we define q 1 = −1 and r k = 1.

Proof. The proof is similar to Cohen and Sackrowitz's. We let
The joint density of (X i j ,Ȳ l ) given Q is where ∝ μ,σ 2 indicates that we are ignoring terms proportional to μ or σ 2 , and K(μ, σ 2 ) = E μ,σ 2 I Q (x 1 , . . . ,x k ) with I Q denoting the indicator function of the ordering condition Q. Hence by the factorisation criterion, The statistic is also complete, since it is from the exponential family of distributions (Lehmann and Romano 2005).
The UMVCUE of μ l is the Rao-Blackwellisation of the unbiased estimatorȲ l , conditional on the complete, sufficient statistic (Z l ,X c , S 2 l ). That is, we seek the estimator This in turn is equivalent to showing that the conditional density gives the right hand side of Equation (3).
Consider a new sampling model, whereȲ * l is the mean of m l additional observations from population l, without having observedX 1 >X 2 > · · · >X k . Let S * l , Z * l , U * l and S * * l be equal to S l , Z l , U l andS l withȲ * replacingȲ in all formulae. Rearranging terms gives the following expression for U * l 2 : Note that U * l is an ancillary statistic, and so by Basu's theorem (Basu 1955 The numerator of the conditional density of U * l |(Z * l ,X c , S * l 2 , Q) is then Equation (6) times with I Q * being the indicator functions of the ordering condition Q * . The denominator of the conditional density of U * l |(Z * l ,X c , S * l 2 , Q) is the integral of Equation (6) with respect to u * l from q * l to r * l . This can be calculated using the fact that U * l 2 follows a β( 1 2 , c) distribution. Putting everything together gives the conditional density equivalent to Equation (4).

Remark 1.
As an important special case, setting l = 1 in Equation (2) gives the following UMVCUE for the mean of the highest-ranking treatment: where n 1 (n 1 + m 1 )/m 1 S Z 1 n 1 + m 1 −X 2 , r = min(r, 1) Remark 2. As a simple check of consistency with Section 2, set n i = n (for i = 1, . . . , k) and m 1 = 1 in Equation (7). Then we recover the corrected Equation (1) of Cohen and Sackrowitz.

Comparison with the known variance setting
Assuming a common known variance σ 2 , we can use the results of Bowden and Glimm (2008) to find the UMVCUE. Letting φ and denote the pdf and cdf respectively of a standard normal distribution, the UMVCUE of μ l is as follows.
Theorem 4.1. The UMVCUE of μ l given Q is where and we define X 0 = ∞ and X k+1 = −∞.
Remark. This is structurally the same as the UMVCUE with unknown variances given by Equation (2). Firstly, both estimators are in the form of the MLE Z l n l +m l minus a bias correction term, where the latter has a multiplicative factor of n l m l (n l +m l ) . The role of the known standard deviation σ in Equation (8) is played by the estimateS l in Equation (2). Hence r l and q l in Equation (2) can be seen to be exact analogues of W l,l+1 and W l,l−1 respectively in Equation (8). Finally, the standard normal density and distribution functions in Equation (8) are replaced by (transformed) beta density and distribution functions in Equation (2).

Simulation study
We now conduct simulation studies to assess how the UMVCUE with a known variance performs when the assumed known varianceσ 2 is not exactly equal to the true variance σ 2 . We compare various estimators for the largest mean μ 1 : r Stage 2 estimator, which is independent of σ 2 . r MLE, which is independent of σ 2 . rÛ known , the UMVCUE which assumes that σ 2 is known, and equal to some valueσ 2 . rÛ unknown , the UMVCUE with unknown σ 2 .
To evaluate the performance of a generic estimator for μ 1 , say μ * 1 , we use the following definitions of the bias and MSE, as in Bowden and Glimm (2008) We first compare characteristics of the estimators for trials with different target powers. Consider testing the hypothesis H 0 : μ 1 ≤ 0 against the alternative H 1 : μ 1 > 0 at the end of the trial. To do so, we use the usual t-statistic whereσ 2 is the pooled sample variance of the stage 1 and stage 2 data, i.e.
Under H 0 , T d follows a t-distribution with d = N + m 1 − k − 1 degrees of freedom. Hence, when testing at significance level α, we reject Suppose we are targeting a power of at least (1 − β) to detect a λ-fold standard deviation increase in the mean, i.e. under the alternative hypothesis H 1 : μ 1 = λσ . This implies that For simplicity, suppose we have equal numbers of subjects for each treatment and each stage, denoted by M, so that n 1 = m 1 = M, N = kM and hence d = M(k + 1) − k − 1. For given values of α, β and λ, Equation (10) can be solved for M numerically.
In our simulation study, α = 0.05 and we vary β from 0.05 to 0.5, which corresponds to trials with powers between 50% to 95%. We set k = 3, with μ = (0, 0, 0), σ 2 = 1 and λ = 0.5. For the estimatorÛ known , we calculateσ 2 as the pooled sample variance given in Equation (9) and simply plug it in to Equation (8). Figure 1 shows the mean bias of the MLE andÛ known as the power varies from 50% (which corresponds to M = 4) to 95% (which corresponds to M = 12). We also simulated the bias ofÛ unknown and the stage 2 estimator, but as expected, this resulted in no bias apart from simulation error. Hence these simulation results are not shown here, and we simply plot where the bias equal to zero. The MLE is positively biased, with a mean bias ranging from 0.21 (for a power between 50% and 58%) to 0.12 (for a power of 95%). In contrast,Û known is essentially unbiased, although there seems to be a very small positive bias when the power is less than 58%. Figure 2 shows the MSE of the four estimators. The stage 2 estimator has the highest MSE, while the MLE has the lowest. The estimatorsÛ unknown andÛ known have approximately the same MSE, which is 29%-30% higher than the MSE for the MLE, but 26%-27% lower than the MSE for the stage 2 estimator.
To demonstrate the differences further, Figure 3 shows the boxplots for the estimators when M = 10, which corresponds to a power of 90%. The boxplots demonstrate the severity of the positive bias of the MLE, and the unbiasedness of the other three estimators. The distribution of the estimatorsÛ unknown andÛ known seem to be almost identical. To explore what sort of estimates ofσ lead to substantial differences betweenÛ unknown and U known , we conduct a further simulation study which does not assume that we are using the pooled sample variance as our estimate ofσ . Keeping M = 10 as before, we vary the value ofσ 2 from 0.25 to 4 and simply plug it in to Equation (8). In order to get a handle on the likely values ofσ 2 that could be estimated from the data, note thatσ 2 ∼ σ 2 d χ 2 d , and hence var(σ 2 ) = 2σ 2 d .  For M = 10, σ 2 = 1 and k = 3, this gives var(σ 2 ) ≈ 0.0556. Hence in our simulation setting, a very simple approximate 95% confidence interval forσ 2 is given by 1 ± 2 × √ 0.0556, which we show on the plots below as a shaded gray region. Figure 4 shows the mean bias of the MLE andÛ known . The MLE has a positive bias of 3 4 √ 10π = 0.1338... for all values ofσ 2 . In contrastÛ known is positively biased forσ 2 < 1 and negatively biased forσ 2 > 1. The symmetry of the positive and negative bias ofÛ known around σ 2 = 1 explains why, on average, using a pooled sample variance estimator (which would be expected to under and overestimate σ 2 almost equally) results in an essentially unbiased estimator.   The estimatorÛ unknown has a MSE of approximately 0.074. This is a 26% decrease compared to the stage 2 estimator, but a 43% increase compared to the MLE. Finally,Û known has a lower MSE thanÛ unknown forσ 2 < 1, which can be explained by regardingÛ known as a sort of shrinkage estimator. Again, the symmetry of the difference between the MSEs ofÛ known andÛ unknown aroundσ 2 = 1 explains why, on average, the MSEs of these two estimators are so similar.

Case study
Finally, we illustrate our methodology using data based on the INHANCE study (Lawrence, Bretz, and Pocock 2014), which evaluated the use of inhaled indacaterol for the treatment of patients suffering from chronic obstructive pulmonary disease (COPD). The study followed a two-stage adaptive seamless design, where the first stage was a dose-finding stage with dose selection at the interim analysis, and the second stage was a confirmatory analysis of the efficacy and safety of the selected doses. The primary outcome was the 24-hour post-dose FEV 1 (which is the Forced Expiratory Volume after one second).
In the study, four doses of indacaterol were tested in stage 1 (75 μg, 150 μg, 300 μg and 600 μg), and two doses were selected based on a set of dose selection guidelines (see pages 84-87 in Lawrence, Bretz, and Pocock (2014) for further details). For simplicity, and in order for the selected doses to be the same as those that would be selected when ranking by sample means, we only consider the three doses 75 μg, 150 μg and 300 μg of indacaterol. Table 1 gives the observed least-squares mean treatment difference versus placebo at week 2 for each dose of indacaterol for the two stages, with the stage 1 results as given in Barnes et al. (2010), and the stage 2 results as given in Donohue et al. (2010). On the basis of ranking by the sample means, doses 150 μg and 300 μg would be selected for confirmatory analysis in stage 2. The INHANCE study had over 100 patients randomised to each of the three doses in stage 1, and over 400 patients randomised to each of the two selected doses in stage 2. With these large sample sizes, we would not expect there to be appreciable differences between the estimators. Hence, for the purposes of our illustrative case study, we consider a trial with only n = (10, 9, 7) patients in stage 1, and 9 patients for each dose in stage 2, as shown in Table 1.
We use the observed mean treatment differences from each stage of the INHANCE study to simulate a realisation of the data with the sample sizes given above. For each dose at each stage, the data was simulated from a normal distribution with mean equal to that observed in the INHANCE study, and a standard deviation of 0.3L (as given in the results of Donohue et al. 2010). Table 2 shows the various estimators for the treatment difference from placebo (L) at week 2 for the two selected doses. Note that forÛ known , we use the pooled sample variance as given in Equation (9) as our estimate forσ 2 . For both doses, there is a drop in the mean between the stage 1 and stage 2 data, with the MLE in-between since it is a weighted mean. For the 300 μg dose, there are only very small differences between the MLE,Û known andÛ unknown , while for the 150 μg dose these three estimators are in fact identical. Hence for these data, there would be no practical difference in using any of these three estimators to correct for selection bias.

Discussion
In two-stage adaptive trials with normally distributed data, it is unlikely that the variance will be known exactly. Our modified estimator allows for efficient unbiased estimation of multiple selected treatments with a common unknown variance, where the stage one and two sample sizes are arbitrary.
As the simulation studies demonstrated, when using the pooled sample variance, on average the UMVCUEÛ known that assumes a known variance will be essentially unbiased, with a very similar MSE to our modified UMVCUE. However, if the variance is under or overestimated, thenÛ known is no longer unbiased, with a higher MSE if the variance is overestimated. We note that our proposed estimator is most useful when the sample sizes are small. Indeed, ifσ 2 is estimated using the pooled sample variance, then for d > 200, var(σ 2 ) < 0.01σ 2 and there is unlikely to be any substantial variation ofσ 2 from the true value σ 2 .
In this paper, we only looked at ranking by the treatment means. However, it should be reasonably straightforward to adapt our result to ranking by other criteria. One example would be ranking by one-sided 'p-value' , where the conditioning would change to the event Q = {X : √ n 1X1 > · · · > √ n kXk }. Then the proof follows through identically to give the formula (2), except that theX l+1 in r l is replaced by √ n l+1Xl+1 / √ n l , and theX l−1 in q l is replaced by √ n l−1Xl−1 / √ n l .
The proof of the UMVCUE is quite a delicate one, and we do not see a straightforward way to generalise the result to the setting where we no longer assume a common unknown variance but instead let treatment i have variance σ 2 i say. The problem is that U * l 2 as given in Equation (5) is no longer a ratio of independent chi-squared distributions, and would now depend on the unknown parameters σ 2 1 , . . . , σ 2 k . Finally, the focus of this paper was on point estimation, but it is natural to try to derive confidence intervals for the UMVCUE as well. One approach is to use a bootstrap procedure, similar to that described in e.g., Bowden and Glimm (2008) and Robertson, Prevost, and Bowden (2016a). Alternatively, the approach of Sampson and Sill (2005) can possibly be extended to give exact confidence intervals in the unknown variance setting.