Adaptive Multiple Comparison Sequential Design (AMCSD) for clinical trials

ABSTRACT We propose an adaptive sequential testing procedure for clinical trials that test the efficacy of multiple treatment options, such as dose/regimen, different drugs, sub-populations, endpoints, or a mixture of them in one trial. At any interim analyses, sample size re-estimation can be conducted, and any option can be dropped for lack of efficacy or unsatisfactory safety profile. Inference after the trial, including p-value, conservative point estimate and confidence intervals, are provided.


Introduction
Clinical trials are expensive.Moore et al. (2018) analyzed the costs for the pivotal trials for novel therapeutic agents approved by the US Food and Drug Administration during the period of 2015-2016: "Trials designed with placebo or active drug comparators had an estimated mean cost of $35.1 million (95% CI, $25.4 million-$44.8million)".The cost for the most expansive trial in this report was $346.8 million.Many efforts have been made to reduce the cost.One category of such efforts is the multi-arm-multi-stage (MAMS) design.Such a design essentially combines several trials into one, and multiple treatments are compared to a shared single control arm, hence eliminating the need for several control arms in separate trials and reduces the total sample size.
The main advantage of MAMS is cost efficiency (Freidlin et al. 2008 ;Jaki and Hampson 2016;Jaki and Wason 2018) from this sample size reduction.Other benefits of reduced sample size include the shortening of trial duration and the reduction of burden of patient recruitment, especially in trials that require a large number of patients per treatment arm, or in rare diseases where the enrollment is extremely difficult due to small patient population.For example, the European Medicines Agency issued a "Call to pool research resources into large multi-centre, multi-arm clinical trials to generate sound evidence on COVID-19 treatments" (19 March 2020).The MAMS design has been adopted in practice, e.g., i) The multi-arm optimization of stroke thrombosis (National Institute of Health n.d.); ii) A 3-arm multi-center, randomized controlled study comparing transforaminal corticosteroid, transforaminal etanercept and transforaminal saline for lumbosacral radiculopathy (clinicaltrials.gov, NCT00733096).
The main statistical challenges in such trials include: i) Family-wise error (FWE) control with simultaneous testing of multiple hypotheses at multiple time points.An optimal control is such that the FWE never exceeds the nominal error level but can be equal to the nominal level under certain conditions.A conservative control is such that the FWE never reaches the nominal level under any CONTACT Ping Gao ping.gao@innovativstat.comInnovatio Statistics, Inc., 1019 Chambers Ct, Bridgewater, NJ 08807, USA Supplemental data for this article can be accessed online at https://doi.org/10.1080/10543406.2023.2233590.situation.The Bonferroni method is an example of conservative control.Conservative control will lead to reduced power.Hence, methods that achieve less conservative or optimal control are more preferable; ii) Sample size and power calculation (as always, such calculations involve assumptions on the effect size) which are important to trial planning; iii) The assumptions for sample size calculation could be inaccurate (due to uncertainty about effect sizes) and adaptive measures, such as sample size re-estimation (SSR), may be desirable to achieve adequate power.To obtain a proper new sample size, a method for calculating conditional power will be necessary.Further, FWE must be adequately controlled when adaptations are applied to the trial; iv) Inference must be made after the completion of the trial.Hence, p-values, estimates of the effect sizes, confidence intervals are desirable.v) When adaptations are made to the trial, inferences will be more complicated, but their relevance and importance is not reduced; vi) Computations with multiple tests at multiple time points could be complicated and very time consuming.A MAMS procedure won't be practical without efficient means of computations (e.g., Ghosh et al. 2017).
In addition to the need for investigating multiple treatments in one trial as in MAMS, there are also other types of trials that share the same statistical challenges as MAMS.These include trials that investigate multiple sub-populations (FDA guidance [population enrichment], 2019), and/or multiple endpoints (FDA guidance [adaptive design], 2019), in one trial.Statistical solutions for MAMS can also be applied to such trials.Sequential designs and adaptive sequential designs that include multiple options such as multiple treatment (e.g., doses/regimens), multiple sub-populations (FDA guidance [population enrichment], 2019), and/or multiple endpoints (FDA guidance [adaptive design], 2019), may be more generally named as (adaptive) multiple comparison sequential designs (AMCSD/ MCSD).In this article, we propose such a procedure that addresses all of the above-mentioned statistical issues.

Background
Take a trial that tests multiple doses as an example for MAMS, other situations can be dealt with similarly.Suppose that the trial includes M dose groups and a control arm, and a total of K interim/ final analyses.Let θ m (m = 1,. .., M) be the efficacy for each dose being considered, larger θ indicates better efficacy.Let θ ¼ θ 1 ; . . .; θ M ð Þ.Let the related hypotheses for each dose comparison be: null hypothesis H 0;m : θ m ¼ 0, and one-sided alternative hypothesis H a;m : θ m > 0. The overall null hypothesis is H a;m , at least one θ m > 0, or equivalently, θ max > 0. At the i-th interim analysis, each of the M parameters θ 1 ; . . .; θ M , is associated with an estimate θi;m , and the standard error s:e:ð θi;m Þ, m ¼ 1; . . .; M. The comparison of each dose and the control is associated with a Wald statistic Z i;m ¼ θi;m =s:e:ð θi;m Þ, or equivalently, a score function S i;m ¼ θi;m =½s:e:ð θi;m Þ� 2 .At each interim analysis, these Wald statistics or score functions form an M -dimensional random vector, with a multi-variate normal distribution.Hence, there will be a total of M � K Wald statistics or score functions.Thus a MAMS trial involves the testing of these M � K test statistics and the conditional probabilities associated with the adaptive decisions during the trial.A fundamental challenge is to control family-wise error (FWE).A major distinction between the Wald statistics and the score functions is that the score function has independent increment and is approximately a Brownian motion (Jennison and Turnbull 1997).The score function vectors form a multi-dimensional Markov process with independent increment.The research into the MAMS design can be traced at least back to 1989 (Bauer et al. 2015).Several methods employ Dunnett's multiple-testing procedure (on each M -dimensional random vector of the Wald statistics) in some form (Magirr et al. 2012;Wason et al. 2016Wason et al. , 2017)).Stallard and Friede (2008) uses the score function and Dunnett's procedure (on each Mdimensional random vector of the score functions) to construct a group-sequential design where a set number of treatment options are dropped at each interim analysis.The null hypothesis is rejected if the test statistic is above a predefined efficacy threshold.It requires the number of doses to be dropped at any given interim analysis to be pre-specified, hence limiting its flexibility.Bretz et al. (2006) used closed testing procedures and combination tests to control the type-I error whilst allowing many modifications to be made at the interim.The control of FWE in these methods is a two-level procedure, one level controls the FWE for each M -dimensional random vector at each interim analysis, then the combined FWE at all interim/final analyses is controlled.A drop-the-loser approach proposed by Chen et al. (2010) utilizes the multi-variate normality of the Wald statistics and constructed a sequential design with interim analyses that is parallel to the usual group sequential design for comparing one experimental treatment and a control (e.g., O'Brien and Fleming 1979;Pocock 1977).It requires the assumption that the covariance coefficient to be ½, this limits the types of trials the method can be applied to.This method does not use multi-testing procedures, such as the Dunnett's procedure or closed testing procedure to control the FWE.All the above-mentioned methods do not provide p-values and estimates, and there are no sample-size re-estimations.Gao et al. (2014) considers an adaptive group sequential design using a Markov process (formed by the score functions) approach.The procedure does not use the Dunnett method or closed testing procedure.Instead, it uses Markov transition probabilities for the calculations of the critical boundaries, type I error control and sample size re-estimation.It provided exact FWE control in the strong sense, sample size calculation, and sample size re-estimation.It is limited to cases in which the covariance matrix of the score functions are known (it does include cases in which the covariance coefficient is ½ such as in Stallard and Friede 2008;or;Chen et al. 2010).Gao et al. (2014) only included p-values but not parameter estimates after the trial.Both Chen et al. (2010) and Gao et al. (2014) use the correlations of the K M-dimensional random vectors to control the FWE, instead of the two level FWE control such as those by Bauer et al. (2015), Wason et al. (2016), Wason et al. (2017), Magirr et al. (2012), Stallard and Friede (2008), Bretz et al. (2006).All of previous literature, including Chen et al. (2010) and Gao et al. (2014) investigates the properties of K separate M-dimensional random vectors (Chen et al. 2010 uses the Wald statistics, and Gao et al. 2014 uses the score functions).The algorithm in Gao et al. (2014) uses the Markov transition probabilities which involves iterative multi-dimensional integrals and is very time-consuming (see Ghosh et al. 2017).Ghosh et al. (2017) proposed new computing algorithm which linearized the multi-dimensional integrals and greatly reduced the time for the computations, but the computing code is very complicated.

Features of proposed method
In the theory of GSD that compares a test treatment and a control, the recursive algorithm of Armitage et al. (1969), which involves iterative integrals, has been essential for the derivation of critical boundary, sample size/power calculation (e.g., Jennison and Turnbull 2000).In a GSD, a score function S t ð Þ (is the information time) can be created, which is approximately a Brownian motion (Jennison and Turnbull 1997) and also a one-dimensional Markov process.All the methodologies regarding the GSD can be explained using intuitions about the trajectories of the score function as a Brownian motion/Markov process (e.g., Gao et al. 2008Gao et al. , 2013)).With this intuition, a Brownian trajectory corresponds to the score function curve of an infinitely long clinical trial.A GSD involves K information time points 0 < t 1 < . . .< t K , score function values S t 1 ð Þ; . . .; S t K ð Þ can be obtained, which has a joint multivariate normal distribution, with a K � Kcovariance matrix.The entire theory of GSD can be described using this multivariate normal distribution and some calculus (see online supporting material, section 1), without using the recursive algorithm.We refer to this as the Brownian motion approximation (BMA) approach for GSD.In this article, we use a parallel BMA approach on MCSD/AMCSD and improve upon Gao et al. (2014).For the comparison of the m-th dose (suppose that there are m doses) and the control, a score function S m t ð Þ can be created.Suppose that there are K interim analysis/final analyses at information time points 0 < t 1 < . . .< t K .There will be M � K score functions, S m t 1 ð Þ; . . .; S m t K ð Þ, m ¼ 1; . . .; M. These score functions have a joint multivariate normal distribution, with an All previous methods on MAMS in the literature use some kind of iterative algorithm (e.g., see citations of literature in previous section).By using the multivariate normal distribution of these M � K score functions, our proposed method does not use any iterative algorithm, and is thus different from all methods in the literature.The proposed procedure is entirely parallel to that of the BMA approach for GSD/ASD (see online supporting material, section 1): i) Similar sample size and power calculations (i.e., using integration over multivariate normal distribution); ii) similar type I error control using critical boundaries(i.e., using integration over multivariate normal distribution); iii) Similar SSR using conditional power; iv) Similar type I error/FWE control when there is sample size change (i.e.maintaining conditional type I error/FWE, as in Gao et al. 2008; The conditional type I error/FWE are obtained by using integration over multivariate normal distribution); v) Similar logic for calculating inferences (i.e., using the inverse of a monotone function to obtain point estimate, p-value, and confidence interval, as in Gao et al. 2013); vi) Similar logic (using backward image, Gao et al. 2013) for obtaining inferences when there was sample size change.
With (and because of) this algorithm, the proposed procedure possesses the following features: i) FWE control is optimal.The FWE equals to the nominal error level in the special case when the offdiagonal elements of the covariance matrix are known (e.g., off diagonal elements are 0.5, or zero) and the "play the winner" rule (i.e., selecting the best observed treatment option) is applied for adaptation and for hypothesis testing.The FWE is less than the nominal level (i.e., conservative) in all other situations.ii) Sample size and power calculation are exact for the special case and conservative in other situations; iii) Sample size re-estimation is precise in the special case when the observed efficacy equals the true efficacy.Conditional FWE is controlled exactly in the special case and conservative otherwise, hence the overall FWE is controlled; iv) Conservative inferences: Exact inferences require knowledge of the configuration of the vector θ ¼ θ 1 ; . . .; θ M ð Þ and the covariance matrix.But neither is known in actual trials and exact inferences are not possible.Conservative estimates such as p-values (larger than the true value), point estimate (smaller than the true value), and confidence interval (wider than the exact interval) can be obtained with the proposed procedure to make valid conclusions about the efficacy.v) Conservative inferences with adaptation (such as SSR and dropping some comparisons (e.g., ineffective doses) can also be obtained; vi) Computation efficiency: Gao et al. (2014) used iterative multiple integration to calculate probabilities.The iterations are very time consuming (see Ghosh et al. 2017, for discussions).Because the new proposed algorithm does not use iterations, it does not require long computation time.Sample size and power calculation takes about 2-3 seconds.The algorithm is also much simpler than that of Ghosh et al. (2017).vii) By using Slepian's lemma (Huffer 1986;Slepian 1962), the design can be applied to cases in which the covariance coefficients are not known.Hence, the design can be applied not only to dose selections, but also to trials involving different regimens, drugs, or population enrichment.Further, conservative point estimates and confidence intervals, with or without sample size change, are provided.viii) Lastly, all previous methods only test the overall null hypothesis H M 0 ¼ \ m¼1 M H 0;m that at least one experimental treatment is superior to the control.Our method includes a closed testing procedure (similar to an example that uses a modified Dunnett's testing in Marcus et al. 1976) that can test all the l-level null hypotheses and can be used to reject multiple individual null hypotheses H 0;m .All l-level testing will use a common l-level critical boundaries.
We refer our method as the Brownian motion approximation (BMA) approach for MCSD, which reduces to GSD/ASD when M ¼ 1.The BMA approach does not require in-depth knowledge of either Brownian motion or Markov processes.It only uses calculus and four basic (widely known) properties of Brownian motion: i) It is additive: Þ has a joint multivariate normal distribution (e.g., Jennison and Turnbull 2000;Whitehead 1997).An MCSD/AMCSD design is an adaptive design.The FDA guidance on adaptive designs (2019) summarized the major potential advantages of adaptive designs: a) Statistical efficiency; b) Ethical considerations; c) Improved understanding of drug effects; d) Acceptability to stakeholders.In the same guidance, the FDA (2019) recommended that three statistical principles should be satisfied: a) Controlling the Chance of Erroneous Conclusions (type I error control); b) Estimating Treatment Effects (valid inference, including point estimate, two-sided confidence interval, and p-value); c) Trial Planning (including sample size/power calculation); (The FDA guidance (2019) also included a fourth principle, which is operational: d) Maintaining Trial Conduct and Integrity).Our method, the MCSD/AMCSD, satisfies all the three statistical principles in the FDA guidance (2019).The MCSD/AMCSD is supported by the DACT (Design and Analysis for Clinical trials) software at https://www.innovativstat.com/software.html (The software is free for academic researchers).Computation codes are available upon request.

Score function and GSD
To demonstrate the similarities between GSD and MCSD, we first describe the GSD using BMA.In a GSD, one experimental treatment is compared with a control.Let θ be the parameter of interest, larger θ indicates better efficacy, and the null hypothesis is H 0 : θ ¼ 0, and the alternative hypothesis is .The Fisher's information Jennison and Turnbull 1997).The GSD (O'Brien and Fleming 1979;Pocock 1977) was based on the Wald statistics.
Interim analyses are planned, to be performed at information time points . Let e i ¼ c i ffi ffi ffi t i p and name the e i 's as the "exit" boundaries.The rules of a GSD can be equivalently stated as Þ has a multivariate normal distribution, with a K � Kcovariance matrix.All probabilities such as sample size and power calculation, type I error, can be calculated using multivariate integration on S without using the recursive algorithm (Armitage et al, 1969).A brief description is provided in the online supporting material.

The vector of the score functions
Each of the M parameters θ 1 ; . . .; θ M , is associated with an estimate θm (at any information time t m ) and a score function S m θm

Hypothesis testing and type I error control
The trial will include a total of K analyses including the interim analyses and the final analysis.Let t m;i be the information time for the mÀ th component W m t ð Þ at the i-th interim analysis.For each, t m;1 < . . .< t m;K .This forms an information vector at each interim analysis, ti ¼ t 1;i ; . . .; t M;i À � , i = 1, . .., K. Type I error control is achieved by selecting critical boundaries c 1 , . .., c K such that the null hypothesis

The closed testing procedure 3.2.3.1. The L-level tests and critical boundaries.
The closed testing procedure has been used to control FWE for testing H M ð Þ 0 (e.g., Bretz et al. 2006).We propose the use of the closed testing procedure for a different purpose.Marcus et al. (1976) included an example for closed testing procedure by modifying Dunnett (1955) in which multiple individual null hypotheses can be rejected using different levels of rejection criteria.Such a procedure is more powerful than a single criterion that only tests H M ð Þ 0 .The proposed closed testing procedure is similar to this example.With this closed testing procedure, our proposed design is more powerful than the methods in the literature (e.g., Bretz et al. 2006;Chen et al. 2010;Gao et al. 2014;Stallard and Friede 2008).The closed testing procedure is a multi-step procedure: for all Q that contains 2 numbers from {1, . .., M} (level 2 test); Step M: Test H 0;m : θ m ¼ 0 (level 1 test) for all m.These tests will be referred as level l tests for each is rejected, the overall alternative hypothesis is claimed that θ max > 0. A null hypothesis H 0;m can be rejected if and only if all l-level (1 ≤ l ≤ M) tests containing H 0;m have been rejected.At each level l, the l -level FWE is controlled by the sequential testing critical boundaries l c 1 , . .., l c K , derived with BMA procedure.All of the published literature test For simplicity of notations, we assume that for each i, t 1;i ¼ . . .¼ t M;i ¼ t i .Such an assumption may not always hold (e.g., if the efficacy endpoint is time to event, or binary), those situations result in more complicated notations, but does not invalidate the procedure.Interim analysis will be held at For a level l test, type I error control is achieved by selecting critical boundaries l c 1 , . .., l c K such that the null hypothesis ) ≥ q c i , for all q < l.Marcus et al. (1976), it is possible to reject more than one individual null hypotheses with the closed testing procedure.For example, assume M = 3, and the trial is terminated at the I -th interim analysis, with

Using the closed testing procedure to potentially reject multiple null hypotheses. Similar to the example in
be the Wald statistics.Assume that x 3;I � x 1;I � x 2;I , and z 3;I � 3 c I , 3 c I � z 1;I � 2 c I , and 2 c Icritical=exit � z 2;I � 1 c I .Without the closed testing procedure, only H 3 ð Þ 0 :θ max ¼ 0 can be rejected (and H 0;3 , implicitly maybe), the individual null hypotheses H 0;1 , H 0;2 could not have been rejected.With the closed testing procedure, all of H 0;1 , H 0;2 , and H 0;3 can be rejected (this is very similar to the example in Marcus et al. 1976).For simplicity, we will only discuss the M level test.Other l level tests are similar.To simplify the notations, we denote M c i ¼ c i .
The use of the M × K dimensional vector is motivated by the Markov process theory and independent increment property of the score functions.It converts the use of Markov process theory into the use of multi-variate normal distribution.It is the foundation of all the calculations used in the proposed method.It greatly simplifies the algorithms of those used in Gao et al. (2014) and Ghosh et al. (2017).The integral over the multi-variate normal distribution can be easily carried out using the R-function pmvnorm.Details of the calculations are provided in the online supporting material.The l -level critical boundaries for the closed testing procedure, l c 1 , . .., l c K , are derived similarly.

α-spending functions. Let
. This is the event that the critical boundaries were not crossed at any interim analysis before t i , and was crossed at t i .Note that the events A i are mutually exclusive, or that A i \ A j ¼ ; for i�j.Hence, Under the null hypothesis, observing an A i means the occurrence of type I error.Let α i ¼ P A i ð Þ.Then α i is the type I error "spent" at t i .Therefore, the α i 's need to be chosen such that P K i¼1 α i ¼ α=2.The critical boundaries c i can be selected successively as detailed in the online supporting material.

Power calculation and sample size determination
Suppose that the "information fraction" s i 's, i = 1, . .., K, and the critical boundaries c i , i = 1, . .., K, have been determined.Let the information time at the final analysis be denoted as T, and let t K ¼ T, t i ¼ s i T; e i ¼ c i ffi ffi ffi t i p .Let the assumed efficacy parameters under the alternative hypothesis be θ ¼ θ 1 ; . . .; θ M ð Þ.The power with information time T is Details of the calculation are provided in the online supporting material.P θ T ð Þ is an increasing function of T. The desired power of the trial can be obtained by choosing Sample size n K can then be determined by the relationship that n K � aT for some distribution specific constant a.

The general situation and Slepian's lemma
The above discussed calculations of type I error, exit/critical boundaries, sample size/power assume that the covariance matrix is known, which is possible only in the special situation of treatment selection with a common control, and the variable is normally distributed.We refer all other situations as the general situation, such as treatment selection with non-normally distributed variable (e.g., timeto-event), population selection, endpoint selection, or mixed selections (2019).The FDA guidance noted that "It may be particularly difficult to estimate Type I error probability and other operating characteristics for designs that incorporate multiple adaptive features."The FDA guidance (adaptive designs, 2019) hypothesized that, for the general situation, "it can be argued that assuming independence among multiple endpoints will provide an upper bound on the Type I error probability."A mathematically rigorous proof for this hypothesis is provided by using the Slepian's lemma (Huffer 1986;Slepian 1962).Hence, conservative statistics can be obtained that satisfy the three statistical principles in the FDA guidance (2019): upper bound of type I error that does not exceed the desired level, conservative lower confidence limit and upper bound of p-value.These calculations are confirmed through simulations shown in later sections.Slepian's (1962) (Huffer 1986;Slepian 1962)

Slepian's lemma
for i ≠ j.Then the following inequality holds for all real numbers u 1 ; . . .;

Applications
For multiple dose comparison with normally distributed endpoint and a common control, the covariance matrix is known (see online supporting material).In all other cases (such as in multiple treatment comparison in which the distribution of the efficacy variable is not normally distributed, or trials with population enrichment, and/or multiple primary endpoint comparison), the covariance matrix is generally unknown.Let s lemma.For all potential options/comparisons to be considered in a multiple comparison trial, it is reasonable to assume that E B m t i ð Þ; B l t j À � À � � 0, or that the endpoints are non-negatively correlated.This assumption is in agreement with the opinion that (FDA guidance,2019) "Most secondary endpoints in clinical trials are correlated with the primary endpoint, often very highly correlated).Then 0 . Hence, the conditions for Slepian's lemma are met, and g � α=2 as well.Hence the type I error is controlled for the MCSD.

Sample size modification
To facilitate the sample size change discussion and dose dropping (or dropping options for lack of efficacy), we use the superscript (1) to indicate statistics, parameters in the original design, and the superscript (2) to indicate statistics, parameters after the sample size change.For example, the information time points for interim analysis in the original design will be t

Information times and critical
JOURNAL OF BIOPHARMACEUTICAL STATISTICS boundaries after sample size change will be t i , respectively.The original number of comparisons will be M 1 ð Þ , and the new number will be M 2 ð Þ .

Conditional power and conditional type I error
The conditional power and type I error calculation follow almost exactly as in Gao et al. (2008), except the notations between multi-dimensional vector (in this article) and a single Brownian motion (Gao et al. 2008).And suppose that at the , and ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi t . The conditional power under the alternative hypothesis θ� 0 can be calculated as, This can be calculated as an integration with a multi-variate normal distribution.The details of the calculation are provided in the online supporting material.The conditional type I error is obtained by setting

New conditional power, new conditional type I error, and new sample size
The new conditional power and type I error calculation follow almost exactly as in Gao et al. (2008), except the notations.Suppose that after the L 1 ð Þ -th interim analysis, the number of options (e.g., doses, sub-groups, regimens, sub-populations, or endpoints) will be reduced from , and the sample size will be modified.Denote the remaining options as m . For simplicity of notations, we assume that the remaining indices are 1, . .., M 2 ð Þ .Suppose that the remaining interim analysis are re-scheduled at information times t ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi t . The new conditional power is This can be calculated as an integration with a multi-variate normal distribution.The details of the calculation are provided in the online supporting material.
Then, the conditional power will be 1 − β if a new sample size corresponding to t Hence, the new boundaries (which depends on the choice of t K 2 should provide conditional power 1 − β.Both the new critical boundaries and the new sample size can be obtained through an iteration procedure.

Point estimate and lower confidence limit for the option with the best efficacy
The parameter and confidence limit calculation follow the same logic as in Gao et al. (2013), except the notations (i.e., Gao et al. 2013 used the inverse of the monotone function f θ ð Þ, while the monotone is used in this article).Suppose that the trial terminated at I-th interim analysis (information time t I ), with , which is the maximum of the observed Wald statistics at t I .Let (The details of the calculation are provided in the online supporting material).Let's rearrange x 1;I ; . . .; x M;I such that x n 1 ;I � . . .� x n M ;I ¼ x I .For purpose of estimation, it's reasonable to assume that θ n M ¼ max θ 1 ; . . .; θ M ð Þ.Thus, it would be desirable to estimate θ n M .An unbiased estimate of θ n M requires the knowledge of the configuration of θ ¼ θ 1 ; . . .; θ M ð Þ which is generally unknown.We attempt to obtain a conservative estimate of θ n M , which would be sufficient for regulatory purposes.Let θ¼ ¼ θ; . .
5 would be a conservative estimate for θ n M , since this estimate assumes θ 1 ¼ . . .¼ θ M .For the same reason, θ M;α=2 is a conservative estimate for the lower confidence limit for θ n M .For convenience of discussion, we denote θ n M ¼ θ best .We note that θ M;α=2 is consistent with the hypothesis test, in the sense that θ M;α=2 > 0 if and only if p < α=2.A conservative estimate of θ n MÀ 1 can be obtained similarly using the M −1 level test critical boundary.If desired, this process can continue until θ n 1 is obtained.

Upper confidence limit
The calculation of the upper limit of the confidence interval uses the inverse of the monotone function g M θ ð Þ.For the upper limit of the confidence interval, conservativeness means an over-estimate (which then results in a wider confidence interval).Unlike the lower confidence limit, which was obtained using an M-dimensional random vector, the upper confidence limit will be obtained using a onedimensional Brownian motion.For this purpose, let W(t) = B(t) + θt, and e i ¼ M e i , c i ¼ M c i be the exit and critical boundaries for the M-level test.Let (The details of the calculation are provided in the online supporting material.)Then g M θ; M c 1 ; gt . . .
is the function for estimating unbiased estimate for a usual group sequential procedure between two arms (Gao et al. 2013).Since 1 c 1 ; gt . . .
À � is a "conservative" estimate for the upper confidence limit.

Lower confidence interval limit and p-value
Suppose that the trial terminated at I 2 ð Þ -th interim analysis (information time t Then θ α=2 is the Hayter (1986) type lower limit boundary for θ n M 2 ð Þ , and θ ¼ θ 0:5 is a conservative point estimate for θ n M 2 ð Þ .For convinience of discussion, we denote θ n M 2 ð Þ ¼ θ best .The p-value is obtained as f 0; x I ð Þ.

Upper confidence limit
Let W(t) = B(t) + θt, and e i be the planned exit and critical boundaries for the M 1 ð Þ -level test, and is an increasing function of θ.Similar to the estimation without sample size change, g À 1 1 À α=2; 1 c 1 ; gt . . .; 1 c K ; x I À � is a "conservative" estimate for the upper confidence limit.

Repeated sample size modification
Repeated sample size modifications can be conducted.Successive backward images will need to be obtained for inference.Details are provided in the online supporting material.

Examples and simulations
We provide examples of the design of multiple dose comparisons, simulation results for type I error control, point estimate and two-sided confidence interval coverage.We also provide examples on how the AMCSD procedure may be applied in clinical trials.All calculations, such as critical boundaries, sample size, conditional power and conditional type I error, point estimate, confidence interval, sample size re-estimation, power and type I error simulations are available with DACT.

Critical boundary selection and type I error control
We consider a design for a trial that compares two doses of a new drug with a common control (Table 1), with normally distributed efficacy endpoint (Table 2), and another example with non-normally distributed endpoint (Table 3).There will be one interim analysis at information fraction s 1 = 1 ⁄ 2, and a final analysis at s 2 = 1.The boundary is an O'Brien-Fleming type critical boundary.Type I error simulations are performed for both normally distributed endpoint and survival analysis using the O'Brien-Fleming boundaries from Table 2, for a study that enrolls 100 patients randomized to the two dose groups and the control group in a ratio of 1:1.For the normally distributed variables, the control has an N(0, 1) distribution, the dose groups have an N(θ 1 , 1) and an N(θ 2 ,1) distribution, respectively.The repetition for the simulation is 100,000.The type I error is the rejection rate when the Z-score associated with a dose with θ i = 0 crossed the critical boundary.For survival analysis, the planned number of events is 300.The efficacy of the different dose groups are (HR 1 ,HR 2 ).Type I error is the rejection rate when the Z-score associated with a dose with HR i ¼ 1 crossed the critical boundary.The repetition for the simulation is 10,000.

Point estimate and confidence interval simulations
Simulations are performed for dose selection studies.The selection rule for the simulations in Table 4 is for a two (active) doses (with one control) design.θbest is estimate for the dose with the largest Wald statistics at the termination of the trial.θ chosen is the true efficacy corresponding to selected θbest .The bias is θbest À θ chosen .The point estimates and confidence interval coverage for θ chosen are presented.In our simulations, sample size re-estimation is performed at the interim analysis.All the simulation results agree with the mathematical derivations: i) The rate of rejecting the null hypothesis is consistent with the lower boundary of the confidence interval (not shown in the table); ii) The point estimate is conservative (the median bias is negative); iii) the coverage of the two-sided CI is conservative; vi) The upper limit of the confidence interval is conservative; v) The lower limit of the confidence interval is conservative.

Two examples
We use one trial design and two possible scenarios as examples.We use a hypothetical trial to demonstrate how to perform sample size re-estimation and final analysis.Suppose that an oncology trial is being conducted, in which two regimens are compared to a common control.It is noted that the covariance matrix is not known.Hence the trial is designed according to Table 3.Note that K 1 ¼ 2. Suppose that the efficacy endpoint is time to event (say, time to progression free survival, PFS), the efficacy is the hazards ratio HR, and θ ¼ À log HR ð Þ. Suppose that the initial sample size is 300 events for the three arms.Suppose that a sample size re-estimation is conducted at the first look.Hence, L 1 ð Þ ¼ 1, and the total number of events at the interim analysis is 150.Suppose that larger values of θ (smaller HR) indicates better efficacy, and an effect size of θ < À log 0:9 ð Þ � 0:1 is considered to be clinically insufficient.In the example, we assume that combined number of events between an active arm and the control at the interim analysis is 100 (in an actual trial, this number should be the observed number).All the calculations were performed using the DACT software, and users can repeat the calculations using the DACT software.
Suppose that the trial continues as planned without sample size change (hence no change in the final critical boundary).Suppose that at the final analysis, ð θ1 ; θ2 Þ ¼ ð0:22; 0:24Þ were observed with common standard error s:e:ð θ1 Þ ¼ s:e:ð θ2 Þ ¼ 0:106.Then z 2 , the level 1 test rejects the null hypothesis H 0;2 : θ 2 ¼ 0 as well.Since z rejected with the level 2 test.However, since the null hypothesis of H 1 only needs to be compared with 1 c 2 per the closed testing procedure.Since z The rejection of H 0;1 would not be possible without the closed testing procedure.The p-value for the overall test (level 2) is p ¼ 0:024.The conservative point estimate for θ max is θ ¼ 0:1817, and the conservative two-sided 95% confidence interval is (0.0017,0.4475) (see Figure 2 in the online supporting material).Note that because θ2 ¼ maxð θ1 ; θ2 Þ, it is assumed that θ max ¼ θ 2 .The estimate and confidence interval for θ 1 needs to be calculated separately using the 1 boundaries.Two imaginary curves of Wald statistics are used in Figure 2 for illustration.The two curves illustrate the trajectories of the Wald statistics

Discussion
The properties of the score function are well known (Jennison and Turnbull 1997).It is approximately a Brownian motion.Because of this property, the multi-dimensional Markov process, formed by the vector of the score functions, is a precise mathematical model for describing the accumulative data for an MCSD/AMCSD procedure.The type I error is exact and sample size/power calculation is precise if the covariance matrix is known.In more general cases, the type error control, sample size/power calculation is conservative.In this sense, the procedure is optimally conservative.The method allows flexible decision rule for adaptive changes.By utilizing Slepian's lemma, the procedure can be applied to any distribution for which the score function is approximately a Brownian motion (e.g., Jennison and Turnbull 1997), such distributions include the normal distribution, binary distribution, survival analysis, Poisson distribution, and the negative binomial distribution.In a dose selection trial with a normal endpoint, the correlations between any two score functions can be derived as 0.5.For all other situations, the correlation is considered to be unknown, and Slepian's lemma is applied, which conservatively assumes the correlation coefficients to be zero.All these distributions are supported by the DACT software.The procedure can be applied to a wide range of trials, including dose, treatment, end point selection, or a mixture of these selections.It can also be applied to population enrichment.To apply the procedure, M score functions need to be created.Suppose that there are M active doses/treatments in a dose/ treatment selection trial, the comparison of each active dose/treatment arm with the control corresponds to a score function.In a trial that involves one active treatment arm and a control arm with M endpoints, each endpoint corresponds to a score function.In a population enrichment trial that is randomized between an active treatment arm and a control arm, and in which a total of M (sub) populations are considered, the comparison between the active treatment and the control for each subpopulation corresponds to a score function.Such an association between a comparison and a score function can be applied to a trial that involves a mixture of dose/treatment, endpoint.population enrichment comparison in the same manner.The method is complete in the sense that it includes sample size and power calculation, sample size re-estimation and it provides inference which includes point estimate, confidence interval and p-values.
m : max θ 1 ; θ 2 ð Þ ¼ 0 is rejected.Since regimen 1 has been dropped, this suggests that θ 2 > 0. The final p-value is p ¼ 0:005.The point estimate and the conservative two-sided 95% confidence interval are θ2 ¼ 0:1947 (0.047616,0.393584).figure Two imaginary curves of Wald statistics are used in Figure2for illustration.The two curves illustrate the trajectories of the Wald statistics Let θ be an unbiased estimate, and s:e: c 1 ; gt . . .; l c K À � being the critical boundaries for each l level testing.The critical values can be chosen such that l c i � lþ1 c i .So if (W m t i where each B m (t) ~ N(0, t) is a Brownian motion.So M groups of critical boundaries M c 1 ; gt . . .; M c K À � , . . ., 1 c 1 ; gt . . .; 1 c K À � will be selected, with l with multivariate normal distributions.Let the joint density function for s ¼ s 1 θ 1 ; . . .; s 1 θ K ; . . .; s K θ 1

Table 1 .
Two doses, two looks, common control and normally distributed endpoint.

Table 2 .
Critical boundary with two treatments and two looks, any distribution.

Table 3 .
Type I error control with two doses and with two looks.