A causal modelling framework for reference-based imputation and tipping point analysis in clinical trials with quantitative outcome

ABSTRACT We consider estimation in a randomised placebo-controlled or standard-of-care-controlled drug trial with quantitative outcome, where participants who discontinue an investigational treatment are not followed up thereafter, and the estimand follows a treatment policy strategy for handling treatment discontinuation. Our approach is also useful in situations where participants take rescue medication or a subsequent line of therapy and the estimand follows a hypothetical strategy to estimate the effect of initially randomised treatment in the absence of rescue or other active treatment. Carpenter et al proposed reference-based imputation methods which use a reference arm to inform the distribution of post-discontinuation outcomes and hence to inform an imputation model. However, the reference-based imputation methods were not formally justified. We present a causal model which makes an explicit assumption in a potential outcomes framework about the maintained causal effect of treatment after discontinuation. We use mathematical argument and a simulation study to show that the “jump to reference”, “copy reference” and “copy increments in reference” reference-based imputation methods, with the control arm as the reference arm, are special cases of the causal model with specific assumptions about the causal treatment effect. We also show that the causal model provides a flexible and transparent framework for a tipping point sensitivity analysis in which we vary the assumptions made about the causal effect of discontinued treatment. We illustrate the approach with data from two longitudinal clinical trials.


Introduction
Missing outcome data represent a major threat to the validity of randomised controlled trials (RCTs), and appropriate analysis methods have been much discussed. An influential report showed that different analysis methods may target different estimands (different measures of treatment effect) and argued that specification of the estimand is an important part of trial design and should inform trial analysis and reporting (National Research Council 2010). Regulators have joined the call for estimands to be defined clearly, and the International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use (ICH) Steering Committee has endorsed the development of new regulatory guidance on the choice of estimands and sensitivity analysis in clinical trials (European Medicines Agency 2017).
We consider two types of estimand considered by the National Research Council (2010): (E1) difference in outcome improvement at the planned endpoint if all participants had tolerated or adhered to trial protocol; (E2) difference in outcome improvement at the planned endpoint for all randomised participants. The former measures how treatment works in an ideal setting (efficacy), while the latter measures how treatment might work in practice (effectiveness). To encompass outcomes that measure harms of treatment, Carpenter et al. (2013) (henceforth CRK) proposed the broader terms de jure and de facto estimand for (E1) and (E2) respectively. The ICH E9(R1) draft addendum (European Medicines Agency 2017) refers to a "hypothetical strategy" and a "treatment policy strategy" in defining estimands (E1) and (E2) respectively.
Sometimes investigators continue to collect data after treatment discontinuation. The use of such off-treatment data depends on the estimand (Permutt 2016). For the estimation of a de jure estimand, off-treatment data for participants who discontinued randomised treatment could be used in a complier average causal effect analysis (Dunn et al. 2003). In practice, however, off-treatment data are typically either not collected or excluded from the primary analysis, and the missing data are assumed to be missing at random (MAR): that is, it is assumed that participants who discontinued treatment would have benefited from continued treatment in the same way as those who remained on treatment. However, estimation of a de facto estimand ideally makes use of the off-treatment data, which should be collected where possible (National Research Council 2010). When all discontinuers are followed up and complete outcome data are obtained, the de facto estimand can be directly estimated by comparing observed means (Little and Kang 2015).
This paper considers estimation of a de facto estimand for a quantitative outcome when offtreatment data are not collected. For participants who have discontinued treatment, this requires assumptions about whether and to what extent they continue to benefit from their previous treatment. Our approach is also relevant to the situation in which rescue treatment (over and above the per-protocol treatment regime for the control arm) is available for those who discontinue randomised treatment, but interest is in the effect attributable to the initially randomised treatment without the confounding effects of rescue medications (corresponding to estimand 6 in Mallinckrodt et al. (2012)), and data after rescue are either unavailable or are ignored.
In the previous work on this topic, Little and Yau (1996) presented a multiple imputation (MI) approach that could incorporate a variety of alternative assumptions about the treatment effect after treatment discontinuation for the estimation of de facto estimands in RCTs. CRK presented a generic algorithm for MI of post-discontinuation outcome data. They assumed that post-discontinuation outcomes in a given trial arm behave in some way like outcomes in a reference arm (often the control arm), and proposed various specific methods for forming the imputation distribution. These methods have been called "reference-based imputation" (RBI) or "control-based imputation" methods. However, CRK did not theoretically justify these methods. In this paper, we assume that participants who have discontinued their randomised treatment receive treatment that is similar to that allocated to the control arm: thus for reference-based imputation, we take the reference to be the control treatment, where this is typically either placebo or standard of care.
Specification of estimands is clarified by using counterfactual outcomesoutcomes that have not been or could not have been observed. Such counterfactuals are best described using potential outcomes notation (Angrist et al. 1996;Little and Rubin 2000). Our aims in this paper are first to propose and implement a causal model, using explicit assumptions about the causal effect of a previously discontinued treatment, and second to show that three of the RBI methods are special cases of the causal model, and hence to provide their theoretical justification. Implementing the causal model requires untestable assumptions, so we need sensitivity analyses to understand the impact of these assumptions on inferences and conclusions from the primary analysis. Kenward et al. (2001) described a principled approach to sensitivity analyses which varies a sensitivity parameter that quantifies deviations from the missing data assumption. A tipping point sensitivity analysis (e.g. Yan et al. (2009);Liublinska and Rubin (2014)) extends this approach by varying the sensitivity parameter until the conclusion from the primary analysis is overturned. The third aim of this paper is to propose a tipping point sensitivity analysis using the causal model.
The paper is organised as follows: In Section 2, we set out notation and define the RBI methods. Section 3 contains the key new material: here we set out the causal model and discuss equivalence with RBI methods. In Section 4, we discuss implementation. In Section 5 we verify the equivalence of RBI and causal model estimates in a simulation study. In Section 6, we illustrate the proposed approach and demonstrate the tipping point analysis with two example data sets. We conclude with summary remarks in Section 7.

Notation
We consider a two-arm longitudinal RCT with quantitative outcome observations scheduled at baseline and at t max occasions after randomisation. Let Z be the random variable for the participant's randomised treatment arm: Z ¼ a for the active treatment arm and Z ¼ c for the control arm. Let Y t be the random variable for the participant's outcome at visit t ¼ 0; :::; t max . It is convenient to define Y t ¼ ðY 0 ; . . . ; Y t Þ, the vector of all outcomes up to and including visit t; Y > t ¼ ðY tþ1 ; . . . ; Y tmax Þ, the vector of all outcomes after visit t; and Y ¼ ðY 0 ; . . . ; Y tmax Þ, the vector of all outcomes. Let D be the random variable for the participant's last visit prior to discontinuing treatment, so D ¼ 0; :::; t max . Y t is observable for all t but only observed for t D, because we assume no off-treatment data. We aim to impute the unobserved values of Y t for t > D: we stress that these are the outcomes that existed but were unobserved, not the outcomes that would have existed if treatment had been continued.
We define the potential outcome Y t ðsÞ at visit t as the outcome that would have been observable if, possibly contrary to fact, a participant received active treatment for s periods only. In particular, Y t ð0Þ is the potential outcome if never treated, and Y t ðt max Þ is the potential outcome if always treated. We define Y t ðsÞ, Y > t ðsÞ and YðsÞ as before. We let μ t ðsÞ ¼ E Y t ðsÞ ½ , the mean of the potential outcome at visit t if active treatment was received for s periods only. Similarly, we define μ t ðsÞ, μ > t ðsÞ and μðsÞ. The variance-covariance matrix of the potential outcomes is ΣðsÞ ¼ var YðsÞ ð Þ with submatrices Σ t t ðsÞ, Σ > t > t ðsÞ and Σ > t t ðsÞ. We define the matrix of regression coefficients of potential outcomes after visit t on those up to visit t as β t ðsÞ ¼ Σ > t t ðsÞΣ t t ðsÞ À1 , and the residual variance of the potential outcomes after visit t given those up to visit t as Ω t ðsÞ ¼ Σ > t t ðsÞΣ t t ðsÞ À1 Σ > t t ðsÞ T . Figure 1 illustrates this notation in the case of an adverse outcome which deteriorates (increases) in the absence of treatment and improves (decreases) in the presence of treatment. If treatment is discontinued at time s then mean outcomes up to time s are unaffected (μ t ðsÞ ¼ μ t ðt max Þ for t s) but outcomes after time s are worsened (μ t ðsÞ > μ t ðt max Þ for t > s). The notation is summarised in supplementary appendix A.
This potential outcomes notation allows for only one type of treatment. We assume that the observed outcomes are not affected by other treatments. For the outcomes after treatment discontinuation, we assume either that rescue treatment (over and above the per-protocol treatment regime for the control arm) is not available or that interest is in the effect attributable to the initially randomised treatment without the confounding effects of rescue medications.
The de jure estimand (estimand E1) at visit t > 0 is E Y t ðtÞ À Y t ð0Þ ½ . The de facto estimand (estimand E2) is the estimand of interest in this paper and is E Often, primary interest is in the last visit, t ¼ t max .

Reference-based imputation
CRK proposed a generic MI algorithm for this setting: (1) For each treatment arm, fit a multivariate normal model to all observed data, using a Bayesian approach with an improper prior and assuming MAR. The model should have unstructured mean and unstructured variance-covariance matrix.
(2) For each treatment arm, draw a mean vector and variance-covariance matrix from the posterior distribution.
(3) For each treatment arm and each possible treatment discontinuation visit t, use the draws to build the hypothetical joint distribution of the outcomes Y t up to time t and the outcomes Y > t after time t, using one of the methods described below. (4) For each treatment arm and each observed treatment discontinuation visit t, construct the imputation distribution of Y > t given Y t . Sample Y > t from this conditional distribution, to create a "completed" data set. (5) Repeat steps 2-4 m times, resulting in m imputed data sets. (6) Analyse each imputed data set and combine the results using Rubin's rules (Rubin 1987).
To understand the assumptions behind the CRK algorithm, we express it using the potential outcomes notation.
In step 1, the model is fitted to each treatment arm separately. In the control arm, the observed outcomes are Y t ¼ Y t ð0Þ. In the active treatment arm, the observed outcomes are Y t ¼ Y t ðt max Þ, because we assume no off-treatment data. Hence, under MAR assumptions that we make explicit in Section 3.1, the multivariate normal model fitted to the control arm has mean μð0Þ and variance Σð0Þ, and that fitted to the active treatment arm has mean μðt max Þ and variance Σðt max Þ.
In step 2, values of μð0Þ, Σð0Þ, μðt max Þ and Σðt max Þ are drawn from their posterior distributions. In step 3, the drawn values are used to build hypothetical joint distributions of Y. Specifically, for participants in the active treatment arm who discontinue treatment at time t, a joint distribution is built for YjZ ¼ a; D ¼ t. CRK proposed using a multivariate normal distribution. Five methods are mainly distinguished by their choice of mean: • Missing at random (MAR): mean = μðt max Þ. Figure 1. Notation illustrated. Lines indicate mean potential outcomes under three potential treatment scenarios. Circles indicate observable outcomes for a participant who discontinues treatment at visit s.
CRK proposed corresponding variance matrices. We simplify their description by observing that only the regression coefficient matrix and conditional variance matrix of the potential outcomes after visit t given those up to visit t are used in later steps. CRK set these to be β t ðt max Þ and Ω t ðt max Þ for MAR and LMCF, and β t ð0Þ and Ω t ð0Þ for J2R, CIR and CR. An approach that we call RBI alternative instead uses β t ðt max Þ and Ω t ðt max Þ for all RBI methods.
In step 4, the joint distributions above are used to derive conditional distributions for The rest of the CRK algorithm follows standard MI methods, using the conditional distribution as the imputation model.
The hypothetical joint distributions under each of these methods are written in the notation of this paper in supplementary appendix B. The corresponding imputation distributions for the Z ¼ a; D ¼ t subgroup for any t < t max are given under "Reference-based imputation methods" in Table 1. The imputation means are written as a selection term, reflecting how the D ¼ t subgroup differs from other participants, plus a term linearly related to the treatment effect up to time t. This motivates our causal model in section 3, which relates causal treatment effects after D to those up to D.

New causal model
In this section, we first set out the assumptions of the causal model, and then derive the imputation model. Table 1. Imputation distribution of Y > t ðtÞ for t < t max given randomisation Z ¼ a, past Y t and treatment discontinuation visit D ¼ t, under various reference-based imputation methods with control arm as reference (Carpenter et al. 2013) and under the causal model. C t is a 'carry-forward' ðt max À tÞ Â ðt þ 1Þ matrix containing t columns of zeroes and a final column of ones, so that C t μ t ðtÞ is a column vector containing t max À t copies of μ t ðtÞ.

Imputation distribution
n o þ μ > t ð0Þ, but the expression given here facilitates comparison with the other methods.

Assumptions
Assumption A1. Randomisation is independent of potential outcomes: Z ? ? YðsÞ for all s.

Assumption A2. ðYjZ ¼ cÞ is missing at random (MAR).
Assumption A2 states that the observed outcomes in the control arm are MAR. If there are no missing data before treatment discontinuation, then we can also write this for all t. In other words, treatment discontinuation in the control arm does not relate to future untreated outcomes, given the past and present.
Assumption A3 states that the counterfactual fully treated outcomes in the active arm are MAR. If there are no missing data before treatment discontinuation, then we can also write this for all t. In other words, treatment discontinuation in the active arm does not relate to future counterfactual fully treated outcomes, given the past and present.
We do not assume that the actual outcomes in the active arm, ðYjZ ¼ aÞ, are MAR. Indeed, this is unlikely to be true, since (if treatment is effective) treatment discontinuation causally affects actual future outcomes. Thus, treatment discontinuation is allowed to relate to future actual outcomes, given the past and present.
Assumption A4. Y > t ðtÞjY t ðtÞ follows a linear regression for each t.
This assumption implies that the conditional mean of each future potential outcome Y u ðtÞ (u > t) depends linearly on the past observed outcomes Y 1 ðtÞ; . . . ; Y t ðtÞ. We make no assumption of linearity in t or u, so that trajectories over time have no assumed form. A4 is true if YðtÞ follows a multivariate Normal distribution. The linear regression has mean μ > t ðtÞ þ β t ðtÞfY t ðtÞ À μ t ðtÞg and residual variance matrix Ω t ðtÞ.
A5 states that treatment discontinuation at visit t is unaffected by future partly treated potential outcomes. It appears similar to the equation A3, but the latter refers instead to future fully-treated potential outcomes. If there are no missing data before treatment discontinuation then a stronger assumption which implies both A3 and A5 is for all t and all s > t.
A6 is an explicit assumption about how the maintained effect of treatment after discontinuation relates to the effect of treatment before discontinuation. Equivalently, K t is a ðt max À tÞ Â ðt þ 1Þ matrix of sensitivity parameters: it is not identified by the data and must be specified by the user. Some suggestions for K t are made in Section 3.3. Our model makes no assumption about how the effect of active treatment changes over time while active treatment is continued. However, implicit in assumption A6 is that there is no delayed response to the control treatment: thus when a patient discontinues randomised treatment, we assume the effects of any treatments they switch to are similar to the effects they would have experienced had they received the control treatment from the start of the trial.

Modelling outcomes after treatment discontinuation
In this subsection, we consider an individual in the active arm who stops treatment at visit t < t max . We use the above assumptions to derive a model for this individual's outcomes after treatment discontinuation, conditional on their history Y t . In section 3.3 we take this model as an imputation model and compare it with the RBI imputation models.
We write the conditional mean outcome after treatment discontinuation in this model as the sum of three terms: where the first term represents the difference between the subgroup who discontinue at visit t and the whole group ("selection term"), the second term represents the treatment effect in the whole group ("maintained treatment effect"), and the third term is the untreated mean.
We write the selection term as Using assumption A6, and substituting (1) and (3) into (2) gives the mean of the imputation model: For the variance, we approximate var ð Þ which is valid when differences between drop-out patterns are small compared with the variation in the data, and otherwise conservative. By A1, this is var Y > t ðtÞjY t ðtÞ ð Þ¼Ω t ðtÞ. This imputation distribution is given under "Causal model" in Table 1.

Using the causal model
We need to fix three parameters in order to identify the causal model: K t , β t ðtÞ and Ω t ðtÞ. In most cases the post-discontinuation treatment effect may be assumed to depend only on the treatment effect at the discontinuation visit and not on treatment effects at earlier visits, and therefore K t has non-zero elements only in the final column; our software implementation below relies on this assumption. For tipping point sensitivity analyses, we consider two single-parameter causal models for the outcome at visit u after discontinuation at visit t: where v u ; v t are the times (on a suitable scale) of visits u; t. The maintained treatment effect after treatment discontinuation is constant in model (5) but decays exponentially in model (6), being multiplied by k 1 for every unit of time, where 0 k 1 1. A combined model is Next, we need to fix β t ðtÞ, the matrix of regression coefficients of Y > t ðtÞ on Y t ðtÞ. Assumptions A2 and A3 identify β t ð0Þ, the regression of Y > t ð0Þ on Y t ð0Þ, and β t ðt max Þ, the regression of Y > t ðt max Þ on Y t ðtÞ, respectively. We propose assuming either β t ðtÞ ¼ β t ð0Þ or β t ðtÞ ¼ β t ðt max Þ. We call these "regression from reference" and "regression from active", respectively. If all treatment effects are homogeneous (i.e. if YðtÞ À Yð0Þ does not vary between individuals for any t) then β t ðtÞ ¼ β t ðt max Þ ¼ β t ð0Þ and both "regression from reference" and "regression from active" are valid. If we are willing to assume equal variance-covariance matrices across trial arms (Σðt max Þ ¼ Σð0Þ) then β t ðt max Þ ¼ β t ð0Þ and "regression from reference" and "regression from active" give the same results. The same arguments and proposals apply for Ω t ðtÞ.

Comparison with reference-based imputation
From Table 1, RBI methods J2R, CIR and CR correspond to particular choices of the causal model, while the MAR and LMCF methods do not correspond to this causal model. K t is set to 0 for J2R, C t for CIR, and β t ð0Þ for CR. This makes precise the statement of Mallinckrodt et al. (2012) that, under CIR, CR and J2R, the Z ¼ a; D ¼ t subgroup has the treatment effect at visit t maintained, diminished and eliminated, respectively, at visit t max . Further, β t ðtÞ and Ω t ðtÞ are set to β t ð0Þ and Ω t ð0Þ. If the RBI alternative variance structures are used then the same equivalences apply, but with β t ðtÞ ¼ β t ðt max Þ and Ω t ðtÞ ¼ Ω t ðt max Þ.

Estimation
The CRK algorithm described in section 2 is easily adapted to impute under the causal model. Steps 1 and 2 are unchanged, and provide draws of μð0Þ, Σð0Þ, μðt max Þ and Σðt max Þ.
Step 3 is skipped since the imputation distribution is directly derived from the causal model.
Step 4 starts by imputing any missing data in the control arm under assumption A2, and any missing data in the active arm before treatment discontinuation under assumption A3. It then constructs the imputation distribution for active-arm data after treatment discontinuation using specification of K t , β t ðtÞ and Ω t ðtÞ as in Section 3.3. Steps 5 and 6 are unchanged. We describe implementation using the SAS macros developed by James Roger to perform MI under the RBI methods. These are available on the web page (on www.missingdata.org.uk) of the DIA working group for missing data. We modified the Part2A macro to impute under the causal model with where k is a scalar that may vary between participants. This enables causal model (5) to be implemented by setting k ¼ k 0 , the same for all participants. When interest is in the outcome at visit t max , causal model (6) can be implemented by setting k ¼ k t max ÀD 1 , which varies across participants with different values of D. The modified macro is available on the DIA working group web page and sample code is provided in supplementary appendix C. By default, the variance-covariance matrices in the two arms, Σðt max Þ and Σð0Þ, are assumed equal, but the user can specify them to be unequal, which is the case we consider.
Alternative implementations are given in supplementary appendix D.

Simulation
We performed a simulation study to verify equivalence of the RBI methods with the proposed causal model for estimating the treatment effect at the final visit and to assess the impact of misspecification of K t and β t ðtÞ. Mis-specification of Ω t ðtÞ has no impact on bias and little impact on variance, and for brevity is not discussed.

Design
Details of the data generating mechanism for simulating the observed and unobserved data are given in appendix E. Briefly, we consider an RCT with one baseline observation and two post-baseline visits during the treatment period (that is, t max ¼ 2). Some active-arm participants discontinue treatment after visit 1 and are not observed at time 2; all other participants continue randomised treatment and are fully observed. The mechanism for discontinuing treatment is either MCAR or MAR. The data distribution for the observed data has either β 1 ð2Þ ¼ β 1 ð0Þ or β 1 ð2ÞÞβ 1 ð0Þ: the latter is designed to make choices of β 1 ð1Þ important. For each mechanism for simulating the observed data, we analysed the data in three ways, each with several different settings. For complete data, we generated the unobserved data using a maintained treatment effect parameter k ¼ 0; 0:5; 0:74 or 1 and setting β 1 ð1Þ ¼ β 1 ð0Þ or β 1 ð2Þ. For causal model imputation, we imputed the missing data using the causal model assuming a maintained treatment effect parameterk ¼ 0; 0:5; 0:74 or 1, and settingβ 1 ð1Þ, the assumed value of β 1 ð1Þ, equal to β 1 ð0Þ or β 1 ð2Þ. For RBI imputation, we imputed the missing data using the reference-based imputation methods CR, CIR and J2R with the variance-covariance matrix taken from the control arm or from the active arm. In all cases, we estimated the treatment effect from a linear regression of Y 2 on randomised arm and baseline Y 0 . With imputed data, standard errors were computed using Rubin's rules. Table 2 displays the average estimated treatment difference at visit 2 for each data generating mechanism (columns) and each analysis method (rows).

Results
Comparing panels A (analysis of complete data) and B (analysis by causal model imputation) shows that the causal model imputation methods result in unbiased estimates when the assumed values ofβ 1 ð1Þ andk agree with the true values of β 1 ð1Þ and k.
Comparing panels B and C (analysis by RBI imputation) shows that the RBI estimates with variancecovariance matrix drawn from the control arm (as in CRK) agree with specific cases of the causal model estimates with β 1 ð1Þ ¼ β 1 ð0Þ, and the RBI estimates with variance-covariance matrix drawn from the active arm (as in RBI alternative) agree with specific cases of the causal model estimates with β 1 ð1Þ ¼ β 1 ð2Þ. Specifically, J2R corresponds tok ¼ 0, CIR corresponds tok ¼ 1, and CR corresponds tok ¼ the second element of β 1 ð1Þ which is 0.50 or 0.74 depending on the data generating mechanism.
Comparing different choices ofβ 1 ð1Þ when the observed data had either no selection effect (i.e. under MCAR) or β 1 ð0Þ ¼ β 1 ð2Þ (that is, in the first three data generating mechanisms), we see that choice ofβ 1 ð1Þ does not affect estimates, as expected from Section 3.4. Sensitivity to choice ofβ 1 ð1Þ was observed in the fourth data generating mechanism (MAR with β 1 ð0ÞÞβ 1 ð2Þ): mean causal model estimates were reduced by 0.29 by assuming β 1 ¼ β 1 ð0Þ instead of β 1 ¼ β 1 ð2Þ. Sensitivity to choice ofk was the same for all values of β 1 (see Section 3.4): for example, assumingk ¼ 0 instead of k ¼ 1 reduced mean causal model estimates by 0.50 irrespective of the value of β 1 . Table 3 displays the average standard error (SE) (the average of the 1000 SEs) and the empirical SE (the sample standard deviation of the 1000 point estimates) for the treatment difference at the final visit. Empirical and average SEs for J2R and CIR are similar to those for the corresponding causal model estimates. The SEs for CR are slightly larger than those for the causal model withk ¼ the second element of β 1 , because β 1 is estimated in CR whilek is an assumed value in the causal model. With MAR data, the larger average and empirical SEs due to using the variance-covariance matrix from the active arm rather than from the control arm arise mainly because there are no missing data in the control arm and heterogeneity is larger in the active arm than in the control arm. More importantly, as shown in Seaman et al. (2014), the results confirm that both RBI and causal model methods give (1) smaller empirical SEs than the estimator based on the complete data, and (2) larger average SEs (estimated using Rubin's rules) than the empirical SEs of the methods and the empirical SEs based on the complete data. We comment on these observations in the discussion.

Examples
We use two example data sets from randomised, double-blind, parallel-group studies comparing active treatment with placebo. The first is from a trial of 172 participants with major depressive disorders, taken from the DIA page of www.missingdata.org.uk, and used in the DIA working group to demonstrate various missing data related analytical methods. The outcome variable is the 17-item Hamilton Depression Rating Scale, HAMD17. The second, kindly supplied by Devan Mehrotra, is from a pain trial with a pain score as outcome. In the HAMD17 trial, 76% (64/84) and 74% (65/88) of the randomised participants completed the final (fourth) visit in the active and placebo arms, respectively. In the pain score trial, the completion rate at the final (sixth) visit was 70% (47/67) and 67% (36/54) in the active and placebo arms, respectively. In both trials, participants were not followed up after treatment discontinuation. The observed trajectory means and the frequency of dropout patterns in each trial are shown in Figure 2.
We used the SAS 5 macros for implementing the RBI methods and causal models (Section 4). For the RBI methods, we assumed participants in the active arm were treated similarly to the placebo arm after discontinuing the active treatment. To construct the joint distribution of pre-and postdiscontinuation active-arm data under the RBI methods, we first used the variance-covariance matrix from the placebo arm (RBI analyses) and then repeated the methods with the variancecovariance matrix from the active arm (RBI alternative analyses). Table 4 shows the estimated treatment effect on HAMD17 and pain score at the final visit from standard MI, MMRM and RBI methods. The standard MI and MMRM methods estimate the de jure estimand. These differ slightly for HAMD17 because of a small incompatibility between the imputation and analysis models: the imputation model uses all visits to estimate a common effect of the baseline covariate PoolInv, but the analysis model uses only the final visit. The RBI methods estimate the de facto estimand and show, as expected, treatment estimates of smaller magnitude than the de jure estimand, with J2R giving the smallest magnitude of treatment effect followed by CR. Table 3. Simulation study: average standard error (empirical standard error) for the treatment difference at the final visit using complete data, causal model imputation and RBI imputation. β 1 ð0Þ and β 1 ð2Þ as in Table 2 Using the variance-covariance matrix from the active arm rather than from the placebo arm gives slightly more conservative estimates. We next demonstrate tipping point sensitivity analyses using causal models (5) and (6). In model (5), a fraction k 0 of the treatment effect is maintained at all visits after discontinuation. Figure 3 shows the de facto estimates and 95% CI over a range of k 0 from −0.5 to 2.5. As shown in the theory and the simulation results, the J2R and CIR estimates correspond to using the causal Figure 2. HAMD17 and pain score data sets: observed mean profile according to the visit at which treatment was discontinued in the active and placebo arms.
Note: In the pain score data, four subjects in the active arm and two subjects in the placebo arm did not complete any post-baseline visit and were excluded from analysis. model with k 0 ¼ 0 (no maintained treatment effect after discontinuation) and k 0 ¼ 1 (fully maintained treatment effect after discontinuation), respectively. Values k 0 < 0 mean that the effect of treatment after discontinuation is harmful, while values k 0 > 1 mean that the effect of treatment after discontinuation is greater than before discontinuation. The tipping point analysis on HAMD17 shows that statistical significance is lost when k 0 < 0 (with variance-covariance from the placebo arm) or k 0 < 0:05 (with variance-covariance from the active arm). In both cases, this suggests that the de facto estimate of treatment effect on HAMD17 is non-significant only if any benefit of the active treatment is lost immediately following discontinuation. For the pain score trial, statistical significance is lost when k 0 < 1:1 or k 0 < 1:3 (depending on whether the variancecovariance matrix is taken from the placebo or the active arm, respectively). This suggests that, in order for the de facto estimate of treatment effect to be statistically significant, there would need to be a delayed benefit such that the treatment effect was greater after discontinuation than before discontinuation. In both trials, comparing Table 4 with Figure 3 shows that the MAR analyses give estimates of the de jure estimand that are numerically similar to the causal model estimates of the de facto estimand when values around k 0 ¼ 2 are assumed. In model (6), the treatment effect decays exponentially after discontinuation. Here, k 1 ¼ 0 for J2R and 1 for CIR. We took visits as the timescale, so that v t ¼ t in model (6). Figure 4 shows the de facto estimates of treatment effect at the final visit and its 95% CI from the causal model over a range of k 1 . This model does not accommodate the effect of treatment after discontinuation being either harmful or greater than before discontinuation, and because of the more limited range of k 1 , the tipping point is not reached: all results are statistically significant for HAMD17 and not significant for the pain score.

Discussion
We have considered longitudinal RCTs with quantitative outcomes in which participants who discontinue an active treatment are not followed up thereafter, but are assumed to receive a treatment similar to the control treatment. We have focused on estimating the effect of assignment to treatment in the actual treatment circumstances of the trial (de facto or treatment-policy estimand) rather than the treatment effect if all participants had tolerated or adhered to trial protocol (de jure or hypothetical estimand). We have proposed a generalised causal modelling approach to account for treatment discontinuation in the estimation of the de facto estimand. The proposed causal model makes an explicit assumption about the maintained causal effect of treatment after treatment discontinuation and provides flexibility to perform sensitivity analyses to the causal assumption. The causal model agrees with RBI methods in certain cases, and this provides a formal justification of these RBI methods.
The proposed causal model specifies how much of the treatment effect is maintained after treatment discontinuation, which we represent by the matrix K t . We illustrated this with two examples of K t : equation (5) with the maintained treatment effect independent of time since discontinuation, and equation (6) with the maintained treatment effect decaying exponentially with visits since discontinuation. A simple extension would allow K t to depend on the reason for treatment discontinuation. Ideally, sponsors should justify the choice of K t in the trial protocol based on the nature of the trial and the treatments.
The choice of regression slope β t ðtÞ in the imputation model, reflecting within-subject dependence of post-discontinuation outcomes on pre-discontinuation outcomes should similarly be pre-specified. It is hard to recommend a single choice and perhaps both β t ðtÞ ¼ β t ð0Þ and β t ðtÞ ¼ β t ðt max Þ should be implemented. If the analyst is willing to assume equal variance-covariance . HAMD17 and pain score data sets: tipping point analysis for the estimated treatment effect at the final visit using causal model (6). The model has the treatment effect decaying exponentially after treatment discontinuation, by a ratio k 1 for each visit. The horizontal solid and dashed lines represent the treatment estimates and their pointwise 95% CI, respectively. The tipping point is not attained in the range 0 k 1 1. matrices across trial arms then the situation is simpler and β t ðtÞ ¼ β t ð0Þ ¼ β t ðt max Þ is the obvious choice. It is sensible to make the corresponding choices for the residual variance matrix Ω t ðtÞ.
Our model has been presented for the case of active-arm treatment discontinuation, where subjects who discontinue do not then receive rescue medication over and above the per-protocol treatment regime for the control arm or when interest is in the effect attributable to the initially randomised treatment without the confounding effects of rescue medications. An unresolved problem is how to handle initiation of rescue medications when the confounding effects of rescue medications are of interest. The model can be extended to handle the control arm starting active treatment: an assumption like A6 still holds, but the K t matrix must be replaced by assumptions about how the treatment effect builds up over time.
Assumption A6 implies that if treatment has no effect before discontinuation then it has no effect after discontinuation. This seems reasonable in general; if it was unreasonable in a particular trial, then a constant term could be added in assumption A6 and equation (1). Other assumptions are possible, such as a non-linear model.
We have focussed on varying assumption A6, but we should also assess a number of other assumptions. The MAR assumptions A2 and A3, and the related assumption A5, could be made more plausible if the model could be extended to include further time-dependent covariates. Alternatively one could explore sensitivity to these assumptions by methods like those of Ratitch et al. (2013). It is less clear how to assess departures from the linearity assumption A4.
All the methods we have considered -RBI methods, causal model and MMRMmake a multivariate Normal (MVN) assumption. Our key finding that the causal model and RBI methods are equivalent is valid even if the MVN assumption is false. However, failure of the MVN assumption risks causing bias in all the methods. The assumption can be checked in the observed data using standard methods. If the MVN assumption is correct for the observed data and the maintained treatment effect model (6) is correct, then the imputed data have the correct mean, and so the treatment effect in the imputed data is unbiased even if the MVN assumption is false for the unobserved data. If data were skewed, then it would be wise to consider a transformation before analysis.
Our model applies to quantitative outcomes. Extension to other outcomes would be useful. The repeated-sampling variance of the estimated treatment effect tends to be smaller than the Rubin's rules estimate of variance for a given K t ( Table 3). The repeated-sampling variance can be approximated in practice using the delta method (Liu and Pang 2016;Oehlert 1992). Carpenter et al. (2014) argue that the repeated-sampling variance is not appropriate since it is typically smaller than the complete-data variance (to an extent which depends on the value of K t ). They also argue that the Rubin's rules estimate of variance of the treatment effect is larger than the complete-data variance, because of the information lost due to the missing data, and this makes it an appropriate variance (Carpenter et al. 2014;Cro et al. 2019). We point out that the type I error rate is correct for the repeated-sampling variance and too small for the Rubin's rules variance, meaning that the Rubin's rules variance carries a loss of power; therefore, the repeated-sampling variance may be appropriate for a primary analysis.
In summary, whilst MI is an attractive and powerful method for handling missing data in both experimental and observational studies, it is not always clear what estimand is being targeted or what assumptions are being made about how outcomes for subjects who discontinue randomised treatment relate to those who remain on study. The recent estimands debate (European Medicines Agency 2017) has led to a growing recognition that more complex estimation approaches that do not rely on randomisation may be needed to handle post-randomisation events that lead to missing data, and there are calls for causal inference methods to become more widely adopted (e.g. Akacha et al. (2017); Little and Kang (2015)). We join this call to encourage greater understanding and application of ideas from the causal inference literature to help support the definition and estimation of estimands of interest in a randomised clinical trial. We hope that this paper illustrates how a causal inference framework can provide clarity and rigour in stating estimands, stating assumptions, and performing estimation.