Nonparametric bounds for causal effects in imperfect randomized experiments

Nonignorable missingness and noncompliance can occur even in well-designed randomized experiments making the intervention effect that the experiment was designed to estimate nonidentifiable. Nonparametric causal bounds provide a way to narrow the range of possible values for a nonidentifiable causal effect with minimal assumptions. We derive novel bounds for the causal risk difference for a binary outcome and intervention in randomized experiments with nonignorable missingness caused by a variety of mechanisms and with or without noncompliance. We illustrate the use of the proposed bounds in our motivating data example of peanut consumption on the development of peanut allergies in infants.


Introduction
The goal of randomized experiments is to estimate the causal effect of an intervention such as a medical treatment, vaccine, or social program. However, when the sample arrived upon at the end of the study is missing outcome information, the causal effect may be nonidentifiable.
When there is no missing data, randomization allows for the identification of the the effect of being assigned to the intervention, sometimes called the intent to treat (ITT) effect; this is only equivalent to the intervention effect if subjects comply with their assigned intervention as directed. When this is not the case the intervention effect can also be nonidentifiable, even with no missing data.
There are few papers that focus on bounding nonidentified causal effects in randomized experiments with missing data. A notable exception is Horowitz and Manski [2000] who derive bounds for the risk difference conditional on a measured baseline covariate, making no assumptions about the missingness mechanism. Marden et al. [2018] derive bounds for population proportions under nonignorable missing outcome data, but not causal contrasts.
Additionally, practitioners almost always calculate an assumption free bound when outcome data are missing in a trial by imputing missing data in the least favourable way for the intervention. Specifically, if the intervention is expected to reduce the probability of the outcome being equal to 1, missing outcomes in the intervention arm would be imputed as 1, and in the control arm as 0, which is recommended as a sensitivity analysis by European Medicines Agency: CPMP/EWP/1776/99 [2010]. One can form bounds by additionally imputing in the most favourable way possible obtaining what we will call the best/worst case bounds.
Noncompliance is a well known concept in the causal inference literature. Balke and Pearl [1997] developed nonparametric bounds for the causal risk difference when subjects may not comply with the assigned intervention. When noncompliance is compounded by missing outcome data due to study drop-out, loss to follow-up and withdrawal of consent, the standard method of best/worst case imputation does not bound the intervention effect.
To our knowledge, bounds for the intervention effect have not yet been derived for settings with both nonignorable missingness and noncompliance.
Much of the nonparametric causal bounds literature uses the method developed in Balke and  for deriving valid and tight bounds. Valid means that there are no values of the true causal effect outside of the bounds, while tight means that there are no values inside the bounds that the true causal effect can not take on given the available information and assumptions. In order to use this method, the causal effect of interest and the constraints implied by the causal model must be stated as a linear optimization problem. For this reason, much of the literature on nonparametric bounds for causal effects has focused on simple random sampling in observational studies and completely observed data in randomized experiments, which can be easily stated as linear programming problems provided the causal target is linear. Kuroki et al. [2010] and Gabriel et al. [2020] are exceptions who derive bounds in settings that are nonlinear. Kuroki et al. [2010] derive bounds for the risk ratio under case-control and cohort sampling with and without missing exposure data. Gabriel et al. [2020] derive bounds under more general outcome-dependent observational studies. Although nonignorable missingness can be considered a form of outcome-dependent sampling, Gabriel et al. [2020] and Kuroki et al. [2010] do not consider settings with randomized exposure.
We derive bounds for the causal risk difference of an intervention under a variety of set-tings with nonignorable missingness of the outcome, with and without noncompliance, which is also subject to missingness, in randomized experiments. We consider three settings with perfect compliance, with differing forms of nonignorable missing data, and five settings that also have noncompliance. We only consider settings where missingness would make observation of compliance impossible, such as in our motivating example, where the intervention (peanut exposure) occurs repeatedly over long-term follow-up up to the time of the outcome measurement. While all three settings we consider under perfect compliance are novel, to our knowledge, three of the five scenarios we consider in the noncompliance settings are equivalent to instrumental variable scenarios considered in Gabriel et al. [2020]. In addition, in settings with noncompliance and nonignorable missingness of the outcome, we provide novel bounds under the assumption of no defiers, which in some settings are tighter than the bounds not assuming no defiers.
We map each of the scenarios with noncompliance to a scenario with perfect compliance to consider bounds for the ITT or assignment effect, which is then comparable to the best/worst case bounds in those settings, as the best/worst case bounds are for the assignment effect and not the intervention effect. Because of this difference in estimand, the best/worst imputation, which is often considered the most robust or least biased way to report effects in imperfect trials, can actually give much narrower bounds that do not even contain the causal effect of intervention when ignoring noncompliance. For this reason, when compliance is assessed, we recommend using our proposed bounds for the intervention effect in addition to best/worst case imputation for the assignment effect.
In our motivating example of the regular exposure of infants to peanut products prior to 60 months of life on allergic reactions to peanuts at 60 months there is both observed noncompliance and missingness due to dropout. In the primary publication of this trial, the classic worst case imputation method is used as a sensitivity analysis to nonignorable missingness [Du Toit et al., 2015]. We demonstrate that although this procedure covers the assignment effect, there is much greater uncertainty in the causal risk difference for the intervention. However, as all bounds exclude a null effect, we strongly confirm the findings of the study that regular exposure of infants to peanut products reduces their risk of peanut allergies later in life.
The paper is structured as follows, in Section 2 we define notation, provide basic definitions, assumptions, describe the causal models of interest, and review the relevant previously derived bounds. In Section 3 we describe the methods that we use to derive the novel bounds that we present in Section 4. In Section 5 we qualitatively compare the novel bounds and in Section 6 we carry out a simulation study to assess their performance. In Section 7 we analyse and discuss our motivating example, before providing a summary and discussion of future work and limitations in Section 8.

Notation
Let X be the binary intervention, Y the binary outcome of interest, and Y (x) be the potential (or counterfactual) outcome [Rubin, 1974, Pearl, 2009] for a given subject, if the intervention is set to level x. Let O be an indicator of having observable outcome and compliance information; O = 1 for "observable" and O = 0 for "not observable". Let U be a set of unobserved variables that will represent common causes or confounders with no restrictions on the distribution of U . Thus, the observed data distribution is given by p{X, Y |O = 1}; p{·} denotes the probability mass function. As this is a randomized trial and we know all subjects' X values we observe p{O = 1|X = x}, and therefore the probabilities of interest p{Y = y, O = 1|X = x} = p{Y = y|O = 1, X = x}p{O = 1|X = x} are observable or estimable.
When compliance is imperfect, the randomization and the actual intervention are not the same. Let R be the assignment of a subject to X, which is always randomized with R = 1 meaning that one was randomized to X = 1, and R = 0 to X = 0. Let Y (r) be the potential (or counterfactual) outcome for a given subject, if the randomization is set to level r, and let X(r) be the same for the intervention. In this setting we observe p{X, Y, R|O = 1}, but because we are only considering randomized trials, one will also always know the marginal probabilities of p{R = 1} and p{O = 1} and therefore, p{O = 1|R = r}. We can use this to obtain the probabilities of interest in this setting Note that in the noncompliance setting one may, in some settings, be able to observe p{X}, however, we do not consider these situations. We also do not consider settings where X may be missing for more subjects than Y ; we consider a single missingness mechanism where both X and Y are both observed or both missing.
Our target parameter of interest is the effect of the intervention as measured by the causal risk difference, Though this is likely the causal estimand of interest, in settings with noncompliance or where compliance is unknown one might also consider what might be referred to as the assignment effect, or the ITT effect, For convenience of notation, we define the following probability abbreviations. Let

Settings
The causal diagrams [Pearl, 2009] in Figures 1a -2e represent possible scenarios in a randomized experiment. Figures 1a -1c could be described as randomized experiments with perfect compliance but nonignorable missingness in the outcome. The nonignorable missingness mechanisms we consider are of three types: missingness that is only causally related to the outcome of interest (Figure 1a), missingness that is associated with the outcome of interest because of an unmeasured common cause of the missingness and the outcome, in addition to being causally related to the outcome (Figure 1b), and missingness that is additionally causally related to the intervention (Figure 1c).
Real life settings that fit all the perfect compliance scenarios are single time-point intervention trials where the intervention is administered at the time of randomization. Some examples are a one dose vaccine, a surgical intervention or a single dose intravenous treatment, where subjects may have previous been screened for entry into the study but are not randomized and therefore not actually enrolled until just before the intervention is performed.
Although this type of randomization procedure reduces or even eliminates compliance issues, unless the endpoint is immediate, such trials can still suffer from nonignorable missingness in the outcome. In contrast, any time an intervention requires active participation from the subjects under study, compliance as well as missingness can be issues. The actual intervention X may differ from the randomized assignment R, and therefore X and Y are confounded in all settings of Figure 2. Noncompliance alone alone can cause Figure   2a has noncompliance in addition to nonignorable missingness due to a causal effect of the outcome on the missingness without confounding. Figure 2b is the same as Figure 2a, with noncompliance in addition to nonignorable missingness due to a causal effect of the outcome on the missingness, but with additional confounding. Figure 2c through Figure 2e depict various causal relationships between the missingness and the outcome, the randomization and the true intervention X, but all have nonignorable missingness due to unmeasured common causes of the missingness and the outcome as well as potential causal effects of the outcome, interventions and randomization, all under noncompliance.
Real life trials that fit Figure 2 include any take-at-home medications, diet or physical activity interventions. When such a trial uses an intervention that is available to all participants, and is not blinded to participants, any type of noncompliance is possible. For example, in a randomized trial of diet and exercise it might be the case that being told not to exercise or diet may induce some participants to exercise, while telling those same subjects to exercise might overwhelm them or make them defensive, thus inducing them to not perform the randomized intervention regardless of their randomization. For this reason, bounds not considering any further assumptions about the type of compliance may be needed in many experimental settings.
In any of the settings with noncompliance it may be of interest to further consider if it is possible that subjects randomized to a particular intervention would defy it. This assumption can be stated in terms of the counterfactuals as Angrist et al. [1996] and others have referred to this assumption as monotonicity, but we will use the term no defiers for clarity. The no defiers assumption is justified in settings with experimental intervention only available to those randomized to it. Placebo subjects will not have access to the intervention and therefore X(r = 0) = 0. This setting implies no defiers, but this is not required for no defiers to be a plausible assumption.
Instead, our real data example offers a less restrictive setting where no defiers is plausible, but some randomized to no intervention are still observed to take some form of the intervention. Our real data example is a trial of peanut exposure for infants where children are randomized either to an intervention of consuming peanut products or to avoid all exposure to peanuts in an unblinded manner. Some parents elected to feed their children peanut products in the avoidance arm and some parents elected to avoid peanuts in the intervention arm. Provided the proportion receiving the intervention of peanut products is higher in the arm randomized to the intervention than in those randomized to no intervention, there are no observable ways to rule out no defiers. It is also hard to imagine a rationale that would compel these parents to do the opposite had they been randomized differently, although it is possible that we simply do not observe enough defiers to detect this pattern. We therefore consider bounds in all settings with noncompliance both with and without the no defiers assumption. Robins [1989] derived bounds in the setting with noncompliance without missing data, Fig-ure 2a without O. However, Balke and Pearl [1997] showed that those bounds are not tight The worst/best case bounds that are often used in practice can be written in terms of the true probabilities as:

Previous bounds
Replacing x in p y.x1 and p o.x with r, and ignoring x, in the case of noncompliance gives the theoretical best/worst case bounds for the assignment effect τ . We will compare to this theoretical version of the best/worst case bounds in what follows any time we are using the true rather than the estimated probabilities. Horowitz and Manski [2000], as mentioned in the introduction, derived bounds for risk difference conditional on a baseline covariate in randomized settings with missing data, making no assumptions about the missingness mechanism. It can easily be shown that in the special case where there is no baseline covariate that the bounds given in their corollary 1 of Theorem 1 simplify to the best/worst bounds given in (2).

Methods
Gabriel et al.
[2020] modified the method of Balke and Pearl [1994] to apply to a partially observed setting, providing bounds in the settings of Figures 2b and 2c, which they referred to as confounded outcome-dependent and confounded exposure-and outcome-dependent settings. However they considered these under the conceptual framework of an instrumental variable and a observational study with unmeasured confounding. We will use a similar approach to derive bounds for Figures 1b-1c and 2d-2e. Gabriel et al. [2020] use a different approach to account for the nonlinear constraint implied by the unconfounded sampling, i.e., the lack of an arrow from U to O in Figure 2a, and a setting similar to Figure 1a but with unmeasured confounding between X and Y . We will follow a similar approach to derive bounds for Figure 1a and Figure 2a assuming no defiers.

Linear programming
In order to use this algorithm to derive bounds that are valid and tight, one must derive linear constraints relating observed probabilities to counterfactual probabilities that are necessary and sufficient for the observed distribution to be in the causal model. We also show that the target quantity θ is linear in counterfactual quantities. Treating θ as the objective function and optimizing it subject to the linear constraints in terms of the observed probabilities is a linear programming problem. Solutions to this problem can be found symbolically by applying Balke's implementation of a vertex enumeration algorithm [Balke andPearl, 1994, Mattheiss, 1973]. This gives the bounds on the causal effect of interest as the minimum (maximum) of a list of terms involving only observable probabilities, each of which corresponds to a vertex. This demonstrates that for these problems in Figures 1b,   1c, 2b-2e, valid and tight bounds on θ can be derived symbolically in terms of p y1.x and p xy1.r according to this algorithm.

Expansion
In the settings of Figure 1a and 2a, the lack of an arrow from U to O implies that the constraints are nonlinear. We will therefore use a different approach that yields valid but non necessarily tight bounds in these settings. For Figure  The bounds given in (3) and (4) are valid for θ in the setting of Figure 1a provided that A(y, 0) = q 1y.o /q 0y.o is not undefined for any value y and p{X = 0|O = 0} > 0 and p{X = 1|O = 0} > 0.
( 3) and We give detailed derivations of Result 1 in the supplementary material. It is of note that we show that these bounds are valid, but do not claim that they are tight, in the setting of Figure 1a. In fact, the bounds are not tight and can be made tighter as discussed in Section 5.
All bounds that follow, other than those in Result 5, use the modification to the linear programming method of Balke [1995] that was first introduced in Gabriel et al. [2020], for partial observation of the joint probabilities of the data.

Result 2:
The bounds for θ given in (5) and (6) are valid and tight in the settings of Figure 1b. and Result 3: The bounds for θ given in (7) are valid and tight in the settings of Figures 1c.

Result 4:
The bounds for θ given in (8) and (9) are valid and tight in the settings of Figure 2c-2e. and 4.3 Figure 2 Bounds for θ under the no defiers assumption Result 5: Under the no defiers assumption, the bounds for θ given in (10) We derive these bounds by starting with the single term bound given in Balke [1995] for the setting of Figure 2a without missing data under the no defiers assumption, then use the same expansion procedure as described above to arrive at the bounds in (10). These are the first term of the lower/upper bounds for Figure 2a not assuming no defiers, which are given the supplementary materials.

Result 6:
Under the no defiers assumption the bounds for θ given in (11) and (12) are valid and tight in the settings of Figure 2b.
and θ ≤ min Result 7: Under the no defiers assumption the bounds for θ given in (13) and (14) and θ ≤ min

Figure 2 bounds for τ
Under perfect compliance, R = X, and therefore all Figure 1 bounds are for both θ and τ .
This is not the case with noncompliance. As the ITT or assignment results are often used in randomized clinical trials regardless of noncompliance or missingness issues, we map the bounds for Figure 1 for θ to the assignment effects bounds for τ in Figure 2.
Result 8: The bounds for θ given in (3) and (4) for Figure 1a are valid for τ , replacing X with R, in the setting of Figure 2a.

Estimation of bounds
Up to this point the bounds have only been discussed based on true probabilities. However, all proposed bounds are functions of probabilities that can be estimated by their sample proportions to produce estimated bounds. To account for the statistical uncertainty in the estimates due to sampling we suggest the nonparametric bootstrap [Efron, 1979], which we illustrate the use of in both the simulations and the real data example.
The bounds derived in Result 1 are valid but not tight. To tighten these bounds, note that any bounds that additionally allow for confounding of either the X −Y or Y −O relationships are valid under Figure 1a as well, and it is possible that they are sometimes narrower. This is shown in Figure S1 of  Figure 1a that will be used instead of the bounds in Result 1 for the remainder of the paper: This refinement also clearly holds for the bounds for τ in Figure 2a, and we will therefore use the bounds in (15) for the τ assignment effect bounds in Figure 2a, replacing X with R, for the remainder of the paper.
In addition to the refinement in Figure 1a, it is easily shown that the bounds in (7) are equivalent to the best/worst case bounds in (2). Therefore, whenever one uses the worst/best bounds in settings with perfect compliance one is, in effect, allowing for both confounding of the outcome and the missingness, and the missingness to be influenced by both the outcome and the intervention, as in Figure 1c. Although we make numerical comparisons in the simulations, the bounds in (7) will never differ from the bounds in (2). Balke and Pearl [1997], and [Robins, 1989], found that assuming "monotonicity", which they equate to the no defiers assumption as we present it, in the noncompliance setting results in a set of bounds that are a subset of the bounds not making the no defiers assumption. As pointed out in Balke [1995] because of the structure of the bounds, taking the maximum of the lower bounds terms and the minimum the upper, if the bounds derived under no defiers are valid, tight and a subset of the valid bounds not assuming no defiers, there is nothing gained by using the bounds assuming no defiers, as the only active terms in the bounds derived without assuming no defiers must be those terms given in the no defiers bounds, when there are in fact no defiers. Otherwise, the bounds assuming no defiers would either be invalid or not tight.
We also find that the tight and valid bounds derived via the linear programming method in the settings of Figure 2c-2e are a subset of the bounds derived without making the no defiers assumption. As they are both tight and valid, this implies that there is again nothing gained by assuming no defiers, as having no defiers will automatically make those terms displayed in (13) and (14) the only active terms in (8) and (9); or, under Figure 2c, the single terms from the set of four. We also demonstrate this via simulation. This does not hold however, under Figure 2b, as the terms given in (11) and (12) are not a subset of those in (5) and (6), and are occasionally tighter under no defiers, a fact we demonstrate via simulation.
The bounds given in (8) and (9), which are valid and tight in Figure 2c-2e, become the bounds in (7) when R = X. A similar equivalence was observed in the noncompliance setting with no missing data in [Balke and Pearl, 1997]. This is similarly true in the setting of Figure   2b, although we do not reproduce the bounds not assuming no defiers here. Considering the bounds assuming no defiers given in (11) and (12), it is easy to see that if R = X, then these bounds become those given for Figure 1b, (5)

Simulations
We carried out simulation studies in order to compare the width of the true bounds across the different causal diagrams, assess the impact of the amount of missingness on the width of the true bounds, and also to assess the performance of estimated bounds based on samples.
For the settings with noncompliance in Figure 2, we generate probability distributions p{U, R, X, Y, O} by modifying the model (16) by As above, the constants ε 1 , ε 2 , ε 3 determine which of the 5 settings in Figure 2 are satisfied.
We first generate 1000 distributions for each setting from the models in (16) and (17).
Then we compute the bounds under each setting and the best/worst case bounds using the true probabilities generated by the random coefficients. The relative widths of the bounds compared to the best/worst procedure for distributions generated under settings 1a -1c are shown in Figure 3. The bounds computed under 1a and 1b are always equal or narrower than the best/worst procedure, however when the distribution does not satisfy setting 1a, the 1a bounds occasionally do not cover the true θ, indicated by darker dots and boxes, and when the distribution does not satisfy 1b, the 1a and 1b bounds occasionally do not cover the truth. The bounds computed under settings 1c are numerically identical to the best/worst procedure, as expected.  . The y-axis shows the width of the bounds for each setting minus the width of the best/worst case bounds (denoted bw in the Figure). The light grey dots and boxes indicate cases where the bounds are valid (i.e., the true value is within the bounds), and the dark grey bounds that are invalid.
In Figure 4, we show the absolute width of the bounds for the settings of Figure 2. The bounds for the best/worst are frequently invalid for the intervention effect under all settings of Figure 2, as indicated by the darker shaded boxes and dots. The width of the bounds for the best/worst are clearly narrower, but since they target the assignment effect, τ , they frequently do not cover the true intervention effect θ. Under settings 2c -2e, the bounds computed under 2b are occasionally invalid. The bounds of 2a seem quite robust, as we did not observe any distributions in which the bounds of 2a were invalid, this robustness was also seen in Gabriel et al. [2020].
When generating distributions under α 3 > 0, which is implied by the no defiers assumption, we find that the no defiers bounds for setting 2b are narrower than the 2b bounds allowing defiers 28% of the time. The no defiers bounds for the other settings in Figure 2 are never narrower than the bounds allowing defiers for the same setting out of 10,000 generated distributions. These results are illustrated in Figure S3 of supplementary materials.
To investigate the impact of the amount of missingness on the informativeness of the study, we generate distributions with a fixed β 3 , γ 2 , and varied γ 1 . Figure 5 shows the average width of the bounds as functions of the proportion observed. Even with relatively small amounts of missing data < 5%, we can see that the bounds quickly become very wide, particularly in the settings of Figure 2. The width of the bounds also appears to be approximately linearly increasing in the proportion missing.
To investigate the performance of the estimated bounds, we fix the values of the parameters and generate trials of size n = 200 or 2000 from those distributions, calculate the empirical proportions needed to compute the bounds. We then use the nonparametric bootstrap of this procedure to compute quantile based 95% confidence limits for the lower and upper bounds. Coverage of the 95% bootstrap confidence intervals for the estimated bounds are shown in Table 1 for trial sizes of 200 and 2000 for a missingness probability of 25%. We consider several values of θ, over 1000 simulated replicates. We observe that a few of the confidence intervals have somewhat too small or too large coverage probability, but most have nearly 95% coverage, as expected. Using the upper confidence limit of the upper bound and the lower confidence limit of the lower bound, we observe 100% coverage of the true risk difference in these scenarios.

Real Data Application
Du Toit et al. [2015] present the findings from a randomized controlled trial designed to estimate the causal effect of peanut consumption on the development of allergy to peanuts in infants. 640 participants between 4 months and 11 months of age were randomized to either consume peanuts or avoid peanuts until the age of 60 months. Compliance with the assigned intervention was assessed weekly by using a food frequency questionnaire, and by manual inspection of the infants' cribs for peanut crumbs in a subset of participants. At 60 months, the primary outcome of peanut allergy was assessed using an oral food challenge.
Outcome data were missing in some participants either due to loss to follow up, or due to failure of the oral food challenge procedures. The publicly available trial data were downloaded from the The Immune Tolerance Network TrialShare website on 2020-06-15 (https://www.itntrialshare.org/, study identifier: ITN032AD).
This study clearly falls into one of the settings of Figure 2, as both compliance and missing outcome data were issues in the study. The primary results in the manuscript were reported as the proportion with food allergy at 60 months in the assigned intervention groups. The per-protocol analysis and the worst case imputation analysis were reported as sensitivity analyses. Here we compute and report our bounds.
Our estimated bounds for θ and τ are shown in Table 2 along with bootstrap 95% confidence intervals. We see that noncompliance and missing data lead to a great deal more uncertainty in the causal effect estimate relative to sampling variability. Nevertheless, the bounds still exclude the risk difference of 0, suggesting that there is compelling evidence that consuming peanuts reduces the risk of peanut allergy at 60 months. Compared to the point estimate of −0.14 reported by Du Toit et al. [2015], the range of possible causal effects goes from −0.01 to −0.29 without any additional assumptions. The original publication reports the per-protocol estimate of the intervention effect as −0.17, and the worst case imputation estimate as −0.12. Based on inspection of the publicly available data, however, their worst case imputation estimate is more accurately described as a "pessimistic imputation", rather than worst case, since not all subjects missing outcomes in the intervention arm were imputed with having an allergic event and not all subjects missing outcomes in the avoidance arm were imputed as not having an event. Thus, our best/worst case bounds cover, but are not exactly the same as their published "worst case" imputation results.

Discussion
To ensure validity of causal effect estimates in a randomized experiment, every effort should be made to avoid missing data due to drop-out [Fleming, 2011]. When missing data are unavoidable, our bounds can be used to quantify the uncertainty in the causal effect of an intervention while making minimal assumptions about the nature of the missingness mechanism. Our bounds can often be narrower than the best/worst case bounds in settings with perfect compliance. It is also of note that although the technique of best/worst sensitivity analysis is commonly applied and reported in clinical trials, to our knowledge the nonparametric bounds implied by the procedure based on the true probabilities have not been previously presented in this manner in the literature.
When noncompliance is also an issue our proposed bounds provide direct information on the causal effect of the intervention, in contrast to the best/worst case imputation approach which assesses the effect of assignment to intervention. Additionally, when no defiers is a plausible assumption, our bounds can be tightened in particular settings. Our motivating data example demonstrates how our bounds can be applied to answer important scientific questions regarding the size of causal effects in trials that are subject to noncompliance and nonignorable missing data.
We have assumed throughout that the practitioner, having randomized the experiment and followed its progression, has adequate knowledge to determine the underlying causal diagram. We acknowledge that this may not be the case. It may, however, be possible to use the observed data in some settings to infer the causal relationships via causal discovery algorithms [Spirtes and Glymour, 1991], or by observing that the computed bounds are not compatible with the assumed settings, i.e. the computed upper bound is less than the computed lower bound. This is of course a limitation of this work as in settings where the assumed causal diagram does not hold, the bounds are in no way guaranteed to cover the true causal effect. However, unlike observational settings, there are many characteristics of the experiment that can help narrow the set of plausible causal diagrams without the need to test. For example, in triple blind clinical trials it is implausible that randomization would have a causal effect on missingness, or that in a point-of-care single time point intervention there would be noncompliance. These characteristics should clearly be considered when selecting the assumed setting under which to calculate the bounds.
Although we have considered the addition of the no defiers assumption in settings with noncompliance, there are many additional monotonicity assumptions that could be made in the various settings. For example, it may be plausible that missingness is monotone in the intervention or outcome in some settings, which may lead to tighter bounds. Additionally, the stronger assumption that no control subject, R = 0, can take the intervention X = 1, may lead to tighter or simply different bounds than have been derived under the weaker no defiers assumption. Investigation of such additional monotonicity settings is a current area of research for the authors.  Figure 2 for distributions that are generated under Figure 2. The y-axis shows the absolute width of the bounds for each setting and the best/worst case bounds (denoted bw in the Figure). The light grey dots and boxes indicate cases where the bounds are valid (i.e., the true value is within the bounds), and the dark grey bounds that are invalid. The authors contributed equally to this work. corresponding author: erin.gabriel@ki.se October 13, 2020 1 Previous Bounds For convenience of notation, we define the following probability abbreviations. Let , and θ ≤ 1 − (q 10.1 + q 01.1 )q − min 1 1 + A(1, 1) , We will use these bounds in the refinement of our proposed Figure 1a bounds.
Bounds derived in Gabriel et al. (2020) for the setting of Figure 2a from the main text without the no defiers assumption, are provided below in terms of the abbreviations that match the main text, where B(y, r, o) = p 1y.ro /p 0y.ro .
2 Linear programming method In order to use this algorithm one must show that the problem in question can be stated as a linear programming problem. First, we observe that the causal diagrams in Figure 1 are defined for each possible level of w = (w x , w y , w o ). This representation permits us to relate potential outcome probabilities to probabilities of the response function variables.
To determine the constraints implied by the causal diagram, we can relate all observed probabilities to the probabilities of the response function variables by recursively evaluating the response functions, e.g., for a fixed w, the observed variables are determined by Therefore, for x, y ∈ {0, 1} Since W X is independent of (W Y , W O ), we can factorize the terms in the sum For Figures 2b-2e of the main text, the derivations become slightly more complex due to the additional variable. Nevertheless, the same logic allows us to factorize the response function variable for R out of the constraints, and therefore derive necessary and sufficient constraints on the probability distribution that are linear, and in terms of probabilities of the form p xy1.r .
The linear objective (the risk difference) and linear constraints defines a constrained linear optimization problem. Solutions to this problem can be found symbolically by applying Balke's implementation of a vertex enumeration algorithm . In brief, this algebraically reduces the variables in the optimization problem, then adds slack variables so that all constraints are converted into inequality constraints. The dual of this problem is to maximize (minimize) a linear function of the observed probabilities, subject to a set of constraints. Thus the extremum of the causal query as stated in terms of the potential outcome probabilities is equal to the extremum in the observed probability space defined by the dual constraints. Then, by noting that those constraints describe a convex polytope in the observed probability space, the global extrema can be found by enumerating all of the vertices of the polytope. This gives the bounds on the causal effect of interest as the minimum (maximum) of a list of terms involving only observable probabilities, each of which corresponds to a vertex of this polytope. This demonstrates that for these problems in Figures 1b, 1c, and 2b-2e, tight and valid bounds on θ can be derived symbolically according to this algorithm.

Derivation of Theorem 1
In the setting of Figure 1a of the main text, the data distribution is defined by the 8 probabilities q xy.o for (x, y, o) ∈ 0, 1. The unknown probabilities q xy.0 , (x, y) ∈ 0, 1, are constrained by xy q xy.0 = 1.
we can expand and derive bounds in our setting with partially observed data. To derive bounds for θ in the setting of non-ignorable missingness, Figure 1a, that take the constraint in (8) into account we proceed as follows. We partition 9 into observed and unobserved pieces.
and could set one or more of unobserved probabilities to zero. When p{O = 1|Y = y} = 0 for any value of y, this will be clearly observable, and A(y, 1) or A(y, 1) −1 will both be undefined. Although it is possible to bound the causal effect by replacing the max of unknown probabilities with 1 and the min of unknown probability with 0, this will never be truly informative and instead we suggest evaluating what went wrong in the trial and either abandoning the intervention or re-running the study altogether.
For example, if one observes no cases implying that p{O = 1|Y = 1} = 0, this may be because the event is extremely rare, the sample size is too small or follow-up time too short.
Alternatively if one observes that p{O = 1|Y = 0} = 0, either the intervention does not work and the event is extremely common or the follow-up time is too long and the outcome is inevitable, e.g. death or taxes. Additionally, if p{X = x|O = 0} = 0 for any x, such that the ratios are undefined, this may be due to the intervention having a direct effect on sampling, suggesting that the causal diagram in Figure 1a of the main text may not be the correct setting.  Figure 2 for distributions that are generated under α 3 > 0 which is implied by the no defiers assumption. The bounds ending in "m" are the ones derived under the no defiers assumption. The light grey dots and boxes indicate cases where the true value is within the bounds (valid), and the dary grey where the true value is not within the bounds (invalid).