Marginal Structural Models to Estimate Causal Effects of Right-to-Carry Laws on Crime

Abstract Right-to-carry (RTC) laws allow the legal carrying of concealed firearms for defense, in certain states in the United States. I used modern causal inference methodology from epidemiology to examine the effect of RTC laws on crime over a period from 1959 up to 2016. I fitted marginal structural models (MSMs), using inverse probability weighting (IPW) to correct for criminological, economic, political and demographic confounders. Results indicate that RTC laws significantly increase violent crime by 7.5% and property crime by 6.1%. RTC laws significantly increase murder and manslaughter, robbery, aggravated assault, burglary, larceny theft and motor vehicle theft rates. Applying this method to this topic for the first time addresses methodological shortcomings in previous studies such as conditioning away the effect, overfit and the inappropriate use of county level measurements. Data and analysis code for this article are available online.


Introduction
concluded that the introduction of right-to-carry (RTC) laws in states in the United States decreases violent crime. These laws allow the legal carrying of concealed firearms for self-defense. Since then, many studies (see Section 5) have been conducted on this issue, with conflicting results. For example, Donohue, Aneja, and Weber (2019) concluded that RTC laws substantially increase violent crime. As described in Lott and Mustard (1997), Lott (2010), and Donohue, Aneja, and Weber (2019) RTC laws could simultaneously both have a positive effect (e.g., by deterrence) and a negative effect (e.g., by escalation or displacement) on crime.
The National Research Council (U.S.) (2004) concluded that to reach a robust scientifically supported conclusion, new analytical approaches are needed. For example, as described in Section 5, many methods typically used in previous studies can suffer from adjusting away the effect, overfitting and inappropriate use of county level measurements, among others. In this article, I attempt to address these deficiencies using marginal structural models (MSMs), a causal inference technique popular in epidemiology Robins, Hernán, and Brumback 2000;Hernán and Robins 2006a).

Background: Marginal Structural Models (MSMs)
MSMs are used in epidemiology to estimate the causal effect of a treatment on a chosen outcome in medical patients, from observational data Hernán, Brumback, and Robins 2000;Hernán and Robins 2006a introduce the theory of MSMs and illustrate their advantages over standard models below.
MSMs are based on the concept of counterfactuals, also called potential outcomes (Robins 1999;Höfler 2005). Counterfactuals are the outcomes that could have been observed, had a certain exposure been applied to an observational unit. For instance, the outcome after exposure to a medical treatment of patients, or the outcome after the introduction of RTC laws in states in the United States. Causal effects can then be defined as contrasts between these potential outcomes (Hernán 2004;Hernán and Robins 2006a).
For example, consider patients receiving either a medical treatment or a placebo in a clinical trial. The outcome Y could be survival (yes or no). With dichotomous exposure A, the potential outcome for individual i when receiving treatment level 0, the placebo is Y i,a=0 . The potential outcome for individual i when receiving treatment level 1, the medical treatment is Y i,a=1 . The causal effect of the medical treatment on survival, as compared to the placebo, for individual i could then be expressed as the difference between Y i,a=0 and Y i,a=1 .
At the population level, the causal effect of a dichotomous exposure A on an outcome Y can be defined as a contrast between distributions of potential outcomes. The distribution of outcomes when every observational unit would have received exposure level 0 is f (y a=0 ). The distribution of outcomes when every unit would have received exposure level 1 is f (y a=1 ). The causal effect comparing these two distributions could be expressed for instance as a difference or ratio of the survival rates. Such a contrast is referred to as a marginal causal effect. This is precisely the effect that is desired when estimating the causal effect of a specific action, treatment or policy, such as when estimating the effect of RTC laws on crime (Hernán 2004).

Marginal Structural Models (MSMs)
Using the counterfactual framework, the marginal causal effect of an exposure on a chosen outcome can be described quantitatively using a MSM Robins, Hernán, and Brumback 2000;Hernán and Robins 2006a). For example, the MSM could describe the causal effect of a dichotomous exposure A on a continuous outcome Y. The response variable Y a is the potential outcome that would have been observed, when every unit of observation would have received the same specific treatment level a. Parameter β 1 then quantifies the causal effect of A on Y. This parameter is equal to the difference in the mean of Y between the distributions of the potential outcomes corresponding to the two treatment levels of A, f (y a=0 ) and f (y a=1 ). The parameters of such a MSM could be estimated by fitting the standard regression model on observations{Y i , A i } from a randomized experiment with no selection bias. However, in observational studies bias due to confounding is often present, as described in the next paragraph.

Confounding
Confounding occurs when one or more covariates have a causal effect both on the exposure allocation and on the outcome (Pearl 2009). Such covariates are referred to as "confounders. " An example of a confounder in a medical setting is disease progression in HIV positive individuals, when estimating the effect of treatment on mortality. Disease progression could both affect the start of treatment as well as mortality . Or, in the present study, for example the composition of the state legislature could possibly affect both the introduction of RTC laws as well as crime rates, either directly or indirectly. Unadjusted effect estimators such as Equation (2) are biased estimators of the causal effect when confounding is present (Greenland and Morgenstern 2001). Adjusting for confounding is possible using various methods. I will describe both traditional "conditional" regression models in which confounders are included as covariates, and inverse probability weighting (IPW) to correct for confounding while fitting a MSM.

Conditional Models and their Drawbacks
The most commonly used statistical method to adjust for confounding is conditioning, as was indeed done in previous studies on the effect of RTC laws on crime (Section 5). Conditioning amounts to the pooling of associations, estimated within strata defined by confounders. In this manner, an overall adjusted effect estimate is obtained. Such a pooled estimate can be obtained by using stratification methods, or by including confounders as covariates in regression models (Fitzmaurice 2004;Ranstam 2008). A drawback of conditioning is that effects can be adjusted away, especially in a longitudinal setting. I will illustrate this in the context of estimating the effect of RTC laws on crime. I indicate time in years since a chosen baseline using j. Consider the time-varying exposure A j , the implementation of a RTC law at the state level (0 = no, 1 = yes). Also, consider violent crime rate Y j at the state level, and the composition of the state legislature L j . The effect of A j on Y j could be confounded by time-varying covariate L j . I illustrate this temporal structure at time points j and j − 1 in Figure 1 using a directed acyclic graph (DAG) (Hernán, Hernández-Díaz, and Robins 2004;Hernán and Robins 2006b). DAGs illustrate the assumed causal structure between variables, with nodes representing variables, and causal effects depicted by unidirectional edges (arrows).
Suppose that state legislature composition L j is a confounder for the effect of A j on Y j , assuming it has a causal effect on both A j and Y j , as indicated by . Assuming that the implementation of a RTC law at a given time point also has an effect on the composition of the state legislature at a subsequent time point (the effect of A j−1 on L j ), A j−1 has an indirect effect on Y j through L j , as indicated by . When "adjusting" for the state legislature composition in the previous year by including it as a covariate in a regression model, the indirect effect of RTC laws on crime is adjusted away (Robins 1997;Robins, Greenland, and Hu 1999).
The problem of adjusting away the effect will occur when conditioning on any variable that is also intermediate for the effect of the exposure. This can occur when estimating the effect of RTC laws on crime, when including possible longitudinal confounders in the regression model, as was done in previous studies on this topic. The association will still be a biased estimate of the true causal effect.
Other drawbacks of conditional regression models include non-collapsibility and collider stratification, which can both introduce more bias. Non-collapsibility entails that effect estimates from conditional models are only true estimates of marginal effects with a model that uses a linear or log-linear link function (Greenland, Robins, and Pearl 1999). Collider stratification is the introduction of bias by conditioning on a common effect of two variables. This can also occur in the longitudinal situation illustrated in Figure 1. For a more detailed explanation I refer to the literature (Greenland 2003;Whitcomb et al. 2009).
In this article, I use inverse probability weighting (IPW) to address some of the limitations of conditional modeling, as explained below.

Introduction to Inverse Probability Weighting (IPW)
The parameters of MSMs can be estimated using IPW to correct for confounding (Robins 1998). Fitting a MSM using IPW amounts to weighting each observation by the inverse of the probability of the observed exposure level, given the observed value of the confounders. Subsequently, a MSM regressing the outcome on the exposure is fitted on the weighted dataset. I illustrate this in a point treatment study (i.e., at one specific Figure 1. Illustration of the temporal structure between a (possible) confounder such as state legislature composition L, RTC law implementation A and the crime rate Y at two subsequent time points j − 1 and j. Confounding is indicated by . By conditioning on L, the indirect effect of A on Y through L, as indicated by , is adjusted away. time point) below, and generalize to a longitudinal setting in Section 3.4.
Consider a point treatment setting with a dichotomous exposure A, outcome Y and a vector of possible confounders K measured at a single time point. Using IPW, we can adjust for confounders K by weighting each observation i by the inverse probability weight I indicate the observed exposure and confounder status with a and k, respectively. The denominator of Equation (3) contains the probability of the observed exposure level given the observed values of confounders K. The denominator can be estimated from a model regressing P(A = 1) on K, either using the predicted probability or one minus the predicted probability, for a i = 1 and a i = 0, respectively. Weighting by w i creates a pseudo-population in which K no longer predicts A, but in which the causal effect of A on Y is still present (Robins 1998). Weighting observations i by w i one can then fit a model such as Equation (2) to estimate the parameters of MSM Equation (1). To increase statistical efficiency and attain better coverage of confidence intervals, it is recommended to use stabilized weights Cole and Hernán 2008), for example, The numerator of Equation (4) contains the probability of the observed exposure level, which can be estimated using the observed proportions. To further stabilize the weights, one can condition both in the numerator and in the denominator of Equation (4) on a set of covariates V that are not confounders.

Assumptions
The following assumptions are made when fitting a MSM using IPW (Cole and Hernán 2008). The assumption of consistency states that the counterfactual outcome corresponding to the observed exposure level is precisely the observed outcome (Robins, Greenland, and Hu 1999;Cole and Hernán 2008). This means that the exposure, the causal effect of which is to be estimated, needs to be clearly defined. It is also necessary to assume positivity, which means that every level of the exposure of interest has a positive probability of being allocated in every stratum defined by the measured confounders (Cole and Hernán 2008;Petersen et al. 2012). This assumption is also known as the assumption of experimental treatment assignment (ETA). The assumption of conditional exchangeability means that within strata defined by the measured confounders, potential outcomes are independent of the observed exposure level (Hernán and Robins 2006a). In practice, this holds when there are no unmeasured confounders.

Methods
I have estimated the causal effect of the adoption of RTC laws in states in the United States on crime rates with a MSM for each crime type, correcting for confounding using IPW. I combined this method with multiple imputation to deal with missing values. Below I describe the details of this method. I have implemented the described method in the R software (R Core Team 2021), version 4.0.5. I have made the data and statistical code available with this article, for full transparency and falsifiability, and to allow researcher to improve upon this analysis as they see fit (see supplementary materials).

Data
Observational units i are all 50 states in the United States, with measurements taken in calendar years T j = 1959, 1960, . . . , 2016, with j = 0, 1, . . . , 57 corresponding to those 58 calendar years, respectively. I have chosen to start the follow up at 1959 since that is when the first RTC law was implemented, in New Hampshire (Donohue, Aneja, and Weber 2019). I included the following variables in the dataset: Total reported numbers of crimes Y c ij in each state i, at the end of each year j, with c = 1, 2, . . . , 9 representing violent crime total, murder/manslaughter, forcible rape, robbery, aggravated assault, property crime total, burglary, larceny theft and motor vehicle theft, respectively. These variables as well as total state population P ij were obtained from Federal Bureau of Investigation (2019c) for 1960 up to 2014 for most states and for 1965 up to 2014 for the state of New York. Y c ij and P ij data for all states in 2015 and 2016 were obtained from Federal Bureau of Investigation (2019a) and Federal Bureau of Investigation (2019b), respectively. Corresponding crime rates can be computed from Y c ij and P ij . Measurements for these variables were missing for the state of New York in 1959 up to 1964 and in 1959 for the other states.
I have also used RTC law implementation A ij with 0 = restrictive (no-issue or may-issue) and 1 = permissive (shall-issue or unrestricted) according to Donohue, Aneja, and Weber (2019). A ij = 1 for the years in which a RTC law was in effect for the majority of that year, and A ij = 0 otherwise. I have collected possible longitudinal confounders for the effect of RTC laws on crime in vector L ij including the following. I used the state violent crime rate and property crime rate per 100.000 people, as computed above, at the end of year j−2. Since this variable is measured at the end of year, I have used a lag of two years to satisfy proper temporal ordering, as illustrated in Figure 1. This necessitated state level measurements of this variable from 1957 up to 2014. Measurements for these variables were missing for the state of New York in 1957York in up to 1964York in and in 1957York in , 1958York in , and 1959 for the other states.
I also included the state share of the total U.S. Gross Domestic Product (GDP), at the end of year j−2 to satisfy proper temporal ordering. This necessitated state level measurements for this variable from 1957 up to 2014. I obtained measurements from 1963 up to 2014 for every state from Bureau of Economic Analysis (2019), therefore, the variable had missing values for 1957 up to 1962. To improve normality, I used a natural logarithm transformation.
As possible confounders I also used various demographic variables including the proportions of the population that is female, white non-hispanic, and of age between 0-18, 19-39, and 40-64 (with 65 and above redundant), respectively. Since these variable are measured at the end of year, I employed a lag of two years necessitating measurements from 1957 up to 2014. I computed these proportions from population counts by sex, race and age obtained from different tables including US Census Bureau (2017aBureau ( , 2017bBureau ( , 2019aBureau ( , 2019bBureau ( , 2019cBureau ( , 2019. These population counts also included measurements for 1950 which I used to add further support to the multiple imputation model described in Section 3.2. Values for 1957Values for , 1958Values for , 1959Values for , and 1961Values for up to 1969 were missing. I used Population density computed from the population counts P ij and the state land areas in square miles obtained from US Census Bureau (2016). These were transformed using a natural logarithm to improve normality, and lagged by two years for similar reasons as the demographic variables.
As possible confounders I also included an indicator variable indicating that the state legislation has a Republican majority, measured at the beginning of the previous year, to satisfy temporal ordering according to Figure 1. I used data from Klarner (2013) and National Conference of State Legislatures (NCSL) (2019), before and from 1978 onwards, respectively. And I also used an indicator variable indicating that the state governor is a Republican, measured at the beginning of the previous year, to satisfy temporal ordering according to Figure 1. I used data from Klarner (2013) and National Conference of State Legislatures (NCSL) (2019), before and from 2009 onwards, respectively. The interaction between the indicator variables for the state legislation and state governor party was also included.
I assumed that the above included criminological, economic, political and demographic possible confounders all are likely to affect both the implementation of a RTC law and crime rates, just like variable L in Figure 1 with j representing follow-up time in years. The temporal ordering in Figure 1 is always satisfied since I used a lag of one year for confounders measured at the beginning of the year and a lag of two years for confounders measured at the end of the year.
I have used natural splines (De Boor 2001) with three degrees of freedom fitted on calendar time and follow-up time in the imputation model (Section 3.2), the main MSM models (Section 3.3) and the models for the implementation of a RTC law (Section 3.4). A natural spline with three degrees of freedom has two boundary knots and one interior knot, so that within two distinct periods different cubic trends can be fitted. I consider this choice sufficiently flexible to model calendar year in the context of crime rates and RTC law implementation, based on the observed trend of an increase of crime rates from 1960 up to the "crack era" of the 1980s and 1990s, followed by a decrease. I illustrate this trend in the longitudinal plots in the supplementary materials. I assumed that the relatively limited amount of data does not allow for a more complex model, although I explore varying degrees of freedom in the sensitivity analysis (Section 3.5).
I assessed the positivity assumption (see Section 2.5) graphically, using scatterplots of the implementation of a RTC law (yes/no) against each possible confounder, for each of the multiple imputation datasets that were generated as described below (see supplementary materials).

Multiple Imputation
To deal with the missing values in the data as described above I performed multiple imputation for multivariate, multilevel data using Markov chain Monte Carlo (MCMC) according to Schafer and Yucel (2002). I used a multivariate linear mixed-effects model to impute the missing values. As dependent variables I used the natural logarithm of the crime numbers, natural logarithm of state population numbers, natural logarithm of the state share of the total U.S. Gross Domestic Product (GDP), the proportions of the population that is female, white non-hispanic, and of age between 0-18, 19-39, and 40-64, respectively. As independent predictors in this model I used the intercept, the indicator for RTC law implementation, a natural spline (De Boor 2001) with three degrees of freedom fitted on calendar year, and a random intercept for each state.
Using 200 MCMC iterations after 5.000 burn-in iterations, I generated 25 imputed datasets from this model. The number of iterations and datasets was chosen based on technical feasibility in the sensitivity analysis as described below. On each of these 25 imputed datasets MSMs for the effect of RTC laws on crime rates were fitted using IPW as described below. Estimates and standard errors were combined according to Rubin's rules (Rubin 1987). After this, I performed a sensitivity analysis as described in Section 3.5.

Marginal Structural Models
To model the causal effect of the implementation of RTC laws on crime, for each of the crime types c I have estimated the parameters of a separate generalized linear mixed model (GLMM) (Wolfinger and O' connell 1993), with a quasi-Poisson link function: These models convert the numbers of crimes for each type per state and year Y c ij to crime rates using the offset P ij . The models include the effect θ c 1 of the dummy variable for the RTC laws per state, per year, a ij . The function f 1 (T j ) indicates a natural spline (De Boor 2001) with three degrees of freedom, as a flexible modeling of calendar time. ξ c i indicates a normally distributed random intercept at the state level. This approach is somewhat similar to the generalized estimating equation (GEE) model described by eq (1b) in Hernán, Brumback, and Robins (2002), with the addition of the random intercept.
Compared to a standard Poisson link function, the quasi-Poisson link function uses an additional scale parameter allowing for under-or overdispersion of the error distribution (Zeileis, Kleiber, and Jackman 2008). I have also incorporated an autoregressive correlation structure of order one (AR1) (Littell, Pendergast, and Natarajan 2000) between the repeated outcomes of each state. I consider this structure an appropriate choice for repeated measures over time, since the correlation between measurements declines when those measurements are spaced further apart in time (Littell, Pendergast, and Natarajan 2000). While using IPW to correct for confounding as described below, I performed two-sided significance testing at a significance level of 5%, compared to 0 under the null hypothesis for the main effects θ c 1 . The causal effects estimated by θ c 1 can be interpreted in the following manner. The MSMs Equation (5) model for each crime rate the contrast between two distributions of potential outcomes: (a) the repeated measurements of the crime rates in each state, when all states would never have implemented a RTC law during follow-up, and (b) the repeated measurements of the crime rates in each state, when all states would have always had a RTC law implemented, during the complete follow-up. This effect is quantified by the parameters θ c 1 , from which the risk ratios e θ c 1 can be computed. When such a risk ratio would for example, be equal to 1.1, that should be interpreted as the implementation of a RTC law increasing the corresponding crime rate by 10%, while a risk ratio of 0.9 would equal a decrease by 10%.

Inverse Probability Weighting
I have fitted model Equation (5) on the observed data, for each of the crime types, correcting for confounding by the longitudinal variables in L ij , using IPW. I have weighted each observation ij by the stabilized weights sw ij , similarly to Hernán, Brumback, and Robins (2002): I have estimated the factors in Equation (6) as follows. The introduction of a RTC law was never reversed in any state. I assumed that after the first instance of having a RTC law within follow up in a specific state, the elements P(A ik = a ik | . . .) are equal to one. In other words, after the first instance of having a RTC law within follow up, the probability of having a RTC law in that state is always one. Using only the data up to and including the first year in which a RTC law was implemented within follow up, I have estimated the other elements in the denominator of Equation (6) using the regression model log(− log(1 − P(A ij = 1))) = β 0 + β 1 f 2 (j) + β 2 L ij . (7) Model Equation (7) is similar to a Cox proportional hazards model, but with the outcome observed in discrete time, using the complementary log-log link function (Prentice and Gloeckler 1978). I have included main effects of the longitudinal confounders L, including the interaction between state legislature composition and governor party. The function f 2 (j) indicates a natural spline with three degrees of freedom fitted on follow up time, as a flexible baseline hazard function. The elements in the numerator of Equation (6) were estimated using a similar model including only f 2 (j) and the intercept.
As a robustness check, I computed Pearson correlation coefficients between the measured predictors that were included in the models fitted to estimate the weights. I regarded excessively high correlations (i.e., > 0.8) as an indication of possibly problematic multicollinearity in these models. When computing these correlations, I used only the data up to and including the first year in which a RTC law was implemented within follow up, to which these models were fitted, before imputation.
Note that exposure allocation model Equation (7) uses 16 parameters including the intercept. Given the 42 observed "events" corresponding to the first instance of having a RTC law implemented within follow up (see Section 4), I consider this the maximum acceptable complexity of the model, based on the simulation studies performed by Vittinghoff and McCulloch (2007).

Sensitivity Analysis
I have performed a sensitivity analysis to examine the robustness of the results. I have fitted the following variations of the main MSMs.
Variants 1-11 each subtract a specific component of the model Equation (7): in variant (1) violent crime is dropped from the model, in (2) property crime is dropped, in (3) GDP share, (4) proportion female, (5) proportion white non-hispanic, (6) the age variables, (7) population density, (8) the interaction between the indicator that the state legislature has a Republican majority and the indicator that the state governor is Republican, (9) state legislature, (10) state governor and in (11) both the state legislature and governor indicators are dropped.
In variant (12) I have added to the exposure allocation model specified by Equation (7) both the two-way interactions and the three way interactions between the spline fitted on follow-up time, the indicator that the state legislature has a Republican majority and the indicator that the state governor is Republican. In this manner, possible paradigm shifts within the Democratic and Republican party are captured. These paradigm shifts could modify the effect of these political variables on the probability that a RTC law is implemented.
As explained in Cole and Hernán (2008), the statistical efficiency of an IPW estimator can be increased by truncating the weights, at the cost of introducing a small amount of bias. When truncating, weights below or above a chosen percentile are set to that percentile, at the lower and upper end of the distribution of the weights, respectively. That is, a one-sided truncation proportion of 0.01 would indicate truncating at the 1st and 99th percentile. In the sensitivity analysis I have included the following variants: (13) weights truncated at the 1st and 99th percentile, (14) weights truncated at the 2nd and 98th percentile and (15) weights truncated at the 5th and 95th percentile.
Both in the exposure allocation model Equation (7) and the MSMs Equation (5)  In variant (21) I fitted GEE models for the causal effect of RTC law implementation on crime, as an alternative to the GLMM Equation (5). And similarly to Equation (5) these models also used a Poisson link function with an additional scale parameter and an AR1 correlation structure between the repeated outcomes. In addition, these models used fixed effects for each state and each year, resulting in relatively complex models.
In the next three variants I examined changes to the imputation model, while using the main MSMs. In variants (22) I dropped the indicator for RTC law implementation from the imputation model. In variants (23) I decreased the number of degrees of freedom for the natural spline fitted on calendar time to two. In variants (24) I increased this number of degrees of freedom to four.
In addition to the above described variants of the main MSMs, I have fitted two additional models, to compare the results with. In variant (25) I fitted standard regression models similar to Equation (5), but unweighted, and including the same covariates as Equation (7) to adjust for confounding by conditioning. In this manner, any effect of the implementation of a RTC law on crime that is indirect through any of the included covariates will be adjusted away, as described in Section 2.3. In variant (26) I fitted standard regression models similar to Equation (5), but unweighted, and without any other way of correcting for confounding. Therefore, these models are unadjusted.
Variants (27) were designed as MSMs similar to the main MSMs, but using a selection of states and years that is similar to the synthetic control approach of Donohue, Aneja, and Weber (2019). As described by Donohue, Aneja, and Weber (2019), the synthetic control approach estimated the effect of implementation of a RTC law for the 33 states that implemented a RTC law during 1981-2007, while using follow-up data from 1977 to 2014. Synthetic controls where constructed for each of the 33 "treatment" states using states with either (a) no RTC legislation as of the year 2014, or (b) states that passed RTC laws at least 10 years after the implementation in the specific treatment state (Donohue, Aneja, and Weber 2019). To emulate this selection I included any state where no RTC was implemented, or where a RTC law was implemented in 1981 or later. This resulted in using 44 states in total. I also used follow-up data from 1977 to 2014. Note that this selection precludes any left-truncation in the exposure allocation model Equation (7), since no switches can occur before the start of follow-up by definition.

Descriptive Statistics
The dataset contains 2900 years of total follow up. There were 42 states that had a RTC law implemented. This led to 1889 years of follow up without and 1011 years with a RTC law implemented, respectively. Table 1 presents basic descriptive statistics for the state crime rates, including the % of missing measurements that were imputed. Note that based on the minimum, there are no zero crime rates for any of these crime types at the state level. This precludes the occurrence of zero-inflation. The occurrence of zero-inflation would be problematic when using Poisson regression (He et al. 2014). Table 2 presents basic descriptive statistics for the possible confounders for the effect of RTC laws on crime, that were corrected for, using IPW as described in Section 3.4. This table also includes the % of missing measurements that were imputed, which is at maximum 20.7%. Absolute Pearson correlations between predictors averaged 0.24, with an interquartile range of 0.11 up to 0.32 and a maximum of 0.69. This does not indicate any relevant amount of multi-collinearity.

Main Results
Main results are presented in Table 3. At the chosen 5% significance level, RTC laws have a statistically significant effect on all crime rates except forcible rape. This includes 7.5% more violent crime total, 5.7% more murder/manslaughter, 10.9% more rob-   bery, 6.5% more aggravated assault, 6.1% more property crime total, 7.5% more burglary, 5.7% more larceny theft and 6.2% more motor vehicle theft. Table 4 contains effect estimates from exposure allocation model Equation (7), for numeric predictors (relative to a shift of one standard deviation) and binary predictors (relative to a one unit shift), respectively. These effects are summarized across the 25 multiple imputation datasets using the minimum, mean and maximum. The effect estimates presented in Table 4 are not to be interpreted as causal effects estimates, since they are obtained from a conditional model which was fitted with prediction as a goal. However, what can be concluded is that most of the included variables seem to have an effect on the implementation of a RTC law of a relevant magnitude. In addition, some of the estimated effects seem (post hoc) to have a direction that is quite logical. For example, when both the state legislature majority and governor are Republican, it seems almost three times more likely that a RTC law will be implemented. But it must be stressed that when one is interested in the causal effects of these variables, corresponding MSMs should be fitted. I have included positivity plots for each multiple imputation dataset in the supplementary materials. Across the observed range of most predictors, there are both measurements with and without the implementation of a RTC law. This lends support to the validity of the positivity assumption. In some ranges, often with sparser data, either no measurements with or no measurements without the implementation of a RTC law are observed. However, these ranges are relatively small, and close to the ranges with both outcomes observed. Since these are continuous variables, a small amount of extrapolation will be performed when computing the weights. The positivity assumption seems to be appropriate, especially in light of the results from the sensitivity analysis (Section 4.4) in which variables are dropped from the analysis. Descriptive statistics and boxplots for the IPW weights are also presented in the supplementary materials. From these I conclude that the mean is always close to one, and that the variability of the weights is comparable to that of Hernán, Brumback, and Robins (2002). Table 5 contains the results of the sensitivity analysis. The variants 1 through 24 all produce results that are very similar to the main results. These models all support the conclusion that the implementation of a RTC law at the state level increases both violent crime and property crime rates. The effect estimates for total violent crime and total property crime were always statistically significant, at least at the 0.05 level or lower. The effect estimates for total violent crime and total property crime were always in the order of magnitude of an increase of 5%-10% and 3%-7%, respectively. The effect estimates for the other crime rates were also always similar to the main results, for variants 1 through 24.

Sensitivity Analysis Results
The sensitivity analysis indicates that the results are not sensitive to dropping any variable or interaction term from the model. By adding the two-way and three-way interactions with time and the political variables, results similar to the main MSMs are obtained. By progressively truncating the weights, the results are more and more skewed in the negative direction, indicating less adjustment for confounding. Changing the number of degrees of freedom in the models, or adjusting the RTC implementation year for Pennsylvania from 1989 to 1996 does not change the results in a relevant amount. The alternative GEE models (variants 21) and variants 22 through 24 that examine changes to the imputation model also produce results quite similar to the main MSM results. The conditional models (variants 25) always produce estimates that are much smaller than the main results, and are far less often statistically significant. This confirms the expectation that using the conditional models, while adjusting for confounding, the effect of RTC laws on crime are at least partially adjusted away by conditioning on one or more variables that are intermediate for the effect of RTC laws. Estimates from the unadjusted models (variants 26) are even smaller, indicating that it is likely that confounding skews the estimates in the negative direction.
The estimated effects from the MSMs that were fitted on the period 1977-2014, similarly to Donohue, Aneja, and Weber (2019) (variants 27), were in the same direction as the main results except forcible rape (−1.3%). The effects were somewhat smaller in magnitude. The effects from variants (27) attained statistical significance for total violent crime, murder/manslaughter, robbery, total property crime and larceny theft. I make a further comparison with the results of Donohue, Aneja, and Weber (2019) in Section 5.4. None of the model variants in the sensitivity analysis support the conclusion that RTC laws significantly decrease any of the crime rates.

Discussion
The results from this study are very robust to variations in model specification, as investigated in Section 4.4. The assumptions made by fitting a MSM using IPW (see Section 2.5) are valid.
I have demonstrated the validity of the positivity assumption in Section 4.3. Regarding the assumption of conditional exchangeability, I have minimized unmeasured confounding by including a wide range of measured covariates, including crime rates, an economic indicator, demographic and political variables. The addition of new covariates could be tested in follow-up studies, for example, by other researchers that will have access to the data from this study as made available through the supplementary material.
Regarding the assumption of consistency, the specified exposure of the implementation of a RTC law at the state level is welldefined. However, it is of interest to estimate the effect of RTC laws conditional on possible longitudinal effect modifiers such as changes in incarceration rates or law enforcements budgets. To do so is not possible using a MSM, since it would lead to adjusting away the effect. When sufficient data is available, such an analysis could be done as a follow-up study using a so called history-adjusted marginal structural model (Petersen et al. 2007).
When drawing causal conclusions from a standard conditional model, the above described assumptions are also implicitly made. In addition, in the specific longitudinal situation described in this study, it would be necessary to assume that there are no longitudinal covariates included that are also intermediate to the effect of the exposure, as described in Section 2.3. Lott and Mustard (1997) and Lott (2010) While being groundbreaking in performing the first in-depth statistical analysis on the effect of RTC laws on crime, both the study by Lott and Mustard (1997) and the update Lott (2010) have some methodological drawbacks. A concern is that they use 36 highly collinear demographic variables, resulting in unstable effect estimates, as noted by Donohue, Aneja, and Weber (2019). When examining Lott and Mustard (1997) and Lott (2010) using the causal modeling framework of MSMs, I conclude that by using standard regression models, effects can be adjusted away, inducing biased estimates. Also, unlike the present study, crime rates at a previous time point were not used as predictors for the implementation of RTC laws. Other methodological issues include that Lott and Mustard (1997) and Lott (2010) did not properly adjust for zero inflation which certainly occurs at the county level, which is apparent when studying county-level crime data (see e.g., United States Department of Justice/Federal Bureau of Investigation 2014). The period examined was limited in both studies. They did not correct for clustering at the state level, whereas the present study uses random effects of state to do so.
These studies typically use standard conditional models, so that effects can be adjusted away. Previous studies also have never allowed for the possibility that crime rates at a previous time point could be a confounder for the effect of RTC laws on crime at a subsequent time point.
Many previous studies use a linear regression model with a normal error distribution, which is less appropriate for crime rates than Poisson models (Plassmann and Tideman 2001). Still, when using a Poisson link function, other studies do not take into account the possibility of under-or overdispersion.
A common problem in these studies is that the assumption of independence of measurements is made, when there is in reality a clustering of measurements taken within the same geographical unit (e.g., state or county). When dependence of measurements within clusters is assumed, a common failure is to not assume an appropriate correlation structure between the longitudinal measurements.
Many studies have corrected for covariates measured at the county level. Since RTC laws are implemented at the state level, not the county level, confounding occurs only at the state level, and it is sufficient to adjust only for state level variables. The use of county level measurements introduces unnecessary complexity and instability, in addition to having to deal with possible zero-inflation.
Many of these studies also suffer from overfit. Given the amount of available data, overly complex models were fit, yielding unstable results. Since there are only 50 states, and measurements within states are highly correlated, with crime rates which are very low proportions, I regard the available dataset as relatively small. Therefore, I was conservative in the amount of nuisance parameters that I used.
I could not find a previous study in which none of the abovementioned problems was present. In most of the cited studies several of these problems persist. As mentioned in the introduction, the National Research Council (U.S.) (2004) indeed concluded that to reach a robust scientifically supported conclusion, new analytical approaches should be developed. The present study is a response to that call for action, as was also done earlier by Donohue, Aneja, and Weber (2019), as described in the next paragraph.

Comparison to Donohue, Aneja, and Weber (2019)
The method of constructing state level synthetic controls in Donohue, Aneja, and Weber (2019) is somewhat similar to the G-computation algorithm described in Van der Wal et al. (2009) to fit a MSM, estimating a causal effect with panel data. The present study and Donohue, Aneja, and Weber (2019) both estimate a causal effect while adequately correcting for longitudinal confounders, but using different methods The specific causal effects that were estimated are different in both studies. The present study estimated a risk ratio between crime rates when all 50 states would never have implemented a RTC law versus when all states would have always implemented a RTC law, during the complete follow-up. Donohue, Aneja, and Weber (2019) compared crime rates between having a RTC law versus not having a RTC law, after the moment that a RTC law was actually implemented, in the 33 states that did implement a RTC law.
The choice of possible confounders that are adjusted for is also not the same for Donohue, Aneja, and Weber (2019) and the present study. Both studies use a broad selection of possible criminological, economic, political and demographic confounders which would suggest that confounding adjustment is adequate in both studies. Given the limited amount of data, it is necessary to use parsimonious models. Both approaches are comparable in complexity. Donohue, Aneja, and Weber (2019) used follow-up from 1977 to 2014, while the present study has a much longer followup, which is likely to yield more statistical power. Another difference is that Donohue, Aneja, and Weber (2019) did not incorporate an AR1 autoregressive correlation structure, while the present study does.
Using the synthetic control approach, Donohue, Aneja, and Weber (2019) mainly found a significant effect of RTC law implementation on violent crime rates. This effect was estimated conditional on year after implementation of the RTC law, and ranged from −0.117% after 1 year up to 14.344% after 10 years, averaging 8.45%. The pseudo p-values of Donohue, Aneja, and Weber (2019), taking full account of the uncertainty in the estimate, indicate that the effect is significant at the 0.05 level only after 8 years. When fitting a MSM on a similar (but not equal) selection of data in the sensitivity analysis (variant 27 in Section 4.4), I have found a statistically significant effect of RTC laws on violent crime of 2.8. And while Donohue, Aneja, and Weber (2019) found no convincing effect of RTC laws on murder and property crimes, in variant 27 I have found significant increases of 5.6% for murder/manslaughter, 7.8% for robbery, 2.7% for total property crime and 2.8% for larceny theft. These differences can be explained from the many methodological differences described above. Furthermore, from the main MSM encompassing all 50 states, I have found even larger effects of RTC laws on crime, that where always statistically significant except for forcible rape. My main MSM takes into account more fully the development of crime rates over time in the states that have not implemented a RTC law.
In addition to the main synthetic control approach, Donohue, Aneja, and Weber (2019) also fitted standard conditional regression models (referred to as "panel data estimates"), on all states. They found significant effects of 9.02% more violent crime and 6.49% more property crime. Judging from my own sensitivity analysis, these conditional estimates are likely to underestimate the true effect, although the estimates of Donohue, Aneja, and Weber (2019) are substantially larger than those from variant 25 in Section 4.4. Donohue, Aneja, and Weber (2019) also clearly demonstrated the instability that arises by including the 36 demographic variables of Lott and Mustard (1997) and Lott (2010).
Certainly, both from the present study and Donohue, Aneja, and Weber (2019) can be concluded that the implementation of a RTC law at the state level will cause a substantial increase in violent crime.

Conclusion
This study demonstrates that marginal structural models (MSMs), fitted by inverse probability weighting (IPW), are an appropriate and convenient instrument for policy evaluation in a longitudinal setting, comparing separate entities such as states, cities or countries. This method allows for correction for confounding variables, while avoiding the drawbacks of more standard conditional models such as adjusting away the effect. I have applied this method to this topic for the first time, while addressing methodological shortcomings in previous studies.
The results from the present study support the conclusion of Donohue, Aneja, and Weber (2019) that RTC laws increase violent crime. However, while Donohue, Aneja, and Weber (2019) estimated the effect of implementing a RTC law only in 33 states that did implement such a law using their novel synthetic control approach, the present study estimates the difference in having and not having a RTC law implemented in all 50 U.S. states.
The results indicate that RTC laws cause a substantial increase in both violent crime (7.5%) and property crime rates (6.1%). In the 42 states with a RTC law in effect in 2016, the increase corresponds to approximately 66.000 additional violent crimes and 352.000 additional property crimes per year.