Analyzing the Impacts of Public Policy on COVID-19 Transmission: A Case Study of the Role of Model and Dataset Selection Using Data from Indiana

ABSTRACT Dynamic estimation of the reproduction number of COVID-19 is important for assessing the impact of public health measures on virus transmission. State and local decisions about whether to relax or strengthen mitigation measures are being made in part based on whether the reproduction number, Rt , falls below the self-sustaining value of 1. Employing branching point process models and COVID-19 data from Indiana as a case study, we show that estimates of the current value of Rt , and whether it is above or below 1, depend critically on choices about data selection and model specification and estimation. In particular, we find a range of Rt values from 0.47 to 1.20 as we vary the type of estimator and input dataset. We present methods for model comparison and evaluation and then discuss the policy implications of our findings.


Introduction
During the first months of 2020, nations responded to the COVID-19 pandemic by adopting a variety of public health interventions, including contact tracing, disease surveillance, and mandated social distancing (Pan et al. 2020;Remuzzi and Remuzzi 2020;Walensky and del Rio 2020). Within the United States, the ongoing transmission of COVID-19 represents a serious public health threat and an ongoing strain on local, state, and federal resources. In the United States, direct public health authority is largely vested in states and localities, with local decision-makers playing a critical role in shaping public health responses and in deploying resources during times of crisis. The federal government, meanwhile, seeks to play a coordinating role through the Centers for Disease Control and Prevention (CDC), funds research through agencies such as the National Institutes of Health (NIH), and helps shape the regulatory environment through the Food and Drug Administration (FDA) and other agencies (Carpenter 2014;Sledge 2017; Hodge 2020; Thomson and Nachlis 2020).
While this system has the potential to be highly responsive and adaptive, it is prone to problems including divergent outcomes across political jurisdictions and difficulty coordinating responses to emergent events. The lack of widespread testing in the early stages of transmission in the United States forced policymakers to make decisions without high-quality data and foreclosed the possibility of effective disease surveillance, which might under different circumstances have proved a powerful public policy tool (Hellewell et al. 2020 anti-viral therapies, social distancing measures (including shelter-in-place orders and mandated closure of nonessential businesses) emerged as the primary tool at the disposal of state and local decision-makers (Gostin and Wiley 2020).
Despite the very real public health benefits of such interventions, they have potentially large economic and social costs, which are distributed unevenly across society. As the pandemic has continued, public and political pressure to relax public health interventions has increased. Policymakers, as a result, confront a complex set of problems and high levels of uncertainty (Head 2008;Baumgartner and Jones 2015;Epp 2018).
Given these circumstances, the swift assessment of how differing public health strategies impact the transmission of COVID-19 is critical to fostering flexible, focused, and datadriven policy-making (Frieden 2017). One means of measuring the impact of public health interventions is through the effective reproduction number of a virus, R t , for example, the average number of individuals an infected person directly infects. When R t > 1 and the majority of the population is susceptible, as during the initial stages of the pandemic, the number of new daily infections exhibits exponential growth. However, when R t < 1, the virus is no longer self-sustaining and will die out before most people in the population are exposed.
COVID-19's initial reproduction number, R 0 (when the entire population is susceptible and policy interventions are not in place) has been estimated across several studies to be around 3.28 (1.4, 6.5) . A study conducted with data from China through mid-February estimated that, as a result of public health interventions, the effective time-varying reproduction number, R(t), was reduced from 2 to 1 (You et al. 2020). In Singapore, the impact of social distancing on R was estimated to be between 78.2% and 99.3% (Lewnard and Lo 2020). Research on interventions in Europe observed that a combination of school closings, bans on mass gatherings, and other social distancing measures reduced R(t) below 1 (Flaxman et al. 2020). In the United States, state and local decisions about implementing, modifying, and relaxing social distancing measures have been informed in part based on whether the reproduction number, R t , falls below the self-sustaining value of 1.

A Need for Model Comparison and Evaluation
While many of the forecasting models guiding policy-makers on COVID-19 capture uncertainty in parameter estimates, a large number of these analyses are presented via stand-alone models: model comparison, evaluation, and goodness-of-fit tests are often not presented. In addition, there are a variety of data sources available to researchers, ranging from data aggregation websites (Dong, Du, and Gardner 2020) (https:// covidtracking.com/) to local government data portals (https: // www.coronavirus.in.gov/).
In this article, we show how estimates of the impact of policy interventions can vary depending on modeling and estimation choices, as well as the dataset that is used as an input. Understanding the role of model and dataset selection, we argue, is critical to high-quality policymaking during the COVID-19 pandemic and may help policymakers to more effectively prepare for and respond during the early stages of future disease outbreaks.
A variety of frameworks have been employed for modeling COVID-19 (Bertozzi et al. 2020), including agent based models, compartmental models, and branching point process models. Given our expertise and their broad use in estimating the reproduction number of a virus (Farrington, Kanaan, and Gay 2003;Wallinga and Teunis 2004;Cauchemez et al. 2006;Meyer, Held, and Höhle 2017;Schoenberg, Hoffmann, and Harrigan 2019), we focus here on the point process type of model. Within this framework, we compare three choices for modeling the impact of interventions on the transmission of COVID-19: (1) a step function modeling an immediate impact on R t at key policy change dates that is employed in the highly cited article (Flaxman et al. 2020), (2) a constant R t up until a key policy change date followed by exponential decay (Lekone and Finkenstädt 2006;Ivorra et al. 2020), and (3) a nonparametric histogram estimator that adapts to changes in the reproduction number over time. These choices are not meant to be exhaustive, but rather illustrative of the variation in estimates that can arise based on differing model and data choices.
We apply these models to both daily case and mortality COVID-19 data in Indiana from three different sources: the widely-used COVID-19 data portal hosted at Johns Hopkins University (Dong, Du, and Gardner 2020) (abbreviated as "jhu" throughout), the Covid Tracking project (https://covidtracking. com/) (abbreviated as "covidtracking" throughout), and the local Indiana data portal hosted at https://www.coronavirus.in.gov/ (abbreviated as "in.gov" throughout). Differences in COVID-19 data collection and reporting standards represent a critical challenge for both researchers and policymakers. One issue we investigate is the reporting lag of new cases and deaths posted to the Indiana state department of health dashboard, often several days after the testing date (with reporting sometimes paused on the weekend). While data on the Indiana state health department website are retrospectively updated and corrected, data on aggregation websites like covidtracking and jhu are based on daily updates to cumulative counts and are not retrospectively corrected. Consequently, artificial peaks and valleys are present in the covidtracking and jhu count data. As a result, Indiana provides an excellent test case to investigate different choices in data processing and their impact on critical parameters in models for the spread of Covid-19.
In Section 2, we present the branching point process modeling framework and discuss how these models can be estimated from data. Then, in Section 3, we present our results when the models are applied to Indiana COVID-19 data. We show that estimates of the value of R t in Indiana, and whether it is above or below 1, depend on the model and dataset used for estimation. We also present several methods that can either be used to compare competing models or used to assess the goodness of fit of a particular model. We find a range of R t values from 0.47 to 1.20 as we vary the type of estimator and input dataset. In Section 4, we discuss the policy implications of our findings.

Methods
We consider a branching point process (Hawkes and Oakes 1974;Meyer, Held, and Höhle 2017) framework to estimate a time-varying reproduction number R(t) (Wallinga and Teunis 2004;Cauchemez et al. 2006;Obadia, Haneef, and Boëlle 2012;Schoenberg, Hoffmann, and Harrigan 2019). The conditional intensity (rate) of infections is modeled as where R(t) and w(t) are the dynamic reproduction number and inter-infection time (also known as serial interval, Wallinga and Teunis 2004) distribution, respectively. We also include an exogenous rate μ modeling imported infections. The conditional intensity models the rate of new infections and is connected to the reproduction parameter R(t) through the serial interval distribution w(t). In particular, the expected number of new secondary infections on day t caused by an infection on day t i is given by R(t i )w(t − t i ). The point process governed by Equation (1) can be viewed as an approximation to the common SIR (susceptible-infected-removed) model of infectious diseases during the initial phase of an epidemic when the total infections is small compared to the overall population size and w(t) is specified to be exponential (Rizoiu et al. 2018). When w(t) is chosen to be gamma distributed, the Hawkes process also can approximate staged compartment models, like SEIR, if the average waiting time in each compartment is equal (Lloyd 2001). One can also allow the reproduction parameter to vary parametrically as a function of the overall infection rate, due to mitigation efforts and herd immunity, as in Schoenberg, Hoffmann, and Harrigan (2019).
We consider three competing models for R(t): 1. A step function that changes at the Indiana stay-at-home order date t sh , effective on March 24, 2020: This is analogous to the step function used in Flaxman et al. (2020) to assess the impact of public health interventions on COVID-19 in Europe. 2. An exponential decay (Lekone and Finkenstädt 2006;Ivorra et al. 2020) after the stay-at-home date of the form: 3. A histogram estimator that adapts to dynamic changes in R t over time Here the I k are intervals discretizing time, B is the number of such intervals, and r k is the estimated reproduction rate in interval k. In the remainder of the article, we use a bin-width of 1 week. We also merge the bins of the first 3 weeks due to the low number of events during that time period.
The branching process can be estimated via an expectationmaximization (EM) algorithm for maximum likelihood inference (Veen and Schoenberg 2008;Lewis and Mohler 2011;Mohler et al. 2011). Given initial guesses for the model parameters and r k , the EM algorithm iteratively updates the parameters and branching probabilities by alternating between the E-step update: and M-step update: where T is the total length of the observation period, N k is the total number of new infections in interval k, and w(t) is estimated via weighted maximum likelihood estimation (MLE) using the inter-event times as observations and branching probabilities as weights. The branching probability p ij corresponds to the probability that secondary infection i was caused by infection j in the dataset. While w(t) can be estimated using a Weibull, Gamma, or log-normal distribution, we use a nonparametric histogram estimator with bin-width of 1 day to prevent model misspecification.
Competing models can be compared using the Akaike information criterion (Akaike 1974), AIC = 2p − 2 log(L). The AIC balances goodness of fit measured by the log-likelihood, log(L), and over-parameterization by penalizing the number of parameters, p (lower AIC is better). Alternatively, the goodness of fit of a branching process model can be assessed using residual analysis of rescaled event times (Ogata 1988), The rescaled times are distributed according to a unit rate Poisson process if the model is correctly specified.

Using Reported Death Data to Estimate the Reproduction Number
We estimate the reproduction number of COVID-19 in Indiana from both new reported case data and mortality data. While it is standard to use reported infections to estimate the reproduction number, in the case of Indiana the daily rate of testing has steadily increased since the start of the pandemic (https:// www.coronavirus.in.gov/). In such a scenario, the reproduction number may be over-estimated as the rate of increase of reported infections is partly explained by an increase in testing. On the other hand, reported death counts also suffer from undercounting (Weinberger et al. 2020). We note that under certain modeling assumptions, the reproduction number can be estimated from either reported infections or deaths, though the latter estimate will be lagged due to the time between a confirmed case and a subsequent fatality. For example, consider a susceptible-infected-recovered-death (SIRD) model governed by dS/dt = −βSI/N, dI/dt = βSI/N − γ I, dR/dt = (1 − c)γ I, and dD/dt = cγ I. Here it is assumed that some fraction, c, of those who are infected will subsequently correspond to a fatality. In Figure 1, we simulate such a model with N = 10 6 , γ = 0.2, β = 0.2 and c = 0.01. We note that during the initial phase of the simulation when S ≈ N, the exponential growth rates of new daily infections and new daily deaths are the same. When we apply the EM algorithm with the histogram estimator in Equations (5)-(9) to the simulated SIRD data, we obtain similar estimates for R t , close to the true value of R 0 = 2, when using new infections (βSI/N) or new deaths (cγ I).
Similarly, in Figure 1, we also display a simulation of a Hawkes process of the form of Equation (1) with μ = 1, R 0 = 2, and inter-infection time distribution w(t) given by a Weibull(6, 2). We assume a fraction c = 0.1 of infections lead to a fatality where the infection-death inter-time distribution is given by a Weibull (5, 3). Again, we observe that the growth rate of new infections and new deaths is the same. When we apply the EM algorithm with the histogram estimator in Equations (5)-(9) to the simulated Hawkes process data, we obtain similar estimates for R t , close to the true value of R 0 = 2, when using new infections or new deaths.
However, there are certainly scenarios where the estimated reproduction number will be different when infections or deaths are analyzed. For example, transmission may not be a homogeneous process across the population, and instead subpopulations with higher (or lower) case-fatality rates could have higher (or lower) contact rates. There also could be temporal trends, for example, the case-fatality rate could go down over time as the quality of medical interventions improves. Including these types of effects in estimates of the dynamic reproduction number are outside of the scope of this article.

Results
We apply the estimation procedure outlined above to Indiana COVID-19 case and mortality daily counts (new cases rather than cumulative) from March 5, 2020 to April 26, 2020. While school and business closings occurred on March 16, 2020, there is limited case data available before this date and we therefore assume R 0 is constant across all models up until the stay-athome order on March 24, 2020.
The first observation of note is that the AIC values are lower for all models using the local in.gov data rather than data from aggregation websites. In the case of Indiana, new cases and deaths on a given date are routinely reported to the state department of health several days after the fact (with reporting often paused on the weekend). While the data on Indiana's state health website are retrospectively updated and corrected, the aggregation website data are based on daily updates to cumulative counts. Importantly, these sites do not go back and correct historical cumulative count data. Consequently, artificial peaks and valleys are present in the covidtracking and jhu estimates (see Figure 3). Our findings are consistent with recent recommendations to use local data whenever possible (Jewell, Lewnard, and Jewell 2020).
Variation in estimates of R t that arise from using either case or death counts is higher earlier in the Indiana epidemic. Estimates of R t are initially as high as 5 when using case data but are between 2 and 4 using death counts. The high value of R t early on for cases may be due to the initial lack of testing followed by a rapid growth in testing over a several week period.
We find a larger variation in R t across models than across data sources. For example, when applied to jhu case data the models provide final R t values (at our study's end date of April 26th) of 1.20 (histogram), 0.83 (step), and 0.66 (exponential). Across the datasets, the estimated value of R t at the final time tends to be higher for the histogram as it adapts to changes in reproduction after Indiana's March 24th stay-at-home order. While R t fell below 1 according to the histogram several weeks after the order, it later rose back above 1 (possibly due to lack of adherence to social distancing or the emergence of new clusters in counties outside of the Indianapolis Metropolitan area). The histogram estimator also consistently has the lowest AIC values of the competing models because of its ability to adapt to local changes in time.
While the AIC is useful for comparing competing models, goodness of fit of a point process model can be evaluated using  . Fitted intensity λ t curves for histogram estimator (blue), step function (red), and exponential decay (yellow) applied to different Indiana COVID-19 data types and data sources. Example realizations of λ t simulated for 14 days past the current date (dashed line) show growth or decay depending on whether R t > 1 or R t < 1. residual analysis. In Figure 3, we plot point process intensities fit to Indiana COVID-19 cases and deaths per day from March 5, 2020 to April 26, 2020. We again use a histogram estimate for the inter-infection time distribution ω(t) which we plot in Figure 4.
In Figure 5, we plot the normalized cumulative distribution of rescaled event times and compare them to confidence bounds of the cumulative distribution for a unit rate Poisson process. We find that the estimated intensity using the histogram and expo-  nential decay for R t provides a good fit to Indiana new deaths per day, whereas the intensity that uses a step function for R t under-estimates the empirical death rate. This is in comparison to Figure 3, where all of the intensities appear to give plausible fits to the data based upon visual inspection.

Discussion
Over the course of the COVID-19 pandemic, state and local policymakers have employed mathematical models and projections to inform decisions about implementing, relaxing, and reimposing a variety of public health interventions. Policymakers and public officials have at times focused on the predictions of individual models without making comparisons to other models that might yield different results. During the first months of the pandemic, notably, top presidential advisors regularly referenced the Institute for Health Metrics and Evaluation (IHME) model (IHME COVID-19 Health Service Utilization Forecasting Team and Murray 2020), which generated more optimistic projections than several high-profile alternatives (Bui et al. 2020). It is critical that policymakers and political leaders consider the role of model and data selection when attempting to respond to an outbreak such as the COVID-19 pandemic. Focusing on one model and data source, decision-makers might easily draw misleading conclusions about the impacts of public health interventions on the transmission of COVID-19. Confronted with the social and economic costs that flow from stringent social distancing measures, political leaders may be drawn toward the most optimistic projections. Here, however, we show that conclusions about when COVID-19 transmission might peak depend on the type of model that is used and on the data that is used.
In the present study, we find that a histogram estimator for dynamic R(t) provides the most accurate fit to Indiana COVID-19 data during the early stages of the COVID-19 pandemic, with the exponential model also providing a plausible fit according to residual analysis. We emphasize, however, that these results may not generalize to other datasets and time periods. For other data and for different time periods, model comparison and goodness of fit analysis should again be applied. We also note that while this analysis was limited to temporal data, in the future, access to detailed spatial-temporal data on individual cases may enable more precise estimation of spatial-temporal triggering in these types of models. During the early stages of the COVID-19 pandemic, it was important that forecasting models be rapidly developed to inform decision-making. Now that these models have been implemented and knowledge of COVID-19 transmission dynamics has matured, research is needed on the trade-offs between competing model and data choices. Here, we showed that, when varying the estimator of dynamic R t between three simple choices, along with 3 different data sources, we get dramatically different answers to the question of whether public health interventions in Indiana during Spring 2020 reduced the reproduction number to below one.
As the COVID-19 pandemic has progressed, decisionmakers have faced a myriad of new challenges and crosspressures. Economic concerns, social costs, and public fatigue with health interventions have interacted with misinformation and partisan positioning to help foster an environment in which key political leaders have publicly expressed comfort with ongoing community spread. As COVID-19's reproduction number declined during the late spring and early summer of 2020, states and localities began relaxing mandated social distancing measures. Often framed in terms of the need for increased economic activity, these policy changes had a substantial impact on individual behavior and on public perceptions of the threat of ongoing transmission. In many areas, relaxation was followed by periods where R t again rose above 1, resulting in new cases and deaths.
In several high-profile cases, state Governors implemented a new round of public health interventions. In California and Texas, for instance, Governors allowed bars to reopen and then moved to close them again as they became associated with increases in COVID-19 transmission. Similar patterns were observed during the 1918 Influenza pandemic (Bootsma and Ferguson 2007). In Indiana, meanwhile, Governor Eric Holcomb announced a new set of restrictions in November 2020 as case counts and hospitalizations began to soar. As of November 15, 2020, the histogram model estimate for the reproduction number was R t = 1.3 in Indiana.
Moving forward, modeling will continue to play a critical role in informing policy decisions. The results presented here emphasize both the complex nature of the pandemic and the critical importance of acknowledging that findings about the impacts of public policy interventions will vary as a result of model and data selection. In making decisions about public health measures and in gauging the impact of various interventions, policy-makers should be careful to consider model specification, goodness of fit, and the sensitivity of models to the choice of input data.