Statisticians Engage in Gun Violence Research

Abstract Government reports document more than 14,000 homicides and more than 195,000 aggravated assaults with firearms in 2017. In addition, there were 346 mass shootings, with 4 or more victims, including over 2000 people shot. These statistics do not include suicides (two-thirds of gun deaths) or accidents (5% of gun deaths). This article describes statistical issues discussed at a national forum to stimulate collaboration between statisticians and criminologists. Topics include: (i) available data sources and their shortcomings and efforts to improve the quality, and alternative new data registers of shootings; (ii) gun violence patterns and trends, with statistical models and clustering effects in urban areas; (iii) research for understanding effective strategies for gun violence prevention and the role of the police in solving gun homicides; (iv) the role of reliable forensic science in solving cases involving shootings; and (v) the topic of police shootings, where they are more prevalent and the characteristics of the officers involved. The final section calls the statistical community to engage in collaborations with social scientists to provide the most effective methodological tools for understanding and mitigating the societal problem of gun violence.


Introduction
The CDC reported that in 2017 there were 14,542 firearm homicides (Kochanek, Murphy, Xu, and Arias 2019). The FBI also reported in 2017 there were 195,194 aggravated assaults with a firearm (Federal Bureau of Investigation 2018). Among these incidents, there were 346 mass shootings, shootings in which four or more people were shot or killed, not including the shooter, with 2240 people shot, 437 of them fatally (Gun Violence Archive 2018).
These basic statistics indicate the scale of the gun violence problem in the United States. However, these statistics do not give us knowledge or insight into the impact of gun violence on the commnuities where it occurs, its drag on economic development, nor the fear it imparts in families and children. Gun violence is a widely varying phenomenon. Domestic violence, gang shootings, mass shootings, and police shootings vary in their environments, causes, perpetrators, and victims. Due to this variation, understanding each of these types of gun violence requires different data collection and analytical methods. Ultimately avenues for prevention will also vary greatly.
Statisticians have much to offer the study of gun violence. The quality of statistical methods used in the field could use a lift. There are numerous, high-quality research projects on gun violence, including epidemiological studies, quasi-experiments, and randomized controlled trials. However, a substantial amount of gun violence research uses weak data sources with poor choices of statistical methods. A greater presence of CONTACT James L. Rosenberger jlr@psu.edu Department of Statistics, Penn State -Main Campus, 326 Thomas Building, University Park, PA, 16802-2111. statisticians working in collaboration with criminologists on gun violence research could raise the bar.
In this article, we introduce the statistical community to recent gun violence research, with the intention to encourage greater participation by statisticians in this research. To do so, we summarize results from the 2019 forum, "Gun Violence-The Statistical Issues, " sponsored and hosted by the National Institute of Statistical Sciences (NISS). The forum was targeted at developing ongoing collaborations among cross-disciplinary teams to pursue needed research of high quality (NISS 2019). The goal of the forum, as well as this article, is to inspire policymakers to better understand and implement winning strategies to reduce gun violence in the United States of America.
The remainder of this article is organized as follows. Section 2 describes available data sources with information about their limitations and quality. Section 3 introduces some of the research using the available data sources and highlights gaps and challenges that should be addressed with future research efforts. Section 4 reports on policing and gun violence prevention strategies that have been tried, with an emphasis on reliable outcomes and limitations that could be overcome with additional research. Finally, Section 5 describes several studies that investigate shootings by police.
This article focuses primarily on gun homicides and ignores several equally important topics related to gun violence, namely suicides (two-thirds of gun deaths) and accidents (5% of gun deaths).

Data Sources
Data are essential to the study of gun violence. In this section, we review some of the data sources used in gun violence research. They vary in focus, detail, completeness, quality, accessibility, and ultimately utility for answering fundamental gun violence questions.

National Data Sources
The U.S. Department of Justice, through the Bureau of Justice Statistics (BJS) and the Federal Bureau of Investigation's (FBI) Criminal Justice Information Services (CJIS), maintains numerous data collections and sources that would be relevant for statisticians interested in studying gun violence.
The National Crime Victimization Survey (NCVS) is one of the largest, regular national surveys in the United States. The NCVS surveys roughly 50,000 households and 100,000 people every six months. The survey asks respondents questions about recent victimization incidents, including details such as their relationship with the perpetrator, whether they involved the police, whether they were injured, and the presence of a firearm in the incident. The NCVS has been a primary data source for studying the use of firearms in committing crimes and in defensive gun use (National Research Council 2005).
While the NCVS cannot provide complete data on homicides, other sources of data on firearm homicides include the FBI's Supplemental Homicide Reports (SHR), the Centers for Disease Control and Prevention's (CDC) National Vital Statistics System (NVSS) Fatal Injury Reports, and the CDC's National Violent Death Reporting System (NVDRS) (Bureau of Justice Statistics 2014). The SHR collects detailed information on individual homicides, including details on the victim, perpetrator (if known), their relationship, and circumstances surrounding the homicide. However, while police departments have incentives to contribute their crime data to the FBI, they are not compelled to do so. The FBI's Uniform Crime Report (UCR) homicide count is 20% higher than the number of homicides in the SHR. While the SHR data come through law enforcement, the NVSS tracks all deaths through death certificates. Medical examiners and coroners are required to submit data on all deaths to the NVSS, including homicides. The individuallevel data are not only easily accessible to the public, but also state-level estimates by demographic categories are available. The NVDRS synthesizes data from several sources, including law enforcement, medical examiners and coroners, and death certificates. It aims to include detailed information about the victim, including mental health problems and treatment, toxicology results, financial stressors, and physical health problems.
Data sources relevant for gun violence research are moving toward more incident level data, rather than city and stateaggregated count data. For decades the UCR crime counts were the best available data offering a picture of crimes reported to the police. The National Incident-Based Reporting System (NIBRS) was designed to replace the basic crime count data with incident-level detail, including details relevant for the study of gun violence. Launched in the late 1980s, participation in NIBRS has been modest. By 2019, some states, such as Kentucky, Arkansas, and Vermont, had comprehensive incident-level data in NIBRS. Other states, such as California, Florida, and New York, did not have a NIBRS-certified program. Of the nation's 17,000 law enforcement agencies, in 2019, only 43% participated in NIBRS. However, all crime data collection will transition to NIBRS by 2021. NIBRS offers a very large data source on specific crime incidents and includes the date, time, and place of crime, relationship of victims and perpetrators, whether the incident was solved, and the use of weapons.
The BJS also designed the Survey of Prison Inmates (SPI), last conducted in 2016. SPI uses a two-stage cluster sample. In 2016 SPI conducted 37,000 face-to-face interviews with prisoners at 385 state and federal prisons. The survey includes several questions about the acquisition and use of firearms in the commission of crimes. From this survey, we know that approximately 1 in 8 prisoners brandished or discharged a firearm while committing the crime for which they were currently serving prison time and that less than 2% of them acquired their firearm through a legal retail sale (Alper and Glaze 2019).

Local Data Sources
Most states do not collect or retain information on firearm sales or ownership. California is one exception as it maintains the Automated Firearm System (AFS). All firearms in California should be registered in AFS. All retail sales in California go through the Dealer Record of Sales Process (DROS), providing reasonably comprehensive data on retail sales. Private sales by law should also go through this process, but we do not know how many private sales go unrecorded. The data allow the California Department of Justice to identify the last legal owner of firearms lost, stolen, or recovered in connection with a crime. Such data allow researchers to study the effect of gun ownership. A team of researchers at Stanford created a dataset of 28.7 million adults in California, followed for up to 12 years (Zhang et al. 2020). Using AFS data, they found that one million cohort members purchased handguns during the study period, one million cohort members died, with 15,000 of those deaths from firearm-related injuries (mostly suicide), and importantly the firearm death rate was many times higher for gun owners than non-gun owners.
Police departments, particularly the largest ones, have greatly invested in data collection and analysis in the past decade. The Chicago Police Department (CPD), for example, has a large data collection and reporting system on time, place, and context of shootings and gun homicides. They have a network of acoustic gunshot detectors, 35,000 cameras including videos of shooting incidents, and electronic monitoring of offenders, particularly gun offenders. Using these data, CPD creates risk predictions for places and people. Many of CPD's datasets are available publicly through their open data portal. Many other cities post gun violence data through their open data portals, including data on all shooting victims in Philadelphia, over 10 years of shooting incidents in New York City, and police shootings in Dallas.
There have also been local efforts to interview those involved in gun crimes. The Chicago Inmate Survey (CIS) is akin to the SPI. University of Chicago Crime Lab researchers interviewed 221 male inmates in 2016, asking them about their gun acquisition and possession during the six months prior to the arrest that leads to their current prison sentence. From these interviews, researchers have been able to quantify gun carrying and possession behaviors. The median time from acquisition to first known use in crime is two months, 42% of gun-involved respondents did not have any gun six months prior to their arrest for the current crime, and almost all were legally barred from purchasing a gun from a gun store because of their prior criminal record. Their guns were obtained via illegal transactions with friends, relatives, and the underground market (Cook, Pollack, and White 2019).

Specialty Data Sources
Concerns about the adequacy of data on gun violence, particularly on certain categories of gun violence, have prompted nonprofits and data journalists to begin compiling gun violence data. The Gun Violence Archive (gunviolencearchive.org) is a private effort to track all gun-related violence in the United States. They use automated and manual searches of law enforcement data sources, media, and government reports to identify all incidents of gun violence, including police shootings, mass shootings, defensive gun use, armed robberies, and homicides. Mother Jones, a left-leaning magazine, created a comprehensive compilation of data on mass shootings (shootings involving four or more fatalities, three or more fatalities since 2013) going back to 1982 (Follman, Aronsen, and Pan 2019). The Washington Post has been compiling data on all fatal police shootings since 2015 (Tate et al. 2016).

Gun Violence Patterns and Trends
With access to useful data sources, researchers can explore patterns and trends in gun violence data. While the United States witnessed a large decline in homicide in the last 30 years, those declines have not been to the same degree everywhere and for everyone. Dissecting and disaggregating gun violence data show that the decline is not uniform. The trend in homicide in Ohio differs greatly from the trend in New York. These lead to questions about causes of the local variations in gun violence and gun homicide. Aggregate trends hide underlying patterns.
Drug market violence historically has been driven almost entirely by gun violence. The emergence of crack markets in the 1990s went hand-in-hand with an increase in gun violence. The opioid epidemic is the latest drug problem for us to face, but little research has explored the impact of opioid markets on gun violence. Some large cities have experienced a recent rise in homicide rates, and the opioid epidemic began a few years prior to that rise. In addition, homicide rates have increased among the white population, the same group the opioid epidemic has afflicted the most. This has led some to believe that the opioid epidemic may be causing increases in homicide (Rosenfeld, Gaston, Spivak, and Irazola 2017). At the 2019 forum, Richard Rosenfeld presented work that showed significant effects of the opioid death rate on the white homicide rate and drug-related homicide rate (Rosenfeld and Fox 2019). This result has implications for expanding treatment to reduce opioid demand and cutting off legitimate sources of supply that could strengthen street drug markets.
In her recent work (Lauritsen and Lentz 2019) presented at the forum, Janet Lauritsen dissected both national and local trends in homicide to show that the rate of gun use has not changed much in places with spikes in gun homicides, but shootings appear more likely to be lethal. At the national-level, Lauritsen showed increased lethal capabilities of guns over time may have contributed to the rise in lethality using data from the Supplementary Homicide Reports (SHR) and the NCVS. For the city of St. Louis, Lauritsen studied these potential contributors using data from police and showed the long-term increase in lethality and lethal capacity of firearms may be resulting in greater numbers of deaths when exogenous shocks (e.g., drug markets) occur.
Also at the forum, Charles Loeffler showed how statistical models can be used to differentiate between gun violence being clustered versus contagious based on two of his recent works (Flaxman, Chirico, Pereira, and Loeffler 2019;Loeffler and Flaxman 2018). Most shootings are temporally and spatially clustered within city neighborhoods. Prior research argues that these shootings diffuse in space/time (Institute of Medicine and National Research Council 2013), but analysis of solved homicides indicates most fatal shootings are nonretaliatory (Metropolitan Police Department of D.C. 2006). Loeffler and colleagues used data collected from acoustical gunshot locator systems (AGLS) and a Hawkes process model to separate endemic from epidemic clustering. In DC, there is little contagion (14%) of cases and most clustering is well described by an endemic process, while in Chicago it was much greater (74%) and most violence is consistent with diffusion. These figures align with conclusions of police investigations about whether fatal homicides were retaliatory, which has implications for how a city attempts to address gun violence.
Data sources are needed to distinguish trends in overall violence incidents, the use of guns in violent incidents, and the lethality of gun violence incidents. We note the importance of reconciling our understanding of available data on trends and patterns, and the gains available by drawing on multiple layers of data.

Gun Violence Prevention
Ultimately data collection, statistical methodology development, and analysis should translate into new knowledge about how to prevent and reduce gun violence. Data and analysis alone cannot prevent gun violence. Prevention efforts need to engage people, organizations, and governments in new ways to promote effective strategies. These may include new initiatives to improve the analytical capacity of the police so that they are better positioned to prevent shootings. In addition to preventing shootings, increasing the shooting clearance rate would result in taking more shooters taken off the street and generating a deterrent effect for other would-be shooters. As a nation, we continue to debate whether changes to gun laws, such as right-to-carry and safe storage laws, increase or decrease gun violence, and personal safety. With the states as policy laboratories, statistical analysis should help us to discern the effect of gun laws. Lastly, it may be possible that broader public health initiatives not intended to address gun violence directly may lead to cost-effective solutions.
We now briefly describe research on crime analysis, shooting investigations, forensics, gun laws, and community remediation as examples of how rigorous analysis can prevent gun violence. The research cited in this section was presented at the 2019 forum.

Police Role in Solving Gun Homicides
Jens Ludwig from the University of Chicago Crime Lab described how new analytical capacity within the Chicago Police Department districts most afflicted by gun violence has resulted in much more targeted interventions (Kapustin et al. 2019). Ludwig used a synthetic control design for evaluating place-based interventions, where with only 22 total police districts there are too few to use a regression discontinuity. The establishment of the analytical capacity (Strategic Decision Support Centers) coincided with the commencement of a sharp drop in gun violence in the 7th police district serving the Englewood neighborhood, which is one of the most violent districts in Chicago.
Philip Cook used a quasi-experiment to show that police can solve shootings based on his recent work (Cook, Braga, Turchan, and Barao 2019). By comparing investigative resources invested in cases of clearing gun homicide cases relative to cases of nonfatal gun assaults, Cook argued that the large gap in clearances (43% for gun murders versus 19% for nonfatal gun assaults) may be a result of the sustained investigative effort of the Boston Police Department in homicide cases made during the first two days. This result has policy implications suggesting that extra investigation resources matter for more difficult cases, and increasing arrests is less costly for nonfatal cases than fatal. This showed how a natural experiment can be used to learn about the value of follow-up investigations.
At the forum Heike Hofmann showed how new statistical methods improved the process for matching ballistic evidence to firearms. Statistical machine learning algorithms have been used to address questions of the source in firearm identification (Carriquiry, Hofmann, Tai, and VanderPlas 2019), which is based on comparisons of 2D images and 3D scans of cartridge cases and bullets. Hofmann described an automated process using raw scans to determine the matching score and to establish error rates (Hare, Hofmann, and Carriquiry 2017). Statistical measures (based on the cross-correlation function and random forest scores) are better at discriminating matches than existing quantitative measures like counting matched peaks and valleys.

Gun Violence Interventions and Gun Laws
Terry Schell showed that many researchers have tried to study the impact of firearm laws on gun homicide. He showed results of multiple research teams working on the same dataset and producing highly variable conclusions. Common statistical methods for studying the impact of laws on crime are sensitive to specific modeling and assumption choices and, through simulation, he showed that one specific statistical model choice appears to be most likely to produce the correct conclusions (Schell, Griffin, and Morral 2018). Frequently applied methodologies, such as Huber-White standard errors, cluster adjustments, and fixed effects, have poor Type I error rates and low statistical power. A negative binomial model of firearm deaths with time-fixed effects, autoregressive effect, change coding with no state-fixed effects or standard error adjustments had the best performance in simulation. Having put the various methodological approaches through a rigorous review, the research showed the ideal method can then be applied to testing policies such as safe gun storage, right-to-carry, and stand-your-ground (Schell, Cefalu, Griffin, Smart, and Morral 2020).
John MacDonald talked about place-based experiments to remediate vacant lot and abandoned property based on two of his recent works (Branas, et al. 2018;Moyer, MacDonald, Ridgeway, and Branas 2019). Based on a randomized controlled trial in Philadelphia vacant lot remediation resulted in a reduction in gun-related crimes and shootings. Significant reductions in crime overall (−13%, p < 0.01), gun violence (−29%, p < 0.001), burglary (−22%, p < 0.001), and nuisances (−30%, p < 0.05) were also found after the treatment of vacant lots in neighborhoods below the poverty line. MacDonald showed that strategic cleanups of vacant lots and abandoned property can have large-scale population benefits given that gun violence and related problems are highly concentrated in the same places.

Police Shootings
Police shootings represent about 7% of the roughly 15,000 gun homicides per year. Although these represent a fraction of all the homicides, these incidents generate substantial tension between the public and the police and have provoked most of the largescale civil unrest of the last hundred years. For decades, scholars have tried to make sense of police shootings, but barriers to appropriate data and methods slowed the effort. Police shootings are not a new phenomenon and have declined greatly in the last 50 years. Police shootings in New York City declined by 95% between 1971 and 2018. However, the escalation of scrutiny and the increase in the availability of data makes new analyses possible.
Although there are many data sources, not all data sources are created equal. At the forum, David Hemenway explored a variety of police shooting data sources and found that the FBI's SHR and the states' vital records are not good sources for studying fatal police shootings of civilians (Hemenway, Berrigan, Azrael, Barber, and Miller 2020). They either miss or mislabel the deaths. The Washington Post and the National Violent Death Reporting System (NVDRS) are nearly complete, with 98%+ of fatal police shootings. Those data show that shooting risk is actually higher in rural areas than in urban areas. And some states, such as New Mexico, show consistently high levels of police shootings.
Also at the forum, Greg Ridgeway explored whether specific features of officers make them more likely to shoot and, when they shoot, more likely to fire an excessive number of rounds (Ridgeway 2016(Ridgeway , 2020Ridgeway, Cave, Grieco, and Loeffler 2021). Previous research on this topic struggled to address issues of confounding by assignment. Certain officer features may make them more likely to be assigned to the kinds of environments that put them at the greatest risk of being involved in a shooting, such as age and experience. By conditioning on the number of shooters or the number of rounds fired, time, place, and environment drop out of the conditional likelihood making it possible to isolate the effect of the officer features from the confounding effect of assignments. In New York City an officer's race, age at recruitment, and the pace at which they accumulated negative marks in their files influenced their risk of shooting. However, using data from over 50 police departments, no officer feature strongly influenced the number of rounds fired.

Engaging the Statistical Community
Gun violence research provides possibilities and perils for statisticians. Gun violence is an affliction for every major city in the United States and is a topic of broad national interest. For people under the age of 45, homicide is among the top five causes of death (Curtin and Heron 2019). Like most medical conditions such as cancer and infectious diseases, gun violence is not a single homogenous problem. Gun violence research needs specialists who can study the phenomenon in all its variations including suicide, spree shootings, domestic violence, police shootings, gang violence, accidental discharges, and numerous other distinct types of gun violence. If statisticians wish to be more involved in the topic, there is room for more collaboration with criminology researchers.
This article describes many data sources on gun violence. Some are drawn from national official statistics programs (NCVS, SHR, and NVDRS), while others come from very local sources (Boston homicide case files, Washington, DC acoustic gunshot locator system). We also describe a variety of questions for which gun violence researchers are applying statistical methods in an attempt to gain a better understanding of the phenomenon. These include studies of local variation in gun violence trends, evaluating the impact of changes in laws or new prevention initiatives, and establishing a firmer scientific foundation for the evaluation of ballistics evidence. These untapped and emerging data sources and insufficiently explored research questions would benefit from collaboration with statisticians.
For the statistician interested in becoming involved in gun violence research, we offer several ideas.

Where Are Sources on Gun Violence Research?
Gun violence research is dispersed among numerous academic journals in social science, public health, medicine, and other disciplines. There are a few reports that offer helpful overviews and useful citations for further reading. TInstitute of Medicine and the National Research Council produced a report on gun violence research priorities (Institute of Medicine and National Research Council 2013). The priorities they listed remain relevant: "characteristics of firearm violence, risk and protective factors, interventions and strategies, the impact of gun safety technology, and the influence of video games and other media. " An earlier National Academy study on firearm violence explores more of the justice system angle (National Research Council 2005). This report remains a good starting point to explore issues including defensive gun use, injury prevention, and justice system interventions. In particular, the reader will see the rare dissent in a National Academy report when debating the effect of right-to-carry laws, a debate that persists today. More research is needed to answer the question whether civilians and police officers are safer when carrying a firearm.
Aside from the National Academy reports, several academic journals have produced special issues on gun violence including Preventive Medicine (Hemenway and Webster 2015), American Journal of Public Health (Morabia 2018), and Criminology and Public Policy (Nagin, Koper, and Lum 2020). These journals include citations to other journals, which will provide the interested statistician an introduction to the available research on gun violence.
Other publications include the final report from the NORC expert panel (Roman 2020), and a recent article investigating the concern that mass shootings exhibit contagion effects (Fox et al. 2021).

Where Are Research Teams Studying Gun Violence?
Gun violence research does not represent a singular discipline. As a result, statisticians will find gun violence researchers within a variety of disciplines. A great way to start is to look for gun violence researchers within your home institution. You may find them in public health, nursing, medicine, criminology and criminal justice, social work, engineering, and law.
Consider reaching out to one of the CDC's Injury Control Research Centers (ICRC). Presently, the CDC funds nine ICRCs spread across the country. These centers have a larger mission than violence alone, but many of them have some lines of research around firearm injuries.

Where Are the Funding Opportunities for Gun Violence Research?
A frequent refrain in discussions of gun violence for the last two decades is that the Dickey amendment has prevented federal funding of gun violence research (Rubin 2016;Cook and Donohue 2017 Even with the new federal funding, several alternative funding sources continue to be available. The National Collaborative on Gun Violence Research (ncgvr.org) is one of the newest funders of gun violence research. Backed primarily by Arnold Ventures as well as other major donors, the collaborative expects to fund $20 million in research projects between 2019 and 2023.
Some states have stepped up to fund gun violence research including California and New Jersey. California's program is only open to those with full time academic appointments at a University of California campus and offers grants in the range of $10,000 to $75,000 (University of California Firearm Violence Research Center 2020). New Jersey created the Center on Gun Violence Research at Rutgers University (Rutgers School of Public Health 2020) and indicates that they are interested in engaging with researchers studying gun violence in New Jersey as well as relevant research nationally.
Other foundations have funded gun violence research including the Joyce Foundation, Robert Wood Johnson Foundation, and the MacArthur Foundation. New funding organizations frequently emerge. Paypal has begun funding research on illicit gun markets and may signal other developing gun violence research funding opportunities.

Engaging the Statistical Community
This article aims to engage the statistical community in gun violence research and remove a few barriers for those who are interested. We introduced a handful of studies that approached the gun violence problem from a statistical perspective. We invite the reader to explore these studies in more detail and reach out to their authors. We also provide additional readings to get started in gun violence research, allowing the statistician to quickly understand the landscape of research questions and the variety of research teams. Lastly, we provide some avenues for research funding. We list the ones that get substantial attention and therefore also a lot of competition for limited funds. However, many cities and states have local initiatives that may offer data, access, and possible funding for local research efforts. The Web-based Injury Statistics Query and Reporting System (WISQARS) provide data on fatal and non-fatal injury from multiple causes and locations.
We end on a note of caution. Gun violence can be a politically hot-button issue. While the statistical scientist tends to be motivated by the pursuit of truth through data analysis, gun control advocates and gun rights activists will not necessarily see it that way. Prepare for substantial scrutiny of your work and personal scrutiny of you. Every gun violence researcher we know has uncomfortable stories resulting from their gun violence research.
There would be no need to prepare for such close scrutiny if the topic of gun violence did not matter so much. Gun violence urgently needs scientific and statistical attention. A world of important and fascinating statistical challenges awaits.