A dasymetric method to spatially apportion tornado casualty counts

ABSTRACT This paper describes a dasymetric technique to spatially apportion casualty counts from tornado events in the US Storm Prediction Center's database. Apportionment is a calculation of the number of casualties within the area of the tornado damage path and with respect to the underlying population density. The method is illustrated with raster grids on tornadoes occurring between 1955 and 2016 within the most tornado-prone region of the United States. Results suggest a relatively uniform spatial distribution of tornado-induced casualties with slightly higher rates in the mid-south, particularly in northern Mississippi and Alabama, and also in many metropolitan areas. In addition, there is some degree of spatial variation over time, particularly clusters of high injury rates across the northern half of Alabama. Validation of the results at the county- and grid-level indicate that casualty numbers correlate strongly with the dasymetric estimates. Future work that includes socioeconomic variables (demographics, ethnicity, poverty and housing stock/value) might allow populations to be profiled with regards to vulnerability.


Introduction
Tornadoes are severe wind storms capable of catastrophic physical damage and human casualties. They account for nearly one-fifth of all natural-hazard fatalities in the United States (National Oceanic and Atmospheric Administration 2015). In 2011 alone, 1691 tornadoes resulted in 553 deaths and 5483 injuries (National Centers for Environmental Information 2012). More specifically, from the 25th through the 28th of April, 2011 there were 324 deaths and 2906 injuries from 349 tornadoes according to the National Oceanic and Atmospheric Administration. That same year, a few weeks later, another tornado killed 158 people in Joplin, Missouri. Even in years with below average tornado reports, the threat to human life still exists. For example, in 2016 the National Centers for Environmental Information (NCEI) reported 17 deaths in the United States that were related directly to 977 tornadoes (National Centers for Environmental Information 2017).
The locational climatology of tornadoes (where they occur) is well established in the literature (Kelly et al. 1978;Brooks et al. 2003;Gagan et al. 2010;Dixon et al. 2011;Cheng et al. 2013), including the risk at various spatial scales (Jagger et al. 2015). Historically, statistics on casualties resulting from tornado activity are also well documented, but only as an aggregate number for each separate tornado (Brooks and Doswell 2002;Fricker et al. 2017). Less available is information on where casualties occurred within the path of the tornado, except for more recent and high-profile events such as the 2011 Joplin (MO) tornado (Paul and Stimers 2012;Paul et al. 2014). Although news reports of a severe tornado may capture the headlines, particularly if total reported casualties are high, specific locations of casualties are generally not available to the public and are unavailable for many of the events in the database. Moreover, there is yet no means by which to estimate efficiently the most likely location of these casualties with any degree of spatial precision. Ashley (2007) provides a spatial and temporal analysis of tornado deaths using location information provided by Grazulis (1990) and the National Weather Service's (NWS) Storm Data. The locations are based on the nearest town or county seat (when only the county where the death occurred is known). Using these data over the period 1950 through 2004 and grouping by 60 km cells, tornado deaths are mapped across the United States. However, locational accuracy is variable, most notably from data collected before 1985, where judicial decision-making is used to provide a death location. In short, creating a database for mapping from text descriptions is difficult, and requires editing, verification, and cross-tabulations.
In response to this difficulty, we suggest an automated technique that allocates the total number of casualties reported for each tornado in proportion to the underlying population geography within the tornado path. The technique uses dasymetric mapping principles in that volumetric data (total casualties and population density) are spatially reapportioned across areal data (raster grid cells). Dasymetric calculations within an areal unit for an individual tornado are then summed across all tornadoes affecting that unit and repeated for each unit to create composite maps that represent the distribution of casualties at a spatial resolution chosen by the analyst. The method is demonstrated on a region of the world that is notorious for tornadoes; the central and mid-south United States. The work aims to establish proof of concept (for review of dasymetric principles see Mennis 2015), with the overriding objective to test the technique that can spatially link individual tornado paths with socioeconomic information. Such a link could be used to provide a model for potential casualties given a tornado warning together with path scenarios. It can also be used to help provide a model for seasonal casualty estimates given climate conditions (e.g. El Niño).
Casualty statistics and population density are used in this paper to visualize the destructive patterns of historical tornadoes at a far more spatially disaggregate scale than is currently available. The resulting map of tornado deaths produced by Ashley (2007) is updated to include tornadoes occurring over the last decade. Since 2004, there have been 15,979 tornadoes including 282 tornadoes that have caused deaths.

Data and study area
The Storm Prediction Center's (SPC) database contains information on tornado reports in the United States. The database is compiled from the NWS Storm Data. Along with the number of deaths and injuries, individual tornado records contain information on the date, location, damage path dimensions, and the worst damage rating found along the path. The term 'casualty' refers to either human death or injury as a direct consequence of tornado activity according to the NWS Storm Data. Flying debris is the major determinant of casualties. As summarized in Greenough et al. (2001) soft tissue wounds are the most commonly reported cause of injury and head injuries (or trauma) is the most commonly reported cause of death.
From this database broad characteristics of tornado occurrence have been analysed (Brooks and Doswell 2002;Ashley 2007) but there have been few attempts to map the historical distribution. More recently, Shen and Hwang (2015) use the database over the period 1950-2012 to estimate state-level percentages, ranges, and ranks of tornado-induced human casualties. Additionally, Paul and Stimers (2012) use the database for the year 2011 to compare fatality locations between the aforementioned Joplin (MO) tornado and all other tornadoes. Maps of killer tornadoes (tornadoes killing at least one person) aggregated to areal units provide a general idea of the spatial distribution of deaths (Ashley and Strader 2016) but this type of map does not apportion the tornado-level casualties spatially.
The SPC database contains shapefiles with data on the two-dimensional representation of each tornado path. Population data are obtained from the Gridded Population of the World, version 3 (GPW,v3) from the Socioeconomic Data and Applications Center at Columbia University, USA. The population database contains density estimates from 2000 represented as people per square kilometer. Densities are based on residential population. The native cell resolution is 0.0416 latitude/longitude, which at 36 N latitude means a cell having the dimension of 4.6 km in the north-south direction and 3.7 km in the east-west direction. The resolution is sufficient for our application of the dasymetric model since the precision on tornado genesis location prior to the mid 1990s is on the order of 1 km (.01 lat/lon) and the approximation of the track as a straight line decreases the precision by at least another several kilometers on average, especially away from the end locations. The methodology is constructed with programs coded in the open-source R language. The start year (1955) coincides with the period when more advanced severe weather data (hail/wind damage) were also recorded as a more concerted effort to archive tornado reports (Kelly et al. 1985;Doswell et al. 2005;Allen and Tippett 2015). The end year is 2016.
The study area includes the most tornado-prone region of the United States ( Figure 1). It contains a total of 30,546 tornadoes (52%) of all tornadoes in the United States, and two-thirds of all intense (EF3+) tornadoes over the 61-year period. Of these, 865 were linked to 3553 deaths (72% of all tornado deaths in the USA) and 3995 were recorded as causing 56,424 injuries (66% of all tornado injuries in the USA). Particularly severe tornadoes are responsible for a higher concentration of casualties. Not surprisingly, tornadoes with stronger winds tend to cause more deaths and injuries. Table 1 compares casualty numbers against the magnitude of tornado damage, measured by the EF rating, where EF0 represents minimum damage found within the tornado path and EF5 represents total destruction. No EF0 tornadoes were responsible for more than three deaths, while 81 EF4 tornadoes were attributed to at least six deaths. Similarly, while all EF5 tornadoes caused at least six injuries, only eight EF0 tornadoes injured at least six people.
There is an upward trend in the number of deaths linked to tornado activity, and the number of killer tornadoes since the early 1990s ( Figure 2). This increase is occurring despite the fact that  1  5  69  147  155  38  3  2  1  15  37  67  34  1  3  1  3  1 1  3 9  2 9  3  4  0  1  4  18  20  1  5  0  1  1  11  10  1  6+  0  1  3  28  81  26  Number of injuries  1  80  468  440  143  12  0  2  33  259  279  106  15  0  3  14  114  181  75  9  0  4  7  warnings for severe weather have improved (Ashley and Strader 2016). Owing to greater strength and frequency tornado deaths peak in the spring, particularly during the months of April (1320) and May (867). It is worth noting that two-thirds of all killer tornadoes result in only one or two deaths; the extreme outlier was the tornado in Joplin, Missouri in 2011 which was responsible for 158 deaths. Again, the detrimental effect of tornadoes on humans is highest in spring, with April and May witnessing the majority of all tornadoes that inflict one, two, or three injuries. One outlier dominates again, this time a tornado which hit Wichita Falls, Texas in 1979 and injured 1740 people.

Dasymetric calculations
Our dasymetric procedure spatially distributes the number of reported casualties from a tornado within and along the damage path. This involves estimating the probability of a casualty within areal units given the population density. The dasymetric procedure requires two sets of volumetric areal data. The first set is the total number of casualties per tornado and the second is the spatial path of the tornado. Tornadoes are represented in the SPC database as line data. The line represents the best estimate of the tornado track. The track is buffered into rectangle polygons in accordance to the reported storm path width; which is the average along-track width for tornadoes occurring before 1994, and the maximum along-track width after 1994. Each tornado will have its own path width, but on average paths will be wider for tornadoes with higher EF categories (Brooks 2004;Elsner et al. 2014). Adding this areal dimension results in a representative damage path. Not all tornadoes track along a straight line but a more realistic damage path is unavailable for the vast majority of historical tornadoes. As noted in the conclusions some recent tornadoes have additional damage path information which can be exploited by the dasymetric approach. The second set of volumetric and areal data for the dasymetric analysis is population density from the GPW,v3, and its areal representation, not in original collection units but as regular-sized raster grid cells (Figure 3). Population density values at the native 0.0416 latitude/longitude resolution are averaged to a 0.5 latitude/longitude (»50 km) raster grid (20 north-south cells and 44 east-west cells for a total of 880 cells). The analysis resolution, which is adjustable by the user, was chosen for demonstration here as a balance between being large enough to contain a sufficient number of tornadoes for results to be meaningful across the study region, yet small enough to capture local gradients in casualties. Additional results are described for variations to this resolution. The majority of cells have density less than 64 people per km 2 with the average of 29.3 people per km 2 and a standard deviation of 63.8 people per km 2 .
There are 30,546 tornadoes paths in the study area; they include partial paths when the tornado either enters the study area or exits it. The average number of tornado paths per 0.5 cell is 42.1, with a minimum of 0 and a maximum of 150. Only two cells contain over 120 tornado paths (both in central Oklahoma), but the more typical numbers range between 35 and 45 paths (211 cells). The standard deviation of the cell count is 20.1 with a variance-to-mean ratio of 9.64. The mean number of cells per casualty-producing tornado is 1.56. The tornado path area is, on average, smaller than the area of the cell but the chance that the path intersects more than one cell depends mostly on path length (since path width is an order of magnitude or more smaller than path length). With a cell size of 0.5 , 38% of all casualty-producing tornadoes affect more than one cell. With cell sizes of 0.25 , 0.125 , and 0.0625 on a side the percentage increases to 56%, 67%, and 72.9%, respectively.
The central premise is that of increased likelihood of a casualty in a grid cell with higher population, therefore we assign to each grid cell within the tornado path a fraction of the total number of casualties for the tornado. This fraction depends on the area of the tornado path that falls within the cell and on the cell's population density. Formally, for a given cell i, let p i be the population density and A i be the area under the tornado path (see Figure 4). We then assume the number of people in cell i affected by the tornado is P i = p i ¢ A i and the fractional number of casualties from the tornado assigned to the cell is given by, where c is the total casualty count from the tornado and M is the number of grid cells along the damage path. The fractional number of casualties are then summed over each tornado-casualty event affecting the cell over the study period to get an estimated number of casualties. The method does not determine where the casualty occurred only what grid cell contained the population from which the casualty most likely occurred. Our dasymetric procedure is a new application of dasymetric mapping. Traditional dasymetric mapping is performed by disaggregating population using some ancillary variable such as land use. Here, the method is performed by disaggregating casualties using population as the ancillary variable. While using data on land cover would improve the spatial precision of the location of population, it would not change our results; only locate casualties more precisely.

Maps of tornado casualties
The dasymetric method is repeated for each tornado path, and casualties are separated into deaths and injuries ( Figure 5 and Figure 6, respectively). The number of deaths per cell ranges from a low of 0 to a high of 116, with a higher number of tornado deaths across the mid-south, especially over the states of Mississippi and Alabama. Indeed, Alabama has the two cells with the greatest number of deaths. Fewer deaths are noted across the High Plains region of the states of Kansas and Colorado, and also over southwestern Texas. Death counts are highest in urban grid cells including the settlements of Amarillo and Wichita Falls, Oklahoma City, Wichita, Kansas City, Joplin, and Little Rock. However, parts of western Kansas and eastern Colorado, where tornadoes (shown as gray rectangles) are quite numerous, were not attributed to causing deaths.
In terms of injuries, counts per cell range from a low of 0 to a high of 1500. The spatial distribution is more uniform compared to deaths, but exceptions include parts of the High Plains and the southern Appalachians where there are noticeably fewer, and areas of the southern Great Plains the Southeast and the Ohio Valley where there are more. Again, urban cells tend to have the highest tornado injuries. Note that what constitutes a tornado injury and how those policies might have changed increases the uncertainty on the injury map relative to the death map.
Clusters of high casualties are more clearly identified by computing local Moran's I (Anselin 1995) where the local neighbourhood is defined as eight contiguous cells (queen's case). Values higher than zero indicate cells with similar counts in neighbouring cells (either high-high or lowlow) and values less than zero indicate cells with dissimilar counts in neighbouring cells. Values close to zero indicate randomness. The region across northern Alabamaand extending into northern and central Mississippiis identified as having the largest tornado death cluster by this method (Figure 7). Smaller clusters associated with urban areas are similarly demarcated.
Over time there are strong dissimilarities in the location of casualties as shown by time-series plots ( Figure 8). Considerable spatial and temporal variations exist at this temporal scale. The elevated casualty rates during the second decade include the 1974 Super Outbreak of tornadoes across the eastern United States, and the elevated rates during the most recent decade are driven by the extremely active 2011 season. The most prominent consistency is the few injuries across the High Plains, but this is partly explained by the relatively low population. Clusters of high injury rates also show temporal variability; however, there is a consistent cluster across the northern half of Alabama.

Discussion
Deaths and injuries occur throughout the tornado-prone region of the United States with the highest concentrations in cells encompassing cities and towns. Results from the dasymetric approach match closely with those produced by Ashley (2007); where both show a concentration of tornado deaths across the mid South from central Mississippi northward and westward into northern Alabama and western Tennessee. The southwest-to-northeast corridor of relatively high death rates across Oklahoma is also noted by both. The method is limited by the quality of the data used. For example the method does not control for the lack of distinction between direct and indirect deaths caused by the tornado. Also, the method uses population data from 2000 to infer deaths dating back to 1955. However, the median difference in the dasymetric-estimated deaths per grid cell (0.5 resolution) using 1990 versus 2000 population values is 1.2% indicating that the average effect of using a static population is on the order of less than 10% over the six decades of tornado data.

Validation
The procedure is validated by comparing the sum of the estimated number of casualties with the total number. The comparison indicates whether the dasymetric method is volume-preserving, known as satisfying the pycnophylactic assumption. This is important as a check that the code is working correctly. To that end, calculations across the study area indicate that the method estimated 3525 deaths, compared to 3553 from the SPC database, along with an estimated 55,846 injuries compared to 56,424. The differences are very small, 0.8% and 1.0% respectively, and are due to the fact that some tornado paths cross outside the study area. The proportion of casualties within the study area for these tornadoes is less than the number of casualties assigned to the tornado (see Figure 1).
The results are validated by first comparing estimates made at the county level and comparing them with counts available from NCDC Storm Events database (http://www.ncdc.noaa.gov/storme vents/). The dasymetric method is applied to the per-tornado casualty counts from the SPC database using county polygons and population densities. In the SPC tornado database, there are 1137 tornado paths that intersect Tennessee throughout the study period. Tennessee is chosen as the first case because of recent interest in Brown et al. (2016). Of these tornadoes, 94 resulted in deaths, with the average number by county as 2.9, a minimum of 0 and maximum of 20 (Figure 9). The Spearman rank correlation between actual and estimated per-county tornado deaths is 0.86 and the correlation between actual and estimated injuries is 0.88, both indicating a strong relationship.
Comparisons are also done for the states of Kansas, Missouri, Kentucky, Oklahoma, and Arkansas (Table 2). Overall correlations are slightly higher for injuries than deaths. For Kansas the correlation between actual and estimated tornado deaths is 0.79, and between actual and estimated tornado injuries is 0.92. Ranking the states from highest correlation between estimated and actual injuries, Arkansas is followed by Kansas, Tennessee, Missouri, Oklahoma, and Kentucky. There is no spatial pattern in these correlations and the results suggest that the dasymetric methodology of allocating casualties spatially is a good approximation. Note that the available county-level data provide a spatial disaggregation of the tornado deaths but they are not available with the same consistency throughout the United States nor do they allow for finer scale assessments. There are 536 tornado deaths over the period across a spatial region extending from 71 to 106 W longitude and from 25 and 49 N latitude. Deaths occurring in each latitude-longitude cell are summed and compared with deaths estimated using the dasymetric method. Comparisons are made for different raster resolutions ( Figure 10). The Spearman rank correlations between the locationspecific death counts and the dasymetric estimated death counts exceed 0.8 and are highest for the  Table 2. Relationship between estimated and actual tornado casualties at the county level. r d is the Spearman rank correlation between actual (a) and estimated (e) tornado deaths, as well as the Spearman rank correlation between actual tornado deaths and number of killer tornadoes (nT) and r i is the correlation between actual and estimated tornado injuries, as well as the Spearman rank correlation between actual tornado deaths and number of killer tornadoes. States are ordered by decreasing correlation between estimated and actual injuries. State

Limitations
The number of people under the path of a tornado is logically an important factor in predicting casualties, but there are others. The dasymetric method determines where along the path the casualties are most likely to have occurred given the underlying residential population and raster resolution. Limitations are highlighted by applying the method to individual tornadoes and noting the point locations where the deaths occurred ( Figure 11). The first example is the infamous tornado that occurred during the 1974 Super Outbreak. In the late afternoon of April 3rd, an EF5 tornado struck the city of Xenia, OH killing 36 people and Figure 10. Spearman rank correlation between location-specific deaths and dasymetric estimates in rasters of increasing cell size. The 95% uncertainty interval on the statistic is shown as an error bar. completely destroying 420 houses and 179 business and commercial buildings (Grazulis 1990;Boykin and Fisher 2014). Using the dasymetric method, the probability of at least one death in a cell under the path is determined by proportionally allocating population and path area intersection. Here, the cell size is the native resolution of the population raster (0.0416 ). Probabilities are highest in the vicinity of Xenia. Cells with the three highest probabilities contain locations where multiple deaths occurred although the cell with the highest probability contains fewer deaths than the cell with the next highest probability. There is good correspondence between cell probabilities and where the deaths occurred but this example highlights the limitation of assuming a one-to-one relationship between population density and casualty count.
The second example is the tornado that occurred during the Palm Sunday outbreak of 1994. On the morning of 27 March, a tornado tracked northeastward across eastern Alabama killing 22 people. Again, the probability of at least one death in a cell is the proportional allocation using population and path area intersection. The probability of a death is highest for the cells near, or containing, the towns of Alexandria, Jacksonville, and Piedmont. One death occurred near Alexandria when a van was blown off the highway and twenty deaths occurred in a church near Piedmont. The point locations of the deaths in this example underscores the uncertainty associated with assuming a straight line path. Given the fact that the church was destroyed, its location was, by definition, within the tornado damage path. It also highlights the limitation of using residential population density when a tornado affects a community largely at work or elsewhere (in this case, at church). The above limitations for a particular event notwithstanding, this paper shows that the dasymetric approach to aggregate casualties spatially produces results that are valid when aggregated over a reasonable number of events and over large enough areas (e.g. state counties).

Conclusions and further work
Aggregated tornado casualty counts covering a long period of time are unavailable at a local level. This paper applied a dasymetric method linking tornado-induced casualty data with population density to estimate the spatial distribution of known casualties (deaths and injuries) and to aggregate them on a map. Validation of the method using state counties indicates strong correlation between estimated and observed counts that exceed 0.73 for deaths and 0.83 for injuries. At relevant levels of spatial aggregation this method provides a better estimate of where casualties have occurred compared with counting the number of casualty-producing tornadoes.
What can be garnered from the methodology is that there is a relatively uniform spatial distribution of tornado-induced casualties across the study area covering the most tornado-prone areas of the United States. Slightly higher rates are found in the mid-south, particularly in northern Mississippi and Alabama, and also in many metropolitan areas. In addition, there is some degree of spatial variation over time, particularly clusters of high rates of injury across the northern half of Alabama. Validation of the results at the county-level indicate that casualty numbers from six states correlate strongly with the dasymetric estimates. Validation of the results at various grid resolutions indicate the best correspondence between location-specific deaths and estimated death occurs with cell sizes larger than 0.75 .
The technique is straightforward and relies on the assumption of a fixed, straight line representation of a tornado path, which can be modified as a tapering, curved or even discontinuous path to better capture the actual tornado path. For instance the Joplin tornado of 2011 represented as a straight-line path between genesis and dissipation would miss the part of a densely populated city that it severely impacted. Greater spatial precision on the path would be achieved by assigning more weight to the proportion of the path area with higher EF rating. In these regards, it would be interesting to apply the dasymetric method to the set of tornadoes covered by the more realistic paths in the National Weather Service's Damage Assessment Toolkit where this information is available (see e.g. Fricker et al. 2014). The feasibility of using socio-economic data other than population is future work. Variables representing affluence and ethnic groups would determine the demographic profiles of the populations affected by tornadoes, and at the same time help identify vulnerable populations, which could inform mitigation policies.