Geospatial analysis of neighborhood deprivation index (NDI) for the United States by county

ABSTRACT Little is known about the spatial clustering of neighborhood deprivation across the United States (US). Using data from the 2010 US Census Bureau, we created a neighborhood deprivation index (NDI: higher NDI indicates higher deprivation/ lower neighborhood socioeconomic status) for each county within the US County level scores were loaded into ArcGIS 10.5.1 where they were mapped and analyzed using Moran’s I and Anselin Local Moran’s I. Ultimately, NDI varies spatially across the US. The highest NDI scores were found in the Southeastern and Southwestern US states, and inland regions of Southern California. This information is critical for public health initiative development as planners may need to tailor the scale of their efforts based on the higher NDI neighborhoods of the county or geographic region with potentially greater chronic disease burden.


Introduction
Living in an area with lower socioeconomic status (SES) has been linked to poor physical and mental health outcomes as SES indicators (e.g. income, poverty, education) are major predictors of health and health disparities not just in the United States (US), but across the world (Commission on the Social Determinants of Health, 2008;Diez Roux et al., 2010;Robinette et al., 2017). A recent study illuminated the link between neighborhood SES and 18 mental and physical health conditions using a nationally representative cohort, the longitudinal Midlife in the United States (MIDUS) study (Robinette et al., 2017). Results indicated that even after adjusting for individual-level factors, the odds of developing two or more health conditions (mental and/or physical illness) was lower for every $10,000 increase in neighborhood income regardless of length of time spent in the neighborhood (Robinette et al., 2017). These relationships may be due to neighborhood socioeconomic level influencing the number of grocery stores, recreation centers, and other available health-promoting community assets (Diez-Roux et al., 1999;Moore et al., 2008;Morland et al., 2002;Schootman et al., 2007;Yen & Kaplan, 1998).
In recent years, Geographic Information Systems (GIS) have been beneficial in community assessments, particularly in representing the distribution of resources within specific geographic areas (Graham et al., 2011).
For instance, GIS has been used to examine the socioeconomic distribution across particular states such as through the South Carolina Commission for Minority Affairs which identified South Carolina's most socioeconomically affluent and deprived counties (Carter, 2018). Using z-standardized scores from variables on poverty, unemployment, per capita income, and median household income, these values were used to create ordinal categories (below average, average, and above average) (Carter, 2018). Their maps illustrated that the most deprived counties appear to cluster in rural parts of South Carolina, where the more affluent counties appear to cluster in the more urban parts of the state (Carter, 2018). On a broader scheme, the distribution of zip code-level neighborhood disadvantage throughout the US has recently been examined (Kind & Buckingham, 2018). The area deprivation index was created using 17 variables taken from the 2013 American Community Survey data, where higher deprivation represents lower SES. The most deprived zip codes appear to be geographically clustered (i.e. unusually concentrated) in the Southeastern and Southwestern regions of the US; however, there was no statistical analysis to examine the extent of the spatial clustering of deprivation in the US (Kind & Buckingham, 2018). In examining these studies that have developed deprivation indices in the US, in fact, very few have examined the spatial distribution and spatial clustering of neighborhood deprivation using GIS across the US and within geographic regions (Carter, 2018;Kind & Buckingham, 2018).
Using publicly available data from the US Census Bureau, we sought to create a neighborhood deprivation index (NDI) to visually represent county level deprivation across the US. We investigated the spatial distribution of NDI scores on the county level across the United States. Thus, the objectives of this study were to plot NDI across the US and to test whether NDI varies spatially. Ultimately, identifying where NDI clusters geographically in the US may be important in helping to inform community-based research, aid in targeting public health resources, and inform policy makers about potential deleterious characteristics within their jurisdictions.

Data and measures
The NDI was created using publicly available data from the 2010 5-year estimates of the American Community Survey from the US Census Bureau (Bureau USC, 2010), and the methods for creating this NDI were adapted from those published previously (Diez Roux et al., 2004Lian et al., 2016). We gathered 13 sociodemographic variables on employment/occupation, education, housing conditions, wealth, and income from the 2010 American Community Survey (additional information on the variables within these constructs can be found in Table 1) (Feng et al., 2014). Each variable was then loaded into SPSS, where they were zstandardized. Using Promax rotation, having a minimum loading score of 0.40, and a minimum eigenvalue of 1, factor analysis was conducted. Ultimately, those factors with a Cronbach's alpha greater than 0.70 were used to create the final NDI measures. The variables from the factors that fit the criteria were: (1) Household Income, (2) Home Value, (3) % Public Assistance, (4) % Family Poverty, (5) % Employed in Management, (6) % Housing Units Receiving Rental Income, (7) % Female-Headed Household, (8) % Households Without Telephone, (9) % Owner Occupied Housing Units, (10) % High School Graduates, and (11) % Bachelor's Degree or Higher. The sum of these variables was used to create the final NDI measure at the county level. Higher scores are associated with higher deprivation, indicating that these areas have lower SES.
Additional data on the racial/ethnic and age composition of these counties were downloaded from the 2010 Census Summary File. Data on health outcomes and behaviors were downloaded from the 2014 Robert Wood Johnson Foundation County Health Indicator Data as this report contains data from the 2010 study period. These data came from the National Center for Health Statistics, Behavioral Risk Factor Surveillance System, National Center for Chronic Disease Prevention and Health Promotion, Dartmouth Atlas of Health Care, and OneSource Global Business Browser. 2010 Rural Urban Classification Codes were downloaded from the US Census Bureau. These data were used to further contextualize the results of our Anselin Local Moran's I analyses.

GIS process
County cartographic boundary shapefiles were downloaded from the US Census Bureau and then uploaded into ArcGIS 10.5.1 (ESRI, Redlands, CA). NDI scores for each county were converted from Microsoft Excel into a comma separated value (CSV) file. The NDI CSV file was uploaded to ArcGIS 10.5.1 and spatially joined with the county shapefile data. Using choropleth maps in ArcGIS 10.5.1, NDI scores were divided into quintiles and subsequently mapped across the US and stratified by the four US regions (South, Northeast, Midwest, West), presented based on quintiles of NDI scores.

Analysis
The analysis was completed in a two-step process. In the first step, we used the Global Moran's I to investigate if NDI scores at the county level were spatially autocorrelated (i.e. similar scores are located near each other) based on the location of counties and the associated NDI values (Moran, 1950). Moran's I values range between −1 and +1. If Moran's I is positive, it represents a clustering of NDI values across the geographic area. If Moran's I is negative, it means that the NDI values are dispersed across the geographic area. Inverse distance was applied to conceptualize these spatial relationships whereby neighboring features have a larger influence on the computations for a target feature than those further away (ESRI). Using this method, we expected to identify evidence of areas of statistically significant NDI score clustering across the US.
In the second step, we used the Anselin Local Moran's I to identify specific regions within the United States with high and low NDIs in addition to attributes that are significantly different than those near it (Anselin, 1995). The resulting output provides a map of the spatial distribution of significant clustering to identify hot spots (areas of higher neighborhood deprivation), cold spots (areas of low neighborhood deprivation), and spatial outliers (e.g. a county with a low neighborhood deprivation index that is surrounded by high deprivation counties). Given that both of these analytical tools rely on the attributes of neighboring counties, we used data only for counties in the contiguous US. (Arc-GIS, 2018).

Overall and regional distributions of NDI
The NDI ranged from −15.62 (least deprived) to 30.58 (most deprived) across the 3,109 counties in the contiguous United States (Main Map). Based on the distribution of NDI scores, higher NDI scores can be found along the Mississippi River and Southeastern US. (Main Map, Figure 1). Lower NDI scores tend to be found in Northeast and Midwest regions of the country (Figures 2 and 3), in addition to coastal areas of California ( Figure 4).

Moran's I values
The value of Moran's I for the contiguous areas of the US was 0.41, indicating an overall spatial clustering of NDI by county (Table 2). Since the z-score was 80.57, there was a less than 1% chance that this pattern could have happened by chance. On a regional level, these relationships still existed. Southern states had a Moran's I value of 0.37 and a z-score of 34.46, Northeast states had a Moran's I value of 0.21 with a corresponding z-score of 8.55, Midwestern states had a Moran's I of 0.29 and a z-score of 21.81, and Western states had a Moran's I of 0.30 and a subsequent z-score of 13.01. Based on the z-scores for each of these variables, there was spatial clustering for NDIs at the county level. For each of these regions, there is a less than 1% chance that these clustering patterns could have happened by chance.

Anselin's local Moran's Ioverall
The Anselin Local Moran's I identified five different groups of counties based on their similarity or difference to the adjacent counties' NDI (Table 3). However, there were distinct differences in these relationships based on region. For example, 33% of counties in the South, 35% of counties in the Northeast, 32% of counties in the Mid-West, and 60% of counties in the West were classified as unclustered.
In the US, there were statistically significant clusters of both high and low NDIs (Table 3). For example, 27% of all US counties were considered to be within statistically significant clusters of high NDI values. These relationships differed by region with 53% of Southern counties, 2% of Northeastern counties, 3% of Mid-Western counties, and 8% of Western counties being within statistically significant clustering of high NDI values. Similarly, 28% of all US counties were classified as being within a clustering of low NDI values. There are regional differences in the clustering of low NDI scores with 4% of Southern counties, 53% of Northeastern counties, 57% of Mid-Western counties, and 25% of Western counties being classified within a statistically significant cluster of low NDI scores.

The Anselin local Moran's I -Regional
The results of the Anselin Local Moran's I indicated that high deprivation clusters are located in counties in Northern Arizona, California, and southern Texas in addition to the majority of the Southeastern US (Main Map). Clustering of low NDI scores are primarily located in the Northeastern and Midwestern states (Main Map).
In the Southeastern states, the highest deprived areas appear to be widespread among all of the states, except for Texas. In Texas, high deprivation clusters  Figure 5).
In the Northeastern US, the high deprivation clusters are found in the New York City metropolitan area and near Concord, New Hampshire. Large clusters of low deprivation areas are found throughout the majority of these states. However, there are high deprivation outliers located throughout central and south Pennsylvania, east New York, counties in southern Massachusetts, counties in northeastern Connecticut, and counties in northern Rhode Island ( Figure 6). Low deprivation outliers were located in the counties outside of the New York City metropolitan area.
In the Midwestern states, the highest deprived areas appear to be concentrated in southern Missouri and southern Ohio (Figure 7). Low deprivation clusters are found primarily in most states except Ohio; however, high deprivation outliers appear to be concentrated around several midwestern cities including Chicago and Springfield, Illinois (Figure 7). Low  deprivation outliers are relatively sparse in this region except for several counties in central South Dakota, southern Indiana, and southern Ohio.
In the western states, areas with a high NDI clusters appear to be located throughout California, northeastern Arizona and northwestern and southeastern New Mexico (Figure 8). Low deprivation clusters are located primarily in Colorado, Wyoming, Utah, and Montana. High deprivation outliers are found primarily in northeastern Colorado, southern Wyoming, and southeastern Montana. Low deprivation outliers are located in several counties near coastal California, southeastern Arizona, western New Mexico, and inland Washington.

County characteristics based on Anselin local Moran's I results
When examining characteristics of counties based on deprivation categories, we found that high deprivation counties were more likely to be mostly rural (n = 406) where low deprivation counties were more likely to be mostly urban (n = 310) (Table 4). Additionally, high deprivation counties had a higher percentage of people who are under 45 (58.38%), whereas low deprivation counties had a higher percentage of people 45 and older (46.59%). Out of those who live in low deprivation areas, 90.04% of them were White, 1.54% were Black, 4.50% were Hispanic, 0.89% were Native American, 0.93% were Asian, 0.04% were Pacific Islander, 0.07% were Other, and 0.07% were two or more races. Out of those who live in high deprivation areas, 65.74% were White, 21.57% were Black, 8.06% were Hispanic, 1.30% were Native American, 0.74% were Asian, 0.04% were Pacific Islander, 0.08% were Other, and 1.27% were two or more races. In the high deprivation areas, there were 9,922 years of potential life lost, 23% of people reporting fair or poor health, 24% of adults who smoke, 34% of adults who were obese, and 32% of adults who were physically inactive; all of these health outcomes were more prevalent in the high deprivation areas compared to the other types of geographic areas. However, 42% of adults in high deprivation clusters had access to exercise facilities, which was the lowest prevalence across the geographic regions. Low deprivation clusters had the highest percentage of adults with Medicare who received diabetes screening (84.35%) as compared to the other regions.

Discussion
Despite being one of the highest income countries around the world, the US has significant disparities in overall NDI. These geospatial analyses demonstrate that NDI scores are significantly clustered across the US. Based on the maps, NDI varies spatially within the US. For example, the highest deprivation areas are found in the Southeastern and Southwestern US and inland regions of Southern California, while the lowest deprivation areas are located in both the Northeastern and Midwestern regions of the US. Moreover, lower NDI scores are found in many metropolitan areas across the United States, especially in the Southern states, which may be related to these areas being centralized locations for post-secondary education, company headquarters, and military bases, thus drawing in a large population that allows these areas to be economically stable and wealthier. The Anselin Local Moran's I illustrated a significant clustering of counties with high NDI found along the Mississippi River in northeastern Louisiana, western Alabama, and eastern Arkansas and extending through southern regions of Alabama, Georgia, and South Carolina. These findings tend to overlap with regions that are considered to be part of the 'Stroke Belt/Alley' suggesting that neighborhood deprivation is likely be related to cardiovascular disease (CVD) burden and CVD events in these regions. (Prevention C-D for HD and S. Stroke Death Rates, 2018) Our findings contribute to the literature regarding the spatial distribution of neighborhood deprivation index scores across the US counties. Given the link between NDI and cardiovascular disease, it is important to examine the regionality of cardiometabolic outcomes (Stimpson et al., 2007). For example, research has found that Type II diabetes appears to be clustered in the 'Stroke Belt' region of the Southeastern US (Cushman et al., 2008Lee et al., 2014). Using data from the Reasons for Geographic and Racial Difference in Stroke (REGARDS) Study, counties within the highest tertile of coronary heart disease mortality formed a band stretching from the Northeast through Texas through Southern California (Shuaib et al., 2012). Data from the Robert Wood Johnson Foundation indicated a similar pattern based on Anselin's Morans I results. For example, high deprivation areas had higher premature deaths and a higher percentage of people reporting fair or poor health, adult smokers, adults who are obese, and adults who are physically inactive when compared to the other deprivation groups. Leonard et al., found that poor health and high clusters of food insecurity were common in the Mississippi Delta, Black Belt, Appalachia, and Alaska (Leonard et al., 2018). Our findings are supported by this paper and illuminate how social determinants of health are key to understanding the spatial distribution between material deprivation and adverse health outcomes.
Our Anselin Local Moran's I results take the existing knowledge around neighborhood deprivation and present it visually using GIS technology. Our study illustrated stark differences in the spatial distribution of both low and high areas of NDI based on region. Our results suggest that more urban areas are considered to be low deprivation areas, whereas rural areas are considered to be high deprivation areas. Additionally, 2010 Census data suggest that those who are in these high deprivation areas are younger and more likely to be Black, Hispanic, or Native American. Most of the largest visual differences can be identified in the Southeast as compared to counties in the Northeast. For example, in the Southeast, clusters of counties with low NDI scores are found in the capital cities including Austin (Texas), Tallahassee (Florida), Atlanta (Georgia), Columbia (South Carolina), Frankfort (Kentucky), Charleston (West Virginia), Annapolis (Maryland), and Raleigh (North Carolina). Richmond, Virginia is located in an area without any significant clustering and Dover, Delaware is located in a high deprivation cluster. While the counties immediately surrounding these cities have low NDI values, the counties that are located further away from these regions have clusters of high NDI values. However, the inverse of this phenomenon can be  found in the Northeastern counties. With the exception of Providence (Rhode Island), Newark (New Jersey), Bronx (New York), and Boston (Massachusetts), the remaining major cities in the north are located in areas that are classified as a low deprivation area. Overall, these differences suggest to rurality may serve as a catalyst for deprivation in the South, whereby deprivation is concentrated in urban areas within the North.

Strengths and limitations
This investigation has several strengths and limitations. Strengths of this study include objective and publicly available measures of county level characteristics. Additionally, we tested for spatial autocorrelations using Moran's I and Anselin Local Moran's I, which have not been used in relation to NDI across the US.
Limitations include the geographic scale as NDI may be better illustrated on the census tract or census block level to draw more specific conclusions regarding a specific population's neighborhood-level exposure. Additionally, since we are using the county as our unit of analysis, there may be evidence of the modifiable areal unit problem as geopolitical boundaries may change over time, which may ultimately influence our results and the subsequent comparison of these results across multiple years (Waller & Gotway, 2004). Drawing attention to disparities in neighborhood deprivation can aid in developing health-related interventions for disadvantaged populations.

Future directions
These maps can be used in a variety of ways by public health professionals, local government, city planners/ developers, and the public. For example, the Kirwan Institute for the Study of Race and Ethnicity from The Ohio State University was commissioned by the Massachusetts Law Reform Institute to examine the geography of opportunity within Massachusetts (Reece & Gambhir, 2009). Specifically, GIS was used to examine the concentration of subsidized housing, housing foreclosures, and subprime lending (Reece & Gambhir, 2009). By using maps highlighting socioeconomic disadvantage, policy makers could yield insight into the contextual factors faced by their constituents. This information is critical when considering public health or public policy initiatives on a regional scale.
Based on the NDI, neighborhood initiatives implemented in the Midwest region may not be as successful or applicable to the Southeast region due to differences in overall deprivation, rurality, and racial composition. In addition, it may be important for public health organizations in the Southeast region to place an emphasis on the neighborhood socioeconomic environment and community assets when planning public initiatives given the widespread clustering of high NDI in the Southeast. For example, The Georgia Smoke and Heart Attack Prevention Program provides monitoring, health assessments, and lifestyle coaching to low-income state residents with hypertension (Health T for A, 2009). North Carolina's Medicaid managed care program, North Carolina Area Health Education Centers, the University of North Carolina School of Medicine and primary care groups have collaborated to improve chronic disease self-management efforts which led to an increase in the overall number of patients meeting goals for diabetes and cholesterol control (Health T for A, 2009). In the Texas border areas, Northeastern, Midwestern, and Western states, specifically counties in Southern California, efforts to improve neighborhood deprivation, and subsequent health disparities, should consider focusing on individual counties and regions. For example, The Steps Program in Broome County, New York enrolled rural families in the area in a walking program to increase the number of adults that were walking for more than 30 min per day at least 5 days per week. Their efforts led to a nearly 7% increase in the number of adults meeting the activity recommendations of 150 min of physical activity per week (Health T for A, 2009; US Department of Health and Human Services, 2008). Additionally, many of the highest deprivation areas found in Arizona, New Mexico, Colorado, Utah, North Dakota, South Dakota, Minnesota, and Wisconsin are located on Native American/Indigenous reservations. These maps can be used to target resources for community-based interventions, healthcare facilities, improvements in the built environment, and other funding allocations that will improve health for individuals living on reservations. For example, The Racial and Ethnic Approaches to Community Health (REACH) project and The Albuquerque Area Indian Health Board, Inc., worked with the Ramah Band of Navajo Indians to provide mammograms for indigenous populations in addition to providing public health training and cancer screening techniques to tribal leaders (Health T for A, 2009). The development of future, targeted public health initiatives, similar to the examples provided above, may benefit from the visual representation of the spatial distribution of county-level NDI. However, this study highlights the future need for more granular geographic investigations of neighborhood deprivation. The maps developed in the study can provide policy makers, medical and lay health professionals alike, with insight into socioeconomic factors that are likely to influence adverse health outcomes for their patients (Kind & Buckingham, 2018). We recognize the barriers to changing the health status of disadvantaged communities, such as financial limitations of the tax base in high NDI areas with subsequent limited investment and available resources for healthy living (Leonard et al., 2018;Center FR & A). Our maps also demonstrate the clustering of limitedresource communities, which exacerbates these barriers. However, ongoing partnerships between public health departments in high NDI communities and academia may help in addressing these barriers. Ultimately, these maps can not only aid policy makers, but can help academic researchers when partnering with disadvantaged communities to empower advocacy work by these communities' leaders, particularly through community-based participatory research efforts to improve population health. Engagement efforts, including community-based participatory  research, can also facilitate improved neighborhood social cohesion as increased neighborhood social cohesion has been shown to be protective against adverse health outcomes (Brisson et al., 2018). Recent research has also highlighted the impact of community and research partnerships. The Academic Community Engagement Core of the Mid-South Transdisciplinary Collaborative Center for Health Disparities Research partnered with a disadvantaged community in Birmingham, Alabama to engage in coalition building and a community survey (Bateman et al., 2017). By engaging the community, the research team was able to establish a community coalition to better address the needs of their community. An additional approach for community-academic partnerships could include asset mapping (Kretzmann & McKnight, 1993). By identifying community assets, researchers can work to help community leaders link residents in these disadvantaged communities to existing resources and identify additional resource needs for which programing can be developed.
Additionally, disadvantaged communities have a variety of structural disadvantages including an increased density of fast food establishments, less conducive environments for physical activity, increased violence, and lack of adequate housing (Khullar & Chokshi, 2018). A potential policy recommendation could include increasing the number of affordable housing developments in disadvantaged neighborhoods. A recent Stanford University study examined  the impact of multifamily housing developments funded by the Low Income Housing Tax Credit on surrounding neighborhoods. Their results indicated that building more affordable housing units in low income areas would lead to a reduction in violent and property crime, an increase in the income of home buyers, and an increase in income diverse populations (Diamond & McQuade, 2019). These increases could potentially lead to more investment and community assets within these materially deprived areas.