Evaluating metropolitan spatial development: a method for identifying settlement types and depicting growth patterns

Abstract This research developed a method to explore density and growth rates in the 50 largest metropolitan areas in the United States. The outcomes of this method enable planners to understand the growth pattern of a region and compare it with other regions in the country. The main objectives of this research were: (1) to identify settlement types (high, medium and low density, suburbs, and urban fringes) in each metropolitan area; (2) to depict and compare general metropolitan structures across the nation; and (3) to evaluate the location of recently developed dwellings in relation to settlement types and structures and estimate growth trends. The paper combined the percentage of impervious surface raster dataset (30-m cell size) produced by the US Geological Survey (USGS) and block-level housing unit data produced by the Census Bureau to map housing density across the lower 48 states in 2000 and 2010. Neighbourhood density profiles were used to identify the settlement types (high, medium and low density, suburbs, and urban fringes). The paper then compared density-decay functions of 50 metropolitan areas to depict urban growth patterns and to quantify the number of housing units developed in each settlement type. This method effectively distinguishes regions with higher densification (infill development) rates versus regions with high-paced sprawl. The results showed that Los Angeles, New York and San Francisco metropolitan areas had high percentages of housing units developed in high-density areas between 2000 and 2010, while during the same period Birmingham, Nashville and Charlotte had high percentages of housing development in areas identified as suburbs.


INTRODUCTION
This research develops a method to explore the spatial distribution of growth rates of different settlement types in the 50 largest US metropolitan areas. The method identifies the spatial structure of regions and considerably improves previous methods such as density gradient models.

LITERATURE REVIEW
Evaluation of spatial structures and urban densification policies (infill development) in cities and metropolitan areas is an important ongoing task for many local and regional governments. This can be informed by the theories of the late 1990s related to the compact city form (Bertaud, 2004;Jenks & Burgess, 2000). Despite agreement on the benefits of encouraging higher densities, we can still observe suburbanization processes in urban fringes, which continue contributing to fragmented forms in many US metropolitan areas. Alig, Kline, and Lichtenstein (2004) projected that by 2025 developed land areas will expand by 79% in the United States. Development of natural land degrades many environmental, ecological and socio-economic qualities. Development of greenfield areas intensifies the urban heat island effect and increases heat-related mortality, morbidity and energy demand (Arnfield, 2003). Urbanization also degrades regional ecology (Grimm et al., 2008), increases flood intensities (Ntelekos, Oppenheimer, Smith, & Miller, 2010), and exacerbates stresses on water and air quality (Akimoto, 2003;Brabec, Schulte, & Richards, 2002). In addition, there is ongoing discussion about which spatial patterns of densification achieve a more sustainable structure (Davoudi, 2003). For example, depending on the size of the metropolitan area, mono-or polycentric structures may be preferred to improve accessibility to services and consequently to reduce vehicle miles travelled (Bertaud, 2004;Filion, Bunting, Pavlic, & Langlois, 2010;Hall, 2010).
Since the early 20th century, planners and urban geographers have used density models to explain urban and metropolitan structures (Hall & Barrett, 2012). Many studies have measured and mapped spatial structures and trends in metropolitan areas by measuring population density (e.g., Clark, 1951;Newling, 1969) or land cover types and changes (Auch, Taylor, & Acevedo, 2004;Herold, Goldstein, & Clarke, 2003;Luck & Wu, 2002). Density gradients, also called density-decay models, reveal a picture of density distribution around centres. Density-decay profiles also have been used to explore historical changes and trends (e.g., Filion, Bunting, McSpurren, & Tse, 2004;Griffith & Wong, 2007;Lee & Leigh, 2005;Waddell, 1994). Modifications of this method encompass diverse definitions of centres, density types and decay functions (linear versus non-linear), which Regional StudieS, Regional Science have created a rich literature on the methodology of using density gradients. Filion et al. (2004) used density gradients to identify the pattern of 15 metropolitan areas (three in Canada, 12 in the United States). They quantified the density of built-up areas and modelled them using cubic polynomial regression to recognize more than one density peak. Estiri, Krause, and Heris (2015) used density gradients to explain spatial patterns in demographics of households in metropolitan areas. They found that younger households tend to reside in centres of urban areas whereas older households are more likely to choose suburban areas. Density-gradient models simplify the spatial data into graph from, which can mask much of the spatial variability in metropolitan form.
Spatial autocorrelation is an issue when using density functions. Griffith and Wong (2007) used a spatial regression model in order to account for spatial autocorrelation. They studied the 20 largest US metropolitan areas using census data from 2000 to identify mono-and polycentric metropolitan areas. They found that most metropolitan areas have a monocentric structure. Their modified method to account for autocorrelation when measuring the structure of polycentric areas only improved the measurement for three cases (Los Angeles, Washington, DC, and Detroit). The general rule suggested in most studies is that steep density curves indicate stronger centralization and curves with multiple peaks implicate multiple centres (Filion et al., 2004). However, the diversity of metropolitan area forms and specifications requires a unique linear or polynomial function for each metropolitan region.
Identifying settlement types and combining them with density-decay models can reveal how development policies and economic variables may have influenced the location of new developments. Bibby and Shepherd (2004) used a method that measures housing density in multiple neighbourhoods to classify urban and rural settlements in England. They used urban and rural classifications produced through this method to analyze policy options requiring urban and rural differentiation, such as development intensification and historic preservation. In North America, however, there is still a gap between literature and policies when defining settlement types or urban morphology. In most studies, local government and municipal boundaries have been used to represent urban areas (Theobald, 2001). However independent from administrative boundaries, defining settlement types is essential for planning applications. Bibby and Shepherd (2004) discussed how urban versus rural definitions are sometimes confusing for researchers, and there is a need in policy-making for delineating such boundaries.
Quantification of urban form and trend analysis of settlement types also assists planners in identifying urban sprawl and suburbanization processes (Brueckner, 2000). Quantifying urban form for measuring sprawl has been very well researched (e.g., Ewing, Pendall, Chen, & America, 2003;Galster, Hanson, Coleman, & Freihage, 2001;Torrens & Alberti, 2000). Orenstein, Frenkel, and Jahshan (2014) highlighted how methodological differences can lead to disparate outcomes. The use of uniform main density variables and units of measurement are central to standardizing the quantification of form and sprawl. However, uniform variables are not the end goal. Each approach offers something unique. In most studies, population and housing density have been core variables. Alberti et al. (2003) offered a method for measuring urban development by quantifying ecosystem features. Ewing et al. (2003) carried out a thorough study of different methods of measuring sprawl. They proposed four major factors including (1) residential density, (2) mixed use, location of businesses and services, (3) strength of activity centres, and (4) accessibility of street network. They developed a 22-degree index to measure sprawl from various perspectives. The strength of their method is that it is operationalized for 83 metropolitan areas and sheds a light on different dimensions of sprawl.
The most common way to measure metropolitan growth is identifying land cover change (Auch et al., 2004;Yuan, Sawaya, Loeffelholz, & Bauer, 2005;Zhou, Troy, & Grove, 2008) because a national dataset is readily available. Since 1992, the US Geological Survey (USGS) has developed a National Land Cover Data-set (NLCD), which extracts important data from observations by Landsat satellites. The USGS publishes NLCD every five years and includes land cover type, land Regional StudieS, Regional Science cover change and percentage of developed imperviousness. In 2004, the USGS published Urban Growth in American Cities which compares urbanized areas in 1970 and 2002 (Auch et al., 2004). That study, and many similar ones, simply uses the developed land category from land-cover data. This type of data and its scale are appropriate for general studies in regional planning. However, there are serious oversimplifications made when using these data (Fujii & Hartshorni, 1995). For example, it is not possible to measure the intensity of urbanization precisely because the data do not account for housing or population density (Bertaud & Malpezzi, 2003).
To date, few studies have given planners and policy-makers all the information needed to address adequately scales and methodologies so that they strategically and effectively inform the practice of planning and policy-making. In general, any proposed method should be easily replicable. This is crucial because studies depicting a precise image are merely a snapshot and are quickly outdated. Mitigating sprawl and processes of decentralization need a monitoring framework at national, regional and metropolitan levels. Therefore, the input data required for such an approach should be reproduced every 5-10 years and be consistently available for the scale of study. This research proposes and validates such a method to measure urban growth with a high spatial resolution using publicly available data throughout the country.

METHODS
All data used in this research are publicly available with nationwide coverage and are being updated every five or 10 years. Three resources were used to reach a high resolution of density distribution.
First, the number of housing units at the census block level: these data were complied for the whole country for 2000 and 2010. This the finest scale available for population and housing data. Tables of housing units were downloaded from the US Census Bureau website, and Census Block polygons (GIS files) from the TIGER website. As the volume of the data was very high, a Python script was used to join the data. Census block polygons of 2000 and 2010 are not identical.
Second, per cent of impervious surfaces: in core urban areas, census blocks are small and most of their areas are developed. This gives a relatively high resolution to measure housing density and intensity of urbanization. However, in suburbs, low-density urban areas and urban fringes, the size of census blocks are relatively large and, in most cases, only a small portion of them is developed. In other words, if we measure housing density by census block, we will have high-resolution density distribution in high-density urban areas and low resolution in suburbs and low-density areas. To distribute appropriately housing density or population density in low-resolution areas, it is necessary to identify the built sites and exclude open spaces. This was done using the NLCD product 'percent developed imperviousness'. The raster dataset covers the nation with 30-m cells.
Third, roads: the main purpose for using the NLCD impervious surface dataset is to exclude land without housing, particularly in suburban communities. However, the presence of roads as impervious surfaces in these areas creates significant noise in the data. Different road network datasets such as TIGER lines and Open Street Map were compared and OSM (Open Street Map) data were chosen for use because it excludes roads from the impervious surface dataset. This increased the accuracy of the impervious surface dataset.
To explain the steps of methods for addressing the objectives, the methods were divided into three steps.

Measuring the number of housing units in each cell
In order to measure the settlement type, spatial structure and growth trend, a raster dataset is needed in which each cell has a value for the number of housing units. Converting census blocks (as a vector layer) to a raster dataset allows for the spatial distribution of the housing units and constraining their locations to developed areas. In addition, since some census blocks in 2010 and Regional StudieS, Regional Science 2000 are not identical, converting them to raster cells helps to tackle the boundary disparity issue. Figure 1 shows the algorithm developed to calculate the number of housing units in each cell using four phases. The first phase (1) was filtering the impervious surface dataset to find developed cells. Cells with less than 6% impervious surface were assigned a zero value, indicating they are undeveloped cells ( Figure 1B, ). The assumption is that cells with more than 6% (54 m 2 ) of impervious area could have enough built land to accommodate housing units. To set this threshold, different values were tested iteratively on sample areas and 6% were selected that best matched the groundtruth data. Then cells that were matched with roads using the OSM dataset were assigned zero values, meaning that they are undeveloped cells and do not have any housing units ( Figure 1G, ). Neighbourhood analysis of 3*3 was used to reduce the noise of the data ( Figure 1D, ). The filter assigns zeros to cells that have no developed cells in their 3*3 neighbourhood. Comparison of the filtered and unfiltered layers through ground truthing shows that the filter increases the accuracy of data. The outcome is a raster dataset presenting developed (one) and undeveloped (zero) cells.
Phase 2 is counting the number of developed cells in each census block. A Python script was developed to run zonal analysis for all blocks ( Figure 1K, ). Since the census block and raster datasets were extremely large (covering the entire country), a Python script was developed that splits the file into smaller sizes and runs zonal analysis for each small piece. Since some blocks (in core urban areas) are too small to cover the number of cells necessary for zonal analysis, it was decided to only run zonal analysis for blocks larger than 6000 m 2 (about six cells) and 30,000 m 2 (about 30 cells). Through iterative analysis and ground truthing using randomly selected sample areas, those two thresholds were chosen in order to compare the results and ensure that the accuracy of analysis would be good enough in urban and suburban areas. The blocks larger than 30,000 m 2 were eventually chosen for the final analysis ( Figure 1M, ). Any blocks smaller than this threshold were assumed to be entirely developed. The output of the zonal analysis is a census block, which has the number of developed cells as an attribute.
Phase 3 distributed the number of housing units evenly across the developed cells to estimate the number of housing units for each developed cell ( Figure 1N, ); the vector census blocks were then converted to a raster dataset. A 30*30 m cell grid derived from the raster dataset produced in the previous phase was then used. Regional StudieS, Regional Science Phase 4 multiplied the outcome raster with the developed raster to assign numbers to the developed cells and assign zero to undeveloped cells ( Figure 1O, ). Figure 2 shows the final outcome in which the value of each cell represents the number of housing units. The final output covers the entire country and is much more accurate than a simple housing unit density for each block. This dataset is produced for both 2000 and 2010.  . density profiles conceptual patterns.
Regional StudieS, Regional Science

Settlement-type classification
The first objective is the classification of settlement types. This classification defines the geography of density distribution in a metropolitan area. The location of recently developed dwellings and the corresponding land classification types partly reflect the influence of policy levers in shaping metropolitan areas. This explains, first, how dense a metropolitan area is and, second, in which settlement types most of the recently developed dwellings are located. To determine the types of settlements throughout the entire country, multi-distance density profiles were used. As Figure 3 shows, this method measures the housing density of each cell (30 m) for different neighbourhood sizes. Bibby and Brindley (2013) introduced this concept to identify settlement types. This method was changed and calibrated for two key reasons. First, the input datasets that Bibby and Brindley used for the UK (postcode dataset) are not available for the United States; instead, census block data were used as the main input. Second, the algorithm and rules Bibby and Brindley used to identify settlement types were trained for the UK context; here a training algorithm was used to find appropriate thresholds for the US context. This method assumes that changes in density as distance increases can help identify the type of settlement. For example, in urban core areas the density starts high and remains high (from a 60-m neighbourhood radius to 1500 m), while the density profile for a cell located in urban fringes, for example, starts lower and will drop quickly as distance (radius) increases.
The average density at different distances was calculated by starting from 60 m (60, 120, 200, 350, 500, 750, 1000, 1500, 2000 and 4000 m). To define a signature for each settlement type, sample areas in five metropolitan areas (Denver, Dallas, Boston, Seattle, and Chicago) were selected based on visual observation of development density. Between four and 10 samples areas from each city were used to train the algorithm and calculate the average density profile for each settlement type.
Unique patterns of density profile changes were extracted from the sample areas and they were defined as signatures for each of the five settlement types. The dataset enables one to draw these profiles for every single cell in the entire country. Figure 4 shows density profiles for the different settlement types derived from the sample areas, reported in housing units/hectare. As presented, high-density urban areas have high density in their close neighbourhoods and remain high for a radius of about . density profiles of the sample areas (averages for each type).
Regional StudieS, Regional Science 800-1000 m, whereas medium-and low-density urban areas start at a lower level and remain steady for a radius of about 1000 m. These differences create a unique variation for each settlement type. These unique variations define the signature of each settlement type. For example, only the density of suburban areas is as low as eight units/hectare at a radius of about 200 m. A series of queries (rules) were created for each signature that can assign a settlement type to each cell across the full dataset. Figure 5 represents the rules applied for each settlement type in the Denver metropolitan area.
For rules, a series of queries were set to detect each settlement type using density signature variations. These rules were applied to all metropolitan areas to provide a comparable picture of metropolitan forms. For example, high-density urban areas are the only type with densities higher than 15 units/hectare at a distance of 1500 m. This gives a unique window to use the density layer at 1500 m to find high-density urban areas. The same technique was used to identify other types with a series of queries.

Metropolitan structures
Another objective of the research is comparing the density distribution in the selected metropolitan areas. This enables a comparison to be made of density profiles of metropolitan areas and measurement of the trends in centralization or sprawl. A simple description of metropolitan structure is the way density changes in its peripheral area. This profile can present the way development is spread around the centre or centres. The housing unit raster dataset was used to measure the density for concentric zones. This comparison shows the level of urban sprawl or fragmentation. However, it does not precisely show whether the metropolitan areas are mono-or polycentric. Although sub-centres may create a small bump in the profile, in most cases they might not be visible in the profile. The number of housing units in each ring was calculated by starting from 1 to 46 km (from centres). Comparing the density profiles of metropolitan areas in 2000 and 2010 reveals either centralization or fragmentation trends.

RESULTS
The classification of settlements is a useful outcome to provide the context for planning and design policies and growth patterns, which covers a gap in the literature and practical methods. Regional StudieS, Regional Science This paper presented how one can create that context. In most planning studies administrative boundaries were used to define urban areas. For example, in the United States municipal areas are a political boundary that does not imply the type or morphology of urban areas. Using these boundaries as the geography of urban areas could be misleading. Providing a classification based on the morphology of built environments without being confused with political boundaries could help urban and regional researchers to contextualize their variables within the appropriate morphology. In addition, mapping settlement classification in different metropolitan areas illustrates their structures. Figure 6 shows the results of settlement classification for six metropolitan areas (Boston, Philadelphia, Denver, Chicago, DC-Baltimore and New York). This figure alone shows the magnitude of centrality and fragmentation. For instance, Chicago is surrounded by urban fringe areas encompassing a network of continuous low-density corridors, whereas in the Boston metropolitan area we do not see those corridors. Instead, there are some satellite towns with dense cores.
Two measures were used to compare the forms of metropolitan areas. First, the percentage of each settlement type was calculated from the area of all settlement types within each metropolitan area's boundaries. Second, density profiles of each metropolitan area were measured from their centres in 2000 and 2010. Among 50 cities, the five highest percentages of high-density urban areas are for Los Angeles, New York, San Francisco, Miami and Chicago. The highest percentages of suburb and fringe areas are for Birmingham-Hoover, AL; Nashville, TN; Pittsburgh, PA; Kansas City, KS; and Charlotte-Gastonia-Rock Hill, NC-SC. Table 1 presents the percentages of each settlement type from the area of all types for combined metropolitan areas in 2010. It shows the extent of density and compactness in each metropolitan area. However, it does not reveal the magnitude of the high-density classification. For example, according to Table 1, Los Angeles has a higher percentage of high-density areas compared with New York, whereas New York has a significantly denser centre. To visualize density distribution in each metropolitan area, density change was measured through density profiles. Figure 7 presents density profiles of New York, Los Angeles, Chicago and Boston in 2010. To produce this profile, Figure 6. Result of settlement classification for six metropolitan areas.
Regional StudieS, Regional Science the number of housing units from one to 46 km from centres were measured. As Figure 7 indicates, New York has significantly more high-density area compared with other metro areas. This density is the highest from Lower Manhattan to Central Park. The metropolitan area with the second highest magnitude of density is Los Angeles. These profiles also illuminate the way density changes when distance from the centre increases. For example, density profiles for Los Angeles and Chicago are rather similar, which show that after 3-4 km from centres there is a steady line correspondent with relatively low-density urban areas.
This rich dataset also explains changes of metropolitan forms from 2000 to 2010. Figure 8 shows density profiles for Denver in 2000 and 2010. Within 10 years, the density of the city centre has increased from 15 to 25 housing units/hectare. This is a significant densification and indicates considerable infill development over the decade in central areas. A closer inspection of Denver profiles also reveals that there is a rise occurring in the range of 7-14 km from the centre. This elevated density is due to growth in peripheral municipalities such as Westminster and Aurora, both CO. This method can also identify shrinking cities. Figure 9 is the density profiles for Detroit in 2000 and 2010. Although it shows a slight increase in the density of the centre, there is a considerable drop in the range of 4-13 km. Although density profiles are not precise measures for distinguishing mono-or polycentricity, we can detect their signals. For example, the density profile for Detroit shows two bumps at 6 (around New Center) and 10 (around Dexter Ave) km from the centre. These two are obvious sub-centres. However, the profiles suggest there is a centralization trend in Detroit due to increases in the centre and drops in the sub-centres between 2000 and 2010.

THE GEOGRAPHY OF RECENTLY DEVELOPED DWELLINGS
Although these density profiles can partially explain the location of recently developed dwellings (from 2000 to 2010), from the dataset we can measure the proportion of development in each settlement type (Table 2). According to Table 2, 47% of total dwellings developed in the New York-Northern New Jersey-Long Island metropolitan area were developed in high-density urban areas. This means that in this metropolitan area, intensification policies have been more successfully pursued compared with other metropolitan areas. Metropolitan areas such as San Francisco, Los Angeles and Boston have a similar pattern except that most recent development has taken place in low-and medium-density areas. Nevertheless, in these metropolitan areas, the number of new dwellings in suburbs or undeveloped 1 areas have been relatively low. On the other hand, in some metropolitan areas there are inverse trends. For example, in metropolitan areas such as Birmingham-Hoover, AL, Raleigh-Cary, NC, and in Sacramento, CA, development in undeveloped areas has a high percentage (60.9%, 42.6% and 42.2% respectively). Figure 10 shows the percentage of development in undeveloped areas.
The results are in line with other studies reporting sprawl in these metropolitan areas. However, most studies have not covered all the 50 metropolitan areas included here. Furthermore, they were focused on developing indices for measuring sprawl. Galster et al. (2001) developed a comprehensive method for measuring urban sprawl through a multidimensional method exploring several aspects of sprawl. Among the 13 urban areas that they investigated, Atlanta had the highest sprawl score and New York the lowest. The present results also show that Atlanta still has a high rate of growth in low-density and undeveloped areas, and New York has the lowest rate.
Mapping the location and quantity of recently developed housing units in a metropolitan area can reveal the direction of growth and, to some extent, the drivers of such patterns. So far, this paper has shown whether the new housing units are developed in high-density urban areas or low-density areas. As an example, it was chosen to map this trend for the Denver metropolitan area. Figure 11 shows three maps through which we can see the location and intensity Regional StudieS, Regional Science Regional StudieS, Regional Science of development in the region. As presented in the previous sections, Denver has experienced significant infill development in the urban core area. However, the middle map shows there is considerable development in the low-density and urban-fringe areas that creates a ring with an average radius of 20 km around downtown Denver. The right map shows the municipal boundaries in the region. Overlaying these two layers, we can see that each municipality has developed a significant quantity of housing, mostly in the suburbs and urban fringes. Putting together all the numbers, about 70% of recently developed housing in Denver has been in the suburbs and urban fringes. Municipal fragmentation in the region is one of the main drivers contributing to sprawl in the Denver metropolitan area. Of course, there are other economic drivers for such fragmentation that are not discussed here. Denver was mapped only as an example, as each region  Regional StudieS, Regional Science has its own patterns. This analysis provides very rich data for analyzing growth trends and patterns in each metropolitan area. This paper introduced a method for tracking growth trends and metropolitan structure changes over time. This method can be replicated at the temporal resolution of the input datasets, every five or 10 years. Changes in primary data such as census block boundaries do not affect this method. The method also provides nationwide outcomes that can be used for comparison of different metropolitan areas and for analyzing mega-regions (Dewar & Epstein, 2007). Through application of the method the growth trends in 50 metropolitan areas were presented.  Regional StudieS, Regional Science  Table 2. (Continued).
Regional StudieS, Regional Science Regions pursuing intensification policies (New York, San Francisco, Los Angeles and Boston) versus sprawling regions (Birmingham-Hoover, AL, Raleigh-Cary, NC, and Sacramento-Arden-Arcade-Roseville, CA) were identified. The density function of all 50 metropolitan areas was produced and then used to describe the growth patterns (centralization, intensification and shrinking processes). Some metropolitans such as Atlanta, New York and Los Angeles have a logarithmic decaying function. To model the decay function of polycentric metropolitan areas such as Detroit, a polynomial curve is needed. In addition, it was shown how the geography of growth for each region can be mapped. As an example, the Denver metropolitan area has experienced significant intensification in the centre, while there is considerable suburbanization at the edge due to municipal fragmentation. Developing such a methodology provides an accurate image of metropolitan structures and their spatial growth trends. The variation of development patterns in different cities can be helpful in exploring the presence and efficiency of policy levers. In addition, this high-resolution analysis using nationwide validated data (i.e., census and USGS land cover), which are updated every 5-10 years, provides a foundation to study many other variables such as environmental degradation, public health, economic development and relevant policy approaches.
The method presented in this paper offers a nationwide but high-resolution measurement of growth patterns. It produces a high-resolution basis for (1) identifying different settlement types including high-, medium-and low-density urban areas, urban fringes and suburbs, (2) tracing recently developed dwellings (from 2000 to 2010) by identifying their locations, and (3) producing density functions of the metropolitan areas to analyze how density is distributed around the centres. Furthermore, density profiles provide a comparable image of the growth trends for all metropolitan areas. Comparing the profiles of 2000 and 2010 explains whether a metropolitan area has become more centralized, polycentralized or decentralized. This method is easily replicable for the next decade when 2020 Census data become available.