Subpixel land-cover classification for improved urban area estimates using Landsat

ABSTRACT Urban areas are Earth’s fastest growing land use that impact hydrological and ecological systems and the surface energy balance. The identification and extraction of accurate spatial information relating to urban areas is essential for future sustainable city planning owing to its importance within global environmental change and human–environment interactions. However, monitoring urban expansion using medium resolution (30–250 m) imagery remains challenging due to the variety of surface materials that contribute to measured reflectance resulting in spectrally mixed pixels. This research integrates high spatial resolution orthophotos and Landsat imagery to identify differences across a range of diverse urban subsets within the rapidly expanding Perth Metropolitan Region (PMR), Western Australia. Results indicate that calibrating Landsat-derived subpixel land-cover estimates with correction values (calculated from spatially explicit comparisons of subpixel Landsat values to classified high-resolution data which accounts for over [under] estimations of Landsat) reduces moderate resolution urban area over (under) estimates by on an average 55.08% for the PMR. This approach can be applied to other urban areas globally through use of frequently available and/or low-cost high spatial resolution imagery (e.g. using Google Earth). This will improve urban growth estimations to help monitor and measure change whilst providing metrics to facilitate sustainable urban development targets within cities around the world.


Introduction
Urban areas are estimated to cover only 0.5% of Earth's surface yet are one of the fastest growing land use per area basis (Schneider, Friedl, and Potere 2010;Bettencourt and West 2010;Schneider, Friedl, and Potere 2009). Population growth has resulted in increased urbanization with 54% of the planet's seven billion people in 2014 residing in urban areas with an additional 2.5 billion urban dwellers projected by 2050, whilst concurrently extent can influence decision-making (e.g. policy for sustainable urban development) (Schneider, Seto, and Webster 2005;Hepinstall-Cymerman, Coe, and Hutyra 2013;Miller and Small 2003;Bagan and Yamagata 2014).
Due to the heterogeneity of urban areas, subpixel classification methodologies have been increasingly applied to medium spatial resolution data to more accurately represent the mixture of land covers within a pixel (Lu, Moran, and Hetrick 2011;Lu and Weng 2006;Powell and Roberts 2008;Weng and Ruiliang 2013;Wang et al. 2013). This has been achieved through variations of spectral mixture analysis (SMA) where a set number of representative endmembers, frequently following the Vegetation, Impervious and Soil (V-I-S) framework, are used to model the entire image based on their spectral characteristics (Powell et al. 2007;Ridd 1995). However, endmembers may not fully represent image spectral variability or a pixel may be modelled by endmembers that do not represent materials within its field of view resulting in an inability to adequately portray the high spectral heterogeneity of the urban landscape (Powell et al. 2007). Support Vector Machine (SVM) spectral unmixing attempts to resolve this issue through consideration of a large number of training pixels which provides preferential accuracy in comparison to SMA although high dimensional data and large training samples can hinder its performance (Wang et al. 2013).
Comparatively, the novel sub and hard pixel Import Vector Machine (IVM) classifier permits simultaneous multi-class comparison whilst continuously testing training samples for validity providing a more accurate solution (Roscher, Förstner, and Waske 2012). IVM has been found to consistently outperform decision trees, artificial neural networks, and maximum likelihood algorithms (Watanachaturaporn, Arora, and Varshney 2008;Kotsiantis, Zaharakis, and Pintelas 2006;Huang, Davis, and Townshend 2002), with preferential (Braun, Weidner, and Hinz 2012) and comparable results to SVM (Roscher, Waske, and Forstner 2010). However, due to the heterogeneity of urban areas, it is important to calibrate these subpixel approaches against high spatial resolution data that capture the diverse characteristics found within urban environments (Lu, Moran, and Hetrick 2011). Perth, Western Australia (WA), is characterized by extensive urban diversity, surpassing all other major Australian and US cities in terms of suburban development (Kelly, Weidmann, and Walsh 2011;U.S. Department of Commerce 2013). It therefore provides a suitable case study for assessing the ability of Landsat to map urban development, which is a prerequisite for appropriate policy incorporation. This article describes an approach to map the urban extent of the Perth Metropolitan Region (PMR) using an IVM classifier applied to medium spatial resolution imagery. The impact of subpixel landcover heterogeneity is investigated by comparing the urban area estimates to those derived from very high spatial resolution (20 cm) imagery. An innovative, spatially explicit correction to account for over (or under) estimation of urban area is derived which improves the urban land-cover estimates from medium resolution imagery.

Study area
The PMR (Figure 1(a)), WA, has experienced sustained urban development since the twentyfirst century in response to a rapidly growing resource sector (Kennewell and Shaw 2008). The majority of recent urban growth within the PMR has transpired as outward low-density development resulting in a maximum population density of 3662 km 2 which is 33.45% and 24.83% lower than Melbourne (10,827) and Sydney (14,747), respectively (Western  Planning Commission 2015;ABS 2015). The notion of the 'Australian dream,' depicted as detached living in a green suburb, is most pronounced in Perth (Western Australian Planning Commission 2013a). As a result, 79% of the current housing is detached, compared to 62% in Sydney, 72% in Melbourne, and a national average of 74% (Kelly, Weidmann, and Walsh 2011;Western Australian Planning Commission 2013b). Globally, Australia surpasses other developed countries in terms of detached suburban living with UK having 42% of housing as either detached or semi-detached (Department for Communities and Local Government 2015). Similarly, only 64.2% of USA housing stock is detached, with Perth eclipsing all of the major 25 USA metropolitan areas in terms of detached housing (U. S. Department of Commerce 2013). Low population density and outward expansion witnessed in Perth have generated high demand for dispersed amenities and services in a nonstrategic, 'lot-by-lot fashion' (Dhakal 2014). Suburbanization of this nature has been identified as unsustainable due to impacts on ecological systems (e.g. habitat fragmentation) and socio-economic issues (e.g. amenity provisioning costs), with accurate urban area identification essential for sustainable future planning and maximum resource efficiency, particularly in Perth owing to its globally high suburbanization and distributed population (Western Australian Planning Commission 2013a).

Australian
Therefore, the PMR provides a globally diverse range of urban characteristics (e.g. compact urban central business district, older residential areas, and new suburban developments) facilitating broad data set comparison opportunities between Landsat and high spatial resolution urban area estimates. The high spatial resolution data identify the complexity of these suburban and urban areas, which is obscured in medium and coarse spatial resolution data sets. This permits the extraction of individual features such as buildings, roads, and vegetation that compose the urban environment and which are represented as a spectrally mixed pixel in Landsat imagery (illustrated in Figure 2) (Myint et al. 2011).
Definitive feature detection from high-resolution data can assist in refining urban area estimates produced from moderate spatial resolution satellite imagery (Lu, Moran, and Hetrick 2011;Wu and Murray 2003). More accurate satellite-derived urban area estimates are imperative for ensuring appropriate data use for policy and environmental variable applications in order to mitigate the consequences of unsustainable urban development. This aligns with the criteria of effective land-use planning within the City Resilience Framework that is designed to improve city resilience (ARUP, and The Rockefeller Foundation 2015).

Landsat data
Cloud-free Landsat scenes were obtained for 2007 from Landsat 5 Thematic Mapper (TM), coinciding with high-resolution orthophotos (described in Section 3.2). Imagery was acquired within winter months (9 July 2007 for path 113 and 6 October 2007 for path 112) corresponding with peak vegetation green-up which limits issues concerning the spectral separation between senescent vegetation, bare earth, and some impervious surfaces (Feyisa et al. 2016;Chen et al. 2014). Landsat imagery was processed to standard terrain correction (Level 1T), geometrically and topographically corrected using ground control points and a digital elevation model from the Global Land Survey 2000 data set (Hansen and Loveland 2012). Landsat 5 TM surface reflectance values were derived from the Landsat Ecosystem Disturbance Adaptive Processing System (Hansen and Loveland 2012;Jeffrey G. Masek et al. 2006) which corrects for atmospheric effects using the Second Simulation of a Satellite Signal in the Solar Spectrum (6S) radiative transfer model (Vermote et al. 1997).

High spatial resolution airborne imagery
Radiometrically calibrated multispectral red (0.58-0.77 µm), green (0.48-0.63 µm), blue (0.41-0.54 µm), and near-infrared (0.69-1.00 µm) orthophotos were acquired over 19 cloud-free days commencing on 14 March 2007 as part of the Perth and Peel Urban Monitor Programme (Caccetta et al. 2012). Aerial imagery, obtained between 10:00 and 14:00 to reduce shadow effects, was captured using a Microsoft UltraCAM-D at a height of 1300 m resulting in a spatial resolution of 20 cm. Forward and side frame overlap of 60% and 30%, respectively, permitted automatic digital surface model (DSM) extraction using geometric control points provided by WA's land information authority (Landgate). Extraction of ground points exclusively representing terrain variations facilitated derivation of a Ground Elevation Model which, when subtracted from the DSM, generated a Relative Elevation Model, depicting elevation relative to ground points.
Spatial and temporal inconsistencies in reflectance can arise from atmospheric scattering and absorption, instrument noise and Bidirectional Reflection Distribution Function (BRDF) effects. The latter describes the systematic variation in reflectance across an image due to differences in view and illumination angles and which is dependent on the surface 3D structure (Collings, Caccetta, and Campbell 2011). The orthophotos were provided as a surface reflectance product, corrected for multiplicative and additive errors over frames (e.g. instrument noise and atmospheric effects) and within frame viewing and illumination geometry (Caccetta et al. 2012;Collings, Caccetta, and Campbell 2011). Image preprocessing consisted of two steps. First, a combined BRDF and atmospheric correction procedure was applied to retrieve surface reflectance for each image acquisition. Linear BRDF model parameters from the Li Sparse reciprocal kernel (Wanner, Li, and Strahler 1995) were used to correct for BRDF effects. Atmospheric perturbations were corrected by assuming that the obtained digital number represented the relative reflectance affected by spatially dependent multiplicative and additive terms. These combined steps generated an internally consistent mosaicked data set. 'True' surface reflectance was estimated through fitting global offset and gain values to replicate laboratory measured calibration targets based on the assumption that relative reflectance requires a linear transformation to true reflectance (Collings, Caccetta, and Campbell 2011).

Landsat preprocessing
The two Landsat scenes covering the study area were combined to form a seamless image mosaic following the methodology of Pan et al. (2009). Voroni diagrams were created on the bisector between images with adjacent edges defined as seamlines, identifying effective mosaic polygons that specify pixels from each image to include in the final mosaic, facilitating less visible boundaries through blending of overlapping pixels (Pan et al. 2009) (Figure 1(a)). Due to remaining residual noise in the mosaicked imagery caused by factors such as the brightening effect of thin clouds and atmospheric correction differences, surface reflectance values were standardized following the approach identified by Sexton et al. (2013): where p i;b is the standardized pixel value i, from band b based on the original surface reflectance x, standardized through division by a waveband-specific upper reflectance limit which are 0.10 (blue; 0.48 µm), 0.11 (green; 0.56 µm), 0.12 (red; 0.66 µm), 0.23 (near-infrared; 0.84 µm), 0.21 (shortwave infrared; 1.65 µm), 0.15 (shortwave infrared 2; 2.22 µm). The standardized values (p i;b ) were then normalized against the summed band standardized values: where P i p i;b is the sum of each standardized pixel across all bands (Sexton et al. 2013). This approach has been found to satisfactorily reduce variations generated from inherent residual noise across mosaicked imagery, for example due to differences in modelled atmospheric parameters within the LEDAPS algorithm (Sexton et al. 2013;Luo et al. 2014) (Figure 3(a)). Statistical assessment of image radiometric normalization provided in MacLachlan et al. (2017a) found that the post-processed Landsat data exhibited significantly lower inter-and intra-coefficient of variation when compared to the preprocessed data.

Landsat classification
The 2007 Landsat data were classified as a time series of data for seven sequential periods between 1990 and 2015 using an IVM classifier produced in MacLachlan et al. (2017a). The method uses a hybrid strategy that assesses whether new samples (termed import vectors) can be removed in each forward step in order to provide a smoother decision boundary that ideally leads to a more accurate solution (Roscher, Förstner, and Waske 2012). Samples are selected based on how much their incorporation decreases the objective function to minimize the decision boundary to form the optimal separating hyperplane between overlapping clusters (e.g. land-cover types) in spectral feature space (Mountrakis, Jungho, and Ogole 2011;Roscher, Förstner, and Waske 2012;J. Zhu and Hastie 2005). IVM generates two outputs: a soft (subpixel) data set that defines the probability of a pixel containing a given classification value (e.g. land-cover type) and a traditional 'hardened' classified data set (Braun, Weidner, and Hinz 2012). Training samples were collected from the 12 and 19 July 2005 Landsat 5 TM image composite, coinciding with peak vegetation greenness which provides the greatest spectral separability between vegetated and non-vegetated surfaces (Feyisa et al. 2016;Chen et al. 2014). Six land-cover types were defined based on existing literature (e.g. X. Hu and Weng 2009;Schneider 2012;Feyisa et al. 2016) and scene analysis which are high reflectance urban (e.g. concrete), low reflectance urban (e.g. asphalt), forest, water, grassland, and bare earth. Two urban land-cover classes are specified to reduce spectral confusion between spectrally similarly classes (e.g. urban and bare earth) (X. Hu and Weng 2009). For each land-cover type, 250 pixels were randomly identified from across the image for training the IVM classifier that follows the approach used by Foody and Mathur (2006) and Pal and Mather (2003). The IVM algorithm is parameterized using the training data that generate a classification model consisting of spectral profiles for each land-cover type, which are then matched to the Landsat mosaic during classification.
The resulting per-pixel (hardened) classification indicates that the total urban extent of the PMR has increased 45.32% (subpixel estimate of 32.96%) between 1990 (hardened estimate 706.88 km 2 , subpixel estimate 736.93 km 2 ) and 2015 (hardened estimate 1027.22 km 2 , subpixel estimate 979.84 km 2 ) (MacLachlan et al. 2017a). This can be broken down into low reflectance urban cover expanding from a hardened value of 592.83 km 2 (subpixel estimate 668.46 km 2 ) to 839.00 km 2 (subpixel estimate 850.87 km 2 ) and high reflectance urban cover increasing from a hardened value of 114.05 km 2 (subpixel estimate 135.32 km 2 ) to 188.20 km 2 (subpixel estimate 214.06 km 2 ) across the same temporal period.

Google Earth Landsat accuracy assessment
Google Earth imagery consistent with the Landsat acquisition date was used to assess the accuracy of the hardened Landsat classification following previously published methods (e.g. Dorais and Cardille 2011;Cunningham et al. 2015;Song et al. 2016;Sun et al. 2015;Bagan and Yamagata 2014;Z. Zhu and Woodcock 2014). Using the Google Earth imagery, 300 random locations (50 per land-cover class) within the PMR were visually identified and compared to the classified land-cover data, consistent with recommended land-cover accuracy sample size of Congalton (2001) (Song et al. 2016).

Aerial image classification
Urban areas are complex, heterogeneous environments that are challenging to classify even when using high spatial resolution multispectral imagery (Varshney and Rajesh 2014;Lu, Moran, and Hetrick 2011). Within urban areas, traditional moderate and coarse spatial resolution pixel-based classification methods present multiple challenges due to the land surface spatial heterogeneity and the spectral similarity between urban and non-urban materials (Myint et al. 2011). To characterize the influence of spatial resolution on the ability to map urban areas, high spatial resolution multispectral ortho-imagery (20 cm) was classified into the four broad land-cover types. To reduce data processing requirements, four 3 km 2 subsets were chosen which are representative of the land-cover composition and spatial heterogeneity found within Perth (Figure 1(a)). These subsets are an out of town development area (East Beechboro), the Central Business District (CBD), an older suburban area (Palmrya, Melville), and a largely vegetated region (Keysbrook). Using the high spatial resolution multispectral imagery and a relative elevation model, an Object Based Image Analysis (OBIA) method was applied to classify each subset into vegetation, urban, bare earth, and water ( Figure 3(b)). OBIA methods are often applied to high spatial resolution imagery as they include spatial, textural, and spectral information to classify the scene (Myint et al. 2011). Incorporating surface elevation measurements into urban classifications has been found to improve building (urban) extraction accuracy (Aguilar et al. 2012;Poznanska, Bayer, and Bucher 2013). Surface elevation estimates and normalised difference vegetation index data provided additional urban classification parameters, with refinement (e.g. additions and alterations) made based on object spatial, spectral, and textural properties. Unlike the Landsat imagery, the airborne imagery was collected during the late dry season when the grass was senescent which resulted in textural and spectral similarity between bare earth and roads. To mitigate the impact of potential misclassification between these features, Landgate road and, where appropriate, rail vector data set was used for identification of coincident image objects for urban assignment.

Data set comparison and Landsat refinement
In order to compare the orthophoto and Landsat land-cover classifications, the two urban (high and low reflectance) and two vegetation (woodland and grassland) Landsat landcover classes were merged so that both land-cover classifications contained four identical classes. To facilitate comparison between the high spatial resolution orthophoto-derived classification and the Landsat classification, the orthophoto land-cover data are aggregated to Landsat spatial resolution to provide a 'soft' and a 'hard' land-cover data set. To create the soft 30 m 2 orthophoto-derived classification, each resampled 30 m 2 pixel area contains the proportion of each land-cover type within it (Lu, Moran, and Hetrick 2011) (Figure 3(b)). This data set was subsequently 'hardened' by assigning the pixel land-cover type according to the dominant land cover found within the 30 m 2 area.
The comparison methodology is to first compare the per-pixel (i.e. hardened) Landsat land-cover classification with the aggregated (30 m 2 ) orthoimage classification. Misclassified Landsat pixels are assessed further to establish the conditions that lead to erroneous classification using the subpixel proportion information (i.e. soft classification data sets). The latter are also used to identify a spatially explicit correction model to improve urban area estimates from moderate spatial resolution imagery.

Orthophoto and Landsat land-cover comparison
A comparison is conducted between the orthophoto land-cover classification, aggregated to 30 m 2 spatial resolution using the majority land cover, and the IVM 'hardened' Landsat classification. At its native spatial resolution (20 cm; Figures 4(a-d(i))), the orthophoto land-cover classification (Figures 4(a-d(ii))) captures the land-cover spatial heterogeneity found within each region and highlights the difference in the spatial structure between these regions.
A comparison is carried out between the orthophoto land-cover classification, aggregated to 30 m 2 spatial resolution, and the 'hardened' Landsat classification. Figure 4(iii) illustrates the spatial agreement between these data sets and highlights those pixels where the same land-cover type (true) has been assigned to a pixel in both classifications. The areas that are more homogeneous at Landsat's spatial resolution, such as the CBD (urban, Figure 4(b)) and Keysbrook (vegetation, Figure 4(d)), have greater level of agreement (73.14% and 95.68%, respectively). In contrast, the more heterogeneous subsets (East Beechboro and Palmrya, Figures 4(a,c)) have much lower levels of agreement (56.09% and 32.03%, respectively). The differences in agreement result from the subpixel heterogeneity at 30 m 2 spatial resolution. Table 2 shows the percentage of Landsat pixels which contain >50% of a given land-cover for each subset region.
To investigate the influence of subpixel heterogeneity on the ability of Landsat to identify the pixel land-cover type, the classification accuracy is determined as a function of the percentage of urban area within each Landsat pixel for all four subsets ( Figure 5). The urban percentage cover within each Landsat pixel is derived from the orthophoto land-cover classification that has been aggregated to 30 m 2 and that provides the proportion of each land cover within each pixel. The accuracy of the hardened Landsat classification was determined through comparison against the 'hardened' (e.g. aggregated to 30 m 2 ) orthophoto land-cover classification where the per-pixel land-cover type was determined based on the land-cover type with the greatest subpixel proportion. Figure 5 indicates that the hardened Landsat classification results in a relatively high accuracy, with an average of 85.40% (excluding Keysbrook), for pixels containing >50% urban land cover (according to the high spatial resolution land-cover classification). In the subsets of East Beechboro, the CBD, and Palmrya, the overall Landsat classification accuracy drastically declines to 1.99-6.21% when urban land cover within a 30 m 2 pixel area decreases to 40-50%. The classification accuracy then increases with decreasing subpixel urban cover which is particularly evident with Landsat pixels containing 0-10% urban cover. Keysbrook, on the other hand, is a largely vegetated region and exhibits lower accuracy with increasing urban land cover.
In order to understand the counter-intuitive behaviour of such as rapid decrease in classification accuracy in pixels which contain between 40% and 50% urban area ( Figure 5), an analysis of the percentage of pixels classified as a given land-cover type is presented. To do so, all pixels containing different ranges in urban percentage cover (e.g. 0-10%, 20-30%, etc.) were identified using the high spatial resolution land-cover data set. The total percentage of each land-cover type was calculated for all pixels that contained urban percentage cover within each range urban percentage cover (e.g. 0-10%, 20-30%, etc.) using hardened IVM Landsat land-cover data set and the aggregated high spatial resolution land-cover data set (i.e. defined by the dominant land-cover type within a 30 m 2 pixel area). Figure 6 illustrates the percentage of pixels identified as a given land-cover type as indicated by the hardened Landsat land-cover data set and the hardened high spatial resolution orthophoto land-cover data set for pixels which contain differing percentage urban cover (e.g. 0-10%) derived using the original high spatial resolution orthophoto land-cover classification for the East Beechboro subset. This area was selected as it is an intermediate area in terms of land-cover heterogeneity (Figures 2 and 4(a)). The results indicate that the hardened Landsat classification consistently overestimates urban land cover when compared to the 'hardened' high spatial resolution classification which has been aggregated to 30 m 2 based on the dominate land cover within the Landsat pixel area for pixels with 10-50% urban defined by high-resolution data. Table 3 and Figure 7 illustrate the subpixel (30 m 2 ) percentage urban land cover for East Beechboro with the original reflectance imagery for this area shown in Figure 2. The hardened high spatial resolution land-cover data set (left bar in each plot [ Figure 6]) indicates that pixels containing <50% urban land cover are largely dominated by vegetation. In contrast, Landsat largely identifies these pixels as being either urban or vegetated to differing extents and more correctly identifies pixels with 0-10% urban land cover as being predominantly vegetated. For example, pixels containing 40-50% urban area are correctly identified as being vegetated (98.45% of pixels within this range) by the hardened high spatial resolution land-cover data set since these pixels contain on an average 54.72% vegetation, 44.83% urban, and 0.45% bare earth. In contrast, the hardened Landsat land-cover data set identifies 5.65% of pixels containing 40-50% urban cover as being vegetation, 74.28% being urban, and 20.07% being bare earth. As the percentage of urban land-cover decreases, the overall accuracy of the hardened Landsat classification increases due to the increase in Landsat vegetation cover which increases from 5.65% (40-50% urban cover) to 75.41% (0-10% urban cover). The results are similar for the other regional subsets. The rapid decrease in accuracy between 40-50% and 50-60% ( Figure 5) appears extreme as the subset regions are dominated by vegetation and urban land cover (Table 1) which results in the aggregated 30 m 2 pixels being Figure 5. Landsat classification accuracy as a function of the percentage urban cover within Landsat image pixels (as derived from the high spatial resolution land-cover data set) for each of the four subsets. In the Keysbrook subset, no Landsat pixels contained >60% urban land cover. Figure 6. Land-cover type disaggregation for urban land cover (according to the orthophoto imagery) Landsat pixels in East Beechboro. The left axis indicates the total percentage cover of a given land-cover type using all of the pixels within a given range of urban percentage cover range for: (a) 0-10%, (b) 10-20%, (c) 20-30%, (d) 30-40%, (e) 40-50%, (f) 50-60%, (g) 60-70%, (h) 70-80%, (i) 80-90%, and (j) 90-100%. For each percentage urban land-cover graph, the left bar illustrates the overall percentage of pixels from the hardened high spatial resolution classification identified as a given land types whilst the right bar indicates the percentage of hardened Landsat pixels mapped as a given land-cover type.
assigned to vegetation when the percentage urban cover is <50% (Figures 6(a-e)) or urban when the percentage urban cover is >50% (Figures 6(f-j)).
The results in Figure 6 suggest that the spectral data used to train the IVM classification (discussed in Section 4.2) contained spectrally 'mixed' pixels resulting in land-cover Figure 7. Comparison of percentage urban area aggregated to 30 m 2 from high-resolution data (a) and IVM 'soft' Landsat classification (b) highlighting the (overestimation) between the high (c) and moderate (d) spatial resolution estimates for the East Beechboro subset. The classified high spatial resolution data are shown in (e) with the moderate spatial resolution grid (30 m 2 ) overlaid for context (e). type misclassification. To investigate this, the spectral reflectance from Landsat pixels containing 20-30% urban cover for the Palmrya subset, which had the lowest overall agreement and which were identified as being mostly vegetated by the hardened high spatial resolution land-cover data set, is extracted and compared to the spectral reflectance profiles used to train the IVM classification algorithm. Figure 8 indicates that there are strong similarities between the average spectral reflectance profile used to train the IVM classification algorithm and the average spectral profile of the misclassified pixels. This suggests that the IVM classification algorithm is accurately representing the Landsat pixel spectral reflectance properties but that the training data used to develop the classification model contained a high proportion of mixed pixels.
Pure (i.e. homogeneous) pixels are conventionally selected to train classification models (e.g. Weng and Ruiliang 2013) but these are inherently difficult to identify in urban areas owing to the multitude of land covers within a Landsat pixel area. Using the high spatial resolution classification, the percentage of pure pixels, defined here as those containing between 90% and 100% of a single land-cover type, was identified (Table 4). It is evident that some regions contain a high percentage of pure pixels for a given land-cover type, such as vegetation in Keysbrook (92.05%), but that other land-cover types within a region typically have much lower percentages of pure pixels. Pure urban pixels are particularly limited in all subset regions. Whilst the CBD subset obtains a high percentage of pure urban pixels (28.77%), these are predominately urban areas with high spectral reflectance (e.g. concrete), differing from subsets with urban areas which have urban areas with both high and low spectral reflectance (e.g. East Beechboro; Figure 2).

Comparison between Landsat and high spatial resolution impervious surface estimates
Landsat data have been widely applied to map impervious surface area in order to assess its effects on: urban growth dynamics (J. G. Masek, Lindsay, and Goward 2000), the UHI effect (Y. , and surface run-off (Q. Weng 2001). Figure 6 indicates that the 'hardened' Landsat IVM classification overestimates urban land cover,  particularly for pixels containing <50% urban area. The IVM classifier also provides a 'soft' land-cover data set that quantifies the subpixel land-cover proportions.
Here, we investigate the utility of the subpixel Landsat urban land-cover estimates by comparing them to those derived from the high spatial resolution land-cover data set (20 cm) which is used to provide the actual land-cover proportion within each 30 m 2 pixel area. Urban area estimates from each of the four subsets (Figure 1(a)) were spatially averaged over different size spatial windows (30 × 30, 90 × 90, 150 × 150, and 210 × 210 m) in order to account for any errors resulting from pixel heterogeneity, spatial mis-registration, residual atmospheric and BRDF effects, and phenological differences (Ju et al. 2012;Liang, Fang, and Chen 2001;Maiersperger et al. 2013;Ghimire, Rogan, and Miller 2010;Lu, Moran, and Hetrick 2011) that may increase the uncertainty in estimating land-cover proportions (Sexton et al. 2013;Lu, Moran, and Hetrick 2011). Comparison of impervious surface proportions at 30 m 2 , for example the CBD subset (Figure 9), reiterates the overestimation of urban area at 30 m 2 spatial resolution, with a clustering of values towards the upper percentage boundaries associated with lower urban area estimates from the high spatial resolution classification. When neighbourhood averaging is applied, the agreement in urban area typically improves with increasing window size although the subset specific bias remains consistent (Table 5). It is also Figure 8. Average spectral reflectance profile for misclassified pixels (red) from the Palmrya subset for pixels containing 20-30% urban cover compared to the average spectral reflectance profile of pixels used to train the IVM classification algorithm (blue). For (a) forest, (b) low urban reflectance, (c) high urban reflectance, and (d) bare earth. The error bars show the standard deviation. evident that urban area is still overestimated with decreasing urban subpixel proportion even when utilizing the subpixel IVM Landsat classification results.

Refining Landsat estimations using high spatial resolution data
Subpixel land-cover heterogeneity influences Landsat urban area overestimation which must be considered in order to reduce the bias and improve Landsat-derived urban area estimation (Herold et al. 2002;Lu, Moran, and Hetrick 2011;Varshney and Rajesh 2014;Schneider 2012). The complexity and diversity of urban areas identified here from high spatial resolution data, with biases ranging from −2.50% to −34.67%, highlights the inappropriateness of applying a single model to adjust the moderate spatial resolution urban area estimates in a metropolitan region (e.g. Lu, Moran, and Hetrick 2011). The Landsat subpixel urban areas estimates from all four subsets were stratified based on the Landsat subpixel derived urban area and calibrated against the percentage of urban area from the high spatial resolution classification within each moderate spatial resolution pixel area. Both data sets were averaged at the neighbourhood level using a 210 × 210 m window as this provided the best overall relationship (Table 5). Stratification of Landsat subpixel urban estimates into divisions of 10%, consistent with previous results, were selected to develop (using 50% of the data) and test (remaining 50% of the data) regression models to improve the data set agreement (Lu, Moran, and Hetrick 2011). The applied spatially explicit models reduced the bias and root mean square error (RMSE) between the predicted (moderate spatial resolution) and observed (high spatial resolution) estimates (Table 6). It is evident from Table 6 that the adjustment made to the Landsat urban area estimates reduced the overestimation difference of urban area by between 34.38% and 80.67%, with the largest improvement found within Keysbrook. Whilst the corrected Landsat urban area estimates still overestimate the urban area compared to the high spatial resolution data set, the corrected moderate spatial resolution urban area reduces moderate resolution urban area over (under) estimation by on an average 55.08% in comparison to the high spatial resolution data set reducing the average overestimation from 11.86 km 2 per subset to just 0.09 km 2 (Table 6). In the case of this study area, this approach is appropriate for producing more accurate urban area statistics. Due to the frequently reported over and under estimation of land-cover estimates by moderate spatial resolution data, this approach can refine urban estimates for planning development policies that may inform decisionmakers (Z. Zhu and Woodcock 2014;Schneider, Seto, and Webster 2005;Hepinstall-Cymerman, Coe, and Hutyra 2013). However, the derived correction values are not globally applicable since the spatial structure and makeup of urban and suburban areas varies regionally, nationally, and globally. Nevertheless, the methodology implemented here could be replicated to produce localized correction values from other sources of highresolution imagery (e.g. digitization of Google Earth imagery) to calibrate urban area estimates from moderate spatial resolution data.

Discussion
Refined urban estimates are vital in ensuring that suitable sustainable and strategic planning decisions are implemented (Bettencourt and West 2010;Wu and Murray 2003). Table 5. Comparison between high (20 cm 2 ) and moderate (30 m 2 ) spatial resolution subpixel impervious surface estimates considering differing kernel sizes over four subsets (Figure 1)   The hybrid spatial resolution approach applied here to estimate urban area was necessary due to the difficulty in accurately estimating urban area using a traditional per-pixel classification method. This was due to a combination of the sensors moderate (30 m 2 ) spatial resolution, land surface heterogeneity, and the selection of 'mixed' pixels for use in training the classification algorithm. The overall classification accuracy, determined using Google Earth imagery, was on an average 84.00%, which is similar to that found in other studies, albeit for different urban areas (e.g. Gislason, Benediktsson, and Sveinsson 2006;Bagan and Yamagata 2014;Sundarakumar et al. 2012;Luo et al. 2014).
Closer examination of the moderate spatial resolution classification results using a higher resolution data set indicates that when urban land cover within a 30 m 2 area decreases to 40-50% (based on high spatial resolution classification), the Landsat classification accuracy decreased from 85.40% to between 1.99% and 6.21%. This resulted from the Landsat classification overestimating urban area in comparison to high spatial resolution data ( Figure 5) which more correctly identified these pixels as containing a greater per-pixel proportion of vegetation. Pixels containing 40-50% urban cover contained on an average 54.50% vegetation cover excluding Keysbrook. The dominance of vegetation and urban land covers in the regional subset, when ascribed to a 30 m 2 pixel area based on the majority land cover, results in a rapid change in classification accuracy. Strong spectral similarities between training data and misclassified pixels (Figure 8) suggest that the spectral reflectance observations used to train the classification algorithm contained spectrally mixed pixels. The average percentage urban area within a moderate spatial resolution pixel area derived from the high-resolution data was 16.56%, 65.66%, 42.21%, and 0.90% for East Beechboro, CBD, Palmyra, and Keysbrook, respectively. The percentage of 'pure' pixels, defined as those containing over 90% urban land cover, was 28.77% for the CBD but <2.50% for the suburban regional subsets. This highlights the difficulty in selecting pure pixels at moderate spatial resolution and in accurately disentangling mixed spectral reflectance's without the aid of high spatial resolution data. Overestimation of urban extent was most prominent in Keysbrook, where vegetation dominates the subset (97.36%, Table 1). In this instance, Landsat-derived urban area corresponded to 0.28 km 2 compared to 0.08 km 2 from high spatial resolution classification, a difference of only 0.20 km 2 but which equates to 251.74%. In terms of total area difference, the East Beechboro and the CBD Landsat subsets were found to contain 1.75 and 1.70 km 2 more urbanized area, whilst Palmyra data overestimated urban area by 2.79 km 2 compared to the high spatial resolution equivalent due to its suburban nature and associated pixel heterogeneity (Figure 4).
Spatially averaging the Landsat and orthophoto land-cover classifications, to account for potential errors in the data sets (Ghimire, Rogan, and Miller 2010), improved their relationship although Landsat still overestimated urban area with differing bias per subset. Over (under) estimation of urban land from Landsat estimations could result in an under (over) prediction on further environmental variables (e.g. UHI) or policy applications. Multiple studies have used classified per-pixel moderate spatial resolution data to influence policy changes through monitoring urban growth (e.g. Schneider, Seto, and Webster 2005; Hepinstall-Cymerman, Coe, and Hutyra 2013). However, per-pixel methodologies fail to address the issue of mixed pixels, which, as shown here, can result in overestimation of urban area (average: 126.25%, equivalent to 57.58 km 2 within the PMR) (Lu, Moran, and Hetrick 2011). Subpixel methods attempt to remedy this issue but have been found to inaccurately separate impervious land cover from other land-cover types resulting in poor representation of impervious surface area (Lu, Moran, and Hetrick 2011). Consequently, overestimation of urban area may have resulted in suboptimal policies that fail to maximize resource and amenity efficiency (Turner, Lambin, and Reenberg 2010;Downs 2005).
Calibrating Landsat urban estimates using high spatial resolution data reduces the bias, RMSE, and improves urban area estimation. However, the range of bias values across subsets of differing urban land-cover characteristic highlights the inappropriateness of a single regression model due to pixel heterogeneity influencing overestimation (Lu, Moran, and Hetrick 2011). Spatially explicit models, as presented here, permit varying moderate spatial resolution refinement by considering the influence of surface heterogeneity. Whilst the limited availability of low-cost high spatial resolution data can preclude analysis of this type, subset digitization of Google Earth or unmanned aerial vehicle (UAV) imagery may provide a suitable alternative for calibrating Landsat data for improved urban area estimates. Enhanced estimates of urban area would facilitate planning policies that avoid potential environmental and socio-economic consequences of urban development than can result from policies based on over (or under) predicted urban area (ARUP, and The Rockefeller Foundation 2015). For example, classified Landsat data were used to identify spatial clustering, peri urban development, and specialization of land-use in Chengdu, Sichuan province, not considered by China's original 1990 Go West policy, aimed at economically boosting the West of the country. Results were used to reform policy and remediate issues of urban management including service, infrastructure, and resource deficiencies (Schneider, Seto, and Webster 2005). However, traditional Landsat classification may over (or under) estimate urban area and result in ineffective planning, environmental, and policy decisions (Miller and Small 2003;Pravitasari et al. 2015). Therefore, classified subpixel data alongside high spatial resolution imagery (e.g. UAV, Google Earth, high spatial resolution aerial or satellite imagery) as presented here can refine urban estimates facilitating improved decision-making whilst maximizing often limited financial resources. This is especially important in developing countries in regards to directing urban development and resources based on factors including poverty, environmental hazards (e.g. flooding), and current amenity centres (Marfai, Sekaranom, and Ward 2014;Suryahadi and Sumarto 2003).

Conclusion
Landsat imagery from 2007 was used to map the urban extent within the PMR using an IVM classifier that provides both a per-pixel and a subpixel classified data sets. The 2007 Landsat classification overall average accuracy was 84.00% with associated kappa coefficient of 0.78. Comparison between the Landsat per-pixel urban area and urban area estimates obtained from a high spatial resolution (20 cm) orthophoto-derived classification indicates that the moderate spatial resolution classification overestimates urban extent by 126.25% on an average, which is equivalent to 57.58 km 2 in the study area. Similarly, when the high spatial resolution urban area estimates are compared to those derived using a subpixel Landsat classification, the latter still overestimates urban extent by 120.25%.
Accurately quantifying urban expansion within the PMR due to the large population growth over the last decade is important in order to make the efficient use of current resources and to avoid additional amenity, environmental and health expenditure that can impact sprawling cities. Landsat data provide the longest time series of medium spatial resolution imagery to map and monitor urban area. However, the reported over and underestimation inhibits accurate quantification of urbanized land cover which increases uncertainty within global climate models, environmental studies, and targeted urban planning policy. Neighbourhood averaging, to account for potential errors in the data sets, improved the agreement between the two data sets but Landsat subpixel overestimation still remained. The broad differences in bias between the difference subsets indicate that a single regression model is inappropriate to heterogeneous urban land-cover estimates. Therefore, the moderate spatial resolution urban area estimates were corrected using spatially explicit regression models which, on an average, across the four subsets reduced the bias and RMSE by 17.02 and 6.65 km 2 respectively, whilst reducing moderate resolution urban area over (under) estimation by 55.08% Current and future EO satellites that provide complimentary data with enhanced spatial, spectral, and temporal resolution, such as Sentinel-2, may further reduce over or underestimation of urban area experienced by moderate spatial resolution sensors such as Landsat. Similarly, high spatial resolution satellite sensors, such as Worldview-3, are able to remediate discrepancies by capturing the fine spatial detail of urban environments but their cost and small swath limit their widespread application. This might change with companies, such as Planet, which are launching large numbers of small microsatellites that provide high spatial resolution data more frequently. Accurate urban land-cover and land-use mapping is essential in understanding the impact of urban expansion on, for example, social-ecological systems and human health and will improve future sustainable planning of our cities.

Geolocation Information
This research was conducted over the Perth Metropolitan Region, Western Australia, Australia, 31.9505°S, 115.8605°E.