Spatial non-stationarity analysis to estimate dwelling units in Riyadh, Saudi Arabia

ABSTRACT Dwelling unit (population) data at finer zones are important for different applications. The census data are not available every year and published at coarse zones. While the remotely sensed data have addressed this limitation, analysis of spatial non-stationarity in the relationship between dwelling and detailed urban remote-sensing covariates for dwelling estimates is almost novel. Riyadh, Saudi Arabia, is chosen as a case study. The remote-sensing variables have been derived from QuickBird data using an object-based image analysis. To analyse the spatial non-stationarity, ordinary least squares (OLS) and geographically weighted regression (GWR) approaches are applied. The OLS models suggest that utilizing three-residential classes provides more accurate results than built area and residential built area. The GWR models are more accurate than the OLS model. This research proves that the GWR approach cannot completely handle the spatial non-stationarity problem without efficient explanatory variables. For example, the GWR model that uses three-residential classes is the best model to estimate dwelling compared with the GWR model that uses residential built area. This article confirms that the application of the OLS approach with efficient explanatory variables can successfully account the spatial non-stationarity and the results are relatively comparable with the GWR model.


Introduction
The size and distribution of small-area population are necessary for private business and governmental plans (Deng, Wu, & Wang, 2010). For instance, small-area population is powerful in transport planning and traffic engineering, land-use planning, site location analysis, urban-sprawl analysis and national resource management (Martin & Williams, 1992;Pattnaik, Mohan, & Tom, 1998;Rees, Norman, & Brown, 2004). Despite of its importance, census population data have limitation in terms of (1) spatial resolution and (2) temporal resolution (Bhaduri, Bright, Coleman, & Urban, 2007). Regarding the former limitation, census population data are released at a larger geographical area to protect the confidentiality of respondents. In terms of the latter limitation, the census is commonly done every 10 years in most countries and they may be either outdated or not available at all in some developing countries (Rees et al., 2004). Moreover, the census and field survey are time consuming and labour intensive. These limitations have fostered the search for alternative methods to estimate population in a timely and effective way (Liu & Herold, 2007). The developed models for mapping population using a geographic information system (GIS) and remote sensing were classified into two groups: areal interpolation approaches and statistical modelling approaches (Wu, Qiu, & Wang, 2005). This paper is focused on the statistical modelling approaches. However, other studies used social media and phone calls for mapping population (Deville et al., 2014;Patel et al., 2017;Tsou, 2015).
Inhabitants are commonly found in places classified as impervious surfaces, built area or residential built area. This information can be successfully extracted from the satellite data (Tatem, Noor, & Hay, 2005). Bivariate and multivariate regression models are commonly applied to demonstrate the correlation between population and urban remotesensing variables (Langford, 2006;Lu et al., 2010;Lu, Weng, & Li, 2006;Silvan-Cardenas et al., 2010). A bivariate regression model was applied by Lu et al. (2006) to estimate population at the block group level in Marion, Indiana, USA, using Landsat data. Impervious surface, residential area and residential impervious surface were regressed with the square root of population density. Lu et al. (2006) concluded that the residential impervious surface model was the best model to estimate the population density. Joseph et al. (2012) applied a multivariate regression model using three urban remote-sensing variables: mean of impervious surface, mean of vegetation and standard deviation of vegetation.
A potential problem with the most commonly used statistical modelling approaches (bivariate and multivariate regression models) is that the regression coefficients of the dependent and explanatory variables are often assumed to be stationary over space (Langford, 2006). There are some reasons to assume that the relationship between the response and explanatory variables might be varied across the study area. One reason is that it is difficult to get a landuse/cover map with 100% accuracy. It is possible to have some omission and commission errors between the land-use/cover classes and this would be a source of spatial data variability (Lo, 2008). For example, in some parts of the study area, a residential built area could be classified as bare land, and any inhabitants who have lived in those areas would have been missed. Another reason is that the relationship between the variables examined could be fundamentally different over space as spatial variation can be recognized in human behaviour, administrative boundaries or environmental factors that lead to different processes over space (Fotheringham, Brunsdon, & Charlton, 2002;Su, Foody, & Cheng, 2012). For example, the modifiable areal unit problem indicates that the results of spatial data analysis could be affected by modifying the areal boundaries of data aggregation. For this reason, it is not clear to what extent the results obtained (for example, dwelling density) are an artefact of the particular areal boundaries used and whether different results would have been obtained using different boundaries superimposed on the same data. Some researches (Langford, 2006;Lo, 2008;Yuan, Smith, & Limp, 1997) addressed this problem by subdividing the study area into smaller parts (based on administrative units) and then applying a regression model to each part. Langford (2006) indicated an important point about how the part size should be before applying the regression model. Other researches (Dmowska & Stepinski, 2014;Jia & Gaughan, 2016;Ural et al., 2011) applied dasymetric mapping to account the spatial non-stationarity.
Since the mid-2000s, a geographically weighted regression (GWR) has been applied in human geography. The GWR technique takes into account the spatially varying relationship between the investigated variables (Fotheringham et al., 2002). Mennis (2006) mapped the result of the GWR in Philadelphia, USA, at the census tract level using census data and concluded that the variation of the relationship between the variables was accurately explained by the GWR approach compared with the ordinary least squares (OLS) approach. A detailed study was undertaken by Lo (2008) who compared the use of one-class global, four-class global, four-class regional and four-class GWR regression models to estimate population at census tracts in Atlanta, USA, by utilizing Landsat TM data. Lo (2008) concluded that the four-class regional OLS model was more accurate than the one-class and the four-class global OLS models. Moreover, the four-class GWR model produced much more accurate results compared with all the other models (Lo, 2008). Landsat imagery and height data were used to estimate population in Riyadh by using regression through the origin (Alahmadi et al., 2013) and dasymetric mapping (Alahmadi et al., 2016). Alahmadi et al. (2013Alahmadi et al. ( , 2016 concluded that the remotely sensed height data were useful to accurately estimate dwelling units. Dong et al. (2010) extracted building counts, areas and volumes from Landsat TM imagery, remotely sensed height data and parcel boundaries to estimate population at the census block in the eastern part of Denton, USA. In that study, the results of the GWR models were more accurate than those of the OLS models. Moreover, the results indicated that areas with high-density population usually were underestimated (Dong et al., 2010). Joseph et al. (2012) derived houses and vegetation from Landsat ETM+ to estimate population at "Section d'énumération" in Portau-Prince, Haiti. In their study, the GWR models estimated population better than the OLS models (Joseph et al., 2012). Mulatu, Van Der Veen, Becht, Van Oel, and Bekalo (2013) estimated population at the sub-location level in Lake Naivasha Basin, Kenya, using general land-cover classes obtained from Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) data. In that attempt, the GWR model was more accurate than the OLS model (Mulatu et al., 2013). Bagan and Yamagata (2015) estimated population density at a coarse spatial resolution (1 km) using general landcover classes and concluded that the GWR model outperformed the OLS model.
From the above-mentioned studies, Landsat imagery was commonly used to estimate population. The spatial resolution of Landsat imagery is a source of limitation because it is difficult to demarcate urban objects empirically (e.g. villas) that are much smaller than the pixel size, while this would not be the case when the urban objects (e.g. built area) are relatively large compared with the pixel size of the Landsat imagery. However, objects such as built area may be classified inaccurately due to the presence of mixed pixels in the Landsat imagery. Most studies used urban remote-sensing variables such as impervious surface (Li & Lu, 2016;Lu et al., 2006;Patel et al., 2015), built area (Dong et al., 2010;Langford, 2006), residential built area (Lu et al., 2010(Lu et al., , 2006Silvan-Cardenas et al., 2010) and two-residential classes (Deng et al., 2010;Langford, 2006;Silvan-Cardenas et al., 2010). It is not realistic to deal with the city as homogenous. Transition can arise in building structure, dwelling density, family size, human attitude, landscape design and economic activity within the city blocks (Bhaduri et al., 2007;Cadenasso, Pickett, & Schwarz, 2007). Few studies (Lo, 2008;Mennis, 2003;Mulatu et al., 2013) gave attention to the city heterogeneity in terms of three-residential classes such as low-density, medium-density and high-density urban uses.
This paper examines the city heterogeneity in detail by (1) using various land-use/cover categories obtained from fine-spatial-resolution imagery and (2) exploring the application of different statistical modelling approaches for dwelling estimation. This research examines different models ranging from simple to complex models and demonstrating the amount of change in accuracy between them.
This research aims to analyse the spatial non-stationarity in the relationship between dwelling unit density and urban remote-sensing information for small-area population estimates. The study area and data are presented in the second section, while the methods are explained in the third section, covering object-based image analysis (OBIA), OLS, GWR and accuracy assessment. The results of image analysis, OLS models and GWR models are clarified in the fourth section. Finally, discussion and conclusions are provided in the fifth and sixth sections, respectively.

Study area
Riyadh is the capital and most populated city of Saudi Arabia. According to the census data, population of Riyadh was 2.6 million in 1992, 3.7 million in 2004, 5.2 million in 2010 and 6.5 million in 2016. This rapid increase needs timely and accurate approaches of population estimation to support urban and regional planning.
Riyadh city is heterogeneous in terms of urban structure, dwelling counts, population size, education services and municipal services. Unfortunately, it was not possible to obtain the dwelling unit data for a larger area or more than one ward due to the production cost and other issues. Um Alhamam ward locates in the west part of Riyadh ( Figure 1). This ward has a population of 45,991 inhabitants, an area of 4 km 2 , and contains 488 blocks and 2839 parcels. This ward is chosen because it represents various land covers (residential, services, vegetation and industrial.) and includes different types of residential units (e.g. lowdensity, medium-density and high-density areas), thus making it a very good representative ward of Riyadh and a perfect ward to examine the spatial non-stationarity between dwelling unit density and urban remote-sensing variables.
Data Demographic information at the ward level for 2004 was taken from the General Authority of Statistics (GaStat). A ward in Saudi Arabia has an average total population of 27,500 and a mean area of 4 km 2 . Dwelling unit data at the parcel and block levels for 2004 were obtained from Arriyadh Development Authority. A block includes 15-30 parcels that are encircled by the road network. A parcel is the smallest municipal unit at which statistical data are gathered and has an average area of 550 m 2 . QuickBird data acquired on 27 February 2004 were taken from King Abdulaziz City for Science and Technology (KACST). The geometric and radiometric distortions had been corrected by the National Centre for Remote Sensing Technology (NCRST). The coordinate system of the vector and satellite data is Ain El Abd 1970 UTM Zone 38 N.

Methods
One aim of this study is to estimate population at the parcel level. However, the GaStat publishes the census population data at a coarser spatial level (ward level) and this prevents validation at the parcel level. Dwelling unit counts are available at the parcel level and logically there is a strong positive correlation among population and dwelling units. Thus, dwelling unit density is used as a dependent variable instead of population density.
In this research, all blocks are utilized in model calibrations and then regression coefficients are used to estimate dwelling unit density at the parcel level.
Count of dwelling units (model validation) is computed from the dwelling unit density. Then, based on the average number of people per dwelling (5.9 in the study area), the small-area population map is produced.

Object-based image analysis
A new collection of very-fine-spatial-resolution sensors has been launched since 1999 such as QuickBird (0.65 cm), Pleiades (0.50 cm), GeoEye (0.46 cm) and WorldView-4 (0.31 cm). The development of spaceborne sensor technologies motivates researchers to use the ≤1 m spatial resolution that can minimize the mixed-pixel issue (Lu & Weng, 2009) to map detailed urban structures and land-use information. However, the limitation due to the lower spectral resolution (only four bands) increases the difficulties of distinguishing urban information associated with similar spectral characteristics (Pinho, Fonseca, Korting, De Almeida, & Kux, 2012) when using a classical pixel-based approach. Fine-spatial-resolution satellite data contain urban information with heterogeneous spectral characteristics. Classical pixel-based approaches cannot produce accurate land-cover/use maps (Pu, Landry, & Yu, 2011) because they utilize only spectral characteristics and thus produce a salt-and-pepper effect (Guan et al., 2013;Hamedianfar, Shafri, Mansor, & Ahmad, 2014;Qin, Niu, Chen, Li, & Ban, 2013).
To overcome the above-mentioned problem, region-or object-based approach is used instead of the pixel-based approach (Blaschke, 2010;Pu et al., 2011). The object-based approach can provide accurate results with fine-spatial-resolution satellite data. For example, (1) image pixels can be grouped into objects in a similar way to human visual interpretation of the landscape; (2) land-use classes are obtained based on both spectral and spatial characteristics; and (3) different object levels (i.e. different scales) are incorporated to inherit classification between the different object levels and extract a certain land-use class (Kux & Araújo, 2008).

OLS regression
An OLS model has been utilized to explore the relationship between a dependent variable (dwelling unit density) and an independent variable (urban remotesensing information). The OLS model estimates coefficients of the examined variables based on the bestfitting line of all observations (Equation (1)). The OLS coefficients are assumed to be stationary over space.
where Y i is the dependent variable for observation i (i = 1,2, . . ., m), β 0 denotes the constant, β j is the coefficient for variable j, X ij is the independent variable for the jth variable at ith observation, n denotes the number of independent variables and ε i is the residual term.

Geographically weighted regression
The GWR approach is built on the conventional regression model by allowing the relationship between variables to be local (spatial non-stationarity) rather than global. Thus, the model is expressed as: where (u i , v i ) represents the coordinate location for observation i (i = 1,2, . . ., m). The idea of the GWR is that nearby observations have larger weights than distant observations. The GWR result is influenced by the bandwidth of the spatial weighting function chosen and is relatively unaffected by the choice of weighting function. Although the weighting function can be drawn in several forms, there are two common types: fixed and adaptive. The former is suitable in areas where observations are regularly distributed in the study region, whereas the latter is preferred when the observations are irregularly distributed and the sample points have a clustered pattern in the study region (Fotheringham et al., 2002). More detailed explanation of the GWR approach is found in Fotheringham et al. (2002).

Accuracy assessment
The block level is used to calibrate models, whereas the parcel level is used to validate models. Three different measurements are used for accuracy assessment, including overall relative error (ORE), mean absolute error (MAE) and root mean square error (RMSE).
where EDW denotes the overall estimation of dwelling unit, DW is the overall truth of dwelling unit, EDW tz and DW tz refer to the estimation and truth information of dwelling unit for target zones (i.e. parcels) tz, respectively, and TZ is the overall number of target zones. Smaller values of the accuracy assessment measurements suggest a great model performance.

Dependent and explanatory variables
Square root of dwelling unit density is utilized as the dependent variable (computed as dwelling counts/ area of zones). The proportions of urban remotesensing variables in each zone are used as explanatory variables (computed as K extent of the urban remotesensing variable/overall extent of the urban remotesensing variables). For example, suppose a total area of a source zone is 100 m 2 and it comprises of 40 m 2 bare land and 60 m 2 residential built area. Now, the residential built area proportion will be computed in that zone as 60/100 = 0.6.

Land-use/cover information
The eCognition software (v.8.8) is used to analyse the QuickBird imagery. The implementation of the OBIA requires three essential phases: (1) image segmentation, (2) feature selection and (3) classification algorithms. Image segmentation is defined as a function that groups image pixels into homogenous regions or objects (Ryherd & Woodcock, 1996). In terms of the first phase, multiple layers of image segmentation are created to allow hierarchical processing (Pinho et al., 2012). Layer-1 and Layer-2 image segmentations are dependent on the block and parcel (Erol & Akdeniz, 2005) boundaries, respectively, by adopting a chessboard algorithm (a top-down algorithm). The most important factor at Layer-1 and Layer-2 is defining the vector boundaries. Layer-3 image segmentation is dependent on the image channels (Lyons, Phinn, & Roelfsema, 2012) using a multi-resolution segmentation (a bottom-up algorithm). Shape and scale factors are essential to control the diversity of the image objects. The scale factor influences the size of the image object, and a value of 30 (Walker & Blaschke, 2008) is applied to obtain smaller image objects within Layer-2 (a segmentation based on the parcel boundaries). The shape factor is determined by giving a weight to the colour and shape parameters. In this article, the colour is emphasized (0.9) compared with the shape (0.1) to obtain meaningful image objects (Walker & Blaschke, 2008;Johnson, 2013). During the second phase, many attribute features (mean, saturation, hue, area, compactness and texture) are utilized to demarcate the detailed land-use/ cover classes at the different layers. The transferability is taken into account when writing the rules through two approaches. The first approach is that the rule must be transferable in theory. For example, a rule "distance to" is based on the first law of geography that near things are more strongly related than distant things (Tobler, 1970), and this is mostly more transferable than a rule that is based on the spectral values. The second approach is based on the influence (omission and commission errors) of the rule. For example, if the rule just corrects few urban areas, or it affects other urban areas, which are correctly classified, the rule is ignored. In the third phase, three classification algorithms are used: (1) knowledge-based expert system (Hayes-Roth, Waterman, & Lenat, 1983), (2) nearest neighbour (Hubert-Moy, Cotannec, Le Du, Chardin, & Perez, 2001) and (3) classification and regression tress (Breiman, Friedman, Olshen, & Stone, 1984). The general implementation of the OBIA can be summarized in four steps.
(1) At Layer-3, general land-cover classes are obtained based on the nearest neighbour algorithm and knowledge-based expert system. (2) At Layer-1, detailed land-use classes are extracted based on the classification and regression tree (CART) algorithm and enhancement is done based on the knowledge-based expert system. (3) At Layer-2, detailed land-use classes are inherited from Layer-1 and enhancement is done based on (i) the knowledge-based expert system at the same layer (Layer-2) and (ii) the hierarchical process using Layer-3. (4) At Final Layer-3, the results from Layer-2 and Layer-3 are combined.
The land-use map at Layer-3 ( Figure 2) is assessed by utilizing 50 random image objects for each class. Table 1 defines the land-use classes and that represents the situation of the urban structure in the Arab Gulf countries such as Riyadh. Table 2 shows the error matrix of the land-use classes. The overall accuracy and Kappa statistics are 93% and 0.92, respectively. The producer's and user's accuracies of the land-use classes indicate that most of the residential categories are accurately classified. However, there is confusion when discriminating low-density and medium-density villas due to the similarity in terms of the urban structures.

Model-1-OLS: traditional model (built area)
The different land-use categories (Figure 2) are processed by different routes to achieve the purpose of this article. The built area class is obtained by combining the residential and non-residential built areas and used as an independent variable in Model-1. The block level (488 blocks) is utilized to provide the data for the calibration, whereas estimation and assessment are performed at the parcel level (2839 parcels). The correlation coefficient between dwelling unit density and built area proportion is 0.6. The dependent variable is transformed (Joseph et al., 2012) using a square root transformation to maximize linearity of model calibrations. The correlation coefficient is increased when the dependent variable is transformed. Figure 3(a) shows a large positive correlation (r = 0.73) between the square root of dwelling unit density and built area proportion. Model-1 is estimated by: SQRT DUD ¼ 141:9B À 19:5 R 2 ¼ 0:53; where SQRT DUD indicates the square root of dwelling unit density and B denotes the built area proportion. All the regression coefficients of the developed OLS models are statistically significant at the α = 0.05 level. The estimation of dwelling units and accuracy measurements of all models are reported in Table 3. It is noticed that low accuracies are resulted from Model-1.

Model-2-OLS: residential built area
The explanatory variable in Model-1 includes non-residential buildings (i.e. commercial and government buildings). The non-residential buildings may cause the low accuracies of Model-1. Thus, these areas are eliminated in Model-2 by combining only the six-residential classes (extensive residence, large villa, classic villa, low-density villa, medium-density villa and residential blocks of flats) to produce a new class termed "residential built area". Model-2 is estimated by: where B res denotes the residential built area proportion. The correlation between the square root of dwelling unit density and the residential built area proportion (Figure 3(b)) is improved (r = 0.79) as well as the other accuracy measurements compared with Model-1.

Model-3-OLS: three-residential classes
Due to the complexity of urban structure, variation of dwelling densities (Liu & Herold, 2007) is existent within the residential built area. To address this issue, the six-residential classes are combined to produce (1) low-density, (2) medium-density and (3) where B Lres , B Mres and B Hres denote the proportions of the low-density, medium-density and high-density areas, respectively. All the accuracies are increased in Model-3 compared with Model-2.

Model-4-OLS: six-residential classes
Model-3 provides more accurate results than Model-2. However, it is believed that the urban heterogeneity is still an issue. Thus, it is useful to consider whether there is a variation in dwelling unit densities within each class in Model-3. Model-4 is an attempt to utilize six-residential classes.

GWR models
All the previous models (Models 1, 2, 3 and 4) are replicated using the GWR approach. Thus, this allows a relevant comparison between and within each approach. The developed GWR models are: • Model-5-GWR utilizing built area.
The GWR tool in ArcGIS is used to run the models. The adaptive kernel is chosen because the observations are not regularly distributed in Um Alhamam ward and the Akaike information criterion (AICc) is used to determine the bandwidth. Table 4 shows the diagnostics of the GWR models. The R 2 values indicate that the square root of dwelling unit density is strongly correlated with the three-residential classes (adjR 2 = 0.89) in Model-7 than the built area (R 2 = 0.79) in Model-5 and the residential built area (R 2 = 0.83) in Model-6. Figure 4 shows the GWR parameters for Model-5, Model-6, Model-7 and Model-8. The GWR coefficients suggest that the relationship between the square root of dwelling unit density and the urban remote-sensing variables in Model-5, Model-6, Model-7 and Model-8 vary across the study area. As expected, the results of Model-7 and Model-8 are more accurate than those of Model-5 and Model-6. For example, the RMSEs in Model-7 and Model-8 are 2.0 and 2.1, respectively, whereas they are 3.8 and 2.7 in Model-5 and Model-6, respectively. On the other hand, it is surprising to note that the 3-class GWR model (Model-7) is more accurate than the 4-class GWR model (Model-8).
The conventional model (Model-1), the best OLS model (Model-3) and the best GWR model (Model-7) are selected to map the residual dwelling estimates and population distribution ( Figure 5). Most parcels are well estimated in Model-3 ( Figure 5(b)) and Model-7 ( Figure 5(c)), whereas overestimation and underestimation are obtained in the low-density and high-density parcels, respectively, in Model-1 (Figure 5(a)).

Discussion
This work reveals the efficiency of utilizing fine-spatialresolution imagery to demarcate a detailed land-use/   cover map similarly to other works such as Silvan-Cardenas et al. (2010) and Pu et al. (2011). However, this work is distinct because it demarcates six-residential classes and then manipulated them to produce a variety of urban remote-sensing variables. OBIA is used to analyse the QuickBird data. The accuracy measurements of the land-use map are greater than the common recommendation accuracies. However, it is challenging to discriminate classes that are similar in terms of the spectral and spatial properties such as lowdensity and medium-density villas. This challenge could be taken into account by using fine-spectral-resolution imagery or using ancillary urban information such as electricity and water consumption. Robust models of population estimation require (1) appropriate explanatory variables and (2) accurate land-use/cover data. Model-1 is based on the OLS approach and uses built area proportion as the explanatory variable. This variable includes non-residential uses such as public and commercial buildings. The influence of the non-residential uses in Model-1 is double. First, it decreases the slope value (β) in model calibration. Second, it distributes inhabitants in places which are not residential in reality (model validation) and this will cause overestimations. Thus, the results of Model-1 are disappointing (ORE = 27%, MAE = 1.7 and RMSE = 3.8). The limitation of Model-1 (the non-residential uses) has been addressed in Model-2 by using residential built area as the explanatory variable. It is clear that excluding the non-residential uses does improve the model calibration in Model-2 and this is proved by the R 2 value (0.62) as well as the model accuracies (ORE = 21%, MAE = 1.5 and RMSE = 2.7) compared with Model-1.
Model-1 and Model-2 assume that the relationship between the independent variables (built area and residential built area, respectively) and the square root of dwelling unit density is stationary over space. This is not the case in the study area because the study area comprises different types of residential housing. Thus, Model-1 and Model-2 fail to address the spatial non-stationarity. The limitation of Model-1 and Model-2 has been taken into account in Model-3 by using three explanatory variables (low-density, medium-density and high-density areas). The results of Model-3 are more accurate than those of Model-1 and Model-2. For example, the ORE and RMSE results improve to 8% and 2.5 in Model-3, respectively, compared with the results of Model-1 and Model-2. The different coefficients of the independent variables in Model-3 assume that the spatial non-stationarity is an issue in Um Alhamam ward. This leads to a further investigation of whether the relationship could also vary (spatial non-stationarity) within the different categories as well in Model-3. Thus, six-residential classes (Model-4) have been examined. The results of Model-4 are more accurate than those of Model-1 and Model-2. Regardless of the sophistication and complexity of Model-4, the accuracies are not improved compared with Model-3. That means representative models can outperform the sophisticated models. Similar findings were also reported by Langford (2006) in terms of dasymetric mapping. The reason is that the representative models are less sensitive to some important factors that affect models in the calibration and validation processes such as the actual dwelling densities and image classification accuracies. Models 1-4 investigate the spatial non-stationarity by adopting the OLS approach, whereas Models 5-8 account the spatial non-stationarity by adopting the GWR approach.
In general, all the GWR models are more accurate than the OLS models. For example, Model-5 shows an increase in the R 2 (0.79) compared with Model-1 (0.53), demonstrating that the GWR model (Model-5) explains the variation of the square root of dwelling unit density better than the OLS model (Model-1). The regression coefficients of Model-1 (Equation (6)) assume a stationary change (135.18) of the dependent variable in regression with the built area proportion. On the other hand, Model-5 assumes local coefficients (Figure 4) (Model-5 built area) that vary from 2 to 305 over Um Alhamam ward, with a strong influence in the middle where medium-density and high-density areas are concentrated and a weak influence in the east where lowdensity areas are distributed. The intercept value of Model-1 is constant (9), whereas it changes in the sign and quantity (−173 to 62) in Model-5. The range of the local coefficients in Model-5 is a clear evidence of the spatial non-stationarity. Thus, the MAE (1.7) and RMSE (3.8) are more accurate in Model-5 compared with Model-1.
Although Model-5 accounts the spatial non-stationarity, the explanatory variable (built area) may not be the most appropriate one. It is interesting to refine Model-5 by using the residential built area (Model-6). It is noticed that the AICc of Model-6 is 4271, which is smaller than the AICc (4365) of Model-5. This indicates that the calibration of Model-6 is better than Model-5 and this is also proved by the R 2 values (0.83 in Model-6 and 0.79 in Model-5). As a result, the ORE and RMSE are decreased from 27% and 3.8 in Model-5 to 21% and 2.7 in Model-6, respectively. Utilizing the residential built area as a covariate in a study area that comprises different residential types is a source of limitation. Model-7 aims to reduce the limitation of Model-6 by utilizing three residential classes. The explanatory variables in Model-7 show different coefficients across the study area. For example, the local coefficients for the low-density area (Figure 4) (Model-7 low-density areas) show high positive values in the east part where low-density area is concentrated and low negative values in the middle where high-density area is distributed. The AICc (3983) and the R 2 (0.89) in Model-7 compared with Model-6 suggest that the calibration of Model-7 is better than Model-6. Moreover, the accuracies of Model-7 are more than Model-6. For example, the ORE, MAE and RMSE are decreased to 10%, 1.2 and 2.0, respectively, in Model-7 compared with Model-6.
As a result of complexity of urban systems, the sixresidential class (Model-8) is examined. The results of Model-8 are more accurate than those of Model-5 and Model-6 but not than of Model-7. That means advanced models do not always give accurate results. It is important to mention that the results of Model-3 (the OLS approach) are similar to those of Model-7 (the GWR approach). This assumes that the spatial non-stationarity can be successfully addressed by using representative and appropriate urban covariates.
It should be noted that Um Alhamam ward is used to develop the OLS and GWR models. This indicates that all the regression coefficients are specifically relevant to the urban characteristics of that ward and they are not transferrable to other sites. However, the modelling process (i.e. OBIA), findings and conclusions will be applicable to other sites. The population maps ( Figure 5(d-f)) are a final conversion of the dwelling estimates. This means that the variation in terms of the family size between the different urban covariates is not addressed. This limitation could be overcome by additional information such as utility usage because a large family is likely to consume more electricity than a small family.

Conclusion
Small-area population distributions are essential for different applications such as national resource management and urban-sprawl analysis. This research analyses the spatial non-stationarity in the relationship between dwelling unit density and land-use/cover map to estimate population at the parcel level in Um Alhamam ward. General and detailed land-use/cover classes are obtained from QuickBird satellite data using the OBIA method. Most of the land-use/cover classes are correctly classified. Two approaches are applied to estimate dwelling density at the parcel level: OLS and GWR.
The OLS approach assumes constant regression coefficients across the study area, thus failing to deal with the spatial non-stationarity. Model-1 uses built area which is not suitable to describe the variation of the square root of dwelling unit density, and thus, unsatisfactory results are expected. A refined explanatory variable such as residential built area (Model-2) does improve the accuracy, but the spatial non-stationarity is still an issue. The three-residential classes (Model-3) account the spatial non-stationarity, and thus, the results are improved. The complex model (Model-4) does not improve the accuracies compared with the representative model (Model-3).
Generally, most of the GWR models are more accurate than the OLS models because the GWR approach does account the spatial non-stationarity unlike the OLS approach. However, the GWR approach is not enough to completely address the spatial non-stationarity. The explanatory variables are also important likewise the model construction. Thus, the application of the GWR approach in Model-7 (three-residential classes) provides more accurate results compared with Model-6 (residential built area) and Model-5 (built area).