Modelling spatial processes in quantitative human geography

ABSTRACT We discuss the nature of processes relating to human behaviour and how to model such processes when they vary over space. In so doing, we describe the role of local modelling and how the bandwidth parameter, a component of multiscale geographically weighted regression, can inform on the spatial scale over which processes are relatively constant. To do this, we translate properties of spatial data, such as heterogeneity and spatial dependency into the realm of spatial processes. We argue that the modelling of spatially varying processes has important ramifications for how we see the world.


Introduction
The study of processes in quantitative human geography has a long history, nicely encapsulated by Hay and Johnston (1983). However, as 'process' is often interpreted as a sequence of events leading to a particular outcome, studies in quantitative human geography that focus on processes are often hampered by a lack of temporal data. Typically, data are observed at multiple locations but in only one time period. In contradistinction, studies in physical geography and economics often have the luxury of data sets that have a rich temporal dimension, although sometimes with limited spatial resolution. Under the assumption that processes are spatially stationary (the usual scenario in the physical sciences), the lack of spatial resolution is not a hindrance to the identification of processes if sufficient time-series data are available at even just a single location. However, what if the processes being studied are NOT spatially stationary (a likely scenario in the social sciences)? In this situation, can we then replace the variations we typically measure in the temporal dimension with variations we observe in the spatial dimension to infer something about such processes? In the remainder of this paper, we explore this situation.
Throughout most of its long history, human geography has been primarily concerned with data. Initially, the focus was on mapping data, but during the 1960s, statistical methods were introduced and the analysis of spatial data came to the fore, exemplified by measures of spatial dependence and point pattern analysis. The idea that we could quantify issues such as the degree to which similar data values clustered was appealing and spurned a huge literature (Moran 1948(Moran , 1950Whittle 1954;Matheron 1963;Paelinck and Klaassen 1979;Cliff and Ord 1981;Hubert, Golledge, and Costanzo 1981;Anselin 1988Anselin , 1995Getis 1991;Getis and Ord 1992;Getis 1995, 2001;Griffith 2003). Around this time, human geography developed its first, and possibly only, 'law' which again focused on data -'everything is related to everything else, but near things are more related than distant things ' Tobler 1970. Although this is not really a 'law' but an empirical observation, it stands as a description of a regularity that has achieved the status of a 'law within human geography and encapsulates the concept of data spatial dependency succinctly.
As human geography's 'quantitative revolution' proceeded and spatial analysis became both commonplace and more mature, the traditional emphasis of the discipline on data gradually gave way to a competing interest in processes. It became insufficient to describe how some variable was distributed over space and the emphasis shifted towards providing an explanation of why a variable was distributed in a certain way over space. This why question led to an interest in the process or processes causing the observed distribution of data. As Harvey (1967) noted: ' . . . different processes become significant to our understanding of spatial patterns at different scales '. (p. 71-72), an observation expounded upon decades later by Manley, Flowerdew, and Steel (2006): Spatial distributions are based on processes taking place in geographical space. A mapped pattern may reflect several distinct processes, each of which may affect a different area and operate at a different scale. The challenge for the spatial analyst is to identify these processes and evaluate their importance from the spatial pattern observed. (p. 143) As a consequence of the growing interest in process rather than form, came a focus on the development of models, particularly explicitly spatial models, such as spatial regression models (Anselin, 1988(Anselin, , 2002(Anselin, , 2009Griffith & Csillag, 1993;Haining, 1993;Tiefelsdorf, 2000;Gelfand et al., 2003;Haining, 2003;Waller & Gotway, 2004;LeSage & Pace, 2010;Banerjee et al., 2014) and spatial interaction models (Olsson, 1970;Wilson, 1971;Fotheringham, 1983a;1983b;Fotheringham & O'Kelly, 1989). Although spatial models have several purposes, undoubtedly the main one is to try to uncover what factors affect the spatial distribution of a particular variable; or, rather crudely, to answer the question: 'Why are some values high and why are some low?'. This is typically done by making inferences about how covariates influence the dependent variable through the estimates of the parameters obtained in the calibration of the model.
The focus of this paper is encapsulated in Figure 1. In whatever subject matter we study, we understand that there are processes, which here we think of as being interactions between attributes, that lead to a certain outcome that we measure. In the physical sciences such processes can often be deduced from first principles, but in the social sciences such processes are often unknown and have to be inferred from the measured outcomes. This latter is not ideal because of the problem of equifinality -the same outcome can result from different processes -but it underlies the complexity of dealing with the actions of human beings where universal laws of behaviour are rare. However, it can be argued that making inferences about properties of processes is better than ignoring them. 1 Here, we focus on a situation with even greater complexity -where the processes being examined may not be spatially stationary, again a situation which highlights the difference between modelling the physical environment from the human environment.

Modelling spatial processes
From the origins of the quantitative 'turn' across many social sciences came a focus on relationships between attributes with regression-based models exemplified by equation (1): where y is the variable of interest, x 1 ; x 2 ; . . . ; x n are covariates, β 0 is the intercept, β 1 ; β 2 ; . . . ; β n are slope parameters and ϵ is a random error term. Each of the slope parameters represents the conditional effect of a change in the respective covariate on y and hence is an indicator of a specific process operating to contribute to the value of y observed at each location. Consequently, it is from the estimates of these parameters obtained in the calibration of the model that we make inferences about each of the processes that together create the observed distribution of y. It quickly became apparent that models of the format of equation (1) were often inadequate when applied to spatial data because the distribution of epsilon generally exhibited significant positive spatial autocorrelation, violating the requirement of Gaussian regression that the error terms be identically and independently distributed. If the error terms exhibited significant clustering, how could they be independently distributed? This prompted the development of various forms of what became known as spatial regression models exemplified by that of a spatial error model shown in equation (2) (Anselin, 1988;Gibbons & Overman, 2012;Kelejian & Prucha, 1998, 1999Lesage, 2016;LeSage & Pace, 2010).
where ϵ is a vector of spatially autocorrelated error terms, λ is a parameter estimated to measure the spatial autocorrelation in the error term ϵ, Wis a spatial weights matrix, μ is a vector of independent and identically distributed (i.i.d.) errors, x 1 ; x 2 ; . . . ; x n are covariates and β 0 ; β 1 ; . . . ; β n are parameters to be estimated given data on y and x 1 ; . . . ; x n . A fundamental assumption to OLS or spatial regression models (and to almost any other form of model used in Geography prior to 2000) is that the processes being inferred through the parameters of the model are stationary over space. That is, we typically collect data from various spatial locations and use all these data to calibrate a model of the form of either equation (1) or (2) to produce a single estimate of each parameter. The implicit assumption is therefore that the process represented by a single parameter in the model must be stationary over space. We now question this assumption and examine the ramifications of relaxing it.
Until recently, there existed a logical inconsistency in spatial data analysis whereby, while it was taken as given that spatial data exhibited heterogeneity, spatial processes were assumed to be homogeneous. This belief was presumably a hold-over from borrowing quantitative techniques which had their origins in the natural sciences and where processes are typically stationary over space. Whereas most physical processes are invariant to location, social science processes, involving the beliefs, preferences and actions of human beings, are likely to vary according to location. Indeed, a huge literature exists supporting this idea (Chetty and Hendren 2018;Sampson 2019;Darmofal 2008;Plaut, Markus, and Lachman 2002;Escobar 2001;Diez-Roux 1998. Consequently, various local modelling paradigms have been developed by geographers and statisticians which challenge the assumption of spatial process stationarity by allowing the parameters in a model to vary over space, as typified by equation (3).
where x ni is an observation of the n th explanatory variable at location i, β ni is the n th parameter estimate which is now specific to location i, and ϵ i is a random error term.
In this representation of the world, spatial process variation is accommodated by the flexibility of allowing each parameter to vary over space. Several frameworks have been established that allow the modelling of spatially varying processes, such as Geographically Weighted Regression (GWR); Bayesian Spatially Varying Coefficients Models (SVCM), Eigenvector Spatial Filtering (ESF) and Spatially Clustered Coefficient models (SCC) (Banerjee et al., 2014;Fotheringham et al., 2003;Fotheringham et al., 2017;Gelfand et al., 2003;Griffith, 2008;Li & Sang, 2019;Murakami et al., 2017). . Although these modelling frameworks differ in how they represent spatially varying processes, they share the basic idea that processes might vary over space and that global model representations such as those in equations (2) and (3) are inadequate in such circumstances. Also common to all four frameworks is the assumption of process spatial dependency. That is, if processes do vary over space, they are unlikely to vary randomly but are likely to follow a process-based equivalent to Tobler's Law regarding the prevalence of spatial data dependence. We now examine why processes might vary over space and the ramifications of such process spatial nonstationarity.

The role of spatial context
Given the current interest in the development of local statistical methods that allow the modelling of spatially nonstationary processes, it is pertinent to ask why processes might vary over space. After all, they typically do not in the physical sciences so why might they in the social sciences? The answer to this is not an easy one as the processes themselves are hidden, and we can only make inferences about them. Consequently, although there is a substantial amount of empirical evidence that supports the notion that many processes do vary over space, it is difficult to be absolutely certain that our models are not picking up some spurious cause of parameter variation. Equally, we have little theory to guide us on this issue and we have to rely largely on supposition and hypotheses. That said, there is a vast amount of literature which suggests that 'place matters' and that local 'context' can have a major impact on people's beliefs, preferences, and actions (Agnew, 2014;Duncan & Savage, 2006;Golledge, 1997;Goodchild, 2011;Gould, 1991;Hartshorne, 1939aHartshorne, , 1939bHarvey & Wardenga, 2006;Pred, 1984;Relph, 1976;Sayer, 1985;Thomae, 1999;Tuan, 1979;Winter et al., 2009;Winter & Freksa, 2012). Such a link between place and behaviour can arise if people are influenced by the people they talk to on a regular basis or by their local media or if they face longterm conditions that are particular to certain locales and which shape people's outlooks on issues. Evidence for this on a large scale can be seen in geographic variations in preferences for certain types of foods, music, house styles, political parties, etc. (Walker and Li 2007;Enos 2017;Escobar 2001;Shortridge 2003;Agnew, 1996;Braha and De Aguiar 2017;and Fotheringham et al. 2021). However, despite a wealth of evidence that place matters and that location can help shape preferences and actions, an alternative viewpoint is that spatial context does not exist and that what is referred to as 'context' is merely a catch-all term for those covariates not included in the model either because they have not been conceived of having importance or because they are difficult to measure McAllister 1987. It is difficult to argue definitively for either viewpoint because even though many sociological and psychological studies have pointed to the relevance of context (Enos 2017;Plaut, Markus, and Lachman 2002;Rentfrow, Jokela, and Lamb 2015;Krug and Kulhavy 1973) and a great number of geographical studies have espoused the role of location in affecting behaviour from a theoretical viewpoint (Blake, 2001;Books & Prysby, 2016;Carsey, 1995;Chandola et al., 2005;Rousseau & Fried, 2001;Snedker et al., 2009), it could equally be claimed that whatever the effects of location are, they could, theoretically, be measured and incorporated into the model. Whether context is a 'real' effect or simply a catch-all for variables that cannot be or have not been measured therefore remains elusive, but whatever its source, the ability to capture 'context' within local models is better than not accounting for it as is the case in traditional global models. In global models, the omission of an important explanatory variable will create misspecification bias in the parameter estimates associated with any covariate that has some degree of covariance with the omitted variable. As an example of this, and the calculation of the explicit degree of misspecification bias caused by an omitted variable (see Fotheringham 1983aFotheringham , 1984. Alternatively, If the processes being modelled do vary over space, but they are modelled with a traditional global model, the latter will be grossly misspecified and the estimates of the parameters derived from this model will simply represent an average process and will be as unrepresentative of the spatial varying processes as is, for example, the mean annual rainfall of the United States as a measure of the annual rainfall in each county. On balance, the strategic view would appear to be to assemble as comprehensive a set of covariates as possible and to calibrate a local model using these covariates. This will reduce any misinterpretation of the local intercept as a measure of context and the model will default to a global one in the situation where the processes are stationary over space.

The spatial scale of process nonstationarity
Local models of the type shown in equation (3) are able to capture spatially varying processes through the estimation of location-specific parameter estimates. However, two of the local modelling frameworks mentioned above, GWR and SVCM, also provide a bandwidth parameter which denotes the spatial scale over which a process varies. In the case of the multiscale versions of GWR and SVCM, covariate-specific bandwidth parameters are estimated, and in the case of MGWR, confidence intervals for these bandwidths can be calculated.
We now discuss what the notion of a 'bandwidth' in (M) GWR means in terms of the spatial scale over which a process varies.
In the (M)GWR framework, a local model is calibrated for each location so that location-specific parameters and various diagnostics are produced. Typically, there exists only one measurement of each variable at each location so data are 'borrowed' from nearby locations and weighted according to the distance each location is from the regression point with weights falling as distance increases (Fotheringham, Brunsdon, and Charlton 2003). The weighting is controlled by a distance-decay parameter, or bandwidth, so that the weights range between 1 (at the regression point) 0 (at the bandwidth). This process is summarized in Figure 2.
The optimal bandwidth is a trade-off between the amount of bias in the local parameter estimates and their uncertainty (or variance). Bias in the local parameter estimates is caused by borrowing data from other locations where the processes that produced those data may not be the same as those that are being estimated at the regression location. Assuming that if processes vary over space, this variation is unlikely to be random and will exhibit some degree of spatial dependency so that bias in local parameter estimates will tend to increase when data are borrowed from more distant locations. Uncertainty in the local parameter estimates exists because we are using a sample of data from which to calculate these estimates and this uncertainty will decrease as more data are borrowed in each local regression. Hence, small bandwidths generate lower parameter bias but increased uncertainty whereas large bandwidths generate greater parameter bias but lower uncertainty.
This situation is captured in Figure 3. Note that the range of data-borrowing for the local regression at i, the bandwidth, can be estimated in terms of a physical distance or in terms of the number of nearest neighbours. It is generally more intuitive, and eases comparisons across studies, if the latter is used.
As Yu et al. (2020) show, the bandwidth at which the bias-uncertainty trade-off in local parameter estimates is optimized can be found by minimizing a corrected Akaike Information Criterion statistic which is the preferred option in MGWR and GWR software routines (see, for example, MGWR 2.2 software that can be downloaded from https://sgsup. asu.edu/sparc/mgwr and runs on both Windows and MacOS platforms). Essentially, as locations from which data are borrowed are added to the local regressions, the reduction in parameter uncertainty outweighs the increase in parameter bias until a point is reached (the optimal bandwidth) at which the increase in parameter bias outweighs the reduction in bias. If the process being modelled has a high degree of spatial variability, the bandwidth will be small; if the process has low spatial variability, the bandwidth will be large. A global process will result in an infinitely large bandwidth and a global model is therefore a special case of a local model in which all the processes being modelled have infinitely large bandwidths which gives a weight of 1 to all data points. Consequently, the optimized bandwidth for each covariate that is reported in an MGWR calibration provides useful information on the spatial scales over which different processes vary (see  for an example of this in a voting context). In this sense, the spatial scale referred to by a covariate-specific optimal bandwidth represents the spatial extent over which a process is relatively stable. Given processes may vary over space, we need some comparative measure of their relative variability and for this we chose the point at which the amount of bias in the local parameter estimates exceeds the amount of uncertainty in the local estimates as measured by their standard error (for more details refer to Yu et al. 2020). Hence, the optimal bandwidth for each covariate denotes the range around a location i where data borrowed for the local regression at i reduce parameter uncertainty more than they increase parameter bias. Beyond the bandwidth, the addition of further data to the regression at i would increase bias more than they would decrease uncertainty. This is a measure of a property of spatially varying processes which can be compared across covariates to inform on the spatially varying nature of each local process.

Summary and implications
The premise of this paper is that there are reasons to suspect that the processes producing the data we observe about human beings and their actions may not be stationary due to the influence of spatial context. This is a relatively new perspective in quantitative spatial sciences. Should social processes exhibit spatial nonstationarity, what are the ramifications of this? Should we ignore this possibility and continue calibrating global models which assume spatial stationarity and which will yield only average statements of spatially varying processes? This paper takes the view that we should try to model and explore any spatial process of nonstationarity as an interesting and potentially useful facet of human behaviour. It further takes the view that if processes do vary over space because of spatial contextual effects, they are unlikely to vary randomly and almost certainly will exhibit some degree of spatial dependency -processes at locations in close proximity are more likely to be similar than are processes at locations farther apart. We can model the degree to which different processes vary over space through multiscale geographically weighted regression, as described above. This local modelling technique produces covariatespecific bandwidths, each of which yields a comparative measure of the spatial extent to which a process is relatively stable. Larger bandwidths indicate processes which are more stable over space and a global model is an extreme case of a local model with extremely large bandwidths.
The development of local models of human behaviour thus provides a powerful set of tools for spatial analysts. However, it does not imply that all processes are spatially varying. Physical processes, for example, those that govern the way matter is formed and behaves, do not vary over space -E does not equal mc 1.5 in some locations! Processes which would seem most likely to exhibit spatial nonstationarity are those which involve human preferences and actions which are susceptible to local contextual effects. Modelling such processes with traditional global models seems unnecessarily restrictive. Why not allow the data to hint at which processes might be spatially varying by calibrating models locally? If the processes being modelled are all global, then nothing is lost as the local model will replicate a global model by making all the optimized bandwidths very large. If the processes do vary across space, however, local models will generate much better predictions of y and generate useful information on the nature of the spatially varying processes by mapping the local parameter estimates and by examining the optimized bandwidth values. Further, if some processes do exhibit spatial nonstationarity, this has profound implications for the current interest in the reproducibility and replicability of geographic research Kedron et al., 2019Kedron et al., , 2020Sui & Kedron, 2020). If processes are spatially varying, then we cannot expect a model calibrated in one location to be replicated exactly in another location -the processes being modelled might be different in the two locations.
Finally, returning to earlier comments on the focus of geographic studies, the increasing focus on local models of human activity leads us to explore a whole new geography -that of processes. We now have the means to produce maps -not of data -but of processes via the local parameter estimates produced in local models. Spatial processes can be mapped and analysed in much of the same way that spatial data are. Spatial heterogeneity and spatial dependence are two properties of spatial data that are commonly and routinely described in the literature, but these concepts can now be applied to spatial processes. Note 1. It should be noted that the physical sciences are not immune from the issue of equifinality as the recent debates on climate change demonstrate.

Disclosure statement
No potential conflict of interest was reported by the author(s).