A spatial multinomial logit model for analysing urban expansion

The paper proposes a Bayesian multinomial logit model to analyse spatial patterns of urban expansion. The specification assumes that the log-odds of each class follow a spatial autoregressive process. Using recent advances in Bayesian computing, our model allows for a computationally efficient treatment of the spatial multinomial logit model. This allows us to assess spillovers between regions and across land use classes. In a series of Monte Carlo studies, we benchmark our model against other competing specifications. The paper also showcases the performance of the proposed specification using European regional data. Our results indicate that spatial dependence plays a key role in land sealing process of cropland and grassland. Moreover, we uncover land sealing spillovers across multiple classes of arable land.


Introduction
Increased urbanisation and expansion of cities as a direct result of economic and population growth, coupled with intensifying climate change, poses a key challenge for policy makers (IPBES, 2019).
The location choice of new urban developments is of particular importance, as land is a finite resource. Expanding artificial surfaces is both expensive and time consuming to reverse, resulting in long-term impacts on land use and land cover. The conversion of natural habitats to artificial surfaces thus has a direct and potentially irreversible impact on biodiversity (Leclère et al., 2018).
On the other hand, if arable land is built up, global food security is threatened and urban expansion might spillover to other types of land use.
Conversion of land to urban surfaces is a decision usually taken by the landowners, which are either regional governments or private land-holders. In an economic framework, this decision is understood as a trade-off between the relative profitabilities of land uses and respective conversion costs (Miller and Plantinga, 1999). Potential profits from land ownership are typically assessed using various proxies for land rents (Chakir and Lungarska, 2017), while conversion costs rely on the quality of land and national regulations restricting land transformation. Land use change models targeting aggregate administrative levels focus on capturing the outcomes of regional policies (Ay et al., 2017). This is of special importance within the EU, where regional policies (such as the structural funds) are aimed at this level (Alexiadis et al., 2013). Within a regional econometric framework, the land use expansion decision can be modeled as a random choice, with the multinomial logit model representing a popular option (Lubowski et al., 2008;Chakir, 2009). The particular advantage is that a joint modelling of land sealing processes can take into account spillovers across land use classes. When dealing with compositional (shares) data for land use, the multinomial logit random choice model can either be estimated directly from the multinomial logit form (Li et al., 2013) or from its log-linearized form (Chakir and Lungarska, 2017). While the log-linearized version of the model represents a popular choice due to its ease of transformation, it suffers from the usual problems of log-transformation, namely that frequently land use shares are zero and accommodating these observations inherently biases the estimates. Spatial dependence, both from unobserved spatially varying variables, as well as contingent on the choice of neighbouring regions, is well documented in the land use choice literature (Chakir and Parent, 2009;Chakir and Le Gallo, 2013;Li et al., 2013). In a regional econometrics context, a wide number of studies stress the inherent importance of spatial spillovers (LeSage and Pace, 2009). When estimating models for land use change on a small-scale level, the problem of spatial dependence becomes even more central. Specifically, neglecting to account for spatial autocorrelation may result in severely biased estimates and erroneous policy conclusions.
However, spatial dependence in multinomial logit frameworks has been so far neglected by the spatial econometric literature, with the exception of GMM based approaches (Klier and McMillen, 2008).
Within this paper our contribution to the existing literature is twofold. First and foremost we present a novel Bayesian approach for capturing spatial dependence among land use changes using a multinomial logit framework. By combining the spatial autoregressive and multionomial logit frameworks, our specification can account for cross-regional and cross-land use class spillovers.
The estimation approach builds on recent advances in Bayesian modelling of logit type models (Krisztin and Piribauer, 2020) and employs latent Pólya-Gamma distributed variables. We demonstrate the virtues of our approach in a series of Monte Carlo studies. Our second contribution is a novel examination land use change processes on a regional pan-European level. For this we rely on an extensive dataset of land use changes to assess the share of urban gains originating from cropland, grassland, forest and other fallow land. Our framework allows us to shed light on the small-scale spatial dynamics of land sealing processes in European regions.
The remainder of the paper is structured as follows. Section 2 outlines the theoretical model of urban expansion, as well as its multinomial logit variant. Section 3 focuses on the estimation framework. Section 4 presents the Monte Carlo benchmarks of the proposed econometric estimation approach. Section 5 presents the results for urban expansion in Europe. Section 6 concludes.

A spatial autoregressive multinomial logit model
In this paper, we estimate an econometric model which aims at explaining the choice of land buyers (both public and private) for the purpose of converting it to urban, artificial surfaces in N regions.
In a given region i (with i = 1, ..., N), land buyers may acquire land from J different land uses. In our case these are cropland, grassland, forest, and other natural land. Within a region the buyers are assumed to be price taker and their choices are assumed to be homogeneous and risk-neutral.
In an economic sense this constitutes a profit maximization problem of land buyers (Lubowski et al., 2008;Miller and Plantinga, 1999), which is directly dependent on the associated profits and costs of the converted land. In addition, to account for the expected net present value of rents from urban land use and the respective conversion costs, land buyers also face the opportunity costs of alternatives usages.
Such frameworks have been adopted among others by Lubowski et al. (2008), Chakir andParent (2009), andLi et al. (2013). For estimation of parameters relating to observed buyers' choices, the profit maximization problem can be formulated within a multinomial limited dependent variable framework. Let y i j be the observed share of urban expansion from land use j relative to the total urban expansion in region i.1 Econometric estimation thus concerns itself with modelling the probability of observing y i j . Within the multinomial logit framework, this probability can be modelled as a function of choice specific log-odds µ i j , weighted by the sum of log-odds over all 1When land use specific observations y ij are shares, a popular choice for estimating the multinomial model is to apply a log-linear transformation, where the dependent variables correspond to log(y ij /y iJ ) and perform standard regression analysis (see, for example Chakir and Lungarska, 2017;Chakir, 2009). The main drawbacks of this approach are twofold. First, Jensen's inequality states that the expectation of a logarithm is not equal to the logarithm of the expectation. Therefore, log-linearization inherently introduces a bias in the estimated slope coefficients. Second, in empirical applications frequently a large number of observed choices y ij are equal to zero, thus necessitating either a censoring of observations or adding a constant to all observations, both of which have been demonstrated to lead to substantial bias. choice alternatives µ i j ′ ( j ′ = 1, . . . , J): (2.1) In the standard non-spatial multinomial framework µ i j is specified as a function of k explanatory variables, with corresponding choice-specific slope coefficients, which are to be estimated. The explanatory variables correspond to the expected rents and conversion costs with respect to land use j.
Spatial dependence among log-odds µ i j in Eq. (2.1) involves the assumption that the choices of urban land buyers do not solely depend on rent and conversion costs in their own region i, but also on other regions' characteristics as well. This assumption implies that the probability of observing a land use choice in region i also depends on land use choices of all other regions. This assumption is based on the spatial nature of land expansion: before construction, investors typically scope multiple investment opportunities, which might not be contiguous, but located across regions in spatial proximity to each other.
Following the spatial econometric literature, such dependencies can be incorporated by imposing an exogenous neighbourhood structure through a non-negative and row-stochastic spatial weight matrix. Let W be such an N × N spatial weight matrix. Two regions i and i ′ are assumed to be neighbours of w ii ′ > 0, otherwise w ii ′ = 0. No region is a neighbour to itself, thus w ii = 0.
The resulting spatial autoregressive (SAR) multinomial logit model can be expressed as: . . . , x K ] collects the K vectors of explanatory variables and β j denote the respective K × 1 vector of slope parameters related to choice j. The N × 1 vector ε j contains independently and identically Gaussian distributed disturbance terms, with zero mean and σ 2 j variance. The (scalar) parameter ρ j measures the strength of spatial autocorrelation for land use class j, with sufficient stability condition ρ j ∈ (−1, 1), where positive (negative) values of ρ indicate positive (negative) spatial autocorrelation. Note, that the model allows for different ρ j across land use classes.2 In the absence of spatial autocorrelation (ρ 1 = ... = ρ J = 0), the model framework collapses to a classical multinomial logit setup.
In such a spatial autoregressive (SAR) model specification, the N × 1 vector of choice-specific log-odds µ j = [µ 1j , . . . , µ N j ] ′ thus also depend on the characteristics of other regions in the sample.

Spatial dependence is introduced by the spatial multiplier
2With the identifying restriction that the spatial autocorrelation coefficient associated with the J-th land use class ρ J = 0.
3It is worth noting that the standard SAR model can be extended to more flexible spatial econometric model specifications in a straightforward way. Specifically, one may additionally include spatially lagged explanatory variables, resulting in a so-called spatial Durbin model (SDM) specification (see, for example LeSage and Pace 2009). A similar extension is presented in the empirical exercise.
A core implication of the SAR modelling framework is that a change in the explanatory variables associated with region i, would not only result in changes of the observed shares y i j in the own region, but in other regions as well. Through the nature of the multinomial logit model, where marginal impacts to one choice j also affect the shares of all other choices, this implies that in a spatial dependent setting marginal impacts of y i j have spillover effects over regions and choices as well.
Note, that through the normally distributed residual error vector ε j , the residuals of the model in Eq. (2.2) are effectively decomposed into two components: first, the heteroscedastic errors arising from the logistic model in Eq. (2.1) and second, the normally distributed error term ε j with σ 2 j variance. Spatial dependence in the errors is captured through the latter. Similar to spatial autoregressive variants of standard probit (LeSage et al., 2011) and logit models (Krisztin and Piribauer 2020), innovation variances σ 2 j are restricted to unity, in order to identify the logistic errors.

Estimation strategy
We propose a Bayesian estimation strategy for the SAR multinomial logit model, which builds on the idea of introducing a latent variable in order to facilitate the estimation of the multinomial logit likelihood. This estimation strategy has been widely employed in recent Bayesian econometric literature for tackling models featuring non Gaussian distributions (see e.g. Frühwirth-Schnatter and Frühwirth, 2012;Frühwirth-Schnatter et al., 2009). To illustrate the core problem, consider the likelihood of the multinomial logit model in Eq. (2.1): (3.1) Note, that the likelihood contribution of observation i relies not only on µ i j , but on the log-odds of making other choices as well. This well-known non-linearity in the likelihood greatly complicates the estimation of the unknown slope and spatial autoregressive coefficients.
Within a Bayesian framework the focus of estimation frequently lies mainly on finding conditional posterior distributions for the parameters of interest. In fact, assuming suitable priors p(β j ), the conditional posterior of β j can be expressed conditional on all other slope coefficients β −j and ρ (see Holmes and Held, 2006): While this distribution cannot be easily sampled from, we follow the work of Polson et al. (2013), which has been adopted to the spatial autoregressive variant of a bivariate logit distribution (Krisztin and Piribauer, 2020). A particularly useful result in Polson et al. (2013) is the fact that conditional on introducing a Pólya-Gamma distributed latent random variable, exponential type distributions such as the one in Eq. (3.2) can be recast as Gaussian, where posterior sampling can be easily achieved.
Particularly, when conditioning on ω i j ∼ PG(1, 0) -where PG(1, 0) denotes a Pólya-Gamma distribution with rate one and shape zero -the conditional posterior of the slope parameters associated with choice j can be reformulated as: where ω j = [ω 1j , ..., ω N j ] ′ and κ i j = y i j − 1/2. The conditional posterior has working responses we elicit a Gaussian prior distribution for the slope coefficients, with p(β j ) = N µ β j , Σ β j , the conditional posteriors for the slope coefficients are also Gaussian: The Gaussian conditional posterior of the slope parameters reveals the particular appeal of using latent Pólya-Gamma distributed variables. A wide variety of Bayesian model extension, such as variable selection, or uncertainty over the W can be easily introduced in the above framework. Following Polson et al. (2013), the conditional distribution of ω j is also a Pólya-Gamma distribution: where η j = [η 1 , ..., η N ] ′ . Computationally efficient algorithms for sampling from the Pólya-Gamma distribution are readily available in the R package BayesLogit.
The conditional posterior of ρ relates directly to the multinomial logit: where p(ρ j ) denotes the prior distribution of ρ j . The conditional posterior in Eq. (3.6) is not from a well-known form and thus cannot be sampled from easily. This is usual in the spatial econometric literature, and the standard solution is to use a Metropolis-Hastings step, as in LeSage and Pace (2009).
The Markov-chain Monte Carlo algorithm cycles through steps I to III B times by excluding the first B 0 draws as burn-ins. Inference on the parameters is conducted using the B − B 0 remaining draws.4

Simulation study
In a Monte Carlo study we benchmark the SAR multinomial logit model in order to assess the predictive performance of our proposed modelling framework against two competing specifications: (i) a non-spatial version of the SAR multinomial logit, where all spatial autoregressive coefficients ρ j = 0 for all j, and (ii) J − 1 individual SAR logit models where each logit model captures the log-odds of not choosing option J.5 For the simulation study we use a SAR multinomial logit model as a benchmark data generating process, with three choice classes (J = 3) and two randomly generated explanatory variables (k = 2). The data generating process can be written as follows, where variables with a tilde denote generated quantities:ỹ The slope coefficients and the explanatory variables are generated anew in each Monte Carlo iteration, whereX stems from a standard normal distribution. The 4Convergence of the MCMC algorithm was checked using the convergence diagnostics proposed by Geweke (1992) and Raftery and Lewis (1992). Convergence diagnostics have been calculated using the R package coda. 5The SAR logit model is estimated using the method put forward in Krisztin and Piribauer (2020).
slope coefficients are generated as stated above. The row-stochastic spatial weight matrixW is based on a random spatial pattern generated from a Gaussian distribution for latitude and longitude, and constructed using seven nearest neighbours. Note, that our dependent variableỹ i j is a share variable, as is often used in land use share models (see, e.g. Chakir and Parent, 2009).
To assess the strength of the specifications along multiple scenarios, we vary the strength of spatial dependence ρ j ∈ {0, 0.5, 0.8}. To evaluate the accuracy of the sampler with respect to the chosen sample size, we consider N ∈ {400, 1000}. Across all models, our prior set up is as follows: we use a rather uninformative Gaussian prior for β 1 , ..., β J−1 with zero mean and variance 10 8 and for ρ 1 , ..., ρ J−1 we use a the standard beta prior specification as proposed in LeSage and Pace (2009).  In the case of no spatial autocorrelation (ρ j = 0), the non-spatial multinomial logit exhibits the highest estimation accuracy for both sample sizes under scrutiny. It is worth noting that this result is hardly surprising, as in the absence of spatial autocorrelation this model resembles the true data generating process most closely. However, that the SAR multinomial logit closely tracks the estimates of its non-spatial counterpart. In the case of N = 1, 000, our proposed model specification even slightly outperforms all competing specifications in terms of average direct effects.
For a moderate degree of spatial autocorrelation (ρ j = 0.5), the SAR multinomial logit model outperforms all other specifications under scrutiny for both considered sample sizes. In terms of direct average impacts, the non-spatial multinomial logit model performs comparatively better.
However, in the case of a smaller sample size (N = 400), the bias in terms of point predictions clearly increases. Note, that the competing bivariate SAR logit specification shows considerable bias in estimating the spatial autocorrelation parameters, albeit the bias is less than that of the non-spatial multinomial logit.
Turning attention to a high degree of spatial autocorrelation (ρ j = 0.8), we observe that the SAR multinomial logit model significantly outperforms its alternatives. Furthermore, when high spatial autocorrelation is present, the bivariate SAR logit exhibits lower bias in terms of point prediction of average indirect effects, as the non-spatial multinomial logit model.
Overall, we can conclude that the SAR multinomial logit model outperforms both a non-spatial multinomial logit, as well as the application of bivariate SAR logit models. This result applies both in moderate and large sample sizes. Even when no spatial autocorrelation is present, the SAR multinomial logit model produces rather promising results in terms of predictive performance, as it closely tracks the results of its non-spatial counterpart.

European land use change
Recent literature focused attention to land sealing resulting from urban sprawl, and associated spillovers to other land use classes. Results from van Vliet (2019) suggest that in the last decade in Europe 8.4 Mha of land has been converted to urban, out of which 6.3 Mha was converted from cropland. However, this land sealing led to 13.1 Mha displacement of other land use classes, as cropland was expanded elsewhere, to compensate for the lack of production resources, out of which the majority (13 Mha) was expanded in other regions. These spillover effects are well documented in the literature (e.g. Coisnon et al., 2014;Guastella et al., 2017;Zoppi and Lai, 2014), and serve as a motivation for an empirical application of the spatial multinomial logit model. Both global (Ay et al., 2017) and local (Deng et al., 2008) spillovers are considered of importance.
In the spirit of Chakir and Parent (2009), Zoppi and Lai (2014), and Lai and Zoppi (2017) we model the areal share of urban sprawl stemming from non-urban land in a given region within a spatial Durbin multinomial logit model, where the log odds take the following form: The scalar α is an intercept and the term W X is a spatial lag of the matrix of covariates with associated vector of parameters θ j . This lag explicitly controls for the regions' characteristics of their neighbors.

Regions, data, and spatial weights
Our sample covers a cross-section of 1,316 European regions across 27 countries. The regions are classified under the NUTS 2013 classification at the NUTS 3 level. They vary in size and population, however, they divide the territory of the EU for the purpose of harmonized regional statistics and analysis. Further, they are assumed to be appropriate spatial observation units for economic research and regional policy applications. The regions included in the sample are located in Austria (35 regions As a result, we obtain a compositional data vector that -by definition -sums up to unity. The types of land use we consider follow the empirical literature on land use changes and urban expansion (Chakir and Parent, 2009;Chakir and Le Gallo, 2013;Lai and Lombardini, 2016;Zoppi and Lai, 2014;Lai and Zoppi, 2017). We distinguish between the five classes cropland, grassland, forest, other, and urban. At this point it is worth noting that we recognize all artificial surfaces as urban region, as we want to focus our study especially on soil sealing. The raw data stem from the

CORINE Land Cover (CLC) maps provided by Copernicus Land Monitoring Service (CLMS).
Their maps are based on satellite data with Minimum Mapping Units (MMU) of 25 hectares for areal phenomena and a minimum width of 100 meter for linear phenomena. The data consists of an inventory of land cover in 44 classes, which we summarize to the five classes stated above.6 We use CLC change-layers also provided by CLMS, designed to capture the land cover changes at a higher resolution between two neighbour surveys. Regional aggregates at the NUTS 3 level Our set of covariates consists of K ′ = 19 candidate variables that are commonly employed in the literature on land use changes (for an overview, see Shaw et al., 2020). Further, to capture the complex spatial structure we include not only the spatially lagged dependent vector, but also the spatially lagged forms of the explanatory variables (except for the dummy variables). We also include a vector of ones as intercept. Therefore, the resulting design matrix is of column-dimension K = K ′ + 18 + 1 = 38 where 18 are the spatially lagged covariates and 1 is the intercept. Table 2 provides a short technical description for the variables included in our estimation.
Since the rent of a certain land use class is assumed to affect the decision of land-ownersyet it is usually not observed -many recent studies consider various proxies to control for the variation in returns from different land uses (see, e.g. Livanis et al., 2006;Lubowski et al., 2008). Chakir and Parent (2009) conclude that agricultural gross value added divided by the respective 6Table A1 in the appendix summarizes how each of the land cover classes was mapped. Eurostat land use area serves as a reasonably good proxy. Higher rents are therefore assumed to reduce the amount of land that is converted to artificial area.
The initial level of artificial areas and -especially -urban expansion rates are discussed in the literature in the context of the level of available agricultural amenities (Wu, 2006;Wu and Plantinga, 2003;Coisnon et al., 2014). Based on this strain of literature lower initial urban expansion would lead to higher urbanization rates, as regions surrounding population centres with low urban share are in higher demand.
On the other hand, quantities on employment, population and income are typical variables to represent the degree of economic development. Employment enters the model in the form of sectoral shares, with manufacturing (secondary) as baseline. Region specific population, a particularly important driver of land take (see e.g. Guastella et al., 2017;Terama et al., 2019;Paulsen, 2012), is divided by the respective area and therefore captured as density. Income is measured as gross domestic product per inhabitant. High shares of tertiary employment, paired with high income and population density is usually observed around the city centres and, therefore, associated with expansion of housing supply which again should translate into urban expansion.
Quantities usually associated with the quality of soil include measures of slope, elevation and moisture (usually in form of precipitation or humidity). Following Chang-Martínez et al. (2015) we include these physical drivers of land use conversion, as they implicitly influence the cost of land conversion. We consider slope and elevation in average meters and degrees respectively.
Soil moisture is captured as volumetric measure of liquid water in a surface soil layer of 2 to 5 cm depth. Variables capturing the quality of the land are assumed to have a negative impact on conversion of productive land, as they are to be interpreted as costs of conversion (Shaw et al., 2020;Huang et al., 2006).
Additionally, national regulations, as the amount of nature conservation areas, restrict the potential conversion. We include the share of area being protected under the Natura 2000 network of nature protection. The Natura 2000 network's main objective is to preserve natural habitats and secure biodiversity in the European Union, hence, forest and grassland areas are of main concern (Lai and Zoppi, 2017).
In the discussion of steering soil sealing, subsidies and taxes play a key role (Artmann, 2014;Shaw et al., 2020). As a proxy for European level subsidies we utilize observation on whether a region received Objective 2 level regional funding within the period, as this type of funding is also used to enhance infrastructure in the region. An additional major source of subsidy for land use management are agricultural subsidies of countries, as well as the European Union. These are not divided on the regional level, but by farm size and productivity. Therefore, to control for the heterogeneous structures of agricultural actors across Europe, variables that account for farm specific characteristics are incorporated (for a discussion see Delbecq et al., 2014).
For the spatial weights matrix W we suppose a neighbourhood structure known as seven nearest neighbour specification, where every region is constrained to be a neighbour of its seven closest regions. Our results, however, prove robust to variations in the assumed spatial dependence structure.

Empirical results
This subsection presents the Markov Chain Monte Carlo (MCMC) results obtained from 10,000 posterior draws for our spatial multinomial logit specification, where the first 5,000 were discarded as burn-in.7 Straightforward interpretation of coefficient estimates in spatial models could lead to deceptive or misleading conclusions (see, e.g. Anselin, 1988;LeSage and Fischer, 2009). One possibility is to provide summary metrics in form of direct, indirect (spillover) and total effects. Following (LeSage and Pace, 2009) we present marginal effects in Table 3. Direct effects are then to be interpreted similar to regular slope coefficients. In turn, indirect effects account for the impacts due to changes in other regions and are therefore to be interpreted as spillover effects. We find a significant class-specific spatial parameter ρ j for cropland as well as grassland, highlighting the necessity of incorporating the spatial dependence structure in the model. This result confirms the findings of Guastella et al. (2017) and especially of van Vliet (2019), in that the expansion of artificial surfaces on productive land leads to further spillover land conversions in surrounding regions.
In addition, the table reports the McFadden pseudo R 2 , which serves as a measure of the goodness of fit in limited dependent variable models. McFadden (1974) highlights that values between 0.2 and 0.4 already indicate a rather good fit. The rest of the reported results is to be interpreted as follows: For a certain explanatory variable the four values reported in the direct effect column represent the class specific (cropland, forest, grassland, and other) responses to variation in the respective explanatory. These responses are the changes of the probabilities to convert the respective class in that region to artificial area. Similar, the four class specific columns for the indirect effect are the responses to changes in the explanatory variable in all other regions.
The direct effects of the three types of land rent proxies (crop, forest and grass) confirm results from Chakir and Lungarska (2017) and Chakir and Parent (2009), in that for each land use class higher rents imply a significantly lower chance of conversion. Additionally, the joint modelling in a multinomial model indicates that significant spillover effects to other classes are present. Most notably, an increase in cropland rents in a region, would also increase the conversion of grassland to artificial areas. We find analogous relationships for forest rent and cropland, as well as grass rent and cropland.
A higher initial level of artificial areas indicates that land sealing of other natural vegetation in the own region has a significantly higher probability as compared to land sealing of the other land covers under scrutiny. Burnett (2012) have similar findings, where urbanization is a process which enforces itself. Moreover, as the crop, grass, and forest land surrounding cities is frequently the most productive (Shaw et al., 2020), it seems intuitive that urban expansion would take from the comparatively less productive other natural vegetation.The change of artificial area appears to have no direct or indirect impacts on the allocation of its origin.
Regarding the sectoral mix of employment, our results indicate that a higher share of tertiary employment in the own region implies a significantly higher probability of other natural vegetation being converted to artificial land. This reflects the findings of Salvati (2016) and 7Convergence of the sampler was checked using the diagnostics by (Geweke, 1992)   Our results with regards to gross domestic product per capita suggest that it is not a significant driver of land sealing in a European context. This is as opposed to findings of e.g. Deng et al. (2008) in developing countries, where gdp per capita is found to be one of the main drivers of urbanization. Moreover, we find that a higher gdp per capita in fact significantly lowers the probability of sealing grass land in the own region. When observed jointly with the direct effects of population density, this supports findings by McGrath (2005) and Guiling et al. (2009), who find that population is a more significant driver of urbanization, as opposed to personal income.
Note, however that only for the land take from grass land is the effect positive and significant. For land takes from forest and other natural land, a higher population density in fact results in a lower chance of land conversion. This result can be interpreted on the one hand with the fact that regions with a higher endowment of population density are more urban in nature and contain a much lower percentage of cropland or other natural vegetation. On the other hand, specific literature, such as Delbecq et al. (2014) and Wu (2006), provide evidence that private home-owners exhibit strong preferences for surrounding grassland amenities.
Turning our attention to the estimated impacts of the biophysical drivers elevation, slope, and soil moisture, we can largely confirm the overall conclusions of Shaw et al. (2020) and Chang-Martínez et al. (2015) in that the biophysical processes play a secondary role to socioeconomic ones in explaining land sealing processes. For the own-region, only slope plays has a small, albeit significant impact on the probability of sealing other natural vegetation. Additionally, a higher percentage of soil moisture indicates a significantly higher chance of converting grassland to urban land in neighbouring regions.
Our results seem to indicate that the Natura 2000 protection program has intended effects, as higher shares of protected forest and grassland would -according to our results -lead to significantly lower chance of the respective land cover being converted into urban. Note, that if a region has a higher share of other natural vegetation under Natura 2000 protection, this would lower the chances of neighbouring regions converting this land cover to urban. This largely confirms the findings of Lai and Lombardini (2016), Zoppi and Lai (2014), and Lai and Zoppi (2017). Our joint multinomial logit framework, however, allows us to uncover additional interdependencies among the natural protection of land covers. Our estimated results suggest that a higher share of protected forest in a NUTS3 level region would results in a significantly higher probability of converting cropland to urban, not only in the own region but also amongst neighbours. Additionally, this would also lower the odds sealing grassland under artificial surfaces.
The estimated results with regard to our subsidy proxies seem to show that regional funding plays a comparatively larger role as farm-specific subsidies. The own-regional effect of regional level Objective 2 subsidies is significant and negative for cropland, and positive for other natural vegetation. This finding supports the hypotheses that subsidies increase land conversion (Shaw et al., 2020). Particularly noteworthy is the result that the land take comes more significantly from natural vegetation (which is highest in biodiversity) as opposed to more productive cropland. Additionally, neighbours of regions under Objective 2 funding also have a significantly decreased chance of converting cropland to artificial areas. This might suggest an increase in cropland productivity, through better infrastructure. With regard to our farm structure variablesproxying the role of the Common Agricultural Policy -our results indicate a significant effect only with regard to farm density: a higher density of farms would lower the chance of converting land to grassland in the own region.
In this paper we put forth a Bayesian estimation approach for a multinomial logit specification for the modelling of land use conversion, which has a spatial autoregressive structure in the log odds, with differing strength of spatial autocorrelation for each choice alternative. The virtue of our specification is that it combines a spatial autoregressive framework (allowing for cross regional spillovers), and a joint multinomial framework (allowing for cross land use class dependencies).
The proposed approach is based on recent spatial econometric advances dealing with Bayesian estimation of the logit model Krisztin and Piribauer (2020). The core step of the estimation procedure relies on introducing a latent Pólya-Gamma variable (see Polson et al., 2013). Through the latent variable, the conditional posterior distribution of the slope parameters in the spatial autoregressive logit specification is rendered in a Gaussian form, which allows us to tackle the MCMC estimation in a particularly efficient way. We demonstrate in a simulation study the advantages and behaviour of our proposed model specification, benchmarking it against simpler alternatives.
The virtues of the spatial multinomial logit model are illustrated using an empirical specification. Specifically, we examine the land sealing activities in European NUTS-3 level regions. We consider the areal share of urban sprawl emanating from cropland, grassland, forest, and other natural vegetation from 2000 to 2018. The observation on land use data stem from the CORINE Land Cover (CLC) maps. Our results suggest, that spatial dependence indeed play a small, but significant role, particularly for the land use classes cropland and grassland. For all land covers proxied land rents are of central importance. Additionally, our findings corroborate evidence from recent literature, that socio-economic drivers play a much more central role, as opposed to biophysical ones (for an overview, see Shaw et al., 2020). The key role of population density Guastella et al. (2017); Deng et al. (2008); Lai and Lombardini (2016) in urban land take is confirmed by our results. Moreover, we confirm on a larger level that environmental protection not only has effects in the own-but also in neighbouring regions (Lai and Lombardini, 2016;Zoppi and Lai, 2014;Lai and Zoppi, 2017). Through the virtue of our multinomial analysis we also find evidence for forest protection having spillover effects to neighbouring regions and other land covers.

Marginal effects
Similar to the marginal effects of the spatial Durbin logit model (see Krisztin and Piribauer, 2020), in the presented multinomial logit model in Eq. (5.1) the interpretation of marginal effects of the k-th explanatory variable (with k = 1, ..., K) differs from those in linear models. This is due to the fact that the multinomial logit model is non-linear in nature, but also the the presence of spatial autocorrelation gives rise to an N × N matrix of partial derivatives, which makes interpretation of marginal effects richer, but also more complicated (see also LeSage and Pace 2009).
As is standard in the logit literature, and analogous with the proposed marginal effects of the spatial logit model (Krisztin and Piribauer, 2020), we provide marginal effects relative to the mean of the k-th explanatory variable, which we denote as x k = N i=1 x ik /N. Thus the interpretation of the marginal effects is the change in probability of observing y = j associated with a change in the average sample observation of the k-th explanatory variable. To write the partial derivatives of the model in Eq. (5.1), with respect to the k-th coefficient let us define: β k j and θ k j denote the k-th element of β j and θ j , respectively. x W k denotes the average value of the k-th spatially lagged explanatory variable. The partial derivatives can then be expressed as: where ⊙ is the Hadamard product. Note that marginal effects of the k-th coefficient on class j, denoted as Λ k j , are an N × N matrix due to the presence of the N × N spatial multiplier A −1 j . Interpreting N × N marginal effects proves cumbersome, therefore we define summary impact effects (LeSage and Pace, 2009). These can be readily calculated from Λ k j : (A.4) where ι N denotes an N × 1 vector of ones.