Building Hierarchies of Retail Centers Using Bayesian Multilevel Models

The perceived quality of urban environments is intrinsically tied to the availability of desirable leisure and retail opportunities. In this article, we explore methodological approaches for deriving indicators that estimate the willingness to pay for retail and leisure services offered by retail centers. Most often, because the quality of urban environments cannot be qualified by a natural unit, the willingness to pay for an urban environment is explored through the lens of the residential housing market. Traditional approaches control for individual characteristics of houses, meaning that the remaining variation in the price can be unpacked and related to the availability of local amenities or, equivalently, the willingness to pay. In this article, we use similar motivations but exchange housing prices for residential properties with property taxes paid by nondomestic properties to glean hierarchies of retail centers. We outline the applied methodological steps that include very recent, nontrivial contributions from the literature to estimate these hierarchies and provide clear instructions for reproducing the methodology. Using the case study of England and Wales, we undertake a series of econometric experiments to rigorously assess retail center willingness to pay (RWTP) as a test of the methods reviewed. We build intuition toward our preferred specification, a Bayesian multilevel model, that accounts for the possibility of a spatial autoregressive process. Overall, the applied methodology describes a blueprint for building hierarchies of retail spaces and addresses the limited availability of spatial data that measure the economic and social value of retail centers.

T he quality of an urban environment is a principle determinant of attractiveness (Glaeser, Kolko, and Saiz 2001). Attractiveness, in this context, might be understood as an outcome of perceived place attributes (Finn and Louviere 1996), which can be argued as those perceptions, attitudes, and patronage behavior of consumers drawn to particular places (Teller and Elms 2012). The quality of an urban environment cannot be qualified by a natural unit of analysis, however, and so approaches typically observe its capitalization into housing prices (Rappaport 2009). The depth and breadth of consumer amenities, natural and cultural assets, and opportunities in the labor market are seen as an influential driver of demand for residential space (Oner 2017). As an example, the attractiveness of Paris might be considered as a product of fine-dining restaurants, art museums such as the Louvre, and the impressive stock of buildings (Brueckner, Thisse, and Zenou 1999). Accordingly, Rappaport (2009) argued that environments with above-average consumer amenities or, implicitly, quality of urban environment typically sustain a higher density of residential population, resulting in higher prices in the housing market.
Under these assumptions, the desirability of areas has often been explored through the lens of home buyer decisions in the residential housing market. Hedonic analyses that estimate the willingness to pay for consumer amenities through residential housing markets derive a snapshot for the desirability of particular places. In recent years, the proportion of the individual's spending allocated to consuming the economy's lifestyle amenities and services has increased substantially (Oner 2017). An increasing share of the individual's rising wealth is allocated to the pursuit of enjoyment and experience, which is reflected by an increase in the willingness to pay for properties that are proximate to retail and leisure destinations. Changing consumer desires have transformed traditional retail zones into spaces of leisure consumption that are increasingly service oriented. Concentrations of retail outlets are referred to as retail agglomerations and exist across a system in space, with their attractiveness to home buyers related to the composition and richness of the retail environment but also competing opportunities available elsewhere (Teller and Elms 2012). Moreover, areas of retail perform as attractors for a multitude of heterogeneous user groups such as prospective and existing residents, consumers, visitors, and employees (Teller and Reutterer 2008). In this way, the availability of consumer amenities is seen as a driver of urban vitality, so an estimation of the willingness to pay for an amenity-rich environment can be used to gauge how desirable that area is.
One particular area that attracts a number of retail opportunities is the town center. Town centers are complex urban economic systems that are characterized by the clustering of socioeconomic activity (Thurstain-Goodwin and Unwin 2000). Embedded within the urban fabric of town centers are retail centers that are agglomerations of consumer spaces and shopping destinations that are central to economic and civic life (Pavlis, Dolega, and Singleton 2017). Town centers are typically composed of a retail center but in some cases have more expansive functional areas that include office spaces in addition to retail and services. A focus on classifying retail center willingness to pay (RWTP) is foundational to understanding hierarchies of retail spaces, which, by implication, reveal geographic patterns in urban growth and development. Retail center hierarchies are the rankings of particular centers within a network, the position of which relates to the size, attractiveness, and gravity of their composite retailers influence, with top-ranked centers typically offering multipurpose comparison shopping experiences that have a wider geographical reach on consumers (Dennis, Marsland, and Cockett 2002). By contrast, smaller district centers are more embedded in local economies and are patronized by a smaller catchment area. Although an underlying driver to the sustainability of the built environment, since the 1970s retail centers have become threatened by the decentralization and dispersal of development to out-of-town locations on the periphery of towns. In addition, Singleton et al. (2016) claimed that retail has become increasingly vulnerable to the effect of growing online shopping and so must be considered within a framework of e-resilience.
In this article, we introduce a statistical technique to derive indicators that describe hierarchies of retail centers across the national extent, which we obtain alongside a measure of uncertainty in the rankordered estimate for each retail center. Despite the concerns previously raised, although retail centers in the United Kingdom have long been examined under a series of milestone reviews (Department of the Environment Urban and Economic Development Group 1994), there is little quantitative evidence that explores the performance of town center retail economies, which has undermined effective policy formulation and decision making (Astbury and Thurstain-Goodwin 2014). Indicators of retail hierarchies produced by commercial organizations (Javelin Group 2017; CACI 2018), for example, lack fine spatial granularity at the retail center scale. Our approach is motivated by a hedonic framework of analysis that is typically oriented toward residential housing markets, except that we exchange residential for commercial properties to execute our empirical strategy. We describe the methodological steps required to reproduce the RWTP estimates, which includes very recent, nontrivial contributions from the econometrics literature. Finally, we introduce a validation exercise to verify the RWTP estimates correspond to conventional wisdom by correlating the scores to socioeconomic characteristics of the retail center. Not only is the approach we operationalize novel in application but we note that our methodology is replicable and generalizable to international contexts, conditional on data availability.
The remainder of the article is organized as follows. The next section motivates the underlying conceptual framework of the article, followed by an introduction to the specification and underlying assumptions of the modeling approach. After elaborating on the nature and limitations of the data source, we step through the results of each model, including a validation exercise to confirm whether the RWTP estimate for each retail center responds to characteristics that are associated with attractive places. The final section summarizes the article, presenting extensions for future elaborations of the applied methodology.

Background and Motivation Modern Consumption Patterns
The desirability of urban places to live is increasingly dependent on their ability to provide consumption opportunities, which are often reflected in housing prices (Glaeser, Kolko, and Saiz 2001). Leisure and retail amenities such as restaurants, live performance venues, and shopping districts have been shown to be crucial for attracting modern workers who balance economic and lifestyle opportunity in selecting places to live and work (Florida 2000). Because perceptual qualifications for the quality of leisure and retail environments cannot be directly counted or observed, they have often been evaluated by the willingness to pay for residential property through hedonic approaches (Rivera-Batiz 1988; Hui and Liang 2016). Jin and Sternquist (2004) argued that the desire for leisure and shopping is increasingly linked to the concept of enjoyment and experience. From a consumer perspective, shopping trips not only satisfy the individual's bundle of wants and needs at a given store but they allow the consumer to speak his or her own geographies of everyday life through the language of consumption (Sack 1988). This "credit-card citizenship" toward identities and preferred lifestyle choices provides an opportunity for social mixing and participatory entertainment (Goss 1993). Over the last few years, however, this traditional brick-and-mortar retailer landscape has been restructured by the growth of electronic retailing, with e-commerce sales in the United States rising by 101 percent in the period between 2011 and 2016 (Helm, Kim, and Silvia 2018). Due to the rise of the Internet, online consumption has tilted power from retailers to consumers through opportunities for 24/7 convenience and price comparison, increased ease of market entry and transparency, and a distribution of products to a wider geographical reach (Williams 2009). Evidence suggests that this rapid expansion in online consumption has affected the health of retail centers in complex ways and has been a principal driver of change to the geography of traditional UK high streets (Wrigley and Lambiri 2014).
Adjustments as a result of online shopping to the market share of retailing, leisure, and services in retail centers are typically considered detrimental effects that cause physical shopping opportunity to be substituted online (Doherty and Ellis-Chadwick 2010). Yet, online retailing has also been linked to complementarity and modification processes that blend traditional retail channels with e-commerce by refashioning the in-store consumer experience (Poushneh and Vasquez-Parraga 2017). In the United Kingdom, major retailers including Argos, John Lewis, and Boots have integrated new technologies by opening "click and collect" points that act as points of delivery for Internet sales by allowing customers to order goods online and collect them in store . Thus, the role of retail centers remains vital to modern consumption and the continuity of physical shopping environments, with consumers pointing to the hedonic experience that physical stores offer through recounted social experiences, the opportunity to discover new and exciting goods, and the gratification afforded by touching or trying products in store (Cho and Workman 2011). Under this lens, Singleton et al. (2016) recast the propensity of localized populations to engage with the mixture of online shopping and physical retailing provision under a frame work of "e-resilience." The constraint or opportunity of ecommerce to retail centers is not uniform across all retail types, with retailers whose merchandise can be replicated and digitized online the most vulnerable to large-scale store closures and lost physical shopping opportunity (Zentner, Smith, and Kaya 2013).

Geographic Behavioral Drivers of Retail Center Hierarchies
More concretely, the geodemographic characteristics of catchments served by retail centers are fundamental drivers of consumer choices and behaviors that shape the willingness to pay for retail opportunity and, in turn, hierarchies of retail spaces (Birkin, Clarke, and Clarke 2002). In the United Kingdom, geographic variation of consumer disposable incomes affects the relative retail value of catchment areas. For example, hierarchies of retail centers for large conurbations and metropolitan centers are moderated by their propensity to attract highly mobile consumers who require multiple retail and leisure choices (Wrigley et al. 2015). More generally, steps in the hierarchy of retail centers have become contingent on a rising "convenience culture." This incorporates the progressive rise of online retail with preferences for "local" shopping (and derived product authenticity, traceability, and sustainability benefits) alongside a revaluation of consumer awareness toward "community-sustaining" consumption (Chalmers et al. 2012). Since the early 2000s, significant demographic and societal shifts have driven these trends, with particular growth among low-density households, aging populations, and younger workers who are faced with longer working hours and busy lifestyles (Wrigley et al. 2015). These groups in particular have an increasing desire for convenience at the local level. In the United Kingdom this is revealed by evidence from the Institute of Grocery Distribution (IGD) that suggests that consumers are increasingly shopping little and often at shops closer to home rather than shopping at larger out-of-town retail developments, a phenomenon described as "top-up shopping" (IGD 2014). Moreover, a report by the Ethical Consumers Market suggests that the number of shoppers purchasing produce from local shops increased from 15 percent to 42 percent between 2005 and 2012 (Ethical Consumer Research Association 2013). This has considerable beneficial implications for the configuration of UK retail centers, because high streets and town centers are now increasingly the preferred locations for consumers to undertake their top-up shopping. Not only has this driven footfall back to retail centers but local shopping has reshaped hierarchies of retail spaces by boosting the vitality and viability of town centers and high streets in the United Kingdom (Wrigley et al. 2015).
Yet there is significant demographic variation in the propensity for consumers to value local shopping and engage with Internet retail; this has determined the differential geographies of online shopping (Longley and Singleton 2009) and, in turn, been an influential driver of retail hierarchies. Although typically younger age groups have been the most receptive to online shopping, significant growth has been recorded in the rate of online purchasing among those sixty-five and older, with 48 percent buying online in 2014, increasing from 16 percent in 2008 (Office for National Statistics 2018). By exploiting opportunities provided by digital technologies and adapting retail spaces to meet the needs of every population group, retail centers have become virtual marketplaces. Here, consumers are able to access information online regarding the availability of products, stores, services, and brands prior to visiting, which has enhanced the retail center customer experience (Wrigley et al. 2015). Despite these significant structural changes, though, good product ranges, quality of retail provision, and traditional factors such as overall retail center experience, atmosphere, and leisure provisions remain foundational drivers of footfall in retail centers. This extends their use from shopping destinations to areas for economic and educational activities (in addition to social interaction; Warnaby et al. 2002).
In addition to demographic variation, consumer behavioral patterns vary spatially and are directly linked to the geographies of demand toward retail facilities. Steps in the hierarchy of retail centers are intertwined with the underlying characteristics of the catchment area itself. Variations in consumer confidence, the ownership of basic digital skills, and local supply factors such as convenience and accessibility at the small-area level are influential factors toward the vitality of retail centers (Wrigley and Dolega 2011). Thus, the propensity and desirability of consumers to engage with physical shopping opportunity are governed by a multitude of contexts and influences such as the rurality and remoteness of an area (Warren 2007), the extent of Internet connectivity and speed of connection , and even how informed (and educated) consumers are to access online retail (Helsper and Eynon 2010). Despite these factors, and even in a digitally transformed retail landscape, the demand for high street shops remains a permanent fixture of consumer desires, so an estimation of the willingness to pay for retail centers is foundational to unpacking hierarchies of retail spaces that reveal geographic patterns in urban growth and development.

Measuring Attractiveness
Within the academic literature, measures for estimating attractiveness 1 are most typically classified into two streams of research. Models of the first stream are inspired by Reilly's (1931) gravitational law of retail, which motivated the seminal work of Huff (1963). The Huff model applies Newtonian laws of physics to estimate a retail catchment area that factors in the spatial distribution of competing retail destinations when evaluating their gravity or consumer pull to different population groups (Dolega, Pavlis, and Singleton 2016). Huff models are advantageous because they simultaneously estimate break points in the demand surface for all competing retail destinations in the model and reduce the probability of a consumer to patronize a given location to three groups of variables, namely, distance between shops and consumers' homes; a measure of attractiveness such as store size, service levels, or opening hours; and competition proxied by the number of retail units in a location (Teller and Reutterer 2008). Yet, the usual criteria for retail attraction in Huff models are often argued as incomplete, because additional factors that affect the consumer's propensity to visit a retail destination involve a suite of qualitative indicators such as the variety of retail tenants; site-related factors such as accessibility and parking conditions; and environmental factors reflected by sensual stimuli such as ambience, atmosphere, and perception of safety (Teller and Elms 2010). Clearly, these indicators influence the choice of shopping destination, but measuring across a national extent is difficult (Dolega, Pavlis, and Singleton 2016).
Methods of the second stream are motivated by findings that demonstrate that housing prices increase faster than wage levels, implying a premium for particular locations (Glaeser, Kolko, and Saiz 2001). This has led to a number of studies estimating the relevance of consumption opportunities to the desirability of places, with a focus on home buyer decisions toward urban amenities. That is, by controlling for property-specific characteristics of a residential property such as the number of bedrooms or bathrooms or whether the property has a garden, the residual variation in the property value can be unpacked and related to the local availability of amenities or lifestyle opportunity. Using this approach, the desirability of urban environments has been shown to be factored into property values and is broadly defined by the provision of place-specific assets and amenities that contribute to the allure of an urban area (Brueckner, Thisse, and Zenou 1999). Its importance, therefore, is intrinsically tied to population growth and development (Glaeser, Kolko, and Saiz 2001;Clark 2003), because attractive places that elevate one's experience of an urban environment through concentrations of arts, leisure, and retail have been shown to attract highly skilled individuals (Florida 2008). Clark (2003), for example, demonstrated that university graduates are more likely to locate in areas with high numbers of constructed amenities such as museums, libraries, and leisure outlets. Oner (2017) paid particular attention to the role of retail as an urban amenity, regressing a Q-ratio-a ratio of the marginal price of a property to the marginal production cost-on variables reflecting accessibility to shopping destinations. In all, the study found a significant increase in the Qratio of 0.1 for every 1 percent increase in the accessibility to shops for city municipalities.

Measuring Retail Center Attractiveness
In this article, we follow methods of the second stream. Thus, we apply a hedonic framework to estimate the willingness to pay for retail centers. Given our focus on retail environments, business rates paid by commercial property such as high street shops provide an alternative, yet more suitable, lens to explore hierarchies of retail centers than housing prices; although rent or housing prices are our idealized data set, these are difficult to obtain, particularly at the national level. With motivations similar to the way urban economists proxy willingness to pay through residential housing, by controlling for property-level characteristics in business rates-the total floor area, the number of car parking spaces, the store type, for example-the remaining variation in a premise's business rate can be explained by home buyer desirability for a particular area or, in our case, the retail center. In the United Kingdom, nondomestic rates, or business rates, are a propertybased tax levied on the estimated value of all nonresidential properties such as shops, offices, warehouses, and factories (Adam and Miller 2014). Business rates are determined using a ratable value for each nondomestic property. This is set by the Valuation Office Agency (VOA), which analyzes rent evidence (rent and lease agreement details) in addition to undertaking visual inspections of properties to ensure that all evidence is considered fairly. VOA surveyors set ratable values to reflect features including total floor area; business assets such as lifts, air conditioning, and closed-circuit television (CCTV) security systems; and changes in the local property market (VOA 2014). A valuation begins by setting a common basic value per square meter for similar properties in the same area. This basic value is then adjusted to reflect the property's individual features. Each review of a property's valuation considers property-level characteristics and, most important, the buoyancy of the local property market. In this way, business rates are synchronized to local economic market conditions, reflecting the relative size and scale of retail economies (Astbury and Thurstain-Goodwin 2014).
In our study, we label the estimated phenomena as RWTP, which describes the price that home buyers ascribe to the leisure and retail services offered by retail centers proximate to the property. In all, our findings are permissible because the residual variation in the business rate is attributed to local property market conditions (VOA 2014), which themselves are influenced by home buyer aspirations to reside in an environment that satisfies their wants and desires (Glaeser, Kolko, and Saiz 2001). By implication, this means that the ratable value, once controlling for property-level characteristics, can be used to approximate RWTP for the retail center with a catchment that services the surrounding area. Using our conceptual approach, we can begin to unpack hierarchies of retail centers by undertaking a series of experiments on several econometric techniques to find a preferred specification that provides the most rigorous estimates of RWTP for retail centers across the case study of England and Wales.

Methodological Framework
Our approach to estimate RWTP relies on hedonic modeling (Rosen 1974). This technique is typically used in the real estate literature to disentangle the price of a complex good as a function of the multiple intrinsic and extrinsic characteristics common to the property. In our case, a hedonic framework is applied to unpack the determinants of business rates for individual stores. By controlling for various property-level descriptors, a hedonic approach that uses a variable to represent each retail center allows us to recover the implicit price for the retail and leisure opportunities provided by the retail center. Practically speaking, this approach translates into a regression that explains the willingness to pay for receiving consumer amenities inside different retail centers. Once controlling for property-specific characteristics, the RWTP effect can be recovered for the location where stores are located because the business rate for each property involves setting a common basic value per square meter for similar properties in the same area, reflecting the performance, size, and scale of local market conditions (Astbury and Thurstain-Goodwin 2014).
To estimate the most robust empirical hedonic model specification, we compare several approaches, with a focus on recent contributions to the literature. To begin, we introduce a baseline spatial fixed effects model (Anselin and Arribas-Bel 2013), which is expressed as where y ij , the business rate for each store i in retail center j, is log-transformed to alleviate the potential impact of heteroskedasticity; x 0 ij is a 1 Â k vector of store-level variables in the Appendix, and b is a k Â 1 vector of regression coefficient to be estimated; D j is the dummy variable for retail center membership where D j ¼ 1 when j ¼ h for i 2 h, 0 otherwise; and ij is the model residual term, following an independent normal distribution N 0, r 2 e À Á : For model identification, the intercept is constrained to equal zero so that a separate RWTP effect h j can be estimated for each retail center. From a nontechnical standpoint, h j can be interpreted as the average willingness to pay (in log units) for stores to market their services in retail center j: One might expect different retail centers to offer varying degrees of utility such as access to particular socioeconomic groups, amount of footfall, or the prestige of surrounding consumer amenities. Taking into account individual store characteristics, h j captures the RWTP of retail centers.
Limitations exist associated with the fixed effect estimation strategy for the RWTP. First, the estimator of h j ,ĥ j , would not be reliable and precise if the number of stores in retail center j, n j , is small. In addition, if different spatial processes operate at the property and retail center scale, the conflation of unobservable influences will violate the independence of errors assumption through heteroskedastic or spatially correlated error in the covariance structure (G. Dong and Wu 2016). Multilevel models are approaches that allow variance between areas, so they remedy these issues by treating the retail center as part of the explanation for geographically varying outcomes (Owen, Harris, and Jones 2016). Instead of fitting a spatial fixed effect that assumes the relationship between the predictors and response holds as constant, multilevel models factor both spatial heterogeneity (differences) between areas and also dependencies (similarities) within them (Jones 1991). Put another way, this allows two stores located within the same retail center to be more alike in their outcomes than would be expected given their individual characteristics alone. Correlation within boundaries is expected because stores are assumed to be affected by the same aggregate effects, also known as group dependence (G. Dong and Harris 2015). Our second model thus requires a two-level hierarchical structure, an outcome variable measured at the lower level geography-individual stores-and a more aggregate spatial scale for the higher level-retail centers. We specify a random intercept multilevel model as where u j (j ¼ 1, 2 , :::, J) measures the RWTP of the retail center j, assumed to be independently distributed as N 0, r 2 u À Á : Under Equation 2, the dependency between stores in the same retail center j is The random intercepts u j are a linear combination of fully pooled and no-pooling models. The fully pooled model ignores heterogeneity by fitting a common intercept for all retail center boundaries, whereas the no-pooling model, identical to the spatial fixed effect, assumes a separate intercept for each retail center. The multilevel model introduces the partial pooling, or shrinkage, of the RWTP effect toward the global intercept (Gelman and Hill 2007). This is expressed as where u j can be seen as a compromise between the no-pooling estimate u NP j , where each retail center is assigned its own indicator variable, and the fully pooled estimate u FP j , which assumes a single intercept for all retail centers. This precision-weighted compromise is governed by the shrinkage factor s j (e.g., Goldstein 2003), where the weighting for s j is determined by the sample size in the jth retail center (n j ) and the variation within (r 2 e ) and between (r 2 u ) groups (Goldstein 2011). For example, when a retail center's boundaries contain a small number of stores n j , the RWTP estimate is pulled toward the fully pooled estimate. Similarly, when the boundary-level variance r 2 u is small-when the RWTP of retail center boundaries is similar-estimates are pooled more toward the mean level than when r 2 u is large.
The use of a multilevel model in the estimation routines for constructing hierarchies of retail centers represents a novel application. Multilevel models have been used to produce league tables by inferring school effectiveness from individual pupils' educational attainment but, to our knowledge, have never been applied to explore hierarchies of retail centers. Moreover, although this area of social science has a rich history in the direct application of multilevel models (Goldstein 2003), they rarely account for explicit spatial hierarchies in the empirical design. Thus, there has been emerging interest in incorporating spatial dependence into multilevel models (G. Dong and Harris 2015). Although we pursue a modeling objective similar to educational research by building a league table of retail centers, in the remainder of this section we develop an empirical strategy that accounts for potential spatial autocorrelation across the system of retail centers in space.
The model specified in Equation 2 adopts a deterministic, container-driven view of geographical space that contrasts with the reality that two retail centers located close together might be similar given their spatial proximity (Owen, Harris, and Jones 2016). In our case, we expect the RWTP effect induced by the retail center at a particular location to be directly dependent on observed values at surrounding locations, with the intensity of this influence moderated by geographic proximity. This interaction is described by a simultaneous autoregressive (SAR) process. If the data-generating process contains inherent spatial correlation, this could bias the estimated variance used for statistical inference. To account for this possibility in a spatially explicit hierarchy, G. Dong and Harris (2015) distinguished between two kinds of spatial dependence: horizontal and vertical. The horizontal are the spatial dependencies between lower level units that are the traditional concern of spatial econometrics (Anselin 1988), and the vertical are topdown group dependencies due to regional effects. One potential problem is the vertical spatial dependence effect that causes the RWTP effect in nearby retail centers to be more similar than those further away. To account for this possibility, we specify a hierarchical spatial autoregressive (HSAR) model (G. Dong and Harris 2015) that integrates SAR processes for the higher level residuals: where M is a J Â J spatial weights matrix that captures the interaction structure of stores by assigning nonzero weight M ij 6 ¼ 0 to pairs of observations assumed to be spatial neighbors and zero otherwise. M j is the jth row of M: Given the spatial characteristics of the data points, we define neighbors using an exponential decay function with the distance bandwidth d set to 5 km. 2 Following convention, M is row-standardized so that each row sums to unity P M ij ¼ 1: The parameter k quantifies the correlation of RWTP, with higher values for k leading to spatial covariance that dissipates slower for a higher order of neighbors. The reduced form of h in Equation 6 is where the spatial filter I J ÀkM ð Þ À1 captures any vertical spatial dependence in the RWTP effect h j : A Leontief expansion of the matrix inverse expands to I J ÀkM ð Þ À1 ¼ I þ kM þ k 2 M 2 þ k 3 M 3 þ Á Á Á and demonstrates spatial feedback when an increasing order of neighbors creates bands of ever larger reach around each location, relating every retail center to every other one (Anselin 2003).
A different, but related, model we specify next is a hierarchical spatial error (HSE) model, which is similar, except that we specify a spatially autocorrelated error term in g, A final methodological consideration relates to Lesage's (2014) empirical question as to whether the spatial process under study is global or local. The covariance structure induced by the HSAR model is global, because the spatial process relates every retail center to each other one. A hierarchical spatial moving average (HSMA) process, on the other hand, considers only first-and second-order neighbors, beyond which the spatial covariance is zero (Anselin 2003): The data-generating process of Equation 8 collapses to the reduced form where everything holds as in Equation 6, except that we introduce the HSMA parameter c: Unlike Equation 6, because I J þ cM ð Þis not inverted in the HSMA specification, there is only local range for the induced spatial covariance. This approach is intuitive, because there might only be local interaction across a neighborhood of different retail center boundaries, as opposed to interaction across the entire system of the national extent.
Whereas the standard multilevel model is estimated using maximum likelihood estimation, the spatial models are estimated using a Markov chain Monte Carlo (MCMC) simulation technique, the stationary distribution of which constructs a target probability distribution for the parameters. MCMC simulations are typically the only feasible approach for fitting spatial models that introduce the complexities of place relatedness into the variance-covariance structure (Lesage 1997). With these motivations, conditional Gibbs samplers are derived for the HSAR (G. Dong and Harris 2015) and HSE and HSMA (Wolf et al. 2018) models to obtain posterior samples for each parameter. This way, the joint density for the parameters is broken into univariate conditional probabilities where every successive parameter draw is conditioned on the draw for the previous parameter value (Geman and Geman 1984). Not only is this sampling technique computationally efficient but the draws from the parameter space b, r 2 e , r 2 u , k È É accumulate to an entire distribution for each parameter. In our case, we summarize each parameter estimate by the median value across the distribution but also with interval calculations. Each sampling chain is simulated for 10,000 iterations, with the first 5,000 draws discarded as "burnin" to allow the posterior distributions for each parameter to converge. In addition, we assess the serial autocorrelation for the posterior draws by examining the effective number of independent samples. As in time series analysis, we evaluate this because autocorrelation can often understate estimates of the variance in correlated sequences. On a final methodological note, the same weakly informative prior distributions were assigned to the model parameters in each model. 3 Further details on the technical implementation of the spatial models for the HSAR are found in G. Dong and Harris (2015) and for the HSE and HSMA in Wolf et al. (2018).
Despite our empirical strategy becoming increasingly sophisticated, a commonality between each model is that we obtain a free measure of uncertainty alongside each estimate of the willingness to pay for a particular retail center, h j : Uncertainty is expressed in the estimates for the confidence intervals of the spatial fixed effect and multilevel models and Bayesian credible intervals for the HSAR, HSE, and HSMA models. Because point estimates for h j represent an absolute ranking, overlapping interval estimates for each retail center imply confidence or credibility regions that change the rank-ordered estimate of centers in the hierarchy. Where the density bands of the confidence or credible intervals become less disjoint, there is increased uncertainty in the disambiguation between ranks of a given set of retail centers. Uncertainty measurements are desirable in cases where retail centers contain a small number of stores n j : Returning to Equation 4, because this carries implications for the calculated u j , an uncertainty estimation is valuable to ascertaining a measure of trust in the rankings of retail centers.

Data
Our point of departure for the proposed methodological approach is a geographical data set sorted into a hierarchical structure consisting of units grouped at two different levels. The points in our lower level geography represent 355,076 individual high street stores across England and Wales that are located inside retail center boundaries. This includes franchised chains such as fast-food outlets, supermarkets, and clothing stores-McDonald's, Tesco, and Primark, for example-but also independent retailers with more local scope. These data were collected by a large pool of surveying teams from the Local Data Company (LDC) in 2015 and include various descriptors for each property such as retail function and occupancy status. The most important characteristic of the data is that commercial addresses in the LDC database are matched to addresses in the VOA 2010 rating list (VOA 2018). This affords us a business rate valuation for every nondomestic premise, allowing us to unlock a rich, unique, and highly granular data set that provides a new and alternative lens through which to explore the implicit value describing the willingness to pay of an area. In all, for every store we have store-level variables that offer a rich description of the premise's physical condition. This includes data collected by VOA surveyors on the date of assessment such as the total floor area, the number of rooms in the premise, and the number of car parking spaces but also data collected by the LDC that categorize the business's function. 4 A full description of the variables that enter our design matrix is provided in the Appendix.
There are limitations to the VOA rating list that introduce error, though, especially given the primary purpose of the list is not intended for data analysis. The most notable limitation is what Astbury and Thurstain-Goodwin (2014) described as the regional difference in data collection techniques that affect the extent to which the ratable value reflects the market tone of a particular area and could lead to over-and underpredictions of the business rate assessed for the premise. Moreover, although the rating list was released in 2010, the ratable values set are actually conditioned on the 2008 market climate. Given that the UK economy underwent the shock of an economic crisis during this period, a time characterized by fragile consumer confidence, a decline in household disposable incomes, and rising shop vacancy on the high street (Department of Business, Innovation and Skills [BIS] 2011), it is likely that the overall market tone has been over-and undervalued across retail centers for England and Wales. Despite these considerations, the VOA ratings list provides highly granular and geographically accurate access to data reflecting local market economic conditions for the national extent.
The retail center is the observational unit from which we obtain home buyer willingness to pay estimates. Our higher level units are represented by 2,951 exogenously determined retail centers across England and Wales. Conceptually, retail centers are an appropriate choice for this purpose because they are drivers of local economic performance and reflect the wider economic health and social well-being of the urban environment (BIS 2011). Moreover, although they are often viewed as hubs for retail activity, they also exhibit a multitude of heterogeneous uses, including services, offices, and residential and public buildings (Teller and Elms 2012). The boundaries used in this study were produced by Pavlis, Dolega, and Singleton (2017) as a successor to boundaries developed by Thurstain-Goodwin and Unwin (2000) for the Department for Communities and Local Government (DCLG) in 2004, with the exception that they were intended to move away from a definition of town center locations of employment to functional spaces delineated for retail. Although the resulting retail centers might not perfectly align with those designated in governmental planning policy, they provide a consistent method for comparing retail centers nationally. In all, these boundaries are our higher level geographical unit and represent the functional economic market area of the retail center. The resulting spatial hierarchical structure of the data is illustrated in Figure 1 through the example of Liverpool.

Empirical Findings
In this section, we develop a discussion of our empirical findings in two main directions: First, we step through each of the modeling approaches, building intuition toward our preferred specification; second, we introduce a validation exercise to evaluate whether variation in the estimated RWTP effect can be attributed to characteristics that are generally associated with attractive areas.

Model Validation
Our point of departure is a discussion of the results provided for in the proposed methodology. 5 Thus, before exploring the subtleties of the multilevel specifications, we first step through a description of the parameter estimates for the store-level explanatory characteristics. To do this, we use the classical multilevel model (shown by the second column in Table 1) as a baseline but note that the estimates are generally consistent across each model. Overall, the estimates for the store-level covariates that enter our design matrix are fairly intuitive and of the expected signs for all models. For example, every additional room in the premise increases the ratable value by 7 percent, which is consistent with the VOA's mandate to adjust the ratable value by property-level characteristics (VOA 2014). This is also reflected in the number of car parking spaces, where each additional ten spaces increases the ratable value by 1.3 percent. Somewhat surprising, increasing the total floor area by 1,000 m 2 only seemed to increase the ratable value by 2.3 percent, but given that we control for different store sizes latently with the store category variables, this is somewhat expected. On the whole, the store type categorizations are consistent with conventional wisdom. The ratable value for premises such as takeaway food outlets, for example, is generally 20.2 percent less than the reference category, showrooms. This makes sense because the locations of takeaway outlets are generally linked to geographical inequalities in health outcomes (Daras et al. 2018), which are simultaneously related to environments that are considered less desirable. On the other end, the ratable value for hypermarket stores (with a gross floor area over 2,500 m 2 ) is three times greater, which is expected given the number of business assets such as lifts, warehouse machinery, and CCTV security systems common to large supermarket stores. We next address model selection by means of goodness-of-fit tests. In each case, every model had a highly similar root mean square error (RMSE) and log-likelihood value. Although the R 2 of the spatial fixed effect model (67.6 percent) is marginally higher than the multilevel model(s) (67.1 percent to 67.2 percent), the spatial fixed effect fits a parameter for J ¼ 2, 951 retail centers, which contrasts with the regularization introduced by hierarchical pooled effects in the multilevel models for smaller groups. In other words, not only does the spatial fixed effect likely overfit but the estimates and standard errors of the retail center fixed effect will be noisier in places with a smaller number of properties. In our case this is pertinent because the minimum number of stores across retail center boundaries is two. For this reason, we motivate our preferred specification as the multilevel model(s). Because the performance of each multilevel specification is comparable on goodness-of-fit grounds, however, we undertake further examination of the substantive effects in the RWTP estimate later on.
A comparison of the rank-ordered estimates for the RWTP effect h j are visualized for each model in Figure 2, which reflects our rankings of point estimates for RWTP, along with a measure of uncertainty shown by the 95 percent confidence (fixed effect and multilevel model) and credible (HSAR, HSE, HSMA) intervals. If any of the confidence or credible density bands for any two models overlap, the two estimated ranks are not distinct. The rankings, 1 to 2,951, are presented on the x-axis, and the y-axis displays the estimated RWTP value in log units. Additionally, we include a zoomed inset to highlight movement in the estimated RWTP value,  which is zoomed at a window that displays the most variability in the estimated scores between each model. Taking a closer look, it appears that the movement for the RWTP estimate relative to the spatial fixed effect model, marked by the green line, is not uniform. In the upper and lower tails, for example, there is systematic variation in the parameter estimates between the spatial fixed effect and estimates of the multilevel models. This suggests that the point estimates for RWTP values deviate widely from the multilevel models for the most and least desirable retail center boundaries, with little systematic variation in between. At a general level, Figure 2 reproduces a classical result, because the estimates of the multilevel model demonstrate hierarchical pooled effects; that is, shrinkage toward the global intercept. Here, the estimates exhibit improved precision, which contrasts with the higher magnitude of uncertainty in the spatial fixed effect estimates, as shown by the more extreme and noisy estimates in the upper and lower tails of the figure.
Shrinkage effects can be seen clearer in Figure 3, where we sample nine retail centers from our rankings to demonstrate movement in the RWTP estimates by expanding the point estimates horizontally along a two-dimensional axis for each model. In the case of Meridian Leisure Park, Leicester, for example, the fixed effect estimate is shrunk from 10.42 6 0:73 to 9.72 6 0:51 in the MLM. In real terms, this reflects a change in magnitude from £33,523.43 to £16,647.24 when we exponentiate from log units. Interestingly, what is also observable for this retail center is what Wolf et al. (2018) described as "spatially-local shrinkage," where spillovers from the jth adjacent retail centers cause growth in the spatial multilevel estimates toward the mean of neighboring retail centers from 9.72 6 0:51 to 9.82 6 0:51 under the HSAR model. Although none of the interval estimates become disjoint for each retail center, the findings from the spatial models suggest that the RWTP estimate is moderated by shrinkage toward the values of neighboring retail centers. Having discussed our rankings, we now build intuition toward our preferred specification for the RWTP estimate, which we begin by turning our attention to the within-boundary (r 2 e ) and betweenboundary (r 2 u ) variance components. By combining these measures, we calculate the variance partitioning coefficient (VPC) for the multilevel model, which measures the proportion of variance explained by the hierarchical structure ( ). This measure outlines the correlation between stores within the same retail center and is required to ascertain the percentage of variation explained by the retail center differences for store i in retail center j (Browne et al. 2005). The VPC statistic reveals a value of 0.482, meaning that 48.2 percent of the variance in the response is explained by the retail center geography. This VPC value motivates the empirical decision to take our multilevel models as the preferred specification(s) over the fixed effect model, with these models able to flexibly accommodate the covariance structure induced by the grouping of stores by retail center boundary. Our search for a preferred specification continues by evaluating potential spatial dependence in the RWTP effect u j estimated by the MLM. Given that the MLM assumes RWTP values to be independent of each other, we follow G. Dong and Harris (2015) and use a Moran's I to test whether the estimates for RWTP are spatially dependent. A Moran's I statistic for u j premised on the spatial weights matrix M for the retail center polygons returns a coefficient value of 0.174 (p > 0.001). This illustrates positive spatial autocorrelation for the estimated RWTP values, which motivates using the spatial models given that the core model assumption of independence for u j across retail centers does not hold.
We subsequently turn direct attention to the spatial multilevel models. Given that our hierarchical approach is fully Bayesian, trace plots are required to monitor the convergence of each parameter to the target distribution (see Appendix). In each case the parameters were assessed to have converged. Moreover, there was no serial autocorrelation identified in the stationary Markov chain for each parameter. The first substantive difference we observe is that not accounting for spatial dependence leads the MLM to marginally overestimate the retail center boundary variance r 2 u relative to the spatial models; r 2 u can be understood as the average variation of RWTP values across the retail centers in log units. Here, r 2 u falls from 0.503 in the MLM to 0.484, 0.477, and 0.492 in the HSAR, HSE, and HSMA models, respectively. We also recover evidence of a significant spatial autoregressive parameter k, which is indicative of spatial spillover effects of RWTP values between neighboring retail centers. This is recovered because k is distinct from zero at the 95 percent credible interval. Interestingly, the density of the covariance structure seems to affect the estimate for k: The HSMA model, with a sparse covariance structure that is restricted to first-and second-order neighbors, estimates a k value of 0.189. On the other hand, models with a denser covariance structure such as the HSAR and HSE estimate highly similar values of 0.232 and 0.230. Each of these estimates indicates spatial interaction effects among retail center boundaries.
To aid the visualization of spatial patterning, we illustrate the case of Liverpool in Figure 4 with assistance of legendgrams that show the distribution of RWTP values across all 2,951 retail centers, color coded using k ¼ 8 break points classified using Fisher-Jenks optimization (Jenks 1967). Each cell highlights a selected retail center in red, with the corresponding RWTP estimate shown by the vertical bar stemming from the x-axis of the legendgram, with 95 percent confidence and credible intervals shaded on either side to highlight uncertainty in the estimate. From left to right, the columns identify the RWTP estimate for the fixed effect, multilevel, HSAR, and HSMA models. From a first reading, the spatial patterning in Figure 4 seems to reveal a fragmented picture of vitality and decline, with less desirable retail centers observed in the immediate hinterland of the prospering regional center (identifiable by the large red polygon in the first row). Overall, from this reading of Figure 4, we are able to discern spatial hierarchies that possibly fragment Merseyside's functional market area, with certain retail centers eliciting a higher willingness to pay than neighboring centers.

Technical Validation
After motivating our preferred methodological approach, we undertake a validation exercise to evaluate whether the estimated RWTP effect h j for each retail center in the HSAR model responds to characteristics that are generally identifiable for prospering and thriving areas. Here, we regress h j on a selection of variables using ordinary least squares first to assess whether any of the variation in the estimated RWTP values can be attributed to variation in the selected explanatory variables and, second, to quantify the strength of relationship, if any, between the response and explanatory features. Principal attention is paid to the 2011 census Workplace Zone (WZ) population characteristics (Mitchell 2014) that represent individuals working in the retail center. As commuter patterns change, the spatial distribution of the working population changes, which holds when the bulk of economic activity occurs during "traditional" office hours (Mitchell 2014), and WZ statistics are preferable because they describe the daytime working population who commute to their places of work inside the retail center. The WZ variables we use include the percentage of people who report their general health as "good" or better, the percentage of individuals with no qualifications, the percentage of homeowners, the percentage of workers enrolled in higher managerial occupations, and the percentage of individuals in full-time employment. Other variables we consider include the vacancy rate of stores in the retail center calculated from the LDC database 6 ; a raw count of stores from the LDC database; the amount of urban green space (m 2 ; Daras et al. 2018); logged median housing values for the 2015 rolling year (Land Registry 2016); and, finally, binary variables for regions in England and Wales that reflect Nomenclature of Territorial Units for Statistics (NUTS) subdivisions-North West, London, West Midlands, and Wales, for example. In each case, the variables are spatially joined 7 from WZ statistical units to the retail center boundary polygons.
The findings are displayed in Table 2. Generally, they are consistent with expectations, although there are deviations from conventional wisdom. For WZ characteristics, an increase in the number of individuals with "good" health (or better) by 1 percent increases the RWTP value by 3.9 percent. Similarly, an increase in the number of people with no qualifications by 1 percent decreases the value by 2.9 percent. Surprisingly, an increase in the number of workers in higher managerial occupations by 1 percent decreases the RWTP of the retail center by 4.9 percent. At first glance this result appears counterintuitive, but managerial workers are more likely to work in financial districts characterized by mostly office space, which are not necessarily perceived as desirable in the same way that consumer amenities such as leisure plazas and urban green spaces are.
Next, we consider retail center boundary characteristics. For every additional 100 stores in the retail center, the RWTP value increases by 4.3 percent, which implies that patrons value a large number of available retail destinations. Similarly, as the vacancy rate increases by 1 percent, the RWTP of the area decreases by 2.2 percent. Again, this is consistent with expectations that a large number of vacant units deteriorates the vibrancy of the streetscape by revealing signs of decay. On the other hand, the availability of urban green space was not a significant determinant. For the regional indicators, relative to the East Midlands reference category, we recover some examples of regional inequality. Whereas retail centers in the East of England are estimated as having the highest RWTP value (28.9 percent), there is a clear disparity in the estimated values for North West England (9.5 percent), South West England (4.1 percent), and to a lesser extent, Yorkshire and The Humber (13.5 percent) when compared to South East England (20.3 percent) and London (20.3 percent). These inequalities are broadly consistent with regional variations in wealth across England and Wales (Rowlingson and McKay 2011). In all, the validation exercise demonstrates a relationship between RWTP values and socioeconomic characteristics that is consistent with conventional wisdom. Although not conclusive, the coefficients of our estimates suggest that a decline in RWTP is related to urban environments with poorer social and community well-being. This begins to address a key gap in the evidence linking retail center outcomes to characteristics of the urban environment that is identified by the Department of Business, Innovation and Skills for England and Wales (BIS 2011).

Conclusion
The depth and breadth of leisure and retail opportunity are increasingly linked to the desirability of places to live (Glaeser, Kolko, and Saiz 2001). Because the quality of urban environments cannot be qualified by a natural unit of analysis, the willingness to pay to receive an amenity-rich environment has often been explored through the lens of the residential housing market. The groundings of this article were motivated by similar hedonic analyses, except that we used business rates for commercial properties alongside a nontrivial methodological framework to estimate RWTP, for which we provide a detailed exposition for reproducing the analysis. Similar to approaches that analyze housing prices, by controlling for property-level characteristics such as the total floor area, car parking spaces, and store type, the remaining variation in the business rate was attributed to the RWTP. This was possible because business rates approximate local market conditions, because ratable values are set by estimating a basic cost per square meter that is adjusted to reflect similar properties in the same area (VOA 2014). Despite our empirical motivations, particular attention to how the RWTP estimates interface with the unique geographic behavioral characteristics of the UK retail landscape was required. Due to restructuring of the traditional brick-and-mortar retailer landscape through growth in electronic retailing, our study required particular attention to the nuances of UK retail spaces. It is often argued that growth in online retailing is forecast by its deleterious effects that cause physical shopping opportunity to be substituted online (Doherty and Ellis-Chadwick 2010). Despite these concerns, online retail has recently been linked to complementarity and modification processes. These processes blend traditional retail with e-commerce through integration of technologies such as click and collect points that operate as points of delivery for Internet sales . Thus, through the market system of using business rates, the RWTP estimates relate to how much the behavior of consumers values a given retail area. Among the context of behavioral patterns, this allowed us to unpack hierarchies of retail spaces. These spaces are an underlying driver to the sustainability of built environments and so, by implication, reveal geographical patterns in urban growth and development.
Multilevel models have a rich history in the educational sciences literature for building league tables of school performance (Goldstein 2003). We used similar motivations to build a ranking of retail centers, except that unlike previous studies, we allowed for possible spatial autocorrelation that operates on the basis of geographical proximity. This is because the RWTP effect per retail center is likely to covary based on spatial proximity. With these motivations, and by revamping the traditional focus of multilevel modeling techniques, we were able to derive retail center estimates of RWTP. A particular focus on retail centers, our geography of choice, was because they have been argued as a moderating influence on urban hierarchies (Dennis, Marsland, and Cockett 2002). Yet, there is a limited availability of national data for measuring the economic and social value of retail centers, with a presumptive attitude in UK policy circles that the impacts of policy instruments such as the Town Centers First approach are "instinctively positive" (BIS 2011). In producing ranked estimates, we remedied these uncertainties by building quantifiable evidence to directly observe disparities in RWTP across networks of retail centers. More concretely, the derived scores allow an understanding of a particular retail center's position within a network of centers; this can be used as a proxy of economic health and an indicator of the pull that particular retail center catchments have on consumers in the area. From this, retail practitioners might be able to use the derived scores as proxies for footfall generation, which would allow them to deduce consumer appeal of particular centers. Knowledge of such characteristics might be used in decision-making processes, such as determining investment and divestment outcomes or the rationalization of store portfolios, for example. At a general level, our findings also provide a platform for researchers to build on. The applied methodology provides a blueprint for constructing hierarchies of retail centers that is replicable and generalizable to similar contexts, conditional on data availability. In addition, to our knowledge, the study is the first of its kind to build indicators that describe hierarchies of retail centers across a national extent, with previous studies typically limited to smaller case study areas. Finally, a core and intentional contribution of the article is the potential for exploration of hypotheses in retail geography that were previously unavailable due to the absence of statistical data on retail centers.
To conclude this article, we illustrate elaborations to consider for future research. One refinement involves the addition of further attributes at the store or retail center level to be specified into the modeling approach. This might involve undertaking visual, in-person surveys for small case study areas to collect image attributes identified in Gomes and Paula (2017) such as parking security, atmosphere perception, or mix and quality of stores within the retail center boundary, for example. Due to the practicality concerns of obtaining these highly granular measures in this study, this direction would reduce the number of retail centers for which the approach can return RWTP estimates. The benefit, however, is that it would allow an estimation of the willingness to pay for highly granular measures that describe image-based attributes of attractive shopping environments. As a final remark, the advantage of the applied methodology is that it can be redeployed in the future to generate timely updates. This is possible because the VOA continues to reassess the ratable values of nondomestic properties according to a five-year revaluation cycle (VOA 2014). Conditional on the VOA continuing to release their ratings list as an open data product, the area estimates of RWTP are updatable over time. Future research might develop retail center rankings into a longitudinal data product that allows an exploration into the temporal characteristics of RWTP and how successive five-year windows alter the rank-ordered positions of retail centers.

Funding
This project was funded by ESRC studentship funding.

ORCID
Daniel Arribas-Bel http://orcid.org/0000-0002-6274-1619 Notes 1. Given this article's intersection between retail geography and urban economics, particular attention to the conceptualization of attractiveness is required. Whereas urban economists perceive attractiveness through an estimation of willingness to pay, retail geographers might observe the attractiveness of shopping environments through a lens of imagebased characteristics such as cleanliness of the shopping environment, plurality and variety of shops, or existence of fun and entertainment programs (El-Adly 2007; Chebat, Sirgy, and Grzeskowiak 2010; Gomes and Paula 2017). Thus, to avoid confusion, in this article we adopt the direction of the former and describe our measure of interest by willingness to pay. 2. Spatial connectivity at the retail center level is specified as where d ij is the Euclidean distance between retail center and d is the fixed-distance bandwidth. A semivariogram was used as an exploratory tool for determining the distance at which the spatial dependence between business rates between retail centers became negligible (see Appendix). 3. The following conjugate priors are chosen: 4. LDC premise types were recoded in accordance with VOA Special Categories outlined in Rhodes and Brien (2017). 5. Potential problems of multicollinearity were assessed using variance inflation factor (VIF) scores for each predictor variable in the spatial fixed effect model. VIF scores revealed no evidence of such problems, with scores of about 3.0 leading us to continue with our inferential exercise. 6. Vacancy rates are defined as the proportion of all available retail units that are vacant or unoccupied. 7. Because there is only partial overlap between the retail centers and WZ polygons, the resulting WZ statistics are aggregated by the mean value for the intersecting WZ geometries when joined to each retail center polygon.