A Longitudinal Mixed Logit Model for Estimation of Push and Pull Effects in Residential Location Choice

Abstract We develop a random effects discrete choice model for the analysis of households’ choice of neighborhood over time. The model is parameterized in a way that exploits longitudinal data to separate the influence of neighborhood characteristics on the decision to move out of the current area (“push” effects) and on the choice of one destination over another (“pull” effects). Random effects are included to allow for unobserved heterogeneity between households in their propensity to move, and in the importance placed on area characteristics. The model also includes area-level random effects. The combination of a large choice set, large sample size, and repeated observations mean that existing estimation approaches are often infeasible. We, therefore, propose an efficient MCMC algorithm for the analysis of large-scale datasets. The model is applied in an analysis of residential choice in England using data from the British Household Panel Survey linked to neighborhood-level census data. We consider how effects of area deprivation and distance from the current area depend on household characteristics and life course transitions in the previous year. We find substantial differences between households in the effects of deprivation on out-mobility and selection of destination, with evidence of severely constrained choices among less-advantaged households. Supplementary materials for this article are available online.


Introduction
The sorting of households into neighborhoods has fundamental implications for many social outcomes. Patterns of residential mobility shape the spatial distribution of populations and the extent to which certain groups, such as ethnic minorities, immigrants and the economically disadvantaged, are geographically concentrated. Any study of neighborhood effects-the causal effect of place on people-must tackle the issue of nonrandom selection into places of residence. Indeed, it can be argued that the selection process is integral to understanding the key issues of interest (Bergström and van Ham 2010).
The availability of geographical identifiers in many datasets means it is relatively easy to document where different types of household are located. It is far more challenging to understand why households have chosen the neighborhoods in which they live. The difficulty arises because neighborhoods are multidimensional packages of different attributes, in which types of dwellings, physical geography, and social composition are chosen simultaneously and tradeoffs between different area characteristics when making a decision are inevitable. Of particular interest in residential location choice is the distinction between factors that "push" households to leave a neighborhood and factors that attract or "pull" households toward a particular neighborhood (Lee, Oropesa, and Kanan 1994). However, most previous research has focused exclusively on push factors (e.g., Crowder and South 2008;van Ham and Clark 2009), while studies that have investigated pull factors have restricted analysis to the sample of movers and characterised origin and destination areas along a single dimension of neighborhood quality (e.g., Clark 1992;Clark and Ledwith 2007;Rabe and Taylor 2010). A promising approach adopted in recent research is a conditional logit model, a type of discrete choice model where the categorical response is the neighborhood of residence and neighborhood attributes are included as predictors (Ioannides and Zabel 2008;Hedman, van Ham, and Manley 2011;Bruch and Mare 2012). These models allow location choice to depend on multiple neighborhood characteristics, but their use to date has been confined to cross-sectional data or, when applied to longitudinal data, has not fully exploited information on repeated residential choices.
Longitudinal data of the type collected in household panel studies and population registers provide rich information on changes in households' place of residence and in their demographic and socioeconomic characteristics. A particular advantage of longitudinal data, thus far ignored, is the ability to disentangle push and pull effects of area characteristics; these effects are confounded when a cross-sectional approach is taken. In this article, we develop a longitudinal discrete choice model parameterized in a way that permits separation and joint estimation of push and pull effects. While mixed logit models have been developed for the analysis of longitudinal discrete choice data (Jain, Vilcassim, and Chintagunta 1994;Bhat and Guo 2004), to our knowledge, the distinction between push and pull effects of alternative-specific attributes has not been considered previously. Our model also allows the importance of area characteristics to depend on observed time-varying household characteristics, and household-level random effects are included to account for unobserved heterogeneity in the effects of neighborhood characteristics which may lead to dependency in household choices over time. The random effects are assumed to follow a multivariate normal distribution which allows for correlation between push and pull effects of a given neighborhood characteristic and between the effects of different characteristics. We show that nonzero random effect correlations have an important role to play in relaxing the "independence of irrelevant alternatives" assumption. Spatial correlation can be incorporated by including random effects at a higher geographical level than neighborhood, for example districts, allowing for correlation in the choice propensities for neighborhoods within the same district.
A potential barrier to the application of random effects discrete choice models to neighborhood choice is the computational challenges presented by typically large analysis datasets. National panel studies usually have several thousands of respondents which, when combined with multiple waves of measurement and a large choice set, renders existing likelihood-based and Bayesian estimation approaches infeasible. Even when cross-sectional models have been used, previous studies have reduced the size of the problem by focusing on choices within a small geographical area (Bruch and Mare 2012) or by taking a small subsample of the full choice set (Hedman, van Ham, and Manley 2011). We propose an efficient, flexible Bayesian procedure for estimation of random effects (mixed) discrete choice models to longitudinal data with large choice sets. The mixed logit model is illustrated in an analysis of residential choice in England between 1998 and 2008 using data from the British Household Panel Survey linked with neighborhoodlevel census data. In particular, we consider how the push and pull effects of area deprivation and the pull effect of distance from the current neighborhood depend on observed household characteristics, while allowing for unobserved household heterogeneity. More generally, the methods we propose have applications in other settings where push and pull effects may be of interest. For example, in studies of brand loyalty in market research, a high price could push customers away from a particular brand, while a low price of a rival product could cause customers to switch brands.

A Longitudinal Mixed Logit Model for Residential
Location Choice

Preliminaries
The following models are described in terms of household rather than individual choices, while recognising that co-resident nonrelated adults may be independent decision makers with regard to residential mobility and neighborhood choice. Our working definition of a household is given in Section 3. We begin by setting out the classic discrete choice model before considering extensions that allow separation of push and pull effects of area characteristics, and differential push and pull effects by observed and unobserved household characteristics. Suppose that household i (i = 1, . . . , n) chooses its area of residence at year t (t = 2, . . . , T ) from a set C it containing R it potential areas. Some households will move between years t − 1 and t, which will typically lead to a change of area. The choice set is permitted to vary across households and time because it is both behaviorally unrealistic and computationally infeasible for households to choose from a common fixed set of areas (Lee and Waddell 2010). For example, the choice set might be restricted to the set of areas within a specified distance of a household's location at t − 1.
Let y it be the categorical response indicating the observed area of residence for household i at year t. A general discrete choice model for the response probability is (1) where η rit is the linear predictor which will usually be a function of area characteristics and their interactions with household characteristics, both defined at year t − 1, and η it is the vector of linear predictors for household-year observation it.
The model in Equation (1) can also be expressed in terms of the log-ratio of the choice probabilities for a pair of areas r and s log p sit p rit = η sit − η rit . (2)

A Multinomial Logit Model with Push and Pull Effects of Area Characteristics
We begin with a model that includes only area characteristics, but allow their effects on the choice between areas r and s at t in Equation (2) to depend on whether a household is resident in one of these areas at t − 1. We show how this distinction, possible only with longitudinal data, allows estimation of push and pull effects of area characteristics. Let w ri(t−1) = I(y i(t−1) = r) where I(.) is the indicator function, and denote by z r(t−1) a p-vector of characteristics of area r defined at wave t − 1. More generally, area characteristics can also be household specific, for example the distance between a potential area and the current place of residence. The linear predictor for a simple model including only area characteristics is where α is a scalar and β and γ are parameter vectors. The type of model defined by Equations (1) and (3), with covariates relating to response alternatives and fixed coefficients across alternatives, was originally referred to as a conditional logit model (McFadden 1974). However, it was subsequently shown to be equivalent to the multinomial logit model, which traditionally has subjectspecific covariates and alternative-specific coefficients (Maddala 1983). In common with most of the discrete choice literature, we therefore refer to the model as a multinomial logit model hereafter. These models are widely used for analyzing categorical responses where interest lies in the effects of attributes of the response alternatives on individual choice, including applications to brand preference (e.g., Jain, Vilcassim, and Chintagunta 1994) and transportation demand (e.g., Ben-Akiva and Lerman 1985).
In the case of residential choice, Bruch and Mare (2012) first proposed the inclusion of the lagged choice indicator w ri(t−1) and its interaction with area characteristics z r(t−1) to allow for the possibility that households "evaluate their current location differently from other potential destinations. " Equation (3) represents a reparameterization of their model in which β and γ may be interpreted as push and pull effects of z r(t−1) . The interpretation of β and γ, and our definition of push and pull effects, can be seen more clearly by considering how a household's choice between two areas r and s at t depends on their residence at t − 1 according to the values of w ri(t−1) and w si(t−1) . (2) and (3), the log-odds of moving to area s between t − 1 and t versus remaining in area r are

Case 1. Resident in r at
We can also derive the log-odds of moving out of area r to any area as From Equations (4) and (5) we see that area characteristics influence residential mobility and location choice in two ways. First, there are effects of the characteristics of the current area r on moving out of r, represented by β. We refer to β as the "push" effect of z r(t−1) . Second, the characteristics of a potential area s may attract (or repel) households from choosing s as a new destination. We refer to γ as the "pull" effect of z s(t−1) . The push effect of z is conditional on the level of z in other areas in the choice set, while the pull effect of z is conditional on the level of z in the current area. The model may be extended to allow for interactions between elements of z r(t−1) and z s(t−1) . For example, a household's sensitivity to area deprivation when choosing a new area may depend on the level of deprivation experienced in the current area. In Equation (4), each area characteristic z has both a push and pull effect. For some types of z, for example the distance or travel time between a potential destination s and the origin area, only a pull effect is defined. When the origin area is r, this is equivalent to setting z r(t−1) = 0.
α is the baseline log-odds of staying in r rather than moving to a new area, referred to as the inertia parameter. The estimate of α is expected to be large and positive because most households do not change area between t − 1 and t.
Case 2. Resident in neither r nor s at t − 1 (w ri(t−1) = 0, w si(t−1) = 0). For a household which is not resident in area r or s at t − 1, the log-odds of choosing area s over area r are where, as in Case 1, γ may be interpreted as a pull effect, but now of one new area over another.

Allowing for Household Heterogeneity in Inertia and the Effects of Area Characteristics
The linear predictor in Equation (3) can be extended to allow inertia and the push and pull effects of z rt−1 to depend on a qvector of household characteristics x i(t−1) : where ] T is the qp-vector formed by taking the element-wise product of x and z. α 1 is the effect of x on the log-odds of a move out of area r between years t − 1 and t, and β 0 and γ 0 are the push and pull effects of z when x = 0. Writing β 1 = [β T 11 , . . . , β T 1q ] T , β 1k is the change in the push effect of z for a 1-unit change in household characteristic x k (k = 1, . . . , q), and γ 1k is the corresponding change in the pull effect. For simplicity of notation, the same set of household characteristics is assumed to influence mobility and to moderate the effects of the z, but this restriction may be relaxed.
It is straightforward to extend Equation (7) to include choicespecific random intercepts and coefficients of x. We consider the addition of choice random effects, and random effects defined at a higher level than neighborhood, in Section 2.6.

The Independence of Irrelevant Alternatives Assumption
It follows from Equations (1) and (2) that the log ratio of the choice probabilities for areas s and r depends on measured characteristics of these areas, but not of other potential areas. This property of the discrete choice model is known as the "independence of irrelevant alternatives" (IIA), and it implies that the choice between s and r is unaffected by the addition or exclusion of other alternatives (Ben-Akiva and Lerman 1985). The source of the IIA assumption can be seen more clearly if the model is expressed in terms of continuous latent choice propensities (or utilities) y * rit which underlie the observed choice y it such that y it = r if y * rit > y * sit for r = s. The discrete choice model given by Equation (1) can be written y * rit = η rit + rit , where rit are iid residuals, assumed to follow a Type I extreme value distribution with variance π 2 /6. The IIA property stems from the independence of rit , and will be invalid if the latent propensities to choose areas r and s are correlated. In the context of residential choice, nonzero residual correlation may arise because of similarity between areas on unmeasured factors used by households in deciding where to live.
Various approaches have been proposed to relax the IIA assumption, including generalized extreme value (GEV) models and the multinomial probit model. The nested logit model is the most widely used type of GEV model (e.g., Hensher, Rose, and Greene 2005), but it requires the researcher to a priori partition subsets (nests) of similar alternatives within which IIA might reasonably hold. While the nested logit model is of limited use in modeling residential choice because of the difficulty in identifying areas that might be similar on unmeasured characteristics, it has been used to model residential location choice jointly with related decisions such as mode of transport and time of travel for various types of activity (Ben-Akiva and Bowman 1998) and residential mobility (Lee and Waddell 2010). Both applications embed a multinomial logit model for residential choice within a nested logit model for the joint decision processes. The multinomial probit model allows explicitly for residual correlation by assuming that rit follow a multivariate normal distribution, but this is infeasible in situations where the choice set is large. A more flexible way of accommodating similarity between alternatives is to use a mixed model which allows for unobserved heterogeneity between households in the effects of area characteristics.

Unobserved Household Heterogeneity
The most popular models for unobserved between-subject heterogeneity are the normal-mixed model and the latent class model (see Hensher and Greene (2003) and Keane and Wasi (2013) for reviews of mixed logit models). The normal-mixed model includes normally distributed random coefficients for the effects of z r(t−1) . Log-normal distributions may be assumed for coefficients that are expected to have the same sign for all subjects. The latent class model avoids parametric distributional assumptions and assumes that subjects come from a finite set of subpopulations Domanski and von Haefen 2010). We focus on normal-mixed models for several reasons. First, a large number of latent classes may be required to capture complex patterns of heterogeneity. In the application to residential choice, for example, there may be betweenhousehold differences in push and pull effects of multiple area characteristics, leading to a multidimensional finite mixture distribution. Second, the estimation and interpretation of latent class models is complicated by having a separate set of parameters for each class. Third, the direction and magnitude of the correlations between multivariate normal random effects are of direct substantive interest. For instance, correlated random effects can provide insights into the nature of the association between the household-specific push and pull effects of an area attribute z, or between the push or pull effects of two different area attributes.
While mixed logit models have been applied to crosssectional data, identification of unobserved household heterogeneity will generally be much improved by the availability of longitudinal data. (See Revelt and Train (1998) for discussion and applications of mixed logit models with normal and log-normal random effects for repeated choice data.) In a normal-mixed model the coefficients of the main effects of z, β 0 and γ 0 in Equation (7), are replaced by where u βi and u γ i are vectors of household-specific random effects which capture variation between households in the importance placed on z in location decisions. We also allow for unobserved heterogeneity in households' attachment to their current areas by replacing the inertia intercept parameter α 0 by the random coefficient T are assumed to follow a multivariate normal distribution with mean 0 and variance u . Thus the linear predictor can be partitioned as η rit = μ rit + δ rit where μ rit is the systematic (or fixed) component given by Equation (7) and δ rit is a random component which varies across households: (8) δ rit also varies over time, but only through the observed predictors w ri(t−1) and z r(t−1) .
We now show how the inclusion of household random effects relaxes the IIA assumption by considering how they affect the ratio of (conditional and marginal) choice probabilities and the correlation between latent choice propensities. (1) with η rit now defined as the sum of (7) and (8) gives the probability of choosing area r conditional on the household random effects u i . The addition of random coefficients u βi and u γ i allows the ratio of the subject-specific choice probabilities for areas r and s to vary across households according to differences in the (unobserved) importance placed on observed area characteristics z. The IIA property is still assumed to hold at the household level because the ratio of subject-specific probabilities for areas r and s does not depend on characteristics of any other area. However, this is not the case for the ratio of unconditional or marginal choice probabilities. Letting η rit (u i ) denote the linear predictor for the random effects model, the marginal (population-averaged) response probability is given by

Ratio of Choice Probabilities. Equation
where φ(.) is a multivariate normal pdf. The log-ratio of the marginal response probabilities for areas r and s is no longer simply the difference in the linear predictors, as in Equation (2), because the summation in the denominator of (9) does not cancel. Thus the ratio of the marginal probabilities for r and s will depend on characteristics of other areas, and the IIA assumption is relaxed at the population level (Train 2003).
Correlation Between Latent Choice Propensities. The inclusion of household-specific effects induces a correlation between the latent choice propensities for any pair of areas r and s because u i is common across the response alternatives faced by household i at time t. Consider a simplified form of Equations (7) and (8) with one area characteristic z r(t−1) , leading to three random effects (u αi , u βi , u γ i ) with covariance matrix Random effect covariances have received little attention in previous applications of mixed logit models and, indeed, random effects are commonly assumed to be independent. A notable exception is Revelt and Train (1998) who contended that correlation between random effects would generally be expected. In the present application, the random effect covariances are of particular interest because they provide information about the associations between households' latent mobility preferences and the importance they place on z in residential decisions. Suppose, for example, that high values of z in an area are associated with an increased probability of moving out of that area (β 0 < 0) and a reduced probability of moving in (γ 0 < 0). Then σ βγ > 0 implies a positive association between the push and pull effects of z, adjusting for the moderating effects of x: households who attach a higher-than-average importance to z in the decision to move out of an area (u βi < 0) also tend to have a higher-than-average sensitivity to z when choosing a new destination (u γ i < 0). The random effect covariances also play a crucial role in relaxing the IIA assumption, as shown below.
The covariance between the propensities to choose areas r and s for household i at t depends on a household's residence at t − 1 as follows.
Case 1. Resident in r at t − 1 (w ri(t−1) = 1, w si(t−1) = 0). For a household considering a move from area r to s between t − 1 and t, the covariance between the latent propensities of remaining in r and moving to s can be derived from Equation (8) as when cov( rit , sit ) = 0 and cov(δ rit , sit ) = 0 for r = s. Thus the covariance depends on two components: (i) the value of z in the potential area, weighted by the covariance between the household-specific mobility propensity (u αi ) and importance of z as a pull factor (u γ i ), and (ii) the similarity between areas r and s on z, weighted by the covariance between household-specific importance of z as a push and pull factor. If z is mean centred, the second component of the covariance will be highest for two areas with extreme above-average or belowaverage values on z and 0 for two average areas.
Case 2. Resident in neither r nor s at t − 1 (w ri(t−1) = 0, w si(t−1) = 0). The covariance between the latent propensities of choosing between two potential areas when currently resident in neither is (11) Thus the covariance between the latent attractiveness of two potential areas depends on their similarity with respect to z and on the between-household variance in the pull effect of z.

Unobserved Area Heterogeneity
The model may be further extended to allow for the effects of unmeasured area characteristics on location choice by including choice-specific random effects v r in the linear predictor. However, it can be seen from the expressions for the covariance between the latent propensities of choosing neighborhoods r and s given by Equations (10) and (11) that the inclusion of choice-specific random effects does not help to relax the IIA assumption unless cov(v r , v s ) = 0. A natural extension would be to impose a spatial autocorrelation structure on the neighborhood effects, for example to allow a nonzero covariance between neighborhoods that share a boundary. Such spatial correlation would arise if neighborhoods in close proximity share unmeasured attributes that influence a household's location choice. Bhat and Guo (2004) proposed a mixed spatially correlated logit model for residential choice at a cross-section that includes a dissimilarity parameter measuring the correlation between adjacent areas.
One issue when considering the addition of choice-specific effects in applications where the choice set is large is that the number of potential choices can exceed the number of decisions actually made, leading to an identification problem because many potential neighborhoods will not be chosen by the survey respondents over the observation period. An alternative approach is to specify random effects for a broader area classification which we refer to as "districts. " Denote by v r(d) ∼ N(0, σ 2 v ) the random effect for the district d containing neighborhood r. When neighborhoods in the choice set are nested within districts, we have a hierarchical structure and the inclusion of higher-level random effects allows for a form of spatial correlation. Specifically, for neighborhoods r and s in the same district, Equations (10) and (11) will include σ 2 v as an additional term. The model assumes that neighborhoods are exchangeable within districts, but this may be reasonable after adjusting for neighborhood characteristics such as distance from the current location.
The area-level random effects capture differences between districts in the mobility propensity of their residents and in their attractiveness as places to live. We might expect the influence of these unmeasured area characteristics to differ for residents and nonresidents, perhaps due to differences in knowledge about area attributes that are not captured by covariates. A possible specification of the area effects, that mirrors the structure of Equation (7), is where parameter λ allows the effect of unmeasured district characteristics on household location choice, and thus the betweendistrict variance, to depend on whether the household is currently resident in district d.

Sampling Alternatives in Large Choice Sets
Estimation of a multinomial logit model with alternativespecific attributes requires the data to be structured so that there is a record for each of the R it response alternatives for household i at year t. This can lead to a prohibitively large analysis file when the choice set is large, especially when decisions are observed over a long period for a large sample of households. A useful consequence of the IIA property, however, is that consistent parameter estimates can be obtained from a random subset of the full choice set, selected without replacement and including the record corresponding to the chosen alternative (McFadden 1978). For each household-year observation it, denote by q rit the probability that the record for alternative r is selected, where q rit = 1 if y it = r. McFadden (1978) described a situation where the choice set is fixed and q rit = q. More generally, we may wish to include information about the likelihood that household i chooses alternative r at t, referred to as importance sampling (Ben-Akiva and Bowman 1998; Bhat, Govindarajan, and Pulugurta 1998;Brownstone, Bunch, and Train 2000). In residential location choice, for example, the choice set for many households is restricted to areas within commuting distance of the current place of work. This leads to substantial variation in R it across households and time, where R it will typically be considerably larger in metropolitan areas than in rural areas. In such cases q rit will be inversely proportional to R it , and unequal selection probabilities are accommodated in the model by including − log(q rit ) as an offset term (e.g., Ben-Akiva and Lerman 1985; Bruch and Mare 2012). Unfortunately, when the IIA assumption is relaxed, for example by including household-specific random coefficients, McFadden's theoretical result no longer holds and random sampling of choice sets may yield inconsistent estimates (Nerella and Bhat 2004;Keane and Wasi 2012). This has led authors to explore empirically the impact of the size of the sampling fraction q on parameter estimates and standard errors from mixed multinomial logit models. Nerella and Bhat (2004) conducted a simulation study with 750 individuals, a choice set of size 200 and q varying between 0.025 and 0.75. They found a substantial impact of q on the bias and efficiency of the estimated parameters, and suggested that q should be set at 0.25 as a minimum. However, their study was based on a crosssectional design for which mixed logit models may be weakly identified. Other research, using data on repeated choices as in our application to neighborhood choice, suggests that reliable estimates may be obtained using much smaller sampling fractions. From a potential choice set of 689 alternatives, Brownstone, Bunch, and Train (2000) used a random subset of 28 (4%) and reported that increasing the sampling fraction had little effect on parameter estimates. In the most comprehensive study to date Keane and Wasi (2012) considered the impact of using random subsets of the choice set for three alternative mixed MNL models for panel data, including the normal mixed model, through Monte Carlo simulation and sensitivity analysis of real data. Based on simulations with 200 individuals, 20 choice occasions and 60 alternatives, biases were small when random subsets of 10 or 20 alternatives were used.

MCMC Estimation
The most commonly used approaches for fitting mixed logit models are maximum simulated likelihood and Markov chain Monte Carlo (MCMC) estimation. Train (2001) compares these approaches and favors MCMC for both theoretical and computational speed reasons. He gives an MCMC algorithm for such models which generalizes work by Allenby (1997), and builds on ideas of Albert and Chib (1993). In this article, we modify Train's algorithm to accommodate parameterizations designed to improve the efficiency of MCMC estimation, which is especially important when the sample size and choice set are large. We consider a combination of hierarchical centering and orthogonal parameterization, adapting algorithms used for estimation of multilevel binary response models (as in Browne et al. 2009). This section provides an overview of estimation of the model with household-level random effects, given by the sum of Equations (7) and (8). Further details of the MCMC algorithm for the extension to area effects are provided in the online supplementary materials.
The fixed part of the model, Equation (7), includes parameters for mobility (α 0 and α 1 ), and push effects (β 0 and β 1 ) and pull effects (γ 0 and γ 1 ) of choice-specific attributes z. From an algorithmic point of view, it is convenient to distinguish between coefficients that have an associated household-specific random effect (α 0 , β 0 , γ 0 ), as described in Section 2.5, and those with a fixed effect only (α 1 , β 1 , γ 1 ). Train (2003) focused on a general model where all coefficients are random at the subject level, but considers the above specification as a special case that may be useful in certain situations, such as when the full random effects covariance matrix cannot be identified. In the application that follows, the variances of the random effects for β 0 and γ 0 , the main push and pull effects of area attributes z r(t−1) , are of particular interest as measures of between-household heterogeneity in the effects of z r(t−1) that is unexplained by household covariates x i(t−1) .
Let θ = [α 0 , β T 0 , γ T 0 ] T and θ i = θ + u i with associated data vector A ri(t−1) , and let ϕ = [α T 1 , β T 1 , γ T 1 ] T with data vector B ri(t−1) . The linear predictor for the mixed logit model can be reexpressed as where θ i ∼ MVN(θ, u ) and log(q rit ) is an offset (see Section 2.7). It is common to parameterize the model in terms of θ i rather than θ and u i . This parameterization, known as hierarchical centering (Gelfand, Sahu, and Carlin 1995), can improve mixing when the random effect variances in u are not too small, as it allows a Gibbs sampling step for θ rather than a Metropolis step. The other speed up we consider is an orthogonal reparameterization similar to that of Browne et al. (2009). This involves replacing B ri(t−1) by an orthogonal vector that spans the same space. This is achieved using a standard orthogonalising algorithm and we can then run MCMC using the transformed predictors. The chains for the parameters in the original parameterization can be retrieved by a simple matrix transform based on the inverse of the transformation of the predictors (see Browne et al. 2009, for details).
The algorithm has been implemented in the Stat-JR package (Charlton et al. 2013) which allows MCMC chains to be run in parallel with both hierarchical centering and orthogonal parameterization of the fixed predictors. The code has also been optimized so that estimation times are significantly faster than implementing the same model in alternative packages such as WinBUGS (Lunn et al. 2000). Optimization for the mixed logit model involves the storage of intermediate quantities and constants within the likelihood to reduce the number of computationally-expensive calculations, for example exponentiations. The likelihood also contains, for each observation, a linear predictor with many parameters and several steps involve calculation of terms that are equal to the linear predictor minus one element. Storage of the linear predictor for each observation and a technique of subtracting the relevant element, updating it and then adding it back to the linear predictor lead to a substantial reduction in computing time.

Monte Carlo Simulation Study
A simulation study was conducted to explore the impact of sample size (number of households n) and panel length T on the bias and accuracy of estimates of the regression coefficients and random effect parameters. The linear predictor of the data generating model has the same form as Equations (7) and (8), with one area-level variable z and one household characteristic x and household random effects for inertia and the push and pull effects of z. The simulation conditions included sample sizes of 1000 and 1500 households and panel lengths of 5, 10, and 15. We also investigated the extent to which the correct random effects covariance structure was recovered by fitting models to simulated data with correlated and independent household random effects.
As expected, bias decreases as the number of waves per household increases, with the greatest impact on the inertia variance (σ 2 α ), the largest of the random effect variances. However, even in the case of 1000 households and 5 waves, the biases are not large (and are in line with posterior mean estimates from a simulation study by Browne and Draper (2000) that assessed MCMC estimation of random effects binary logistic models with a uniform prior). Increasing the number of households by 50% also leads to bias reduction. Finally, when a model with correlated random effects is fitted to data generated from a model with independent random effects, the correct correlation structure is recovered. There are substantial improvements in accuracy with increases in either the number of households or the number of waves, with the largest impact observed for the random effect parameters and in particular the inertia variance. For the regression coefficients, the mean standard error across replications and the empirical standard error (standard deviation of the parameter estimates) are very similar for all scenarios considered, and there is minimal loss in accuracy when the random effects correlation structure is incorrectly specified. Details of the design of the simulation study and results are given in online supplementary materials.

Data
Neighborhood Definition and Variables. We define a neighborhood as a Lower Super Output Area (LSOA), census areas which contain between 400 and 1200 households and an average of 1500 individuals. There were 32,482 LSOAs in England by the 2001 Census definition. We characterize neighborhoods along two dimensions: deprivation and distance from a person's current residence. Deprivation has been widely used as a measure of neighborhood quality in earlier analyses of residential choice. Previous studies have considered the push effect of deprivation and, among movers, compared the level of deprivation in the origin and destination neighborhood (e.g., Rabe and Taylor 2010).
The distance between the current neighborhood and a potential destination has not been previously studied, but is likely to be an important factor in location choice. We anticipate that many households would prefer to remain close to their current residence when considering a local move, especially families with school-age children. However, for new homeowners or for households seeking a larger home, a longer-distance move may be necessary to secure affordable housing. The measure of deprivation used is the LSOA-level English Index of Multiple Deprivation (IMD), a weighted combination of seven domain indices which capture different aspects of deprivation within a neighborhood, relating to income; employment; health and disability; education, skills, and training; barriers to housing and geographical access to services; crime; and living environment (see Dibben et al. 2007, for details). Three versions of the IMD have been constructed for 2004, 2007, and 2010. The score we allocate to a particular LSOA depends on the household survey year, as described below, and is standardised using the English LSOA-level average and standard deviation from 2007.
The second neighborhood measure is the distance in kilometres between a household's LSOA of current residence and each alternative LSOA. These are the straight-line distances between the population-weighted centroids of each area, calculated from the Ordnance Survey centroid grid references provided by the Office for National Statistics (ONS). An important difference between our distance and deprivation measures is that distance is household specific and, as a consequence, only its pull effect can be defined. If z si(t−1) denotes the distance between area s and the current location of household i at t − 1, then z ri(t−1) = 0 and β vanishes from the expression for the log-ratio of the probabilities of moving to area s rather than remaining in r (Equation 4).
To define the choice set of neighborhoods relevant to a household we use the household's current Travel-to-Work-Area (TTWA). TTWA is a labor market area definition, derived from 2001 Census information on home and work addresses, and used by the ONS to reflect areas where the bulk of the resident population also work within the same area. TTWA boundaries are nonoverlapping and contiguous, and cover the whole of the United Kingdom. TTWAs do cross national boundaries, and of the 243 that cover the entire United Kingdom, 166 contain at least one LSOA in England. In our study a household's choice set of neighborhoods includes all English LSOAs within the current TTWA. The mean number of LSOAs per TTWA is 196, but TTWAs are substantially larger in metropolitan areas such as London with 5467 LSOAs.
Due to the large number of LSOAs, it was not possible to identify choice-specific random effects to allow for betweenarea heterogeneity. Random effects were therefore specified at a higher "district" level, as described in Section 2.6, where "districts" are defined as Medium Super Output Areas (MSOA). MSOAs are census areas with between 2000 and 6000 households. There were 6781 MSOAs in England at the 2001 Census, and LSOAs are nested within MSOAs.
Household Panel Data. Data on the characteristics and residential locations of households are taken from the British Household Panel Survey (BHPS) (ISER 2009). The BHPS is a nationally representative sample of about 5500 private households recruited in 1991, containing approximately 10,000 adults who are interviewed annually. The core questionnaire elicits information on topics such as household composition, housing tenure, employment and income at each annual interview. Our analysis uses information from waves 8-18, covering the period 1998-2008. Earlier waves are excluded because of lack of comparable area-level IMD data in this period.
One challenge in the analysis of household panel data is how to define a household longitudinally when its composition may change over time. The usual approach to this problem is to follow individuals, rather than attempt to track households, with analyses based on person-year observations. However, this is inappropriate for couple households where decisions are likely to be made jointly while partners are co-resident (Steele, Clarke, and Washbrook 2013). For this reason, couples contribute only one person-year record to the analysis file while they are together. Using this approach, couples are regarded as a single decision-making unit, and any other individual is treated as an independent decision-maker. Thus an individual living with unrelated adults is treated the same as an individual living on their own.
We model a household's residential location choices for up to 10 years but, due to a combination of late entry to the study and dropout, only 17.7% of households are present at all waves and the mean number of waves is 4.9. (Late entrants include children of original sample members (OSMs) who join the panel at age 18 and new partners of OSMs.) The following analysis is based on records from any household observed for at least two adjacent years t − 1 and t. Households contribute records to the analysis file up to the time of dropout and, for cases with intermittent nonresponse, after their return to the study. This approach assumes that incomplete data are missing at random, conditional on observed residential location choices together with household and area covariates. In reality, a "missing not at random" (MNAR) mechanism is more plausible because moving house between t − 1 and t is a cause of dropout at t. Washbrook et al. (2014) investigated the effect of nonignorable dropout in an analysis of residential mobility using two types of selection model applied to BHPS. Although they found strong evidence of nonignorability, modeling the dropout process had little effect on the coefficient estimates of the residential mobility model. Our model of residential choice could also be extended to incorporate a dropout model, but we do not pursue this here because it seems reasonable to expect that MNAR is more of a concern for mobility than for choice of area.
In the residential mobility literature, it is usual to distinguish local or short-distance moves from longer-distance moves because the two types of moves have very different determinants. By restricting the choice set to LSOAs within the TTWA of residence at t − 1, we focus on local moves within a given labor or housing market which tend to be triggered by family events such as the arrival or departure of a child (Clark and Huang 2003). Because of our focus on movement within labor market areas, the sample is further restricted to individuals of working age, defined as between 18 and 59 years old at t.
Household-level characteristics considered in the analysis are for the most part defined at t − 1. We define seven categories of household type, distinguishing single males and females; single parents (of either sex) of a child under 16; and couples with a resident youngest child aged 0-4, 5-10, 11-15, and 16 or more. Housing tenure is categorized as owned (outright or with mortgage), private rented, social (council or Housing Association) rented, and living with family. This last group consists of people who are a relative (other than a spouse) of the household reference person (HRP). The HRP is the person legally or financially responsible for the accommodation. Ninety-five percent of individuals in the "living with family" group are children living in the parental home, with siblings of the HRP making up the largest group among the remainder. We measure the gross household income over the previous 12 months as the combined incomes of the two members of a couple, or the individual income of a single person. The log of income in pounds is included in the model. We also include some indicators of life course transitions that occur between t − 1 and t, that is, contemporaneously with the choice of location at t. These include the birth of a child, a move into or out of home ownership, a move out of the family home into social or private rental, and all other tenure transitions (e.g., between private and social rented accommodation). Indicators of partnership formation and dissolution between t − 1 and t were considered in preliminary analysis, but were not retained as the additional effects of these variables on location choice were found to be insignificant.
We also tested for period effects on mobility and in the push and pull effects of area characteristics by including year dummies among the predictors of inertia and their interactions with deprivation and distance. As there was little evidence of period effects over our observation period of 1998-2008, we adopted a simpler model specification which assumes time-invariant inertia and push and pull effects.
The analysis sample consists of 30,912 person-wave observations from 6249 individuals (treating couples as a single "individual" as described above). The overall annual mobility rate is 10.8% and 34.8% of households move at least once during the 10-year observation period.
Sampling Choice Sets. Expanding the data to obtain one record for each LSOA in a household's choice set results in a personwave-LSOA dataset of over 29 million observations. LSOAs were randomly sampled from this expanded dataset with probability inversely proportional to the size of the TTWA, while always retaining the records for the LSOAs of residence at t − 1 and t. Thus, for household i resident in TTWA j at t − 1 the probability that LSOA r is selected from their choice set is R it| j is the number of areas in the choice set of household i at year t given residence in TTWA j, and the constant c is chosen so that the number of records in the person-wave-LSOA file is approximately equal to a target of m tar according to where R j is the total number of LSOAs in TTWA j and m j is the total number of person-wave-LSOA records in TTWA j.
The following results are based on an analysis file with m tar = 800,000. To assess sensitivity of estimates to random sampling of the choice set, the analysis was repeated for two different random subsets of person-wave-LSOA records: the first with the same value of m tar and the second with m tar = 1,600,000. The parameter estimates and credible intervals were found to be very similar for these different random samples.

Results
As described above, we focus on the effects of two neighborhood (LSOA) characteristics on location choice: area deprivation (IMD), and distance from a household's current residence. We allow for both observed and unobserved heterogeneity in the push and pull effects of IMD and the pull effect of distance through their interactions with the household characteristics x i(t−1) described in the previous section and the inclusion of household-specific random effects. In addition, mobility (inertia) is modeled as a function of x i(t−1) , again allowing for unobserved heterogeneity. We also consider models with random effects at the MSOA level.

Assessment of Convergence and Model Fit.
The results presented below are based on five parallel chains of 100,000 MCMC iterations, each using a different starting value and with a burn-in sample of 10,000. Uniform priors were assumed for all parameters. Convergence was assessed using a range of graphical diagnostics and the potential scale reduction factor (PSRF) (Gelman et al. 2004). Visual inspection of trace plots of each parameter for the multiple chains suggested adequate mixing. Following the iterative graphical approach of Brooks and Gelman (1998), the within-chain variance, a weighted average of the within and between-chain variance, and the PSRF were examined for subsets of the chains. All had stabilized by 100,000 iterations, and the final PSRF estimates were close to 1 for all parameters. Furthermore, increasing the chain length led to little change in the running means of the posterior estimates. Hierarchical centering and orthogonal parameterization were considered, separately and in combination, in an attempt to improve mixing. For the fixed parameters, orthogonal parameterization led to substantial reductions in the effective sample size (ESS) (Kass et al. 1998). As in this case hierarchical centering had almost no impact on the ESS of the random effect variances and covariances, we present results from using only orthogonal parameterization. Three models were fitted and compared using the Bayesian Deviance Information Criterion (DIC) (Spiegelhalter et al. 2002). A model with household and area covariates, as in Equation (7), has a DIC of 52,525. The DIC decreases to 44,854 with the addition of household random effects, and there is a further decrease to 41,151 when area (MSOA) random effects are included. The specification of the area effects in the selected model is a special case of Equation (12) with λ = 1. In the more general model with λ unconstrained, the chains for the betweenarea variance and λ mixed very poorly, suggesting that the data did not support the extra complexity. We, therefore, assume that unmeasured area characteristics have the same influence on moves out of an area and on that area being chosen as a new destination.
The fit of the selected model was assessed using graphical posterior predictive checks (Gelman and Hill 2007). Replicates of the multinomial response y rep were simulated using every 1000th draw from the parameter chains for the fitted model, including predicted values of the random effects. Simulation was carried out sequentially with the lagged choice indicator w ri(t−1) and covariates relating to the current area (distance and deprivation at origin) updated at each wave to reflect the dynamic structure of the model. The replicates were generated from predicted response probabilities computed for each area in the full choice set, not only the random subset selected for analysis. Test statistics T (y rep ) were chosen to assess the model's ability to capture the two main components of the model: household mobility and residential choice among movers. Figure 1 shows the distribution across replicates of the proportion of households who never moved over the observation period and the proportion of moves where there was an increase or decrease in deprivation of more than 0.5 standard deviation units between the destination and origin area. Comparison of the posterior predictions of each test statistic with the same statistic computed for the observed data T (y) shows that the model provides a good fit to these aspects of the data.

Effects of Household Characteristics and Differential Push Effects of Area Deprivation on Mobility.
The results for the full mixed logit model are shown in Tables 1-3 and Figure 2. The estimates shown in the "main effects" columns of Table 1 are the effects of household characteristics x i(t−1) on the odds of a move to a new area between years t − 1 and t; these are the exponents of the α parameters in Equation (7) after reversing their signs. The first row gives exp(−α 0 ) the odds of a move for households taking the reference value for each variable in x i(t−1) . The remaining estimates are the odds ratios (OR) for each household characteristic k, exp(−α 1k ), where an OR greater than 1 implies that a higher value on that covariate (or being in a particular covariate category) is associated with an increased odds of a move. These estimates are for households in an area with an average level of deprivation. As the model includes household and area random effects, all estimates are median ORs, that is the covariate effects for households at the mean of the multivariate random effects distribution and for neighborhoods (LSOA) in an area (MSOA) at the mean of the area-level random effects distribution. Lower household income and the presence of a new baby or young (preschool) child are associated with a higher annual probability of moving to a new neighborhood. There is a strong relationship between housing tenure and neighborhood change (estimates not shown): private renters are the most mobile and, not surprisingly, a change in tenure usually coincides with a move.

Differential Push and Pull Effects of Deprivation and Distance on
Mobility and Area Choice. Also shown in Table 1 are the interaction effects between x i(t−1) and area deprivation (IMD) on the odds of a move. These estimates are the exponents of the β parameters in Equation (7), after reversing their sign. The first row gives exp(−β 0 ), which is the multiplicative effect of a 1 standard deviation increase in IMD in the current area on the odds of moving out for the reference group. Two types of IMD effects are presented: the interaction effects exp(−β 1k ), which represent differential push effects of IMD compared to the reference group, and, for categorical household covariates, the total or overall push effects for each households in each category, exp(−(β 0 + β 1k )). Similarly, the estimates shown in Table 2 are the pull effects of IMD and distance from the current area on neighborhood choice for the reference group, exp(γ 0 ) in Equation (7), and the differential and total pull effects by household characteristics, exp(γ 1k ) and exp(γ 0 + γ 1k ). The estimates for IMD are the effects of a 1 standard deviation increase in IMD in area s on the odds of choosing s rather than another area r at year t, holding constant IMD in r, where r could be the area of residence at t − 1. The estimates for distance are the effects of an increase of 1 km in the distance between area s and the area at t − 1 on the odds of choosing s rather than r, holding constant the distance between r and the current area.  Figure (a). ‡ Couple with no children at t − 1 or birth in (t − 1, t], owner-occupiers at t − 1 with no change in tenure in (t − 1, t] and mean log(household income). Note: Estimates are posterior means and % credible intervals (. and . percentiles) of odds ratios (OR). ORs with % CI excluding  are highlighted in bold.
For the reference group of households we find that, as expected, a higher level of deprivation in a neighborhood is associated with an increase in the probability of out-mobility (Table 1) and a decrease in the probability of being chosen as a destination by movers (Table 2). Households are also less likely to move to neighborhoods that are far from their current place of residence (Table 2). However, the interaction effects in Tables 1 and 2 show that the effects of IMD, and to a lesser extent of distance, vary by household characteristics. In Table 1 an interaction OR > 1 for characteristic k indicates that the positive push effect of IMD for that subgroup of households is stronger than for the reference group, implying a greater aversion to remaining in a deprived area (and a total OR exceeding the OR for the reference group). An interaction OR < 1 may imply a weaker positive effect of deprivation or even a negative effect for a particular group. As the pull effects of IMD and distance are negative for the reference group (Table 2), an interaction OR < 1 implies a stronger negative effect of increasing deprivation or distance in area s on the odds of choosing s over another area.
Although there is little evidence that the push effect of deprivation depends on household income, higher income is associated with a stronger aversion to deprivation when choosing a new area, most likely because higher-income households   (c) show the effects of increasing IMD and distance from the current area for area s on the odds of choosing s rather than another area. Estimates are the posterior means and % credible intervals of odds ratios for the effects of a  SD increase in IMD and a  km increase in distance.
are better able to act on preferences toward living in better-off areas. This result is consistent with research in the United States, Sweden and Britain that finds that household income constrains movers' access to more advantaged neighborhoods (Ioannides and Zabel 2008;Hedman, van Ham, and Manley 2011;Clark, van Ham, and Coulter 2013). The differential pull effect of deprivation by the level of area deprivation at origin is in line with the income effect: deprivation is a less important factor when choosing a new area for households currently living in a more deprived area. The effect of distance also depends on income, with a weaker effect for higher-income households. Previous research on Britain suggests that area deprivation exerts a push effect on mobility among couples but not singles (Rabe and Taylor 2010) while singles are more likely than Estimates of inertia, push and pull effects of IMD, and pull effect of IMD on the log-odds scale for the reference group, that is α 0 , β 0 , γ 0,IMD and γ 0,Distance . ‡ Calculated as the random effect mean ± . times the random effect SD. Note: Estimates are posterior means and % credible intervals (. and . percentiles). Correlations with % CI excluding  are highlighted in bold.
couples to move to less advantaged areas (Clark, van Ham, and Coulter 2013). However, we find little evidence of a difference in the push or pull effects of deprivation for singles and couples without children: differences between singles and couples only emerge when they have dependent children. Deprivation in the current area of residence has a weaker effect on the decision to move out for single parents and couples with a preschool child than for other household types, and there is also a suggestion that deprivation has a weaker deterrent effect on choosing a new area among couples with older children (aged 16+) than for households with a younger child or no children. Distance is a more important factor in movers' choice of destination for couples with school-age children than for other household types, which may reflect stronger local ties and a reluctance to move far from current schools among these families. A birth between years t − 1 and t strengthens the push effect of deprivation during the same period, but also weakens aversion to deprivation when choosing a new area to live. These apparently contradictory findings may be due to a desire to move out of a deprived area among some new parents while others compromise on neighborhood quality in the search for an affordable family home.
Another source of heterogeneity in the importance of deprivation and distance in residential location decisions is housing tenure and changes in tenure. Figure 2 shows estimates of the push effect of IMD on mobility and the pull effects of IMD and distance from the current area for households whose tenure remains the same at t − 1 and t ("static") and for households who change tenure ("transitions"). Starting with households whose tenure did not change, we find that higher deprivation in the current area is associated with increased odds of a move for homeowners, but the effect of deprivation switches direction and becomes nonsignificant for private and social renters (Figure 2(a)). Furthermore, private renters are less averse to deprivation than homeowners when choosing a new area (Figure 2(b)), while the effect of deprivation on the area choices of social renters is not significant. Distance is an important factor for all three tenure groups, but less so for renters than for homeowners (Figure 2(c)). In Britain, private renters tend to be more mobile than owner-occupiers (e.g., Rabe and Taylor 2010;Steele, Clarke, and Washbrook 2013) and may, therefore, be less concerned about neighborhood factors. On the other hand, rented accommodation, especially social housing, tends to be located in the most deprived areas (Clark, van Ham, and Coulter 2013) which limits the chance of a move to an affluent area without a change in tenure. Low mobility within the social housing sector, particularly in high-demand areas in the South, is well documented (e.g., Kearns and Parkes 2003) and previous research has found that social renters have limited opportunities to "move up" to less-advantaged areas or to maintain residence in better-off areas (Clark, van Ham, and Coulter 2013).
Turning to households who changed tenure, higher deprivation in the area of origin is associated with a lower probability of leaving that area for private renters who made the transition into home-ownership (Figure 2(a)), although new homeowners tend to choose less-deprived areas when choosing a new destination (Figure 2(b)). Individuals who left the family home for private rented housing have a tendency either to remain in or move to a deprived area, most likely because of a greater availability of lower-cost accommodation in such areas. Finally, proximity to the current residence is an important factor for all households who change tenure, but less so for new homeowners and homeleavers.
Unobserved Heterogeneity in Push and Pull Effects. Table 3 shows estimates of the variances and correlations between the four household-level random effects representing timeinvariant propensities to stay in the same area (inertia), and sensitivities to deprivation as a push or pull factor in residential location choice and distance from the current area as a pull factor. The results presented are from the full model that allows for differential effects of deprivation and distance by the observed household characteristics of Tables 1 and 2 and Figure 2. There is considerable variation between households in the propensity to stay in the same area and in the effects of deprivation and distance on location choice. Based on the normality assumption, 95% of households are expected to have a baseline logodds of remaining in the same area between 2.44 and 14.32, while the corresponding ranges for the effects of IMD and distance span zero. The proportion of households for whom higher deprivation is associated with higher odds of a move is estimated as Pr(−β 0i > 0) = Pr(β * 0i < 0.49/1.56) = 62%, where β * 0i ∼ N(0, 1), which suggests that a substantial number of households are unable to move out of deprived areas (assuming an underlying preference to live in more prosperous neighborhoods). By similar calculations, the proportion of households with an aversion to deprivation when choosing a new area and with a preference toward areas that are close to their current residence are 96.5% and 91%, respectively. After accounting for differential effects of deprivation and distance by observed household characteristics, there remains a significant correlation between inertia and the pull effect of distance. This strong, positive correlation suggests that more mobile households have a stronger-than-average preference to remain close to their current neighborhood when moving house. Put another way, households that move less frequently are prepared to move greater distances when they do so. There is also a moderate, positive correlation between the push and pull effects of deprivation, which implies that households with a stronger-than-average aversion to remaining in a deprived neighborhood have a tendency to avoid more deprived areas when choosing a new place to live.
For neighborhoods (LSOAs) r and s in different MSOAs, the log-odds of choosing s versus r depends on the difference in their unmeasured MSOA-level characteristics v s(d) − v r(d) . The contribution of area effects to the standard deviation in the log-odds is estimated as √ 2 × 1.38 = 1.95. Thus area effects are sizeable, although variation between households in neighborhood choice is dominated by the substantial variation in household inertia. This between-MSOA variation also implies that there is a similarity between neighborhoods in close proximity (in the same MSOA) in their probability of being chosen as a place to live.

Discussion
This article has presented a general mixed logit model which makes use of longitudinal data to distinguish, and estimate simultaneously, the push and pull effects of multiple area attributes on residential location choice. An efficient MCMC algorithm was proposed which, together with sampling of the choice set, allows consideration of a larger set of potential destination areas, larger sample size, and longer observations period than has been possible in previous research.
Our analysis of household heterogeneity in the effects of neighborhood deprivation on out-mobility and movers' selection of a new destination suggests that the residential choices of less-advantaged households are severely constrained. We find that low income is associated with a lower probability of moving to a more advantaged neighborhood while private and social renters are less likely than owner-occupiers to move out of deprived areas. As argued by other authors (e.g., Clark, van Ham, and Coulter 2013), such constraints in the housing market lead to increasingly selective migration with disadvantaged households unable to "move up" to better-off areas, a situation which is likely to worsen with rising house prices in the United Kingdom. We also find that even for local moves within labor market areas, the influence of distance of a potential destination from the current residence depends on household characteristics.
The focus of the analysis presented here is household heterogeneity in the importance placed on two area characteristics, deprivation and distance, in location choices. Another avenue for research would be to compare the push and pull effects of multiple area attributes-such as crime, house prices, and school quality-for different types of household. It is also straightforward to extend the model to include random coefficients on interactions between household and area characteristics, x it−1 * z rt−1 in Equation (7), thus allowing for heterogeneity in the importance placed on an area characteristic z within groups defined by x. A consequence of this more general specification is that the covariance between individual choice propensities given by Equations (10) and (11) would depend not only on z, but also on (possibly time-varying) household characteristics.

Supplementary Materials
The online supplementary materials contain further details of the MCMC algorithm and the extension to include area effects. Also provided are details of the design of the simulation study and results.