Embedding road networks and travel time into distance metrics for urban modelling

ABSTRACT Urban environments are restricted by various physical, regulatory and customary barriers such as buildings, one-way systems and pedestrian crossings. These features create challenges for predictive modelling in urban space, as most proximity-based models rely on Euclidean (straight line) distance metrics which, given restrictions within the urban landscape, do not fully capture spatial urban processes. Here, we argue that road distance and travel time provide effective alternatives, and we develop a new low-dimensional Euclidean distance metric based on these distances using an isomap approach. The purpose of this is to produce a valid covariance matrix for Kriging. Our primary methodological contribution is the derivation of two symmetric dissimilarity matrices ( and ), with which it is possible to compute low-dimensional Euclidean metrics for the production of a positive definite covariance matrix with commonly utilised kernels. This new method is implemented into a Kriging predictor to estimate house prices on 3,669 properties in Coventry, UK. We find that a metric estimating a combination of road distance and travel time, in both and , produces a superior house price predictor compared with alternative state-of-the-art methods, that is, a standard Euclidean metric in and a non-restricted road distance metric in and . F


Introduction
By 2030, it is expected that 5 billion people will live in urban areas, 662 cities will have at least 1 million residents and there will be a total urban spread of 1.2 million km 2 (Biello 2012, Seto et al. 2012, Nations 2016. Hence, cities will continue to accommodate over 50% of the world's population. In the United Kingdom, over 82% of citizens live in its 64 cities, a figure which has grown by more than 13% in the past 30 years (Champion 2014). Many UK cities suffer from legacy infrastructurethe City of London, for example, relies on sewage infrastructure originally built in the 1860swhich impacts on their ability to support projected growth. Such challenges are well documented: Housing supply is not matching demand (Henretty 2018), commuting times are increasing (Gayle 2017) and there are shortages in services for the most vulnerable citizens (Stewart et al. 2003). Issues of urban growth and sustainability motivate the development of mathematical tools and models for explanatory and predictive analysis (Townsend 2015).
Urban models provide insight into the relationship between some chosen target value, house prices for example, and other potentially related variables, such as topography (Kok et al. 2011), building footprints (Pace et al. 1998) and crime (Thaler 1978). Space (Crosby et al. 2016) and time (Huang et al. 2010) consistently feature in most urban models, for example in house price prediction (Crosby et al. 2018), traffic flow prediction (Zou et al. 2012) and in the analysis of green space and its impact on wellbeing (Houlden et al. 2017). A typical approach to understanding spatial characteristics in this way is through geostatistical proximity-based modelling. An example of this approach is Kriging (defined in Section 4), which assumes random variables to be spatially dependent and non-stationary over space.
A common assumption in geostatistical models (including Kriging) is that proximity is based on Euclidean distance; this is in spite of the fact that dispersion in a city landscape is unlikely to exhibit such properties.
Traditionally, research in real-estate price modelling has considered distance to a specific location (e.g. workplace) and/or comparable prices of other sub-markets within close proximity. A more sophisticated approach to this is to include physical barriers such as buildings, road layout and non-accessible open space to the models, as distance, in practice, is clearly governed by such obstacles. This is evident in recent work on road distance-based Kriging, which has been shown to be highly effective for urban house price prediction (Crosby et al. 2018).
Our paper presents a natural extension to this earlier work by including travel time. In so doing, it integrates a number of otherwise difficult-to-capture variables such as traffic flow, road layout, junction priority and congestion caused by on-road parking. Our primary purpose is to show the effect that road distance and travel time have on predictive modelling; note that we do not prescribe reasons for these effects (i.e. we will not be considering any covariates).
Our methodological advances are motivated by our work in urban house price prediction; that is, we attempt to model unexplained variation through proximity between observations, to underpin and improve on hedonic pricing models already available in academia and in industry.
An essential prerequisite to geostatistical models is the production of a variogram and covariance function. Covariance and variogram functions must remain validpositive definite (PD) and conditionally negative definite (CND), respectively (Curriero 2005(Curriero , 2006) (see Section 4.1 for formal definition).
Given the extensive research based on Euclidean pairwise distance (straight lines), there is no guarantee that any non-Euclidean distance matrix (PD or otherwise) will produce valid covariance or variogram functions. For this reason, pairwise road distance and travel time matrices are unlikely to be valid. Hence, the purpose of this research is to propose an isometric embedding approach with which we can approximate road distance and travel time in a lower dimensional Euclidean space, to allow physical properties of cities to be represented in spatial prediction whilst still producing mathematically valid approximations.
In order to illustrate the benefits of these new distance metrics, a so-called realestate automated valuation model (AVM) for residential properties is developed for the city of Coventry in the United Kingdom. This AVM is used to provide mathematically modelled individual market values for 3669 properties. The case study in Section 5 shows that a combination of road distance and travel time produces a superior Kriging predictor compared with a Euclidean approach for all assessed validation metrics.

Contributions
The contributions of this research are as follows: • First, methodological contributions are made via the derivation of two symmetric dissimilarity matrices (B þ and B 2þ ), with which it is possible to compute low-dimensional Euclidean metrics for the production of a PD covariance matrix with commonly utilised kernels and non-valid, non-Euclidean, input spaces. • Second, we demonstrate the application of this new geostatistical approach to the calculation of (1) approximate restricted road distance, (2) approximate travel time and (3) combined road distance and travel time matrices, in each case within an embedded lower dimensional Euclidean space. • Third, we compare a number of the most popularly employed cross-validation techniques to assess the ability of each to estimate how well our model generalises to unseen data.

Sections
The remainder of this paper is organised as follows: background research is detailed in Section 2; Section 3 motivates the need for this research through two practical examples; new methodological contributions are described in Section 4 and applications of these methods, to urban house price prediction, can be found in Section 5. The paper concludes in Section 6 in which we also document avenues for future research.

Related literature and key concepts 2.1. Constructing optimal urban Kriging predictors
Kriging is a geostatistical spatial predictor which accounts for spatial covariance. The method utilises observation distances to understand the spatial structure of a dataset and hence determine its own interpolation parameters (Cressie 2015). Kriging is used extensively for interpolation by ecologists (Little et al. 1997), geographers Changling (1987) application and geo-scientists (Hudson and Wackernagel 1994). The first stage of Kriging models the degree to which distance between observations is correlated. The experimental variogram does this by calculating an average difference between observations (termed the lag). If the experimental variogram is able to observe spatial patterns accurately, then Kriging applies the modelling coefficients to determine interpolation parameters (Matheron 1963). Parameter optimisation, kernel selection and lag sizes are the primary strategies used in optimising experimental variogram and Kriging algorithms (Cressie 1985, Garcıa-Soidán et al. 2004, Yu et al. 2007. Kriging is commonly used in urban science and examples of its application include traffic flow prediction (Zou et al. 2012), and travel time (Miura 2010) and trip planning (Liebig et al. 2014). The use of Kriging for urban real-estate pricing is motivated by Dubin (1988), Basu and Thibodeau (1998) and Crosby et al. (2016) who together note that space and time are highly influential in house price prediction. Each of these approaches however uses Euclidean distance only.
A small number of non-Euclidean distance-based approaches have been employed to Kriging, including those based on Minkowski (including Manhattan) (Ganio et al. 2005, Theodoridou et al. 2015, Crosby et al. 2018, geodetic (Banerjee 2005) and water-based (shortest path over water) (Murphy et al. 2014) distances. Each offers its own benefits; however, it is difficult to assess whether each produces valid experimental variograms without access to the initial data; we show in Section 3 that relying on the fact that input distances are PD metrics is no guarantee of valid variograms.
In a similar manner to this research, Crosby et al. (2018) utilises Open Street Map (OSM) data to estimate restricted road distance and travel time between pairwise points. Crosby et al. (2018) uses a Minkowski P value of 1.6, which demonstrates the highest correlation to road distance, travel time and a linear combination of both, with the OSM data. Their work also discovered that the same P value returned positive results when applied to the domain of house price prediction. We directly compare the results of Crosby et al. (2018) with our new approach, see Section 5. Figure 1 provides a simple comparison of Minkowski distance (P = 1.6), a Euclidean distance, a Manhattan distance and regular road distance.
Research which bears similarity to our own can be found in Lu et al. (2014), who use geographically weighted regression (GWR) and a non-Euclidean distance metric for predicting London house prices. Their research also utilises road distance and travel time, however is limited to network shape and speed limit; our measures include a wealth of other data provided by OSM, see Table 5.
Approaches based on GWR have advantages, in particular because there is no requirement for the matrix to be Euclidean [the matrix w i of weights is diagonal; hence, there is no need to check for positive definiteness, which is not the case with the covariance matrix used in Kriging (Curriero 2006)]. However, it is noted in Crosby et al. (2018) that Kriging typically outperforms GWR in spatial pricing models; this is especially true when implemented locally, which is the case in Ordinary Kriging which assumes intrinsic stationarity (i.e. a moving mean but a stationary variance between any two points).

Overcoming non-metric input spaces
For the most part, geostatistics relies on the assumption that each set of distances lie in a metric space ðM; dÞ, where M is a set and d is a metric on M; for example, d might be a function: (1) R þ 2 M is a set of non-negative real numbers whose values satisfy the properties P1-P4: d i;j >0 (P1: Non negativity) d i;j ¼ 0 ( x i ¼ x j (P2: Identity of indiscernibles) d i;j ¼ d j;i ðP3 : SymmetryÞ d i;j <d i;k þ d k;j (P4: Triangle inequality).
There are three known methods which ensure that a distance matrix is valid (i.e. it produces a PD covariance matrix): the first uses isometric embedding to ensure a Euclidean input; the second is the use of kernel convolution so that the kernel fits any matrix; the third is to select a matrix which produces a valid covariance matrix. Previous research has assumed that the distance matrix does not have to satisfy P1-P4 per se, but that it must ensure a PD covariance matrix. We do not subscribe to this view, as the example in Section 3 highlights.
With regard to the three methods that ensure matrix validity: Isometric embedding provides a dimensionality reduction technique with which it is possible to build a low dimensional Euclidean approximation of non-Euclidean inputs for variogram modelling. Using simulated data with isotropic spatial dependence, Curriero (2006) builds four omnidirectional experimental variograms, each representing an α norm, for α = 1, …, 4 (α = 2 is Euclidean). When these data are applied with Kriging, the newly defined 'stream' distances outperform Euclidean distances in all cases; this is therefore our method of choice.
We note that other research proposes similar approaches to approximate road distance metrics, see Tenenbaum et al. (2000) and Zou et al. (2012). In Zou et al. (2012), the Floyd Warshall (FW) algorithm is applied to a road network to estimate the actual road distance between pairwise locations. We note however that FW only selects the shortest distance, irrespective of restrictions such as transport patterns and one-way systems.
The use of kernel convolutions, which can be used to express moving averages, assumes that correlated data can be expressed as linear combinations of uncorrelated data. This method has been successfully applied by Crawford and Young (2008); however, we note that this method can be difficult to implement on problems with large datasets and is hence not considered further in this work.
Finally, the selection or creation of a valid covariance function can be undertaken. For example, Curriero (2006) noted that a set of Manhattan distances produced nonvalid variograms with Gaussian, Matern and spherical kernels but were valid for an exponential kernel. We are aware that this approach has several restrictions and is also time consuming to compute, and so for this reason, isometric embedding remains our method of choice.

Motivation
The contributions of this work are based on the following assertion: The only way to guarantee that a covariance matrix and variogram function are valid in this context is to ensure that a Euclidean distance metric is input for their calculation.
The only way to ensure that a variogram is valid is to input a Euclidean distance function. This implies that even PD distance functions cannot always produce a valid variogram, a concept which has potential to invalidate much previous research.

Non-PD inputs
Non-PD matrices produce non-PD kernels (covariance functions) which is usually as a consequence of the L 2 norm; note that Section 3.2 provides other examples where this is the case. Matrices 1 and 2 show a set of possible pairwise distances. These matrices are not symmetric, much like a road network containing one-way systems, and hence, they are not PD. To test whether each matrix always produces a valid variogram, we select a Gaussian covariance function (CðhÞ ¼ σ 2 e ðÀh=aÞ 2 ) with σ 2 = 0.5, 0.08 and a = 450, 1.5. The output vectors from this calculation are shown in Vectors 1 and 2. In view of the negative roots in Vectors 1 and 2, it is clear that both covariance functions are not conditionally PD ( P n i¼1 P n j¼1 α i α j CðhÞ ! 0) and hence road distance and travel time are not valid for variogram modelling.

PD inputs
Additionally, non-Euclidean PD matrices may also produce non-PD kernels, a fact that previous research has been known to overlook. Matrix 3 below represents the same roads as in Matrix 1 and 2, but this time, the road distance is not restricted (much like the work by Zou et al. 2012); that is to say, one-way systems are not considered and hence are completely PD. The same covariance function and hyperparameters are used. Vector 3. PD road distances. Vector 3 shows that the output eigenvector still contains negative roots, which itself means that the covariance function is not conditionally PD, despite the input matrix being PD. This motivates our new approach for estimating non-Euclidean, non-PD distance matrices in a Euclidean space in order to produce valid covariance and variogram functions.

Method
We describe how current state-of-the-art approaches estimate city-based proximity (i.e. non-Euclidean distance metrics) and compare these approaches to our new method. We show how our proposed approach, isometric embedding with two symmetric dissimilarity matrices (B þ and B 2þ ), produces a PD covariance matrix. As a result of this, we then show application of this new technique to the establishment of an urban realestate price predictor.

Distance matrix calculation
To undertake geostatistical modelling, a pairwise distance metric is required. This pairwise distance metric is populated with distances d i;j from a list of locations {x i ; i ¼ 1; . . . ; n} in Euclidean space R n . The matrix provides the basis for a valid metric if all d i;j satisfy P1-P4, see Section 2.2.
As we have previously shown, road distance and travel time are not natural metrics. Given this, we compare four methods for calculating conforming geostatistical distance metrics from these inputs: (1) A Euclidean distance (see Section 4.1.1), (2) a Minkowski approximation of restricted road distance and travel time (see Section 4.1.2), (3) an isomap estimate of road distance (see Section 4.1.3) and (4) a newly formulated improved isometric embedding approach to estimating restricted road distance and travel time (see Section 4.1.3).

Euclidean distance
Unless otherwise stated, it is typical to assume a Euclidean function when referring to distance. Assuming two sites as vectors s = ðs 1 ; . . . ; s d Þ and u = ðu 1 ; . . . ; u d Þ, in Euclidean space R d , then the Euclidean distance is where d is the number of dimensions (or attributes) and s i and u i are the attributes.

4.1.2.
Minkowski distance Assuming the same notation as above, the Minkowski distance is where P is an user-defined parameter. Manhattan and Euclidean distances are special cases of Minkowski, with P = {1,2}, respectively. Crosby et al. (2018) show that Minkowski distances with PÞ {1,2} can better estimate road distance and travel time compared with Manhattan or Euclidean distances.

Isometric embedding and isomap
Isometric embedding provides the spatial transformation of a new metric space ζ 0 =ðs 0 ; d 0 Þ from ζ ¼ ðs; dÞ, with point set s ¼ ðs 1 ; s 2 ; . . . ; s n Þ, distance function D of ζ and distance function D′ of ζ 0 . All associated s and d ij values are intrinsic. If D ' D′, then the transformation still preserves topological adjacency among points in the original space ζ. Dimensionality reduction is a good means of achieving isometric embedding; multidimensional scaling (MDS) is the most popular such scheme.
Isomap, in addition to isometric embedding, attempts to detect the intrinsic characteristics of non-linear data, in which ζ may be a non-metric space. For example, isometric embedding assumes a Euclidean distance, whereas isomap supports other spatial features such as non-restricted approximate road (Zou et al. 2012) and geodesic (Banerjee 2005) distances on a set of discrete points (Tenenbaum et al. 2000). Figure 2 provides an example of a road distance layout (left) transformed into a low-dimensional Euclidean space (right) using isomap.
As stated, MDS is a dimensionality reduction technique used to achieve isometric embedding or isomap. Given an input metric D (which is e.g. Euclidean) in n-dimensional metric space ζ, the first stage of MDS is to calculate the dissimilarity matrix B B ¼ 1 2 a ij À a i : À a j : þ a:: where a i : is the average of all a ij across j. Formally, each element B ij in matrix B is calculated by Figure 2. Illustration of the spatial transformation from road distance (or travel time) into a Euclidean space.
where B is a new set of isometric distances which mimics a kernel where B is doubly centred. Although B is semi-PD, it is not guaranteed to produce a PD covariance function or a CND variogram (see proof in Section 3). B is definitely valid only when the input distance matrix D= d ij È É nxn is symmetric and positive. Given this, classical MDS requires that the eigenvalues of B are λ 1 λ 2 . . . λ α , where α is a user-selected value based on an optimal κ: where λ α >0. The optimal κ provides the smallest value of α given some user-defined minimum variation threshold. Thereafter, the corresponding eigenvectors (Γ ¼ i , for i ¼ 1; . . . ; α) are calculated. The penultimate step of MDS is to calculate a new dataset of points in the new α-dimensional subspace ζ 0 =ðs; dÞ, where s 0 ¼ ΓΛ 1 2 and Λ ¼ diagðλ 1 ; λ 2 ; . . . ; λ k Þ. This new s 0 point set is the isometric subspace which best describes point set D; this process is called eigenvalue decomposition and explains the variance of the data in a lower dimension. In the final stage of isomap, the new coordinates in s 0 are used to calculate a new approximate distance metric using the Euclidean function.
If some inputs are non-metric, such as may be the case with travel time or restricted road distance, the dissimilarity matrix B may not be semi-PD with an L 2 -norm, a property which is essential for MDS. For this reason, a new B þ dissimilarity matrix is proposed in which D is forced to be symmetric within the calculation: Additionally, B 2þ ij takes a combination of both road distance and travel time matrices (the maximum and minimum distances are normalised between 0 and 1) to produce isometric distances, where δ ij represents the normalised road distance and τ ij represents the normalised travel time distance between each i and j: Each new B þ ij and B 2þ ij solves the problem of non-symmetry for travel time and restricted road networks. This ensures that B is semi-PD so that the process of MDS and the output distance matrices is also both valid. B þ ij and B 2þ ij are key contributions of this research. 'Stress' validates the effectiveness of classical MDSit tests the goodness of fit for D′ with the input metric D (the normalised sum of squares), such that However, when implementing non-metric inputs, Stress should be calculated differently such that d b þ ij and d b 2þ ij are the Euclidean functions on space B þ and B 2þ , respectively. The reason for this is because we are no longer reconstructing elements d ij . Rather, we reconstruct the dissimilarity matrix for the new metric space. A metric space can be confirmed such that Given that we can define a Euclidean metric from B, we are assured that it is indeed a valid metric space.

Case study: real-estate valuation
Real-estate valuation has become a much more data-driven and quantitative process. This said, the process of estimating the value of a property or land parcel through market appraisal remains the de rigueur of skilled market professionals. Having now worked in this domain for several years, our aim has been to scale-up and semiautomate the use of big data for real-estate valuation.
To this end, we build a so-called AVM for a sample of 3669 residential properties in the city of Coventry in the United Kingdom, using Ordinary Kriging with a target valuation date of 1 January 2017. We develop a new approximate road distance and travel time metric for variogram calculations. Figure 3 diagrammatically depicts the entire process in this study and Algorithm 1 represents the purple coloured section of the diagram.
Algorithm 1. Pseudocode for the entire isomap algorithm displayed in the purple coloured section of Figure 3. Embed into low-dimensional Euclidean space ζ 0 ¼ ðs 0 ; d 0 Þ i such that α<r (Eq. 6) 14: Collect new coordinates s 0 given S 0 in ζ 0 15: Calculate the new Euclidean distances 16: end for For the purpose of comparison, and to ensure robust results, we run six experiments where each contains a different input distance metric: (1) Euclidean (vector norm of 2); (2) Optimal Minkowski (P = 1.6) Crosby17; (3) FW on a road network (PD Road) HaixongDistanceMatric; (4) OSM road distance with restrictions; (5) OSM travel time with restrictions; (6) A combination of normalised road distance and travel time with restrictions.
Each experiment is subsequently referred to using the numerical identifier (1-6).

Data description
Our AVM uses input data regarding all houses that were sold in Coventry in 2016. For each of these 3669 properties, the percentage change in house price, between the date sold and 1 January 2017, is calculated using the predicted change in value in each output area as defined by the UK Office for National Statistics. This provides a predicted price per property for the data as at the 1 January 2017.
The datasets that we use are all open source. The house prices are obtained from Her Majesty's Land Registry. In addition, experiment (3) requires road network data, which is also sourced from the Ordnance Survey. Experiments (4)-(6) all require distances between points along a roadway and the time that it takes to travel these distances; this is sourced from the Open Street Routing Machine (OSRM) powered by OSMs. Table 1 provides a description of each dataset and Table 2 lists roadway restrictions routinely used in the calculations by the OSRM. Figure 4 provides a graphical representation of all house prices from a low price (small, light-coloured circles) to a high price (large, dark-coloured circles). Empirically, it can be seen that the south west of the city of Coventry has a larger proportion of more expensive houses compared with the north east; hence, there exists some global spatial autocorrelation (SAC). Formally, a standard Moran's I test was implemented (see definition by Moran 1950) to confirm a statistically significant SAC. This dataset showed a strong result for I observed ¼ 0:1559136>>I expected ¼ À0:00267094, with a standard deviation of 0:001123158 and a P-value ' 0. These results allow us to reject the null hypothesis that there is no SAC present at α = 0:05. This result supports the notion that spatial regression is appropriate for our application.

Matrix construction
We process five distance matrices for our six experiments. Experiments (1) and (2) require a Euclidean and a Minkowski distance metric, respectively, which are valid for variogram modelling (see the beige-coloured portion of Figure 3). Experiment (2) uses a P-value of 1.6 which was previously reported to perform best on the same dataset, see Crosby et al. (2018). Experiments (3)-(6) require pre-processing using isomap (see purple-coloured portion of Figure 3). Experiment (3) utilises a road network to calculate a shortest path using the FW algorithm. Experiment (3) embeds the input distance matrix using dissimilarity matrix B Ã . Experiments (4) and (5) embed the distance matrices sourced from OSRM and dissimilarity B þ . Finally, experiment (6) utilises the same distance matrices sourced from OSRM but now implementing the B 2þ dissimilarity matrix. This entire process is  depicted in the matrix production column in Figure 3, the purple-coloured portion of which is captured in Algorithm 1 and used in experiments (3)-(6). Table 3 provides a comparison of all distance metrics calculated in our experiment compared with OSRM's actual road distance and travel time matrices.

Data sampling for cross-validation
The most sophisticated validation sampling techniques (hold-out and k-fold) assume data in both the test and training sets to be independent of each other. This is an assumption that may be unrealistic with datasets containing SAC, especially if the purpose of the modelling is for interpolation or close proximity extrapolation (Pohjankukka et al. 2017). As such, four sampling techniques are considered, three of which consider spatial dependence for comparison (see 'data sampling' in Figure 3): (1) 10-Fold cross-validation on the full dataset of 3669 properties; (2) spatially stratified 10-fold cross-validation (spatially stratified k-fold crossvalidation [SSKCV]) on the full dataset of 3669; (3) chequerboard holdout on a training set of 1832 properties, with a test set of 1837 properties; (4) spatial k-fold cross-validation (SKCV) (Pohjankukka et al. 2017) on samples of the entire dataset, with each sample including 3187 properties AE 135 for each fold.
5.3.1. k-Fold cross-validation k-Fold cross-validation (KCV) randomly partitions a dataset into k equally sized subsets. One of these subsets is retained for testing, whereas the other k − 1 are considered for training. For each fold, a different subset is retained for testing until all k subsets are tested. Figure 5(e,f) shows 2 of the 10-folds in our 6 experiments. KCV overestimates statistical effects on spatial random variables and hence produces an optimistic estimate of generalisation performance for unseen data.

Chequerboard holdout
Chequerboard holdout trains approximately 50% of the data and tests the remaining data based on whether they lay in the black or white grid squares (see Figure 5(a)). Our case study uses a training and test set of 1832 and 1837 properties, respectively. Chequerboard holdout is quick to apply, simple and removes some SAC. On the other hand, it removes a significant amount of training data and still contains bias at block borders.

SSKCV
SSKCV processes data in a similar manner to standard k-fold; however, the data splits are spatial and not random. Two of the 10-folds are shown in Figure 5(c,d). As can be seen, each test subset is spatially separated from the training set, which can appropriately remove some bias caused by SAC. However, the data splits still contain SAC at and near sample borders.

SKCV
SKCV estimates a predictor's performance by implementing traditional k-fold crossvalidation, whilst at the same time removing all training points within an empirically designed Euclidean dead zone from all test points (Pohjankukka et al. 2017). Figure 5(b) demonstrates this method where training points within 20 m of each test point are in yellow for a specific fold. This method more efficiently removes SAC than the other methods. However, it relies on a user-defined dead zone with no given heuristic and removes training points which in turn can cause pessimistic results. For our case study, we apply 20 m dead-zones, which removes approximately 8% of the total training points: this parameter value is selected as it is at this level that we see the most significant change in results; close inspection shows that this removes on average 3-5 of properties' closest neighbours.

Variogram construction and ordinary Kriging
Let s 2 R d be a single location representing a house in a d-dimensional Euclidean space and suppose that the house price Z(s) at spatial location s is a random quantity. Then, let s vary over index set D, which is a subset of R d (D & R d ), so as to generate the random process ZðsÞ : s 2 D.
For each experiment (six in total) and each sampling technique (four in total), a new variogram is produced together with a parametric model (kernel); see 'variogram construction' in Figure 3. The maximum distance and lag classes are empirically selected.  The nugget, sill and range are selected by ordinary least squares. For each fold in a k-fold sampling technique, a new variogram is estimated. By means of an example, Figure 6 graphically displays the variogram for the first fold of experiment (4) (restricted road distance metric) with its three best performing kernels: Gaussian, spherical and Matern (in improving order). We undertake two approaches to selecting the best variogram: (1) the user empirically selects the kernel; and (2) a maximum likelihood estimator (MLE) selects the best kernel (Lark 2002). We find that the empirical fitting approach, although lengthy to undertake, produces in all cases a matching or better predictor result. Hence, Section 5.6 reports the optimal results with empirical fitting for all sampling techniques as well as MLE for k-fold cross-validation as evidence that we selected the best approach. Table 4 provides the selected parameters and hyperparameters for each experiment with our most realistic sampling approach -SKCV. It can be seen that the kernel used can change between each experiment; this is because we select the kernel which produces the best Kriging result for each experiment. The kernels show that different distance matrices can make a significant difference to the parameters and weightings of an optimal Kriging predictor. Given that we provide the best result, irrelevant of the kernel, we are providing a more robust like-for-like comparison than we would if we just Figure 6. A graph of the three best kernels for a road distance matrix. selected one kernel for all experiments. We believe that this avoids overly optimistic results for one or two experiments and pessimistic results for the remainder.

Validation
Three validation metrics are utilised: (1) r 2 , (2) root mean squared error (RMSE) and (3) mean absolute percentage error (MAPE) (see Equations (10)- (12)). The r 2 calculation measures the predictor's 'goodness of fit', the RMSE calculates the square root of the sum of the mean squared errors and MAPE is the mean absolute error expressed as a percentage.

Results and analysis
A summary of all results is recorded in Table 5, which provides the validation results for each experiment (1-6) for all validation techniques (k-fold, chequerboard, SSKCV and SKCV). All values in bold represent the experiment which provides the best house price predictor for each sampling technique. If more than one experiment is selected for one sampling technique, then all results between are statistically insignificant based on a t-value of 0.05 on a paired t-test, and hence, all are optimal. It can be seen that prior state-of-the-art Euclidean and Minkowski consistently under-perform compared with the urban road distance and travel time-based models. For example, Euclideanbased Kriging delivers an r 2 of 0.23 compared with a combination of road distance and travel time of r 2 of 0.56 (>x 2 goodness of fit) on the most pessimistic/realistic sampling technique (SKCV). In addition, we note that by considering the shortest path with restrictions (i.e. experiments (4)-(6)), unlike the current state of the art in isomap (experiment (3)), we are able to find a statistically improved house price regression in 3 out of 4 sampling techniques.
Notably, the significance of the improvements between our new approaches (experiments (4)-(6)) compared to Euclidean distances increases as the sampling technique becomes more pessimistic. This is intuitive because in SKCV, a Euclidean dead zone is utilised to penalise the over bias caused by SAC. Additionally, our novel approaches take account of a more sophisticated SAC which better infers the covariates of an urban Table 5. Results from four validation techniques: 10-fold cross-validation, spatially stratified 10-fold cross-validation, chequerboard holdout and spatial deadzone 10-fold cross-validation. environment and hence is less affected by the assumption of independent and identically distritributed (IID) in k-fold cross-validation. As previously discussed (Section 5.4), Table 6 presents the results for all experiments with a MLE. These are inferior to the empirical approach; hence, we opted to undertake all experiments with the empirical approach; these results are shown in Table 5. Table 7 emphasises this point by reporting that our empirically selected kernels produce improved urban house price Kriging predictors compared with the MLE approach undertaken in Crosby et al. (2018).
Overall, we see that our isomap approach can, in some cases, deliver a goodness of fit which is twice as good as results from an approach using Euclidean distance. This statistically significant outcome highlights the potential of using restricted road distance, travel time and non-Euclidean distance matrices, in urban studies and in other geostatistical applications such as restricted stream distances.
Isomap is representative of a network's global structure and is theoretically understood across disciplines. Local isometric embedding, on the other hand, attempts to preserve the local geometry of data; these methods include sparse matrix computations that speed up calculation and utilise local geometry and Euclidean distances in a network, which may otherwise be non-Euclidean globally. Given that we have utilised the commonly understood global approach, further research would include testing against local isometric embedding, especially if one were interested in producing realtime applications which require a low computational complexity.

Conclusion
Through the use of a practical urban modelling case study, we demonstrate that variogram functions do not always remain valid with non-Euclidean distance inputs, and therefore establishing the validity of each distance function becomes essential. Using isomappinga method for nonlinear dimensionality reductionwe show that it is possible to produce PD Euclidean distance metrics, and as a result valid variogram functions.  In contrast to previous research, we demonstrate that shortest path link-based road distances do not always improve the output of geostatistical models compared with Euclidean-based approaches. However, road networks which consider real-world restrictions, such as one-way systems, congestion and the presence of traffic lights can significantly improve modelling accuracy. Two such approaches presented in this research are travel time and a combination of restricted road distance and travel time, the latter of which accounts for a greater number of factors than road distance alone.
More specifically, a newly defined isomap approach is presented, which shows that road distance and travel time can both be more accurately modelled against a PD approximation of both, compared to Euclidean, Minkowski and link-based approaches (Zou et al. 2012, Crosby et al. 2018. In some cases, this provides a goodness-of-fit value which is twice as good as state-of-the-art approaches.
Furthermore, an extensive comparison of spatial cross-validation techniques is conducted, in which we conclude that k-fold cross-validation does not accurately estimate how well a model generalises to unseen data in a spatial setting -SKCV is shown to be a more appropriate sampling technique for cross-validation.
We highlight that using an inappropriate validation sampling technique can lead to an incorrect selection of prediction models. In the case study that we present, the results for our combined road distance and travel time method are significantly better with SAC removal than with standard k-fold cross-validation. Our results show that restricted road distance and travel time predictions produce a statistically improved house price predictor with an r 2 = 0.56; this compares with a Euclidean-based approach which achieves a result of r 2 = 0.23 in the case of sampling with a pessimistic/realistic dead-zone k-fold cross-validation technique (SKCV).
Further avenues of research include the introduction of covariates for an optimal AVM, the production of a restricted road distance and travel time kernel for urban variogram modelling, and an improved estimate of combined road distance and travel time metrics.

Notes on contributors
Henry Crosby is a PhD candidate at the Centre for Doctoral Training in Urban Science in the Warwick Institute for the Science of Cities. He received his BSc (Hons) degree in Business and Mathematics from Aston University, United Kingdom, and his MSc in data analysis at the University of Warwick.