Value of incorporating geospatial information into the prediction of on-street parking occupancy – A case study

ABSTRACT In light of growing urban traffic, car parking becomes increasingly critical for cities to manage. As a result, the prediction of parking occupancy has sparked significant research interest in recent years. While many external data sources have been considered in the prediction models, the underlying geographic context has mostly been ignored. Thus, in order to study the contribution of geospatial information to parking occupancy prediction models, road network centrality, land use, and Point of Interest (POI) data were incorporated in Random Forest (RF) and Artificial Neural Network (ANN, specifically Feedforward Neural Network FFNN) prediction models in this work. Model performances were compared to a baseline, which only considers historical and temporal input data. Moreover, the influence of the amount of training data, the prediction horizon, and the spatial variation of the prediction were explored. The results show that the inclusion of geospatial information led to a performance improvement of up to 25% compared to the baseline. Besides, as the prediction horizon expanded, predictions became less reliable, while the relevance of geospatial data increased. In general, land use and POI data proved to be more beneficial than road network centrality. The amount of training data did not have a significant influence on the performance of the RF model. The ANN model, conversely, achieved optimal results on a training input of 5 days. Likely attributable to varying occupancy patterns, prediction performance disparities could be identified for different parking districts and street segments. Generally, the RF model outperformed the ANN model on all predictions.


Introduction
Finding a free parking space in urban areas can be a very challenging task. On average, parking search traffic, that is, vehicles roaming around looking for free parking spaces, amounts to as much as 30% of total traffic in certain urban areas (Zhang and Haghani 2015). As a result, it has far-reaching socio-economic and environmental consequences. A significant amount of fuel is wasted and an increased number of traffic accidents are linked to the search for parking spaces (Bush and Chavis 2017).
A potential solution to help mitigating parking search is the provision of parking-related data, e.g. by deploying intelligent sensors in parking lots. Ideally, this information can be transmitted to drivers' navigation systems in search of a parking space in the form of dynamic parking maps (Zheng, Rajasegarar, and Leckie 2015;Bock 2018;Huang et al. 2018;Sester 2020). In order to develop more adaptive traffic management and traveler information systems, the prediction of parking occupancy has received significant attention in recent years (Ermagun and Levinson 2018). In such a way, drivers can plan their trips ahead of time, allowing them to customize the destination and departure time. Once close to the destina-tion, drivers can then be guided directly to a vacant parking space.
To make the prediction results more reliable, various external sources of information, in addition to historical parking occupancy data, have been incorporated in parking occupancy prediction models. Some examples of these external sources include temporal information (e.g. time of the day, day of the week), weather, and traffic information (e.g. Arjona et al. 2020;Awan et al. 2020;Camero et al. 2019;Xiao, Lou, and Frisby 2018). However, the underlying geographic context of the target area has not received due attention. Given that the geospatial characteristics of an urban environment largely influence how drivers behave in it, integrating these geospatial components into the prediction models can potentially contribute to the improvement of the predictive performance, as shown in a number of studies (Lu and Liao 2020;Bock 2018;Leu and Zhu 2015). Nevertheless, these existing studies only focused on the occupancy status of adjacent roads, neighboring garages, or parking lots. The explicit inclusion of geospatial information such as land use, Points of Interest (POIs), and the spatial configuration of the street network has, to the best of our knowledge, not been realized and assessed.
In recognizing the above research gap, this work aims to assess the contribution of geospatial information to parking occupancy prediction, via a case study in the city center of San Francisco. We focus on the prediction of on-street parking occupancy, which is more challenging than the occupancy prediction of off-street parking facilities (e.g. parking lots or garages) due to the significant changes of the occupancy rates of parking street segments. Specifically, the following overall Research Question (RQ), and its four sub-RQs are addressed: Overall RQ: To what extent does geospatial information help improving the performance of on-street parking occupancy prediction models? Sub-RQ1: How do results change under varying amounts of training data? Sub-RQ2: How do results change under different temporal prediction horizons (i.e. how far ahead the models predict the future)?
Sub-RQ3: How do predictions vary spatially? Sub-RQ4: How is the predictive performance influenced by the choice of machine learning algorithms?

Parking occupancy prediction
In the literature, parking occupancy prediction has been defined as the estimation of occupancy for a specific parking facility at a given time in the future based on parking-related information (e.g. Zheng, Rajasegarar, and Leckie 2015;Li, Li, and Zhang 2018;Bock 2018). Initial studies focused mainly on the occupancy prediction of off-street parking facilities, such as garages and parking lots. This is due to the fact that they are ubiquitous in many cities, the problem is simpler, and parking data are more accessible (Monteiro and Ioannou 2018). In recent years, however, the focus has increasingly shifted to on-street parking prediction, which is more challenging, due to the absence of lot entrances and significant changes of the occupancy rate of a parking street segment following more frequent parking/leaving events (Bock 2018). In terms of the prediction horizon (i.e. how far ahead a model predicts the future), most of the existing studies focused on short-term (less than 1 h) and medium-term (less than 12 h) prediction (e.g. Ji et al. 2015;Klappenecker, Lee, and Welch 2014;Li, Li, and Zhang 2018).

Parking algorithms
With regard to parking occupancy prediction algorithms, three general approaches can be identified in the literature (Xiao, Lou, and Frisby 2018;Mei et al. 2019). The following section summarizes these three classes of prediction algorithms. For a concise overview of existing algorithms, please refer to Table 1 (columns "Method" and "Employed Algorithm(s)").

Model-based approach
The model-based approach involves the establishment of an underlying model for the parking process, where model parameters are estimated to make parking occupancy predictions. Stochastic arrival and departure processes are usually explicitly employed for this approach. Mostly applied to off-street parking facilities, it is based on the assumption that vehicles arrive at parking spaces following a Poisson distribution. A number of studies (Atif et al. 2020;Caliskan et al. 2007;Klappenecker, Lee, and Welch 2014;Peng and Li 2016;Wu et al. 2014) made parking occupancy predictions using a continuous-time Markov Chain. Lu et al. (2009) used advanced technologies for the provision of arrival and departure rates, whereas Caicedo, Blazquez, and Miranda (2012) made parking availability predictions based on request allocations.

Parametric statistical approach
Statistical time series methods have been a popular approach for making predictions in transportation problems (Karlaftis and Vlahogianni 2011). In this approach, the evolution of a system is considered, with historical observations indexed by time. Implementations using Auto Regressive Integrated Moving Average (ARIMA) models for parking occupancy prediction have been abundant in the literature. Dias, Bellalta, and Oechsner (2015) suggested an ARIMA prediction approach for occupancy status of public bicycle stations in Barcelona, Spain. Similarly, Badii, Nesi, and Paoli (2018) found that an ARIMA model can make satisfactory predictions, on condition that the training was recomputed every hour. Yu et al. (2015) established an ARIMA model to forecast the remaining spaces of a central mall parking lot in realtime by constantly updating the data. Time series analysis, model parameter estimation, and model adaptive testing were carried out to establish the model.

Non-parametric machine learning approach
Several Machine Learning (ML) algorithms have also been applied to predict parking occupancy. Amongst them, the implementation of Artificial Neural Networks (ANNs) have been particularly popular as a means to establish parking occupancy prediction models. Feedforward Neural Networks (FFNNs), the simplest type of ANN models, have been advocated to make parking availability predictions. Yu et al. (2015) and Pengzi et al. (2017) used this type of model to make short-term predictions with time and recent parking occupancy observations for parking occupancy. In a similar manner, Zheng, Rajasegarar, and Leckie (2015) made parking occupancy predictions with longer prediction horizons. Recurrent Neural Networks (RNNs), more complex ANNs with loops, have been proposed in parking occupancy prediction schemes due to their strength in solving problems that are sequential and time-varying (Qolomany et al. 2017). Camero et al. (2019) implemented a shortterm RNN for the occupancy prediction of several car parks in Birmingham. Similarly, Vlahogianni et al. (2016) suggested a real-time time series occupancy scheme based on RNNs. Further, the usage of Long Short-Term Memory (LSTM) networks, an extension of a traditional RNN, was proposed in the literature (Arjona et al. 2020;Li, Li, and Zhang 2018;Shao et al. 2019;Sun et al. 2018).
Ensemble methods have also been implemented for parking occupancy prediction. Koumetio Tekouabou et al. (2020) used bagging, boosting, and Random Forests (RF) in their prediction model. Emphasizing their robustness and competitiveness, Bock (2018) employed an RF model to predict occupancy status of parking segments from crowdsensed data. Dias, Bellalta, and Oechsner (2015), conversely, used an RF model to make long-term occupancy predictions for a public bicycle sharing programme in Barcelona. In addition, regression models have been proposed in the literature. Using support-vector regression, Leu and Zhu (2015) predicted the number of available parking spaces for bicycle stations in Taipei, China. Similarly, Zheng, Rajasegarar, and Leckie (2015), Badii, Nesi, and Paoli (2018), and Chen (2014) carried out prediction schemes for on-street and off-street parking in various cities.

Input data employed in the existing prediction model
In the following, we summarize the types of input data that have been used in the existing studies on parking occupancy prediction. For an overview, please refer to Table 1 (column "Data input").

Recent observations of parking occupancy (i.e. historical parking occupancy data)
Recent parking occupancy observations may be the most significant data input for future parking occupancy prediction. This stems from the fact that there is a strong temporal correlation for parking utilization (Rajabioun and Ioannou 2015;Liu et al. 2018). While statistical time-series methods such as ARIMA explicitly rely on the parking occupancy status of previous time steps, the previous observations have also been implemented as input features in ML models in the literature (Bock 2018;Liu et al. 2018;Zheng, Rajasegarar, and Leckie 2015).

Temporal information
Due to the fact that the parking utilization rate follows recurrent within-day and day-to-day patterns (Chen 2014;Xiao, Lou, and Frisby 2018), temporal information such as the Time of the Day (TOD), the Day of the Week (DOW), and holidays have been implemented in many models to predict parking occupancy. The TOD is a very relevant factor to consider (e.g. Dias, Bellalta, and Oechsner 2015;Zheng, Rajasegarar, and Leckie 2015;Richter, Martino, and Mattfeld 2014). It is implemented either in a time series (e.g. Vlahogianni et al. 2016;Liu et al. 2018) or directly as a feature in ML methods (e.g. Pflügler et al. 2016;Zheng, Rajasegarar, and Leckie 2015;Badii, Nesi, and Paoli 2018). Furthermore, the literature has shown that longterm predictions especially benefit from the distinction between DOWs (e.g. Richter, Martino, and Mattfeld 2014;Vlahogianni et al. 2016;Rajabioun and Ioannou 2015). Drivers' parking behavior also tends to be different on holidays (Li, Li, and Zhang 2018;Wang et al. 2007). Yang, Liu, and Wang (2003) and Greengard (2015) argued that weather information is of central importance, affecting the traffic behavior and traffic flow intensity. Badii, Nesi, and Paoli (2018) showed that weather conditions of 1 h before the parking time have a significant impact on the parking behavior. Similarly, Dias, Bellalta, and Oechsner (2015) and Leu and Zhu (2015) found that the relative humidity and extreme weather conditions, respectively, play an important role in making occupancy predictions for public bicycle sharing systems.

Traffic
Since traffic and parking are closely connected, it has also been argued that the inclusion of traffic information is advantageous to predict parking occupancy. Especially, traffic volume is an important factor, as high traffic volume makes it more difficult to find a vacant parking space (Shin and Jun 2014;Yang, Liu, and Wang 2003;Hössinger et al. 2014). Moreover, Badii, Nesi, and Paoli (2018) suggested that vehicle flow, concentration, and average speed have high predictive relevance.

Location (i.e. geospatial information)
As can be seen from Table 1 (column "Data input"), only few existing studies have used geospatial information in their parking occupancy prediction models. Occupancy observations in adjacent roads or garages were considered by Lu and Liao (2020) and Bock (2018). Similarly, Leu and Zhu (2015), used the occupancy status of the target station's neighboring stations as feature input. Rajabioun and Ioannou (2015) pointed out that there is a correlation of parking usage between car parks that are at different distances from each other. However, beyond these initial studies, research on incorporating other types of geospatial information is missing.

Summary and research gap
In short, three types of methods have been proposed for parking occupancy prediction, including modelbased, parametric statistical, and non-parametric ML, in which ML methods such as RF and ANN have become popular in recent years. Different types of input data have been considered, such as historical parking occupancy data, temporal information, weather, and traffic information. Surprisingly, the underlying geospatial context of the target area has received little attention. The few studies that considered a spatial component in their prediction model only focused on occupancy status of adjacent roads or parking lots, based on the assumption that there is a spatial correlation. The explicit inclusion of information such as land use, POIs and the spatial configuration of the street network has, to the best of our knowledge, not been realized. Hence, its implementation in parking occupancy prediction models poses a great potential that has not yet been exploited. This article thus aims to address the above research gap, and investigate the contribution of different types of geospatial information to the prediction of parking occupancy. We mainly focus on two popular ML methods: RF and FFNN, which were also shown to have high potential in providing reliable predictions (e.g. Bock 2018; Dias, Bellalta, and Oechsner 2015;Zheng, Rajasegarar, and Leckie 2015;Awan et al. 2020;Koumetio Tekouabou et al. 2020).

Parking occupancy data
The historical parking occupancy data used in this study are provided by the San Francisco Municipal Transportation Agency as part of SFpark, a largescale smart parking project. 1 The overall goal was to increase parking efficiency and drivers' experience as well as to evaluate demand-responsive pricing. In order to monitor parking occupancy, onstreet parking spaces were equipped with sensors and recorded occupancy continuously from April 2011 to July 2013. According to the SFpark project, due to various sensor failures in the course of time, the data quality is best for the initial three months. Furthermore, the literature suggests that three months provide sufficient data to develop a reliable prediction model (e.g. Badii, Nesi, and Paoli 2018;Rajabioun and Ioannou 2015;Stolfi, Alba, and Yao 2017). Therefore, in this work, data from the initial phase of the project from April 2011 to June 2011 were considered. Overall, the data cover the hourly occupancy status in 312 on-street parking segments (with 6291 parking spaces) distributed across 9 parking districts. A map of the distribution of the parking districts and their corresponding parking segments within the city of San Francisco is shown in Figure 1. For information regarding how the average occupancy changes over the months, please refer to the Appendix.

Geospatial data
Primarily OpenStreetMap (OSM) data were utilized to represent the street network. 2 OSM provides a detailed street and pedestrian network. The land use data are provided by the City and County of San Francisco. 3 The data include land use categories for every parcel. In this work, three categories were considered: industrial (production, distribution and repair), office (management, information, professional services), and residential. Three categories of POI data were also selected, including business, public transport, and tourist attractions. Business comprises the locations of all registered businesses, which are provided by the City and County of San Francisco. 4 Public transport includes the locations of all train stations and stops within the city, derived from the OSM data mentioned before. Tourist POIs comprise the 20 most popular tourist attractions, according to Tripadvisor. 5 These data were collected in early 2019. 6

Overview
An overview of the methodology is shown in Figure 2. In a first step, the street data and the parking occupancy data were preprocessed. The geospatial data were then quantified according to the methods specified in Section 4.3, based on which the geographic features were derived. The values of the features were normalized to a scale of 0 to 1. Moreover, temporal and historical occupancy features were defined. Given the input data, the prediction models were trained and validated based on the parking occupancy data.

Data preprocessing
In a first step, the parking occupancy data had to be prepared. A total of 11 weeks of data records was selected, from 11 April 2011-26 June 2011. In a next step, occupancy rates of each parking street segment were derived as follows: In this work, we computed the occupancy rates of each parking street segment for each hour in each day. In other words, the denominator of the above equation is equal to 1 h = 3600 s. Further, all parking segment locations were georeferenced and OSM street data were converted into network datasets. Two types of networks were considered: A road network including all roads, as well as a pedestrian network. Impedence was defined as the distance in meters.

Quantification of geospatial information
In order to include geospatial information as input features of the prediction models, it needed to be quantified in a meaningful way. By doing so, geographic predictors were created by assigning the values to each of the parking segment's location. In the following, the three main approaches for the derivation of the geospatial features in this work are described.

Centrality
Centrality is a fundamental concept in network analysis and has been used in various fields, such as social network analysis, urban planning and transportation (Wilson 2000). Especially in urban areas, centrality has been studied by transforming the edges of the street network into a relational graph, representing urban street patterns as spatial networks (so-called line graphs). By doing so, streets are mapped onto graph nodes and intersections of street segments onto the edges between the nodes (Crucitti, Latora, and Porta 2006). In this work, 3 centrality indices were considered and computed for each street segment: closeness centrality (Bavelas 1950), betweenness centrality (Freeman 1977;Brandes 2001), and alpha centrality (Bonacich and Lloyd 2001). Below are the mathematical expressions for indices in that order: where σ st is the number of shortest paths from node s to node t, σ st v ð Þ is the number of shortest paths from node s to node t that pass through v, d w; v ð Þ is the distance between vertices v and w, α is the relative importance of endogenous versus exogenous factors in the determination of centrality, A is the adjacency matrix and e are effects of external status characteristics.
The closeness centrality of a node (or conversely, a street segment in this article) quantifies the mean length of shortest paths between this node and all other nodes in a network. The higher the score of a node, the closer it is to all other nodes and thus the more central in the network. The betweenness centrality of a node measures the number of times this node lies on the shortest path between two other nodes, acting as a "bridge" on that shortest path. High scores thus indicate critical nodes in a network. Finally, alpha centrality is a variation of eigenvector centrality, which is based on the concept that a node within a network is more important if it is linked to adjacent important nodes. For these three indices, a high centrality score indicates a high degree of importance of a street segment within the network.

Service area
Service areas are used to evaluate the accessibility of a facility, e.g. the service that is supplied along a traffic network (Talen and Anselin 1998). In this work, the associated buffer radius is drawn along the pedestrian network. The resulting service area is assumed to represent an area that covers a short walking distance from the parking street segment. The literature suggests that the maximum distance drivers are willing to walk to their destination is 500 m (Van Der Waerden, Timmermans, and De Bruin-verhoeven 2015). Accordingly, the service area radius was set to 500 m.
The land use features residential, office, and industrial were quantified within the service area of each parking street segment. The first land use feature was measured according to the number of residential units contained per service area, while the office and industrial features, respectively, were assessed by the sum of their square footage. In doing so, land use parcels were considered that lie partially or completely within the service area. Similarly, the number of business POI locations were counted within each service area.

Shortest path
As for the quantification of the tourist and public transport POIs in relation to the parking segments, shortest paths (Dijkstra 1959) were computed. In this work, single-source shortest paths were computed from the center point of each parking segment to all POIs along the pedestrian network. As a result, the mean distance (i.e. the average distance of all shortest paths from each parking segment to all points) was computed.

Geospatial feature selection
Making use of the approaches described above, nine geospatial features were derived in total. Three features were allocated to each of the categories centrality, POIs, and land use. Table 2 shows all geospatial features and their derivation.

Prediction framework
The aim of the prediction framework is to forecast the occupancy rate [0, 1] of the parking street segments for a specific time in the future. A prediction horizon ranging from 1 step to 10 steps was considered. One time step corresponds to 1 h. Hence, occupancy rates from 1 h to 10 h ahead were predicted. This was realized by using historical occupancy rates as data input. Accordingly, to make a 1 step ahead prediction, the occupancy rate 1 h prior to the time at which occupancy is to be predicted was considered. Moreover, geospatial features and time (i.e. time of day) were considered in the data input. X is defined as the data input (i.e. a feature vector) and y corresponds to the prediction output. Generally, the prediction problem can therefore be described as follows: where t is the time (i.e. hour of the day), O t ð Þ is occupancy at time t, k is the number of steps (i.e. hours) ahead to be predicted and GF 1 ; . . . ; GF i are the geospatial features; Depending on which feature set in Table 3 is employed, there might be 0 (i.e. FS1) to 9 (i.e. FS8 -with all features) geospatial features.
In order to assess the effect of different predictors (i.e. different input features) on the prediction models, X X X X X 8 feature sets were defined (FS1 to FS8), differing in their geospatial information input. Table 3 summarizes each feature set with its respective input categories.

Prediction models and parameterization
In the following, we introduce the two ML models employed in this work. Both the RF and FFNN models are popularly used for various prediction problems, especially for the prediction of parking occupancy (see Table 1). Furthermore, they are relatively straightforward to implement and train. The models as well as the above preprocessing and geospatial feature preparation steps, were implemented using the R language. All the R codes can be accessed on the GitHub repository at https://github.com/michaelbal mer/OnStreetParkingPrediction.

Random forest
RF (Breiman 2001) is an ensemble learning method that uses a collection of decision trees for regression and classification tasks. Specifically, a randomly selected subset of predictors from the training data is used to build the trees. The algorithm subsequently outputs the mean prediction of the individual trees. As a compromise between required processing power and model performance, the number of trees was set to 200. Additionally, for each feature and training data set, two hyperparameters were tuned: the number of variables randomly selected at each split and the maximum number of terminal nodes each tree can have. Depending on the data input combination, the former was set to 2-5, whereas the latter was set to 100-1000.

Artificial neural network
ANNs (Haykin 2010) are connectionist systems, inspired by neural networks in the human brain. They have been implemented in various fields, for instance for engineering and technical problems. ANNs consist of neuron-like processing nodes, organized in layers. For this work, a FFNN with two hidden layers was built and optimized by the RMSprop optimizer. Each of the hidden layers consisted of 128 nodes which were activated by a rectified linear unit activation function. A minibatch gradient descent approach was implemented by setting the batch size to 32. The number of epochs was set to 100. The batch size defines the number of samples that are passed through the network before weights are updated, whereas the number of epochs denotes the number of times the algorithm works through the entire training dataset.

Performance assessment
A variety of different evaluation metrics exist, such as Mean Absolute Percentage Error (MAPE), Mean Absolute Error (MAE), and Mean Squared Error (MSE). Parking occupancy can be as low as zero and any evaluation metric that divides by the true data (such as MAPE) is therefore inadequate for two reasons. The first is the immediate problem of division by zero, which can happen if there are no vehicles parked on the street segment during an observed period. The second problem is that the errors will tend to be largest for small ground truth values, so the prediction model will focus on correctly predicting small values. However, most applications are probably mostly interested in the accurate prediction of hours where there is a high occupancy and hence a metric that does not disadvantage those occupancies is preferable.
MAE and MSE both avoid this problem by only looking at the magnitude of the deviation, irrespective of the target value. Applications can be assumed to be interested in avoiding gross errors, which are penalized harder by MSE. Hence, both MAE and MSE were used to evaluate the performance of the prediction models, and further complemented by the coefficient of determination (R 2 ). Below, the mathematical expressions for the three metrics are given.

MAE
where n is the number of data points, y i are the observed values, ŷ i are the predicted values, and � y i are the mean values.

Experimental design
In order to address the research questions mentioned in Section 1, experiments were performed on all 8 feature sets. To analyze different input scenarios, different training and test splits were applied. For each feature set, the amount of training data was set to range from 1 day up to 10 days, in 1-day increments. Furthermore, to take into account potential differences between weekdays, the datasets were trained on the same weekdays. For instance, for a training data input of 3 days, 3 consecutive Mondays were considered. The test dataset in that case then consisted of records on the following Monday. Prediction performances among different weekdays were finally averaged. For all combinations, a ten-fold cross validation, a widely used and robust accuracy estimation method (Kohavi 1995), was applied. Hence, training and test subsets comprised 90% and 10% of the total reference data, respectively. Moreover, for each data input scenario and feature set, different prediction horizons were considered (see Section 4.5). All experiments were performed using both RF and ANN algorithms and results were compared in terms of MAE, MSE and R 2 . The evaluation of the contribution of the geospatial information to the prediction models formed a key interest. This was achieved by comparing the prediction performance of the baseline (FS1) to those of all other feature sets (FS2 to FS8) for each training period. Similarly, for each prediction horizon, the effect of the geospatial features was examined. Moreover, in order to evaluate the predictive importance of each feature, the percent increase in MSE (%IncMSE) was derived. It indicates the increase in error as a consequence of a feature being permuted (i.e. values randomly shuffled). The higher the value, the more important the feature.
In a last step, the variation of the models' predictive performance as a function of geographic space was explored. Notably, model performances were evaluated in a quantitative and qualitative fashion for all parking districts and street segments.

Results
In this section, the results are presented evaluating the effect of varying the prediction horizon, that is, how far ahead the models predict the future (Section 5.1); the influence of the amount of training data on the model performance (Section 5.2); the contribution of the geospatial features to the overall model performance (Section 5.3); and finally the spatial variation of parking occupancy prediction (Section 5.4). Figure 3 shows the performances as a function of the prediction horizon of at most 10 steps in terms of MAE for both the RF and ANN algorithms. Generally, it is apparent that an increased prediction horizon had a negative influence on the performance, as the error was increasing. This held true for all feature sets and both prediction models. Hence, the further into the future parking occupancy was predicted, the more error-prone was the result.

Prediction horizon
Evidently, a comparison of the feature sets revealed that there was little difference for short-term predictions (i.e. predictions of few steps/h ahead). For a 1-step/h ahead prediction, results were almost identical among feature sets. However, as the prediction horizon increased, differences became more prominent.
Predictions on feature sets containing geospatial information (FS2 to FS8) consistently outperformed the baseline (FS1) on all prediction steps when the RF model was considered. More importantly, predictions based on all the categories of geospatial features (i.e. FS8) performed considerably better than all other feature sets. The same mostly applied to the ANN model. Table 4 provides an overview of model performances for all feature sets in terms of MAE, MSE and R 2 . Results of prediction horizons of 1, 5 and 10 steps are listed.

Training dataset size
To investigate how the results change under different amounts of training data, we vary the training dataset sizes from 1 day to 10 days in 1-day increments. This investigation is of interest, as sometimes the data available for model training might be very limited, especially at the beginning of the data collection. Figure 4 illustrates the relationship between training dataset size and prediction performance for a prediction horizon of 5 steps/h.
When considering the RF model (Figure 4(a)), an increased amount of data input was beneficial up to a training dataset comprising 5 days, where an optimum was reached. An input of additional training data (i.e. 6 to 10 days) was not associated with model improvement. This applied to all prediction horizons (though only the result for a prediction of 5 steps/h ahead is shown). Moreover, patterns were very similar regardless of the prediction horizon. However, it is important to note that performance differences as a function of the training dataset size were relatively small. It should also be noted that the overall pattern was very similar for all feature sets and that the baseline was outperformed by all other feature sets. Again, predictions using FS8 achieved best results consistently.
In comparison to the RF model, the impact of the training dataset size differs for the ANN model in terms of both pattern and magnitude (Figure 4(b)). Although best results were generally also achieved using a training dataset size of about 5 days, the model benefited from an increased amount of training data to a much greater extent. This was especially apparent for shorter prediction horizons. Moreover, similar to the results for RF, using FS8 generally achieved best results.

Feature importance
In order to evaluate the contribution of individual geospatial features to the overall model performance, feature importance was determined for the RF model. Figure 5 gives insight into feature importance in terms of IncMSE.
When average values across all prediction horizons are considered, the time of day feature exceeds an IncMSE of 0.05 and contributes most to the prediction model, followed by the historical occupancy. In terms of geospatial features, the land use feature office showed the highest predictive importance. The POI feature tourist attractions contributed slightly less, followed by the POI feature business. Centrality features alpha and betweenness contributed least to the prediction model. Figure 6 shows the performance improvement as a function of the prediction horizon. For the RF algorithm ( Figure 6(a)), the length of the prediction horizon was clearly correlated with a model performance improvement when results of FS2 to FS8 were compared to those of the baseline (FS1). The longer the prediction horizon, the more the values of each feature set diverged. Accordingly, on a short-term prediction horizon, the inclusion of geospatial information  produced improvements of 3.1-4.2%, whereas on a prediction horizon of 10 steps these values amounted to 16.5-25.4%. The most significant improvements were recorded for FS8 across all prediction horizons, where all three categories of geospatial features were included. Unlike the RF model, performance improvement for the ANN model did not steadily increase as a function of the prediction horizon ( Figure 6(b)). Nevertheless, on longer prediction horizons, improvements were achieved. On average, again, FS8 showed most improvement across all prediction horizons, ranging from approximately 0-7%. It should be noted that certain prediction horizons (e.g. 3 and 8 steps) showed more improvement than others. However, no clear pattern could be recognized.

Quantitative assessment
A quantitative assessment was carried out by comparing the prediction model performance across the parking districts. The performances of each district on a 5-step prediction horizon are shown in Figure 7. Generally, there were performance disparities between the parking districts. The district Mission recorded best parking occupancy prediction results for its  parking segments for both feature sets and algorithms. Accordingly, using the RF model with FS8 led to an MAE of 0.06 on a 5-step prediction horizon for street segments in this district. In contrast, parking occupancy rates were most difficult to predict in the districts Civic Center and Downtown. The median MAE value in the former was about 0.14 with the above stated prediction set-up.
The introduction of geospatial information (i.e. the usage of FS8) improved median occupancy prediction values for all districts on all prediction horizons for both RF and ANN algorithms.
Drawing on the above results, relative model improvement for each district was derived by computing the relative differences in terms of MAE between FS8 and the baseline (FS1) in the RF model. As shown in Figure 8, by including geospatial information in the prediction model, performance improvements were observed for all the districts. However, prediction improvements deviated across the districts. The most significant improvements were recorded in the districts Mission and Inner Richmond on a 1-step prediction horizon and in Mission and Fisherman's Wharf on a 5-h prediction horizon. Median improvements were as high as 5-10% and around 25% for the former and the latter, respectively. Parking segments in Civic Center saw least prediction improvement, with median values of 0% and 10% on a 1-step and a 5-step prediction horizon, respectively.
Comparing Figures 7 and 8, we can see that locations that were difficult to predict when using the baseline seemed to remain to be difficult to predict after the inclusion of geospatial information in the model.

Qualitative assessment
In the following, a qualitative assessment of the spatial variability of prediction performance is provided. The parking segments' occupancy prediction results using FS8, and the relative performance improvement of FS8 compared to the baseline are visualized in Figure 9.
In general, no consistent pattern across space is identifiable for model performance with FS8 (Figure 9(a)). In many cases, parking segments whose occupancy rates were reliably predicted are located next to or in the vicinity of others with relatively poor model performance. Especially in the north-eastern parking districts of Fisherman's Wharf and Downtown as well as in the central districts of Civic Center and Fillmore, prediction results across parking segments were fairly heterogeneous. Nevertheless, in the southern parking district of Mission, parking occupancy was predicted very reliably for all segments.
Similarly, there was no distinct pattern when the relative performance improvement was considered (Figure 9(b)). However, some parking segments with already difficult to predict occupancy rates did not benefit much from the inclusion of geospatial information, especially in the centrally located Civic Center district. In contrast, parking segments that achieved good prediction results using the baseline experienced further improvement when geospatial information was added (e.g. in the Mission district).
To further investigate the effect of spatial context on parking occupancy prediction, the district Civic Center was examined in more detail. Figure 10 depicts a close-up view of the district. The performance of FS8 and the relative performance improvement compared to the baseline on a 5-step prediction horizon are shown. Additionally, the locations of the POIs (business, tourist, and public transport) are included. Evidently, there were two parking segments with especially distinct differences in terms of model performance in the Civic Center district (see the labeled street segments in Figure 10). 200 Polk Figure 9. Spatial distribution of performance on a 5-step prediction horizon. Random forest algorithm. (a) FS8 and (b) relative performance improvement using FS8 compared to the baseline (FS1).

Street
(2) recorded an MAE of 0.28, whereas 500 Hayes Street (1) recorded an MAE of merely 0.07. Moreover, the inclusion of geospatial information improved the former and the latter by 12.5% as oppsoed to 35.2%, respectively. Other parking segments in the district recorded MAE performance values of 0.11 to 0.20 and improvements of −14.6 to 22.7%. Furthermore, it is apparent that there are only few businesses in the vicinity of 200 Polk St, the parking segment that was most difficult to predict. In contrast, a high number of businesses can be found at or near 500 Hayes St and other parking segments that were relatively easy to predict.
Another possible explanation of the diverging occupancy prediction results for the above two parking street segments might be the variation of their occupancy rates over time ( Figure 11). Evidently, the occupancy rate for 200 Polk St fluctuated heavily over time, regularly reaching occupancy rates of 0 and 1. 500 Hayes St, on the contrary, recorded steady rates, mostly hovering between 0.6 and 0.8. This suggests that parking segments with significant temporal variability of their occupancy rates are generally more difficult to predict.

Summary of results
In summary, the most relevant findings are: • The incorporation of geospatial information consistently helped to improve prediction models. Improvements of up to 25.4% compared to the baseline were achieved on long-term prediction horizons (e.g. 10 h ahead prediction). • The inclusion of more geospatial information was associated with better model performance.
Land use and POI information were more beneficial than centrality. Among all the geospatial features, the land use feature office showed most predictive relevance, followed by POI features tourist and business. • Longer prediction horizons entailed less reliable predictions. • With the RF and FFNN models currently implemented in this study, an increased amount of training data (i.e. more days of training data) did not necessarily improve the prediction model. • There were performance disparities across the parking districts. However, no clear spatial pattern was observed. The introduction of geospatial information into the model did not entail uniform improvement across space. • The ANN model was outperformed by the RF model across all metrics. Moreover, the RF model benefited from geospatial information to a much larger degree.  6. Discussion

Benefits of the inclusion of geospatial data
To answer the leading research question, the benefit of adding geospatial information to the parking occupancy prediction models was addressed. In order to do so, experiments were conducted that compared the relative performance improvement of data inputs containing geospatial information (FS2 to FS8) with the baseline (FS1). Additionally, the importance of each individual feature was explored for the RF model using FS8. Overall, prediction performance could be increased by up to 25.4%. Across all prediction horizons, FS8 consistently performed best, indicating that more geospatial input is beneficial. These findings are insofar relevant, as they point out that the incorporation of the underlying geographic context helps to improve parking occupancy prediction models. Besides, the results are novel, as, to the best of our knowledge, no study has previously considered the explicit inclusion of geospatial information for parking occupancy prediction problems.
In other fields, geospatial information has also been found beneficial as input for prediction problems. POI/land use data have been implemented for traffic models, potentially improving conventional models (Krause and Zhang 2019;Luo 2010;Sarlas and Axhausen 2016). Similarly, Chan and Cooper (2019) used centrality information to predict bicycle mode share and flows, achieving comparable results to more complex models that lack spatial input.
Generally, the prediction models performed better incorporating the geospatial categories land use and POI, whereas centrality contributed less to the prediction. It could be argued that the categories land use and POI are conceptually more similar and therefore had a similar impact on the prediction model. Moreover, the mere location of a parking street segment with respect to the entire street network did not appear to play as influential a role as the configuration of geographic context in its local vicinity.
Results of overall contributions of geospatial categories agreed with individual feature importance. Each POI feature contributed more to the model than centrality features. The land use feature office was considerably more beneficial than residential and industrial. This could partially be explained by the fact that many parking districts in the study area lack industrial areas. The land use classes residential and office, on the other hand, are more evenly distributed, with high concentrations in certain areas.

Training dataset size (Sub-RQ1)
As mentioned before, the impact of varying the amount of data inputs on the model performance is also a critical aspect, as sometimes the data available for model training might be very limited, especially at the beginning of the data collection. In this study, the amount of input data to train the model was varied from 1 day to 10 days of occupancy data. Overall, little change was recorded for the RF model. The fact that an increased amount of training data did not lead to improved results is insofar unexpected as a larger amount of input data generally is associated with increased performance (Figueroa et al. 2012;Bock 2018). Since the recorded occupancy data was relatively uniform across time (i.e. there were no significant deviations, such as special events or holidays), unexpected anomalies in the data can be excluded as a reason. Hence, further research is needed to investigate the influence of the training dataset size in the context of the prediction model implemented with RF.
The ANN model, by contrast, benefited from an increased amount of training data significantly, especially on short-term predictions. These findings are in agreement with Ji et al. (2015) and Bock (2018), who found that their models improved when more input data was added.

Temporal prediction horizon (Sub-RQ2)
Further, the influence of the temporal prediction horizon on the model performance was examined. Prediction horizons of 1 to 10 steps were considered, whereby 1 step corresponds to 1 h. Unsurprisingly, as the prediction horizon increased, the prediction results became less reliable, due to the fact that errors accumulate. This is in agreement with several studies that compared the length of the prediction horizon with the model performance (Monteiro and Ioannou 2018;Zheng, Rajasegarar, and Leckie 2015;Liu et al. 2018;Mei et al. 2019).
The fact that the length of the prediction horizon was correlated with the model performance was also reflected in the feature importance. The further the occupancy rates lay in the past, the less important they were to the model. As a result, beyond a certain prediction horizon, the temporal and geospatial features were relatively more important. This is also consistent with the study of Zheng, Rajasegarar, and Leckie (2015) who found that the inclusion of previous observations as a feature is very beneficial for short-term predictions. However, as the prediction horizon increased, other features became more relevant.

Spatial prediction variation (Sub-RQ3)
The spatial variation of the prediction performance was evaluated to address Sub-RQ3. Evidently, there were spatial disparities, manifested by prediction performances that were not uniform across space. Parking occupancy could be predicted more reliably in certain districts than others. However, as a whole, no clear pattern was recognizable. The same applied for the performance improvement after the inclusion of geospatial information, i.e. the extent of improvement is not selective about space. Nevertheless, the model was improved for many street segments' occupancy rates that were reliably predicted. Conversely, a considerable number of street segments recording unreliable predictions could only slightly benefit from the inclusion of geospatial information. Hence, locations that were notoriously difficult to predict remained difficult to predict after the inclusion of geospatial information in the model. A potential reason might also be that both the RF and FFNN developed in this study might not be able to accurately model the changing patterns of the occupancy on heavily fluctuating data (e.g. the data of "200 Polk St", see Figure 11). Moreover, proximity of parking street segments did not automatically suggest similar prediction performance, as already suggested by Rajabioun and Ioannou (2015), Richter, Martino, and Mattfeld (2014), and Leu and Zhu (2015).

Machine learning algorithm comparison (Sub-RQ4)
Finally, Sub-RQ4 aimed to discover performance differences of ML algorithms that were implemented in this work, namely RF and ANN. The performance of RF surpassed that of the ANN in every aspect. Moreover, the ANN model was much less receptive to the inclusion of geospatial information, recording an improvement of up to 7.2%, compared to up to 25.4% for RF.
In the literature, there is no consensus as to which algorithm is better suited to solve the problem of parking occupancy prediction. Both RF and ANN algorithms have been used to reliably make predictions (e.g. Bock 2018;Dias, Bellalta, and Oechsner 2015;Zheng, Rajasegarar, and Leckie 2015;Awan et al. 2020;Koumetio Tekouabou et al. 2020). The fact that an FFNN algorithm was used instead of a more complex RNN or LSTM can be seen as a limitation. Therefore, the performance of other neural networks should be explored in future work. However, literature has shown that RNN or LSTM algorithms are not necessarily better suited than other algorithms (e.g. Arjona et al. 2020;Badii, Nesi, and Paoli 2018). The fact that the RF model outperformed the ANN model considerably in this work could be explained by its robustness to overfitting and its relative simplicity.

Conclusion and future work
Prediction of parking availability in urban environments has attracted significant research interest in recent years. Different prediction models have been proposed, considering various external data inputs, such as temporal, weather and traffic information. However, the underlying geospatial context of the study area has received little attention so far. This work aimed to investigate the value of geospatial information for the prediction of on-street parking occupancy. To this end, RF and ANN (i.e. FFNN in this paper) prediction models were implemented and geospatial data regarding road network centrality, land use, and POIs were used as input features. The key findings can be summarized as follows: • The inclusion of geospatial information leads to a performance improvement of up to 3% and 25% on a short-term and a long-term prediction horizon, respectively. Hence, the incorporation of geospatial context adds value to parking occupancy prediction models. The inclusion of land use and POI information were more beneficial than network centrality measures. • Generally, longer prediction horizons (i.e. how far ahead a model seeks to predict the future) produced less reliable predictions, due to the accumulation of errors. Nevertheless, the inclusion of spatial information showed much more relevance in long-term predictions. • The amount of training data did not significantly impact the RF model's prediction performance.
As occupancy data anomalies could be excluded, the reason is unknown and has to be investigated in further research. The ANN model's performance, however, benefited significantly from an increased amount of training data. • There were prediction performance disparities across parking districts. Moreover, no clear spatial pattern could be identified, and proximity of parking segments did not necessarily imply similar prediction results. • In terms of model performance, the RF model outperformed the ANN model in all respects. A possible explanation is its robustness to overfitting as well as its relative simplicity compared to the ANN.
In spite of the promising results, there are opportunities to further advance the research on parking occupancy prediction incorporating geospatial information. Firstly, more geospatial data sources such as the population distribution, or the public transportation network could be considered. Moreover, instead of using the last occupancy rate, a sequence of historical occupancy rates might be employed. Secondly, research could be extended in terms of prediction algorithms. For example, the usage of ANNs could be further exploited by implementing more complex architectures (e.g. RNNs and LSTM). The linking of advanced statistical methods (such as ARIMA) with ML would also be a conceivable option. Thirdly, to check the robustness of the findings, we are also interested in applying the proposed evaluation framework to other cities.