ActivityNET: Neural networks to predict public transport trip purposes from individual smart card data and POIs

ABSTRACT Predicting trip purpose from comprehensive and continuous smart card data is beneficial for transport and city planners in investigating travel behaviors and urban mobility. Here, we propose a framework, ActivityNET, using Machine Learning (ML) algorithms to predict passengers’ trip purpose from Smart Card (SC) data and Points-of-Interest (POIs) data. The feasibility of the framework is demonstrated in two phases. Phase I focuses on extracting activities from individuals’ daily travel patterns from smart card data and combining them with POIs using the proposed “activity-POIs consolidation algorithm”. Phase II feeds the extracted features into an Artificial Neural Network (ANN) with multiple scenarios and predicts trip purpose under primary activities (home and work) and secondary activities (entertainment, eating, shopping, child drop-offs/pick-ups and part-time work) with high accuracy. As a case study, the proposed ActivityNET framework is applied in Greater London and illustrates a robust competence to predict trip purpose. The promising outcomes demonstrate that the cost-effective framework offers high predictive accuracy and valuable insights into transport planning.


Introduction
Activity-based models aim to predict travel demand using trip purposes to understand and plan the transport network usage under different socio-economic scenarios and land use structures. Transport planning with such models rely on travel surveys, which are relatively small sample sizes, are expensive to obtain and have relatively low update frequencies (collected only one day). Therefore, they are prone to bias when estimating travel demand for the whole population (Yang et al. 2019). On the other hand, collecting Smart Card (SC) data has shown great potential for investigating passengers' daily activities at an unprecedented scale, such as a much larger population and a longer period of data collection (Anda, Erath, and Fourie 2017). In addition, smart card data reveal an individual's spatial-temporal activity pattern as a sequence of activity locations, activity start and end time, duration of the activity and land use in the proximity of alighting or boarding station (Faroqi, Mesbah, and Kim 2018), which could be further explored to derive the trip purpose of the travelers (Sari Aslam and Cheng 2018;Sari Aslam et al. 2020).
Trip purpose is essential for planning purposes, performance evaluation and the development of public transit networks and services (Faroqi, Mesbah, and Kim 2018). The scope of the research expands to consumer behavior for commercial establishments (Longley, Cheshire, and Singleton 2018), urban mobility, and people flows for city planners (Yang et al. 2019), the aspiration of the quality life for economists (Nakamura et al. 2016), and public health for policy and decision-makers, e.g. the spread of COVID-19 . Thus, longitudinal smart card data with volume and details need to be investigated for trip purposes, such as home, work, entertainment, eating, shopping, drop-offs/pick-ups, and part-time work activities. However, the majority of the trip purpose identification models from smart card data are focused on only primary activities, such as home and work (Chakirov and Erath 2012;Devillaine, Munizaga, and Trepanier 2012;Zou et al. 2016;Yang et al. 2019;) but rarely secondary activities (Alsger et al. 2018;Sari Aslam et al. 2020). The reason is that the defined rules and number of constraints are limited and reduce the ability to identify trip purposes with high accuracy, specifically for secondary activities (Xiao, Juan, and Zhang 2016;Anda, Erath, and Fourie 2017), which are complex compared to regular commuters' activities. Therefore, there is a need to investigate trip purposes using data-driven Machine Learning (ML) approaches, which are flexible enough to capture complex information about trip purposes. Besides, they are capable of handling a non-linear problem with high accuracy (Xiao, Juan, and Zhang 2016;Anda, Erath, and Fourie 2017).
Although ML methods focused on clustering and classification of trips, passengers, and stations to investigate travel patterns and behaviors from smart card data (Faroqi, Mesbah, and Kim 2018), trip purposes hardly investigated from individuals' activities (Lee and Hickman 2014;Kusakabe and Asakura 2014;Han and Sohn 2016). The reason is that the model performance is low as compared to other methods due to the following reasons: First, the noise in unprocessed smart card data requires pre-processing steps before applying prediction models to achieve high accuracy (Dacheng et al. 2018;. Second, aggregated input features per user from a large volume of travel data, such as average travel duration, and average departure time of the first/ last trips (Goulet-Langlois, Koutsopoulos, and Zhao 2016; Han and Sohn 2016), may not accurately represent activity points. Third, how robustly smart card data combined with other data sources is a crucial step to present the semantic interpretations of activities (Yang et al. 2019).
Therefore, in this study, we propose using the ActivityNET framework to predict passengers' trip purposes for each activity per individual from their smart card data. The feasible framework includes the following: The first phase of the study focuses on extracting spatiotemporal activities from smart card data and combining these activities with points of interest (POIs) using an "activity-POI consolidation algorithm". This part of the study offers an understanding of human mobility and urban flows from two big data sources in cities. In addition, the combined dataset provides input features under three sub-groups, such as activity characteristics, day characteristics, and land use characteristics. The second phase of the study uses input features with multiple scenarios and predict trip purposes with Artificial Neural Network (ANN) under primary (home and work) and secondary activities (entertainment, eating, shopping, child drop-offs/pick-ups, and part-time work activities) with high accuracy.
The contributions of this study are summarized as follows: • The proposed "activity-POIs consolidation algorithm" aims to explore how two large datasets, such as smart card data and POIs are combined for trip purpose prediction. • The proposed ActivityNET framework uses multiple scenarios and predicts trip purposes of primary and secondary activities using ML algorithms with high precision. • The trip purpose prediction model, ActivityNET, is a cost-effective method using smart card and POIs to help transport and urban planning.
The next section of the paper presents the data and methods with a logical framework. The following section (section 3) offers the results of a case study in London. Finally, discussion and conclusions of the work are presented in sections 4 and 5, respectively.

Methods
The proposed ActivityNET framework in Figure 1 predicts trip purpose in two phases. In Phase I, two large data sources, namely smart card data and POIs, are combined using the proposed "activity-POIs consolidation algorithm" after extracting activities. Thus, the location information, e.g. station name, from travel data, can be enhanced by dynamic socioeconomic land use attributes. Phase II, extracted spatial-temporal features are selected with multiple scenarios and passed into the model to predict trip purposes within sub-categories, e.g. home, work, entertainment, eating, shopping, child drop-offs/pickups, and part-time work activities. Hence, the reason for the trips is investigated, revealing why people spent their spare time within the city using smart card data with the help of POIs.

Phase I: data pre-processing
This section aims to increase the accuracy of the large travel datasets while cleaning SC data. First, single trips in a day are excluded. The reason is that insufficient information has failed to define an activity. Thus, 1060 single trips are excluded from the total of 19,792 trip records. Besides, 499 missing trips, e.g. alighting time or station, boarding time or station, are also excluded, which create uncertainty to extract activities (Chakirov and Erath 2012). After the data pre-processing, the travel data in combination with POIs are used in the prediction model to explore trip purposes from travel data.

Extract activities.
The definition of a trip is a one-way journey from one stop to another stop. An activity is the time duration between two consecutive trips, such as the alighting station of the first trip and the boarding station of the second trip. There is a sequence of activities in a day per individual with their characteristics, such as start-end time of the activity, the location of the activity, the day of the activity, which can be used to infer trip purposes. Trip purpose (the reason for the trip) is to find an answer "why has an activity happened in a specific location and time"? To achieve this, the location of the transit data need to be enriched using other data sources, e.g. land use information. Then, it is possible to infer trip purposes based upon the type of activities, such as home, work, entertainment, eating, shopping, and other activities from POIs (Faroqi, Mesbah, and Kim 2018).
The assumptions of activity extraction are applied in this stage (Sari Aslam et al. 2020) using transfer time and walking distance between public transit stops, which were assumed to be 15 min (TfL 2019) and 800 m (RTPI 2018; Alsger et al. 2018;Sari Aslam et al. 2020), respectively. The resulting dataset consists of 18,232 trip records, which means 9,116 data points (activities) from smart card data.

Combining both datasets using activity-POIs consolidation algorithm. POIs from Twitter and
Foursquare data have been used to investigate trip purposes, human mobility and urban flows to generate an understanding of transport and urban planning in cities (Rashidi et al. 2017). To infer activities from transit data, the highest probability of activity types has been determined from POIs (Alsger et al. 2018;  Sari . However, in this study, we have explained how both large datasets, i.e. smart card data and land use information (POIs), can be combined and used for the machine learning algorithm to predict trip purposes. Figure 2 presents the proposed "activity-POIs consolidation algorithm" with details in three sections. First, Figure 2(a) illustrates the proposed activity-POIs consolidation algorithm to explain how relevant POIs are filtered for each activity. The algorithm starts by selecting a station and an activity in that station. Then the activity is checked: "do we have POIs at the station within walking distance?" If yes, a POI is selected for that activity. Then, the activity-POI temporal information match is tested against two conditions: "the start time of the activity is later than (>) the opening time of POIs and the end time of the activity is earlier than (<) the closing time of POIs". If the conditions are met, the number of check-ins is added under the activity types of the POI. Then, the algorithm moves to the next POI for the same activity. Once all possible POIs have been checked, the activity has the total number of check-ins for each of the activity types: home (H), work (W), entertainment (ENT), eating (EAT), shopping (SHO), outdoor & recreational (REC), and travel & transport (TPO). This process is conducted for all activities in each station. Thus, the characteristics of land use information using the check-ins of POIs are assigned to each activity with different weights. Figure 2(b) illustrates the same scenario using data characteristics under three categories, including spatial information match using the coordinates of both datasets, temporal information match using the start/end time of activities from smart card data and opening/closing hours of POIs, and attractiveness of each activity using the total number of check-ins for the activity types from the POIs. The opening hours of the POIs may have some variation on different days. If this is the case, the earliest and latest working hours are used for each POI, e.g. if opening/closing hours of a place are 10:00/15:00 from Monday to Friday and 12:00/16:00 on Saturday and Sunday, the opening/closing hours are considered to be 10:00/16:00 for the place.
The third column visualizes the same scenario using an example. Figure 2(c) starts with the spatial information match for an activity (A1) at a station (Oxford Circus station) using "walking distance 800 m", which captured 3023 POIs for A1. The same example, further investigated for A1 considering the temporal information match, is displayed in Figure 2 (d). The start/end times of A1 are 10:00/13:00 and the opening/closing hours of the first POI (POI1Sho) are 9:00/22:00. According to the temporal information match, the time variables overlapped; thus, POI1Sho is moved next step and the number of check-ins is saved for corresponding activity types (POISHOs) in Figure 2(e). Then, the next POIs (POI2Wor and POI3Eat) are similarly checked based on temporal information. The number of check-ins for POI2Wor is added in POIWORs, but the number of check-ins for POI3Eat is not counted in POIEATs due to non- Figure 2. A workflow for combining the two datasets. First, the proposed activity-POIs consolidation algorithm filters relevant POIs for each activity (a). Second, data characteristics are presented under the three subsections: spatial information match, temporal information match, and attractiveness (b). Third, an example, with visualizations, is presented spatially (c), temporally (d), and for aggregated (sum) check-ins for the activity types (e). In addition, m refers to the number of POIs around the station. overlapping temporal information. After running this process for each of the 3023 POIs, the aggregated check-ins are saved under seven categories for A1 as the characteristics of land use information, as shown in Figure 2(e).
Note that the steps in Figure 2 may result in memory issues due to the processing of large datasets. The reason for this is that data processing packages -e.g. Pandas in Python -are designed to work with a low memory allowance. Therefore, PySpark is used to carry out the processing steps and analysis in this section.
As a result, combined input features are presented with details as temporal features (activity characteristics, day characteristics) and spatial features (land use characteristics) in Table 2.

Phase II: prediction of trip purposes
This section shows the structure of the model, training the model using input features, and the prediction of trip purposes using the trained model illustrated under "phase II" in Figure 1.

The structure of the artificial neural network with multiple scenarios.
The artificial neural network is applied for predictive analysis to classify multi-class trip purposes using its non-linear pattern classification capabilities. The reason is that neural networks are capable of handling dimensionality of the problem using spatial dependencies in a large dataset with high accuracy and low computing time (Xiao, Juan, and Zhang 2016;Ibrahim et al. 2019), while statistical models are parametric and struggle from high computational complexity in large-scale scenarios. On the other hand, standard ML methods are narrow in architecture that cannot comprehensively handle non-linear large spatial-temporal data with high dimensionality.
The details of the structure of the model illustrated in Figure 3 are provided in the following subsections; 1. Input layer: The first layer of neural networks transfers the information from input features using the same dimensionality. Due to class imbalance issues (see section 2.1.1), (1) random over-sampling technique that duplicates data points randomly in the minority classes and (2) random under-sampling technique that removes data points from majority classes randomly (Brownlee 2020a), are compared to (3) unchanged values in this section. In addition, the dimensionality of the layer is increased and decreased, including (input dimension = 11, with POIs) and excluding of spatial features (input dimension = 4, without POIs) to evaluate overall accuracy with different scenarios in the model (section 3.2).
2. Hidden layers: These layers process the information from the input layer to the output layer. In this section, the number of neurons and functions needs to be investigated. Even though there is no rule of thumb to choose the number of layers in neural network (Goodfellow, Bengio, and Courville 2017), two hidden layers are processed the transformation, one with 100 and one with 60 units, which are activated using the Rectified Linear Unit (ReLU) (Glorot, Bordes, and Bengio 2011) to increase the nonlinearity of the model and improve the performance of the units (Dahl, Sainath, and Hinton 2013).
The dropout regularization technique (Hinton et al. 2012) is considered after hidden layers with a dropout rate of 0.5 to reduce overfitting. The cross-entropy loss was applied to the model as the training objective function. The model is compiled using the stochastic gradient descent Adam optimizer (Kingma and Ba 2015) to minimize the loss function with an initial learning rate of 0.001. Different values of mini-batch gradient descents with different possible epochs are also investigated, and the best accuracy is attained using a batch size of 64 with 700 epochs during the training process.
Hyper-parameters such as the number of neurons, drop rate, optimizers, activation functions, loss functions are tuned to decide the best possible parameters in the model using grid search techniques (one parameter is changed while others are unchanged) (Brownlee 2020b) 3. Output softmax layer: The output layer is activated using the softmax function to distribute the probability throughout each output class. The result of the given input feature is presented as the high probability value for predicting the output class. As a result, the proposed model is trained with 70% of the data (training data) and tested with the rest of the dataset (30% testing data).

Evaluating and validating the model performance.
Validation of the model is crucial for the study, and the model evaluation is illustrated under two subsections. The first approach of evaluating model performance is achieved under three sub-categories (1) evaluating the model performance with three measures presented, such as precision, recall, and F1-score (Brownlee 2020c), (2) plotting the confusion matrix to illustrate the prediction performance for each class independently, and (3), comparing the effectiveness of the model to other baseline models using cross-validation.
The second approach of the validation focusses on the comparison of the accuracy obtained from the highest probability of land use information (Alsger et al. 2018;Sari Aslam et al. 2020). Thus, after phase 1, we have inferred the activities from smart card data using the highest probability of POIs as a benchmark model and compared the results with the survey smart card data. The validation of activity type has been calculated as follows: Where A T is activity type, such as home, work, etc.,V A T is the percentage of validated activity type, CA T is the correctly identified activity points from labeled data using the highest probability of land use (POIs) values and TA T n is the total number of n (check-ins) in activity type. Hence, CA T is normalized based on the total number of check-ins. As a result, the accuracy for each activity type is presented in section 3.2.2.

The result of the multiple scenarios for input features to predict trip purposes
The classification methods have the potential to examine trip purpose within travel data (Kuhlman 2015;Alsger et al. 2018). However, the representation of trip purposes in each class with a different number of data points may create class imbalance issues in the ML approach (Brownlee 2020a). For instance, almost 60% of the activities in the survey data are primary activities, and 40% are secondary activities, which reveals that the count of each secondary activity is much lower than the count of each primary activity. Therefore, random over and under-sampling techniques are compared to unchanged values of each class to evaluate overall accuracy. In addition, the classification accuracy using different scenarios such as including and excluding land use attributes (with/without POIs, respectively) are also evaluated in this stage to obtain the best possible model performance. According to the results in Figure 4, using random under-sampling techniques with POIs achieved an overall accuracy of 94%. Conversely, without POIs this number decreases 7% for an overall accuracy of 88%. The accuracy of using over-sampling techniques with POIs was 96%, and the accuracy without POIs was 89%. Finally, without balancing any classes, the overall accuracy was 89% and 83% with and without POIs, respectively. In addition, Figure 4(c,d) illustrate the convergence of the model accuracy and loss using under-sampling with POIs. As a result of this section, training speed using the under-sampling technique has a lower impact compared to the over-sampling technique. In addition, there is a consistent 6% to 7% accuracy difference using each model with and without POIs shown in Figure 4(b). Therefore, the rest of the analysis is presented using random under-sampling with POIs.

The results of the validation process
We validated the results using two approaches. First, we evaluated the model performance using the testing data. Second, we compared the proposed model against benchmark models using the highest probability of land use information from POIs.

Evaluating the model performance
This section presents the performance of prediction under three sub-sections. First, we evaluate the models using three performance metrics in each class, such as precision, recall, and F1-score (Brownlee 2020c). The best results in precision, recall, and F1 were attained for work activities (primary activities) and child dropoffs/pick-ups and part-time work activities (secondary activities) presented in Table 3.
Then, we present the confusion matrix to clarify the prediction performance for each class independently. The confusion matrix using test data in Figure 5 illustrates that the probability of a correct prediction is larger than misclassification. The lowest prediction score is for shopping activities, with 17% misclassified as entertainment or eating activities. The misclassification may suggest that the temporal variation in the three activities is overlapping. For example, shorter duration shopping activities might be misclassified as eating, and longer duration shopping activities might be misclassified as entertainment. The best score among primary activities is fairly close, with 99% of home and 97% of work activities correctly predicted. The best prediction of inference among secondary activities is obtained for drop-offs /pick-ups (84%) and PT-work activities (81%) as a result of regular activity patterns. The rest secondary activities present similar outcomes with high temporal stability and regularity, such as 84% of entertainment activities, 76% of eating activities.
The third one is the comparison of the model with other baseline models using 10-fold cross-validation. In this section, trip purpose prediction accuracy of ActivityNet is compared with several baseline models, Figure 4. The representation of the data points in each method (a) and the results of overall prediction with/without POIs using unchanged data (UD), random under-and over-sampling (RUS and ROS, respectively) techniques (b), the model accuracy (c) and loss (d) using random under-sampling with POIs.  (Cortes and Vapnik 1995), Logistic Regression Classifier (LR) and Naïve Bayes (NB). In the existing literature, these models have been adopted for trip purpose prediction from different data sources, such as GPS, phone data, but smart card data. Therefore, they are considered baseline models to compare to the proposed model in this study. As shown in Figure 6, the original data is randomly partitioned into 10 subsamples. The highest accuracy, between 86% and 99% with a 12% variance, is achieved using ANN. The second highest accuracy, 84%-89% with a 6% variance, is achieved using RF. The third highest accuracy, 78%-81% with the lowest variance, is captured using SVM. Finally, LR and NB produce the lowest accuracy results in the cross-validation analysis compared to the other classifiers. These results support the assertion that neural networks can build computation-intensive classification with high accuracy using transport smart card data and locational POIs information with the help of data preprocessing steps.

Validation of the model
This section aims to compare the accuracy of the proposed framework to existing models using the highest probability of land use information from POIs. Note that this part of the enrichment is obtained after phase 1. As a result, 51% of work and 49% of home activities, 44% of entertainment, 33% of eating, 35% of shopping, 34% of D/P and 39% of PTW activities are identified as correct. As a result, the proposed ActivityNET framework demonstrates a higher success rate as compared to rule-based techniques in the literature.
The reason for the low accuracy in the heuristic approaches is that the distribution of highly mixed land use provides lower accuracy than the distribution of single land use, such as residential or work centers. Besides, sophisticated techniques provide higher accuracy to predict trip purposes (Anda, Erath, and Fourie 2017).

Discussion
This study aims to predict trip purposes using the spatial and temporal attributes of transport data and land use data derived from POIs with machine learning algorithms. Multiple scenarios, including spatial features with a random under-sampling technique, are investigated to optimize the accuracy of the model. The overall accuracy values of the model predictions for the training and testing datasets are 99% and 94%, respectively. To investigate the model robustness further, cross-validation is applied to represent the difference between the highest and lowest accuracies achieved in ANN versus other baseline methods. The results for each activity type are shown based on precision, recall (sensitivity), and F1-score, as well as confusion metrics. Our results show that the ActivityNET framework provides consistent accuracy and model stability in detecting trip purposes using machine learning techniques for further developments.  Using new big data sources, such as smart card data and POIs provides an excellent opportunity to explain where, when, and why people spend their time within urban settings. Both data sources have great opportunities, such as investigating human mobility, urban flow and trip purposes with some limitations. For instance, smart card data may suffer from demographic details of passengers' , recording destination information for bus users (Gordon et al. 2013), and the trip purpose of the travelers, investigated further using land use attributes such POIs. Similarly, regardless of the wide range of positive characteristics of POIs from foursquare data, e.g. quantifying the weight of the place using check-ins, using working hours of POIs to present dynamics of the activity patterns in cities, POIs may suffer from over-representing of some of the locations, e.g. a small number of users with substantial check-ins in restaurant or shopping centers as compared to workplaces (Rashidi et al. 2017). In addition, demographic biases in the dataset is an inevitable fact that the application is mainly used by younger age groups, e.g. less than 30 years old, as compared to older age groups in the cities (Longley and Adnan 2016).
Even though the proposed framework provides high prediction accuracy compared to other ML models, trip purpose detection inherently involves uncertainty (Xiao, Juan, and Zhang 2016;Faroqi, Mesbah, and Kim 2018) in terms of temporal and spatial similarities in the dataset. For instance, long hours of shopping activity may be disturbed by eating activity (drinking coffee/tea) at a location in which both shopping and eating places are available. Although it is difficult to separate those activities in individuals' daily lives, there are no multiple activities in survey data for the analysis. Therefore, we assume that this is not an issue for the proposed framework.
Moreover, this study also shows a comparison between what-if scenarios and ML approaches. The analysis demonstrates that the highest probability of activity type is dependent on the distribution of land use. That means the distribution of highly mixed land use provides lower accuracy than the distribution of single land use such as residential or work centers. In addition, the land use information from POIs has limitations to represent primary locations. Moreover, the complex sequential relationship between spatial and temporal features can be captured by the ML approach with high accuracy to predict trip purposes.

Conclusion
The availability of big data sources such as smart card data and POIs provide a great opportunity to produce new insights into transport demand modeling. This study aims to predict trip purposes in a feasible framework using the spatial and temporal attributes of transport data and urban functions derived from POIs to generate an understanding of human mobility and urban flow in cities.
The proposed framework, ActivityNET, is demonstrated to provide improved accuracy in trip purpose prediction. First, the framework leverages the proposed "activity-POIs consolidation algorithm", which combines travel behaviors with socio-functional information from POIs, e.g. activity characteristics (activity start and end time, activity duration), day characteristics, and land use characteristics. Second, the framework utilizes an ANN method to predict trip purposes of primary (home and work) and secondary activities (entertainment, eating, shopping, child drop-offs/pickups, and part-time work activities). Third, the proposed framework is applied in a case study in London and achieved 94% overall accuracy using random undersampling techniques with POIs. In addition, high accuracy for primary activities, 99% for home and 97% for work, are obtained from smart card data. Furthermore, improved accuracies are achieved for secondary activities, with 84% for entertainment, 84% for drop-offs /pick-ups, 81% for PT-work, 76% for eating activities, and 62% for shopping activities. In summary, ActivityNET offers trip purpose prediction with high accuracy, which has the potential to inform transport and urban planning. Future work includes creating travel diaries using the results of ActivityNET as an alternative method for travel demand research.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Data availability statement
The data that support the findings of this study can be found in these links. First, travel data (Oyster card data) can be downloaded from (https://api-portal.tfl.gov.uk/). Note that the user needs to label the journey data after extracting activities. Second, London stations data can be downloaded from (https://data.london.gov.uk/dataset/tfl-station-locations). As of last, Foursquare POIs data such as user counts and checkins, opening and closing hours, activity types can be found here (https://developer.foursquare.com/docs/places-api/end points/). Besides, data are available from the authors upon reasonable request (DOI: 10.5281/zenodo.4527765).

Notes on contributors
Nilufer Sari Aslam is a PhD candidate at SpaceTimeLab, University College London. Her research interests are data mining techniques, spatial-temporal big datasets, urban and transport planning.
Mohamed R. Ibrahim is a PhD candidate at SpaceTimeLab, University College London. His research interests are artificial intelligence, urban modeling and geoinformatics.
Tao Cheng is a Professor in GeoInformatics and the Director of SpaceTimeLab at University College London. Her research interests span Space-Time AI, network complexity, and urban analytics.
Huanfa Chen is a Lecturer in Spatial Data Science in the Centre for Advanced Spatial Analysis at University College London. His current research interests include geospatial machine learning, agent-based modeling and spatial optimization.