Analysis of route choice based on path characteristics using Geolife GPS trajectories

ABSTRACT Navigation services are essential for daily navigation, providing turn-by-turn instructions to help wayfinders reach their destinations. These services often differ from the heuristics wayfinders use, resulting in a poor user experience. Researchers have attempted to address this issue by developing algorithms that find less complex routes, by integrating prominent locations along the route to make wayfinding easier and to improve a wayfinder’s knowledge about the environment. These approaches, however, have taken a bottom-up approach, involving a limited number of participants navigating in real or virtual environments which may limit generalisability of results. In this study, we took a top-down approach by analysing a large dataset of GPS-based trips in the real world. Using the Geolife dataset, we analysed individual heuristics for route selection in terms of complexity and prominent locations, and found that wayfinders prefer less complex routes, such as routes that require fewer turns or involve simpler intersections. Additionally, we found that wayfinders choose routes with fewer prominent locations, such as routes that bypass well-known landmarks or busy commercial areas. These findings suggest that simplicity and ease of use are prioritized when selecting a route, while overly complex routes or areas with many points of interest are avoided.


Introduction
For many people navigation services have become an essential part of their everyday life (Axon, Speake, and Crawford 2012;Kitchin and Dodge 2007). These tools provide both visual and auditory guidance to help wayfinders navigate from one location to another (Allen 1999). They have evolved from simple 2D map-based services to more advanced digitally assisted ones. One common feature of navigation services is turn-by-turn guidance, which breaks the overall journey down into segments connected by decision points where new instructions are provided to the wayfinder (Richter 2007).
While navigation services rely on pre-determined routes and turn-by-turn instructions, wayfinders often use a variety of heuristics to simplify their navigation decisions (Millonig and Schechtner 2007), instead of always choosing the shortest or fastest route to their destination (Pingel 2010). For instance, Shao et al. (2014) found that wayfinders prefer easiest-to-reach destinations rather than the nearest-to-reach ones. The specific heuristics that are used may depend on factors such as familiarity with the area and the availability of environmental information. However, while wayfinders consider a range of factors in their navigation decisions, navigation services typically prioritise efficient route planning.
Thus, there is a discrepancy between how navigation services operate and how people naturally navigate, which can cause confusion or frustration for wayfinders, who may find it difficult to follow the instructions or to remember the route (Dillemuth 2009;Hejtmánek et al. 2018;Ishikawa et al. 2008;Parush, Ahuvia, and Erev 2007;Sonmez and Onder 2019). This can be especially problematic in unfamiliar environments, where people may already be disoriented or unsure of their surroundings. To address this challenge, researchers have studied ways in which navigation services could be designed to operate more like the way people navigate. These studies have proposed a number of strategies that could be helpful in improving the effectiveness and usability of navigation services.
One such strategy is to find less complex routes that are easier for wayfinders to follow and remember (Duckham and Kulik 2003;Giannopoulos et al. 2014;Haque, Kulik, and Klippel 2006;Richter and Duckham 2008). This is important because overly complex or confusing routes can hinder spatial learning and wayfinding (Giudice, Bakdash, and Legge 2007). Navigation services can achieve this by avoiding routes with numerous turns or decision points or by breaking down the instructions into smaller, more manageable steps.
Another strategy that has been suggested is using prominent locations as reference points (Krukar, Anacta, and Schwering 2020;Lowen, Krukar, and Schwering 2019). People tend to rely on landmarks or other notable features to structure their environment (Schwering et al. 2017;Yesiltepe, Conroy Dalton, and Ozbil Torun 2021). By incorporating these locations into the instructions as reference points, navigation services can help wayfinders to better understand and remember the route.
Most previous research on navigation heuristics has taken a bottom-up approach, proposing new methodologies which use prominent locations to improve navigation services (Gramann, Hoepner, and Karrer-Gauss 2017;Schwering et al. 2017). These studies have typically involved a limited group of participants navigating in a real or virtual environment using either conventional navigation instructions or the proposed methods, and then comparing the results to evaluate the effectiveness of the new approaches. While this type of research can be useful in demon strating the benefits of using prominent locations, it may not fully capture the complexity of real-world navigation. In addition, there are various factors that can affect the validity of such research, and it can be challenging to provide a controlled environment that is suitable for rigorous reproducible research methods.
In contrast to this bottom-up approach, this study takes a top-down approach, examining how individuals are naturally navigating in the real world. This can provide valuable insights into how navigation services could be designed to operate more like the way people navigate, and help to improve the effectiveness and usability of these services. This study seeks to answer four research questions:

RQ1. How does the number of intersections affect a wayfinder's path selection?
RQ2. How does route detour (compared to the shortest path) affect a wayfinder's path selection?

RQ4. How does the number of prominent locations affect a wayfinder's path selection?
The Geolife dataset (Zheng et al. 2008(Zheng et al. , 2009, which contains GPS trajectories from 182 individuals, was used to answer these research questions. The GPS data was divided into trips and matched to a road network. The best matched paths were chosen based on their similarity to the individual's trip. Next, their number of intersections (RQ1), length (RQ2), complexity (RQ3), and number of prominent locations (RQ4) were calculated, and a statistical analysis of common vs. uncommon paths were performed.
The remainder of the paper is structured as follows. We start by reviewing and comparing human vs. navigation service wayfinding heuristics (see Section 2). Computation of path complexity and identification of prominent locations are presented in Section 3. Methods for analysing the dataset used in this study including trip segmentation, map matching, computation of similarity measures, and determining best matched paths as well as common and uncommon paths are presented in Section 4. This is followed by the presentation of analysis results in Section 5 and a discussion of the main findings and lessons learned in Section 6. Finally, in Section 7, potential directions for future research are outlined.

Related work
This section divides earlier studies into works concerning human wayfinding heuristics and wayfinding heuristics implemented in navigation services.

Human wayfinding heuristics
Wayfinding was formally defined by Lynch as the consistent use and organization of sensory cues from the external environment (Lynch 1964). While it may seem a straightforward activity due to its frequent occurrence in daily life, numerous studies have revealed the complexity of this process (Farr et al. 2012). However, people do not always choose the shortest possible route, as it may be too complex to discern or communicate (Wiener et al. 2008). Instead, they may apply certain heuristics that minimise cognitive effort, such as seeking routes with fewer turns or taking the least deviation from the direction of the intended destination. These heuristics are applied by people regardless of their familiarity with the environment or spatial ability.
A review of the literature reveals that there are several well-established wayfinding heuristics that are commonly applied by pedestrians (Bhowmick et al. 2020), including the the longest leg first heuristic (Bailenson, Shum, and Uttal 2000), the shortest leg first heuristic (Hochmair and Karlsson 2005), least angle heuristic (Hochmair and Frank 2000), the southern route preference (Bruny´e et al. 2010), and the fewest turns heuristic (Zhou et al. 2014).
The longest leg first is a heuristic that prioritises selecting routes with longer and straighter initial segments, without taking a 'turn' to minimise cognitive effort. The shortest leg first heuristic involves taking turns in the initial portion of the route to keep the latter portions as straight as possible, allowing for more exploration at decision points. The least angle heuristic involves selecting routes that deviate least from a destination's overall direction. The southern route preference is the tendency to select routes that go south, likely due to implicit associations between cardinal directions and elevation. The fewest turns heuristic involves selecting routes with the fewest number of turns to simplify decision making and cognitive effort.

Wayfinding heuristics in navigation services
Navigation services that take into account human cognition and preferences have been developed to simplify the process of navigating from one location to another. These algorithms can be divided into two categories. The first category aims to avoid complex or ambiguous parts of an environment, and may compute complexity measures for different regions in order to apply different cost functions depending on the complexity of the region (Haque, Kulik, and Klippel 2006;Manley, Orr, and Cheng 2015;Richter 2009). The second category specifically focuses on the integration of prominent locations such as landmarks in calculated paths, using them to anchor actions in space and reduce wayfinding complexity (Caduff and Timpf 2005;Richter and Klippel 2004).

Complexity
The layout of the road network is a key factor in wayfinding (Dogu and Erkip 2000), as intersections in the network are decision points that require wayfinding decisions to be made. For example, the complexity of a decision point can be quantified using the InterConnection Density (ICD) (O'Neill 1991), which is the average number of path segments meeting at an intersection. However, this measure does not take into account certain dynamics of wayfinding, such as the fact that going straight at an intersection is easier than turning left or right (Montello 2005). The complexity of a decision point can also be affected by the ambiguity in the decision situation and the interaction between instructions and the environment during route following. For instance, the instruction 'turn right' becomes more complex when there are multiple options to turn right compared to when there is only one path segment heading in that direction (Haque, Kulik, and Klippel 2006). Landmarks can help to reduce ambiguity and assist in identifying the correct location to perform an action (Lovelace, Hegarty, and Montello 1999). During route following, the interplay between instructions and the environment is also crucial (Giannopoulos et al. 2014), as good instructions can facilitate wayfinding while bad instructions can make it more difficult (Padgitt and Hund 2012).

Prominent locations
One key aspect of wayfinding is the use of prominent locations as reference points. These locations serve as anchors for a wayfinder's knowledge of that environment. The anchor point theory (Golledge 1997) explains how reference points anchor known regions in an area, resulting in a hierarchical structure of spatial knowledge (Hirtle and Jonides 1985). This hierarchy is often used in path planning, with wayfinders using more fine-grained spatial information for nearby locations and more coarse information for distant locations (Wiener and Mallot 2003). This hierarchical structure helps to reduce cognitive load and make wayfinding more efficient.
Landmarks, in particular, are essential elements in human spatial knowledge and are often used in navigation and instruction giving (Richter and Winter 2014;Schwering et al. 2017;Yesiltepe, Conroy Dalton, and Ozbil Torun 2021). These features are often mentally structured in a hierarchical way, with some landmarks being more prominent or important than others. Highlighting local landmarks along a route can improve a wayfinder's route knowledge, while highlighting global landmarks can improve a wayfinder's survey knowledge (Krukar, Anacta, and Schwering 2020;Lowen, Krukar, and Schwering 2019).
The prominence of streets and pathways can also be hierarchical, with some roads being more prominent or important than others (Dale, Geldof, and Prost 2003;Manley, Orr, and Cheng 2015;Tomko and Winter 2006). This hierarchy can be used in route planning and instruction giving, even though individuals often prefer landmarks over street names in route instructions (Tom and Denis 2003).
Despite the recent interest in the role of path complexity and prominent locations in wayfinding, few studies have explored the impact of these factors on human wayfinding behaviour in their everyday life. By examining real-world wayfinding behaviour, this study aims to provide a more comprehensive understanding of the impact of path complexity and prominent locations on wayfinding.

Preliminaries
This section provides an overview of the algorithms used to compute path complexity and to identify prominent locations in a given path, laying the groundwork for their application in the Section 4.

Complexity
According to Giannopoulos et al. (2014), the complexity of a path can be influenced by various factors such as the complexity of the environment, the instructions needed to navigate the path, and the wayfinding skills required to follow the path. In order to quantify these factors, Teimouri and Richter (2020) proposed a weighted sum model based on the model by Giannopoulos et al. (2014). The model considers three main components (see Table 1): environmental complexity, wayfinder-related complexity, and instruction complexity.

Santa Barbara Sense Of Direction
Wayfinder's ability to navigate and orient themselves in a given environment These components are each normalised to values between 0 and 1 and then combined using the Equation 1.
where w e , w w , and w i are the weights assigned to the environmental complexity, wayfinder-related complexity, and instruction complexity factors, respectively. The environmental complexity is further broken down into three parameters: the node degree of the decision point, the deviation from prototypical angles, and the length of road segments branching from decision points. These parameters are combined using the Equation 2.
where nd is the node degree, dv is the deviation from prototypical angles, length is the length of road segments, and w nd , w dv , and w l are the weights assigned to each parameter.
The complexity related to instructions is calculated using the Equation 3. where ie is the number of instruction equivalent turns, ic is the complexity of describing the turn to take, lm is the complexity of landmarks, and w ie , w ic , and w lm are the weights assigned to each parameter. Finally, the wayfinder-related complexity is calculated using the normalised Santa Barbara Sense of Direction Scale (SBSOD), using the Equation 4. SBSOD is a psychometric tool designed to assess an individual's sense of direction and spatial orientation abilities (Hegarty et al. 2006). The model utilised in this study to identify the less complex paths does not take into account wayfinder-related factors, i.e. SBSOD scores. These scores are unavailable in the Geolife dataset, but since our main objective is to analyse general behaviour, this is not a significant issue.

Prominent locations
Wayfinders rely on prominent locations as reference points to orient themselves, stay on track, and reach their destination. These locations serve as key anchors for our spatial memory, which is organised around personally significant or salient anchor points in the environment. Different elements of an environment may serve as such reference points. Landmarks help anchor actions in space and, thus, are often memorised. Prominent streets that are frequently experienced tend to be highly ranked in the hierarchical mental representation of spatial information.
Major turns indicate significant changes in direction along the route. Teimouri and Richter (2022) proposed an algorithm (Algorithm 1) to identify the most prominent locations in a given route. The algorithm consists of three steps. First, the input route is simplified to its essential geometric form by removing irrelevant geometric details using shape simplification techniques. This results in only major turns remaining. Second, landmarks located along or near the route are ranked based on their salience in a hierarchical way. A landmark's salience, or ability to stand out, is influenced by factors such as uniqueness, spatial prominence, and cultural significance. The most prominent landmarks, those in higher levels of the hierarchy, are then added to the list of prominent locations. Third, the algorithm identifies prominent streets that the route passes or crosses. To rank streets in a street network and determine their hierarchical importance in the city network, the algorithm uses betweenness centrality.
The resulting list of prominent locations is then filtered using a threshold value to select the most prominent locations. Overall, the algorithm is a comprehensive method for identifying key locations in a route that are critical for spatial learning and navigation. Algorithm 1: High-level pseudo-code of the prominent locations algorithm (Teimouri and Richter 2022) Data: is a connected, simple, directed graph of the environment; Input: Route R : A sequence of Decision Points dp ∈ V connected by edges e ∈ E Output: PL: A set of prominent locations for the route R

Methods
This section describes the workflow for analysing GPS trajectories ( Figure 1). First, the raw trajectories are segmented into individual trips taken by different users. Then, the trip data is matched to the OpenStreetMap (OSM) road network to associate each location with a specific road segment, enabling the sequence of connected road segments to represent the actual path taken by each trip. Similarity measures are then calculated between pairs of trips and their matched paths to identify the best-matched path using the degree of similarity. Through this process, common and uncommon paths are determined by analysing the frequency of each matched path across all trips. Common paths refer to routes that are used by several travellers, whereas uncommon paths are taken only once. This approach allows for the identification of travel patterns and preferences. The following sections provide a detailed discussion of each step in the workflow.

GPS trajectories
In this study, we utilised a dataset consisting of GPS trajectories from 182 users, which were collected between April 2007 and August 2012 as part of the Geolife project (Zheng et al. 2008(Zheng et al. , 2009. Various GPS loggers and GPS phones were used to collect 17,621 trajectories that covered over 50,000 hours and spanned a distance of 1.2 million kilometres at various sampling rates, with 91.5% of them recorded at a high density, i.e. every 1 to 5 seconds or every 5 to 10 metres. It is important to note that the dataset is natural, as users collected it during their daily activities (Zheng et al. 2009), and it was collected prior to the widespread use of cell-phone-based routing.
Although the dataset contains GPS trajectories from 30 cities in China, as well as some cities in the US and Europe, the majority of the records were collected in Beijing, China. Therefore, our study focused on the GPS trajectories gathered in Beijing. It is important to consider that the road network can undergo changes over time due to construction and other factors. As such, the use of the newest available OSM data may not necessarily align with the roads present in the GPS trajectories. In order to minimise this potential discrepancy, we opted to utilise OSM data from 2009, the oldest available data. This choice was made as it aligns with the time frame of the GPS trajectories recorded between 2007-2012. The road network for Beijing city centre obtained from the 2009 OSM data is illustrated in Figure 2.

Trip segmentation
To begin with, the raw GPS points for each user were filtered to only include those with transportation mode labels which was necessary for trip segmentation (see below). However, out of the 182 users, only 69 had labelled their trajectories with transportation modes, which consisted of the following: walk, bike, bus, car, subway, train, airplane, boat, run, motorcycle, and taxi.
After matching each user's label file with its corresponding GPS trajectories file, we obtained a series of GPS data points for each user in chronological order. The labelled GPS points were then divided into separate trips to facilitate further analysis. To do so, we applied trip segmentation criteria, which involved the use of trip segmentation thresholds also known as 'dwell time'. Previous studies have shown that 'dwell time' tends to vary based on the characteristics of local activity and can range between 45 and 900 seconds (Schuessler and Axhausen 2009). For our study, we decided to use a threshold of 600 seconds to distinguish between consecutive trips. Thus, the user's GPS data points were segmented into separate trips if the time difference between two consecutive GPS points did exceed the 'dwell time' or the transportation label changed.

Map matching
In order to calculate complexity and prominent locations for trip trajectories, it is crucial to match the GPS trips to the road network. Map-matching is utilised to align the sequence of user trajectories with the road network.

Map-matching algorithm
We employed the Leuven Map-Matching algorithm, which is an open-source tool known for its robustness to noise and sparseness in GPS data (Meert and Verbeke 2018). This algorithm is based on a Hidden Markov Model (HMM) with non-emitting states. HMM is a statistical model that assumes the system under consideration is a Markov process with intangible states. In the case of map matching, this model generates candidate paths based on their probabilities and evaluates them sequentially. The algorithm updates the hypotheses of older GPS points to account for newly observed results, and the surviving path with the highest joint likelihood is selected as the final solution.
To ensure the accuracy of the map matching algorithm, we set the searching distance radius to 50 metres. This distance was determined based on road widths in Beijing, which are estimated to be between 10 and 40 metres (Tao 2021), and the standard deviation of GPS noise, which is approximately 4 metres (Newson and Krumm 2009). Map matching yielded a sequence of OSM nodes that the user traversed. We considered the first and last point of the obtained sequences as the origin and destination of each trip, respectively.

Similarity measures
Assessing the quality of the matching result is challenging since the 'true' path of the user trajectories in the road network is unknown. Therefore, a similarity measure between the user trajectory and the matched path in the road network is required to assess how closely the matched path resembles the original one.
There are several similarity measures available for comparing two curves, such as dynamic time warping (Berndt and Clifford 1994), edit distance (Chen, Ozsu, and Oria 2005), longest common subsequence (Vlachos, Kollios, and Gunopulos 2002), and Fréchet distance (Alt and Godau 1995;Fréchet 1906) which are five of the most widely used ones . In our analysis, we utilised the Fréchet distance measure. The Fréchet distance is a continuous metric used to determine a smooth alignment between the entire paths of two trajectories. This property makes the Fréchet distance particularly useful when considering interpolated values between points on the trajectories, as noted by . Moreover, continuous metrics like the Fréchet distance are capable of handling trajectories with varying sampling rates and gaps more effectively .
The Fréchet distance can be illustrated using a well-known example (Eiter and Mannila 1994). Consider a person walking their dog on one curve while the dog is walking on the other. Both are allowed to adjust their speed, but they cannot go backward. The minimum length of a leash required for both of them to walk the curves from start to end is the Fréchet distance of the curves.
Formally, the Fréchet distance between two curves f, g :[0, 1] → R 2 is defined as follows: Here, α and β range over continuous and nondecreasing reparametrizations with α(0) = β(0) = 0 and α(1) = β(1) = 1 only. The Fréchet distance measures the minimum distance between any point on the curve f and any point on the curve g as the curves are traversed from start to end along continuous and nondecreasing reparametrizations. A lower Fréchet distance indicates a higher similarity between the two curves.

Best matched paths
After matching the user trajectories to the road network and calculating the Fréchet distance between user trajectories and matched paths, we encountered the challenge of precisely identifying all the best matched paths using a single threshold for the Fréchet distance. Therefore, to overcome this limitation, we employed a method to divide the user trajectories into different groups based on trajectory length and trajectory distance. Trajectory length means the number of data points in the trajectory, and trajectory distance means the physical distance between start data point and end data point of the trajectory. This comprehensive approach allows us to accurately identify the best matched paths by considering multiple thresholds for the Fréchet distance within each group.
To create these groups, we utilised quantiles to divide the data into three levels of trajectory length (short, moderate, long) and three levels of trajectory distance (short, moderate, long). Specifically, we calculated the 33rd and 66th percentiles of the data for each variable. Then, we used the data between the minimum and 33rd percentile as the short group, the data between the 33rd and 66th percentiles as the moderate group, and the data between the 66th percentile and the maximum as the long group. We combined the three levels of trajectory length with the three levels of trajectory distance to create the following nine groups: short-short, short-moderate, short-long, moderate-short, moderate-moderate, moderate-long, long-short, long-moderate, and long-long. This allowed us to create more precise groupings of trajectories based on their length and distance, which was crucial for subsequent analysis. The resulting nine trajectory groups are shown in Figure 3, which displays distributions of Fréchet distances for each group.
For each group, we calculated a threshold value by selecting the 80th percentile of all the Fréchet distances within that group. This threshold value was then assigned as the group threshold. This approach aimed to increase the accuracy of identifying the best matched paths as map matching typically has an accuracy of 70% to 90% depending on the measure used (Berjisian and Bigazzi 2022).
Finally, of the nine trajectory groups, we selected matched paths with a Fréchet distance lower than the group threshold as the best matched paths for subsequent analysis. This approach allowed us to tailor and accurately select matched paths as different groups of trajectories can have different best thresholds for the Fréchet distance. Figure 4 illustrates the matching results of user trajectories for two different scenarios. Figure 4(a) shows a successful match of a user trajectory from the long-long group (consisting of long distances and long lengths) that satisfies the Fréchet distance criterion and has a Fréchet distance value lower than its corresponding group threshold. The user trajectory is depicted as an orange solid line, and the matched path is represented by a green dashed line. On the other hand, Figure 4(b) exhibits an unsuccessful match scenario, where a user trajectory from the moderate-long group (comprising moderate distances and long lengths) fails to meet the Fréchet distance criterion and has a Fréchet distance value higher than its corresponding group threshold.

Common and uncommon paths
The process of matching GPS trajectories to the road network is crucial for identifying common and uncommon origin-destination (OD) pairs and paths. By obtaining a sequence of connected road segments for each trip, we can gain insights into travel patterns and preferences of individuals.
To identify common OD pairs, we compare the origin and destination points of the matched paths. Common OD pairs are those that appear in more than one of the matched paths and are used by multiple users. Our analysis used  a search radius of 50 metres for map-matching. Therefore, any two origins or two destinations that are within 50 metres of each other are considered 'common', or equal, in our analysis. Common paths are those that have been traversed more than once. This means that two matched paths are identical, and that they utilise the same edges and nodes in the network. On the other hand, uncommon paths refer to those that have only been travelled once.
To illustrate the concept of common and uncommon paths, Figure 5 shows three maps with different matched paths between the same OD pair. Figure 5(a) displays a common path, which is more direct than the other two uncommon paths shown in Figures 5(b, c). The solid orange lines represent the original GPS trajectories, while the dashed green lines represent the matched paths that are either common or uncommon.  Overall, there are 78 common OD pairs, with a total of 547 paths between them. Of these paths, 351 are common and 196 are uncommon. A comparison of characteristics between common and uncommon paths is presented in the next section using Welch's t-test due to the uneven group sizes. The analysis includes examining the number of intersections (RQ1), detours (RQ2), complexity (RQ3), and number of prominent locations (RQ4) between common and uncommon paths. The aim of this analysis is to identify differences between the two path types and gain insights into how they are utilised.

Results
In this section, we compare characteristics of common and uncommon paths. These analyses are conducted on the whole dataset, without separating different transportation types. This decision was made to ensure a comprehensive understanding of travel patterns across all modes of transportation.

Intersections
When comparing the number of intersections between common and uncommon paths, we found that on average uncommon paths (M = 23.58, SD = 31.858) have more intersections than common paths (M = 10.69, SD = 13.985). A Welch's t-test reveals a statistically significant difference between the two groups; t (237.689) = 5.382, p < .001; d = .583 (Table A1). The effect size for this analysis shows a medium to large effect, with a Cohen's d value of .583 (Cohen 2013).

Detour
In order to examine differences in path lengths between common and uncommon paths, we divided the distance of each path by the shortest path distance between the same origin and destination. This allowed us to determine the detour or extra distance of each path compared to the shortest possible route between these points. Our aim was to determine whether one group of paths was significantly longer than the other. A Welch's t-test did not show a significant difference between common and uncommon paths in terms of their detours; t (462.389) = -.403, p = .687 (Table A2). This finding suggests that the length of a path may not be the most important factor in determining its usability, and that other factors such as traffic, complexity, or presence of prominent locations may play a greater role.

Complexity
To compare the complexity of common and uncommon paths (see Section 3, we conducted Welch's t-test. The results show that the complexity of uncommon paths (M = 12.389, SD = 17.515) is statistically significantly higher than that of common paths (M = 5.311, SD = 7.696); t(237.773) = 5.375, p < .001; d = .582 (Table A3).
To make sure that the result has not been affected by the number of intersections in the paths, we normalised the complexity of each path by dividing it by the number of its intersections. The results again shows a statistically significant difference between the complexity of each intersection on common paths (M = .432, SD = .087) and uncommon paths (M = .476, SD = .072), with the complexity of intersections on uncommon paths being higher than that of intersections on common paths; t(467.641) = 6.286, p < .001; d = .532 (Table A4).
In order to investigate the contribution of different parameters in the path complexity model (as shown in Table 1) to the differences between common and uncommon paths, separate t-tests were conducted for each of the following parameters: number of branches, deviation from prototypical angles, segment length, instruction equivalence, instruction complexity, and landmark complexity. To account for multiple testing, the Bonferroni correction was employed (Weisstein 2004), which adjusts the alpha value by dividing it by the number of tests conducted. In this case, the adjusted alpha value was 0.05/6 = 0.0083. Therefore, rather than comparing the p-values with the original alpha value of 0.05, they were compared with the new adjusted alpha value of 0.0083. The results of all t-tests indicated a statistically significant difference between common and uncommon paths. Table 2 presents the p-values, effect sizes, and direction of the differences for each t-test. For all parameters, the uncommon paths showed significantly higher values than the common paths, with effect sizes ranging from .279 to .522.

Prominent locations
In Section 3, we explained that prominent locations are computed using a threshold value. To identify prominent locations for common and uncommon paths, we applied five different thresholds (0, 0.25, 0.5, 0.75, and 1). Welch's t-test was then used to compare the number of prominent locations between the two groups for each threshold value. Furthermore, we ensured that the results were not affected by the number of path intersections by dividing the number of prominent locations for each path by its corresponding number of intersections. Our results show that there is a statistically significant difference between common and uncommon paths, with uncommon paths having a higher number of prominent locations overall and at each intersection. This is true for all the threshold values that we tested. In order to save space and avoid redundancy, we have chosen to only present the results for the threshold value of the 0.5. This value was selected because it very well reflects the overall findings. With the threshold value set to 0.5, the number of prominent locations between common paths (M = 9.114, SD = 10.859) and uncommon paths (M = 30.796, SD = 57.267) is significantly different; t(202.864) = 5.248, p < .001; d = .613 (Table A5) (Table A6).
To determine which parameters in the prominent locations algorithm (Algorithm 1) contributed most to the differences in the number of prominent locations on common and uncommon paths, we performed three separate t-tests for major turns, prominent streets, and prominent landmarks. As multiple tests were conducted, the Bonferroni correction was applied by adjusting the alpha value to 0.05/3 = 0.0167. These t-tests revealed statistically significant differences between common and uncommon paths. Table 3 presents the p-values, effect sizes, and direction of differences for each parameter. The analysis revealed that common paths had a higher number of prominent streets compared to uncommon paths, while uncommon paths had a higher number of major turns and prominent landmarks compared to common paths.

Discussion
In this study, we examined the heuristics of wayfinders by analysing a large sample of GPS-based trips in the real world. The main findings indicated that individuals tend to choose less complex routes with fewer prominent locations. Further analysis of the data revealed significant differences between common and uncommon paths in the road network. Uncommon paths were found to have more intersections (RQ1), to be more complex (RQ3), and to pass through more prominent locations (RQ4) than common paths. However, there was no significant difference between common and uncommon paths in terms of their detours (RQ2). These findings are important for understanding the wayfinding heuristics of individuals and for improving navigation services. There are several potential reasons for these findings. One possibility is that common paths are more direct and efficient than uncommon paths, which is consistent with previous research on wayfinding and navigation that suggests people prefer routes that are straightforward and easy to follow (Haque, Kulik, and Klippel 2006;Richter 2009). Additionally, t-test results showed that common paths tend to have more prominent streets, such as major roads and highways, which could contribute to their popularity. By presenting less complex routes, navigation services may be able to better support the wayfinding strategies of individuals and improve the user experience.
On the other hand, wayfinders may be more likely to choose uncommon paths when they encounter traffic or other obstacles on their usual routes. For example, in avoiding a highway or other major road, wayfinders may end up on small, residential roads that are not designed to handle large amounts of traffic. Such roads may be shorter but involve more turns, increasing the complexity of the route and the number of prominent locations (major turns). These findings are consistent with the observation that uncommon paths tend to be more complex and pass through more major turns than common paths.
The familiarity of wayfinders with their environment may play a significant role in their route selection (Millonig and Schechtner 2007). Familiarity with an area allows wayfinders to rely on prior knowledge to navigate, leading them to choose more straightforward or previously travelled paths, thus contributing to the higher likelihood of selecting common paths. In contrast, unfamiliar wayfinders may also opt for uncommon paths to explore their surroundings or to avoid potentially confusing or busy areas. Although such paths may be more complex, they offer a more engaging and interesting route to the destination, which could explain why some wayfinders still choose to take them despite the potential difficulties.
Spatial ability may influence selection of the common and uncommon paths. Those with higher spatial ability tend to perform better in tasks related to wayfinding and spatial navigation (Montello 2005). It is possible that individuals with higher spatial ability may be more likely to use uncommon paths, which could require more complex spatial processing and decision-making skills. Additionally, personal preferences and specific goals or needs may also contribute to the differences between common and uncommon paths (Redish et al. 1999). Wayfinders may choose routes that align with their personal interests or values, such as routes that are more environmentally friendly or aesthetically pleasing. They may also have specific goals or needs that influence their route selection, such as the need to reach a destination quickly or to avoid certain types of terrain or weather conditions. While our study provides some insights into the factors that influence path selection, it is important to acknowledge its limitations. Our sample was drawn from the Geolife dataset which may not be representative of all wayfinders or environments. Demographic characteristics of the 182 individuals whose GPS trajectories were used for path analysis are not known. Therefore, the top-down approach we used to examine wayfinding heuristics may not fully capture all the factors that influence path selection, such as individual characteristics like age and gender (Broach and Dill 2016;Hegarty et al. 2023) or cognitive ability (Manley, Orr, and Cheng 2015). Therefore, future research should explore these factors in greater depth and using more diverse samples to strengthen the generalisability of the findings.

Conclusions
This study aimed to gain a better understanding of wayfinding heuristics in a real-world environment by analysing individual heuristics for path selection, with a focus on complexity and prominent locations, using the Geolife dataset. As opposed to most related work, which analyzes individual trips to elicit route preferences in revealed preference methods (Hood, Sall, and Charlton 2011;Raveau et al. 2014), it introduces the novel concept of common paths to cluster routes into more and less frequently taken routes before preference analysis. It therefore draws from the experience of travellers that are familiar with an environment to obtain characteristics of routes that demonstrate some level of agreement among travellers. Since common paths are characterised by relevant criteria used in route choice, an aspect of future work could include the joint analysis of the top-down and bottom-up approach by comparing to which extent common paths (as identified from GPS trajectories) are already realised in routes suggested in commercial trip planners. While discrepancies between observed travel behaviour and trip recommendations provided by online trip planners, such as Google Maps and MapQuest, have already been discussed in earlier studies (Payyanadan, Sanchez, and Lee 2016;Schirck-Matthews et al. 2023), to which extent these discrepancies apply to common paths is currently unknown. A better understanding of this aspect could help to design future algorithms which generate more intuitive and commonly used routes. 290 F. TEIMOURI ET AL.