A novel method for road network mining from floating car data

ABSTRACT Vehicles have been increasingly equipped with GPS receivers to record their trajectories, which we call floating car data. Compared with other data sources, these data are characterized by low cost, wide coverage, and rapid updating. The data have become an important source for road network extraction. In this paper, we propose a novel approach for mining road networks from floating car data. First, a Gaussian model is used to transform the data into bitmap, and the Otsu algorithm is utilized to detect road intersections. Then, a clothoid-based method is used to resample the GPS points to improve the clustering accuracy, and the data are clustered based on a distance-direction algorithm. Last, road centerlines are extracted with a weighted least squares algorithm. We report on experiments that were conducted on floating car data from Wuhan, China. To conclude, existing methods are compared with our method to prove that the proposed method is practical and effective.


Introduction
A road map is a compilation of roads and transport links ("Roadmap" 2021). It plays an important role in many aspects of our lives, such as navigation, urban management (Wu, Gui, and Yang 2020) and Location-Based Services (LBSs) (Huang and Wang 2020;Zuo, Liu, and Fu 2020). Along with the development of LBSs, demands for map accuracy are increasingly stringent. However, road construction is a frequent activity, and roads are quickly updated. For instance, in China, the total length of highways was 3:73 � 10 6 km in 2008, and grew to 5:2 � 10 6 km in 2020. Therefore, the road map needs to keep up to date to follow the increase in road construction. There are two main sources of update data: (1) road information extracted from aerial images (Yuan and Cheriyadat 2016;Karaduman, Cinar, and Eren 2019;Zhang et al. 2019;Wang, Hou, and Ren 2017) and (2) road information collected by professionally operated probe cars (Gwon et al. 2017;Jo and Sunwoo 2014;Massow et al. 2016). The first source consists of high-resolution satellite images that are processed with a shape classification algorithm to estimate the boundaries of roads. Aerial images are an important source of road map updates. However, it is costly to acquire suitable satellite images. The second source requires probe cars equipped with on-board sensors such as Real-Time Kinematic (RTK) GPS, Post-Processing Kinematic (PPK) devices, and laser scanners to collect road information (Sester 2020). The accuracy of the resulting maps is higher than that of the first method. However, in addition to the high cost of on-board sensors, this method is a labor-intensive way to collect road information.
Considering the limitations of the two mentioned methods, some researchers have proposed extracting road maps from floating car data (Wang et al. 2015;Zheng and Zhu 2019;Fang et al. 2016). As the cost of GPS and communication technologies has decreased this year, many vehicles are equipped with a GPS to record trace data. In contrast to aerial images, floating car data is accessible, has wide coverage, is available in large amounts, and is quickly updated. However, the accuracy of GPS data can only reach 5 m -30 m due to signal interruptions and multipath transmission. The large set of trajectories can compensate for the shortcomings in accuracy, but the accuracy makes it difficult to mine road networks from these data. In this paper, we propose a novel method of extracting road maps from floating car data.
In general, the main contributions of this paper are as follows: (1) An Otsu-based background segmentation algorithm is introduced to detect road intersections; (2) A gamma-correction-based spatiotemporal prediction algorithm is utilized to increase the accuracy of intersection detection; (3) A clothoid curve is used to resample the GPS data, and the distance and direction similarity are combined to cluster the data.

Related work
There are three main steps to mining road maps from floating car data: intersection detection, data clustering, and centerline extraction of roads from the clustering data. As an important component of roads, the structure of intersection is more complex than road segments. Therefore, intersection detection is the first step in generating a road network from floating car data. Then, it is necessary to cluster the trajectories together that belong to the same road. Finally, to describing the shape of road, centerline extraction is essential.
Road intersection extraction is one of the most important and difficult steps in road network mining. Some studies have identified road intersections and segments from the angle and distance of their trajectory (Fathi and Krumm 2010;Yang et al. 2018). In addition, the speed threshold combined with direction changes was used to detect intersections (Chen et al. 2020). In reference (Deng et al. 2018), a local G* statistic was introduced to detect GPS points with large turning angles. Wang et al. (2015) determined intersection boundaries by analyzing conflict points that have large intersection angles. The turning angle is an important feature for detecting intersections from trajectory data in the studies mentioned above. However, the heading angles of GPS points are inaccurate because of signal interruptions and multipath transmission.
The methods of clustering include (1) clustering based on the density of GPS points (Biagioni and Eriksson 2012;Li et al. 2018) and (2) clustering based on the direction and distance features of GPS traces Deng et al. 2018;Li et al. 2012;Liu et al. 2012). The kernel density method is the most commonly used way to build the probability function of similar GPS points for clustering (Biagioni and Eriksson 2012). The Delaunay triangulation network is also utilized to cluster the GPS points . However, this density-based method cannot cluster points correctly in road intersections.
Clustering methods based on the direction and distance of traces are widely used and work on most occasions.  presented a region growing clustering method to cluster GPS trajectories that used the vertical and angular differences of trajectory vectors and assigned different weighting values to the vertical distance and angle to calculate trajectory similarity. Deng et al. (2018) combined the longest common sub-track with a distance-direction function to calculate the total similarity of adjacent tracks. First, the shape similarity of two adjacent tracks was measured by calculating the ratio of the longest matched sub-trace between two associated trajectories. Then, a distance-direction function was used to compute the heading direction similarity. The overall similarity was measured by combining the two steps. Based on the position and direction components of GPS traces, Li et al. (2012) added a semantic relationship to classify GPS points. In addition, Liu et al. (2012)further optimized the distance and direction-based method. The orientation similarity and geographical distance were used first to perform basic clustering. Then, a clustering refinement method was proposed. The main concept of the refinement method was to calculate backbone curves to represent roads first and then group the closest samples to a smaller cluster. The algorithms mentioned above can work effectively in some instances, but they fail on complex roads.
Extracting the centerline of a road from floating car data is an important step. Various algorithms have been proposed to accurately describe the geometrical shape of roads. Some researches converted floating car data into bitmaps and used a grayscale map skeletonization method to thin and prune the data to generate its centerline (Shi, Shen, and Liu 2009;Biagioni and Eriksson 2012;Chen and Cheng 2008). This method can extract centerlines successfully when the density of GPS points remains moderate. However, it failed when the density of points became very large or small. Li et al. (2012) and Bruntrup et al. (2005) used an incremental method to generate road networks from GPS data. They grouped the traces that belonged to the same road first and utilized an incremental approach to generate the centerline. This method needs to match all trajectories and modify the centerline step by step, causing low efficiency. Some studies introduced a Gaussian mixture model to extract the number and location of lanes from GPS data (Chen and Krumm 2010;Guo, Iwamura, and Koga 2007), assuming that these data follow a normal distribution. However, this method is better suited to data with high accuracy and obtained in a controlled environment (Winden, Biljecki, and Van Der Spek 2016;Ahmed et al. 2015). In addition, Cao and Krumm (2009) proposed a point-based physical attraction model to generate the centerline. It was assumed that there are two types of forces acting on a GPS point. One is the total gravitation force from the neighboring points. Another is the spring force to keep a point in its original position. The efficiency of this method is low and is invalid when the density of GPS points is high.
In contrast to these methods, in this paper, the Otsu algorithm, which is often used in computer vision to segment foreground and background, is adopted to detect road intersections. Furthermore, a clothoid curve is utilized to resample the floating car data, and a direction-distance clustering algorithm is used to cluster the data to group similar trajectories together. Finally, the centerline of the road is extracted with a weighted least squares algorithm.

Method
In this section, we elaborate on our methods, which include road intersection detection and data clustering. First, we convert the GPS data to raster data with a Gaussian model and detect road intersections with the Otsu algorithm. Then, we use a clothoid-based method to resample the trajectories and calculate their distance and direction similarity to cluster the similar ones together. Finally, we describe a piecewise weighted least square fitting method that extracts the centerline of the clustered data and builds a road network that describes the topology and geometry of the roads.

Problem statement
A complete road can be separated into road segments and intersections, as shown in Figure 1. Road intersection detection is critical to mining road networks from floating car data. A road intersection is a junction where more than two roads meet or cross. Compared with a road segment, intersections are more complex because they may include a left-turn lane, right-turn lane, straight lane, and U-turn lane as represented in Figure 1. To exactly represent the road shape, the road intersection cannot be described by a single point. Therefore, clustering is necessary to group similar GPS trajectories. Centerline extraction is another important step in mining road networks. We need to calculate the centerline from the clustered data to describe the road. Specific to these problems, this paper proposes a novel method to extract the road network from floating car data. To detect road intersections from trajectories, we attempt an Otsubased background segmentation method. To the best of our knowledge, this is the first study to use this method to extract intersections from floating car data.

Road intersection detection
To avoid interference with other traffic, the speed of vehicles will be slowdown and traffic signals are usually assigned in the intersection. As a result, there are significant differences in the distribution of floating car data between road segments and intersections. Compared with the road segment, the data are more dense in the intersection, as shown in Figure 2. Based on this, an Otsu-based background segmentation algorithm is utilized to detect road junctions. First, a Gaussian model is used to resample the data to a grid. To increase the distinction between the segments (background) and intersections (foreground), we utilize a gamma-correction-based spatiotemporal prediction algorithm to process the grid images. Finally, Otsu is introduced to divide the background and foreground features.

Resampling
Resampling is used to transform the GPS data into raster data. A raster can integrate a large amount of GPS data efficiently. A Gaussian model is used to calculate the weight of each grid cell. The cell's intensity is calculated from the weight of the surrounding GPS points as illustrated in Figure 3(a). P is a grid cell, and B is a buffer with a 3σ radius (σ is relevant to the error of GPS). The weight values of GPS points in B are calculated with Equation (1). The intensity of P is computed by accumulating the weights in buffer B as Equation (2) shows.
where W x i ; y i ð Þ is the weight value of any point P i in B, G P ð Þ is the intensity value of grid P, σ represents the variance, and x 0 ; y 0 ð Þ represents the coordinate of the center point of the gird. ðx i ; y i Þ is the coordinate of GPS point P i . dist P; P i ð Þ is the distance between P and P i . The result of resampling is depicted in Figure 4. The brighter the color, the greater the intensity value, Figure 1. An e xample of a road segment and an intersection. and the result reveals that the intensity value of intersection points is obviously larger than the GPS points in the road segment.

Gamma-correction-based spatiotemporal prediction
As Figure 4 shows, the population of GPS points in intersections is obviously greater than that of points in road segments. With the increase in GPS data, the intensity differences become increasingly distinct. However, the calculation load also increases. To improve the accuracy and efficiency of Otsu, a gammacorrection-based spatiotemporal prediction algorithm is used in this paper. Gamma correction modifies the gamma curve of an image to edit the image nonlinearly to detect the dark part and light part in the image, and increase the ratio of the two part to improve the image contrast and is widely used in image processing. We introduce a time coefficient A based on gamma correction as in Equation (4). A is the ratio of target time to test time. The gamma-correction-based spatiotemporal prediction method can increase the variance between the background (segments) and foreground (intersections), which can improve the accuracy of Otsu.
where A is the time coefficient based on the ratio of the prediction time T and to base time T 0 . γ is the separation coefficient between the background and foreground. I i represents the normalization intensity of the grid. G i is the density of the gird corresponding to Equation (3). The results of the gamma-correction-based spatiotemporal prediction algorithm are shown in Figure 5. From   the figure, it can be seen that the variance between the background and foreground is increased with this method.

Otsu-based intersections detection
Otsu is a method often used in computer vision and image processing. It assumes that images contain two classes of pixels, background and foreground, and calculates the optimum threshold to maximize the interclass variance and separate the two classes (Otsu 1979). In this paper, we utilize this method to detect intersections from the results of the gamma-correction-based spatiotemporal prediction method. The ratios of intersections and segments are represented by w 1 ; w 2 . The mean values are u 1 ; u 2 . The interclass variance can be calculated with Equation (7). By traversing different gray thresholds, the relevant ratio and mean value are calculated. Finally, the optimum gray threshold is calculated to classify the road segments and intersections, and the result is illustrated in Figure 6. The red dots indicate the points selected as foreground by Otsu, and the black dots are the background points.

Clustering
Because the sampling frequency of floating car data ranges from 20s to 60s, the distance between two consecutive GPS points is usually long and changeable. We introduce a clothoid-based method to resample the trajectory. We cluster the trajectories in road segments and road intersections separately, as we have detected the boundary of a road intersection in Section 3.2.

Clothoid-based trajectory resampling
Clothoids are widely used as transition curves in road engineering for their connection to the geometry between tangents and circular curves (Meek and Walton 1992). The curvature of a clothoid changes linearly with its curve length, which is in accordance with the law of vehicle dynamics. Therefore, we utilize a clothoid curve to resample the trajectories. A clothoid curve can be described with an expansion of the Fresnel integral as shown in equations (8) and (9).
where s is the length of the curve from its start point (x 0; y 0 ), # 0 is the direction of the start point, k is the curvature of the curve at the start point, and k' is the rate of change of curvature.
In floating vehicle data, the positions of the start and end points and their directions can be deduced. The main problem of generating a clothoid curve between two adjacent GPS points on a trajectory is to calculate the k and k ' of the clothoid curve. According to the proposed method in reference (Bertolazzi and Frego 2015), the parameters of a clothoid curve can be calculated by the positions and directions of two points as depicted in Figure 7 (a). To generate a clothoid curve between two points, the distance d between the points and the incline angle φ of the line segment are calculated first. Then, the angle between the incline angle φ and the direction of the start and end points # 0 , # 1 are denoted as ; 0 and; 1 . The total arc length S t of the curve is calculated by dividing the value between d and the Fresnel integrals of ; 0 -; 1 , # 0 -# 1 and ; 0 . The curvature of the start points k and the rate of curvature change k 0 is calculated based on the arc length S t according to reference (Bertolazzi and Frego 2015).
According to the previous step, we create a series of clothoid curves to link the trajectories. To calculate the similarity accurately between different trajectories, we resample the clothoid curve of the trajectories with an arc length step l as shown in Figure 7(b). The orange dots are the original GPS points, and the red points represent the resample points. The blue line is the polyline composed of the original GPS points, and the green line is the clothoid curve composed of the original points and the resample points. The red arrows depict the direction of the original points.

Distance similarity
There is a high probability that two adjacent trajectories belong to the same road. Therefore, we propose a distance-contribution function to calculate the distance similarity of two associated trajectories. As illustrated in Figure 8, T 1 and T 2 are two adjacent trajectories T 1 ¼ p 1 ; p 2 ; � � � ; p k f g, and T 2 ¼ q 1 ; q 2 ; � � � ; q k f g after resampling by a clothoid curve.  We set T 1 as the matching trajectory and set T 2 as the reference trajectory. In the first step, we calculate the shortest distance d i from p i to T 2 . If d i < r (r is the distance threshold), then we continue to calculate the discrimination dis radius of the turning radius of two points. If dis radius < d r (d r is the turning radius threshold), as depicted in Figure 9(a), d i < rdis radius < d r ð Þ. We calculate the value of the distance-contribution function f con with Equation (10). If dis radius > d r , as shown in Figure 9(b), the value of f con is zero. In addition, if d i > r at first, as represented in Figure 9(c), the value of f con is zero.
Then, the total value of the distance-contribution function of T 1 is calculated with Equation (11). Finally, the distance similarity from T 1 to T 2 is calculated as the ratio of F CON to the length of T 2 as shown in Equation (12).
where d i is the distance from p i to T 2 and σ represents the precision of the position. F CON denotes the total value of the distance-contribution function of T 1 , and Sim dis is the distance similarity of T 1 and T 2 . Len T 2 represents the length of T 2 .

Direction similarity
Direction is another important parameter for clustering floating car traces. According to the expansion of a clothoid in Equations (8) and (9), the direction of any point on the curve can be calculated using the arc length to the start point of the curve. Therefore, we can use a piecewise function to calculate the direction of all the resample nodes. When the resampled node is between two GPS points N i and N iþ1 of a trajectory, the direction of the resample node can be calculated with Equation (13).
where s is the arc length to the start point of the trajectory, # i is the direction of N i , k i is the curvature of the curve at N i , and k 0 i is the rate of the change in the curvature.
The direction similarity between two trajectories can be calculated using a resample node that meets the distance and turning radius requirements. Figure 9 (a) shows a resampling node q j in the reference trajectory T 2 that has a corresponding nearest node p i in the matching trajectory T 1 that meets the distance and turning radius requirements. The direction similarity function can be described, as in Equation (14).

Clustering the trajectories
The overall similarity function is calculated by considering both the distance similarity and the direction similarity as in Equation (15).
A hierarchical clustering method is used to classify all the trajectories in the intersections for different turning directions.
Step 1: All the trajectories are marked as "unused", and the base clusters are initialized to empty.
Step 2: Two trajectories are chosen and their similarity value is calculated. If the similarity is less than a given threshold T g , two different clusters are added to the base clusters, and the trajectories are marked as the corresponding base cluster index.
Step 3: The following trajectories are compared to all the list clusters. If the similarity between one trajectory of the list is larger than the given threshold, the trajectory is added to that cluster. If the similarity value to all the list clusters is smaller than the given threshold, the trajectory is realized as a new cluster and added to the base clusters. Step 4: After all the trajectories of the intersection are clustered, the clusters that have more than N trajectories are selected.

Centerline extraction
In the clustering result for an intersection, the trajectories having the same direction contain many discrete points that have certain aggregation characteristics. A piecewise weighted least square fitting method is used to extract the centerline of the clusters. The detailed step of the proposed method are as follows: Step 1: A rectangular fitting region is created in front of the entry point of the cluster. The direction of the region is the same as the direction of the entry point as shown in Figure 10.
Step 2: Corresponding bitmap generated in Section 3.2 that corresponds to the rectangular region are selected if they contain intensities greater than the threshold T i The selected girds are designed as the key points to fit the line segments of the region.
Step 3: A weighted least squares fit is used to compute the parameters of the line segment of the region. The weights of the key points are the normalized values of their intensities. The result of the weighted least squares fitting is shown in Figure 11.
The fitting line segment of the rectangular region can be described as follows: The residual error E of fitting the line segment under weight W is shown as Equation (17).
By minimizing the residual error E, the parameter matrix c ¼ a; b ½ � T can be calculated with Equation (18).
where A and b are the matrix styles of the horizontal and vertical coordinates, respectively, and W is the weight matrix. Step 4: Sequences of line segments are extracted along with the cluster in Step 1 to Step 3. The direction of the next rectangular region is replaced with the direction of the current rectangular region.
Step 5: A clothoid curve is used to fit the global segments of the clusters, which can make the centerline smooth and more like the real road.

Road network building
Because we calculate the boundary of an intersection in section 3.2.3, the traces are divided into two parts: road segment and intersection. The cutting points are the crossing points of the centerline of the clustered data and the boundary of the intersection. If the direction of a point is toward the inside of a road intersection, it is defined as an entry point. Otherwise, it is defined as an exit point as depicted in Figure 12. If the start point of a road can be matched with the endpoint of another road, it means that the two roads are connected. As shown in Figure 12, L 1 is connected to L 9 L 10 , L 11 and L 12 . The road segments are connected by road intersections. The connectivity attribute of Figure 12 is expressed in Table 1.

Experimental results and discussions
To evaluate the proposed approach, experiments are conducted on two datasets in Wuhan, China. In this section, we first introduce two datasets and the parameters used in this paper. Thereafter, we show the  results of intersection detection, clustering and centerline extraction. Then, we compare and evaluate the proposed method with the two other methods. Finally, we discuss the advantages and disadvantages of the proposed method.

Experimental data and parameters
To test the method proposed in this paper, we use two datasets collected by thousands of vehicles in Wuhan, China. Figure 13(a) illustrates data 1, which contains 700,000 track points and was cleaned in our previous study . The sampling frequency of the data ranges from 20 s to 60 s and the position accuracy ranges from 5 m to 30 m. Figure 13(b) illustrates the original floating car data (data 2). We collected approximately 1.4 million track points in one week. The parameter setting values are shown in Table 2.
In this article, some parameters need to be set. First, σ is relevant to the GPS error, and we set σ ¼ 30. A means the time coefficient in the gammacorrection-based spatiotemporal prediction process. In our study, A ¼ 12. γ is the separation coefficient between the background and foreground. The arc length step l is used to resample the clothoid curve of the trajectories, and we set l ¼ 5 for this paper. Between two different trajectories, the distance threshold r and turning radius threshold d r are used to calculate their distance similarity. We set r = 10 m and d r = 20 m. To cluster all the trajectories, we use a threshold T g to decide whether different trajectories are similar to each other. When the number of a cluster is more than N, the cluster is selected as a director of the intersection. We set T g = 0.45 and N = 10.  Table 1. Figure 13. The experimental floating car data. For centerline extraction, the centerlines of the clusters are generated along the trajectories by a series of rectangular regions. The bitmaps of the rectangular regions are used in this step. The intensities of the grid cells that are larger than T i are selected as key points to fit the centerline segment in this rectangular region with the least squares method. In addition, we set T i = 6 for this paper.

Intersection detection results
The results of intersection detection in data 1 are depicted in Figure 14. Figure 14(a) shows the results of Otsu, and the detection results are shown in Figure 14(b). Each circle presents the spatial coverage of the intersection. There are four kinds of results by manual inspection: correctly detected, incorrectly detected, correctly excluded and not detected.
As the area of data 1 is one of the CBDs of Wuhan, there is high traffic flow and often traffic jams there. However, more than 92% of the intersections are correctly detected by our method, indicating the validity of the Otsu-based intersection detection algorithm. There is one road intersection not detected, as shown in rectangle A. The main reason for not being detected in A is that the variation of the data density in this intersection is not obvious compared with the nearby road segments. In addition, three intersections are incorrectly detected. As shown in rectangle B, the density of data in the red rectangle is greater than that of the near data as it is the entrance to a community, as a result, these data are incorrectly detected as intersections.
The results of intersection detection in data 2 are depicted in Figure 15. Data 2 is the original floating car data that are full of noise points. Compared with data 1, data 2 contains more road scenarios and is more complex. However, from Figure 15(a,b), we can see that more than 85% of the road intersections are accurately detected by our method, demonstrating the robustness of the algorithm.
Some of the minor intersections are not detected or are incorrectly detected. For instance, intersections C and D in Figure 15(b) are not detected. The main reason is that there are few vehicles turning at these intersections, so the data density is not significantly different than the surrounding data. In this case, Otsu cannot tell the foreground from the background. Additionally, intersections E and F are incorrectly detected, and the main reasons for the "incorrectly  detected" are that on these roads, there is frequent traffic blockage or there is a community or shopping mall entrance and many vehicles stop there. Therefore, the data density is significantly greater than that of the surroundings.

Results of clustering and centerline extraction
In the data clustering part, we use a clothoidbased method to resample the GPS trajectory first. The clothoid-curve can correctly resample the trajectory to make it closer to the real trajectory of the vehicle, and the accuracy of clustering is not affected by the low sampling frequency of floating cars after clothoid-based resampling. Then, we calculate the distance and direction similarity of the trajectories to cluster them. Data with greater distance and direction similarity have a greater probability of belonging to the same cluster. We use satellite images as the background for comparison with the clustering and centerline extraction results as shown in Figure 16. The results demonstrate that the proposed clustering and centerline extraction method can correctly delineate the geometries of road intersections and segments.
The result of centerline extraction is illustrated in Figure 17. Most road segments and intersections are correctly extracted. In addition, the clothoid curve is also used in the centerline extraction part to fit the global segments of the clusters, which can make the centerline smoother and closer to the real road compared with a polyline. The position accuracy of the road network can reach 2 m to 5 m.

Results comparison and evaluation
In the intersection detection step, we compare the proposed method with the algorithm proposed by Deng et al. (2018) through three indicators precision, recall, and F value in data 1 and data 2, illustrated in Equation (20-22). Deng detected intersections by a hotspot analysis, and we detected intersections through a computer-vision based OTSU method. The comparative results of the two methods are shown in Figure 18. The recall value of our method is higher than Deng's method for data 1. However, the precision of our method is lower than that of the method by Deng. The main reason for the result is that our method is sensitive to the data density change. The roads where a traffic block frequently form are easily recognized as intersections by our algorithm. For data 2, road scenarios are complex and data quality is low. However, our method achieved higher  precision value and significantly higher recall value and F value than Deng's method, which proved the robustness of our method for intersection detection.
recall ¼ correctly detected correctly detected þ not detected (21) In addition, in the centerline extraction parts, we compare our method with the existing algorithm proposed by Biagioni and Eriksson (2012). The author proposed a classic map inference method. First, the author extracted the map centerline from density estimate data through a grayscale map skeletonization method. Then, the author pruned and merged the edge and intersection based on a trajectory-based topology refinement technique. Finally, the author estimated intersection geometries by a trajectorybased geometric refinement technique. We used a manually interpreted section of OpenStreetMap as ground truth. The method proposed in reference (Liu et al. 2012b) is used to perform a quantitative measurement of the two methods. In reference (Liu et al. 2012b), the author performed a quantitative evaluation by measuring the precision and recall of the inferred map M and ground truth map T m . In addition, the author determined the true positive length T p ¼ M \ T m as a measure of common road length. To calculate the true positive length T p , we sample the map at 5 m intervals first and then compute it by oneto-one map matching.
The comparison result is shown in Figure 19. In data1 and data 2, the proposed method achieved a significantly higher precision and recall value. The main reason is that the proposed method clustered the data first and extracted the centerline of the road from the clustering results, which can describe the geometry of the road more specifically. In addition, we use a clothoid-based trajectory resampling algorithm to increase the density of data, which also improves the accuracy of the results, as shown in Figure 20.

Discussion
The overall method of this paper can be divided into two parts: (1) intersection detection and (2) data clustering and centerline extraction. In the first part, we experimented with the algorithm on two datasets and compared our method with Deng's. The results show Figure 18. Results comparison of intersection detection. Figure 19. Results comparison of centerline extraction. that our method can achieve an obviously higher recall value, which proves the robustness of our method. However, the precision of the algorithm needs to be further improved, as it is sensitive to data density changes. The roads where a traffic block frequently formed or the entrance to a shopping mall may be recognized as intersections. But in general, the method in this paper can achieve better results in complex environments as we use a computer-vision based OTSU method. In our future work, we will consider the direction change of the data at the intersection, combine the direction change and density difference of road intersections and segments, and try to use machine learning to extract road intersections. In part two, we compared our method with Biagioni's. The algorithm in this article achieved a higher precision and recall value. Our method can describe the geometry and topology of roads in more detail, because we resample the trajectories based on clothoid curves and cluster the data in different classes. However, in places where GPS satellite coverage is severely occluded, trajectory errors will make it impossible to resample, which will affect the results of clustering and centerline extraction. In our future work, we will try to incorporate the image data to infer higher precision road information.

Conclusions
In this paper, we proposed a novel method to mine road networks from floating car data. We first presented an Otsu-base algorithm in the intersection detection part, which is the first time using this method in intersection detection. In the clustering step, we proposed a clothoid-based method to resample the trajectory for improved cluster accuracy. Then, a distance-direction method was used to cluster the data. Last, a piecewise weighted least square fitting method was used to extract the cluster centerlines. We compared the proposed method with others. Our method can detect intersections effectively and robustly, and the extracted road centerline is more accurate and smoother than other algorithms. The geometric information and topological structure of the road are important parts of HD maps. Our method can be used to update the road map and provide a geometric basis for the HD map, and in our future work, we can try to extract more detailed and accurate road information and build a complete HD map.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Data availability statement
The data that support the findings of this study are available from DF GO. DF GO is a mobility technology platform. It offers app-based services including taxi-hailing. Restrictions apply to the availability of these data, which were used under license for this study. Data are available with the permission of DF GO (www.dfcx.com.cn).