Length-squared L-function for identifying clustering pattern of network-constrained flows

ABSTRACT The network-constrained flow is defined as the movement between two locations along road networks, such as the residence-workplace flow of city dwellers. Among flow patterns, clustering (i.e. the origins and destinations are aggregated simultaneously) is one of the cities’ most common and vital patterns, which assists in uncovering fundamental mobility trends. The existing methods for detecting the clustering pattern of network-constrained flows do not consider the impact of road network topology complexity on detection results. They may underestimate the flow clustering between networks of simple topology (roads with simpler shapes and fewer links, e.g. straight roads) but with high network intensity (i.e. flow number per network flow space), and determining the actual scale of an observed pattern remains challenging. This study develops a novel method, the length-squared L-function, to identify clustering patterns of network-constrained flows. We first use the L-function and its derivative to examine the clustering scales. Further, we calculate the local L-function to ascertain the locations of the clustering patterns. In synthetic and practical experiments, our method can identify flow clustering patterns of high intensities and reveal the residents’ main travel behavior trends with taxi OD flows, providing more reasonable suggestions for urban planning.


Introduction
Spatial movement, such as population migration, economic mobility, and disease spread, can be naturally abstracted into a simple origin-to-destination (OD) model called geographical flow (hereafter, flow) (Guo et al. 2012;Zhu and Guo 2014;Zhao et al. 2020).Flow can be classified into two types concerning space.One is free flow, whose origin and destination can be located anywhere within the planar (see Figure 1a), e.g. the OD flow of a typhoon.The other is constrained flow, whose origin and destination can be located only at restricted locations (see Figure 1b).Many social and economic activities in urban areas are subject to location restrictions imposed by road structures; thus, the network-constrained flow can better depict the movement of goods and people in cities than the free flow.A network-constrained flow is noted as (X O , Y O , X D , Y D ), it can be treated as a point in a network-constrained flow space (or network flow space), which is seen as the Cartesian product of two networks (i.e. the O network and D network) (Gao et al. 2018;Jiang et al. 2022).Studying how people and things move within urban networks and analysing how these flows are distributed across space is crucial for gaining insights into spatial interactions and human activities.
Clustering (or aggregation) is one of the most significant flow patterns.Identifying flow clustering in network flow space is essential in quantifying urban research.For instance, the cluster of transportation flows with a small scale might represent the strong traffic demand between the two locations, which may help to customize a more optimized traffic plan (Chen et al. 2013); detecting aggregated stolen-recovery flow of crimes would provide a perspective for understanding the scope and pattern of criminal events, thus achieving better crime prevention (Tao and Thill 2016); examining the variation of aggregated human flow distribution before and after extreme weather events could analyse the impact of the events on the residents' life (Chen et al. 2020).The size of the aggregation is referred to as the clustering scale (or aggregation scale); it is the basis for identifying the underlying clusters (Kiskowski, Hancock, and Kenworthy 2009;Shu et al. 2021).The clustering scale is usually measured based on the radius of the cluster, namely the distance between the flow at the center of the cluster and the flow farthest from it; this kind of radius-based scale is also called the planar scale (Fang et al. 2023).For example, the planar scales of the four clusters in Figure 1 are r under the maximum distance of flows (i.e. the maximum value of the distance between O points and the distance between D points of the two flows).Similar to detecting aggregated free flows, if the number of flows within the distance r of a particular flow is more than that expected under completely random assumption, clustering pattern centered at this flow is considered to be significant, and the corresponding planar scale is r.For instance, the flow sets in Figure 1 can be recognized as clustering patterns centered at central flows (colored in red) (Shu et al. 2021).
Progresses have been achieved in identifying free flow clustering by extending the existing methods of detecting point clustering in planar.Existing methods can be grouped into three categories (Song et al. 2019).The first is the hierarchical clustering method proposed for cluster identification and visualization (Doantam et al. 2005;Lee, Han, and Whang 2007;Wood, Dykes, and Slingsby 2010;Adrienko and Adrienko 2011).This approach can be used to classify flows into clusters with multiple scales according to the parameters set for cluster numbers (Zhu and Guo 2014).However, the optimal parameter selection is worth further studying, and even an outlier flow would be included in a certain cluster.The second is the density-based method (Pei et al. 2015;Nanni and Pedreschi 2006;Zhu et al. 2013), which is typically developed by altering the distance, density, and reachability using the density-based spatial clustering of applications with noise (DBSCAN) algorithm.DBSCAN can detect arbitrarily shaped clusters and reject outlier flows.However, determining the parameters MinPts and 1 is realized in an interactive way.The third method, namely the spatial statistic-based method aiming to analyse heterogeneity, can overcome the previous problems to a certain extent.It extended existing spatial statistics of point data to flow data, such as Moran's I (Liu, Tong, and Liu 2015) and Getis-Ord G statistics (Berglund and Karlström 1999).Though this kind of method can be seen as a nonparametric method, it mainly focuses on the spatial association of proximate flows.In conclusion, all the methods mentioned above may suffer the same weakness: they are not concerned with the scale characteristics of clustering.So the existing methods cannot accurately estimate the intensity of the clustering pattern, which is the key to pattern detection.
Among existing spatial statistic-based methods for free flows, the flow K-function is regarded as the most effective one.K-function method is able to detect clustering patterns and examine the scale properties (Tao and Thill 2016).In the K-function family, the incremental K-function was proposed to detect the clustering scale (Yamada and Thill 2007;Tao, Thill, and Yamada 2015).Further, the flow K-function was normalized by its expected value under CSR to flow L-function, which further magnifies the difference between observed distribution and random distribution.L-function's first derivative can effectively reveal the clustering scale without manually setting parameters (Sterner, Ribic, and Schatz 1986;Haase 1995;Shu et al. 2021).Though the K-function for network-constrained flows (network flow K-function) (Kan, Kwan, and Tang 2021) was proposed based on studies about the K-function for network-constrained points (Yamada and Thill 2004;Lu and Chen 2007;Okabe and Yamada 2001), it does not consider the effect of network topology complexity.This defect makes it unable to detect the flow clustering pattern of high intensity in network flow space.
To detect clustering patterns with high intensity in the network, the existing methods still have some limitations: First, the network flow K-function detects clustering patterns under the radiusbased planar scale originally designed for planar space, leading to the under or over-estimation of flow intensity in networks (Fang et al. 2023).This is because the planar scale r indicates the same spatial size in planar flow space (see the yellow planar area pairs in Figure 1a), but the spatial size in network flow space indicated by this scale is different and is dependent on the complexity of the network topology (see the yellow network segment pairs in Figure 1b).Specifically, the flow set located at road networks with complex topology (cluster ③ in Figure 1b) has larger flow space and thus has the chance to possess more flows at the scale r than the one between simple roads (cluster ④ in Figure 1b).So, the former flow set is more likely to be identified as a clustering pattern, even though the latter one has a higher intensity in networks.Second, flaws exist in the definition of the flow volume, i.e. the size of the network flow space.Considering that the flow space is the Cartesian product of two networks, simply measuring the size of the network flow space by the length of the network where the flows are located is broad and inaccurate.This definition needs to be rectified for a sounder mathematical foundation for network-constrained flows.Third, although there exist methods of detecting the clustering scale in planar space using the incremental K-function or the first derivative of the L-function, these methods have yet to be extended to determine the clustering scale of the network-constrained flow.
To overcome the aforementioned challenges, we proposed the length-squared K and L-function inspired by the idea of the length-based network scale (Fang et al. 2023).Unlike the radius-based planar scale used in the existing methods, where the clustering scale is represented by the radius of the cluster, the network scale considers the network lengths (i.e. the length of an imaginary straight line that is extended from the network) of the O and D points ranges.For example, the clustering scales under the network scale for cluster ③ and ④ in Figure 1b should be related to 4r and 2r, respectively.Hence, the methods based on the network scale would obtain the actual spatial size, helping detect the true-to-life intensity of the flow set.
In the rest of this paper, we correct the defects in the concepts of network flow space, helping define the K-function under the network scale, namely the length-squared K-function.Then, the expected value of this K-function under complete spatial randomness (CSR) is derived, which is the base of the length-squared L-function.The length-squared L-function could determine clustering patterns at a series of scales, its first derivative can be used to detect the scales of clusters, and its local version help identify the location of clustering patterns.In both synthetic and practical experiments, we found that the existing techniques using the radius-based planar scale may focus on the clustering pattern between complex road networks but with lower intensity, while the proposed length-squared L-function is able to detect clusters between relatively simple roads but with higher intensity, helping to supplement a new perspective to flow clustering detection and find location pairs with close interaction in the city.

Network flow space under the length-based network scale
Network flow space is defined as the Cartesian product of two road networks where flow events happen, which is similar to the free flow space definition (Shu et al. 2021) ).The following subsections will first define the distance and metrics of flows, then introduce the two kinds of neighborhoods under different metrics, which are designed to detect planar and network scales of clustering patterns, respectively.At last, the volume of the network flow space and the subset of it, i.e. h 2 -neighborhood, are deduced to help derive the flow intensity in the network flow space.These definitions support the subsequent derivation for the L-function.
Definition 1. Flow distance and metrics: In networks, the network distance between point p i (x i , y i ) and point p j (x j , y j ) is denoted as d ij : the shortest path distance between two points along the road network, calculated with the aid of Dijkstra's algorithm (Dijkstra 1959).For flow f i and f j , we calculate and reserve both the network distance between O points, i.e. d O ij , and the network distance between D points, i.e. d D ij , as the distance matrix, i.e. 2).In previous studies, metrics related to flows typically synthesize the distribution of O and D points.The most common ones are the additive metric and the maximum metric, which represent the linear combination and maximum value of a certain measurement in the O and D networks, respectively (Tao and Thill 2016).For example, the distance with additive metric (i.e.additive distance) is denoted as , and the distance with maximum metric (i.e.maximum distance) is denoted as ).The former distance can comprehensively consider both the proximity between the O points and the D points, while the latter focuses on the scope of the flows.
Definition 2. R 2 -neighborhood and h 2 -neighborhood: they are both subsets of the network flow space, which center around a certain flow, namely, the central flow.They are designed to detect the planar and network scales of clustering patterns, respectively.For r 2 -neighborhood, it is a flow space where the distance between an arbitrary flow and the central flow is not larger than r.For example, the yellow network segment pairs in Figure 1b refer to r 2 -neighborhoods under the maximum metric in networks.For h 2 -neighborhood, we first introduce the maximum-metric-based h 2 -neighborhood as an example (see Figure 3a).It follows the following two points: First, the total length of the road network occupied by its projections in the O and D network should be the same as h.Second, the radius r (i.e. the distance between each endpoint and the central point) should be the same in the corresponding projection.However, due to the complex configuration of road networks, r will change with the central flow's position.For example, as the h 2 -neighborhood shown in Figure 3a  Counting flows in the neighborhood is one of the most important steps in K and L-function.For the maximum-metric-based h 2 -neighborhood centered on a certain flow f i , the method to count flow in it is as follows: First, for each flow, i.e. f j ( j = 1, . . ., n, j = i), where n is the flow numbers in the research area, calculate the length of h , represented by the yellow path in the O and D network, respectively, are both within the extent of the projections of h 2 -neighborhood centered on flow f i (road segments colored in blue).Thus, flow f p will be counted within this h 2 -neighborhood.However, the flow f q in Figure 3c will not be included since h D r=d D iq exceeds the projection of the h 2 -neighborhood in the D network.Similarly, the corresponding flow counting method for the additive-metric-based h 2 -neighborhood can be deduced in the same manner as above.Definition 3. Network flow volume: The previous study proposed a method that uses the total length of the road network in the experimental area to represent the volume of the network flow space (Kan, Kwan, and Tang 2021).This definition has defects because the network flow space should be regarded as the Cartesian product of the O and D network.Suppose the total length of the O network to be L O and the total length of the D network to be L D ; the volume of this network flow space should be V = L O • L D .The volume of h 2 -neighborhood under different metrics can be derived by the shell integration method (Shu et al. 2021).This method consists of the inner and the outer integrations, which depicts the range of h 2 -neighborhood under certain metric.The inner integration integrates the sub-network from the O network to the D network, while the outer integration integrates the results in inner integration.The relevant integration processes under the maximum and the additive metric are detailed in Figure 4. Suppose a maximum-metric-based h 2 -neighborhood in network flow space centered on a central flow  h 2 -neighborhood under the maximum metric is: When using the additive metric, the h 2 -neighborhood based on a central flow f c is ≤ h}, and the volume of it is: Definition 4. Flow process intensity: Flow process intensity is the expected number of flows per unit volume in the research flow space.It can usually be calculated as , where L O and L D are the length of road networks covered by O points and D points, respectively.In the real world, however, it is sometimes difficult to precisely identify the covering network flow space of the flow process; thus, the intensity parameter can also be estimated using the second-order properties of flows (Pei et al. 2015;Pei et al. 2007;Shu et al. 2021).Due to the limited length of the paper, the detailed deduction is not mentioned here.
As introduced in Definition 1, the maximum metric can better measure the spatial extent of a flow cluster since it binds the distribution range of O and D points, satisfying the demand to quantify the clustering scale of flows.Therefore, we use the maximum metric in the following derivation and experiments.

Network flow K-function and its incremental function
As with Ripley's K-function in point-pattern analysis (Ripley 1976), network flow K-function should be defined as the expected number of additional flows within the r 2 -neighborhood of an arbitrary flow normalized by flow intensity in the network (Kan, Kwan, and Tang 2021).The derived formula for the network flow K-function K(r) is: where r is a designated planar scale, n is the number of flows in the research area, l f is the flow intensity in the network, D max ij is the maximum distance between two flows, and s f ij (r) is used to count flows within the r 2 -neighborhood centered at flow f i .
The local network K-function for each flow is defined as.
where r, l f , n, and s f ij (r) in Equation ( 5) have the same meanings as in Equations ( 3) and (4).The incremental network K-function for constrained points was proposed based on the network K-function to weaken the cumulative effect and thus help quantify clustering scales, which examines the number of points by unit increments of planar scale r rather than the total number of points within r-neighborhood (Yamada and Thill 2007).Therefore, the incremental network flow K-function K incre (r t ) can also be derived as the difference between the network flow K-function value at two consecutive scales: where K(r) is calculated as Equation (3) and r t represents the t th scale in ordered scales.

Length-squared K-function and L-function
Length-squared K-function should be defined as the expected number of additional flows within h 2neighborhood of an arbitrary flow normalized by flow intensity in the network.The derived formula for the length-squared K-function K(h) is: where h is a designated network scale, n is the number of flows in the research area, l f is the flow intensity in the network, and s f ij (h) is used to count flows within the h 2 -neighborhood centered at flow f i .
In point pattern analysis, CSR describes a point process whereby point events occur within a given study area in a completely random distribution.CSR is often applied as the standard or benchmark against which datasets are tested (Tao, Thill, and Yamada 2015;Yamada and Thill 2007;Okabe and Yamada 2001).In the present study, inferences about CSR flow assist in deriving the L-function from the K-function.CSR flows are defined as a homogeneous spatial Poisson process.Given a subset S of flow space, the number of flows in S obeys the Poisson distribution as l f V S , where l f is the intensity of this flow process in networks and V S is the volume of flow space S.
As the definition of K-function, the expected number of additional flows within an h 2 -neighborhood is l f K(h).Combining that with the property of CSR flows, we can conclude that the expected value of the length-squared K-function under CSR equals the neighborhood's volume at each scale.When using the maximum metric for flows, we get that l f K(h) = l f h 2 .Thus, the expectation for the length-squared K-function with a homogeneous Poisson flow process under the maximum metric is: By normalizing the length-squared K-function with its expectation, we can obtain the lengthsquared L-function, the expectation of which is 0 at any scale for CSR flows.In this way, we can amplify the deviation from the flow CSR (Besag 1977): The local length-squared L-function for each flow is defined as.
The benchmark we use in this study is flow CSR, which assumes unbounded space (e.g. the Poisson point process), while our study is performed in the context of bounded networks, which leads to the problem of edge effects.In the context of networks, edge effects do not represent a critical issue as the data are always located in a reduced area in the networks, making the edge correction not necessary (Okabe and Sugihara 2012).

Simulation experiment
In this section, we first test the correctness of the benchmark (i.e.expectation value under CSR) for length-squared L-function using Monte Carlo simulation.Then, methodologies of using the existing planar-scale-based network K-function and the proposed length-based length-squared L-function to identify clustering are introduced.At last, we generate flow datasets with designed clustering patterns to examine and compare the ability of network flow K-function and length-squared L-function.

Verifying the correctness of Null models
As deduced before, the expectation of K(h) is h 2 , and the expectation of L(h) is 0 at any scale under CSR, which are also called the theoretical curves for the corresponding functions.To prove that the complexity of the road network does not influence these null models, we designed two synthetic networks with different structure complexities.Monte Carlo simulation repeated 99 times was used to test the null models.In each simulation, the flow dataset contains 400 CSR flows.For these two synthetic networks, the data example and the results are shown in Figure 5, where the blue line with stars in Figure 5b and e, the purple line with stars in Figure 5c and f correspond to the theoretical K(h) and L(h) curves; the black lines correspond to the simulated means of K(h) and L(h) with 99 runs.The grey belts are the 95% confidence bands based on the Monte Carlo simulations.Both the theoretical K(h) and L(h) curves fall between the 95% confidence intervals with 99 runs, and the simulated means match the theoretical curves.Therefore, the benchmark of K(h) and L(h) under CSR for any network can be the derived expectations (i.e.K(h) = h 2 and L(h) = 0), which proves that the derivation of L-function is correct and provides a reference for the significance test of L-function.

Detecting clustering scale and identifying clustering
To test the ability of length-squared L-function to detect clustering patterns in the network, we designed three flow datasets.As the OD maps (Wood, Dykes, and Slingsby 2010) show in Figure 6, Case 1 contains 60 aggregated flows from a straight road to another straight road, the network scale is h = 1.0, the planar scale is r = 0.5, and the intensity is I = 60 1 × 1 = 60; 400 randomly distributed flows.Case 2 contains 140 aggregated flows from an intersection with seven exits to another, the network scale is h = 3.5, the planar scale is r = 0.5, and the intensity is step is to identify the location of the cluster, which is achieved by calculating the local function values of these two functions based on the detected clustering scales.The following paragraphs will introduce these three steps in detail.
In the first step, we calculate the network K-function and the length-squared L-function at a series of scales.For these two functions, if the function value of the observed data is greater than the benchmark on a certain scale, then there exhibits a significant clustering pattern at this scale.We use 99 times Monte Carlo simulation to get the benchmark (i.e.95% confidence interval) for these two functions.For each simulation, randomly distributed flows are generated, and the number of the generated simulated flows is consistent with the number of observed data.
In the second step, we use the incremental network K-function to detect the planar scales.We calculate the first derivative of the L-function, namely the L'-function, to discern the network scales of the clustering pattern.The maxima of the incremental network flow K-function can be regarded as a planar clustering scale if the function value at this scale is higher than its benchmark (which is also stemmed from the Monte Carlo simulation method as in the K-function).The transformation from L to L'-function is also about quantifying the L-function's increment rate.Specifically, the scale h that minimizes L'-function indicates a clustering scale and h 2 is the network scale of the clustering pattern (Shu et al. 2021).
In the third step, we calculate the local function values for each flow at the clustering scales r and h, respectively.Local function value reveals the clustering pattern's location; if a flow shows a high local function value at a certain clustering scale, then there is a flow cluster with the size of the detected scale.
For Case 1, the network K-function detected clustering scale at r = 0.4 and 0.6 (Figure 7a).The other maxima of incremental network flow K-function are unable to reflect clustering scales since the function values corresponding to them are smaller than the benchmark.The length-squared Lfunction detected the clustering scale at h = 0.2 2 = 0.1 and h = 2.0 2 = 1.0 (Figure 7b).The other larger scales by the minima of the L'-function are omitted since they may represent clustering patterns at larger scales but consist of several independent ones, which could be verified by calculating the local function value at these scales (Shu et al. 2021).Figure 7c and d show the local function value at detected scales; the smaller ones, namely r = 0.4 and h = 0.1 are omitted since they can only reveal subsets of the clustering.In this case, both methods can identify the clustering between straight roads.However, the length-squared L-function has the advantage that it could amplify the difference between the observed data with CSR data as the L-function is larger than the benchmark (theoretical curve and the simulated envelope).In contrast, the difference between the network Kfunction for the observed and simulated data is slight.For Case 2, the detected clustering scale is between intersections (Figure 9c) and ignored the one between straight roads with higher intensity.However, our method identified the two clustering patterns with two different network scales (Figure 9d and e).
Cases 1 and 2 demonstrate the ability of both the existing and proposed methods to identify clustering patterns in road networks.Case 3 demonstrates the superiority of our method in detecting high-intensity clusters.The network K-function failed to detect the high-intensity cluster because the r 2 -neighborhood is based on the radius of the flow set.Specifically, the r 2 -neighborhood starting and ending at crossings (relatively complex road networks) would have the chance to include more flows than that between straight roads (relatively simple road networks).The flows between crossings would have the priority to be identified as clustering patterns by network K-function.

Experiment with real data
The behavior mode depicts the main human daily activities in cities, such as modes between residential, working, medical, entertainment and transit areas.Understanding the behavior mode in a region helps perceive the travel demand and support better urban planning.By analysing the distribution range of O points and D points of the taxi OD flow clustering in areas, we can understand the behavior mode behind it.To study the clustering pattern of residents' travel demand, we use the length-squared L-function to identify the clustering pattern compared with the network flow Kfunction.This study helps analyse the behavior mode in regions combined with the facilities around the detected clustering patterns.
In this case study, we conducted experiments using taxi OD flow.We chose Wangjing Area and Financial Street as study areas (see Figure 10) because these zones are in the main urban area of  Beijing and contain facilities with distinctive functions; they can better reflect the practicability of our method in analysing the dynamics of urban residents.All experimental OD data are matched to the road network according to the travel trajectory of the taxis.Due to the large size of the taxi dataset, we randomly sampled 2000 flows of each study area from October 20th to 24th, 2014, which were workdays.
The results of the real data using network K-function and length-squared L-function are demonstrated in Figure 11 We further compare the difference between the result of the network flow K-function and length-squared L-function.For each detected clustering scale, we extracted flows that possess local function values in the top 25% of the entire value range, revealing the clustering patterns' locations (see Figure 14).For Wangjing Area, the clustering patterns detected by network K-function are concentrated in dense road networks around Wangjing Central Park (see Figure 14a).This is because the r 2 -neighborhood within this area covers a more complex road network (longer length of the roads) than other sparse road networks and has the opportunity to contain more flows.Thus, the corresponding local network K-function values for flows here are higher.However, this does not mean that there are more points per road network length, namely clustering patterns with higher intensity.The clustering patterns that our method detects mainly cross the Fourth Ring Road from the periphery to the city's center (see Figure 14b).For example, the clustering patterns from a residential area to Taiyanggong metro station and Shaoyaoju metro station, and the clustering pattern from a residential area to Liangmaqiao metro station.These are travel demands of residential-transit mode, which reflect a common mode that residents living in the suburbs and far away from the subway station usually take a taxi to the nearest subway station and then commute.In addition, our method also detected other clusters, such as the one from the Sanyuanqiao metro station to the 798 Art Block, a famous entertainment place in this research area, which shows transit-entertainment mode.Furthermore, it could be concluded that our method is more sensitive to clustering patterns as it is able to detect patterns at relatively small scales.For example, our method detects a clustering pattern at the network scale of 275 m, but the network flow K-function did not identify that there is a clustering pattern on the equivalent planar scale.
For Financial Street, the clustering patterns detected by the network K-function are also distributed at road networks with relatively dense and complex structures (see Figure 14c).Similar to the case study of Wangjing Area, the results of the network K-function in Financial Street cannot reflect clustering patterns with high intensity either.In addition, we also found that the clustering scale detected in this area is very large, meaning that each detected clustering pattern at this scale occupies a large part of the study area, which is not conducive to analysing local anomalies.However, our method detected clustering patterns consisting of four main parts (see Figure 14d), from west to east: the one from the residential place to Muxidi metro station, the one from the residential The result of the length-squared L-function adds a new perspective and brings new discoveries into urban anomaly detection.In addition, our proposed method is based on the network scale; thus, the detected clustering pattern is the one with higher intensity, namely with more flows per unit of network flow space, revealing more concentrated interactions (Fang et al. 2023).Furthermore, our method is more sensitive to clustering patterns since it can amplify the difference between observed data and CSR data.The network flow K-function does not regard that there is a clustering pattern at relatively small planar scales, but at the equivalent network scale, our method could detect clustering patterns.

Conclusions
Based on the length-based network scale, this study proposes the length-squared L-function to identify clustering patterns of network-constrained flows.Experiments with synthetic and real data show that our method can accurately identify the location and extent of clustering patterns in road networks.Compared with the existing network flow K-function, which is based on the radius-based planar scale, our method can overcome the influence of road topology and detect the clustering patterns with high intensities.In the case study with taxi OD data in selected areas, our method can detect the clustering patterns of high intensities, which the existing network K-function ignored.The detected clustering patterns reveal the primary behavior mode (e.g. the residential-transit mode), which helps perceive the strong travel demand in a selected area and supports appropriate urban planning.
The contribution of our work is reflected in four aspects: First of all, with the novel h 2 -neighborhood based on the network scale, the proposed method would not be affected by the topological complexity of road networks and can thus identify the clustering pattern of high intensity.Moreover, the h 2 -neighborhood proposed in this study can also be used to extend other spatial statistics methods for more precise spatial quantification in the context of road networks.Second, our method can identify the clustering pattern without setting scale parameters (i.e.clustering scales).In other related methods (e.g.DBSCAN), the scales of the clusters should be set in advance.In our method, however, the clustering scales can be detected by the derivative of the L-function.Third, the L-function is able to amplify the difference between observed data and randomly distributed data and is thus more sensitive to the clustering scale.Forth, our work has corrected and improved the concepts in network flow space and laid a solid mathematical foundation for future statistics in this direction.
Our work still has shortcomings, which could be crucial increments of future work.First, though our method can be applied to networks of any size, it is usually necessary to select some study regions manually to get more precise results.In the L-function method, when the research area is large, there might be too many clustering patterns with different scales, and the detected scales are the comprehensive contribution of these multiple clustering patterns (Yamada and Thill 2007;Kan, Kwan, and Tang 2021).It will be challenging to determine a specific clustering pattern based on the detected scale in a large area.Thus, when selecting a research area, it is better to use prior knowledge (e.g. the distribution of flow data) to select areas with only a few potential clustering patterns.Furthermore, we use flow CSR as the benchmark of our method because it can be a base for deriving the L function.Such CSR assumption is too strong for most true-to-life ways.As a result, more research should be done to extend our method to real applications, such as using the Monte Carlo method or developing some other simulation methods (Liu, Tong, and Liu 2015;Kan, Kwan, and Tang 2021).Moreover, the time attribute of the flow is not considered in our method for flows.This is because this work focuses on the spatial pattern of flows.However, some approaches have been made for spatiotemporal flows (Yan et al. 2023a;2023b;Hohl et al. 2022), which may provide an indication of extending our work to spatiotemporal flows.

Geolocation information
In this paper, we used two urban areas of Beijing as study areas, which mainly include the Wangjing Area in Chaoyang District (the latitude range is about from 116.42-116.52 degrees, the longitude range is from 39.95-40.02degrees) and the Financial Street Area in Xicheng District (the latitude range is about from 116.33-116.37 degrees, the longitude range is from 39.90-39.92degrees).

Figure 1 .
Figure1.The clustering patterns in free flow space (located in yellow planar area pairs) and in network flow space (located in yellow network segment pairs).(a) In free flow space, cluster ① has higher intensity (flow number per flow space) than ②.(b) In network flow space, the flow cluster ③, which is from and to complex roads, has a lower intensity than cluster ④, which is between the simple roads.
, the central flow is from a straight road to a place close to crossroads.The neighborhood has r O (radius of its projection in O network) and r D (radius of its projection in D network) as h .Although the shapes and structures of the projections in the O and D network vary with the position of central flows, their total length are consistent as h.This fundamental and vital property provides a guarantee for the subsequent derivation.

Figure 2 .Figure 3 .
Figure 2. Network distance between flows.The path colored in orange is the shortest path distance between O points of flow f i and f j , i.e. d O ij ; the path colored in yellow is the shortest path distance between D points of flow f i and f j .i.e. d D ij .The distance between flows has usually been designed to be the linear combination or extreme value of d O ij and d D ij .
r=d D ic ) ≤ h}.Based on the metric defined in Definition 1, the volume of a

Figure 4 .
Figure 4. Calculating flow volume of h 2 -neighborhood under maximum metric and additive metric.The inner integration integrates the sub-network from the O network to the D network, while the outer integration integrates the results in inner integration.

Figure 5 .
Figure 5. Monte Carlo simulation with CSR flows of the length-squared K-function and the length-squared L-function in the context of the (a) simple and (d) complex road networks: (b) and (e) are the theoretical curves and the simulated curves of the lengthsquared K-function, (c) and (f) are the theoretical curves and the simulated curves of the length-squared L-function.
Figure 6.Synthetic flow data.(a) Case 1 consists of a flow cluster between straight roads and 400 CSR flows.(b) Case 2 consists of a flow cluster between seven-fork intersections and 400 CSR flows.(c) Case 3 consists of the flow clusters in Cases 1 and 2, and 400 CSR flows.

Figure 7 .
Figure 7. Results of the network K-function and the length-squared L-function for Case 1.The first row consists of (a) the network K-function and the incremental network K-function (the scale that maximizes the incremental network K-function is represented by grey dotted lines), (b) the length-squared L-function and the L'-function (the minimum of L'-function is represented by the grey dotted line).The second row consists of (c) local network K-function and (d) local length-squared L-function values for each flow at the detected clustering scales.

rFigure 8 .
Figure8.Results of the network K-function and the length-squared L-function for Case 2 (see Figure7for the explanation for the legend).

Figure 9 .
Figure 9. Results of the network K-function and the length-squared L-function for Case 3 (see Figure 7 for the explanation for the legend).

Figure 10 .
Figure 10.Sub-areas and selected flow data in Beijing.(a) Study areas; (b)selected flows in Wangjing Area and (d) the corresponding OD map; (c)selected flows in Financial Street and (e) the corresponding OD map.
. The network flow K-function detected obvious clustering patterns from r = 1500m and the length-squared L-function detected clustering patterns at all scales for these two areas.According to the result of incremental network K-function, for the Wangjing Area, network flow K-function obtained the planar scale of clustering pattern at r = 1950, 2650, 3150, 3500, 3600, 3850m.According to the result of the first derivative of lengthsquared L-function, detected network scales are h = 275, 825, 2100, 3175, 4000, 4375m.For the Financial Street, the network flow K-function method obtained the planar scale of the clustering patterns at r = 1900, 2000, 2200, 2600, 2700, 2900m and length-squared L-function method detected the network scale at h = 1025, 1575, 2075, 2625, 3025, 3325m.The corresponding local function values at detected clustering scales are shown in Figures 12 and 13.

Figure 11 .
Figure11.Results of the case study.Each row corresponds to the sub-areas in Figure10(see Figure7for the explanation for the legend).

Figure 12 .
Figure 12.Local network flow K-function (NK) and local length-squared L-function (LL) for each flow at the detected clustering scales for Wangjing Area.

Figure 13 .
Figure 13.Local network flow K-function (NK) and local length-squared L-function (LL) for each flow at the detected clustering scales for Financial Street.

Figure 14 .
Figure 14.Flows with top local function values at detected clustering scales.Flows with (a) top local NK and (b) top local LL values at detected scales for Wangjing Area, (c) top local NK, and (d) top local LL values at detected scales for Financial Street.
2 -neighborhood's projection in the O and D network based on d O ij and d D ij , i.e. h O