Non-recurrent traffic congestion detection on heterogeneous urban road networks

This paper proposes two novel methods for non-recurrent congestion (NRC) event detection on heterogeneous urban road networks based on link journey time (LJT) estimates. Heterogeneity exists on urban road networks in two main aspects: variation in link lengths and data quality. The proposed NRC detection methods are referred to as percentile-based NRC detection and space–time scan statistics (STSS) based NRC detection. Both of these methods capture the heterogeneity of an urban road network by modelling the LJTs with a lognormal distribution. Empirical analyses are conducted on London's urban road network consisting of 424 links for the 20 weekdays of October 2010. Various parameter settings are tested for both of the methods, and the results favour STSS-based NRC detection method over the percentile-based NRC detection method. Link-based analyses demonstrate the effectiveness of the proposed methods in capturing the heterogeneity of the analysed road network.


Introduction
Occurrence of a non-recurrent congestion (NRC) event frustrates commuters and traffic operators because an NRC event would cause an unexpected delay, which then might result in missing a meeting or an appointment (FHWA 2012b).An NRC event is mainly caused by unexpected events such as traffic accidents or vehicle breakdowns.It can also occur due to planned engineering works, special events (e.g.football matches or concerts) or inclement weather (FHWA 2012a;Kwon, Mauch, and Varaiya 2006).The duration, timing and location of these events would vary; all of which makes it difficult to provide a formal definition for an NRC event.The occurrence of an NRC event leads to much higher travel times on one or more links with respect to expected travel times.Previous researches highlight that the major source of travel time variability is NRC events (Noland and Polak 2002).Consequently, understanding how much of the total congestion is due to NRC has been studied thoroughly (Chow et al. 2014;Dowling et al. 2004;Skabardonis, Varaiya, and Petty 2003); however, research on detecting NRC events has only recently gained importance (Anbaroglu, Heydecker, and Cheng 2014).
Detection of NRC events has gained importance, especially in an urban road network context, as traffic operation centres need to understand the impact of such events in order to take proactive measures (Hallenbeck, Ishimaru, and Nee 2003).Such an understanding will allow researchers and practitioners to characterise NRCs.In this way, valuable information would be provided to traffic operation centres on (i) the development of NRC mitigation strategies based on their cause, (ii) estimating how much NRC is caused by a particular type of incident and the development of incident response strategies and finally (iii) the effective management of planned events.For example, an operational use of detecting NRCs would be a traffic operation centre charging an engineering work based on the NRC that the work had caused.In this way, the traffic operation centre would encourage engineering companies to complete their duties during off-peak times.
Most of the existing studies consider NRC as a consequence of incidents on motorways (freeways).The structure of links in most motorways, however, is homogeneous.Specifically, most of the links are in similar length (Dowling et al. 2004), and the data quality of these links do not vary substantially across a motorway network (Han and May 1989).In most of the traffic theoretical research, links are even assumed to have equal lengths.A network containing links having different lengths is considered to be a 'complex network' (Long, Ren, and Lian 2008;Mazloumian, Geroliminis, and Helbing 2010).Consequently, spatial correlation amongst the traffic data collected on motorways on adjacent sensors is higher than urban road networks (Hawas 2007;Yang, Lin, and Gong 2009).
On the other hand, detection of NRCs on an urban road network is challenging due to heterogeneity.An urban road network is heterogeneous in two main aspects.First, link lengths vary substantially; and therefore handling day-to-day variations on short links while capturing the variations on longer links which may be due to an NRC is a difficult task.Second, low data quality may lead to signalling an NRC event when there is none in reality.Previous researchers have highlighted that data quality in an urban road network is a serious issue, and should be considered in traffic modelling (Hasan, Ben-Akiva, and Emmonds 2011;Thomas and van Berkum 2009).Existing NRC detection methods fail to capture this heterogeneity, as they do not consider the statistical characteristics of the traffic data (Anbaroglu, Heydecker, and Cheng 2014).
The aim of this paper is to support the accurate detection of NRCs on large heterogeneous urban road networks.In order to achieve this aim, link journey time (LJT) data are exploited as it is a common data source that traffic operation centres use to monitor road network performance (OECD 2007, 43;TfL 2010, 86).Accordingly, this paper proposes two novel methods to detect NRCs on a large heterogeneous urban road network based on LJT data.First method is referred to as 'percentile-based NRC detection', and it relies on the percentile values of the estimated LJTs to detect NRCs.Second method is referred to as 'space-time scan statistics (STSS) based NRC detection', and it relies on a statistical model to detect statistically significant clusters of high LJTs.Even though STSS-based approaches have commonly been used in epidemiology to detect disease outbreaks (Kulldorff et al. 2005;Neill et al. 2005), in crime science to detect crime hotspots (Maciejewski et al. 2010;Nakaya and Yano 2010) amongst others (SaTScan 2010); their investigation in transportation science is a recent research endeavour.According to the best of our knowledge, only Huang et al. (2009) investigated the use of spatial scan statistics to detect clusters of active transportation (i.e.walking, cycling); however, they have not considered the temporal aspect of the phenomenon.
Therefore, the contributions of this paper are summarised as follows: • How STSS could be used in transportation science to detect clusters of substantially high LJTs corresponding to NRCs in an urban road network context is introduced (Section 2.2). • Percentile-based NRC detection method is proposed (Section 2.1).
• The distribution of LJTs on a large urban road network consisting of 424 links is investigated (Section 3.1).• The effectiveness of the proposed methods is assessed on London's urban road network (Section 3.2) by relying on the evaluation criteria discussed in Section 2.3.

NRC detection on heterogeneous urban road networks
Two novel NRC detection methods, both of which rely on statistical principles to capture the heterogeneity of an urban road network, have been proposed in this section.Both of the methods aim to identify NRCs accurately (i.e.recognition of all NRCs and no incorrect interpretations of recurrent events in traffic as NRCs) that occurred on a given date of analysis.The way in which the identified NRCs are represented would allow a researcher to observe the evolution of an NRC in space and time.In this way, the dynamic nature of an NRC can be captured, and each detected NRC can be quantified based on its spatial and temporal impacts.
The proposed methods rely on LJT data, where an LJT is the estimated journey time through a link at an established time interval.The wide usage of automatic number plate recognition (ANPR) cameras for traffic enforcement purposes allows traffic operation centres to estimate LJT from vehicle journey times that are obtained by matching the readings of ANPR cameras.The higher the number of vehicles' journey times that are used to estimate an LJT (i.e.sample size), the higher the data quality of the estimated LJT is.Because LJT is a prominent traffic data type in network-wide analysis and used in traffic operation centres for road network performance monitoring, the proposed NRC detection methods could readily be adapted in a real-life context.Thereon, a road network can be characterised by a directed graph consisting of a set of nodes corresponding to ANPR cameras, and a set of arcs (i.e.A) corresponding to links.An estimated LJT on link a time interval t is denoted by y a (t).
Proposed NRC detection methods rely on four inputs.These are adjacency matrix (M), historic LJT data, congestion factor (c) and date of analysis.Adjacency matrix is a binary matrix, which defines the connectivity of the links.A hierarchical structure can be built upon the adjacency relationships.First-order adjacent links are directly connected with each other, whereas there is another link to connect two second-order adjacent links and so on.Historic LJT data are used for statistical analysis like estimating the expected LJTs.Congestion factor is a real-valued number multiplied with the expected LJTs to determine the threshold to identify whether an LJT is excessive.Date of analysis is the day on which NRCs will be detected.
Given these four inputs, two measures can be determined.Expected LJT is the mean travel time on a given link a at time interval t under normal traffic conditions.It is denoted as ȳa (t) and measured in minutes.Expected LJTs capture the recurrent nature of traffic.Excess LJT is the difference between the observed and expected LJT if the observed LJT is higher than a threshold which is determined by multiplying the congestion factor c with the expected LJT.An LJT is said to be excessive, if the estimated LJT is higher than the threshold.
Both of the proposed NRC detection methods consider the statistical distribution to model LJTs.The evidence from the literature (Arezoumandi 2011;Dandy and McBean 1984;Hollander and Liu 2008) and the empirical analysis conducted in Section 3.1 indicates that LJTs can best be modelled by lognormal distribution.

Percentile-based NRC detection
This method aims to capture the heterogeneous nature of an urban road network by relying on the percentile values of the estimated LJTs.There are different ways of estimating the percentiles of a given data set; however, here we consider the percent point function method (also referred to as the 'inverse cumulative distribution function'), as it considers the statistical distribution of LJTs.The formula of the percent point function of the two-parameter lognormal distribution is presented in Equation (1) as follows (Pu 2011): where p is the cumulative probability, −1 (p) is the percent point function of the standard normal distribution function, μ and σ are the mean and standard deviation of the underlying normal distribution, respectively.The value of −1 (p) could be obtained easily given p.For example, when p = 0.5, −1 (p) would be zero; hence, G(0.5) = exp(μ), which is the median of the lognormal distribution.
The aforementioned process to calculate the π th percentile value (π = 100p) of an LJT is conducted for all a ∈ A and t ∈ {1, 2, . . ., T}, where A and T denote the set of links and the total number of LJTs within the analysis interval, respectively.Specifically, G(p) is calculated for |A| •T times for a given value of π .Thereon, an estimated LJT is considered to belong to an NRC if it is greater than its π th percentile value.Formally, y a (t) belongs to an NRC if y a (t) > y π a (t), where y π a (t) denotes the π th percentile value of link a at time interval t.Those LJTs that are higher than their π th percentile values and spatio-temporally overlap with each other are clustered to detect NRCs.Two LJTs spatio-temporally overlap with each other if they either occur on the same link at adjacent time intervals (i.e.y a (t) and y a (t + 1)) or occur on adjacent links at the same time interval (i.e.y a (t) and y b (t), where M(a, b) = 1).This procedure of clustering spatio-temporally overlapping LJTs is repeated until all the LJTs that are higher than their π th percentile value are included within an NRC.

STSS-based NRC detection
This section develops a modified version of expectation-based STSS as described in Neill (2008) for the purpose of detecting NRCs.The method consists of four main steps, which are summarised in Figure 1.First, space-time regions (STRs) are generated which requires two inputs: maximum spatial window size (ρ) and maximum temporal window size (τ ).Second, the likelihood ratio function (F) is determined by considering the distribution of LJTs.Third, significant STRs are determined by comparing the likelihood ratio scores of the observed data with the ones obtained from the replications.Finally, significant STRs are clustered to detect NRCs.

Generating STRs
An STR is the aggregation of spatial regions in time, where links correspond to the spatial regions.In order to detect any NRC regardless of the number of links that it contains or its duration, it is necessary to scan an entire study area with overlapping STRs whose size and location vary.There are two important values that should be determined to generate all STRs.These are maximum spatial window size (ρ) and maximum temporal window size (τ ).Because scanning a large spatial area containing hundreds of regions is computationally unfeasible (Neill and Moore 2004), we propose to reduce the number of STRs with the following two adjustments.First, only those STRs whose individual LJTs are excessive are evaluated.In this way, detecting a significant STR that includes several links but only one having a substantial increment in its LJTs would be prevented, which is a known limitation of scan statistics-based approaches (Rogerson and Yamada 2009, 115).Second, spatial regions are created by only considering the link itself and its first-order adjacencies.This adjustment, however, does not prevent the proposed method to detect arbitrarily shaped NRCs consisting of many links as would be described in Section 2.2.4.These two adjustments lead to detecting more reasonable NRCs and also reduce the runtime of the STSS as there would be fewer STRs in total to evaluate.

Determining the likelihood ratio function
This section describes how an expectation-based STSS can be derived by assuming that LJTs are distributed based on lognormal distribution.
The likelihood ratio function is derived from Pr(Data|H 1 (s))/Pr(Data|H 0 ), where H 1 (s)is the alternative hypothesis suggesting that the STR s is a significant cluster; and H 0 is the null hypothesis suggesting that there is no significant STR across the analysis period.The Pr(Data|H 1 (s)) is the probability of the STR s to be a significant cluster.On the other hand, Pr(Data|H 0 ) is the probability of having no significant STRs; hence, no NRC, on the analysed date.The null and the alternative hypothesis are formally defined as follows: • H 0 : y a (t) ∼ lognormal(μ a,t , σ a,t ) for all links a ∈ A and time intervals t ∈ {1,2, . . ., T}.
μ a,t is referred to as the location parameter and is calculated by taking the mean of the natural logarithm of the historic LJTs.σ a,t is referred to as the scale parameter, and is the standard deviation of the natural logarithm of the historic LJT observations.• H 1 (s) : y a (t) ∼ lognormal(μ a,t + q, σ a,t ), for all links a and time intervals t ∈ s for some positive constant q > 0, q ∈ R; where s denotes the STR that rejects the null hypothesis.
The reason for proposing an additive lognormal model is to build a linkage between the congestion factor (c) and q.Specifically, the threshold to determine whether or not an LJT is excessive is calculated by c × ȳa (t).The natural logarithm of this threshold is ln(c × ȳa (t)), which is equal to ln(c) + μ a,t .Therefore, q corresponds to ln(c).
The likelihood ratio is calculated as illustrated in Equation ( 2): where α = a,t∈s (ln (y a (t)) − μ a,t )/(σ a,t 2 ), and β = a,t∈s (1/(σ a,t 2 )).From this equation, the maximum likelihood estimate of q can be found as α/β.If this is substituted into the previous function, the likelihood ration function is obtained as follows: (3) The likelihood ratio values are calculated for each s ∈ STR, where STR denotes the set of STRs.

Determining significant STRs
In order to determine whether or not an STR is significant, the likelihood ratio values' distribution under the null hypothesis should be determined.This is achieved by conducting Monte Carlo simulations, as an analytical solution to compute the test statistic derived in Equation ( 3) is not available.
Monte Carlo simulations generate LJTs based on the null hypothesis that there is no NRC across the analysis period.Then, STRs are used to scan the simulated data sets and their likelihood ratio values are calculated based on Equation (3).At this point, it is important to keep in mind that the maximum spatial and temporal window sizes which are used to create STRs to scan the entire road network should be the same for the simulations.Since the simulated data (i.e.replications) are generated based on the null hypothesis, the highest likelihood ratio scores of each simulation (denoted as F*) indicate the scores which are expected to be observed when there is no NRC on the analysed day.
After conducting the Monte Carlo simulations and detecting the highest scoring regions of each simulation under the null hypothesis, it is then necessary to calculate the p-value of each STR in the observed data.The likelihood ratio value of the investigated STR is compared with the highest scores obtained from each simulation, and the number of simulations whose F* values are higher than the investigated STR are counted.Suppose that 7 simulations out of 99 have a higher likelihood ratio than the 1 being investigated.Then, the p-value of that STR would be (7 + 1/(99 + 1)) = 0.08.
The p-value of an STR indicates the likelihood of observing that STR if data are to be generated under the null hypothesis.Thus, the lower the p-value, the higher the likelihood that the LJTs within that STR are not generated according to the null hypothesis.The significant STRs' p-values are less than a pre-determined significance level.

Clustering significant STRs
There would be a number of significant STRs with the same p-values as the most significant STR (i.e. the one having the highest likelihood ratio value), since adding or removing an estimated LJT from an STR would not lead to a noticeable effect on the likelihood ratio value (Kulldorff and Nagarwalla 1995).This step proposes to cluster the significant STRs found in the previous step, so that the evolution of an NRC can be observed, and the impact of each detected NRC could be quantified spatially and temporally.In this way, it would become possible to detect any NRC regardless the number of links that are congested and the duration during which they are congested.
Clustering significant STRs has a similar procedure to the one described in percentile-based NRC detection.This is because an STR is a group of LJTs, and all the LJTs that belong to a statistically significant STR are considered to belong to an NRC.Thereon, spatio-temporally overlapping LJTs could be clustered to detect NRCs.An example to clarify the concept of STR and STSS-based NRC detection is presented in Figure 2, in which there are four linearly aligned links a1, a2, a3 and a4 and the traffic is flowing from a1 towards a4.The excessive LJTs are indicated with a ' + ' sign.The statistically significant STRs (s 1 , s 2 , . . ., s 8 ) and the resulting NRCs (k 1 and k 2 ) are illustrated in the figure.The NRCs k 1 and k 2 enclose 8 and 11 LJTs, respectively.
In this example, we assumed the maximum temporal window size to be four (i.e.τ = 4), and the maximum spatial window size to be two (i.e.ρ = 2).Consequently, an STR may enclose anything between a single LJT to at most two adjacent links across four consecutive time intervals.For example, the STR enclosing links a3 and a4 for time intervals six-eight has not been detected as a significant STR, as y a4 (8) is not excessive.Furthermore, in order to increase legibility, not all of the significant STRs are illustrated.For instance, it is possible that there are additional significant STRs enclosing a single LJT just like s 4 .Nevertheless, once all the LJTs that are enclosed within the statistically significant STRs are determined, the corresponding NRCs would be the same.Last, but not least, the occurrence of excessive LJTs such as y a1 (1) and y a1 (2) does not guarantee a statistically significant STR; hence an NRC.

Evaluating the detected NRCs
The two proposed NRC detection methods do not make any assumption regarding the spatial or temporal extent of NRCs.The methods consider substantially high LJTs as the main indicator of an NRC.Naturally, different values of π in percentile-based NRC detection, and maximum spatial and temporal window size values in STSS-based NRC detection would lead to different NRCs.In order to determine the most suitable values of these variables, a formal way to evaluate an NRC detection method should be provided.
Even though it is possible to obtain ground-truth data providing the true spatial-temporal extent of NRCs, the ways in which to obtain it govern remarkable challenges.These challenges would not be met in most real-life scenarios.For example, a team of traffic analysts might label the NRCs; however, such an effort, even for a single day, would be unprecedentedly intense for a large urban road network.Consequently, this paper relies on the two criteria that have been proposed by Anbaroglu, Heydecker, and Cheng (2014), namely high-confidence episodes and Localisation Index as these criteria assess different aspects of an NRC detection method.

High-confidence episodes
This evaluation criterion assesses to what extent an NRC detection method detects 'highconfidence' episodes.A 'high-confidence' episode is an NRC event on a link that lasts for a minimum duration during which all LJTs are excessive.The main reason for relying on this evaluation criterion is to detect NRCs that would take a traffic analyst's attention.Because a high-confidence episode satisfies this requirement, an NRC detection method should be able to detect the high-confidence episodes that occurred on an analysed day.
Once all the NRCs and high-confidence episodes are identified on the analysed day, they are compared with respect to each other based on the LJTs they enclose.From this comparison, two measures could be estimated.False alarm rate (FAR) is the proportion of all LJTs that are enclosed within an NRC but a high-confidence episode to all LJTs enclosed by the NRCs.False negative rate (FNR) is the proportion of all LJTs that are enclosed within a high-confidence episode but the detected NRCs to all LJTs enclosed by the high-confidence episodes.Of these two measures, FNR is the critical one as it determines the proportion of missed high-confidence episodes.Therefore, a very good NRC detection method should have low FNR.On the other hand, an NRC may last less than the minimum duration as defined in a high-confidence episode.Consequently, a very good NRC detection model should produce some false-alarms; but the corresponding FAR would be unknown due to the lack of ground-truth data.

Localisation Index
The 'Localisation Index' is an evaluation criterion that quantifies the extent to which the detected NRCs consider day-to-day variations in traffic to belong to an NRC.To calculate the Localisation Index, the number of connected components of the detected NRCs should be determined.A connected component comprises links that are adjacent with each other.In order to clarify, please refer to the example illustrated in Figure 2.
Relating the first NRC (k 1 ) with an incident is possible, as the effect of an incident happening on link a4 at the first-time interval could have propagated towards its upstream links.On the other hand, identifying such a relation for the second NRC is more difficult.This observation could also be stated when the number of connected components are identified for both of the NRCs.Regarding k 1 , there is only one connected component between time intervals one and four, even though the links that are included within the connected component change over time.Specifically, at the first-time interval, the connected component consists of only link a4, whereas at the third time interval, the connected component contains links a2, a3 and a4.These three links constitute a single connected component, because a2 is adjacent with a3 and a3 is adjacent with a4.The average number of connected components for k 1 is therefore one.On the other hand, the average number of connected components for k 2 is 7/4, as there are two, one, two and two connected components between the sixth and ninth time intervals, respectively.The number of connected components is two for time intervals six, eight and nine because link a1 is not adjacent with links a3 or a4.
Once the average number of connected components is determined for all the detected NRCs, the maximum of them is referred to as the 'Localisation Index'.The higher the Localisation Index is, the harder to relate the detected NRCs with the reported incidents, as each connected component could be related to a different incident.On the other hand, the minimum value of the Localisation Index is one, which is the best value, as all the detected NRCs consist of a single connected component throughout their lifetime.

Case study: detecting NRCs in London's urban road network
London, the capital city of the UK, is a prominent example of a heterogeneous urban road network.Link lengths vary from 175 to 13,487 m.In addition, the characteristics of these links also differ with respect to the traffic flow direction, the number of lanes they contain and the number of junctions they pass through.In addition, the qualities of LJTs vary spatially and temporally.Specifically, some links have higher sample sizes than others due to factors like the better positioning of cameras or better hardware and software, which lead to a better performance of capturing the licence plate of vehicles.In addition, more vehicles' journey times are collected on some time periods due to luminosity or higher traffic flows.London's urban road network, whose LJTs are obtained by matching the readings of ANPR cameras, has 424 links and it is illustrated in Figure 3.These data were provided from Transport for London (TfL), which is the institution responsible for the management of London's urban road network.

Determining the distribution of LJTs
This section reports the results regarding the analysis to determine the distribution of LJTs.Historical LJTs are collected for the weekdays between 1 April and 1 August 2010.This period consists of 87 weekdays, which exceeds the number of LJTs that are required to derive reliable statistical estimates in order to model an LJT, which is reported to be approximately 40 (Hollander and Liu 2008).All of the collected data are analysed between 07:00 and 19:00, as this time interval covers the AM/Inter/PM peak periods (TfL 2010, 268).Within this interval, there are T = 145 LJT estimations, as an LJT is estimated every 5 min.Because there are 424 links, the distribution has been tested for 61,480 (424 × 145) LJTs.In order to determine the best model, four distributions have been tested: lognormal, gamma, normal (i.e.Gaussian) and exponential.The reason why normal and exponential distributions are also hypothesised is that they will be useful to compare the results, since both of the distributions are theoretically not possible for modelling LJTs.This is because normal distribution is a symmetric distribution; however, an LJT cannot have a negative value.On the other hand, exponential distribution does not have a peak which is observed in an LJT distribution, as most trips are completed within a narrow LJT interval.Furthermore, because exponential distribution is a special case of gamma distribution; it is expected that gamma distribution would be more flexible to model LJTs compared to exponential distribution.
The hypothesised distributions are fit using maximum likelihood estimation, and then the Kolmogorov-Smirnov (KS) test is applied to establish whether the difference between the fitted distribution and hypothesised distribution is significant or not at 5% significance level.If the p-value of the KS-test statistic is lower than 5%, then the null hypothesis is rejected suggesting that the LJT cannot be modelled with the hypothesised distribution.
Two different analyses are conducted to assess the distribution of LJTs.The first uses all the available data for a given LJT (i.e.87 observations for an LJT within the analysis period).However, low data quality might hamper the quality of the analysis.Therefore, the distribution of LJTs has also been tested for cleaned data.In order to obtain the cleaned data, outliers are determined and removed by using the interquartile range method as described in Frigge, Hoaglin, and Iglewicz (1989).The results are illustrated for both raw and cleaned data in Table 1.
It is seen that lognormal distribution describes the LJTs best, as it rejected the least number of hypotheses for both raw and cleaned data.For raw data, lognormal distribution can be used to describe 63% of the 909/61,480), whereas gamma, normal and exponential distributions describe LJTs by 55%, 40% and 0%, respectively.Although the majority of LJTs can be modelled by lognormal distribution, there is still a huge ratio of LJTs that cannot be modelled using a lognormal distribution.For cleaned data, however, results reveal a substantial improvement to model the LJTs in which lognormal distribution can be used to describe 88% of the LJTs; whereas gamma, normal and exponential distributions model 87%, 80% and 0% of all LJTs, respectively.Consequently, π th percentile values are estimated based on this cleaned data.

Results
This section reports the results of the two proposed NRC detection methods based on the evaluation criteria presented in Section 2.3.The analysis is conducted for the weekdays of October 2010 on London's urban road network.Consequently, a total of 20 weekdays are analysed.The analysis has been conducted between 07:00 and 19:00 for all the analysed weekdays (i.e.T = 145 as each LJT is estimated every 5 min).In order to investigate the NRCs in a real-life context, raw LJT estimates as would be seen by a traffic operator have been used within this section.The results are compared for both of the proposed NRC detection methods with different parameter settings.These parameters and how they are chosen are discussed in Table 2.
Note that a tube strike occurred on 4 October, which increased LJTs considerably across the entire road network (Tsapakis et al., 2013).In this research, we focused on local NRCs and which led us omit the tube strike day.This decision is related with the question 'what is recurrent?'.Most traffic scientists and domain experts acknowledge that in recurrent congestion and NRC, the temporal scale to analyse traffic data is considered to be a 'day'.Tube strikes are very rare events that may occur only several times a year; hence, are not within the scope of this paper.

High-confidence episodes
A high-confidence episode is reported to occur when the estimated LJTs are at least 40% higher than their expected values (i.e.c = 1.4) for at least a minimum duration of 25 min (Anbaroglu, Heydecker, and Cheng 2014).Detecting such high-confidence episodes would correspond to detecting the majority of the total delay on a given date of analysis.The average FNR is illustrated with respect to the average FAR for all the analysed dates in Figure 4 for different NRC detection models.
The results suggest that the temporal window size (τ ) in STSS-based NRC detection has an important effect on the detection of high-confidence episodes.When the temporal window size is increased, STSS models detect high-confidence episodes much more effectively.Almost 50% of the high-confidence episodes could further be detected by increasing τ from one to six.The corresponding increase in FAR suggests that more LJTs are also considered to belong to an NRC that do not belong to a high-confidence episode.Another noticeable result is that performance increments are higher at lower values of τ .For example, increasing τ from one to two leads to a decrement in FNR of approximately 15%, whereas this percentage decreases to approximately 4% when τ is increased from five to six.Similarly, increasing ρ from one to two results in a greater decrement in FNR compared to increasing ρ from two to three.However, increasing ρ does not always necessitate better detection of high-confidence episodes.For example, when τ is three, increasing ρ from two to three demonstrated a slight performance loss.This is because increasing ρ leads to higher maximum likelihood ratio values.This might lead to the disintegration of some NRCs or to missing to detect them entirely.However, most of the time increasing ρ from two to three does not lead to a substantial change in the results, suggesting that information obtained from spatial domain has less effect compared to the temporal domain.
It is also evident that percentile-based NRC detection has shown a better performance in general compared to STSS-based NRC detection, as it missed much fewer high-confidence episodes.Even the most conservative percentile-based NRC detection model (i.e.π = 95) has successfully detected approximately 75% of high-confidence episodes, and this performance increases to approximately 95% if the π = 75 model is used.These results reveal that STSS-based NRC detection is more conservative than percentile-based NRC detection.Liberal methods, however, are susceptible of considering day-to-day variations in traffic to belong to an NRC.This could also be observed by investigating the FAR values, as liberal models result in higher FAR values.Note that all of the percentile-based NRC detection models have higher FAR values compared to STSS models.The lack of ground-truth data prevents this research to determine the most suitable NRC detection model by examining the FAR values.1926).In our context, this decision would correspond to considering an NRC as an event happening almost once in a month of weekdays Congestion factor (c) -1.2One of the major limitations of unmodified (i.e.c = 1) spatial and STSS-based approaches is that the detected clusters might contain links that do not have NRC but are adjacent to links which have NRC.This was the main motivation for researchers to detect irregularly shaped clusters (Duczmal and Assunção 2004).By using a congestion factor, it would be guaranteed that all LJTs that are included in a significant STR would be at least 20% higher than their expected values.In addition, running an unmodified STSS is cost-prohibitive in terms of computational runtime (Neill and Moore 2004).The main reason for using c = 1.2 is the practical guidelines, in which TfL considers a link to have 'minimal congestion' whenever estimated LJTs are 20% higher than their expected values (TfL 2010, 95) Spatial window size (ρ) -1-3 links Our empirical analyses of London's urban road network suggest that the combined effect of links do not have a substantial effect on the detected NRCs for STRs containing three or more adjacent links.Consequently, the upper limit of the maximum spatial window size is chosen as three Temporal window size (τ ) -1-6 (i.e.5-30 min) The LJT estimation interval is 5 min.Therefore, by varying the temporal window size between 1 and 6, we have allowed an NRC event to develop between 5 and 30 min.The maximum temporal window size is limited to 30 min, as such a time lapse sufficiently long for a congestion event to develop (Jain, Sharma, and Subramanian 2012)

Localisation Index
The Localisation Index for each of the analysed NRC detection models is calculated for every analysed date.The boxplot of the Localisation Index values is shown in Figure 5 in log-scale in order to improve the legibility of the results.
Liberal NRC detection models perform poorly on Localisation Index (i.e. have high Localisation Index values), as their tendency to include day-to-day variations in traffic to belong to an NRC is higher than conservative models.This trend is clear for the percentile-based NRC detection models: as π value increases from 75 to 95, the NRC detection method becomes more conservative; hence, achieve a lower Localisation Index.However, even the most conservative percentile-based NRC detection model that has been analysed (i.e.π = 95), has not achieved the best Localisation Index (i.e.zero in log-scale) value in any of the analysed dates.On the other hand, STSS-based NRC detection models achieved the best Localisation Index score.Specifically, on some analysed dates, all τ = 1 models, and {τ = 2, ρ = 1} model achieved the best score on Localisation Index.These results suggest that it is better to liberalise an STSSbased NRC detection model by increasing the maximum temporal window size rather than the maximum spatial window size in a heterogeneous urban road network context.

Link-based analysis
The aim of this section is to illustrate the advantages of the proposed NRC detection methods on some exemplar links that demonstrate the heterogeneity of an urban road network.In order to compare different NRC detection models, the two aforementioned conflicting criteria are combined into one using a weighted product model by assuming equal weights for both of the criteria (Triantaphyllou and Mann 1989).Our empirical analysis, which are omitted here due to space limitations, suggests that STSS-based NRC detection method outperforms percentilebased NRC detection method, whenever τ ≥ 4 (20 min) and ρ is one.The best percentile-based NRC detection method is found to be π = 95.These two best-performance models are compared with the traditional way of detecting NRCs, clustering episodes (CE), which considers an LJT to belong to an NRC whenever it is 40% higher than its expected value (Anbaroglu, Heydecker, and Cheng 2014).Consequently, the 'threshold' is obtained by multiplying the expected LJTs with 1.4.These three models are denoted by STSS, π and CE, respectively, in Figure 6.The date of analysis is 23 June 2010 (Wednesday) between the times 07:00 and 19:00.The estimated LJTs that are found to belong to an NRC are highlighted in red, and the estimated and expected LJTs are illustrated with respect to the sample size.In this way, the effectiveness of the proposed  methods on handling the two main issues that lead to a heterogeneous urban road network are emphasized through the analysed links.Low data quality is a common issue in traffic science, and LJT data are not an exception.Unreliable estimation of LJTs occur when only a few vehicles' journey times are captured at a given time interval.A good example of this situation is link 2394, which has a length of 1.3 km.Most of the LJTs of this link are estimated based on a single vehicle's journey time, which decreased the reliability of the estimated LJTs.This could explain such erratic increases in estimated LJTs as shown in Figure 6(a), as the captured vehicles might have detoured.Because no incident was reported on this link, the likelihood that the substantially high LJTs are due to sampling error is even higher.Furthermore, empirical investigation shows that the sample size used to estimate an LJT is usually very low on this link regardless the analysed day.Therefore, most estimated LJTs exhibit high variance.The high variance is effectively captured in STSS, as it reported the least number of LJTs to belong to an NRC.Even though percentile-based NRC detection method also achieved to capture some of the variations in LJTs, it was not as successful as STSS.The traditional method failed on this link, as it could not handle the variation of the LJTs on this link and reported the highest number of LJTs to belong to an NRC.
Shorter links exhibit high variability around their expected values, as the estimated LJTs are very low.The timing of a traffic light, for example, could substantially affect the estimated LJTs on short links.Therefore, it is interesting to understand the performance of NRC detection models on the shortest link of London's road network (i.e.link 2098), which has a length of 175 m.The advantages of STSS and percentile-based NRC detection methods can be observed on this link.Specifically, STSS did not report any LJT to belong to an NRC, and percentile-based NRC detection signalled only a single LJT to belong to an NRC as shown in Figure 6(b).Although some LJTs are excessive (higher than 40% than their expected LJTs), this is probably due to dayto-day variations, as the delay is negligible.However, the existing method (CE) considered many LJTs to belong to an NRC.As no incident was recorded on the incident database and considering the small amount of delay, the detected NRCs are likely to be wrong.Note that the sample size on this link is also relatively small, but the estimated LJTs are plausible.This suggests the difficulty in determining low-quality links just based on the sample size information.

Conclusions
This paper proposes two novel NRC detection methods, both of which aim to capture the heterogeneous nature of an urban road network by modelling the estimated LJTs.It is observed that lognormal distribution is the most suitable way to model the estimated LJTs.This information is used to estimate the π th percentile values in percentile-based NRC detection method, and derive a likelihood ratio score function in the STSS-based NRC detection method.Both of the NRC detection methods have been evaluated with different parameter settings on London's urban road network.It is observed that STSS-based NRC detection would outperform percentile-based NRC detection when ρ is one, and τ ≥ 4. Link-based analysis demonstrate the effectiveness of the proposed methods on links whose LJTs exhibit high variance.Such links occur either when the quality of the estimated LJTs is low or when the length of a link is short.
In order to explore the full potential of NRC detection on large urban road networks, this paper could be extended in the following directions.First, missing LJT data is another important challenge that requires to be handled.This could be addressed by fusing probe-vehicle data (e.g.mobile phone data) with the proposed NRC detection methodology.Second, the way in which the adjacency matrix is defined could be improved either by incorporating domain knowledge or by developing a dynamic adjacency matrix which would consider the current traffic situation.In addition, theory of STSS-based NRC detection could be improved by developing a likelihood ratio score that considers the spatial and temporal correlations within the estimated LJTs.Furthermore, it would be exciting to apply the proposed NRC detection methods on different urban road networks and understand how the results vary on them.Finally, it would be intriguing to develop effective visualisations so that traffic operators could investigate the detected NRCs in a legible and easy way.
Above all, the ultimate goal and research direction would be developing methods that could predict emerging NRCs.This research direction has several challenges.First is the collection of (near) real-time data.Specifically, the current LJT estimate could only be obtained approximately 30 min from now on for London's ANPR network.LJT estimation interval and the complexity of the algorithm converting the vehicle journey times to LJT are the two main factors affecting this delay.Second, some causes of NRC are simply unpredictable (e.g. a traffic accident).Accounting such unpredictable factors when developing methods to detect emerging NRCs would be a challenging task.

Figure 4 .
Figure 4. Performance of STSS and percentile-based NRC detection models regarding the detection of high-confidence episodes.

Figure 5 .
Figure 5. Boxplots indicating the Localisation Index of different NRC detection models.

Figure 6 .
Figure 6.Comparison of best-performing NRC detection models on a low-quality link (a) and on the shortest link of London's urban road network (b) on 23 June 2010.

Table 1 .
Identifying the distribution of LJTs on raw and cleaned data.

Table 2 .
Parameters of the NRC detection methods.The significance level of 0.05 has commonly been used in most of the statistical literature to determine significance of a statistical test.It has commonly been used since the 1920s (Fisher