Determining the road traffic accident hotspots using GIS-based temporal-spatial statistical analytic techniques in Hanoi, Vietnam

This study applied GIS-based statistical analytic techniques to investigate the influence of accident Severity Index (SI) on temporal-spatial patterns of accident hotspots related to the specific time intervals of day and seasons. Road Traffic Accident (RTA) data in 3 years (2015 − 2017) in Hanoi, Vietnam were used to analyze and test this approach. Firstly, the RTA data were divided into four seasons in accordance with Hanoi’s weather conditions and the time intervals such as the daytime, nighttime, or peak hours. Then, the Kernel Density Estimation (KDE) method was applied to analyze hotspots according to the time intervals and seasons. Finally, the results were presented by using the comap technique. This study considered both analyses with and without SI. The accident SI measures the seriousness of an accident. The approach method is to give higher weights to the more serious accidents, but not with the extremely high values calculated on a direct rate to the accident expenditures. The results showed that both analyses determined the relatively similar hotspots, but the rankings of some hotspots were quite different due to the integration of SI. It is better to take into account SI in determining RTA hotspots because the gained results are more precise and the rankings of hotspots aremore accurate. From there, the traffic authorities can easily understand the causes behind each accident and provide reasonable solutions to solve the most dangerous hotspots in case of limited budget and resources appropriately. This is also the first study about this issue in Vietnam, so the contribution of the article will help the traffic authorities easily solve this problem not only in Hanoi but also in other cities. ARTICLE HISTORY Received 10 December 2018 Accepted 10 October 2019


Introduction
Road Traffic Accident (RTA) is one of the most complicated issues over the world. There are around 1.3 million deaths and 50 million injured as results of RTA every year in the world (ITF 2017). To significantly decrease the number of accidents, it is really necessary to know exactly where and when accidents happen frequently. The locations, which are identified by a high accident occurrence compared with the other locations, are known as hotspots or black spots (Dereli and Erdogan 2017). Past studies show that the occurrences of RTA are not random in space and time. In fact, these locations identified by several key factors such as geometric design, traffic volume, surroundings, or severe weather conditions, etc. (Xia and Yan 2008). Therefore, in order to effectively build accident preventive plans, it is really vital to determine potential dangerous locations associated with accident occurrence time (Harirforoush 2017).
Road transportation system is considered as the highest risk system that people must confront every day. RTAs are increasing because the development of transportation infrastructure system is slower than that of other sectors such as real estates and industrial zones. There were over a third of deaths owing to RTA in low and middle-income nations among vehicles, cyclists, and pedestrians where transportation infrastructure system is backward (WHO 2015). Vietnam is a developing country, thus RTA issue also is one of the major concerns of transportation authorities. As of 19 April 2018, National Traffic Safety Committee (NTSC) of Vietnam reported on its website that the annual social expenditure of RTA in Hanoi, is the capital of Vietnam, in terms of medical treatment, deaths, and property damage occupy 2.9% of GDP (5 − 12 billion USD) (Mai 2018).
In 2018, there were 18 736 traffic accidents, about 8248 deaths and 14 802 injured on Vietnam's road networks (Chung 2019). Currently, non-spatial modeling has been used in Vietnam to identify RTA hotspot locations, namely: Accident Frequency Method (classified by levels of injuries) over 1-year period (MOT 2012). This is the oldest and simplest method to identify dangerous locations. However, this method has many limitations such as lack of visualization, connection between space and time, a ranking of hotspot's priority, and it does not take into consideration traffic volume (Li 2006). Currently, there has not been any study that deals with accident mapping in Vietnam.
The fact showed that accident datasets presented in form of tables or graphs are too complicated to understand comprehensively (Shi and Pun-Cheng 2019). It is really necessary to build an effective spatialtemporal analysis method to investigate the large accident datasets (Shekhar et al. 2011). Clustering is one of the most important unsupervised learning algorithms that has been widely applied in determining RTA hotspots (Han, Pei, and Kamber 2011). Wherein, events are grouped according to their similarity. There are four main types of clustering algorithms such as partitional, hierarchical, grid-based, and densitybased clustering algorithms (Han and Kamber 2001). Partitional algorithms include k-means and k-medoids. This method requires predefined cluster numbers, so it is not applied in many applications. Hierarchical algorithms include Clustering Using REpresentatives (CURE) and Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH), which are effective for summarizing and visualizing data. However, this method is difficult to scale up since each decision needs to evaluate many events. Gridbased algorithms include STatistical INformation Grid (STING) and CLustering In QUEst (CLIQUE). Partitioning and hierarchical algorithms are suitable to identify spherical-shaped clusters. Density-based algorithms include Density-Based Spatial Clustering of Applications with Noise (DBSCAN), Ordering Points to Identify the Clustering Structure (OPTICS), and DENsity-based CLUstEring (DENCLUE), which are popular techniques for identifying clusters of arbitrary shape. In DBSCAN and OPTICS, density is computed by counting the number of events in an area defined by a bandwidth. This is highly sensitive to the bandwidth value applied. To deal with this issue, DENCLUE or Kernel Density Estimation (KDE) can be applied, which is a nonparametric density estimation technique. The KDE method can efficiently lessen the effect of noise by equally allocating noise into the input data (Han and Kamber 2001).
As a common GIS algorithm, KDE can create a density map, which shows the density of the accident points (Mohaymany, Shahri, and Mirbagheri 2013). Meanwhile, DBSCAN only creates the shape of the clusters and does not mention the density of the clusters. The density differences among clusters are not highlighted (Hegyi, Borsos, and Koren 2017). Also, DBSCAN cannot group well events with changed densities is the biggest drawback of this method (Qiu, Xu, and Bao 2016). The great advantage of DBSCAN is that it can identify clusters with arbitrary shapes (Shi and Pun-Cheng 2019). Besides, differ from several clustering methods, KDE and DBSCAN do not require predefined cluster numbers (Aoying and Shuigeng 2000).
GIS has been applied widely in traffic safety studies in many countries for a long time (Mohaymany, Shahri, and Mirbagheri 2013;Dereli and Erdogan 2017). The locations of accidents and their attributes are stored in GIS. Thus, it is easy to find out the reasons for each accident. The use of spatial data plays a critical role in the traffic safety analysis. GIS enables us to collect, store, manipulate, query, analyze, and visualize the spatial data (Lloyd 2010). However, currently, in Vietnam, GIS has not been applied in identifying RTA hotspot locations.
GIS-based spatial analysis of RTA has been popularly applied to explore hotspots (Anderson 2009;Vemulapalli et al. 2017). The hotspots will be showed accurately on the map associated with the attribute information of each accident. This helps us understand the reasons behind each accident. The KDE method has been commonly applied for identifying high accident risk locations on the road networks (Harirforoush and Bellalite 2016;Vemulapalli et al. 2017). In contrast to spatial analysis, temporal analysis of RTA is not seriously considered (Plug, Xia, and Caulfield 2011;Vemulapalli et al. 2017). There were several past studies carried out using temporal analysis of traffic accidents. However, their outcomes were mainly depicted by simple graphs, which do not enable us to visualize accident clusters varied over time (Harirforoush 2017).
It is vital to have a deep understanding of both the temporal and spatial dimensions simultaneously of accidents so that there is a comprehensive view of accident hotspots. Nevertheless, currently, there are very few studies that focus on the spatial-temporal analysis in identifying RTA hotspots (Plug, Xia, and Caulfield 2011;Dai 2012;Vemulapalli et al. 2017). The comap method enables us to analyze temporal-spatial integration effectively (Brunsdon, Corcoran, and Higgs 2007). The past studies revealed that the comap method well-highlighted specific locations involving high accident intensity in a specific time period (Harirforoush 2017). Plug, Xia, and Caulfield (2011) applied the comap method to discover the temporal-spatial interaction influence on single-vehicle accident patterns. However, the seasonal factor and Severity Index (SI) were not mentioned. Vemulapalli et al. (2017) used this method to investigate the spatial-temporal patterns of aging-involved accidents in three urban counties in Florida. This study illustrated accident hotspots varied over time. However, the seasonal factor and SI were not mentioned. Harirforoush (2017) used this method to explore the spatial-temporal patterns of seasonal accidents but accidents varied over the specific time intervals of the day were not mentioned.
In addition, in many past studies, counts of accidents are often utilized to assess the safety issue at a position. The past studies also show that it is difficult to realize whether the high or low cluster is occurring without weighted data. It is recognized that the more serious accidents will have the greater weights in determining dangerous places based on accident expenditures. Thus, accidents should be weighted associated with severity in identifying unsafe locations (Geurts et al. 2004). SI was integrated in KDE method to investigate accident hotspots on some highways in India. However, temporal-spatial integration was not mentioned (Sandhu et al. 2016), despite of using SI in the hotspot study (Iyanda 2019).
Therefore, the purpose of our paper was to explore the temporal-spatial patterns of RTA hotspots in Hanoi varied according to the special time of day and seasons in both cases of with and without accident SI. Firstly, the KDE method was used to determine RTA hotspots with or without weighted data. Secondly, the comap technique was applied to analyze the spatial-temporal patterns of RTA hotspots. Finally, their orders were arranged in accordance with their significance. The aim of our study is to present an advanced process of identifying RTA hotspots. Especially, since this is the first research about this issue in Vietnam so that the contribution of the article will help the traffic authorities easily solve this problem not only in Hanoi but can also be applied to other cities. The remainders of the article are arranged as follows. Study data are depicted in Section 2. Section 3 shows the proposed methodology. Section 4 presents the results and discussions. Finally, conclusions and future researches are presented in Section 5.

Study area and data
This study was carried out in Hanoi, Vietnam. Due to the fact that this is the first study about applying GIS in analyzing RTA in Vietnam, it is really difficult to collect data adequately. In addition, Hanoi is the capital of Vietnam and covers an area of around 3344.47 km 2 and with a population of about 7 742 200 in 2017. This study used two datasets. First, a road network digital map was provided by the Environment and Resource Department in Hanoi including specifications like road length, width, and types. Second, a RTA dataset in 3 years (2015 − 2017) was provided by the Transport Police Department in Hanoi. There were 1132 accidents on Hanoi's roads during this period. The RTA dataset included vital accident information like the date and time of accidents, accident places, accident and vehicle types, age and gender of drivers, the number of the injured, accident injury levels, etc. Figure 1 illustrates the study zone with the locations of accidents in Hanoi from 2015 to 2017. Table 1 describes the accident severity and the corresponding percentage of injuries.
The Jupyter Notebook programming was used to calculate and describe traffic accident data. Graphic representations of accident data enable us to generally understand the accident data. Accidents occur more frequently in December, January, and April in a year (as shown in Figure 2(a)), and on Saturday in a week (as shown in Figure 2(b)). In addition, accidents often occur from 14:00 to 15:00 and from 19:00 to 23:00 in a day (as shown in Figure 2(c)).

Comap method
The comap method enables us to analyze temporalspatial integration and helps us to understand the relationships between the locations of RTA and their changing over time (Brunsdon, Corcoran, and Higgs 2007). In this paper, the three-year (2015 − 2017) accident data in Hanoi, Vietnam were divided according to the specific time intervals of a day and the seasons of a year in accordance with Hanoi's weather conditions. Next, the KDE method was utilized to compute and analyze the intensity of each subset. Finally, the spatial distributions of RTA varied over time were demonstrated in different maps (Vemulapalli et al. 2017;Plug, Xia, and Caulfield 2011).
The class boundaries should be overlapped each other suggested by some researchers (Plug, Xia, and  Caulfield 2011). In this research, the RTA data were divided into four different seasons and several days were overlapped to evade the temporal edge issue as shown in Table 2.

Severity Index (SI)
The accident SI measures the seriousness of an accident. It is really difficult to realize whether the high or low cluster is occurring without weighted data. The outputs of the accident severity calculation method are from different weighting systems. The approach is to give higher weights to the more serious accidents, but not with the extremely high values calculated indirect rate to the cost of accidents. This research applied an accident severity weighting system used by the Belgian government. As per this system, individual weights of 1, 3 and 5 are provided for slight, serious, and fatal accidents, respectively (Geurts et al. 2004). SI for each location can be calculated by Equation (1): where SI is severity index for each location; L is the total amount of slight injuries; S is the total amount of serious injuries; and D is the total amount of deaths. Belgium's definition of hotspots represents a high economic level. Equally the level of economic damage has been determined. One death is equivalent to five slight injuries and one seriously injured is equivalent to three slight injuries. These weighting value combinations apply a more moderate and appropriate approach to emphasize the significance of fatal accidents (Geurts et al. 2004). The sensitivity of the hotspot ranking in Belgium to this weighting selection for the severity of injury was investigated by Geurts (2006). In addition, this weighting system was successfully adapted to determine accident hotspots in several cities in India (Karuppanagounder 2011;Choudhary, Ohri, and Kumar 2015). From here it can be estimated how much damage is considered to be a hotspot (Phong 2018). To account for accident severity, the weight as per the Belgian system is assigned to each accident which is represented as its Identification Number (ID). This facilitates the counting of every accident according to its weight assigned.

Kernel Density Estimation (KDE) method
KDE is one of the most effective methods to determine the spatial models of RTA. The intensity of events is calculated within a definite research bandwidth in the   study regions to create a smoothed surface. A kernel function is utilized to allocate a weight to the region surrounding the events proportionate to its distance to the point event. From there, the value is highest at the point event center and decreases smoothly to a value of zero at the radius of the research circle. At the end, a smoothly continuous intensity surface is created by adding the individual kernels in the research region (Anderson 2009). The density at a definite position is computed by Equation (2): where f(s) is the density estimate at the location s; n is the amount of observations; h is the research bandwidth; K is the kernel function; d i is the distance between the location s and the location of the ith observation. The output of the KDE method is presented in a raster format consisting of a grid of cells. The two main parameters that influence the KDE method are cell size and bandwidth. The choice of bandwidth is quite subjective (Anderson 2009). The past studies used this value changing from 20 m to 1000 m (Xie and Yan 2013). In our research, we tried to do it 10 times including 100 m, 200 m,. . ., and 1000 m in order to find the optimal bandwidth for our research. Finally, we considered 1000-m-bandwidth value because it enables us to visualize RTA hotspot locations easily.

Categorization of hotspots
In the case of analysis using KDE, because there is no index associated with statistical significance, hotspots were categorized using equal intervals. In this study, the categorization was done in five categories including very low, low, medium, high and very high density based on their associated accident density. Figure 3 illustrates the distribution of the severity indices at accident locations in relation to specific time intervals of the day and seasons. Figure 3(a-d) shows the different accident severity indices according to time intervals of the day. Accidents with the high severity indices often focus on the period of 12:00 am-11:59 pm as shown in Figure 3 (c,d). Besides, the seasonal factor also directly affects the accident severity index. Figure 3(e-h) depicts the severity indices varied among seasons of the year. Accidents with the high severity indices often happen in winter as shown in Figure 3(h). In contrast, accidents with the low severity indices often happen in spring and fall as shown in Figure 3(e,g).

Distribution of time-intervals-related hotspots without accident severity index
The comap method enables us to identify whether the same dangerous positions are dependent on temporal variations in RTA. Therefore, a comap was created to investigate the temporal-spatial patterns of RTA hotspots. Figure 4 shows that the RTA hotspot patterns in Hanoi varied over specific time intervals of the day. As shown in the Figure 4(a), in the early morning (00:00 am-5:59 am), there were 170 accidents although there were not many traffic users. The hotspots are mainly located in the southern part of Thang Long Bridge, Nhat Tan Bridge, Chuong Duong Bridge, Vinh Tuy Bridge, and Nguyen Trai Road (encircled in black). These are main roads that connect the center of Hanoi with the neighborhood of Hanoi. This explains that most traffic users at this time are workers from the suburb areas of Hanoi, and they frequently travel to the city center at this time. Besides, in Vietnam, heavy trucks and construction material trucks are allowed to operate at night. In this time, the bad lighting conditions and tired state lead to serious accidents among trucks and motorcycles at these hotspots. These points were also the RTA hotspots observed from reality. This indicated that several RTA hotspots only occurred during a specific time of the day as shown in Figure 4(a) (encircled in black). These spots have high intensity during the period of 00:00 am-5:59 am while its intensity was not high in other time intervals of the day. In addition, the accidents distributed in Figure 4(b,c) look similar. The accidents mainly occur in the center of Hanoi and National Highway (NH)-1A segment with a higher intensity in Figure 4(b). In particular, there was an abnormality in Figure 4(d). Even though the number of accidents occurred mostly at night (6:00 pm-11:59 pm), the number of hotspots seemed less than the other intervals and the intensity of these hotspots was not high compared with other time intervals of the day. This abnormality can be explained that without accident SI, it is difficult to realize whether a high or low cluster is occurring.

Distribution of time-intervals-related hotspots with accident severity index
This section enables us to understand more clearly the difference between with and without SI in identifying RTA hotspots. Thanks to comap technique, the distinction between taking into account SI and not taking (a) 00:00am -5:59am (b) 6:00am -11:59am (c) 12:00am -5:59pm (d) 6:00pm -11:59pm into account SI in determining RTA hotspots becomes more pronounced and easily investigated. Distribution of time-intervals-related hotspots with accident SI is shown in Figure 5. Both analyses determined the relatively similar hotspots, but the rankings of some hotspots were quite different due to the integration of accident SI (Figures 4 and 5). Because there is no index associated with statistical significance, the hotspots were categorized using equal intervals. In this study, the categorization was done in five categories including very low, low, medium, high and very high density based on their associated accident density. After the analysis of ranking, the effect of the accident SI over the results can be easily observed. For instance, Location 1 (encircled in blue in Figure 4(a)) was ranked in medium-density without SI while it was ranked in very high density with SI (as shown in Figure 5(a)). A similar situation can be observed in Location 2. Location 2 (encircled in blue in Figure 4 (b)) was ranked in low density without SI while it was ranked in very high density with SI (as shown in Figure  5(b)). In addition, the hotspots (encircled in black in Figure 4(d)) were ranked in medium-density without SI while it was ranked in very high density with SI (as shown in Figure 5(d)). This is due to a larger percentage of fatal and serious accidents in these locations which show their impacts over the results. Notably, in the southern part of Hanoi (highlighted by the ellipses) (Figures 4 and 5) were the hotspots at all time intervals of the day. This is easy to explain because these locations were two RTA hotspots according to the report of the Hanoi Transport Department. These were two intersections with Cho Tia station and Van Diem station located nearby on NH-1A. These zones have many pedestrians and vehicles traveling all day with many illegal crossings and passages between railway and roadway, as shown in Figure 6. These areas have high traffic volume while the road is narrow and there is no median strip.

The distribution of season-related hotspots without accident severity index
In this part, a comap was also generated to comprehend the temporal-spatial patterns of season-related hotspots without SI, as shown in Figure 7. Figure 7 shows that the allocation of RTA hotspots in Hanoi varied among seasons. According to statistics, the number of accidents is similar in summer and winter, namely 367 and 392 accidents, respectively. More importantly, the distribution of RTA hotspots is quite similar in both seasons, as shown in Figure 7(b,d). The amount of RTA in summer and winter is two times higher than the rest and RTA hotspots with higher intensity (in red) were concentrated mainly in the city center where trade centers, schools, and hospitals are located, and in particular along NH-1A segment, especially in Van Dien, Cho Tia, Thuong Tin station zones (encircled in black in Figure 7(b,d)). RTA fluctuated in both space and time as shown in Figure 7. There were several key factors affecting the level of changing. Namely, the weather conditions are quite severe in summer and winter in Hanoi. The temperature can reach over 42 degrees during summer and 8 degrees during winter. In addition, there were months of rain in summer (e.g. in June and July). These severe conditions make the number of accidents much higher than in fall and spring. In contrast to the harsh weather, fall in Hanoi is extremely pleasant, cool, no rain. It is probably thanks to this that the number of accidents is only 163 cases.
Besides, seasonal-related RTA hotspots could be determined by comap. There were several seasonalrelated hotspots only occurred in a specific season.
These hotspots surrounded by the ellipses. Namely, in fall, accidents often happened at Vo Chi Cong -Xuan La intersection (as shown in Figure 7(c)), while its intensity was not high in other seasons. In spring, the accident intensity was high at Pham Hung -Duong Dinh Nghe intersection (as shown in Figure 7(a)), while its density was not high in the others.
There was an unusual occurrence in fall (as shown in Figure 7(c)). Although the number of accidents was lowest in fall, there were too many hotspots and high intensity compared with the rest. This can be explained after recalculating the hotspots with accident SI.

Distribution of season-related hotspots with accident severity index
This section enables us to understand more clearly the difference between with and without SI in identifying seasonal-related RTA hotspots. The distribution of season-related hotspots with accident SI is shown in Figure  8. Both analyses determined the relatively similar hotspots, but the rankings of some hotspots were quite different due to the integration of accident SI (Figures  7 and 8). After the analysis of ranking, the effect of the accident SI over the results can be easily observed. Figure 8(c) seems to give more reasonable results. Taking into account SI helped us detect hotspots that were not really dangerous. There were only a few hotspots with low-intensity levels and a special hotspot with a very high-intensity level. This was Cho Tia station location. In addition, comparing Figure 7(b) with Figure 8(b), it was easy to see that several hotspots were ranked in high density without SI while they were ranked in medium-density with SI. Taking into account SI also enables us to detect hotspots that are not really dangerous. However, it is necessary to validate with reference data and the reality. After validating with reference data, the results of the proposed method are appropriate to the observations from the reference data and the reality.

Conclusions and further researches
The article has shown that the combined analysis of space and time in identifying RTA hotspots enables traffic authorities to capture the situation accurately and timely.
In addition, hotspots were ranked according to their level of danger. From there, the traffic authorities easily provide reasonable solutions to overcome these issues in case of limited budget and resources appropriately.
This approach demonstrates strengths in traffic safety analysis, particularly in identifying RTA hotspots. The distribution of RTA hotspots varied over time according to the different seasons and specific time intervals of the day. Our study showed that traffic accidents occurred frequently in winter and summer. In addition, traffic accidents occurred frequently at two periods of time, 2:00 pm-3:00 pm and 7:00 pm-11:00 pm. The hotspots often occurred at the intersections in the center of Hanoi and near the illegal crossroads and stations along NH-1A. This study considered both analyses with and without accident SI. This research employed an accident severity weighting system used by the Belgian government. Both analyses determined the relatively similar hotspots, but the ranking of some hotspots was quite different due to the integration of SI. In the evaluation of accident hotspots, besides the accident frequency, the accident severity level is also important because it helps to point out the accidents with great damage. According to the above analysis, it is better to take into account accident SI in determining RTA hotspots because the gained results are more precise and the rankings of hotspots are more accurate. From there, the traffic authorities easily provide reasonable solutions to solve the most dangerous hotspots in case of limited budget and resources appropriately.
The research determined several important hotspots. For instance, the intersections located near Cho Tia and Van Diem stations on NH-1A. These were two dangerous locations that often cause serious accidents in all-time intervals of the day. Based on the causes and types of accidents in these locations, as well as from the above analysis, the authors proposed several possible solutions for these hotspots as follows: These are two zones with a high population density on both sides of the road, many illegal crossings and passages, many atgrade rail crossings without barriers and signals, lacking lighting equipment while narrow road, and no median strip. Therefore, it is really necessary to supplement signs and signal lights at intersections, clear the illegal crossroads, supplement a hard median in the middle of the road, add pedestrian walkways, and widen road lanes. At night, it needs to add roadside lighting equipment. These countermeasures can also be applied to the remaining hotspots along NH-1A because they have similar characteristics. This is also the first study about this issue in Vietnam, so the contribution of the article will help the traffic authorities easily solve this problem not only in Hanoi, but also can be applied to other cities. However, within the scope of the paper, there are several limitations that need to be solved in near future. Firstly, traffic volume needs to be considered in identifying RTA hotspots because it has a direct relationship with accident frequency. Secondly, these hotspots need to be tested statistical significance because KDE method has a drawback that the uncertainty about the exact location of the traffic accident is showed by the search bandwidth of the kernel (Anderson 2009).

Notes on contributors
Khanh Giang Le is currently a PhD candidate in PhD program of Civil and Hydraulic Engineering, College of Construction and Development, Feng Chia University, Taichung. He is also a lecturer at Faculty of Civil Engineering, University of Transport and Communications, Hanoi, Vietnam. He is interested in applying GIS, GPS, spatial statistics, and geospatial analysis in transportation sector and urban studies.
Pei Liu is an associate professor at College of Construction and Development, Feng Chia University. His areas of expertise include highway engineering, artificial intelligence methods, numerical methods, and pavement engineering.
Liang-Tay Lin is a professor at College of Construction and Development, Feng Chia University. His areas of expertise include traffic engineering, traffic control, traffic flow theory, and urban traffic management.