An image generation approach for traffic density classification at large-scale road network

ABSTRACT Recently, with the rapid development of deep learning models, traffic analysis using image datasets recently has attracted more attention. Specifically, the network traffic can be represented to images as the input for deep learning models to provide various applications (e.g. Spatio-Temporal traffic forecasting). In this study, we propose a new image generation approach for traffic density classification in terms of large-scale road network. Particularly, traffic volume and speed are at certain areas able to be measured by using surveillance systems (e.g. loop detectors). However, measuring the density is difficult which depends on the spatial correlation from the perspective of the network. Consequently, an effective image generation approach, based on information arrival and departure time of vehicles, is proposed to deal with this problem. Regarding the experiment, traffic density classification using a convolutional neural network is executed on roadside equipment data of 11 continuous intersections for evaluating the effectiveness of the proposed approach.


Introduction
In recent years, Deep Learning (DL) models have achieved superior performance in a variety of application domains (Alom et al., 2019). In case of Intelligent Transportation Systems (ITS), researchers are interested in applying DL models to analyse traffic flow (e.g. traffic classifications and predictions). Specifically, several well-known DL models such as Convolutional Neural Network (CNN) (Chen et al., 2018), Recurrent Neural Network (RNN), Deep Neural Network (DNN), and Deep Reinforcement Learning (DQN), have successfully applied for traffic flow analysis. In particular, it depends on the assumption of traffic datasets to apply appropriate models for the training process. For instance, CNN-based methods model the road network as an image which is able to capture the spatial dependency of traffic flow. On the other hand, RNN-based methods (i.e. LSTM network) are suitable for learning the temporal dependence. Recently, trend research focuses on combining two aforementioned models to deal with the spatial and temporal dependencies for traffic flow analysis Yu et al., 2017). However, conventional models are not able to explore the hidden patterns, for example, the topology of the road network and the connection between them. Therefore, it is difficult to analyse the traffic condition in terms of the largescale network. Recently, Graph Neural Networks (GNN) has proposed as a promising concept for improving the ability to learn hidden patterns from spatial-temporal graphs, which is an important issue for various applications such as human action recognition and traffic prediction (Wu et al., 2019). Nonetheless, the dynamicity of graphs becomes an emergent issue in which we need to consider how to perform graph convolution with the dynamic spatial relation problem, especially, in the case of transportation domain, where the spatial dependencies change over time with the road condition (e.g. traffic congestion) (Diao et al., 2019).
Based on insights from previous works, we find out that representing the road network to static images as the input datasets play an important step to apply DL models for traffic analysis (Li et al., 2017). Particularly, traffic datasets, that have collected from connected devices on the road (e.g. VDS, RSE, and GPS), can be converted to images with both spatial and temporal relations (Yao et al., 2018). Consequently, the network traffic can be modeled with grid  or graph structures . Specifically, Figure 1 depicts the main steps for traffic analysis from traffic datasets by using image generation approach.
In this study, we propose a new approach for converting network traffic to images. Specifically, we focus on how to classify traffic density in terms of large-scale road network for spatial-temporal traffic flow analysis. Generally, the contributions of this paper can be summarized as follows: . We propose a traffic density classification approach for classifying traffic condition which is an emergent issue to provide solution for improving the traffic flow in term of largescale road network (e.g. congestion detection, dynamic traffic light control system). . A new image generation method is presented for covering the traffic information in terms of both spatial and temporal correlations. Specifically, difference with previous works, in this paper, the images are generated based on the average travel speed values which is able to specify the traffic condition in each location (i.e. sensor's location) following spatial and temporal relation. . The evaluation is executed on a real traffic dataset, which is collected and pre-processed from continuous intersections in an urban area. Therefore, we are able to evaluate the performance of the proposed method by estimating the traffic condition in terms of large-scale road networks. The rest of this paper is organized as follows: In Section 2, we present some literature reviews of DL models for traffic analysis. Moreover, some previous works of representing traffic road network to images and applications of image classification are presented. In Section 3, we present our proposed approach for traffic density classification problem at the large-scale road network. The experiment results, which evaluate on real traffic datasets at an urban area, are shown in Section 4. Section 5 includes some discussions and future works of this study.

Deep learning models for traffic analysis
The rapid growth of traffic data becomes an emergent challenge in Intelligent Transportation System (ITS) that the traditional processing systems are not able to deal with the data analytics requirements. Recently, DL has introduced as a promising approach to deal with characteristic problems in traffic data such as highly nonlinear, time-varying, and randomness (Zhu et al., 2019). Specifically, different DL models enable different data representations to be learned for different applications. Particularly, Figure 2 depicts the applications of DL models for different fundamental tasks in ITS . The preliminary background of several well-known DL models are sequentially described as follows: . Deep Neural Network (DNN): is an artificial neural network (ANN) with multiple layers between input and output layers. Several variants of DNN includes Multilayer Perceptron (MLP), Deep Belief Network (DBN), and Stacked Auto-Encoder (SAE), which the main difference is the design of hidden layers, have been widely applied for traffic prediction (Lv et al., 2015). . Recurrent Neural Network (RNN): are designed to model the sequence data. Specifically, Long-Short Term Memory (LSTM) is a special kind of RNN, which is capable of learning long-term dependencies to address the limitation of the conventional RNN model (e.g. gradient vanishing or exploding problems). Therefore, this model has proved the capability for training time series data in ITS . . Convolution Neural Network (CNN): starts with a convolutional layer to extract common patterns of the training instances which has achieved great success for the image classification in the competition of ImageNet (Krizhevsky et al., 2012). The model most commonly applied to analysing visual imagery. Specifically, CNN has used for visual recognition in traffic analysis (e.g. vehicle counting). Furthermore, the hybrid model, which combines CNN and LSTM, has recently become an emergent issue for traffic analysis that incorporates both spatial and temporal dependency in the traffic flow . . Deep Reinforcement Learning (DQN): A DNN is set up to learn the Q-function of reinforcement learning from inputs such as internal states and set of actions. The objective is to maximize the future rewards through a sequence of actions. This concept has been applied successfully for applying advanced technologies to improve traffic flow at urban areas .

Representing network traffic as images
Time series analysis based on DL models has exploited more complex architectures and can achieve better results than traditional methods. However, in the case of applying DL models for transportation domain, there are still remain challenges that need to take into account: . Traffic time series consider temporal correlations. It is difficult to long term traffic analysis, especially in case of recurring incidents (e.g. rush hours or accidents) which can cause non-stationarity. . The models mainly focus on training the dataset of a single road location or a small network region (e.g. intersection) due to the traffic data from sensors that do not consider the spatial correlations from the perspective of the network.
Recent studies apply Graph Neural Network (GNN) (Zhu et al., 2019) to deal with the spatial-temporal correlation problem. Specifically, several state-of-the-art GNN models such as Spatiotemporal Recurrent Convolutional Networks (SRCN) , Spatial-temporal Graph Neural Networks (STGNN) , Spatio-Temporal Graph Convolutional Networks , and Graph Multi-Attention Network (GMAN) (Zheng et al., 2020) have been proposed as emergent models for analysing the temporal evolution and spatial dependencies in large-scale road network. However, this method mainly focus on traffic forecasting problem (Yin et al., 2020), which analyse and predict traffic flow in each certain point (i.e. sensor's location).
For estimating the traffic condition (e.g. congestion estimation) in large-scale areas, representing the road network as image for the input data become a promising method . Specifically, traffic road network is convert into a time-space matrix which is constructed using time and space dimension information (e.g. vehicle position or average speed) as shown in Figure 3.
However, the drawback of this approach is that the spatial structure of traffic road network is represented with Euclidean space among road sections. Therefore, it is difficult to specify the traffic condition among road sections. In this regard, this paper propose a new method for generating image from traffic road network which based on the average traffic features (e.g.traffic volume, traffic speeds). To the best of our knowledge, this is the first paper which focus on converting network traffic to images in order to classify the traffic condition in large-scale road network.

Image classification using CNN model
CNNs is the most popular neural network model for the image classification problem. Specifically, the model follows a hierarchical architecture to build a network. Subsequently, the output is processed based on a fully-connected layer where all the neurons are connected to each other. Figure 4 depicts a CNN architecture to classify handwritten digits datasets (i.e. MNIST datasets). Consequently, CNN-based image classification has been applied in various applications. For instance, Li et al. (2014) present a medical image classification approach using CNN for classifying lung image patches with interstitial lung disease. Authors in Hershey et al. (2017) adopt various CNN architectures (e.g. AlexNet, VGG, Inception, and ResNet) to classify the soundtracks for the largescale audio datasets. Seo and Shin (2019) propose a hierarchical convolutional neural network for fashion image datasets. Regarding the transportation domain, our previous works apply CNN architectures for the feature extraction to analyse traffic flow from CCTV and road sound datasets, respectively  and Bui, Oh et al. (2020).

Proposed approach
Traffic information, that have collected from connected devices on the road network, are considered with time and space dimensions in order to classify the traffic congestion. In this regard, we first present the large-scale traffic density classification problem. Then, an image generation approach is proposed for representing the traffic conditions.

Image-based traffic density classification
The main focus of traffic analysis regarding the proposed image generation approach is to classify the traffic density in the large-scale road network. Specifically, the traffic conditions are represented by three conventional parameters such as traffic volume, speed, and density. Surveillance systems (e.g. loop detectors) are able to measure the traffic volume and speed. However, measuring the density is difficult which have recently attracted more attention (Chung & Sohn, 2018;Kurniawan et al., 2018). Normally, the traffic densities are able to classify different conditions following the time of a day. For instance, Figure 5 shows the traffic volume of a symmetric road during one day which have collected from Vehicle Detection System (VDS) . Consequently, traffic conditions can be defined into three groups which are low, normal, and high densities following the time of a day.
In this regard, supporting L = {l 1 , l 2 , . . . l n } denotes a set of traffic condition labels, the traffic density classification problem is regarded as a supervised learning problem in which given a training set where Y i , L, the objective is to learn a multi-label classifier from D to predict labels of new images.

Image generation approach
The traffic information can be represented in the spatio-temporal correlation as a graph G(V, E), which V is the set of Node (e.g. intersection) and E denotes the set of links which are the possible directions among Nodes. The time interval depends on the sampling resolution of connected devices. Normally, data might be aggregated in several minutes (e.g. 5 minutes) for analysing the traffic condition. In this regard, the research question is that how to represent the weight (traffic density) among nodes for generating the image datasets. Figure 6 illustrates the image generation process of our proposed approach. Specifically, traffic datasets from surveillance systems (e.g. loop detector) collect traffic information such as speed and traffic volume at a certain location. Since we consider the density of traffic flow in a large road network, speeds of vehicles are taken into account. However, the average speed at a certain location can not explain in more detail the traffic condition (e.g.congestion or non-congestion). In this regard, we calculated the traffic travel time based on the arrival and departure times of vehicles in Figure 5. Traffic volumes of a symmetric road with two sensors. order to represent the traffic conditions at intersections. Consequently, the main steps of this process is calculated as follows:

Traffic condition representation
The average travel speed is calculated based on the travel time and the distance between two nodes/intersections as follows: where d i,j is distance from intersection i (start point) to j (end point) andt i,j is the approximated travel time between two intersections, which can be calculated as follows: where the t v i and t v i are the arrival and departure time of vehicle v at intersections i and j, respectively. Since the distance is constant, the average travel speed depends on travel time values. In this research, the travel time would be used to represent the traffic conditions (e.g. congestion or non-congestion).

Image representation at single intersection
Since an intersection consists of more than three legs, we use polygons to present the traffic conditions at intersections. Specifically, the polygon can be a triangle, square or pentagon shapes in cases of the intersection with 3 legs, 4 legs, or 5 legs, respectively. Furthermore, since we represent the traffic condition using travel time of vehicles, the radius of the polygons is maximum travel time of vehicles, which is formulated as follows: where I max denote the maximum normalized travel time at a certain intersection. t max and t min are calculated based on the maximum and minimum speeds as follows: Normally, in the field of transportation, we are able to measure the maximum and minimum speeds based on the posted speed limits values v lim depending on the type Figure 6. Image generation process of the proposed approach.
of road. In this regard, Equation (3) can be re-formulated as follows: where α is a certain value which defines the ratio between the maximum speed and posted limited speed. Consequently, Figure 7 illustrates the concept of converting traffic conditions at an intersection with 4 legs to the image of polygons at a certain time. Specifically, the vertexes of the polygon are computed based on the values of average travel time corresponding with different directions (e.g. North, East, West, and South), which are computed based on th approximated travel timet i,j (Equation (2)).
In particular, the area of the polygon represents the traffic condition of the area around the intersection. Specifically, if the area is close to the maximum value (I max ), the traffic is the congested condition. For instance, the example in Figure 6 shows that the congestion occurs in the East direction of the intersection, other directions are normal conditions of the traffic flow.

Converting network traffic to images
Supporting the network traffic includes multiple intersections, we can represent the network as a graph G(V, E) where V is a set of intersections and E denotes the set of possible links among intersections. Consequently, the traffic conditions of the road network is represented based on densities of traffic flow in each intersection, which is illustrated in Figure 8.

Data description and experiment setup
In this study, data from roadside equipment (RSE) is taken into account. Specifically, RSE devices have installed on major intersections, which are able to connect to on-board equipment (OBE). Consequently, when vehicles arrives the intersections, OBE automatically connects to the RSE through communication system based on the dedicated short range communication (DSRC) technology. Therefore, we collect and pre-process RSE data of a test base area during 5 months, which includes 11 continuous intersections in order to analyse traffic conditions as shown in Figure 9. Specifically, the traffic density images are generated based on the value of maximum speed, posted limited speed, and the approximated travel time of each node/intersection. In particular, the posted limited speed v lim is set around 10 KPH (6.2 mph), and the desired maximum speed equals 1.5 times of the posted speed.
Furthermore, we consider the scenario of traffic flow in the morning (6AM to 10AM) at the test base for evaluating our proposed method and the time interval for each recorded/image is set 5 min. Consequently, the traffic image dataset includes 5040 samples for the training, 720 samples for the validation, and 1440 for the testing. Specifically, the set of traffic condition labels L are described as follows: . Label 1: from 6AM to 7AM, traffic condition is low density. . Label 2: from 7AM to 8AM, traffic condition is normal density. . Label 3: from 8AM to 9AM, traffic condition is normal density. . Label 4: from 9AM to 10AM, traffic condition is high density.
Regarding the training model, the architecture of the proposed CNN is determined based on the hyperparameter optimization using Tree-structured Parzen Estimator (TPE) (Bergstra et al., 2011). Particularly, the hyperparameters of the model are shown in Table 1.  Figure 10 demonstrates the accuracy of the trained model on the new image dataset. Specifically, we take a comparison following the number of labels. Particularly, we are able to achieve over 0.95 of the training accuracy to classify the low density and high density. In case of the 3 and 4 labels, the training accuracies are 0.86 and 0.74, respectively.

Experiment result
For more detail, Table 2 shows the results of all cases for the classification of the generated image datasets. The experimented results indicate that the generated image dataset is able to classify the traffic density of the large-scale road network, especially in the case of distinguishing congested and non-congestion conditions. Consequently, the traffic condition can be estimated at certain times to provide smart traffic control for improving the traffic flow.
Moreover, we are able to improve the classification problem by increasing the layers of CNN architecture. For instance, Figure 11 demonstrates the training results by adopting two well-know CNN architecture which are Resnet50 (He et al., 2016) and Inception-v3 (Szegedy et al., 2016) (without pre-trained model). However, we need to deal with several challenging issues by applying the aforementioned CNN models such as time-consuming and transfer learning problems. Therefore, we will take this issue as a future work regarding this study. Fully-connected - Figure 10. Results of the trained model with different values of labels.

Conclusion and future work
Recently, text-to-image generation is an emergent approach that provides many applications using Deep Learning models, and transportation system is no exception. In this study, we propose a new image generation approach to deal with the traffic density classification in terms of large-scale network traffic. Specifically, traffic density at a certain area (e.g. intersection) is able to represent based on the average travel time of vehicles. Consequently, we are able to classify the traffic conditions based on the generated image by using CNN network. Particularly, RSE data at an urban area with traffic flow from 11 continuous intersection have collected and pre-processed to evaluate our proposed approach. Specifically, the experiment shows promising results for classifying the traffic density with large-scale network traffic. From our point of view, there are several issues to extend the trained model in order to improve performance: (i) Designing a custom trained model which focuses on specific traffic datasets. Specifically, compared with other applications, the context of traffic image datasets have several different characteristics such as one channel value (e.g. traffic speed) and abstract features input images; (ii) Increasing the depth of CNN architecture, which is able to learn more complex relations and maintain the acceptable  time-consuming tasks. The aforementioned problems are interesting issues that we will take into account in the future work regarding this study

Acknowledgements
This work was partly supported by Institute for Information & communications Technology Promotion(IITP) grant funded by the Korea government (MSIT) (No.2018-0-00494, Development of deep learning-based urban traffic congestion prediction and signal control solution system) and Korea Institute of Science and Technology Information(KISTI) grant funded by the Korea government (MSIT) (K-20-L02-C09-S01).

Disclosure statement
No potential conflict of interest was reported by the author(s).