A numerical-analysis-based optimization method for location selection for planning residential areas in grid transportation networks

ABSTRACT China's economy has been rapidly growing over the past few decades; however, increasingly more modern cities are facing problems caused by poor city planning, such as traffic networks, resource distribution and routine management. Optimization methods are urgently needed during city development. From location planning perspective, the residential area issue is discussed in this paper. Based on this issue, a numerical-analysis-based method is proposed for selecting proper residential areas in grid transportation networks. First, a grid layout is introduced to formalize a real traffic network. Furthermore, to guarantee its approximation, a considerable amount of data are obtained from real-world scenarios based on the shortest routes to demonstrate the problem of traffic jams. Then, a quantitative evaluation system is proposed, which quantitatively evaluates the specified location with the traffic flow distribution index, traffic congestion index and infrastructure convenience index. Each index reflects the attitude of citizens from a unique perspective, which affects the final location planning and selection. Finally, an analytic hierarchy process is designed to analyse all these indices together for a comprehensive analysis, and case studies and experiments are conducted to demonstrate the effectiveness of the proposed method. The generated residential location can be considered as a technical reference for further city planning and development decisions.


Introduction
With the development of China's economy, many cities in China have been growing into a relatively developed level [1]. However, many formidable traffic problems also exist since these cities have not made sufficient preparations for such growth. Increasingly more people are realizing that some temporary methods and adaptations will not take effect quickly even if they are implemented immediately [2]. In this situation, failures and shutdown inspections are not acceptable to the public. At present, people spend considerable time on the transportation system daily because of traffic jams [3]. Thus, an effective method for planning transportation and residential areas is required to address further challenges of city development, such as population and traffic [4].
A grid is the ideal layout in modern cities [5]. In places such as Manhattan, New York, people have far more available routes and experience relatively fewer traffic jams than in any other busy location. In addition, it is also easy to quickly locate their destination. For example, Times Square is located at the intersection of 46th Street and 7th Avenue. Every location can be precisely identified by simply providing the street and avenue numbers. A grid layout is a modern trend; however, for historical reasons, it is difficult for most cities in China to change [6]. Together with the substantial traffic, congestions cannot be solved unless there is a large change, which means that a need to perform planning under a grid layout exists, making full preparations for the future [7]. Thus, in this paper, a grid layout is naturally considered. The following question then arises: What should our study mainly focus on? Cities are vast combinations. The complexity of a city can vary from a single castle to an entire country. However, irrespective of what a city could be, humans are always the core. Thus, the residential area is selected as the main topic of this study. The location of residential areas is the key point of city planning.
Selecting the target location of residential areas is an optimization problem with constraints in the real world [8]. Both city planning and the demands of residents need to be taken into consideration. These two aspects are closely related. From this perspective, this paper focuses on how to select and evaluate the locations of residential areas in an existing city plan.
After configuring a residential area, the former balance will be broken to make the development of the current community reach a new balance. To illustrate this deviation, a traffic flow distribution index is introduced [9]. Meanwhile, traffic congestion is a problem that cannot be forgotten. Therefore, a traffic congestion index is defined to evaluate whether the selection would cause substantial traffic congestion in the future [10]. Directly related to citizens' feelings, the distance between the infrastructure facilities and the residential area is also an important aspect, and this is quantified as the infrastructure convenience index [11]. Finally, an analytic hierarchy process (AHP) model is introduced to obtain the weight of each index, and then each location selection can be quantitatively evaluated.
The remainder of this paper is organized as follows. Section 2 reviews the background material. Section 3 proposes the traffic flow distribution simulation method. Section 4 presents the definition and the implementation details. Section 5 conducts the case study and experimental analysis. Section 6 concludes this paper and presents directions for future work.

Research background
In this section, we present examples to show the motivation for this research. For example, Chongqing, one of the largest cities in China, has long been criticized for its traffic congestion. Chongqing was even ranked as the most congested city in China. In fact, considering the high levels of car ownership, it is rather reasonable for Chongqing to be more congested than some large cities, such as Beijing and Shanghai. However, the area of Chongqing is 5.02 times larger than that of Beijing, 6.92 times larger than that of Tianjin and 13.00 times larger than that of Shanghai. The characteristics of these cities are shown in Table 1. From this perspective, the congestion situation in Chongqing appears to be quite abnormal.
In fact, most congestions could be dispersed or even eliminated [12]. The current situation in Chongqing is completely due to its poor city planning. As shown in Figures 1 and 2, the roads in Chongqing are randomly scattered, making the traffic network extremely complex [13].
As illustrated above, the roads were not constructed with a good order. Consequently, a highly efficient traffic network cannot be formed, and the traffic situation is stuck in a vicious loop [14]. Although some branches   have been constructed to disperse the traffic, their effect is limited and is still far from the expected effect. In a grid layout, the inner roads are supposed to share the same or similar level of priority. All of the roads could disperse the traffic flow, fully utilizing the space [15,16]. As a comparison, Figure 3 shows the situation of Xi'an. The roads in Xi'an are strictly arranged based on a grid layout. When travelling to a destination, people can follow many different routes that have the same distance. This arrangement directly disperses the traffic flow since there is no need for them to worry about an increase in distance. Any congestion could be avoided by simply changing to another route. Grid layouts are a promising trend. Fortunately, several cities in China have begun to use grid layouts. Shanghai is among such cities and sets a good example for other cities [17][18][19].
In China, increasingly more cities will face such transportation problems. Thus, studying the application of a grid layout and providing a scientific method to perform city planning are of great significance [20].
To handle the optimization problem of residential area selection, a numerical-analysis-based method is proposed considering a grid transportation network, as shown in Figure 4. In this paper, the traffic flow is simulated first. With this solid basis, each of the three indices can be calculated once a residential location is selected. After iterating all possible locations, we can obtain the score of each location. Finally, the location with the highest score is the optimal location.

The traffic flow simulation
In reality, it is impossible to acquire a precise real-time traffic flow. Therefore, a model should be established to perform the simulation. However, some fundamental aspects should first be considered [21,22]: people tend to choose the shortest route to their destination to save time and expenses, and if traffic jams occur on the shortest route, then people might drive farther to  avoid the congestion but still attempt to find the shortest route to their target destination. Based on these two points, the concept of 'shortest routes' is defined. It is known that in a grid layout, once the destination is fixed, there could be several available routes with the same distance. These types of routes are called 'shortest routes' in this paper. In addition, it is easy to understand that the more shortest routes there are that pass one point, the more potential traffic flow there would be [23]. To better understand this situation, some basic concepts are introduced below.   For clarity, each point is marked with the number of shortest routes. For example, in a grid network with only one grid, the right bottom point has 2 shortest routes. Thus, the number 2 is written next to the point, as shown in Figure 5.
Then, the situation of 4 grids is considered, as shown in Figure 6.
In Figures 5 and 6, the number of a certain point's shortest routes is the sum of the numbers on its left and top. In fact, the number of shortest routes could be given in a general way.
As shown in Figure 7, v 0,0 stands for the departure point and v i,j refers to the destination point. The term N(v 0,0 , v i,j ) represents the number of shortest routes between point v 0,0 and point v i,j . Following this pattern, the number of shortest routes between point v 1,j and The situation with departure point v 0,0 and destination point v 2,n is now considered. According to the above examples, the number of shortest routes is given by adding the numbers of N(v 0,0 , v 1,n ) and The shortest routes of N(v 0,0 , v 2,n−1 ) can also be obtained with a similar method (2) Then, If N(v 0,0 , v 2,j ) continues to be substituted in the equation, then the eventual equation would be Similarly, Then, the following can be deduced: A general expression for the number of shortest routes between two fixed points is now provided.
However, the above result cannot account for areas with barricades, such as buildings and parks. To ensure the practicability of the model in reality, these barricades should be taken into consideration. To simplify this problem, the situation with only one barricade is considered. The barricade can be regarded as a neighbourhood occupying the space of a × b grids L ij (a, b) [21]. The neighbourhoods in China are designed as closed neighbourhoods, which means that the roads inside the neighbourhoods are not available for the external traffic system. Thus, the roads inside the neighbourhoods are removed, forming the traffic network shown in Figure 8.
After the analysis, the new network can be divided into two parts. Let the number of shortest routes of part one be N L ij (a,b) (P 1 ) and the number of shortest routes of part two be N L ij (a,b) (P 2 ). Because the methods to calculate the two parts are symmetric, only the method to calculate N L ij (a,b) (P 1 ) is discussed in this paper. The rectangular area or so-called barricade does not interfere with the result in the area to the left of and above the rectangle. It only changes the result of the area to the right of and below the rectangle. Additionally, recall that the connection between  and N(v 0,0 , v k,j ) is already known. Then, The number of N L ij (a,b) (P 2 ) can now be generalized. Additionally, the total number of shortest routes can be given by combining N L ij (a,b) (P 2 ) and N L ij (a,b) (P 1 ).
Take a 4 × 4 network as an example, and set the barricade of a 2 × 2 rectangle at the centre in Figure 9. The number of shortest routes can be presented as a matrix, as shown in Figure 10.
Since the number of each point is only related to its neighbouring point, the result can only be calculated step by step. Thus, the existence of the barricade makes the calculation considerably more complicated. Additionally, the networks with more than one barricade need to be divided into several subnetworks composed of one barricade to complete the calculation. After being calculated, they should then be added together and presented as a matrix, forming a rough estimate of the original distribution of the target area, as shown in Figure 11.

The traffic flow distribution index
The centre part is generally the busiest area in a city. Then, the traffic flow will gradually decrease as it becomes closer to the suburbs, unless there is a subcentre. This pattern is reasonable, and it is exploited in this paper. We hope that the traffic flow can be relatively evenly distributed such as in the aforementioned situation. Then, it is natural to set a standard value, which one can call the ideal distribution. If this value is exceeded, then the traffic condition of that location would be considered poor. Based on this situation, the traffic flow distribution index is defined. Its value increases when the traffic flow distribution is more reasonable, which means that it better fits the city planning [24]. The method to calculate this index is given below.
This index directly shows the weight of each point. Before presenting the method to calculate the ideal distribution p v (i,j) , a function needs to be established to compute the distance to the city centre Another function is needed to help interpret the distribution: where r v (i,j) stands for the Euclidean distance to each point. It is used to distinguish different parts of the city by setting the line R, making it convenient to apply different ideal distributions. Actually, they can be combined into one function.

Definition 4.2:
The ideal distribution index of v (i,j) is j) ). (14) This formula indicates the change in the traffic distribution starting from the city center. f 1 (r v (i,j) ) refers to the city centre, and f 2 (r v (i,j) ) stands for the suburbs [25][26][27]. If the city is very large, such as Tokyo, then more functions could be added to illustrate the complicated distribution [28].
In addition, the traffic flow can always be related to its economic condition. Thus, some external factors need to be considered, such as its economic environment and industrial background. Take the largest city in China, Shanghai, as an example; it has held the first position for a long time in terms of population, economy, and particularly traffic. This is shown in Table 2. The data of the population growth are shown in Table 3.
Based on these information, an ideal traffic flow function can be obtained through interpolation. From the perspective of each point, after obtaining the actual traffic flow q v (i,j) and the ideal distribution index q v (i,j) , their difference ratio can be illustrated through the reasonable scale ρ v (i,j) .

Definition 4.3:
The reasonable scale of each point is The value ρ v (i,j) increases as the traffic flow better fits the ideal distribution. The traffic network reaches a critical condition when ρ v (i,j) obtains a value of 0. When ρ v (i,j) < 0, the traffic flow will show a very high value, indicating considerable traffic congestion. Finally, the traffic flow distribution index is obtained by adding each ρ v (i,j) together.

The congestion index
As a city traffic network, the ability to tolerate certain problems should be fundamental [29]. For example, the overall network should not be paralysed by the breakdown of one or two roads for some accidents. In addition, traffic congestion is similar to a plague. Once a traffic congestion occurs, nearby roads might also become congested. Thus, the epidemic model is introduced [30]. This model initially shows that all healthy people can be infected with an epidemic disease, including the ones that have already recovered. In this case, it means that the areas that suffered from congestion could become congested again [31,32]. However, the epidemic model can only illustrate the entire network rather than a specific point; thus, some changes need to be made. In the previous section, we obtained the share of each point in the entire network. Thus, the value of each point can be obtained by simply multiplying the entire traffic flow and its share. Each calculation step is listed below.
In the epidemic model, let the traffic flow function of point x(t) be a continuous and differentiable function over timestamp t. The percentage of the weight of the light traffic flow is s(t), and that of the heavy traffic flow is i(t). As stated in the model description, neighbouring points of congested points can gradually be transformed into congested points. Let the average number of good points that could be transformed into congested points by one congested point be λ. Then, the number of new congested points is λs(t). Assume that the number of vehicles caught in a congestion is Ni(t) and that the number of new vehicles caught in a congestion of unit time is λNs(t)i(t). Therefore, the growth rate of the congestion with Ni vehicles caught in a congestion is λNsi. Additionally, it is necessary to consider the points that become congested again after the congestion was fully cleared. Let the percentage of this type of point occupy μ of the total number. Then, the following can be deduced: Since s(t) + i(t) = 1, the equation can be simplified.
The variable i 0 stands for the congestion percentage when t = 0. Define the variable σ as follows: Then, the equation is equivalent to: The function i(t) converges when the variable goes to infinity.
The number of vehicles at each point is N 0 v (i,j) .
The critical value of the traffic flow is N v (i,j) .
l c stands for the average length of the vehicles, l s stands for the length of the road and l a refers to the safety distance between vehicles. The state of the traffic flow can be analysed through the critical value. When N v (i,j) (i,j) , the traffic is congested, and the safety distance between vehicles l a decreases. Now consider the ratio between the length of the road l s and the average length of the vehicles l c .
It stands for the maximum of the traffic congestion in theory. If N v (i,j) ≥ N v (i,j) , then the road is fully congested. Those locations should be ruled out when selecting the location of a neighbourhood since they can easily become congested. N v (i,j) and N v (i,j) are constant. Now introduce a new variable θ v (i,j) to illustrate the scale of the congestion of each point The value of θ v (i,j) increases as the traffic becomes clear at the point. The variable θ v (i,j) becomes negative in the case of traffic congestion. Then, the traffic congestion evaluation of the overall system can be performed as follows.

Definition 4.5: The congestion index is
n + stands for the number of positive θ v (i,j) and n all stands for the number of whole networks. This index indicates the proportion of the points with a positive value of the variable θ v (i,j) . The higher the proportion is, the fewer congestions there are.

The infrastructure convenience index
When choosing the residential area, people tend to pay attention to the infrastructure facilities nearby. People expect the infrastructure facilities, such as hospitals and schools, to be as close to their homes as possible. Therefore, the distance is of great importance.
A simplified model is presented that only takes parks, hospitals, schools and commercial areas into consideration. The level of importance decreases in the order of schools, commercial areas, hospitals and parks. For each of them, a location is given randomly, as shown in Figure 12.
At this moment, it is very natural to consider how to find the optimal location of the residential area. The location should be expected to be the one where the transportation is the most convenient. This is exactly what particle swarm optimization (PSO) can do [33].
The target of the optimization is  The variable l i stands for the distance to the primary school, the middle school, the commercial area, the park and the hospital. The variable ω i stands for its weight. A simple illustration is shown in Figure 13. After applying PSO, the optimal location can be directly obtained.
In Figure 14, the curve stands for the scale of the stability. If the value is high, then it means that the location is unstable. Thus, the bottom of the curve is the most stable location. This corresponds to our final result mostly locating at these stable locations.
The result of PSO could become stable as the number of iterations increases. After a specific number of iterations, the result would be regarded as the optimal location in this section regardless of the two former indices. Then, the method to quantify the infrastructure convenience index is introduced. The result can be obtained by calculating the distance between the last two iterated locations d and the arithmetic square root of the inverse.

Definition 4.6:
The infrastructure convenience index is This index shows the scale of the convenience of the target area. The higher the value is, the better is the convenience.

The analytic hierarchy process
Thus far, the quantification of the three factors has been presented, but their relationship still remains unknown. Thus, AHP is introduced [34]. In this paper, two hierarchies are taken into consideration: (I) The target hierarchy stands for the overall score. (II) The discipline hierarchy consists of the traffic flow distribution index ρ, the congestion index θ and the infrastructure convenience index δ.  Although the relationship among indices is very difficult to quantify, Saaty presented a convincing method. He compared two of the indices rather than comparing them together. This directly overcame the difficulty in specifying the weights at one time. We assume that the importance of the three indices satisfies θ > ρ > δ. The relationship can be illustrated by the following matrix: ⎡ ⎣ 1 1/2 2 2 1 The 1−9 scale proposed by Saaty is adopted [35,36]. The value of a ij represents the importance ratio between index i and index j. The value increases as index j becomes more important. Take the first row as an example. The number 1/2 indicates that the importance of the first index is only half the importance of the second index. The number 2 suggests that the third index is twice as important as the second index. Given the relations of the indices in the first row, we can infer that the value of the element in the second row, third column should be 4. However, the assumed value is 3. This result shows that the importance of the traffic convenience will relatively decrease when it comes to the congestion, which contributes to the asymmetric form of the matrix. The result is not ideal, but it commonly occurs in real life. Then, to identify the validity of this matrix, the uniformity is tested by calculating the eigenvalue λ of the matrix A and its eigenvector α. λ = 3.0092. (30) α = (0.4660, 0.8467, 0.2564) T .
If the value of λ proves to be considerably larger than n, then the uniformity is considered to be poor. Therefore, values of the matrix could be changed according to these values. Then, the uniformity index CI is considered.
To identify the limits of the uniformity, Saaty introduced a random uniformity index RI. His idea is to construct many positive reciprocal matrices and take the average of their uniformity index as the random uniformity index. With different values of n, Saaty provided the value of the random uniformity index RI calculated with the sample capacity from 100 to 500, which is shown in Table 4.
When n = 3 and RI = 0.58, the uniformity ratio can be obtained by dividing the uniformity index by its random uniformity index.
The result is less than 0.1, which means that the matrix is well selected and the vector normalized from the vector α can be adopted as the weight vector α : Therefore, the final score I of each residential area can be given

Results and discussion
In this section, we focus on the validity of the method proposed in this paper. The aim of the experiment is to verify and demonstrate (I) whether the ideal distribution index is reasonable; (II) whether the method is flexible and (III) how stable all the indices are. According to these problems, experiments were conducted using equipment that consisted of an Intel core i7 3770 3.4 GHz CPU and 16 GB of RAM. The software used is MATLAB, and the operating system is Microsoft Windows 10. The infrastructure layout is as shown in Figure 12.

Experimental process
During the iteration section, the scores of the three indices at each location are calculated. The final score is then obtained by using AHP. The location with the highest score is the optimal choice.

Validity of the ideal distribution index
In the method for calculating the ideal traffic flow index, there is a special function based on the practical situation and previous experience. However, this function might cause a substantial deviation. Therefore, the expected target needs to be corrected by interpolating with the simulation data before adding the residential area. In this way, the corrected function could be considered ideal. An example is shown in Figure 15. In addition, if there are no special requirements, then the interpolation result can be directly applied to the experiment. This could considerably reduce configuration issues.

Feasibility analysis
Take 1 grid as the step length of each iteration. After reaching the end of one row, the iteration will jump to the next row and to the head of it. The entire grid network can be iterated in this way. The result is shown in Figure 16.  In Figure 16, it is easy to find that, for each row, the head always obtains the highest score. The score decreases when the location becomes closer to the centre and reaches the lowest value at the end. This result is consistent with the practical situation. In reality, the traffic flow increases as it becomes closer to the centre. The area that is not suitable for a residential area. At the bottom of the target area, the traffic situation is better and more convenient than the other areas. If the residential area is constructed, it will not create much traffic burden.
To conclude, the highest scores appear at the edge and the bottom of the area, which means that the outer part of the city is the optimal location for the residential area.

Stability analysis
The iteration is not a continuous one since there are jumps when reaching the end of each row. Therefore, for convenience, three continuous location chains (three rows) are selected for analysis. The results are shown in Figures 17-19. Consider the result of the traffic flow distribution index in Figure 17. Unlike the top area, both the middle area and the bottom area show a significant decline before increasing. The values of the head and the end of each line are quite similar. However, the scale of the change in the three areas is different. For the bottom area, the flow index fits the ideal the best, and it of course has the highest score. Keeping this in mind, it is also clear that the result for the area is not sufficiently stable, particularly when the number of iterations reaches 6. The three locations all reach the highest point at the head and the end. Consequently, the location is more suitable when it becomes closer to the bottom area. Figure 18 shows the distribution of the congestion index. It illustrates the influence of the location in the entire network. In fact, according to the traffic simulation method, the traffic flow decreases when the residential area becomes farther away from the centre. However, the result of the experiment is not fully consistent with the assumption. Note that the higher the score is, the better is the traffic condition. Other than the high score of the premier location, the two points following it show a remarkable decrease in the score, and the score does not recover until it reaches the points in the middle. Subsequently, the bottom area shows a bulge surface. This is because the traffic flow will mostly be cleared or will not form a massive traffic flow due to the residential area. The effect will decrease when it reaches the end, just as the scores in the graph do.
The distribution of the infrastructure convenience index in Figure 19 is quite reasonable compared with the former two. After applying PSO to obtain the optimal location, the scores directly show the convenience. The highest scores gather at the head, and the score decreases as the iteration proceeds.
As suggested by the above results, the change rate of each index is steep between each two neighbouring points in the middle area. However, considering that most high scores appear in the bottom area and the  top area, the change in the middle area does not greatly matter. The stabilities of the scores in these two areas contribute to the reliability of the location selection in this paper.

Conclusion
This paper proposes an optimization method to select the location of the residential area, focusing on the traffic network. The residential area is the basic element of the city and the source of traffic flow, which could directly affect the traffic distribution. The problems of a nongrid traffic network, analysed with the traffic condition in Chongqing, include low traffic network rates and overall efficiency. It proves the necessity and efficiency of the grid network. Then, the traffic flow is simulated under consideration of the shortest routes. Based on this simulation, a comprehensive evaluation model is developed based on three indices. To simulate the reallife traffic congestion problem, the epidemic model is added. Considering the multiple targets of the location selection, PSO is applied. Subsequently, the score of each point is calculated with the weights obtained from AHP. Iterating through all the locations, the score of each location is returned. We can confirm that the location with the highest score is the optimal location. The experiment discusses the validity, feasibility and stability of our proposed model. However, the ideal traffic flow index involves considerably more factors in reality than the model that we proposed. Thus, the quantified result might not be completely consistent with the actual situation. For example, the migration of population and infrastructure facilities will influence the returned data. In the future, our method will be improved by integrating the Skyline algorithm for data retrieval to ensure the generalization of the areas discussed, thereby making the model more flexible.