Improved ACO-based path planning with rollback and death strategies

ABSTRACT This paper is concerned with the path planning problem for a class of mobile robot systems in a complex environment. By applying the rollback strategy into the traditional ACO, the ants can return to the previous node if there is no any solution of the algorithm. In this sense, the number of the ants which successfully reach the target is increased. Then, in order to reduce the effect of invalid pheromone on the evolution of ant colony as well as reduce the cost of the time, the death strategy is utilized. Our aim of this paper is to apply the rollback and death strategies into ACO such that the state transfer rule is improved and the composition structure of pheromone is optimized. By giving a certain upper bound of the pheromone of the node, the node whose pheromone exceeds such an upper bound will not be selected. Therefore, the efficiency of the algorithm is greatly improved. Finally, a simulation example is given to illustrate the effectiveness of the proposed algorithm.


Introduction
It is well recognized that path planning is an attractive research topic in the field of mobile robot, the purpose of which is to find an optimal path from the starting point to the goal point in the obstacle environment (Blackmore, Ono, & Williams, 2011). In recent years, a large number of research efforts have been invested in investigating the path planning problem of the mobile robot and a few of representative algorithms have been put forward, such as the Ant Colony Optimization (ACO) (Dorigo, Maniezzo, & Colorni, 1996;Uriol & Moran, 2017), the genetic algorithm (Roberge, Tarbouchi, & Labonte, 2013;Song, Wang, & Sheng, 2016), the particle swarm optimization (Zeng, Wang, Zhang, & Alsaadi, 2016;Zeng, Zhang, Chen, Chen, & Liu, 2016;Zeng, Zhang, Liu, Liang, & Alsaadi, 2017), the smoothing algorithm (Song, Tian, & Zhou, 2010), the sliding mode control algorithm (Capisani & Ferrara, 2012;Wang, Song, Zhang, & Liu, 2016), A star algorithm (Duchoň et al., 2014) and the immune algorithm. Among these optimization algorithms, ACO is robust against the external disturbances, and it is easy to be applied into the practice with the other optimization algorithms. Up to date, a rich body of results on ACO has been reported in literature, see e.g. Tsou and Hsueh (2010) and Zhu and Zhang (2005).
Although ACO has a great ability of handling the path planning problem and Traveling Salesman Problem (TSP), CONTACT Guoliang Wei guoliang.wei@usst.edu.cn it also has some disadvantages for its practical application. The main obstacles of the ACO can be summarized as follows: (1) the search time of the ant colony algorithm is longer than other algorithms owing to its inherent computation complexity; (2) the occurrence of the 'stagnation phenomenon' after certain search steps limits its search space to certain extent. In Tsou and Hsueh (2010), the concept of e-navigation has been used as a framework, and the positioning collision avoidance path planning and ACO have been applied into the field of artificial intelligence to construct a collision avoidance model. This method can imitate optimization behaviors in real-life applications but add the computational complexity. As stated in Dorigo et al. (1996), due to the pheromone left by each ant, the initial path might have a heavy influence on the final path. In this sense, the 'stagnation phenomenon' problem cannot be completely solved since the ACO might fall into a local optimization or the stagnation. Therefore, it is of great important to overcome these two obstacles mentioned above. In order to overcome the time-consuming obstacle, some improved ACOs have been developed such as the Ant System (AS) (Dorigo & Gambardella, 1997;Dorigo et al., 1996), the elitist ant system (Ataie-Ashtiani & Ketabchi, 2011), the Max-Min Ant System (MMAS) (Stutzle & Hoos, 2000) and the rank-based AS (Bagheri & Golbraikh, 2012). In Stutzle and Hoos (2000), both the upper and lower bounds of the amount of pheromone have been set to make the volatilization coefficient smaller, which has effectively avoided the stagnation phenomenon. However, a fatal disadvantage of this method is that the search process is quite time-consuming. In Bagheri and Golbraikh (2012), in terms of the length of path, the ants have been sorted with different weights for the sake of speeding up the search process. It should be noted that 'stagnation phenomenon' have not yet be handled although the above algorithms have a guaranteed reduction of the simulation time. Thus, it is practically significant to propose a novel algorithm so as to deal with 'stagnation phenomenon' as well as save the time.
For the purpose of coping with the problem of 'stagnation phenomenon', some advanced optimization algorithms (e.g. rollback and death strategies) have been proposed in recent years. In Bu, Law, and Feng (2008), the rollback strategy has been successfully applied into GA to solve the vehicle routing problem. In Shi, Liang, Lee, Lu, and Wang (2005), the death strategy has been introduced in the variable population-size genetic algorithm (VPGA) to solve the optimization problem of functions. Since the above rollback and death strategies have great abilities in overcoming the 'stagnation phenomenon', it is natural to apply these two optimization strategies into ACO. To the best of the authors' knowledge, the rollback strategy and the death strategy have not yet be applied into ACO since it is challenging to optimize the composition of pheromone and eliminate the interference of invalid pheromone on the system. This is our primary motivation of this paper.
According to the above discussions, we endeavor to improve the ACO by utilizing the rollback strategy and the death strategy. The main contributions can be highlighted as follows: (1) When there is no reachable point near the ant, the ant will return to the former node, put the original node into the tabu list and search the node again. This method enables the artificial ant to obtain more chances to reach the target point.
(2) When the ant returns to the previous node and searches nearby, it will be killed if there is no reachable point yet. Then, the dead ant is marked as the invalid ant. In addition, the pheromone on the path passes will be subtracted from the fixed value, and the pheromone will not be updated anymore. Such a strategy can effectively reduce the influence of the invalid pheromone on the subsequent ant colony. (3) By giving a certain upper bound of the pheromone of the node, the node whose pheromone exceeds such an upper bound will not be selected. As such, the negative phenomenon that too many ants choose the same path can be effectively avoided. Therefore, the efficiency of the algorithm can be improved.
The reminder of this paper is organized as follows. Section 2 introduces the basic concepts and formulas of ACO. Section 3 introduces the detailed implementation of the improved ACO including the flow chart and algorithm steps. Section 4 gives the simulation and the corresponding analysis results under different scenarios. In section 5, we draw the conclusion of this paper.

Ant colony algorithm
Generally speaking, ACO is a heuristic intelligent search algorithm which imitates the behavior of ant colony foraging and finds the optimal path in an unknown environment. The existing research results show the fact that ants always look for paths by releasing a specific secretion of information along with the path. As the increase of the ants walking through the same path, the pheromone concentration on the path will increases. In this sense, the probability that subsequent ants choose such a path will increase. In addition, ants would find a new path according to the change of the environment. In order to improve the efficiency of path planning, the concept of the heuristic function η and tabu list tabu k are introduced into the artificial ant colony model . To be more specific, in the random search algorithm, the heuristic function is used to improve the search efficiency. Every node the ant passed through would be deposited in the tabu list to ensure that the ants will not return and circle in the same place. The ants search nodes nearby in the process of movement based on the transition probability.
The state transition rule, also called foraging rule, is given as follows where τ ij (t) represents the pheromone concentration on the path i to j, s is the current node, η ij = 1/d ij the reciprocal of the distance between i and j. A so-called tabu list tabu k is the set of nodes that are chosen by ants. allowed k is the set of nodes that ants can choose in the next step. α and β represent the influence of pheromone concentration and the heuristic information, respectively. The above formula shows that the greater the pheromone and heuristic function is, the higher the probability of the node to be chosen is.
On the other hand, the pheromone on the map will accumulate and evaporate over time and the pheromone update formula is governed by where ρ is the pheromone evaporation coefficient taking values over the interval [0 1] so as to avoid too much accumulation of pheromone; τ ij (t) represents the incremental pheromone on the path from i to j after the time t. The initial time of the τ ij (0) is 0. τ k ij (t) denotes the increment of pheromone left by ant k after t time on the path from i to j, which is defined as where L k means the total length of the path taken by the ant in this cycle. Q stands for the pheromone intensity, which affects the convergence speed of the algorithm.
Although there is no individual communication between the ants and the foraging process, each participant will leave pheromone as the medium for inter population communication. Due to the lack of proper guidance in the early stage of the population evolution, it would take a long time to find several effective paths. As a consequence, the heuristic function and tabu list are introduced in the artificial ant colony for the purpose of solving such a problem to a certain extent.

Improved algorithm
It should be stressed that the improved ACO searches according to the transition probability (1). However, in comparison with the existing algorithms, the main difference is that when there are only barrier grids or nodes in tabu table nearby, the ant is allowed to return to the previous node, which is called the rollback strategy. Under the rollback strategy, if the previous node is in the tabu list, there will not be any accessible node for the ants. Then the ants will be labeled as the invalid ants and the left pheromone will not be updated, which gives rise to the death strategy.
A prominent advantage of the rollback strategy is that such a strategy can effectively enlarge the amount of the ants to the target point, and thus improves the concentration of effective pheromone. By utilizing the death strategy, the misleading of pheromone to the ant colony can be reduced. In ACO, the updating of pheromone depends mainly on the number of ants passing through the path (i, j) at the moment t. However, some of them are the invalid ants. In other words, the ants do not reach the target point although they have ever passed through this path. As such, the undifferentiated pheromone left by these ineffective ants will mislead the evolution of the ant colony. The reasons discussed above heavily deteriorate the performance of the algorithm on the evolution time of the ant colony as well as the convergence rate.
For the sake of countering the side effects on the algorithm performance, a new pheromone updating rule, which refers to the Max-Min ant system, is adopted in this paper. After one cycle is finished, only the pheromone left by the ants reach the target point is involved will be updated. Therefore, the path length plays a central role for the ACO. The new updating rule is given by where τ best ij (t) represents the increment of the pheromone left by the ant which has found the optimal path, L best is the total length of the path from the starting point to the target point, Q stands for the pheromone intensity. Instead of updating the pheromone at each search, the improved algorithm updates the pheromone after each iteration.
Improved ACO inspired by the MAX-MIN ant system, setting the large initial value of the concentration of pheromone. The pheromone concentration will decrease with the number of iterations and the pheromone concentration of the path that no ant passes will be reduced to 0. Finally, the algorithm output all existing pheromone nodes, which is the final path. The objective function of the optimization problem is as (5) where (i, j) is the grid coordinate. τ (i,j) is the concentration of pheromone. The attachment of all the nodes that exist in pheromone is the path that is desired. The flow diagram of the algorithm is shown in Figure 1 and the corresponding steps are concluded as follows.

Algorithm 3.1:
Step 1: System initialization. Set the number of each generation of ants be m, the total wave number be n, the maximum number of iterations be NG, the pheromone weight be α, the heuristic function weight be β, the initial pheromone be 0, the pheromone evaporation coefficient be ρ, the pheromone intensity be Q, respectively, and add the starting point of S to the tabu list.
Step 2: Select the next node j according to the transition probability (1) and place the selected node in the tabu list. If the destination point is placed in the tabu list, jump to Step 4. Step 3: If there is no alternative node nearby, the ant returns to the former node; Then put the original node into the tabu list; the Step 2 is re-executed. If the node is still unselected, the pheromone on the path is subtracted from the fixed value and the next iteration is executed.
Step 5: If n ≥ ng, stop and save the final output path, otherwise jump to Step 2.

Simulation analysis
In this section, a simulation is provided to show the effectiveness of the proposed modified algorithm and the corresponding analysis is constructed. Two groups of comparison experiments were made for two different environments. Environment 1 (E1) is the environment that obstacles are centralized with small amounts. The environmental scale is 20 times 20 grids. Environment 2 (E2) is a large and decentralized environment with a large scale of 30 × 30.
According to the mechanism of ACO, parameters selection of the algorithm has a certain effect on the search efficiency of the algorithm. As usual, parameters of simulation are selected by the practical engineering. The main function of heuristic factor is to guide the evolution of ant colony with insufficient pheromone. The mechanism of the heuristic factor determines that the ant colony will choose the shortest path if the pheromone is deficient or the pheromone concentration is the same. The heuristic factor will be ineffective if its weight is too small. Meanwhile, the ant colony choose the local shortest path mechanically if the weight is too big.
Pheromone is another important parameter of ant colony algorithm. Its accumulative degree and volatilization degree have direct influence on the output of algorithm. If the pheromone weight is too large, the pheromone will overwhelm the heuristic information. The excessive proportion of accumulative degree will lead to the convergence in advance. In this sense, we may find a local optimization rather than the global optimization. If the pheromone weight value is too small, it will not effectively guide the ant colony search path. Therefore, the number of ant colony should be valued according to the size of the environment to ensure that the algorithm can fully search space. The accumulative factor and the volatilization factor should adapt the initial density of pheromone.
In this example, the Iteration number is chosen by n = 100; the ant population is selected as m = 50; the weight of pheromone is α = 0.1; the weight of the  heuristic function is β = 0.2; the pheromone volatilization coefficient is given by ρ = 0.1; and the pheromone intensity is selected as Q = 10.
Experiments are carried out with the same parameters. Some comparisons are made between the traditional ACO and the proposed improved ACO. The comparison   results are shown in Figures 2-7 and Table 1. In the E1, since the pheromone left by each ant is the same, the initial path has a direct influence on the final path. Therefore, the initial path will directly affect whether the optimal path can be found by the ants. In this case, an effective way is to increase the weight of heuristic function. However, the problem cannot be completely solved because the ant colony algorithm may fall into a local optimization or stagnate. From the Figures 2-7 and Table 1, we can conclude that improved ACO is better than the tradition ACO. In other words, the simulation example demonstrates the validity of our method.

Conclusion
In this paper, we have investigated the path planning problem for a class of mobile robot systems. In order to handle the 'stagnation phenomenon' problem as well as reduce the computation time, we have applied rollback and death strategies into the traditional ACO. 11. By using the rollback strategy, without any solution of the algorithm, ants can return to the previous node. As such, more ants can reach the target. Furthermore, the death strategy has been employed to reduce the effect of invalid pheromone on the evolution of ant colony. Then, a certain upper bound of the pheromone of the node has been put forward to limit the reselection of nodes. In the end, an illustrative example has been carried out to demonstrate the validness of the results.

Disclosure statement
No potential conflict of interest was reported by the authors.