Permutation rules and genetic algorithm to solve the traveling salesman problem

Abstract In this paper, a new approach including permutation rules and a genetic algorithm is proposed to solve the symmetric travelling salesman problem. This problem is known to be NP-Hard. In order to increase the efficiency of the genetic algorithm, the initial population of feasible solutions is carefully generated. In addition to that, dynamic crossover and mutation rates were developed. The proposed method was successfully tested using large numbers of different-sized benchmarks. The computational results proved that the proposed solution approach outperforms many existing methods. In addition, for many problem instances the proposed algorithm is able to generate solutions with same value as the best known solutions.


Introduction and background
The travelling salesman problem (TSP) is a popular and challenging optimization problem and belongs to the class of NP-complete problems. In this problem, the salesman aims to visit all the cities and return to the start city with the constraint that each city can be visited only once. The classical objective is to minimize the total travelled distance. The TSPs are real world problems related to transportation, logistics, etc. In transportation, the school buses routes are established to find a cheapest link through every stop. Another real life application can be found in transportation in logistics where the aim is to find the cheapest route to deliver goods to customers. Thus, proposing solutions for them is of great interest (Osaba, Yang, Diaz, Garcia, & Carballedo, 2016). Most of the real world TSPs are very complex. Therefore, finding an optimal solution within an accepted time is a big challenge. In line with this, any related TSP problem is hard to solve using exact procedures (Lawler, Lenstra, Kan, & Shmoys, 1985).
Various heuristics and approximation algorithms have been developed in the last few decades. In addition, many local optimization techniques such as simulated annealing, tabu search, neural networks and genetic algorithms (see Aarts, Korst, & van Laarhoven, 1988;Fiechter, 1994;Gendreau, Laporte, & Semet, 1998;Grefenstette, Gopal, Rosmaita, & VanGucht, 1985;Knox, 1994;Larranaga, Kuijpers, Murga, Inza, & Dizdarevic, 1999;Leung, Jin, & Xu, 2004;Malek, Guruswamy, Pandya, & Owens, 1989) were developed. Hybrid algorithms were also proposed to solve efficiently the TSP. In Chen and Chien (2011), the authors proposed a hybridization of the genetic algorithm, the simulated annealing and the ant colony system with particle swarm optimization techniques. The experimental results showed that the proposed solution approach gives better average solution and percentage deviation of the average solution than existing methods. The authors in Dong, Guo, and Tickle (2012) proposed a new hybrid algorithm, cooperative genetic ant system (CGAS). This new approach combines cooperatively both Genetic Algorithm (GA) and Ant Colony Optimization (ACO) with the aim to improve the performance of ACO. The experimental results of CGAS are better than those of GA and ACO algorithms in terms of quality of average optimal solutions, particularly for small TS problems. Masutti and De Castro (2009) investigated a TSP problem. They proposed a solution approach by modifying the RABNET-TSP (an immune-inspired self-organizing neural network). The algorithm is then compared with other neural network-based methods proposed in the literature. The overall results were promising and better even though the algorithm requires longer processing time for convergence in many cases. The symmetric and asymmetric TSPs were studied in Osaba et al. (2016) and Osaba, Javier, Sadollah, Bilbao, and Camacho (2018). To solve these problems, the authors proposed an improvement of the classic bat algorithm as well as a discrete water cycle algorithm. Another type of TSP is called Family Travelling Salesman Problem (FTSP) and has attracted many researchers. In this problem, instead of knowing exactly the cities to be visited, we only know how many cities to visit in a subset of cities, which is called a family. For example, we can cite Bernardino and Paias (2018). In their paper, the authors proposed exact and heuristic procedures to solve the FTSP. In addition, compact and non-compact models were theoretically and practically compared. In the classical TSP, the distance from the city is fixed and known in advance. This assumption may not be always true where this distance may be subject to changes. Therefore, this problem is referred to as dynamic TSP as studied by Mavrovouniotis, Muller, and Yang (2017).
Real world TSPs are of large sizes. This fact motivated (Taillard & Helsgaun, 2019) to propose a metaheuristic to optimize locally sub-parts of a solution to a given problem. This heuristic can be used as part of many neighbourhood search methods. The authors showed that it finds efficiently high quality solutions.
The main contribution of this paper is the development of a Genetic Algorithm that solves the symmetric TSP. Usually, the initial population is randomly generated. However, in this work we proposed permutation rules to start with relatively good solutions. In addition, to help the genetic algorithm converge toward optimal or near optimal solutions, we proposed a dynamic crossover rate that depends on the population being considered.
This paper is organized as follows. The studied TSP is defined and formulated in Section 2. Section 3 describes the proposed method used to solve the TSP. Section 4 presents the testing and simulation results in addition to discussion and analysis of the obtained results. Section 5 concludes the paper and proposes some recommendations to improve the current solution approach as well as including more constraints to deal with more realistic TSPs.

Problem definition and formulation
The Travelling Salesman Problem (TSP) is one of the famous and well-studied problems in operations research fields. The problem is considered to be NPhard and is of great interest (Lawler et al., 1985). Hence, it has attracted many researchers.
Different versions of the travelling salesman problem exist based on the definition of the distances between the cities. The problem is said to be symmetric if for every pair of cities (a, b), the distance from city a to city b is the same as the distance from city b to city a. The problem is said to be asymmetric if this property does not hold. In addition, the problem is said to be Euclidean if the distance between two cities is the Euclidean distance. In this study we consider a symmetric and Euclidean TSP. Many real world applications of the TSP problem exist such as computer wiring, order-picking and overhauling gas turbine engines (Matai, Mittal, & Singh, 2010).
The TSP can be defined on a complete undirected graph G ¼ ðV; EÞ; where V ¼ f1; 2; . . . ; ng is the set of n vertices and E ¼ fði; jÞ; i; j 2 V and i 6 ¼ jg the set of edges (connections between cities). Figure 1 shows the graphical representation of the TSP described in Table 1. A cost c ij (can be distance, time, etc.) is assigned to each edge (i, j), where all the costs satisfy the constraint c ij c ik þ c kj for all i; j; k 2 V: In the current study, the vertices are assumed to be points P i ¼ ðX i ; Y i Þ in the plane and the Euclidean distance between P i and P j . The distances c ij can be represented by a symmetric square matrix C as shown in Table 1 describing a symmetric salesman problem of five cities.  Table 1. A feasible solution to the TSP is a tour T formed by a sequence of n selected edges pairwise distinct allowing to visit the n cities without redundancy except the start city which appears at the beginning and the end of the tour. In addition to the graphical representation, the TSP can be formulated as an Integer programming problem as shown in the following subsections.

Decision variables
Equation (1) defines a binary variable x ij associated with each edge (i, j) in the graph G. The values 1 and 0 of x ij indicate respectively the inclusion or exclusion of every edge (i, j) in the optimal tour.

Objective function
The objective expressed in Equation (2) is to find the shortest tour.

Constraints
The constraints to be considered are formulated in this section. Any feasible tour, including the optimal tour, must contain exactly n edges asl expressed in Equation (3).
Equation (4) states that only two edges must be selected for each vertex. This constraint allows to produce tours where every city is visited only once and the salesman returns back to the start city. X Finally, Equation (5) prohibits the formation of sub tours containing a number of vertices smaller than n. This constraint guarantees the visit of all cities. X

Solution approach
In this section, we describe in detail the proposed genetic algorithm to solve the travelling salesman problem. The motivation behind using Genetic Algorithms (GAs) is that they are simple and powerful optimization techniques to solve NP-hard problems. GAs start with a population of feasible solutions to an optimization problem and apply iteratively different operators to generate better solutions. These operators, based on random processes, allow GAs to explore the search space in different directions. GAs evaluate each individual of the population using a fitness function. Most fitted individuals are selected for reproduction with the aim of getting better feasible solutions. The reproduction process includes crossover and mutation operators. The evaluation, selection and reproduction are repeated until some stopping conditions are reached. The different components used to develop the genetic algorithm are detailed in the next subsections. The process of the PGAs is described in Figure 2.

Encoding
The genotype of the individuals (chromosomes) is represented by a sequence of n integers with no repeated values. This kind of path representation is simple to implement and provides chromosomes directly interpreted as TSP's solutions (tours). A chromosome Cr can be formulated as Cr ¼ ðg 1 ; g 2 ; . . . ; g n Þ where g i 2 V; 1 i n represents a gene (node) in the chromosome. An example of chromosome for the TSP instance shown in Table 1 is 1; 3; 4; 5; 2; 6: The resulting tour 1; 3; 4; 5; 2; 6; 1 is obtained from the chromosome by appending the initial city at the end.  Figure 2. Genetic algorithm flowchart diagram.

Initial population
Generating the initial population is a very important step in the GA. Usually, the initial population is randomly generated. In this paper, the initial population is carefully created to guarantee better and faster convergence toward the optimal or near optimal solutions. The initial population for the developed genetic algorithm was generated using two proposed heuristics. The first heuristic, named ANN is a generalization of the well known Nearest Neighbour approach. The second heuristic, namely 4Perm, aims to improve the solutions obtained by ANN by applying some permutation rules. These heuristics will be detailed in the next paragraphs.

All nearest neighbors (ANN)
ANNis a generalization of the classical Nearest Neighbor heuristic (NN) that consists of constructing a solution to the TSP starting always from city 1 (initial city). In every step, the salesman has to visit the non-visited city next nearest to his current location. We generalized this rule to generate n tours T i , ð1 i nÞ; where each tour T i starts from city i. The reason behind this is simply because all the generated tours via some rotations will all start from the initial city 1. But, the intermediate cities will have different orders of visits. ANN is described in Figure 3.
Computational complexity. First, we study the computational complexity of the nearest neighbour heuristic NN which starts from the city 1.
Step 1 consists of selecting the nearest city to city 1. This step requires the comparison of ðnÀ1Þ edges ð1; 2Þ; ð1; 3Þ; . . . ; ð1; nÞ to select the edge with minimum cost. In step 2, we compare ðnÀ2Þ edges and so on until there is only one edge. Therefore, there are ðnÀ1Þ þ ðnÀ2Þ þ . . . þ 1 ¼ nðnÀ1Þ 2 comparisons. Since ANN generates n different tours, each starts from a distinct first node, and the computational complexity of ANN is of the order of Oðn 3 Þ: 3.2.2. Four permutation rules-based heuristic (4Perm) The idea of the second heuristic is to check whether or not successive local and slight changes of a tour can lead to a better one. Since graphically a tour can be represented as a Hamiltonian cycle, replacing two carefully-selected edges (i, j) and (k, l) from the tour having no common city with two other edges that are not in the tour, can lead to a new and more feasible tour. Following are four proposed rules to perform the edge permutations in a feasible tour T with n edges. The digit 1 in the rule's name indicates the addition of an edge ði; jÞ 2 E; which is currently not in the tour, to the tour. However, the digit 0 indicates the removal of an edge from the tour.
Rule 1 (Perm 0011 ): Delete the edge ði; jÞ 2 T having the highest cost c ij from the tour. Then, delete the edge ðk; lÞ 2 T having the second highest cost c kl and that has no common city with the edge ði; jÞ: Finally, replace the deleted edges by their crossings ði; lÞ= 2T and ðj; kÞ= 2T Rule 2 (Perm 1100 ): Add to the tour the edge ði; jÞ= 2T having the lowest cost c ij . Then, insert the edge ðk; lÞ= 2T having the second lowest cost c kl and that has no common city with the edge ði; jÞ: Finally, delete the edges (i, k) and (l, j) from the tour. Rule 3 (Perm 1010 ): Add to the tour the edge ði; jÞ= 2T having the lowest cost c ij . Then, compare the costs of the four edges ði; i 1 Þ; ði; i 2 Þ; ðj; j 1 Þ and ðj; j 2 Þ that belong to the tour and neighbours to the cities i and j to exclude the edge with the highest cost. Without loss of generality, suppose that the edge ðj; j 1 Þ was excluded from the tour. Next, add to the tour the edge (j 1 , i 2 ) or the edge (j 1 , i 1 ) (the one that keeps the tour feasible); say ðj 1 ; i 2 Þ: Finally, delete the edge ði; i 2 Þ from the tour. Rule 4 (Perm 0101 ): Delete from the tour the edge ði; jÞ 2 T having the highest cost c ij . Then, insert an edge from the set fði; kÞ= 2Tandðj; kÞ= 2T; k 2 Vg having the lowest cost, say for example ði; kÞ: Next, delete the edge ðk; lÞ 2 T so that the insertion of the edge ðl; jÞ= 2T to T keeps T feasible. To illustrate the above described process, the permutation rules were applied to the TSP instance shown in Figure 1. The results are detailed in Figure 4. For example, Perm 0011 deletes the edges (1, 3) and (4, 5) and inserts the edges (1, 5) and (4, 3) into the tour. The resulting tour has a length slightly smaller than the initial one. It is clear from this figure that the four rules apply a slight perturbation to a given tour but the improvement in cost is not always guaranteed. Even an increase of the total cost of the resulting tour may occur. Figure 4 shows a cost improvement of the initial tour T 0 using the rules Perm 0011 and Perm 0101 and a cost increase given by the rules Perm 1100 and Perm 1010 .
It is also obvious that the improvement of a given tour cannot be always obtained by applying a special permutation rule. Thus, applying them altogether is the best way to guarantee a high chance of cost improvement. In addition to that, applying these rules iteratively to a given tour may also lead to the best nearest tour (local optima).
The aim of 4Perm heuristic is to improve the solutions generated by ANN. The idea is to iteratively apply the four permutation rules to every solution. At every iteration, the best tour obtained is kept for the next iteration. This step generates n new tours with lower costs. This process is iteratively repeated until no more cost improvement is obtained. Figure  5 illustrates the heuristic process.
Computational complexity. It is not evident to estimate the computational complexity of the 4Perm heuristic. The reason is that there are n loops during every one of them and a given tour is slightly modified using the four permutation rules until there is no more improvement. However, the experimental study showed that the considered loops stops quickly even for large size instances and the   improvement of every tour is very small as shown in Table 2. We can confirm according to the experimental study that the computational complexity of the 4Perm method is polynomial.

Evaluation
The evaluation function is used to assign to each chromosome in the population a fitness value. For the TSP problem, the fitness value is the inverse of the resulting tour's length. Therefore, the higher the fitness, the better the chromosome. The fitness function f of a chromosome Cr ¼ ðg 1 ; g 2 ; . . . ; g n Þ is given by Equation (6).

Selection
The selection is inspired from the natural phenomena that promote fitter individuals for reproduction. Meanwhile, some individuals with medium or even low fitness can be useful to introduce their good characteristics in the next generations. We followed this principle for the selection of chromosome candidates. The population is sorted in a decreasing order of the chromosomes' fitness values. Then, most of the current fittest chromosomes, some with medium fitness, and a few weak individuals are selected. The selection of the chromosomes for the reproduction phase proceeds as follows. Fifty per cent% of the population is selected representing the fittest chromosomes. Then 20% of chromosomes with medium fitness values and the last 10% of the chromosomes are selected. This process is illustrated in Figure 6.

Crossover
The crossover operator is the most important component of the GA. The process of crossover consists of combining two individuals to create new ones. These newly created chromosomes are then copied into the new population. In this study, many crossover operators were empirically tested with the aim to choose the most suitable one(s). Two different crossover operators were used, namely Ordered Crossover (OX) proposed by Goldberg (1989) and Sequential Constructive Crossover (SCX) proposed by Zakir (2010).

Mutation
The mutation is used so that the genetic algorithm does not get stacked in a local optimum. It is used to maintain the genetic diversity in the population. This operator modifies the genes of a chromosome selected with a mutation probability p m . Different mutation operators were empirically tested. Four different mutation operators proposed by Sivanandam and Deepa (2007)

Insertion method
The insertion method consists of copying the best chromosome from the old population to the new generation (Chakraborty & Chaudhuri, 2003). The solutions resulting from the crossover and mutation stages are also added to the new population so that the population size remains fixed from one generation to another. This method of forming the new population guarantees the convergence of the GA to a global optimal solution.

New dynamic crossover and mutation rates
Since the mutation and particularly the crossover rates have an impact on the efficiency of the solutions generated by the genetic algorithm, the aim of this method is that the crossover and mutation rates are no more fixed but vary from one generation to another. The aforementioned rates depend on the maximum fitness f max and average fitness f avg among the individuals in the population as well as the best known fitness f bkn . This idea was inspired from Li (2010). The new dynamic crossover and mutation rates are illustrated in Equations (7) and (8), respectively.
where k 1 and k 2 are parameters varying in ½0; 1 to promote some chromosomes to be selected for crossover mutation depending on their fitness values. The chromosomes with high fitness value will have more chance to be selected and vice versa. This process will probably help in producing fitter chromosomes in future generations.

Genetic algorithm parameter tuning
The genetic algorithm parameters values were empirically tuned. The population size is 100. The genetic algorithm was run 50 independent times. In each run, the cycle of the genetic algorithm going from evaluation of the current population until applying the mutation is repeated 100 times. The best solution over each run is saved. Finally, the best solution over all the 50 runs is returned. Applying dynamic crossover and mutation operators with variable probability values guarantees the convergence of the GA toward good solutions. The experimental results confirmed this as the obtained results are in general very promising. Moreover, multiple tests were conducted to adjust the values of the different genetic algorithm parameters. It was noticed that the best values for the parameters k 1 and k 2 are 0.6 and 0.1, respectively.

Experimental results
The ANN and 4Perm heuristics were coded in Python (Version 3.5.x), whereas the proposed genetic algorithm was implemented using Java language on a MacBook Pro 2.5 GHz Intel core i5. In order to validate the performance of the developed methods, many benchmarks with different sizes have been used from the TSPLIB website (Reinelt, 1997). We first tested the heuristics ANN and 4Perm. The computational results are summarized in Table 2 where C ANN and C 4Perm represent the lengths of the tours generated by ANN and 4Perm, respectively, and C Ã is the best-known tour length. We define the percentage error pe H given by Equation (9) as a measure of how far are the best solutions generated by a heuristic H to the best known solution. The results show clearly better performance of 4Perm compared to ANN. The last column of Table 2 shows that the percentage error of 4Perm compared to the best known solutions varies from 8.5 to 21.6. This can be explained by the fact that ANN and 4Perm do not have enough ability to avoid local minima. Therefore, the genetic algorithm is used to improve the quality of the solutions generated by ANN and 4Perm.
It was also noted from the computational results that SCX crossover along with Reverse Sequence Mutation (RSM) outperform all other crossover/mutation combinations for almost all the tested problems. Figure 7 shows the experimental results obtained for Eil76 dataset using different crossover/mutation operators' combinations. From that figure we can see that SCX/RSM is the best crossover/mutation combination. Table 3 shows a comparison of the experimental results of the best solution C GA (cost provided by the proposed genetic algorithm) with the best known solution C Ã : Different combinations of crossover and mutation operators were used. When examining Table 3, it can be seen that the proposed method solves optimally four instances and provides solutions with costs too close to the optimal for all other instances. Table 4 shows that the proposed method was able to generate the same or better solutions than most of the selected methods used for comparison.

Conclusion and future work
In this paper, we presented a new genetic algorithm to solve the symmetric travelling salesman problem. It combines the genetic algorithms and many heuristics to solve the aforementioned problem. Many crossover and mutation operators have been tested. In addition, dynamic crossover and mutation rates were proposed to increase the efficiency of the proposed approach. These parameters were calculated for every new population of individuals based on the average fitness and maximum fitness. We also have made experiments using benchmarks from the TSPLIB and have compared the obtained results with those obtained by many methods already proposed in the literature. The experimental results showed that the proposed heuristic performs very well in all tested instances. In addition, it outperforms many existing methods by providing optimal or very near optimal solutions. A future work would study other versions of the travelling salesman problem including random and dynamic new cities to visit, more than one salesman, many objectives etc.