Krill herd algorithm with chaotic time interval and elitism scheme

ABSTRACT We propose a new chaotic krill herd (CKH) in terms of the recently developed krill herd (KH) algorithm, to solve global numerical optimization problems. In CKH, chaos characteristics are introduced into the KH so as to further enhance its global search ability. The elitism scheme is also applied to store the best krill during the process when updating the krill. This new approach can speed up the global convergence, while preserving the advantage of the standard KH, thus making the approach more feasible for a wider range of practical applications. Here, thirteen different chaotic maps are used to tune the time interval of the krill in the KH algorithm. Twenty-four standard benchmark functions are utilized to verify the effects of the CKH and it has been demonstrated that, in most cases, the performance of CKH with a proper chaotic map is superior to, or at least highly competitive with, the standard KH and other population-based optimization methods. Highlights A new meta-heuristic algorithm, namely CKH is proposed for global optimization. 13 different chaotic maps are applied to tune the main parameter of the KH algorithm. The elitism scheme is applied to keep the best fitness krill. The CKH algorithm is compared with ten well-known methods.


Introduction
The process of optimization is searching for a vector in a function that produces an optimal solution. All of feasible values are available solutions and the extreme value is optimal solution. In general, modern intelligent algorithms are applied to solve optimization problems. A general classification way for these intelligent algorithms is considering the nature of the algorithms, and optimization algorithms can be divided into two main categories: deterministic algorithms, and stochastic algorithms. Deterministic algorithms using gradient such as hill-climbing have a rigorous move, and will generate the same set of solutions if the iterations start with the same initial starting point. On the other hand, stochastic algorithms without using gradient often generate different solutions even with the same initial value (Wang & Tan, 2019). However, generally speaking, the final values, though slightly different, will converge to the same optimal solutions within a given accuracy. The emergence of metaheuristic optimization method as a blessing from the global optimization theorem has opened CONTACT Shuxia Li lisx0717@163.com up a new facet to fulfil the optimization of a function.
Firstly presented by Gandomi and Alavi in 2012, inspired by the herding behaviour of krill individuals, krill herd (KH) is a novel swarm intelligence approach for optimizing possibly non-differentiable and nonlinear functions in continuous space (Gandomi & Alavi, 2012). In KH algorithm, the objective function for the krill movement is decided by the minimum distances of each krill individual from food and from the highest density of the herd. The time-dependent position of the krill individuals consists of three main components: (i) movement led by other individuals (ii) foraging motion, and (iii) random physical diffusion. One of the remarkable advantage of the KH algorithm is that the KH needs not derivative information because it uses a stochastic search rather than a gradient search. Furthermore, comparing with other population-based metaheuristic algorithms, this new method requires few control variables, in essence only a single parameter t (time interval) to regulate, which makes KH easy to implement, more robust, and is very appropriate for parallel computation.
KH is an efficient and powerful algorithm in exploitation but at times it may trap into some local optima so that it cannot implement global search well. For KH, the search depends completely on random walks, so there is no guarantee for a fast convergence. In order to improve KH in optimization problems, a method has been proposed (Wang et al., 2014c), which introduces mutation scheme into KH to add the diversity of population. Also, recently, some other updating strategies have been proposed to further improve the performance of the standard KH algorithm (Wang, Gandomi, Alavi, & Deb, 2016g).
On the other hand, recent significant advances in theories and applications of nonlinear dynamics, especially of chaos, have drawn more attention in many fields. One of these fields is the applications of chaos in function optimization methods (Yang, Li, & Cheng, 2007).
Firstly presented here, chaotic KH-based methods are proposed in this paper, with the aim of accelerating convergence speed, thus making the approach more feasible for a wider range of practical applications without losing the attractive characteristics of the standard KH. In these algorithms, we use thirteen different 1-D chaotic maps to replace the parameter of the KH. Therefore different methods that use chaotic maps as quite efficient alternatives to pseudorandom sequences have been proposed. The proposed method is evaluated on twenty-four standard benchmark functions that have ever been applied to verify optimization algorithms in continuous optimization problems. Experimental results show that the CKH performs more efficiently and accurately than standard KH, ACO, BA, CS, DE, ES, GA, HS, PBIL, and PSO. The results reveal the improvement of the novel methods because of the application of deterministic chaotic signals in place of constant time interval.
The structure of this paper is organized as follows: Section 2 gives a brief description of standard KH algorithm and 13 chaotic maps. Our proposed method CKH is presented in detail in Section 3. Subsequently, the tuning of the time interval and finding the optimal chaotic KH are discussed in Section 4. In addition, comparing with ACO, BA, CS, DE, ES, GA, HS, KH, PBIL, and PSO, our method is also evaluated through twenty-four benchmark functions. Finally, Section 5 involves the conclusion and proposals for future work.

Preliminary
At first, in this section we will provide a background on the krill herd algorithm and 13 chaotic maps in brief.

Krill herd algorithm
Krill herd (KH) (Gandomi & Alavi, 2012) is a novel metaheuristic optimization method for solving optimization problems, which is based on the imitation of the herding of the krill swarms in response to specific biological and environmental processes. The time-dependent position of an individual krill in 2-D surface is decided by three main actions described as follows: (1) movement influenced by other krill individuals; (2) foraging action; (3) random physical diffusion.
KH algorithm adopted the following Lagrangian model in a d-dimensional decision space as shown in Equation (1).
where N i is the motion induced by other krill individuals; F i is the foraging motion, and D i is the physical diffusion of the ith krill individual.
In movement induced by other krill individuals, the direction of motion induced, α i , is estimated by the target swarm density (target effect), a local swarm density (local effect), and a repulsive swarm density (repulsive effect). For a krill individual, this movement can be given as and N max is the maximum induced speed, ω n is the inertia weight of the motion induced in [0,1], N old i is the last motion induced.
The foraging motion is determined by the two main factors. One factor is the food location and the other one is the previous experience about the food location. For the ith krill individual, this motion can be defined as follows: and V f is the foraging speed, ω f is the inertia weight of the foraging motion between 0 and 1, F old i is the last foraging motion.
The physical diffusion of the krill individuals can be treated as a random process. This motion can be described according to a maximum diffusion speed and a random directional vector. It can be expressed as follows: where D max is the maximum diffusion speed, and δ is the random directional vector and its arrays are random values in [−1, 1]. Based on the three above-mentioned movements, using different effective parameters of the motion during the time, the position vector of a krill individual during the interval t to t + t is formulated by the following equation: Note that t is one of the most important constants and should be carefully regulated in terms of the given practical optimization problem. This is because this parameter can be considered to be a scale factor of the speed vector. t completely relies on the search space and it seems to be simply obtained from the following expression: where NV is the total number of variables, and UB j and LB j are upper and lower bounds of the jth variables (j = 1, 2, . . . , NV), respectively. Thus, the absolute of their subtraction shows the search space. Very clearly, low values of C t make the krill individuals perform the search in the space carefully. More details about the three main motions and KH algorithm can be found in (Bolaji, Al-Betar, Awadallah, Khader, & Abualigah, 2016;Gandomi & Alavi, 2012;Wang, Gandomi, Alavi, & Gong, 2019).

Chaotic maps
In random-based optimization methods, the methods using chaotic variables instead of random variables are called chaotic optimization algorithm (COA). In these algorithms, because chaos has the feature of the nonrepetition and ergodicity, it can perform overall searches at higher speeds than stochastic searches that rely on probabilities. To fulfil this matter, herein 1-D noninvertible maps are used to produce chaotic sets. In our present study, the following 13 well-known 1-D chaotic maps are applied to form chaotic KH represented in Table  1. More details about the 13 chaotic maps can be found in (Gandomi et al., 2013b).

Our approach: CKH
As it is presented in Equation (6), the main parameter of the KH is the time interval t that can be considered to be a scale factor of the speed vector and features the variations of the global best attraction, and its value is of vital importance in determining the speed of the convergence and how the KH works.
The standard KH is very efficient and powerful, but the solutions have slight changes as the optima are approaching. In KH, the time interval t is unaltered constant evaluated by Equation (7).
In the standard KH, there is no need to keep constant time interval value. As a matter of fact, a bigger time interval emphasizes exploration at the beginning of the search, while a smaller time interval encourages exploitation and makes the krill individuals to search the space carefully at the end of the search. Therefore, a chaotic varying time interval t may be advantageous, which may also result in the speedup of convergence of the method as we will see in the next section. As all chaotic maps are normalized, the variations of a chaotic map are always in the range [0, 1]. Therefore, chaotic maps can be applied to tune the parameter of the time interval t, and such chaos-based KH is called the chaotic KH. The pseudo code of chaotic KH algorithm can be simply summarized as shown in Figure 1.
The tuning of time interval t using chaotic maps also enhance the ability of the KH to evade trapping into local optima in a multimodal landscape, thus making the method more feasible for a wider range of applications but without losing the attractive characteristics of the original method. Comparing with other optimization methods, this could be an advantage for this algorithm as we can see in the simulations below.
In addition, another important improvement is the addition of elitism scheme into the CKH. As with other population-based optimization methods, we typically combine some sort of elitism so as to keep the best solutions in the population. In the major cycle of the CKH, to begin with, the KEEP best solutions are stored in a variable KEEPKRILL. In general, eventually the KEEP worst solutions are replaced by the KEEP best solutions. This elitism scheme can guarantee that the whole population cannot be declined to the population with worse fitness than the former. This prevents the best solutions from being ruined by motion calculation operator. Note that we use an elitism strategy to save the property of the krill that has the best fitness in the CKH process, so even if motion calculation operation corrupts its corresponding krill, we have stored it and can recover to its former good status if needed.

Simulation experiments
In this section, we evaluate the performance of our proposed method CKH to global numerical optimization through a series of experiments conducted in benchmark functions.
To allow an unbiased comparison of running times, all the experiments were implemented on a PC with a Pentium IV processor running at 2.0 GHz, 512 MB of RAM and a hard drive of 160 Gbytes. Our implementation was compiled using MATLAB R2012a (7.14) running under Windows XP3. No commercial KH tool was used in the following simulation experiments.
Well-defined problem sets are good for testing the performance of optimization methods proposed in our work. Based on mathematical functions, benchmark functions can be applied as objective functions to carry out such tests. In our study, twenty-four different benchmark functions are applied to verify our proposed metaheuristic method CKH. The benchmark functions described in Table 2 are standard testing functions. More details of all  the benchmark function can be found in (Ali, Khompatraporn, & Zabinsky, 2005;Yao, Liu, & Lin, 1999).

The performance of CKH with thirteen different chaotic maps
Different chaotic KH variants were benchmarked using fourteen well-known numerical examples. In our present study, we use the benchmark functions F01-F14 to select the optimal chaotic map. These benchmark functions are provided in Table 2.
It is feasible to improve the solution quality by using the chaotic maps. In this subsection, tuning the time interval t are carried out. Here, the value of t is replaced with different chaotic maps represented in Subsection 2.2. In order to fulfil this, all the maps are normalized in [0, 1].
We set population size NP = 50, an elitism parameter Keep = 2 and maximum generation Maxgen = 50 for CKH algorithm. We ran 100 Monte Carlo simulations of CKH method on fourteen selected benchmarks to get typical performances. Table 3 illustrates the results of the simulations. The first half part in Table 3 shows the average minima found by CKH algorithm, averaged over 100 Monte Carlo runs. The second half part in Table 3 shows the absolute best minima found by CKH algorithm over 100 Monte Carlo runs. The best value obtained for each benchmark is marked in bold. Note that the normalizations for average and best values in the tables are based on different scales, so values cannot be compared  Table 3 shows the average minima found by CKH algorithm, averaged over 100 Monte Carlo runs. The second half part in Table 3 shows the absolute best minima found by CKH algorithm over 100 Monte Carlo runs. *The values are normalized so that the minimum in each row is 1.00. between the two parts. Each of the functions in this experiment has 20 independent variables (i.e. d = 20).
From Table 3, it can be seen that the M9 (Sine map) is significantly outperforms and more effective than other chaotic maps when multiple runs are made. Therefore, it is selected as the best map to be used for chaotic KH (CKH) which will be compared and discussed in more detail in the next subsection.

General performance of CKH
In order to prove the superiority of CKH, in this subsection we compared its performance on global numeric optimization problem with nine other population-based optimization methods (except KH), which are ACO, BA, CS, DE, ES, GA, HS, PBIL, and PSO. ACO (ant colony optimization) (Dorigo & Stutzle, 2004) is an efficient swarm intelligence algorithm for solving optimization problems which is based on the pheromone deposition of ants. BA (bat algorithm) ) is a new powerful metaheuristic optimization method inspired by the echolocation behaviour of bats with varying pulse rates of emission and loudness. CS (cuckoo search) ) is a metaheuristic optimization algorithm inspired by the obligate brood parasitism of some cuckoo species by laying their eggs in the nests of other host birds. DE (differential evolution) (Storn & Price, 1997) is a simple but robust optimization method that uses the difference between two solutions to probabilistically adapt a third solution. An ES (evolutionary strategy) (Beyer, 2001) is an evolutionary algorithm (Hao, Lim, Ong, Huang, & Wang, 2019) that generally distributes equal importance to mutation and recombination, and that allows more than two parents to reproduce an offspring. A GA (genetic algorithm) (Goldberg, 1998) is a search heuristic that mimics the process of natural evolution. HS (harmony search) (Geem et al., 2001;Wang et al., 2013b) is a new metaheuristic approach inspired by behaviour of musician' improvization process. PBIL (probability-based incremental learning) (Baluja, 1994) is a type of genetic algorithm where the genotype of an entire population (probability vector) is evolved instead of individual members. PSO (particle swarm optimization) (Kennedy & Eberhart, 1995) is also a swarm intelligence algorithm which is based on the swarm behaviour of bird, and fish schooling in nature. In addition, we must point out that, in (Gandomi & Alavi, 2012), Gandomi and Alavi have proved that, comparing all the algorithms, the KH II (KH with crossover operator) performed the which confirms the robustness of the KH algorithm. Therefore, in our work, we use KH II as standard KH algorithm. In the following experiments, we will use the same parameters for KH and CKH that are the foraging speed V f = 0.02, the maximum diffusion speed D max = 0.005, the maximum induced speed N max = 0.01. For ACO, DE, ES, GA, PBIL, and PSO, we set the same parameters as (Simon, 2008). For BA, we set loudness A = 0.95, pulse rate r = 0.5, and scaling factor = 0.1; for CS, a discovery rate p a = 0.25. For HS, we set harmony memory accepting rate = 0.75, and pitch adjusting rate = 0.7.
For each algorithm, we set population size NP = 50, an elitism parameter Keep = 2 and maximum gener-  Table 4 shows the average minima found by each method, averaged over 100 Monte Carlo runs. Table 5 shows the absolute best minima found by each algorithm over 100 Monte Carlo runs. In other words, Table 4 shows the average performance of each algorithm, while Table 5 shows the best performance of each algorithm. The best value obtained for each test problem is marked in bold. Note that the normalizations in Tables 4 and 5 are based on different scales, so values cannot be compared between the two tables. In our present study, benchmarks F01-F14, F15-F18, F19-F23 and F24 have 20, 2, 4, and 6 independent variables (i.e. d = 20, 2, 4 and 6), respectively.
From Table 4, we see that, on average, CKH is the most effective at finding objective function minimum on seventeen of the twenty-four benchmarks (F01-F08, F12-F16, and F21-F23). DE is the second most effective, performing best on eight of the twenty-four benchmarks (F11, F11-F20 and F24) when multiple runs are made. CS and GA are the third most effective, performing best on four of the twenty-four benchmarks (F10, F15, F16, F18 and F09, F16-F18, respectively) when multiple runs are made. Looking carefully at Table 4, for high-dimensional benchmarks F01-F14 (d = 20), CKH performs significantly better than other algorithms by and large. However, for benchmarks F15-F24 (d = 2, 4, and 6), there is little difference between CKH and other algorithms, especially for benchmarks F15-F18 (d = 2). This phenomenon demonstrates the superiority of CKH in solving the high-dimensional function optimization problems with certain appropriate chaotic map.
From Table 5 shows that CKH performs the best on twenty of the twenty-four benchmarks (F01, F03, F06-F08, and F10-F24). DE and KH are the second and third most effective, performing the best on the benchmarks *The values are normalized so that the minimum in each row is 1.00. These are the absolute best minima found by each algorithm.
F11, F15-F19, F24 and F15, F16, F18, F21-F23 when multiple runs are made respectively. ACO and GA are the fourth most effective, performing the best on the benchmarks F04-F05, F16-F18 and F02, F09, F16-F18 when multiple runs are made respectively. For other algorithms, BA, CS, ES, HS, and PSO perform the fifth on four of the twenty-four benchmarks (F15-F18). In addition, PBIL works well because PBIL ranks six, and performs the best on the benchmarks F16-18. Similar to the results shown in Table 4, for benchmarks F01-F14 (d = 20), CKH performs significantly better than other algorithms. However, for benchmarks F15-F24 (d = 2, 4, and 6), there is slight difference between CKH and other algorithms. Furthermore, for benchmarks F15-F18 (d = 2), all the eleven optimization algorithms perform almost identically with each other. Benchmark functions F15-24 are too simple to clarify the performance difference among eleven methods. As it can be seen from Tables 4 and 5, replacing time interval t with chaotic maps can definitely improve the performance of the KH. Moreover, the computational requirements of the eleven optimization algorithms were homologous. We congregated the average computational time of the optimization algorithms as used to the 24 benchmarks discussed in this section. The results are shown in the last row in Table 4. For this case, we can see that, clearly PBIL was the most efficient optimization method. CKH was the fifth fastest of the eleven methods. However, note that in the great majority of practical applications, it is the fitness function evaluation that is by far the most expensive part of a population-based optimization method.
Further, limited by the length of the paper, some con-  Figure 2 shows the results obtained for the eleven methods when the F01 Ackley function is applied. Ackley function is a multimodal function with a narrow global minimum basin (F01 min = 0) and many minor local optima. From Figure 2, clearly, we can draw the conclusion that, CKH is significantly superior to the other algorithms during the process of optimization, while KH II performs the second best in this multimodal benchmark function. Here, all the algorithms show the almost same starting point, while CKH has a stably faster convergence rate than other algorithms. All though slower, DE performs the third best at finding the global minimum. All methods clearly outperform the PBIL algorithm. Figure 3 shows the optimization results for the F07 Rastrigin function, which is a complex multimodal problem with a unique global minimum of F07 min = 0 and a      great number of local optima. When trying to solve F07, methods may easily trap into a local optimum. Therefore, a method capable of maintaining a larger diversity is likely to produce better solutions. At first glance, it is clear that, CKH significantly outperforms all other approaches. For other algorithms, PSO has the fastest convergence rate initially at finding the global minimum. However, it seems to be attracted to sub-minima as the distance from the global minimum increases slightly and it is outperformed by CKH after 15 generations. Further, DE and GA works very well because the DE and GA have ranks of 2, 3, respectively, among eleven algorithms. Figure 4 shows the results for F12 Schwefel 2.21 function. At first glance, it is obvious that CKH has the fastest convergence rate finding the global minimum in the whole optimization progress. CKH reaches the optimal solution significantly superiorly to other algorithms. KH II is only inferior to CKH, and performs the second best in this unimodal function. Figure 5 shows the results for F13 Sphere function, which is also known as de Jong's function, and has a unique global minimum of F13 min = 0, therefore it is easy to solve. From Figure 5, CKH has the fastest convergence rate at finding the global minimum and outperforms all other approaches. Looking carefully at Figure 5, for DE and PSO, we can see that PSO has a faster convergence rate than DE, but DE does finally converge to the value of CKH, while PSO not. KH II does not manage to succeed in this relatively simple problem within the maximum number of iterations, showing a wide range of obtained results. Figure 6 shows the results for F21 Shekel 1 function. From Figure 6, clearly, we can draw the conclusion that, CKH is significantly superior to the other algorithms during the process of optimization, while DE performs the second best in this multimodal benchmark function. Figure 7 shows the results for F22 Shekel 2 function. Apparently, from Figure 7, analogous to the F21 Shekel 1 function as shown in Figure 6, CKH shows the fastest convergence rate at finding the global minimum and significantly outperforms all other approaches. Looking carefully at Figure 7, for other algorithms, DE performs the second best in this multimodal benchmark function and it is only inferior to CKH and reaches the minimum that is very close to CKH's. Further, KH II works very well because the KH II ranks 3, among eleven algorithms. For other algorithms, ACO, BA, CS, ES, HS, PBIL, and PSO do not manage to succeed in this benchmark function within maximum number of generations, showing a wide range of obtained results.
From above-analyses about Figures 2-7, we can draw the conclusion that the CKH's performance is superior to or at least quite competitive with the other ten acclaimed state-of-the-art population-based algorithms. In general, DE and KH II are only inferior to the CKH. Further, benchmark F07 also illustrates that PSO has a much faster convergence rate initially, while later it converges slower and slower to the true objective function value.

Discussion
For all the standard benchmark functions discussed in this section, the CKH has been demonstrated to perform better than or at least highly competitive with the standard KH and other acclaimed state-of-the-art populationbased methods. The advantages of CKH involve performing simply and easily, and have no parameter to tune. The work conducted here proves the CKH to be robust, powerful and efficient over all kinds of benchmark functions.
Benchmark testing is a good way for evaluating the performance of the metaheuristic methods, but it is also not perfect and has some limitations. First, we did not do much work painstakingly to carefully regulate the optimization methods in this section. In general, different tuning parameter values in the optimization methods might lead to significant differences in their performance. Second, practical optimization problems may have little of a relationship to benchmark functions. Third, benchmark tests may come to quite different conclusions if the grading criteria or problem setup change. In our present study, we investigated the mean and best results achieved with some population size and after some number of iterations. However, we might draw different conclusions if (for example) we change the population size, or look at how many population size it needs to arrive at a certain function value, or if we change the iteration. Regardless of these caveats, the benchmark results shown here are prospective for CKH, and indicate that this novel method might be capable of finding a niche among the plethora of population-based optimization methods.
It should be noted that CPU time is a bottleneck to the implementation of many population-based optimization methods. If a method can't converge fast, it will be impractical and infeasible, since it would take too long to find an optimal or near-optimal solution. CKH seems not to require an unreasonable amount of computational requirements; of the eleven optimization methods compared in this paper, CKH was the fifth fastest. How to speed up the CKH's convergence still deserves further scrutiny.
In our present work, 24 benchmark functions are used to verify the performance of our method; we will test our proposed method on more problems, such as the highdimensional (d ≥ 20) CEC 2010 test suit (Tang, Li, Suganthan, Yang, & Weise, 2010) and the practical engineering problems. Moreover, we will compare CKH with other optimization methods. In addition, we only consider the unconstrained function optimization in this study. Our future work consists on adding the diversity rules into CKH for constrained optimization problems, such as constrained real-parameter optimization CEC 2010 test suit (Mallipeddi & Suganthan, 2010).

Conclusion and future work
Chaos has been introduced to the standard KH to develop a novel improved metaheuristic chaotic KH method for optimization problem. Thirteen different chaotic maps have been investigated to tune the time interval, t, of the KH. By comparing different chaotic KH variants, the algorithm that uses the Sine map as its t is the best chaotic KH (CKH). The results have also revealed the significant improvement of the new method with the application of deterministic chaotic signals instead of constant time interval value. This new method can speed up the global convergence rate without losing the strong robustness of the standard KH. From the analysis of the experimental results, we observe that the tuned KH clearly improves the reliability of the global optimality and they also improve the quality of the solutions. Based on the results of the eleven approaches on the test problems, we can conclude that the CKH significantly improves the performances of the KH on most multimodal and unimodal problems. In addition, the CKH is simple and implements easily.
Further, it can be observed that, comparing with other population-based optimization methods the main advantage of the CKH is that there is a fewer number of parameters. All methods have method-dependent parameters, and the performance of a method is mainly influenced by these parameters. Therefore, it is often one of the most difficult tasks to find optimal values for parameters in an optimization method. In this paper, we provided a strategy to solve this challenging issue by using chaotic maps to replace these parameters. In CKH, there is only one parameter, t, to be tuned using chaotic maps.
In the field of numerical optimization, there are many issues that are worthy of further study, and some more efficient optimization methods should be developed depending on the analysis of specific engineering problem. Our future work will focus on the two issues. On the one hand, we would apply our proposed method CKH to solve practical engineering optimization problems, and, obviously, CKH can be a promising method for real-world engineering optimization problems. On the other hand, we would develop more new metaheuristic method to solve optimization problem.

Disclosure statement
No potential conflict of interest was reported by the authors.