Improving CPU utilization of interleaving generation parallel evolutionary algorithm with precedence evaluation of tentative solutions and their suspension

This paper proposes a new mechanism to improve the CPU efficiency of parallel evolutionary algorithms (PEAs). The proposed method is based on interleaving generation evolutionary algorithm (IGEA) that was proposed in a previous study. Whereas PEA generates offspring after all individuals are evaluated, IGEA generates offspring of which all parents have been determined before other evaluations are completed. The proposed method introduced a precedence evaluation of tentative offspring and their suspension mechanism into IGEA. In particular, while IGEA generates offspring of which all parents have been determined, the proposed method tentatively generates offspring when one of two parents has been determined and then begins their evaluations. The evaluation of unnecessary offspring is suspended when the other parent of tentative offspring is determined. We compare the proposed method with the original IGEA and a simple PEA to investigate the effectiveness of the proposed method. This paper considers two replacement schemes of PEAs, -PEA and -PEA. The experimental results reveal that the proposed method has higher CPU utilization than the original IGEA and the simple PEA on both schemes.


Introduction
Evolutionary algorithms (EAs) typified by genetic algorithm (GA) [1] and genetic programming (GP) [2] have high search capability without any problemspecific knowledge. EAs are widely applied to realworld optimization problems [3,4]. When applying EAs to real-world optimization problems, solution evaluations need much computational time due to, for example, physical simulations to evaluate solutions or measurement of actual consumption time. In such a situation, parallel evolutionary algorithms (PEAs) [5,6] are a possible attempt for accelerating the optimization process by parallelizing fitness evaluations. Although PEAs have been widely studied for reducing the enormous computation time of evolutionary algorithms (EAs), the conventional PEA has a problem that the parallelization efficiency decreases when the difference in the evaluation time of solutions is significant [7][8][9]. This is because the conventional PEA is generally based on the generation-based approach where the new population is generated after all solutions are evaluated, and it is necessary to wait for the slowest evaluation.
For improving the computational efficiency of PEAs, asynchronous PEAs (APEAs) have been proposed [10]. Since APEAs generate a new individual immediately after one individual is evaluated, it can reduce idling time to wait for the evaluation of the slowest individual and utilize 100% of the computing resources regardless of the difference in the evaluation time of solutions. However, APEAs have a problem when the evaluation time is biased. Due to the evaluation time bias, APEAs suffer from converging to the search area with a shorter evaluation time [11]. Therefore, APEAs fall into local optima with a shorter evaluation time rather than the global optimum with a longer evaluation time [12].
To overcome such a problem, Pilát and Neruda proposed an interleaving generation evolutionary algorithm (IGEA) [13]. IGEA is based on a generationbased approach, but IGEA can reduce the computation time by generating new offspring of which parents have been determined even when all individuals have not been evaluated. Since IGEA is the generation-based scheme, it does not suffer from falling into local optimum with a shorter evaluation time. However, IGEA has a problem that decreases the computational efficiency while increasing the number of CPUs. This is because the number of unused processors increases due to increasing the probability that one of the two parents has not been evaluated.
In this work, we propose a method for improving the computational efficiency of IGEA. In particular, for increasing the CPU utilization of IGEA, the proposed method introduces a mechanism of a precedence evaluation of tentative offspring that is generated one of two parents is determined. Additionally, when another parent of tentative offspring completes its evaluation, the proposed method suspends the evaluation of tentative offspring that need not be evaluated.
To investigate the effectiveness of the proposed method, we conduct two experiments to compare the performance of the proposed method with the original IGEA and a simple generation-based PEA. Each experiment uses several fitness functions and the different features of the evaluation time. We consider two variants of replacement schemes, (λ, λ)-EA and (λ + λ)-EA.
This paper is an extension of our conference paper [14]. The additional original contribution of this paper from the conference paper is as follows: • This paper implements the proposed method to the (λ + λ)-IGEA variant, though our conference paper implemented only the (λ, λ)-IGEA variants. • This paper compares not only the proposed method and the original PEA but also the simple generationbased PEA, though our conference paper only compares the proposed method with the original IGEA. • This paper further analyses the behaviour of precedence evaluation of tentative solutions and the suspension mechanism.
The remainder of this paper is organized as follows. The following section describes the overview of PEAs. Section 3 shows the detail of IGEA. Then, Section 4 describes the detail of the proposed method based on the IGEA. Section 5 describes the experimental settings in which the proposed method is compared with a simple PEA and the original IGEA. Section 6 shows the experimental results and their analyses. Section 7 analyses the effectiveness of the suspension of precedence evaluation, and we discuss the applicability and limitation of the proposed method. Finally, Section 8 concludes this paper and presents future works.

Parallel evolutionary algorithm
Evolutionary algorithms (EAs) are meta-heuristics that search for optimal solutions by imitating the natural evolution of organisms. Since an EA needs a huge amount of computation time for solution evaluations, parallel EAs (PEAs) have been studied.
The master-slave model is a simple and general parallelization model that evaluates the fitness of individuals in parallel [15]. PEAs using the master-slave parallelization model perform the main procedure of EAs, e.g. initialization, genetic operators, selection, and population replacement, on a master node, while the fitness evaluation is performed on multiple processors, called slave nodes, in parallel [16]. An illustration of the master-slave model is shown in Figure 1. The master processor sends the information of the individual to each slave processor. Then, the slave processor calculates the fitness value of the individual based on the received information and returns the result to the master processor.

Interleaving generation evolutionary algorithm
This section describes previous research related to our proposed method, named interleaving generation evolutionary algorithm (IGEA) [13] proposed by Pilát and Neruda.
IGEA is based on a generation-based PEA and improves its CPU efficiency by reducing the waiting time of the slave nodes. In particular, IGEA generates individuals which parents complete their evaluations and starts their evaluations on the slave nodes. This is possible because, in general, a few parent individuals are needed to generate an offspring individual, and it is enough to wait for the evaluations of the parent individuals. The original IGEA is available in two population replacement schemes, (λ, λ)-EA and (λ + λ)-EA. (λ, λ)-EA generates λ offspring from λ population and fully replaces the current population with newly generated offspring. (λ + λ)-EA, on the other hand, generates λ offspring from λ population and selects the best λ individuals from a merged set of the λ population and the λ offspring.

(λ, λ)-IGEA
(λ, λ)-IGEA is composed of the main function and the propagation function. The main function and the propagation function are shown in Algorithms 1 and 2, respectively.
In the main function in Algorithm 1, first, an initial population G 0 is randomly generated. G 0 contains the aspirant pair for tournament selection. Then, individuals in the aspirant list are submitted to the slave nodes and evaluated. The obtained fitness is assigned Algorithm 1 (λ, λ)-IGEA. 1: G 0 ← initial population 2: for all p in G 0 do 3: if p is aspirants in G 0 then 4: Submit parent to a slave node for the evaluation 5: end if 6: end for 7: while not termination condition do 8: i, f ← receive an evaluation result from a slave node 9: Assign fitness f to individual i in G i.gen 10: Comma-Propagate(i, G) 11: end while to the individual completing the evaluation, and then the propagation function in Algorithm 2 is performed.
In the propagation function in Algorithm 2, first, all aspirant pairs in the current generation g are examined. If both individuals in an aspirant pair (a and b in Algorithm 2) have already been evaluated, the tournament selection is performed (line 6 in Algorithm 2). An individual selected by the tournament selection is placed at the same index as the aspirant pair index in the array of g sel . Both parents (p 1 and p 2 ) have been already evaluated, their offspring (g off [p] and g off [p ]) are generated by performing genetic operators. The generated offspring are placed in the offspring array g off and copied to the parent population of the next generation (g ). If the new offspring are not evaluated and are needed to generate the next offspring, they are submitted to the slave nodes for evaluation. Otherwise, the propagation function is called recursively for the evaluated offspring. Since new offspring are submitted for evaluation in line 18 in Algorithm 2 before other individuals in the same generation have not been evaluated, (λ, λ)-IGEA can improve the efficiency of the slave processors compared to the generation-based PEA. Figure 2 shows an example of the (λ, λ)-IGEA flow at the tournament selection and genetic operators (lines 13-23 in Algorithm 2). In this example, the offspring x 3 and x 4 (corresponding to g off [p] and g off [p ] in Algorithm 2) are generated by the winners (p 1 and p 2 in Algorithm 2) from two tournament selections; one is a winner of x 1 and x 5 , while another is a winner of x 3 and x 6 . In such a case, the tournament selection can be performed when individuals x 1 , x 3 , x 5 , and x 6 have been evaluated, even if other individuals have not been evaluated.
Next generation 3: for all aspirant pairs (a, b) in g that contain ind do 4: if both a and b are already in g and evaluated then 5: i ← index of the aspirant pair 6: g sel [i] ← better of a and b 7: if i even then Get parents 8: i ← i +  16: for all new offspring o in g do 17: if o not evaluated and in aspirants in g then 18: Submit o for evaluation 19 Next generation 3: P ← evaluates parents from g 4: O ← evaluates offspring from g 5: S ← best | P ∪ U | − λ individuals from P ∪ U 6: S ← individuals from S which are not among parents of g 7: for all individuals ind from S do 8: Add ind to parents of g 9: for all aspirant pairs (a, b) in g that contain ind do 10: if both a and b are already in g and evaluated then 11: i ← index of the aspirant pair 12: g sel [i] ← better of a and b 13: if i even then Get parents 14: i ← i + 1 15: else 16: i  the one of (λ, λ)-IGEA, but calls the propagation function shown in Algorithm 3 at line 10 in Algorithm 1.
In the propagation function in Algorithm 3, when the k > λ individuals in a merged set of parents and offspring of the previous generation have been evaluated, the best k − λ individuals are determined to be selected in the next generation. Thus, offspring of which parents are determined can be generated as similar to (λ, λ)-IGEA.  generated by the winners from two tournament selections of p 1 , p 5 , p 3 , and p 4 (p 1 and p 2 in Algorithm 3). In such a case, the tournament selection can be performed when the top 5 individuals are determined.

Proposed method
In this paper, we propose a modified version of IGEA that introduces the precedence evaluation of tentative offspring for improving the efficiency of the slave processors. This section firstly describes the problem of the original IGEA. Then, we propose the modification of IGEA and show its detailed algorithm.

Problem of IGEA
Since the original IGEA is based on a generation-based EA, it has a problem that the efficiency of the slave processors decreases as the number of used processors increases. This is because the number of unused slave processors increases due to an increase in the waiting time of evaluations of the individuals by the evaluation time bias. For example, in Figure 2, when an individual x 8 has not been evaluated yet, the tournament selection using x 8 cannot be performed and the waiting time occurs. A similar problem arises even in (λ + λ)-IGEA. For example, in Figure 3, since the sixth parent individual cannot be determined, the tournament selection using p 6 cannot be performed, and the generation procedure stagnates.

Two modifications
For the problems of the original IGEA described in Section 4.1, we propose a method to improve the efficiency of the slave processors in IGEA by introducing two modifications, the precedence evaluation of tentative offspring and the suspension of unnecessary precedence evaluation.

Precedence evaluation of tentative offspring
In order to reduce the waiting time in IGEA, we propose a method to execute the precedence evaluation of tentative offspring.
In the original (λ, λ)-IGEA, when two parents are determined, new offspring are generated, and their evaluations start. However, even when one of two parents is evaluated, and another parent is not determined, there are only two possibilities of offspring, that is, either of the corresponding aspirant pair wins. For this fact, our proposed method performs the tournament selection and genetic operations when one of two parents is determined in advance, and it starts evaluations of newly generated tentative offspring. Figure 4 shows an example of the tentative offspring generation. In this figure, three of four parents, i.e. x 2 , x 4 , and x 7 , complete their evaluations, and one parent x 4 is determined, which is the winner of x 4 and x 7 . On the other hand, since the evaluation of x 8 has not been completed, another parent is not determined. However, in such a case, another parent is either x 2 or x 8 . Then, the proposed method temporally generates offspring for the case that x 2 wins, and for the other case that x 8 wins as shown in Figure 4. As a result, four tentative offspring are generated, and the proposed method starts their evaluations immediately, even though the evaluation of x 8 has not been completed yet.
A similar mechanism can be considered in (λ + λ)-IGEA. In particular, when three of four parent candidates have been evaluated, one parent is determined, while another parent is either an individual that has already been evaluated or has not been evaluated. In such a case, as similar to the mechanism of (λ, λ) scheme, tentative offspring can be generated. Figure 5 shows an example of the tentative offspring generation in (λ + λ)-IGEA. In this figure, o 7 and o 8 are generated from the parent candidates of p 2 , p 4 , p 5 , and p 6 , and p 2 , p 4 , p 5 are determined. In this case, one parent is determined from the winner of p 4 and p 5 , while another parent is either p 2 or the other newly evaluated solution. Then, the proposed method temporally generates offspring for the case that p 2 wins and starts their evaluations.

Suspension of precedence evaluation
The precedence evaluations of the tentative offspring in the two schemes are continued until their evaluations are completed, or the remaining parent is determined. If the evaluation of unnecessary tentative offspring continues after the remaining parent is determined, the slave processor will waste the computing resource.  Therefore, it needs to suspend and delete the unnecessary evaluation of the tentative offspring. Thus, the proposed method introduces a mechanism to suspend the evaluations of unnecessary individuals. Figure 6 shows an example of the suspension procedure in (λ, λ)-IGEA. In Figure 4, the evaluation of x 8 was not completed, while later in Figure 6, its evaluation is completed, and x 2 is determined as the winner of the tournament selection. In such a case, the tentative offspring generated from x 8 and x 4 is no longer necessary. Thus, the proposed method suspends the evaluations of these unnecessary offspring.
A similar mechanism is introduced in the (λ + λ) scheme. Figure 7 shows an example of the suspension procedure of (λ + λ)-IGEA. In this figure, when p 6 is determined, either of p 6 and p 2 win. If p 2 wins, tentative offspring can be left. However, if p 6 , a newly evaluated individual, wins, the precedence evaluation is suspended, and tentative offspring is deleted. Then, new offspring generated from p 6 and p 5 are generated and start their evaluations. By this procedure, the proposed method eliminates wasted computing resources.

Experiments
To investigate the effectiveness of the proposed method, we conduct experiments to compare the performance of the proposed method with the original IGEA and a simple generation-based PEA. EA used in our experiments is the real-coded GA that evolves continuous (real-value) design variables. We consider two schemes of (λ, λ)-EA and (λ + λ)-EA. Hereafter, we denoted the proposed IGEA with (λ, λ) scheme as P-(λ, λ)-IGEA, while the one with (λ + λ) scheme as P-(λ + λ)-IGEA.
The experiments use various fitness functions and the different features of the evaluation time. The experiments are implemented based on the Python code provided by the authors of [13]. The master-slave parallel processing is implemented in the simulation environment.

Experimental cases
This paper conducts two experiments: the first one uses a constant fitness function, while the second one uses the Rastrigin function.

Experiment 1
Experiment 1 uses a constant fitness function and three different features of the evaluation time. Concretely, we use the following three evaluation time functions: Fixed: the evaluation time is fixed to 1 s. Uniform: the evaluation time is randomly sampled from the uniform distribution of 1 to 100 s. Exponential: the evaluation time is sampled from the exponential distribution of 1 s on average.
The maximum number of evaluations is 10,000 in each method, and each method is performed in 20 trials. We change the number of utilized CPUs as 1, 10, 20, ··· , 90, 100, and compare the evaluation time on the simulation.

Experiment 2
Experiment 2 uses the Rastrigin function that is wellknown multimodal benchmark problem and is defined as: where n is the dimension of the individual. This experiment set n = 5, which is the same setting as the original work.
We use the evaluation time function that is negatively or positively correlated to the fitness value. The evaluation time function of the positive correlation is formulated as: where x is an individual to be determined its evaluation time, while f R (x) indicates the fitness value of x calculated with (1). On the other hand, the evaluation time function of the negative correlation is formulated as: where the minimum evaluation time is limited to 1 s, even if the fitness value is greater than 100. The maximum number of evaluations is 10,000 in each method, and each method is performed in 20 trials. We change the number of utilized CPUs as 1, 10, 20, ··· , 90, 100, and compare the evaluation time on the simulation and the transition of the fitness value.
Note that the search performance of the competitive methods (both in the (λ, λ) and (λ + λ) replacement schemes) is identical because there is no change in the optimization result with respect to the same generation. Therefore, the purpose of this experiment is to investigate whether the proposed parallelization scheme can reduce the execution time, and we do not discuss the search performance.

Parameter settings
We use the same experimental parameter settings as the original work. The population size is set to 100. We use the two-point crossover with the probability of 0.8 and the Gaussian mutation (a standard deviation is 1.0) with the probability of 0.1.

Evaluation criteria
To compare the performance of the competitive methods, we first calculate the execution time to complete 10,000 evaluations. Then, we calculate the CPU utilization, which is calculated as u = t e /Nt w . N is the number of CPUs used, t w is the execution time to complete 10,000 evaluations in the simulation of the experiment, while t e is the sum of all fitness evaluation time.

Result
This section describes the experimental results described in Section 5. First, we show the results when using a constant fitness function, and then, the results with the Rastrigin function are presented. Figure 8 shows the CPU utilization when using the fixed evaluation time, the uniform evaluation time, and the exponentially distributed evaluation time. The vertical axis shows the CPU utilization, while the horizontal axis shows the number of CPUs. This result shows the median of 20 trials. The blue lines with circles indicate (λ, λ)-EA and (λ + λ)-EA. The orange lines with squares indicate (λ, λ)-IGEA and (λ + λ)-IGEA, while the red lines with triangles indicate P-(λ, λ)-IGEA and P-(λ + λ)-IGEA. As to the line type, the solid lines indicate the (λ, λ) scheme, while the dashed lines indicate the (λ + λ) one.

CPU utilization
From these results, we can first find that the CPU utilization of a simple PEA decreases as the number of utilized CPUs increases in all evaluation time functions. In particular, in the exponential evaluation time shown in Figure 8(c), the CPU utilization decreases to less than 20%. This indicates that the simple generationbased PEA wastes much computation resources when the variance of evaluation time is large.
Focusing on the original IGEA and the proposed method, we can generally find that the CPU utilization is 100% up to 30 or 40 CPUs, but it decreases when using more than 50 CPUs. However, the proposed method shows higher CPU utilization than the original IGEA in both schemes of (λ, λ) and (λ + λ) in all features of the evaluation time. In particular, for the fixed evaluation time, as shown in Figure 8(a), the CPU utilization of P-(λ, λ)-IGEA is about 74% when the number of CPUs is 100, while that of (λ, λ)-IGEA is about 71%. Meanwhile, the CPU utilization of P-(λ + λ)-IGEA is about 83%, while that of (λ + λ)-IGEA is about 82%.
On the other hand, when the evaluation times are different, i.e. the uniform evaluation time and the exponential distributed evaluation time, the difference between the proposed method and the original IGEA is larger on both the replacement schemes. For the uniform evaluation time, when the number of CPU is 100, the CPU utilization of P-(λ, λ)-IGEA is about 55%, while that of (λ, λ)-IGEA is about 42%. In addition, the CPU utilization of P-(λ + λ)-IGEA is about 70%, while that of (λ + λ)-IGEA is about 49%. For the exponential distributed evaluation time, when the number of CPU is 100, the CPU utilization of P-(λ, λ)IGEA is about 32%, while that of (λ, λ)-IGEA is about 26%. In addition, the CPU utilization of P-(λ + λ)IGEA is about 44%, while that of (λ + λ)-IGEA is about 30%.
These results reveal that the proposed method improves the CPU utilization in PEA by generating precedence offspring. In particular, the proposed method improves the CPU utilization at most 21% compared with the original IGEA when 100 CPUs are utilized. Tables 1-3 show the execution times to complete the maximum number of evaluations and the occurrence percentage of precedence evaluations for each number of CPUs in the fixed evaluation time, the uniform evaluation time, and the exponential evaluation time, respectively. In these tables, T PEA indicates the reduction ratio of the computational time of the proposed method compared with the simple PEA, while T IGEA indicates that compared with the original IGEA. These ratios are calculated as:   indicates the occurrence percentage of precedence evaluations in P-(λ, λ)-IGEA and P-(λ + λ)-IGEA, which is calculated as:

Execution time
where #precedence evaluations means the number of precedence evaluations during the evolution, while #all evaluations means the number of all evaluations, i.e. 10,000 evaluations in our experiments. As a result in Tables 1-3, when the number of CPUs is small, the evaluation time of the proposed IGEA is almost the same as the one of the original IGEA. In these cases, since no precedence evaluation is executed, the proposed method shows the same behaviour as the original IGEA. On the other hand, it is shown that the evaluation time of the proposed IGEA is shorter than the one of the original IGEA as the number of CPUs increases, and when 100 CPU are utilized, the proposed method reduces the execution time at most 62.44% from the simple PEA and 32.65% from the original IGEA. When a large number of cores is used, the occurrence percentage of precedence evaluations increases. In particular, more than 50% of newly generated individuals are tentative offspring for the precedence evaluations when using 100 CPUs. This contributes to improving the computational efficiency of IGEA. This result reveals that the proposed precedence evaluation of tentative offspring and their suspensions enable to shorten the execution time of PEA. Figure 9 shows the CPU utilization of the evaluation time with negative and positive correlation when solving the Rastrigin function. The axes and the lines in these figures indicate the same meaning as Figure 8.

CPU utilization
As shown in Figure 9(a), the CPU utilization decreases from 50 CPUs in both the proposed method and the original IGEA using the evaluation time with a negative correlation. The proposed method performs higher CPU utilization than the original IGEA in both replacement schemes. When the number of CPUs is 100, the CPU utilization of (λ, λ)-IGEA is about 68%, while the one of P-(λ, λ)IGEA is about 73%. In addition, the CPU utilization of (λ + λ)-IGEA is about 79%, while the one of P-(λ + λ)IGEA is about 80%. On the other hand, as shown in Figure 9(b), the CPU utilization decrease from 10 CPUs in both the proposed method and the original IGEA when using the evaluation time with a positive correlation. Same as the case of a negative correlation, the proposed method performs higher CPU utilization than the original IGEA in both replacement schemes. When the number of CPUs is 100, the CPU utilization of (λ, λ)-IGEA is about 27%, while the one of P-(λ, λ)IGEA is about 33%. In addition, the CPU utilization of (λ + λ)-IGEA is about 29%, while the one of P-(λ + λ)IGEA is about 43%.
These results indicate that the proposed method can improve the CPU utilization from the simple PEA and the original IGEA even when there exists correlation between the fitness value and the evaluation time. Tables 4 and 5 show the execution time to complete 10,000 evaluations and the occurrence percentage of precedence evaluations for each number of CPUs in the simple PEA, the original IGEA, and the proposed method. A similar tendency to the results of the constant fitness function can be found. As a result, when the number of CPUs is small, the execution time of the proposed method is slightly longer than the one of IGEA. In these cases, since no precedence evaluations are executed, the difference in the execution time can be negligible. On the other hand, it is shown that the reduction ratio of the proposed method increases as the number of CPUs increases. As increasing the number of used CPUs, the percentage of precedence evaluations increases, and almost a half of newly generated individuals are tentative offspring for the precedence evaluation when using 100 CPUs. For this fact, it is indicated that the proposed method contributes to decreasing the execution time when using a large number of CPUs.

Fitness transition
Finally, the fitness transitions with the negative and the positive correlation evaluation times are shown in Figure 10. The vertical axis shows the logarithm of fitness, while the horizontal axis shows the elapsed simulation times. This result shows the median of  From Figure 10(a), when we compare all methods at the same elapsed time, the proposed IGEA, both in (λ, λ) and (λ + λ) schemes obtains the equivalent fitness function value to the simple PEA and the original IGEA. This indicates that the proposed IGEA maintains the search efficiency of the simple PEA and the original IGEA when the evaluation value negatively correlates to the fitness value. On the other hand, from Figure 10(b), when comparing each method at the same elapsed time on the evaluation time positively correlates to the fitness value, the proposed IGEA obtains a better fitness function value than the original IGEA. This indicates that the proposed IGEA has a search efficiency equal to or better than the simple PEA and the original IGEA when there is a positive correlation in the evaluation time.
These results reveal that the proposed method can improve the CPU utilization regardless of the feature of the evaluation time and can achieve the equivalent search capability to the original IGEA. The previous research has shown that the original IGEA does not affect the search performance even when there is a bias in the evaluation time. The proposed method inherits the search performance of the original IGEA while improving CPU utilization, which is not affected by the evaluation time bias. Overall, these results show that P-(λ + λ)-IGEA performs best.

Discussion
This section further analyses and discusses the proposed method. First, the following subsection analyses the effectiveness of the evaluation suspension in the proposed method. Then, we consider the influence of the population size and the tournament size on the proposed method. Finally, we discuss the range of applications in the proposed method.

Effectiveness of the suspension of precedence evaluation
Although our proposed method introduces the evaluation suspension that suspends and deletes tentative offspring of which evaluations are no longer necessary because of the decision of their parents. In this subsection, we analyse the effectiveness of the evaluation suspension. Figure 11 shows the CPU utilization of the proposed method with and without the evaluation suspension. The horizontal axis shows the number of used CPUs, while the vertical axis shows the CPU utilization. The red lines with triangles indicate the result of the proposed method with the evaluation suspension, while the blue ones with inverse triangles indicate the result of the proposed method without the suspension.
From this figure, in the (λ, λ) scheme, the CPU utilization of no evaluation suspension is constantly higher than that of the proposed method using the evaluation suspension. On the other hand, in the (λ + λ) scheme, there is no difference between the proposed method with and without evaluation suspension in the constant fitness function shown in Figure 11(a)-(c). This is because tentative offspring are always left for the evaluations when the fitness function is constant. Meanwhile, in Figure 11(d ,e), in which the Rastrigin function is used, the CPU utilization increases if the suspension of precedence evaluation is not performed. These results indicate that the evaluation suspension successfully reduces the unnecessary use of CPUs for both of (λ, λ) and (λ + λ) schemes. Figure 12 shows the fitness transition of the proposed method with and without evaluation suspension. The horizontal axis shows the elapsed execution time, while the vertical axis shows the fitness value. From these figures, there is no large difference between the proposed method with and without evaluation suspension in both the (λ, λ) and (λ + λ) schemes. This indicates that the proposed method without the suspension just increases the CPU utilization without any improvement of the search efficiency. For this fact, the suspension of precedence evaluation is an important mechanism to reduce the execution time with the minimum CPU utilization significantly.

Influence of population size and tournament size
We consider the influence of population size and tournament size on the efficiency of the proposed method. First, for the population size, as the population size increases, the number of combinations of aspirant pairs increases, and therefore, the percentage of tentative offspring that can be generated may decrease. In this case, it is expected to decrease the CPU utilization close to that of IGEA without the precedence evaluation because the occurrence percentage of precedence evaluations decreases. On the other hand, if the tournament size increases, it is also expected that the efficiency of the proposed method decreases as well as the population size because the number of individuals required to generate new (tentative) offspring increases. From the above point of view, we are concerned that the proposed method will be adversely affected by increasing population size and tournament size.
To overcome such problems, we can improve the CPU utilization by changing the condition of tentative offspring generation and actively performing the precedence evaluation even when the population size and tournament size are large. Specifically, in the proposed method, when three out of the four individuals necessary for generating offspring are determined, four tentative offspring are generated in P-(λ, λ)-IGEA. However, by relaxing this condition, for example, generating eight tentative offspring when two out of the four individuals are determined, CPU efficiency can be improved. This relaxation method reduces the waiting time when the population size or tournament size increases, and it allows us to utilize the available computing resources as much as possible. This analysis will be tackled in future research.

Applicability of the proposed method
Finally, this subsection discusses the applicability of the proposed method. Although this paper used a realcoded GA in the experiments, the application scope of the proposed method is not limited to it. However, it should be noticed that the proposed method cannot be applied to all existing EA methods. In particular, the following conditions must be met to apply the proposed method.

Condition 1:
Offspring can be generated with only a few individuals in the population. Condition 2: Part of individuals in the new population can be determined without evaluating all individuals.
The first condition is necessary because the proposed method can achieve the interleaving generation by generating offspring without waiting for all population evaluations. For example, the roulette-wheel selection does not satisfy Condition 1 because it requires evaluating all individuals in the population. Another example is particle swarm optimization (PSO) [17]. Since PSO uses the global best to calculate the next position and velocity, it requires the evaluations of the entire population and does not satisfy Condition 1.
The second condition is also necessary. If the new population is selected depending on the evaluations of the entire population, no aspirant is determined without waiting for all evaluations. For example, IBEA [18], a well-known multi-objective EA, selects the next On the other hand, if these conditions are satisfied, the proposed method can be applied not only for the real-coded GA, but also other EA methods for combinatorial or structural optimization. One example applicable to the proposed method is differential evolution (DE) [19]. DE satisfies both of the two conditions. Precisely, DE satisfies Condition 1 because the offspring can be generated only with the parent individual and three randomly selected individuals. Also, DE can determine the next population by comparing the parent and offspring individuals, so Condition 2 is also satisfied. This detail is discussed in our another work [20].
Another example is crossover methods with multiple parents, such as SPX [21] and UNDX [22]. Since these crossover methods can generate offspring with some, but not all, individuals in the population, they can be applied to the proposed method. However, when using crossover requiring multiple parents, the efficiency of precedence evaluation by the proposed method decreases as the number of required parental individuals increases because it needs to wait for the completion of evaluation of those parent individuals.

Conclusion
This paper proposed an improved method of IGEA that introduces a mechanism of the precedence evaluation of tentative offspring and its suspension. To investigate the effectiveness of the proposed method, we compared the performance of the proposed IGEA with the original IGEA and a simple generation-based PEA. In the experiment, we considered two replacement schemes. One is (λ, λ)-EA that replaces the population with newly generated offspring. Another is (λ + λ)-EA that selects the best λ individuals from the current λ individuals and λ offspring.
In these experiments, we used the constant fitness function and three evaluation time functions fixed to 1 s, randomly sampled from the uniform distribution of 1 to 100 s, and sampled from the exponential distribution of 1 s on average. In addition, we used the Rastrigin function and the evaluation time function that negatively or positively correlated to the fitness value. The experimental results showed that the proposed (λ, λ)-IGEA has higher CPU utilization and shorter execution time than the original (λ, λ)-IGEA and the (λ, λ)-PEA regardless of the features of the fitness function and the evaluation time. For the (λ + λ) version, the experimental results were a similar tendency to those of (λ, λ) version. Besides, the experimental results showed that the search capability of the proposed method is almost the same as the original IGEA on both the IGEA versions. Overall, the proposed IGEA with (λ + λ) replacement scheme showed the best performance.
In the near future, we will propose a method to improve CPU utilization further. We will also analyse the influence of the parameters (population size and tournament size) in the proposed method. In addition to them, we will address to implement some concrete EA algorithms by using the proposed method and compare them with the original IGEA.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
This work was supported by Japan Society for the Promotion of Science Grant-in-Aid for Young Scientists [grant number JP19K20362].