An improved Monte Carlo Tree Search approach to workflow scheduling

Workflow computing has become an essential part of many scientific and engineering fields, while workflow scheduling has long been a well-known NP-complete research problem. Major previous works can be classified into two categories: heuristic-based and guided random-search-based workflow scheduling methods. Monte Carlo Tree Search (MCTS) is a recently proposed search methodology with great success in AI research on game playing, such as Computer Go. However, researchers found that MCTS also has potential application in many other domains, including combinatorial optimization, task scheduling, planning, and so on. In this paper, we present a new workflow scheduling approach based on MCTS, which is still a rarely explored direction. Several new mechanisms are developed for the major steps in MCTS to improve workflow execution schedules further. Experimental results show that our approach outperforms previous methods significantly in terms of execution makespan.


Introduction
As the computational structure of modern scientific and engineering applications grows ever more complex and large-scale, workflow computing has become an inevitable parallel processing model. To improve computational performance, workflow scheduling (Topcuoglu et al., 2002) has become an important research topic on modern parallel processing platforms, including various high-performance computing platforms and cloud computing environments. There are a lot of methods proposed to resolve the challenging NP-complete workflow scheduling problem in the literature. Most of them are heuristicbased algorithms like HEFT (Topcuoglu et al., 2002), PEFT (Arabnejad & Barbosa, 2014), and IPPTS (Djigal et al., 2021). Another well-known category of workflow scheduling approaches is based on meta-heuristic methods (Keshanchi et al., 2017;Xu et al., 2014), e.g. genetic algorithm (Srinivas & Patnaik, 1994) and other evolutionary algorithms.
Search based approaches were less considered in the previous workflow scheduling studies because of the search space explosion problem, especially when dealing with large scale workflows and parallel computing platforms. However, the recent development of advanced search methods has made search-based approaches a potentially promising direction for workflow scheduling. Monte Carlo Tree Search (MCTS) (Browne et al., 2012) is one of the most well-known advanced search methods, which has shown great performance in several applications domains (Chaslot et al., 2006).
Traditionally, MCTS was usually used in computer programs playing turn-based-strategy games, such as Go, chess, shogi, and poker. However, some previous research works have tried to apply it to solve specific optimization problems, such as production management (Chaslot et al., 2006), and found that it has the potential to outperform commonly used heuristic-based algorithms. Recently, MCTS was applied to solve the workflow scheduling problem in (Liu et al., 2019). In addition to applying the traditional MCTS method, they also developed a pruning algorithm to reduce the search space in MCTS for improving search efficiency and effectiveness. The proposed approach leverages the idea of branch-and-bound and is called MCTS-BB (Liu et al., 2019).
The experimental results in (Liu et al., 2019) show that MCTS outperforms well-known heuristic algorithms for workflow scheduling, including HEFT, PEFT and CPOP (Topcuoglu et al., 2002). However, the experimental results also reveal that in general, the performance of MCTS-BB is worse than the original MCTS approach, indicating that the proposed pruning algorithm is not as effective as expected. The pruning algorithm in (Liu et al., 2019) relies on the makespan of the current partial schedule and the length of the critical path among unscheduled tasks to determine the upper bound guiding the search process. In this paper, we present new pruning and heuristic-guiding mechanisms to further improve the performance of MCTS in workflow scheduling. Experimental results show that our new approaches can achieve better performance than the original MCTS and MCTS-BB methods proposed in (Liu et al., 2019).
The remainder of this paper is organized as follows. In the next section, we present the formulation of the workflow scheduling problem. Section III discusses related works of heuristic-based and guided random-search-based workflow scheduling methods. Section IV illustrates how MCTS could be applied to solve the workflow scheduling problem and presents the MCTS-BB methods in (Liu et al., 2019). Section V describes our new heuristic guided MCTS approach for workflow scheduling. Section VI presents the overall performance evaluation of our improved MCTS approach for workflow scheduling with a comparison to well-known previous methods in the literature. Section VII concludes this paper and sheds some light on promising future research directions.

Workflow scheduling problem
In most workflow scheduling research works (Arabnejad & Barbosa, 2014;Liu et al., 2019;Topcuoglu et al., 2002), a workflow is usually represented by a Directed Acyclic Graph (DAG), i.e. G = (V, E), for describing the inter-task structure, where V represents the set of nodes and E is the set of edges. Figure 1 shows a workflow example represented by a DAG. Each node in the workflow represents a computational task with the annotated value indicating the required execution time. The directed edges between tasks represent the computational or data interdependence among them and thus reduce potential task parallelism and enforce specific execution order of all tasks. The number next to an edge indicates the required data transfer time between tasks. The workflow scheduling problem on a parallel computing platform is to determine the start time and allocated processor for each task in a workflow, aiming for optimizing a specific goal, e.g. execution makespan.
When speed heterogeneity of processors is considered in workflow scheduling, the required execution time for each task in the DAG is usually assigned to the average value of execution time across all processors in the system (Arabnejad & Barbosa, 2014;Topcuoglu et al., 2002) as follows, where w i,j is the required execution time for task i on processor j and there are total q processors in the system.
For two tasks i, j executed on processor m, n respectively, the required communication time c i,j for the edge connecting nodes n i and n j in the DAG is usually measured by summing up the latency L m,n and the data transfer time which could be calculated by dividing the data size data i,j by the data transfer rates B m,n , i.e. network bandwidth, between the two processors running tasks i and j.
When a heterogeneous computing environment is concerned, the weight on an edge c i,j in the workflow's DAG could be represented by the average communication time of transfer the data of size data i,j across all possible network links in the system as follows.
The Earliest Start Time (EST) and Earliest Finish Time (EFT) of a task is the earliest possible time when it can start and would finish its execution, respectively. EST and EFT are dependent on both data and resource availability, while EFT also depends on the task's workload.
A workflow is assumed to have exactly one entry node without any precedent tasks and one exit node with no following tasks. EST of the entry node is defined to be zero.
EST(n entry, p j ) = 0 ( 4 ) The EST's of other non-entry nodes allocated onto specific processors are computed recursively from the entry node as follows, where avail[j] indicates the earliest available time of processor j and AFT() is a function returning the actual finish time of a task in a partial schedule.
The EFT of a task on a specific processor is the summation of its EST and required execution time on the processor.
The makespan of a workflow is defined to be the actual finish time of the exit task, which is a common optimization goal of workflow scheduling algorithms.

Related works
Minimization of execution makespan has long been an important research topic in workflow scheduling research. Recently, as new parallel and distributed computing platforms emerge, such as cloud and fog computing, there are also research works exploring new issues on such platforms (Li et al., 2021;Wang & Zuo, 2021;Wu et al., 2020;Yadav et al., 2021Yadav et al., , 2022. Some of the recent research works also deal with multi-objective scheduling problems, considering extra optimization goals, such as energy-saving, in addition to workflow execution makespan (Hu et al., 2021;Wu et al., 2020;Yadav et al., 2021Yadav et al., , 2022. In this paper, we focus on the most fundamental workflow scheduling problem, which aims to minimize workflow execution makespan on a parallel processor platform. Most previously proposed approaches for such scheduling problems can be roughly classified into two categories: heuristic-based (Arabnejad & Barbosa, 2014;Djigal et al., 2021;Topcuoglu et al., 2002) and guided random-search-based (Keshanchi et al., 2017) methods.

HEFT
Heterogeneous Earliest Finish Time (HEFT, Topcuoglu et al., 2002) is one of the most wellknown heuristic-based workflow scheduling algorithms, featuring its simplicity, efficiency, and effectiveness. HEFT starts with a task prioritizing phase for calculating task priorities using an upward ranking mechanism, followed by a processor selection phase for allocating each task onto the most suitable processor to minimize its finish time. The upward rank of a task is calculated recursively as follows.

PEFT
Predict Earliest Finish Time (PEFT, Arabnejad & Barbosa, 2014) is an efficient workflow scheduling algorithm improved from HEFT. PEFT adopts a lookahead mechanism by introducing an Optimistic Cost Instead of the upward rank used in HEFT, PEFT ranks tasks using the OCT value defined as follows, which estimates the minimum processing time of the longest path from the current task to the exit task considering that the current task is allocated onto a specific processor.
The OCT values of the exit node on all processors are defined to be zero.
Based on the OCT values, the rank of each task is calculated by In the processor selection phase, PEFT uses O EFT instead of the EFT mechanism in HEFT, which is defined as

ResourceAwareEFT
A Resource Aware Earliest Finish Time algorithm was proposed in (ResourceAwareEFT, Chen & Huang, 2019, July) for workflow scheduling, which is also an improved algorithm based on HEFT. While adopting the same EFT mechanism as HEFT in the processor selection phase, in the task prioritizing phase, ResourceAwareEFT ranks tasks based on the observation of different DAG structure patterns. Figure 2 shows the four major patterns utilized in ResourceAwareEFT, which have different rank calculation mechanisms as defined in equations (11), (14), (15), (16), (17), respectively. For the pattern in Figure 2(a), the communication cost between tasks A and B is ignored since both tasks involve only one single edge. The rank spc (t i ) of this structure pattern is defined by rank spc (t i ) = 0 (11) In (12) and (13), |ParentTask(t B )| is the number of parents for t B and |ChildrenTask(t A )| is the number of children for t A . |P| represents the total number of processors in the parallel system.
Figure 2(b) shows a join pattern among tasks. The rank spc (t i ) of this structure pattern is defined by Figure 2(c) illustrates a fork pattern among tasks. The rank spc (t i ) of this structure pattern is defined by Figure 2(d) shows a more complicated mixed pattern of Figure 2(a-c). The rank spc (t i ) of this structure pattern is defined by Based on the above definitions, the task rank in ResourceAwareEFT is calculated as follows.

IPPTS
Improved Predict Priority Task Scheduling (IPPTS) was proposed in (Djigal et al., 2021), which is the latest heuristic-based workflow scheduling algorithm. IPPTS maintains the same time complexity as previous well-known heuristic algorithms (Arabnejad & Barbosa, 2014;Topcuoglu et al., 2002;Xie et al., 2015), while outperforming them in terms of workflow execution makespan. IPPTS is an improved version of PPTS (Djigal et al., 2019) based on a Predict Cost Matrix (PCM). The PCM equation (18) is similar to the OCT equation (8) in PEFT (Arabnejad & Barbosa, 2014), except that PCM takes into consideration the computation cost of the current task.
In the task prioritization phase, the average PCM value of each task is denoted by rank PCM (t i ) and defined as follows.
The rank of each task, denoted by Prank(t i ), is computed by multiplying rank PCM (t i ) with its out-degree outd(t i ).
In the processor selection phase, IPPTS uses Looking Ahead Earliest Finish Time Lhead eft instead of the EFT mechanism (Sinnen, 2007). To compute Lhead eft , the Looking Head Exit Time LHET and EFT are involved. The LHET(t i , p j ) and Lhead eft (t i , p j ) are defined by

Genetic Algorithm
The Genetic Algorithm (GA, Srinivas & Patnaik, 1994) is a kind of evolutionary algorithm usually used for optimization problems. Previous works (Keshanchi et al., 2017;Sardaraz & Tahir, 2020;Xu et al., 2014) have applied GA to workflow scheduling, where a gene in a chromosome represents a task and the content of each chromosome is a possible task execution sequence for a specific workflow as shown in Figure 3. A population contains a pre-defined number of chromosomes representing various possible workflow execution schedules. The approach proposed in (Keshanchi et al., 2017) starts with population initialization based on both heuristic and random chromosome generation. In the first step, heuristic chromosome generation is applied, where the schedules produced by heuristic workflow scheduling algorithms, such as HEFT-B (Topcuoglu et al., 2002), HEFT-T (Topcuoglu et al.,), and HEFT-L (Xu et al., 2014) are set as seed chromosomes. Then, in the second step of random chromosome generation, more chromosomes are randomly generated until the pre-defined population size is achieved.  After population initialization, the GA approach will enter a loop of three essential operations, i.e. selection, crossover, and mutation, until a pre-defined termination condition is reached. In the selection phase, chromosomes will be randomly selected for performing the crossover and mutation operations on them. In the crossover phase, the elitist chromosome is the chromosome with the best fitness value, which will be marked and reserved with no update. Other chromosomes will be selected for the one-point crossover or twopoint crossover operations to produce offsprings as new chromosomes. Figure 4 shows the one-point crossover operation used in (Keshanchi et al., 2017).
The fitness of a chromosome is the resultant execution makespan of the schedule represented by it. In (Keshanchi et al., 2017), the execution makespan of a chromosome is calculated by applying the Earliest Finish Time (EFT, Sinnen, 2007) processor allocation policy to the task execution sequence represented in it. Therefore, a smaller value indicates better fitness.
In the mutation phase, chromosomes will be modified according to the mutation rule. For example, in (Xu et al., 2014) a task-swapping mutation was used, where a mutation point and a random point in a chromosome will be selected randomly for swapping the corresponding two tasks represented by the two genes if such swapping would not violate the inter-task dependence. The task-swapping mutation operation is shown in Figure 5.
In our implementation of the GA-based workflow scheduling approach used in this paper for performance comparison with MCTS approaches, two chromosomes are used for representing a workflow execution schedule, one for task execution sequence and the other for processor allocation, as shown in Table 6. Therefore, when performing the mutation operation, in addition to task swapping, the processor allocation of the genes on the selected mutation and random points will also be changed randomly as shown in Table 6.  At the end of each iteration in the loop, the fitness value of each chromosome will be calculated. The termination condition of the loop is defined to be that all chromosomes have the same fitness. The elitist chromosome after the loop finishes will be returned as the best workflow execution schedule found by the GA approach. The entire GA workflow scheduling process is shown in Figure 7.

MCTS applied to workflow scheduling
Monte Carlo Tree Search (MCTS) is a heuristic searching algorithm consisting of four major steps. The entire search process of MCTS can be illustrated in Figure 8. When we apply MCST to workflow scheduling, each node in the MCTS tree represents a probable partial or complete schedule of a workflow to be executed on a set of processors. Nodes with complete schedules are called terminal nodes. The children of a node represent all possibilities of choosing an unscheduled ready task and allocating it to one of the processors. The root is a special node with an empty schedule where no tasks have been scheduled. MCTS starts with a single root node, and the tree gradually grows as illustrated in Figure 8 as the search process proceeds.
The four major steps in MCTS for workflow scheduling are described as follows.
(1) Selection. Starting from the root node, a child is recursively selected at each visited node to proceed according to a selection policy until reaching a leaf node. (2) Expansion. If the selected leaf node in the previous step is not a terminal node, all its child nodes would be created and their visit counts and reward values are initialized. Then, one of the expanded child nodes is randomly selected for the following simulation step. (3) Simulation. Starting with the chosen node, a simulation strategy is used to extend the current partial schedule to a complete schedule and calculate the execution makespan of the resultant schedule. (4) Backpropagation. Updating the visit counts and reward values in the nodes on the path from the chosen node to the root node based on the complete schedule. The reward value in a node represents the shortest makespan ever found from the performed simulations going through it.

MCTS-BB
MCTS-BB is a recent MCTS approach for workflow scheduling, which is equipped with a pruning method based on the branch-and-bound concept. The proposed pruning method is used to prevent MCTS from exploring some unpromising subtrees. Tree pruning is performed in the selection step, where the selected maximum UCT node will be marked as disabled if it meets the pruning criterion, to prevent it from further exploration. Once tree pruning occurs, the search process would start over again from the root node. Otherwise, the search process will continue as shown in Algorithm 1.
The pruning criterion is that the estimated makespan going through the selected node exceeds the current shortest execution makespan ever found, i.e. β in Algorithm 1. The estimated makespan M of the selected node is calculated by summing up the makespan of the current partial schedule and the length of the critical path CP last among the unscheduled Algorithm 1: MCTS-BB Pruning Input: selected node n with maximum uct value Variable: β as current best makespan ever found 1. Calculate Mark(n) 2. ifMark(n) > β 3. mark node n as disabled 4.
cut off n's children nodes 5.
return n // continue to selection step 8. end if tasks as follows, where max } is the makespan of the current partial schedule.
The critical path in equation (18), i.e. CP last , could be determined by calculating the sum of upward rank, i.e. equation (7), and downward rank, i.e. equation (24), for each task. The tasks with the largest rank's sum belong to the critical paths.
Since the entire MCTS process is usually controlled by a pre-defined total search time or simulation count limit, a good pruning algorithm can avoid wasting time on exploring those nodes which look unlikely to lead to a better schedule than ever found, and thus increase the probability of finding a better schedule within the same search time limit. However, on the other hand, a flawed pruning algorithm might run the risk of missing some nodes which actually would result in better schedules. The experimental results in (Liu et al., 2019) showed that MCTS-BB achieved worse makespan than MCTS on average, indicating that the pruning algorithm adopted in MCTS-BB leaves room for further improvement.

Our improved MCTS approach for workflow scheduling
This section presents the methods we propose to improve the MCTS process for workflow scheduling.

Pre-calculated upper bound of makespan
The first improvement we made is a new pruning algorithm utilizing existing workflow scheduling heuristic algorithms (Arabnejad & Barbosa, 2014;Chen & Huang, 2019;Topcuoglu et al., 2002, July;Djigal et al., 2021). Instead of the critical path length of unscheduled tasks used in MCTS-BB, our approach adopts the minimum makespan value achieved by HEFT, PEFT, ResourceAwareEFT, CPOP, IPPTS as the initial upper bound before searching for the best schedule via MCTS i.e. lines 1-2 in Algorithm 2.

Estimated makespan of unscheduled tasks
Instead of the critical path length used in MCTS-BB, our approach estimates the minimum makespan of all possible schedules extended from a node allocated on a specific processor by adding up the node's EFT and the summation of all its successors' shortest required execution time divided by the number of processors in the system, i.e. equation (25). Based on such makespan estimation, our new pruning algorithm could avoid the risk of disabling promising nodes for further exploration.

Node selection policy
At the selection step, the UCT equation in (Kocsis & Szepesvári, 2006) is the most widely used node selection policy. In (Liu et al., 2019), a variant of the original UCT equation as shown in equation (26) was used in the selection step, where Q(s, a) and U(s, a) represent the reward value and the visit count of the current tree node, respectively. In this paper, we evaluate other possible node selection policies, which will be described in the performance evaluation section.

Node expansion strategy
At the expansion step, we propose a new heuristic strategy for guiding the expansion aiming to improve the search effectiveness. In MCTS-BB, a random expansion policy was used.
In this paper, we propose and evaluate three new expansion policies based on a task's upward rank (Topcuoglu et al., 2002), OCT-rank (Arabnejad & Barbosa, 2014) and STR-rank (Chen & Huang, 2019, July), respectively. The task with the highest rank value would be chosen for expansion as described in line 10 of Algorithm.

Simulation strategy
At the simulation step, we propose a new strategy aiming to improve the simulation effectiveness. In MCTS-BB, a random simulation strategy was used, which repeatedly selects an unscheduled task randomly until all tasks are scheduled. Regarding processor allocation, in (Liu et al., 2019), a probability parameter R threshold was set, and a task has R threshold probability to be allocated randomly and (1−R threshold ) probability to be allocated to a processor chosen by the EFT mechanism (Sinnen, 2007). In this paper, we propose and evaluate three policies for selecting unscheduled tasks based on HEFT, OCT rank + EFT and ResourceAwa-reEFT, respectively. We also adopt the R threshold concept for processor allocation. In the experiments, we will evaluate the effectiveness of different values for R threshold . The overall process of our improved MCTS approach and the major steps in it are presented in the following Algorithm 2 to Algorithm. n expanded = expansion(n selected ) //Algorithm 4 8.

Input:
The node where simulation starts The specified workflow scheduling algorithm Probability threshold used for processor allocation Output: makespan 1. Initialize list < Cpu, Event > schedule for all scheduled task, and put them into the list 2. Schedule remaining unscheduled tasks by the specified workflow scheduling algorithm 3. return resultant makespan

Performance evaluation
This section presents the performance evaluation of the proposed MCTS-based workflow scheduling approach, compared to well-known previous methods.

Experiment setup and performance metrics
There are several toolkits available for generating various workflow structures to be used in evaluating workflow scheduling algorithms. WorkflowGenerator (Juve, 2014) is a toolkit developed in the Pegasus project, which could generate various workflow structures of real-world applications, including Cybershake, Montage, Epigenomics, Insprial and Sipht, described in the Pegasus DAX format. DAGGEN (Suter, 2015) is a software tool developed by Suter et al for generating a wide range of random workflow structures based on user input parameters such as the number of tasks, fatness and density of DAG. It is useful for evaluating workflow scheduling algorithms. DAGGEN supports both dot and text formats. Both toolkits were used in the following experiments.
The experiments were conducted on WorkflowSim (Chen, 2015;Chen & Deelman, 2012), which is a workflow scheduling simulation toolkit developed by Chen based on CloudSim (Calheiros et al., 2011). We wrote a simple program to convert the dot format files produced by DAGGEN into the XML input format used by WorflowSim.
DAGGEN has the following parameters to set for generating workflows of various structures and properties.
• n: the number of tasks in a workflow node.add_child(new Node(cpu, task)) 8. end foreach 9. end foreach 10. returnhighest_task_rank_child_node (node) • fat: regarding the DAG shape, ranging in [0,1]. A value close to 1 means a fat DAG with greater parallelism, while a value close to 0 indicates a thin DAG with less parallelism. • density: controlling the inter-dependence of tasks in adjacent levels of a DAG.
• regularity: controlling the distribution of tasks between different levels of a DAG.
• jump: regarding the allowed distance between a task and its parent in terms of levels in a DAG. • ccr: the ratio of communication cost to computation cost. Table 1 lists the parameter values used in the following experiments for generating workflows of various structures.
In the experiments, we also used WorkflowGenerator to generate the real-world applications' workflow structures, such as CyberShake, Epigenomics, and Montage, to evaluate the performance of different workflow scheduling algorithms.  Table 2 shows the parameter values used in the GA-based workflow scheduling method in the following experiments.
The following performance metrics were used in our simulation experiments for performance evaluation.
• Normalized Makespan. Equation (27) is used to normalize the makespan in comparing the performance of different workflow scheduling algorithms. The range of the normalized makespan is between 0 and 1. A normalized makespan close to 1 means a relatively worse performance, while a value near 0 indicates superior performance. makespan = makespan − makespan min makespan max − makespan min (27) • Schedule Length Ratio (SLR). The definition of SLR, as shown in equation (28), is the makespan divided by the summation of the shortest required execution time of all tasks on the critical path in the workflow. SLR is the resultant makespan normalized to the lower bound. SLR = makespan n i ∈ CP last min p j ∈ P{w i,j } • Speedup. The definition of speedup is the minimum cumulative computation cost of all tasks in a workflow divided by parallel execution time, i.e. makespan, as shown in equation (29).
• Algorithm Run Time. This is the measure used to compare the computational overheads of different workflow scheduling algorithms.

Experimental results and discussions
This section presents the simulation experiments conducted to evaluate the pruning algorithm and guided heuristics proposed in this paper. Four sets of experiments were conducted to evaluate different types of pruning algorithms and guiding heuristics. We conducted the experiments for the simulation strategies first to determine the best R threshold for use in other sets of experiments. In each experiment, MCTS would stop when the simulation count reaches 100,000. Equation (25) was applied in all cases except for MCTS-BB. Workflows of three different CCR values, 0.1, 1, and 10, were used as input data. MCTS-1111 represents the standard MCTS algorithm.

Experiment set 1: simulation strategies
To determine the best simulation strategy in MCTS and the corresponding R threshold , we conducted experiments for the random policy in MCTS-BB and the simulation policies guided by HEFT, OCT rank + EFT, and ResourceAwareEFT with R threshold set to 0, 20 and 30. Tables 3-5 show the experimental results.
• MCTS-1112-0: The simulation policy guided by HEFT, R threshold is set to 0.
• MCTS-1112-20: The simulation policy guided by HEFT, R threshold is set to 20.
• MCTS-1112-30: The simulation policy guided by HEFT, R threshold is set to 30.
• MCTS-1113-0: The simulation policy guided by OCT rank + EFT, R threshold is set to 0.
• MCTS-1113-20: The simulation policy guided by OCT rank + EFT, R threshold is set to 20.
• MCTS-1113-30: The simulation policy guided by OCT rank + EFT, R threshold is set to 30.
According to Table 6, on average the simulation policy guided by HEFT with R threshold set to 20% (MCTS-1112-20) significantly achieved the best performance among all simulation policies and configurations. In contrast, the random simulation policy (MCTS-1111) used in MCTS-BB results in the worst performance. Moreover, the heuristic simulation policies (MCTS-1112, MCTS-1113, MCTS-114) with R threshold = 0 result in poorer    performance compared to R threshold = 20 and R threshold = 30, indicating the importance of introducing randomness in the simulation step. Therefore, we used R threshold = 20% in the following experiments.

Experiment set 2: pruning methods
This section evaluates the performance of our pruning method.
• MCTS-1211: Our pruning method in section VI.   Figures 9-11 show the experimental results in terms of speedup for workflows of different CCR values, respectively. Our pruning method shows the greatest potential for performance improvement in all cases. Moreover, the performance improvement is even large for large-scale workflows.

Experiment set 3: selection policies
This section evaluates different selection policies.

Experiment set 4: expansion polices
This section presents experimental results evaluating different expansion policies.
• MCTS-1121: using the upward rank in HEFT to choose a node for expansion. The task with the highest rank value would be chosen.  • MCTS-1131: using the OCT-rank in PEFT to choose a node for expansion. The task with the highest rank value would be chosen. • MCTS-1141: using the STR-rank in ResourceAwareEFT to choose a node for expansion.
The task with the highest rank value would be chosen. Table 7-10 present the experimental results. In general, the random expansion policy (MCTS-1111) achieves the best performance for workflows with a smaller CCR value or a larger CCR value, e.g. CCR = 0.1 and CCR = 10 as shown in Tables 7 and 9. On the other hand, the expansion policy based on the upward rank (MCTS-1121) outperforms the others for workflows of a medium CCR value, e.g. 1 as shown in Table 1. Considering all CCR values, on average the random expansion policy (MCTS-1111) achieved the best performance as shown in Table 10.

Overall performance evaluation
In this section, we present the overall performance evaluation of our improved MCTS approach for workflow scheduling consisting of the proposed policies described in the previous sections. Based on the previous experimental results, the following three variants  of our approach were compared to the standard MCTS approach (MCTS-1111), MCTS-BB, IPPTS, and the GA-based workflow scheduling method in the following experiments. All the three variants adopt our pruning method and the same simulation policy guided by HEFT but differ in the selection and expansion policies.
• MCTS-1212: using equation (26) in the selection policy and using the random node expansion policy.  • MCTS-3212: using equation (31) in the selection policy and using the random node expansion policy. • MCTS-3222: using equation (32) in the selection policy and using the node expansion policy based on the upward rank in HEFT.
The following presents the performance evaluation of our MCTS workflow scheduling approaches compared to the standard MCTS method and MCTS-BB in a homogenous computing environment. Tables 11-13 show the experimental results for workflows of different CCR values, respectively. In general, our approaches outperform the standard MCTS method and MCTS-BB. MCTS-1222 achieves the best performance across all the three CCR values.
This following presents the performance evaluation of our MCTS workflow scheduling approaches in a parallel computing environment of speed-heterogeneous processors. Table 14 shows the performance comparison in terms of normalized makespan. Figures 15  and 16 compares different scheduling approaches in terms of SLR, while Figure 17 makes the comparison in terms of speedup. Figure 18 compares the algorithm runtime of different approaches. All the above performance comparisons were made on synthetic workflows generated by DAGGEN.   In terms of normalized makespan, MCTS-1222 achieved the best average performance. MCTS-1222 also performed very well in terms of SLR, speedup and runtime as shown in Figures 15-18. Table 15 evaluates the scheduling approaches using workflow structures of real-world applications generated by WorkflowGenerator. On average, MCTS-1222 achieved the best performance. Figures 19-Figure 23 show the detailed performance comparison for each specific real-world workflow application structure, where each scheduling approach exhibited superiority over others in different scenarios.  In (Liu et al., 2019), MCTS-based methods were only compared to heuristic workflow scheduling algorithms. In the following, we compare the proposed MCTS-based method to both major categories of workflow scheduling approaches, including a GA-based method (Keshanchi et al., 2017) and the latest heuristic based algorithm IPPTS (Djigal et al., 2021), which has been shown to outperform well known previous heuristic algorithms such as HEFT and PEFT.
Trying to make a fair comparison between MCTS and GA, we first record the total number of generated chromosomes when the GA-based approach converges. Then, we let the MCTS-based scheduling methods stop when its simulation count reaches the recorded number. To conduct a comprehensive comparison, we also evaluated the performance of the MCTS-based approach with different simulation counts, set to specific ratios of the recorded number for the GA-based method. For example, the GA-based method generated 10,000 chromosomes for a specific input workflow DAG-01.dot. Then, we       would evaluate MCTS-1222 with different simulation counts ranging [10,000 * 1 = 10,000, 10,000 * 0.1 = 1000, 10,000 * 0.01 = 100, 10,000 * 0.001 = 10]. The average recorded number for the GA-based method and the corresponding simulation counts for the MCTS-based approach is shown in Table 16. Figures 24 and 25 present the performance comparison in terms of average speedup. It is obvious that for smaller workflows, e.g. 20 tasks in Figure 24, our approach MCTS-1222 outperforms all the others even using a less simulation count, 1% of the total number of generated chromosomes in the GA-based method. The experimental results also show that more simulations could effectively improve the workflow execution schedules produced by the MCTS-based approach. Figures 26 and 27 compare the scheduling time required for different methods. It is clear that in general MCTS-based approaches would take a longer time to produce the schedules, and the time increases largely as the workflow size grows. This points out an important future research topic for MCTS-based workflow scheduling, developing efficient mechanisms to reduce the required search time.      According to the above experimental results, our improved MCTS-based workflow scheduling approach not only outperforms MCTS-BB (Liu et al., 2019) in terms of schedule quality but also requires less scheduling time. In the following, we present the experimental analysis of the advantage of our approach. Benefitting from the new pruning method and other proposed mechanisms for the selection, expansion, and simulation steps in MCTS, Our approach could disable more unpromising nodes than MCTS-BB as shown in Figure 28. Table 17 takes a specific input workflow, n-0040-ccr-1.0-fat-0.8-regular-0.2-density-0.8jump-2-000001.dot, for a case study, comparing the workflow execution makespan, required scheduling time and resultant MCTS trees between MCTS-BB and our approach. In this case, our approach produces a better schedule of shorter makespan while spending less time in scheduling. Table 17 shows that our approach grows a smaller MCTS tree and disables more nodes in the tree, resulting in superiority over MCTS-BB. Figure 29 compares the MCTS tree properties of our approach and MCTS-BB in terms of the number of nodes at different tree depths. The comparison shows that our approach builds an MCTS tree of more narrow widths at the top levels, demonstrating the advantage of focusing on promising directions quickly.

Conclusion and future work
Monte Carlo Tree Search (MCTS) is a promising direction for workflow scheduling but was less explored in previous studies. In this paper, we present and evaluate several new mechanisms to further improve the effectiveness of MCTS when applied to workflow scheduling, including a new pruning algorithm and new heuristics for guiding the selection, expansion and simulation steps in MCTS. Performance evaluation was conducted with simulation experiments based on both synthetic and real-world workflow application structures. The experimental results show that our new approaches can achieve better workflow execution schedules than the previous method MCTS-BB in (Liu et al., 2019).
In addition to MCTS-BB, we also compared our approach to the two major categories of workflow scheduling methods, i.e. heuristic-based and meta-heuristic based methods, including the latest heuristic based algorithm IPPTS (Djigal et al., 2021) and a recent GAbased meta-heuristic approach in (Keshanchi et al., 2017). The experimental results show significant performance superiority of our approach over both of them in terms of workflow execution makespan, indicating the great potential of MCTS for workflow scheduling.
Based on our experience in the research work presented in this paper, there are several promising and desirable future research directions for applying MCTS to solve workflow scheduling problems. Firstly, according to Figures 13 and 14, currently, there is no single equation for the selection step which could consistently achieve the best performance across different workflow configurations and properties. Therefore, it is an important future research work to develop a new selection equation for consistently delivering the best performance. Secondly, Figures 24 and 25 show that our MCTS approach outperforms the GA-based scheduling method for small and medium workflows, while the GA-based method exhibits an advantage for large workflows. Therefore, further studies on the root cause are desired, and the development of hybrid approaches might be a worthful research direction. Lastly, MCTS could be viewed as an online learning approach. To further improve workflow scheduling performance, it is promising to integrate MCTS with off-line learning and reinforcement learning methods as demonstrated in many AI research domains, such as game playing.

Disclosure statement
No potential conflict of interest was reported by the author(s).