Business process recommendation method based on cost constraints

ABSTRACT Business process recommendation can be used to simplify the working procedures of enterprises, avoid unnecessary expenses, and promote the development of enterprises. In the process of process recommendation, there are a lot of activities that are similar in structure and difficult to choose. Here, a process recommendation method based on cost constraints is proposed to solve the problem of difficult to distinguish similar processes. First, the business process is transformed into a labelled Petri net, and the execution probability of each transition is calculated according to the business process log. Then, the matrix used to represent Petri nets is constructed according to the adjacent relationship between transitions, and the matrix is made into the same dimension, and the similarity between matrices is calculated by biggest–smallest approach degree, and the set of Petri nets with similar structure is established. Finally, a cost constraint-based process recommendation method is proposed to find lower service cost items in similar process sets. In the experimental part, the feasibility of the method is compared and verified.


Introduction
In the past decades, the application fields of business processes have been expanding, such as intelligent financial management , underwater sensor networks , intelligent manufacturing systems (Fu et al., 2021), and robotic mission planning. Enterprises are also constantly innovating, and the scale of business processes of enterprises is constantly expanding, especially large enterprises may generate hundreds of business processes. This makes enterprises face a series of new challenges in process analysis, process management, process retrieval, and process recommendation. For example, in the aspect of process management, a series of process model libraries are established to better manage the process. In terms of process retrieval and process recommendation, in order to retrieve effective process models from the process model library for process recommendation, a corresponding model retrieval mechanism is established. The implementation of these aspects requires the similarity calculation of the process model. Therefore, similarity calculation is an important solution to efficiently solve business process problems.
At present, due to the different needs of users, the calculation basis of similarity is also different, which can be roughly divided into three categories (Dijkman et al., 2011): node matching similarity, structural similarity, and behavioural similarity.
Node matching is usually to map the node labels in two process models to calculate similarity. Dijkman et al. (2011) analysed process model similarity from five perspectives, namely, syntax, semantics, attributes of node labels, and node types and node contexts. Ehrig et al. (2007) measured similarity by transforming Petri nets into Semantic Business Process Models (SBPMs). This method does not consider the structure and behaviour of the process, resulting in inaccurate similarity results. Bergmann and Gil (2014) proposed a graph-based semantic workflow similarity measure and combined with cases to improve the traditional graph similarity algorithm, but the retrieval performance is not very high as the number of cases increases.
Most of the structural similarity measurement methods convert business processes into graphs or trees to calculate edit distance and measure the similarity between processes. Dijkman et al. (2009) used graph edit distance to measure the similarity between two processes, i.e. the minimum cost required to transform one graph into another, but cannot distinguish parallel relationships. Zhou et al. (2019) first constructed weighted business process graph, and then used the weighted graph edit distance to measure business process similarity, which can distinguish parallel relationships. Jia et al. (2012) measured the similarity by the tree edit distance between the two trees. Automata can be represented as directed graphs, and Wombacher and Rozie (2006) analysed the structural similarity of workflows from the perspective of automata. Bae et al. (2007) gave the concept of a process dependency graph and transformed this graph into a process matrix to measure the distance between processes.
For behavioural similarity, many methods are currently proposed to calculate behavioural similarity. Wang et al. (2010) studied the process similarity problem based on PTS, but divided the sequence into cyclic and acyclic structures to calculate the similarity, which destroyed the semantics of the complete sequence. Dong et al. (2014) proposed to use the complete firing sequences to calculate the process similarity for the loop structure problem in Wang et al. (2010), which can effectively deal with the loop structure, but the concurrent structure needs to be listed one by one. Wang et al. (2013) constructed an SSDT matrix according to the shortest succession distance between task in the process, and calculated the similarity by dividing the number of the same elements in the matrix by the total number, which can deal with various structures, but cannot be used for processes index. Zha et al. (2010) measure process similarity according to adjacent relations between activities and can deal with loop structures, but are insensitive to non-free choice structures and ignore the importance of adjacent relations. Yin et al. (2015) added important coefficients to measure similarity based on Zha et al. (2010). Weidlich et al. (2010) extended the adjacent relations of activities, proposed the concept of behaviour profile, and calculated the process similarity according to the behaviour relationship, but it has limitations in the processing of hidden transition, and it is difficult to distinguish between similar structures, that is, it is not easy to retrieve. Facing the problem of process retrieval with similar structures, people hope to find a business process that meets the requirements and conditions, rather than re-develop the process by themselves. At present, most of the process retrieval is to retrieve multiple processes with similar structure, and does not consider variable constraints, such as service cost, and different service quality corresponds to different service cost. For processes that are structurally similar and indistinguishable, people prefer to choose business processes with lower costs. Therefore, this paper analyses and recommends business processes from the perspective of cost constraints.
As the premise of process recommendation, process similarity calculation either only considers process behaviour, or only considers the control flow structure of the process, and does not consider the data flow of the process, that is, the occurrence of actual activities. Based on this, this paper proposes a process similarity calculation method by synthesising the structure, behaviour and activities of the process, and adds cost constraints to recommend the process.
The main contributions of this paper are as follows: (1) Establish a process matrix with execution probability based on Petri net, and codimensionalise the matrix, describe the similarity between two processes through the maximum and minimum closeness, and establish a process set with similar structure; (2) Propose a process recommendation method based on cost constraints, and find out the business process with lower service cost in the process set for recommendation.
The structure of the paper is as follows: Section 2 introduces a motivation example, Section 3 gives the basic concepts, and Section 4 introduces the method of calculating similarity between processes. Section 5 describes the cost-based process recommendation algorithm, Section 6 experimental analysis, Section 7 conclusions and future directions.

Motivation
Taking the three bank loan processes N 1 , N 2 and N 3 in Figure 1 as an example, analyse the similarity of N 1 , N 2 and N 3 from the perspective of process behaviour.
The similarity between two processes is calculated according to the transition adjacency relation proposed by Zha et al. (2010). The transition adjacency sets of N 1 , N 2 and N 3 are respectively ac,bd,cd,de,ef ,eg,eh},TAR 2 = {ab,ac,bd,cd,de,ef ,ei,eh,ig},TAR 3 = {ab,ac,bd,cd,de,ef ,eg,ej,jh}, According to the similarity formula sim(N 1 , N 2 ) = |TAR 1 ∩TAR 2 | |TAR 1 ∪TAR 2 | , we can get At this time, the structure of the process is similar, and the similarity between N 1 and N 2 and the similarity between N 1 and N 3 cannot be distinguished according to the transition adjacency relationship, that is, further process recommendation cannot be performed. We improve the calculation method of process similarity, first analyse the activity execution probability, and establish a process matrix, use the concept of closeness in fuzzy mathematics to measure the similarity of two processes, and construct a process set with similar structure, and then a cost constraint-based process recommendation method is proposed to find out the business process with lower service cost in the process set, that is, the business process we want to recommend.

Basic concept
The establishment of business process model is crucial to the realisation of process recommendation. There are many modelling notations such as Event-Driven Process Chaining (EPC) (Van der Aalst, 1999), UML Activity Diagram (Eshuis & Wieringa, 2004), Business Process Modeling Notations (BPMN) (Weske, 2019) and Petri Nets (Cheng et al., 2014), by comparing, this paper uses Petri net to represent the business process model. At present, Petri nets have been widely used in intelligent manufacturing systems, communication and artificial intelligence, etc. Different forms of model expansion are carried out on the basis of retaining the basic Petri net model structure and representation method. The process modelling results are easier to understand, such as the extension of the transition connotation in the Petri net to obtain a Petri net with labels, which is defined as follows: Definition 3.1: ((Labelled Petri Net) ): A 5-tuple N = (P, T, F, , λ) that satisfies the following conditions is called a labelled Petri net: (1) P ∪ T = ∅; (2) P ∩ T = ∅; ( is the set of active labels for transitions; (5) λ : T → is a function of assigning labels to transitions; where, P is the place set, T is the transition set, and F is the flow relation.
Definition 3.2: ((Trace, Event Log) (Fang et al., 2020)): Let be the active transition set of transitions, then the active label sequence is called traceσ ∈ * ; L ∈ B * is the multiset of traces, called the event log, in short, the active labels in the trace only the name, while the active label in the log contain timestamps, resources, etc. During the actual execution of the business process, some activities may be easy to perform, and others may not occur. The actual implementation of the net N 1 of Figure 1 is shown in Table 1. Definition 3.3: (Activity execution probability): Let N be a labelled Petri net, L is the trace set of the event log, which contains K traces in total, and the execution probability ρ of each activity t in N is the participation rate of each activity in the event log, that is ρ(t) = |t| K , where |t| is the number of occurrences of t in the trace.
Definition 3.4: (Low Frequency Sequence): Let L be the event log, which contains K traces in total. For any tracel ∈ L, the frequency of occurrence in the event log is κ, then the occurrence frequency of this trace is κ K , If the frequency of occurrence is lower than a given threshold ξ , the trace is a low frequency sequence.
Let ξ = 0.01, and the occurrence frequency of trace acdgf is 3 600 < 0.01, which is a lowfrequency sequence. At this time, preprocessing is performed to delete it. Then combine 1 and Table 1 to get the execution probability of each activity in the net N 1 , such as ρ(a) = 600 600 = 1. The execution probability graph of net N 1 is shown in Figure 2.
Definition 3.5: (Adjacent Activity): In the process model, x, y ∈ T, x, y is the adjacent activity if and only if there is an occurrence sequence For example, the adjacent activities in N 1 are ab, ac, bd, cd, etc.
Definition 3.6 (Process Matrix): Let N be a labelled Petri net, and the process matrix NM of N is as follows: , t i and t j are adjacent activities 0, otherwise (1) Table 2. According to the actual situation, the process matrix of N 2 and N 3 can be obtained in the same way, as shown in Tables 3 and 4.
Observing the three process matrices, it is found that the dimensions of the matrices are different, so to compare the similarity of the two processes through the process matrix, the process matrix needs to be co-dimensionalised.
Definition 3.7 (Homodimensionalisation of Process Matrix): Let N 1 and N 2 be two labelled Petri net processes, NM 1 and NM 2 are corresponding process matrices, DNM 1 and DNM 2 are homodimensional process matrices, defined as follows: (1) DNM 1 and DNM 2 are n × n matrices, where n = |T 1 ∪ T 2 |; (2) The matrix DNM 1 has the same row and column activity names as DNM 2 , which is the union of the N 1 and N 2 activity names, T 1 ∪ T 2 = {a 1 , a 2 , . . . , a n }; (3) DNM 1 (i, j) and DNM 2 (i, j) are the elements of the i-th row and the j-th column of the matrices DNM 1 and DNM 2 , respectively. The calculation formula is as follows: For example, the process matrices of N 1 , N 2 and N 3 are co-dimensionalised to obtain Tables 5-7.

Similarity between business processes
Nearness degree is the degree of similarity between fuzzy sets described in fuzzy mathematics (Xie & Liu, 2013). This paper adopts the concept of nearness degree to measure the degree of similarity between business processes.
then B is said to be the closest to A k , or B into the AK category, this is the proximity principle.

Definition 4.3:
(Biggest-Smallest Approach Degree): Let N 1 and N 2 be two labelled Petri net, the biggest-smallest approach degree of N 1 and N 2 is defined as where DNM 1 (i, j) ∧ DNM 2 (i, j) = inf{DNM 1 (i, j), DNM 2 (i, j)}. For example, calculate the biggest-smallest approach degree between N 1 and N 2 and between N 1 and N 3 .
In addition to the biggest-smallest approach degree, the approach degree includes lattice approach degree, distance approach degree, and minimum approach degree, etc. The calculation methods are as follows (Table 8): Lattice approach degree: Distance approach degree: Minimum approach degree: Comparing the four approach degree, σ (N 1 , N 2 ) = σ (N 1 , N 3 ). In order to select the appropriate approach degree to calculate the similarity, we calculate the average value of the four methods to be 0.775. At this time, the biggest-smallest approach degrees are the closest to the average value, so the biggest-smallest approach degrees are selected as the method Algorithm 1: Process Similarity Algorithm Input: Two Petri net processes with labels N 1 and N 2 , event logs L 1 and L 2 Output: Similarity sim of N 1 and N 2 1.For each a i ∈ T 1 do 2. ρ(a i ) ← ComputerActivity Execution Probability(N 1 , L 1 , a i ) 14. return sim(N 1 , N 2 ) 15. end to calculate the similarity, namely Algorithm 1 calculates the similarity between processes, constructs the process matrix in lines 1-4, co-dimensionalises the process matrix in lines 5-s10, and calculates process similarity in lines 12-13.
For N 1 , sim(N 1 , N 2 ) = sim (N 1 , N 3 ), the optimal process cannot be selected. In order to select the business process with the lowest cost service, this paper proposes a process recommendation method based on cost constraints.

Process recommendation method based on cost constraints
In the actual execution process of the process, there may be certain preferences, which make some activities have a high probability of execution, and some activities have a low probability of execution, correspondingly, some paths are frequent sequences, and some paths are infrequent sequences. Since N 1 is similar in structure to N 2 and N 3 , and the similarity is the same, in order to distinguish, add a double constraint, that is, the cost constraint, and choose the process with less cost of the two, which is the process we finally choose. Therefore, a process recommendation method based on cost constraints is proposed.
Definition 5.1: (Cost Constraint): The cost constraint is denoted by the interval [p, q], where p and q are non-negative real numbers, and p ≤ q, the length of the cost constraint [p, q] is denoted by l([p, q]) = p − q + 1.
min dis tan ce = dis tan ce 18.
Occurrence sequence = θ j 19. end if 20. end for 21. end for 22. C = cos t(θ j ) * ρ(θ i ) Definition 5.2: (Occurring Sequence Set): Let N be a labelled Petri net, the set of all possible occurring sequences from the start node to the target node, denoted as SN.
Definition 5.3: (Frequent Sequence, Sequence Cost): Any occurrence sequence θ ∈ SN, w = |θ| represents the length of the occurrence sequence, then the occurrence probabil- cos t[θ(i)] the cost of the sequence occurrence, where the cost of θ(i) is determined by the length of the cost constraint.
Analysis of Algorithm 2: calculate the probability of occurrence of each occurrence sequence of N 1 , and determine which sequences are frequent sequences, and the consumption cost of each occurrence sequence of N 2 (Algorithm 2: 1-12); Then find the occurrence sequence in N 2 that is most similar to the frequent sequence in N 1 , refer to the calculation method of editing distance in reference (Levenshtein, 1966) and calculate the corresponding consumption cost (Algorithm 2: 13-22).
Example: calculate the probabilistic cost of N 1 and N 2 , and the probabilistic cost of N 1 and N 3 (Figures 3 and 4).
Let δ = 0.15, the frequent sequence of processN 1 is SN = {abdef , ab deg, acdef }, and the occurrence sequence of process N 2 is most similar to the frequent sequence of process N 1 , which are abdef , abdeh, and acdef . The consumption costs of these three sequences are 24, 23, and 26, respectively, and the probabilistic cost is 24 × 0.42 + 23 × 0.21 + 26 × 0.18 = 19.59; The occurrence sequence of process N 3 is most similar to the frequent sequence of process N1, which are abdef , ab deg, and acdef . The consumption costs of these three    sequences are 29, 30 and 31, respectively, and the probabilistic cost is 29 × 0.42 + 30 × 0.21 + 31 × 0.18 = 24.06. Therefore, process N 2 is selected as the optimal process of process N 1 (Tables 9 and 11).

Feasibility analysis of similarity calculation method
The similarity between the processes is calculated by using the current mainstream similarity algorithm, and compared with the calculation method in this paper. The results are shown in Table 12. The result of algorithm 1 is 0.78, and the result of the mainstream algorithm is in the range of 0.7 ∼ 0.94, so it is feasible.

.Time complexity analysis
Algorithm 1 mainly consists of two parts: calculating the homodimension matrix and calculating the similarity. For the homodimension matrix, if the order is n, the complexity of the homodimension matrix is O(n 2 ). Similarity calculation involves the calculation of DNM 1 (i, j) ∧ DNM 2 (i, j). At this time, the time complexity is O(n 2 ), so the total time complexity is O(n 2 ). Algorithm 2: The number of occurrence sequences of N 1 and N 2 is n 1 and m 1 respectively, and the time required to traverse each occurrence sequence of N 1 and N 2 , and calculate the similarity, and the required time is O(n 1 + m 1 + n 1 × m 1 ).

Performance evaluation
To evaluate the performance of the algorithm, this paper manually created 200 process models and randomly assigned them into 3 datasets, where dataset 1, dataset 2 and dataset 3 contained 38, 62 and 100 process models, respectively.
We know that the greater the distance between the two processes, the smaller the similarity, and conversely, the smaller the distance, the greater the similarity. The similarity can be converted into distance to check the related properties. The distance between the two processes is d (N 1 , N 2 (1) Non-negativity: the distance between two processes (DNM 1 (i, j) + DNM 2 (i, j)), ∴ 0 ≤ sim(N 1 , N 2 ) ≤ 1, So it is non-negative. (1) Symmetry: The distance between any two processes is unique, d(N 1 , N 2 ) = d(N 2 , N 1 ), that is, the similarity of the two processes is the same, and it has symmetry. (2) Identity: If the two processes are the same, the distance between them is 0, that is, sim(N 1 , N 2 ) = 1, which means they are identical. (3) Triangular inequality: Given any three processes N1, N2 and N3 and the distances d(N 1 , N 2 ), d(N 1 , N 3 )and d(N 2 , N 3 )between them, the sum of any two distances is greater than or equal to the third distance, i.e For the verification of the triangle inequality, it is converted into the triangle inequality satisfaction rate for verification. Assuming that there are a total of n process models, three are randomly selected from them, and there are C 3 n ways of taking them. If m of them satisfy the triangle inequality, the satisfaction rate of the triangle inequality is m/C 3 n . Comparing the triangular inequality satisfaction rate of mainstream algorithms ( Figure  5), it is found that SSDT algorithm and CF algorithm are poor in triangular inequality satisfaction rate, PTS only has some data satisfying triangular inequality, TAR, GED, BP, Algorithm 1 has a better satisfaction rate than PTS, SSDT and CF algorithms.
In addition to satisfying the above four properties, the similarity algorithm in this paper is also related to data and cost (Table 13).
In terms of running time, the running times of different algorithms under several sets of data sets are compared, as shown in Figure 6. It is found that the CF algorithm is timeconsuming and has poor running efficiency. The time consumption of algorithm 1 is lower than that of the other algorithms except that of the TAR algorithm.  Figure 6. Comparison of running time.

Conclusion and future
In order to simplify the working procedure of enterprises and avoid unnecessary expenses, a process recommendation method based on cost constraint is proposed to solve the existing problems in process retrieval. First, the concept of activity execution probability is given, the execution probability of each activity is calculated, and a process matrix is constructed based on this, and then the similarity between processes is calculated by the biggest-smallest approach degree, and the process set with similar structure is found. Finally, in order to distinguish processes with similar structures, a cost constraint-based process recommendation method is proposed to find out the business processes with lower service cost in the process set. The experimental results show that the similarity calculation is feasible, and the business process with lower cost is recommended. Although the data and cost are considered, the operation efficiency is not worse than other algorithms. However, the method proposed in this paper requires a process model and a specific process execution log. Compared with other methods, the input conditions are more, and the infrequent behaviour is directly ignored when calculating the execution probability. In the future, in addition to considering infrequent behaviours, further research will be done on the application of process recommendation methods to industrial scenarios to make the methods more adaptable.