Machine learning methods based on probabilistic decision tree under the multi-valued preference environment

Abstract In the classification calculation, the data are sometimes not unique and there are different values and probabilities. Then, it is meaningful to develop the appropriate methods to make classification decision. To solve this issue, this paper proposes the machine learning methods based on a probabilistic decision tree (DT) under the multi-valued preference environment and the probabilistic multi-valued preference environment respectively for the different classification aims. First, this paper develops a data pre-processing method to deal with the weight and quantity matching under the multi-valued preference environment. In this method, we use the least common multiple and weight assignments to balance the probability of each preference. Then, based on the training data, this paper introduces the entropy method to further optimize the DT model under the multi-valued preference environment. After that, the corresponding calculation rules and probability classifications are given. In addition, considering the different numbers and probabilities of the preferences, this paper also uses the entropy method to develop the DT model under the probabilistic multi-valued preference environment. Furthermore, the calculation rules and probability classifications are similarly derived. At last, we demonstrate the feasibility of the machine learning methods and the DT models under the above two preference environments based on the illustrated examples.


Introduction
With the development of artificial intelligence, machine learning plays a vital role in it. Machine learning is used in addressing data set containing a lot of messy information (Blum & Langley, 1997), and it can be widely applied in various fields, such as the sales (Sun et al., 2008;Tsoumakas, 2019;Wong & Guo, 2010), the credit assessment (Kruppa et al., 2013;Pal et al., 2016), and the autonomous vehicles (Qayyum et al., 2020), etc. Moreover, machine learning is commonly be divided into two different types including the unsupervised learning and the supervised learning. The unsupervised learning is a method that classifies samples through data analysis of a large number of samples without the category information (Barlow, 1989;Goldsmith, 2001). The most common unsupervised learning algorithm is the k-means method (Cap o et al., 2020;Han et al., 2020). The supervised learning is a kind of method to learn a function from a given training data. Next, when new data comes, it can predict the result based on the former function (Figueiredo, 2003;Gerfo et al., 2008). The supervised learning contains many algorithms, such as the decision tree (DT) (Pal & Mather, 2003;Polat & G€ uneş, 2007), the k-nearest neighbor (Zhang & Zhou, 2007;Lee et al., 2019), the random forests (Mantas et al., 2019), and the support vector machines (Utkin, 2019;Yuan et al., 2010). Obviously, we can conclude that the machine learning method is not only widely used but also has many kinds of extended algorithms. Based on this, the machine learning method can be developed by fusing some new conditions according to the actual requirements.
In the above machine learning methods, the DT is a basic machine learning classification, regression (Galton, 1889), and data mining (Han & Micheline, 2006) method. Hunt (1965) proposed the Concept Learning System (CLS) which introduces the concept of the DT. Based on the ID3 algorithm, the DT was formally proposed and defined (Quinlan, 1986(Quinlan, , 1987. After that, some scholars continued to research and improve this method based on the ID3 algorithm, resulting in many classic classification algorithms such as the CART (Crawford, 1989;Rutkowski et al., 2014), the C4.5 (Quinlan, 1996), and the SLIQ (Narasimha & Naidu, 2014). In addition, these methods have a large number of extended applications (Bhargava et al., 2013;Hardikar et al., 2012;Ruggieri, 2002;Santhosh, 2013). Although the above algorithms with inductive properties used to construct the DTs can make classification and regression in the single-valued attributes environment, the operation results are not satisfactory when the environment changes (Chen et al., 2003).
According to the above analysis, the multi-valued concept and the multi-valued environment were presented (Miao & Wang, 1997). Some scholars built the models by using the similarity (Clarke, 1993;Santini & Jain, 1999;Tversky,1977) to solve the problem under the defined multi-valued and multi-labeled environments (Chen et al., 2010;Cheng, 2014;Chou & Hsu, 2005;Hsu, 2021;Yi et al., 2011;Zhao & Li, 2007). Obviously, they realized the transition from the single-valued to the multi-valued environments. This lays a research foundation for the proposal of a decision classification algorithm under the multi-valued preference environment. However, in the decisionmaking process, the numbers and weights of the preferences are sometimes different. Therefore, it also provides a research direction for this paper to extend the multi-valued preference environment to the probabilistic multi-valued preference environment.
To achieve the above aims, this paper is organized as follows: This paper briefly reviews the traditional DT models in Section 2. The weight and quantity matching method and the DT algorithm under the multi-valued preference environment are proposed in Section 3. In Section 4, we further propose a machine learning model under the probabilistic multi-valued preference environment and demonstrate the operation process with an example. Section 5 gives the derived conclusions.

Preliminaries
The concept of the probabilistic multi-valued attributes environment can be divided into two perspectives to understand. The first perspective is the environment with the multi-valued preferences; the second perspective is the environment with the probabilistic multi-valued preferences. It is found that both of these environments are established based on the traditional single-valued attributes environment. Therefore, before introducing the two new environments in detail, we first briefly review machine learning in the traditional single-valued attributes environment. Then, this paper introduces the construction process of the DT model under the traditional single-valued attributes environment. Finally, we describe the multi-valued and probabilistic multi-valued preference environments. By comparing the three environments through examples, this paper can intuitively present the changes.

The DT model in a traditional single-valued attributes environment
The traditional DTs include the ID3, C4.5, and CART methods. In this paper, the ID3 algorithm is selected as the basic technique. This method uses the entropy method and information gain to construct a DT, where the information gain is the difference between the information entropy and conditional entropy, and then selects the attribute with the largest information gain as the optimal attribute to form a classification node. The main construction steps of the DT model are shown below: Assume that a training data set is S where jSj denotes the sample number, and these samples have K classes labeled C k , k ¼ 1, 2, 3, . . . , K, and C k j j represents the number of samples belonging to the class C k : If a discrete attribute A has m possible values, and then A divides the training data set S into D training data subsets and each training data subset can be labeled as S d ðd ¼ 1, 2, . . . , DÞ, where S d j j is the number of samples in S d : Note that the set S dk belongs to the class C k in the subset S d and S dk j j is its sample number of S dk : The corresponding calculation formula of the information gain is given as follows: Eq.
(1) denotes the information gain of the attribute A in the training data set S, it is used to measure the degree of information uncertainty reduction. Eq.
(2) represents the information entropy of the training set S: Eq.
(3) denotes the conditional entropy of the attribute A in the training data set S: Generally, the classification ability of an attribute is enhanced with the increase of the information gain. Then, we can select the attribute with the largest information gain as the optimal attribute.
According to this idea, this paper traverses the entire training data set and finds out all the attributes to construct the probabilistic DT. Note that the study in this subsection is only based on the traditional single-valued attributes environment, and then we study the multi-valued and probabilistic multi-valued preference environment in the next section.

The multi-valued and probabilistic multi-valued preference environment
As we know, there are some situations which show the multi-valued preference and probabilistic multi-valued preference characters in real life. However, few models are developed based on the multi-valued and probabilistic multi-valued preference environments. Therefore, we try to study these two new environments and propose the feasible probabilistic classification and machine learning models in this paper.
It is noted that the original data are derived from the DT cases (Quinlan, 1987). Tables 1-3 respectively represent the above three environments in this reference. We can find that there are four attributes in each table, namely "outlook, temperature, humidity, windy". Each attribute has its sub attributes. For example, "outlook" contains three sub attributes of "sunny, overcast, rain"; "humidity" contains three sub attributes of "hot, mild, cool"; "temperature" contains two sub attributes "high, normal", and "windy" contains two sub attributes "true, false". Among them, D i denotes the sample of day i: The attributes "outlook, temperature, humidity, windy" means that there are four attributes in the data set; each attribute contains different sub attributes such as "sunny, overcast, rain". According to these setting, we have o ij , t ij , h ij , and w ij represent the probability values of the j th attribute in the i th sample. Classes P and N represent the positive and negative instances. p i and 1Àp i denote the probability values of P and N in the i th sample. Therefore, we can further describe these three environments in detail. Table 1 shows a single-value preference environment. Extract sample D13 from Table 1 as an illustration, we can find that D13 shows each attribute only retains one attribute value as the final state. For example, the attribute "outlook" selects the attribute of "overcast", "temperature" selects the attribute of "hot", "humidity" selects the attribute of "high", and "windy" selects the attribute of "false". In this single-valued preference environment, each rule corresponds to only one classification result P or N, which is called the absolute classification. Table 2 shows a multi-valued preference environment. We can find that the preference values of the sample have the multi-valued forms. Take the sample D23 in  Table 2 as an illustrated example, D23 shows that its each attribute contains one or more preferences. In this example, the attribute "outlook" selects three preferences, namely "overcast, sunny, rain", which means that three different preferences of "outlook" appeared on this day. Obviously, this situation is common in real life. In this paper, we focus on these three preferences but not the order that they occur. Based on this, a multi-valued preference environment can be shown as follows: Definition 1. Let D i be the i th sample with j attributes, and each attribute contains t preferences, and then this case can be defined as the multi-valued preference environment, where the probability of each preference is equal, i ¼ 1, 2, . . . , n, j ¼ 1, 2, . . . , m, and t ¼ 1, 2, . . . , l: According to Def. 1, we can obtain the multi-valued preference environment. Then, it is found that all the attributes in the data retain the single and multiple preferences. The weight of each preference in the same sample and the same attribute is equal. Furthermore, we can find that the probability of preference is not involved here. Table 3 shows the multi-valued preference environment with the probability. Taking D33 in Table 3 as an example, we can find that this is different from Table 2 and each preference have their probability. Moreover, O, T, H, and W respectively represent the possible probabilities. Taking the attribute "temperature" in D33 as an example, this shows that the occurrence probability of preference "hot" is t 31 , "mild" is t 32 , and "cool" is t 33 on the same day, and the sum of the probabilities of these three preferences is 1.
Moreover, the attribute "humidity" just has one attribute value "high". Obviously, the occurrence probability of the attribute value "high" in this day can be set as 100%, and the occurrence probability of the other attribute "normal" can be set as 0%. Finally, by observing the three small samples in Table 3, we can find that the number of attribute in each sample can be set as the different values, which means that the attribute situation that occurs every day is also different. We think that this situation including the multi-valued preference and its probability is closer to real life.  The three different environments aforementioned are three kinds of presentations and show the different relationships. The multi-valued preference is a development of the single-valued attribute, while the probabilistic multi-valued preference is a development of the multi-valued preference. Therefore, this paper develops the DT model based on these different environments and analyzes them according to these two situations which are the main innovations of this paper.
3. The machine learning model in the multi-valued preference environment As aforementioned in Section 2.1, the ID3 algorithm is used to calculate the singlevalued discrete data. Then, this method directly selects the optimal attribute by calculating their information gains and constructs the machine learning model. The hypothesis of the ID3 algorithm is that the weight of each sample in each column of attributes is 1. However, the multi-valued and probability multi-valued data sets studied in this paper have the main characteristics, which are the different numbers of attribute values in the same attribute and sample. Thus, these two types of data cannot be directly embedded in the information gain formula to select the optimal attribute in the calculation process.
With respect to the above discussion, we can find that the build of the DT in the multi-valued preference environment includes two main steps. The first step is to solve the aforementioned problem by preprocessing the original training data. In this step, the numbers and weights of the preferences in the original training data should be re-matched to balance them. The second step is to construct a DT using the new training data. Compare with the optimal degree of each attribute and its preference based on the bifurcation criterion, we can find that the optimal attribute is different. Based on this, we first propose the weight and quantity matching method under the multi-valued preference environment in the next subsection.

The weight and quantity matching in the multi-valued preference environment
In the multi-valued preference environment, the training data set S contains n samples for the multi-dimensional attributes which presents a multi-dimensional matrix. Suppose the attribute A in i th sample contains l preferences can be expressed as a i1 , a i2 , . . . , a il f g : Then, the preferences that are selected for the attribute A in the training data set S can be shown as a matrix below: . . . a n1 , a n2 , a n3 , . . . , a nl 2 6 6 4 3 7 7 5 : where S A represents a matrix includes the different preferences in the attribute A of all the samples, which is a combination of the n sets fa i1 , a i2 , a i3 , . . . , a il g and i ¼ 1, 2, . . . , n: a it denotes the i th sample selects the t th preference in the attribute A and t ¼ 1, 2, . . . , l, a it j j is defined as the weight of the preference a it : Moreover, let L i denotes the number of preferences belonging to the attribute A in the i th sample. However, since each sample exists independently, each L i can be unequal. Therefore, different preferences have different weights under the attribute A: To address the issue of uneven weights caused by the unequal numbers, this paper proposes a weight and quantity matching method. This method is to use the least common multiple to match the quantity of data. This principle of the least common multiple can change the amounts of data into the same one. Assume the least common multiple is marked h ¼ 〚〛: In this multi-valued training data set S, the least common multiple of the attribute A can be calculated as h ¼ 〚L 1 , L 2 , L 3 , . . . , L n 〛: The original training data set S can be expanded accordingly and the expansion multiple is marked with h=L i : Here, we make the number of preferences in each sample with respect to the attribute A become h: Therefore, the weight reduction of each preference is a multiple of Similarly, the preferences of other attributes can also obtain new weights using the given weight and quantity matching method. Then, we can get a new multi-valued preference data set S 0 , and assume a it j j=L i ði ¼ 1, 2, . . . , n; t ¼ 1, 2, . . . , lÞ as the new weight of each preference.
Based on this, we finish the process of quantity matching and give a weight and quantity matching method. Therefore, we complete the weight and quantity matching in the multi-valued preference environment. Further, we can analyze how to use the weight and quantity matching method to develop the DT algorithm under the multivalued preference environment.

The DT algorithm in the multi-valued preference environment
As we know, the DT model under the single-valued environment uses the information gain to select bifurcation nodes. Similarly, under the multi-valued attribute environment, we also construct the DT model based on the bifurcation nodes. However, with respect to the different environments, the selection criterion for the bifurcation nodes are changed accordingly. In this paper, we develop a new method of selecting nodes under the multi-valued preference environment as the bifurcation criterion.
The following content explains these concepts in conjunction with the mathematical expressions and modeling process of the new DTs and machine learning. We also introduce the notations that are used in the following Sections, which are shown in Table 4.
Due to the different environments, this paper optimizes the algorithm in the single-valued preference environment and improves the information gain as a bifurcation criterion that can handle the multi-valued preferences. Moreover, this paper divides the node selection into two steps, which are the selection of the root node and the other nodes. The reason is that the selection of the root node only considers the changing relationship between the preference and classification. However, the selection of other nodes needs to be based on the known conditions of the previous node. Therefore, the weight modification should be considered, as well as the previous node's weights.
With respect to these principles, the calculation of the root node bifurcation criterion in the multi-valued preference environment is summarized as follows: Then, the bifurcation information entropy bif r ðS 0 Þ of the root node is shown as Furthermore, the conditional information entropy of the root node is presented as follows: Similarly, the calculation of the sub node bifurcation criterion in the multi-valued preference environment is summarized as follows: Then, the bifurcation information entropy bif sub ðS 0 w Þ ðw ¼ 1, 2, . . . , WÞ of the sub node is The multiple of preferences a c iu that should be reduced after quantity matching jC k j The sample number of the class C k 1=R c ik The multiple of preferences a c iuk that should be reduced after quantity matching ja iw j The weight about the w th preference of the root node attribute in the i th sample The preferences a iw of the root node attribute is under branch Z, where there are b samples with classification result P ja iwk j The weight about the w th preference in the root node attribute belongs to the class k in the i th sample Moreover, the conditional information entropy of the sub node is presented as Therefore, we can obtain the root node and sub node bifurcation criterion in the multi-valued preference environment. As we know, after the numbers of attributes are matched, the preference weights can be changed. When calculating the bifurcation criterion values of the nodes, the previous nodes should be considered again. Generally, the larger bifurcation criterion, the more information contained in the data, and the greater uncertainty.
However, the classification result of the DT can be not accurate at this stage and the same attribute branch can appear the different classification results. Then, the final result can be a multi-class result containing probabilities. Therefore, we should calculate the corresponding probabilities of the classification results in all branches to help decision-making. Assume that the preference set corresponds to a branch Z and the classification results are P and N: Then, we can obtain the probability e P that the classification result is P by Eq. (10). Similarly, we also can get e N : Thus, the main construction process of the probabilistic DT and machine learning in the multi-valued preference environment is given. Obviously, we can make a decision based on the above classification results correspond to each branch. Here, we choose a classification with a probability greater than 50%. To further demonstrate the calculation steps, we give the following pseudo-code.
Input: The multi-valued training data set S: Step 1: Preprocess the multi-valued training data set S to match its weight and quantity, and obtain a new data set S 0 : Step 2: Take each attribute A in the new training data set S 0 as a node.
Step 3: Transform the preference in an attribute and divide the corresponding data into the different sub nodes.
Step 4: Use the bifurcation criterion to calculate the optimal preference for partitioning.
Step 5: Convert all the attributes according to the second step, select the best attribute and preference, and derive the final sub node.
Step 6: Perform Steps 3-5 again for all the nodes until each node becomes a final node, and then we can obtain a DT.
Step 7: Compare the DT with the original data and obtain classification results under the same bifurcation, and then calculate the probability of each classification and obtain a probabilistic DT. Output: A probabilistic DT.
To understand the above algorithm and its steps, we use an illustrated example to prove its feasibility in the next subsection.

An illustrated example
To show the above machine learning model, a simple example is given in this section to show the operation process. For an intuitive comparison, this section follows the example in Section 2.2, retains 10 samples, namely Si ði ¼ 1, 2, 3, . . . , 10Þ: We consider four attributes, which are outlook A 1 , temperature A 2 , humidity A 3 , and windy A 4 : The training data set S is shown in Table 5 and it is further expanded and shown in Table 6. The final classification DT is shown in Figure 1.
According to the steps to build a multi-valued DT, this subsection first performs the quantity matching in the original data set S, and then calculates the least common multiple of the four attributes, then we can obtain their matching values when h 1 ¼ 6, h 2 ¼ 6, h 3 ¼ 2, and h 4 ¼ 2: Then, based on the result of quantity matching, the original data set S is expanded to a new data set S 0 as shown in Table 6. It is  found that the number and weights of attribute values in the same attribute are kept consistent. Further, we use the bifurcation criterion to select the optimal attributes and construct a DT. In the following content, we only show the selection process of the first node. The calculation processes of other nodes are the same and the results are shown in Table 7.
In Table 7, the left part shows the classification process under the attribute value is "sunny". The right part shows the classification process under the attribute value is "rain". Obviously, under the two branches, the bifurcation criterion value of A 3 is greater than the values of A 2 and A 4 : Thus, the attribute A 3 is selected as the optimal node. Since the classification with respect to the attribute A 3 still contains P and N, then, A 3 is only used as a bifurcation node. The two preferences "high" and "normal" of attribute A 3 can divide the data set again. Thus, S 1 is divided into S 11 and S 12 : These two sets select the attributes from the remaining attributes A 2 and A 4 respectively as the next bifurcation node. Table 7 shows the more detailed results.
From the calculation results in Table 7, it can be seen that in S 11 , the bifurcation criterion value of the attribute A 2 is greater than the value of A 4 : Thus, A 2 is selected as the bifurcation node in both sets. The last attribute A 4 is left as a stop node. Furthermore, in S 12 , the bifurcation criterion value of the attribute A 2 is greater than the value of A 4 , and then A 2 is selected as the bifurcation node. Thus, only the last attribute A 4 is left as a stop node. Based on this, we temporarily get a DT that lacks classification results. Since the classification results also contain probabilities, we need to calculate them again. Figure 1 is a probabilistic DT derived from the training data set. The ellipse represents the attribute of the bifurcation node, the better the classification performance is, the closer the attribute is to the root of the tree. The line denotes the division preference, and the rectangle shows the probability of the classification result. The upper and lower numbers represent the probabilities of the classification results respectively under the bifurcation, which are P and N: For example, when the preferences are "rain, high, false, mild", we can find that the corresponding classification result is P ¼ 100% and N ¼ 0%: Obviously, P is better than others under this branch. If preferences are "sunny, high, cool, false", we can find that the corresponding classification result is P ¼ 18:18% and N ¼ 81:82%: Then, N is better than others under this branch. If preferences are "sunny, high, hot, true", we can find that the corresponding classification result is P ¼ 55% and N ¼ 45%: There is very little difference in the classification probabilities between P and N under this branch. Then, both the alternatives are equal.
As aforementioned, to develop a DT model in machine learning under the multivalued preference environment, this section first proposes a weight and quantity matching method in the multi-valued preference environment. Then, the corresponding DT algorithm is proposed and applied to an illustrated example. In the next section, we develop a machine learning model in the probabilistic multi-valued preference environment.

The machine learning model in the probabilistic multi-valued preference environment
It is found that the probability of each preference is the same in the machine learning models under the multi-valued preference environment. However, in the real classification calculation process, the probability of each preference may be different. Therefore, in this section, we propose a machine learning model in the probabilistic multi-valued preference environment. First, we propose a DT algorithm under this environment in the following subsection.

The DT algorithm in the probabilistic multi-valued preference environment
The probabilistic multi-valued preference environment is a situation based on the expansion of the multi-valued preference. The training data set in the probabilistic multi-valued preference not only contains multi-valued preferences, but also each preference contains the different probability instead of the equal weight values.
In Section 3.1, to re-match the numbers and weights of the preferences under all the attributes, the original data S in the multi-valued preference environment needs to be preprocessed. Then, we can ensure that each attribute is balanced during the calculation process. However, each preference in the probabilistic multi-valued environment contains a probability. It is unnecessary to preprocess the data in this environment and we can directly calculate the bifurcation criterion of each attribute. Based on this, we can select the optimal bifurcation node and bifurcation sub node. Additionally, in the probabilistic multi-valued preference environment, it is noted that since each preference contains the probability, the preference probability need to be added in the entire calculation process. Therefore, we can build a DT model in the probabilistic multi-valued preference environment, which is also a machine learning model. Some related notations used in this section are given in Table 8.
Although this environment does not require preprocessing of the original training data set, we need to consider the probability contained in each preference. Then, we subdivide this modeling process into two steps, namely the selection of the root node and the other nodes.
The selection of the optimal root node is done individually, and each branch only considers a certain attribute. We calculate the bifurcation criterion of each attribute separately, and the attribute with the largest bifurcation criterion is the optimal one. There is only one optimal root node in a DT, while there may be multiple optimal sub nodes. The selection principle of all the sub nodes is the same. In this case, the sub node not only considers the current attribute and its belonging preference, but also considers all the previous nodes including the root node. Then, we can take the conditional probability in this environment as the probabilities of other sub nodes when the root node is known, and the corresponding probabilities are multiplied in the calculation process.
With respect to these principles, the calculation of the root node bifurcation criterion in the probabilistic multi-valued environment can be shown as Then, the bifurcation information entropy Mpbif r ðT 0 Þ of the root node is shown as Furthermore, the conditional information entropy of the root node is presented as follows: Table 8. Some main notations in the probabilistic multi-valued preference environment.

Notations
Introduction Notations Introduction The new training data set after quantity matching jq c iu j The weight about the u th preference of the sub node attribute c in the i th sample jT 0 j The sample number of T 0 jq c iuk j The weight about the u th preference of the sub node attribute c belongs to class k in the i th sample Q An attribute of T 0 P Z iw ðbÞ 0 The preferences q iw of the root node attribute is under branch Z, where there are b samples with classification result P jC k j The sample number of the class C k P Z icu ðbÞ 0 The preference q iu of the c th sub node is under branch Z, where there are b samples with classification result P jq iw j The weight about the w th preference of the root node attribute in the i th sample The preference q iw of the root node attribute is under branch Z, where there are g samples with classification result N jq iwk j The weight about the w th preference in the root node attribute belongs to the class k in the i th sample Similarly, the calculation of the sub node bifurcation criterion in the probabilistic multi-valued preference environment is summarized as follows: Then, the bifurcation information entropy Mpbif sub ðT 0 w Þðw ¼ 1, 2, . . . , WÞ of the sub node is Moreover, the conditional information entropy of the sub node is presented as follows: Therefore, we can obtain the root node and sub node bifurcation criterion in the probabilistic multi-valued preference environment. Furthermore, we need to calculate the probability of each branch in the DT, and then obtain the probability e 0 P that the classification result is P by Eq. (17). Similarly, we also can get e 0 N : Thus, we construct the DT model in machine learning under the probabilistic multi-valued preference environment, and the DT shows the probabilistic character. The above calculation process can be further shown by the following pseudo-code.
Input: The training data set T: Step 1: Treat each attribute Q in the new training data set T 0 as a node.
Step 2: Traverse each preference q of the current attribute and divide the data into different sub nodes by preference.
Step 3: Use bifurcation criterion to determine the optimal preference for partitioning.
Step 4: Consider the probability of each preference in the calculation process.
Step 5: Traverse all the attributes by the second step, select the best attribute and the best preference of the attribute, and get the final sub node.
Step 6: Continue to perform Steps 2-5 for all sub nodes, until each final sub node becomes a stop node, and obtain a DT.
Step 7: Calculate the probability of the classification result under each branch in the DT. Output: A probabilistic DT.

An illustrated example
In this section, to show the effectiveness and feasibility of the above algorithm in the probabilistic multi-valued preference environment, we give the detailed calculation process based on an illustrated example. To present the advantages, we follow the example in Subsection 3.2 and expand the data set to 60 samples which are shown in Table 9. We can find that the entire data set is divided into two parts, the first 50 samples are taken as the training data sets, and the last 10 samples are taken as the prediction data sets. In addition, the preference in each sample contains the probability and the sum of the preference probabilities is 1 in the same sample and attribute.
In this example, we only show the calculation process of the root node selection. Similarly, the selection of other sub nodes can be calculated. The calculation results are shown in Table 10 The above calculation shows the selection process of the root node. From the results, we can find that the attribute A 1 has the largest bifurcation criterion value, and then the attribute A 1 can be selected as the optimal attribute. Specifically, the three preferences "sunny", "overcast", and "rain" in the attribute A 1 divided data set T 0 into three kinds of preferences T 0 1 , T 0 2 , and T 0 3 : When the preference contained in the branch is "overcast", we can find that all the classification results are unique and the category is P: In this case, there is no need to further divide this node, and then the classification process is end.
Then, we calculate the bifurcation criterion of other attributes in the two kinds of preferences T 0 1 , and T 0 3 , namely "sunny" and "rain", to find all the sub nodes. This calculation process involves more multi-attribute bifurcation criterion. The Table 9. The probabilistic multi-valued training set containing 60 samples.  bifurcation criterion values and node selection results are given in Table 10. Then, we can find that the attribute selection order is A 1 , A 4 , A 3 , and A 2 : Based on this, we can get a DT in machine learning. At last, we use Eq. (17) to calculate the probability of each branch in the tree and get a probabilistic DT, which are shown in Figure 2. Based on Figure 2, we can obtain the probabilities of all the branches in the DT and their corresponding classification results. As shown in Figure 2, the ellipse represents the bifurcation node, the line represents the division preference, and the rectangle denotes the probability of the classification result. The upper and lower numbers in the rectangle refer to the probabilities of the classification results, which are P and N respectively. According to this, we can further obtain the classification results of the prediction data set. Moreover, Figure 2 shows all the probabilities of the classification results for P and N: As aforementioned, in this section, we propose a DT model in machine learning under the probabilistic multi-valued preference environment. In addition, we apply them to an illustrated example which reflects their feasibility and effectiveness.

Conclusions
In the classification process, the data are often not unique, there may be multiple values and probabilities, and then it is meaningful to develop an appropriate method to make a classification decision in this situation. To do this, this paper has given the multi-valued preference environment and probabilistic multi-valued preference environment, and then constructed two optimization machine learning methods based on the new DT models. First, in the multi-valued preference environment, a training data set with matching quantity and weight has been obtained through data preprocessing, and a DT model in the multi-valued preference environment has been proposed by using the entropy to generate several branches and probabilistic classifications. Then, according to the different probabilities, we have developed a DT model in the probabilistic multi-valued preference environment, and the branches and their corresponding probabilities have been similarly generated. Meanwhile, we have given two illustrated examples to show the calculation process and proved the feasibility of the proposed machine learning methods. Therefore, there are three contributions of this paper: (1) This paper has proposed a data preprocessing method to match the weights and numbers in the multi-valued preference environment; (2) This paper has developed a machine learning method and a DT algorithm in the multivalued preference environment to obtain the corresponding probability classification; (3) This paper has constructed a machine learning method and a DT algorithm in the probabilistic multi-valued preference environment, and the corresponding probability classification has been obtained.
The machine learning methods and the DT algorithms in the multi-valued preference and probabilistic multi-valued preference environments have been proposed in this paper. These can show the practical significance and provide decision makers with reasonable classification decision suggestions. However, these methods also have some limitations in data preprocessing. For example, they are only suitable for the small samples. These limitations are also the focus of our future research.

Disclosure statement
No potential conflict of interest was reported by the authors.

Funding
This work was supported by Natural Science Foundation of China (Nos. 72071176 and 71840001).