Determine OWA operator weights using kernel density estimation

Abstract Some subjective methods should divide input values into local clusters before determining the ordered weighted averaging (OWA) operator weights based on the data distribution characteristics of input values. However, the process of clustering input values is complex. In this paper, a novel probability density based OWA (PDOWA) operator is put forward based on the data distribution characteristics of input values. To capture the local cluster structures of input values, the kernel density estimation (KDE) is used to estimate the probability density function (PDF), which fits to the input values. The derived PDF contains the density information of input values, which reflects the importance of input values. Therefore, the input values with high probability densities (PDs) should be assigned with large weights, while the ones with low PDs should be assigned with small weights. Afterwards, the desirable properties of the proposed PDOWA operator are investigated. Finally, the proposed PDOWA operator is applied to handle the multicriteria decision making problem concerning the evaluation of smart phones and it is compared with some existing OWA operators. The comparative analysis shows that the proposed PDOWA operator is simpler and more efficient than the existing OWA operators.


Introduction
During multi-criteria decision making (MCDM) processes (Garg, 2018a;2018b;, various decision methods can be chosen to handle the criteria evaluation information of alternatives for ranking them (He et al., 2019a(He et al., , 2019bLin et al., 2019;;Zeng et al., 2019). The aggregation operators are considered as a simple yet efficient MCDM method (Garg, 2018c;Garg & Kaur, 2020), which aggregate the criteria evaluation information of alternatives into the overall criteria values  CONTACT Riqing Chen linmwcs@163.com 2018; Roy et al., 2019;Tang et al., 2019). According to the overall criteria values of alternatives, all the alternatives can be ranked and then the optimal one can be obtained (Mi et al., 2019). To improve the ranking results of MCDM problems, a variety of aggregation operators have been proposed (Kang et al., 2018;Liu et al., 2020;Riaz & Tehrim, 2019;, such as weighted averaging operator, weighted geometric operator, and ordered weighted averaging (OWA) operator (Yager, 1988;. The OWA operator was proposed by Yager (1988). It is a parameterized class of the mean type aggregation operators. If the OWA operator weights are determined, then special aggregation operators will be obtained, such as max, arithmetic average, and min operators (Amarante, 2018;Gong et al., 2019;Jin et al., 2019a;Merig o & Yager, 2019;Yager, 2019;. Since it appeared, it has attracted much attention from a large number of well-known research scholars (Beliakov et al., 2018;Leite & Skrjanc, 2019;Mesiar et al., 2018). Due to its practicality, it has been widely applied in various fields such as machine learning (Maldonado et al., 2018), EEG signal improvement (Pander, 2019), time series data fusion (Liu & Xiao, 2019), and risk assessment Ma & Cong, 2019). The implementation process of the OWA operator is composed of three steps: (1) The input values are rearranged in the descending order.
(2) The weights of the rearranged input values are determined using an efficient method. (3) Finally, according to the derived weights, these rearranged input values are aggregated into a single value (Yager, 1988). How to determine the weights of rearranged input values is important for the OWA operator since using different weight information to aggregate rearranged input values may derive different ranking results (Casanovas et al., 2020;Jin et al., 2019b;. How to derive the weights of the OWA operator has become a hot research topic in the decision making analysis field and a large quantity of methods have been put forward, such as constraint optimization models (Filev & Yager, 1995;Fuller & Majlender, 2001), quantifier functions (Yager, 1999;Yager & Filev, 1994), and distribution assumption (Lenormand, 2018;Sadiq & Tesfamariam, 2007;Sha et al., 2019;Xu, 2005). For these methods, the weights of the OWA operator are determined in an objective way. They do not consider the complex data distribution characteristics of input values. For the distribution assumption methods, it is assumed that the distribution of the OWA operator weights follows various types of commonly used probability density functions. Although they own solid theoretical foundations and have desirable properties, the ideal hypothesis is unrealistic since the real OWA operator weights usually cannot fit the probability density functions in the practical situations.
Based on the data distribution characteristics of input values, several methods (Boongoen & Shen, 2008;Li et al., 2016;Xu, 2006;Yager, 1993) have been devised. The argument-dependent method (Xu, 2006;Yager, 1993) assigns the input values far away from the average value with small weights and the input values close to the average value with large weights. It treats all the input values as only one cluster, whose average value is solely used to derive the OWA operator weights. However, in the practical cases, the input values show complex data distribution characteristics, where there may exist two or more local clusters. To identify these local clusters, an agglomerative hierarchical clustering method was modified by Boongoen and Shen (2008) to partition the input values into multiple local clusters. After that, a clus-DOWA operator was proposed to derive the weights of input values based on their distances to their nearest local clusters. Using the concept of majority clusters, a majority clusters DOWA (MC-DOWA) operator was proposed by Li et al. (2016). It computes the weights based on the ratio of the number of input values in the local clusters to the number of all the input values. The clus-DOWA and MC-DOWA operators are capable of generating relatively reasonable ranking results by considering the complex data distribution characteristic of the input values, but they have high time complexity since classification methods should be first adopted to partition input values into some local clusters before calculating the weights. Moreover, for the MC-DOWA operator, the weights of different input values within the same local clusters are assigned with equal weights. It does not make sense.
To overcome these drawbacks, a novel probability density method is proposed to derive the weights of the OWA operator. A powerful mathematical tool, called the kernel density estimation (KDE), is introduced to estimate the probability density function (PDF) for fitting all the input values. The KDE can effectively capture the complex data distribution characteristic of input values without using complicated classification methods. According to the derived PDF, the input values with high probability densities are assigned with large weights, while the input values with low probability densities are assigned with small weights. Based on the probability density method, a novel probability density based OWA (PDOWA) operator is proposed in this paper and its desirable properties are investigated. An application is also developed to implement the processes of reordering, weighting, and aggregating input values of the proposed PDOWA operator. Finally, a practical MCDM example of evaluating smart phones is provided to show the application of the proposed PDOWA operator in the MCDM problem and then the comparative analysis is also given.
The rest of this paper is organized as follows: Some basic knowledge of the OWA operator is provided in Section 2. Section 3 proposes a novel probability density method to determine the OWA weights. Section 4 presents a novel probability density based OWA (PDOWA) operator and develops a smart application for the proposed PDOWA operator. In Section 5, an illustrative MCDM example concerning the evaluation of smart phones is provided and then the comparative analysis is also given. Finally, the valuable conclusions are drawn in Section 6.
It can be noted that the value of orness measure of the OWA operator falls in the unit interval [0,1] and the orness measure of the OWA operator actually characterizes its similarity degree to the max operator.
The latter measure, also called entropy, is defined as where EðWÞ denotes the dispersion of the OWA operator. It characterizes how uniformly the input values are being used.

Probability density method
As demonstrated in Figure 1, the implementation process of the proposed probability density based OWA (PDOWA) operator works as follows: 1. A given set of input values I 1 , I 2 , . . . , I n f gare reordered in descending order as m 1 , m 2 , . . . , m n f g ; 2. The mathematical tool of kernel density estimation (KDE) is used to estimate a probability density function (PDF), which can describe how the input values are distributed; 3. The estimated probability density function is used to compute the OWA operator weights; 4. The reordered input values associated with their weights are aggregated into a single value.
The key to the PDOWA operator is how to identify the data distribution characteristics of input values. In the OWA operator studied by Xu (2005), the input values are supposed to be independent and identically distributed and follow the normal distribution. Nevertheless, in the real situations, the input values are often distributed irregularly and they cannot follow the ideal normal distribution. Therefore, it is unreasonable to assume that the input values follow the normal distribution. To avoid the unreasonable assumption, a novel probability density method is proposed, which uses the kernel density estimation to estimate the probability density function that describes how the input values are distributed. As a nonparametric method, the kernel density estimation can estimate the probability density function of the given input values without the priori knowledge about the data distribution characteristic. Based on the kernel density estimation, the probability density function of the reordered input values m 1 , m 2 , . . . , m n f g can be estimated aŝ wherepðvÞ denotes the estimated probability density function of the random variable v, n denotes the number of input values, h is a smoothing parameter, also called bandwidth, KðÁÞ is a kernel. The kernel KðÁÞ has the following three features: (1) KðxÞ is symmetric; (2) Ð 1 À1 KðxÞdx ¼ 1; and (3) KðxÞ ! 0 for all x: As depicted in Figure 2, there are some classical kernel functions: normal (Gaussian), uniform, and Epanechnikov. Because of its desirable mathematical properties, the Gaussian kernel is chosen in this paper, which is defined as From Equation (3), it is seen that the accuracy of the estimated probability density function depends on the smoothing parameter h and kernel function KðÁÞ when the number of input values are enough large. Through statistical experiments, Epanechnikov (1969) and Scott (1992) found that different types of kernel functions have slight influences on the accuracy when the smoothing parameter is fixed. However, different values for smoothing parameter show great impacts on the accuracy. The smoothing parameter h governs the smoothness degrees of the estimated probability density functions. Large values generate oversmoothed estimations, while small values generate undersmoothed estimations. Hence, how to choose an appropriate value for the smoothing parameter is crucial to kernel density estimation. To determine an optimal value for smoothing parameter, there are many methods that have been put forward such as Scott's rule (Scott, 1992), Silverman's rule (Silverman, 1986), and cross validation (Rudemo, 1982). Scott's rule and Silverman's rule assume that the underlying distribution of input values is unimodal and normal, despite the fact that the real distribution is multimodal and non-normal. Thus, it is to be expected that the derived optimal value will be too large for the multimodal distributions. In this case, it will produce the oversmoothed probability density functions. The cross validation is an empirical method, which can produce more trustworthy optimal values regardless of the underlying distribution characteristics of the input values. All of these methods have been implemented in Python. Scott's rule and Silverman's rule are implemented in SciPy module and the cross validation is implemented in the Statsmodels KDEMultivariate class.
Example 1. Suppose that seven experts are invited to provide their preferences for a cloud storage service production with respect to its criterion performance. The collected preferences are expressed as I 1 ¼ 8:0, I 2 ¼ 9:8, I 3 ¼ 5:5, I 4 ¼ 9:5, I 5 ¼ 2:8, I 6 ¼ 8:6, I 7 ¼ 3:2 f g The above preferences are reordered in descending order as The estimated probability density functions are depicted in Figures 3-5 when different methods are used.
As shown in Figures 3 and 4, it is seen that the estimated probability density functions are oversmoothed and they is incapable of identifying the local cluster structures of the input values when the Scott's rule and Silverman's rule are used to determine the optimal values for the smoothing parameter. As shown in Figure 5, When the cross validation is used, two local cluster structures can be identified in the estimated probability density function. Hence, the optimal value derived from the cross validation method is more trustworthy.
We can also estimate the desirable probability density function by manually adjusting the value of the smoothing parameter. As the value of the smoothing parameter varies, various estimated probability density functions can be generated as depicted in Figure 6.
As shown in Figure 6, the probability density function shown as the blue curve is undersmoothed since it fits "biased" data when the value of smoothing parameter h ¼ 0:6: The probability density function shown as the black curve is oversmoothed since it is incapable of identifying the underlying local cluster structures of the input values when h ¼ 2:0: The probability density function shown as the green curve is considered to be optimally smoothed since it approximates the real data distribution characteristic of input values when h ¼ 1:4: Through the above analysis, it can be seen that the estimated probability density function can not only capture the local densities of given input values, but also identify their local cluster structures without using time-consuming classification methods. It is well known that the input values with high probability densities have large importance and the input values with low probability densities have small importance.  Following this rule, the input values with high probability densities should be assigned with large weights and the input values with low probability densities should be assigned with relatively small weights. Therefore, based on the estimated probability density function, the weights of the OWA operator can be derived as  where W ¼ w 1 , w 2 , . . . , w j , . . . , w n ½ denotes the weight vector of the OWA operator that satisfies w j 2 0, 1 ½ and P n j¼1 w j 2 0, 1 ½ : denote the proximity density (PD) of the input value v j to the other input values, then we can obtain the following theorem.
Theorem 1. Given a set of input values I 1 , I 2 , . . . , I n f g , then its descending ordered set is m 1 , m 2 , . . . , m n f g . If the proximity densities of input values v j and v k satisfy that dðv j Þ ! dðv k Þ, then wðv j Þ ! wðv k Þ: Proof. A ccording to equation (4), we can have Since dðv j Þ ! dðv k Þ, then wðv j ÞÀwðv k Þ ! 0: It completes the proof of Theorem 1. It can be observed that the OWA operator weights of descending ordered input values depend on their proximity densities.

Probability density based OWA operator
Based on the above weight determining method, in this paper, the PDOWA operator is defined as: According to Equations (1) and (2), the orness and dispersion of the PDOWA operator are computed as when I 1 ¼ I 2 ¼ Á Á Á ¼ I n , then we have OðWÞ ¼ 1 2 and EðWÞ ¼ ln n: In the following part, some properties of the proposed PDOWA operator are discussed.
Property 1 (Boundedness) Given a set of input values I 1 , I 2 , . . . , I n f g , then we have min I 1 , I 2 , . . . , I n ð Þ PDOWA I 1 , I 2 , . . . , I n ð Þ max I 1 , I 2 , . . . , I n ð Þ Proof. A ccording to equation (5), we have Property 2 (Commutativity) IfĨ 1 ,Ĩ 2 , . . . ,Ĩ n is any permutation of I 1 , I 2 , . . . , I n , then we have PDOWA I 1 , I 2 , . . . , I n ð Þ¼ PDOWAĨ 1 ,Ĩ 2 , . . . ,Ĩ n À Á Proof. IfĨ 1 ,Ĩ 2 , . . . ,Ĩ n is any permutation of given input values I 1 , I 2 , . . . , I n , then their descending ordered sets are equal, which is v 1 , v 2 , . . . , v n : Therefore, we have According to Equation (4) Example 2. The online reviews of a brand car are collected as 2:7, 3:2, 3:6, 8:2, 8:5, 5:4, 7:6, 3:3 f g : When the proposed PDOWA operator is used, the implementation processes are listed as follows: 1. They are reordered in descending order as 8:5, 8:2, 7:6, 5:4, 3:6, 3:3, 3:2, 2:7 f g ; 2. The probability density function of the online reviews is estimated as depicted in Figure 7. 3. The estimated probability density function is used to computed the weights of the PDOWA operator as To automatically perform the implementation processes of the PDOWA operator, a smart application is developed using Python. Tkinter, the standard GUI (Graphical User Interface) library in Python, is adopted to develop the GUI for this smart application. As shown in Figure 8, after running the smart application, users can enter a series of input values separated by commas and specify a value for the smoothing parameter h: When the button "submit" is clicked, the processes for reordering, weighting, and aggregating input values can be performed automatically along with the estimation of probability density function.

Illustrative example and comparative analysis
In this section, we present an illustrative example to illustrate the application of the proposed PDOWA operator in the MCDM problem and then the comparative analysis is also given.

Illustrative example
In this subsection, the proposed PDOWA operator is applied to aggregate the evaluation information of multiple experts since the experts usually cannot reach consensus so that their evaluation information shows the complex data distribution characteristics in terms of local clusters.
Example 3. An organization intends to purchase a batch of smart phones. After screening, four alternatives a 1 , a 2 , a 3 , a 4 f g are selected to be further evaluated according to four criteria: price (c 1 ), performance (c 2 ), battery life (c 3 ), and after-sale service (c 4 ). The weight values of these four criteria are ð0:3, 0:3, 0:2, 0:2Þ: Eight experts e 1 , e 2 , . . . , e 8 f g are called to evaluate these four alternatives with respect to their four criteria and then all the evaluation information is collected to construct eight decision matrices as shown in Tables 1-8.
In the following part, we show the application of the proposed PDOWA operator in this group MCDM problem.
where r k ij is the evaluation information of alternative a i with respect to criterion c j that is provided by the kth expert. The term x j denotes the weight value of criterion c j : The term I k i denotes the aggregated criteria value of alternative a i in the decision matrix provided by the kth expert.

Comparative analysis
To verify the superiority of our proposed PDOWA operator, we also use the normal distribution based OWA operator (Xu, 2005), clus-DOWA operator (Boongoen & Shen, 2008), and MC-DOWA operator (Li et al., 2016) to handle the above MCDM problem.
(1) When the normal distribution based OWA (NDOWA) operator is applied to process Example 3, the OWA weights of all the aggregated criteria values of four alternatives are computed as Therefore, the ranking result of these four alternatives is a 1 1 a 4 1 a 3 1 a 2 : (2) When the clus-DOWA operator is applied to process Example 3, then the modified agglomerative hierarchical clustering algorithm should be first used to divide the aggregated criteria values of alternatives into clusters as shown in Table 11.
Then, the OWA weights of aggregated criteria values of four alternatives are computed as  Therefore, the ranking result of these four alternatives is a 1 1 a 4 1 a 3 1 a 2 : (3) When the MC-DOWA operator is used to handle Example 3, then the classification method should be first used to group the aggregated criteria values of four alternatives into some clusters. Here, we use the k-means method to divide the aggregated criteria values of four alternatives into clusters as shown in Table 12. The ranking results obtained from the above four operators are summarized in Table 13.
From Table 13, it can be noted that the proposed PDOWA operator obtains the same best alternative as the NDOWA, clus-DOWA, and MC-DOWA operators. It shows the effectiveness of the proposed PDOWA operator. However, the ranking result obtained from the proposed PDOWA operator is different from those that are obtained from the other operators. The reasons are analyzed as follows: From the above analysis, the advantages of the proposed PDOWA operator are summarized as follows: 1. The clus-DOWA and MC-DOWA operators use the classification methods to divide the aggregated criteria values into clusters before deriving the OWA operator weights. However, the process for clustering aggregated criteria values is The proposed PDOWA a 1 1 a 4 1 a 2 1 a 3 NDOWA (Xu, 2005) a 1 1 a 4 1 a 3 1 a 2 clus-DOWA (Boongoen & Shen, 2008) a 1 1 a 4 1 a 3 1 a 2 MC-DOWA (Li et al., 2016) a 1 1 a 3 1 a 4 1 a 2 Source: The authors' data.
time-consuming. Moreover, additional parameter should be considered during the clustering process. For example, if the k-means method is used to cluster the aggregated criteria values, then how to determine the value for the parameter k is a key problem since it can influence the clustering result. The NDOWA operator introduces the probability density function of normal distribution to determine the OWA weights. Since the probability density function only depends on the number of aggregated criteria values, it is incapable of identifying the local cluster structures of aggregated criteria values. The proposed PDOWA operator is capable of identifying the underlying local cluster structures among aggregated criteria values without using the time-consuming classification methods. Therefore, the proposed PDOWA operator is simple but efficient. 2. The clustering results generated by the modified agglomerative hierarchical clustering algorithm in the clus-DOWA operator may be wrong, which can lead to unreasonable OWA weights and ranking results. The OWA weights obtained by the MC-DOWA operator are also unreasonable since the aggregated criteria values in each cluster are assigned with equal weights. The proposed PDOWA operator computes the OWA weights of aggregated criteria values according to their probability densities. Hence, the OWA weights and ranking results obtained from the proposed PDOWA operator are more reasonable.

Conclusions
In this paper, a novel PDOWA operator is proposed to determine the OWA weights by considering the data distribution characteristics of input values. The kernel density estimation is first applied to estimate the PDF, which fits to input values. Using the estimated PDF, the weights of the OWA operator can be derived. Afterwards, some desirable properties are discussed. Finally, a practical example concerning the evaluation of smart phones is provided to show the application of the proposed PDOWA operator in MCDM problems. The proposed PDOWA operator is capable of identifying the underlying local cluster structures among input values without using complicated classification methods. It is simple but efficient. Moreover, its OWA weights and ranking results are more reasonable than the existing OWA operators. However, it also has the limitations: 1) It does not consider the subjective weights of input values when aggregating input values; 2) It cannot be used to derive the OWA weights for interval-based input values.
In the future studies, we will extend the OWA operator to process rough sets (Sharma et al., , 2020 and its applications in multicriteria evaluation (Roy et al., 2019;

Disclosure statement
The authors declare that they have no conflict of interest.