Hybridizing association rules with adaptive weighted decision fusion for personal credit assessment

ABSTRACT Credit risk has been one of the major challenges emerged from the banking industry in modern financial markets. Served as a typical method, association classification (AC) has been widely used for personal credit risk assessment. It focuses on the relationship between an item and a class based on mined association rules, where three measures, i.e. support, confidence and weighted Chi-square, are generally used to generate association rules. However, most of the existing approaches neglect the discrimination power differences from between items, between measures and between rules. To deal with this problem, in this paper we present a novel approach characterized by hybridizing association rules with adaptive weighted decision fusion (HAR-AWDF) for personal credit assessment. Here in measures and rules are all worked as classifiers, and adaptive weightings are assigned to items via information entropies computed by posterior probabilities of individual items, also to measures and rules with their classification performance. In particular, a threshold scheme is proposed to determine whether a rule is effective, thereby the final decision is made via weighted voting in classification. The experimental results obtained conducting our approach on German Credit Data set show that among all items of the applied credit data, property, savings account and credit history are vital to evaluate personal credit state, in terms of classification accuracy.


Introduction
With the rapid development of inclusive finance, especially the development of online loan, consumer finance and other industries, personal credit industry possesses unlimited demand and potential, which has become the concern in credit industry. Therefore, it is necessary to build a reliable personal credit evaluation system, to differentiate the credit state for a customer (Lv, Li, & Zhang, 2017). Under the background of big data (Jones, Johnstone, & Wilson, 2015), several data mining techniques, including clustering (Huang, Hung, & Jiau, 2006), classification (Jurgovsky et al., 2018), association rules (Ma & Cheng, 2016) and prediction (García, Marqués, & Sánchez, 2019), have been conducted to perfect personal credit rating mechanisms.
Association rules are especially suitable for marketbasket dataset, which reflects the interdependence and correlation between one thing and other things. They are used to mine the correlation between valuable data items from a large amount of data (Jan, 2015). The AC technique (Wanaskar, Vij, & Mukhopadhyay, 2013;Yue & Shi, 2017) employs association rules in the classification process to increase classification accuracy, in which association rules CONTACT Yue Zhang zhangyue@ahpu.edu.cn are generally mined by making use of a priori-based algorithms (Apilettia et al., 2017;Jorge, Marcio, & Mario, 2018;Xie et al., 2019) or their improvements, say, FP-growthbased algorithms (Pei, Wang, & Wang, 2016). These algorithms aim to find all item sets and association rules whose support, confidence or weighted Chi-square are respectively greater than the minimum counterparts that are specified by users according to the applied database. However, they indicate neither significance differences between the candidate items nor classification performance differences between the derived rules. In addition, there are three measures, including support, confidence and weighted Chi-square, linking to a selected rule. In this way, it leads to another problem of how to integrate the decisions from three measures in the classification process. To handle the above problems, we propose the approach of HAR-AWDF to evaluate the personal credit in this work. The adaptive weighting is firstly assigned in our assessment system, according to the entropy calculated by the posterior probability of different items. The adaptive weightings of three classical measures are subsequently computed depending on their classification performance, and the fusion measurement for each candidate itemset is obtained by the weighted sum of three measures. The critical itemsets and rules can be then found, as the deviation between fusion measurements of two rules associated with the same itemset exceeds the given threshold value. Finally, the adaptive weightings of critical itemsets are computed based on their classification performance, and the class label for an instance is obtained by means of weighted voting.
The rest of this paper is organized as follows: In Section 2, we give the background knowledge of this study. The approach of HAR-AWDF is detailed in Section 3. The experimental results are reported in Section 4. And we conclude our work finally.

Association Classification (AC)
The traditional AC technique (Hadi, Aburub, & Alhawari, 2016) is derived as the combination of association rule mining and the resultant classification, which played an important role in the decision-making process in many previous applications. The association rules aim to discover a correlation or association between items, while association rules-based classification is conducted for label prediction. The derived rules that their support and confidence are no less than the given thresholds, can mine some data of strong relevance from the data set, that is, frequent itemsets.
Let T be a transactional database, defined as a set of transactions {t 1 · · · t n }, and X be a set of n different items (symbols or item values){x 1 , · · · , x n , c 1 , c 2 }. An association rule for classification is an implication of the form A → c, where A ⊆ X, and c represents the class value belonging to {c 1 , c 2 }. For example, the interpretation of a rule A → c 1 is that if an itemset A appears in a transaction, then the transaction can be categorized as class one.
In the phase of determining the interestingness of a rule, support (Sup), confidence (Conf) and weighted Chisquare (Wch) are the most widely used measures, which can be described as (1) Support: Sup(x 1 · · · x k → c) = σ (x 1 · · · x k c)/|T| (2) Confidence: (3) Weighted Chi-square: where σ (·) denotes the number of transactions containing the item x or the item set A = {x 1 · · · x k } in a transaction dataset, and |T| represents the size of transactional database T.

Shannon entropy
Shannon entropy (Xiong & Shang, 2016), also known as information entropy, reflects the relationship between the amount of information and the uncertainty of a piece of information. Given an item x, Shannon entropy is computed by where p j represents the posterior probability of the item x, and N denotes the possible category number. The entropy H(x) measures the information content of the item x. The larger the entropy, the smaller amount of information of the item, and the weaker discriminative power of the item to classify an instance with the item value x. Using H(x), the item x is assigned the weight W(x) with

Multi-classifier fusion
Motivated by the fact that individual classifiers perform differently for a same task, the technique in (Kannatey-Asibu, Yum, & Kim, 2017) can be used to enhance the performance of classifier ensemble by reducing classification errors of individual classifiers. Let L be the matrix (7) consisting of the class pattern (the class label) determined by individual classifiers. In Equation (7), l ij , N, m and n denote the class label of observation j by classifier i, the category number, the total number of all the samples, and the number of classifiers in turn. The weight w i of classifier i, indicating its classification performance for m observations, is calculated as where y j is the true class of observation j (j = 1, 2, 3, . . . , m), and For making the final decision on label predication, the voting matrix V is defined as where v(i) is the ith row of the identity matrix I, whose rank is determined by the number of the classes. Then the final decision equation is the dot product of the weighting vector W = [w 1 , · · · , w n ] and voting matrix V: where V(j) denotes the jth column of the voting matrix V.

HAR-AWDF-based assessment system
It is assured that the existing AC-based credit assessment systems neglect the classification performance differences stemmed from different attributions, measures and rules. As the improvements of our approach, the Shannon entropy is used as the criterion to measure the discriminative power of attributions, and the aforesaid multi-classifier fusion technique is employed to increase the accuracy of class prediction of the adopted measures and rules. Furthermore, the strategy of the weighted voting is conducted for final decision-making.

Problem formulation
In this section, we analyze the flaws of conventional AC technique through a simple example, and the raw data are shown in Table 1. For these data, we calculate the measures of Sup, Conf and Wch of single-itemset rules at first, and the results, respectively, are displayed in Table 2. As shown in Table 2, the following observations are very clear.  (1) The support for items a and b is higher than the given threshold 0.2. Hence, items of a and b become frequent items as the Apriori algorithm is applied. However, confidences for rules b → 1 and b → 2 are both 50%, while confidences for rules a → 1 and a → 2 are 85.71% and 14.29%, respectively. It manifests that the contributions of items a and b to categorize transactions are very different. Therefore, it's needed for one to assign weights to items to indicate the difference of discrimination between items.
(2) Confidence for rule a → 1 is 85.71%, which is higher than that of rule a → 2 of 14.29%. Nevertheless, the weighted Chi-square of rule a → 1 is 1.0, which is lower than that of rule a → 2 of 2.250. Thus, from the view of both support and confidence, a transaction with item a is classified in class 1, while with weighted Chi-square, it is categorized to class 2. These indicate that it is necessary to integrate the decisions supported by Sup, Conf and Wch. (3) According to the five items of the transaction dataset shown in Table 1, ten rules linking to single-itemsets are mined. By analogy, a large number of candidate itemsets and rules could be mined from a big dataset, which is bound to bring a huge challenge to algorithmic efficiency. So it is appropriate to set a threshold to prune redundant itemsets and rules. (4) There are many vital categorization rules to be mined for each category. Individual rules give different predictions. Hence, it is reasonable to fuse the decisions made by all selected rules for prediction.

Our schemes
The proposed HAR-AWDF system improves previous AC approaches by providing new schemes to the typical issues described above. The technical details of our approach are schematically illustrated in Figure 1. The HAR-AWDF algorithm functions in two phases: mining crucial single-itemsets and multi-itemsets and making decision fusion. The first phase contains four  steps: weighting items, weighting and integrating measures of candidate single-itemsets, selecting crucial singitemsets, and selecting crucial multi-itemsets. The second phase consists of weighting crucial itemsets and rules, and making a final decision. The followings are step-bystep demonstration on how the HAR-AWDF algorithm works: Step 1: Weighting items According to the confidences for items a, b, c, d and e in Table 2, we calculate Shannon entropies and weights of items via Equations (5) and (6), respectively. The results shown in Table 3 are posteriori probabilities, Shannon information entropies, and weights of five items. By comparing the entropy values of the five items, we observe that the entropy values of items b, c, d are higher than those of items a and e. This means that the discriminative powers of items b, c, d are weaker than those of items a and e which are assigned to weights 0.2627 and 0.2109, respectively. Thus, the equal weight scheme for items, widely used in the previous approach, is improved as the weighted items.
where W(x) denotes the weight of item x and is computed via Equation (6); Sup(x → c) is computed via Equation (2). Three measures of Wsup, Conf and Wch for a single-itemset rule x → c can then be weighted via Equation (8), where a measure is regarded as a classifier. The integrated measure of a rule x → c is eventually obtained by the weighted sum of Wsup, Conf and Wch. The integrated measures of rules with the same singleitemset to class 1 and class 2 are denoted as M 1 and M 2 , respectively. For the example shown in Table 1, the integrated measures of single-itemset rules are displayed in the second and third columns in Table 4. For the singleitemset of a, the integrated measures of rules a → 1 and a → 2 are 1.5599 and 0.4683, respectively. As a component of for algorithmic improvement, this procedure responds to the second issue addressed in Section 3.1.
Step 3: Selecting crucial single-itemsets The selection criteria of crucial single-itemsets can be described as where M 1 and M 2 denote the integrated measures of rules with the same candidate single-itemset linking to class 1 and class 2, respectively; δ is the threshold for selecting crucial single-itemsets. Specifically, the threshold δ is set to 0.1. The difference between the integrated measures between e → 1and e → 2 is 0.3. It is obvious that Equation (13) is then satisfied, so the single-itemset e is treated as a crucial single-itemset. As displayed in Table 4, the crucial single-itemsets satisfied Equation (13) include a, b, c and e. This scheme is distinct from the existing AC approaches in which five items are considered to have the same level.
Step 4: Selecting crucial multi-itemsets Followed by the above procedures, we consider the combinations of items in crucial single-itemsets. Then the programme is repeatedly conducted from step 1 to step 3 in order to select crucial multi-itemsets. Thus the weighted support (Wsup) of a rule x 1 · · · x k → c is calculated by where W(x i ) denotes the weight of item x i and is computed via Equation (6); Sup(x 1 · · · x n → c) is computed via Equation (2). As shown in Table 5, the crucial multiitemsets consist of five two-itemsets ab, ac, ae, bc and ce, three three-itemsets abc, abe and ace, and one fouritemset abce. Explicitly, the scheme of selecting crucial single-itemsets and multi-itemsets, executed in Step 3 and Step 4, aims to speed up algorithmic implementation. This distinctly differs from previous works to use all singleand multi-itemsets for prediction.
Step 5: Weighting crucial itemsets and rules In step 3 and step 4, we get the crucial singleitemsets and multi-itemsets whose classification performance shows actual differences. For this purpose, a crucial single-itemset or multi-itemset rule is endowed with the weight computed via Equation (8), when it works as a classifier. As shown in Table 5, the weights of crucial single-itemset rules e → 1 and e → 2 corresponding to the crucial single-itemset e are both 0.3, while the weights of crucial multi-itemset rules abce → 1 and abce → 2 corresponding to the crucial multi-itemset abce are both 0.1. Obviously, the scheme of weighting crucial itemsets and rules is another element of our approach in algorithmic improvement.
Step 6: Making final decision The weighted voting depicted by Equation (10) is then used to make a final decision via Equation (11) as the end of our algorithm's execution. It is clear that this procedure can further boost the performance of our proposed HAR-AWDF-based system.

Experiments and results analysis
In this section, we conduct several experiments to verify the effectiveness of the proposed HAR-AWDF-based approach. The experimental data are the German Credit Data set 1 in which each data point contains 20 items and class label to indicate good or bad credit risk within 150,000 loan applicants. We achieve the assessment task on a laptop with Intel i5 × 8250u processor under the MATLAB (Version 2016a) programming environment.

Selection of the threshold δ
To compromise classification accuracy and algorithm complexity, we firstly select the appropriate setting for threshold δ experimentally. The results are displayed in Table 6. As shown in Table 6, as to set δ = 6, the accuracy arrives at 88.31%, while this implementation is only 10 s. In the case of δ = 0.1, the accuracy increases to 93.6%.
However, it takes 267 s to execute algorithm far exceeding that in the first case. So, we set δ = 3 for the following experiments.

Integration of measures
By considering that the classification performances of three measures (Sup, Conf, Wch) are sometimes inconsistent, we realize the measures' integration with weighting sum. In this experiment, we randomly select a subset consisting 1000 data points from the original dataset, and then carrying out our algorithm by randomly splitting the training and testing samples in three cases where the ratios of training samples are set as 25%, 50% and 75%, respectively. Figure 2 shows the classification accuracies of four measures (Sup, Conf, Wch and integrated measure) gained in these cases. From Figure 2, the experimental results show that the integrated measure proposed in our approach consistently outperforms those of three other measures (Sup, Conf, Wch). As can be seen, it possesses the performance advantage more than ten percentage points in each trial.

Comparison with representative AC-based approaches
To compare our approach with representative AC-based approaches to validate our improvement, we conduct the experiment by selecting 10,000 instances from the original dataset. The ratios of 12-fold random sample selections from experimental set to make up the training sets are 4%, 5%, 6%, 7%, 8%, 9%, 10%, 12%, 14%, 16%, 18%   and 20% in turn. We perform our approach together with three classical AC-based approaches including CMAR (Li, Han, & Pei, 2001), WCBA (Alwidian, Hammo, & Obeid, 2018) and CBA (Azmi, Runger, & Berrado, 2019) for each trial. The experimental results are shown in Figure 3. With the results displayed in Figure 3, we can observe that our approach becomes the best performer in most of cases except the first one. In addition, in the cases that the numbers of training samples are no less than 800, our assessment system holds the stable and remarkable advantage to the opponents. For the further evaluation of our proposed approach, we randomly select 1000 training samples from experimental set for each trial for a 10-fold cross-validation. The experiment results are shown in Table 7. The experiment shows that the average recognition rate of HAR-AWDF is above 92% with the standard deviation 0.0185, while the average running time is only 14.7 s. The average recognition rates of three reference approaches are all under 71%, and their standard deviations are above 0.02, while the average running times are all more than 30 s. These indicate that our approach outperforms these congeneric AC-based approaches, in terms of the significance and performance of items, measures and rules in classification.

Rationality of mined crucial rules
In this section, we are to interpret the rationality of the first seven crucial itemsets in terms of their significances to evaluate personal credit. These itemsets shown in Table 7 are mined in the second experiment in Section 4.3. The items in crucial itemsets include (1) A0: all credits paid back duly or no credits taken in credit history; (2) B3: owning real estate; (3) E2: savings account or bonds ≥ 1000 DM.
As shown in Table 8, we find the attributes of property, credit amount and credit history are closely related to one's credit score. The item B3 has a 0.94 weight, while the weights of the items A0 and E2 are 0.78 and 0.92, respectively. So we can conclude that the item of 'owning real estate' is more important than items of 'all credits paid back duly in credit history' and 'savings account or bonds ≥ 1000 DM', in accordance with the real-life cognition that a person owning real estate has a good credit rating.

Conclusions
In this paper, we put forward the HAR-AWDF-based personal credit assessment system, as an improvement of traditional AC technique. Our approach is characterized by weighting items, measure integration and thresholding to mine crucial itemsets and rules. In addition, the performance of the proposed assessment scheme is strengthened by the weighted voting in classification step. The effectiveness of our approach is verified by the desirable experimental results obtained by the comparisons with different measures and AC-based algorithms.
In particular, the average recognition rate of HAR-AWDF is above 92%, and the standard deviation is under 0.02 in the case that 10% training samples is randomly selected from German Credit Data set. This manifests that three attributes of the credit data, that is, property, savings account and credit history are vital to evaluate personal credit state, and one owning real estate has a good credit.