Dynamic Fuzzy Rule-based Source Selection in Distributed Decision Fusion Systems

A key challenge in decision fusion systems is to determine the best performing combination of local decision makers. This selection process can be performed statically at the training phase or dynamically at the execution phase, taking into consideration various features of the data being processed. Dynamic algorithms for the selection of competent sources are generally more accurate, but they are also computationally more intensive and require more memory. In this research, we propose a fuzzy rule-based approach for dynamic source selection (FDSS) that compresses the knowledge from local sources using a divide-and-conquer strategy along with the basic conceptsofcoverageandtruthvaluecriteria,leadingtolessmemoryrequirementandfasterprocessing.Atop-downapproachtoFDSSis thenusedtoreachaparameter-freealgorithm,i.e.onethatavoidstherestrictiveparameters/thresholdsettingsofFDSS.Therulebases inbothapproachesarecreatedrecursivelyandusetheconditionalprobabilitiesofeachclass’scorrectnessastherule’sweight.Thepro-posedapproachesarecomparedagainstseveralcompetingdynamicclassifierselectionmethodsbasedonlocalaccuracy.Resultsindi-catethattheproposedfuzzyrulestructuresaregenerallyfasterandrequirelessmemory,whiletheyalsoleadtomoreaccuratedecisions fromtheuncertaindecisionsfrommultiplesources.


Introduction
Nowadays, data are produced at staggering rates due to the widespread communication and sensor technologies around the world. While the availability of data is an enormous opportunity for making better decisions, storing and processing it presents a great challenge. A reasonable approach here could be to keep only the most relevant and to process only the most appropriate. If so, data could be gainfully transformed to information and subsequently processed to knowledge and ultimately to wisdom; with wisdom here being defined as the essence of what is most relevant, widely appropriate and most lasting.
However, it is non-trivial to define the concepts 'relevance', 'appropriateness', and 'lasting'. What complicates this process further is when these data are not in one place, which is a common occurrence due to the distributed nature of the available data today. These data are often provided from different sources and contain different aspects of information with concerns for their privacy. Besides, it is virtually impossible to aggregate all data in one place for processing and mining, due to databases' high volume, variety and velocity.
Accordingly, handling vast data sets and developing distributed data mining algorithms that analyze and summarize distributed data into usable knowledge is highly challenging. In this process, decision fusion is a key methodology for making decisions with vast amounts of data located at different sources. The main concerns in developing such decision-making system is handling the inconsistency and uncertainty of decisions, differing types of local sources and competency levels, and the variety of decision fusion strategies.
Decision fusion is the process of combining the decisions made by multiple information sources. This kind of systems can be either human experts, classifiers, regressors, or any other kind of decision makers. A simple visualization of a decision fusion center is depicted in Figure 1. Decision fusion approaches let each source make its own decision with its local information. The main purpose of such approaches is to make effective use of the disparate local sites without direct access, in order to preserve their privacy. Three main phases typically included in fusion-based decision-making systems are generating local sources, pruning them and fusing their decisions [1]. Decision fusion has many advantages for data mining including the possibility of using a set of low cost computers to train a bunch of algorithms with subsets of the whole training data that fit in their main memory. In this way, it would also be possible to apply algorithms on very large data sets in a feasible time.
There are a number of challenges in distributed decision fusion. First of all, the local systems are individually trained by the separately collected data obtained from each site. Based on the specifics of gathering information such as data location and time, these local data may not cover the entire feature space. These differences also lead to different views about the data. Moreover, inconsistencies commonly exist in a group of separately made decisions from different views and localities. As a result, some sources may be inefficient and reduce the performance of the system [2]. Too many of these poor sources can suppress correct predictions of good sources [3]. For this reason, some local sources are eliminated before fusion and the final decision is made by using a selected subset of sources [4]. In this paper, we focus on effective selection of sources.
The current approaches for selecting local sources in the literature generally falls into two categories: static and dynamic. Static approaches apply a region of competence for each source in the feature space at the training phase, while dynamic approaches apply the competence regions during the execution phase and determine the best sources using training data considering the data being analyzed. In other words, the accuracy of each source is estimated in a local area surrounding the data named 'region of competence' or 'local accuracy' [5]. In general, dynamic approaches are more accurate than static approaches [6]. The most common strategy for dynamic approaches in the literature so far uses neighborhood of the data [1], which requires keeping all training data in memory and searching them for each test data in order to find its neighbors. Determining a specific number of neighbors for each data is very time consuming [7] and the final decision depends on the size and shape of the neighborhood [1]. Therefore, dynamic approaches require larger memory and are more computationally intensive compared to static methods [7,8]. This is particularly so in applications with larger scale and higher computational complexity [1,8]; and as a result, their applicability is often criticized [1] and their benefits have remained out of reach.
The objective of the present work is to use fuzzy logic to compress the knowledge about the expertise of sources into a single rule base that fits well in memory while also providing fast and accurate results. Specifically, we propose two algorithms which extract useful knowledge about the competence regions and performances of local sources and store this knowledge in the form of a fuzzy rule base. Upon arrival of new data, both algorithms gradually partition the input space and define one or more rules for each partition. Each rule in the final rule base assigns a competence region to a local source. Since the firing degree of each rule depends on the location of the data, the contribution of each source in the final decision is determined based on the data at hand. Hence, the above methods are categorized as dynamic selection methods.
The first proposed Fuzzy Distributed Source Selection (FDSS) algorithm dynamically weights the local sources for each new data using a rule-based system which represents the priori stored knowledge about the competence regions of sources. This rule-based system is constructed using the previously available data in the fusion center called validation set. Divide and conquer strategy is used to recursively partition the input space and determine the competence regions of the local sources to construct the rule base. Inspired by fuzzy rule measures and linguistic summarization concepts, truth value and coverage [9], we propose two measures for defining the conditions of recursion termination. We prevent the generation of redundant rules by defining two appropriate thresholds for these two measures. In order to control the contribution of each source in the final decision, we also provide the rule base with an extra parameter that controls the firing degrees of rules. We use the conditional probability of correctness of each rule for each class estimated over the validation set and create the rule base in a probabilistic form. The second proposed  algorithm, top-down FDSS, differs from FDSS by omitting the pre-defined threshold values while constructing the rule base in a top-down manner. This methodology does not require the pre-defined parameters and conditions of the first method and hence is no longer affected by the threshold settings. Training and storing a fuzzy rule-based system for weighting and selecting sources carries several benefits. Using fuzzy logic, we combine the merits of dynamic selection methods with those of static methods. While less memory is typically required for storing the rule-based system than storing the complete data set, we also omit the computationally intensive process of searching the whole training data set to find the nearest neighbors of each test data.
Avoiding the risk of defining a single region of competence for each source and letting each source contribute to the estimation of the accuracies according to its adjacency to the data results in avoiding the problems arising from crisp boundaries [10]. Using fuzzy logic also better enables coping with uncertainties that often exist in the outputs of local sources. The proposed approaches also benefit from the interpretability and representation ability of fuzzy rule-based systems that make it suitable for real-life applications.
The rest of this paper is organized as follows. For simplification, symbols used in the paper are shown in Table 1. In Section 2, we review the current algorithms in the literature for selecting sources in fusion systems. In Section 3, the details of the proposed approaches are explained. Experimental results are included in Section 4. Finally, Section 5 concludes the paper.

Literature review
Static and dynamic methods are two main categories of source selection strategies in decision fusion systems. The first category or static selection, selects the best decision maker or decision makers at the training phase. Second category or dynamic selection applies the selection process at the testing phase involving the input data. Yin et al. [11] classifies decision combination methods into two main groups. The first group aims to train and combine an ensemble of classifiers during the learning process, e.g. Boosting [12] and Bagging [13]. Selection phase of this group in the literature commonly appears under different names, such as ensemble pruning, ensemble selection or ensemble thinning [14]. The second group combines the results of multiple available decision makers to solve the targeted problem. This group often trains a meta-learner to combine the component decision makers intelligently [11]. Although this paper focuses on the second group, we explore current literature in both groups in the viewpoints of source selection strategies.
The rest of this section is dedicated to a brief literature review on each category, emphasizing dynamic methods which this paper focuses on.

Static selection
The selection phase in static selection methods aims to find the subset of classifiers with optimal accuracy [8]. Different methods have been proposed in the literature to select one subset of classifiers. Based on the categorization of ensemble selection methods [14], one of the simplest methods for static selection is ranking-based approaches which choose N-best performing classifiers or weighted N-best performing classifiers over the training data set [6,15]. Greedy approaches [16] are also used in the literature which add or remove from the ensemble iteratively aiming to increase the overall accuracy. Genetic algorithms are also popular for selecting and/or weighting local sources [17][18][19]. Clustering the input space into disjoint regions and dominating one local classifier in each region is adversely used in the literature such as [7,20]. The authors have used fuzzy logic earlier in [21] for weighting local sources in a multi-source decision fusion problem. They train a fuzzy rule base which approximates the reliabilities of sources over the input space. The estimated reliabilities are used as weights of local sources. In ensemble-based multiple classifier systems, different criteria such as incorrect predictions of local classifiers [22], diversity [15,23], independency [24] or combination of measures such as diversity together with sparsity [11,25] are used to prune the ensemble and select a subset of classifiers. Some of these criteria are discussed in [26].

Dynamic selection
Using neighborhood of the pattern to estimate the accuracy of multiple classifiers, in notion of 'local accuracy' or 'region of competence' was first triggered by Woods et al. [5] as DCS-LA algorithm. This algorithm estimates the accuracy of each source in the vicinity of the test pattern using its K nearest neighbors. This algorithm was further extended in the literature. In [27] multiple classifier behavior or MCB is used to determine the K nearest neighbors of the test pattern. In [28], linear programming is used to weight sources in the neighborhood the test pattern. In [4], the competence of each source in the neighborhood is determined by comparing it to a random classifier. In [29], KNNE is used instead of KNN which selects the K nearest neighbors of each class separately. Cevikalp and Polikar [30] use quadratic programming to weight local sources in the neighborhood of the test pattern based on their accuracy.
In [1], it is emphasized that using K nearest neighbors for defining local region of competence has several disadvantages. First of all, using the concept of neighborhood, it is assumed that the local accuracy of each source is constant in the region. Also, the reliability of the results is deeply dependent on the number of points in the neighborhood. In other words, the result depends on the shape (distance measure) and size (number of points) of the neighborhood. Some attempts are made to reduce these disadvantages. Bringing into consideration that using KNN, the final decision depends on the number of K, Zou et al. [31] proposed to add another phase for selecting a suitable number for K to the algorithm. This selection is performed using the margin error. KO et al. [32] also proposed to reduce the number of neighbors considered in estimating local accuracy until at least one source is found that correctly classifies all the neighbors. Although these attempts have improved the performance of neighborhood-based methods, adding an extra phase for deciding about number of neighbors is time consuming and increases the complexity of the system.
Ensemble-based methods also apply dynamic selection using different criteria for effective selection of local sources. Calculating confidence measures rather than performance, Dos Santos et al. [8] proposed to select an ensemble with less ambiguity from a pool of ensembles which increases the degree of certainty of the final decision about the test pattern. Lysiak et al. [33] propose dynamic weighing of sources in which sources are first eliminated using diversity measure and then the remained sources are weighted using their accuracy over the entire data set. Considering both local accuracy and diversity, Giacinto and Roli [26] propose to select the most accurate classifiers in the vicinity of the test data together with the most diverse set of classifiers between them. Li et al. [34] consider error diversity measures to select from the initial pool of classifiers and then use local accuracy for selecting the final classifier. Swiderski et al. [35] propose to use the area under curve (AUC) of the receiver operating characteristic of each classifier as a measure to select a proper subset of classifiers. Nazemi et al. [36] use fuzzy logic for dynamically weighting ensemble members in loss given default modeling problem. They create fuzzy rule base using clustering methods and use it to weight trained regression sources dynamically. Several combination formulas are tested and compared in this paper. Ykhlef and Bouchaffra [37] consider game theory algorithms and solve the selection problem as a coalitional game.
Dealing with vast amount of data, it is hard to find one local source which has expertise over the entire input space. Applying a selection phase before combination in order to use locally accurate sources in the vicinity of the input pattern leads to more precise results. Dynamic methods involve the input pattern in the process of selecting sources by defining a local region of competence in the neighborhood of the input pattern. Since selecting a weak source can significantly reduce the performance of the system, determining this region directly effects the performance of the system [28]. Until now, the most used method for dynamically defining the local accuracies in the literature is K nearest neighbor method [1]. Dynamic methods so far are memory consuming and execute under a high computationally intensive process. Even though the presented dynamic selection methods so far have shown remarkable results, there is a notable lack of methodologies that present high accuracy of dynamic algorithms while avoiding the time-consuming process of finding neighbors and high memory consumption. In this paper, we aim to expand the limits of dynamic selection methods to better use their advantages in effective selection of local sources.

The proposed FDSS algorithm
Let {S 1 , S 2 , . . . , S n } be a set of local decision makers each learnt with n separate data sets. The feature vector x = [x 1 , x 2 , . . . , x d ] is presented to be labeled into one of m classes [c 1 , c 2 , . . . , c m ]. After all sources are trained, one separate data set called the validation set is labeled by all the local sources. The labeled data set is then used as the training set at the fusion center. Figure 2 shows this process. The proposed approach tries to generate a fuzzy rule base that contains useful information about the competence regions of local sources and their performances in order to dynamically assign a proper weight to each source for each given data. To this end, the algorithm searches for the local regions in the search space in which there is at least one decision maker with high performance. After the rule base is constructed, each rule defines a competence region and specifies an efficient decision maker for that region. The proposed algorithm constructs the rule base iteratively and no pre-defined number of regions is necessary. The pseudo code of the algorithm under the name FDSS is shown in Figure 3. FDSS stands for Dynamic Selection Approach-Fuzzy Rule-Based System. In the following, the training and testing phases of the algorithm are explained in detail. Since the proposed algorithm constructs the rule base iteratively, its convergence is discussed at the end of this section.

Training phase in the proposed FDSS algorithm
Inspired by [38], each iteration of the algorithm at the training phase divides the feature space into two regions, aiming at finding local regions with powerful decision makers and assign a fuzzy rule to that region. The added rule is in the following form This rule specifies that, if x is in the region A, then the result of decision maker s would be correct with probability P A S . P A S is the conditional probability of correctness of s given each class.
Selection algorithms are prone to overfitting [39,40]. Overfitting happens when the algorithm that selects the local sources fits the training data set so well that while being accurate for training data set, its selections fail to make proper decisions during the execution. For the generated rules to present high reliability and to prevent overfitting, each region could be turned into a rule if the following conditions are met: (1) There exists at least one source with high correctness probability for the data in that region. (2) There exists a sufficient number of data in that region (for the sake of generalization ability of the final rule base).
To evaluate the fulfillment of these conditions, we propose two measures, truth value and coverage, based on quality measures introduced in [13]. The proposed measures are described in the following.

Degree of truth value (T)
T can be viewed as the rate of the data satisfying the consequent from those which satisfy the antecedent [9]. Since each rule in our problem specifies one of the local decision makers that are suitable for making decision about the incoming data, the truth value of the rule depends on the performance of the specified decision maker. In other words, degree of truth of each rule is related to the correctness degree of the specified decision maker. T increases as more data satisfying the antecedent part (located in the determined competence region) also satisfy the consequent part (are labeled correctly by the specified decision maker). Hence, we formulate T as follows, In this formula, the numerator shows the membership of the data which the corresponding decision maker labels correctly. The formula computes the normalized sum of these membership values. As the above formula shows, we have considered the correctness probability of the decisions in the training data set.

Degree of coverage (V)
Coverage value specifies the generalization ability of the rule. Degree of sufficient coverage or V describes if the rule is supported by enough data [9]. V increases as more data satisfy both antecedent and consequent parts. As V increases, the generality of the rule increases. Since each rule in our problem indicates a competence region and a decision maker, we consider its coverage value as the number of train data that the source labels correctly in the specified region, as shown in (2) Two threshold values are initiated at the beginning of the algorithm for the truth and coverage values above. At each iteration, each divided region turns into a rule if there exists at least one source that its truth value for the corresponding data exceeds the desired threshold. If no sources satisfy this condition, the coverage measure is checked. When coverage is below the threshold for all of the sources, it means that further division of the space leads to rules that are not supported by enough train data, hence not efficient for the rule base. Therefore, the division is stopped. To increase the diversity of the sources included in the rule base, all sources with truth value higher than average of truth values in that region are added to the rule base. Then the algorithm breaks the chain of recursion. Otherwise, the process repeats by further dividing the area.
A conditional probability vector in the following form is assigned to each rule in the final rule base: This vector assigns a probability to source S i per class. Whenever each rule uses a source to label data to class C j , P j indicates the probability that this decision is correct. In this way, for each new data, the correctness probabilities of decisions of sources are considered in the final decision further to its membership degree to local competence regions.
At the end, the final constructed rule base includes R rules in the following form: Such that s k ∈ {S 1 , . . . , S n }, k = 1, . . . , r is one of the base decision makers. A r is a vector showing the center of β i in the competence region that the r th rule specifies.

Decision-making process in the proposed method
After the training phase, an unseen pattern is received for being processed. The output of each rule suggests one source with a vector ω which is the result of multiplying the firing value of the rule and its conditional probability vector. The result of the rule base is a matrix W of size n * m as below, W Rules is then set to the element by element multiplication of itself and the matrix of results of sources is, After calculating (5), W Rules is multiplied by the result of each source. O s r ,x k ,j equals one if the corresponding source labels x k to class j and zero otherwise. Therefore, W s r ,c j turns into zero if s i labels the data to a class other than c j . Then we compute the weight for each source per class as below, W S i ,C j specifies the weight of source S i for class C j which is computed as below, As Equation (7) shows, the determined weight of S i for class C j is the maximum output value of rules for that specific source and class.
At the end, we compute the weight of each class as the maximum weight among all sources. If there exists one class which strongly dominates other classes, that class is selected as the final decision, as in [41]. Otherwise, we compute the weight of each class as the average of weights in the rule base as shown in (8). Then, the class with maximum weight is selected as the final decision, as below, The process of making decision about the new data is shown in Figure 4.

Remarks on convergence of the algorithm
Since the proposed algorithms run iteratively and recursively, we discuss their convergence here. The proposed algorithm continues dividing the regions until the following two conditions are met: (1) The local regions cover an appropriate number of data.
(2) No source is accurate enough for the current region's data.
If there does not exist any source with sufficiently high accuracy for any region, the coverage criteria breaks the chain of recursion and prevents the unlimited number of divisions. Therefore, the algorithm always converges and does not fall into an unlimited loop.

The proposed top-down FDSS algorithm
One of the main disadvantages of dynamic approaches is their dependence on pre-defined parameters. Although FDSS has many advantages including less memory consumption, its performance depends on two parameter settings, minimum truth and minimum coverage threshold values. One idea to remove the parameters of the proposed FDSS algorithm is to construct the rule base in a top-down manner. The top-down algorithm starts with one rule over the entire data set. After this, the algorithm starts dividing the area recursively just like FDSS. The difference is that the maximum achieved conditional probability for each class so far by the added rules to the areas containing the current area is passed to each step. In each step, only those rules are added to the rule base that outperform the previous rules in terms of conditional probabilities for at least one class. In this way, the algorithm gives up searching for suitable rules which need suitability parameters to be previously defined, in favor of adding better rules in each step. We also omit the minimum coverage threshold. The FDSS divides the search space to find at least one suitable rule. In some cases, there might not exist any suitable source for the local regions. Therefore we define the coverage threshold to stop the algorithm from adding dysfunctional rules which are not supported by enough amounts of data. Two parameters are provided in the FDSS algorithm to prevent overfitting. The methodology of creating the rule base in top-down FDSS is immune to overfitting. When using top-down manner, we add a rule in each step only if it is better than the previously added ones. If in any situation one rule with very low coverage is added to the rule base, it does not lead to overfitting because of its very low variance and the fact that the algorithm has added efficient rules for this area in the previous steps. This lets us remove the coverage parameter. Newly added rules in each step focus on local regions where sources perform better in comparison to the previous more global areas. We should note that this algorithm provides a rule base with overlapping local regions of competence. It means that unlike the previous algorithm, each region might fit to more than one rule. Figure 5 shows the pseudo code of the proposed top-down FDSS algorithm.

Experimental results
To evaluate the performance of our proposed method, we consider two sets of experiments. First we evaluate the accuracy of classification using homogeneous local sources on 14 benchmark data sets from UCI [42] and Keel [43] machine learning database repositories. Table 2 shows the main features of the selected data sets. The number of features in the selected data sets varies from 2 to 24, and number of instances varies from 569 to 19,020. After describing the experimental setup, classification results to homogeneous benchmarks are presented. Then we evaluate the proposed algorithm using heterogeneous local sources. Performance results and comparisons in this part are included in Section 4.2. Finally we present the memory consumption of the proposed method in Section 4.3.  The classification results are compared against five other selection approaches: (1) DCS-LA [5]: This algorithm defines the competence of each local source classifier as its local accuracy. The local accuracy is estimated using the k nearest neighbors of the test data. We choose k = 10 since this value of k represents the best performance according to [5]. (2) DCS-P and DCS-KL [4]: These two algorithms are dynamic approaches that use comparison to a random classifier as a measure of competence. These algorithms use k nearest neighbor method and we set k = 10 according to the paper's results.

Experimental setup
In order to simulate the distributed local sources where each source is trained by its own separate data set, we divide the training data randomly between sources with no overlap. Data for training and testing sets are extracted using fivefold cross validation. At each iteration, three folds are used for training sources, one fold as validation data set and one fold as testing data set. Therefore, the presented results are the average of 20 times running of the algorithm. Naïve Based is used as the local classifier and FCM clustering is used for dividing the space. The threshold values for truth value is set to 0.96 and for coverage value is set to 10/(#training data) based on experiments. In both homogeneous and heterogeneous tests, we apply the following process. At each iteration, we first train the local sources, each using its own dedicated set of training data. Then we compute the outputs of local sources for the validation data set and train the combination algorithm using the output values together with the validation set. The algorithm is then tested on the testing data set.

Test with homogeneous sources
In this test, we use the same classifier and the same set of features for all the local sources.
In order to evaluate the performance of the algorithm in dealing with unreliability of local sources, we increase the number of local sources that the training data will divide into. Since the data are divided with no overlap, the available training data for each source decreases as we increase the number of local sources. For 2-10 number of local sources, each source receives approximately 50%, 25%, 16%, 12% and 10% of the whole available train data, respectively. This leads to less reliability for local sources. The accuracy results averaged over the different number of sources is shown in Table 2. Figure 6 compares the accuracies of different algorithms for different number of sources. As this figure and Table 3 show, the proposed approaches perform better than other approaches in most cases. Increasing the number of local sources presents interesting results. Single best approach produces an average of 0.234 decrease in accuracy. This is because of the fact that decreasing the number of available train data leads to classifiers with lower performance. FRBMCS, DCS-P, SB, FDSS, DCS-LA, DCS-KL produce an average of −4.69, −3.09, 0.578, 0.054, −0.245, 0.035 values increase in accuracy. As the results show, increasing the number of local sources might lead to better results. This is because when number of local sources increases, the fusion center could select between more complementary classifiers. While the overall performance of the proposed method is better than other methods according to Table 3, the change of accuracy is lower in the proposed method except for DCS-KL. On the other hand, DCS-KL shows less accuracy in predictions. In other words, the proposed method produces a smoother and more accurate performance by increasing the number of local sources. This is the result of the main feature of the proposed algorithm that facilitates the power of fuzzy logic in handling uncertainty.
The Kolmogorov-Smirnov test is used for a deeper comparison of the two proposed algorithms. The Kolmogorov-Smirnov test is a nonparametric test that does not assume that data are sampled from any specific distribution. The confidence level for the null hypothesis rejection is considered as 5%. The p-values of the test are also shown in Table 3. As the test shows, the two proposed algorithms almost perform the same. The answer to the question of which algorithm is beneficial refers to the differences of the two algorithms. FDSS has two parameters that control the number of iterations of the algorithm. The larger the minimum coverage threshold or the smaller the minimum truth value threshold, the faster the algorithm's convergence, and the smaller the rule base size. On the other hand, Top-down FDSS is parameter free. Although this is a significant advantage, but makes the algorithm continue searching to the deepest possible level and leads to more time-consuming process of training phase. The memory consumption of two algorithms is compared in Subsection 4.3.

Test with heterogeneous sources
In this test, we use different sets of features for each local classifier, hence heterogeneous. For this purpose, we use Random Subspace [45] method to create the local data. The Random Subspace method uses different sets of features for each local classifier. Here we use ten local sources each trained with 50% of the features that are randomly selected. The rule base is trained with the whole set of features of the validation set. Table 4 shows average of fivefold cross validation results. For heterogeneous sources, we observe that the proposed approach works better than the others for six datasets. The average difference between FDSS and best of other approaches for all data sets is −0.7 and for top-down FDSS it is −1.14. FDSS also works better than top-down FDSS in eight data sets. The p-values of the test are also shown in Table 4. As the table shows, the two proposed algorithms maintains similar performance.

Memory consumption and runtime
Mean number of generated rules in the proposed method is shown in Table 5 in order to evaluate the proposed methods in terms of memory consumption. The number of generated rules relates to the number of leaves in the tree of dividing the input space. In most cases, top-down FDSS inherently produces fewer rules than FDSS. Since the top-down manner prefers the more global rules to local rules and each iteration only adds better rules, the final rule base is smaller than FDSS. Comparing with storing all the training data in memory, FDSS presents 96.12% and top-down FDSS presents 97.5% improvement in memory. We see from the results that the required memory for storing the produced rule base is much less than the required memory for keeping the complete data. This is while Tables 3-5 demonstrate that it is able to achieve the same or even higher classification performance with lower consumed memory.
Average runtime for one single source is compared in Table 6 for dynamic algorithms. The algorithms are implemented in MATLAB and runtimes are calculated using Intel(R) Core(TM) i3, with 4 GB RAM using Windows 7. As the table shows, runtime of the proposed algorithms are significantly less than other dynamic algorithms. The reason is that proposed algorithm does not need to search for a specific number of neighbors for the input data which is very time consuming. The runtime for top-down FDSS is generally less than FDSS since the average number of rules generated by top-down FDSS is generally less than FDSS as shown in Table 5.

Conclusion
In this paper, we propose two algorithms based on fuzzy logic for dynamic source selection in decision fusion systems. The first algorithm works in a recursive manner using specifications of fuzzy rules including truth and coverage values to construct the rule base. The second algorithm works independently from these two parameters. We find that by compressing the knowledge extracted from the training data set into a single fuzzy rule base, we can achieve similar or better performance while storing much less data in memory. The proposed approach can then be regarded as an alternative for neighborhood-based approaches especially when data size is large, memory usage is limited and process speed is important.
There are several directions for the extending the proposed approach. First, the proposed approach divides data into two clusters at each iteration; whereas it may be useful to choose an appropriate number of clusters at each iteration based on the available data. Second, it would be desirable to cluster data not only by distance measures but also consider the accuracies of different sources in order to find the optimal division that leads to the best set of possible rules. Finally, a post-processing phase can be considered that removes or merges rules in order to have a more efficient rule base.