Similarity-based Classifier Combination for Decision Making Authors: Gongde Guo, Daniel Neagu Department of Computing, University of Bradford.

Similarity-based Classifier Combination for Decision Making Authors: Gongde Guo, Daniel Neagu Department of Computing, University of Bradford

Outline of Presentation 1.Background –Classification process; –Drawbacks of A Single Classifier –Solutions … 2.Approaches for Multiple Classifier Systems –Explanation of the Four Approaches 3.An Architecture of Multiple Classifier System 4.Involved Classifiers for Combination –K-Nearest Neighbour Method (kNN) –Weighted k-Nearest Neighbour Method (wkNN) –Contextual Probability-based Classification (CPC) –kNN Model-based Method (kNNModel) 5.Combination Strategies –Majority voting –based combination –Maximal Similarity-based Combination –Average Similarity-based Combination –Weighted Similarity-based Combination 6.Experimental Results 7.Conclusions 8.Reference

Background - Classification Process C lassification occurs in a wide range of human activities. At its broadest, the term could cover any activity in which some decision or forecast is made on the basis of currently available information, and a classifier is then some formal method for repeatedly making such judgments in new situations (Michie et al. 1994). V arious approaches to classification have been developed and applied to real-world applications for decision making. Examples include probabilistic decision theory, discriminant analysis, fuzzy-neural networks, belief networks, non- parametric methods, tree-structured classifiers, and rough sets.

Background - Drawbacks of A Single Classifier U nfortunately, no dominant classifier exists for all the data distributions, and the data distribution of the task at hand is usually unknown. A single classifier cannot be discriminative enough if the number of classes is huge. For applications where the classes of content are numerous, unlimited, and unpredictable, one specific classifier cannot solve the problem with a good accuracy.

Background - Solutions A Multiple Classifier System (MCS) is a powerful solution to difficult decision making problems involving large sets and noisy input because it allows simultaneous use of arbitrary feature descriptors and classification procedures. T he ultimate goal of designing such a multiple classifier system is to achieve the best possible classification performance for the task at hand. Empirical studies have observed that different classifier designs potentially offer complementary information about the patterns to be classified, which could be harnessed to improve the performance of the selected classifier.

Architecture of Multiple Classification Systems Approach 1: Different combination schemes.Approach 2: Different classifier models. Classifier iClassifier 1Classifier L Combiner x Classifier iClassifier 1Classifier L Combiner x …… x Approach 3: Different feature subsets. … D1D1 DiDi DmDm Approach 4: Different training sets. … S1S1 SkSk SiSi Classifier iClassifier 1Classifier L Given a set of classifiers C={C 1, C 2, …, C L } and a dataset D, each instance x in D represents as a feature vector [x 1, x 2, …, x n ] T, x A classifier gets as its input x and assigns it to a class label from Ω, i.e. Four approaches are generally used to design a classifier combination system (Kuncheva, 2003).

Explanation of the Four Approaches Approach 1: The problem is to pick a combination scheme for L classifiers C 1, C 2, …, C L studied to form a combiner. Approach 2: The problem is to choose individuals (classifiers) by considering the issues of similarity/ diversity, homogeneous/heterogeneous etc. Approach 3: The problem is to build each C i on an individual subset of features (subspace of ) Approach 4: The problem is to select training subsets D 1, D 2, …, D m of the dataset D to lead to a team of diverse classifiers.

An Architecture of Multiple Classifier System GRIGCFSMSCASCWSCCPC wkNN kNNModel kNN Output4 Output3 Output2 Output1 Classifier Combination Output Data Pre-processing Data Sets

Involved Classifiers for Combination- kNN G iven an instance x, the k-nearest neighbour classifier finds its k nearest instances, and traditionally uses the majority rule (or majority voting rule) to determine its class, i.e. assigning the single most frequent class label associated with the k nearest neighbours to x. This is illustrated in Figure 3. The two classes here are depicted by “□” and “o”, with ten instances for each class. Each instance is represented by a two-dimensional point within a continuous-valued Euclidean space. The instance x, represented as ‘ ’. x k=5

I n wkNN, the k nearest neighbours are assigned different weights. Let ∆ be a distance measure, and x 1, x 2, …, x k be the k nearest neighbours of x arranged in increasing order of ∆(x i, x). So x 1 is the first nearest neighbour of x. The distance weight w i for i-th neighbour x i is defined as follows: Instance x is assigned to the class for which the weights of the representatives among the k nearest neighbours sum to the greatest value. Involved Classifiers for Combination-wkNN Involved Classifiers for Combination- wkNN

C ontextual probability-based classifier (CPC) (Guo et al., 2004) is based on a new function G – a probability function used to calculate the support of overlapping or non-overlapping neighbourhoods. The idea of CPC is to aggregate the support of multiple sets of nearest neighbours of a new instance for various classes to give a more reliable support value, which better reveals the true class of this instance. Involved Classifiers for Combination- CPC

Involved Classifiers for Combination- kNNModel T he basic idea of kNN model-based classification method (kNNModel) (Guo et al. 2003) is to find a set of more meaningful representatives of the complete data set to serve as the basis for further classification. Each chosen representative x i is represented in the form of which respectively represents the class label of x i ; the similarity of x i to the furthest instance among the instances covered by N i ; the number of instances covered by N i ; a representation of intance x i. The symbol N i represents the area that the distance to N i is less than or equal to Sim(x i ). kNNModel can generate a set of optimal representatives via inductively learning from the dataset.

Combination Strategy– Majority Voting-based Combination G iven a new instance x to be classified, whose true class label is t x and k predefined classifiers are denoted as A 1, A 2, …, A k respectively, the classifier A i approximates a discrete-valued function : The final class label of x, obtained by using majority voting-based classifier combination, is described as follows: f(x) where if a=b, and otherwise

Combination Strategy – Class-wise similarity-based classifier combination – Class-wise similarity-based classifier combination T he classification result of x classified by A j is given by a vector of normalized similarity values of x to each class, represented by S =, where j=1, 2, …, k. The final class label of x can be obtained in three different ways: a) Maximal Similarity-based Combination (MSC): b) Average Similarity-based Combination (ASC): c) Weighted Similarity-based Combination (WSC):, where is a control parameter used for setting the relative importance of local optimization and global optimization of combination.

T his study mainly focuses on Approach 1. Given four classifiers: kNN, kNNModel, CPC and wkNN, we proposed three similarity-based classifier combination schemes empirically. After evaluating them on fifteen public datasets from UCI machine learning repository, we apply the best approach to a real-world application of toxicity prediction of the environment effects of chemicals in order to obtain better classification performance. Experimental Results

In Table 1, NF-Number of Features, NN-Number of Nominal features, NO-Number of Ordinal features, NB- Number of Binary features, NI-Number of Instances, CD-Class Distribution. Four Phenols data sets are used in the experiment, where Phenols_M represents the phenols data set with MOA (Mechanism of Action) as endpoint for prediction; Phenols_M_FS represents the Phenols_M data set after feature selection; Phenols_T represents the Phenols data set with toxicity as endpoint for prediction, and Phenols_T_FS represents Phenols_T data set after feature selection. Fifteen public data sets from the UCI machine learning repository and one data set (Phenols) from real-world applications (toxicity prediction of chemical compounds) have been collected for training and testing. Some information about these data sets is given in Table 1. Data setNFNNNONBNICD Australian Colic Diabetes Glass HCleveland Heart Hepatitis Ionosphere Iris LiverBupa Sonar Vehicle Vote Wine Zoo P_MOA P_MOA_FS P_T P_T_FS 14 23 8 9 13 19 34 4 6 60 18 16 13 16 173 20 173 20 4 16 0 3 6 0 16 0 6 7 8 9 7 1 34 4 6 60 18 0 13 0 173 20 173 20 4 0 3 12 0 16 0 690 368 768 214 303 270 155 351 150 345 208 846 435 178 90 250 383:307 232:136 268:500 70:17:76:0:13:9:29 164:139 120:150 32:123 126:225 50:50:50 145:200 97:111 212:217:218:199 267:168 59:71:48 37:18:3:12:4:7:9 173:27:4:19:27 37:152:61 Table 1. Some information about the data sets

Table 2. A comparison of four individual algorithms and MV in classification performance. Data setkNNModel ε NvkNNwkNNCPCMV Australian Colic Diabetes Glass HCleveland Heart Hepatitis Ionosphere Iris LiverBupa Sonar Vehicle Vote Wine Zoo P_MOA P_MOA_FS P_T P_T_FS 86.09 83.61 75.78 69.52 82.67 81.85 89.33 94.29 96.00 68.53 84.00 66.55 91.74 95.29 92.22 83.20 89.20 71.60 75.60 211301100202400002 2211301100202400002 2 54531321223351200445453132122335120044 85.22 83.06 74.21 67.62 81.00 80.37 83.33 84.00 96.67 66.47 85.00 69.29 92.17 94.71 95.56 87.20 88.80 74.40 73.60 82.46 81.94 72.37 67.42 81.33 77.41 83.33 87.14 95.33 66.47 86.50 71.43 90.87 95.29 95.56 86.80 92.80 74.40 72.40 84.64 83.61 72.63 68.57 82.67 81.48 82.67 84.86 96.00 65.88 87.50 70.12 91.74 95.88 96.67 87.60 91.20 74.80 77.20 85.65 83.33 74.87 69.52 83.00 81.85 85.33 88.57 96.00 68.82 87.00 71.43 91.74 95.29 95.56 87.60 91.60 74.00 76.00 Average83.00//82.2582.1782.9383.53

Table 3. A comparison of different combination schemes Data setSVMC5.0MVMSCASCWSC αε N Australian Colic Diabetes Glass HCleveland Heart Hepatitis Ionosphere Iris LiverBupa Sonar Vehicle Vote Wine Zoo P_MOA P_MOA_FS P_T P_T_FS 81.45 83.89 77.11 62.86 83.67 84.07 82.67 87.14 98.67 69.71 74.00 77.50 96.96 95.29 97.79 84.40 89.20 65.60 76.00 85.5 80.9 76.6 66.3 74.9 75.6 80.7 84.5 92.0 65.8 69.4 67.9 96.1 92.1 91.1 90.0 89.2 72.8 74.0 85.65 83.33 74.87 69.52 83.00 81.85 85.33 88.57 96.00 68.82 87.00 71.43 91.74 95.29 95.56 87.60 91.60 74.00 76.00 86.52 84.72 75.13 70.95 82.33 81.85 87.33 89.43 96.67 70.59 88.50 70.83 92.61 96.47 95.56 88.40 92.40 76.40 77.20 86.23 84.17 75.13 70.95 82.33 81.48 86.67 88.86 96.67 71.18 88.50 71.90 92.61 96.47 96.67 88.20 92.40 76.00 76.40 86.52 84.72 75.13 70.95 82.67 81.85 87.33 89.43 96.67 71.18 89.00 71.90 92.61 96.47 96.67 88.80 92.40 76.40 77.20 0.7 0.6 0.7 0.8 0.7 0.8 0.7 0.8 0.7 23424211132221004232342421113222100423 50000450450500000005000045045050000000 Average82.5380.2883.5384.4284.3984.64///

SVMC5.0kNNModelvkNNwkNNCPCMV -0.69 (-) 2.98 (+) -0.33 (-) 2.52 (+) 2.07 (+) 0.23 (-) / WSC1.15 (-) 2.98 (+) 2.07 (+) 3.44 (+) 2.98 (+) 2.98 (+) 2.52 (+) Table 4. The signed test of different classifiers In Table 4, the item 2.07 (+) in cell (3, 4), for example, means WSC is better than kNNModel in terms of performance over the nineteen data sets. That is, the corresponding |Z|>Z 0.95 =1.729. The item 1.15 (-) in cell (3, 2) means there is no significant difference in terms of performance between WSC and SVM over nineteen data sets as the corresponding |Z|<Z 0.95 =1.729.

Conclusions T he proposed methods directly employ class-wise similarity measure used in each individual classifier for combination without changing the representation from similarity to probability. It significantly improves the average classification accuracy carried out over nineteen data sets. The average classification accuracy of WSC is better than that of any other individual classifiers and the majority voting-based combination method. The statistical test also shows that the proposed combination method WSC is better than any individual classifier with an exception of SVM. The average classification accuracy of WSC is still better than that of SVM with a 2.49% improvement. Further research is required into how to combine heterogeneous classifiers using class-wise similarity-based combination methods.

References (Michie et al. 1994) D. Michie, D.J.Spiegelhalter, and C.C.Taylor. Machine Learning, Neural and Statistical Classification, Ellis Horwood, 1994. (Guo et al. 2003) G. Guo, H. Wang, D. Bell, Y. Bi, K. Greer. kNN Model-Based Approach in Classification. In Proc. of ODBASE 2003, LNCS 2888/2003, pp. 986-996, 2003. (Guo et al. 2004) G. Guo, H. Wang, D. Bell, Z. Liao. Contextual Probability- Based Classification. In Proc. of ER 2004, LNCS 3288/2004, pp. 313-326, Springer-Verlag, 2004. (Kuncheva, 2003) Kuncheva. L.I. Combining Classifiers: Soft Computing Solutions. In: S.K. Pal (Eds.) Pattern Recognition: From Classical to Modern Approaches, pp. 427-452, World Scientific, Singapore, 2003.

Thank you very much!

Similarity-based Classifier Combination for Decision Making Authors: Gongde Guo, Daniel Neagu Department of Computing, University of Bradford.

Similar presentations

Presentation on theme: "Similarity-based Classifier Combination for Decision Making Authors: Gongde Guo, Daniel Neagu Department of Computing, University of Bradford."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Similarity-based Classifier Combination for Decision Making Authors: Gongde Guo, Daniel Neagu Department of Computing, University of Bradford.

Similar presentations

Presentation on theme: "Similarity-based Classifier Combination for Decision Making Authors: Gongde Guo, Daniel Neagu Department of Computing, University of Bradford."— Presentation transcript:

Similar presentations

About project

Feedback