Download presentation
Presentation is loading. Please wait.
Published byHerbert Heath Modified over 9 years ago
1
Ensemble Based Systems in Decision Making Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: 2008.07.02 IEEE CIRCUITS AND SYSTEMS MAGAZINE 2006, Q3 Robi Polikar, Dept. of Electrical and Computer Engineering (ECE) at Rowan
2
Outlines Introduction Ensemble Based Systems Creating an Ensemble Combining Classifiers/Learners Current & Emerging Areas Software Conclusions
3
Introduction (1) Ensemble system ◦ Multiple classifiers system ◦ Meta-learning algorithm
4
Introduction (2) Reasons for using ensemble based systems ◦ Statistical reasons ◦ Large volumes of data ◦ Too little data ◦ Divide and conquer ◦ Data fusion
5
Ensemble Based Systems (1) Two major steps for creating ensemble systems ◦ The specific procedure used for generating individual classifiers Partitioning feature space (subspace) Partitioning data/resampling data ◦ The strategy employed for combining the classifiers Classifier selection Classifier fusion
6
Ensemble Based Systems (2) Diversity, the cornerstone of ensemble systems ◦ To make each classifier as unique as possible Individual classifiers make errors on different instances. Decision boundaries are adequately different from those of others. Different classifiers can handle different kind of instances – general instances, informative instances. ◦ The methods to archive diversity Use different training datasets Use different training parameters Use different feature subsets
7
Ensemble Based Systems (3) Popular data split techniques ◦ Bootstrap resampling ◦ jackknife (k-fold data split) Ex. in cross validation
8
Ensemble Based Systems (4) Measures of diversity ◦ Pair-wise measures, defined between two classifiers For T classifiers, we can calculate T(T-1)/2 pair-wise diversity measures ◦ Given two hypotheses hi and hj,
9
Ensemble Based Systems (5) But … ◦ There is no diversity measure that consistently correlates with higher accuracy. No Free Lunch Theorem for optimization (D.H. Wolpert and W.G. Macready, 1997) ◦ No matter what algorithm we use, there is at least one target function for which random guessing is a better algorithm. ◦ Learning algorithm -> Turing machine -> language -> distribution So … ◦ Prior knowledge, data distribution, amount of training data, cost function are important. ◦ Try another algorithms if the performance is unacceptable.
10
Creating an Ensemble (1) Bootstrap aggregation
11
Creating an Ensemble (2) Adaptive boosting
12
Creating an Ensemble (3)
13
Creating an Ensemble (4) For certain instance, not for a dataset
14
Creating an Ensemble (5) Classifier selection Dynamic combination rules
15
Combining Classifiers (1) Combination rules are often grouped as ◦ Trainable vs. non-trainable ◦ Combination rules that apply to class labels to class-specific continuous outputs
16
Combining Classifiers (2) Combining class labels ◦ Majority voting Unanimous voting Simple majority ( > 50% ) Plurality voting (majority voting) ◦ Weighted majority voting ◦ Borda count Devised in 1770 by Jean Charles de Borda Each voter (classifier) rank-orders the candidates (1 st,2 nd,…,N th ) place -> (N-1, N-2, …, 0) votes ◦ Behavior knowledge space (BKS)
17
Combining Classifiers (3)
18
Combining Classifiers (4) Combining continuous outputs C: Class number
19
Combining Classifiers (5) Approaches to combine continuous outputs ◦ Algebraic combiners Mean rule, weighted average, trimmed mean, min/max/median rule, product rule, generalized means ◦ Decision templates ◦ Dempster-Shafer-based combination But, which one is better? ◦ No Free Lunch Theorem
20
Current & Emerging Areas Incremental learning Data fusion Feature selection Confidence estimation Error correcting output codes (ECOC) Output [0 1 1 1 0 1 0 1 0 1 0 1 0 1 0] is closest to ω 5 code word with a Hamming distance of 1.
21
Software Entool ◦ A Matlab Toolbox for Ensemble Modeling PRTools ◦ http://www.prtools.org http://www.prtools.org Weka ◦ Java, Open source (GNU license) RapidMiner (formerly YALE) ◦ Java, Open source (GNU/OEM license)
22
Conclusions Ensemble-based systems ◦ It Enjoyed a growing attention. ◦ Its diversity is important. ◦ “No Free Lunch Theorem”
23
Thanks!!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.