Ensemble Methods: Bagging and Boosting

Slides:



Advertisements
Similar presentations
Ensemble Learning Reading: R. Schapire, A brief introduction to boosting.
Advertisements

My name is Dustin Boswell and I will be presenting: Ensemble Methods in Machine Learning by Thomas G. Dietterich Oregon State University, Corvallis, Oregon.
Data Mining Classification: Alternative Techniques
CMPUT 466/551 Principal Source: CMU
Longin Jan Latecki Temple University
Paper presentation for CSI5388 PENGCHENG XI Mar. 23, 2005
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Sparse vs. Ensemble Approaches to Supervised Learning
Ensemble Learning what is an ensemble? why use an ensemble?
A Brief Introduction to Adaboost
Ensemble Learning: An Introduction
Adaboost and its application
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
Bagging LING 572 Fei Xia 1/24/06. Ensemble methods So far, we have covered several learning methods: FSA, HMM, DT, DL, TBL. Question: how to improve results?
Competent Undemocratic Committees Włodzisław Duch, Łukasz Itert and Karol Grudziński Department of Informatics, Nicholas Copernicus University, Torun,
Machine Learning: Ensemble Methods
Sparse vs. Ensemble Approaches to Supervised Learning
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Boosting Main idea: train classifiers (e.g. decision trees) in a sequence. a new classifier should focus on those cases which were incorrectly classified.
Rotation Forest: A New Classifier Ensemble Method 交通大學 電子所 蕭晴駿 Juan J. Rodríguez and Ludmila I. Kuncheva.
For Better Accuracy Eick: Ensemble Learning
Machine Learning CS 165B Spring 2012
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Boosting Neural Networks Published by Holger Schwenk and Yoshua Benggio Neural Computation, 12(8): , Presented by Yong Li.
CS 391L: Machine Learning: Ensembles
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Ensemble Classification Methods Rayid Ghani IR Seminar – 9/26/00.
Benk Erika Kelemen Zsolt
CS Fall 2015 (© Jude Shavlik), Lecture 7, Week 3
Data Mining - Volinsky Columbia University 1 Topic 10 - Ensemble Methods.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
CS Ensembles1 Ensembles. 2 A “Holy Grail” of Machine Learning Automated Learner Just a Data Set or just an explanation of the problem Hypothesis.
CLASSIFICATION: Ensemble Methods
BAGGING ALGORITHM, ONLINE BOOSTING AND VISION Se – Hoon Park.
Ensemble Learning (1) Boosting Adaboost Boosting is an additive model
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Ensemble Methods in Machine Learning
Classification Ensemble Methods 1
1 January 24, 2016Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 7 — Classification Ensemble Learning.
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
… Algo 1 Algo 2 Algo 3 Algo N Meta-Learning Algo.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
1 Machine Learning: Ensemble Methods. 2 Learning Ensembles Learn multiple alternative definitions of a concept using different training data or different.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
Ensemble Classifiers.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning: Ensemble Methods
Reading: R. Schapire, A brief introduction to boosting
Ensembles (Bagging, Boosting, and all that)
Trees, bagging, boosting, and stacking
COMP61011 : Machine Learning Ensemble Models
Ensemble Learning Introduction to Machine Learning and Data Mining, Carla Brodley.
A “Holy Grail” of Machine Learing
INTRODUCTION TO Machine Learning
Combining Base Learners
Data Mining Practical Machine Learning Tools and Techniques
Machine Learning Ensemble Learning: Voting, Boosting(Adaboost)
Ensemble learning.
Somi Jacob and Christian Bach
Model Combination.
Ensemble learning Reminder - Bagging of Trees Random Forest
Model generalization Brief summary of methods
INTRODUCTION TO Machine Learning 3rd Edition
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Ensembles (Bagging, Boosting, and all that)
Presentation transcript:

Ensemble Methods: Bagging and Boosting Lucila Ohno-Machado HST951

Topics Combining classifiers: Ensembles (vs. mixture of experts) Bagging Boosting Ada-Boosting Arcing Stacked Generalization

Combining classifiers Examples: classification trees and neural networks, several neural networks, several classification trees, etc. Average results from different models Why? Better classification performance than individual classifiers More resilience to noise Why not? Time consuming Overfitting

Bagging Breiman, 1996 Derived from bootstrap (Efron, 1993) Create classifiers using training sets that are bootstrapped (drawn with replacement) Average results for each case

Bagging Example (Opitz, 1999) Original 1 2 3 4 5 6 7 8 Training set 1 Training set 2 Training set 3 Training set 4

Boosting A family of methods Sequential production of classifiers Each classifier is dependent on the previous one, and focuses on the previous one’s errors Examples that are incorrectly predicted in previous classifiers are chosen more often or weighted more heavily

Ada-Boosting Freund and Schapire, 1996 Two approaches Select examples according to error in previous classifier (more representatives of misclassified cases are selected) – more common Weigh errors of the misclassified cases higher (all cases are incorporated, but weights are different) – does not work for some algorithms

Boosting Example (Opitz, 1999) Original 1 2 3 4 5 6 7 8 Training set 1 Training set 2 Training set 3 Training set 4

Ada-Boosting Define ek as the sum of the probabilities for the misclassified instances for current classifier Ck Multiply probability of selecting misclassified cases by bk = (1 – ek)/ ek “Renormalize” probabilities (i.e., rescale so that it sums to 1) Combine classifiers C1…Ck using weighted voting where Ck has weight log(bk)

Arcing Arcing-x4 (Breiman, 1996) For the ith example in the training set, mi refers to the number of times that it was misclassified by the previous K classifiers Probability pi of selecting example i in the next classifier K-1 is Empirical determination

Bias plus variance decomposition Geman, 1992 Bias: how close the average classifier is to the gold standard Variance: how often classifiers disagree Intrinsic target noise: error of Bayes optimal classifier

Empirical comparison (Opitz, 1999) 23 data sets from UCI repository 10-fold cross validation Backpropagation neural nets Classification trees Single, Simple (multiple NNs with different initial weights), Bagging, Ada-boost, Arcing # of classifiers in ensemble “Accuracy” as evaluation measure

Results Emsembles generally better than single, not so different from “Simple” Ensembles within NNs and CTs are strongly correlated Ada-boosting and arcing strongly correlated even across different algorithms (boosting depends more on data set than type of classifier algorithm) 40 networks in ensemble were sufficient NNs generally better than CTs

Correlation coefficients Neural Net Classification Tree Simple Bagging Arcing Ada Simple NN 1 .88 .87 .85 -.10 .38 .37 Bagging NN .78 -.11 .35 Arcing NN .99 .14 .61 .60 Ada NN .17 .62 .63 Bagging CT -.1 .68 .69 Arcing CT .96 Ada CT

More results Created data sets with different levels of noise (random selection of possible value for a feature or outcome) from the 23 sets Created artificial data with noise Boosting worse with more noise

Other work Opitz and Shavlik Create diverse classifiers by: Genetic search for classifiers that are accurate yet different Create diverse classifiers by: Using different parameters Using different training sets

Stacked Generalization Wolpert, 1992 Level-0 models are based on different learning models and use original data (level-0 data) Level-1 models are based on results of level-0 models (level-1 data are outputs of level-0 models) -- also called “generalizer”

Empirical comparison Ting, 1999 Compare SG to best model and to arcing and bagging Satcked C4.5, naïve Bayes, and a nearest neighbor learner Used multi-response linear regression as generalizer

Results Better performance (accuracy) than best level-0 model Use of continuous estimates better than use of predicted class Better than majority vote Similar performance as arcing and bagging Good for parallel computation (like bagging)

Related work Decomposition of problem into subtasks Mixture of experts (Jacobs, 1991) Difference is that each expert here takes care of a certain input space Hierarchical neural networks Difference is that cases are routed to pre-defined expert networks

Ideas for final projects Compare single, bagging, and boosting on other classifiers (e.g., logistic regression, rough sets) Use other data sets for comparison Use other performance measures Study the effect of voting scheme Try to find a relationship between initial performance, number of cases, and number of classifiers within an ensemble Genetic search for good diverse classifiers Analyze effect of prior outlier removal on boosting