Bagging and Boosting in Data Mining Carolina Ruiz

Slides:



Advertisements
Similar presentations
Ensemble Learning – Bagging, Boosting, and Stacking, and other topics
Advertisements

Lectures 17,18 – Boosting and Additive Trees Rice ECE697 Farinaz Koushanfar Fall 2006.
Data Mining Classification: Alternative Techniques
Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.
Comparison of Data Mining Algorithms on Bioinformatics Dataset Melissa K. Carroll Advisor: Sung-Hyuk Cha March 4, 2003.
Application of Stacked Generalization to a Protein Localization Prediction Task Melissa K. Carroll, M.S. and Sung-Hyuk Cha, Ph.D. Pace University, School.
Longin Jan Latecki Temple University
Introduction to Boosting Slides Adapted from Che Wanxiang( 车 万翔 ) at HIT, and Robin Dhamankar of Many thanks!
Sparse vs. Ensemble Approaches to Supervised Learning
Ensemble Learning what is an ensemble? why use an ensemble?
2D1431 Machine Learning Boosting.
Adaboost and its application
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
Bagging LING 572 Fei Xia 1/24/06. Ensemble methods So far, we have covered several learning methods: FSA, HMM, DT, DL, TBL. Question: how to improve results?
Machine Learning: Ensemble Methods
Sparse vs. Ensemble Approaches to Supervised Learning
Boosting Main idea: train classifiers (e.g. decision trees) in a sequence. a new classifier should focus on those cases which were incorrectly classified.
Ensemble Learning (2), Tree and Forest
Machine Learning CS 165B Spring 2012
Issues with Data Mining
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Modified from the slides by Dr. Raymond J. Mooney
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Ensemble Classification Methods Rayid Ghani IR Seminar – 9/26/00.
Benk Erika Kelemen Zsolt
Boosting of classifiers Ata Kaban. Motivation & beginnings Suppose we have a learning algorithm that is guaranteed with high probability to be slightly.
CS Fall 2015 (© Jude Shavlik), Lecture 7, Week 3
Data Mining - Volinsky Columbia University 1 Topic 10 - Ensemble Methods.
BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 “BOF” Trees Diagram as a Visual Way to Improve Interpretability.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Ensemble Methods: Bagging and Boosting
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
CLASSIFICATION: Ensemble Methods
Classification Techniques: Bayesian Classification
Ensemble Learning (1) Boosting Adaboost Boosting is an additive model
ISQS 6347, Data & Text Mining1 Ensemble Methods. ISQS 6347, Data & Text Mining 2 Ensemble Methods Construct a set of classifiers from the training data.
Learning with AdaBoost
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Konstantina Christakopoulou Liang Zeng Group G21
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Classification Ensemble Methods 1
1 January 24, 2016Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 7 — Classification Ensemble Learning.
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Genetic Algorithms (in 1 Slide) l GA: based on an analogy to biological evolution l Each.
1 Introduction to Predictive Learning Electrical and Computer Engineering LECTURE SET 8 Combining Methods and Ensemble Learning.
… Algo 1 Algo 2 Algo 3 Algo N Meta-Learning Algo.
Boosting ---one of combining models Xin Li Machine Learning Course.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Combining Bagging and Random Subspaces to Create Better Ensembles
Ensemble Classifiers.
Machine Learning: Ensemble Methods
Bagging and Random Forests
Ensemble methods with Data Streams
Chapter 13 – Ensembles and Uplift
COMP1942 Classification: More Concept Prepared by Raymond Wong
COMP61011 : Machine Learning Ensemble Models
Ensemble Learning Introduction to Machine Learning and Data Mining, Carla Brodley.
Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007
Combining Base Learners
Data Mining Practical Machine Learning Tools and Techniques
Multiple Decision Trees ISQS7342
Ensemble learning.
Model Combination.
Ensemble learning Reminder - Bagging of Trees Random Forest
Classification with CART
Presentation transcript:

Bagging and Boosting in Data Mining Carolina Ruiz

2 Motivation and Background Problem Definition: Given: a dataset of instances and a target concept Find: a model (e.g. set of association rules, decision tree, neural network) that helps in predicting the classification of unseen instances. Difficulties: The model should be stable (i.e. shouldn’t depend too much on input data used to construct it) The model should be a good predictor (difficult to achieve when input dataset is small)

3 Two Approaches Bagging (Bootstrap Aggregating) Leo Breiman, UC Berkeley Boosting Rob Schapire, ATT Research Jerry Friedman, Stanford U.

4 Bagging Model Creation: Create bootstrap replicates of the dataset and fit a model to each one Prediction: Average/vote predictions of each model Advantages Stabilizes “unstable” methods Easy to implement, parallelizable.

5 Bagging Algorithm 1. Create k bootstrap replicates of the dataset 2. Fit a model to each of the replicates 3. Average/vote the predictions of the k models

6 Boosting Creating the model: Construct a sequence of datasets and models in such a way that a dataset in the sequence weights an instance heavily when the previous model has misclassified it. Prediction: “Merge” the models in the sequence Advantages: Improves classification accuracy

7 Generic Boosting Algorithm 1. Equally weight all instance in dataset 2. For I = 1 to T 2.1. Fit a model to current dataset 2.2. Upweight poorly predicted instances 2.3 Downweight well-predicted instances 3. Merge the models in the sequence to obtain the final model

8 Conclusions and References Boosted naïve Bayes tied for first place in KDD-cup 1997 Reference: “Combining Estimators to Improve Performance” KDD-99 tutorial notes John F. Elder Greg Ridgeway