Bagging and Boosting in Data Mining Carolina Ruiz

Slides:

Advertisements

Similar presentations

Ensemble Learning – Bagging, Boosting, and Stacking, and other topics

Advertisements

Lectures 17,18 – Boosting and Additive Trees Rice ECE697 Farinaz Koushanfar Fall 2006.

Data Mining Classification: Alternative Techniques

Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.

Comparison of Data Mining Algorithms on Bioinformatics Dataset Melissa K. Carroll Advisor: Sung-Hyuk Cha March 4, 2003.

Application of Stacked Generalization to a Protein Localization Prediction Task Melissa K. Carroll, M.S. and Sung-Hyuk Cha, Ph.D. Pace University, School.

Longin Jan Latecki Temple University

Introduction to Boosting Slides Adapted from Che Wanxiang( 车万翔 ) at HIT, and Robin Dhamankar of Many thanks!

Sparse vs. Ensemble Approaches to Supervised Learning

Ensemble Learning what is an ensemble? why use an ensemble?

2D1431 Machine Learning Boosting.

Adaboost and its application

Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.

Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!

Bagging LING 572 Fei Xia 1/24/06. Ensemble methods So far, we have covered several learning methods: FSA, HMM, DT, DL, TBL. Question: how to improve results?

Machine Learning: Ensemble Methods

Sparse vs. Ensemble Approaches to Supervised Learning

Boosting Main idea: train classifiers (e.g. decision trees) in a sequence. a new classifier should focus on those cases which were incorrectly classified.

Ensemble Learning (2), Tree and Forest

Machine Learning CS 165B Spring 2012

Issues with Data Mining

DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.

Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

Modified from the slides by Dr. Raymond J. Mooney

LOGO Ensemble Learning Lecturer: Dr. Bo Yuan

Ensemble Classification Methods Rayid Ghani IR Seminar – 9/26/00.

Benk Erika Kelemen Zsolt

Boosting of classifiers Ata Kaban. Motivation & beginnings Suppose we have a learning algorithm that is guaranteed with high probability to be slightly.

CS Fall 2015 (© Jude Shavlik), Lecture 7, Week 3

Data Mining - Volinsky Columbia University 1 Topic 10 - Ensemble Methods.

BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 “BOF” Trees Diagram as a Visual Way to Improve Interpretability.

Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.

Ensemble Methods: Bagging and Boosting

Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.

Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.

CLASSIFICATION: Ensemble Methods

Classification Techniques: Bayesian Classification

Ensemble Learning (1) Boosting Adaboost Boosting is an additive model

ISQS 6347, Data & Text Mining1 Ensemble Methods. ISQS 6347, Data & Text Mining 2 Ensemble Methods Construct a set of classifiers from the training data.

Learning with AdaBoost

Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.

Konstantina Christakopoulou Liang Zeng Group G21

Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.

Classification Ensemble Methods 1

1 January 24, 2016Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 7 — Classification Ensemble Learning.

Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Genetic Algorithms (in 1 Slide) l GA: based on an analogy to biological evolution l Each.

1 Introduction to Predictive Learning Electrical and Computer Engineering LECTURE SET 8 Combining Methods and Ensemble Learning.

… Algo 1 Algo 2 Algo 3 Algo N Meta-Learning Algo.

Boosting ---one of combining models Xin Li Machine Learning Course.

Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.

Combining Bagging and Random Subspaces to Create Better Ensembles

Ensemble Classifiers.

Machine Learning: Ensemble Methods

Bagging and Random Forests

Ensemble methods with Data Streams

Chapter 13 – Ensembles and Uplift

COMP1942 Classification: More Concept Prepared by Raymond Wong

COMP61011 : Machine Learning Ensemble Models

Ensemble Learning Introduction to Machine Learning and Data Mining, Carla Brodley.

Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007

Combining Base Learners

Data Mining Practical Machine Learning Tools and Techniques

Multiple Decision Trees ISQS7342

Ensemble learning.

Model Combination.

Ensemble learning Reminder - Bagging of Trees Random Forest

Classification with CART

Presentation transcript:

Bagging and Boosting in Data Mining Carolina Ruiz

2 Motivation and Background Problem Definition: Given: a dataset of instances and a target concept Find: a model (e.g. set of association rules, decision tree, neural network) that helps in predicting the classification of unseen instances. Difficulties: The model should be stable (i.e. shouldn’t depend too much on input data used to construct it) The model should be a good predictor (difficult to achieve when input dataset is small)

3 Two Approaches Bagging (Bootstrap Aggregating) Leo Breiman, UC Berkeley Boosting Rob Schapire, ATT Research Jerry Friedman, Stanford U.

4 Bagging Model Creation: Create bootstrap replicates of the dataset and fit a model to each one Prediction: Average/vote predictions of each model Advantages Stabilizes “unstable” methods Easy to implement, parallelizable.

5 Bagging Algorithm 1. Create k bootstrap replicates of the dataset 2. Fit a model to each of the replicates 3. Average/vote the predictions of the k models

6 Boosting Creating the model: Construct a sequence of datasets and models in such a way that a dataset in the sequence weights an instance heavily when the previous model has misclassified it. Prediction: “Merge” the models in the sequence Advantages: Improves classification accuracy

7 Generic Boosting Algorithm 1. Equally weight all instance in dataset 2. For I = 1 to T 2.1. Fit a model to current dataset 2.2. Upweight poorly predicted instances 2.3 Downweight well-predicted instances 3. Merge the models in the sequence to obtain the final model

8 Conclusions and References Boosted naïve Bayes tied for first place in KDD-cup 1997 Reference: “Combining Estimators to Improve Performance” KDD-99 tutorial notes John F. Elder Greg Ridgeway