LOGO Ensemble Learning Lecturer: Dr. Bo Yuan

Slides:



Advertisements
Similar presentations
DECISION TREES. Decision trees  One possible representation for hypotheses.
Advertisements

Random Forest Predrag Radenković 3237/10
Data Mining Classification: Alternative Techniques
Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.
CMPUT 466/551 Principal Source: CMU
Longin Jan Latecki Temple University
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Sparse vs. Ensemble Approaches to Supervised Learning
Ensemble Learning what is an ensemble? why use an ensemble?
A Brief Introduction to Adaboost
Ensemble Learning: An Introduction
Three kinds of learning
Bagging LING 572 Fei Xia 1/24/06. Ensemble methods So far, we have covered several learning methods: FSA, HMM, DT, DL, TBL. Question: how to improve results?
Machine Learning: Ensemble Methods
Sparse vs. Ensemble Approaches to Supervised Learning
Chapter 5 Data mining : A Closer Look.
Ensemble Learning (2), Tree and Forest
For Better Accuracy Eick: Ensemble Learning
Machine Learning CS 165B Spring 2012
Issues with Data Mining
Chapter 10 Boosting May 6, Outline Adaboost Ensemble point-view of Boosting Boosting Trees Supervised Learning Methods.
Zhangxi Lin ISQS Texas Tech University Note: Most slides are from Decision Tree Modeling by SAS Lecture Notes 6 Ensembles of Trees.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Benk Erika Kelemen Zsolt
Boosting of classifiers Ata Kaban. Motivation & beginnings Suppose we have a learning algorithm that is guaranteed with high probability to be slightly.
CS Fall 2015 (© Jude Shavlik), Lecture 7, Week 3
Data Mining - Volinsky Columbia University 1 Topic 10 - Ensemble Methods.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensemble Methods: Bagging and Boosting
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
CS Ensembles1 Ensembles. 2 A “Holy Grail” of Machine Learning Automated Learner Just a Data Set or just an explanation of the problem Hypothesis.
CLASSIFICATION: Ensemble Methods
BAGGING ALGORITHM, ONLINE BOOSTING AND VISION Se – Hoon Park.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Learning to Detect Faces A Large-Scale Application of Machine Learning (This material is not in the text: for further information see the paper by P.
Ensemble Methods in Machine Learning
Konstantina Christakopoulou Liang Zeng Group G21
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Classification Ensemble Methods 1
COMP24111: Machine Learning Ensemble Models Gavin Brown
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
1 Introduction to Predictive Learning Electrical and Computer Engineering LECTURE SET 8 Combining Methods and Ensemble Learning.
Regression Tree Ensembles Sergey Bakin. Problem Formulation §Training data set of N data points (x i,y i ), 1,…,N. §x are predictor variables (P-dimensional.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
Ensemble Classifiers.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning: Ensemble Methods
Introduction to Machine Learning
COMP61011 : Machine Learning Ensemble Models
Ensemble Learning Introduction to Machine Learning and Data Mining, Carla Brodley.
Neuro-Computing Lecture 5 Committee Machine
ECE 5424: Introduction to Machine Learning
A “Holy Grail” of Machine Learing
Data Mining Practical Machine Learning Tools and Techniques
Machine Learning Ensemble Learning: Voting, Boosting(Adaboost)
Lecture 18: Bagging and Boosting
Ensemble learning.
Lecture 06: Bagging and Boosting
Model Combination.
Ensemble learning Reminder - Bagging of Trees Random Forest
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

LOGO Ensemble Learning Lecturer: Dr. Bo Yuan

Real World Scenarios VS. 2

Real World Scenarios 3

What is ensemble learning?  Many individual learning algorithms are available:  Decision Trees, Neural Networks, Support Vector Machines  The process by which multiple models are strategically generated and combined in order to better solve a particular Machine Learning problem.  Motivations  To improve the performance of a single model.  To reduce the likelihood of an unfortunate selection of a poor model.  Multiple Classifier Systems  One idea, many implementations  Bagging  Boosting 4

Algorithm Hierarchy Machine learning Supervised learning Classification Single algorithms: SVM,DT,NN Ensemble algorithms BoostingBagging Semi-supervised learning Unsupervised learning Clustering 5

Combination of Classifiers 6

Model Selection 7

Divide and Conquer 8

9

Diversity  The key to the success of ensemble learning  Need to correct the errors made by other classifiers.  Does not work if all models are identical.  Different Learning Algorithms  DT, SVM, NN, KNN …  Different Training Processes  Different Parameters  Different Training Sets  Different Feature Sets  Weak Learners  Easy to create different decision boundaries.  Stumps … 10

Combiners  How to combine the outputs of classifiers.  Averaging  Voting  Majority Voting Random Forest  Weighted Majority Voting AdaBoost  Learning Combiner  General Combiner Stacking  Piecewise Combiner RegionBoost  No Free Lunch 11

Bagging 12

Bootstrap Samples 13 Sample 1 Sample 2 Sample 3

A Decision Tree 14

Tree vs. Forest 15

Random Forests  Developed by Prof. Leo Breiman  Inventor of CART    Breiman, L.: Random Forests. Machine Learning 45(1), 5–32, 2001  Bootstrap Aggregation (Bagging)  Resample with Replacement  Use around two third of the original data.  A Collection of CART-like Trees  Binary Partition  No Pruning  Inherent Randomness  Majority Voting 16

RF Main Features  Generates substantially different trees:  Use random bootstrap samples of the training data.  Use random subsets of variables for each node.  Number of Variables  Square Root (K)  K: total number of available variables  Can dramatically speed up the tree building process.  Number of Trees  500 or more  Self-Testing  Around one third of the original data are left out.  Out of Bag (OOB)  Similar to Cross-Validation 17

RF Advantages  All data can be used in the training process.  No need to leave some data for testing.  No need to do conventional cross-validation.  Data in OOB are used to evaluate the current tree.  Performance of the entire RF  Each data point is tested over a subset of trees.  Depends on whether it is in the OOB.  High levels of predictive accuracy  Only a few parameters to experiment with.  Suitable for both classification and regression.  Resistant to overtraining (overfitting).  No need for prior feature selection. 18

Stacking 19

Stacking 20

Boosting 21

Boosting 22

Boosting  Bagging reduces variance.  Bagging does not reduce bias.  In Boosting, classifiers are generated sequentially.  Focuses on most informative data points.  Training samples are weighted.  Outputs are combined via weighted voting.  Can create arbitrarily strong classifiers.  The base learners can be arbitrarily weak.  As long as they are better than random guess! 23

Boosting 24 Base classifier h 1 (x) Base classifier h 3 (x) Base classifier h 3 (x) Base classifier h 2 (x) Base classifier h 2 (x) Training Boosting classifier H(x) = sign(∑ α i  h i (x)) Boosting classifier H(x) = sign(∑ α i  h i (x)) Test Results

AdaBoost 25

Demo 26

Demo 27

Demo 28 Which one is bigger?

Example 29

The Choice of α 30

The Choice of α 31

The Choice of α 32

Error Bounds 33

Summary of AdaBoost  Advantages  Simple and easy to implement  No parameters to tune  Proven upper bounds on training set  Immune to overfitting  Disadvantages  Suboptimal α values  Steepest descent  Sensitive to noise  Future Work  Theory  Comprehensibility  New Framework 34

Fixed Weighting Scheme 35

Dynamic Weighting Scheme 36 Base classifier h 1 (x) Base classifier h 3 (x) Base classifier h 3 (x) Base classifier h 2 (x) Base classifier h 2 (x) Boosting classifier H(x) = sign(∑ α i (x)  h i (x)) Boosting classifier H(x) = sign(∑ α i (x)  h i (x)) Test Results Estimator α 1 (x) Estimator α 1 (x) Estimator α 2 (x) Estimator α 2 (x) Estimator α 3 (x) Estimator α 3 (x) Training

Boosting with Dynamic Weighting 37 Boosting with dynamic weighting RegionBoostiBoostDynaBoostWeightBoost

RegionBoost  AdaBoost assigns fixed weights to models.  However, different models emphasize different regions.  The weights of models should be input-dependent.  Given an input, only invoke appropriate models.  Train a competency predictor for each model.  Estimate whether the model is likely make a right decision.  Use this information as the weight.  Many classifiers can be used such as KNN and Neural Networks.  Maclin, R.: Boosting classifiers regionally. AAAI, ,

RegionBoost 39 Base ClassifierCompetency Predicator

RegionBoost with KNN 40  To calculate :  Find the K nearest neighbors of x i in the training set.  Calculate the percentage of points correctly classified by h j.

RegionBoost Results 41

RegionBoost Results 42

Review  What is ensemble learning?  What can ensemble learning help us?  Two major types of ensemble learning:  Parallel (Bagging)  Sequential (Boosting)  Different ways to combine models:  Average  Majority Voting  Weighted Majority Voting  Some representative algorithms  Random Forests  AdaBoost  RegionBoost 43

Next Week’s Class Talk  Volunteers are required for next week’s class talk.  Topic: Applications of AdaBoost  Suggested Reading  Robust Real-Time Object Detection  P. Viola and M. Jones  International Journal of Computer Vision 57(2),  Length: 20 minutes plus question time 44