CSSE463: Image Recognition Day 31 Due tomorrow night – Project plan Due tomorrow night – Project plan Evidence that you’ve tried something and what specifically.

Slides:

Advertisements

Similar presentations

An Introduction to Boosting Yoav Freund Banter Inc.

Advertisements

Ensemble Learning Reading: R. Schapire, A brief introduction to boosting.

On-line learning and Boosting

AdaBoost Reference Yoav Freund and Robert E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal.

EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.

Games of Prediction or Things get simpler as Yoav Freund Banter Inc.

Longin Jan Latecki Temple University

Sparse vs. Ensemble Approaches to Supervised Learning

Ensemble Learning: An Introduction

1 How to be a Bayesian without believing Yoav Freund Joint work with Rob Schapire and Yishay Mansour.

Adaboost and its application

CSSE463: Image Recognition Day 30 Due Friday – Project plan Due Friday – Project plan Evidence that you’ve tried something and what specifically you hope.

Three kinds of learning

Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.

Machine Learning: Ensemble Methods

Boosting Main idea: train classifiers (e.g. decision trees) in a sequence. a new classifier should focus on those cases which were incorrectly classified.

Ensembles of Classifiers Evgueni Smirnov

Machine Learning CS 165B Spring 2012

AdaBoost Robert E. Schapire (Princeton University) Yoav Freund (University of California at San Diego) Presented by Zhi-Hua Zhou (Nanjing University)

CSSE463: Image Recognition Day 21 Upcoming schedule: Upcoming schedule: Exam covers material through SVMs Exam covers material through SVMs.

Issues with Data Mining

CSSE463: Image Recognition Day 27 This week This week Last night: k-means lab due. Last night: k-means lab due. Today: Classification by “boosting” Today:

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

CS 391L: Machine Learning: Ensembles

LOGO Ensemble Learning Lecturer: Dr. Bo Yuan

Benk Erika Kelemen Zsolt

Boosting of classifiers Ata Kaban. Motivation & beginnings Suppose we have a learning algorithm that is guaranteed with high probability to be slightly.

MVPA Tutorial Last Update: January 18, 2012 Last Course: Psychology 9223, W2010, University of Western Ontario Last Update:

BOOSTING David Kauchak CS451 – Fall Admin Final project.

Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.

1 Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector classifier 1classifier 2classifier.

Today Ensemble Methods. Recap of the course. Classifier Fusion

Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.

Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.

CSSE463: Image Recognition Day 11 Lab 4 (shape) tomorrow: feel free to start in advance Lab 4 (shape) tomorrow: feel free to start in advance Test Monday.

Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.

Learning with AdaBoost

Methods for classification and image representation

CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.

CSSE463: Image Recognition Day 33 This week This week Today: Classification by “boosting” Today: Classification by “boosting” Yoav Freund and Robert Schapire.

Learning to Detect Faces A Large-Scale Application of Machine Learning (This material is not in the text: for further information see the paper by P.

Ensemble Methods in Machine Learning

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16.

CSSE463: Image Recognition Day 15 Announcements: Announcements: Lab 5 posted, due Weds, Jan 13. Lab 5 posted, due Weds, Jan 13. Sunset detector posted,

CSSE463: Image Recognition Day 11 Due: Due: Written assignment 1 tomorrow, 4:00 pm Written assignment 1 tomorrow, 4:00 pm Start thinking about term project.

Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bagging and Boosting Cross-Validation ML.

Notes on HW 1 grading I gave full credit as long as you gave a description, confusion matrix, and working code Many people’s descriptions were quite short.

Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.

Ensembles of Classifiers Evgueni Smirnov. Outline 1 Methods for Independently Constructing Ensembles 1.1 Bagging 1.2 Randomness Injection 1.3 Feature-Selection.

AdaBoost Algorithm and its Application on Object Detection Fayin Li.

1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.

Adaboost (Adaptive boosting) Jo Yeong-Jun Schapire, Robert E., and Yoram Singer. "Improved boosting algorithms using confidence- rated predictions."

1 Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector classifier 1classifier 2classifier.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Machine Learning: Ensemble Methods

Reading: R. Schapire, A brief introduction to boosting

ECE 5424: Introduction to Machine Learning

Adaboost Team G Youngmin Jun

Data Mining Practical Machine Learning Tools and Techniques

CSSE463: Image Recognition Day 20

Implementing AdaBoost

CSSE463: Image Recognition Day 18

Model Combination.

CSSE463: Image Recognition Day 18

CSSE463: Image Recognition Day 18

Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector classifier 1 classifier 2 classifier.

CSSE463: Image Recognition Day 18

Presentation transcript:

CSSE463: Image Recognition Day 31 Due tomorrow night – Project plan Due tomorrow night – Project plan Evidence that you’ve tried something and what specifically you hope to accomplish. Evidence that you’ve tried something and what specifically you hope to accomplish. Maybe ¼ of your experimental work? Maybe ¼ of your experimental work? Go, go, go! Go, go, go! This week This week Today: Topic du jour: Classification by “boosting” Today: Topic du jour: Classification by “boosting” Yoav Fruend and Robert Schapire. A decision-theoretic generalization of on- line learning and an application to boosting. Proceedings of the 2 nd European Conference on Computational Learning Theory, March, Yoav Fruend and Robert Schapire. A decision-theoretic generalization of on- line learning and an application to boosting. Proceedings of the 2 nd European Conference on Computational Learning Theory, March, Friday: Project workday (Class cancelled) Friday: Project workday (Class cancelled) Next Monday/Tuesday (and Weds?) also project workdays. Next Monday/Tuesday (and Weds?) also project workdays. I’ll be free during the time class usually meets I’ll be free during the time class usually meets You’ll send a status report by by the end of next week You’ll send a status report by by the end of next week Questions? Questions?

Pedestrian Detection Ghost_outline.htm Ghost_outline.htm Ghost_outline.htm Ghost_outline.htm Thanks to Thomas Root for link Thanks to Thomas Root for link

Motivation for Adaboost SVMs are fairly expensive in terms of memory. SVMs are fairly expensive in terms of memory. Need to store a list of support vectors, which for RBF kernels, can be long. Monolith SVM inputoutput By the way, how does svmfwd compute y1? y1 is just the weighted sum of contributions of individual support vectors: d = data dimension, e.g., 294 sv, svcoeff and bias are learned during training. BTW: looking at which of your training examples are support vectors can be revealing! (Keep in mind for term project)

Motivation SVMs are fairly expensive in terms of memory. SVMs are fairly expensive in terms of memory. Need to store a list of support vectors, which for RBF kernels, can be long. Can we do better? Can we do better? Yes, consider simpler classifiers like thresholding or decision trees. The idea of boosting is to combine the results of a number of “weak” classifiers in a smart way. Monolith SVM inputoutput L4 L3 L2 L1 input output Team of experts w1w1 w4w4 w2w2 w3w3

Idea of Boosting Consider each weak learner as an expert that produces “advice” (a classification result, possibly with confidence). Consider each weak learner as an expert that produces “advice” (a classification result, possibly with confidence). We want to combine their predictions intelligently We want to combine their predictions intelligently Call each weak learner L multiple times with a different distribution of the data (perhaps a subset of the whole training set). Call each weak learner L multiple times with a different distribution of the data (perhaps a subset of the whole training set). Use a weighting scheme in which each expert’s advice is weighted by it’s accuracy on the training set. Use a weighting scheme in which each expert’s advice is weighted by it’s accuracy on the training set. Not memorizing the training set, since each classifier is weak

Notation Training set contains labeled samples: Training set contains labeled samples: {(x 1, C(x 1 )), (x 2, C(x 2 )), … (x N, C(x N ))} {(x 1, C(x 1 )), (x 2, C(x 2 )), … (x N, C(x N ))} C(x i ) is the class label for sample x i, either 1 or 0. C(x i ) is the class label for sample x i, either 1 or 0. w is a weight vector, 1 entry per example, of how important each example is. w is a weight vector, 1 entry per example, of how important each example is. Can initialize to uniform (1/N) or weight guessed “important” examples more heavily Can initialize to uniform (1/N) or weight guessed “important” examples more heavily h is a hypothesis (weak classifier output) in the range [0,1]. When called with a vector of inputs, h is a vector of outputs h is a hypothesis (weak classifier output) in the range [0,1]. When called with a vector of inputs, h is a vector of outputs

Adaboost Algorithm Initialize weights for t = 1:T 1. Set 2. Call weak learner L t with x, and get back result vector 3. Find error  t of : 4. Set 5. Re-weight the training data Output answer: Normalizes so p weights sum to 1 Average error on training set weighted by p Low overall error   close to 0; (big impact on next step) error almost 0.5   almost 1 (small impact “ “ “ ) Weighting the ones it got right less so next time it will focus more on the incorrect ones

Adaboost Algorithm Output answer: Combines predictions made at all T iterations. Label as class 1 if weighted average of all T predictions is over ½. Weighted by error of overall classifier at that iteration (better classifiers weighted higher)

Practical issues For binary problems, weak learners have to have at least 50% accuracy (better than chance). For binary problems, weak learners have to have at least 50% accuracy (better than chance). Running time for training is O(nT), for n samples and T iterations. Running time for training is O(nT), for n samples and T iterations. How do we choose T? How do we choose T? Trade-off between training time and accuracy Trade-off between training time and accuracy Best determined experimentally (see optT demo) Best determined experimentally (see optT demo)

Multiclass problems Can be generalized to multiple classes using variants Adaboost.M1 and Adaboost.M2 Can be generalized to multiple classes using variants Adaboost.M1 and Adaboost.M2 Challenge finding a weak learner with 50% accuracy if there are lots of classes Challenge finding a weak learner with 50% accuracy if there are lots of classes Random chance gives only 17% if 6 classes. Random chance gives only 17% if 6 classes. Version M2 handles this. Version M2 handles this.

Available software GML AdaBoost Matlab Toolbox GML AdaBoost Matlab Toolbox Weak learner: Classification tree Weak learner: Classification tree Uses less memory if weak learner is simple Uses less memory if weak learner is simple Better accuracy than SVM in tests (toy and real problems) Better accuracy than SVM in tests (toy and real problems) Demo/visualization to follow Demo/visualization to follow