CSC2515 Fall 2008 Introduction to Machine Learning Lecture 11a Boosting and Naïve Bayes All lecture slides will be available as.ppt,.ps, &.htm at www.cs.toronto.edu/~hinton.

Slides:



Advertisements
Similar presentations
Ensemble Learning Reading: R. Schapire, A brief introduction to boosting.
Advertisements

EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.
Data Mining and Machine Learning
EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.
Rapid Object Detection using a Boosted Cascade of Simple Features Paul Viola, Michael Jones Conference on Computer Vision and Pattern Recognition 2001.
Face detection Behold a state-of-the-art face detector! (Courtesy Boris Babenko)Boris Babenko.
Boosting Approach to ML
AdaBoost & Its Applications
Longin Jan Latecki Temple University
Face detection Many slides adapted from P. Viola.
Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei Li,
EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.
CSC2515 Fall 2008 Introduction to Machine Learning Lecture 10a Kernel density estimators and nearest neighbors All lecture slides will be available as.ppt,.ps,
Introduction to Boosting Slides Adapted from Che Wanxiang( 车 万翔 ) at HIT, and Robin Dhamankar of Many thanks!
The Viola/Jones Face Detector (2001)
A simple classifier Ridge regression A variation on standard linear regression Adds a “ridge” term that has the effect of “smoothing” the weights Equivalent.
Ensemble Learning: An Introduction
Learning and Vision: Discriminative Models
Adaboost and its application
Robust Real-Time Object Detection Paul Viola & Michael Jones.
Examples of Ensemble Methods
CSC321: Introduction to Neural Networks and Machine Learning Lecture 20 Learning features one layer at a time Geoffrey Hinton.
Boosting Main idea: train classifiers (e.g. decision trees) in a sequence. a new classifier should focus on those cases which were incorrectly classified.
Foundations of Computer Vision Rapid object / face detection using a Boosted Cascade of Simple features Presented by Christos Stoilas Rapid object / face.
Face Detection CSE 576. Face detection State-of-the-art face detection demo (Courtesy Boris Babenko)Boris Babenko.
For Better Accuracy Eick: Ensemble Learning
AdaBoost Robert E. Schapire (Princeton University) Yoav Freund (University of California at San Diego) Presented by Zhi-Hua Zhou (Nanjing University)
Face Detection using the Viola-Jones Method
CS 391L: Machine Learning: Ensembles
Detecting Pedestrians Using Patterns of Motion and Appearance Paul Viola Microsoft Research Irfan Ullah Dept. of Info. and Comm. Engr. Myongji University.
Window-based models for generic object detection Mei-Chen Yeh 04/24/2012.
Benk Erika Kelemen Zsolt
Learning Lateral Connections between Hidden Units Geoffrey Hinton University of Toronto in collaboration with Kejie Bao University of Toronto.
Lecture 29: Face Detection Revisited CS4670 / 5670: Computer Vision Noah Snavely.
Face detection Slides adapted Grauman & Liebe’s tutorial
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
BOOSTING David Kauchak CS451 – Fall Admin Final project.
Ensemble Learning (1) Boosting Adaboost Boosting is an additive model
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Learning with AdaBoost
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall 2004 AdaBoost.. Binary Classification. Read 9.5 Duda,
Lecture 09 03/01/2012 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.
The Viola/Jones Face Detector A “paradigmatic” method for real-time object detection Training is slow, but detection is very fast Key ideas Integral images.
Learning to Detect Faces A Large-Scale Application of Machine Learning (This material is not in the text: for further information see the paper by P.
FACE DETECTION : AMIT BHAMARE. WHAT IS FACE DETECTION ? Face detection is computer based technology which detect the face in digital image. Trivial task.
Classification Ensemble Methods 1
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Boosting ---one of combining models Xin Li Machine Learning Course.
Face detection Many slides adapted from P. Viola.
AdaBoost Algorithm and its Application on Object Detection Fayin Li.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
Adaboost (Adaptive boosting) Jo Yeong-Jun Schapire, Robert E., and Yoram Singer. "Improved boosting algorithms using confidence- rated predictions."
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning: Ensemble Methods
Reading: R. Schapire, A brief introduction to boosting
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Session 7: Face Detection (cont.)
ECE 5424: Introduction to Machine Learning
Object detection as supervised classification
In summary C1={skin} C2={~skin} Given x=[R,G,B], is it skin or ~skin?
Adaboost Team G Youngmin Jun
Learning to Detect Faces Rapidly and Robustly
Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei.
Introduction to Boosting
ADABOOST(Adaptative Boosting)
Model Combination.
CSC2515 Fall 2007 Introduction to Machine Learning Lecture 7b: Kernel density estimators and nearest neighbors All lecture slides will be available as.
Lecture 29: Face Detection Revisited
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

CSC2515 Fall 2008 Introduction to Machine Learning Lecture 11a Boosting and Naïve Bayes All lecture slides will be available as.ppt,.ps, &.htm at Many of the figures are provided by Chris Bishop from his textbook: ”Pattern Recognition and Machine Learning”

A commonsense way to use limited computational resources First train a model on all of the data –Lets assume it get the great majority of the cases right. Then train another model on all the cases the first model got wrong plus an equal number that it got right. –This focusses the resources on modelling the hard cases. Train a third model focusssing on cases that either or both previous models got wrong. –Then use a simple committee of the three models This is quite effective for learning to recognize handwritten digits, but it is also very heuristic. –Can we give it a theoretical foundation?

Making weak learners stronger Suppose you have a weak learning module (a “base classifier”) that can always get 0.5+epsilon correct when given a two-way classification task –That seems like a weak assumption but beware! Can you apply this learning module many times to get a strong learner that can get close to zero error rate on the training data? –Theorists showed how to do this and it actually led to an effective new learning procedure (Freund & Shapire, 1996)

Boosting (ADAboost) First train the base classifier on all the training data with equal importance weights on each case. Then re-weight the training data to emphasize the hard cases and train a second model. –How do we re-weight the data? Maybe some theory can help. Keep training new models on re-weighted data Finally, use a weighted committee of all the models for the test data. –How do we weight the models in the committee? Maybe some theory can help.

How to train each classifier 1 if error, 0 if correct

How to weight each training case for classifier m This is the quality of the classifier. It is zero if the classifier has weighted error rate of 0.5 and infinity if the classifier is perfect weighted error rate of classifier

How to make predictions using a committee of classifiers Weight the binary prediction of each classifier by the quality of that classifier: This completely ignores the confidence of the classifier which is very un-bayesian and probably inefficient, but may make it easier to prove things.

An alternative derivation of ADAboost Just write down the right cost function and let each classifier minimize it greedily (Freidman et. al. 2000). the exponential loss for classifier m real-valued prediction by committee of models up to m

Learning classifier m using exponential loss

Re-writing the part of the exponential loss that is relevant when fitting classifier m wrong cases unmodifiable multiplicative constant

An impressive example of boosting Viola and Jones created a very fast face detector that can be scanned across a large image to find the faces. The base classifier just compares the total intensity in two rectangular pieces of the image. –There is a neat trick for computing the total intensity in a rectangle in a few operations. So its easy to evaluate a huge number of base classifiers and they are very fast at runtime. –We add classifiers greedily based on their quality on the weighted training cases.