Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.

Slides:



Advertisements
Similar presentations
Ensemble Learning Reading: R. Schapire, A brief introduction to boosting.
Advertisements

On-line learning and Boosting
Lectures 17,18 – Boosting and Additive Trees Rice ECE697 Farinaz Koushanfar Fall 2006.
AdaBoost & Its Applications
Longin Jan Latecki Temple University
Face detection Many slides adapted from P. Viola.
Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei Li,
The Viola/Jones Face Detector Prepared with figures taken from “Robust real-time object detection” CRL 2001/01, February 2001.
Introduction to Boosting Slides Adapted from Che Wanxiang( 车 万翔 ) at HIT, and Robin Dhamankar of Many thanks!
HCI Final Project Robust Real Time Face Detection Paul Viola, Michael Jones, Robust Real-Time Face Detetion, International Journal of Computer Vision,
A Robust Real Time Face Detection. Outline  AdaBoost – Learning Algorithm  Face Detection in real life  Using AdaBoost for Face Detection  Improvements.
Ensemble Learning: An Introduction
Why does averaging work? Yoav Freund AT&T Labs Plan of talk Generative vs. non-generative modeling Boosting Boosting and over-fitting Bagging and over-fitting.
Adaboost and its application
A Robust Real Time Face Detection. Outline  AdaBoost – Learning Algorithm  Face Detection in real life  Using AdaBoost for Face Detection  Improvements.
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
Robust Real-Time Object Detection Paul Viola & Michael Jones.
Machine Learning: Ensemble Methods
Sparse vs. Ensemble Approaches to Supervised Learning
Boosting Main idea: train classifiers (e.g. decision trees) in a sequence. a new classifier should focus on those cases which were incorrectly classified.
Foundations of Computer Vision Rapid object / face detection using a Boosted Cascade of Simple features Presented by Christos Stoilas Rapid object / face.
Face Detection CSE 576. Face detection State-of-the-art face detection demo (Courtesy Boris Babenko)Boris Babenko.
For Better Accuracy Eick: Ensemble Learning
Machine Learning CS 165B Spring 2012
AdaBoost Robert E. Schapire (Princeton University) Yoav Freund (University of California at San Diego) Presented by Zhi-Hua Zhou (Nanjing University)
Face Detection using the Viola-Jones Method
Issues with Data Mining
Chapter 10 Boosting May 6, Outline Adaboost Ensemble point-view of Boosting Boosting Trees Supervised Learning Methods.
CSSE463: Image Recognition Day 27 This week This week Last night: k-means lab due. Last night: k-means lab due. Today: Classification by “boosting” Today:
CS 391L: Machine Learning: Ensembles
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Machine Learning – Linear Classifiers and Boosting CS 271: Fall 2007 Instructor: Padhraic Smyth.
Window-based models for generic object detection Mei-Chen Yeh 04/24/2012.
Benk Erika Kelemen Zsolt
Lecture 29: Face Detection Revisited CS4670 / 5670: Computer Vision Noah Snavely.
CS Fall 2015 (© Jude Shavlik), Lecture 7, Week 3
BOOSTING David Kauchak CS451 – Fall Admin Final project.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensemble Methods: Bagging and Boosting
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
Ensemble Learning (1) Boosting Adaboost Boosting is an additive model
Lecture 09 03/01/2012 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.
The Viola/Jones Face Detector A “paradigmatic” method for real-time object detection Training is slow, but detection is very fast Key ideas Integral images.
Bibek Jang Karki. Outline Integral Image Representation of image in summation format AdaBoost Ranking of features Combining best features to form strong.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Learning to Detect Faces A Large-Scale Application of Machine Learning (This material is not in the text: for further information see the paper by P.
Ensemble Methods.  “No free lunch theorem” Wolpert and Macready 1995.
1 CHUKWUEMEKA DURUAMAKU.  Machine learning, a branch of artificial intelligence, concerns the construction and study of systems that can learn from data.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16.
FACE DETECTION : AMIT BHAMARE. WHAT IS FACE DETECTION ? Face detection is computer based technology which detect the face in digital image. Trivial task.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Machine Learning – Classifiers and Boosting Reading Ch , (3 rd ed.) Ch , but not the part of 20.3 after “Learning Bayesian.
AdaBoost Algorithm and its Application on Object Detection Fayin Li.
1 Machine Learning: Ensemble Methods. 2 Learning Ensembles Learn multiple alternative definitions of a concept using different training data or different.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
Ensemble Classifiers.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning: Ensemble Methods
Reading: R. Schapire, A brief introduction to boosting
Machine Learning – Linear Classifiers and Boosting
ECE 5424: Introduction to Machine Learning
Combining Base Learners
Data Mining Practical Machine Learning Tools and Techniques
Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei.
ADABOOST(Adaptative Boosting)
Lecture 18: Bagging and Boosting
Ensemble learning.
Lecture 06: Bagging and Boosting
Model Combination.
Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector classifier 1 classifier 2 classifier.
Presentation transcript:

Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara

Tony Jebara, Columbia University Boosting Combining Multiple Classifiers Voting Boosting Adaboost Based on material by Y. Freund, P. Long & R. Schapire

Tony Jebara, Columbia University Combining Multiple Learners Have many simple learners Also called base learners or weak learners which have a classification error of <0.5 Combine or vote them to get a higher accuracy No free lunch: there is no guaranteed best approach here Different approaches: Voting combine learners with fixed weight Mixture of Experts adjust learners and a variable weight/gate fn Boosting actively search for next base-learners and vote Cascading, Stacking, Bagging, etc.

Tony Jebara, Columbia University Voting Have T classifiers Average their prediction with weights Like mixture of experts but weight is constant with input y x Expert  

Tony Jebara, Columbia University Mixture of Experts Have T classifiers or experts and a gating fn Average their prediction with variable weights But, adapt parameters of the gating function and the experts (fixed total number T of experts) Output Input Gating Network Expert

Tony Jebara, Columbia University Boosting Actively find complementary or synergistic weak learners Train next learner based on mistakes of previous ones. Average prediction with fixed weights Find next learner by training on weighted versions of data. weights training data weak learner ensemble of learners weak rule prediction

Tony Jebara, Columbia University AdaBoost Most popular weighting scheme Define margin for point i as Find an h t and find weight  t to min the cost function sum exp-margins weights training data weak learner ensemble of learners weak rule prediction

Tony Jebara, Columbia University AdaBoost Choose base learner &  t : Recall error of base classifier h t must be For binary h, Adaboost puts this weight on weak learners: (instead of the more general rule) Adaboost picks the following for the weights on data for the next round (here Z is the normalizer to sum to 1)

Tony Jebara, Columbia University Decision Trees X>3 Y>5 +1 no yes no X Y

Tony Jebara, Columbia University -0.2 Decision tree as a sum X Y -0.2 Y> yes no X> no yes sign

Tony Jebara, Columbia University An alternating decision tree X Y sign -0.2 Y> yes no X> no yes Y<1 0.0 no yes

Tony Jebara, Columbia University Example: Medical Diagnostics Cleve dataset from UC Irvine database. Heart disease diagnostics (+1=healthy,-1=sick) 13 features from tests (real valued and discrete). 303 instances.

Tony Jebara, Columbia University Ad-Tree Example

Tony Jebara, Columbia University Cross-validated accuracy Learning algorithm Number of splits Average test error Test error variance ADtree617.0%0.6% C %0.5% C5.0 + boosting %0.5% Boost Stumps %0.8%

Tony Jebara, Columbia University AdaBoost Convergence Rationale? Consider bound on the training error: Adaboost is essentially doing gradient descent on this. Convergence? Loss Correct Margin Mistakes Brownboost Logitboost 0-1 loss exp bound on step definition of f(x) recursive use of Z

Tony Jebara, Columbia University AdaBoost Convergence Convergence? Consider the binary h t case. So, the final learner converges exponentially fast in T if each weak learner is at least better than gamma!

Tony Jebara, Columbia University Curious phenomenon Boosting decision trees Using 2,000,000 parameters

Tony Jebara, Columbia University Explanation using margins Margin 0-1 loss

Tony Jebara, Columbia University Explanation using margins Margin 0-1 loss No examples with small margins!!

Tony Jebara, Columbia University Experimental Evidence

Tony Jebara, Columbia University AdaBoost Generalization Bound Also, a VC analysis gives a generalization bound: (where d is VC of base classifier) But, more iterations  overfitting! A margin analysis is possible, redefine margin as: Then have

Tony Jebara, Columbia University AdaBoost Generalization Bound Margin Suggests this optimization problem:

Tony Jebara, Columbia University AdaBoost Generalization Bound Proof Sketch

Tony Jebara, Columbia University UCI Results DatabaseOtherBoosting Error reduction Cleveland27.2 (DT)16.539% Promoters22.0 (DT)11.846% Letter13.8 (DT)3.574% Reuters 45.8, 6.0, ~60% Reuters 811.3, 12.1, ~40% % test error rates

Tony Jebara, Columbia University Boosted Cascade of Stumps Consider classifying an image as face/notface Use weak learner as stump that averages of pixel intensity Easy to calculate, white areas subtracted from black ones A special representation of the sample called the integral image makes feature extraction faster.

Tony Jebara, Columbia University Boosted Cascade of Stumps Summed area tables A representation that means any rectangle’s values can be calculated in four accesses of the integral image.

Tony Jebara, Columbia University Boosted Cascade of Stumps Summed area tables

Tony Jebara, Columbia University Boosted Cascade of Stumps The base size for a sub window is 24 by 24 pixels. Each of the four feature types are scaled and shifted across all possible combinations In a 24 pixel by 24 pixel sub window there are ~160,000 possible features to be calculated.

Tony Jebara, Columbia University Boosted Cascade of Stumps Viola-Jones algorithm, with K attributes (e.g., K = 160,000) we have 160,000 different decision stumps to choose from At each stage of boosting given reweighted data from previous stage Train all K (160,000) single-feature perceptrons Select the single best classifier at this stage Combine it with the other previously selected classifiers Reweight the data Learn all K classifiers again, select the best, combine, reweight Repeat until you have T classifiers selected Very computationally intensive!

Tony Jebara, Columbia University Boosted Cascade of Stumps Reduction in Error as Boosting adds Classifiers

Tony Jebara, Columbia University Boosted Cascade of Stumps First (e.g. best) two features learned by boosting

Tony Jebara, Columbia University Boosted Cascade of Stumps Example training data

Tony Jebara, Columbia University Boosted Cascade of Stumps To find faces, scan all squares at different scales, slow  Boosting finds ordering on weak learners (best ones first) Idea: cascade stumps to avoid too much computation!

Tony Jebara, Columbia University Boosted Cascade of Stumps Training time = weeks (with 5k faces and 9.5k non-faces) Final detector has 38 layers in the cascade, 6060 features 700 Mhz processor: Can process a 384 x 288 image in seconds (in 2003 when paper was written)

Tony Jebara, Columbia University Boosted Cascade of Stumps Results