Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

ECG Signal processing (2)
Classification / Regression Support Vector Machines
Detecting Faces in Images: A Survey
EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.
Rapid Object Detection using a Boosted Cascade of Simple Features Paul Viola, Michael Jones Conference on Computer Vision and Pattern Recognition 2001.
Face detection Behold a state-of-the-art face detector! (Courtesy Boris Babenko)Boris Babenko.
An Introduction of Support Vector Machine
SVM—Support Vector Machines
Machine learning continued Image source:
Computer vision: models, learning and inference Chapter 8 Regression.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
CMPUT 466/551 Principal Source: CMU
Computer vision: models, learning and inference
AdaBoost & Its Applications
Face detection Many slides adapted from P. Viola.
Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei Li,
EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Sparse vs. Ensemble Approaches to Supervised Learning
Object Recognition with Informative Features and Linear Classification Authors: Vidal-Naquet & Ullman Presenter: David Bradley.
Face detection and recognition Many slides adapted from K. Grauman and D. Lowe.
A Robust Real Time Face Detection. Outline  AdaBoost – Learning Algorithm  Face Detection in real life  Using AdaBoost for Face Detection  Improvements.
Face Recognition with Harr Transforms and SVMs EE645 Final Project May 11, 2005 J Stautzenberger.
A Robust Real Time Face Detection. Outline  AdaBoost – Learning Algorithm  Face Detection in real life  Using AdaBoost for Face Detection  Improvements.
Robust Real-Time Object Detection Paul Viola & Michael Jones.
Computer Vision Spring ,-685 Instructor: S. Narasimhan WH 5409 T-R 10:30am – 11:50am Lecture #22.
Foundations of Computer Vision Rapid object / face detection using a Boosted Cascade of Simple features Presented by Christos Stoilas Rapid object / face.
Face Detection CSE 576. Face detection State-of-the-art face detection demo (Courtesy Boris Babenko)Boris Babenko.
AdaBoost Robert E. Schapire (Princeton University) Yoav Freund (University of California at San Diego) Presented by Zhi-Hua Zhou (Nanjing University)
Face Detection using the Viola-Jones Method
Classification III Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,
Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?
CS 231A Section 1: Linear Algebra & Probability Review
Recognition using Boosting Modified from various sources including
Window-based models for generic object detection Mei-Chen Yeh 04/24/2012.
Lecture 29: Face Detection Revisited CS4670 / 5670: Computer Vision Noah Snavely.
Face detection Slides adapted Grauman & Liebe’s tutorial
DIEGO AGUIRRE COMPUTER VISION INTRODUCTION 1. QUESTION What is Computer Vision? 2.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
1 Chapter 6. Classification and Prediction Overview Classification algorithms and methods Decision tree induction Bayesian classification Lazy learning.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Adaboost and Object Detection Xu and Arun. Principle of Adaboost Three cobblers with their wits combined equal Zhuge Liang the master mind. Failure is.
Methods for classification and image representation
Lecture 09 03/01/2012 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.
The Viola/Jones Face Detector A “paradigmatic” method for real-time object detection Training is slow, but detection is very fast Key ideas Integral images.
CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.
Learning to Detect Faces A Large-Scale Application of Machine Learning (This material is not in the text: for further information see the paper by P.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16.
Face detection Behold a state-of-the-art face detector! (Courtesy Boris Babenko)Boris Babenko slides adapted from Svetlana Lazebnik.
Notes on HW 1 grading I gave full credit as long as you gave a description, confusion matrix, and working code Many people’s descriptions were quite short.
CSSE463: Image Recognition Day 14 Lab due Weds. Lab due Weds. These solutions assume that you don't threshold the shapes.ppt image: Shape1: elongation.
Face detection Many slides adapted from P. Viola.
1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.
CSSE463: Image Recognition Day 14
PREDICT 422: Practical Machine Learning
Reading: R. Schapire, A brief introduction to boosting
Support Vector Machines
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei.
CSSE463: Image Recognition Day 14
CSSE463: Image Recognition Day 14
CSSE463: Image Recognition Day 14
Support Vector Machine _ 2 (SVM)
CSSE463: Image Recognition Day 14
COSC 4368 Machine Learning Organization
Lecture 29: Face Detection Revisited
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006

Course Project Basic Requirement: Implement a vision algorithm How complex?  The experiments/implementation details should be interesting enough for a 4-5 page write-up.  If you choose a relatively simple algorithm, then you should do interesting experiments to test the algorithm's limits

Groups I encourage you to work in groups  Can do more interesting projects  Should be more interesting projects Come talk to me if you would like to work in a group, but don't know anyone Group write-up: 6-8 pages Possible goal: CVPR07 Submission (Dec 4)  ~20% acceptance rate, don't plan on submitting second-rate work

How do I pick a project? Strategy #1:  Pick topic that you think is interesting  Read three papers on that topic  Implement one  Or implement your own solution  Could be original research Lots of opportunity in the area of computational photography  Come talk to me!!! I can point you to interesting papers that have come out recently

Strategy #2 I have a few original research ideas  Computational Photography  Surveillance  Object Segmentation Come talk to me to see what you're interested in and if you need help finding partners for a group project No advantage in terms of grading

Q:I work in one of the vision groups, can I just turn in my CVPR07 submission? A: No

Well, actually Your project may be related, but should not just be your current research project Examples  Related side project that you haven't had time to pursue in depth  Application of algorithms that you have developed for one problem to a different problem Should have interesting experiments

Getting it done Write-ups due Dec 2 Brief Proposal Due Nov 7 th  I would prefer Oct 18 th or 25 th Whatever you work on, keep me updated!!!! I am here to help!

Grading I will give you feedback on your proposal  The earlier you touch base with me, the better Once we agree, if you do what your proposal stated and turn in a good-quality write-up, you will get an “A” What if it doesn't work?  It happens a lot!  Good write-up explaining what went wrong, what you think the underlying problems are and how you would fix them if you were to keep working on this project I'm not talking about “I didn't understand the math” or “My code kept crashing”  Can still get an “A”

One last thing about projects I will be scheduling project meetings to meet with each group at the end of November Class will be canceled on November 21 That class will be your project meeting.

What's wrong with this decision boundary? (Assume this is the training data)

What's wrong with this decision boundary? What if you then tested on this data? This decision boundary over-fit the training data  Hard to do with a linear classifier, but easy with a non-linear classifier

How to tell if your classifier is overfitting Strategy #1:Hold out part of your data as a test set What if data is hard to come by? Strategy #2: k-fold cross-validation  Break the data set into k parts  For each part, hold a part out, then train the classifier and use the held out part as a test set  Slower than test-set method  More efficient use of limited data

Basic Set-up for Boosting We want to learn a classifier We will assume that F(x) has the form Basic Idea:  Iteratively Choose weak learners and set the weights

AdaBoost Initialization: D is a distribution over the training examples  Can also be thought of as a weight on each example From “A short introduction to boosting” by Freund and Schapire

Next Step: Get Weak Learner The weak learner trained to do as well as possible on the weighted training set  Must have better than 50% accuracy From “A short introduction to boosting” by Freund and Schapire

Next Reset Weights From “A short introduction to boosting” by Freund and Schapire

Demo

In this demo, each weak learner is a stump of the form (ax+by)>c

Demo

Looking at the algorithm again From “A short introduction to boosting” by Freund and Schapire

Advantages A simple algorithm for learning robust classifiers  Freund & Shapire, 1995  Friedman, Hastie, Tibshhirani, 1998 Provides efficient algorithm for sparse visual feature selection  Tieu & Viola, 2000  Viola & Jones, 2003 Easy to implement, does not require external optimization tools. (From Tutorial on Object Detection by Torralba, Ferbus, and Li – ICCV 2005)

Where do the weak learners come from? Any classifier can be a weak learner Common ones:  Stump: r(x) > c  Decision tree (Another kind of classifier) Combined with Adaboost, has been dubbed “Best off-the- shelf classifier” (Friedman, Hastie, and Tibshirani)

Application: Face Detection (Viola and Jones 2001)

Features Threshold on the response to simple features (Figures copied from Robust Real-time Object Detection by Viola and Jones) (2001)

Why? Viola and Jones introduce a trick that lets them compute the response to these features very quickly  Called integral image First step:  Doing a running, cumulative sum across the image

Integral Image Can compute the response in a square very easily

These features also capture important features of faces

How well does it work? 95% Detection Rate with a false positive rate of 1 in 14084

Is it fast? In 2001, one 384x288 image every 0.7 seconds Not real-time How can we make it faster?

Use a cascade A classifier with 2 weak-learners detect 100% of the faces with a 40% false positive rate  Have eliminated 60% of the training set with very little computation Can now train a slightly more complicated classifier to eliminate even more examples

The implementation 32 layers Layer 1 – Two Weak Learners (Rejects 60% of non-faces) Layer 2 – Five Weak Learners (Rejects 80% of non-faces) Layers 3-5 – 20 Weak Learners Layers 6-7 – 50 Weak Learners Layers 8-12 – 100 Weak Learners Layers – 200 Weak Learners

Computation On average 8 features out of 4297 possible features are evaluated at every pixel On a 700Mhz Pentium III, can process a 384x288 image in seconds Almost as accurate as without a cascade

The Support Vector Machine Boosted Classifiers and SVM's are probably the two most popular classifiers today I won't get into the math behind SVM's, if you are interested, you should take the pattern recognition course (highly recommended)

The Support Vector Machine Last time, we considered the problem of linear classification We used probabilities to fit the line

The Support Vector Machine Consider a different criterion Called the margin

The Support Vector Machine Margin – minimum distance from a data point to the decision boundary

The Support Vector Machine The SVM finds the decision boundary that maximizes the margin

The Support Vector Machine Data points along the boundary are known as support vectors

Non-Linear Classification in SVMs Last time, I showed how you could do non- linear classification by using non-linear transformations of the features x y This is the decision boundary from x 2 + 8xy + y 2 > 0 This is the same as making a new set of features, then doing linear classification

Non-Linear Classification in SVMs The decision function can be expressed in terms of dot-products Each α will be zero unless the vector is a support vector

Non-Linear Classification in SVMs What if we wanted to do non-linear classification? We could transform the features and compute the dot product of the transformed features. But there may be an easier way!

The Kernel Trick Let Φ(x) be a function that transforms x into a different space A kernel function K is a function such that

Example (Burges 98) If Then This is called the polynomial kernel

Gaussian RBF Kernel One of the most commonly used kernels Equivalent to doing a dot-product in an infinite dimensional space

The Kernel Trick So, with a kernel function K, the new classification rule is Basic Ideas:  Computing the kernel function should be easier than computing a dot-product in the transformed space  Other algorithms, like logistic regression can also be “kernelized”

So what if I want to use an SVM? There are well-developed packages with Python and MATLAB interfaces  libSVM  SVMLight  SVMTorch