Learning and Vision: Discriminative Models

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Detecting Faces in Images: A Survey
EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.
Rapid Object Detection using a Boosted Cascade of Simple Features Paul Viola, Michael Jones Conference on Computer Vision and Pattern Recognition 2001.
Face detection State-of-the-art face detection demo (Courtesy Boris Babenko)Boris Babenko.
Face detection Behold a state-of-the-art face detector! (Courtesy Boris Babenko)Boris Babenko.
Support Vector Machines
Learning and Vision: Discriminative Models
SVM—Support Vector Machines
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Face Detection & Synthesis using 3D Models & OpenCV Learning Bit by Bit Don Miller ITP, Spring 2010.
AdaBoost & Its Applications
Face detection Many slides adapted from P. Viola.
Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei Li,
EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.
1 Fast Asymmetric Learning for Cascade Face Detection Jiaxin Wu, and Charles Brubaker IEEE PAMI, 2008 Chun-Hao Chang 張峻豪 2009/12/01.
The Viola/Jones Face Detector Prepared with figures taken from “Robust real-time object detection” CRL 2001/01, February 2001.
The Viola/Jones Face Detector (2001)
Support Vector Machines (and Kernel Methods in general)
Rapid Object Detection using a Boosted Cascade of Simple Features
Robust Real-time Object Detection by Paul Viola and Michael Jones ICCV 2001 Workshop on Statistical and Computation Theories of Vision Presentation by.
Face detection and recognition Many slides adapted from K. Grauman and D. Lowe.
Face Recognition with Harr Transforms and SVMs EE645 Final Project May 11, 2005 J Stautzenberger.
Adaboost and its application
Robust Real-Time Object Detection Paul Viola & Michael Jones.
Viola and Jones Object Detector Ruxandra Paun EE/CS/CNS Presentation
Artificial Neural Networks
Foundations of Computer Vision Rapid object / face detection using a Boosted Cascade of Simple features Presented by Christos Stoilas Rapid object / face.
Face Detection CSE 576. Face detection State-of-the-art face detection demo (Courtesy Boris Babenko)Boris Babenko.
FACE DETECTION AND RECOGNITION By: Paranjith Singh Lohiya Ravi Babu Lavu.
An Introduction to Support Vector Machines Martin Law.
Face Detection using the Viola-Jones Method
CS 231A Section 1: Linear Algebra & Probability Review
Detecting Pedestrians Using Patterns of Motion and Appearance Paul Viola Microsoft Research Irfan Ullah Dept. of Info. and Comm. Engr. Myongji University.
Support Vector Machines Mei-Chen Yeh 04/20/2010. The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined.
Window-based models for generic object detection Mei-Chen Yeh 04/24/2012.
Lecture 29: Face Detection Revisited CS4670 / 5670: Computer Vision Noah Snavely.
CSC2515 Fall 2008 Introduction to Machine Learning Lecture 11a Boosting and Naïve Bayes All lecture slides will be available as.ppt,.ps, &.htm at
Face detection Slides adapted Grauman & Liebe’s tutorial
Visual Object Recognition
BOOSTING David Kauchak CS451 – Fall Admin Final project.
An Introduction to Support Vector Machines (M. Law)
ECE738 Advanced Image Processing Face Detection IEEE Trans. PAMI, July 1997.
Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Face Detection Ying Wu Electrical and Computer Engineering Northwestern University, Evanston, IL
Face Detection Using Large Margin Classifiers Ming-Hsuan Yang Dan Roth Narendra Ahuja Presented by Kiang “Sean” Zhou Beckman Institute University of Illinois.
CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.
Adaboost and Object Detection Xu and Arun. Principle of Adaboost Three cobblers with their wits combined equal Zhuge Liang the master mind. Failure is.
HCI/ComS 575X: Computational Perception Instructor: Alexander Stoytchev
Lecture 09 03/01/2012 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.
The Viola/Jones Face Detector A “paradigmatic” method for real-time object detection Training is slow, but detection is very fast Key ideas Integral images.
Boosted Particle Filter: Multitarget Detection and Tracking Fayin Li.
Learning to Detect Faces A Large-Scale Application of Machine Learning (This material is not in the text: for further information see the paper by P.
Lecture 15: Eigenfaces CS6670: Computer Vision Noah Snavely.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Lecture 15: Eigenfaces CS6670: Computer Vision Noah Snavely.
FACE DETECTION : AMIT BHAMARE. WHAT IS FACE DETECTION ? Face detection is computer based technology which detect the face in digital image. Trivial task.
Face detection Behold a state-of-the-art face detector! (Courtesy Boris Babenko)Boris Babenko slides adapted from Svetlana Lazebnik.
Notes on HW 1 grading I gave full credit as long as you gave a description, confusion matrix, and working code Many people’s descriptions were quite short.
Face detection Many slides adapted from P. Viola.
Face Detection and Recognition Reading: Chapter and, optionally, “Face Recognition using Eigenfaces” by M. Turk and A. Pentland.
AdaBoost Algorithm and its Application on Object Detection Fayin Li.
Recognition Part II: Face Detection via AdaBoost Linda Shapiro CSE
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Learning to Detect Faces Rapidly and Robustly
Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei.
Face Detection via AdaBoost
Lecture 29: Face Detection Revisited
Presentation transcript:

Learning and Vision: Discriminative Models Paul Viola Microsoft Research viola@microsoft.com

Overview Perceptrons Support Vector Machines AdaBoost Face and pedestrian detection AdaBoost Faces Building Fast Classifiers Trading off speed for accuracy… Face and object detection Memory Based Learning Simard Moghaddam

History Lesson 1950’s Perceptrons are cool Very simple learning rule, can learn “complex” concepts Generalized perceptrons are better -- too many weights 1960’s Perceptron’s stink (M+P) Some simple concepts require exponential # of features Can’t possibly learn that, right? 1980’s MLP’s are cool (R+M / PDP) Sort of simple learning rule, can learn anything (?) Create just the features you need 1990 MLP’s stink Hard to train : Slow / Local Minima 1996 Perceptron’s are cool

Why did we need multi-layer perceptrons? Problems like this seem to require very complex non-linearities. Minsky and Papert showed that an exponential number of features is necessary to solve generic problems.

Why an exponential number of features? 14th Order??? 120 Features N=21, k=5 --> 65,000 features

MLP’s vs. Perceptron MLP’s are hard to train… Takes a long time (unpredictably long) Can converge to poor minima MLP are hard to understand What are they really doing? Perceptrons are easy to train… Type of linear programming. Polynomial time. One minimum which is global. Generalized perceptrons are easier to understand. Polynomial functions.

Perceptron Training is Linear Programming Polynomial time in the number of variables and in the number of constraints. What about linearly inseparable?

Rebirth of Perceptrons How to train effectively Linear Programming (… later quadratic programming) Though on-line works great too. How to get so many features inexpensively?!? Kernel Trick How to generalize with so many features? VC dimension. (Or is it regularization?) Support Vector Machines

Lemma 1: Weight vectors are simple The weight vector lives in a sub-space spanned by the examples… Dimensionality is determined by the number of examples not the complexity of the space.

Lemma 2: Only need to compare examples This is called the kernel matrix.

Simple Kernels yield Complex Features

But Kernel Perceptrons Can Generalize Poorly

Perceptron Rebirth: Generalization Too many features … Occam is unhappy Perhaps we should encourage smoothness? This not a convergent. Smoother

Linear Program is not unique The linear program can return any multiple of the correct weight vector... Slack variables & Weight prior - Force the solution toward zero

Definition of the Margin Geometric Margin: Gap between negatives and positives measured perpendicular to a hyperplane Classifier Margin

Require non-zero margin Allows solutions with zero margin Enforces a non-zero margin between examples and the decision boundary.

Constrained Optimization Find the smoothest function that separates data Quadratic Programming (similar to Linear Programming) Single Minima Polynomial Time algorithm

Constrained Optimization 2

SVM: examples

SVM: Key Ideas Augment inputs with a very large feature set Polynomials, etc. Use Kernel Trick(TM) to do this efficiently Enforce/Encourage Smoothness with weight penalty Introduce Margin Find best solution using Quadratic Programming

SVM: Zip Code recognition Data dimension: 256 Feature Space: 4 th order roughly 100,000,000 dims

The Classical Face Detection Process Larger Scale Smallest Scale 50,000 Locations/Scales

Classifier is Learned from Labeled Data Training Data 5000 faces All frontal 108 non faces Faces are normalized Scale, translation Many variations Across individuals Illumination Pose (rotation both in plane and out) This situation with negative examples is actually quite common… where negative examples are free.

Key Properties of Face Detection Each image contains 10 - 50 thousand locs/scales Faces are rare 0 - 50 per image 1000 times as many non-faces as faces Extremely small # of false positives: 10-6 As I said earlier the classifier is evaluated 50,000 Faces are quite rare…. Perhaps 1 or 2 faces per image A reasonable goal is to make the false positive rate less than the true positive rate...

Sung and Poggio

Rowley, Baluja & Kanade First Fast System - Low Res to Hi

Osuna, Freund, and Girosi

Support Vectors

P, O, & G: First Pedestrian Work Flaws: very poor model for feature selection. Little real motivation for features at all. Why not use the original image.

On to AdaBoost Given a set of weak classifiers None much better than random Iteratively combine classifiers Form a linear combination Training error converges to 0 quickly Test error is related to training margin

AdaBoost Freund & Shapire Weak Classifier 1 Weights Increased Weak Final classifier is linear combination of weak classifiers

AdaBoost Properties

AdaBoost: Super Efficient Feature Selector Features = Weak Classifiers Each round selects the optimal feature given: Previous selected features Exponential Loss

Boosted Face Detection: Image Features “Rectangle filters” Similar to Haar wavelets Papageorgiou, et al. For real problems results are only as good as the features used... This is the main piece of ad-hoc (or domain) knowledge Rather than the pixels, we have selected a very large set of simple functions Sensitive to edges and other critcal features of the image ** At multiple scales Since the final classifier is a perceptron it is important that the features be non-linear… otherwise the final classifier will be a simple perceptron. We introduce a threshold to yield binary features Unique Binary Features

Feature Selection For each round of boosting: Evaluate each rectangle filter on each example Sort examples by filter values Select best threshold for each filter (min Z) Select best filter/threshold (= Feature) Reweight examples M filters, T thresholds, N examples, L learning time O( MT L(MTN) ) Naïve Wrapper Method O( MN ) Adaboost feature selector

Example Classifier for Face Detection A classifier with 200 rectangle features was learned using AdaBoost 95% correct detection on test set with 1 in 14084 false positives. Not quite competitive... ROC curve for 200 feature classifier

Building Fast Classifiers Given a nested set of classifier hypothesis classes Computational Risk Minimization vs false neg determined by % False Pos % Detection 50 50 100 In general simple classifiers, while they are more efficient, they are also weaker. We could define a computational risk hierarchy (in analogy with structural risk minimization)… A nested set of classifier classes The training process is reminiscent of boosting… - previous classifiers reweight the examples used to train subsequent classifiers The goal of the training process is different - instead of minimizing errors minimize false positives FACE IMAGE SUB-WINDOW Classifier 1 F T NON-FACE Classifier 3 Classifier 2

Other Fast Classification Work Simard Rowley (Faces) Fleuret & Geman (Faces)

Cascaded Classifier 50% 20% 2% IMAGE SUB-WINDOW 1 Feature 5 Features 20 Features FACE F F F NON-FACE NON-FACE NON-FACE A 1 feature classifier achieves 100% detection rate and about 50% false positive rate. A 5 feature classifier achieves 100% detection rate and 40% false positive rate (20% cumulative) using data from previous stage. A 20 feature classifier achieve 100% detection rate with 10% false positive rate (2% cumulative)

Comparison to Other Systems (94.8) Roth-Yang-Ahuja 94.4 Schneiderman-Kanade 89.9 90.1 89.2 86.0 83.2 Rowley-Baluja-Kanade 93.7 91.8 91.1 90.8 90.0 88.8 85.2 78.3 Viola-Jones 422 167 110 95 78 65 50 31 10 False Detections Detector

Output of Face Detector on Test Images

Solving other “Face” Tasks Profile Detection Facial Feature Localization Demographic Analysis

Feature Localization Surprising properties of our framework The cost of detection is not a function of image size Just the number of features Learning automatically focuses attention on key regions Conclusion: the “feature” detector can include a large contextual region around the feature

Feature Localization Features Learned features reflect the task

Profile Detection

More Results

Profile Features

Features, Features, Features In almost every case: Good Features beat Good Learning Learning beats No Learning Critical classifier ratio: AdaBoost >> SVM This is not to say that a