1 Fast Asymmetric Learning for Cascade Face Detection Jiaxin Wu, and Charles Brubaker IEEE PAMI, 2008 Chun-Hao Chang 張峻豪 2009/12/01.

Slides:

Advertisements

Similar presentations

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Advertisements

Detecting Faces in Images: A Survey

Rapid Object Detection using a Boosted Cascade of Simple Features Paul Viola, Michael Jones Conference on Computer Vision and Pattern Recognition 2001.

Rapid Object Detection using a Boosted Cascade of Simple Features Paul Viola, Michael Jones Conference on Computer Vision and Pattern Recognition 2001.

Face detection Behold a state-of-the-art face detector! (Courtesy Boris Babenko)Boris Babenko.

AdaBoost & Its Applications

Face detection Many slides adapted from P. Viola.

Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei Li,

EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.

The Viola/Jones Face Detector (2001)

HCI Final Project Robust Real Time Face Detection Paul Viola, Michael Jones, Robust Real-Time Face Detetion, International Journal of Computer Vision,

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Graz University of Technology, AUSTRIA Institute for Computer Graphics and Vision Fast Visual Object Identification and Categorization Michael Grabner,

Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson

Robust Real-time Object Detection by Paul Viola and Michael Jones ICCV 2001 Workshop on Statistical and Computation Theories of Vision Presentation by.

Ensemble Tracking Shai Avidan IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE February 2007.

Face detection and recognition Many slides adapted from K. Grauman and D. Lowe.

Face Recognition with Harr Transforms and SVMs EE645 Final Project May 11, 2005 J Stautzenberger.

Ensemble Learning: An Introduction

Learning and Vision: Discriminative Models

Adaboost and its application

Robust Real-Time Object Detection Paul Viola & Michael Jones.

Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.

Dynamic Cascades for Face Detection 第三組馮堃齊、莊以暘. 2009/01/072 Outline Introduction Dynamic Cascade Boosting with a Bayesian Stump Experiments Conclusion.

Oral Defense by Sunny Tang 15 Aug 2003

Foundations of Computer Vision Rapid object / face detection using a Boosted Cascade of Simple features Presented by Christos Stoilas Rapid object / face.

Face Detection CSE 576. Face detection State-of-the-art face detection demo (Courtesy Boris Babenko)Boris Babenko.

Face Detection using the Viola-Jones Method

CS 231A Section 1: Linear Algebra & Probability Review

SVM by Sequential Minimal Optimization (SMO)

1 Graph Embedding (GE) & Marginal Fisher Analysis (MFA) 吳沛勳劉冠成韓仁智

EADS DS / SDC LTIS Page 1 7 th CNES/DLR Workshop on Information Extraction and Scene Understanding for Meter Resolution Image – 29/03/07 - Oberpfaffenhofen.

Window-based models for generic object detection Mei-Chen Yeh 04/24/2012.

Benk Erika Kelemen Zsolt

Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.

Lecture 29: Face Detection Revisited CS4670 / 5670: Computer Vision Noah Snavely.

DIEGO AGUIRRE COMPUTER VISION INTRODUCTION 1. QUESTION What is Computer Vision? 2.

Object Recognition in Images Slides originally created by Bernd Heisele.

ECE738 Advanced Image Processing Face Detection IEEE Trans. PAMI, July 1997.

A Face processing system Based on Committee Machine: The Approach and Experimental Results Presented by: Harvest Jang 29 Jan 2003.

Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.

Face Detection Ying Wu Electrical and Computer Engineering Northwestern University, Evanston, IL

Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

Linear Models for Classification

Lecture 09 03/01/2012 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.

The Viola/Jones Face Detector A “paradigmatic” method for real-time object detection Training is slow, but detection is very fast Key ideas Integral images.

Learning to Detect Faces A Large-Scale Application of Machine Learning (This material is not in the text: for further information see the paper by P.

CSSE463: Image Recognition Day 11 Due: Due: Written assignment 1 tomorrow, 4:00 pm Written assignment 1 tomorrow, 4:00 pm Start thinking about term project.

Computer Vision Spring ,-685 Instructor: S. Narasimhan WH 5409 T-R 10:30am – 11:50am Lecture #23.

Notes on HW 1 grading I gave full credit as long as you gave a description, confusion matrix, and working code Many people’s descriptions were quite short.

A Brief Introduction on Face Detection Mei-Chen Yeh 04/06/2010 P. Viola and M. J. Jones, Robust Real-Time Face Detection, IJCV 2004.

Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection Minh-Tri Pham Ph.D. Candidate and Research Associate Nanyang.

9.913 Pattern Recognition for Vision Class9 - Object Detection and Recognition Bernd Heisele.

“Joint Optimization of Cascaded Classifiers for Computer Aided Detection” by M.Dundar and J.Bi Andrey Kolobov Brandon Lucia.

Face detection Many slides adapted from P. Viola.

AdaBoost Algorithm and its Application on Object Detection Fayin Li.

Evaluation of Gender Classification Methods with Automatically Detected and Aligned Faces Speaker: Po-Kai Shen Advisor: Tsai-Rong Chang Date: 2010/6/14.

1 Munther Abualkibash University of Bridgeport, CT.

Reading: R. Schapire, A brief introduction to boosting

Cascade for Fast Detection

Session 7: Face Detection (cont.)

Lit part of blue dress and shadowed part of white dress are the same color

Yun-FuLiu Jing-MingGuo Che-HaoChang

Object detection as supervised classification

In summary C1={skin} C2={~skin} Given x=[R,G,B], is it skin or ~skin?

Feature Selection To avid “curse of dimensionality”

Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei.

Adaboost for faces. Material

Presented by: Chang Jia As for: Pattern Recognition

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Presentation transcript:

1 Fast Asymmetric Learning for Cascade Face Detection Jiaxin Wu, and Charles Brubaker IEEE PAMI, 2008 Chun-Hao Chang 張峻豪 2009/12/01

2 Outline 1. Introduction 2. Recall Adaboost 3. System Flowchart 4. Forward Feature Selection (FFS) 5. Linear Asymmetric Classifier (LAC) 6. Experimental Result 7. Conclusion

3 1. Introduction 2. Recall Adaboost 3. System Flowchart 4. Forward Feature Selection (FFS) 5. Linear Asymmetric Classifier (LAC) 6. Experimental Result 7. Conclusion

4 1. Introduction  Observe three asymmetries in face detection problem: 1. Uneven class priors – Training database: # of Positives Vs. # of Negatives. 2. Goal asymmetry – Detection Rate Vs. False Positive Rate => EER 3. Unequal complexity with positive and negative classes – Face Vs. Car (Non-Face) => Easy to classify Face Vs. Animal (Non-Face) => Hard to classify  This paper present a framework similar to Adaboost: but faster in learning have the freedom to design an ensemble classifier.

5 1. Introduction  Decoupled classifier design step into Feature selection and Ensemble classifier. (ex: FDA, SVM…)  Proposed Forward Feature Selection (FFS) and Linear Asymmetric Classifier (LAC).  Advantage : 1.FFS is about 2.5 ~3.5 times faster than Fast Adaboost and 50~100 times faster than Adaboost in training process. 2.FFS only requires about 3% memory usage as that of Adaboost. 3.Have the freedom to design an ensemble classifier.

6 1. Introduction: Adaboost Vs. FFS+LAC Adaboost FFS+LAC h1h1 α1α1 α2α2 h2h2 α3α3 α4α4 α5α5 h5h5 h3h3 h4h4 FFS h1h1 h2h2 h4h4 h5h5 h3h LAC h2h2 α1’α1’ h1h1 h5h5 h4h4 h3h3 α2’α2’ α5’α5’ α3’α3’ α4’α4’ image z i, weight w i =1/N p Assume N=N p +N n N p positive samples N n negative samples image z i, weight w i =1/ N p c i is the label of z i, and h 1 is the weak classifier with weight α 1 k = # of weak classifier

7 1. Introduction 2. Recall Adaboost 3. System Flowchart 4. Forward Feature Selection (FFS) 5. Linear Asymmetric Classifier (LAC) 6. Experimental Result 7. Conclusion

8 2. Recall Adaboost (1/2) 1. Input Data N Training Data N samples 2. Cascaded Framework Learning goal satisfied? Adding new node Node Learning F T 1. Normalize weights. 2. Pick appropriate threshold for each weak classifier h i, where 1<i<M. M is the number of features. Feature Selection and Ensemble Classifier (Adaboost) 3. Cascaded Detector 4. Update weights with input data z, and h ’ s corresponding mask (feature) m and threshold τ. H k+1 3. Choose the classifier, h t, with the lowest error. Were coupling together (not separable) α h H1H1 H2H2 H3H3 N=Np+NnN=Np+Nn,α t is the weight of h t T iterations

9 2. Recall Adaboost (2/2)  α t is decided once h t is chosen.  Weight w t,i is updated by the error rate ε i at the end of each iteration, where w t,i is the weight of sample i at iteration t.  Feature = (Filter, Position) Feature Value = Feature * example, * = convolution Classifier = (Feature, Threshold)

10 1. Introduction 2. Recall Adaboost 3. System Flowchart 4. Forward Feature Selection (FFS) 5. Linear Asymmetric Classifier (LAC) 6. Experimental Result 7. Conclusion

11 3. System Flowchart: Notations  z: Input example.  x: Vector of feature values of a positive example.  y: Vector of feature values of a negative example.  : Covariance matrix of x.   a: Optimal weight.  b: Optimal threshold. sample x i h1h1 h3h3 h2h2 h4h4 weak classifiers Convolution

12 3. System Flowchart: FFS+LAC 1. Input Data N Training Data N samples 2. Cascaded Framework Learning goal satisfied? Adding new node Node Learning F T 1. Build Feature Table. 2. Choose the weak classifier, h i, that makes H ’ has the smallest error rate. Feature Selection (FFS) Ensemble Classifier (LAC) 3. Cascaded Detector H k+1 Separable H1H1 H2H2 H3H3 N=Np+NnN=Np+Nn Θ is the threshold of H(z) T iterations

13 3. System Flowchart: Q&A (1/2)  Q1: What ’ s the difference between Adaboost and FFS+LAC?  A1: We can ’ t separate Adaboost into feature selection and ensemble classifier step.  (Adaboost) α i is decided once h i is chosen.  (FFS) α i is 1 for all h i. Each sample weight w i in Adaboost is updated at the end of each round.  Q2: Why using FFS instead of Adaboost?  A2: FFS: 1-bit for each weight storage. (only 3% memory) Adaboost: 32-bits each.  Q3: Can Adaboost be expedited by a pre-computing strategy?  A3: Yes. If the weights keeps unchanged (no weight update)=> fast Adaboost.

14 3. System Flowchart: Q&A (2/2)  Conclusion: 1. (Training Process) FFS is about : a. 2.5 ~3.5 times faster than Fast Adaboost. b. 50~100 times faster than Adaboost. c. only 3% memory usage. 2.It ’ s much easier to implement in plate form. 3.We have freedom to design our own algorithms (ex: SVM, FDA … ) for solving different problems.

15 1. Introduction 2. Recall Adaboost 3. System Flowchart 4. Forward Feature Selection (FFS) 5. Linear Asymmetric Classifier (LAC) 6. Experimental Result 7. Conclusion

16 4. Forward Feature Selection (FFS) Fig. 1. Adaboost vs FFS Train all weak classifiers Add the feature with minimum weighted error to the ensemble Adjust threshold of the Ensemble to meet the learning goal (a) Adaboost O(NMTlogN) O(T) O(N) T Train all weak classifiers Add the feature to minimize error of the current ensemble Adjust threshold of the ensemble to meet the Learning goal (b) FFS O(NMlogN) O(NMT) O(N) T

17 4. FFS: Adaboost Vs. FFS - Adaboost w1w1 w2w2 w4w4 w3w3 w5w5 w6w6 Samples: h1h1 h2h2 ε 1 =2ε 2 =5 h3h3 h4h4 ε 3 =7ε 4 =4 Iteration 1 w1’w1’ w2’w2’ w4’w4’ w3’w3’ w5’w5’ w6’w6’ Samples: 取 min => ε 1 Iteration 2 ε 2 ’ = 9ε 3 ’ =5ε 4 ’ = 3 取 min => ε 4 Error: Updated Weak classifiers:

18 4. FFS: Adaboost Vs. FFS - FFS w1w1 w2w2 w4w4 w3w3 w5w5 w6w6 Samples: Iteration 1 w1w1 w2w2 w4w4 w3w3 w5w5 w6w6 Samples: Iteration 2 ε 2 ’ =6ε 3 ’ =10ε 4 ’ =8 取 min => ε 2 Error: h1h1 Unchanged The chosen one in first iteration h1h1 h2h2 ε 1 =2ε 2 =5 h3h3 h4h4 ε 3 =7ε 4 =4 取 min => ε 1 Error: Weak classifiers:

19 4. FFS: Training Process a. for i =1 to M do find θ that makes H ’ has the smallest error rate end for b. k<=arg min 1 ≦ i ≦ M ε i c. Find h k that makes H ’ has the smallest error rate For each feature i a. Sort the feature value V i1 ~V iN b. Choose a threshold τ with the smallest error i = M i < M, i=i+1 t = T t < T (With N samples (images), M features) 2. Build Feature Table: Size MxN with input example z, and h ’ s corresponding mask m and threshold τ. 1. Train all weak classifiers 3. Add the feature to minimize error of the current ensemble 4. Adjust value of θ: adjust θ to make H has a 50% false positive rate on the training set ε i <=the error rate of H ’ with the chosen θ (threshold) V in S <= ψ Fix Theta => Adjust V

20 4. FFS: Example - Train all weak classifiers For a given feature i, 1 ≦ i ≦ M Feature values for each example Sort Set N = 6 Initial ε= = Non-FaceFace w1w1 w2w2 w6w6 w3w3 w5w5 w4w ε= 1-0.2= ε= = ε= = ε= =0 5. ε= 0+0.6= ε= = τ=16 zTm zTmzizi Paper: P. 5, Algo. 3 threshold P N

21 4. FFS: Example – Feature Selection Using Table (1/2) M=4 N=6 t = 1 取 h 3 為第一輪的 weak h Pos Neg Classify result while apply h 3 to sample 2

22 4. FFS: Example – Feature Selection Using Table (2/2) M=4, N=6 t = 2 Pos Neg 取 h 1 為第二輪的 weak h

23 4. FFS: FFS Vs. Adaboost Three major difference between FFS and Adaboost in implementation  No weight update – Faster, due to the Table.  Total vote (confidence value before normalized) in FFS is between 0 and T. Adaboost can be any real number.  Criterion: FFS: selected feature should make the ensemble classifier has smallest error on the training set. Adaboost: choose a feature with the smallest weighted error on the training set.

24 1. Introduction 2. Recall Adaboost 3. System Flowchart 4. Forward Feature Selection (FFS) 5. Linear Asymmetric Classifier (LAC) 6. Experimental Result 7. Conclusion

25 5. Linear Asymmetric Classifier (LAC) We can treat β as (1 - false positive rate) We want to optimize this

26 5. LAC: Definitions k Normalize term

27 5. LAC: Derivation (1/3) Constraint (1) can be re-written as We want to maximize It ’ s equal to minimize Take (2) into b

28 5. LAC: Derivation (2/3) =k=k Assume y is symmetric distribution, we have For β=0.5 we have k2k2 k1k1 Fig. 2.

29 5. LAC: Derivation (3/3) Fig. 3. Normality test for a T y, in which y is a feature vector extracted from non- face data, and a is drawn from the uniform distribution [0 1] T. It ’ s more likely to be normal distribution while we are close to the red line

30 5. LAC: Optimal Result Compared with FDA FDALAC Optimal Result Output is a classifier

31 1. Introduction 2. Recall Adaboost 3. System Flowchart 4. Forward Feature Selection (FFS) 5. Linear Asymmetric Classifier (LAC) 6. Experimental Result 7. Conclusion

32 6. Experimental Result: LAC Vs. FDA Fig. 4. Comparing LAC and FDA on synthetic data set when both x and y are Gaussians. (red for positives, blue for negatives)

33 6. Experimental Result: Synthetic Data Fig. 5. Synthetic data where y is not symmetric.

34 Fig. 6. Experiments comparing different linear discrimination functions. In 6(a), training data sets are collected from AdaBoost+FDA cascade ’ s node 11 to 21. And in 6(b), were collected from AdaBoost+LAC. 6. Experimental Result: Adaboost Vs. FDA&LAC

35 Fig. 7. Experiments comparing different linear discrimination functions. Training sets were collected from AdaBoost cascade ’ s. 6. Experimental Result: Adaboost Vs. FDA&LAC

36 6. Experimental Result: Adaboost Vs. FFS Fig. 8. Experiments comparing cascade performances on the MIT+CMU test set (ROC).

37 Fig. 8. Experiments comparing cascade performances on the MIT+CMU test set. (a) with post-processing. (b) without post-processing. (a) (b) 6. Experimental Result: Effect of Post- Processing??? Post-processing:????

38 1. Introduction 2. Recall Adaboost 3. System Flowchart 4. Forward Feature Selection (FFS) 5. Linear Asymmetric Classifier (LAC) 6. Experimental Result 7. Conclusion

39 7. Conclusion (Contribution??)  Three types of asymmetric are categorized.  Decoupled classifier design step into feature selection and design ensemble classifier.  Proposed FFS for feature selection, and it is 2.5~3.5 times faster than Adaboost with only 3% memory usage as that of Adaboost.  Proposed LAC for ensemble classifier to solve the asymmetric problem.

Problems: Q&A??? 40

41 Reference  [1] J. Wu, C. Brubaker, "Fast Asymmetric Learning for Cascade Face Detection", IEEE transaction on Pattern Analysis and Machine Intelligence, pp , March  [2] P. Viola and M Jones, "Robust Real-time Object Detection", Intl. J. Computer Vision, 57(2): pp ,  [3] P. Viola and M Jones, " Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade", NIPS, pp , 2001.