Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted from Bill Freeman, MIT 6.869, April 2005) Robust Real-Time Face Detection1

Scan classifier over locs. & scales Robust Real-Time Face Detection2

“Learn” classifier from data Robust Real-Time Face Detection3  Training Data 5000 faces (frontal) 10 8 non faces Faces are normalized  Scale, translation  Many variations Across individuals Illumination Pose (rotation both in plane and out)

Characteristics of Algorithm Robust Real-Time Face Detection4 Feature set (…is huge about 16M features) Efficient feature selection using AdaBoost New image representation: Integral Image Cascaded Classifier for rapid detection  Fastest known frontal face detector for gray scale images

Integral Image Robust Real-Time Face Detection5  Allows for fast feature evaluation  Do not work directly on image intensities  Compute integral image using a few operations per pixel (similar with Haar Basis functions)

Simple and Efficient Classifier Robust Real-Time Face Detection6  Select a small number of important features from a huge library of potential features using AdaBoost [Freund and Schapire,1995]AdaBoost

AdaBoost, Adaptive Boosting Robust Real-Time Face Detection7  Formulated by Yoav Freund and Robert Schapire. [1]Yoav FreundRobert Schapire [1]  It is a meta-algorithm, can be used in conjunction with many other learning algorithms to improve their performance.meta-algorithm  AdaBoost is adaptive  subsequent classifiers are tweaked in favor of instances misclassified by previous classifiers.  Sensitive to noisy data and outliers.outliers  Less susceptible to the overfitting problem than most algorithms in some problems.overfitting  Calls a weak classifier repeatedly in a series of rounds from T classifiers.weak classifier  For each call  a distribution of weights D t is updated that indicates the importance of examples in the data set  On each round,  the weights of each incorrectly classified example are increased  Or alternatively, the weights of each correctly classified example are decreased),  The new classifier focuses more on those examples

AdaBoost Robust Real-Time Face Detection8  Given,  Initialize  For  For each classifier that minimizes the error with respect to the distribution  is the weighted error rate of classifier  If, then stop  Choose, typically  Update  where is a normalized factor (choose so that Dt+1 will sum_x=1)

AdaBoost Robust Real-Time Face Detection9  Output the final classifier  The equation to update the distribution Dt is constructed so that  After selecting an optimal classifier for the distribution  Examples that the classifier identified correctly are weighted less  Examples that is identified incorrectly are weighted more.  When the algorithm is testing the classifiers on the distribution  it will select a classifier that better identifies those examples that the previous classifier missed.

Characteristics of Algorithm Robust Real-Time Face Detection10 Feature set (…is huge about 16M features) Efficient feature selection using AdaBoost New image representation: Integral Image Cascaded Classifier for rapid detection

Cascaded Classifier Robust Real-Time Face Detection11  Combining successively more complex classifiers in a cascade structure  Dramatically increases the speed of the detector by  Focusing attention on promising regions of the image.  Focus of attention approaches  It is often possible to rapidly determine where in an image a face might occur (Tsotsos et al., 1995; Itti et al., 1998; Amit and Geman, 1999; Fleuret and Geman, 2001).  More complex processing is reserved only for these promising regions.   The key measure of such an approach is the “false negative” rate of the attentional process.

Cascaded Classifier Robust Real-Time Face Detection12  Training process  An extremely simple and efficient classifier  Used as a “supervised” focus of attention operator.  A face detection attentional operator  Filter out over 50% of the image  Preserving 99% of the faces over a large dataset  This filter is exceedingly efficient  it can be evaluated in 20 simple operations per location/scale

Overview Robust Real-Time Face Detection13  Features: form and computing  Combing features to form a classifier: AdaBoost  Constructing cascade of classifiers  Experimental results  Discussions

Features Robust Real-Time Face Detection14  Using features rather than image pixels  Features act to encode ad-hoc domain knowledge that is difficult to learn using a finite quantity of training data  Much faster than a pixel-based system

Image features Robust Real-Time Face Detection15 “Rectangle filters” [Papageorgiou et al. 1998]  Similar to Haar wavelets Differences between sums of pixels in adjacent rectangles About 160000 rectangle features for a 200x200 image

Integral Image Robust Real-Time Face Detection16  Partial sum  Any rectangle is  D = 1+4-(2+3)  Also known as: summed area tables [Crow84] boxlets [Simard98]

Huge library of filters Robust Real-Time Face Detection17

Feature Discussion Robust Real-Time Face Detection18  Primitive when compared with steerable filters, etc…  Excellent for the detailed analysis of boundaries, image compression, and texture analysis.  Sensitive to the presence of edges, bars, and other simple image structure  Quite coarse: only three orientations (|, X, --)  Overcomplete: 400 times, aspect ratio, location

Computational Advantage Robust Real-Time Face Detection19  Face detector scans the input at many scales  starting at the base scale: detect face at a size of 24 × 24 pixels,  Then at 12 scales, 1.25 larger than the last  384 × 288 pixel image is scanned at the top scale  The conventional approach:  Compute a pyramid of 12 images (smaller and smaller image)  A fixed scale detector is scanned at each image.  Computation of the pyramid directly requires significant time.  It takes around.05 seconds to compute a 12 level pyramid of this size (on an Intel PIII 700 MHz processor)  Implemented efficiently on conventional hardware (using bilinear interpolation to scale each level of the pyramid)

Computational Advantage Robust Real-Time Face Detection20  Define a meaningful set of rectangle features  A single feature can be evaluated at any scale and location in a few operations.  Effective detectors is constructed with two rectangle features.  Computational efficiency of features  Face detection process can be completed for an entire image at every scale at 15 frames per second  About the same time required to evaluate the 12 level image pyramid alone.

Learning Classification Functions Robust Real-Time Face Detection21  Any machine learning methods  Given the feature set and training set  Mixture of Gaussian model (Sung and Poggio, 1998)  Simple image feature and neural network (Rowley et al. 1998)  Support Vector Machine (Osuna et al. 1997b)  Winnow learning procedure (Roth et al. 2000) 160000 features Even though each feature can be computed very efficiently, computing the complete set is prohibitively expensive

AdaBoost Robust Real-Time Face Detection22  A very small number of features can be combined to form an effective classifier  Boost the classification performance  Combining a collection of weak classification functions to form a stronger classifier  Weak learner  Do not expect even the best classification function to classify the training data well  The first round of learning  Examples are re-weighted in order to emphasize those which were incorrectly classified by the previous weak classifier.  The final strong classifier  takes the form of a perceptron, a weighted combination of weak classifiers followed by a threshold.6 Training error of the strong classifier approaches zero exponentially in the number of rounds

AdaBoost Robust Real-Time Face Detection23  Selecting a small set of good classification functions nevertheless have significant variety  Select effective features which nevertheless have significant variety  Restrict the weak learner to classification functions  Each function depends on a single feature  Select the single rectangle feature  which best separates the positive and negative examples threshold 24x24 subwindow feature Polarity indicating the direction of inequality

AdaBoost Robust Real-Time Face Detection24  No single feature can perform the classification task with low error  Features selected early: error rates 0.1~0.3  Features selected later: error rates 0.4~0.5  Threshold single features  Single node decision trees  Decision stumps

Constructing the classifier Robust Real-Time Face Detection25  Perceptron yields a sufficiently powerful classifier  Use AdaBoost to efficiently choose best features add a new h i (x) at each round each h i (x k ) is a “decision stump” b=E w (y [x> q]) a=E w (y [x< q]) x hi(x)hi(x) 

Constructing the Classifier Robust Real-Time Face Detection26  For each round of boosting: Evaluate each rectangle filter on each example Sort examples by filter values Select best threshold for each filter (min error)  Use sorting to quickly scan for optimal threshold Select best filter/threshold combination Weight is a simple function of error rate Reweight examples  (There are many tricks to make this more efficient.)

AdaBoost using single rectangular feature Robust Real-Time Face Detection27  Given example images,  Initialize weight  For  Normalize the weights  Select the best classifier with respect to the weighted error  Define with the parameters minimizing  Update weights

AdaBoost using single rectangular feature Robust Real-Time Face Detection28  The final strong classifier

Good Reference on Boosting Robust Real-Time Face Detection29  Friedman, J., Hastie, T. and Tibshirani, R. Additive Logistic Regression: a Statistical View of Boosting http://www-stat.stanford.edu/~hastie/Papers/boost.ps  “We show that boosting fits an additive logistic regression model by stagewise optimization of a criterion very similar to the log-likelihood, and present likelihood based alternatives. We also propose a multi-logit boosting procedure which appears to have advantages over other methods proposed so far.”

Learning Discussion Robust Real-Time Face Detection30  The set of weak classifier is extraordinarily large  One weak classifier for each distinct feature/threshold combination  KN weak classifier  K: the number of features  N: the number of examples  Others have larger classifier sets  Wrapper method  M weak classifier: O(MNKN) 10^16 operations  AdaBoost  O(MKN) 10^11 operations

Learning Discussion Robust Real-Time Face Detection31  Dependency on N?  Suppose that the examples are sorted by a given feature value.  Any two thresholds that lie between the same pair of sorted examples is equivalent.  Therefore the total number of distinct thresholds is N  For each feature, sort the examples based on feature value  Compute optimal threshold for that feature in a single pass over this sorted list.  For each element in the list, Compute  Total sum of positive example weights T+  Total sum of negative example weights T-  the sum of positive weights below the current example S+  The sum of negative weights below the current example S-

Learning Discussion Robust Real-Time Face Detection32  Error of a threshold split the list  The final application demanded a very aggressive process which would discard the vast majority of features.

Other feature selection Robust Real-Time Face Detection33  Papageorgiou et al.1998  Feature selection based on feature variance.  37 features out of 1734 features for every image subwindow: still large  Roth et al. 2000  Feature selection process based on the Winnow exponential perceptron learning rule  A very large and unusual feature set: each pixel is mapped into a binary vector of d dimensions  Concatenate all pixels to nd-D vector  Perceptron: assign weight to each dimension  Winnow learning process:  Converges to a solution where many of the weights are zero  Very large number of features are retained (perhaps a few hundred or thousand).

Learning Results Robust Real-Time Face Detection34  The classifier constructed from 200 features would yield reasonable results 1 in 14084 For a face detector to be practical for real applications, the false positive rate must be closer to 1 in 1,000,000.

Learning Results Robust Real-Time Face Detection35  Features selected by AdaBoost are meaningful and easily interpreted  In terms of detection  Results are compelling but not sufficient for many real- world tasks.  In terms of computation  Very fast, requiring 0.7 seconds to scan an 384 by 288 pixel image.

Attentional Cascade Robust Real-Time Face Detection36  Achieves increased detection performance while radically reducing computation time  Construct boost classifier  Rejecting many of negative sub-windows  Detecting almost all positive instances.  Adjusting the strong classifier threshold to minimize false negatives: lower threshold

Attentional Cascade Robust Real-Time Face Detection37 Further processing 1. Evaluate the rectangle features (requires between 6 and 9 array references per feature). 2. Compute the weak classifier for each feature (requires one threshold operation per feature) 3. Combine the weak classifiers (requires one multiply per feature, an addition, and finally a threshold).

Attentional Cascade Robust Real-Time Face Detection38  Subsequent classifiers

Trading speed for accuracy Robust Real-Time Face Detection39  Given a nested set of classifier hypothesis classes  Computational Risk Minimization

Training a Cascade of Classifiers Robust Real-Time Face Detection40  Detection Goals  Good detection rates (85%~95%) and  Extremely low false positive rates (on the order of 10−5 or 10−6).  False positive rate of the cascade:  Detection rate: To achieve a detection rate of 0.9 by a 10 stage classifier each stage has a detection rate of 0.99 false positive rate 30% (0.3010 ≈ 6 × 10−6).

Training a Cascade of Classifiers Robust Real-Time Face Detection41  The expected number of features:  Scheme for trading off these errors is to adjust the threshold of the perceptron produced by AdaBoost the positive rate of the ith classifier the number of features in the ith classifier

Tradeoffs in Training Robust Real-Time Face Detection42  Classifiers with more features  Achieve higher detection rates and lower false positive rates.  require more time to compute  An optimization framework in which  the number of classifier stages,  the number of features, ni, of each stage,  the threshold of each stage are traded off in order to minimize the expected number of features N given a target for F and D.  Finding this optimum is a tremendously difficult problem.

Training Cascaded Detector Robust Real-Time Face Detection43  A simple framework to produce effective and efficient classifier  The user selects the maximum acceptable rate for fi and the minimum acceptable rate for di.  Each layer of the cascade is trained by AdaBoost with the number of features used being increased until the target detection and false positive rates are met for this level.  The rates are determined by testing the current detector on a validation set.  If the overall target false positive rate is not yet met then another layer is added to the cascade.  The negative set for training subsequent layers is obtained by collecting all false detections found by running the current detector on a set of images which do not contain any instances of faces.

Training Cascaded Detector Robust Real-Time Face Detection44  User selects values for f, the maximum acceptable false positive rate per layer and d, the minimum acceptable detection rate per layer.  User selects target overall false positive rate, F_target.  P = set of positive examples, N = set of negative examples  F0 = 1.0; D0 = 1.0, i = 0  while F_i > F_target  – i ←i + 1  – ni = 0; Fi = Fi−1  – while Fi > f × Fi−1  ∗ ni ← ni + 1  ∗ Use P and N to train a classifier with ni features using AdaBoost  ∗ Evaluate current cascaded classifier on validation set to determine Fi and Di.  ∗ Decrease threshold for the ith classifier until the current cascaded classifier has a detection rate of at least d × Di−1 (this also affects Fi )  – N ← ∅  – If Fi > Ftarget  Evaluate the current cascaded detector on the set of non-face images  put any false detections into the set N

Simple Experiment Robust Real-Time Face Detection45  A monolithic 200-feature classifier and  A cascade of ten 20-feature classifiers  Training using  5000 faces + 10000 nonface sub-windows

Robust Real-Time Face Detection46

Simple Experiment Robust Real-Time Face Detection47  A monolithic 200-feature classifier and  A cascade of ten 20-feature classifiers  Training using  5000 faces + 10000 nonface sub-windows  Little difference between them in terms of accuracy  But cascaded classifier is nearly 10 times faster  since its first stage throws out most non-faces so that they are never evaluated by subsequent stages.

Detector Cascade Discussion Robust Real-Time Face Detection48  Similar to Rowley et al. (1998) (fast)  Trained two neural networks  One was moderately complex  focused on a small region of the image,  detected faces with a low false positive rate.  Second neural network much faster  focused on a larger regions of the image, and  detected faces with a higher false positive rate  This method  two stage cascade  include 38 stages

Training Dataset Robust Real-Time Face Detection49  4916 hand labeled faces scaled and aligned to a base resolution of 24 by 24 pixels.

Structure of the Detector Cascade Robust Real-Time Face Detection50  38 layer cascade of classifiers included a total of 6060 features  First classifier constructed using two features  rejects about 50% of non-faces while  correctly detecting close to 100% of faces.  The next classifier has ten features  rejects 80% of nonfaces while  detecting almost 100% of faces.  The next two layers are 25-feature classifiers  Then three 50-feature classifiers  Then classifiers with variety of different numbers of features chosen according

Speed of Face Detector Robust Real-Time Face Detection51  Speed is proportional to the average number of features computed per sub-window.  On the MIT+CMU test set, an average of 9 features (/ 6061) are computed per sub-window.  On a 700 Mhz Pentium III, a 384x288 pixel image takes about 0.067 seconds to process (15 fps).  Roughly 15 times faster than Rowley-Baluja-Kanade and 600 times faster than Schneiderman-Kanade.

Scanning The Detector Robust Real-Time Face Detection52  Multiple scales  Scaling is achieved by scaling the detector itself, rather than scaling the image  The features can be evaluated at any scale with the same cost  Locations  Subsequent locations are obtained by shifting the window some number of pixels D  choice of D affects both speed and accuracy  a step size > 1 pixel tends to  decrease the detection rate slightly while also  decreasing the number of false positives

Robust Real-Time Face Detection53

Integration of Multiple Detections Robust Real-Time Face Detection54  Postprocess: combine overlapping detections into a single detection  The set of detections are first partitioned into disjoint subsets  Two detections are in the same subset if their bounding regions overlap.  Each partition yields a single final detection.  The corners of the final bounding region are the average of the corners of all detections in the set.  Decreases the number of false positives.

Integration of Multiple Detections Robust Real-Time Face Detection55  A simple Voting Scheme further improves results  Three detections performed similarly on the final task, but in some cases errors were different.  Retaining only those detections where at least 2 out of 3 detectors agree.  This improves the final detection rate as well as eliminating more false positives.  Since detector errors are not uncorrelated, the combination results in a measurable, but modest, improvement over the best single detector.

Sample results Robust Real-Time Face Detection56 MIT + CMU test set

Failure Cases Robust Real-Time Face Detection57  Trained on frontal, upright faces.  The faces were only very roughly aligned so there is some variation in rotation both in plane and out of plane.  Detect faces that are tilted up to about ±15 degrees in plane and about ±45 degrees out of plane (toward a profile view).  The detector becomes unreliable with more rotation.  Harsh backlighting in which the faces are very dark while the background is relatively light sometimes causes failures.  Nonlinear variance normalization based on robust statistics to remove outliers  The problem with such a normalization is the greatly increased computational cost within our integral image framework.  Fails on significantly occluded faces.  Occluded eyes: usually fail.  The face with covered mouth will usually still be detected.

Summary (Viola-Jones) Robust Real-Time Face Detection58 Fastest known face detector for gray images Three contributions with broad applicability:  Cascaded classifier yields rapid classification  AdaBoost as an extremely efficient feature selector  Rectangle Features + Integral Image can be used for rapid image analysis

Face detector comparison Robust Real-Time Face Detection59  Informal study by Andrew Gallagher, CMU, for CMU 16-721 Learning-Based Methods in Vision, Spring 2007CMU 16-721  The Viola Jones algorithm OpenCV implementation was used. (<2 sec per image).  For Schneiderman and Kanade, Object Detection Using the Statistics of Parts [IJCV’04], the www.pittpatt.com demo was used. (~10-15 seconds per image, including web transmission).www.pittpatt.com

Robust Real-Time Face Detection60 Schneiderman Kanade Viola Jones

Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Similar presentations

Presentation on theme: "Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Similar presentations

Presentation on theme: "Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted."— Presentation transcript:

Similar presentations

About project

Feedback