Download presentation
Presentation is loading. Please wait.
Published byAlberta Bridges Modified over 9 years ago
1
Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted from Bill Freeman, MIT 6.869, April 2005) Robust Real-Time Face Detection1
2
Scan classifier over locs. & scales Robust Real-Time Face Detection2
3
“Learn” classifier from data Robust Real-Time Face Detection3 Training Data 5000 faces (frontal) 10 8 non faces Faces are normalized Scale, translation Many variations Across individuals Illumination Pose (rotation both in plane and out)
4
Characteristics of Algorithm Robust Real-Time Face Detection4 Feature set (…is huge about 16M features) Efficient feature selection using AdaBoost New image representation: Integral Image Cascaded Classifier for rapid detection Fastest known frontal face detector for gray scale images
5
Integral Image Robust Real-Time Face Detection5 Allows for fast feature evaluation Do not work directly on image intensities Compute integral image using a few operations per pixel (similar with Haar Basis functions)
6
Simple and Efficient Classifier Robust Real-Time Face Detection6 Select a small number of important features from a huge library of potential features using AdaBoost [Freund and Schapire,1995]AdaBoost
7
AdaBoost, Adaptive Boosting Robust Real-Time Face Detection7 Formulated by Yoav Freund and Robert Schapire. [1]Yoav FreundRobert Schapire [1] It is a meta-algorithm, can be used in conjunction with many other learning algorithms to improve their performance.meta-algorithm AdaBoost is adaptive subsequent classifiers are tweaked in favor of instances misclassified by previous classifiers. Sensitive to noisy data and outliers.outliers Less susceptible to the overfitting problem than most algorithms in some problems.overfitting Calls a weak classifier repeatedly in a series of rounds from T classifiers.weak classifier For each call a distribution of weights D t is updated that indicates the importance of examples in the data set On each round, the weights of each incorrectly classified example are increased Or alternatively, the weights of each correctly classified example are decreased), The new classifier focuses more on those examples
8
AdaBoost Robust Real-Time Face Detection8 Given, Initialize For For each classifier that minimizes the error with respect to the distribution is the weighted error rate of classifier If, then stop Choose, typically Update where is a normalized factor (choose so that Dt+1 will sum_x=1)
9
AdaBoost Robust Real-Time Face Detection9 Output the final classifier The equation to update the distribution Dt is constructed so that After selecting an optimal classifier for the distribution Examples that the classifier identified correctly are weighted less Examples that is identified incorrectly are weighted more. When the algorithm is testing the classifiers on the distribution it will select a classifier that better identifies those examples that the previous classifier missed.
10
Characteristics of Algorithm Robust Real-Time Face Detection10 Feature set (…is huge about 16M features) Efficient feature selection using AdaBoost New image representation: Integral Image Cascaded Classifier for rapid detection
11
Cascaded Classifier Robust Real-Time Face Detection11 Combining successively more complex classifiers in a cascade structure Dramatically increases the speed of the detector by Focusing attention on promising regions of the image. Focus of attention approaches It is often possible to rapidly determine where in an image a face might occur (Tsotsos et al., 1995; Itti et al., 1998; Amit and Geman, 1999; Fleuret and Geman, 2001). More complex processing is reserved only for these promising regions. The key measure of such an approach is the “false negative” rate of the attentional process.
12
Cascaded Classifier Robust Real-Time Face Detection12 Training process An extremely simple and efficient classifier Used as a “supervised” focus of attention operator. A face detection attentional operator Filter out over 50% of the image Preserving 99% of the faces over a large dataset This filter is exceedingly efficient it can be evaluated in 20 simple operations per location/scale
13
Overview Robust Real-Time Face Detection13 Features: form and computing Combing features to form a classifier: AdaBoost Constructing cascade of classifiers Experimental results Discussions
14
Features Robust Real-Time Face Detection14 Using features rather than image pixels Features act to encode ad-hoc domain knowledge that is difficult to learn using a finite quantity of training data Much faster than a pixel-based system
15
Image features Robust Real-Time Face Detection15 “Rectangle filters” [Papageorgiou et al. 1998] Similar to Haar wavelets Differences between sums of pixels in adjacent rectangles About 160000 rectangle features for a 200x200 image
16
Integral Image Robust Real-Time Face Detection16 Partial sum Any rectangle is D = 1+4-(2+3) Also known as: summed area tables [Crow84] boxlets [Simard98]
17
Huge library of filters Robust Real-Time Face Detection17
18
Feature Discussion Robust Real-Time Face Detection18 Primitive when compared with steerable filters, etc… Excellent for the detailed analysis of boundaries, image compression, and texture analysis. Sensitive to the presence of edges, bars, and other simple image structure Quite coarse: only three orientations (|, X, --) Overcomplete: 400 times, aspect ratio, location
19
Computational Advantage Robust Real-Time Face Detection19 Face detector scans the input at many scales starting at the base scale: detect face at a size of 24 × 24 pixels, Then at 12 scales, 1.25 larger than the last 384 × 288 pixel image is scanned at the top scale The conventional approach: Compute a pyramid of 12 images (smaller and smaller image) A fixed scale detector is scanned at each image. Computation of the pyramid directly requires significant time. It takes around.05 seconds to compute a 12 level pyramid of this size (on an Intel PIII 700 MHz processor) Implemented efficiently on conventional hardware (using bilinear interpolation to scale each level of the pyramid)
20
Computational Advantage Robust Real-Time Face Detection20 Define a meaningful set of rectangle features A single feature can be evaluated at any scale and location in a few operations. Effective detectors is constructed with two rectangle features. Computational efficiency of features Face detection process can be completed for an entire image at every scale at 15 frames per second About the same time required to evaluate the 12 level image pyramid alone.
21
Learning Classification Functions Robust Real-Time Face Detection21 Any machine learning methods Given the feature set and training set Mixture of Gaussian model (Sung and Poggio, 1998) Simple image feature and neural network (Rowley et al. 1998) Support Vector Machine (Osuna et al. 1997b) Winnow learning procedure (Roth et al. 2000) 160000 features Even though each feature can be computed very efficiently, computing the complete set is prohibitively expensive
22
AdaBoost Robust Real-Time Face Detection22 A very small number of features can be combined to form an effective classifier Boost the classification performance Combining a collection of weak classification functions to form a stronger classifier Weak learner Do not expect even the best classification function to classify the training data well The first round of learning Examples are re-weighted in order to emphasize those which were incorrectly classified by the previous weak classifier. The final strong classifier takes the form of a perceptron, a weighted combination of weak classifiers followed by a threshold.6 Training error of the strong classifier approaches zero exponentially in the number of rounds
23
AdaBoost Robust Real-Time Face Detection23 Selecting a small set of good classification functions nevertheless have significant variety Select effective features which nevertheless have significant variety Restrict the weak learner to classification functions Each function depends on a single feature Select the single rectangle feature which best separates the positive and negative examples threshold 24x24 subwindow feature Polarity indicating the direction of inequality
24
AdaBoost Robust Real-Time Face Detection24 No single feature can perform the classification task with low error Features selected early: error rates 0.1~0.3 Features selected later: error rates 0.4~0.5 Threshold single features Single node decision trees Decision stumps
25
Constructing the classifier Robust Real-Time Face Detection25 Perceptron yields a sufficiently powerful classifier Use AdaBoost to efficiently choose best features add a new h i (x) at each round each h i (x k ) is a “decision stump” b=E w (y [x> q]) a=E w (y [x< q]) x hi(x)hi(x)
26
Constructing the Classifier Robust Real-Time Face Detection26 For each round of boosting: Evaluate each rectangle filter on each example Sort examples by filter values Select best threshold for each filter (min error) Use sorting to quickly scan for optimal threshold Select best filter/threshold combination Weight is a simple function of error rate Reweight examples (There are many tricks to make this more efficient.)
27
AdaBoost using single rectangular feature Robust Real-Time Face Detection27 Given example images, Initialize weight For Normalize the weights Select the best classifier with respect to the weighted error Define with the parameters minimizing Update weights
28
AdaBoost using single rectangular feature Robust Real-Time Face Detection28 The final strong classifier
29
Good Reference on Boosting Robust Real-Time Face Detection29 Friedman, J., Hastie, T. and Tibshirani, R. Additive Logistic Regression: a Statistical View of Boosting http://www-stat.stanford.edu/~hastie/Papers/boost.ps “We show that boosting fits an additive logistic regression model by stagewise optimization of a criterion very similar to the log-likelihood, and present likelihood based alternatives. We also propose a multi-logit boosting procedure which appears to have advantages over other methods proposed so far.”
30
Learning Discussion Robust Real-Time Face Detection30 The set of weak classifier is extraordinarily large One weak classifier for each distinct feature/threshold combination KN weak classifier K: the number of features N: the number of examples Others have larger classifier sets Wrapper method M weak classifier: O(MNKN) 10^16 operations AdaBoost O(MKN) 10^11 operations
31
Learning Discussion Robust Real-Time Face Detection31 Dependency on N? Suppose that the examples are sorted by a given feature value. Any two thresholds that lie between the same pair of sorted examples is equivalent. Therefore the total number of distinct thresholds is N For each feature, sort the examples based on feature value Compute optimal threshold for that feature in a single pass over this sorted list. For each element in the list, Compute Total sum of positive example weights T+ Total sum of negative example weights T- the sum of positive weights below the current example S+ The sum of negative weights below the current example S-
32
Learning Discussion Robust Real-Time Face Detection32 Error of a threshold split the list The final application demanded a very aggressive process which would discard the vast majority of features.
33
Other feature selection Robust Real-Time Face Detection33 Papageorgiou et al.1998 Feature selection based on feature variance. 37 features out of 1734 features for every image subwindow: still large Roth et al. 2000 Feature selection process based on the Winnow exponential perceptron learning rule A very large and unusual feature set: each pixel is mapped into a binary vector of d dimensions Concatenate all pixels to nd-D vector Perceptron: assign weight to each dimension Winnow learning process: Converges to a solution where many of the weights are zero Very large number of features are retained (perhaps a few hundred or thousand).
34
Learning Results Robust Real-Time Face Detection34 The classifier constructed from 200 features would yield reasonable results 1 in 14084 For a face detector to be practical for real applications, the false positive rate must be closer to 1 in 1,000,000.
35
Learning Results Robust Real-Time Face Detection35 Features selected by AdaBoost are meaningful and easily interpreted In terms of detection Results are compelling but not sufficient for many real- world tasks. In terms of computation Very fast, requiring 0.7 seconds to scan an 384 by 288 pixel image.
36
Attentional Cascade Robust Real-Time Face Detection36 Achieves increased detection performance while radically reducing computation time Construct boost classifier Rejecting many of negative sub-windows Detecting almost all positive instances. Adjusting the strong classifier threshold to minimize false negatives: lower threshold
37
Attentional Cascade Robust Real-Time Face Detection37 Further processing 1. Evaluate the rectangle features (requires between 6 and 9 array references per feature). 2. Compute the weak classifier for each feature (requires one threshold operation per feature) 3. Combine the weak classifiers (requires one multiply per feature, an addition, and finally a threshold).
38
Attentional Cascade Robust Real-Time Face Detection38 Subsequent classifiers
39
Trading speed for accuracy Robust Real-Time Face Detection39 Given a nested set of classifier hypothesis classes Computational Risk Minimization
40
Training a Cascade of Classifiers Robust Real-Time Face Detection40 Detection Goals Good detection rates (85%~95%) and Extremely low false positive rates (on the order of 10−5 or 10−6). False positive rate of the cascade: Detection rate: To achieve a detection rate of 0.9 by a 10 stage classifier each stage has a detection rate of 0.99 false positive rate 30% (0.3010 ≈ 6 × 10−6).
41
Training a Cascade of Classifiers Robust Real-Time Face Detection41 The expected number of features: Scheme for trading off these errors is to adjust the threshold of the perceptron produced by AdaBoost the positive rate of the ith classifier the number of features in the ith classifier
42
Tradeoffs in Training Robust Real-Time Face Detection42 Classifiers with more features Achieve higher detection rates and lower false positive rates. require more time to compute An optimization framework in which the number of classifier stages, the number of features, ni, of each stage, the threshold of each stage are traded off in order to minimize the expected number of features N given a target for F and D. Finding this optimum is a tremendously difficult problem.
43
Training Cascaded Detector Robust Real-Time Face Detection43 A simple framework to produce effective and efficient classifier The user selects the maximum acceptable rate for fi and the minimum acceptable rate for di. Each layer of the cascade is trained by AdaBoost with the number of features used being increased until the target detection and false positive rates are met for this level. The rates are determined by testing the current detector on a validation set. If the overall target false positive rate is not yet met then another layer is added to the cascade. The negative set for training subsequent layers is obtained by collecting all false detections found by running the current detector on a set of images which do not contain any instances of faces.
44
Training Cascaded Detector Robust Real-Time Face Detection44 User selects values for f, the maximum acceptable false positive rate per layer and d, the minimum acceptable detection rate per layer. User selects target overall false positive rate, F_target. P = set of positive examples, N = set of negative examples F0 = 1.0; D0 = 1.0, i = 0 while F_i > F_target – i ←i + 1 – ni = 0; Fi = Fi−1 – while Fi > f × Fi−1 ∗ ni ← ni + 1 ∗ Use P and N to train a classifier with ni features using AdaBoost ∗ Evaluate current cascaded classifier on validation set to determine Fi and Di. ∗ Decrease threshold for the ith classifier until the current cascaded classifier has a detection rate of at least d × Di−1 (this also affects Fi ) – N ← ∅ – If Fi > Ftarget Evaluate the current cascaded detector on the set of non-face images put any false detections into the set N
45
Simple Experiment Robust Real-Time Face Detection45 A monolithic 200-feature classifier and A cascade of ten 20-feature classifiers Training using 5000 faces + 10000 nonface sub-windows
46
Robust Real-Time Face Detection46
47
Simple Experiment Robust Real-Time Face Detection47 A monolithic 200-feature classifier and A cascade of ten 20-feature classifiers Training using 5000 faces + 10000 nonface sub-windows Little difference between them in terms of accuracy But cascaded classifier is nearly 10 times faster since its first stage throws out most non-faces so that they are never evaluated by subsequent stages.
48
Detector Cascade Discussion Robust Real-Time Face Detection48 Similar to Rowley et al. (1998) (fast) Trained two neural networks One was moderately complex focused on a small region of the image, detected faces with a low false positive rate. Second neural network much faster focused on a larger regions of the image, and detected faces with a higher false positive rate This method two stage cascade include 38 stages
49
Training Dataset Robust Real-Time Face Detection49 4916 hand labeled faces scaled and aligned to a base resolution of 24 by 24 pixels.
50
Structure of the Detector Cascade Robust Real-Time Face Detection50 38 layer cascade of classifiers included a total of 6060 features First classifier constructed using two features rejects about 50% of non-faces while correctly detecting close to 100% of faces. The next classifier has ten features rejects 80% of nonfaces while detecting almost 100% of faces. The next two layers are 25-feature classifiers Then three 50-feature classifiers Then classifiers with variety of different numbers of features chosen according
51
Speed of Face Detector Robust Real-Time Face Detection51 Speed is proportional to the average number of features computed per sub-window. On the MIT+CMU test set, an average of 9 features (/ 6061) are computed per sub-window. On a 700 Mhz Pentium III, a 384x288 pixel image takes about 0.067 seconds to process (15 fps). Roughly 15 times faster than Rowley-Baluja-Kanade and 600 times faster than Schneiderman-Kanade.
52
Scanning The Detector Robust Real-Time Face Detection52 Multiple scales Scaling is achieved by scaling the detector itself, rather than scaling the image The features can be evaluated at any scale with the same cost Locations Subsequent locations are obtained by shifting the window some number of pixels D choice of D affects both speed and accuracy a step size > 1 pixel tends to decrease the detection rate slightly while also decreasing the number of false positives
53
Robust Real-Time Face Detection53
54
Integration of Multiple Detections Robust Real-Time Face Detection54 Postprocess: combine overlapping detections into a single detection The set of detections are first partitioned into disjoint subsets Two detections are in the same subset if their bounding regions overlap. Each partition yields a single final detection. The corners of the final bounding region are the average of the corners of all detections in the set. Decreases the number of false positives.
55
Integration of Multiple Detections Robust Real-Time Face Detection55 A simple Voting Scheme further improves results Three detections performed similarly on the final task, but in some cases errors were different. Retaining only those detections where at least 2 out of 3 detectors agree. This improves the final detection rate as well as eliminating more false positives. Since detector errors are not uncorrelated, the combination results in a measurable, but modest, improvement over the best single detector.
56
Sample results Robust Real-Time Face Detection56 MIT + CMU test set
57
Failure Cases Robust Real-Time Face Detection57 Trained on frontal, upright faces. The faces were only very roughly aligned so there is some variation in rotation both in plane and out of plane. Detect faces that are tilted up to about ±15 degrees in plane and about ±45 degrees out of plane (toward a profile view). The detector becomes unreliable with more rotation. Harsh backlighting in which the faces are very dark while the background is relatively light sometimes causes failures. Nonlinear variance normalization based on robust statistics to remove outliers The problem with such a normalization is the greatly increased computational cost within our integral image framework. Fails on significantly occluded faces. Occluded eyes: usually fail. The face with covered mouth will usually still be detected.
58
Summary (Viola-Jones) Robust Real-Time Face Detection58 Fastest known face detector for gray images Three contributions with broad applicability: Cascaded classifier yields rapid classification AdaBoost as an extremely efficient feature selector Rectangle Features + Integral Image can be used for rapid image analysis
59
Face detector comparison Robust Real-Time Face Detection59 Informal study by Andrew Gallagher, CMU, for CMU 16-721 Learning-Based Methods in Vision, Spring 2007CMU 16-721 The Viola Jones algorithm OpenCV implementation was used. (<2 sec per image). For Schneiderman and Kanade, Object Detection Using the Statistics of Parts [IJCV’04], the www.pittpatt.com demo was used. (~10-15 seconds per image, including web transmission).www.pittpatt.com
60
Robust Real-Time Face Detection60 Schneiderman Kanade Viola Jones
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.