Rapid Object Detection using a Boosted Cascade of Simple Features Paul Viola, Michael Jones Conference on Computer Vision and Pattern Recognition 2001 (CVPR 2001)
Outline Introduction Features Learning classification functions The attentional cascade Result Conclusion
Outline Introduction Features Learning classification functions The attentional cascade Result Conclusion
Introduction New object detection framework Motive Face recognition Characteristics Robust Rapid
Contributions 1. New image representation Integral image 2. Method for constructing a classifier Selecting a small number of important features using AdaBoost 3. Method for combining classifiers in a cascade structure
Application Rapid face detector can be used in User interfaces Image databases Teleconferencing Especially, … Allow for post-processing When rapid frame-rates are not necessary Can be implemented on small low power devices Handhelds, embedded processors
Outline Introduction Features Learning classification functions The attentional cascade Result Conclusion
Features Why not pixels? The most common reason Features can encode ad-hoc domain knowledge The critical reason for this system Feature based system operates much faster 3 kind of features used Two-rectangle feature Three-rectangle feature Four-rectangle feature
Integral Image ( x,y ) ( 0,0 ) integral image original image
Rectangular sum Location A1 B2-1 C3-1 D4+1-(2+3)
Outline Introduction Features Learning classification functions The attentional cascade Result Conclusion
Learning classification functions Hypothesis Very small number of features can form an effective classifier How to find Select the single rectangle feature which best separates the positive and negative examples Weak classifier Result Features selected in early round Error rate: 0.1~0.3 Features selected in later round Error rate: 0.4~0.5 threshold featurepolarity
AdaBoost algorithm
Learning result A frontal face classifier 200 features (among 180,000) Detection rate: 95% False positive rate: 1/14084 0.7s to scan an 384*288 pixel image Not sufficient First feature selected The eyes is often darker than the nose and cheeks Second feature selected The eyes are darker than the bridge of the nose
Outline Introduction Features Learning classification functions The attentional cascade Result Conclusion
The attentional cascade Constructing goal Reject many of the negative sub-window Detect almost all positive instances False negative rate → 0 Cascade
Training a cascade of classifiers Tradeoffs Features↑ ↔ detection rates ↑ Features↑ ↔ computational time ↓ Constructing stages Training classifiers using AdaBoost Adjust the threshold to minimize false negative
Outline Introduction Features Learning classification functions The attentional cascade Result Conclusion
Result Face training set 4916 hand labeled faces Resolution: 24*24 pixels Source: random crawl of the WWW 9544 manually inspected image 350 million sub-windows The complete face detection cascade has 38 stages 6061 features 15 times faster than current system Layer features
Performance Receiver operating characteristic (ROC) What’s ROC? (please reference )
Performance comparison Detection rates for various numbers of false positives on the MIT+CMU test set containing 130 images and 507faces
Outline Introduction Features Learning classification functions The attentional cascade Result Conclusion
Conclusions An approach for object detection Minimize computation time 15 times faster than any previous approach Achieve high detection accuracy false negative false positive