Presentation is loading. Please wait.

Presentation is loading. Please wait.

AdaBoost & Its Applications

Similar presentations


Presentation on theme: "AdaBoost & Its Applications"— Presentation transcript:

1 AdaBoost & Its Applications
主講人:虞台文

2 Outline Overview The AdaBoost Algorithm How and why AdaBoost works?
AdaBoost for Face Detection

3 AdaBoost & Its Applications
Overview

4 AdaBoost Introduction Boosting Adaptive A learning algorithm
Building a strong classifier a lot of weaker ones

5 AdaBoost Concept . weak classifiers strong classifier
slightly better than random

6 Weaker Classifiers . weak classifiers strong classifier
Each weak classifier learns by considering one simple feature T most beneficial features for classification should be selected How to define features? select beneficial features? train weak classifiers? manage (weight) training samples? associate weight to each weak classifier? . weak classifiers strong classifier slightly better than random

7 The Strong Classifiers
How good the strong one will be? . weak classifiers strong classifier slightly better than random

8 AdaBoost & Its Applications
The AdaBoost Algorithm

9 The AdaBoost Algorithm
Given: Initialization: For : Find classifier which minimizes error wrt Dt ,i.e., Weight classifier: Update distribution:

10 The AdaBoost Algorithm
Given: Initialization: For : Find classifier which minimizes error wrt Dt ,i.e., Weight classifier: Update distribution: Output final classifier:

11 Boosting illustration
Weak Classifier 1

12 Boosting illustration
Weights Increased

13 Boosting illustration
Weak Classifier 2

14 Boosting illustration
Weights Increased

15 Boosting illustration
Weak Classifier 3

16 Boosting illustration
Final classifier is a combination of weak classifiers

17 AdaBoost & Its Applications
How and why AdaBoost works?

18 The AdaBoost Algorithm
What goal the AdaBoost wants to reach? The AdaBoost Algorithm Given: Initialization: For : Find classifier which minimizes error wrt Dt ,i.e., Weight classifier: Update distribution: Output final classifier:

19 The AdaBoost Algorithm
What goal the AdaBoost wants to reach? The AdaBoost Algorithm They are goal dependent. Given: Initialization: For : Find classifier which minimizes error wrt Dt ,i.e., Weight classifier: Update distribution: Output final classifier:

20 Minimize exponential loss
Goal Final classifier: Minimize exponential loss

21 Goal Minimize exponential loss Maximize the margin yH(x)
Final classifier: Minimize exponential loss Maximize the margin yH(x)

22 Minimize Goal Final classifier: Define with Then,

23 Minimize Final classifier: Define with Then, Set

24 Minimize Final classifier: Define with Then,

25 Minimize Final classifier: Define with Then,

26 Minimize Final classifier: Define with Then,

27 Minimize Final classifier: Define with Then,

28 Minimize Final classifier: Define with Then,

29 Minimize Final classifier: Define with Then, maximized when

30 Minimize Final classifier: Define with Then, At time t

31 Minimize Final classifier: At time t At time 1 At time t+1 Define with
Then, At time t At time 1 At time t+1

32 AdaBoost & Its Applications
AdaBoost for Face Detection

33 The Task of Face Detection
Many slides adapted from P. Viola

34 Basic Idea Slide a window across image and evaluate a face model at every location.

35 Challenges Slide a window across image and evaluate a face model at every location. Sliding window detector must evaluate tens of thousands of location/scale combinations. Faces are rare: 0–10 per image For computational efficiency, we should try to spend as little time as possible on the non-face windows A megapixel image has ~106 pixels and a comparable number of candidate face locations To avoid having a false positive in every image image, our false positive rate has to be less than 106

36 The Viola/Jones Face Detector
A seminal approach to real-time object detection Training is slow, but detection is very fast Key ideas Integral images for fast feature evaluation Boosting for feature selection Attentional cascade for fast rejection of non-face windows P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. CVPR 2001. P. Viola and M. Jones. Robust real-time face detection. IJCV 57(2), 2004.

37 Image Features Rectangle filters

38 Image Features Rectangle filters

39 Size of Feature Space A+B C D Rectangle filters
How many number of possible rectangle features for a 24x24 detection region? A+B C D Rectangle filters

40 Feature Selection A+B C D
How many number of possible rectangle features for a 24x24 detection region? A+B C D What features are good for face detection?

41 Feature Selection A+B C D
How many number of possible rectangle features for a 24x24 detection region? A+B C Can we create a good classifier using just a small subset of all possible features? How to select such a subset? D

42 Integral images The integral image computes a value at each pixel (x, y) that is the sum of the pixel values above and to the left of (x, y), inclusive. (x, y)

43 Computing the Integral Image
The integral image computes a value at each pixel (x, y) that is the sum of the pixel values above and to the left of (x, y), inclusive. This can quickly be computed in one pass through the image. (x, y)

44 Computing Sum within a Rectangle
D B Only 3 additions are required for any size of rectangle! C A

45 Scaling Integral image enables us to evaluate all rectangle sizes in constant time. Therefore, no image scaling is necessary. Scale the rectangular features instead! 1 2 3 4 5 6

46 Boosting Boosting is a classification scheme that works by combining weak learners into a more accurate ensemble classifier A weak learner need only do better than chance Training consists of multiple boosting rounds During each boosting round, we select a weak learner that does well on examples that were hard for the previous weak learners “Hardness” is captured by weights attached to training examples Y. Freund and R. Schapire, A short introduction to boosting, Journal of Japanese Society for Artificial Intelligence, 14(5): , September, 1999.

47 The AdaBoost Algorithm
Given: Initialization: For : Find classifier which minimizes error wrt Dt ,i.e., Weight classifier: Update distribution:

48 The AdaBoost Algorithm
Given: Initialization: For : Find classifier which minimizes error wrt Dt ,i.e., Weight classifier: Update distribution: Output final classifier:

49 Weak Learners for Face Detection
What base learner is proper for face detection? Given: Initialization: For : Find classifier which minimizes error wrt Dt ,i.e., Weight classifier: Update distribution: Output final classifier:

50 Weak Learners for Face Detection
value of rectangle feature parity threshold window

51 Boosting Training set contains face and nonface examples
Initially, with equal weight For each round of boosting: Evaluate each rectangle filter on each example Select best threshold for each filter Select best filter/threshold combination Reweight examples Computational complexity of learning: O(MNK) M rounds, N examples, K features

52 Features Selected by Boosting
First two features selected by boosting: This feature combination can yield 100% detection rate and 50% false positive rate

53 ROC Curve for 200-Feature Classifier
A 200-feature classifier can yield 95% detection rate and a false positive rate of 1 in ROC Curve for 200-Feature Classifier Not good enough! To be practical for real application, the false positive rate must be closer to 1 in 1,000,000.

54 Attentional Cascade We start with simple classifiers which reject many of the negative sub-windows while detecting almost all positive sub-windows Positive response from the first classifier triggers the evaluation of a second (more complex) classifier, and so on A negative outcome at any point leads to the immediate rejection of the sub-window Classifier 1 Classifier 2 Classifier 3 IMAGE SUB-WINDOW T T FACE T NON-FACE F F NON-FACE F NON-FACE

55 Attentional Cascade 100 50 % False Pos % Detection ROC Curve Chain classifiers that are progressively more complex and have lower false positive rates Classifier 1 Classifier 2 Classifier 3 IMAGE SUB-WINDOW T T FACE T NON-FACE F F NON-FACE F NON-FACE

56 Detection Rate and False Positive Rate for Chained Classifiers
The detection rate and the false positive rate of the cascade are found by multiplying the respective rates of the individual stages A detection rate of 0.9 and a false positive rate on the order of 106 can be achieved by a 10-stage cascade if each stage has a detection rate of 0.99 ( ≈ 0.9) and a false positive rate of about 0.30 (0.310 ≈ 6106 ) Classifier 1 Classifier 2 Classifier 3 IMAGE SUB-WINDOW T T FACE T NON-FACE F F NON-FACE F NON-FACE

57 Training the Cascade Set target detection and false positive rates for each stage Keep adding features to the current stage until its target rates have been met Need to lower AdaBoost threshold to maximize detection (as opposed to minimizing total classification error) Test on a validation set If the overall false positive rate is not low enough, then add another stage Use false positives from current stage as the negative training examples for the next stage

58 Training the Cascade

59 ROC Curves Cascaded Classifier to Monlithic Classifier

60 ROC Curves Cascaded Classifier to Monlithic Classifier
There is little difference between the two in terms of accuracy. There is a big difference in terms of speed. The cascaded classifier is nearly 10 times faster since its first stage throws out most non-faces so that they arenever evaluated by subsequent stages.

61 The Implemented System
Training Data 5000 faces All frontal, rescaled to 24x24 pixels 300 million non-faces 9500 non-face images Faces are normalized Scale, translation Many variations Across individuals Illumination Pose

62 Structure of the Detector Cascade
Combining successively more complex classifiers in cascade 38 stages included a total of 6060 features Reject Sub-Window 1 2 3 4 5 6 7 8 38 Face F T All Sub-Windows

63 Structure of the Detector Cascade
2 features, reject 50% non-faces, detect 100% faces 10 features, reject 80% non-faces, detect 100% faces 25 features 50 features All Sub-Windows by algorithm 1 T 2 T 3 T 4 T 5 T 6 T 7 T 8 T 38 T Face F F F F F F F F F Reject Sub-Window

64 Speed of the Final Detector
On a 700 Mhz Pentium III processor, the face detector can process a 384 288 pixel image in about .067 seconds” 15 Hz 15 times faster than previous detector of comparable accuracy (Rowley et al., 1998) Average of 8 features evaluated per window on test set

65 Image Processing Training  all example sub-windows were variance normalized to minimize the effect of different lighting conditions Detection  variance normalized as well Can be computed using integral image Can be computed using integral image of squared image

66 Scanning the Detector Scaling is achieved by scaling the detector itself, rather than scaling the image Good detection results for scaling factor of 1.25 The detector is scanned across location Subsequent locations are obtained by shifting the window [s] pixels, where s is the current scale The result for  = 1.0 and  = 1.5 were reported

67 Merging Multiple Detections

68 ROC Curves for Face Detection

69 Output of Face Detector on Test Images

70 Facial Feature Localization
Other Detection Tasks Facial Feature Localization Profile Detection Male vs. female

71 Facial Feature Localization
Other Detection Tasks Facial Feature Localization Profile Detection

72 Other Detection Tasks Male vs. Female

73 Conclusions How adaboost works? Why adaboost works?
Adaboost for face detection Rectangle features Integral images for fast computation Boosting for feature selection Attentional cascade for fast rejection of negative windows


Download ppt "AdaBoost & Its Applications"

Similar presentations


Ads by Google