Download presentation
Presentation is loading. Please wait.
1
Stanford CS223B Computer Vision, Winter 2007 Lecture 5 Advanced Image Filters Professors Sebastian Thrun and Jana Košecká CAs: Vaibhav Vaish and David Stavens
2
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Advanced Features: Topics n Advanced Edge Detection n Global Image Features (Hough Transform) n Templates, Image Pyramid n SIFT Features n Learning with Many Simple Features
3
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Features in Matlab im = imread('bridge.jpg'); bw = rgb2gray(im); edge(im,’sobel’) - (almost) linear edge(im,’canny’) - not local, no closed form
4
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Sobel Operator -1 -2 -1 0 0 0 1 2 1 -1 0 1 -2 0 2 -1 0 1 S1=S1=S 2 = Edge Magnitude = Edge Direction = S 1 + S 2 2 2 tan -1 S1S1 S2S2
5
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Sobel in Matlab edge(im,’sobel’)
6
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Canny Edge Detector edge(im,’canny’)
7
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Comparison CannySobel
8
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Canny Edge Detection Steps: 1.Apply derivative of Gaussian 2.Non-maximum suppression Thin multi-pixel wide “ridges” down to single pixel width 3.Linking and thresholding Low, high edge-strength thresholds Accept all edges over low threshold that are connected to edge over high threshold
9
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Non-Maximum Supression Non-maximum suppression: Select the single maximum point across the width of an edge.
10
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Linking to the Next Edge Point Assume the marked point q is an edge point. Take the normal to the gradient at that point and use this to predict continuation points (either r or p).
11
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Edge Hysteresis n Hysteresis: A lag or momentum factor Idea: Maintain two thresholds k high and k low –Use k high to find strong edges to start edge chain –Use k low to find weak edges which continue edge chain n Typical ratio of thresholds is roughly k high / k low = 2
12
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Canny Edge Detection (Example) courtesy of G. Loy gap is gone Original image Strong edges only Strong + connected weak edges Weak edges
13
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Canny Edge Detection (Example) Using Matlab with default thresholds
14
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Bridge Example Again edge(im,’canny’)
15
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Corner Effects
16
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Summary: Canny Edge Detection n Most commonly used method n Traces edges, accommodates variations in contrast n Not a linear filter! n Problems with corners
17
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Advanced Features: Topics n Advanced Edge Detection n Global Image Features (Hough Transform) n Templates, Image Pyramid n SIFT Features n Learning with Many Simple Features
18
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Towards Global Features Local versus global
19
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Vanishing Points
20
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Vanishing Points
21
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Vanishing Points A. Canaletto [1740], Arrival of the French Ambassador in Venice
22
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Vanishing Points…? A. Canaletto [1740], Arrival of the French Ambassador in Venice
23
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 From Edges to Lines
24
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Hough Transform y x
25
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 m Hough Transform: Quantization Detecting Lines by finding maxima / clustering in parameter space x y
26
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Hough Transform: Algorithm n For each image point, determine –most likely line parameters b,m (direction of gradient) –strength (magnitude of gradient) n Increment parameter counter by strength value n Cluster in parameter space, pick local maxima
27
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Hough Transform: Results Hough TransformImageEdge detection
28
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Summary Hough Transform n Smart counting –Local evidence for global features –Organized in a table –Careful with parameterization! n Problem: Curse of dimensionality –Works great for simple features with 3 unknowns –Will fail for complex objects n Problem: Not a local algorithm
29
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Advanced Features: Topics n Advanced Edge Detection n Global Image Features (Hough Transform) n Templates, Image Pyramid n SIFT Features n Learning with Many Simple Features
30
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Features for Object Detection/Recognition Want to find … in here
31
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Templates n Find an object in an image! n We want Invariance! –Scaling –Rotation –Illumination –Perspective Projection
32
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Convolution with Templates % read image im = imread('bridge.jpg'); bw = double(im(:,:,1))./ 256; imshow(bw) % apply FFT FFTim = fft2(bw); bw2 = real(ifft2(FFTim)); imshow(bw2) % define a kernel kernel=zeros(size(bw)); kernel(1, 1) = 1; kernel(1, 2) = -1; FFTkernel = fft2(kernel); % apply the kernel and check out the result FFTresult = FFTim.* FFTkernel; result = real(ifft2(FFTresult)); imshow(result) % select an image patch patch = bw(221:240,351:370); imshow(patch) patch = patch - (sum(sum(patch)) / size(patch,1) / size(patch, 2)); kernel=zeros(size(bw)); kernel(1:size(patch,1),1:size(patch,2)) = patch; FFTkernel = fft2(kernel); % apply the kernel and check out the result FFTresult = FFTim.* FFTkernel; result = max(0, real(ifft2(FFTresult))); result = result./ max(max(result)); result = (result.^ 1 > 0.5); imshow(result) % alternative convolution imshow(conv2(bw, patch, 'same'))
33
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Template Convolution
34
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Template Convolution
35
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Aside: Convolution Theorem Fourier Transform of g : F is invertible
36
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Convolution with Templates % read image im = imread('bridge.jpg'); bw = double(im(:,:,1))./ 256;; imshow(bw) % apply FFT FFTim = fft2(bw); bw2 = real(ifft2(FFTim)); imshow(bw2) % define a kernel kernel=zeros(size(bw)); kernel(1, 1) = 1; kernel(1, 2) = -1; FFTkernel = fft2(kernel); % apply the kernel and check out the result FFTresult = FFTim.* FFTkernel; result = real(ifft2(FFTresult)); imshow(result) % select an image patch patch = bw(221:240,351:370); imshow(patch) patch = patch - (sum(sum(patch)) / size(patch,1) / size(patch, 2)); kernel=zeros(size(bw)); kernel(1:size(patch,1),1:size(patch,2)) = patch; FFTkernel = fft2(kernel); % apply the kernel and check out the result FFTresult = FFTim.* FFTkernel; result = max(0, real(ifft2(FFTresult))); result = result./ max(max(result)); result = (result.^ 1 > 0.5); imshow(result) % alternative convolution imshow(conv2(bw, patch, 'same'))
37
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Convolution with Templates n Invariances: –Scaling –Rotation –Illumination –Perspective Projection n Provides –Good localization No
38
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Scale Invariance: Image Pyramid
39
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Pyramid Convolution with Templates n Invariances: –Scaling –Rotation –Illumination –Perspective Projection n Provides –Good localization No Yes No
40
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Pyramid warning: Aliasing
41
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Aliasing Effects Constructing a pyramid by taking every second pixel leads to layers that badly misrepresent the top layer Slide credit: Gary Bradski
42
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Solution to Aliasing n Convolve with Gaussian
43
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Templates with Image Pyramid n Invariance: –Scaling –Rotation –Illumination –Perspective Projection n Provides –Good localization No (maybe rotate template?) Yes No Not really No
44
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Template Matching, Commercial http://www.seeingmachines.com/facelab.htm
45
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Advanced Features: Topics n Advanced Edge Detection n Global Image Features (Hough Transform) n Templates, Image Pyramid n SIFT Features n Learning with Many Simple Features
46
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Improved Invariance Handling Want to find … in here
47
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 SIFT Features n Invariances: –Scaling –Rotation –Illumination –Deformation n Provides –Good localization Yes Not really Yes
48
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 SIFT Reference Distinctive image features from scale-invariant keypoints. David G. Lowe, International Journal of Computer Vision, 60, 2 (2004), pp. 91-110. SIFT = Scale Invariant Feature Transform
49
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Invariant Local Features n Image content is transformed into local feature coordinates that are invariant to translation, rotation, scale, and other imaging parameters SIFT Features
50
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Advantages of invariant local features n Locality: features are local, so robust to occlusion and clutter (no prior segmentation) n Distinctiveness: individual features can be matched to a large database of objects n Quantity: many features can be generated for even small objects n Efficiency: close to real-time performance n Extensibility: can easily be extended to wide range of differing feature types, with each adding robustness
51
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 SIFT On-A-Slide 1. Enforce invariance to scale: Compute Gaussian difference max, for may different scales; non-maximum suppression, find local maxima: keypoint candidates 2. Localizable corner: For each maximum fit quadratic function. Compute center with sub-pixel accuracy by setting first derivative to zero. 3. Eliminate edges: Compute ratio of eigenvalues, drop keypoints for which this ratio is larger than a threshold. 4. Enforce invariance to orientation: Compute orientation, to achieve rotation invariance, by finding the strongest second derivative direction in the smoothed image (possibly multiple orientations). Rotate patch so that orientation points up. 5. Compute feature signature: Compute a "gradient histogram" of the local image region in a 4x4 pixel region. Do this for 4x4 regions of that size. Orient so that largest gradient points up (possibly multiple solutions). Result: feature vector with 128 values (15 fields, 8 gradients). 6. Enforce invariance to illumination change and camera saturation: Normalize to unit length to increase invariance to illumination. Then threshold all gradients, to become invariant to camera saturation.
52
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 SIFT On-A-Slide 1. Enforce invariance to scale: Compute Gaussian difference max, for may different scales; non-maximum suppression, find local maxima: keypoint candidates 2. Localizable corner: For each maximum fit quadratic function. Compute center with sub-pixel accuracy by setting first derivative to zero. 3. Eliminate edges: Compute ratio of eigenvalues, drop keypoints for which this ratio is larger than a threshold. 4. Enforce invariance to orientation: Compute orientation, to achieve rotation invariance, by finding the strongest second derivative direction in the smoothed image (possibly multiple orientations). Rotate patch so that orientation points up. 5. Compute feature signature: Compute a "gradient histogram" of the local image region in a 4x4 pixel region. Do this for 4x4 regions of that size. Orient so that largest gradient points up (possibly multiple solutions). Result: feature vector with 128 values (15 fields, 8 gradients). 6. Enforce invariance to illumination change and camera saturation: Normalize to unit length to increase invariance to illumination. Then threshold all gradients, to become invariant to camera saturation.
53
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Finding “Keypoints” (Corners) Idea: Find Corners, but scale invariance Approach: n Run linear filter (diff of Gaussians) n Do this at different resolutions of image pyramid
54
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Difference of Gaussians Minus Equals
55
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Difference of Gaussians surf(fspecial('gaussian',40,4)) surf(fspecial('gaussian',40,8)) surf(fspecial('gaussian',40,8) - fspecial('gaussian',40,4))
56
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Find Corners with DiffOfGauss im =imread('bridge.jpg'); bw = double(im(:,:,1)) / 256; for i = 1 : 10 gaussD = fspecial('gaussian',40,2*i) - fspecial('gaussian',40,i); res = abs(conv2(bw, gaussD, 'same')); res = res / max(max(res)); imshow(res) ; title(['\bf i = ' num2str(i)]); drawnow end
57
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Gaussian Kernel Size i=1
58
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Gaussian Kernel Size i=2
59
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Gaussian Kernel Size i=3
60
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Gaussian Kernel Size i=4
61
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Gaussian Kernel Size i=5
62
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Gaussian Kernel Size i=6
63
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Gaussian Kernel Size i=7
64
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Gaussian Kernel Size i=8
65
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Gaussian Kernel Size i=9
66
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Gaussian Kernel Size i=10
67
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Key point localization n Detect maxima and minima of difference-of-Gaussian in scale space
68
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Example of keypoint detection (a) 233x189 image (b) 832 DOG extrema (c) 729 above threshold
69
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 SIFT On-A-Slide 1. Enforce invariance to scale: Compute Gaussian difference max, for may different scales; non-maximum suppression, find local maxima: keypoint candidates 2. Localizable corner: For each maximum fit quadratic function. Compute center with sub-pixel accuracy by setting first derivative to zero. 3. Eliminate edges: Compute ratio of eigenvalues, drop keypoints for which this ratio is larger than a threshold. 4. Enforce invariance to orientation: Compute orientation, to achieve rotation invariance, by finding the strongest second derivative direction in the smoothed image (possibly multiple orientations). Rotate patch so that orientation points up. 5. Compute feature signature: Compute a "gradient histogram" of the local image region in a 4x4 pixel region. Do this for 4x4 regions of that size. Orient so that largest gradient points up (possibly multiple solutions). Result: feature vector with 128 values (15 fields, 8 gradients). 6. Enforce invariance to illumination change and camera saturation: Normalize to unit length to increase invariance to illumination. Then threshold all gradients, to become invariant to camera saturation.
70
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Example of keypoint detection Threshold on value at DOG peak and on ratio of principle curvatures (Harris approach) (c) 729 left after peak value threshold (from 832) (d) 536 left after testing ratio of principle curvatures
71
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 SIFT On-A-Slide 1. Enforce invariance to scale: Compute Gaussian difference max, for may different scales; non-maximum suppression, find local maxima: keypoint candidates 2. Localizable corner: For each maximum fit quadratic function. Compute center with sub-pixel accuracy by setting first derivative to zero. 3. Eliminate edges: Compute ratio of eigenvalues, drop keypoints for which this ratio is larger than a threshold. 4. Enforce invariance to orientation: Compute orientation, to achieve rotation invariance, by finding the strongest second derivative direction in the smoothed image (possibly multiple orientations). Rotate patch so that orientation points up. 5. Compute feature signature: Compute a "gradient histogram" of the local image region in a 4x4 pixel region. Do this for 4x4 regions of that size. Orient so that largest gradient points up (possibly multiple solutions). Result: feature vector with 128 values (15 fields, 8 gradients). 6. Enforce invariance to illumination change and camera saturation: Normalize to unit length to increase invariance to illumination. Then threshold all gradients, to become invariant to camera saturation.
72
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Select canonical orientation n Create histogram of local gradient directions computed at selected scale n Assign canonical orientation at peak of smoothed histogram n Each key specifies stable 2D coordinates (x, y, scale, orientation)
73
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 SIFT On-A-Slide 1. Enforce invariance to scale: Compute Gaussian difference max, for may different scales; non-maximum suppression, find local maxima: keypoint candidates 2. Localizable corner: For each maximum fit quadratic function. Compute center with sub-pixel accuracy by setting first derivative to zero. 3. Eliminate edges: Compute ratio of eigenvalues, drop keypoints for which this ratio is larger than a threshold. 4. Enforce invariance to orientation: Compute orientation, to achieve rotation invariance, by finding the strongest second derivative direction in the smoothed image (possibly multiple orientations). Rotate patch so that orientation points up. 5. Compute feature signature: Compute a "gradient histogram" of the local image region in a 4x4 pixel region. Do this for 4x4 regions of that size. Orient so that largest gradient points up (possibly multiple solutions). Result: feature vector with 128 values (15 fields, 8 gradients). 6. Enforce invariance to illumination change and camera saturation: Normalize to unit length to increase invariance to illumination. Then threshold all gradients, to become invariant to camera saturation.
74
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 SIFT vector formation n Thresholded image gradients are sampled over 16x16 array of locations in scale space n Create array of orientation histograms n 8 orientations x 4x4 histogram array = 128 dimensions
75
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Nearest-neighbor matching to feature database n Hypotheses are generated by approximate nearest neighbor matching of each feature to vectors in the database –SIFT use best-bin-first (Beis & Lowe, 97) modification to k-d tree algorithm –Use heap data structure to identify bins in order by their distance from query point n Result: Can give speedup by factor of 1000 while finding nearest neighbor (of interest) 95% of the time
76
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 3D Object Recognition n Extract outlines with background subtraction
77
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 3D Object Recognition n Only 3 keys are needed for recognition, so extra keys provide robustness n Affine model is no longer as accurate
78
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Recognition under occlusion
79
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Test of illumination invariance n Same image under differing illumination 273 keys verified in final match
80
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Examples of view interpolation
81
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Location recognition
82
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 SIFT n Invariances: –Scaling –Rotation –Illumination –Perspective Projection n Provides –Good localization Yes Maybe Yes
83
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 SOFT for Matlab (at UCLA)
84
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 SIFT demos Run sift_compile sift_demo2
85
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Summary SIFT 1. Enforce invariance to scale: Compute Gaussian difference max, for may different scales; non-maximum suppression, find local maxima: keypoint candidates 2. Localizable corner: For each maximum fit quadratic function. Compute center with sub-pixel accuracy by setting first derivative to zero. 3. Eliminate edges: Compute ratio of eigenvalues, drop keypoints for which this ratio is larger than a threshold. 4. Enforce invariance to orientation: Compute orientation, to achieve rotation invariance, by finding the strongest second derivative direction in the smoothed image (possibly multiple orientations). Rotate patch so that orientation points up. 5. Compute feature signature: Compute a "gradient histogram" of the local image region in a 4x4 pixel region. Do this for 4x4 regions of that size. Orient so that largest gradient points up (possibly multiple solutions). Result: feature vector with 128 values (15 fields, 8 gradients). 6. Enforce invariance to illumination change and camera saturation: Normalize to unit length to increase invariance to illumination. Then threshold all gradients, to become invariant to camera saturation. Defines state-of-the-art in invariant feature matching!
86
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Advanced Features: Topics n Advanced Edge Detection n Global Image Features (Hough Transform) n Templates, Image Pyramid n SIFT Features n Learning with Many Simple Features
87
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 A totally different idea n Use many very simple features n Learn cascade of tests for target object n Efficient if: –features easy to compute –cascade short
88
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Using Many Simple Features n Viola Jones / Haar Features (Generalized) Haar Features: rectangular blocks, white or black 3 types of features: two rectangles: horizontal/vertical three rectangles four rectangles in 24x24 window: 180,000 possible features
89
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Integral Image Def: The integral image at location (x,y), is the sum of the pixel values above and to the left of (x,y), inclusive. We can calculate the integral image representation of the image in a single pass. (x,y) s(x,y) = s(x,y-1) + i(x,y) ii(x,y) = ii(x-1,y) + s(x,y) (0,0) x y Slide credit: Gyozo Gidofalvi
90
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Efficient Computation of Rectangle Value Using the integral image representation one can compute the value of any rectangular sum in constant time. Example: Rectangle D ii(4) + ii(1) – ii(2) – ii(3) As a result two-, three-, and four-rectangular features can be computed with 6, 8 and 9 array references respectively. Slide credit: Gyozo Gidofalvi
91
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Idea 1: Linear Separator Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce
92
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Linear Separator for Image features (highly related to Vapnik’s Support Vector Machines) Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce
93
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Problem n How to find hyperplane? n How to avoid evaluating 180,000 features? n Answer: Boosting [AdaBoost, Freund/Shapire] –Finds small set of features that are “sufficient” –Generalizes very well (a lot of max-margin theory) –Requires positive and negative examples
94
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 AdaBoost Idea (in Viola/Jones): n Given set of “weak” classifiers: –Pick best one –Reweight training examples, so that misclassified images have larger weight –Reiterate; then linearly combine resulting classifiers Weak classifiers: Haar features
95
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 AdaBoost Weak Classifier 1 Weights Increased Weak classifier 3 Final classifier is linear combination of weak classifiers Weak Classifier 2 Freund & Shapire
96
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Adaboost Algorithm Freund & Shapire
97
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 AdaBoost gives efficient classifier: n Features = Weak Classifiers n Each round selects the optimal feature given: –Previous selected features –Exponential Loss n AdaBoost Surprise –Generalization error decreases even after all training examples 100% correctly classified (margin maximization phenomenon)
98
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Boosted Face Detection: Image Features “Rectangle filters” Unique Binary Features Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce
99
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Example Classifier for Face Detection ROC curve for 200 feature classifier A classifier with 200 rectangle features was learned using AdaBoost 95% correct detection on test set with 1 in 14084 false positives. Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce
100
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Classifier are Efficient n Given a nested set of classifier hypothesis classes vs falsenegdetermined by % False Pos % Detection 0 50 50 100 IMAGE SUB-WINDOW Classifier 1 F NON-FACE F FACE Classifier 3 T F NON-FACE T T T Classifier 2 F NON-FACE Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce
101
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Cascaded Classifier 1 Feature 5 Features F 50% 20 Features 20%2% FACE NON-FACE F F IMAGE SUB-WINDOW n A 1 feature classifier achieves 100% detection rate and about 50% false positive rate. n A 5 feature classifier achieves 100% detection rate and 40% false positive rate (20% cumulative) –using data from previous stage. n A 20 feature classifier achieve 100% detection rate with 10% false positive rate (2% cumulative) Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce
102
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Output of Face Detector on Test Images Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce
103
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Solving other “Face” Tasks Facial Feature Localization Demographic Analysis Profile Detection Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce
104
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Face Localization Features n Learned features reflect the task Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce
105
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Face Profile Detection Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce
106
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Face Profile Features
107
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Finding Cars (DARPA Urban Challenge) n Hand-labeled images of generic car rear-ends n Training time: ~5 hours, offline 1100 images Credit: Hendrik Dahlkamp
108
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Generating even more examples n Generic classifier finds all cars in recorded video. n Compute offline and store in database 28700 images Credit: Hendrik Dahlkamp
109
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Results - Video
110
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Summary Viola-Jones n Many simple features –Generalized Haar features (multi-rectangles) –Easy and efficient to compute n Discriminative Learning: –finds a small subset for object recognition –Uses AdaBoost n Result: Feature Cascade –15fps on 700Mhz Laptop (=fast!) n Applications –Face detection –Car detection –Many others
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.