Download presentation
Presentation is loading. Please wait.
1
Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce
2
CS223b 2 Advanced Features: Topics Template matching SIFT features Haar features
3
CS223b 3 Features for Object Detection/Recognition Want to find … in here
4
CS223b 4 Template Convolution Pick a template - rectangular/square region of an image Goal - find it in the same image/images of the same scene from Different viewpoint
5
CS223b 5 Convolution with Templates % read image im = imread('bridge.jpg'); bw = double(im(:,:,1))./ 255; imshow(bw) % apply FFT FFTim = fft2(bw); bw2 = real(ifft2(FFTim)); imshow(bw2) % define a kernel kernel=zeros(size(bw)); kernel(1, 1) = 1; kernel(1, 2) = -1; FFTkernel = fft2(kernel); % apply the kernel and check out the result FFTresult = FFTim.* FFTkernel; result = real(ifft2(FFTresult)); imshow(result) % select an image patch patch = bw(221:240,351:370); imshow(patch) patch = patch - (sum(sum(patch)) / size(patch,1) / size(patch, 2)); kernel=zeros(size(bw)); kernel(1:size(patch,1),1:size(patch,2)) = patch; FFTkernel = fft2(kernel); % apply the kernel and check out the result FFTresult = FFTim.* FFTkernel; result = max(0, real(ifft2(FFTresult))); result = result./ max(max(result)); result = (result.^ 1 > 0.5); imshow(result) % alternative convolution imshow(conv2(bw, patch, 'same'))
6
CS223b 6 Template Convolution
7
CS223b 7 Aside: Convolution Theorem Fourier Transform of g : F is invertible Convolution is a spatial domain is a multiplication in frequency domain - often more efficient when fast FFT available
8
CS223b 8 Convolution with Templates % read image im = imread('bridge.jpg'); bw = double(im(:,:,1))./ 256;; imshow(bw) % apply FFT FFTim = fft2(bw); bw2 = real(ifft2(FFTim)); imshow(bw2) % define a kernel kernel=zeros(size(bw)); kernel(1, 1) = 1; kernel(1, 2) = -1; FFTkernel = fft2(kernel); % apply the kernel and check out the result FFTresult = FFTim.* FFTkernel; result = real(ifft2(FFTresult)); imshow(result) % select an image patch patch = bw(221:240,351:370); imshow(patch) patch = patch - (sum(sum(patch)) / size(patch,1) / size(patch, 2)); kernel=zeros(size(bw)); kernel(1:size(patch,1),1:size(patch,2)) = patch; FFTkernel = fft2(kernel); % apply the kernel and check out the result FFTresult = FFTim.* FFTkernel; result = max(0, real(ifft2(FFTresult))); result = result./ max(max(result)); result = (result.^ 1 > 0.5); imshow(result) % alternative convolution imshow(conv2(bw, patch, 'same'))
9
CS223b 9 Given a template - find the region in the image with the highest matching score Matching score - result of convolution is maximal (or use SSD, SAD, NSS similarity measures) Given rotated, scaled, perspectively distorted version of the image Can we find the same patch (we want invariance!) Scaling Rotation Illumination Perspective Projection Feature Matching with templates
10
CS223b 10 Given a template - find the region in the image with the highest matching score Matching score - result of convolution is maximal (or use SSD, SAD, NSS similarity measures) Given rotated, scaled, perspectively distorted version of the image Can we find the same patch (we want invariance!) Scaling - NO Rotation - NO Illumination - depends Perspective Projection - NO Feature Matching with templates
11
CS223b 11 Scale Invariance: Image Pyramid
12
CS223b 12 Aliasing Effects Constructing a pyramid by taking every second pixel leads to layers that badly misrepresent the top layer Slide credit: Gary Bradski
13
CS223b 13 “Drop” vs “Smooth and Drop” Drop every second pixelSmooth and Drop every second pixel Aliasing problems
14
CS223b 14 Improved Invariance Handling Want to find … in here
15
CS223b 15 SIFT Features Invariances: Scaling Rotation Illumination Deformation Provides Good localization Yes Not really Yes
16
CS223b 16 SIFT Reference Distinctive image features from scale-invariant keypoints. David G. Lowe, International Journal of Computer Vision, 60, 2 (2004), pp. 91-110. SIFT = Scale Invariant Feature Transform
17
CS223b 17 Invariant Local Features Image content is transformed into local feature coordinates that are invariant to translation, rotation, scale, and other imaging parameters SIFT Features
18
CS223b 18 Advantages of invariant local features Locality: features are local, so robust to occlusion and clutter (no prior segmentation) Distinctiveness: individual features can be matched to a large database of objects Quantity: many features can be generated for even small objects Efficiency: close to real-time performance Extensibility: can easily be extended to wide range of differing feature types, with each adding robustness
19
CS223b 19 SIFT On-A-Slide 1. Enforce invariance to scale: Compute Gaussian difference max, for many different scales; non-maximum suppression, find local maxima: keypoint candidates 2. Localizable corner: For each maximum fit quadratic function. Compute center with sub-pixel accuracy by setting first derivative to zero. 3. Eliminate edges: Compute ratio of eigenvalues, drop keypoints for which this ratio is larger than a threshold. 4. Enforce invariance to orientation: Compute orientation, to achieve rotation invariance, by finding the strongest second derivative direction in the smoothed image (possibly multiple orientations). Rotate patch so that orientation points up. 5. Compute feature signature: Compute a "gradient histogram" of the local image region in a 4x4 pixel region. Do this for 4x4 regions of that size. Orient so that largest gradient points up (possibly multiple solutions). Result: feature vector with 128 values (15 fields, 8 gradients). 6. Enforce invariance to illumination change and camera saturation: Normalize to unit length to increase invariance to illumination. Then threshold all gradients, to become invariant to camera saturation.
20
CS223b 20 Finding “Keypoints” (Corners) Idea: Find Corners, but scale invariance Approach: Run linear filter (difference of Gaussians) Do this at different resolutions of image pyramid
21
CS223b 21 Difference of Gaussians Minus Equals Approximates Laplacian (see filtering lecture)
22
CS223b 22 Difference of Gaussians surf(fspecial('gaussian',40,4)) surf(fspecial('gaussian',40,8)) surf(fspecial('gaussian',40,8) - fspecial('gaussian',40,4)) im =imread('bridge.jpg'); bw = double(im(:,:,1)) / 256; for i = 1 : 10 gaussD = fspecial('gaussian',40,2*i) - fspecial('gaussian',40,i); res = abs(conv2(bw, gaussD, 'same')); res = res / max(max(res)); imshow(res) ; title(['\bf i = ' num2str(i)]); drawnow end
23
CS223b 23 Gaussian Kernel Size i=1
24
CS223b 24 Gaussian Kernel Size i=2
25
CS223b 25 Gaussian Kernel Size i=3
26
CS223b 26 Gaussian Kernel Size i=4
27
CS223b 27 Gaussian Kernel Size i=5
28
CS223b 28 Gaussian Kernel Size i=6
29
CS223b 29 Gaussian Kernel Size i=7
30
CS223b 30 Gaussian Kernel Size i=8
31
CS223b 31 Gaussian Kernel Size i=9
32
CS223b 32 Gaussian Kernel Size i=10
33
CS223b 33 Key point localization In D. Lowe’s paper image is decomposed to octaves (consecutively sub-sampled versions of the same image) Instead of convolving with large kernels within an octave kernels are kept the same Detect maxima and minima of difference- of-Gaussian in scale space Look for 3x3 neighbourhood in scale and space
34
CS223b 34 Example of keypoint detection (a) 233x189 image (b) 832 DOG extrema (c) 729 above threshold
35
CS223b 35 SIFT On-A-Slide 1. Enforce invariance to scale: Compute Gaussian difference max, for may different scales; non-maximum suppression, find local maxima: keypoint candidates 2. Localizable corner: For each maximum fit quadratic function. Compute center with sub-pixel accuracy by setting first derivative to zero. 3. Eliminate edges: Compute ratio of eigenvalues, drop keypoints for which this ratio is larger than a threshold. 4. Enforce invariance to orientation: Compute orientation, to achieve rotation invariance, by finding the strongest second derivative direction in the smoothed image (possibly multiple orientations). Rotate patch so that orientation points up. 5. Compute feature signature: Compute a "gradient histogram" of the local image region in a 4x4 pixel region. Do this for 4x4 regions of that size. Orient so that largest gradient points up (possibly multiple solutions). Result: feature vector with 128 values (15 fields, 8 gradients). 6. Enforce invariance to illumination change and camera saturation: Normalize to unit length to increase invariance to illumination. Then threshold all gradients, to become invariant to camera saturation.
36
CS223b 36 Example of keypoint detection Threshold on value at DOG peak and on ratio of principle curvatures (Harris approach) (c) 729 left after peak value threshold (from 832) (d) 536 left after testing ratio of principle curvatures
37
CS223b 37 SIFT On-A-Slide 1. Enforce invariance to scale: Compute Gaussian difference max, for may different scales; non-maximum suppression, find local maxima: keypoint candidates 2. Localizable corner: For each maximum fit quadratic function. Compute center with sub-pixel accuracy by setting first derivative to zero. 3. Eliminate edges: Compute ratio of eigenvalues, drop keypoints for which this ratio is larger than a threshold. 4. Enforce invariance to orientation: Compute orientation, to achieve rotation invariance, by finding the strongest second derivative direction in the smoothed image (possibly multiple orientations). Rotate patch so that orientation points up. 5. Compute feature signature: Compute a "gradient histogram" of the local image region in a 4x4 pixel region. Do this for 4x4 regions of that size. Orient so that largest gradient points up (possibly multiple solutions). Result: feature vector with 128 values (15 fields, 8 gradients). 6. Enforce invariance to illumination change and camera saturation: Normalize to unit length to increase invariance to illumination. Then threshold all gradients, to become invariant to camera saturation.
38
CS223b 38 Select canonical orientation Create histogram of local gradient directions computed at selected scale Assign canonical orientation at peak of smoothed histogram Each key specifies stable 2D coordinates (x, y, scale, orientation)
39
CS223b 39 SIFT On-A-Slide 1. Enforce invariance to scale: Compute Gaussian difference max, for may different scales; non-maximum suppression, find local maxima: keypoint candidates 2. Localizable corner: For each maximum fit quadratic function. Compute center with sub-pixel accuracy by setting first derivative to zero. 3. Eliminate edges: Compute ratio of eigenvalues, drop keypoints for which this ratio is larger than a threshold. 4. Enforce invariance to orientation: Compute orientation, to achieve rotation invariance, by finding the strongest second derivative direction in the smoothed image (possibly multiple orientations). Rotate patch so that orientation points up. 5. Compute feature signature: Compute a "gradient histogram" of the local image region in a 4x4 pixel region. Do this for 4x4 regions of that size. Orient so that largest gradient points up (possibly multiple solutions). Result: feature vector with 128 values (15 fields, 8 gradients). 6. Enforce invariance to illumination change and camera saturation: Normalize to unit length to increase invariance to illumination. Then threshold all gradients, to become invariant to camera saturation.
40
CS223b 40 SIFT vector formation Thresholded image gradients are sampled over 16x16 array of locations in scale space Create array of orientation histograms 8 orientations x 4x4 histogram array = 128 dimensions
41
CS223b 41 Nearest-neighbor matching to feature database Hypotheses are generated by approximate nearest neighbor matching of each feature to vectors in the database SIFT use best-bin-first (Beis & Lowe, 97) modification to k-d tree algorithm Use heap data structure to identify bins in order by their distance from query point Result: Can give speedup by factor of 1000 while finding nearest neighbor (of interest) 95% of the time
42
CS223b 42 3D Object Recognition Extract outlines with background subtraction
43
CS223b 43 3D Object Recognition Only 3 keys are needed for recognition, so extra keys provide robustness Affine model is no longer as accurate
44
CS223b 44 Recognition under occlusion
45
CS223b 45 Test of illumination invariance Same image under differing illumination 273 keys verified in final match
46
CS223b 46 Examples of view interpolation
47
CS223b 47 Location recognition
48
CS223b 48 SIFT Invariances: Scaling Rotation Illumination Perspective Projection Provides Good localization Yes Maybe Yes State-of-the-art in invariant feature matching! Alternative detectors/descriptors/references can be found at http://www.robots.ox.ac.uk/~vgg/software/
49
CS223b 49 SOFTWARE for Matlab (at UCLA)
50
CS223b 50 SIFT demos Run sift_compile sift_demo2
51
CS223b 51 Advanced Features: Topics SIFT Features Learning with Many Simple Features
52
CS223b 52 A totally different idea Use many very simple features Learn cascade of tests for target object Efficient if: features easy to compute cascade short
53
CS223b 53 Using Many Simple Features Viola Jones / Haar Features (Generalized) Haar Features: rectangular blocks, white or black 3 types of features: two rectangles: horizontal/vertical three rectangles four rectangles in 24x24 window: 180,000 possible features
54
CS223b 54 Integral Image Def: The integral image at location (x,y), is the sum of the pixel values above and to the left of (x,y), inclusive. We can calculate the integral image representation of the image in a single pass. (x,y) s(x,y) = s(x,y-1) + i(x,y) ii(x,y) = ii(x-1,y) + s(x,y) (0,0) x y Slide credit: Gyozo Gidofalvi
55
CS223b 55 Efficient Computation of Rectangle Value Using the integral image representation one can compute the value of any rectangular sum in constant time. Example: Rectangle D ii(4) + ii(1) – ii(2) – ii(3) As a result two-, three-, and four-rectangular features can be computed with 6, 8 and 9 array references respectively. Slide credit: Gyozo Gidofalvi
56
CS223b 56 Idea 1: Linear Separator Slide credit: Frank Dellaert, Paul Viola, Forsyth&Ponce
57
CS223b 57 Linear Separator for Image features (highly related to Vapnik’s Support Vector Machines) Slide credit: Frank Dellaert, Paul Viola, Forsyth&Ponce
58
CS223b 58 Problem How to find hyperplane? How to avoid evaluating 180,000 features? Answer: Boosting [AdaBoost, Freund/Shapire] Finds small set of features that are “sufficient” Generalizes very well (a lot of max-margin theory) Requires positive and negative examples
59
CS223b 59 AdaBoost Idea (in Viola/Jones): Given set of “weak” classifiers: Pick best one Reweight training examples, so that misclassified images have larger weight Reiterate; then linearly combine resulting classifiers Weak classifiers: Haar features
60
CS223b 60 AdaBoost Idea (in Viola/Jones): We will dicuss the classification later Sneak preview of Adaboost and Results on face and car detection … to be continued when discussing object detection and recognition
61
CS223b 61 AdaBoost Weak Classifier 1 Weights Increased Weak classifier 3 Final classifier is linear combination of weak classifiers Weak Classifier 2 Freund & Shapire
62
CS223b 62 Adaboost Algorithm Freund & Shapire
63
CS223b 63 AdaBoost gives efficient classifier: Features = Weak Classifiers Each round selects the optimal feature given: Previous selected features Exponential Loss AdaBoost Surprise Generalization error decreases even after all training examples 100% correctly classified (margin maximization phenomenon)
64
CS223b 64 Boosted Face Detection: Image Features “Rectangle filters” Unique Binary Features Slide credit: Frank Dellaert, Paul Viola, Forsyth&Ponce
65
CS223b 65 Example Classifier for Face Detection ROC curve for 200 feature classifier A classifier with 200 rectangle features was learned using AdaBoost 95% correct detection on test set with 1 in 14084 false positives. Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce
66
CS223b 66 Classifier are Efficient Given a nested set of classifier hypothesis classes vs falsenegdetermined by % False Pos % Detection 0 50 50 100 IMAGE SUB-WINDOW Classifier 1 F NON-FACE F FACE Classifier 3 T F NON-FACE T T T Classifier 2 F NON-FACE Slide credit: Frank Dellaert, Paul Viola, Forsyth&Ponce
67
CS223b 67 Cascaded Classifier 1 Feature 5 Features F 50% 20 Features 20%2% FACE NON-FACE F F IMAGE SUB-WINDOW A 1 feature classifier achieves 100% detection rate and about 50% false positive rate. A 5 feature classifier achieves 100% detection rate and 40% false positive rate (20% cumulative) using data from previous stage. A 20 feature classifier achieve 100% detection rate with 10% false positive rate (2% cumulative) Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce
68
CS223b 68 Output of Face Detector on Test Images Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce
69
CS223b 69 Solving other “Face” Tasks Facial Feature Localization Demographic Analysis Profile Detection Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce
70
CS223b 70 Face Localization Features Learned features reflect the task Slide credit: Frank Dellaert, Paul Viola, Forsyth&Ponce
71
CS223b 71 Face Profile Detection Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce
72
CS223b 72 Face Profile Features
73
CS223b 73 Finding Cars (DARPA Urban Challenge) Hand-labeled images of generic car rear-ends Training time: ~5 hours, offline 1100 images Credit: Hendrik Dahlkamp
74
CS223b 74 Generating even more examples Generic classifier finds all cars in recorded video. Compute offline and store in database 28700 images Credit: Hendrik Dahlkamp
75
CS223b 75 Results - Video
76
CS223b 76 Summary Viola-Jones Many simple features Generalized Haar features (multi-rectangles) Easy and efficient to compute Discriminative Learning: finds a small subset for object recognition Uses AdaBoost Result: Feature Cascade 15fps on 700Mhz Laptop (=fast!) Applications Face detection Car detection Many others
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.