Interest Point Descriptors and Matching

Slides:



Advertisements
Similar presentations
Distinctive Image Features from Scale-Invariant Keypoints
Advertisements

Feature Detection. Description Localization More Points Robust to occlusion Works with less texture More Repeatable Robust detection Precise localization.
Distinctive Image Features from Scale-Invariant Keypoints David Lowe.
Group Meeting Presented by Wyman 10/14/2006
The SIFT (Scale Invariant Feature Transform) Detector and Descriptor
Object Recognition using Invariant Local Features Applications l Mobile robots, driver assistance l Cell phone location or object recognition l Panoramas,
CSE 473/573 Computer Vision and Image Processing (CVIP)
Distinctive Image Features from Scale- Invariant Keypoints Mohammad-Amin Ahantab Technische Universität München, Germany.
Instructor: Mircea Nicolescu Lecture 15 CS 485 / 685 Computer Vision.
Instructor: Mircea Nicolescu Lecture 13 CS 485 / 685 Computer Vision.
IBBT – Ugent – Telin – IPI Dimitri Van Cauwelaert A study of the 2D - SIFT algorithm Dimitri Van Cauwelaert.
Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.
Patch Descriptors CSE P 576 Larry Zitnick
Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.
Lecture 4: Feature matching
Scale Invariant Feature Transform
Distinctive Image Feature from Scale-Invariant KeyPoints
Feature extraction: Corners and blobs
Distinctive image features from scale-invariant keypoints. David G. Lowe, Int. Journal of Computer Vision, 60, 2 (2004), pp Presented by: Shalomi.
Scale Invariant Feature Transform (SIFT)
Representation, Description and Matching
Automatic Matching of Multi-View Images
SIFT - The Scale Invariant Feature Transform Distinctive image features from scale-invariant keypoints. David G. Lowe, International Journal of Computer.
The SIFT (Scale Invariant Feature Transform) Detector and Descriptor
Distinctive Image Features from Scale-Invariant Keypoints David G. Lowe – IJCV 2004 Brien Flewelling CPSC 643 Presentation 1.
Lecture 6: Feature matching and alignment CS4670: Computer Vision Noah Snavely.
Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.
Overview Introduction to local features
Interest Point Descriptors
Distinctive Image Features from Scale-Invariant Keypoints By David G. Lowe, University of British Columbia Presented by: Tim Havinga, Joël van Neerbos.
Computer vision.
Feature matching Digital Visual Effects, Spring 2005 Yung-Yu Chuang 2005/3/16 with slides by Trevor Darrell Cordelia Schmid, David Lowe, Darya Frolova,
Bag of Visual Words for Image Representation & Visual Search Jianping Fan Dept of Computer Science UNC-Charlotte.
Object Tracking/Recognition using Invariant Local Features Applications l Mobile robots, driver assistance l Cell phone location or object recognition.
Overview Harris interest points Comparing interest points (SSD, ZNCC, SIFT) Scale & affine invariant interest points Evaluation and comparison of different.
Reporter: Fei-Fei Chen. Wide-baseline matching Object recognition Texture recognition Scene classification Robot wandering Motion tracking.
776 Computer Vision Jan-Michael Frahm Fall SIFT-detector Problem: want to detect features at different scales (sizes) and with different orientations!
CVPR 2003 Tutorial Recognition and Matching Based on Local Invariant Features David Lowe Computer Science Department University of British Columbia.
CSCE 643 Computer Vision: Extractions of Image Features Jinxiang Chai.
Lecture 7: Features Part 2 CS4670/5670: Computer Vision Noah Snavely.
Distinctive Image Features from Scale-Invariant Keypoints Ronnie Bajwa Sameer Pawar * * Adapted from slides found online by Michael Kowalski, Lehigh University.
Harris Corner Detector & Scale Invariant Feature Transform (SIFT)
Features Digital Visual Effects, Spring 2006 Yung-Yu Chuang 2006/3/15 with slides by Trevor Darrell Cordelia Schmid, David Lowe, Darya Frolova, Denis Simakov,
Distinctive Image Features from Scale-Invariant Keypoints David Lowe Presented by Tony X. Han March 11, 2008.
Overview Introduction to local features Harris interest points + SSD, ZNCC, SIFT Scale & affine invariant interest point detectors Evaluation and comparison.
Feature extraction: Corners and blobs. Why extract features? Motivation: panorama stitching We have two images – how do we combine them?
A Tutorial on using SIFT Presented by Jimmy Huff (Slightly modified by Josiah Yoder for Winter )
Instructor: Mircea Nicolescu Lecture 10 CS 485 / 685 Computer Vision.
Recognizing specific objects Matching with SIFT Original suggestion Lowe, 1999,2004.
Distinctive Image Features from Scale-Invariant Keypoints Presenter :JIA-HONG,DONG Advisor : Yen- Ting, Chen 1 David G. Lowe International Journal of Computer.
Blob detection.
SIFT.
SIFT Scale-Invariant Feature Transform David Lowe
CS262: Computer Vision Lect 09: SIFT Descriptors
Interest Points EE/CSE 576 Linda Shapiro.
Lecture 07 13/12/2011 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.
Distinctive Image Features from Scale-Invariant Keypoints
Scale Invariant Feature Transform (SIFT)
SIFT paper.
Feature description and matching
CAP 5415 Computer Vision Fall 2012 Dr. Mubarak Shah Lecture-5
From a presentation by Jimmy Huff Modified by Josiah Yoder
The SIFT (Scale Invariant Feature Transform) Detector and Descriptor
Interest Points & Descriptors 3 - SIFT
SIFT.
Feature descriptors and matching
SIFT SIFT is an carefully designed procedure with empirically determined parameters for the invariant and distinctive features.
Presented by Xu Miao April 20, 2005
Presentation transcript:

Interest Point Descriptors and Matching CS485/685 Computer Vision Dr. George Bebis

Interest Point Descriptors Extract affine regions Normalize regions Eliminate rotational ambiguity Compute appearance descriptors SIFT (Lowe ’04)

Simplest approach: correlation

Simplest approach: correlation (cont’d) Works satisfactorily when matching corresponding regions related mostly by translation. e.g., stereo pairs, video sequence assuming small camera motion 4

Simplest approach: correlation (cont’d) Sensitive to small variations with respect to: Location Pose Scale Intra-class variability Poorly distinctive! Need more powerful descriptors!

Scale Invariant Feature Transform (SIFT) Take a 16 x16 window around interest point (i.e., at the scale detected). Divide into a 4x4 grid of cells. Compute histogram of image gradients in each cell (8 bins each). 16 histograms x 8 orientations = 128 features

Properties of SIFT Highly distinctive Scale and rotation invariant. A single feature can be correctly matched with high probability against a large database of features from many images. Scale and rotation invariant. Partially invariant to 3D camera viewpoint Can tolerate up to about 60 degree out of plane rotation Partially invariant to changes in illumination Can be computed fast and efficiently. 7

Example http://people.csail.mit.edu/albert/ladypack/wiki/index.php/Known_implementations_of_SIFT 8

SIFT Computation – Steps (1) Scale-space extrema detection Extract scale and rotation invariant interest points (i.e., keypoints). (2) Keypoint localization Determine location and scale for each interest point. Eliminate “weak” keypoints (3) Orientation assignment Assign one or more orientations to each keypoint. (4) Keypoint descriptor Use local image gradients at the selected scale. D. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints”, International Journal of Computer Vision, 60(2):91-110, 2004. Cited 9589 times (as of 3/7/2011)

Scale-space Extrema Detection y  Harris   LoG  Harris-Laplace Find local maxima of: Harris detector in space LoG in scale SIFT Find local maxima of: Hessian in space DoG in scale scale x y  Hessian   DoG 

1. Scale-space Extrema Detection (cont’d) DoG images are grouped by octaves (i.e., doubling of σ0) Fixed number of levels per octave 22σ0 down-sample where 2σ0 σ0

1. Scale-space Extrema Detection (cont’d) k0σ0 ksσ0 k1σ0 k2σ0 … (ks=2) Images within each octave are separated by a constant factor k If each octave is divided in s intervals: ks=2 or k=21/s 12

Choosing SIFT parameters Parameters (i.e., scales per octave, σ0 etc.) can be chosen experimentally based on keypoint (i) repeatability, (ii) localization, and (iii) matching accuracy. In Lowe’s paper: Keypoints extracted from 32 real images (outdoor, faces, aerial etc.) Images were subjected to a wide range of transformations (i.e., rotation, scaling, shear, change in brightness, noise). 13

Choosing SIFT parameters (cont’d) How many scales per octave? 3 scales   # of keypoints increases but they are not stable!

Choosing SIFT parameters (cont’d) Smoothing is applied to the first level of each octave. How should we choose σ0? σ0 =1.6

Choosing SIFT parameters (cont’d) 2σ0 Pre-smoothing discards high frequencies. Double the size of the input image (i.e., using linear interpolation) prior to building the first level of the DoG pyramid. Increases the number of stable keypoints by a factor of 4. … k2σ0 kσ0 σ0 16

1. Scale-space Extrema Detection (cont’d) Extract local extrema (i.e., minima or maxima) in DoG pyramid. Compare each point to its 8 neighbors at the same level, 9 neighbors in the level above, and 9 neighbors in the level below (i.e., 26 total).

2. Keypoint Localization Determine the location and scale of keypoints to sub-pixel and sub-scale accuracy by fitting a 3D quadratic polynomial: keypoint location offset sub-pixel, sub-scale Estimated location Substantial improvement to matching and stability! 18 18

2. Keypoint Localization Use Taylor expansion to locally approximate D(x,y,σ) (i.e., DoG function) and estimate Δx: Find the extrema of D(ΔX): 19

2. Keypoint Localization ΔX can be computed by solving a 3x3 linear system: use finite differences: If in any dimension, repeat. 20 20

2. Keypoint Localization (cont’d) Reject keypoints having low contrast. i.e., sensitive to noise If reject keypoint i.e., assumes that image values have been normalized in [0,1]

2. Keypoint Localization (cont’d) Reject points lying on edges (or being close to edges) Harris uses the auto-correlation matrix: R(AW) = det(AW) – α trace2(AW) or R(AW) = λ1 λ2- α (λ1+ λ2)2

2. Keypoint Localization (cont’d) SIFT uses the Hessian matrix (for efficiency). i.e., Hessian encodes principal curvatures α: largest eigenvalue (λmax) β: smallest eigenvalue (λmin) (proportional to principal curvatures) (r = α/β) (SIFT uses r = 10) Reject keypoint if:

2. Keypoint Localization (cont’d) (a) 233x189 image (b) 832 DoG extrema (c) 729 left after low contrast threshold (d) 536 left after testing ratio based on Hessian

3. Orientation Assignment Create histogram of gradient directions, within a region around the keypoint, at selected scale: 2 p 36 bins (i.e., 10o per bin) Histogram entries are weighted by (i) gradient magnitude and (ii) a Gaussian function with σ equal to 1.5 times the scale of the keypoint.

3. Orientation Assignment (cont’d) Assign canonical orientation at peak of smoothed histogram (fit parabola to better localize peak). In case of peaks within 80% of highest peak, multiple orientations assigned to keypoints. About 15% of keypoints has multiple orientations assigned. Significantly improves stability of matching. 2 p

3. Orientation Assignment (cont’d) Stability of location, scale, and orientation (within 15 degrees) under noise.

4. Keypoint Descriptor 8 bins

4. Keypoint Descriptor (cont’d) Take a 16 x16 window around detected interest point. Divide into a 4x4 grid of cells. Compute histogram in each cell. (8 bins) 16 histograms x 8 orientations = 128 features

4. Keypoint Descriptor (cont’d) Each histogram entry is weighted by (i) gradient magnitude and (ii) a Gaussian function with σ equal to 0.5 times the width of the descriptor window.

4. Keypoint Descriptor (cont’d) Partial Voting: distribute histogram entries into adjacent bins (i.e., additional robustness to shifts) Each entry is added to all bins, multiplied by a weight of 1-d, where d is the distance from the bin it belongs. 31

4. Keypoint Descriptor (cont’d) Descriptor depends on two main parameters: (1) number of orientations r (2) n x n array of orientation histograms rn2 features SIFT: r=8, n=4 128 features

4. Keypoint Descriptor (cont’d) Invariance to linear illumination changes: Normalization to unit length is sufficient. 128 features

4. Keypoint Descriptor (cont’d) Non-linear illumination changes: Saturation affects gradient magnitudes more than orientations Threshold entries to be no larger than 0.2 and renormalize to unit length 128 features

Robustness to viewpoint changes Match features after random change in image scale and orientation, with 2% image noise, and affine distortion. Find nearest neighbor in database of 30,000 features. Additional robustness can be achieved using affine invariant region detectors.

Distinctiveness Vary size of database of features, with 30 degree affine change, 2% image noise. Measure % correct for single nearest neighbor match.

Matching SIFT features Given a feature in I1, how to find the best match in I2? Define distance function that compares two descriptors. Test all the features in I2, find the one with min distance. I2 I1 37

Matching SIFT features (cont’d)   I1 I2 f1 f2 38

Matching SIFT features (cont’d) Accept a match if SSD(f1,f2) < t How do we choose t?

Matching SIFT features (cont’d) A better distance measure is the following: SSD(f1, f2) / SSD(f1, f2’) f2 is best SSD match to f1 in I2 f2’ is 2nd best SSD match to f1 in I2 I1 I2 f1 f2 f2' 40

Matching SIFT features (cont’d) Accept a match if SSD(f1, f2) / SSD(f1, f2’) < t t=0.8 has given good results in object recognition. 90% of false matches were eliminated. Less than 5% of correct matches were discarded 41

Matching SIFT features (cont’d) How to evaluate the performance of a feature matcher? 50 75 200 42

Matching SIFT features (cont’d) Threshold t affects # of correct/false matches 50 75 200 false match true match True positives (TP) = # of detected matches that are correct False positives (FP) = # of detected matches that are incorrect 43

Matching SIFT features(cont’d) 0.7 1 FP rate TP rate 0.1 ROC Curve - Generated by computing (FP, TP) for different thresholds. - Maximize area under the curve (AUC). 1 http://en.wikipedia.org/wiki/Receiver_operating_characteristic 44

Applications of SIFT Object recognition Object categorization Location recognition Robot localization Image retrieval Image panoramas 45

Object Recognition Object Models

Object Categorization 47

Location recognition 48

Robot Localization 49

Map continuously built over time 50

Image retrieval – Example 1 CS4495/7495 2004 4/21/2017 Image retrieval – Example 1 … > 5000 images change in viewing angle Local Invariant Features 51 51

Matches 22 correct matches CS4495/7495 2004 4/21/2017 Local Invariant Features 52 52

Image retrieval – Example 2 CS4495/7495 2004 4/21/2017 Image retrieval – Example 2 … > 5000 images change in viewing angle + scale change Local Invariant Features 53 53

Matches 33 correct matches CS4495/7495 2004 4/21/2017 Local Invariant Features 54 54

Image panoramas from an unordered image set CS4495/7495 2004 4/21/2017 Image panoramas from an unordered image set Local Invariant Features 55 55

Variations of SIFT features PCA-SIFT SURF GLOH

SIFT Steps - Review (1) Scale-space extrema detection Extract scale and rotation invariant interest points (i.e., keypoints). (2) Keypoint localization Determine location and scale for each interest point. Eliminate “weak” keypoints (3) Orientation assignment Assign one or more orientations to each keypoint. (4) Keypoint descriptor Use local image gradients at the selected scale. D. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints”, International Journal of Computer Vision, 60(2):91-110, 2004. Cited 9589 times (as of 3/7/2011)

PCA-SIFT Steps 1-3 are the same; Step 4 is modified. Take a 41 x 41 patch at the given scale, centered at the keypoint, and normalized to a canonical direction. Yan Ke and Rahul Sukthankar, “PCA-SIFT: A More Distinctive Representation for Local Image Descriptors”, Computer Vision and Pattern Recognition, 2004 58 58

PCA-SIFT Instead of using weighted histograms, concatenate the horizontal and vertical gradients (39 x 39) into a long vector. Normalize vector to unit length. 2 x 39 x 39 = 3042 vector 59 59

PCA-SIFT Reduce the dimensionality of the vector using Principal Component Analysis (PCA) e.g., from 3042 to 36 Some times, less discriminatory than SIFT. PCA N x 1 K x 1 60 60

SURF: Speeded Up Robust Features Speed-up computations by fast approximation of (i) Hessian matrix and (ii) descriptor using “integral images”. What is an “integral image”? Herbert Bay, Tinne Tuytelaars, and Luc Van Gool, “SURF: Speeded Up Robust Features”, European Computer Vision Conference (ECCV), 2006. 61 61

Integral Image The integral image IΣ(x,y) of an image I(x, y) represents the sum of all pixels in I(x,y) of a rectangular region formed by (0,0) and (x,y). . Using integral images, it takes only four array references to calculate the sum of pixels over a rectangular region of any size. 62 62

SURF: Speeded Up Robust Features (cont’d) Approximate Lxx, Lyy, and Lxy using box filters. Can be computed very fast using integral images! (box filters shown are 9 x 9 – good approximations for a Gaussian with σ=1.2) derivative approximation derivative approximation 63 63

SURF: Speeded Up Robust Features (cont’d) In SIFT, images are repeatedly smoothed with a Gaussian and subsequently sub-sampled in order to achieve a higher level of the pyramid. 64 64

SURF: Speeded Up Robust Features (cont’d) Alternatively, we can use filters of larger size on the original image. Due to using integral images, filters of any size can be applied at exactly the same speed! (see Tuytelaars’ paper for details) 65 65

SURF: Speeded Up Robust Features (cont’d) Approximation of H: Using DoG Using box filters 66 66

SURF: Speeded Up Robust Features (cont’d) Instead of using a different measure for selecting the location and scale of interest points (e.g., Hessian and DOG in SIFT), SURF uses the determinant of to find both. Determinant elements must be weighted to obtain a good approximation: 67 67

SURF: Speeded Up Robust Features (cont’d) Once interest points have been localized both in space and scale, the next steps are: (1) Orientation assignment (2) Keypoint descriptor 68 68

SURF: Speeded Up Robust Features (cont’d) Orientation assignment Circular neighborhood of radius 6σ around the interest point (σ = the scale at which the point was detected) 600 angle x response y response Haar wavelets (responses weighted with Gaussian) Side length = 4σ Can be computed very fast using integral images! 69 69

SURF: Speeded Up Robust Features (cont’d) Keypoint descriptor (square region of size 20σ) Sum the response over each sub-region for dx and dy separately. To bring in information about the polarity of the intensity changes, extract the sum of absolute value of the responses too. Feature vector size: 4 x 16 = 64 4 x 4 grid 70 70

SURF: Speeded Up Robust Features (cont’d) The sum of dx and |dx| are computed separately for points where dy < 0 and dy >0 Similarly for the sum of dy and |dy| More discriminatory! 71 71

SURF: Speeded Up Robust Features Has been reported to be 3 times faster than SIFT. Less robust to illumination and viewpoint changes compared to SIFT. K. Mikolajczyk and C. Schmid,"A Performance Evaluation of Local Descriptors", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 10, pp. 1615-1630, 2005. 72 72

Gradient location-orientation histogram (GLOH) Compute SIFT using a log-polar location grid: 3 bins in radial direction (i.e., radius 6, 11, and 15) 8 bins in angular direction Gradient orientation quantized in 16 bins. Total: (2x8+1)*16=272 bins PCA. K. Mikolajczyk and C. Schmid,"A Performance Evaluation of Local Descriptors", IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 27, no. 10, pp. 1615-1630, 2005.

Shape Context A 3D histogram of edge point locations and orientations. Edges are extracted by the Canny edge detector. Location is quantized into 9 bins (using a log-polar coordinate system). Orientation is quantized in 4 bins (i.e., (horizontal, vertical, and two diagonals). Total number of features: 4 x 9 = 36 K. Mikolajczyk and C. Schmid,"A Performance Evaluation of Local Descriptors", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 10, pp. 1615-1630, 2005.

Spin image A histogram of quantized pixel locations and intensity values. A normalized histogram is computed for each of five rings centered on the region. The intensity of a normalized patch is quantized into 10 bins. Total number of features: 5 x 10 = 50 K. Mikolajczyk and C. Schmid,"A Performance Evaluation of Local Descriptors", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 10, pp. 1615-1630, 2005.

Differential Invariants “Local jets” of derivatives obtained by convolving the image with Gaussian derivates. Derivates are computed at different orientations by rotating the image patches. Example: some Gaussian derivatives up to fourth order compute invariants 76

Bank of Filters (e.g., Gabor filters) 77

Moment Invariants Moments are computed for derivatives of an image patch using: where p and q is the order, a is the degree, and Id is the image gradient in direction d. Derivatives are computed in x and y directions. 78 78

Bank of Filters: Steerable Filters 79