Distinctive Image Feature from Scale-Invariant KeyPoints David G. Lowe, 2004
Presentation Content Introduction Related Research Algorithm Keypoint localization Orientation assignment Keypoint descriptor Recognizing images using keypoint descriptors Achievements and Results Conclusion
Introduction Image matching is a fundamental aspect of many problems in computer vision. So how do we do that?
Scale Invariant Feature Transform (SIFT) Object or Scene recognition. Using local invariant image features. (keypoints) Scaling Rotation Illumination 3D camera viewpoint (affine) Clutter / noise Occlusion Realtime
Related Research Corner detectors Moravec 1981 Harris and Stepens 1988 Zhang 1995 Torr 1995 Schmid and Mohr 1997 Scale invariant Crowley and Parker 1984 Shokoufandeh 1999 Lindeberg 1993, 1994 Lowe 1999 (this author) Invariant to full affine transformation Baumberg 2000 Tuytelaars and Van Gool 2000 Mikolajczyk and Schmid 2002 Schaffalitzky and Zisserman 2002 Brown and Lowe 2002
Keypoint Detection Goal: Identify locations and scales that can be repeatably assigned under differing views of the same object. Keypoints detection is done at a specific scale and location Difference of gaussian function
Search for stable features across all possible scales D(x, y, σ) = (G(x, y, kσ) − G(x, y, σ)) ∗ I (x, y) = L(x, y, kσ) − L(x, y, σ). σ = amount of smoothing k = constant : 2^(1/s)
KeyPoint Detection Reasonably low cost Scale sensative Number of scale samples per octave?
3 scale samples per octave where used (although more is better).
Determine amount of smoothing (σ) Loss of high frequency information so double up
Accurate Keypoint Localization (1/2) Use Taylor expansion to determine the interpolated location of the extrema (local maximum). Calculate the extrema at this exact location and discart extrema below 3% difference of it surroundings.
Accurate Keypoint Localization (2/2) Eliminating Edge Responses Deffine a Hessian matrix with derivatives of pixel values in 4 directions Detirmine ratio of maxiumum eigenvalue divided by smaller one.
#KeyPoints 0 832 729 536
Orientation Assignment Caluculate orientation and magnitude of gradients in each pixel Histogram of orientations of sample points near keypoint. Weighted by its gradient magnitude and by a Gaussian-weighted circular window with a σ that is 1.5 times that of the scale of the keypoint.
Stable orientation results Multiple keypoints for multiple histogram peaks Interpolation
The Local Image Discriptor We now can find keypoints invariant to location scale and orientation. Now compute discriptors for each keypoint. Highly distinctive yet invariant for illumination and 3D viewpoint changes. Biologically inspired approach.
Divide sample points around keypoint in 16 regions (4 regions used in picture) Create histogram of orientations of each region (8 bins) Trilinear interpolation. Vector normalization
Descriptor Testing This graph shows the percent of keypoints giving the correct match to a database of 40,000 keypoints as a function of width of the n×n keypoint descriptor and the number of orientations in each histogram. The graph is computed for images with affine viewpoint change of 50 degrees and addition of 4% noise.
Keypoint Matching Look for nearest neighbor in database (euclidean distance) Comparing the distance of the closest neighbor to that of the second-closest neighbor. Distance closest / distance second-closest > 0.8 then discard.
Efficient Nearest Neighbor Indexing . 128-dimensional feature vector Best-Bin-First (BBF) Modified k-d tree algorithm. Only find an approximate answer. Works well because of 0.8 distance rule.
Clustering with the Hough Transform Select 1% inliers among 99% outliers Find clusteres of features that vote for the same object pose. 2D location Scale Orientation Location relative to original training image. Use broad bin sizes.
Solution for Affine Parameters An affine transformation correctly accounts for 3D rotation of a planar surface under orthographic projection, but the approximation can be poor for 3D rotation of non-planar objects. Basiclly: we do not create a 3D representation of the object.
The affine transformation of a model point [x y] to an image point [u v] can be written as Outliers are discarded New matches can be found by top-down matching
Results
Results
Conclusion Invariant to image rotation and scale and robust across a substantial range of affine distortion, addition of noise, and change in illumination. Realtime Lots of applications
Further Research Color 3D representation of world.