Distinctive Image Features from Scale-Invariant Keypoints David G. Lowe International Journal of Computer Vision(IJCV), 2004
Extracting distinctive invariant features Points are individually ambiguous More unique matches are possible with small regions of images http://www.csie.ntu.edu.tw/~cyy/courses/vfx/05spring/lectures/handouts/lec04_feature.pdf
Desired properties for features Invariant: invariant to scale, rotation, affine, illumination and noise for robust matching across a substantial range of affine distortion, viewpoint change and so on. Distinctive: a single feature can be correctly matched with high probability
Moravec corner detector (1980) We should easily recognize the point by looking through a small window Shifting a window in any direction should give a large change in intensity
Moravec corner detector flat
Moravec corner detector flat
Moravec corner detector flat edge
Moravec corner detector isolated point flat edge
Moravec corner detector Change of intensity for the shift [u,v]: Intensity Window function Shifted intensity Four shifts: (u,v) = (1,0), (1,1), (0,1), (-1, 1) Problem: responds too strong for edges because only minimum of E is taken into account
Harris corner detector [1992] Consider all small shifts by Taylor’s expansion W(x, y): Gaussian function => M: 2x2 Hessian matrix, 1, 2 – eigenvalues of M
Harris corner detector 1 2 Corner 1 and 2 are large, 1 ~ 2; E increases in all directions edge 1 >> 2 edge 2 >> 1 flat Measure of corner response: Classification of image points using eigenvalues of M:
Harris Detector: Problem non-invariant to image scale! All points will be classified as edges Corner !
Scale-invariant feature transform (SIFT) Scale-invariant feature transform (or SIFT) is an algorithm to detect and describe local features in images. Distinctive features Invariant to image scale, rotation and affine distortion Applied locally on key-points Based upon the image gradients in a local neighborhood
Scale-space extrema detection Keypoint localization SIFT stages: Scale-space extrema detection Keypoint localization Orientation assignment Keypoint descriptor detector descriptor local descriptor
1. Detection of scale-space extrema Convolution with a variable-scale Gaussian Difference-of-Gaussian (DoG) filter
Scale space doubles for the next octave 2 K=2(1/s), s+3 images for each octave k∙
DoG Efficient function to compute A close approximation to the scale-normalized Laplacian of Gaussian
2. Keypoint localization X is selected if it is larger or smaller than all 26 neighbors
Decide scale sampling frequency
Pre-smoothing =1.6, plus a double expansion
2. Accurate keypoint localization Reject points with low contrast and poorly localized along an edge If has offset larger than 0.5, sample point is changed. If is less than 0.03 (low contrast), it is discarded.
Eliminating edge responses Let Keep the points with r=10
3. Orientation assignment By assigning a consistent orientation, the keypoint descriptor can be orientation invariant. For a keypoint, L is the image with the closest scale 36-bin orientation histogram over 360° weighted by m Peak is the dominant orientation Local peak within 80% creates multiple orientations About 15% has multiple orientations
4. Local image descriptor Image gradients are sampled over 16x16 array of locations in scale space Create array of orientation histograms 8 orientations x 4x4 histogram array = 128 dimensions
Recognition examples