Distinctive Image Features from Scale-Invariant Keypoints By David G. Lowe, University of British Columbia Presented by: Tim Havinga, Joël van Neerbos and Robert van der Linden
Organization Introduction Keypoint extraction Applications
Introduction Matching images across affine transformation: Change in lighting and 3D viewpoint:
Introduction Motion tracking Object and scene recognition Stereo correspondence
Extracting features Extrema detection Keypoint localization Orientation assignment Local image descriptor
Extrema detection Blur copies of the image with broadening Gaussian filters.
Extrema detection Subtract these (DoG) to find local extrema.
Extrema detection Calculate the DoGs for different gaussians. 2 x
Extrema detection Calculate the DoGs for different gaussians. 2 x
Extrema detection Blur
Keypoint localization Select keypoints that are higher or lower than their 26 neighbours.
Keypoint localization Reject all points where the contrast is too low.
Keypoint localization Reject all points that lie on an edge.
Effects of this elimination Extrema detection
Effects of this elimination Contrast check
Effects of this elimination Edge check
Extracting features Extrema detection Keypoint localization Orientation assignment Local image descriptor
Orientation assignment Assign an orientation to a keypoint to make its descriptor invariant to rotation
Orientation assignment The orientation of a keypoint is determined in four steps: 1.Determine sample points 2.Determine the gradient magnitude and orientation of each sample point 3.Create an orientation histogram of the sample points 4.Extract the dominant directions from the histogram
Step 1: Determine sample points The source image is the Gaussian smoothed image with the closest scale Use all pixels within a certain radius Actual scale Used Gaussian
Step 2: Determine gradient magnitude and orientation of each sample point Gradient magnitude: Gradient orientation:
Step 2: Determine gradient magnitude and orientation of each sample point Gradient magnitude: Gradient orientation: pixel
Step 3: Create an orientation histogram The histogram has 36 bins, each covering 10 degrees Each sample is weighted its gradient magnitude and a Gaussian weighted circular window
Step 4: Extract dominant directions Take the peak(s) from the orientation histogram Use all peaks greater than 80% of the highest peak Every direction gets its own keypoint
The Local image descriptor Every keypoint now has a location, scale and orientation, from which a repeatable 2D grid can be determined We want distinctive descriptor vectors, partially invariant to illumination and viewpoint changes
Computing the Local image descriptor Take the 16 x 16 sample array around the keypoint Compute 4 x 4 orientation histograms from this array Use 8 bins per histogram: 4x4x8=128 features
Local image descriptor optimizations Normalize the obtained feature vector to enhance invariance to illumination changes Reduce the influence of large gradient magnitudes by capping the normalized features to 0.2 Normalize again
Possible applications for SIFT We have a feature extraction method which yields useful keypoints, what's next? Some appications: Object recognition in images Panorama stitching 3D scene modelling 3D human action tracking (for example for security surveillance) Robot localisation and mapping
Panorama stitching
Brown, ICCV 2003 Panorama stitching
(from Sudderth et al., 2006) 3D modelling
Application: SIFT to object recognition We can applicate SIFT to recognize objects in images. Say, we have an image which contains an object. How to recognize? Key idea: Compare keypoints, if these are similar it is likely that it is the same object. First problem: a lot of features arise from background clutter. How to remove these? Possible approach: - Look for clusters of matching features - Look for distance of closest match to the second- closest match
Efficiently locating the nearest neighbour 128 dimensional feature vector for each keypoint: no search optimization possible, no better way to find the nearest neighbour than exhaustive search. But: only 3 features are enough to locate objects, for example when occluded. Hough Transform method is used to describe clusters of keypoints as shapes and let them 'vote' for the pose of an object, described in location, orientation and scale.
Application: robot vision, localization and mapping Se, S. Lowe, D. G. Little, J. Vision-based Mobile Robot Localization And Mapping using Scale- Invariant Features, 2001 Application of SIFT to mobile robotics SIFT features combined with Simultaneous Localization And Map Building (SLAMB) Recognizing landmarks: estimation of the 10m by 10m lab, 3000 features collected Preliminary results: quite good
Conclusions from the paper The keypoints SIFT extracts are indeed invariant to image rotation, scale and robust to affine distortion, noise and change in illumination. SIFT can be optimized to run real-time. The proposed approach (SIFT combined with Hough transform for object recognition) has shown to work reliably.
Discussion Is the SIFT method for keypoint extraction the best way to get distinctive features from images? Is SIFT biologically plausible? Is it important to have biologically inspired methods in object recognition / localization?
References Main article: Distictive Image Features from Scale-Invariant Keypoints, D. G. Lowe. International Journal of Computer Vision 60, , Other articles: Depth from Familiar Objects: A Hierarchical Model for 3D Scenes, Sudderth et al, Proceedings of the 2006 IEEE Conference on computer vision and pattern recognition, volume II, , Vision-based Mobile Robot Localization And Mapping using Scale-Invariant Features, Se, S. Lowe, D. G. Little, J., 2001