NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.

Slides:



Advertisements
Similar presentations
Distinctive Image Features from Scale-Invariant Keypoints
Advertisements

Recognising Panoramas M. Brown and D. Lowe, University of British Columbia.
Object Recognition from Local Scale-Invariant Features David G. Lowe Presented by Ashley L. Kapron.
The SIFT (Scale Invariant Feature Transform) Detector and Descriptor
TP14 - Local features: detection and description Computer Vision, FCUP, 2014 Miguel Coimbra Slides by Prof. Kristen Grauman.
Object Recognition using Invariant Local Features Applications l Mobile robots, driver assistance l Cell phone location or object recognition l Panoramas,
Distinctive Image Features from Scale- Invariant Keypoints Mohammad-Amin Ahantab Technische Universität München, Germany.
IBBT – Ugent – Telin – IPI Dimitri Van Cauwelaert A study of the 2D - SIFT algorithm Dimitri Van Cauwelaert.
Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.
Fast High-Dimensional Feature Matching for Object Recognition David Lowe Computer Science Department University of British Columbia.
Computational Photography
Invariant Features.
A Study of Approaches for Object Recognition
Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.
Recognising Panoramas
SIFT Guest Lecture by Jiwon Kim
Automatic Panoramic Image Stitching using Local Features Matthew Brown and David Lowe, University of British Columbia.
Distinctive Image Feature from Scale-Invariant KeyPoints
Distinctive image features from scale-invariant keypoints. David G. Lowe, Int. Journal of Computer Vision, 60, 2 (2004), pp Presented by: Shalomi.
Object Recognition Using Distinctive Image Feature From Scale-Invariant Key point D. Lowe, IJCV 2004 Presenting – Anat Kaspi.
Automatic Image Stitching using Invariant Features Matthew Brown and David Lowe, University of British Columbia.
Fitting a Model to Data Reading: 15.1,
Scale Invariant Feature Transform (SIFT)
Automatic Matching of Multi-View Images
Image Features: Descriptors and matching
SIFT - The Scale Invariant Feature Transform Distinctive image features from scale-invariant keypoints. David G. Lowe, International Journal of Computer.
Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2006 with a lot of slides stolen from Steve Seitz and.
The SIFT (Scale Invariant Feature Transform) Detector and Descriptor
776 Computer Vision Jan-Michael Frahm, Enrique Dunn Spring 2013.
Distinctive Image Features from Scale-Invariant Keypoints David G. Lowe – IJCV 2004 Brien Flewelling CPSC 643 Presentation 1.
Sebastian Thrun CS223B Computer Vision, Winter Stanford CS223B Computer Vision, Winter 2005 Lecture 3 Advanced Features Sebastian Thrun, Stanford.
Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.
Distinctive Image Features from Scale-Invariant Keypoints By David G. Lowe, University of British Columbia Presented by: Tim Havinga, Joël van Neerbos.
Computer vision.
Local Invariant Features
Recognition and Matching based on local invariant features Cordelia Schmid INRIA, Grenoble David Lowe Univ. of British Columbia.
Object Tracking/Recognition using Invariant Local Features Applications l Mobile robots, driver assistance l Cell phone location or object recognition.
Image Stitching Shangliang Jiang Kate Harrison. What is image stitching?
Feature Detection. Image features Global features –global properties of an image, including intensity histogram, frequency domain descriptors, covariance.
Reporter: Fei-Fei Chen. Wide-baseline matching Object recognition Texture recognition Scene classification Robot wandering Motion tracking.
776 Computer Vision Jan-Michael Frahm Fall SIFT-detector Problem: want to detect features at different scales (sizes) and with different orientations!
CVPR 2003 Tutorial Recognition and Matching Based on Local Invariant Features David Lowe Computer Science Department University of British Columbia.
CSCE 643 Computer Vision: Extractions of Image Features Jinxiang Chai.
Lecture 7: Features Part 2 CS4670/5670: Computer Vision Noah Snavely.
Distinctive Image Features from Scale-Invariant Keypoints Ronnie Bajwa Sameer Pawar * * Adapted from slides found online by Michael Kowalski, Lehigh University.
Features Digital Visual Effects, Spring 2006 Yung-Yu Chuang 2006/3/15 with slides by Trevor Darrell Cordelia Schmid, David Lowe, Darya Frolova, Denis Simakov,
Distinctive Image Features from Scale-Invariant Keypoints David Lowe Presented by Tony X. Han March 11, 2008.
Matching features Computational Photography, Prof. Bill Freeman April 11, 2006 Image and shape descriptors: Harris corner detectors and SIFT features.
CSE 185 Introduction to Computer Vision Feature Matching.
Local features: detection and description
Presented by David Lee 3/20/2006
776 Computer Vision Jan-Michael Frahm Spring 2012.
Recognizing specific objects Matching with SIFT Original suggestion Lowe, 1999,2004.
Distinctive Image Features from Scale-Invariant Keypoints Presenter :JIA-HONG,DONG Advisor : Yen- Ting, Chen 1 David G. Lowe International Journal of Computer.
SIFT.
776 Computer Vision Jan-Michael Frahm Spring 2012.
SIFT Scale-Invariant Feature Transform David Lowe
Presented by David Lee 3/20/2006
Lecture 07 13/12/2011 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.
Distinctive Image Features from Scale-Invariant Keypoints
Scale Invariant Feature Transform (SIFT)
Nearest-neighbor matching to feature database
Nearest-neighbor matching to feature database
CAP 5415 Computer Vision Fall 2012 Dr. Mubarak Shah Lecture-5
The SIFT (Scale Invariant Feature Transform) Detector and Descriptor
Interest Points & Descriptors 3 - SIFT
SIFT.
CSE 185 Introduction to Computer Vision
Presented by Xu Miao April 20, 2005
Recognition and Matching based on local invariant features
Presentation transcript:

NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia

Object Recognition n Definition: Identify an object and determine its pose and model parameters n Commercial object recognition l Currently a $4 billion/year industry for inspection and assembly l Almost entirely based on template matching n Upcoming applications l Mobile robots, toys, user interfaces l Location recognition l Digital camera panoramas, 3D scene modeling

Invariant Local Features n Image content is transformed into local feature coordinates that are invariant to translation, rotation, scale, and other imaging parameters SIFT Features

Advantages of invariant local features n Locality: features are local, so robust to occlusion and clutter (no prior segmentation) n Distinctiveness: individual features can be matched to a large database of objects n Quantity: many features can be generated for even small objects n Efficiency: close to real-time performance n Extensibility: can easily be extended to wide range of differing feature types, with each adding robustness

Zhang, Deriche, Faugeras, Luong (95) n Apply Harris corner detector n Match points by correlating only at corner points n Derive epipolar alignment using robust least-squares

Cordelia Schmid & Roger Mohr (97) n Apply Harris corner detector n Use rotational invariants at corner points l However, not scale invariant. Sensitive to viewpoint and illumination change.

Scale invariance Requires a method to repeatably select points in location and scale: n The only reasonable scale-space kernel is a Gaussian (Koenderink, 1984; Lindeberg, 1994) n An efficient choice is to detect peaks in the difference of Gaussian pyramid (Burt & Adelson, 1983; Crowley & Parker, 1984 – but examining more scales) n Difference-of-Gaussian with constant ratio of scales is a close approximation to Lindeberg’s scale-normalized Laplacian (can be shown from the heat diffusion equation)

Scale space processed one octave at a time

Key point localization n Detect maxima and minima of difference-of-Gaussian in scale space n Fit a quadratic to surrounding values for sub-pixel and sub-scale interpolation (Brown & Lowe, 2002) n Taylor expansion around point: n Offset of extremum (use finite differences for derivatives):

Sampling frequency for scale More points are found as sampling frequency increases, but accuracy of matching decreases after 3 scales/octave

Eliminating unstable keypoints n Discard points with DOG value below threshold (low contrast) n However, points along edges may have high contrast in one direction but low in another n Compute principal curvatures from eigenvalues of 2x2 Hessian matrix, and limit ratio (Harris approach):

Select canonical orientation n Create histogram of local gradient directions computed at selected scale n Assign canonical orientation at peak of smoothed histogram n Each key specifies stable 2D coordinates (x, y, scale, orientation)

Example of keypoint detection Threshold on value at DOG peak and on ratio of principle curvatures (Harris approach) (a) 233x189 image (b) 832 DOG extrema (c) 729 left after peak value threshold (d) 536 left after testing ratio of principle curvatures

Creating features stable to viewpoint change n Edelman, Intrator & Poggio (97) showed that complex cell outputs are better for 3D recognition than simple correlation

Stability to viewpoint change n Classification of rotated 3D models (Edelman 97): l Complex cells: 94% vs simple cells: 35%

SIFT vector formation n Thresholded image gradients are sampled over 16x16 array of locations in scale space n Create array of orientation histograms n 8 orientations x 4x4 histogram array = 128 dimensions

Feature stability to noise n Match features after random change in image scale & orientation, with differing levels of image noise n Find nearest neighbor in database of 30,000 features

Feature stability to affine change n Match features after random change in image scale & orientation, with 2% image noise, and affine distortion n Find nearest neighbor in database of 30,000 features

Distinctiveness of features n Vary size of database of features, with 30 degree affine change, 2% image noise n Measure % correct for single nearest neighbor match

Nearest-neighbor matching to feature database n Hypotheses are generated by matching each feature to nearest neighbor vectors in database n No fast method exists for always finding 128-element vector to nearest neighbor in a large database n Therefore, use approximate nearest neighbor: l We use best-bin-first (Beis & Lowe, 97) modification to k-d tree algorithm l Use heap data structure to identify bins in order by their distance from query point n Result: Can give speedup by factor of 1000 while finding nearest neighbor (of interest) 95% of the time

Detecting 0.1% inliers among 99.9% outliers n Need to recognize clusters of just 3 consistent features among 3000 feature match hypotheses n LMS or RANSAC would be hopeless! n Generalized Hough transform l Vote for each potential match according to model ID and pose l Insert into multiple bins to allow for error in similarity approximation l Using a hash table instead of an array avoids need to form empty bins or predict array size

Probability of correct match n Compare distance of nearest neighbor to second nearest neighbor (from different object) n Threshold of 0.8 provides excellent separation

Model verification 1)Examine all clusters in Hough transform with at least 3 features 2)Perform least-squares affine fit to model. 3)Discard outliers and perform top-down check for additional features. 4)Evaluate probability that match is correct a)Use Bayesian model, with probability that features would arise by chance if object was not present b)Takes account of object size in image, textured regions, model feature count in database, accuracy of fit (Lowe, CVPR 01)

Solution for affine parameters n Affine transform of [x,y] to [u,v]: n Rewrite to solve for transform parameters:

Planar texture models n Models for planar surfaces with SIFT keys

Planar recognition n Planar surfaces can be reliably recognized at a rotation of 60° away from the camera n Affine fit approximates perspective projection n Only 3 points are needed for recognition

3D Object Recognition n Extract outlines with background subtraction

3D Object Recognition n Only 3 keys are needed for recognition, so extra keys provide robustness n Affine model is no longer as accurate

Recognition under occlusion

Test of illumination invariance n Same image under differing illumination 273 keys verified in final match

Examples of view interpolation

Recognition using View Interpolation

Location recognition

Robot Localization n Joint work with Stephen Se, Jim Little

Map continuously built over time

Locations of map features in 3D

Recognizing Panoramas Matthew Brown and David Lowe (ICCV 2003) n Recognize overlap from an unordered set of images and automatically stitch together n SIFT features provide initial feature matching n Image blending at multiple scales hides the seams Panorama of our lab automatically assembled from 143 images

Why Panoramas? n Are you getting the whole picture? l Compact Camera FOV = 50 x 35°

Why Panoramas? n Are you getting the whole picture? l Compact Camera FOV = 50 x 35° l Human FOV = 200 x 135°

Why Panoramas? n Are you getting the whole picture? l Compact Camera FOV = 50 x 35° l Human FOV = 200 x 135° l Panoramic Mosaic = 360 x 180°

RANSAC for Homography

Finding the panoramas

Bundle Adjustment n New images initialised with rotation, focal length of best matching image

Bundle Adjustment n New images initialised with rotation, focal length of best matching image

Multi-band Blending n Burt & Adelson 1983 Blend frequency bands over range 

Low frequency ( > 2 pixels) High frequency ( < 2 pixels) 2-band Blending

Linear Blending

2-band Blending

Sony Aibo SIFT usage: Recognize charging station Communicate with visual cards Teach object recognition

Object recognition in primates n Object recognition is mediated by neurons in IT cortex that respond to features of intermediate complexity (Tanaka 93,95; Booth & Rolls, 98) n These neurons are largely invariant to location, scale, and contrast of features, but only to limited rotation n Complexity is roughly at the same level as SIFT keys, but some neurons respond to complex objects (e.g., faces)

Object recognition in primates n Neurons are organized in columns of related features, presumably to aid fine discrimination n Room for about 2000 columns (but many more if smaller)

Comparison to template matching n Cost of template matching l 250,000 locations x 30 orientations x 8 scales = 60,000,000 evaluations l Does not easily handle partial occlusion and other variation without large increase in template numbers l Viola & Jones cascade must start again for each qualitatively different template n Cost of local feature approach l 3000 evaluations (reduction by factor of 20,000) l Features are more invariant to illumination, 3D rotation, and object variation l Use of many small subtemplates increases robustness to partial occlusion and other variations

Future directions n Build true 3D models l Integrate features from large number of training views and perform continuous learning n Feature classes can be greatly expanded l Affine-invariant features (Tuytelaars & Van Gool, Mikolajczyk & Schmid, Schaffalitzky & Zisserman, Brown & Lowe) l Incorporate color, texture, varying feature sizes l Include edge features that separate figure from ground n Address instance recognition of generic models l Map feature probabilities to measurements of interest (e.g., specific person, expression, age)