CS 1699: Intro to Computer Vision Local Image Features: Extraction and Description Prof. Adriana Kovashka University of Pittsburgh September 17, 2015
Announcement What: CS Welcome Party When: Friday, September 18, 2015 at 11:00AM Where: SENSQ 5317 Why (1): To learn about the CS Department, the CS Major and other interesting / useful things Why (2): To meet other students and faculty Why (3): To have some pizza!
Plan for today Feature extraction / keypoint detection Feature description Homework due on Tuesday Office hours today moved to 1pm-2pm tomorrow
An image is a set of pixels… What we seeWhat a computer sees Source: S. Narasimhan Adapted from S. Narasimhan
Problems with pixel representation Not invariant to small changes – Translation – Illumination – etc. Some parts of an image are more important than others What do we want to represent?
Interest points Note: “interest points” = “keypoints”, also sometimes called “features” Many applications – tracking: which points are good to track? – recognition: find patches likely to tell us something about object category – 3D reconstruction: find correspondences across different views D. Hoiem
Human eye movements Yarbus eye tracking D. Hoiem
Interest points Suppose you have to click on some point, go away and come back after I deform the image, and click on the same points again. – Which points would you choose? original deformed D. Hoiem
Choosing distinctive interest points If you wanted to meet a friend would you say a)“Let’s meet on campus.” b)“Let’s meet on Green street.” c)“Let’s meet at Green and Wright.” – Corner detection Or if you were in a secluded area: a)“Let’s meet in the Plains of Akbar.” b)“Let’s meet on the side of Mt. Doom.” c)“Let’s meet on top of Mt. Doom.” – Blob (valley/peak) detection D. Hoiem
Choosing interest points Where would you tell your friend to meet you? D. Hoiem
Choosing interest points Where would you tell your friend to meet you? D. Hoiem
What points would you choose? K. Grauman
Overview of Keypoint Matching K. Grauman, B. Leibe B1B1 B2B2 B3B3 A1A1 A2A2 A3A3 1. Find a set of distinctive key- points 3. Extract and normalize the region content 2. Define a region around each keypoint 4. Compute a local descriptor from the normalized region 5. Match local descriptors Query In database
Goals for Keypoints Detect points that are repeatable and distinctive Adapted from D. Hoiem
Key trade-offs More Repeatable More Points B1B1 B2B2 B3B3 A1A1 A2A2 A3A3 Detection More Distinctive More Flexible Description Robust to occlusion Minimize wrong matches Robust to expected variations Maximize correct matches Precise localization Adapted from D. Hoiem
Why extract features? Motivation: panorama stitching We have two images – how do we combine them? L. Lazebnik
Why extract features? Motivation: panorama stitching We have two images – how do we combine them? Step 1: extract features Step 2: match features L. Lazebnik
Why extract features? Motivation: panorama stitching We have two images – how do we combine them? Step 1: extract features Step 2: match features Step 3: align images L. Lazebnik
Goal: interest operator repeatability We want to detect (at least some of) the same points in both images. No chance to find true matches, yet we have to be able to run the detection procedure independently per image. K. Grauman
We want to be able to reliably determine which point goes with which. Must provide some invariance to geometric and photometric differences between the two views. ? Goal: descriptor distinctiveness K. Grauman
Corners as distinctive interest points We should easily recognize the point by looking through a small window Shifting a window in any direction should give a large change in intensity “edge”: no change along the edge direction “corner”: significant change in all directions “flat” region: no change in all directions A. Efros, D. Frolova, D. Simakov
Corners as distinctive interest points We should easily recognize the point by looking through a small window Shifting a window in any direction should give a large change in intensity “edge”: no change along the edge direction “corner”: significant change in all directions “flat” region: no change in all directions Adapted from A. Efros, D. Frolova, D. Simakov
Corners as distinctive interest points We should easily recognize the point by looking through a small window Shifting a window in any direction should give a large change in intensity “edge”: no change along the edge direction “corner”: significant change in all directions “flat” region: no change in all directions Adapted from A. Efros, D. Frolova, D. Simakov
Corners as distinctive interest points We should easily recognize the point by looking through a small window Shifting a window in any direction should give a large change in intensity “edge”: no change along the edge direction “corner”: significant change in all directions “flat” region: no change in all directions Adapted from A. Efros, D. Frolova, D. Simakov
Corners as distinctive interest points We should easily recognize the point by looking through a small window Shifting a window in any direction should give a large change in intensity “edge”: no change along the edge direction “corner”: significant change in all directions “flat” region: no change in all directions Adapted from A. Efros, D. Frolova, D. Simakov
Harris Detector: Mathematics Window-averaged squared change of intensity induced by shifting the image data by [ u,v ]: Intensity Shifted intensity Window function or Window function w(x,y) = Gaussian1 in window, 0 outside D. Frolova, D. Simakov
Harris Detector: Mathematics Window-averaged squared change of intensity induced by shifting the image data by [ u,v ]: Intensity Shifted intensity Window function E(u, v) D. Frolova, D. Simakov
Harris Detector: Mathematics Expanding I(x,y) in a Taylor series expansion, we have, for small shifts [u,v], a quadratic approximation to the error surface between a patch and itself, shifted by [u,v]: where M is a 2 2 matrix computed from image derivatives: D. Frolova, D. Simakov
Notation: K. Grauman Harris Detector: Mathematics
Harris Detector [ Harris88 ] Second moment matrix Image derivatives 2. Square of derivatives 3. Gaussian filter g( I ) IxIx IyIy Ix2Ix2 Iy2Iy2 IxIyIxIy g(I x 2 )g(I y 2 ) g(I x I y ) 4. Cornerness function – both eigenvalues are strong har 5. Non-maxima suppression (optionally, blur first) D. Hoiem
First, consider an axis-aligned corner: What does this matrix reveal? K. Grauman
First, consider an axis-aligned corner: This means dominant gradient directions align with x or y axis Look for locations where both λ’s are large. If either λ is close to 0, then this is not corner-like. What does this matrix reveal? What if we have a corner that is not aligned with the image axes? K. Grauman
What does this matrix reveal? Since M is symmetric, we have The eigenvalues of M reveal the amount of intensity change in the two principal orthogonal gradient directions in the window. K. Grauman
Corner response function “flat” region: 1 and 2 are small “edge”: 1 >> 2 2 >> 1 “corner”: 1 and 2 are large, 1 ~ 2 Adapted from A. Efros, D. Frolova, D. Simakov, K. Grauman
Harris Detector: Mathematics Measure of corner response: ( k – empirical constant, k = ) D. Frolova, D. Simakov
Harris Detector: Summary Compute image gradients I x and I y for all pixels For each pixel – Compute by looping over neighbors x, y – compute Find points with large corner response function R ( R > threshold) Take the points of locally maximum R as the detected feature points (i.e., pixels where R is bigger than for all the 4 or 8 neighbors). 36 D. Frolova, D. Simakov (k :empirical constant, k = )
K. Grauman Example of Harris application
Compute corner response at every pixel. Example of Harris application K. Grauman
Harris Detector – Responses [ Harris88 ] Effect: A very precise corner detector. D. Hoiem
Harris Detector - Responses [ Harris88 ] D. Hoiem
Affine intensity change Only derivatives are used => invariance to intensity shift I I + b Intensity scaling: I a I R x (image coordinate) threshold R x (image coordinate) Partially invariant to affine intensity change I a I + b L. Lazebnik
Image translation Derivatives and window function are shift-invariant Corner location is covariant w.r.t. translation L. Lazebnik
Image rotation Second moment ellipse rotates but its shape (i.e. eigenvalues) remains the same Corner location is covariant w.r.t. rotation L. Lazebnik
Scaling Invariant to image scale? image zoomed image A. Torralba
Scaling All points will be classified as edges Corner Corner location is not covariant to scaling! L. Lazebnik
Scale Invariant Detection The problem: how do we choose corresponding circles independently in each image? Do objects in the image have a characteristic scale that we can identify? D. Frolova, D. Simakov
Scale Invariant Detection Solution: – Design a function on the region (circle), which is “scale invariant” (the same for corresponding regions, even if they are at different scales) – Take a local maximum of this function scale = 1/2 f region size Image 1 f region size Image 2 A. Torralba s1s1 s2s2
Scale Invariant Detection A “good” function for scale detection: has one stable sharp peak f region size Bad f region size Bad f region size Good ! For usual images: a good function would be a one which responds to contrast (sharp local intensity change) A. Torralba
Automatic Scale Selection K. Grauman, B. Leibe How to find corresponding patch sizes?
Automatic Scale Selection Function responses for increasing scale (scale signature) K. Grauman, B. Leibe
Automatic Scale Selection Function responses for increasing scale (scale signature) K. Grauman, B. Leibe
Automatic Scale Selection Function responses for increasing scale (scale signature) K. Grauman, B. Leibe
Automatic Scale Selection Function responses for increasing scale (scale signature) K. Grauman, B. Leibe
Automatic Scale Selection Function responses for increasing scale (scale signature) K. Grauman, B. Leibe
Automatic Scale Selection Function responses for increasing scale (scale signature) K. Grauman, B. Leibe
Recall: Edge detection f S. Seitz Edge Derivative of Gaussian Edge = max of derivative
f Edge Second derivative of Gaussian Edge = zero crossing of 2 nd derivative Recall: Edge detection S. Seitz
From edges to blobs Edge = ripple Blob = superposition of two ripples Spatial selection: the magnitude of the Laplacian response will achieve a maximum at the center of the blob, provided the scale of the Laplacian is “matched” to the scale of the blob maximum L. Lazebnik
Blob detection in 2D Laplacian of Gaussian: Circularly symmetric operator for blob detection in 2D K. Grauman
What Is A Useful Signature Function? Laplacian of Gaussian = “blob” detector K. Grauman, B. Leibe
We can approximate the Laplacian with a difference of Gaussians; more efficient to implement. (Laplacian) (Difference of Gaussians)
Gaussian pyramid example
DoG – Efficient Computation Computation in Gaussian scale pyramid K. Grauman, B. Leibe Original image Sampling with step =2
Find local maxima in position-scale space of Difference-of-Gaussian K. Grauman, B. Leibe List of (x, y, s)
Results: Difference-of-Gaussian K. Grauman, B. Leibe
Gaussian pyramid example
Local features: desired properties Repeatability – The same feature can be found in several images despite geometric and photometric transformations Saliency – Each feature has a distinctive description Compactness and efficiency – Many fewer features than image pixels Locality – A feature occupies a relatively small area of the image; robust to clutter and occlusion K. Grauman
Plan for today Feature extraction / keypoint detection – Corners – Blobs Feature description
Raw patches as local descriptors The simplest way to describe the neighborhood around an interest point is to write down the list of intensities to form a feature vector. But this is very sensitive to even small shifts, rotations. K. Grauman
Geometric transformations e.g. scale, translation, rotation K. Grauman
Photometric transformations T. Tuytelaars
Local Descriptors The ideal descriptor should be – Robust – Distinctive – Compact – Efficient Most available descriptors focus on edge/gradient information – Capture texture information – Color rarely used K. Grauman, B. Leibe
Local Descriptors: SIFT Descriptor [Lowe, ICCV 1999] Histogram of oriented gradients Captures important texture information Robust to small translations / affine deformations K. Grauman, B. Leibe
Basics of Lowe’s SIFT algorithm Run Difference-of-Gaussian keypoint detector – Find maxima in location/scale space Find all major orientations – Bin orientations – Weight by gradient magnitude – Weight by distance to center (Gaussian-weighted mean) – Return orientations within 0.8 of peak For each (x, y, scale, orientation), create descriptor: – Sample 16x16 gradient magnitude and relative orientation – Bin 4x4 samples into 4x4 histograms – Threshold values to max of 0.2, divide by L2 norm – Final descriptor: 4x4x8 normalized histograms Lowe IJCV 2004
Basic idea: Take 16x16 square window around detected feature Compute gradient orientation for each pixel Create histogram over edge orientations weighted by magnitude Scale Invariant Feature Transform L. Zitnick, adapted from D. Lowe 0 22 angle histogram
SIFT descriptor Full version Divide the 16x16 window into a 4x4 grid of cells (2x2 case shown below) Compute an orientation histogram for each cell 16 cells * 8 orientations = 128 dimensional descriptor L. Zitnick, adapted from D. Lowe
Full version Divide the 16x16 window into a 4x4 grid of cells (2x2 case shown below) Compute an orientation histogram for each cell 16 cells * 8 orientations = 128 dimensional descriptor Threshold normalize the descriptor: SIFT descriptor 0.2 such that: L. Zitnick, adapted from D. Lowe
CSE 576: Computer Vision Making descriptor rotation invariant Image from Matthew Brown Rotate patch according to its dominant gradient orientation This puts the patches into a canonical orientation. K. Grauman
Extraordinarily robust matching technique Can handle changes in viewpoint Up to about 60 degree out of plane rotation Can handle significant changes in illumination Sometimes even day vs. night (below) Fast and efficient—can run in real time Lots of code available SIFT descriptor [Lowe 2004] S. Seitz
Matching local features ? To generate candidate matches, find patches that have the most similar appearance (e.g., lowest SSD) Simplest approach: compare them all, take the closest (or closest k, or within a thresholded distance) Image 1 Image 2 K. Grauman
Ambiguous matches At what SSD value do we have a good match? To add robustness to matching, can consider ratio : distance to best match / distance to second best match If low, first match looks good. If high, could be ambiguous match. Image 1 Image 2 ???? K. Grauman
Matching SIFT Descriptors Nearest neighbor (Euclidean distance) Threshold ratio of nearest to 2 nd nearest descriptor Lowe IJCV 2004
SIFT Repeatability Lowe IJCV 2004
SIFT Repeatability Lowe IJCV 2004
Robust feature-based alignment L. Lazebnik
Robust feature-based alignment Extract features L. Lazebnik
Robust feature-based alignment Extract features Compute putative matches L. Lazebnik
Robust feature-based alignment Extract features Compute putative matches Loop: Hypothesize transformation T (small group of putative matches that are related by T) L. Lazebnik
Robust feature-based alignment Extract features Compute putative matches Loop: Hypothesize transformation T (small group of putative matches that are related by T) Verify transformation (search for other matches consistent with T) L. Lazebnik
Robust feature-based alignment Extract features Compute putative matches Loop: Hypothesize transformation T (small group of putative matches that are related by T) Verify transformation (search for other matches consistent with T) L. Lazebnik
Applications of local invariant features Image alignment Indexing and retrieval Recognition Robot navigation 3D reconstruction Wide baseline stereo Panoramas Motion tracking … Adapted from K. Grauman and L. Lazebnik
Recognition of specific objects, scenes Rothganger et al Lowe 2002 Schmid and Mohr 1997 Sivic and Zisserman, 2003 K. Grauman
Wide baseline stereo [Image from T. Tuytelaars ECCV 2006 tutorial]
Automatic mosaicing K. Grauman
Summary Keypoint detection: repeatable and distinctive – Corners, blobs, stable regions – Laplacian of Gaussian, automatic scale selection Descriptors: robust and selective – Histograms for robustness to small shifts and translations (SIFT descriptor) Adapted from D. Hoiem, K. Grauman