CS 1699: Intro to Computer Vision Local Image Features: Extraction and Description Prof. Adriana Kovashka University of Pittsburgh September 17, 2015.

CS 1699: Intro to Computer Vision Local Image Features: Extraction and Description Prof. Adriana Kovashka University of Pittsburgh September 17, 2015

Announcement What: CS Welcome Party When: Friday, September 18, 2015 at 11:00AM Where: SENSQ 5317 Why (1): To learn about the CS Department, the CS Major and other interesting / useful things Why (2): To meet other students and faculty Why (3): To have some pizza!

Plan for today Feature extraction / keypoint detection Feature description Homework due on Tuesday Office hours today moved to 1pm-2pm tomorrow

An image is a set of pixels… What we seeWhat a computer sees Source: S. Narasimhan Adapted from S. Narasimhan

Problems with pixel representation Not invariant to small changes – Translation – Illumination – etc. Some parts of an image are more important than others What do we want to represent?

Interest points Note: “interest points” = “keypoints”, also sometimes called “features” Many applications – tracking: which points are good to track? – recognition: find patches likely to tell us something about object category – 3D reconstruction: find correspondences across different views D. Hoiem

Human eye movements Yarbus eye tracking D. Hoiem

Interest points Suppose you have to click on some point, go away and come back after I deform the image, and click on the same points again. – Which points would you choose? original deformed D. Hoiem

Choosing distinctive interest points If you wanted to meet a friend would you say a)“Let’s meet on campus.” b)“Let’s meet on Green street.” c)“Let’s meet at Green and Wright.” – Corner detection Or if you were in a secluded area: a)“Let’s meet in the Plains of Akbar.” b)“Let’s meet on the side of Mt. Doom.” c)“Let’s meet on top of Mt. Doom.” – Blob (valley/peak) detection D. Hoiem

Choosing interest points Where would you tell your friend to meet you? D. Hoiem

What points would you choose? K. Grauman

Overview of Keypoint Matching K. Grauman, B. Leibe B1B1 B2B2 B3B3 A1A1 A2A2 A3A3 1. Find a set of distinctive keypoints 3. Extract and normalize the region content 2. Define a region around each keypoint 4. Compute a local descriptor from the normalized region 5. Match local descriptors Query In database

Goals for Keypoints Detect points that are repeatable and distinctive Adapted from D. Hoiem

Key trade-offs More Repeatable More Points B1B1 B2B2 B3B3 A1A1 A2A2 A3A3 Detection More Distinctive More Flexible Description Robust to occlusion Minimize wrong matches Robust to expected variations Maximize correct matches Precise localization Adapted from D. Hoiem

Why extract features? Motivation: panorama stitching We have two images – how do we combine them? L. Lazebnik

Why extract features? Motivation: panorama stitching We have two images – how do we combine them? Step 1: extract features Step 2: match features L. Lazebnik

Why extract features? Motivation: panorama stitching We have two images – how do we combine them? Step 1: extract features Step 2: match features Step 3: align images L. Lazebnik

Goal: interest operator repeatability We want to detect (at least some of) the same points in both images. No chance to find true matches, yet we have to be able to run the detection procedure independently per image. K. Grauman

We want to be able to reliably determine which point goes with which. Must provide some invariance to geometric and photometric differences between the two views. ? Goal: descriptor distinctiveness K. Grauman

Corners as distinctive interest points We should easily recognize the point by looking through a small window Shifting a window in any direction should give a large change in intensity “edge”: no change along the edge direction “corner”: significant change in all directions “flat” region: no change in all directions A. Efros, D. Frolova, D. Simakov

Corners as distinctive interest points We should easily recognize the point by looking through a small window Shifting a window in any direction should give a large change in intensity “edge”: no change along the edge direction “corner”: significant change in all directions “flat” region: no change in all directions Adapted from A. Efros, D. Frolova, D. Simakov

Harris Detector: Mathematics Window-averaged squared change of intensity induced by shifting the image data by [ u,v ]: Intensity Shifted intensity Window function or Window function w(x,y) = Gaussian1 in window, 0 outside D. Frolova, D. Simakov

Harris Detector: Mathematics Window-averaged squared change of intensity induced by shifting the image data by [ u,v ]: Intensity Shifted intensity Window function E(u, v) D. Frolova, D. Simakov

Harris Detector: Mathematics Expanding I(x,y) in a Taylor series expansion, we have, for small shifts [u,v], a quadratic approximation to the error surface between a patch and itself, shifted by [u,v]: where M is a 2  2 matrix computed from image derivatives: D. Frolova, D. Simakov

Notation: K. Grauman Harris Detector: Mathematics

Harris Detector [ Harris88 ] Second moment matrix 30 1. Image derivatives 2. Square of derivatives 3. Gaussian filter g(  I ) IxIx IyIy Ix2Ix2 Iy2Iy2 IxIyIxIy g(I x 2 )g(I y 2 ) g(I x I y ) 4. Cornerness function – both eigenvalues are strong har 5. Non-maxima suppression (optionally, blur first) D. Hoiem

First, consider an axis-aligned corner: What does this matrix reveal? K. Grauman

First, consider an axis-aligned corner: This means dominant gradient directions align with x or y axis Look for locations where both λ’s are large. If either λ is close to 0, then this is not corner-like. What does this matrix reveal? What if we have a corner that is not aligned with the image axes? K. Grauman

What does this matrix reveal? Since M is symmetric, we have The eigenvalues of M reveal the amount of intensity change in the two principal orthogonal gradient directions in the window. K. Grauman

Corner response function “flat” region: 1 and 2 are small “edge”: 1 >> 2 2 >> 1 “corner”: 1 and 2 are large, 1 ~ 2 Adapted from A. Efros, D. Frolova, D. Simakov, K. Grauman

Harris Detector: Mathematics Measure of corner response: ( k – empirical constant, k = 0.04-0.06) D. Frolova, D. Simakov

Harris Detector: Summary Compute image gradients I x and I y for all pixels For each pixel – Compute by looping over neighbors x, y – compute Find points with large corner response function R ( R > threshold) Take the points of locally maximum R as the detected feature points (i.e., pixels where R is bigger than for all the 4 or 8 neighbors). 36 D. Frolova, D. Simakov (k :empirical constant, k = 0.04-0.06)

K. Grauman Example of Harris application

Compute corner response at every pixel. Example of Harris application K. Grauman

Harris Detector – Responses [ Harris88 ] Effect: A very precise corner detector. D. Hoiem

Harris Detector - Responses [ Harris88 ] D. Hoiem

Affine intensity change Only derivatives are used => invariance to intensity shift I  I + b Intensity scaling: I  a I R x (image coordinate) threshold R x (image coordinate) Partially invariant to affine intensity change I  a I + b L. Lazebnik

Image translation Derivatives and window function are shift-invariant Corner location is covariant w.r.t. translation L. Lazebnik

Image rotation Second moment ellipse rotates but its shape (i.e. eigenvalues) remains the same Corner location is covariant w.r.t. rotation L. Lazebnik

Scaling Invariant to image scale? image zoomed image A. Torralba

Scaling All points will be classified as edges Corner Corner location is not covariant to scaling! L. Lazebnik

Scale Invariant Detection The problem: how do we choose corresponding circles independently in each image? Do objects in the image have a characteristic scale that we can identify? D. Frolova, D. Simakov

Scale Invariant Detection Solution: – Design a function on the region (circle), which is “scale invariant” (the same for corresponding regions, even if they are at different scales) – Take a local maximum of this function scale = 1/2 f region size Image 1 f region size Image 2 A. Torralba s1s1 s2s2

Scale Invariant Detection A “good” function for scale detection: has one stable sharp peak f region size Bad f region size Bad f region size Good ! For usual images: a good function would be a one which responds to contrast (sharp local intensity change) A. Torralba

Automatic Scale Selection K. Grauman, B. Leibe How to find corresponding patch sizes?

Automatic Scale Selection Function responses for increasing scale (scale signature) K. Grauman, B. Leibe

Recall: Edge detection f S. Seitz Edge Derivative of Gaussian Edge = max of derivative

f Edge Second derivative of Gaussian Edge = zero crossing of 2 nd derivative Recall: Edge detection S. Seitz

From edges to blobs Edge = ripple Blob = superposition of two ripples Spatial selection: the magnitude of the Laplacian response will achieve a maximum at the center of the blob, provided the scale of the Laplacian is “matched” to the scale of the blob maximum L. Lazebnik

Blob detection in 2D Laplacian of Gaussian: Circularly symmetric operator for blob detection in 2D K. Grauman

What Is A Useful Signature Function? Laplacian of Gaussian = “blob” detector K. Grauman, B. Leibe

We can approximate the Laplacian with a difference of Gaussians; more efficient to implement. (Laplacian) (Difference of Gaussians)

Gaussian pyramid example

DoG – Efficient Computation Computation in Gaussian scale pyramid K. Grauman, B. Leibe  Original image Sampling with step    =2   

Find local maxima in position-scale space of Difference-of-Gaussian K. Grauman, B. Leibe       List of (x, y, s)

Results: Difference-of-Gaussian K. Grauman, B. Leibe

Gaussian pyramid example

Local features: desired properties Repeatability – The same feature can be found in several images despite geometric and photometric transformations Saliency – Each feature has a distinctive description Compactness and efficiency – Many fewer features than image pixels Locality – A feature occupies a relatively small area of the image; robust to clutter and occlusion K. Grauman

Plan for today Feature extraction / keypoint detection – Corners – Blobs Feature description

Raw patches as local descriptors The simplest way to describe the neighborhood around an interest point is to write down the list of intensities to form a feature vector. But this is very sensitive to even small shifts, rotations. K. Grauman

Geometric transformations e.g. scale, translation, rotation K. Grauman

Photometric transformations T. Tuytelaars

Local Descriptors The ideal descriptor should be – Robust – Distinctive – Compact – Efficient Most available descriptors focus on edge/gradient information – Capture texture information – Color rarely used K. Grauman, B. Leibe

Local Descriptors: SIFT Descriptor [Lowe, ICCV 1999] Histogram of oriented gradients Captures important texture information Robust to small translations / affine deformations K. Grauman, B. Leibe

Basics of Lowe’s SIFT algorithm Run Difference-of-Gaussian keypoint detector – Find maxima in location/scale space Find all major orientations – Bin orientations – Weight by gradient magnitude – Weight by distance to center (Gaussian-weighted mean) – Return orientations within 0.8 of peak For each (x, y, scale, orientation), create descriptor: – Sample 16x16 gradient magnitude and relative orientation – Bin 4x4 samples into 4x4 histograms – Threshold values to max of 0.2, divide by L2 norm – Final descriptor: 4x4x8 normalized histograms Lowe IJCV 2004

Basic idea: Take 16x16 square window around detected feature Compute gradient orientation for each pixel Create histogram over edge orientations weighted by magnitude Scale Invariant Feature Transform L. Zitnick, adapted from D. Lowe 0 22 angle histogram

SIFT descriptor Full version Divide the 16x16 window into a 4x4 grid of cells (2x2 case shown below) Compute an orientation histogram for each cell 16 cells * 8 orientations = 128 dimensional descriptor L. Zitnick, adapted from D. Lowe

Full version Divide the 16x16 window into a 4x4 grid of cells (2x2 case shown below) Compute an orientation histogram for each cell 16 cells * 8 orientations = 128 dimensional descriptor Threshold normalize the descriptor: SIFT descriptor 0.2 such that: L. Zitnick, adapted from D. Lowe

CSE 576: Computer Vision Making descriptor rotation invariant Image from Matthew Brown Rotate patch according to its dominant gradient orientation This puts the patches into a canonical orientation. K. Grauman

Extraordinarily robust matching technique Can handle changes in viewpoint Up to about 60 degree out of plane rotation Can handle significant changes in illumination Sometimes even day vs. night (below) Fast and efficient—can run in real time Lots of code available http://people.csail.mit.edu/albert/ladypack/wiki/index.php/Known_implementations_of_SIFT SIFT descriptor [Lowe 2004] S. Seitz

Matching local features ? To generate candidate matches, find patches that have the most similar appearance (e.g., lowest SSD) Simplest approach: compare them all, take the closest (or closest k, or within a thresholded distance) Image 1 Image 2 K. Grauman

Ambiguous matches At what SSD value do we have a good match? To add robustness to matching, can consider ratio : distance to best match / distance to second best match If low, first match looks good. If high, could be ambiguous match. Image 1 Image 2 ???? K. Grauman

Matching SIFT Descriptors Nearest neighbor (Euclidean distance) Threshold ratio of nearest to 2 nd nearest descriptor Lowe IJCV 2004

SIFT Repeatability Lowe IJCV 2004

Robust feature-based alignment L. Lazebnik

Robust feature-based alignment Extract features L. Lazebnik

Robust feature-based alignment Extract features Compute putative matches L. Lazebnik

Robust feature-based alignment Extract features Compute putative matches Loop: Hypothesize transformation T (small group of putative matches that are related by T) L. Lazebnik

Robust feature-based alignment Extract features Compute putative matches Loop: Hypothesize transformation T (small group of putative matches that are related by T) Verify transformation (search for other matches consistent with T) L. Lazebnik

Applications of local invariant features Image alignment Indexing and retrieval Recognition Robot navigation 3D reconstruction Wide baseline stereo Panoramas Motion tracking … Adapted from K. Grauman and L. Lazebnik

Recognition of specific objects, scenes Rothganger et al. 2003 Lowe 2002 Schmid and Mohr 1997 Sivic and Zisserman, 2003 K. Grauman

Wide baseline stereo [Image from T. Tuytelaars ECCV 2006 tutorial]

Automatic mosaicing http://www.cs.ubc.ca/~mbrown/autostitch/autostitch.html K. Grauman

Summary Keypoint detection: repeatable and distinctive – Corners, blobs, stable regions – Laplacian of Gaussian, automatic scale selection Descriptors: robust and selective – Histograms for robustness to small shifts and translations (SIFT descriptor) Adapted from D. Hoiem, K. Grauman

CS 1699: Intro to Computer Vision Local Image Features: Extraction and Description Prof. Adriana Kovashka University of Pittsburgh September 17, 2015.

Similar presentations

Presentation on theme: "CS 1699: Intro to Computer Vision Local Image Features: Extraction and Description Prof. Adriana Kovashka University of Pittsburgh September 17, 2015."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS 1699: Intro to Computer Vision Local Image Features: Extraction and Description Prof. Adriana Kovashka University of Pittsburgh September 17, 2015.

Similar presentations

Presentation on theme: "CS 1699: Intro to Computer Vision Local Image Features: Extraction and Description Prof. Adriana Kovashka University of Pittsburgh September 17, 2015."— Presentation transcript:

Similar presentations

About project

Feedback