4.1 Feature: Point and Patches Xuejin Chen Reading: Szeliski Chap. 4 222~238.

Slides:

Advertisements

Similar presentations

Feature matching “What stuff in the left image matches with stuff on the right?” Necessary for automatic panorama stitching (Part of Project 4!) Slides.

Advertisements

CSE 473/573 Computer Vision and Image Processing (CVIP)

The SIFT (Scale Invariant Feature Transform) Detector and Descriptor

TP14 - Local features: detection and description Computer Vision, FCUP, 2014 Miguel Coimbra Slides by Prof. Kristen Grauman.

Object Recognition using Invariant Local Features Applications l Mobile robots, driver assistance l Cell phone location or object recognition l Panoramas,

Instructor: Mircea Nicolescu Lecture 15 CS 485 / 685 Computer Vision.

Matching with Invariant Features

Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.

Robust and large-scale alignment Image from

Patch Descriptors CSE P 576 Larry Zitnick

CS4670: Computer Vision Kavita Bala Lecture 10: Feature Matching and Transforms.

Lecture 6: Feature matching CS4670: Computer Vision Noah Snavely.

Lecture 4: Feature matching

Feature matching and tracking Class 5 Read Section 4.1 of course notes Read Shi and Tomasi’s paper on.

Scale Invariant Feature Transform

Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2005 with a lot of slides stolen from Steve Seitz and.

Distinctive image features from scale-invariant keypoints. David G. Lowe, Int. Journal of Computer Vision, 60, 2 (2004), pp Presented by: Shalomi.

Scale Invariant Feature Transform (SIFT)

Lecture 3a: Feature detection and matching CS6670: Computer Vision Noah Snavely.

1 Invariant Local Feature for Object Recognition Presented by Wyman 2/05/2006.

Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2006 with a lot of slides stolen from Steve Seitz and.

CS4670: Computer Vision Kavita Bala Lecture 8: Scale invariance.

Distinctive Image Features from Scale-Invariant Keypoints David G. Lowe – IJCV 2004 Brien Flewelling CPSC 643 Presentation 1.

Sebastian Thrun CS223B Computer Vision, Winter Stanford CS223B Computer Vision, Winter 2005 Lecture 3 Advanced Features Sebastian Thrun, Stanford.

Lecture 6: Feature matching and alignment CS4670: Computer Vision Noah Snavely.

Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Overview Introduction to local features

Interest Point Descriptors

Advanced Computer Vision Chapter 4 Feature Detection and Matching Presented by: 傅楸善 & 許承偉

1 Interest Operators Harris Corner Detector: the first and most basic interest operator Kadir Entropy Detector and its use in object recognition SIFT interest.

Object Tracking/Recognition using Invariant Local Features Applications l Mobile robots, driver assistance l Cell phone location or object recognition.

Overview Harris interest points Comparing interest points (SSD, ZNCC, SIFT) Scale & affine invariant interest points Evaluation and comparison of different.

Local invariant features Cordelia Schmid INRIA, Grenoble.

Computer Vision Lecture 4 FEATURE DETECTION. Feature Detection Feature Description Feature Matching Today’s Topics 2.

Lecture 4: Feature matching CS4670 / 5670: Computer Vision Noah Snavely.

Feature extraction: Corners 9300 Harris Corners Pkwy, Charlotte, NC.

776 Computer Vision Jan-Michael Frahm, Enrique Dunn Spring 2013.

Lecture 7: Features Part 2 CS4670/5670: Computer Vision Noah Snavely.

Harris Corner Detector & Scale Invariant Feature Transform (SIFT)

Distinctive Image Features from Scale-Invariant Keypoints David Lowe Presented by Tony X. Han March 11, 2008.

Lecture 8: Feature matching CS6670: Computer Vision Noah Snavely.

Feature extraction: Corners and blobs. Why extract features? Motivation: panorama stitching We have two images – how do we combine them?

CSE 185 Introduction to Computer Vision Feature Matching.

Project 3 questions? Interest Points and Instance Recognition Computer Vision CS 143, Brown James Hays 10/21/11 Many slides from Kristen Grauman and.

Local features: detection and description

CS654: Digital Image Analysis

776 Computer Vision Jan-Michael Frahm Spring 2012.

Image matching by Diva SianDiva Sian by swashfordswashford TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAA.

Announcements Project 1 Out today Help session at the end of class.

Recognizing specific objects Matching with SIFT Original suggestion Lowe, 1999,2004.

Lecture 14: Feature matching and Transforms CS4670/5670: Computer Vision Kavita Bala.

Invariant Local Features Image content is transformed into local feature coordinates that are invariant to translation, rotation, scale, and other imaging.

776 Computer Vision Jan-Michael Frahm Spring 2012.

Interest Points EE/CSE 576 Linda Shapiro.

Lecture 13: Feature Descriptors and Matching

Lecture 07 13/12/2011 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.

Image Features: Points and Regions

TP12 - Local features: detection and description

Local features: detection and description May 11th, 2017

Feature description and matching

Features Readings All is Vanity, by C. Allan Gilbert,

CAP 5415 Computer Vision Fall 2012 Dr. Mubarak Shah Lecture-5

Announcements Project 1 Project 2 (panoramas)

The SIFT (Scale Invariant Feature Transform) Detector and Descriptor

Announcement Project 1 1.

CSE 185 Introduction to Computer Vision

Feature descriptors and matching

Presentation transcript:

4.1 Feature: Point and Patches Xuejin Chen Reading: Szeliski Chap ~238

4.1.2 Feature Descriptor

Feature Descriptors We know how to detect good points Next question: How to match them? ?

Feature Descriptors Lots of possibilities (this is a popular research area) – Simple option: match square windows around the point – State of the art approach: SIFT David Lowe, UBC ?

Feature Descriptors Sum of Squared Difference (SSD) Normalized Cross-Correlation (NCC) For small motion: video stereo, tracking… Time consuming and sensitive to noise, transforms..

Invariance Suppose we are comparing two images I 1 and I 2 – I 2 may be a transformed version of I 1 – What kinds of transformations are we likely to encounter in practice? We’d like to find the same features regardless of the transformation – This is called transformational invariance – Most feature methods are designed to be invariant to Translation, 2D rotation, scale – They can usually also handle Limited 3D rotations (SIFT works up to about 60 degrees) Limited affine transformations (some are fully affine invariant) Limited illumination/contrast changes

How to Achieve Invariance Need both of the following: 1.Make sure your detector is invariant – Harris is invariant to translation and rotation – Scale is trickier common approach is to detect features at many scales using a Gaussian pyramid (e.g., MOPS) More sophisticated methods find “the best scale” to represent each feature (e.g., SIFT) 2. Design an invariant feature descriptor – A descriptor captures the information in a region around the detected feature point – The simplest descriptor: a square window of pixels What’s this invariant to? – Let’s look at some better approaches…

Find dominant orientation of the image patch – This is given by x +, the eigenvector of H corresponding to + + is the larger eigenvalue – Rotate the patch according to this angle Rotation Invariance for Feature Descriptors Figure by Matthew Brown

Take 40x40 square window around detected feature – Scale to 1/5 size (using prefiltering) – Rotate to horizontal – Sample 8x8 square window centered at feature – Intensity normalize the window by subtracting the mean, dividing by the standard deviation in the window Multiscale Oriented PatcheS descriptor 8 pixels 40 pixels Adapted from slide by Matthew Brown

Detections at Multiple Scales

Basic idea: Take 16x16 square window around detected feature Compute edge orientation (angle of the gradient - 90  ) for each pixel Throw out weak edges (threshold gradient magnitude) Create histogram of surviving edge orientations Scale Invariant Feature Transform Adapted from slide by David Lowe 0 22 angle histogram

SIFT Descriptor Full version Divide the 16x16 window into a 4x4 grid of cells (2x2 case shown below) Compute an orientation histogram for each cell 16 cells * 8 orientations = 128 dimensional descriptor Adapted from slide by David Lowe

Properties of SIFT Extraordinarily robust matching technique – Can handle changes in viewpoint Up to about 60 degree out of plane rotation – Can handle significant changes in illumination Sometimes even day vs. night (below) – Fast and efficient—can run in real time – Lots of code available

PCA-SIFT 39x39 patches Ix, Iy 39x39x2=3021 D vector  36 D using Principle Component Analysis Ke and Sukthankar 2004

Gradient Location-Orientation Histogram (GLOH) Log-polar binning structure instead of four quadrants 17 spatial bins * 16 orientation bins = 272 D PCA -> 128 D Mikolajczyk and Schmid 2005 Best performance overall

Maximally Stable Extremal Regions – Threshold image intensities: I > thresh for several increasing values of thresh – Extract connected components (“Extremal Regions”) – Find a threshold when region is “Maximally Stable”, i.e. local minimum of the relative growth – Approximate each region with an ellipse J.Matas et.al. “Distinguished Regions for Wide-baseline Stereo”. BMVC 2002.

Performance of Local Descriptor Comparison in GLOH > SIFT > others Mikolajczyk and Schmid 2005

Many New Descriptors Many newer techniques: tuning parameters.. Trained from large database Class- or instance- specific features…

4.1.3 Feature Matching Given a feature in I 1, how to find the best match in I 2 ? 1.Define distance function that compares two descriptors 2.Test all the features in I 2, find the one with min distance 3.Efficient data structures and algorithms

Feature Distance How to define the difference between features f 1, f 2 ? – Simple approach is SSD(f 1, f 2 ) Sum of square differences between entries of the two descriptors Can give good scores to very ambiguous (bad) matches I1I1 I2I2 f1f1 f2f2

Feature Distance How to define the difference between features f 1, f 2 ? – Better approach: ratio distance = SSD(f 1, f 2 ) / SSD(f 1, f 2 ’) f 2 is best SSD match to f 1 in I 2 f 2 ’ is 2 nd best SSD match to f 1 in I 2 gives small values for ambiguous matches I1I1 I2I2 f1f1 f2f2 f2'f2'

Evaluating the Results How can we measure the performance of a feature matcher? feature distance

True/False Positives The distance threshold affects performance – True positives = # of detected matches that are correct Suppose we want to maximize these—how to choose threshold? – False positives = # of detected matches that are incorrect Suppose we want to minimize these—how to choose threshold? feature distance false match true match Threshold

Performance Evaluation TP: true positives, i.e., number of correct matches; FN: false negatives, matches that were not correctly detected; FP: false positives, proposed matches that are incorrect; TN: true negatives, non-matches that were correctly rejected.

True/False Positive/Negative Black 1, 2: features Threshold solid circle – Green 1: true positive – Blue 1: false negative – 3: false positive – 4: true negative Threshold dashed circle – Green 1: true positive – Blue 1: true positive – 3: false positive – 4: false positive

Performance Evaluation Unit rates: – True positive rate (TPR) – False positive rate (FPR) – Positive predictive value (PPV) – Accuracy (ACC)

Confusion Matrix TP=18FP=4P’=22 FN=2TN=76N’=78 P=20N=80Total =100 TPR=0.90FPR=0.05 PPV=0.82 ACC=0.94 True matches True non-matches Predicted matches Predicted non-matches Ideally, TPR = 1, FPR =0

0.7 Evaluating the Results How can we measure the performance of a feature matcher? false positive rate true positive rate # true positives # matching features (positives) 0.1 # false positives # unmatched features (negatives) For a specific threshold

0.7 Evaluating the Results How can we measure the performance of a feature matcher? false positive rate true positive rate # true positives # matching features (positives) 0.1 # false positives # unmatched features (negatives) ROC curve (“Receiver Operator Characteristic”) ROC Curves Generated by counting # current/incorrect matches, for different thresholds Want to maximize area under the curve (AUC) Useful for comparing different feature matching methods For more info: Close to upper left corner

Performance Evaluation Give the distribution, look for the best threshold However, hard to have a clear distribution function Hard to set a best threshold Inter-space distance Distribution of positives and negatives

Fixed threshold – Db: FN; Dc, De: FP Nearest Neighbor – Db: TP; Dc: FP Trained from large database to get appropriate threshold Nearest neighbor distance ratio (NNDR) – Small d1/d2 – Db: TP; Dc, De: TN Threshold will adapt to different regions of the feature space (Mikolajczyk and Schmid 2005)

Performance Evaluation (Mikolajczyk and Schmid 2005) Fixed threshold

Performance Evaluation (Mikolajczyk and Schmid 2005) Nearest neighbor

Performance Evaluation (Mikolajczyk and Schmid 2005) NNDR

Feature Space Nearest Neighbor Distance Ratio Adaptive threshold based on different regions in feature space Trained from database..

Efficient Matching Compare all features against all others – Quadratic, impractical for most applications Indexing structure – For individual image, or globally for all images – Multi-dimensional search tree – Hash table – Vocabulary tree – …

Efficient Matching Multi-dimensional hashing – Map descriptor into fixed size buckets – When matching, new feature is hashed into a bucket – Search for nearby buckets to return potential candidates 123 N ……

Haar Wavelets 3D index by performing sum over the quadrants Normalize the 3 values by their expected standard deviations V_n  two nearest bins (of 10)  index 2^3 = 8 bins Query: primary 3D bin, k nearest neighbors for further process MOPS, Brown, Szeliski, andWinder (2005)

Haar Wavelets V_n  two nearest bins (of 10)  index 2^3 = 8 bins MOPS, Brown, Szeliski, andWinder (2005)

Haar Wavelets Query primary 3D bin, k nearest neighbors for further process MOPS, Brown, Szeliski, andWinder (2005) [v1 v2 v3]

More Structures Locality Sensitive Hashing – Use unions of independently computed hashing functions to index the features Parameter-Sensitive Hashing – More sensitive to the distribution of points in parameter space High-D vectors to binary codes – Compared using Hamming distance – Accommodate arbitrary kernel functions Page 232, 233 [Gionis, Indyk, and Motwani 1999] [Shakhnarovich, et al,2003] Torralba, Weiss, and Fergus 2008; Weiss, Torralba, and Fergus 2008; Kulis and Grauman 2009; Raginsky and Lazebnik 2009

KD Tree Multi-dimensional search tree Recursively divide multi-D feature space along alternating axis-aligned hyperplanes Choose the threshold along each axis so as to maximize some criterion ( tree balance, maximum depth as small as possible …) Query: nearby bins

KD Tree (.22,.56) More complicated case

Additional Data Structure Slicing – use a series of 1D binary searches on point list sorted along different dimensions to efficiently cull down a list of candidate points that lie within a hypercube of the query point Reweight the matches at different levels of the indexing tree – Less sensitive to discretization errors in tree construction Metric tree – A small number of prototypes at each level in a hierarchy – Visual words  classical information retrieval, fast Page 234

Indexing Structure Survey and comparison on indexing structures Kd tree works best Rapid computation of image feature correspondences remains a challenging open research problem Muja and Lowe (2009)

Verification and Densification Geometry alignment to verify Inliers and outliers RANSAC, Random sampling Will be discussed further in next sections

4.1.4 Feature Tracking Find features in all candidate images – Detect in one then search in others – Detect and track: for video tracking applications, where the expected amount of motion and appearance deformation between adjacent frames in expected to be small Subsequent frames – SSD works well – NCC is better if there is brightness change – Hierarchical search strategy when search range is large: matches in lower resolution images to provide better initial guesses and speed up the search

4.1.4 Feature Tracking Long image sequence: large appearance change – Whether to continue matching against the originally detected patch or – Re-sample each subsequent frame at the matching location Fail when original patch apperance changes such as foreshotening Risk: features drift from its original location

Feature Tracking Affine motion model – Compare patches in neighboring frames using a translational model – Use the location to initialize an affine registration between the patch in the current frame and the base frame – The area around the predicted feature location is searched with an incremental registration algorithm (Shi and Tomasi 1994) Detect new features in regions where the tracking has fail Kanade–Lucas–Tomasi (KLT) tracker.

A lot of Expansions Tracking combined with structure from motion Tie together the corners Tracking in video with large number of moving objects or points Special purpose recognizer: Learning algorithms – Train classifiers on sample patches and their affine deformation, fast and reliable, fast motion is supported Survey … (Yilmaz, Javed, and Shah 2006) Page 236

Real-time head tracking using the fast trained classifiers of Lepetit, Pilet, and Fua (2004) 2004 IEEE.

Application: Performance-driven animation Expression and head tracking Morph among a series of hand-drawn sketches Buck, Finkelstein, Jacobs et al. (2000)

Performance-Driven Animation An animator first extracts the eye and mouth regions of each sketch and draws control lines over each image

Performance-Driven Animation An animator first extracts the eye and mouth regions of each sketch and draws control lines over each image At run time, a face-tracking system (Toyama 1998) determines the current location of these features.

Performance-Driven Animation An animator first extracts the eye and mouth regions of each sketch and draws control lines over each image At run time, a face-tracking system (Toyama 1998) determines the current location of these features. The animation system decides which input images to morph based on nearest neighbor feature appearance matching and triangular barycentric interpolation. It also computes the global location and orientation of the head from the tracked features.

correspondence Morph based on nearest neighbor feature appearance matching triangular barycentric interpolation

Performance-Driven Animation An animator first extracts the eye and mouth regions of each sketch and draws control lines over each image At run time, a face-tracking system (Toyama 1998) determines the current location of these features. The animation system decides which input images to morph based on nearest neighbor feature appearance matching and triangular barycentric interpolation. It also computes the global location and orientation of the head from the tracked features. The resulting morphed eye and mouth regions are then composited back into the overall head model to yield a frame of hand-drawn animation

More on feature detection/description

Lots of Applications Features are used for: – Image alignment (e.g., mosaics) – 3D reconstruction – Motion tracking – Object recognition – Indexing and database retrieval – Robot navigation – … other

Object Recognition (David Lowe)

Sony Aibo SIFT usage: Recognize charging station Communicate with visual cards Teach object recognition