Deconstruction: Discriminative learning of local image descriptors Samantha Horvath Learning Based Methods in Vision 2/14/2012.

Slides:



Advertisements
Similar presentations
Distinctive Image Features from Scale-Invariant Keypoints
Advertisements

Feature Detection. Description Localization More Points Robust to occlusion Works with less texture More Repeatable Robust detection Precise localization.
Distinctive Image Features from Scale-Invariant Keypoints David Lowe.
BRIEF: Binary Robust Independent Elementary Features
Group Meeting Presented by Wyman 10/14/2006
Computer vision: models, learning and inference Chapter 13 Image preprocessing and feature extraction.
Computer Vision CS 143, Brown James Hays
Outline Feature Extraction and Matching (for Larger Motion)
TP14 - Local features: detection and description Computer Vision, FCUP, 2014 Miguel Coimbra Slides by Prof. Kristen Grauman.
LPP-HOG: A New Local Image Descriptor for Fast Human Detection Andy Qing Jun Wang and Ru Bo Zhang IEEE International Symposium.
Image and video descriptors
Instructor: Mircea Nicolescu Lecture 15 CS 485 / 685 Computer Vision.
Matching with Invariant Features
Attention Model Based SIFT Keypoints Filtration for Image Retrieval
SURF: Speeded-Up Robust Features
Recognition using Regions CVPR Outline Introduction Overview of the Approach Experimental Results Conclusion.
Patch Descriptors CSE P 576 Larry Zitnick
A Study of Approaches for Object Recognition
Scale Invariant Feature Transform
Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2005 with a lot of slides stolen from Steve Seitz and.
Distinctive Image Feature from Scale-Invariant KeyPoints
Feature extraction: Corners and blobs
Distinctive image features from scale-invariant keypoints. David G. Lowe, Int. Journal of Computer Vision, 60, 2 (2004), pp Presented by: Shalomi.
Scale Invariant Feature Transform (SIFT)
Representation, Description and Matching
Automatic Matching of Multi-View Images
Image Features: Descriptors and matching
SIFT - The Scale Invariant Feature Transform Distinctive image features from scale-invariant keypoints. David G. Lowe, International Journal of Computer.
1 Invariant Local Feature for Object Recognition Presented by Wyman 2/05/2006.
Distinctive Image Features from Scale-Invariant Keypoints David G. Lowe – IJCV 2004 Brien Flewelling CPSC 643 Presentation 1.
Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.
Matthew Brown University of British Columbia (prev.) Microsoft Research [ Collaborators: † Simon Winder, *Gang Hua, † Rick Szeliski † =MS Research, *=MS.
Overview Introduction to local features
The Beauty of Local Invariant Features
Interest Point Descriptors
Distinctive Image Features from Scale-Invariant Keypoints By David G. Lowe, University of British Columbia Presented by: Tim Havinga, Joël van Neerbos.
Computer vision.
Interest Point Descriptors and Matching
Internet-scale Imagery for Graphics and Vision James Hays cs195g Computational Photography Brown University, Spring 2010.
Recognition and Matching based on local invariant features Cordelia Schmid INRIA, Grenoble David Lowe Univ. of British Columbia.
Overview Harris interest points Comparing interest points (SSD, ZNCC, SIFT) Scale & affine invariant interest points Evaluation and comparison of different.
Local invariant features Cordelia Schmid INRIA, Grenoble.
Reporter: Fei-Fei Chen. Wide-baseline matching Object recognition Texture recognition Scene classification Robot wandering Motion tracking.
Feature Detection and Descriptors
Local invariant features Cordelia Schmid INRIA, Grenoble.
Histograms of Oriented Gradients for Human Detection(HOG)
Harris Corner Detector & Scale Invariant Feature Transform (SIFT)
Overview Introduction to local features Harris interest points + SSD, ZNCC, SIFT Scale & affine invariant interest point detectors Evaluation and comparison.
Feature extraction: Corners and blobs. Why extract features? Motivation: panorama stitching We have two images – how do we combine them?
Features, Feature descriptors, Matching Jana Kosecka George Mason University.
A Tutorial on using SIFT Presented by Jimmy Huff (Slightly modified by Josiah Yoder for Winter )
Local features: detection and description
Presented by David Lee 3/20/2006
Finding Clusters within a Class to Improve Classification Accuracy Literature Survey Yong Jae Lee 3/6/08.
Recognizing specific objects Matching with SIFT Original suggestion Lowe, 1999,2004.
WLD: A Robust Local Image Descriptor Jie Chen, Shiguang Shan, Chu He, Guoying Zhao, Matti Pietikäinen, Xilin Chen, Wen Gao 报告人:蒲薇榄.
SIFT.
SIFT Scale-Invariant Feature Transform David Lowe
Presented by David Lee 3/20/2006
Distinctive Image Features from Scale-Invariant Keypoints
Scale Invariant Feature Transform (SIFT)
Digital Visual Effects, Spring 2006 Yung-Yu Chuang 2006/3/22
Paper Presentation: Shape and Matching
Feature description and matching
CS598:Visual information Retrieval
Interest Points & Descriptors 3 - SIFT
KFC: Keypoints, Features and Correspondences
SIFT.
Feature descriptors and matching
Recognition and Matching based on local invariant features
Presentation transcript:

Deconstruction: Discriminative learning of local image descriptors Samantha Horvath Learning Based Methods in Vision 2/14/2012

Introduction Computer vision makes use of many “hand- crafted” descriptors. These descriptors share many common components This paper presents a modular framework for designing and optimizing new feature descriptors

Common Descriptors SIFT ▫Most well-known descriptor ▫Quantized gradient vectors ▫Grid based spatial histogram ▫Post-normalization PCA-SIFT ▫Quantized gradient vectors ▫PCA to reduce dimensionality GLOH ▫Quantized gradient vectors ▫Polar based histogram ▫PCA to reduce dimensionality

Common descriptors SURF ▫Haar wavelet responses ▫Grid based histograms HOG ▫Dense SIFT Shape Context ▫Extract points from object contour ▫Polar based histogram Geometric blur ▫Extract sparse channels ▫Apply spatially varying blur ▫Subsample to create descriptor

Image with Interest point Extracted Patch Feature extraction Pooling Dimensionality reduction

Generalized Framework Represent each portion of the descriptor algorithm as an interchangeable block Blocks inspired by existing descriptor algorithms Blocks are organized into candidate descriptors, then parameters are optimized

Algorithmic building blocks G-block: Gaussian smoothing T-block: non-linear transformation S-block: Spatial summation/pooling E-block: Embedding (dimensionality reduction) N-block: Normalization

General Pipeline

Transformations  T1 – Gradient Orientation binning  Variants: number of orientation bins  T2 – Rectified gradient binning  Variants: number of rectification bins  T3 – Steerable filters  Variants: filter order and number of orientations  T4 – Difference of Gaussian Responses (center- surround)  Parameter: size of center  T5 – Haar wavelet transform  T6 – Fixed 4 x 4 classifier  T7 – Quantized gray levels  Variant: number of gray level bins

Spatial summation blocks - parametric  S1 - SIFT style bilinear weighted grid  Parameters: overall footprint size (continuous)  S2 – GLOH style log- polar regions  Variants: number/arrangement of pooling regions  Parameters: Ring radius

Spatial summation blocks - parametric  S3 – Gaussian weighted pooling regions on a grid  Variants: Grid size  Parameters: Grid sample positions, Size of the gaussians  S4 – Gaussian weighted, polar arranged pooling regions  Variants: number of pooling regions  Parameters: Ring radii, gaussian kernel size, relative angular rotation

Embedding – non-parametric E1 – PCA (non discriminative technique) E2/E3 – Projection minimizes the ratio of in-class variance for match pairs to the variance of all match pairs (LPP) E4/E5 – Projection minimizes the ratio of variance between matched and non-matched pairs (LDE) E6/E7 – Projection minimizes the ratio of in-class variance for match pairs to the total data variance (GLDE)

Smoothing and Normalization Gaussian smoothing ▫Parameter: σ Normalization ▫Normalize to unit vector ▫Clip to threshold ▫Renormalize, rinse, repeat ▫Parameters: clipping threshold

New names! SIFT ▫T1b – S1-16 GLOH ▫T1b – S2-17 – E1 PCA-SIFT ▫T1b – E1 SURF ▫T5 – S1

Ground Truth dataset

Ground truth dataset Uses camera calibration and dense multi-view stereo data DoG interest points are detected Interest points are mapped from one image to another view of the scene Stereo constraints are used to help match interest points

Same to same vs. like to like

Learning/optimization The G, S,N blocks and one T block (T4) contain parameters for optimization The G, S, N, and T blocks are jointly optimized using Powell minimization ▫Powell minimization a conjugate gradient method – does not require derivatives ▫Optimization initialized with reasonable values E blocks are optimized separately – generalized eigenvalue problem ▫Power regularization to avoid over fitting

Dimension Reduction on SIFT

T3 blocks (rectified steerable filters) with polar summation regions (S4/S2) performed the best Consistently 0utperformed SIFT descriptors

Optimal Summation regions Optimal summation regions are foveated!!!

Foveated summation regions The S4-25 spatial pooling variant is very similar to the DAISY descriptor (designed for dense matching) Foveated regions are similar to geometric blur (increased blurring away from interest point)

Results – Pipeline 2 Performance more varied Steerable filters still perform the best LDE and LPP best embedding methods Does not consistently outperform SIFT

Results – Pipeline 3 Dimensionality reduction after learned T/S block combination Greatly outperforms SIFT Straightforward PCA works the best

Thoughts Would like to see how the optimized spatial pooling blocks vary with the different training sets Ultimately, would like to see this framework tested on different dataset types Difficulty is getting “ground truth” matches for identification/classification tasks

Synthesis Majority of my computer vision research involves medical imaging Medical images are very different from natural scene images ▫Images represent a planar slice through the patient ▫Often poor contrast between different structures

Synthesis Interest point detection has applications for medical imaging Non-rigid registrations ▫Warp one set of interest points to overlay the second set Tracking

References 1.M. Brown, G. Hua and S. Winder, Discriminant Learning of Local Image Descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence (original) 2.S. Winder and M. Brown, “Learning local image descriptors,” in Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR07), Minneapolis, June (technical) 3.K. Mikolajczyk and C. Schmid, “Scale and affine invariant interest point detectors,” International Journal of Computer Vision, vol. 1, no. 60, pp. 63–86, (overview) 4.E. Tola, V. Lepetit, and P. Fua, “A fast local descriptor for dense matching,” in Proceedings of the International Conference on Computer Vision and Pattern Recognition, Anchorage, June (daisy) 5.Y. Ke and R. Sukthankar, “PCA-SIFT: a more distinctive representation for local image descriptors,” in Proceedings of the International Conference on Computer Vision and Pattern Recognition, vol. 2, July 2004, pp. 506–513. (PCA-sift) 6.A. Berg and J. Malik, “Geometric blur and template matching,” in International Conference on Computer Vision and Pattern Recognition, 2001, pp. I:607– D. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, (SIFT) 8.Bay,H,. Tuytelaars, T., &Van Gool, L.(2006). “SURF: Speeded Up Robust Features”, 9 th European Conference on Computer Vision. 9.G. Sharma, F. Jurie, Learning discriminative spatial representation for image classification, British Machine Vision Conference (BMVC.11) (grid paper) 10.W. T. Freeman and E. H. Adelson, “The design and use of steerable filters,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, pp. 891–906, 1991.