MIT CSAIL Vision interfaces Approximate Correspondences in High Dimensions Kristen Grauman* Trevor Darrell MIT CSAIL (*) UT Austin…

Slides:

Advertisements

Similar presentations

Distinctive Image Features from Scale-Invariant Keypoints David Lowe.

Advertisements

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Three things everyone should know to improve object retrieval

Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.

MIT CSAIL Vision interfaces Towards efficient matching with random hashing methods… Kristen Grauman Gregory Shakhnarovich Trevor Darrell.

Efficiently searching for similar images (Kristen Grauman)

Multi-layer Orthogonal Codebook for Image Classification Presented by Xia Li.

The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features Kristen Grauman Trevor Darrell MIT.

Semi-Supervised Hierarchical Models for 3D Human Pose Reconstruction Atul Kanaujia, CBIM, Rutgers Cristian Sminchisescu, TTI-C Dimitris Metaxas,CBIM, Rutgers.

CS4670 / 5670: Computer Vision Bag-of-words models Noah Snavely Object

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

Global spatial layout: spatial pyramid matching Spatial weighting the features Beyond bags of features: Adding spatial information.

Image alignment Image from

Discriminative and generative methods for bags of features

Bag-of-features models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.

Effective Image Database Search via Dimensionality Reduction Anders Bjorholm Dahl and Henrik Aanæs IEEE Computer Society Conference on Computer Vision.

Robust and large-scale alignment Image from

1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.

Lecture 28: Bag-of-words models

Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.

CS294‐43: Visual Object and Activity Recognition Prof. Trevor Darrell Spring 2009 March 17 th, 2009.

Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.

Local Features and Kernels for Classification of Object Categories J. Zhang --- QMUL UK (INRIA till July 2005) with M. Marszalek and C. Schmid --- INRIA.

5/30/2006EE 148, Spring Visual Categorization with Bags of Keypoints Gabriella Csurka Christopher R. Dance Lixin Fan Jutta Willamowski Cedric Bray.

1 Invariant Local Feature for Object Recognition Presented by Wyman 2/05/2006.

Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.

Large Scale Recognition and Retrieval. What does the world look like? High level image statistics Object Recognition for large-scale search Focus on scaling.

Matthew Brown University of British Columbia (prev.) Microsoft Research [ Collaborators: † Simon Winder, *Gang Hua, † Rick Szeliski † =MS Research, *=MS.

Overview Introduction to local features

The Beauty of Local Invariant Features

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

Indexing Techniques Mei-Chen Yeh.

Unsupervised Learning of Categories from Sets of Partially Matching Image Features Kristen Grauman and Trevor Darrel CVPR 2006 Presented By Sovan Biswas.

Step 3: Classification Learn a decision rule (classifier) assigning bag-of-features representations of images to different classes Decision boundary Zebra.

Final Exam Review CS485/685 Computer Vision Prof. Bebis.

Overview Harris interest points Comparing interest points (SSD, ZNCC, SIFT) Scale & affine invariant interest points Evaluation and comparison of different.

Local invariant features Cordelia Schmid INRIA, Grenoble.

Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Bastian Leibe & Computer Vision Laboratory ETH.

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

A Statistical Approach to Speed Up Ranking/Re-Ranking Hong-Ming Chen Advisor: Professor Shih-Fu Chang.

Svetlana Lazebnik, Cordelia Schmid, Jean Ponce

SVM-KNN Discriminative Nearest Neighbor Classification for Visual Category Recognition Hao Zhang, Alex Berg, Michael Maire, Jitendra Malik.

Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao.

Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.

Beyond Sliding Windows: Object Localization by Efficient Subwindow Search The best paper prize at CVPR 2008.

Local invariant features Cordelia Schmid INRIA, Grenoble.

Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.

Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.

A feature-based kernel for object classification P. Moreels - J-Y Bouguet Intel.

Overview Introduction to local features Harris interest points + SSD, ZNCC, SIFT Scale & affine invariant interest point detectors Evaluation and comparison.

Lecture 08 27/12/2011 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.

CS654: Digital Image Analysis

Goggle Gist on the Google Phone A Content-based image retrieval system for the Google phone Manu Viswanathan Chin-Kai Chang Ji Hyun Moon.

3D Motion Data Mining Multimedia Project Multimedia and Network Lab, Department of Computer Science.

776 Computer Vision Jan-Michael Frahm Spring 2012.

Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.

Finding Clusters within a Class to Improve Classification Accuracy Literature Survey Yong Jae Lee 3/6/08.

A Fast Kernel for Attributed Graphs Yu Su University of California at Santa Barbara with Fangqiu Han, Richard E. Harang, and Xifeng Yan.

776 Computer Vision Jan-Michael Frahm Spring 2012.

Video Google: Text Retrieval Approach to Object Matching in Videos

Paper Presentation: Shape and Matching

Large-scale Instance Retrieval

Histogram—Representation of Color Feature in Image Processing Yang, Li

Digit Recognition using SVMS

By Suren Manvelyan, Crocodile (nile crocodile?) By Suren Manvelyan,

Approximate Correspondences in High Dimensions

CS 1674: Intro to Computer Vision Scene Recognition

CAP 5415 Computer Vision Fall 2012 Dr. Mubarak Shah Lecture-5

Brief Review of Recognition + Context

Video Google: Text Retrieval Approach to Object Matching in Videos

Presentation transcript:

MIT CSAIL Vision interfaces Approximate Correspondences in High Dimensions Kristen Grauman* Trevor Darrell MIT CSAIL (*) UT Austin…

MIT CSAIL Vision interfaces Key challenges: robustness IlluminationObject pose Clutter Viewpoint Intra-class appearance Occlusions

MIT CSAIL Vision interfaces Key challenges: efficiency Thousands to millions of pixels in an image 3,000-30,000 human recognizable object categories Billions of images indexed by Google Image Search 18 billion+ prints produced from digital camera images in million camera phones sold in 2005

MIT CSAIL Vision interfaces Local representations Superpixels [Ren et al.] Shape context [Belongie et al.] Maximally Stable Extremal Regions [Matas et al.] Geometric Blur [Berg et al.] SIFT [Lowe] Salient regions [Kadir et al.] Harris-Affine [Schmid et al.] Spin images [Johnson and Hebert] Describe component regions or patches separately

MIT CSAIL Vision interfaces How to handle sets of features? Each instance is unordered set of vectors Varying number of vectors per instance

MIT CSAIL Vision interfaces Partial matching Compare sets by computing a partial matching between their features.

MIT CSAIL Vision interfaces Pyramid match overview optimal partial matching

MIT CSAIL Vision interfaces Computing the partial matching Optimal matching Greedy matching Pyramid match for sets with features of dimension

MIT CSAIL Vision interfaces Pyramid match overview Place multi-dimensional, multi-resolution grid over point sets Consider points matched at finest resolution where they fall into same grid cell Approximate optimal similarity with worst case similarity within pyramid cell No explicit search for matches! Pyramid match measures similarity of a partial matching between two sets:

MIT CSAIL Vision interfaces Pyramid match Number of newly matched pairs at level i Measure of difficulty of a match at level i Approximate partial match similarity [Grauman and Darrell, ICCV 2005]

MIT CSAIL Vision interfaces Pyramid extraction, Histogram pyramid: level i has bins of size

MIT CSAIL Vision interfaces Counting matches Histogram intersection

MIT CSAIL Vision interfaces Example pyramid match

MIT CSAIL Vision interfaces Example pyramid match

MIT CSAIL Vision interfaces Example pyramid match

MIT CSAIL Vision interfaces Example pyramid match pyramid match optimal match

MIT CSAIL Vision interfaces x Randomly generated uniformly distributed point sets with m= 5 to 100, d=2 Approximating the optimal partial matching

MIT CSAIL Vision interfaces PM preserves rank…

MIT CSAIL Vision interfaces and is robust to clutter…

MIT CSAIL Vision interfaces Learning with the pyramid match Kernel-based methods –Embed data into a Euclidean space via a similarity function (kernel), then seek linear relationships among embedded data –Efficient and good generalization –Include classification, regression, clustering, dimensionality reduction,… Pyramid match forms a Mercer kernel

MIT CSAIL Vision interfaces ComplexityKernel Pyramid match Match [Wallraven et al.] Time (s) Accuracy Category recognition results ETH-80 data set Mean number of features

MIT CSAIL Vision interfaces s / match5 s / match Category recognition results Pyramid match kernel over spatial features with quantized appearance 2004 Time of publication 6/05 12/05 3/06 6/06

MIT CSAIL Vision interfaces But rectangular histogram may scale poorly with input dimension… Build data-dependent histogram structure… New Vocabulary-guided PM [NIPS 06]: Hierarchical k-means over training set Irregular cells; record diameter of each bin VG pyramid structure stored O(k L ); stored once Individual Histograms still stored sparsely Vocabulary-guided pyramid match

MIT CSAIL Vision interfaces Vocabulary-guided pyramid match Uniform bins Tune pyramid partitions to the feature distribution Accurate for d > 100 Requires initial corpus of features to determine pyramid structure Small cost increase over uniform bins: kL distances against bin centers to insert points Vocabulary- guided bins

MIT CSAIL Vision interfaces Vocabulary-guided pyramid match n ij (X) : hist. X level i cell j w ij : weight for hist. X level i cell j (1)~= diameter of cell (2)~= d ij (X) + d ij (Y) (d ij (H)=max dist of H’s pts in cell i,j to center) c h (n) : child h of node n c 2 (n 11 ) Mercer kernel Upper bound w ij * (# matches in cell j level i - # matches in children) W * # new level i

MIT CSAIL Vision interfaces Results: Evaluation criteria Quality of match scores How similar are the rankings produced by the approximate measure to those produced by the optimal measure? Quality of correspondences How similar is the approximate correspondence field to the optimal one? Object recognition accuracy Used as a match kernel over feature sets, what is the recognition output?

MIT CSAIL Vision interfaces Match score quality Uniform bin pyramid match Vocabulary- guided pyramid match ETH-80 images, sets of SIFT features d=8d=128 d=8 Dense SIFT (d=128) k=10, L=5 for VG PM; PCA for low-dim feats

MIT CSAIL Vision interfaces ETH-80 images, sets of SIFT features Match score quality

MIT CSAIL Vision interfaces Spearman correlation Correlation coefficient to measure how well two ordinal rankings agree rank value in true ordering corresponding rank assigned by approximate ordering

MIT CSAIL Vision interfaces Bin structure and match counts Data-dependent bins allow more gradual distance ranges d=8d=13 d=68 d=3 d=113 d=128

MIT CSAIL Vision interfaces Approximate correspondences Use pyramid intersections to compute smaller explicit matchings.

MIT CSAIL Vision interfaces Approximate correspondences Use pyramid intersections to compute smaller explicit matchings. optimal per bin random per bin

MIT CSAIL Vision interfaces Correspondence examples

MIT CSAIL Vision interfaces ETH-80 images, sets of SIFT descriptors Approximate correspondences

MIT CSAIL Vision interfaces ETH-80 images, sets of SIFT descriptors Approximate correspondences

MIT CSAIL Vision interfaces Impact on recognition accuracy VG-PMK as kernel for SVM Caltech-4 data set SIFT descriptors extracted at Harris and MSER interest points

MIT CSAIL Vision interfaces Sets of features elsewhere diseases as sets of gene expressions documents as bags of words methods as sets of instructions