Indexing Techniques Mei-Chen Yeh.

Slides:



Advertisements
Similar presentations
Content-Based Image Retrieval
Advertisements

VisualRank: Applying PageRank to Large-Scale Image Search Yushi Jing, Member, IEEE, and Shumeet Baluja, Member, IEEE.
Searching on Multi-Dimensional Data
MIT CSAIL Vision interfaces Towards efficient matching with random hashing methods… Kristen Grauman Gregory Shakhnarovich Trevor Darrell.
Efficiently searching for similar images (Kristen Grauman)
TP14 - Indexing local features
MIT CSAIL Vision interfaces Approximate Correspondences in High Dimensions Kristen Grauman* Trevor Darrell MIT CSAIL (*) UT Austin…
Query Specific Fusion for Image Retrieval
CS4670 / 5670: Computer Vision Bag-of-words models Noah Snavely Object
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.
Image alignment Image from
Bag-of-features models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Multimedia Indexing and Retrieval Kowshik Shashank Project Advisor: Dr. C.V. Jawahar.
Fast High-Dimensional Feature Matching for Object Recognition David Lowe Computer Science Department University of British Columbia.
CVPR 2008 James Philbin Ondˇrej Chum Michael Isard Josef Sivic
Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.
Large-scale matching CSE P 576 Larry Zitnick
Coherency Sensitive Hashing (CSH) Simon Korman and Shai Avidan Dept. of Electrical Engineering Tel Aviv University ICCV2011 | 13th International Conference.
Bag of Features Approach: recent work, using geometric information.
Effective Image Database Search via Dimensionality Reduction Anders Bjorholm Dahl and Henrik Aanæs IEEE Computer Society Conference on Computer Vision.
Robust and large-scale alignment Image from
Object retrieval with large vocabularies and fast spatial matching
1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.
Lecture 28: Bag-of-words models
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.
Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Bag-of-features models
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman
Nearest Neighbor Retrieval Using Distance-Based Hashing Michalis Potamias and Panagiotis Papapetrou supervised by Prof George Kollios A method is proposed.
Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.
Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.
Large Scale Recognition and Retrieval. What does the world look like? High level image statistics Object Recognition for large-scale search Focus on scaling.
Keypoint-based Recognition and Object Search
Efficient Image Search and Retrieval using Compact Binary Codes
10/31/13 Object Recognition and Augmented Reality Computational Photography Derek Hoiem, University of Illinois Dali, Swans Reflecting Elephants.
Object Recognition and Augmented Reality
Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Clustering with Application to Fast Object Search
Keypoint-based Recognition Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/04/10.
CSE 473/573 Computer Vision and Image Processing (CVIP)
Special Topic on Image Retrieval
04/30/13 Last class: summary, goggles, ices Discrete Structures (CS 173) Derek Hoiem, University of Illinois 1 Image: wordpress.com/2011/11/22/lig.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Nearest Neighbor Paul Hsiung March 16, Quick Review of NN Set of points P Query point q Distance metric d Find p in P such that d(p,q) < d(p’,q)
Fast Similarity Search for Learned Metrics Prateek Jain, Brian Kulis, and Kristen Grauman Department of Computer Sciences University of Texas at Austin.
Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.
10/31/13 Object Recognition and Augmented Reality Computational Photography Derek Hoiem, University of Illinois Dali, Swans Reflecting Elephants.
Similarity Searching in High Dimensions via Hashing Paper by: Aristides Gionis, Poitr Indyk, Rajeev Motwani.
Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.
P ROBING THE L OCAL -F EATURE S PACE OF I NTEREST P OINTS Wei-Ting Lee, Hwann-Tzong Chen Department of Computer Science National Tsing Hua University,
Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality Piotr Indyk, Rajeev Motwani The 30 th annual ACM symposium on theory of computing.
Lecture 08 27/12/2011 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.
Bundling Features for Large Scale Partial-Duplicate Web Image Search Zhong Wu ∗, Qifa Ke, Michael Isard, and Jian Sun Microsoft Research.
CS654: Digital Image Analysis
776 Computer Vision Jan-Michael Frahm Spring 2012.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
776 Computer Vision Jan-Michael Frahm Spring 2012.
Lecture 07 13/12/2011 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.
Instance Based Learning
CS 2770: Computer Vision Feature Matching and Indexing
Video Google: Text Retrieval Approach to Object Matching in Videos
By Suren Manvelyan, Crocodile (nile crocodile?) By Suren Manvelyan,
Locality Sensitive Hashing
Video Google: Text Retrieval Approach to Object Matching in Videos
Minwise Hashing and Efficient Search
President’s Day Lecture: Advanced Nearest Neighbor Search
Presentation transcript:

Indexing Techniques Mei-Chen Yeh

Last week Matching two sets of features Strategy 1 Strategy 2: Convert to a fixed-length feature vector (Bag-of-words) Use a conventional proximity measure Strategy 2: Build point correspondences

Last week: bag-of-words visual vocabulary ….. frequency codewords CS 376 Lecture 18

Matching local features: building patch correspondences ? Image 1 Image 2 To generate candidate matches, find patches that have the most similar appearance (e.g., lowest SSD) Slide credits: Prof. Kristen Grauman CS 376 Lecture 18

Matching local features: building patch correspondences ? Image 1 Image 2 Simplest approach: compare them all, take the closest (or closest k, or within a thresholded distance) Slide credits: Prof. Kristen Grauman CS 376 Lecture 18

Indexing local features Each patch / region has a descriptor, which is a point in some high-dimensional feature space (e.g., SIFT) Descriptor’s feature space Database images CS 376 Lecture 18 6

Indexing local features When we see close points in feature space, we have similar descriptors, which indicates similar local content. Query image Descriptor’s feature space Database images CS 376 Lecture 18

Problem statement With potentially thousands of features per image, and hundreds to millions of images to search, how to efficiently find those that are relevant to a new image? CS 376 Lecture 18 8

50 thousand images 4m Slide credit: Nistér and Stewénius

110 million images?

To continue the analogy, if we printed all these images on paper, and stacked them,

The pile would stack, as high as

Scalability matters! Mount Everest. Another way to put perspective on this is, Google image search not too long ago claimed to index 2 billion images, although based on meta-data, while we do it based on image content. So, with about 20 desktop systems like the one I just showed, it seems that it may be possible to build a web-scale content-based image search engine, and we are sort of hoping that this paper will fuel the race for the first such search engine. So, that is some motivation. Let me now move to the contribution of the paper. As you can guess by now, it is about scalability of recognition and retrieval. Scalability matters!

The Nearest-Neighbor Search Problem Given A set S of n points in d dimensions A query point q Which point in S is closest to q? Time complexity of linear scan: O( ? ) dn ?

The Nearest-Neighbor Search Problem

The Nearest-Neighbor Search Problem r-nearest neighbor for any query q, returns a point p ∈ S s.t. c-approximate r-nearest neighbor for any query q, returns a point p’ ∈ S

Today Indexing local features Inverted file Vocabulary tree Locality sensitivity hashing CS 376 Lecture 18

Indexing local features: inverted file

Indexing local features: inverted file For text documents, an efficient way to find all pages on which a word occurs is to use an index. We want to find all images in which a feature occurs. page ~ image word ~ feature To use this idea, we’ll need to map our features to “visual words”.

Text retrieval vs. image search What makes the problems similar, different? CS 376 Lecture 18

e.g., SIFT descriptor space: each point is 128-dimensional Visual words e.g., SIFT descriptor space: each point is 128-dimensional Extract some local features from a number of images … Slide credit: D. Nister, CVPR 2006

Visual words

Visual words

Visual words

Each point is a local descriptor, e.g. SIFT vector.

Example: Quantize into 3 words

Descriptor’s feature space Visual words Map high-dimensional descriptors to tokens/words by quantizing the feature space Quantize via clustering, let cluster centers be the prototype “words” Determine which word to assign to each new image region by finding the closest cluster center. Word #2 Descriptor’s feature space CS 376 Lecture 18

Visual words Each group of patches belongs to the same visual word! Figure from Sivic & Zisserman, ICCV 2003 CS 376 Lecture 18

Visual vocabulary formation Issues: Sampling strategy: where to extract features? Fixed locations or interest points? Clustering / quantization algorithm What corpus provides features (universal vocabulary?) Vocabulary size, number of words Weight of each word?

The index maps word-to-image ids Inverted file index Why the index give us a significant gain in efficiency? The index maps word-to-image ids

A query image is matched to database images that share visual words. Inverted file index A query image is matched to database images that share visual words. CS 376 Lecture 18

tf-idf weighting Term frequency – inverse document frequency Describe the frequency of each word within an image, decrease the weights of the words that appear often in the database economic, trade, … the, most, we, … w↗ discriminative regions w↘ common regions Standard weighting for text retrieval, measuring a word’s importance in a particular document CS 376 Lecture 18

tf-idf weighting Term frequency – inverse document frequency Describe the frequency of each word within an image, decrease the weights of the words that appear often in the database Total number of documents in database Number of occurrences of word i in document d Standard weighting for text retrieval Number of documents word i occurs in, in whole database Number of words in document d CS 376 Lecture 18

Bag-of-Words + Inverted file Bag-of-words representation http://people.cs.ubc.ca/~lowe/keypoints/ Inverted file http://www.robots.ox.ac.uk/~vgg /research/vgoogle/index.html

D. Nistér and H. Stewenius D. Nistér and H. Stewenius. Scalable Recognition with a Vocabulary Tree, CVPR 2006.

We then run k-means on the descriptor space We then run k-means on the descriptor space. In this setting, k defines what we call the branch-factor of the tree, which indicates how fast the tree branches. In this illustration, k is three. We then run k-means again, recursively on each of the resulting quantization cells. This defines the vocabulary tree, which is essentially a hierarchical set of cluster centers and their corresponding Voronoi regions. We typically use a branch-factor of 10 and six levels, resulting in a million leaf nodes. We lovingly call this the Mega-Voc.

Visualize as a tree

Vocabulary Tree Training: Filling the tree [Nister & Stewenius, CVPR’06] Slide credit: David Nister CS 376 Lecture 18

Vocabulary Tree Training: Filling the tree [Nister & Stewenius, CVPR’06] Slide credit: David Nister CS 376 Lecture 18

Vocabulary Tree Training: Filling the tree [Nister & Stewenius, CVPR’06] Slide credit: David Nister CS 376 Lecture 18

Vocabulary Tree Training: Filling the tree [Nister & Stewenius, CVPR’06] Slide credit: David Nister CS 376 Lecture 18

Vocabulary Tree Training: Filling the tree [Nister & Stewenius, CVPR’06] 42 Slide credit: David Nister CS 376 Lecture 18

Or perform geometric verification Vocabulary Tree Recognition Retrieved Or perform geometric verification [Nister & Stewenius, CVPR’06] Slide credit: David Nister CS 376 Lecture 18

Think about the computational advantage of the hierarchical tree vs Think about the computational advantage of the hierarchical tree vs. a flat vocabulary! CS 376 Lecture 18

Hashing

Direct addressing Create a direct-address table with m slots U 0 1 2 3 4 5 6 7 8 9 U (universe of keys) key satellite data 2 3 5 8 4 6 9 1 7 K (actual keys) 2 3 5 8

Direct addressing Search operation: O(1) Problem: The range of keys can be large! 64-bit numbers => 18,446,744,073,709,551,616 different keys SIFT: 128 * 8 bits U 2^64 K CS 376 Lecture 18

Hashing O(1) average-case time Use a hash function h to compute the slot from the key k T: hash table U (universe of keys) h(k1) may not be k1 anymore! h(k4) collision K (actual keys) may share a bucket k3 = h(k3) h(k5) k1 k4 m-1 k5 CS 376 Lecture 18

Hashing A good hash function Satisfies the assumption of simple uniform hashing: each key is equally likely to hash to any of the m slots. How to design a hash function for indexing high-dimensional data?

128-d T: hash table ?

Locality-sensitive hashing Indyk and Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality, STOC 1998.

Locality-sensitive hashing (LSH) Hash functions are locality-sensitive, if, for any pair of points p, q we have: Pr[h(p)=h(q)] is “high” if p is close to q Pr[h(p)=h(q)] is “low” if p is far from q ko-she CS 376 Lecture 18

Locality Sensitive Hashing A family H of functions h: Rd → U is called (r, cr, P1, P2)-sensitive, if for any p, q: if then Pr[h(p)=h(q)] > P1 if then Pr[h(p)=h(q)] < P2

LSH Function: Hamming Space Consider binary vectors points from {0, 1}d Hamming distance D(p, q) = # positions on which p and q differ Example: (d = 3) D(100, 011) = 3 D(010, 111) = 2

LSH Function: Hamming Space Define hash function h as hi(p) = pi where pi is the i-th bit of p Example: select the 1st dimension h(010) = 0 h(111) = 1 Pr[h(010)≠h(111)] = ? ⅔ = D(p, q)/d vs. D(p, q)? d? Pr[h(p)=h(q)] = ? 1 - D(p, q)/d Clearly, h is locality sensitive.

LSH Function: Hamming Space A k-bit locality-sensitive hash function is defined as g(p) = [h1(p), h2(p), …, hk(p)]T Each hi(p) is chosen randomly Each hi(p) results in a single bit Pr(similar points collide) ≥ Pr(dissimilar points collide) ≤ Indyk and Motwani [1998]

LSH Function: R2 space Consider 2-d vectors

LSH Function: R2 space The probability that a random hyperplane separates two unit vectors depends on the angle between them:

LSH Pre-processing Each image is entered into L hash tables indexed by independently constructed g1, g2, …, gL Preprocessing Space: O(LN)

LSH Querying For each hash table, return the bin indexed by gi(q), 1 ≤ i ≤ L. Perform a linear search on the union of the bins.

W. –T Lee and H. –T. Chen. Probing the local-feature space of interest points, ICIP 2010.

Hash family a : random vector sampled from a Gaussian distribution The dot-product a‧v projects each vector v to “a line” a : random vector sampled from a Gaussian distribution b : real value chosen uniformly from the range [0 , r] r : segment width

Building the hash table

Building the hash table : segment width (max-min)/t For each random projection, we get t buckets.

Building the hash table Generate K projections Combing them to get an index in the hash table: How many buckets do we get? tK

Building the hash table Example 5 projections (K = 5) 15 segments (t = 15) 155 = 759,375 buckets in total!

Sketching the Feature Space Natural image patches (from Berkeley segmentation database ) Noise image patches (Randomly-generated noise patches) Collect three image patches of different size 16x16 , 32x32 , 64x64 Each set consist of 200,000 patches.

Patch distribution over buckets

Summary Indexing techniques are essential for organizing a database and for enabling fast matching. For indexing high-dimensional data Inverted file Vocabulary tree Locality sensitive hashing

Resources and extended readings LSH Matlab Toolbox http://www.cs.brown.edu/~gregory/download.html Yeh et al., “Adaptive Vocabulary Forests for Dynamic Indexing and Category Learning,” ICCV 2007.