Indexing Techniques Mei-Chen Yeh
Last week Matching two sets of features Strategy 1 Strategy 2: Convert to a fixed-length feature vector (Bag-of-words) Use a conventional proximity measure Strategy 2: Build point correspondences
Last week: bag-of-words visual vocabulary ….. frequency codewords CS 376 Lecture 18
Matching local features: building patch correspondences ? Image 1 Image 2 To generate candidate matches, find patches that have the most similar appearance (e.g., lowest SSD) Slide credits: Prof. Kristen Grauman CS 376 Lecture 18
Matching local features: building patch correspondences ? Image 1 Image 2 Simplest approach: compare them all, take the closest (or closest k, or within a thresholded distance) Slide credits: Prof. Kristen Grauman CS 376 Lecture 18
Indexing local features Each patch / region has a descriptor, which is a point in some high-dimensional feature space (e.g., SIFT) Descriptor’s feature space Database images CS 376 Lecture 18 6
Indexing local features When we see close points in feature space, we have similar descriptors, which indicates similar local content. Query image Descriptor’s feature space Database images CS 376 Lecture 18
Problem statement With potentially thousands of features per image, and hundreds to millions of images to search, how to efficiently find those that are relevant to a new image? CS 376 Lecture 18 8
50 thousand images 4m Slide credit: Nistér and Stewénius
110 million images?
To continue the analogy, if we printed all these images on paper, and stacked them,
The pile would stack, as high as
Scalability matters! Mount Everest. Another way to put perspective on this is, Google image search not too long ago claimed to index 2 billion images, although based on meta-data, while we do it based on image content. So, with about 20 desktop systems like the one I just showed, it seems that it may be possible to build a web-scale content-based image search engine, and we are sort of hoping that this paper will fuel the race for the first such search engine. So, that is some motivation. Let me now move to the contribution of the paper. As you can guess by now, it is about scalability of recognition and retrieval. Scalability matters!
The Nearest-Neighbor Search Problem Given A set S of n points in d dimensions A query point q Which point in S is closest to q? Time complexity of linear scan: O( ? ) dn ?
The Nearest-Neighbor Search Problem
The Nearest-Neighbor Search Problem r-nearest neighbor for any query q, returns a point p ∈ S s.t. c-approximate r-nearest neighbor for any query q, returns a point p’ ∈ S
Today Indexing local features Inverted file Vocabulary tree Locality sensitivity hashing CS 376 Lecture 18
Indexing local features: inverted file
Indexing local features: inverted file For text documents, an efficient way to find all pages on which a word occurs is to use an index. We want to find all images in which a feature occurs. page ~ image word ~ feature To use this idea, we’ll need to map our features to “visual words”.
Text retrieval vs. image search What makes the problems similar, different? CS 376 Lecture 18
e.g., SIFT descriptor space: each point is 128-dimensional Visual words e.g., SIFT descriptor space: each point is 128-dimensional Extract some local features from a number of images … Slide credit: D. Nister, CVPR 2006
Visual words
Visual words
Visual words
Each point is a local descriptor, e.g. SIFT vector.
Example: Quantize into 3 words
Descriptor’s feature space Visual words Map high-dimensional descriptors to tokens/words by quantizing the feature space Quantize via clustering, let cluster centers be the prototype “words” Determine which word to assign to each new image region by finding the closest cluster center. Word #2 Descriptor’s feature space CS 376 Lecture 18
Visual words Each group of patches belongs to the same visual word! Figure from Sivic & Zisserman, ICCV 2003 CS 376 Lecture 18
Visual vocabulary formation Issues: Sampling strategy: where to extract features? Fixed locations or interest points? Clustering / quantization algorithm What corpus provides features (universal vocabulary?) Vocabulary size, number of words Weight of each word?
The index maps word-to-image ids Inverted file index Why the index give us a significant gain in efficiency? The index maps word-to-image ids
A query image is matched to database images that share visual words. Inverted file index A query image is matched to database images that share visual words. CS 376 Lecture 18
tf-idf weighting Term frequency – inverse document frequency Describe the frequency of each word within an image, decrease the weights of the words that appear often in the database economic, trade, … the, most, we, … w↗ discriminative regions w↘ common regions Standard weighting for text retrieval, measuring a word’s importance in a particular document CS 376 Lecture 18
tf-idf weighting Term frequency – inverse document frequency Describe the frequency of each word within an image, decrease the weights of the words that appear often in the database Total number of documents in database Number of occurrences of word i in document d Standard weighting for text retrieval Number of documents word i occurs in, in whole database Number of words in document d CS 376 Lecture 18
Bag-of-Words + Inverted file Bag-of-words representation http://people.cs.ubc.ca/~lowe/keypoints/ Inverted file http://www.robots.ox.ac.uk/~vgg /research/vgoogle/index.html
D. Nistér and H. Stewenius D. Nistér and H. Stewenius. Scalable Recognition with a Vocabulary Tree, CVPR 2006.
We then run k-means on the descriptor space We then run k-means on the descriptor space. In this setting, k defines what we call the branch-factor of the tree, which indicates how fast the tree branches. In this illustration, k is three. We then run k-means again, recursively on each of the resulting quantization cells. This defines the vocabulary tree, which is essentially a hierarchical set of cluster centers and their corresponding Voronoi regions. We typically use a branch-factor of 10 and six levels, resulting in a million leaf nodes. We lovingly call this the Mega-Voc.
Visualize as a tree
Vocabulary Tree Training: Filling the tree [Nister & Stewenius, CVPR’06] Slide credit: David Nister CS 376 Lecture 18
Vocabulary Tree Training: Filling the tree [Nister & Stewenius, CVPR’06] Slide credit: David Nister CS 376 Lecture 18
Vocabulary Tree Training: Filling the tree [Nister & Stewenius, CVPR’06] Slide credit: David Nister CS 376 Lecture 18
Vocabulary Tree Training: Filling the tree [Nister & Stewenius, CVPR’06] Slide credit: David Nister CS 376 Lecture 18
Vocabulary Tree Training: Filling the tree [Nister & Stewenius, CVPR’06] 42 Slide credit: David Nister CS 376 Lecture 18
Or perform geometric verification Vocabulary Tree Recognition Retrieved Or perform geometric verification [Nister & Stewenius, CVPR’06] Slide credit: David Nister CS 376 Lecture 18
Think about the computational advantage of the hierarchical tree vs Think about the computational advantage of the hierarchical tree vs. a flat vocabulary! CS 376 Lecture 18
Hashing
Direct addressing Create a direct-address table with m slots U 0 1 2 3 4 5 6 7 8 9 U (universe of keys) key satellite data 2 3 5 8 4 6 9 1 7 K (actual keys) 2 3 5 8
Direct addressing Search operation: O(1) Problem: The range of keys can be large! 64-bit numbers => 18,446,744,073,709,551,616 different keys SIFT: 128 * 8 bits U 2^64 K CS 376 Lecture 18
Hashing O(1) average-case time Use a hash function h to compute the slot from the key k T: hash table U (universe of keys) h(k1) may not be k1 anymore! h(k4) collision K (actual keys) may share a bucket k3 = h(k3) h(k5) k1 k4 m-1 k5 CS 376 Lecture 18
Hashing A good hash function Satisfies the assumption of simple uniform hashing: each key is equally likely to hash to any of the m slots. How to design a hash function for indexing high-dimensional data?
128-d T: hash table ?
Locality-sensitive hashing Indyk and Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality, STOC 1998.
Locality-sensitive hashing (LSH) Hash functions are locality-sensitive, if, for any pair of points p, q we have: Pr[h(p)=h(q)] is “high” if p is close to q Pr[h(p)=h(q)] is “low” if p is far from q ko-she CS 376 Lecture 18
Locality Sensitive Hashing A family H of functions h: Rd → U is called (r, cr, P1, P2)-sensitive, if for any p, q: if then Pr[h(p)=h(q)] > P1 if then Pr[h(p)=h(q)] < P2
LSH Function: Hamming Space Consider binary vectors points from {0, 1}d Hamming distance D(p, q) = # positions on which p and q differ Example: (d = 3) D(100, 011) = 3 D(010, 111) = 2
LSH Function: Hamming Space Define hash function h as hi(p) = pi where pi is the i-th bit of p Example: select the 1st dimension h(010) = 0 h(111) = 1 Pr[h(010)≠h(111)] = ? ⅔ = D(p, q)/d vs. D(p, q)? d? Pr[h(p)=h(q)] = ? 1 - D(p, q)/d Clearly, h is locality sensitive.
LSH Function: Hamming Space A k-bit locality-sensitive hash function is defined as g(p) = [h1(p), h2(p), …, hk(p)]T Each hi(p) is chosen randomly Each hi(p) results in a single bit Pr(similar points collide) ≥ Pr(dissimilar points collide) ≤ Indyk and Motwani [1998]
LSH Function: R2 space Consider 2-d vectors
LSH Function: R2 space The probability that a random hyperplane separates two unit vectors depends on the angle between them:
LSH Pre-processing Each image is entered into L hash tables indexed by independently constructed g1, g2, …, gL Preprocessing Space: O(LN)
LSH Querying For each hash table, return the bin indexed by gi(q), 1 ≤ i ≤ L. Perform a linear search on the union of the bins.
W. –T Lee and H. –T. Chen. Probing the local-feature space of interest points, ICIP 2010.
Hash family a : random vector sampled from a Gaussian distribution The dot-product a‧v projects each vector v to “a line” a : random vector sampled from a Gaussian distribution b : real value chosen uniformly from the range [0 , r] r : segment width
Building the hash table
Building the hash table : segment width (max-min)/t For each random projection, we get t buckets.
Building the hash table Generate K projections Combing them to get an index in the hash table: How many buckets do we get? tK
Building the hash table Example 5 projections (K = 5) 15 segments (t = 15) 155 = 759,375 buckets in total!
Sketching the Feature Space Natural image patches (from Berkeley segmentation database ) Noise image patches (Randomly-generated noise patches) Collect three image patches of different size 16x16 , 32x32 , 64x64 Each set consist of 200,000 patches.
Patch distribution over buckets
Summary Indexing techniques are essential for organizing a database and for enabling fast matching. For indexing high-dimensional data Inverted file Vocabulary tree Locality sensitive hashing
Resources and extended readings LSH Matlab Toolbox http://www.cs.brown.edu/~gregory/download.html Yeh et al., “Adaptive Vocabulary Forests for Dynamic Indexing and Category Learning,” ICCV 2007.