Recent Advances of Compact Hashing for Large-Scale Visual Search Shih-Fu Chang Columbia University October 2012 Joint work with Junfeng He (Facebook), Sanjiv Kumar (Google), Wei Liu (IBM Research), and Jun Wang (IBM Research)
digital video | multimedia lab Outline Lessons learned in designing hashing functions The importance of balancing hash bucket size How to incorporate supervised information Prediction of NN search difficulty & hashing performance Demo: Bag of hash bits for Mobile Visual Search
Fast Nearest Neighbor Search Applications: image search, texture synthesis, denoising … Avoid exhaustive search ( time complexity) 3 Dense matching, Coherence sensitive hashing (Korman&Avidan ’11) Photo tourism patch search Image search
Locality-Sensitive Hashing hash code collision probability proportional to original similarity l : # hash tables, K : hash bits per table hash function random 101 Index by compact code [Indyk, and Motwani 1998] [Datar et al. 2004]
Hash Table based Search 5 O(1) search time by table lookup bucket size is important (affect accuracy & post processing cost) xixi n q hash table hash bucketaddress
Different Approaches 6 Unsupervised Hashing LSH ‘98, SH ‘08, KLSH ‘09, AGH ’10, PCAH, ITQ ‘11 Semi-Supervised Hashing SSH ‘10, WeaklySH ‘10 Supervised Hashing RBM ‘09, BRE ‘10, MLH ‘11, LDAH ’11, ITQ ‘11, KSH ‘12
PCA + Minimize Quantization Errors PCA to maximize variance in each hash dimension find optimal rotation in the subspace to minimize quantization error ITQ method, Gong&Lazebnik, CVPR 11
Effects of Min Quantization Errors 580K tiny images PCA-ITQ, Gong&Lazebnik, CVPR 11 PCA-random rotation PCA-ITQ optimal alignment
Utilize supervised labels Semantic Category Supervision 9 Metric Supervision similar dissimilar similar dissimilar
Design Hash Codes to Match Supervised Information 10 similar dissimilar 0 1 Preferred hashing function
Adding Supervised Labels to PCA Hash Relaxation: Wang, Kumar, Chang, CVPR ’10, ICML’10 “adjusted” covariance matrix solution W: eigen vectors of adjusted covariance matrix If no supervision (S=0), it is simply PCA hash Fitting labels PCA covariance matrix dissimilar pair similar pair
Semi-Supervised Hashing (SSH) 1 Million GIST Images 1% labels, 99% unlabeled Supervised RBM Random LSH Unsupervised SH SSH top 1K
Problem of orthogonal projections Many buckets become empty when # bits increases. Need to search many neighbor buckets at query time hamming radius 2
Explicitly optimize two terms – Preserve similarity (accuracy) – Balanced bucket size max entropy min mutual info I (search time) Search accuracy ICA Type Hashing Balanced bucket size SPICA Hash, He et al, CVPR 11 Fast ICA to find non-orthogonal projections
The Importance of balanced size Bucket index Bucket size LSH SPICA Hash Balanced bucket size Simulation over 1M tiny image samples The largest bucket of LSH contains 10% of all 1M samples
Different Approaches 16 Unsupervised Hashing LSH ‘98, SH ‘08, KLSH ‘09, AGH ’10, PCAH, ITQ ‘11 Semi-Supervised Hashing SSH ‘10, WeaklySH ‘10 Supervised Hashing RBM ‘09, BRE ‘10, MLH ‘11, LDAH ’11, ITQ ‘11, KSH ‘12
Better ways to handle supervised information? 17 MLH [Norouzi & Flee, ‘11] BRE [Kulis & Darrell, ‘10] Hamming distance between H(x i ) and H(x j ) hinge loss But optimizing Hamming Distance (D H, XOR) is not easy!
A New Supervision Form: Code Inner Products 18 S x2x2 x3x3 x1x1 dissimilar similar supervised hashing labeled data dissimilar Х T code matrix x1x1 x2x2 x3x3 x1x1 x2x2 x3x3 pair-wise label matrix code inner products r x1x1 x2x2 x3x3 code matrix fitting Liu, Wang, Ji, Jiang, Chang, CVPR’12 proof: code inner product ≡ Hamming distance
Code Inner Product enables efficient optimization Much easier/faster to optimize and extend to kernels 19 sample hash bit Hashing: Design hash codes to match supervised information Liu, Wang, Ji, Jiang, Chang, CVPR2012
Extend Code Inner Product to Kernel Following KLSH, construct a hash function using a kernel function and m anchor samples: zero-mean normalization applied to k(x). 20 = sgn hash coefficients kernel matrix × l samples m anchors
Benefits of Code Inner Product 21 CIFAR 10, 60K object images from 10 classes, 1K query images. 1K supervised labels. KSH 0 Spec Relax, KSH Sigmoid hashing function Supervised Methods Open Issue: empty buckets and balance not addressed
Speedup by Inner Code Product 22CVPR 2012 Method Train Time Test Time 48 bits SSH ×10 −5 LDAH ×10 −5 BRE ×10 −5 MLH ×10 −5 KSH ×10 −5 KSH ×10 −5 Significant speedup
Effect of training data size 23
Tiny-1M 24CVPR M tiny images from the web, 2K query images. Pseudo labels: top 5% L2 neighbors. top 5K NN neighbors used in as supervised labels Supervised Methods
25 Tiny-1M: Visual Search Results CVPR 2012 More visually relevant
Comparison: KD-Tree O(log n) search time (e.g., 20 bits for 1 million nodes) E.g., VlFeat/FLANN tool, Best-Fit-First Search Strategy curse of dimensionality – needs of backtracking Might be hard to store tree indexes on small devices
Comparison of Hashing vs. KD-Tree Supervised Hashing Photo Tourism Patch set (Norte Dame subset, 103K samples) 512D GIFT Anchor Graph Hashing KD Tree
How difficult is approximate nearest neighbor search in a dataset? Understand Difficulty of Approximate Nearest Neighbor Search Toy example q x is an ε-approximate NN if Search not meaningful! A concrete measure of difficulty of search in a dataset? He, Kumar, Chang, ICML 2012
A naïve search approach: Randomly pick a point and compare that to the NN Relative Contrast q High Relative Contrast easier search If, search not meaningful He, Kumar, Chang, ICML 2012
With CLT, and binomial approximation Estimation of Relative Contrast ϕ - standard Gaussian cdf σ' – a function of data properties (dimensionality and sparsity) n: data size p: Lp distance
Data sampled randomly from U[0,1] Synthetic Data relative contrast higher dimensionality bad sparser vectors good s: prob. of non-zero element in each dim. d: feature dimension
Data sampled randomly from U[0,1] Synthetic Data relative contrast lower p good Larger database good
Predict Hashing Performance of Real-World Data 16 bits LSH DatasetDimensionality (d) Sparsity (s) Relative Contrast (C r ) for p = 1 SIFT Gist Color Hist Imagenet BoW bits LSH
Mobile Search System by Hashing 34 Light Computing Low Bit Rate Big Data Indexing He, Feng, Liu, Cheng, Lin, Chung, Chang. Mobile Product Search with Bag of Hash Bits and Boundary Reranking, CVPR 2012.
Estimate the Complexity 500 local features per image – Feature size ~128 Kbytes – more than 10 seconds for transmission over 3G Database indexing – 1 million images need 0.5 billions local features – Finding matched features becomes challenging Idea: directly compute compact hash codes on mobile devices
Approach: hashing Each local feature coded as hash bits –locality sensitive, efficient for high dimensions Each image is represented as Bag of Hash Bits … … 36
Bit Reuse for Multi-Table Hashing To reduce transmission size – Reuse a single hash bit pool by random subsampling Optimal hash bit pool (e.g., 80 bits, PCA Hash or SPICA hash) Random subset... Table 1 Table 2 Table 11 Table bits Union Results
Rerank Results with Boundary Features Use automatic salient object segmentation for every image in DB [Cheng et al, CVPR 2011] Compute boundary features: normalized central distance, Fourier magnitude Invariance: translation, scaling, rotation 38
Boundary Feature – Central Distance Distance to Center D(n) FFT: F(n) 39
Reranking with boundary feature 40
Server: 1 million product images crawled from Amazon, eBay and Zappos Hundreds of categories; shoes, clothes, electrical devices, groceries, kitchen supplies, movies, etc. Speed Feature extraction: ~1s Transmission: 80 bits/feature, 1KB/image Serer Search: ~0.4s Download/display: 1-2s Mobile Product Search System: Bags of Hash Bits and Boundary features video demovideo demo (52”) He, Feng, Liu, Cheng, Lin, Chung, Chang. Mobile Product Search with Bag of Hash Bits and Boundary Reranking, CVPR 2012.
Performance Baseline [Chandrasekhar et al CVPR ‘10]: Client: compress local features with CHoG Server: BoW with Vocabulary Tree (1M codes) 30% higher recall and 6X-30X search speedup 42
Summary Some Ideas Discussed – bucket balancing is important – code inner product – an efficient form of supervised hashing – insights on search difficulty prediction – Large mobile search – a good test case for hashing Open Issues – supervised hashing vs. attribute discovery – hashing beyond point-to-point search – hashing to incorporate structured relation (spatio- temporal) 43
References ( Supervised Kernel Hash) W. Liu, J. Wang, R. Ji, Y. Jiang, and S.-F. Chang, Supervised Hashing with Kernels, CVPR (Difficulty of Nearest Neighbor Search) J. He, S. Kumar, S.-F. Chang, On the Difficulty of Nearest Neighbor Search, ICML ( Hash Based Mobile Product Search) J. He, T. Lin, J. Feng, X. Liu, S.-F. Chang, Mobile Product Search with Bag of Hash Bits and Boundary Reranking, CVPR 2012 (Hashing with Graphs) W. Liu, J. Wang, S. Kumar, S.-F. Chang. Hashing with Graphs, ICML (Iterative Quantization) Y. Gong and S. Lazebnik, Iterative Quantization: A Procrustean Approach to Learning Binary Codes, CVPR (Semi-Supervised Hash) J. Wang, S. Kumar, S.-F. Chang. Semi-Supervised Hashing for Scalable Image Retrieval. CVPR (ICA Hashing) J.He, R. Radhakrishnan, S.-F. Chang, C. Bauer. Compact Hashing with Joint Optimization of Search Accuracy and Time. CVPR