Y. Weiss (Hebrew U.) A. Torralba (MIT) Rob Fergus (NYU) Spectral Hashing Y. Weiss (Hebrew U.) A. Torralba (MIT) Rob Fergus (NYU)
Motivation What does the world look like? Object Recognition for large-scale search High level image statistics And also the relationships between objects and the scene in general.
Semantic Hashing Semantic Hash Function Binary code [Salakhutdinov & Hinton, 2007] Query Image Semantic Hash Function Address Space Binary code Images in database Query address Semantically similar images Quite different to a (conventional) randomizing hash
1. Locality Sensitive Hashing Gionis, A. & Indyk, P. & Motwani, R. (1999) Take random projections of data Quantize each projection with few bits 1 101 1 1 Gist descriptor No learning involved
Toy Example 2D uniform distribution
2. Boosting Learn threshold & dimension for each bit (weak classifier) Modified form of BoostSSC [Shaknarovich, Viola & Darrell, 2003] Positive examples are pairs of similar images Negative examples are pairs of unrelated images 1 1 Learn threshold & dimension for each bit (weak classifier) 1
Toy Example 2D uniform distribution
3. Restricted Boltzmann Machine (RBM) Type of Deep Belief Network Hinton & Salakhutdinov, Science 2006 Hidden units Visible units Symmetric weights Units are binary & stochastic Single RBM layer W Attempts to reconstruct input at visible layer from activation of hidden layer
Multi-Layer RBM: non-linear dimensionality reduction Output binary code (N dimensional) Layer 3 N w3 256 Layer 2 256 w2 512 Layer 1 512 w1 512 Linear units at first layer Input Gist vector (512 dimensions)
Toy Example 2D uniform distribution
2-D Toy example: 3 bits 7 bits 15 bits Distance from query point Red – 0 bits Green – 1 bit Black – >2 bits Blue – 2 bits Query Point
Toy Results Distance Red – 0 bits Green – 1 bit Blue – 2 bits
Semantic Hashing Semantic Hash Function Binary code [Salakhutdinov & Hinton, 2007] Query Image Semantic Hash Function Address Space Binary code Images in database Query address Semantically similar images Quite different to a (conventional) randomizing hash
Spectral Hash Binary code Real-valued vectors Query Image Spectral Hash Non-linear dimensionality reduction Address Space Binary code Images in database Real-valued vectors Query address Semantically similar images Quite different to a (conventional) randomizing hash
Spectral Hashing (NIPS ’08) Assume points are embedded in Euclidean space How to binarize so Hamming distance approximates Euclidean distance? Ham_Dist(10001010,11101110)=3
Spectral Hashing theory Want to min YT(D-W)Y subject to: Each bit on 50% of time Bits are independent Sadly, this is NP-complete Relax the problem, by letting Y be continuous. Now becomes eigenvector problem
Nystrom Approximation Method for approximating eigenfunctions Interpolate between existing data points Requires evaluation of distance to existing data cost grows linearly with #points Also overfits badly in practice
What about a novel data point? Need a function to map new points into the space Take limit of Eigenvalues as n\inf Need to carefully normalize graph Laplacian Analytical form of Eigenfunctions exists for certain distributions (uniform, Gaussian) Constant time compute/evaluate new point For uniform: Only depends on extent of distribution (b-a)
Eigenfunctions for uniform distribution
The Algorithm Input: Data {xi} of dimensionality d; desired # bits, k Fit a multidimensional rectangle to the data Run PCA to align axes, then bound uniform distribution For each dimension, calculate k smallest eigenfunctions. This gives dk eigenfunctions. Pick ones with smallest k eigenvalues. Threshold eigenfunctions at zero to give binary codes
1. Fit Multidimensional Rectangle Run PCA to align axes Bound uniform distribution
2. Calculuate Eigenfunctions
3. Pick k smallest Eigenfunctions Eigenvalues e.g. k=3
4. Threshold chosen Eigenfunctions
Back to the 2-D Toy example 3 bits 7 bits 15 bits Distance Red – 0 bits Green – 1 bit Blue – 2 bits
2-D Toy Example Comparison
10-D Toy Example
Experiments on Real Data
Input Image representation: Gist vectors Pixels not a convenient representation Use Gist descriptor instead (Oliva & Torralba, 2001) 512 dimensions/image (real-valued 16,384 bits) L2 distance btw. Gist vectors not bad substitute for human perceptual distance NO COLOR INFORMATION Oliva & Torralba, IJCV 2001
LabelMe images 22,000 images (20,000 train | 2,000 test) Ground truth segmentations for all Assume L2 Gist distance is true distance
LabelMe data
Extensions
How to handle non-uniform distributions
Bit allocation between dimensions Compare value of cuts in original space, i.e. before the pointwise nonlinearity.
Summary Spectral Hashing Simple way of computing good binary codes Forced to make big assumption about data distribution Use point-wise non-linearities to map distribution to uniform Need more experiments on real data
Overview Assume points are embedded in Euclidean space (e.g. output from RBM) How to binarize the space so that Hamming distance between points approximates L2 distance?
Semantic Hashing beyond 30 bits
Strategies for Binarization Deliberately add noise during backprop - forces extreme values to overcome noise 1 1