Download presentation
Presentation is loading. Please wait.
Published byFlora Bishop Modified over 9 years ago
1
1 Large Scale Similarity Learning and Indexing Part II: Learning to Hash for Large Scale Search Fei Wang and Jun Wang IBM TJ Watson Research Center
2
2 Outline Background Approximate nearest neighbor search Tree and hashing for data Indexing Locality sensitive hashing Learning to Hashing: Unsupervised hashing Supervised hashing Semi-supervised hashing (pointwise/pairwise/listwise) Large Scale Active Learning with Hashing Hyperplane hashing Fast query selection with hashing Summary and Discussion
3
3 Motivation Similarity based search has been popular in many applications –Image/video search and retrial: finding most similar images/videos –Audio search: find similar songs –Product search: find shoes with similar style but different color –Patient search: find patients with similar diagnostic status Two key components: –Similarity/distance measure –Indexing scheme Whittlesearch (Kovashka et al. 2013)
4
4 Nearest Neighbor Search (NNS) Given a set of points in a metric space and a query point, find the closest point in to -Nearest neighbor k-nearest neighbor search Nearest neighbor search is a fundamental problem in many topics, including computational geometry, information retrieval, machine learning, data mining, computer vision, and so on Time complexity: linear to the size of data Also need to load the entire dataset in the memory Example: 1 billion images with 10K-dim BOW features Linear scan takes ~15 hrs; Storage for such a dataset is ~40 TB
5
5 Approximate Nearest Neighbor Instead finding the exact nearest neighbors, return approximate nearest neighbors (Indyk, 03) ANNs are reasonably good for many applications Retrieve ANNs could be much faster (with a sublienar complexity) Tree and Hashing are two popular indexing schemes for fast ANN search
6
6 Tree-Based ANN Search Recursively partition the data: Divide and Conquer Search complexity is O(log n) (worst case could be O(n)) Inefficient for high dimensional data Requires significant memory cost tree KD-tree
7
7 Various Tree-Based Methods Different ways to construct the tree structure KD-treeBall-tree PCA-treeRandom Projection-tree 2nd PC 1st PC
8
8 Hashing-Based ANN Search Repeatedly partition the data Each item in database represented as a hash code Significantly reduced storage cost for 1 billion images: 40 TB features -> 8GB hash codes Search complexity: constant time or sublinear linear scan 1 billions images: 15 hrs->13 sec x1x1 Xx1x1 x2x2 x3x3 x4x4 x5x5 h1h1 01101 h2h2 10101 h1h1 h2h2 ……………… hkhk …………… 010…100…111…001…110… x2x2 x3x3 x4x4 x5x5
9
9 Hashing: Training Step Design models (hash functions) for computing hashing codes a general linear projection based hash function family Estimate the model parameters
10
10 Hashing: Indexing Step Compute the hash codes for each database item Organize all the codes in hash tables (Inverse look-up) database items hash codes database items Inverse look-up hash bucket
11
11 Hashing: Query Step (Hash Lookup) Compute the hash codes for the query point Return the points within a small radius to the query in the Hamming space The number of hashing codes within Hamming radius is
12
12 Hashing: Query Step (Hamming Ranking) Hamming distance: the number of different bits between two hash codes Rank the database items using their Hamming distance to the query’s hash code generate rank list
13
13 A Conceptual Diagram for Hashing Based Image Search System Indexing and Search Image Database Similarity Search & Retrieval Hash Function Design Visual Search Applications Reranking Refinement Designing compact yet accurate hashing codes is a critical component to make the search effective
14
14 Locality Sensitive Hashing (LSH) 0 1 0 1 0 1 Database Items hash function random 101 Query
15
15 Single hash bit hash table with the bit length Collision Probability High dot product: unlikely to split Lower dot product: likely to split =
16
16 Outline Background Approximate nearest neighbor search Tree and hashing for data Indexing Locality sensitive hashing Learning to Hashing: Unsupervised hashing Supervised hashing Semi-supervised hashing (pointwise/pairwise/listwise) Large Scale Active Learning with Hashing Hyperplane hashing Fast query selection with hashing Summary and Discussion
17
17 Overview: Learning-Based Hashing Techniques Unsupervised: only use the property of unlabeled data (data- dependent) Spectral hashing (SH, Weiss, et al. NIPS 2008) Kernerlized methods (Kulis et al. ICCV 2009) Graph hashing (Liu et al. ICML 2011) Isotropic hashing (Kong et al. NIPS 2012) Angular quantization hashing (Gong et al. NIPS 2012) Supervised: use labeled data (task dependent) Deep learning based (Torralba, CVPR 2008) Binary reconstructive embedding (Kulis et al. NIPS 2009) Supervised kernel method (Liu et al. CVPR 2012) Minimal Loss Hashing (Norouzi & Fleet ICML 2011) Semi-Supervised: use both labeled and unlabeled data Metric learning based (Jian et al. CVPR 2008) Semi-supervised hashing (Wang et al. CVPR 2010, PAMI 2012) Sequential hashing (Wang et al. ICML 2011)
18
18 Overview: Advanced (Other) Hashing Techniques Triplet and Listwise Hashing Hamming metric learning based hashing (Norouzi et al. NIPS 2012) Order preserving hashing (Wang et al. ACM MM 2013) Column generation hashing (Li et al. ICML 2013) Ranking supervised hashing (Wang et. al. ICCV 2013) Hyperplane Hashing and Active Learning Angle & embedding hyperplane hashing (Jain et al. NIPS 2010) Bilinear hashing (Liu et al. ICML 2012) Fast pairwise query selection (Qian et al. ICDM 2013) Hashing for complex data sources Heterogeneous hashing (Ou et al. KDD 2013) Structured hashing (Ye et al. ICCV 2013) Multiple feature hashing (Song et al. ACM MM 2011) Composite hashing (Zhang et al. ACM SIGIR 2011) Submodular hashing (Cao et al. ACM MM 2012)
19
19 Unsupervised: PCA Hashing Partition the data along the PCA directions Projections with high variance are more reliable Low-variance projections are very noisy
20
20 Unsupervised: Spectral Hashing (Weiss et al. 2008) Partition the data along the PCA directions Essentially a balanced minimum cut problem (NP hard for a single bit partition) Approximation through spectral relaxation (uniform distribution assumption) balancing orthogonality
21
21 Unsupervised: Spectral Hashing Illustration of spectral hashing Main steps: 1) extraction projections through performing PCA on the data, 2) projection selection (prefer projections with large spread and small spatial frequency) 3) generating hashing codes through thresholding a sinusoidal function
22
22 Unsupervised: Graph Hashing (Liu et al. 2012) Graph is capable of capturing complex nonlinear structure The same objective as spectral hashing +1 +1
23
23 Unsupervised: Graph Hashing Different Solution: Full graph construction and eigen-decomposition is not scalable The same objective as spectral hashing
24
24 Unsupervised: Angular Quantization (Gong et al. 2012) Data-independent angular quantization The binary codes of a data point is the nearest binary vertex in the hypercube Data-dependent angular quantization
25
25 From Unsupervised to Supervised Hashing Existing hashing methods mostly rely on random or principal projections Not compact Insufficient accuracy Simple metrics and features are usually not enough to express semantic similarity – semantic gap Goal: to learn effective binary hash functions through incorporating supervised information Five categories of objects from Caltech 101, 40 images for each category
26
26 Binary Reconstructive Embedding (Kulis & Darrell 2009) Kernelized hashing function Euclidian distance and the binary distance Objective: minimize the difference of these two distance measures Can be supervised method if using semantic distance/similarity
27
27 RBMs Based Binary Coding (Torralba et al. 2008) Restricted Boltzmann Machine (RBM) Stacking RBMs into multiple layers – deep network (512-512-256-N) The training process has two stages: unsupervised pre-training and supervised fine tuning weight offset energy Objective expected log probability Hinton & Salakhutdinov, 2006
28
28 Supervised Hashing with Kernels (Liu et al. 2012) Pairwise similarity Code inner product approximates pairwise similarity
29
29 Metric Learning Based Hashing ( Jain et al. 2008) Given the distance metric, the generalized distance Generalized similarity measure Parameterized hash function Collision probability
30
30 Semi-Supervised Hashing ( Wang et al. 2010) Different ways to preserve pairwise relationships Besides minimizing empirical loss on labeled data 1 1 Neighbor pair Non-neighbor pair Maximum entropy principle
31
31 Supervised information as triplets Triplet ranking loss Objective: minimize regularized ranking loss Optimization: stochastic gradient descent Hamming Metric Learning ( Norouzi et al. 2012)
32
32 Column Generation Hashing (Li et al. 2013) Learn hashing functions and the weights of hash bits Large-margin formulation Column generation to iteratively lean hash bits and update the bit weights weighted Hamming distance
33
33 Ranking Supervised Hashing (Wang et al. 2013) Preserve ranking list in the Hamming space Triplet matrix representing ranking order Ranking consistency
34
34 Outline Background Approximate nearest neighbor search Tree and hashing for data Indexing Locality sensitive hashing Learning to Hashing: Unsupervised hashing Supervised hashing Semi-supervised hashing (pointwise/pairwise/listwise) Large Scale Active Learning with Hashing Hyperplane hashing Fast query selection with hashing Summary and Discussion
35
35 Point-to-Point NN vs. Point-to-Hyperplane NN Hyperplane hashing: aims in finding nearest points to a hyperplane An efficient way for query selection in the active learning paradigm Points nearest to the hyperplanes are the most uncertain ones
36
36 Hyperplane Hashing Objective: find the datapoints with the shortest point-to-hyperplane distance
37
37 Angle and Embedding Based Hyperplane Hashing (Jain et al. 2010) Angle based hyperplane hashing Embedding based hyperplane hashing Figures from http://vision.cs.utexas.edu/projects/activehash/ collision probability embedding Distance in the embedded space proportional to the distance in the original space
38
38 Bilinear Hyperplane Hashing (Liu et al. 2012) Bilinear hash functions u v x1x1 x2x2
39
39 Analysis and Comparison Collision probability of bilinear hyperplane hashing Compare with a ngle-based and embedding-based
40
40 Active Learning for Big Data Active learning aims in reducing the annotation cost through connecting human and prediction models The key idea for active learning is to iteratively identify those ambiguous points to the current prediction model requires exhaustive testing over all the data samples at least linear complexity and not feasible for big data applications Example and figure from “Active Learning Literature Survey” by Burr Settles
41
41 Active Learning with Hashing The conceptual diagram Two key components Index unlabeled data into hash tables; Compute the hash code of the current classifier and treat it as a query Figures from http://vision.cs.utexas.edu/projects/activehash/
42
42 Empirical Study 20News group data (18,846 documents, 20 classes) Starting with 5 randomly labeled documents per class Perform 300 iterations of active learning with different query selection strategy
43
43 Active Learning with Pairwise Queries Typical applications can be found in pairwise comparison based ranking (Jamieson & Nowak et al. 2011, Wauthier et al. 2013) In an active learning setting, the system sends the annotator a pair of points and receives the relevance comparison result as supervision Exhaustively select optimal sample pairs will be with quadratic complexity Qian et al. 2013 Jamieson & Nowak et al. 2011
44
44 Fast Pairwise Query Selection with Hashing Key motivations: The selected query pairs should be with high relevance; The order between the pair points should be uncertain Two-step selection strategy Relevance selection (point-to-point hashing); Uncertain selection (point-to-hyperplane hashing):
45
45 Outline Background Approximate nearest neighbor search Tree and hashing for data Indexing Locality sensitive hashing Learning to Hashing: Unsupervised hashing Supervised hashing Semi-supervised hashing (pointwise/pairwise/listwise) Large Scale Active Learning with Hashing Hyperplane hashing Fast query selection with hashing Summary and Discussion
46
46 Summary and Trend in Metric Learning
47
47 Summary and Trend in Learning to Hash From data-independent to data-dependent From task-independent to task-dependent From simple supervising to complex supervision (pointwise- >pairwise->triplet/listwise) From linear methods to kernel based methods From homogeneous data to heterogeneous data From simple data to structured data From point-to-point methods to point-to-hyperplane methods From model driven to application driven From single table to multiple table
48
48 References
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.