Download presentation
Presentation is loading. Please wait.
Published byAvis Small Modified over 9 years ago
1
Presented by Arshad Jamal, Rajesh Dhania, Vinkal Vishnoi Active hashing and its application to image and text retrieval Yi Zhen, Dit-Yan Yeung, Published in DMKD Feb 2012
2
Introduction Computing similarity plays a fundamental role Hashing based methods gained popularity for large- scale similarity search Hashing based Tree based Suitable for low dimensions Data Dependent Data Independent Unsupervised Semi- supervised This paper proposes a novel Framework for Active Hashing
3
Related work Locality Sensitive Hashing Goal is to assign similar binary code for data points that are closer in feature space [Random Linear Projection + Thresh] Code length could become quite large Spectral Hashing Performs spectral decomposition to learn hash functions Assumes data to be uniformly distributed Active Learning Identify and present the most informative unlabeled data to human experts for labeling
4
Related Work: Semi-supervised Hashing Given N normalized data points of D dimensions Learn K Hash functions to generate K-bit binary code Build two set of point pairs S (Similar), D(Dissimilar) Together they characterize the semantic similarity Hash functions are learned by maximizing an objective function,
5
Limitations of SSH Point pairs from both S and D sets are considered to be equally important For multi-class data, the D points picked from closer or farther class contribute same weight More dissimilar points will spoil the learned hash function C1 C2 C3
6
Active Hashing (Greedy AH) Tries to overcome the limitations of SSH by picking most informative points Algorithm: Three main steps Given (L, U) labeled and un-labeled data points and candidate set C Select most informative pts A from C Get A labeled by an expert Update L, U, C Train the hash functions based on L & U
7
Greedy AH: Selecting data points Based on SSH model hash function Intuitively, the term indicates the certainty of x Data certainty (DC): Data points with smallest f will be the most informative points
8
Batch mode Active Hashing Selecting points one by one is inefficient and suboptimal Set of points are selected and processed to learn a Hash fn. µ is indicator function deciding about the presence of a point f is a vector of normalized certainty values in C K is positive semi-definite similarity matrix defined on C Choose M examples with largest µ
9
Experimental evaluation-I Image retrieval (MNIST dataset): Results reported for different parameter settings Text Retrieval (20Newsgroups (NEWS) data set) Random vs BMAH: Performance improvement
10
Experimental evaluation-II Image retrieval (MNIST dataset) BMAH vs GAH: BMAH takes less time
11
References Andoni A, Indyk P (2006) Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: Proceedings of the 47th annual IEEE symposium on foundations of computer science, FOCS ’06, IEEE Computer Society, Washington, pp 459–468 Weiss Y, Torralba A, Fergus R (2008) Spectral hashing. In: Koller D, Schuurmans D, Bengio Y, Bottou L (eds) Advances in neural information processing systems 21, NIPS 21, The MIT Press, Cambridge, MA, pp 1753–1760 Wang J,Kumar S, Chang S-F (2010a) Semi-supervised hashing for scalable image retrieval. In: Proceedings of IEEE conference on computer vision and pattern recognition [46], pp 3424–3431 Salakhutdinov R, Hinton GE (2009) Semantic hashing. Int J Approx Reason 50:969–978 Thanks
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.