1 Jun Wang, 2 Sanjiv Kumar, and 1 Shih-Fu Chang 1 Columbia University, New York, USA 2 Google Research, New York, USA Sequential Projection Learning for.

Slides:



Advertisements
Similar presentations
Statistical Machine Learning- The Basic Approach and Current Research Challenges Shai Ben-David CS497 February, 2007.
Advertisements

© 2009 IBM Corporation IBM Research Xianglong Liu 1, Junfeng He 2,3, and Bo Lang 1 1 Beihang University, Beijing, China 2 Columbia University, New York,
Aggregating local image descriptors into compact codes
Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Linear Classifiers (perceptrons)
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Searching on Multi-Dimensional Data
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Presented by Relja Arandjelović The Power of Comparative Reasoning University of Oxford 29 th November 2011 Jay Yagnik, Dennis Strelow, David Ross, Ruei-sung.
Classification and Decision Boundaries
Presented by Arshad Jamal, Rajesh Dhania, Vinkal Vishnoi Active hashing and its application to image and text retrieval Yi Zhen, Dit-Yan Yeung, Published.
Discriminative and generative methods for bags of features
Presented by Relja Arandjelović Iterative Quantization: A Procrustean Approach to Learning Binary Codes University of Oxford 21 st September 2011 Yunchao.
Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
1 Large Scale Similarity Learning and Indexing Part II: Learning to Hash for Large Scale Search Fei Wang and Jun Wang IBM TJ Watson Research Center.
Fast and Compact Retrieval Methods in Computer Vision Part II A. Torralba, R. Fergus and Y. Weiss. Small Codes and Large Image Databases for Recognition.
Principal Component Analysis
L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Sparse Solutions for Large Scale Kernel Machines Taher Dameh CMPT820-Multimedia Systems Dec 2 nd, 2010.
Ensemble Learning: An Introduction
Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint.
1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Y. Weiss (Hebrew U.) A. Torralba (MIT) Rob Fergus (NYU)
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Large Scale Recognition and Retrieval. What does the world look like? High level image statistics Object Recognition for large-scale search Focus on scaling.
Efficient Image Search and Retrieval using Compact Binary Codes
Machine Learning CS 165B Spring 2012
SVM by Sequential Minimal Optimization (SMO)
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Benk Erika Kelemen Zsolt
Recent Advances of Compact Hashing for Large-Scale Visual Search Shih-Fu Chang Columbia University October 2012 Joint work with Junfeng He (Facebook),
Fast Similarity Search for Learned Metrics Prateek Jain, Brian Kulis, and Kristen Grauman Department of Computer Sciences University of Texas at Austin.
Minimal Loss Hashing for Compact Binary Codes
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Mingyang Zhu, Huaijiang Sun, Zhigang Deng Quaternion Space Sparse Decomposition for Motion Compression and Retrieval SCA 2012.
A Content-Based Approach to Collaborative Filtering Brandon Douthit-Wood CS 470 – Final Presentation.
© 2009 IBM Corporation IBM Research Xianglong Liu 1, Yadong Mu 2, Bo Lang 1 and Shih-Fu Chang 2 1 Beihang University, Beijing, China 2 Columbia University,
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality Piotr Indyk, Rajeev Motwani The 30 th annual ACM symposium on theory of computing.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Efficient Image Search and Retrieval using Compact Binary Codes Rob Fergus (NYU) Jon Barron (NYU/UC Berkeley) Antonio Torralba (MIT) Yair Weiss (Hebrew.
Classification Ensemble Methods 1
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr.
Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Principal Component Analysis and Linear Discriminant Analysis for Feature Reduction Jieping Ye Department of Computer Science and Engineering Arizona State.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Page 1 CS 546 Machine Learning in NLP Review 2: Loss minimization, SVM and Logistic Regression Dan Roth Department of Computer Science University of Illinois.
Cross-modal Hashing Through Ranking Subspace Learning
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Ensemble Classifiers.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Learning to Align: a Statistical Approach
Data Driven Resource Allocation for Distributed Learning
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presenter: Chu-Song Chen
Boosting Nearest-Neighbor Classifier for Character Recognition
Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.
Locality Sensitive Hashing
COSC 4335: Other Classification Techniques
Minwise Hashing and Efficient Search
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

1 Jun Wang, 2 Sanjiv Kumar, and 1 Shih-Fu Chang 1 Columbia University, New York, USA 2 Google Research, New York, USA Sequential Projection Learning for Hashing with Compact Codes

 Nearest neighbor search in large databases with high dimensional points is important (e.g. image/video/document retrieval)  Exact search not practical  Computational cost  Storage cost (need to store original data points)  Approximate nearest neighbor (ANN)  T ree approaches (KD tree, metric tree, ball tree, … )  Hashing methods (locality sensitive hashing, spectral hashing, …) Nearest Neighbor Search

 Hyperplane partitioning  Linear projection based hashing x1x1 Binary Hashing Xx1x1 x2x2 x3x3 x4x4 x5x5 h1h h2h h1h1 h2h2 ……………… hkhk …………… 010…100…111…001…110… x2x2 x3x3 x4x4 x5x5

Related Work  Different choices of projections  random projections locality sensitive hashing ( LSH, Indyk et al. 98 ) shift invariant kernel hashing ( SIKH, Raginsky et al. 09 )  principal projections spectral hashing (SH, Weiss, et al. 08)  Different choices of  identity function: LSH & Boosted SSC ( Shakhnarovich, 05 )  sinusoidal function: SH & SIKH  Other recent work:  Restricted boltzman machines ( RBMs, Hinton et al. 06 )  Jian et al. 08 (metric learning)  Kulis et al. 09 & Mu et al. 10 (kernerlized)  Kulis NIPS 09 (binary reconstructive embedding)

 Existing hashing methods mostly rely on random or principal projections  not compact  low accuracy  Simple metrics are usually not enough to express semantic similarity  Similarity given by a few pairwise labels  Goal: to learn binary hash functions that give high accuracy with compact codes --- semi-supervised and unsupervised cases Main Issues

Formulation x1x1 x2x2 x3x3 x4x4 x5x5 x6x6 x7x7 x8x8 x1x1 x2x2 x3x3 x4x4 x5x5 x6x6 x7x7 x8x8

Formulation - Empirical Fitness # of # of correctly hashed pairs # of wrongly hashed pairs

 How well the hash codes fit on the training data  Recall the definition of pair-wise label metric  Objective: Empirical Fitness

 Replace the sign of projections with the signed magnitude  A more simplified matrix form as: Relaxing Empirical Fitness

 Maximizing empirical fitness is not sufficient  Maximizing the combination of empirical fitness over training data and entropy of hash codes Information Theoretic Regularizer 1 1 Neighbor pair Non-neighbor pair Maximum entropy principle

 Step 1: Maximum entropy equals balanced partition  Step 2: Balanced partition equals partition with maximum variance  Step 3: Substitute the maximum bit-variance term by its lower bound Relaxing Regularization Term

Final Objective  Orthogonal solution by Eigen-decomposition on adjusted covariance matrix M “adjusted” covariance matrix Not very accurate!

Sequential Solution  Motivation : to learn a new hash function which tries to correct the mistakes made by previous hash function

 Impose higher weights on point pairs violated by the previous hash functions Update Label Matrix

S3PLH Algorithm Summary

Unsupervised Extension (USPH)  Observation - boundary errors

Pseudo Pair-Wise Labels  Pseudo set of labeled data from contains all the point pairs:  Accordingly, generate a pseudo label matrix  Different with SPLH, USPLH generates new pseudo labels and the corresponding label matrix, instead of updating weights of a fixed set of given labels.

USPLH Algorithm Summary 35

Experiments  Datasets  MNIST (70K) – supervised case  SIFT Data (1 Million SIFT features) – unsupervised case  Evaluation protocol  Mean Average Precision and Recall  Setting of training: MNIST data semi-supervised: 1K labeled samples for training, 1K for query test SIFT Data unsupervised: 2K pseudo labels for training, 10K for query test

MNIST Digits 48-bit Recall curve

Training and Test Time

SIFT 1 Million Data 48-bit Recall curve

 Summary and contributions  A semi-supervised paradigm for hashing learning (Empirical risk with information theoretic regularization);  Sequential learning idea for error correction;  Extension of unsupervised case;  Easy implementation and highly scalable;  Future work  Theoretical analysis of performance guarantee  Weighted hamming embedding Summary and Conclusion

Thank you!