Spectral Approaches to Nearest Neighbor Search Alex Andoni Joint work with:Amirali Abdullah Ravi Kannan Robi Krauthgamer.

Slides:

Advertisements

Similar presentations

Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009.

Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Algorithmic High-Dimensional Geometry 1 Alex Andoni (Microsoft Research SVC)

Spectral Approaches to Nearest Neighbor Search arXiv: Robert Krauthgamer (Weizmann Institute) Joint with: Amirali Abdullah, Alexandr Andoni, Ravi.

Overcoming the L 1 Non- Embeddability Barrier Robert Krauthgamer (Weizmann Institute) Joint work with Alexandr Andoni and Piotr Indyk (MIT)

MIT CSAIL Vision interfaces Towards efficient matching with random hashing methods… Kristen Grauman Gregory Shakhnarovich Trevor Darrell.

Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) Dimensionality Reductions or data projections Random projections.

Dimensionality Reduction PCA -- SVD

An Overview of Machine Learning

Similarity Search in High Dimensions via Hashing

Dave Lattanzi’s RRT Algorithm. General Concept Use dictionaries for trees Create a randomized stack of nodes Iterate through stack “Extend” each tree.

Data Structures and Functional Programming Algorithms for Big Data Ramin Zabih Cornell University Fall 2012.

Nearest Neighbor Search in high-dimensional spaces Alexandr Andoni (Microsoft Research)

Large-scale matching CSE P 576 Larry Zitnick

Coherency Sensitive Hashing (CSH) Simon Korman and Shai Avidan Dept. of Electrical Engineering Tel Aviv University ICCV2011 | 13th International Conference.

Nearest Neighbor Search in High-Dimensional Spaces Alexandr Andoni (Microsoft Research Silicon Valley)

Affine-invariant Principal Components Charlie Brubaker and Santosh Vempala Georgia Tech School of Computer Science Algorithms and Randomness Center.

MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 

Principal Component Analysis

Three Papers: AUC, PFA and BIOInformatics The three papers are posted online.

1 Jun Wang, 2 Sanjiv Kumar, and 1 Shih-Fu Chang 1 Columbia University, New York, USA 2 Google Research, New York, USA Sequential Projection Learning for.

Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.

1 Lecture 18 Syntactic Web Clustering CS

KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.

Summer School on Hashing’14 Locality Sensitive Hashing Alex Andoni (Microsoft Research)

Optimal Data-Dependent Hashing for Approximate Near Neighbors

Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Simons Inst. / Columbia) Robert Krauthgamer (Weizmann Inst.) Ilya Razenshteyn (MIT, now.

Dimensionality Reduction

Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)

FLANN Fast Library for Approximate Nearest Neighbors

Indexing Techniques Mei-Chen Yeh.

Presented By Wanchen Lu 2/25/2013

Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.

COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.

Beyond Locality Sensitive Hashing Alex Andoni (Microsoft Research) Joint with: Piotr Indyk (MIT), Huy L. Nguyen (Princeton), Ilya Razenshteyn (MIT)

Fast Similarity Search for Learned Metrics Prateek Jain, Brian Kulis, and Kristen Grauman Department of Computer Sciences University of Texas at Austin.

Nearest Neighbor Search in high-dimensional spaces Alexandr Andoni (Princeton/CCI → MSR SVC) Barriers II August 30, 2010.

Using Support Vector Machines to Enhance the Performance of Bayesian Face Recognition IEEE Transaction on Information Forensics and Security Zhifeng Li,

Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)

Max-Margin Classification of Data with Absent Features Presented by Chunping Wang Machine Learning Group, Duke University July 3, 2008 by Chechik, Heitz,

Sketching and Nearest Neighbor Search (2) Alex Andoni (Columbia University) MADALGO Summer School on Streaming Algorithms 2015.

Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg.

Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg.

Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality Piotr Indyk, Rajeev Motwani The 30 th annual ACM symposium on theory of computing.

A NOVEL METHOD FOR COLOR FACE RECOGNITION USING KNN CLASSIFIER

Face recognition via sparse representation. Breakdown Problem Classical techniques New method based on sparsity Results.

Sketching, Sampling and other Sublinear Algorithms: Euclidean space: dimension reduction and NNS Alex Andoni (MSR SVC)

Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.

11 Lecture 24: MapReduce Algorithms Wrap-up. Admin PS2-4 solutions Project presentations next week – 20min presentation/team – 10 teams => 3 days – 3.

Optimal Data-Dependent Hashing for Nearest Neighbor Search Alex Andoni (Columbia University) Joint work with: Ilya Razenshteyn.

Sampling in Graphs Alexandr Andoni (Microsoft Research)

Iterative K-Means Algorithm Based on Fisher Discriminant UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE JOENSUU, FINLAND Mantao Xu to be presented.

Summer School on Hashing’14 Dimension Reduction Alex Andoni (Microsoft Research)

Debrup Chakraborty Non Parametric Methods Pattern Recognition and Machine Learning.

Sketching complexity of graph cuts Alexandr Andoni joint work with: Robi Krauthgamer, David Woodruff.

Tight Lower Bounds for Data- Dependent Locality-Sensitive Hashing Alexandr Andoni (Columbia) Ilya Razenshteyn (MIT CSAIL)

Data-dependent Hashing for Similarity Search

Data Driven Resource Allocation for Distributed Learning

Sublinear Algorithmic Tools 3

Lecture 11: Nearest Neighbor Search

Spectral Approaches to Nearest Neighbor Search [FOCS 2014]

Sketching and Embedding are Equivalent for Norms

Object Modeling with Layers

Lecture 7: Dynamic sampling Dimension Reduction

Data-Dependent Hashing for Nearest Neighbor Search

Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.

Locality Sensitive Hashing

Dimension reduction : PCA and Clustering

Minwise Hashing and Efficient Search

President’s Day Lecture: Advanced Nearest Neighbor Search

Presentation transcript:

Spectral Approaches to Nearest Neighbor Search Alex Andoni Joint work with:Amirali Abdullah Ravi Kannan Robi Krauthgamer

Nearest Neighbor Search (NNS)

Motivation  Generic setup:  Points model objects (e.g. images)  Distance models (dis)similarity measure  Application areas:  machine learning: k-NN rule  speech/image/video/music recognition, vector quantization, bioinformatics, etc…  Distance can be:  Hamming,Euclidean, edit distance, Earth-mover distance, etc…  Primitive for other problems

Curse of dimensionality AlgorithmQuery timeSpace Full indexing No indexing – linear scan

Approximate NNS

NNS algorithms 6 It’s all about space partitions ! Low-dimensional  [Arya-Mount’93], [Clarkson’94], [Arya-Mount- Netanyahu-Silverman-We’98], [Kleinberg’97], [Har-Peled’02],[Arya-Fonseca-Mount’11],… High-dimensional  [Indyk-Motwani’98], [Kushilevitz-Ostrovsky- Rabani’98], [Indyk’98, ‘01], [Gionis-Indyk- Motwani’99], [Charikar’02], [Datar-Immorlica- Indyk-Mirrokni’04], [Chakrabarti-Regev’04], [Panigrahy’06], [Ailon-Chazelle’06], [A-Indyk’06], [A-Indyk-Nguyen-Razenshteyn’14], [A- Razenshteyn’14]

Low-dimensional 7

High-dimensional 8

Practice  Data-aware partitions  optimize the partition to your dataset  PCA-tree [S’91, M’01, VKD’09]  randomized kd-trees [SAH08, ML09]  spectral/PCA/semantic/WTA hashing [WTF08, CKW09, SH09, YSRL11] 9

Practice vs Theory 10  Data-aware projections often outperform (vanilla) random-projection methods  But no guarantees (correctness or performance)  JL generally optimal [Alo’03, JW’13]  Even for some NNS regimes! [AIP’06] Why do data-aware projections outperform random projections ? Algorithmic framework to study phenomenon?

Plan for the rest 11  Model  Two spectral algorithms  Conclusion

Our model 12

Model properties 13

14

Tool: PCA 15  Principal Component Analysis  like SVD  Top PCA direction: direction maximizing variance of the data

Naïve attempt via PCA 16

1 st Algorithm: intuition 17  Extract “well-captured points”  points with signal mostly inside top PCA space  should work for large fraction of points  Iterate on the rest

Iterative PCA 18

Analysis of PCA 19

Closeness of subspaces ? 20

Wedin’s sin-theta theorem 21

Additional issue: Conditioning 22  After an iteration, the noise is not random anymore!  Conditioning because of selection of points (non-captured points)  Fix: estimate top PCA subspace from a small sample of the data  Might be purely due to analysis  But does not sound like a bad idea in practice either

Performance of Iterative PCA 23

2 nd Algorithm: PCA-tree 24  Closer to algorithms used in practice

2 algorithmic modifications 25  Centering:  Need to use centered PCA (subtract average)  Otherwise errors from perturbations accumulate  Sparsification:  Need to sparsify the set of points in each node of the tree  Otherwise can get a tight cluster:  not enough variance in signal  lots of noise

Analysis 26

Wrap-up 27 Why do data-aware projections outperform random projections ? Algorithmic framework to study phenomenon?

Post-perspective 28  Can data-aware projections help in worst-case situation?  Data-aware LSH provably better than classic LSH [A-Indyk-Nguyen-Razenshteyn’14], [A-Razenshteyn]  Overall improve performance almost quadratically  But not based on (spectral) projections  random data-dependent hash function  Data-aware projections for LSH ?  Instance-optimal partitions/projections?