Optimal Data-Dependent Hashing for Nearest Neighbor Search Alex Andoni (Columbia University) Joint work with: Ilya Razenshteyn.

Slides:



Advertisements
Similar presentations
A Nonlinear Approach to Dimension Reduction Robert Krauthgamer Weizmann Institute of Science Joint work with Lee-Ad Gottlieb TexPoint fonts used in EMF.
Advertisements

Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009.
Algorithmic High-Dimensional Geometry 1 Alex Andoni (Microsoft Research SVC)
Spectral Approaches to Nearest Neighbor Search arXiv: Robert Krauthgamer (Weizmann Institute) Joint with: Amirali Abdullah, Alexandr Andoni, Ravi.
Overcoming the L 1 Non- Embeddability Barrier Robert Krauthgamer (Weizmann Institute) Joint work with Alexandr Andoni and Piotr Indyk (MIT)
Searching on Multi-Dimensional Data
MIT CSAIL Vision interfaces Towards efficient matching with random hashing methods… Kristen Grauman Gregory Shakhnarovich Trevor Darrell.
Efficiently searching for similar images (Kristen Grauman)
Cse 521: design and analysis of algorithms Time & place T, Th pm in CSE 203 People Prof: James Lee TA: Thach Nguyen Book.
Similarity Search in High Dimensions via Hashing
Fast Algorithm for Nearest Neighbor Search Based on a Lower Bound Tree Yong-Sheng Chen Yi-Ping Hung Chiou-Shann Fuh 8 th International Conference on Computer.
Data Structures and Functional Programming Algorithms for Big Data Ramin Zabih Cornell University Fall 2012.
Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)
VLSH: Voronoi-based Locality Sensitive Hashing Sung-eui Yoon Authors: Lin Loi, Jae-Pil Heo, Junghwan Lee, and Sung-Eui Yoon KAIST
Nearest Neighbor Search in high-dimensional spaces Alexandr Andoni (Microsoft Research)
Spectral Approaches to Nearest Neighbor Search Alex Andoni Joint work with:Amirali Abdullah Ravi Kannan Robi Krauthgamer.
Large-scale matching CSE P 576 Larry Zitnick
Coherency Sensitive Hashing (CSH) Simon Korman and Shai Avidan Dept. of Electrical Engineering Tel Aviv University ICCV2011 | 13th International Conference.
Nearest Neighbor Search in High-Dimensional Spaces Alexandr Andoni (Microsoft Research Silicon Valley)
Algorithms for Nearest Neighbor Search Piotr Indyk MIT.
Given by: Erez Eyal Uri Klein Lecture Outline Exact Nearest Neighbor search Exact Nearest Neighbor search Definition Definition Low dimensions Low dimensions.
Approximate Nearest Neighbors and the Fast Johnson-Lindenstrauss Transform Nir Ailon, Bernard Chazelle (Princeton University)
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Nearest Neighbour Condensing and Editing David Claus February 27, 2004 Computer Vision Reading Group Oxford.
Summer School on Hashing’14 Locality Sensitive Hashing Alex Andoni (Microsoft Research)
Optimal Data-Dependent Hashing for Approximate Near Neighbors
Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Simons Inst. / Columbia) Robert Krauthgamer (Weizmann Inst.) Ilya Razenshteyn (MIT, now.
FLANN Fast Library for Approximate Nearest Neighbors
Efficient Image Search and Retrieval using Compact Binary Codes
Indexing Techniques Mei-Chen Yeh.
Approximation algorithms for large-scale kernel methods Taher Dameh School of Computing Science Simon Fraser University March 29 th, 2010.
Beyond Locality Sensitive Hashing Alex Andoni (Microsoft Research) Joint with: Piotr Indyk (MIT), Huy L. Nguyen (Princeton), Ilya Razenshteyn (MIT)
Nearest Neighbor Paul Hsiung March 16, Quick Review of NN Set of points P Query point q Distance metric d Find p in P such that d(p,q) < d(p’,q)
Fast Similarity Search for Learned Metrics Prateek Jain, Brian Kulis, and Kristen Grauman Department of Computer Sciences University of Texas at Austin.
Minimal Loss Hashing for Compact Binary Codes
Nearest Neighbor Search in high-dimensional spaces Alexandr Andoni (Princeton/CCI → MSR SVC) Barriers II August 30, 2010.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.7: Instance-Based Learning Rodney Nielsen.
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT.
Sketching and Nearest Neighbor Search (2) Alex Andoni (Columbia University) MADALGO Summer School on Streaming Algorithms 2015.
Similarity Searching in High Dimensions via Hashing Paper by: Aristides Gionis, Poitr Indyk, Rajeev Motwani.
Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality Piotr Indyk, Rajeev Motwani The 30 th annual ACM symposium on theory of computing.
Sketching, Sampling and other Sublinear Algorithms: Euclidean space: dimension reduction and NNS Alex Andoni (MSR SVC)
1 Efficient Algorithms for Substring Near Neighbor Problem Alexandr Andoni Piotr Indyk MIT.
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
11 Lecture 24: MapReduce Algorithms Wrap-up. Admin PS2-4 solutions Project presentations next week – 20min presentation/team – 10 teams => 3 days – 3.
Summer School on Hashing’14 Dimension Reduction Alex Andoni (Microsoft Research)
KNN & Naïve Bayes Hongning Wang
Tight Lower Bounds for Data- Dependent Locality-Sensitive Hashing Alexandr Andoni (Columbia) Ilya Razenshteyn (MIT CSAIL)
Data-dependent Hashing for Similarity Search
SIMILARITY SEARCH The Metric Space Approach
Grassmannian Hashing for Subspace searching
Approximate Near Neighbors for General Symmetric Norms
Fast nearest neighbor searches in high dimensions Sami Sieranoja
Sublinear Algorithmic Tools 3
Lecture 11: Nearest Neighbor Search
Spectral Approaches to Nearest Neighbor Search [FOCS 2014]
K Nearest Neighbor Classification
Sketching and Embedding are Equivalent for Norms
Lecture 7: Dynamic sampling Dimension Reduction
Near(est) Neighbor in High Dimensions
Data-Dependent Hashing for Nearest Neighbor Search
Locality Sensitive Hashing
Fast Nearest Neighbor Search in the Hamming Space
Overcoming the L1 Non-Embeddability Barrier
CS5112: Algorithms and Data Structures for Applications
Data Mining Classification: Alternative Techniques
Lecture 15: Least Square Regression Metric Embeddings
Minwise Hashing and Efficient Search
President’s Day Lecture: Advanced Nearest Neighbor Search
LSH-based Motion Estimation
Presentation transcript:

Optimal Data-Dependent Hashing for Nearest Neighbor Search Alex Andoni (Columbia University) Joint work with: Ilya Razenshteyn

Nearest Neighbor Search (NNS) 2

Motivation  Generic setup:  Points model objects (e.g. images)  Distance models (dis)similarity measure  Application areas:  machine learning: k-NN rule  speech/image/video/music recognition, vector quantization, bioinformatics, etc…  Distances:  Hamming, Euclidean, edit distance, earthmover distance, etc…  Core primitive for: closest pair, clustering …

Curse of Dimensionality 4 AlgorithmQuery timeSpace Full indexing No indexing – linear scan

Approximate NNS 5

NNS algorithms 6 Exponential dependence on dimension  [Arya-Mount’93], [Clarkson’94], [Arya-Mount-Netanyahu-Silverman- We’98], [Kleinberg’97], [Har-Peled’02],[Arya-Fonseca-Mount’11],… Linear dependence on dimension  [Kushilevitz-Ostrovsky-Rabani’98], [Indyk-Motwani’98], [Indyk’98, ‘01], [Gionis-Indyk-Motwani’99], [Charikar’02], [Datar-Immorlica-Indyk- Mirrokni’04], [Chakrabarti-Regev’04], [Panigrahy’06], [Ailon-Chazelle’06], [A.-Indyk’06], [A.-Indyk-Nguyen-Razenshteyn’14], [A.-Razenshteyn’15], [Pagh’16],…

Locality-Sensitive Hashing 1 [Indyk-Motwani’98] “not-so-small” 7

LSH Algorithms 8 SpaceTimeExponentReference [IM’98] [IM’98, DIIM’04] Hamming space Euclidean space [MNP’06, OWZ’11] [AI’06] LSH is tight. Are we really done with basic NNS algorithms? LSH is tight. Are we really done with basic NNS algorithms?

Detour: Closest Pair Problem 9

Beyond Locality Sensitive Hashing SpaceTimeExponentReference [IM’98] [AI’06] Hamming space Euclidean space [MNP’06, OWZ’11] [AR’15] LSH 10 complicated [AINR’14] complicated [AINR’14]

New approach? 11  A random hash function, chosen after seeing the given dataset  Efficiently computable Data-dependent hashing

A look at LSH lower bounds 12 [O’Donnell-Wu-Zhou’11]

Why not NNS lower bound? 13

Construction of hash function 14  Two components:  Average-case: nicer structure  Reduction to such structure  Like a (very weak) “regularity lemma” for a set of points has better LSH data-dependent

Average-case: nicer structure 15 [A.-Indyk-Nguyen-Razenshteyn’14]

Reduction to nice structure 16  Idea: iteratively decrease the radius of minimum enclosing ball  Algorithm:  find dense clusters  with smaller radius  large fraction of points  recurse on dense clusters  apply cap carving on the rest  recurse on each “cap”  eg, dense clusters might reappear *picture not to scale & dimension Why ok?

Hash function 17  Described by a tree (like a hash table) *picture not to scale&dimension

Dense clusters ?

Tree recap 19

Fast preprocessing 20

Even more details 21

Data-dependent hashing: beyond LSH 22 Lower bounds: an opportunity to discover new algorithmic techniques