Beyond Locality Sensitive Hashing Alex Andoni (Microsoft Research) Joint with: Piotr Indyk (MIT), Huy L. Nguyen (Princeton), Ilya Razenshteyn (MIT)

Slides:



Advertisements
Similar presentations
Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009.
Advertisements

Algorithmic High-Dimensional Geometry 1 Alex Andoni (Microsoft Research SVC)
Spectral Approaches to Nearest Neighbor Search arXiv: Robert Krauthgamer (Weizmann Institute) Joint with: Amirali Abdullah, Alexandr Andoni, Ravi.
Image acquisition using sparse (pseudo)-random matrices Piotr Indyk MIT.
Overcoming the L 1 Non- Embeddability Barrier Robert Krauthgamer (Weizmann Institute) Joint work with Alexandr Andoni and Piotr Indyk (MIT)
Searching on Multi-Dimensional Data
MIT CSAIL Vision interfaces Towards efficient matching with random hashing methods… Kristen Grauman Gregory Shakhnarovich Trevor Darrell.
Efficiently searching for similar images (Kristen Grauman)
Similarity Search in High Dimensions via Hashing
Data Structures and Functional Programming Algorithms for Big Data Ramin Zabih Cornell University Fall 2012.
Parallel Algorithms for Geometric Graph Problems Alex Andoni (Microsoft Research) Joint with: Aleksandar Nikolov (Rutgers), Krzysztof Onak (IBM), Grigory.
VLSH: Voronoi-based Locality Sensitive Hashing Sung-eui Yoon Authors: Lin Loi, Jae-Pil Heo, Junghwan Lee, and Sung-Eui Yoon KAIST
Nearest Neighbor Search in high-dimensional spaces Alexandr Andoni (Microsoft Research)
Spectral Approaches to Nearest Neighbor Search Alex Andoni Joint work with:Amirali Abdullah Ravi Kannan Robi Krauthgamer.
Large-scale matching CSE P 576 Larry Zitnick
Coherency Sensitive Hashing (CSH) Simon Korman and Shai Avidan Dept. of Electrical Engineering Tel Aviv University ICCV2011 | 13th International Conference.
Nearest Neighbor Search in High-Dimensional Spaces Alexandr Andoni (Microsoft Research Silicon Valley)
Lower Bounds for NNS and Metric Expansion Rina Panigrahy Kunal Talwar Udi Wieder Microsoft Research SVC TexPoint fonts used in EMF. Read the TexPoint manual.
Algorithms for Nearest Neighbor Search Piotr Indyk MIT.
Given by: Erez Eyal Uri Klein Lecture Outline Exact Nearest Neighbor search Exact Nearest Neighbor search Definition Definition Low dimensions Low dimensions.
Ryan O'Donnell (CMU) Yi Wu (CMU, IBM) Yuan Zhou (CMU)
1 Jun Wang, 2 Sanjiv Kumar, and 1 Shih-Fu Chang 1 Columbia University, New York, USA 2 Google Research, New York, USA Sequential Projection Learning for.
Approximate Nearest Subspace Search with Applications to Pattern Recognition Ronen Basri, Tal Hassner, Lihi Zelnik-Manor presented by Andrew Guillory and.
Approximate Nearest Neighbors and the Fast Johnson-Lindenstrauss Transform Nir Ailon, Bernard Chazelle (Princeton University)
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
1 Nearest Neighbor Learning Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AI.
1 Embedded Stringology Piotr Indyk MIT. 2 Combinatorial Pattern Matching Stringology [Galil] : algorithms for strings (as well as trees and other plants)
Y. Weiss (Hebrew U.) A. Torralba (MIT) Rob Fergus (NYU)
Summer School on Hashing’14 Locality Sensitive Hashing Alex Andoni (Microsoft Research)
Optimal Data-Dependent Hashing for Approximate Near Neighbors
Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Simons Inst. / Columbia) Robert Krauthgamer (Weizmann Inst.) Ilya Razenshteyn (MIT, now.
Efficient Image Search and Retrieval using Compact Binary Codes
Indexing Techniques Mei-Chen Yeh.
Approximation algorithms for large-scale kernel methods Taher Dameh School of Computing Science Simon Fraser University March 29 th, 2010.
Nearest Neighbor and Locality-Sensitive Hashing
Nearest Neighbor Paul Hsiung March 16, Quick Review of NN Set of points P Query point q Distance metric d Find p in P such that d(p,q) < d(p’,q)
Fast Similarity Search for Learned Metrics Prateek Jain, Brian Kulis, and Kristen Grauman Department of Computer Sciences University of Texas at Austin.
Nearest Neighbor Search in high-dimensional spaces Alexandr Andoni (Princeton/CCI → MSR SVC) Barriers II August 30, 2010.
Sketching and Nearest Neighbor Search (2) Alex Andoni (Columbia University) MADALGO Summer School on Streaming Algorithms 2015.
Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality Piotr Indyk, Rajeev Motwani The 30 th annual ACM symposium on theory of computing.
Sketching, Sampling and other Sublinear Algorithms: Euclidean space: dimension reduction and NNS Alex Andoni (MSR SVC)
1 Efficient Algorithms for Substring Near Neighbor Problem Alexandr Andoni Piotr Indyk MIT.
Outline Problem Background Theory Extending to NLP and Experiment
Nearest Neighbor Classifier 1.K-NN Classifier 2.Multi-Class Classification.
11 Lecture 24: MapReduce Algorithms Wrap-up. Admin PS2-4 solutions Project presentations next week – 20min presentation/team – 10 teams => 3 days – 3.
Optimal Data-Dependent Hashing for Nearest Neighbor Search Alex Andoni (Columbia University) Joint work with: Ilya Razenshteyn.
Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Columbia) Robert Krauthgamer (Weizmann Inst) Ilya Razenshteyn (MIT) 1.
Summer School on Hashing’14 Dimension Reduction Alex Andoni (Microsoft Research)
Ryan O'Donnell (CMU) Yi Wu (CMU, IBM) Yuan Zhou (CMU)
S IMILARITY E STIMATION T ECHNIQUES FROM R OUNDING A LGORITHMS Paper Review Jieun Lee Moses S. Charikar Princeton University Advanced Database.
Tight Lower Bounds for Data- Dependent Locality-Sensitive Hashing Alexandr Andoni (Columbia) Ilya Razenshteyn (MIT CSAIL)
Data-dependent Hashing for Similarity Search
Approximate Near Neighbors for General Symmetric Norms
Fast nearest neighbor searches in high dimensions Sami Sieranoja
Sublinear Algorithmic Tools 3
Lecture 11: Nearest Neighbor Search
Spectral Approaches to Nearest Neighbor Search [FOCS 2014]
Sketching and Embedding are Equivalent for Norms
Lecture 7: Dynamic sampling Dimension Reduction
Near(est) Neighbor in High Dimensions
Data-Dependent Hashing for Nearest Neighbor Search
TOP DM 10 Algorithms C4.5 C 4.5 Research Issue:
Near-Optimal (Euclidean) Metric Compression
Locality Sensitive Hashing
Overcoming the L1 Non-Embeddability Barrier
Exact Nearest Neighbor Algorithms
CS5112: Algorithms and Data Structures for Applications
Lecture 15: Least Square Regression Metric Embeddings
President’s Day Lecture: Advanced Nearest Neighbor Search
Ronen Basri Tal Hassner Lihi Zelnik-Manor Weizmann Institute Caltech
Presentation transcript:

Beyond Locality Sensitive Hashing Alex Andoni (Microsoft Research) Joint with: Piotr Indyk (MIT), Huy L. Nguyen (Princeton), Ilya Razenshteyn (MIT)

Nearest Neighbor Search (NNS)

Motivation Generic setup: Points model objects (e.g. images) Distance models (dis)similarity measure Application areas: machine learning: k-NN rule image/video/music recognition, deduplication, bioinformatics, etc… Distance can be: Hamming, Euclidean, … Primitive for other problems: find the similar pairs, clustering…

Approximate NNS c-approximate q r p cr

Locality-Sensitive Hashing q p 1 [Indyk-Motwani’98] q “ not-so-small ”

Locality sensitive hash functions 6 [Indyk-Motwani’98] 1

Algorithms and Lower Bounds SpaceTimeCommentReference [IM’98] [PTW’08, PTW’10] [IM’98] [DIIM’04, AI’06] [MNP’06] [OWZ’11] [PTW’08, PTW’10] [MNP’06] [OWZ’11]

LSH is tight… leave the rest to cell-probe lower bounds?

Main Result 9

A look at LSH lower bounds 10 [O’Donnell-Wu-Zhou’11]

Why not NNS lower bound? 11

Our algorithm: intuition 12

Nice Configuration: “sparsity” 13

Reduction: into spherical LSH 14

Two-level algorithm

Details 16

Practice Practice uses data-dependent partitions! “wherever theoreticians suggest to use random dimensionality reduction, use PCA” Lots of variants Trees: kd-trees, quad-trees, ball-trees, rp- trees, PCA-trees, sp-trees… no guarantees: e.g., are deterministic Is there a better way to do partitions in practice? Why do PCA-trees work? [Abdullah-A-Kannan-Krauthgamer]: if have more structure 17

Finale 18