Data-Dependent Hashing for Nearest Neighbor Search

Slides:



Advertisements
Similar presentations
A Nonlinear Approach to Dimension Reduction Robert Krauthgamer Weizmann Institute of Science Joint work with Lee-Ad Gottlieb TexPoint fonts used in EMF.
Advertisements

Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009.
Algorithmic High-Dimensional Geometry 1 Alex Andoni (Microsoft Research SVC)
Spectral Approaches to Nearest Neighbor Search arXiv: Robert Krauthgamer (Weizmann Institute) Joint with: Amirali Abdullah, Alexandr Andoni, Ravi.
Overcoming the L 1 Non- Embeddability Barrier Robert Krauthgamer (Weizmann Institute) Joint work with Alexandr Andoni and Piotr Indyk (MIT)
Big Data Lecture 6: Locality Sensitive Hashing (LSH)
Searching on Multi-Dimensional Data
MIT CSAIL Vision interfaces Towards efficient matching with random hashing methods… Kristen Grauman Gregory Shakhnarovich Trevor Darrell.
Cse 521: design and analysis of algorithms Time & place T, Th pm in CSE 203 People Prof: James Lee TA: Thach Nguyen Book.
Similarity Search in High Dimensions via Hashing
Data Structures and Functional Programming Algorithms for Big Data Ramin Zabih Cornell University Fall 2012.
VLSH: Voronoi-based Locality Sensitive Hashing Sung-eui Yoon Authors: Lin Loi, Jae-Pil Heo, Junghwan Lee, and Sung-Eui Yoon KAIST
Nearest Neighbor Search in high-dimensional spaces Alexandr Andoni (Microsoft Research)
Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Simons Institute) Robert Krauthgamer (Weizmann Institute) Ilya Razenshteyn (CSAIL MIT)
Spectral Approaches to Nearest Neighbor Search Alex Andoni Joint work with:Amirali Abdullah Ravi Kannan Robi Krauthgamer.
Given by: Erez Eyal Uri Klein Lecture Outline Exact Nearest Neighbor search Exact Nearest Neighbor search Definition Definition Low dimensions Low dimensions.
Approximate Nearest Neighbors and the Fast Johnson-Lindenstrauss Transform Nir Ailon, Bernard Chazelle (Princeton University)
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Nearest Neighbour Condensing and Editing David Claus February 27, 2004 Computer Vision Reading Group Oxford.
Summer School on Hashing’14 Locality Sensitive Hashing Alex Andoni (Microsoft Research)
Optimal Data-Dependent Hashing for Approximate Near Neighbors
Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Simons Inst. / Columbia) Robert Krauthgamer (Weizmann Inst.) Ilya Razenshteyn (MIT, now.
Dimensionality Reduction
Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)
Indexing Techniques Mei-Chen Yeh.
Approximation algorithms for large-scale kernel methods Taher Dameh School of Computing Science Simon Fraser University March 29 th, 2010.
Beyond Locality Sensitive Hashing Alex Andoni (Microsoft Research) Joint with: Piotr Indyk (MIT), Huy L. Nguyen (Princeton), Ilya Razenshteyn (MIT)
Nearest Neighbor Paul Hsiung March 16, Quick Review of NN Set of points P Query point q Distance metric d Find p in P such that d(p,q) < d(p’,q)
Fast Similarity Search for Learned Metrics Prateek Jain, Brian Kulis, and Kristen Grauman Department of Computer Sciences University of Texas at Austin.
Nearest Neighbor Search in high-dimensional spaces Alexandr Andoni (Princeton/CCI → MSR SVC) Barriers II August 30, 2010.
David Claus and Christoph F. Eick: Nearest Neighbor Editing and Condensing Techniques Nearest Neighbor Editing and Condensing Techniques 1.Nearest Neighbor.
Sketching and Nearest Neighbor Search (2) Alex Andoni (Columbia University) MADALGO Summer School on Streaming Algorithms 2015.
Similarity Searching in High Dimensions via Hashing Paper by: Aristides Gionis, Poitr Indyk, Rajeev Motwani.
Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality Piotr Indyk, Rajeev Motwani The 30 th annual ACM symposium on theory of computing.
Sketching, Sampling and other Sublinear Algorithms: Euclidean space: dimension reduction and NNS Alex Andoni (MSR SVC)
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
11 Lecture 24: MapReduce Algorithms Wrap-up. Admin PS2-4 solutions Project presentations next week – 20min presentation/team – 10 teams => 3 days – 3.
Optimal Data-Dependent Hashing for Nearest Neighbor Search Alex Andoni (Columbia University) Joint work with: Ilya Razenshteyn.
Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Columbia) Robert Krauthgamer (Weizmann Inst) Ilya Razenshteyn (MIT) 1.
Summer School on Hashing’14 Dimension Reduction Alex Andoni (Microsoft Research)
Tight Lower Bounds for Data- Dependent Locality-Sensitive Hashing Alexandr Andoni (Columbia) Ilya Razenshteyn (MIT CSAIL)
Data-dependent Hashing for Similarity Search
Clustering Data Streams
SIMILARITY SEARCH The Metric Space Approach
Approximate Near Neighbors for General Symmetric Norms
Sublinear Algorithmic Tools 3
Lecture 11: Nearest Neighbor Search
Spectral Approaches to Nearest Neighbor Search [FOCS 2014]
K Nearest Neighbor Classification
Lecture 10: Sketching S3: Nearest Neighbor Search
Sketching and Embedding are Equivalent for Norms
Lecture 7: Dynamic sampling Dimension Reduction
Near(est) Neighbor in High Dimensions
Lecture 16: Earth-Mover Distance
Nearest-Neighbor Classifiers
Y. Kotidis, S. Muthukrishnan,
Near-Optimal (Euclidean) Metric Compression
Yair Bartal Lee-Ad Gottlieb Hebrew U. Ariel University
Locality Sensitive Hashing
Overcoming the L1 Non-Embeddability Barrier
cse 521: design and analysis of algorithms
COSC 4335: Other Classification Techniques
CS5112: Algorithms and Data Structures for Applications
Exact Nearest Neighbor Algorithms
CS5112: Algorithms and Data Structures for Applications
Lecture 15: Least Square Regression Metric Embeddings
Minwise Hashing and Efficient Search
The Intrinsic Dimension of Metric Spaces
Clustering.
President’s Day Lecture: Advanced Nearest Neighbor Search
Presentation transcript:

Data-Dependent Hashing for Nearest Neighbor Search Alex Andoni (Columbia University) Ilya Razenshteyn (MIT)

Nearest Neighbor Search (NNS) Preprocess: a set 𝑃 of points Query: given a query point 𝑞, report a point 𝑝 ∗ ∈𝑃 with the smallest distance to 𝑞 𝑝 ∗ 𝑞

Motivation Generic setup: Applications: Distances: Points model objects (e.g. images) Distance models (dis)similarity measure Applications: machine learning: k-NN rule speeding up deep learning Distances: Hamming, Euclidean, ℓ ∞ edit distance, earthmover distance, etc… Core primitive: closest pair, clustering, etc… 000000 011100 010100 000100 011111 000000 001100 000100 110100 111111 𝑝 ∗ 𝑞

Curse of Dimensionality All exact algorithms degrade rapidly with the dimension 𝑑 Algorithm Query time Space Full indexing 𝑑⋅ log 𝑛 𝑂 1 𝑛 𝑂(𝑑) (Voronoi diagram size) No indexing – linear scan 𝑂(𝑛⋅𝑑)

Approximate NNS 𝑐-approximate 𝑟-near neighbor: given a query point 𝑞, report a point 𝑝 ′ ∈𝑃 s.t. 𝑝′−𝑞 ≤𝑟 as long as there is some point within distance 𝑟 Practice: use for exact NNS Filtering: gives a set of candidates (hopefully small) 𝑐𝑟 𝑟 𝑝 ∗ 𝑐𝑟 𝑞 𝑝 ′

NNS algorithms Exponential dependence on dimension [Arya-Mount’93], [Clarkson’94], [Arya-Mount-Netanyahu-Silverman- We’98], [Kleinberg’97], [Har-Peled’02],[Arya-Fonseca-Mount’11],… Linear/poly dependence on dimension [Kushilevitz-Ostrovsky-Rabani’98], [Indyk-Motwani’98], [Indyk’98, ‘01], [Gionis-Indyk-Motwani’99], [Charikar’02], [Datar-Immorlica-Indyk- Mirrokni’04], [Chakrabarti-Regev’04], [Panigrahy’06], [Ailon-Chazelle’06], [A.-Indyk’06], [A.-Indyk-Nguyen-Razenshteyn’14], [A.-Razenshteyn’15], [Pagh’16], [Becker-Ducas-Gama-Laarhoven’16],…

Locality-Sensitive Hashing [Indyk-Motwani’98] 𝑞 𝑝′ Random hash function ℎ on 𝑅 𝑑 satisfying: for close pair: when 𝑞−𝑝 ≤𝑟 Pr⁡[ℎ(𝑞)=ℎ(𝑝)] is “high” for far pair: when 𝑞−𝑝′ >𝑐𝑟 Pr⁡[ℎ(𝑞)=ℎ(𝑝′)] is “small” Use several hash tables: 𝑝 𝑞 𝑃1= “not-so-small” 𝑃2= 𝜌= log 1/ 𝑃 1 log 1/ 𝑃 2 𝑛𝜌, where

LSH Algorithms Hamming space Euclidean space Sample random bits! Space Time Exponent 𝒄=𝟐 Reference Hamming space 𝑛 1+𝜌 𝑛 𝜌 𝜌=1/𝑐 𝜌=1/2 [IM’98] 𝜌≥1/𝑐 [MNP’06, OWZ’11] Euclidean space 𝑛 1+𝜌 𝑛 𝜌 𝜌=1/𝑐 𝜌=1/2 [IM’98, DIIM’04] Other ways to use LSH 𝜌≈1/ 𝑐 2 𝜌=1/4 [AI’06] 𝜌≥1/ 𝑐 2 [MNP’06, OWZ’11]

LSH is tight… what’s next? Lower bounds (cell probe) [A.-Indyk-Patrascu’06, Panigrahy-Talwar-Wieder’08,‘10, Kapralov-Panigrahy’12] Datasets with additional structure [Clarkson’99, Karger-Ruhl’02, Krauthgamer-Lee’04, Beygelzimer-Kakade-Langford’06, Indyk-Naor’07, Dasgupta-Sinha’13, Abdullah-A.-Krauthgamer-Kannan’14,…] Space-time trade-offs [Panigrahy’06, A.-Indyk’06, Kapralov’15] But are we really done with basic NNS algorithms?

Beyond Locality Sensitive Hashing Space Time Exponent 𝒄=𝟐 Reference Hamming space 𝑛 1+𝜌 𝑛 𝜌 𝜌=1/𝑐 𝜌=1/2 [IM’98] LSH 𝜌≥1/𝑐 [MNP’06, OWZ’11] 𝑛 1+𝜌 𝑛 𝜌 complicated 𝜌=1/2−𝜖 [AINR’14] 𝜌≈ 1 2𝑐−1 𝜌=1/3 [AR’15] Euclidean space 𝑛 1+𝜌 𝑛 𝜌 𝜌≈1/ 𝑐 2 𝜌=1/4 [AI’06] LSH 𝜌≥1/ 𝑐 2 [MNP’06, OWZ’11] 𝑛 1+𝜌 𝑛 𝜌 complicated 𝜌=1/4−𝜖 [AINR’14] 𝜌≈ 1 2 𝑐 2 −1 𝜌=1/7 [AR’15]

Data-dependent hashing New approach? A random hash function, chosen after seeing the given dataset Efficiently computable Data-dependent hashing Example: Voronoi diagram but expensive to compute the hash function

Construction of hash function Two components: Pseudo-random structure Reduction to such structure Like a (weak) “regularity lemma” for a set of points has better LSH data-dependent

Pseudo-random structure Think: random dataset on a sphere vectors perpendicular to each other s.t. random points at distance ≈𝑐𝑟 Lemma: 𝜌= 1 2 𝑐 2 −1 via Cap Carving Curvature helps 𝑐𝑟 𝑐𝑟/ 2

Reduction to nice structure Idea: iteratively decrease the radius of minimum enclosing ball Algorithm: find dense clusters with smaller radius large fraction of points recurse on dense clusters apply cap carving on the rest recurse on each “cap” eg, dense clusters might reappear Why ok? no dense clusters like “random dataset” with radius=100𝑐𝑟 even better! Why ok? radius = 100𝑐𝑟 radius = 99𝑐𝑟 *picture not to scale & dimension

Hash function Described by a tree (like a hash table) radius = 100𝑐𝑟 *picture not to scale&dimension

Practice Adapt partition to your given dataset My dataset is not worse-case Adapt partition to your given dataset kd-trees, PCA-tree, spectral hashing, etc [S91, McN01,VKD09, WTF08,…] no guarantees (performance or correctness) Theory: assume special structure on dataset low intrinsic dimension [KR’02, KL’04, BKL’06, IN’07, DS’13,…] structure + noise [AAKK’14] Data-dependent hashing helps even when no a priori structure !

Follow-ups 𝑐 2 𝜌 𝑞 + 𝑐 2 −1 𝜌 𝑠 ≤ 2 𝑐 2 −1 Dynamicity? Better bounds? Dynamization techniques [Overmars-van Leeuwen’81] Better bounds? For dimension 𝑑=𝑂( log 𝑛 ), can get better 𝜌! [Becker-Ducas-Gama- Laarhoven’16] like it is the case for closest pair [Alman-Williams’15] For dimension 𝑑> log 1+𝛿 𝑛 : our 𝜌 is optimal even for data-dependent hashing! [A-Razenshteyn’16]: in the right formalization (to rule out Voronoi diagram): description complexity of the hash function is 𝑛 1−Ω(1) Practical variant: FALCONN [A-Indyk-Laarhoven-Razenshteyn-Schmidt’15] Time-space trade-offs: 𝑛 𝜌 𝑞 time and 𝑛 1+ 𝜌 𝑠 space Optimal* bound! [A-Laarhoven-Razenshteyn-Waingarten’16, Christiani’16] 𝑐 2 𝜌 𝑞 + 𝑐 2 −1 𝜌 𝑠 ≤ 2 𝑐 2 −1 [Laarhoven’15]

Data-dependent hashing wrap-up Main algorithmic tool: reduction from worst-case to “random case” Apply reduction in other settings? Eg, closest pair can be solved much faster for random-case [Valiant’12, Karppa-Kaski-Kohonen’16] Understand data-dependency for “beyond-worst case”? Instance-optimal partitions? Data-dependency to understand space partitions for harder metrics? Classic hard distance: ℓ ∞ No non-trivial sketch But: [Indyk’98] gets efficient NNS with approximation 𝑂( log log 𝑑 ) Tight approximation in the relevant models [ACP’08, PTW’10, KP’12] Can be thought of as data-dependent hashing! NNS for any norm (eg, matrix norms, EMD) or metric (edit dist)?