Data-Dependent Hashing for Nearest Neighbor Search

Data-Dependent Hashing for Nearest Neighbor Search
Alex Andoni (Columbia University) Ilya Razenshteyn (MIT)

Nearest Neighbor Search (NNS)
Preprocess: a set 𝑃 of points Query: given a query point 𝑞, report a point 𝑝 ∗ ∈𝑃 with the smallest distance to 𝑞 𝑝 ∗ 𝑞

Motivation Generic setup: Applications: Distances:
Points model objects (e.g. images) Distance models (dis)similarity measure Applications: machine learning: k-NN rule speeding up deep learning Distances: Hamming, Euclidean, ℓ ∞ edit distance, earthmover distance, etc… Core primitive: closest pair, clustering, etc… 000000 011100 010100 000100 011111 000000 001100 000100 110100 111111 𝑝 ∗ 𝑞

Curse of Dimensionality
All exact algorithms degrade rapidly with the dimension 𝑑 Algorithm Query time Space Full indexing 𝑑⋅ log 𝑛 𝑂 1 𝑛 𝑂(𝑑) (Voronoi diagram size) No indexing – linear scan 𝑂(𝑛⋅𝑑)

Approximate NNS 𝑐-approximate
𝑟-near neighbor: given a query point 𝑞, report a point 𝑝 ′ ∈𝑃 s.t. 𝑝′−𝑞 ≤𝑟 as long as there is some point within distance 𝑟 Practice: use for exact NNS Filtering: gives a set of candidates (hopefully small) 𝑐𝑟 𝑟 𝑝 ∗ 𝑐𝑟 𝑞 𝑝 ′

NNS algorithms Exponential dependence on dimension
[Arya-Mount’93], [Clarkson’94], [Arya-Mount-Netanyahu-Silverman- We’98], [Kleinberg’97], [Har-Peled’02],[Arya-Fonseca-Mount’11],… Linear/poly dependence on dimension [Kushilevitz-Ostrovsky-Rabani’98], [Indyk-Motwani’98], [Indyk’98, ‘01], [Gionis-Indyk-Motwani’99], [Charikar’02], [Datar-Immorlica-Indyk- Mirrokni’04], [Chakrabarti-Regev’04], [Panigrahy’06], [Ailon-Chazelle’06], [A.-Indyk’06], [A.-Indyk-Nguyen-Razenshteyn’14], [A.-Razenshteyn’15], [Pagh’16], [Becker-Ducas-Gama-Laarhoven’16],…

Locality-Sensitive Hashing
[Indyk-Motwani’98] 𝑞 𝑝′ Random hash function ℎ on 𝑅 𝑑 satisfying: for close pair: when 𝑞−𝑝 ≤𝑟 Pr⁡[ℎ(𝑞)=ℎ(𝑝)] is “high” for far pair: when 𝑞−𝑝′ >𝑐𝑟 Pr⁡[ℎ(𝑞)=ℎ(𝑝′)] is “small” Use several hash tables: 𝑝 𝑞 𝑃1= “not-so-small” 𝑃2= 𝜌= log 1/ 𝑃 1 log 1/ 𝑃 2 𝑛𝜌, where

LSH Algorithms Hamming space Euclidean space Sample random bits! Space
Time Exponent 𝒄=𝟐 Reference Hamming space 𝑛 1+𝜌 𝑛 𝜌 𝜌=1/𝑐 𝜌=1/2 [IM’98] 𝜌≥1/𝑐 [MNP’06, OWZ’11] Euclidean space 𝑛 1+𝜌 𝑛 𝜌 𝜌=1/𝑐 𝜌=1/2 [IM’98, DIIM’04] Other ways to use LSH 𝜌≈1/ 𝑐 2 𝜌=1/4 [AI’06] 𝜌≥1/ 𝑐 2 [MNP’06, OWZ’11]

LSH is tight… what’s next?
Lower bounds (cell probe) [A.-Indyk-Patrascu’06, Panigrahy-Talwar-Wieder’08,‘10, Kapralov-Panigrahy’12] Datasets with additional structure [Clarkson’99, Karger-Ruhl’02, Krauthgamer-Lee’04, Beygelzimer-Kakade-Langford’06, Indyk-Naor’07, Dasgupta-Sinha’13, Abdullah-A.-Krauthgamer-Kannan’14,…] Space-time trade-offs [Panigrahy’06, A.-Indyk’06, Kapralov’15] But are we really done with basic NNS algorithms?

Beyond Locality Sensitive Hashing
Space Time Exponent 𝒄=𝟐 Reference Hamming space 𝑛 1+𝜌 𝑛 𝜌 𝜌=1/𝑐 𝜌=1/2 [IM’98] LSH 𝜌≥1/𝑐 [MNP’06, OWZ’11] 𝑛 1+𝜌 𝑛 𝜌 complicated 𝜌=1/2−𝜖 [AINR’14] 𝜌≈ 1 2𝑐−1 𝜌=1/3 [AR’15] Euclidean space 𝑛 1+𝜌 𝑛 𝜌 𝜌≈1/ 𝑐 2 𝜌=1/4 [AI’06] LSH 𝜌≥1/ 𝑐 2 [MNP’06, OWZ’11] 𝑛 1+𝜌 𝑛 𝜌 complicated 𝜌=1/4−𝜖 [AINR’14] 𝜌≈ 1 2 𝑐 2 −1 𝜌=1/7 [AR’15]

Data-dependent hashing
New approach? A random hash function, chosen after seeing the given dataset Efficiently computable Data-dependent hashing Example: Voronoi diagram but expensive to compute the hash function

Construction of hash function
Two components: Pseudo-random structure Reduction to such structure Like a (weak) “regularity lemma” for a set of points has better LSH data-dependent

Pseudo-random structure
Think: random dataset on a sphere vectors perpendicular to each other s.t. random points at distance ≈𝑐𝑟 Lemma: 𝜌= 1 2 𝑐 2 −1 via Cap Carving Curvature helps 𝑐𝑟 𝑐𝑟/ 2

Reduction to nice structure
Idea: iteratively decrease the radius of minimum enclosing ball Algorithm: find dense clusters with smaller radius large fraction of points recurse on dense clusters apply cap carving on the rest recurse on each “cap” eg, dense clusters might reappear Why ok? no dense clusters like “random dataset” with radius=100𝑐𝑟 even better! Why ok? radius = 100𝑐𝑟 radius = 99𝑐𝑟 *picture not to scale & dimension

Hash function Described by a tree (like a hash table) radius = 100𝑐𝑟
*picture not to scale&dimension

Practice Adapt partition to your given dataset
My dataset is not worse-case Adapt partition to your given dataset kd-trees, PCA-tree, spectral hashing, etc [S91, McN01,VKD09, WTF08,…] no guarantees (performance or correctness) Theory: assume special structure on dataset low intrinsic dimension [KR’02, KL’04, BKL’06, IN’07, DS’13,…] structure + noise [AAKK’14] Data-dependent hashing helps even when no a priori structure !

Follow-ups 𝑐 2 𝜌 𝑞 + 𝑐 2 −1 𝜌 𝑠 ≤ 2 𝑐 2 −1 Dynamicity? Better bounds?
Dynamization techniques [Overmars-van Leeuwen’81] Better bounds? For dimension 𝑑=𝑂( log 𝑛 ), can get better 𝜌! [Becker-Ducas-Gama- Laarhoven’16] like it is the case for closest pair [Alman-Williams’15] For dimension 𝑑> log 1+𝛿 𝑛 : our 𝜌 is optimal even for data-dependent hashing! [A-Razenshteyn’16]: in the right formalization (to rule out Voronoi diagram): description complexity of the hash function is 𝑛 1−Ω(1) Practical variant: FALCONN [A-Indyk-Laarhoven-Razenshteyn-Schmidt’15] Time-space trade-offs: 𝑛 𝜌 𝑞 time and 𝑛 1+ 𝜌 𝑠 space Optimal* bound! [A-Laarhoven-Razenshteyn-Waingarten’16, Christiani’16] 𝑐 2 𝜌 𝑞 + 𝑐 2 −1 𝜌 𝑠 ≤ 2 𝑐 2 −1 [Laarhoven’15]

Data-dependent hashing wrap-up
Main algorithmic tool: reduction from worst-case to “random case” Apply reduction in other settings? Eg, closest pair can be solved much faster for random-case [Valiant’12, Karppa-Kaski-Kohonen’16] Understand data-dependency for “beyond-worst case”? Instance-optimal partitions? Data-dependency to understand space partitions for harder metrics? Classic hard distance: ℓ ∞ No non-trivial sketch But: [Indyk’98] gets efficient NNS with approximation 𝑂( log log 𝑑 ) Tight approximation in the relevant models [ACP’08, PTW’10, KP’12] Can be thought of as data-dependent hashing! NNS for any norm (eg, matrix norms, EMD) or metric (edit dist)?

Data-Dependent Hashing for Nearest Neighbor Search

Similar presentations

Presentation on theme: "Data-Dependent Hashing for Nearest Neighbor Search"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Data-Dependent Hashing for Nearest Neighbor Search

Similar presentations

Presentation on theme: "Data-Dependent Hashing for Nearest Neighbor Search"— Presentation transcript:

Similar presentations

About project

Feedback