Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data-Dependent Hashing for Nearest Neighbor Search

Similar presentations


Presentation on theme: "Data-Dependent Hashing for Nearest Neighbor Search"β€” Presentation transcript:

1 Data-Dependent Hashing for Nearest Neighbor Search
Alex Andoni (Columbia University) Ilya Razenshteyn (MIT)

2 Nearest Neighbor Search (NNS)
Preprocess: a set 𝑃 of points Query: given a query point π‘ž, report a point 𝑝 βˆ— βˆˆπ‘ƒ with the smallest distance to π‘ž 𝑝 βˆ— π‘ž

3 Motivation Generic setup: Applications: Distances:
Points model objects (e.g. images) Distance models (dis)similarity measure Applications: machine learning: k-NN rule speeding up deep learning Distances: Hamming, Euclidean, β„“ ∞ edit distance, earthmover distance, etc… Core primitive: closest pair, clustering, etc… 000000 011100 010100 000100 011111 000000 001100 000100 110100 111111 𝑝 βˆ— π‘ž

4 Curse of Dimensionality
All exact algorithms degrade rapidly with the dimension 𝑑 Algorithm Query time Space Full indexing 𝑑⋅ log 𝑛 𝑂 1 𝑛 𝑂(𝑑) (Voronoi diagram size) No indexing – linear scan 𝑂(𝑛⋅𝑑)

5 Approximate NNS 𝑐-approximate
π‘Ÿ-near neighbor: given a query point π‘ž, report a point 𝑝 β€² βˆˆπ‘ƒ s.t. π‘β€²βˆ’π‘ž β‰€π‘Ÿ as long as there is some point within distance π‘Ÿ Practice: use for exact NNS Filtering: gives a set of candidates (hopefully small) π‘π‘Ÿ π‘Ÿ 𝑝 βˆ— π‘π‘Ÿ π‘ž 𝑝 β€²

6 NNS algorithms Exponential dependence on dimension
[Arya-Mount’93], [Clarkson’94], [Arya-Mount-Netanyahu-Silverman- We’98], [Kleinberg’97], [Har-Peled’02],[Arya-Fonseca-Mount’11],… Linear/poly dependence on dimension [Kushilevitz-Ostrovsky-Rabani’98], [Indyk-Motwani’98], [Indyk’98, β€˜01], [Gionis-Indyk-Motwani’99], [Charikar’02], [Datar-Immorlica-Indyk- Mirrokni’04], [Chakrabarti-Regev’04], [Panigrahy’06], [Ailon-Chazelle’06], [A.-Indyk’06], [A.-Indyk-Nguyen-Razenshteyn’14], [A.-Razenshteyn’15], [Pagh’16], [Becker-Ducas-Gama-Laarhoven’16],…

7 Locality-Sensitive Hashing
[Indyk-Motwani’98] π‘ž 𝑝′ Random hash function β„Ž on 𝑅 𝑑 satisfying: for close pair: when π‘žβˆ’π‘ β‰€π‘Ÿ Pr⁑[β„Ž(π‘ž)=β„Ž(𝑝)] is β€œhigh” for far pair: when π‘žβˆ’π‘β€² >π‘π‘Ÿ Pr⁑[β„Ž(π‘ž)=β„Ž(𝑝′)] is β€œsmall” Use several hash tables: 𝑝 π‘ž 𝑃1= β€œnot-so-small” 𝑃2= 𝜌= log 1/ 𝑃 1 log 1/ 𝑃 2 π‘›πœŒ, where

8 LSH Algorithms Hamming space Euclidean space Sample random bits! Space
Time Exponent 𝒄=𝟐 Reference Hamming space 𝑛 1+𝜌 𝑛 𝜌 𝜌=1/𝑐 𝜌=1/2 [IM’98] 𝜌β‰₯1/𝑐 [MNP’06, OWZ’11] Euclidean space 𝑛 1+𝜌 𝑛 𝜌 𝜌=1/𝑐 𝜌=1/2 [IM’98, DIIM’04] Other ways to use LSH πœŒβ‰ˆ1/ 𝑐 2 𝜌=1/4 [AI’06] 𝜌β‰₯1/ 𝑐 2 [MNP’06, OWZ’11]

9 LSH is tight… what’s next?
Lower bounds (cell probe) [A.-Indyk-Patrascu’06, Panigrahy-Talwar-Wieder’08,β€˜10, Kapralov-Panigrahy’12] Datasets with additional structure [Clarkson’99, Karger-Ruhl’02, Krauthgamer-Lee’04, Beygelzimer-Kakade-Langford’06, Indyk-Naor’07, Dasgupta-Sinha’13, Abdullah-A.-Krauthgamer-Kannan’14,…] Space-time trade-offs [Panigrahy’06, A.-Indyk’06, Kapralov’15] But are we really done with basic NNS algorithms?

10 Beyond Locality Sensitive Hashing
Space Time Exponent 𝒄=𝟐 Reference Hamming space 𝑛 1+𝜌 𝑛 𝜌 𝜌=1/𝑐 𝜌=1/2 [IM’98] LSH 𝜌β‰₯1/𝑐 [MNP’06, OWZ’11] 𝑛 1+𝜌 𝑛 𝜌 complicated 𝜌=1/2βˆ’πœ– [AINR’14] πœŒβ‰ˆ 1 2π‘βˆ’1 𝜌=1/3 [AR’15] Euclidean space 𝑛 1+𝜌 𝑛 𝜌 πœŒβ‰ˆ1/ 𝑐 2 𝜌=1/4 [AI’06] LSH 𝜌β‰₯1/ 𝑐 2 [MNP’06, OWZ’11] 𝑛 1+𝜌 𝑛 𝜌 complicated 𝜌=1/4βˆ’πœ– [AINR’14] πœŒβ‰ˆ 1 2 𝑐 2 βˆ’1 𝜌=1/7 [AR’15]

11 Data-dependent hashing
New approach? A random hash function, chosen after seeing the given dataset Efficiently computable Data-dependent hashing Example: Voronoi diagram but expensive to compute the hash function

12 Construction of hash function
Two components: Pseudo-random structure Reduction to such structure Like a (weak) β€œregularity lemma” for a set of points has better LSH data-dependent

13 Pseudo-random structure
Think: random dataset on a sphere vectors perpendicular to each other s.t. random points at distance β‰ˆπ‘π‘Ÿ Lemma: 𝜌= 1 2 𝑐 2 βˆ’1 via Cap Carving Curvature helps π‘π‘Ÿ π‘π‘Ÿ/ 2

14 Reduction to nice structure
Idea: iteratively decrease the radius of minimum enclosing ball Algorithm: find dense clusters with smaller radius large fraction of points recurse on dense clusters apply cap carving on the rest recurse on each β€œcap” eg, dense clusters might reappear Why ok? no dense clusters like β€œrandom dataset” with radius=100π‘π‘Ÿ even better! Why ok? radius = 100π‘π‘Ÿ radius = 99π‘π‘Ÿ *picture not to scale & dimension

15 Hash function Described by a tree (like a hash table) radius = 100π‘π‘Ÿ
*picture not to scale&dimension

16 Practice Adapt partition to your given dataset
My dataset is not worse-case Adapt partition to your given dataset kd-trees, PCA-tree, spectral hashing, etc [S91, McN01,VKD09, WTF08,…] no guarantees (performance or correctness) Theory: assume special structure on dataset low intrinsic dimension [KR’02, KL’04, BKL’06, IN’07, DS’13,…] structure + noise [AAKK’14] Data-dependent hashing helps even when no a priori structure !

17 Follow-ups 𝑐 2 𝜌 π‘ž + 𝑐 2 βˆ’1 𝜌 𝑠 ≀ 2 𝑐 2 βˆ’1 Dynamicity? Better bounds?
Dynamization techniques [Overmars-van Leeuwen’81] Better bounds? For dimension 𝑑=𝑂( log 𝑛 ), can get better 𝜌! [Becker-Ducas-Gama- Laarhoven’16] like it is the case for closest pair [Alman-Williams’15] For dimension 𝑑> log 1+𝛿 𝑛 : our 𝜌 is optimal even for data-dependent hashing! [A-Razenshteyn’16]: in the right formalization (to rule out Voronoi diagram): description complexity of the hash function is 𝑛 1βˆ’Ξ©(1) Practical variant: FALCONN [A-Indyk-Laarhoven-Razenshteyn-Schmidt’15] Time-space trade-offs: 𝑛 𝜌 π‘ž time and 𝑛 1+ 𝜌 𝑠 space Optimal* bound! [A-Laarhoven-Razenshteyn-Waingarten’16, Christiani’16] 𝑐 2 𝜌 π‘ž + 𝑐 2 βˆ’1 𝜌 𝑠 ≀ 2 𝑐 2 βˆ’1 [Laarhoven’15]

18 Data-dependent hashing wrap-up
Main algorithmic tool: reduction from worst-case to β€œrandom case” Apply reduction in other settings? Eg, closest pair can be solved much faster for random-case [Valiant’12, Karppa-Kaski-Kohonen’16] Understand data-dependency for β€œbeyond-worst case”? Instance-optimal partitions? Data-dependency to understand space partitions for harder metrics? Classic hard distance: β„“ ∞ No non-trivial sketch But: [Indyk’98] gets efficient NNS with approximation 𝑂( log log 𝑑 ) Tight approximation in the relevant models [ACP’08, PTW’10, KP’12] Can be thought of as data-dependent hashing! NNS for any norm (eg, matrix norms, EMD) or metric (edit dist)?


Download ppt "Data-Dependent Hashing for Nearest Neighbor Search"

Similar presentations


Ads by Google