Download presentation
Presentation is loading. Please wait.
Published byEmil Flowers Modified over 9 years ago
1
Beyond Locality Sensitive Hashing Alex Andoni (Microsoft Research) Joint with: Piotr Indyk (MIT), Huy L. Nguyen (Princeton), Ilya Razenshteyn (MIT)
2
Nearest Neighbor Search (NNS)
3
Motivation Generic setup: Points model objects (e.g. images) Distance models (dis)similarity measure Application areas: machine learning: k-NN rule image/video/music recognition, deduplication, bioinformatics, etc… Distance can be: Hamming, Euclidean, … Primitive for other problems: find the similar pairs, clustering… 000000 011100 010100 000100 010100 011111 000000 001100 000100 110100 111111
4
Approximate NNS c-approximate q r p cr
5
Locality-Sensitive Hashing q p 1 [Indyk-Motwani’98] q “ not-so-small ”
6
Locality sensitive hash functions 6 [Indyk-Motwani’98] 1
7
Algorithms and Lower Bounds SpaceTimeCommentReference [IM’98] [PTW’08, PTW’10] [IM’98] [DIIM’04, AI’06] [MNP’06] [OWZ’11] [PTW’08, PTW’10] [MNP’06] [OWZ’11]
8
LSH is tight… leave the rest to cell-probe lower bounds?
9
Main Result 9
10
A look at LSH lower bounds 10 [O’Donnell-Wu-Zhou’11]
11
Why not NNS lower bound? 11
12
Our algorithm: intuition 12
13
Nice Configuration: “sparsity” 13
14
Reduction: into spherical LSH 14
15
Two-level algorithm
16
Details 16
17
Practice Practice uses data-dependent partitions! “wherever theoreticians suggest to use random dimensionality reduction, use PCA” Lots of variants Trees: kd-trees, quad-trees, ball-trees, rp- trees, PCA-trees, sp-trees… no guarantees: e.g., are deterministic Is there a better way to do partitions in practice? Why do PCA-trees work? [Abdullah-A-Kannan-Krauthgamer]: if have more structure 17
18
Finale 18
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.