Download presentation
Presentation is loading. Please wait.
Published byBeatrix Scott Modified over 9 years ago
1
Summer School on Hashing’14 Locality Sensitive Hashing Alex Andoni (Microsoft Research)
2
Nearest Neighbor Search (NNS)
3
Approximate NNS c-approximate q r p cr
4
Heuristic for Exact NNS q r p cr c-approximate
5
Locality-Sensitive Hashing q p 1 [Indyk-Motwani’98] q “ not-so-small ”
6
Locality sensitive hash functions 6
7
Full algorithm 7
8
Analysis of LSH Scheme 8
9
Analysis: Correctness 9
10
Analysis: Runtime 10
11
NNS for Euclidean space 11 [Datar-Immorlica-Indyk-Mirrokni’04]
12
Regular grid → grid of balls p can hit empty space, so take more such grids until p is in a ball Need (too) many grids of balls Start by projecting in dimension t Analysis gives Choice of reduced dimension t? Tradeoff between # hash tables, n , and Time to hash, t O(t) Total query time: dn 1/c 2 +o(1) Optimal* LSH 2D p p RtRt [A-Indyk’06]
13
x Proof idea Claim:, i.e., P(r)=probability of collision when ||p-q||=r Intuitive proof: Projection approx preserves distances [JL] P(r) = intersection / union P(r)≈random point u beyond the dashed line Fact (high dimensions): the x-coordinate of u has a nearly Gaussian distribution → P(r) exp(-A·r 2 ) p q r q P(r) u p
14
LSH Zoo 14 To be or not to be To Simons or not to Simons …21102… be to or not Simons …01122… be to or not Simons …11101… …01111… {be,not,or,to}{not,or,to, Simons} 1 1 not beto
15
LSH in the wild 15 safety not guaranteed fewer false positives fewer tables
16
Time-Space Trade-offs [AI’06] [KOR’98, IM’98, Pan’06] [Ind’01, Pan’06] SpaceTimeCommentReference [DIIM’04, AI’06] [IM’98] query time space medium low high low ω(1) memory lookups [AIP’06] ω(1) memory lookups [PTW’08, PTW’10] [MNP’06, OWZ’11] 1 mem lookup
17
LSH is tight… leave the rest to cell-probe lower bounds?
18
Data-dependent Hashing! 18 [A-Indyk-Nguyen-Razenshteyn’14]
19
A look at LSH lower bounds 19 [O’Donnell-Wu-Zhou’11]
20
Why not NNS lower bound? 20
21
Intuition 21
22
Nice Configuration: “sparsity” 22
23
Reduction: into spherical LSH 23
24
Two-level algorithm
25
Details Inside a bucket, need to ensure “sparse” case 1) drop all “far pairs” 2) find minimum enclosing ball (MEB) 3) partition by “sparsity” (distance from center) 25
26
1) Far points 26
27
2) Minimum Enclosing Ball 27
28
3) Partition by “sparsity” 28
29
Practice of NNS 29 Data-dependent partitions… Practice: Trees: kd-trees, quad-trees, ball-trees, rp-trees, PCA-trees, sp-trees… often no guarantees Theory? assuming more about data: PCA-like algorithms “work” [Abdullah-A-Kannan- Krauthgamer’14]
30
Finale 30
31
Open question: [Prob. needle of length 1 is not cut] [Prob needle of length c is not cut] ≥ 1/c 2
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.