Download presentation
Presentation is loading. Please wait.
Published byVivien Garrison Modified over 9 years ago
1
Optimal Data-Dependent Hashing for Approximate Near Neighbors
Ilya Razenshteyn (CSAIL MIT) Alexandr Andoni (Simons Institute)
2
Approximate Near Neighbors (ANN)
Dataset: n points in d dimensions Query: a point within 1 from a data point Want: a data point within 2 from the query Regime: d ≈ log n
3
Locality-Sensitive Hashing (LSH)
Hash family on Rd: points at distance ≤ 1 collide more often (with probability ≥ p1) than points at distance ≥ 2 (with probability ≤ p2) (Indyk, Motwani 1998): LSH can be used to solve ANN with space O(n1+ρ) and query time O(nρ), where ρ depends on the gap between p1 and p2 Question: what is the best ρ?
4
Bounds on LSH (Indyk, Motwani 1998): ρ = 1/2 (defined LSH and first constructions) (Andoni, Indyk 2006): ρ = 1/4 (better LSH construction) (O’Donnell, Wu, Zhou 2011): ρ = 1/4 is optimal! Can one improve upon LSH for ANN?
5
Beyond LSH (Andoni, Indyk, Nguyen, R 2014), (Andoni, R 2015): ρ = 1/7 for ANN (the best LSH gives only ρ = 1/4) The main idea LSH is data-oblivious: good collision probabilities for every pair from Rd For ANN one of the points may be assumed to lie in the dataset Data-dependent hashing: a hash family depends on the points Achieve improvement for every dataset (Andoni, R 2015): ρ = 1/7 is optimal for data-dependent hashing See the arXiv preprint (intricate proof, but very simple algorithm!)
6
Open problems Dynamic case Instance-optimal data-dependent hashing
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.