Optimal Data-Dependent Hashing for Approximate Near Neighbors

Optimal Data-Dependent Hashing for Approximate Near Neighbors
Ilya Razenshteyn (CSAIL MIT) Alexandr Andoni (Simons Institute)

Approximate Near Neighbors (ANN)
Dataset: n points in d dimensions Query: a point within 1 from a data point Want: a data point within 2 from the query Regime: d ≈ log n

Locality-Sensitive Hashing (LSH)
Hash family on Rd: points at distance ≤ 1 collide more often (with probability ≥ p1) than points at distance ≥ 2 (with probability ≤ p2) (Indyk, Motwani 1998): LSH can be used to solve ANN with space O(n1+ρ) and query time O(nρ), where ρ depends on the gap between p1 and p2 Question: what is the best ρ?

Bounds on LSH (Indyk, Motwani 1998): ρ = 1/2 (defined LSH and first constructions) (Andoni, Indyk 2006): ρ = 1/4 (better LSH construction) (O’Donnell, Wu, Zhou 2011): ρ = 1/4 is optimal! Can one improve upon LSH for ANN?

Beyond LSH (Andoni, Indyk, Nguyen, R 2014), (Andoni, R 2015): ρ = 1/7 for ANN (the best LSH gives only ρ = 1/4) The main idea LSH is data-oblivious: good collision probabilities for every pair from Rd For ANN one of the points may be assumed to lie in the dataset Data-dependent hashing: a hash family depends on the points Achieve improvement for every dataset (Andoni, R 2015): ρ = 1/7 is optimal for data-dependent hashing See the arXiv preprint (intricate proof, but very simple algorithm!)

Open problems Dynamic case Instance-optimal data-dependent hashing

Optimal Data-Dependent Hashing for Approximate Near Neighbors

Similar presentations

Presentation on theme: "Optimal Data-Dependent Hashing for Approximate Near Neighbors"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Optimal Data-Dependent Hashing for Approximate Near Neighbors

Similar presentations

Presentation on theme: "Optimal Data-Dependent Hashing for Approximate Near Neighbors"— Presentation transcript:

Similar presentations

About project

Feedback