President’s Day Lecture: Advanced Nearest Neighbor Search

Slides:



Advertisements
Similar presentations
When Is Nearest Neighbors Indexable? Uri Shaft (Oracle Corp.) Raghu Ramakrishnan (UW-Madison)
Advertisements

Lecture outline Nearest-neighbor search in low dimensions
Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009.
Algorithmic High-Dimensional Geometry 1 Alex Andoni (Microsoft Research SVC)
Spectral Approaches to Nearest Neighbor Search arXiv: Robert Krauthgamer (Weizmann Institute) Joint with: Amirali Abdullah, Alexandr Andoni, Ravi.
Overcoming the L 1 Non- Embeddability Barrier Robert Krauthgamer (Weizmann Institute) Joint work with Alexandr Andoni and Piotr Indyk (MIT)
Big Data Lecture 6: Locality Sensitive Hashing (LSH)
Searching on Multi-Dimensional Data
MIT CSAIL Vision interfaces Towards efficient matching with random hashing methods… Kristen Grauman Gregory Shakhnarovich Trevor Darrell.
Cse 521: design and analysis of algorithms Time & place T, Th pm in CSE 203 People Prof: James Lee TA: Thach Nguyen Book.
Similarity Search in High Dimensions via Hashing
Data Structures and Functional Programming Algorithms for Big Data Ramin Zabih Cornell University Fall 2012.
VLSH: Voronoi-based Locality Sensitive Hashing Sung-eui Yoon Authors: Lin Loi, Jae-Pil Heo, Junghwan Lee, and Sung-Eui Yoon KAIST
Nearest Neighbor Search in high-dimensional spaces Alexandr Andoni (Microsoft Research)
Spectral Approaches to Nearest Neighbor Search Alex Andoni Joint work with:Amirali Abdullah Ravi Kannan Robi Krauthgamer.
SASH Spatial Approximation Sample Hierarchy
Large-scale matching CSE P 576 Larry Zitnick
Given by: Erez Eyal Uri Klein Lecture Outline Exact Nearest Neighbor search Exact Nearest Neighbor search Definition Definition Low dimensions Low dimensions.
Approximate Nearest Neighbors and the Fast Johnson-Lindenstrauss Transform Nir Ailon, Bernard Chazelle (Princeton University)
Similarity Search in High Dimensions via Hashing Aristides Gionis, Protr Indyk and Rajeev Motwani Department of Computer Science Stanford University presented.
Nearest Neighbor Retrieval Using Distance-Based Hashing Michalis Potamias and Panagiotis Papapetrou supervised by Prof George Kollios A method is proposed.
Summer School on Hashing’14 Locality Sensitive Hashing Alex Andoni (Microsoft Research)
Optimal Data-Dependent Hashing for Approximate Near Neighbors
Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.
Indexing Techniques Mei-Chen Yeh.
Approximation algorithms for large-scale kernel methods Taher Dameh School of Computing Science Simon Fraser University March 29 th, 2010.
Beyond Locality Sensitive Hashing Alex Andoni (Microsoft Research) Joint with: Piotr Indyk (MIT), Huy L. Nguyen (Princeton), Ilya Razenshteyn (MIT)
Nearest Neighbor Paul Hsiung March 16, Quick Review of NN Set of points P Query point q Distance metric d Find p in P such that d(p,q) < d(p’,q)
Fast Similarity Search for Learned Metrics Prateek Jain, Brian Kulis, and Kristen Grauman Department of Computer Sciences University of Texas at Austin.
Nearest Neighbor Search in high-dimensional spaces Alexandr Andoni (Princeton/CCI → MSR SVC) Barriers II August 30, 2010.
Sketching and Nearest Neighbor Search (2) Alex Andoni (Columbia University) MADALGO Summer School on Streaming Algorithms 2015.
Similarity Searching in High Dimensions via Hashing Paper by: Aristides Gionis, Poitr Indyk, Rajeev Motwani.
Geometric Problems in High Dimensions: Sketching Piotr Indyk.
Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality Piotr Indyk, Rajeev Motwani The 30 th annual ACM symposium on theory of computing.
Sketching, Sampling and other Sublinear Algorithms: Euclidean space: dimension reduction and NNS Alex Andoni (MSR SVC)
11 Lecture 24: MapReduce Algorithms Wrap-up. Admin PS2-4 solutions Project presentations next week – 20min presentation/team – 10 teams => 3 days – 3.
Optimal Data-Dependent Hashing for Nearest Neighbor Search Alex Andoni (Columbia University) Joint work with: Ilya Razenshteyn.
1 Learning Bias & Clustering Louis Oliphant CS based on slides by Burr H. Settles.
Clustering – Definition and Basic Algorithms Seminar on Geometric Approximation Algorithms, spring 11/12.
Spatial Data Management
Data-dependent Hashing for Similarity Search
Clustering Data Streams
SIMILARITY SEARCH The Metric Space Approach
Approximate Near Neighbors for General Symmetric Norms
Data Science Algorithms: The Basic Methods
Haim Kaplan and Uri Zwick
Lecture 18: Uniformity Testing Monotonicity Testing
Sublinear Algorithmic Tools 3
Lecture 11: Nearest Neighbor Search
Sublinear Algorithmic Tools 2
Spectral Approaches to Nearest Neighbor Search [FOCS 2014]
K Nearest Neighbor Classification
Lecture 10: Sketching S3: Nearest Neighbor Search
Sketching and Embedding are Equivalent for Norms
Lecture 4: CountSketch High Frequencies
Randomized Algorithms CS648
Lecture 7: Dynamic sampling Dimension Reduction
Near(est) Neighbor in High Dimensions
Data-Dependent Hashing for Nearest Neighbor Search
Lecture 16: Earth-Mover Distance
CSCI B609: “Foundations of Data Science”
Near-Optimal (Euclidean) Metric Compression
Yair Bartal Lee-Ad Gottlieb Hebrew U. Ariel University
Locality Sensitive Hashing
cse 521: design and analysis of algorithms
CS5112: Algorithms and Data Structures for Applications
Data Mining Classification: Alternative Techniques
Lecture 15: Least Square Regression Metric Embeddings
Minwise Hashing and Efficient Search
Clustering.
Presentation transcript:

President’s Day Lecture: Advanced Nearest Neighbor Search [Advanced Algorithms, Spring’17]

Announcements Evaluation on CourseWorks If you think homework is too easy (or too hard): mark “appropriateness of workload”

Time-Space Trade-offs (Euclidean) query time space Space Time Comment Reference ≈𝑛 𝑛 𝜎 𝜎=2.09/𝑐 [Ind’01, Pan’06] low high 𝜎=𝑂(1/ 𝑐 2 ) [AI’06] 𝑛 1+𝜌 𝑛 𝜌 𝜌=1/𝑐 [IM’98, DIIM’04] 𝜌=1/ 𝑐 2 [AI’06] medium medium 𝜌≥1/ 𝑐 2 [MNP’06, OWZ’11] 𝑛 1+𝑜( 1/𝑐 2 ) ω(1) memory lookups [PTW’08, PTW’10] 1 mem lookup 𝑛 4/ 𝜖 2 𝑂(𝑑 log 𝑛) 𝑐=1+𝜖 [KOR’98, IM’98, Pan’06] high low 𝑛 𝑜(1 /𝜖 2 ) ω(1) memory lookups [AIP’06]

Near-linear Space for 0,1 𝑑 [Indyk’01, Panigrahy’06] Setting: Close: 𝑟= 𝑑 2𝑐 ⇒ 𝑃 1 =1− 1 2𝑐 Far: 𝑐𝑟= 𝑑 2 ⇒ 𝑃 2 = 1 2 Algorithm: Use one hash table with 𝑘= log 𝑛 log 1/ 𝑃 2 =𝛼⋅ln 𝑛 On query 𝑞: compute 𝑤=𝑔 𝑞 ∈ 0,1 𝑘 Repeat 𝑅=𝑛 𝜎 times: 𝑤 ′ : flip each 𝑤 𝑗 with probability 1− 𝑃 1 look up bucket 𝑤 ′ and compute distance to all points there If found an approximate near neighbor, stop Sample a few buckets in the same hash table!

Near-linear Space Theorem: for 𝜎=Θ log 𝑐 𝑐 , we have: Proof: Pr[find an approx near neighbor]≥0.1 Expected runtime: 𝑂( 𝑛 𝜎 ) Proof: Let 𝑝 ∗ be the near neighbor: ||𝑞− 𝑝 ∗ ||≤𝑟 𝑤=𝑔(𝑞), 𝑡=||𝑤−𝑔 𝑝 ∗ | | 1 Claim 1: Pr 𝑔 𝑡≤ 𝑘 𝑐 ≥ 1 2 Claim 2: Pr 𝑔,𝑤′ 𝑤 ′ =𝑔 𝑝 ||𝑞−𝑝| | 1 ≥ 𝑑 2 ≤ 1 𝑛 Claim 3: Pr [ 𝑤 ′ =𝑔 𝑝 ∗ ∣𝐶𝑙𝑎𝑖𝑚 1] ≥2 𝑛 −𝜎 If 𝑤 ′ =𝑔( 𝑝 ∗ ) at least for one 𝑤′, we are guaranteed to output either 𝑝 ∗ or an approx. near neighbor

Beyond LSH Hamming LSH space Euclidean space LSH Space Time Exponent 𝒄=𝟐 Reference Hamming space 𝑛 1+𝜌 𝑛 𝜌 𝜌=1/𝑐 𝜌=𝟏/𝟐 [IM’98] LSH 𝜌≥1/𝑐 [MNP’06, OWZ’11] 𝑛 1+𝜌 𝑛 𝜌 𝜌≈ 1 2𝑐−1 𝜌=𝟏/𝟑 [AINR’14, AR’15] Euclidean space 𝑛 1+𝜌 𝑛 𝜌 𝜌≈1/ 𝑐 2 𝜌=𝟏/𝟒 [AI’06] LSH 𝜌≥1/ 𝑐 2 [MNP’06, OWZ’11] 𝑛 1+𝜌 𝑛 𝜌 𝜌≈ 1 2 𝑐 2 −1 𝜌=𝟏/𝟕 [AINR’14, AR’15]

Data-dependent hashing New approach? A random hash function, chosen after seeing the given dataset Efficiently computable Data-dependent hashing Example: Voronoi diagram but expensive to compute the hash function

Construction of hash function [A.-Indyk-Nguyen-Razenshteyn’14, A.-Razenshteyn’15] Two components: Nice geometric structure Reduction to such structure has better LSH data-dependent

Nice geometric structure Points on a unit sphere, where 𝑐𝑟≈ 2 , i.e., far pair is (near) orthogonal this would be distance if the dataset were random on sphere Close pair: 𝑟= 2 /𝑐 Query: at angle 45’ from near-neighbor 𝑐𝑟≈ 2 𝑟= 2 /𝑐

Alg 1: Hyperplanes Sample unit 𝑟 uniformly, hash 𝑝 into 𝑠𝑔𝑛〈𝑟, 𝑝〉 Pr⁡[ℎ(𝑝) = ℎ(𝑞)] ? = 1 – 𝛼 / 𝜋 where 𝛼 is the angle between 𝑝 and 𝑞 𝑃 1 =3/4 𝑃 2 =1/2 𝜌≈0.42 [Charikar’02]

Alg 2: Voronoi Sample 𝑇 i.i.d. standard 𝑑- dimensional Gaussians 𝑔 1 , 𝑔 2 , …, 𝑔 𝑇 Hash 𝑝 into ℎ 𝑝 = 𝑎𝑟𝑔𝑚𝑎 𝑥 1≤𝑖≤𝑇 〈𝑝, 𝑔 𝑖 〉 𝑇=2 is simply Hyperplane LSH [A.-Indyk-Nguyen-Razenshteyn’14] based on [Karger-Motwani-Sudan’94]

Hyperplane vs Voronoi Hyperplane with 𝑘=6 hyperplanes Means we partition space into 2 6 =64 pieces Voronoi with 𝑇= 2 𝑘 =64 vectors 𝜌=0.18 grids vs spheres

Reduction to nice structure (very HL) Idea: iteratively decrease the radius of minimum enclosing ball OR make more isotopic Algorithm: find dense clusters with smaller radius large fraction of points recurse on dense clusters apply VoronoiLSH on the rest recurse on each “cap” eg, dense clusters might reappear Why ok? no dense clusters like “random dataset” with radius=100𝑐𝑟 even better! Why ok? radius = 100𝑐𝑟 radius = 99𝑐𝑟 *picture not to scale & dimension

Hash function Described by a tree (like a hash table) radius = 100𝑐𝑟 *picture not to scale&dimension

Dense clusters Current dataset: radius 𝑅 A dense cluster: 2 −𝜖 𝑅 Current dataset: radius 𝑅 A dense cluster: Contains 𝑛 1−𝛿 points Smaller radius: 1−Ω 𝜖 2 𝑅 After we remove all clusters: For any point on the surface, there are at most 𝑛 1−𝛿 points within distance 2 −𝜖 𝑅 The other points are essentially orthogonal ! When applying Cap Carving with parameters ( 𝑃 1 , 𝑃 2 ): Empirical number of far pts colliding with query: 𝑛 𝑃 2 + 𝑛 1−𝛿 As long as 𝑛 𝑃 2 ≫ 𝑛 1−𝛿 , the “impurity” doesn’t matter! 𝜖 trade-off 𝛿 trade-off ?

Tree recap 𝛿 trade-off During query: Will look in >1 leaf! Recurse in all clusters Just in one bucket in VoronoiLSH Will look in >1 leaf! How much branching? Claim: at most 𝑛 𝛿 +1 𝑂(1/ 𝜖 2 ) Each time we branch at most 𝑛 𝛿 clusters (+1) a cluster reduces radius by Ω( 𝜖 2 ) cluster-depth at most 100/Ω 𝜖 2 Progress in 2 ways: Clusters reduce radius CapCarving nodes reduce the # of far points (empirical 𝑃 2 ) A tree succeeds with probability ≥ 𝑛 − 1 2 𝑐 2 −1 −𝑜(1) 𝛿 trade-off

NNS: conclusion 1. Via sketches 2. Locality Sensitive Hashing Random space partitions Better space bound Even near-linear! 3. Data-dependent hashing even better Used in practice a lot these days