Download presentation
Presentation is loading. Please wait.
Published byLaurence Ramsey Modified over 9 years ago
1
Nearest Neighbor Search in high-dimensional spaces Alexandr Andoni (Microsoft Research)
2
Nearest Neighbor Search (NNS) Preprocess: a set D of points in R d Query: given a new point q, report a point p D with the smallest distance to q q p
3
Motivation Generic setup: Points model objects (e.g. images) Distance models (dis)similarity measure Application areas: machine learning, data mining, speech recognition, image/video/music clustering, bioinformatics, etc… Distance can be: Euclidean, Hamming, ℓ ∞, edit distance, Ulam, Earth-mover distance, etc… Primitive for other problems: find the closest pair in a set D, MST, clustering… p q
4
Plan for today 1. NNS for “basic” distances: LSH 2. NNS for “advanced” distances: embeddings
5
2D case Compute Voronoi diagram Given query q, perform point location Performance: Space: O(n) Query time: O(log n)
6
High-dimensional case All exact algorithms degrade rapidly with the dimension d When d is high, state-of-the-art is unsatisfactory: Even in practice, query time tends to be linear in n AlgorithmQuery timeSpace Full indexingO(d*log n)n O(d) (Voronoi diagram size) No indexing – linear scan O(dn)
7
Approximate NNS r-near neighbor: given a new point q, report a point p D s.t. ||p-q||≤r c-approximate cr as long as there exists a point at distance ≤r q r p cr
8
Approximation Algorithms for NNS A vast literature: With exp(d) space or Ω(n) time: [Arya-Mount-et al], [Kleinberg’97], [Har-Peled’02],… With poly(n) space and o(n) time: [Kushilevitz-Ostrovsky-Rabani’98], [Indyk-Motwani’98], [Indyk’98, ‘01], [Gionis-Indyk-Motwani’99], [Charikar’02], [Datar-Immorlica-Indyk-Mirrokni’04], [Chakrabarti- Regev’04], [Panigrahy’06], [Ailon-Chazelle’06], [A- Indyk’06]…
9
ρ=1/c 2 +o(1) [AI’06] n 1+ρ +nddn ρ ρ≈1/c [IM’98, Cha’02, DIIM’04] The landscape: algorithms ρ=O(1/c 2 ) [AI’06] n 4/ε 2 +ndO(d*log n)c=1+ε [KOR’98, IM’98] nd*logndn ρ ρ=2.09/c [Ind’01, Pan’06] Space: poly(n). Query: logarithmic Space: small poly (close to linear). Query: poly (sublinear). Space: near-linear. Query: poly (sublinear). SpaceTimeCommentReference ρ=1/c 2 +o(1) [AI’06] n 1+ρ +nddn ρ ρ≈1/c [IM’98, Cha’02, DIIM’04]
10
Locality-Sensitive Hashing Random hash function g: R d Z s.t. for any points p,q: If ||p-q|| ≤ r, then Pr[g(p)=g(q)] is “high” If ||p-q|| >cr, then Pr[g(p)=g(q)] is “small” Use several hash tables q p “not-so-small” ||p-q|| Pr[g(p)=g(q)] rcr 1 P1P1 P2P2 : n ρ, where ρ s.t. [Indyk-Motwani’98]
11
Example of hash functions: grids Pick a regular grid: Shift and rotate randomly Hash function: g(p) = index of the cell of p Gives ρ ≈ 1/c p [Datar-Immorlica-Indyk-Mirrokni’04]
12
Regular grid → grid of balls p can hit empty space, so take more such grids until p is in a ball Need (too) many grids of balls Start by reducing dimension to t Analysis gives Choice of reduced dimension t? Tradeoff between # hash tables, n , and Time to hash, t O(t) Total query time: dn 1/c 2 +o(1) Near-Optimal LSH 2D p p RtRt [A-Indyk’06]
13
x Proof idea Claim:, where P(r)=probability of collision when ||p-q||=r Intuitive proof: Let’s ignore effects of reducing dimension P(r) = intersection / union P(r)≈random point u beyond the dashed line The x-coordinate of u has a nearly Gaussian distribution → P(r) exp(-A·r 2 ) p q r q P(r) u p
14
The landscape: lower bounds ρ=1/c 2 +o(1) [AI’06] ρ=O(1/c 2 ) [AI’06] n 4/ε 2 +ndO(d*log n)c=1+ε [KOR’98, IM’98] n 1+ρ +nddn ρ ρ≈1/c [IM’98, Cha’02, DIIM’04] nd*logndn ρ ρ=2.09/c [Ind’01, Pan’06] Space: poly(n). Query: logarithmic Space: small poly (close to linear). Query: poly (sublinear). Space: near-linear. Query: poly (sublinear). SpaceTimeCommentReference n o(1/ε 2 ) ω(1) memory lookups [AIP’06] ρ≥1/c 2 [MNP’06, OWZ’10] n 1+o(1/c 2 ) ω(1) memory lookups [PTW’08, PTW’10]
15
Open Question #1: Design space partitioning of R t that is efficient: point location in poly(t) time qualitative: regions are “sphere-like” [Prob. needle of length 1 is cut] [Prob needle of length c is cut] ≥ c2c2
16
LSH beyond NNS Approximating Kernel Spaces (obliviously) Problem: For x,y R d, can define inner product K(x,y)=e -||x-y|| Implicitly, means K(x,y) = ϕ (x) * ϕ (y) Can we obtain explicit and efficient ϕ ? approximately? (can’t do exactly) Yes, for some kernels, via LSH: E.g., map ϕ ’(x)=(r 1 (g 1 (x)), r 2 (g 2 (x)…) g i ’s are LSH functions on R d, r i ’s map into random ±1 Get: ±ε approximation in O(ε -2 * log n) dimensions Sketching (≈ dimensionality reduction in a computational space) [KOR’98,…] [A-Indyk’07, Rahami-Recht’07, A’09]
17
Plan for today 1. NNS for basic distances 2. NNS for advanced distances: embeddings NNS beyond LSH
18
Distances, so far LSH good for: Hamming, Euclidean ρ≈1/c 2 [AI’06] n 1+ρ +nddn ρ ρ=1/c [IM’98, Cha’02, DIIM’04] ρ≥1/c 2 [MNP06,OZW10,PTW’08’10] Hamming ( ℓ 1 ) Euclidean ( ℓ 2 ) ρ≥1/c [MNP06,OZW10,PTW’08’10] SpaceTimeCommentReference How about other distances (not ℓ p ’s) ?
19
Earth-Mover Distance (EMD) Given two sets A, B of points, EMD(A,B) = min cost bipartite matching between A and B Points can be in plane, ℓ 2 d … Applications: image search Images courtesy of Kristen Grauman (UT Austin)
20
Reductions via embeddings f ℓ 1 =real space with distance ||x-y|| 1 =∑ i |x i -y i | For each X M, associate a vector f(X), such that for all X,Y M ||f(X) - f(Y)|| 2 approximates original distance between X and Y Up to some distortion (approximation) Then can use NNS for Euclidean space! Can also consider other “easy” distances between f(x), f(y) Most popular host: ℓ 1 ≡Hamming f
21
Earth-Mover Distance over 2D into ℓ 1 Sets of size s in [1…s]x[1…s] box Embedding of set A: impose randomly-shifted grid Each grid cell gives a coordinate: f (A) c =#points in the cell c Subpartition the grid recursively, and assign new coordinates for each new cell (on all levels) Distortion: O(log s) 21 [Cha02, IT03] 22 1 0 0 2 11 1 0 0 0 0 00 0 02 21
22
Embeddings of various metrics Embeddings into ℓ 1 MetricUpper bound Earth-mover distance (s-sized sets in 2D plane) O(log s) [Cha02, IT03] Earth-mover distance (s-sized sets in {0,1} d ) O(log s*log d) [AIK08] Edit distance over {0,1} d (= #indels to tranform x->y) 2 Õ(√log d) [OR05] Ulam (edit distance between non-repetitive strings) O(log d) [CK06] Block edit distanceÕ(log d) [MS00, CM07] Lower bound Ω(log 1/2 s) [NS07] Ω(log s) [KN05] Ω(log d) [KN05,KR06] Ω̃(log d) [AK07] 4/3 [Cor03] Open Question #3: Improve the distortion of embedding EMD, W 2, edit distance into ℓ 1
23
Really beyond LSH NNS for ℓ ∞ via decision trees cannot do better via (deterministic) decision trees NNS for mixed norms, e.g. ℓ 2 (ℓ 1 ) [I’04,AIK’09,A’09] Embedding into mixed norms [AIK’09] Ulam O(1)-embeds in ℓ 2 2 (ℓ ∞ (ℓ 1 )) of small dimension yields NNS with O(log log d) approximation Ω̃(log d) if would embed into each separate norm! Open Question #4: Embed EMD, edit distance into mixed norms? 23 n 1+ρ O(d log n)c≈log ρ log d [I’98] SpaceTimeCommentReference
24
Summary: high-d NNS Locality-sensitive hashing: For Hamming, Euclidean spaces Provably (near) optimal NNS in some regimes Applications beyond NNS: kernels, sketches Beyond LSH Non-normed distances: via embeddings into ℓ 1 Algorithms for ℓ p and mixed norms (of ℓ p ‘s) Some open questions: Design qualitative, efficient LSH / space partitioning (in Euclidean space) Embed “harder” distances (like EMD, edit distance) into ℓ 1, or mixed norms (of ℓ p ’s) ? Is there an LSH for ℓ ∞ ? NNS for any norm: e.g. trace norm?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.