Yair Bartal Lee-Ad Gottlieb Hebrew U. Ariel University

Yair Bartal Lee-Ad Gottlieb Hebrew U. Ariel University
Approximate nearest neighbor for ℓp–spaces (2<p<∞) via embeddings Yair Bartal Lee-Ad Gottlieb Hebrew U. Ariel University

Nearest neighbor search
Problem definition: Given a set of points S, preprocess S so that the following query can be answered efficiently: Exact: NNS - Given query point q, what is the closest point to q in S? Approximate: ANN - Given query point q, find a point x in S whose distance from q is within some approximation factor of the closest ? polynomial space, polylog query o(log n) approximation q

Approx. nearest neighbor search
Not possible in general metrics More restrictive spaces? Good news! Euclidean space Normed spaces

Lp Normed Spaces Norms: Recall that for d-dimensional vectors x,y
ǁx-yǁp = (|x1-y1|p+…+|xd-yd|p)1/p l1 l2 l∞

Approx. Nearest neighbor search
An efficient ANN structure features polynomial space polylog query time o(log n), o(d) approximation Efficient ANN structures exist for Euclidean (p=2): (1+ɛ)-ANN [IM’98, KOR ‘98] Reduce dimension via JL, brute force in lower dimension. ℓ∞: O(log log d)-ANN [Indyk ‘98] What about other norms? 1≤p<2: (1+ɛ)-ANN same as Euclidean 2<p<∞: ? (previous – Andoni) subject of this paper

Summary of results Combine two algorithms for ℓp (2<p<∞):
Andoni: O(log log d (logdn)1/p) -ANN New result: 2O(p) –ANN Analysis: Equality at p = (logloglog d) + (loglogdn)1/2 Worse case approximation: (loglogd) exp((loglogdn)1/2) Andoni better for larger values, New for smaller Improved bounds in metrics of low doubling dimension

Embeddings An embedding of set X into Y with distortion D is a mapping f : X → Y such that for all x, y ∈ X: 1 ≤ c・dY(f(x),f(y)) / dX(x,y) ≤ D where c is any scaling constant Relaxed: one side may be preserved with constant probability If an embedding is non-expansive and has small contraction: The nearest neighbor stays close, and far points are still relatively far If an embedding is non-contractive and has small expansion: The nearest neighbor is only a little bit farther away, and far points remain far q

Andoni’s algorithm Basic idea: [Andoni ‘09]
Embed ℓp space into ℓ∞ Run Indyk’s ANN algorithm for ℓ∞ Embedding using Frechet random variables max-stable distribution…

Frechet distribution For random variable X ∼ Frechet Max-stability:
Pr[X < x] = e-x-p for x>0 Max-stability: Let random variables X and Z1, ,Zd be ∼ Frechet let v = (v1, , vd) be a non-negative valued vector. Then the random variable Y := maxi viZi is distributed as ǁvǁp ・X (Y ∼ ǁvǁp・X).

Frechet distribution Proof of Max-stability: Recall Y := maxi viZi
Pr[Y ≤ x] = Pr[maxi viZi ≤ x] = Πi Pr[viZi ≤ x] = Πi Pr[Zi ≤ x/vi] = Πi e−(vi/x)p = e−(∑ivip)/xp = e−(ǁvǁp/x)p Similarly, Pr[ǁvǁp・X ≤ x] = Pr[X ≤ x/ǁvǁp] = e−(ǁvǁp /x)p

Review of Andoni’s embedding
Define embedding fb : V → ℓ∞ (b > 0): Draw Frechet random variables Z1, ,Zd. fb maps v= (v1, ,vd) to (v1bZ1, ,vdbZd) The resulting set is V′ ∈ ℓ∞. Theorem: Set b = (3 ln n)1/p. Then fb satisfies Non-contractive (for all points) with prob. > 1−1/n Expansion: For any u,w∈ V, with constant prob. ǁfb(u) − fb (w)ǁ∞ ≤ b ǁu − wǁp Expansion guarantee needed for only one inter-point distance: between the query point and nearest neighbor.

Analysis of Andoni’s embedding
Theorem: Set b = (3 ln n)1/p. Then fb satisfies Non-contractive (all points) with prob. > 1−1/n Expansion: For any u,w∈ V with constant prob. ǁfb(u) − fb (w) ǁ∞ ≤ b ǁu − wǁp Proof of contraction: Take v with ǁvǁp = 1. By max-stability, ǁfb(v)ǁ∞ ∼ bǁvǁp・X = b・X, By definition of Frechet distribution, Pr[ǁfb (v)ǁ∞ < 1] = Pr[b ・X < 1] = e−(1/b)−p = n-3 . Since the embedding is linear, v may be taken to be any inter-point distance between two vectors in V, so the probability that any of the n2 inter-point distances decreases is less than n2・ n-3 = 1/n. Proof of expansion: Same approach, Pr[ǁfb(v)ǁ∞ ≤ b] = Pr[b ・X < b] = Pr[X < 1] = e−1 So expansion bounded by b.

Summary Embed ℓp space into ℓ∞ Run Indyk’s ANN algorithm for ℓ∞
distortion O(b) = O(ln n)1/p Run Indyk’s ANN algorithm for ℓ∞ O(loglogd)-ANN Final guarantee O(loglogd ln1/pn)-ANN

An improvement We can improve the guarantees of Andoni’s algorithm by considering the doubling dimension of the space. Doubling constant: number of half-radius balls necessary to cover big ball. Doubling dimension: log(doubling constant) For example, d-dimensional Euclidean space has doubling dimension Ѳ(d) 4 5 3 6 8 2 7 1

Improvement outline Nearest neighbor search can be reduced to a series of subproblems Searches on spaces with small aspect ratio So we can take a net on the subspaces, and run Andoni’s algorithm on the nets instead Size of net: ddimO(ddim) Approximation: Andoni: O(log log d (logdn)1/p) Improved: O(log log d (ddim logdddim)1/p)

New algorithm Basic idea: Embedding using the Mazur map
Embed ℓp space into ℓ2 Run ANN algorithm for ℓ2 Embedding using the Mazur map

Mazur map Mazur map is a mapping from ℓp to ℓq, for any
0 < p, q < ∞. The mapping of vector v ∈ ℓp is defined as M(v) = (|v0|p/q, |v1|p/q, , |vm−1|p/q) For set V, let C satisfy C ≥ ǁvǁp, for all ∈ V. Our embedding f is the Mazur map from ℓp to ℓ2, scaled down by a factor (p/2) C p/2 – 1 f is non-expansive. Contraction: If ǁx − yǁp = u, then ǁf(x) − f(y)ǁp ≥ 2p-1 (2C)1−p/2 up/2 [Binyamini & Lindenstrauss ‘00]

ANN via the Mazur map The distortion of our embedding is large depends on the diameter C of the space: 2p-1 (2C)1−p/2up/2 But we can show that this guarantee is sufficient to solve a specific case of nearest neighbor in ℓp: the c-bounded nearest neighbor problem.

C-bounded nearest neighbor
Define the c-bounded near neighbor problem where c ≥ ǁvǁp for all v ∈ V If there is a point in V within distance 1 of query q, return it or some point in V within distance c/9 of q. This is a c/9-ANN. If there is no point in V within distance 1 of query q, return null or some point in V within distance c/9 of q.

Approximately solve the c-bounded nearest neighbor problem in ℓp, for c=p18p/2 Embed from ℓp to ℓ2 Compute a 2-ANN in ℓ2 Analysis: the Mazur map ensures that inter-point distances of c/9 or greater map to at least 2p-1 (2c) 1−p/2 (c/9) p/2 = 4. If q possesses a neighbor in the original space at distance 1 or less, the 2-ANN finds a neighbor at distance 2 in the embedded space and less than c/9 in the origin space. q

We can show that the C-bounded nearest neighbor problem can be used to give a c-ANN for the regular (unbound problem). Final result: 2O(p)-ANN

Conclusion Combine two algorithms for ℓp (2<p<∞):
Andoni: O(log log d (logdn)1/p) -ANN New result: 2O(p) –ANN Worse case approximation: (loglogd) exp((loglogdn)1/2) Improved bounds in metrics of low doubling dimension

Yair Bartal Lee-Ad Gottlieb Hebrew U. Ariel University

Similar presentations

Presentation on theme: "Yair Bartal Lee-Ad Gottlieb Hebrew U. Ariel University"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Yair Bartal Lee-Ad Gottlieb Hebrew U. Ariel University

Similar presentations

Presentation on theme: "Yair Bartal Lee-Ad Gottlieb Hebrew U. Ariel University"— Presentation transcript:

Similar presentations

About project

Feedback