Download presentation
Presentation is loading. Please wait.
Published byCora Young Modified over 9 years ago
1
Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Simons Inst. / Columbia) Robert Krauthgamer (Weizmann Inst.) Ilya Razenshteyn (MIT, now at IBM Almaden) 1
2
Sketching n d When is sketching possible? 2 Compress a massive object to a small sketch Rich theories: high-dimensional vectors, matrices, graphs Similarity search, compressed sensing, numerical linear algebra Dimension reduction (Johnson, Lindenstrauss 1984): random projection on a low-dimensional subspace preserves distances
3
Motivation: similarity search Model dis-similarity as a metric Sketching may speed-up computation and allow indexing Interesting metrics: Euclidean ℓ 2 : d(x, y) = (∑ i |x i – y i | 2 ) 1/2 Manhattan, Hamming ℓ 1 : d(x, y) = ∑ i |x i – y i | ℓ p distances d(x, y) = (∑ i |x i – y i | p ) 1/p for p ≥ 1 Edit distance, Earth Mover’s Distance etc. This talk: sketching metrics 3
4
Sketching metrics Alice and Bob each hold a point from a metric space (say x and y) Both send s-bit sketches to Charlie For r > 0 and D > 1 distinguish d(x, y) ≤ r d(x, y) ≥ Dr Shared randomness, allow 1% probability of error Trade-off between s and D Various variants: general protocols etc sketch(x)sketch(y) d(x, y) ≤ r or d(x, y) ≥ Dr? 0110…1 4 AliceBob Charlie x y
5
The main question Which metrics can we sketch efficiently? (Kanpur 2006)
6
Near Neighbor Search via sketches Near Neighbor Search (NNS): Given n-point dataset P A query q within r from some data point Return any data point within Dr from q Sketches of size s imply NNS with space n O(s) and a 1-probe query Proof idea: amplify probability of error to 1/n by increasing the size to O(s log n); sketch of q determines the answer For many metrics: the only approach 6
7
The main question Which metrics can we sketch efficiently? (Kanpur 2006)
8
Sketching ℓ p norms (Indyk 2000): can sketch ℓ p for 0 < p ≤ 2 via random projections using p-stable distributions For D = 1 + ε one gets s = O(1 / ε 2 ) Tight by (Woodruff 2004) For p > 2 sketching ℓ p is somewhat hard (Bar-Yossef, Jayram, Kumar, Sivakumar 2002), (Indyk, Woodruff 2005) To achieve D = O(1) one needs sketch size to be s = Θ~(d 1-2/p ) 8
9
The main question (quantitative) Which metrics can we sketch with constant sketch size and approximation?
10
X Y Beyond ℓ p norms A map f: X → Y is an embedding with distortion C, if for a, b from X: d X (a, b) / C ≤ d Y (f(a), f(b)) ≤ d X (a, b) Reductions for geometric problems 10 a b f(a) f(b) f f Sketches of size s and approximation D for Y Sketches of size s and approximation CD for X
11
Metrics with good sketches Summary: a metric X admits sketches with s, D = O(1), if: X = ℓ p for p ≤ 2 X embeds into ℓ p for p ≤ 2 with distortion O(1) Are there any other metrics with efficient sketches? We don’t know! Some new techniques waiting to be discovered? Above are the only “tractable” spaces? 11
12
A normed space: R d equipped with a metric (think ℓ p or matrix norms) The main result If a normed space X admits sketches of size s and approximation D, then for every ε > 0 the space X embeds into ℓ 1 – ε with distortion O(sD / ε) Embedding into ℓ p, p ≤ 2 Efficient sketches (Kushilevitz, Ostrovsky, Rabani 1998) (Indyk 2000) For norms 12
13
Application: lower bounds for sketches Convert non-embeddability into lower bounds for sketches in a black box way 13 No embeddings with distortion O(1) into ℓ 1 – ε No sketches * of size and approximation O(1) * in fact, any communication protocols
14
Example 1: the Earth Mover’s Distance For x: R [Δ]×[Δ] → R with zero average, ‖x‖ EMD is the cost of the best transportation of the positive part of x to the negative part Initial motivation for this work! (Kanpur 2006) Upper bounds: (Andoni, Do Ba, Indyk, Woodruff 2009), (Charikar 2002), (Indyk, Thaper 2003), (Naor, Schechtman 2005) Lower bound also holds for the minimum-cost matching metric on subsets No embedding into ℓ 1 – ε with distortion O(1) (Naor, Schechtman 2005) No sketches with D = O(1) and s = O(1) 14
15
Example 2: the Trace Norm For an n × n matrix A define the Trace Norm (the Nuclear Norm) ‖A‖ to be the sum of the singular values Previously: lower bounds only for linear sketches (Li, Nguyen, Woodruff 2014) Any embedding into ℓ 1 requires distortion Ω(n 1/2 ) (Pisier 1978) Any sketch must satisfy sD = Ω(n 1/2 / log n) 15
16
The sketch of the proof 16 Good sketches for X Absence of certain Poincaré-type inequalities on X [Andoni-Jayram-P ă traşcu 2010], Direct sum for Information Complexity Weak embedding of X into ℓ 2 Convex duality + compactness Uniform embedding of X into ℓ 2 [Johnson-Randrianarivony 2006], Lipschitz extension Linear embedding of X into ℓ 1-ε [Aharoni-Maurey-Mityagin 1985], Fourier analysis Good sketches for ℓ ∞ (X) Crucially use that X is a norm ‖(x 1, x 2, …, x k )‖ = max i ‖x i ‖ ‖ x 1 – x 2 ‖ ≤ 1 → ‖ f(x 1 ) – f(x 2 ) ‖ ≤ 1 ‖ x 1 – x 2 ‖ ≥ sD → ‖ f(x 1 ) – f(x 2 ) ‖ ≥ 10 [Andoni-Krauthgamer 2007]: almost works, but gives a tiny gap instead of 1 vs 10
17
Open problems Extend to as general class of metrics as possible (Edit Distance?) Can one strengthen our theorem to “sketches with O(1) size and approx. imply embedding into ℓ 1 with distortion O(1)”? Equivalent to an old open problem from Functional Analysis [Kwapien 1969] Keep in mind negative-type metrics that do not embed into ℓ 1 [Khot-Vishnoi 2005] [Cheeger-Kleiner-Naor 2009] Spaces that require s = Ω(d) for D = O(1) besides ℓ ∞ ? Linear sketches with f(s) measurements and g(D) approximation? 17 Any questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.