Advances in Metric Embedding Theory ICMS - Geometry and Algorithms workshop Advances in Metric Embedding Theory Yair Bartal Hebrew University & Caltech IPAM UCLA
My (metric) Space
Metric Spaces Metric space: (X,d) d:X2→R+ d(u,v)=d(v,u) d(v,w) ≤ d(u,v) + d(v,w) d(u,u)=0 Data Representation: Pictures (e.g. faces), web pages, DNA sequences, … Network: communication distance
Metric Embedding Simple Representation: Translate metric data into easy to analyze form, gain geometric structure: e.g. embed in low-dimensional Euclidean space Algorithmic Application: Apply algorithms for a “nice” space to solve problem on “problematic” metric spaces
Embedding Metric Spaces Metric spaces (X,dX), (Y,dy) Embedding is a function f:X→Y For an embedding f, Given u,v in X let Distortion c = max{u,v X} distf(u,v) / min{u,v X} distf(u,v)
Special Metric Spaces Euclidean space lp metric in Rn: Planar metrics Tree metrics Ultrametrics Doubling
Embedding in Normed Spaces [Bourgain 85]: Any n-point metric space embeds in Lp with distortion Θ(log n) [Johnson-Lindenstrauss 85]: Any n-point subset of Euclidean Space embeds with distortion (1+e) in dimension Θ(-2 log n) [ABN 06, B 06]: Dimension Θ(log n) In fact: Θ*(log n/ loglog n)
Average Distortion Practical measure of the quality of an embedding Network embedding, Multi-dimensional scaling, Biology, Vision,… Given a non-contracting embedding f:(X,dX)→(Y,dY): [ABN06]: Every n point metric space embeds into Lp with average distortion O(1), worst-case distortion Θ(log n) and dimension Θ(log n).
[ABN 06]: lq-distortion is bounded by Θ(q) The lq-Distortion lq-distortion: [ABN 06]: lq-distortion is bounded by Θ(q)
Time for a…
Metric Ramsey Problem Given a metric space what is the largest size subspace which has some special structure, e.g. close to be Euclidean Graph theory: Every graph of size n contains either a clique or an independent set of size Q(log n) Dvoretzky’s theorem… [BFM 86]: Every n point metric space contains a subspace of size W(ce log n) which embeds in Euclidean space with distortion (1+e)
Basic Structures: Ultrametric, k-HST [B 96] d(x,z)= (lca(x,z))= (v) (w) (u) 0 = (z) (w)/k (v)/k2 (u)/k3 (v) x z (z)=0 An ultrametric k-embeds in a k-HST (moreover this can be done so that labels are powers of k).
Hierarchically Well-Separated Trees 1 D2 D2 D1/ k D3 D3 D2/ k
Properties of Ultrametrics An ultrametric is a tree metric. Ultrametrics embed isometrically in l2. [BM 04]: Any n-point ultrametric (1+)- embeds in lpd, where d = O(-2 log n) .
A Metric Ramsey Phenomenon Consider n equally spaced points on the line. Choose a “Cantor like” set of points, and construct a binary tree over them. The resulting tree is 3-HST, and the original subspace embeds in this tree with distortion 3. Size of subspace: .
Metric Ramsey Phenomena [BLMN 03, MN 06, B 06]: Any n-point metric space contains a subspace of size which embeds in an ultrametric with distortion Θ(1/e) [B 06]: Deterministic Construction [B 06]: Any n-point metric space contains a subspace of linear size which embeds in an ultrametric with lq-distortion is bounded by Õ(q)
Metric Ramsey Theorems Key Ingredient: Partitions!
Complete Representation via Ultrametrics ? Goal: Given an n point metric space, we would like to embed it into an ultrametric with low distortion. Lower Bound: W(n), in fact this holds event for embedding the n-cycle into arbitrary tree metrics [RR 95]
Probabilistic Embedding [Karp 89]: The n-cycle probabilistically-embeds in n-line spaces with distortion 2 If u,v are adjacent in the cycle C then E(dL(u,v))= (n-1)/n + (n-1)/n < 2 = 2 dC(u,v) C
Probabilistic Embedding [B 96,98,04, FRT 03]: Any n-point metric space probabilistically embeds into an ultrametric with distortion Θ(log n) [ABN 05,06, CDGKS 05]: lq-distortion is Θ(q)
Probabilistic Embedding Key Ingredient: Probabilistic Partitions
Probabilistic Partitions P={S1,S2,…St} is a partition of X if P(x) is the cluster containing x. P is Δ-bounded if diam(Si)≤Δ for all i. A probabilistic partition P is a distribution over a set of partitions. P is (η,d)-padded if Call P η-padded if d=1/2. x1 x2 η [B 96] h=Q(1/(log n)) [CKR01+FRT03]: η(x)= Ω(1/log (ρ(x,Δ))
Partitions and Embedding [B 96, Rao 99, …] Let Δi=4i be the scales. For each scale i, create a probabilistic Δi-bounded partitions Pi, that are η-padded. For each cluster choose σi(S)~Ber(½) i.i.d. fi(x)= σi(Pi(x))·d(x,X\Pi(x)) Repeat O(log n) times. Distortion : O(η-1·log1/pΔ). Dimension : O(log n·log Δ). diameter of X = Δ Δi 16 4 x d(x,X\P(x))
Time to…
Uniform Probabilistic Partitions In a Uniform Probabilistic Partition η:X→[0,1] all points in a cluster have the same padding parameter. [ABN 06]: Uniform partition lemma: There exists a uniform probabilistic Δ-bounded partition such that for any , η(x)=log-1ρ(v,Δ), where The local growth rate of x at radius r is: C1 C2 v2 v1 v3 η(C1) η(C2)
Embedding into a single dimension Let Δi=4i. For each scale i, create uniformly padded probabilistic Δi-bounded partitions Pi. For each cluster choose σi(S)~Ber(½) i.i.d. , fi(x)= σi(Pi(x))·ηi-1(x)·d(x,X\Pi(x)) Upper bound : |f(x)-f(y)| ≤ O(log n)·d(x,y). Lower bound: E[|f(x)-f(y)|] ≥ Ω(d(x,y)) Replicate D=Θ(log n) times to get high probability.
Upper Bound: |f(x)-f(y)| ≤ O(log n) d(x,y) For all x,yєX: - Pi(x)≠Pi(y) implies fi(x)≤ ηi-1(x)· d(x,y) - Pi(x)=Pi(y) implies fi(x)- fi(y)≤ ηi-1(x)· d(x,y) Use uniform padding in cluster
Take a scale i such that Δi≈d(x,y)/4. It must be that Pi(x)≠Pi(y) Lower Bound: y x Take a scale i such that Δi≈d(x,y)/4. It must be that Pi(x)≠Pi(y) With probability ½ : ηi-1(x)d(x,X\Pi(x))≥Δi
Lower bound : E[|f(x)-f(y)|] ≥ d(x,y) Two cases: R < Δi/2 then prob. ⅛: σi(Pi(x))=1 and σi(Pi(y))=0 Then fi(x) ≥ Δi ,fi(y)=0 |f(x)-f(y)| ≥ Δi/2 =Ω(d(x,y)). R ≥ Δi/2 then prob. ¼: σi(Pi(x))=0 and σi(Pi(y))=0 fi(x)=fi(y)=0
Partial Embedding & Scaling Distortion Definition: A (1-ε)-partial embedding has distortion D(ε), if at least 1-ε of the pairs satisfy distf(u,v) ≤ D(ε) Definition: An embedding has scaling distortion D(·) if it is a 1-ε partial embedding with distortion D(ε), for all ε>0 [KSW 04] [ABN 05, CDGKS 05]: Partial distortion and dimension Q(log(1/ε)) [ABN06]: Scaling distortion Q(log(1/ε)) for all metrics
lq-Distortion vs. Scaling Distortion Upper bound D(e) = c log(1/e) on Scaling distortion: ½ of pairs have distortion ≤ c log 2 = c + ¼ of pairs have distortion ≤ c log 4 = 2c + ⅛ of pairs have distortion ≤ c log 8 = 3c …. Average distortion = O(1) Wost case distortion = O(log(n)) lq-distortion = O(min{q,log n})
Coarse Scaling Embedding into Lp Definition: For uєX, rε(u) is the minimal radius such that |B(u,rε(u))| ≥ εn. Coarse scaling embedding: For each uєX, preserves distances to v s.t. d(u,v) ≥ rε(u). rε(w) w rε(u) u rε(v) v
Scaling Distortion Claim: If d(x,y) ≥ rε(x) then 1 ≤ distf(x,y) ≤ O(log 1/ε) Let l be the scale d(x,y) ≤ Δl < 4d(x,y) Lower bound: E[|f(x)-f(y)|] ≥ d(x,y) Upper bound for high diameter terms Upper bound for low diameter terms Replicate D=Θ(log n) times to get high probability.
Upper Bound for high diameter terms: |f(x)-f(y)| ≤ O(log 1/ε) d(x,y) Scale l such that rε(x)≤d(x,y) ≤ Δl < 4d(x,y).
Upper Bound for low diameter terms: |f(u)-f(v)| = O(1) d(u,v) Scale l such that d(x,y) ≤ Δl < 4d(x,y). All lower levels i ≤ l are bounded by Δi.
Embedding into trees with Constant Average Distortion [ABN 07a]: An embedding of any n point metric into a single ultrametric. An embedding of any graph on n vertices into a spanning tree of the graph. Average distortion = O(1). L2-distortion = Lq-distortion = Θ(n1-2/q), for 2<q≤∞
Embeddings Metrics in their Intrinsic Dimension Definition: A metric space X has doubling constant λ, if any ball with radius r>0 can be covered with λ balls of half the radius. Doubling dimension: dim(X) = log λ [ABN 07b]: Any n point metric space X can be embedded into Lp with distortion O(log1+θ n), dimension O(dim(X)) Same embedding, using: nets Lovász Local Lemma Distortion-Dimension Tradeoff
Locality Preserving Embeddings Def: A k-local embedding has distortion D(k) if for every k-nearest neighbors x,y: distf(x,y) ≤ D(k) [ABN 07c]: For fixed k, k-local embedding into Lp distortion Q(log k) and dimension Q(log k) (under very weak growth bound condition) [ABN 07c]: k-local embedding into Lp with distortion Õ(log k) on neighbors, for all k simultaneously, and dimension Q(log n) Same embedding appropriately scaled down Lovász Local Lemma
Summary Unified framework for embedding finite metrics. Probabilistic embedding into ultrametrics. Metric Ramsey theorems. New measures of distortion. Embeddings with strong properties: Optimal scaling distortion. Constant average distortion. Tight distortion-dimension tradeoff. Embedding metrics in their intrinsic dimension. Embedding that strongly preserve locality.
Embedding Examples Euclidean distortion =
Embedding in Normed Spaces [Fréchet Embedding]: Any n-point metric space embeds isometrically in L∞ Proof. w x y
Embedding in Normed Spaces [Bourgain 85]: Any n-point metric space embeds in Lp with distortion Θ(log n) [Johnson-Lindenstrauss 85]: Any n-point subset of Euclidean Space embeds with distortion (1+e) in dimension Θ(-2 log n) [ABN 06, B 06]: Dimension Θ(log n) In fact: Θ*(log n/ loglog n)
Basic Structure: Ultrametric (v) (w) x z (z)=0 0 = (z) (w) (v) (u) d(x,z)= (lca(x,z))= (v)
Upper Bound For all x,yєX: Pi(x)≠Pi(y) implies d(x,X\Pi(x))≤d(x,y) fi(x)= σi(Pi(x))·d(x,X\Pi(x)) For all x,yєX: Pi(x)≠Pi(y) implies d(x,X\Pi(x))≤d(x,y) Pi(x)=Pi(y) implies d(x,A)-d(y,A)≤d(x,y)
Lower bound: Take a scale i such that Δi≈d(x,y)/4. It must be that Pi(x)≠Pi(y) With probability ½ : d(x,X\Pi(x))≥ηΔi With probability ¼ : σi(Pi(x))=1 and σi(Pi(y))=0
Matrixpace Developing mathematical theory of embedding of finite metric spaces Fruitful interaction between computer science and pure/applied mathematics New concepts of embedding yield surprisingly strong properties