Advances in Metric Embedding Theory Yair Bartal Hebrew University &Caltech UCLA IPAM 07.

Advances in Metric Embedding Theory Yair Bartal Hebrew University &Caltech UCLA IPAM 07

Metric Spaces  Metric space: (X,d) d:X 2 →R  Metric space: (X,d) d:X 2 →R +  d( u,v)=d(v,u)  d(v,w) ≤ d(v,u) + d(u,w)  d(u,u)=0  Data Representation:  Data Representation: Pictures (e.g. faces), web pages, DNA sequences, …  Network:  Network: communication distance

Metric Embedding  Simple Representation:  Simple Representation: Translate metric data into easy to analyze form, gain geometric structure: e.g. embed in low- dimensional Euclidean space  Algorithmic Application:  Algorithmic Application: Apply algorithms for a “nice” space to solve problem on “problematic” metric spaces

Embedding Metric Spaces  Metric spaces (X,d X ), (Y,d y )  Embedding is a function f:X→Y  For an embedding f, Given u,v in X let Given u,v in X let  Distortion c = max {u,v  X} dist f (u,v) / min {u,v  X} dist f (u,v)

Special Metric Spaces  Euclidean space  l p metric in R n :  Planar metrics  Tree metrics  Ultrametrics  Doubling

Embedding in Normed Spaces  [Fréchet Embedding]: Any n -point metric space embeds isometrically in L ∞  Proof. x y w

Embedding in Normed Spaces  [Bourgain 85]: Any n -point metric space embeds in L p with distortion (log n)  [Bourgain 85]: Any n -point metric space embeds in L p with distortion Θ(log n)  [Johnson-Lindenstrauss 85]: Any n - point subset of Euclidean Space embeds with distortion (1+  ) in dimension (  - 2 log n)  [Johnson-Lindenstrauss 85]: Any n - point subset of Euclidean Space embeds with distortion (1+  ) in dimension Θ(  - 2 log n)  [ABN 06, B 06]: Dimension Θ(log n) In fact: Θ * (log n/ loglog n)

Embeddings Metrics in their Intrinsic Dimension  Definition: A metric space X has doubling constant λ, if any ball with radius r>0 can be covered with λ balls of half the radius.  Doubling dimension: dim(X) = log λ  [ABN 07b]: Any n point metric space X can be embedded into L p with distortion O(log 1+θ n), dimension O(dim(X))  Same embedding, using:  nets  Lovász Local Lemma  Distortion-Dimension Tradeoff

Average Distortion  Practical measure of the quality of an embedding  Network embedding, Multi-dimensional scaling, Biology, Vision,…  Given a non-contracting embedding f : (X,d X )→(Y,d Y ): f : (X,d X )→(Y,d Y ):  [ABN06]: Every n point metric space embeds into L p with average distortion O(1), worst-case distortion Θ(log n) and dimension Θ(log n).

The l q -Distortion  l q -distortion  l q -distortion: [ABN 06]: [ABN 06]: l q -distortion is bounded by Θ(q)

Dimension Reduction into Constant Dimension  [B 07]: Any finite subset of Euclidean Space embeds in dimension h with distortion  [B 07]: Any finite subset of Euclidean Space embeds in dimension h with l q- distortion e O(q/h) ~ 1+ O(q/h)  Corollary: Every finite metric space embeds into L p in dimension h with distortion  Corollary: Every finite metric space embeds into L p in dimension h with l q- distortion

Local Embeddings  Def: A k -local embedding has distortion D(k) if for every k -nearest neighbors x,y: dist f (x,y) ≤ D(k)  [ABN 07c]: For fixed k, k -local embedding into L p distortion (log k ) and dimension  (log k) (under very weak growth bound condition)  [ABN 07c]: For fixed k, k -local embedding into L p distortion  (log k ) and dimension  (log k) (under very weak growth bound condition)  [ABN 07c]: k -local embedding into L p with distortion Õ(log k) on neighbors, for all k simultaneously, and dimension (log n)  [ABN 07c]: k -local embedding into L p with distortion Õ(log k) on neighbors, for all k simultaneously, and dimension  (log n)  Same embedding method  Lovász Local Lemma

Local Dimension Reduction  [BRS 07]: For fixed k, any finite set of points in Euclidean space has k -local embedding with distortion (1+  ) in dimension  (  - 2 log k) (under very weak growth bound condition)  New embedding ideas  Lovász Local Lemma

Time for a …

Metric Ramsey Problem   Given a metric space what is the largest size subspace which has some special structure, e.g. close to be Euclidean  Graph theory:  Graph theory: Every graph of size n contains either a clique or an independent set of size  (log n)  Dvoretzky’s theorem…  [BFM 86]:  [BFM 86]: Every n point metric space contains a subspace of size  (c  log n) which embeds in Euclidean space with distortion (1+  )

Basic Structures: Ultrametric, k-HST [B 96] d(x,z)=  (lca(x,z))=  (v) (w)(w) (u)(u) 0 =  (z)   (w)/k   (v)/k 2   (u)/k 3 (v)(v) xz  (z)=0 An ultrametric k-embeds in a k-HST (moreover this can be done so that labels are powers of k).

Hierarchically Well- Separated Trees 11 11 11 11 11 22 22 22  2   1 / k 33 33 33 33 33  3   2 / k

Properties of Ultrametrics  An ultrametric is a tree metric.  Ultrametrics embed isometrically in l 2.  [BM 04]: Any n -point ultrametric (1+  )- embeds in l p d, where d = O (  - 2 log n ).

A Metric Ramsey Phenomenon  Consider n equally spaced points on the line.  Choose a “Cantor like” set of points, and construct a binary tree over them.  The resulting tree is 3-HST, and the original subspace embeds in this tree with distortion 3.  Size of subspace:.

Metric Ramsey Phenomena  [BLMN 03, MN 06, B 06]: Any n -point metric space contains a subspace of size which embeds in an ultrametric with distortion (1/  )  [BLMN 03, MN 06, B 06]: Any n -point metric space contains a subspace of size which embeds in an ultrametric with distortion Θ (1/  )  [B 06]: Any n -point metric space contains a subspace of linear size which embeds in an ultrametric with l q Õ  [B 06]: Any n -point metric space contains a subspace of linear size which embeds in an ultrametric with l q -distortion is bounded by Õ(q)

Metric Ramsey Theorems  Key Ingredient: Partitions

Complete Representation via Ultrametrics ?  Goal:  Goal: Given an n point metric space, we would like to embed it into an ultrametric with low distortion.  Lower Bound: [RR 95]  Lower Bound:  (n), in fact this holds event for embedding the n-cycle into arbitrary tree metrics [RR 95]

Probabilistic Embedding  [Karp 89]: The n -cycle probabilistically- embeds in n -line spaces with distortion 2  If u,v are adjacent in the cycle then  If u,v are adjacent in the cycle C then E(d L (u,v))= (n-1)/n + (n-1)/n < 2 = 2 d C (u,v) E(d L (u,v))= (n-1)/n + (n-1)/n < 2 = 2 d C (u,v) C

Probabilistic Embedding  [B 96,98,04, FRT 03]: Any n -point metric space probabilistically embeds into with distortion (log n)  [B 96,98,04, FRT 03]: Any n -point metric space probabilistically embeds into an ultrametric with distortion Θ (log n) [ABN 05,06, CDGKS 05]: l q -distortion is Θ(q)

Probabilistic Embedding  Key Ingredient: Probabilistic Partitions

Probabilistic Partitions  P={S 1,S 2,…S t } is a partition of X if  P(x) is the cluster containing x.  P is Δ-bounded if diam(S i )≤Δ for all i.  A probabilistic partition P is a distribution over a set of partitions.  P is (η,  )-padded if  Call P η-padded if  x1x1 x2x2 ηη ηη [B 96][B 96]  =  (1/(log n)) [CKR01+FRT03, ABN06]:[CKR01+FRT03, ABN06]: η(x)= Ω(1/log (ρ(x,Δ))

 [B 96, Rao 99, …]  Let Δ i =4 i be the scales.  For each scale i, create a probabilistic Δ i - bounde d partitions P i, that are η- padded.  For each cluster choose σ i (S)~Ber(½) i.i.d. f i (x)= σ i (P i (x))·d(x,X\P i (x)) f i (x)= σ i (P i (x))·d(x,X\P i (x))  Repeat O(log n) times.  Distortion : O(η -1 ·log 1/p Δ).  Dimension : O(log n·log Δ). Partitions and Embedding diameter of X = diameter of X = Δ ΔiΔi 4 16 x d(x,X\P(x))

Time to …

Uniform Probabilistic Partitions  In a Uniform Probabilistic Partition η:X→[0,1] all points in a cluster have the same padding parameter.  [ABN 06]: Uniform partition lemma: There exists a uniform probabilistic Δ-bounded partition such that for any, η(x)=log -1 ρ(v,Δ), where  The local growth rate of x at radius r is: v1v1 v2v2 v3v3 C1C1 C2C2 η(C 2 )  η(C 1 ) 

 Let Δ i =4 i.  For each scale i, create uniformly padded probabilistic Δ i - bounde d partitions P i.  For each cluster choose σ i (S)~Ber(½) i.i.d., f i (x)= σ i (P i (x))·η i -1 (x)·d(x,X\P i (x)), f i (x)= σ i (P i (x))·η i -1 (x)·d(x,X\P i (x)) 1.Upper bound : |f(x)-f(y)| ≤ O(log n)·d(x,y). 2.Lower bound: E[|f(x)-f(y)|] ≥ Ω(d(x,y)) 3.Replicate D=Θ(log n) times to get high probability. Embedding into a single dimension

Upper Bound: |f(x)-f(y)| ≤ O(log n) d(x,y)  For all x,yєX : - P i (x)≠P i (y) implies f i (x)≤ η i -1 (x)· d(x,y) - P i (x)≠P i (y) implies f i (x)≤ η i -1 (x)· d(x,y) - P i (x)=P i (y) implies f i (x)- f i (y)≤ η i -1 (x)· d(x,y) - P i (x)=P i (y) implies f i (x)- f i (y)≤ η i -1 (x)· d(x,y) Use uniform padding in cluster

x y  Take a scale i such that Δ i ≈d(x,y)/4.  It must be that P i (x)≠P i (y)  With probability ½ : η i -1 (x)d(x,X\P i (x))≥Δ i LowerBound:

Lower bound : E[|f(x)-f(y)|] ≥ d(x,y)  Two cases: 1.R < Δ i /2 then  prob. ⅛: σ i (P i (x))=1 and σ i (P i (y))=0  Then f i (x) ≥ Δ i, f i (y)=0  |f(x)-f(y)| ≥ Δ i /2 =Ω(d(x,y)). 2.R ≥ Δ i /2 then  prob. ¼: σ i (P i (x))=0 and σ i (P i (y))=0  f i (x)=f i (y)=0  |f(x)-f(y)| ≥ Δ i /2 =Ω(d(x,y)).

Partial Embedding & Scaling Distortion  Definition: A (1-ε)- partial embedding has distortion D(ε), if at least 1-ε of the pairs satisfy dist f (u,v) ≤ D(ε)  Definition: An embedding has scaling distortion D(·) if it is a 1-ε partial embedding with distortion D(ε), for all ε>0  [KSW 04]  [ABN 05, CDGKS 05]:  Partial distortion and dimension  (log(1/ε))  [ABN06]: Scaling distortion  (log(1/ε)) for all metrics

l q -Distortion vs. Scaling Distortion  Upper bound D  c log(1/  ) on Scaling distortion:  ½ of pairs have distortion ≤ c log 2 = c  + ¼ ofpairsdistortion ≤ c log 4 = 2c  + ¼ of pairs have distortion ≤ c log 4 = 2c  + ⅛ ofpairsdistortion ≤ c log 8 = 3c  + ⅛ of pairs have distortion ≤ c log 8 = 3c  ….  Average distortion = O(1)  Wost case distortion = O(log(n))  l q - distortion = O(min{q,log n})

Coarse Scaling Embedding into L p  Definition: For uєX, r ε (u) is the minimal radius such that |B(u,r ε (u))| ≥ εn.  Coarse scaling embedding: For each uєX, preserves distances to v s.t. d(u,v) ≥ r ε (u). u r ε (u) v r ε (v) r ε (w) w

Scaling Distortion  Claim: If d(x,y) ≥ r ε (x) then 1 ≤ dist f (x,y) ≤ O(log 1/ε)  Let l be the scale d(x,y) ≤ Δ l < 4d(x,y) 1.Lower bound: E[|f(x)-f(y)|] ≥ d(x,y) 2.Upper bound for high diameter terms 3.Upper bound for low diameter terms 4.Replicate D=Θ(log n) times to get high probability.

Upper Bound for high diameter terms: |f(x)-f(y)| ≤ O(log 1/ε) d(x,y) Scale l such that r ε (x)≤d(x,y) ≤ Δ l < 4d(x,y). Scale l such that r ε (x)≤d(x,y) ≤ Δ l < 4d(x,y).

Upper Bound for low diameter terms: |f(u)-f(v)| = O(1) d(u,v) Scale l such that d(x,y) ≤ Δ l < 4d(x,y). Scale l such that d(x,y) ≤ Δ l < 4d(x,y).  All lower levels i ≤ l are bounded by Δ i.

Embedding into trees with Constant Average Distortion  [ABN 07a]: An embedding of any n point metric into a single ultrametric.  An embedding of any graph on n vertices into a spanning tree of the graph.  Average distortion = O(1).  L 2 -distortion =  L q -distortion = Θ(n 1-2/q ), for 2<q≤∞

Conclusion  Developing mathematical theory of embedding of finite metric spaces  Fruitful interaction between computer science and pure/applied mathematics  New concepts of embedding yield surprisingly strong properties

Summary  Unified framework for embedding finite metrics.  Probabilistic embedding into ultrametrics.  Metric Ramsey theorems.  New measures of distortion.  Embeddings with strong properties:  Optimal scaling distortion.  Constant average distortion.  Tight distortion-dimension tradeoff.  Embedding metrics in their intrinsic dimension.  Embedding that strongly preserve locality.

Advances in Metric Embedding Theory Yair Bartal Hebrew University &Caltech UCLA IPAM 07.

Similar presentations

Presentation on theme: "Advances in Metric Embedding Theory Yair Bartal Hebrew University &Caltech UCLA IPAM 07."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Advances in Metric Embedding Theory Yair Bartal Hebrew University &Caltech UCLA IPAM 07.

Similar presentations

Presentation on theme: "Advances in Metric Embedding Theory Yair Bartal Hebrew University &Caltech UCLA IPAM 07."— Presentation transcript:

Similar presentations

About project

Feedback