The Intrinsic Dimension of Metric Spaces

Slides:



Advertisements
Similar presentations
Routing Complexity of Faulty Networks Omer Angel Itai Benjamini Eran Ofek Udi Wieder The Weizmann Institute of Science.
Advertisements

Lower Bounds for Additive Spanners, Emulators, and More David P. Woodruff MIT and Tsinghua University To appear in FOCS, 2006.
Embedding Metric Spaces in Their Intrinsic Dimension Ittai Abraham, Yair Bartal*, Ofer Neiman The Hebrew University * also Caltech.
On Complexity, Sampling, and -Nets and -Samples. Range Spaces A range space is a pair, where is a ground set, it’s elements called points and is a family.
Lower Bound for Sparse Euclidean Spanners Presented by- Deepak Kumar Gupta(Y6154), Nandan Kumar Dubey(Y6279), Vishal Agrawal(Y6541)
WSPD Applications.
Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009.
Overcoming the L 1 Non- Embeddability Barrier Robert Krauthgamer (Weizmann Institute) Joint work with Alexandr Andoni and Piotr Indyk (MIT)
 Distance Problems: › Post Office Problem › Nearest Neighbors and Closest Pair › Largest Empty and Smallest Enclosing Circle  Sub graphs of Delaunay.
6.896: Topics in Algorithmic Game Theory Lecture 11 Constantinos Daskalakis.
Doubling dimension and the traveling salesman problem Yair BartalHebrew University Lee-Ad GottliebHebrew University Robert KrauthgamerWeizmann Institute.
Metric Embeddings with Relaxed Guarantees Hubert Chan Joint work with Kedar Dhamdhere, Anupam Gupta, Jon Kleinberg, Aleksandrs Slivkins.
Cse 521: design and analysis of algorithms Time & place T, Th pm in CSE 203 People Prof: James Lee TA: Thach Nguyen Book.
Embedding Metrics into Ultrametrics and Graphs into Spanning Trees with Constant Average Distortion Ittai Abraham, Yair Bartal, Ofer Neiman The Hebrew.
A Nonlinear Approach to Dimension Reduction Lee-Ad Gottlieb Weizmann Institute of Science Joint work with Robert Krauthgamer TexPoint fonts used in EMF.
Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)
Routing, Anycast, and Multicast for Mesh and Sensor Networks Roland Flury Roger Wattenhofer RAM Distributed Computing Group.
Approximation Algorithms: Combinatorial Approaches Lecture 13: March 2.
Proximity algorithms for nearly-doubling spaces Lee-Ad Gottlieb Robert Krauthgamer Weizmann Institute TexPoint fonts used in EMF. Read the TexPoint manual.
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.
Advances in Metric Embedding Theory Ofer Neiman Ittai Abraham Yair Bartal Hebrew University.
CSE 421 Algorithms Richard Anderson Lecture 27 NP Completeness.
Random Walks Great Theoretical Ideas In Computer Science Steven Rudich, Anupam GuptaCS Spring 2004 Lecture 24April 8, 2004Carnegie Mellon University.
Algorithmic Models for Sensor Networks Stefan Schmid and Roger Wattenhofer WPDRTS, Island of Rhodes, Greece, 2006.
Doubling Dimension in Real-World Graphs Melitta Lorraine Geistdoerfer Andersen.
Embedding and Sketching Alexandr Andoni (MSR). Definition by example  Problem: Compute the diameter of a set S, of size n, living in d-dimensional ℓ.
Algorithms on negatively curved spaces James R. Lee University of Washington Robert Krauthgamer IBM Research (Almaden) TexPoint fonts used in EMF. Read.
Randomized Online Algorithm for Minimum Metric Bipartite Matching Adam Meyerson UCLA.
Random Walks Great Theoretical Ideas In Computer Science Steven Rudich, Anupam GuptaCS Spring 2005 Lecture 24April 7, 2005Carnegie Mellon University.
Doubling Dimension: a short survey Anupam Gupta Carnegie Mellon University Barriers in Computational Complexity II, CCI, Princeton.
A light metric spanner Lee-Ad Gottlieb. Graph spanners A spanner for graph G is a subgraph H ◦ H contains vertices, subset of edges of G Some qualities.
On the Impossibility of Dimension Reduction for Doubling Subsets of L p Yair Bartal Lee-Ad Gottlieb Ofer Neiman.
Advances in Metric Embedding Theory Yair Bartal Hebrew University &Caltech UCLA IPAM 07.
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. Fast.
Oct 23, 2005FOCS Metric Embeddings with Relaxed Guarantees Alex Slivkins Cornell University Joint work with Ittai Abraham, Yair Bartal, Hubert Chan,
Approximation algorithms
Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.
Forbidden-Set Distance Labels for Graphs of Bounded Doubling Dimension
Dimension reduction for finite trees in L1
Optimization problems such as
Generalized Sparsest Cut and Embeddings of Negative-Type Metrics
Efficient methods for finding low-stretch spanning trees
Haim Kaplan and Uri Zwick
Ultra-low-dimensional embeddings of doubling metrics
Great Theoretical Ideas in Computer Science
Lecture 18: Uniformity Testing Monotonicity Testing
Sublinear Algorithmic Tools 3
Greedy Routing with Bounded Stretch
Structural Properties of Low Threshold Rank Graphs
Lecture 10: Sketching S3: Nearest Neighbor Search
Sketching and Embedding are Equivalent for Norms
Randomized Algorithms CS648
Discrete Mathematics for Computer Science
Lecture 7: Dynamic sampling Dimension Reduction
Lecture 16: Earth-Mover Distance
Near-Optimal (Euclidean) Metric Compression
Light Spanners for Snowflake Metrics
Yair Bartal Lee-Ad Gottlieb Hebrew U. Ariel University
Coverage Approximation Algorithms
cse 521: design and analysis of algorithms
Introduction Wireless Ad-Hoc Network
Metric Methods and Approximation Algorithms
Dimension versus Distortion a.k.a. Euclidean Dimension Reduction
Embedding Metrics into Geometric Spaces
Compact routing schemes with improved stretch
Lecture 15: Least Square Regression Metric Embeddings
Clustering.
President’s Day Lecture: Advanced Nearest Neighbor Search
Hierarchical Routing in Networks with Bounded Doubling Dimension
Routing in Networks with Low Doubling Dimension
Presentation transcript:

The Intrinsic Dimension of Metric Spaces Anupam Gupta Carnegie Mellon University 3ème cycle romand de Recherche Opérationnelle, 2010 Seminar, Lecture #4

y z x Metric space M = (V, d) set V of points symmetric non-negative distances d(x,y) triangle inequality d(x,y) ≤ d(x,z) + d(z,y) d(x,x) = 0 y z x

in the previous two lectures… We saw: every metric is (almost) a (random) tree every metric is (almost) an Euclidean metric every Euclidean metric is (almost) a low-dimensional metric Today: how to measure the complexity of a metric space and doing “better” dimension reduction

“better” dimension reduction Can we do “better” than random projections?

tight bounds on dimension reduction Theorem [1984] Given any n-point subset V of Euclidean space, one can map into O(log n/²2)-dimensional space while incurring a distortion at most (1+²). Theorem (Alon): Lower bound of (log n/(²2 log ² -1)) again for the “uniform/equidistant” metric Un

so what do we do now? Note that these are uniform results. (“All n-point Euclidean metrics have property blah(n) …”) Can we give a better “per-instance” guarantee? (“For any n-point Euclidean metric M, we get property blah(M) …”) Note: If the metric contains a copy of equidistant metric Ut we know we need (log t) dimensions if we want O(1) distortion Similar lower bounds if our metric contains almost equidistant metrics But are these the only obstructions to low-dimensionality?

more nuanced results To paraphrase the results recent proven: If a Euclidean metric embeds into Rk for some dimension k with distortion O(1) then we can find an embedding into RO(k) with distortion O(log n) [Chan G. Talwar ’08, Abraham Bartal Neiman ’08] JL says: we can find an embedding into RO(log n)

here’s how we do it come up with convenient ways of capturing the property that there are no large near-equidistant metrics formally: define a notion of “intrinsic dimension” of a point set in Euclidean space in fact, our definitions will hold for any metric space, not just for point sets in Euclidean space. And we will make use of this generality to get good algos…

rest of this talk Give two definitions of metric dimension: a) pointwise dimension b) doubling dimension Develop good algorithms for metrics with low pointwise/doubling dimension. Also, get “better” dimension reduction for metrics with low doubling dimension

pointwise dimension The pointwise dimension of a metric is at most k if for all points x, number of points within distance 2r of point x ≤ 2k (Number of points within distance r of point x) x r

pointwise dimension captures sets of regularly spaced points in Rk

using it for near-neighbor searching Given a metric space M = (X, d) and a subset S of X preprocess it so that given query point q we can return point in S (approximately) closest to q.

using it for near-neighbor searching Query point q. Suppose I know a point x at distance d(q,x) = r. If I find a point y at distance r/2 from q,  have made progress. Allowed operations: Can sample points from “balls” around any point in the original point set S. (sample from points in S  Ball(point x, radius r))

algorithm by picture Query point q. Suppose I know a point x at distance d(q,x) = r. Want to find a point y at distance r/2 from q. 3r/2 x r q r/2

algorithm by picture Query point q. Suppose I know a point x at distance d(q,x) = r. Want to find a point y at distance r/2 from q. Suppose we sample from B(x, 3r/2) to find such a point y 3r/2 Bad case: Most points in this ball lie close to x x r Not possible! This would cause high “growth rate”!!! q r/2

algorithm by picture |B(x,3r/2)| ≤ |B(q, 5r/2)| ≤ |B(q, 4r)| ≤ 23k |B(q,r/2)|  Probability of hitting a good point ≥ 1/23k 3r/2 Near-neighbor Algorithm: 1. Let x = closest point seen so far, with distance r = d(x,q) 2. Pick 23k samples from B(x, 5r/2) 3. If see some point at distance ≤ r/2 from q, go to line1 else output closest point seen so far x r q r/2

pointwise dimension used widely Used to model “tractable/simple” networks by [Kamoun Kleinrock], [Kleinberg Tardos], [Plaxton Rajaraman Richa], [Karger Ruhl], [Beygelzimer Kakade Langford]… x r

 but… Drawback: the definition is not closed under taking subsets (A subset of a low-dimensional metric can be high dimensional.) r  O(lg r2) 2-dimensional set

and that’s not all Pointwise dimension is a somewhat restrictive notion (unclear if “real” metrics have low pointwise dimension) Would like something a bit more robust and general

new notion: doubling dimension Dimension dimD(M) is at most k such that for any set S, if S has diameter DS it can be covered by 2k sets of diameter ½DS D

new notion: doubling dimension Dimension dimD(M) is at most k such that for any set S, if S has diameter DS it can be covered by 2k sets of diameter ½DS D

doubling generalizes geometric dimension Take k-dim Euclidean space Rk Claim: dimD(Rk) ≈ Θ(k) Easy to see for boxes Argument for spheres a bit more involved. 23 boxes to cover larger box in R3

“doubling metrics” Dimension at most k if every set S with diameter DS can be covered by 2k sets of diameter ½DS A family of metric spaces is called “doubling” if there exists a constant k such that the doubling dimension of these metrics is bounded by k

what is not a doubling metric? The equidistant metric Ut on t points has dimension (log t) Hence low doubling dimension captures the fact that the metric does not have large equidistant metrics.

doubling generalizes pointwise dim Fact: If a metric has pointwise dimension k, it has doubling dimension O(k)

the picture thus far…

rest of this talk Give two definitions of metric dimension: a) pointwise dimension b) doubling dimension Develop good algorithms for metrics with low pointwise/doubling dimension. Also get “better” dimension reduction for metrics with low doubling dimension

some useful properties of doubling dimension

small near-equidistant metrics (restated)   this 2-dim set has (/d)2 points Fact: Suppose a metric (X,d) has doubling dimension k. If any subset S µ X of points has all inter-point distances lying between ± and ¢ there are at most (¢/±)O(k) points in S.

the (simple) proof

advantages of this fact Thm: Doubling metrics admit O(dim(M))-padded decompositions Useful wherever padded decompositions are useful E.g.: can prove that all doubling metrics embed into ℓ2 with distortion

btw, just to check Natural Q: Do all doubling metrics embed into ℓ2 with distortion O(1)?

a substantial generalization Many geometric algorithms can be extended to doubling spaces… Near neighbor search Compact routing Distance labeling Network triangulation Sensor placements Small-world networks Traveling Salesman Sparse Spanners Approx. inference Network Design Clustering problems Well-separated pair decomposition Data structures Learnability

example application Assign labels L(x) to each host x in a metric space Looking just at L(x) and L(y), can infer distance d(x,y) Results labels with (O(1)/ε)dim × log n bits estimates within (1 + ε) factor y 010001 010001 Contrast with lower bound of n bit labels in general for any factor < 2 110001 110001 x f( , ) ≈ d(x,y)

another example [Arora 95] showed that TSP on Rk was (1+²)-approximable in time [Talwar 04] extended this result to metrics with doubling dimension k

example in action: sparse spanners for doubling metrics [Chan G example in action: sparse spanners for doubling metrics [Chan G. Maggs Zhou]

spanners Given a metric M = (V, d), a graph G = (V, E) is an (m, ²)-spanner if 1) number of edges in G is m 2) d(x,y) ≤ dG(x,y) ≤ (1 + ²) d(x,y) A reasonable goal: (?) ² = 0.1, m = O(n) Fact: For the equidistant metric Un, if ² < 1 then G = Kn

spanners for doubling metrics Theorem: [Chan G. Maggs Zhou] Given any metric M, and any ² < ½, we can efficiently find an (m, ²)-spanner G with m = n (1 + 1/²) dimD(M) Hence, for doubling metrics, linear-sized spanners! Independently proved by [HarPeled Mendel], they give better runtime Generalizes a similar theorem for Euclidean metrics due to [Arya Das Narasimhan]

standard tool: nets Nets: A set of points N is an r-net of a set S if d(u,v) ≥ r for any u,v 2 N For every w 2 S \ N, there is a u 2 N with d(u,w) < r r

standard tool: nets Nets: A set of points N is an r-net of S if d(u,v) ≥ r for any u,v 2 N For every w 2 S \ N, there is a u 2 N with d(u,w) < r Fact: If a metric has doubling dim k and N is an r-net ) B(x,2r) Å N · O(1)k

recursive nets 16 8 4 2 Suppose all the points were at least unit distance apart so you take a 2-net N1 of these points Now you can take a 4-net N2 of this net And so on…

recursive nets Nt is a 2t-net of the set Nt-1 N0 = V Nt is a 2t-net of the set Nt-1  Nt is a 2t+1-net of the set V (almost)

the spanner construction N0 = V Nt is a 2t-net of the set Nt-1  Nt is a 2t+1-net of the set V (almost)

the sparsity

the stretch

rest of this talk Give two definitions of metric dimension: a) pointwise dimension b) doubling dimension Develop good algorithms for metrics with low pointwise/doubling dimension. Also get “better” dimension reduction for metrics with low doubling dimension

want to improve on J-L Theorem [1984] Given any n-point subset V of Euclidean space, one can map it into O(log n/²2)-dimensional space while incurring distortion at most (1+²). Want map to depend on the metric space. (JL just computes a random projection.) Fact: we need a non-linear map.

back to dimensionality reduction To paraphrase the best currently-known results: If a Euclidean metric embeds into Rk for some dimension k with distortion O(1) the metric has doubling dimension O(k) we can find an embedding into RO(k) with distortion O(log n)  

new: “per-instance” bounds Theorem: [Chan G. Talwar] Any metric with doubling dimension k embeds into Euclidean space with T dimensions with distortion (where T 2 [ k log log n, log n]) Independently, [Abraham Bartal Neiman] get similar results.

special cases of interest Distortion on using O(dimD (M)) Euclidean dimensions Distortion on using O(log n) General metrics Euclidean If the metric is doubling, this quantity is sqrt{log n}. In general, this is never more than O(log n). Again generalizes the previous result. This generalizes result we talked about in Lecture #2: any metric embeds into Euclidean space with O(log n) distortion This is just the Johnson-Lindenstrauss lemma.

thank you!