Download presentation
Presentation is loading. Please wait.
1
Proximity algorithms for nearly-doubling spaces Lee-Ad Gottlieb Robert Krauthgamer Weizmann Institute TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAA
2
Proximity algorithms for nearly-doubling spaces 2 Proximity problems In arbitrary metric space, some proximity problems are hard For example, the nearest neighbor search problem requires Θ(n) time The doubling dimension parameterizes the “bad” case… q ~1
3
Proximity algorithms for nearly-doubling spaces 3 Doubling Dimension Definition: Ball B(x,r) = all points within distance r from x. The doubling constant (of a metric M) is the minimum value ¸ such that every ball can be covered by ¸ balls of half the radius First used by [Ass-83], algorithmically by [Cla-97]. The doubling dimension is dim(M)=log ¸ (M) [GKL-03] A metric is doubling if its doubling dimension is constant Packing property of doubling spaces A set with diameter D and min. inter-point distance a, contains at most (D/a) O(log ¸ ) points Here ≤7.
4
Proximity algorithms for nearly-doubling spaces 4 Applications In the past few years, many algorithmic tasks have been analyzed via the doubling dimension For example, approximate nearest neighbor search can be executed in time ¸ O(1) log n Some other algorithms analyzed via the doubling dimension Nearest neighbor search [KL-04, BKL-06, CG-06] Clustering [Tal-04, ABS-08, FM-10] Spanner construction [GGN-06, CG-06, DPP-06, GR-08] Routing [KSW-04, Sil-05, AGGM-06, KRXY-07, KRX-08] Travelling Salesperson [Tal-04] Machine learning [BLL-09, GKK-10] Message: This is an active line of research…
5
Proximity algorithms for nearly-doubling spaces 5 Problem Most algorithms developed for doubling spaces are not robust Algorithmic guarantees don’t hold for nearly-doubling spaces If a small fraction of the working set possesses high doubling dimension, algorithmic performance degrades. This problem motivates the following key task Given an n-point set S and target dimension d* Remove from S the fewest number of points so that the remaining set has doubling dimension at most d*
6
Proximity algorithms for nearly-doubling spaces 6 Two paradigms How can removing a few “bad” points help? Two models: 1. Ignore the bad points Outlier detection. [GHPT-05] cluster based on similarity, seek a large subset with low intrinsic dimension. Algorithms with slack. Throw bad points into the slack [KRXY-07] gave a routing algorithm with guarantees for most of the input points. [FM-10] gave a kinetic clustering algorithm for most of the input points. [GKK-10] gave a machine learning algorithm – small subset doesn’t interfere with learning
7
Proximity algorithms for nearly-doubling spaces 7 Two paradigms How can removing a few “bad” points help? Two models: 2. Tailor a different algorithm for the bad points Example: Spanner construction. A spanner is an edge subset of the full graph Good points: Low doubling dimensionsparse spanner with nice properties (low stretch and degree) Bad points: Take the full graph If the number of bad points is O(n.5 ), we have a spanner with O(n) edges
8
Proximity algorithms for nearly-doubling spaces 8 Results Recall our key problem Given an n-point set S and target dimension d* Remove from S the fewest number of points so that the remaining set has doubling dimension at most d* This problem is NP-hard Even determining the doubling dimension of a point set exactly is NP- hard! Proof on the next slide But the doubling dimension can be approximated within a constant factor… Our contribution: bicriteria approximation algorithm In time 2 O(d*) n 3, we remove a number of points arbitrarily close to optimal, while achieving doubling dimension 4d* + O(1) We can also achieve near-linear runtime, at the cost of slightly higher dimension
9
Proximity algorithms for nearly-doubling spaces 9 Warm up Lemma: It is NP-hard to determine the doubling dimension of a set S Reduction: from vertex cover with bounded degree Δ = n ½. the size of any vertex cover is at least n ½. Construction: A set S of n points corresponding to the vertex set V. Let d(u,v) = ½ if the cor. vertices are connected by an edge Let d(u,v) = 1if the cor. vertices aren’t connected Analysis: Any subset of S found in a ball of radius ½ has at most n ½ points - degree of original graph S is a ball of radius 1. The minimum covering of all of S with balls of radius ½ is equal to the minimum vertex cover of V. Note: reduction preserves hardness of approximation Corollary: It is NP-hard to determine if removing k points from S can leave a set with doubling dimension d*. So our problem is hard as well. ½ ½ 1
10
Proximity algorithms for nearly-doubling spaces 10 Bicriteria algorithm Recall that he doubling constant (of a metric M) is the minimum value ¸ such that every r-radius ball can be covered by ¸ balls of half the radius Define the related notion of density constant as the minimum value >0 such that every r-radius ball contains at most points at mutual interpoint distance r/2 Nice property: The density constant can only decrease under the removal of points, unlike the doubling constant. We can show that √ (S) ≤ ¸ (S) ≤ (S) it’s NP-hard to compute the density constant (ratio-preserving reduction from independent set) =2, =3
11
Proximity algorithms for nearly-doubling spaces 11 Bicriteria algorithm We will give a bicriteria algorithm for the density constant. Problem statement: Given an n-point set S and target density constant * Remove from S the fewest number of points so that the remaining set has density constant at most * A bicriteria algorithm for the density constant is itself a bicriteria algorithm for the doubling constant within a quadratic factor
12
Proximity algorithms for nearly-doubling spaces 12 Witness set Given a set S, a subset S’ is a witness set for the density constant if All points are at interpoint distance at least r/2 Note that S’ is a concise proof that the density constant of S is at least |S’| Theorem: Fix a value ’< (S). A witness set of S of size at least √ ‘ can be found in time 2 O( *) n 3 Proof outline: For each point p and radius r define the r-ball of p. Greedily cover all points in the r-ball with disjoint balls of radius r/2. Then cover all points in each r/2 ball with disjoint balls of radius r/4. Since there exists in S a witness set of size (S), there exists a p and r so that either there are √ (S) r/2 balls, and these form a witness set, or one r/2 ball covers √ (S) r/4 balls, and these form a witness set.
13
Proximity algorithms for nearly-doubling spaces 13 Bicriteria algorithm Recall our problem Given an n-point set S and target density constant * Remove from S the fewest number of points so that the remaining set has density constant at most * Our bricriteria solution: Let k be the true answer (the minimum number of points that must be removed). We remove k c/(c-1) points and the remaining set has density constant c 2 * 2
14
Proximity algorithms for nearly-doubling spaces 14 Bicriteria algorithm Algorithm Run the subroutine to identify a witness set of size at least c * Remove it Repeat Analysis The density constant of the resulting set is not greater than c 2 * 2 since we terminated without finding a witness set of size at least c * Every time a witness set of size w>c * is removed by our algorithm, the optimal algorithm must remove at least w- * points or else the true solution would have density constant greater than * It follows that are algorithm removes k w/(w- *) < kc/(c-1) points
15
Proximity algorithms for nearly-doubling spaces 15 Conclusion We conclude that there exists a bicriteria algorithm for the density constant We remove k c/(c-1) points and the remaining set has density constant c 2 * 2 It follows that there exists a bricriteria algorithm for the doubling constant We remove k c/(c-1) points and the remaining set has doubling constant c 4 ¸ * 4
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.