k-center Clustering under Perturbation Resilience

Slides:



Advertisements
Similar presentations
Guy EvenZvi LotkerDana Ron Tel Aviv University Conflict-free colorings of unit disks, squares, & hexagons.
Advertisements

Triangle partition problem Jian Li Sep,2005.  Proposed by Redstar in Algorithm board in Fudan BBS.  Motivated by some network design strategy.
Gillat Kol joint work with Ran Raz Locally Testable Codes Analogues to the Unique Games Conjecture Do Not Exist.
A sublinear Time Approximation Scheme for Clustering in Metric Spaces Author: Piotr Indyk IEEE FOCS 1999.
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
1 ©D.Moshkovitz Complexity The Traveling Salesman Problem.
1 The TSP : Approximation and Hardness of Approximation All exact science is dominated by the idea of approximation. -- Bertrand Russell ( )
Polynomial Time Approximation Schemes Presented By: Leonid Barenboim Roee Weisbert.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Approximation Algorithms Chapter 5: k-center. Overview n Main issue: Parametric pruning –Technique for approximation algorithms n 2-approx. algorithm.
Why do we want a good ratio anyway? Approximation stability and proxy objectives Avrim Blum Carnegie Mellon University Based on work joint with Pranjal.
Combinatorial Algorithms
Complexity 16-1 Complexity Andrei Bulatov Non-Approximability.
Approximation Algorithms: Combinatorial Approaches Lecture 13: March 2.
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.
1 Optimization problems such as MAXSAT, MIN NODE COVER, MAX INDEPENDENT SET, MAX CLIQUE, MIN SET COVER, TSP, KNAPSACK, BINPACKING do not have a polynomial.
The Load Distance Balancing Problem Eddie Bortnikov (Yahoo!) Samir Khuller (Maryland) Yishay Mansour (Google) Seffi Naor (Technion)
Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.
1 Combinatorial Dominance Analysis Keywords: Combinatorial Optimization (CO) Approximation Algorithms (AA) Approximation Ratio (a.r) Combinatorial Dominance.
1 Introduction to Approximation Algorithms Lecture 15: Mar 5.
Computational aspects of stability in weighted voting games Edith Elkind (NTU, Singapore) Based on joint work with Leslie Ann Goldberg, Paul W. Goldberg,
Dana Moshkovitz, MIT Joint work with Subhash Khot, NYU.
1 The Santa Claus Problem (Maximizing the minimum load on unrelated machines) Nikhil Bansal (IBM) Maxim Sviridenko (IBM)
Approximation Algorithms Department of Mathematics and Computer Science Drexel University.
1 The TSP : NP-Completeness Approximation and Hardness of Approximation All exact science is dominated by the idea of approximation. -- Bertrand Russell.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Martin Grötschel  Institute of Mathematics, Technische Universität Berlin (TUB)  DFG-Research Center “Mathematics for key technologies” (M ATHEON ) 
© 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property. Case Studies: Bin Packing.
Packing Rectangles into Bins Nikhil Bansal (CMU) Joint with Maxim Sviridenko (IBM)
Princeton University COS 423 Theory of Algorithms Spring 2001 Kevin Wayne Approximation Algorithms These lecture slides are adapted from CLRS.
Randomized Composable Core-sets for Submodular Maximization Morteza Zadimoghaddam and Vahab Mirrokni Google Research New York.
1 The Theory of NP-Completeness 2 Cook ’ s Theorem (1971) Prof. Cook Toronto U. Receiving Turing Award (1982) Discussing difficult problems: worst case.
On the Approximability of Geometric and Geographic Generalization and the Min- Max Bin Covering Problem Michael T. Goodrich Dept. of Computer Science joint.
Harnessing implicit assumptions in problem formulations: Approximation-stability and proxy objectives Avrim Blum Carnegie Mellon University Based on work.
Hedonic Clustering Games Moran Feldman Joint work with: Seffi Naor and Liane Lewin-Eytan.
Stability Yields a PTAS for k-Median and k-Means Clustering
Vasilis Syrgkanis Cornell University
1 Approximation algorithms Algorithms and Networks 2015/2016 Hans L. Bodlaender Johan M. M. van Rooij TexPoint fonts used in EMF. Read the TexPoint manual.
Young CS 331 D&A of Algo. NP-Completeness1 NP-Completeness Reference: Computers and Intractability: A Guide to the Theory of NP-Completeness by Garey and.
Approximation Algorithms by bounding the OPT Instructor Neelima Gupta
Clustering Data Streams A presentation by George Toderici.
Approximation algorithms
Linear program Separation Oracle. Rounding We consider a single-machine scheduling problem, and see another way of rounding fractional solutions to integer.
TU/e Algorithms (2IL15) – Lecture 11 1 Approximation Algorithms.
The Theory of NP-Completeness
The NP class. NP-completeness
More NP-Complete and NP-hard Problems
Clustering Data Streams
Introduction to Approximation Algorithms
Data Driven Resource Allocation for Distributed Learning
Optimization problems such as
Haim Kaplan and Uri Zwick
The Price of information in combinatorial optimization
Chapter 5. Optimal Matchings
Algorithms for Routing Node-Disjoint Paths in Grids
Computability and Complexity
Analysis and design of algorithm
The Subset Sum Game Revisited
Dynamic and Online Algorithms for Set Cover
Coverage Approximation Algorithms
On the k-Closest Substring and k-Consensus Pattern Problems
Fair Clustering through Fairlets ( NIPS 2017)
Chapter 11 Limitations of Algorithm Power
Minimizing the Aggregate Movements for Interval Coverage
The Byzantine Secretary Problem
NP-Completeness Reference: Computers and Intractability: A Guide to the Theory of NP-Completeness by Garey and Johnson, W.H. Freeman and Company, 1979.
the k-cut problem better approximate and exact algorithms
Clustering.
Complexity Theory in Practice
Submodular Maximization with Cardinality Constraints
Presentation transcript:

k-center Clustering under Perturbation Resilience Colin White Joint work with Nina Balcan and Nika Haghtalab.

Clustering is everywhere Given a set of elements, with distances 1 4 3 2 9 2 3 8 Partition into 𝑘 clusters Minimize distances within each cluster Objective function: 𝑘-means, 𝑘-median, 𝑘-center 2

k-center Clustering Choose fire stations, to minimize the maximum travel time to any site. For a set S of n points and distance metric d: Choose k centers from S, assign each point to closest center. Goal: minimize the maximum radius.

Asymmetric 𝑘-center (A𝑘C) Relax the condition that d is symmetric 10 min 30 min Still have directed triangle-ineq. Minimize distance from Centers to points (order matters now)

Known approximation results Cannot find the opt sol’n quickly unless 𝑃=𝑁𝑃 [G 1985] 2-approx algo for symmetric k-center No efficient 2−𝜀 -approx algo unless 𝑃=𝑁𝑃 [V 1996] -approximation for A𝑘C [C et al. 2005] matching lower bound First natural problem to have a tight approximation factor not in

Outline Define 𝑘-center, asymmetric 𝑘-center Previous work 2-approx. algo for symmetric 𝑘-center Define Perturbation Resilience Results Symmetric 𝑘-center Asymmetric 𝑘-center Hardness of (2−𝜀)-Perturbation Resilience Robust versions of Perturbation Resilience

Beyond the worst-case O(log*n) is not desirable in practice The NP-hard instances are often contrived and particular Theory does not always match up with practice Perturbation Resilience Small changes do not affect the opt clustering More meaningful solution

Perturbation Resilience Given a clustering instance (S, d), an α-perturbation is an instance (𝑆, 𝑑 ′ ) for a function 𝑑′ such that ∀ Bilu & Linial ’12: A clustering instance (S, d) is α-perturbation resilient, if for any α-perturbation, (𝑆, 𝑑 ′ ), the optimal clustering is the same as (𝑆,𝑑). × It’s ok for centers to change, but not the partition.

Perturbation Resilience Bilu & Linial ’12: A clustering instance (S, d) is α-perturbation resilient, if for any α-perturbation, (𝑆, 𝑑 ′ ), the optimal clustering is the same as (𝑆,𝑑). More structure as 𝛼 increases Given a clustering, we are promised it satisfies 𝛼-perturbation resilience Can we find an exact algorithm in polynomial time? How small can we make 𝛼?

Prior Work [B L 2009] Exact alg for max cut under -PR [A B S 2010] Exact alg for center-based clustering under 3-PR [B L 2011] Exact alg for center-based clustering under -PR [M M V 2014] Exact alg for min multiway cut under 4-PR and max cut under -PR. [M M 2016] Exact alg for center-based clustering under 2-PR Our results: Exact alg for symmetric AND asymmetric 𝒌-center under 2-PR No exact alg under (𝟐−𝜺)-PR unless N𝑃=𝑅𝑃

Outline Define 𝑘-center, asymmetric 𝑘-center Previous work 2-approx. algo for symmetric 𝑘-center Define Perturbation Resilience Results Symmetric 𝑘-center Asymmetric 𝑘-center Hardness of (2−𝜀)-Perturbation Resilience Robust versions of Perturbation Resilience

Symmetric 𝑘-center under 2-PR Theorem: Any 2-approximation algorithm returns the optimal solution if the instance satisfies 2-PR. Proof: Given a 2-PR instance, with opt. 𝑟 ∗ Given a 2-approx solution, 𝐶 Make a 2-perturbation 𝑑 ′ : Multiply all dists by 2 Decrease dists in 𝐶 down to 2𝑟 ∗ Then 𝐶 is optimal in 𝑑′: 𝐶 is cost ≤ 2𝑟 ∗ Everything else has cost ≥ 2𝑟 ∗ 2-PR implies 𝐶 is optimal originally

Asymmetric k-center under 2-PR Theorem: Polynomial algorithm for A𝑘C under 2-PR. Idea: “bad” points for which are hard to deal with Can we find a subset of points that behave “symmetrically” ? 100 𝑟 ∗ But how do we know A is nonempty Throw out the bad points 𝑝 is “symmetric” if ∀𝑞, if d(q,p) ≤ r* then d(p,q) ≤ r* as well.

Facts about Set A Fact 1: All centers are in A. Fact 2: Each belongs to the same cluster as its closest point in 𝐴 With these facts, it suffices to cluster A only To find the best clustering over the whole instance, it suffices to find the best clustering over A.

Two Useful Properties Property 1: For all and , Notation: optimal clusters: centers: Useful Properties Property 1: For all and , Property 2: For all and , , Property 1 Property 2 15

Key Observation Margin: Each is closer to 𝑐 than points outside 𝐺 𝑐 Notation : ball of radius 𝑟 ∗ around a point 𝑐 Margin: Each is closer to 𝑐 than points outside 𝐺 𝑐 and it satisfies margin! Non-center 𝐺 𝑐 : no margin unless Property 1: For all and , Property 2: For all and , , 16

Algorithm Create the set A. For all , construct 𝐺 𝑐 (ball of radius 𝑟 ∗ around 𝑐) If does not satisfy margin, delete it. If delete . Add to the same set as with smallest d(q,p). Return all remaining sets. Algorithm:

Proof Idea After step 4, we have clustered the set A: Create the set A. For all , construct 𝐺 𝑐 (ball of radius 𝑟 ∗ around 𝑐) If does not satisfy margin, delete it. If delete . Add to the same set as with smallest d(q,p). Return all remaining sets. After step 4, we have clustered the set A: are not deleted in step 3. Non-center are deleted in step 3 unless All other non-center are deleted in step 4. All points outside of A are added to the correct clusters Key Observation: and they satisfy margin Non-center 𝐺 𝑐 : no margin unless

Outline Define k-center, asymmetric k-center Previous work 2-approx. algo for symmetric k-center Define Perturbation Resilience Results Symmetric k-center Asymmetric k-center Hardness of (𝟐−𝜺)-Perturbation Resilience Robust versions of Perturbation Resilience

Lower Bounds No polynomial time algorithm for symmetric k-center under (2-ε) - perturbation resilience, unless NP=RP. Hardness: Parsimonious reduction from dominating set 𝛼=1 𝛼=1.99 𝛼=3 No structure No structure Clusters are very far apart 𝛼=2 Efficient Algorithm!

Robust Stability Conditions α-perturbation resilience: Optimal clustering does not change under α-perturbations. Robust: (α, ε)-perturbation resilience [B L ‘12] For each α-perturbation, opt. changes by ≤𝜀𝑛 points

𝑘-center under (3,𝜀)-PR We need Ω(𝜀𝑛) lower bound on opt cluster sizes Single Linkage returns opt under (3,𝜀)-PR Idea: If two points 𝑝,𝑞 from diff clusters are close, 𝑝 can become center for both clusters under a perturbation 𝑝 and a dummy center 𝑝′ can replace 𝑐 𝑖 and 𝑐 𝑗 as optimal centers 𝑝 ≤3 𝑟 ∗

Conclusion Polytime alg for 𝑘-center and A𝑘C under 2-PR, tight Theoretical Significance First time a problem with no constant factor approximation has an exact algorithm, when assuming just constant stability First tight results in this area Symmetric and asymmetric become same difficulty Practical Significance Only a small window of values for which perturbation resilience is interesting

Open Questions Thanks! Can we go below 𝛼=2 for k-median and k-means? Can we apply the symmetrizing technique to other problems? Thanks!