Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013.

Slides:



Advertisements
Similar presentations
A Dependent LP-Rounding Approach for the k-Median Problem Moses Charikar 1 Shi Li 1 1 Department of Computer Science Princeton University ICALP 2012, Warwick,
Advertisements

Lindsey Bleimes Charlie Garrod Adam Meyerson
Shortest Vector In A Lattice is NP-Hard to approximate
Poly-Logarithmic Approximation for EDP with Congestion 2
Approximation Algorithms Chapter 14: Rounding Applied to Set Cover.
Centrality of Trees for Capacitated k-Center Hyung-Chan An École Polytechnique Fédérale de Lausanne July 29, 2013 Joint work with Aditya Bhaskara & Ola.
Primal-Dual Algorithms for Connected Facility Location Chaitanya SwamyAmit Kumar Cornell University.
Approximation Algorithms Chapter 5: k-center. Overview n Main issue: Parametric pruning –Technique for approximation algorithms n 2-approx. algorithm.
N. Bansal 1, M. Charikar 2, R. Krishnaswamy 2, S. Li 3 1 TU Eindhoven 2 Princeton University 3 TTIC Midwest Theory Day, Purdue, May 3, 2014.
Paths, Trees and Minimum Latency Tours Kamalika Chaudhuri, Brighten Godfrey, Satish Rao, Satish Rao, Kunal Talwar UC Berkeley.
Why do we want a good ratio anyway? Approximation stability and proxy objectives Avrim Blum Carnegie Mellon University Based on work joint with Pranjal.
Introduction to Approximation Algorithms Lecture 12: Mar 1.
Computational problems, algorithms, runtime, hardness
Algorithms for Max-min Optimization
Network Design Adam Meyerson Carnegie-Mellon University.
Single Sink Edge Installation Kunal Talwar UC Berkeley.
Approximate Nearest Neighbors and the Fast Johnson-Lindenstrauss Transform Nir Ailon, Bernard Chazelle (Princeton University)
Robust Network Design with Exponential Scenarios By: Rohit Khandekar Guy Kortsarz Vahab Mirrokni Mohammad Salavatipour.
1 Traveling Salesman Problem (TSP) Given n £ n positive distance matrix (d ij ) find permutation  on {0,1,2,..,n-1} minimizing  i=0 n-1 d  (i),  (i+1.
Job Scheduling Lecture 19: March 19. Job Scheduling: Unrelated Multiple Machines There are n jobs, each job has: a processing time p(i,j) (the time to.
On Stochastic Minimum Spanning Trees Kedar Dhamdhere Computer Science Department Joint work with: Mohit Singh, R. Ravi (IPCO 05)
Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.
Facility Location with Nonuniform Hard Capacities Tom Wexler Martin Pál Éva Tardos Cornell University ( A 9-Approximation)
Primal-Dual Algorithms for Connected Facility Location Chaitanya SwamyAmit Kumar Cornell University.
Facility Location with Nonuniform Hard Capacities Martin Pál Éva Tardos Tom Wexler Cornell University.
1 Introduction to Approximation Algorithms Lecture 15: Mar 5.
(work appeared in SODA 10’) Yuk Hei Chan (Tom)
ESA 2003M. Pál. Universal Facility Location1 Universal Facility Location Mohammad Mahdian MIT Martin Pál Cornell University.
A General Approach to Online Network Optimization Problems Seffi Naor Computer Science Dept. Technion Haifa, Israel Joint work: Noga Alon, Yossi Azar,
1 Krakow, Jan. 9, 2008 Approximation via Doubling Marek Chrobak University of California, Riverside Joint work with Claire Kenyon-Mathieu.
Integrated Logistics PROBE Princeton University, 10/31-11/1.
1/24 Algorithms for Generalized Caching Nikhil Bansal IBM Research Niv Buchbinder Open Univ. Israel Seffi Naor Technion.
1 The Santa Claus Problem (Maximizing the minimum load on unrelated machines) Nikhil Bansal (IBM) Maxim Sviridenko (IBM)
LP-based Techniques for Minimum Latency Problems Chaitanya Swamy University of Waterloo Joint work with Deeparnab Chakrabarty Microsoft Research, India.
Approximation Algorithms for Stochastic Combinatorial Optimization Part I: Multistage problems Anupam Gupta Carnegie Mellon University.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
1 Streaming Algorithms for Geometric Problems Piotr Indyk MIT.
Primal-Dual Algorithms for Connected Facility Location Chaitanya SwamyAmit Kumar Cornell University.
LP-Based Algorithms for Capacitated Facility Location Hyung-Chan An EPFL July 29, 2013 Joint work with Mohit Singh and Ola Svensson.
Approximation Algorithms for Stochastic Optimization Chaitanya Swamy Caltech and U. Waterloo Joint work with David Shmoys Cornell University.
Image segmentation Prof. Noah Snavely CS1114
Joint work with Chandrashekhar Nagarajan (Yahoo!)
The Matroid Median Problem Viswanath Nagarajan IBM Research Joint with R. Krishnaswamy, A. Kumar, Y. Sabharwal, B. Saha.
Randomized Composable Core-sets for Submodular Maximization Morteza Zadimoghaddam and Vahab Mirrokni Google Research New York.
1 Combinatorial Algorithms Local Search. A local search algorithm starts with an arbitrary feasible solution to the problem, and then check if some small,
1 Wroclaw University, Sept 18, 2007 Approximation via Doubling Marek Chrobak University of California, Riverside Joint work with Claire Kenyon-Mathieu.
Online Social Networks and Media
Geometric Problems in High Dimensions: Sketching Piotr Indyk.
“Fault Tolerant Clustering Revisited” - - CCCG 2013 Nirman Kumar, Benjamin Raichel خوشه بندی مقاوم در برابر خرابی سپیده آقاملائی.
Topics in Algorithms 2007 Ramesh Hariharan. Tree Embeddings.
A polylog competitive algorithm for the k-server problem Nikhil Bansal (IBM) Niv Buchbinder (Open Univ.) Aleksander Madry (MIT) Seffi Naor (Technion)
New algorithms for Disjoint Paths and Routing Problems
The Effectiveness of Lloyd-type Methods for the k-means Problem Chaitanya Swamy University of Waterloo Joint work with Rafi Ostrovsky, Yuval Rabani, Leonard.
Yuan Zhou, Ryan O’Donnell Carnegie Mellon University.
PRIMAL-DUAL APPROXIMATION ALGORITHMS FOR METRIC FACILITY LOCATION AND K-MEDIAN PROBLEMS K. Jain V. Vazirani Journal of the ACM, 2001.
Facility Location with Service Installation Costs Chaitanya Swamy Joint work with David Shmoys and Retsef Levi Cornell University.
Clustering Data Streams A presentation by George Toderici.
Approximation Algorithms based on linear programming.
Multiroute Flows & Node-weighted Network Design Chandra Chekuri Univ of Illinois, Urbana-Champaign Joint work with Alina Ene and Ali Vakilian.
Clustering – Definition and Basic Algorithms Seminar on Geometric Approximation Algorithms, spring 11/12.
Clustering Data Streams
Data Driven Resource Allocation for Distributed Learning
Approximating the MST Weight in Sublinear Time
Haim Kaplan and Uri Zwick
Lecture 18: Uniformity Testing Monotonicity Testing
k-center Clustering under Perturbation Resilience
Coverage Approximation Algorithms
Fault-Tolerant Facility Location
The Byzantine Secretary Problem
Non-clairvoyant Precedence Constrained Scheduling
Presentation transcript:

Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, /20/2013

$100 $130 maintenance cost transportation cost $10 $20 $50 $30 + minimize

BALINSKI, M. L On finding integer solutions to linear programs. In Proceedings of the IBM Scientific Computing Symposium on Combinatorial Problems. IBM, New York, pp. 225–248. KUEHN, A. A., AND HAMBURGER, M. J A heuristic program for locating warehouses. STOLLSTEIMER, J. F The effect of technical change and output expansion on the optimum number, size and location of pear marketing facilities in a California pear producing region. Ph.D. thesis, Univ. California at Berkeley, Berkeley, Calif. STOLLSTEIMER, J. F A working model for plant numbers and locations. J. Farm Econom. 45, 631– 645. Facility Location Problem

Uncapacitated Facility Location (UFL) facility cost connection cost + F : potential facility locations C : set of clients f i, i  F : cost for opening i d : metric over F  C find S  F, minimize facilities clients $30 $100 $20 $100

Wal-mart Stores in New Jersey Question : Suppose you have budget for 50 stores, how will you select 50 locations?

k -median facilities clients + F : potential facility locations C : set of clients d : metric over F  C find S  F, minimize f i, i  F : cost for opening i k : number of facilities to open | S |= k

k -median clustering

Known Results: UFL O(log n)-approximation [Hoc82] constant approximations 3.16 [STA98] 2.41 [GK99] 3 [JV99] [CG99] [CG99] 5+ε [Kor00] [MMSV01] [CS03] 1.61 [JMS02] [Svi02] 1.52 [MYZ02] 1.50 [Byr07] [Li11] hardness of approx. [GK98]

4Deterministic rounding of linear programs 4.5 The uncapacitated facility location problem 5Random sampling and randomized rounding of linear programs 5.8 The uncapacitated facility location problem 7The primal-dual method 7.6 The uncapacitated facility location problem 9Further uses of greedy and local search algorithms 9.1 A local search algorithm for the uncapacitated facility location problem 9.4 A greedy algorithm for the uncapacitated facility location problem 12 Further uses of random sampling and randomized rounding of linear programmings 12.1 The uncapacitated facility location problem

Know results : k -median  pseudo-approximation  1-approx with O(k log n) facilities [Hoc82]  2(1+ε)-approx. with (1+1/ε)k facilities[LV92]  super-constant approximation  O(log n loglog n) [Bar96,Bar98]  O(log k loglog k) [CCGS98]

Known Results: k -median  constant approximation LP rounding Primal-Dual Local Search [CGTS99]6 [JV99] 4 [CG99]4 [JMS03]3.25 [CL12] 3+ε [AGK + 01] 1+√3+ε [LS13]  (1+2/e)-hardness of approximation [JMS03]

Lloyd Algorithm[Lloyd82]  k-means clustering : min total squared distances  k-means vs k-median clustering: k-means is more often used Walmart example: k-median is more appropriate approximation: k-median is “easier”

Local Search  Can we improve the solution by p swaps?  No : stop  Yes : swap and repeat  Approximation :  k-median : 3+2/p [AGK + 01]  k-means : (3+2/p) 2 [KMN + 02]

LP for k -median y i : whether to open i x i,j : whether connect j to i open at most k facilities client j must be connected client j can only connected to an open facility integrality gap is at least 2 integrality gap is at most 3 (proof non-constructive)

(1+√3+ε)-approximation on k-median

k -median and UFL  f = cost of a facility  f #open facilities Given a black-box α-approximation A for UFL Naïve try : find an f such that A opens k facilities α-approxition for k-median? Proof : α ≈ for UFL, α > for k-median

k -median and UFL Naïve try : find an f such that A opens k facilities 2 issues with naïve try : 1. need LMP α-approximation for UFL α- approximation: LMP α-approximation LMP = Lagragean Multiplier Preserving

k -median and UFL S 1 : set of k 1 < k facilities S 2 : set of k 2 > k facilities bi-point solution Naïve try : find an f such that A opens k facilities 2 issues with naïve try : 1. need LMP α-approximation for UFL 2. can not find f s.t. A opens exactly k facilities

k -median and UFL 2 issues with naïve try : 1. need LMP α-approximation for UFL 2. can not find f s.t. A opens exactly k facilities LMP approx. factor bi-point  integral final ratio for k-median [JV] [JMS] 3 x our result 2 do not know how to improve this factor of 2 is tight !!

bi-point solution k 1 = | S 1 | < k ≤ | S 2 | = k 2 a, b : ak 1 + bk 2 = k, a + b = 1 bi-point solution : a S 1 +b S 2 cost(a S 1 +b S 2 ) = a cost( S 1 ) + b cost( S 2 ) S1S1 S2S2

gap-2 instance 1 0 k + 1 cost of integral solution = 2 k 1 = 1, k 2 = k+1 cost ( S 1 ) = k+1, cost ( S 2 ) = 0 S1S1 S2S2

k -median and UFL Main Lemma 2 : bi-point solution of cost C  solution of cost with k+O(1/ε) facilities [JV][JMS]our result LMP approx. factor 322 bi-point  integral x 2 final ratio for k-median 64 this factor of 2 is tight !! bi-point  pseudo-integral Main Lemma 1 : suffice to give an α-approximate solution with k+O(1) facilities

Main Lemma 1 with k+1 open facilities, cost = 0 with k open facilities, cost huge A : black-box α-approximation with k+c open facilities A ' : (α+ε)-approximation with k open facilities A ' calls A n O(c/ε) times. bad instance:

Dense Facility B i : set of clients in a small ball around i i is A-dense, if connection cost of B i in OPT is ≥ A i BiBi this instance : i is A-dense for A ≈ opt

Dense Facility BiBi Reduction component works directly if there are no opt/t-dense facilities, t = O(c/ε) can reduce to such an instance in n O(t) time i

[Awasthi-Blum-Sheffet] : ε, δ >0 constants, OPT k-1 ≥ (1+δ)OPT k  can find (1+ε)-approximation Main Lemma 1 : suffice to give an α-approximate solution with k+O(1) facilities  k-median clustering is easy in practice  reason : there is a “meaningful” clustering Lemma 1 from [ABS]

Algorithm  Apply A to (k-c, F, C, d)  solution with k facilities of cost ≤ αOPT k-c  Apply [ABS] to each (k-i, F, C, d) for i = 0, 1, 2, …, c-1  Output the best of the c+1 solutions Proof  If OPT k-c ≤ (1+ε)OPT k, then done.  otherwise, consider the smallest i s.t. OPT k-i-1 ≥ (1+ε) 1/c OPT k-i  [ABS] on (k-i, F, C, d)  solution of cost (1+ε)OPT k-i ≤ (1+ε) 2 OPT k [ABS] OPT k-1 ≥ (1+δ)OPT k  (1+ε)-approximation A : α-approximation algorithm for k-median with k+c medians

Main Lemma 2 : bi-point solution of cost C  solution of cost with k+O(1/ε) facilities [JV] bi-point solution of cost C  solution of cost 2C  based on improving [JV] algorithm

S1S1 S2S2 given : bi-point solution a S 1 +b S 2 select S’ 2  S 2, | S’ 2 | = | S 1 | = k 1 with prob. a, open S 1 with prob. b, open S’ 2 randomly open k-k 1 facilities in S 2 \ S’ 2 i JV algorithm τ i = nearest facility of i guarantee : either i is open, or τ i is open

Analysis of JV algorithm i1i1 i2i2 i3i3 ≤ d 1 + d 2 If i 2 is open, connect j to i 2 Otherwise, if i 1 is open, connect j to i 1 Otherwise connect j to i 3 E[cost of j] ≤ × [cost of j in a S 1 +b S 2 ] d1d1 d2d2 j i 1  S 1, i 3  S’ 2 either i 1 or i 3 is open 2

Our Algorithm on average, d 1 >> d 2 d(j, i 3 ) ≤ i1i1 i2i2 i3i3 d1d1 d2d2 ≤ d 1 + d 2 j i3i3 If i 2 is open, connect j to i 2 Otherwise, if i 1 is open, connect j to i 1 Otherwise connect j to i 3 E[cost of j] ≤ × [cost of j in a S 1 +b S 2 ] 2 d 1 +2 d 2 2d1+d22d1+d2

Our Algorithm for a star, either the center is open, or all leaves are open idea : big stars: always open the center, open each leaf with prob. ≈b group small stars of the same size, dependent rounding for each group, open 3 more facilities than expected first try open each star independently? with prob. a, open the center, with prob. b, open the leaves problem : can not bound the number of open facilities need to guarantee : either i is open, or τ i is open i τiτi

small stars small star : star of size ≤ 2/(abε ) M h : set of stars of size h, m = |M h | Roughly, for am stars, open the center for bm stars, open the leaves More accurately, permute the stars and the facilities open top centers open bottom leaves

big stars size h > 2/(abε ) always open the center randomly open leaves ≈ bh for big star

Lemma : we open at most k + 6/(abε) facilities. for a big star of size h, FRAC : a+bh ALG : for a group of m small stars of size h FRAC : m(a+bh) ALG : there are at most 2/(abε) groups

Summary Main Lemma 2 : bi-point solution of cost C  solution of cost with k+O(1/ε) facilities [JV][JMS]our result LMP approx. factor 322 x 2 final ratio for k-median 64 bi-point  pseudo-integral Main Lemma 1 : suffice to give an α-approximate solution with k+O(1) facilities

Open Problems gap between integral solution with k+1 open facilities and LP value(with k open facilities)? tight analysis? algorithm works for k-means?