A Dependent LP-Rounding Approach for the k-Median Problem Moses Charikar 1 Shi Li 1 1 Department of Computer Science Princeton University ICALP 2012, Warwick, UK
Introduction Linear Programming Relaxation Simple Pseudo-Approx. for k-median Our Algorithm for k-medianOutline
k-Median as a Clustering Problem Given: metric (X, d), k Partition X into k clusters Select a center for each cluster Minimize sum of distances to the centers: Quantifies how well a set can be divided into k partitions k = 4
k-Median in Operation Research Given metric (F C, d), k o F : set of facilities o C : set of clients Open k facilities Connect each client to its nearest open facility Minimize total connection cost k = 4
Related Problem : Facility Location Problem Given metric (F C, d), k o F : set of facilities o C : set of clients o f i : facility cost of opening i Open k facilities Connect each client to its nearest open facility Minimize total connection cost {f i ≥ 0 : i F} Open a set F' F of facilities Minimize sum of facility cost and connection cost, k = 4
Known Results * local search: if switching p facilities can not improve a solution, then the solution is a 3+2/p-approx. Integrality gap of the natural linear programming is between 2 and 3 o the proof of the upper bound 3 is non-constructive Approx.Hardness of appox. facility location1.488 [Li11]1.463 [GK98,Sri02] k-median3+ε * [AGK + 01]1+2/e+ε [JMS02]
Our Results A LP-rounding approach for k-median o prove 3.25 approximation ratio o thus give a constructive proof for the 3.25 integrality gap o faster running time compared to the local search algorithm o potential to improve the 3+ε approximation the upper bound 3.25 is not tight our algorithm may already give approximation ratio smaller than 3
Our Results prev. best approx. ratioour approx. ratio k-facility location [Zha06]3.25 matroid median16 [KKN + 11]9 knapsack median ≥ 1000 [Kum12]34 k-facility location: facility location problem with constraint that at most k facilities can be open matroid median: the set of open facilities must be an independent set of a given matroid knapsack median problem: each facility has a cost, the total cost of open facilities can not exceed a budget B
Introduction Linear Programming Relaxation Simple Pseudo-Approx. for k-median Our Algorithm for k-medianOutline
Natural LP Relaxation y i {0,1}, i F : whether facility i is open x i,j {0,1}, i F, j C : whether client i is connected to facility j Every client j must be connected to 1 facility Client j can only be connected to an open facility We can open at most k facilities
Canonical Instance km facilities every client j is connected to its nearest m facilities in the LP solution, y i =1/m, x i,j {0,1/m} facilities clients j
Canonical Instance F j : the set of m facilities that j is connected to average distance from j to F j maximum distance from j to F j LP value = facilities clients j
Introduction Linear Programming Relaxation Simple Pseudo-Approx. for k-median Our Algorithm for k-medianOutline
Pseudo-Approximation An (α, c)-pseudo approximation is a solution that opens at most αk facilities and whose connection cost is at most c times the optimal cost A warm-up : (1 + ε, O(1/ε))-pseudo approximation for k-median
Pseudo-Approximation Let m' = m / (1+ε), y' i =(1+ε)y i =1/m' Every client only needs to connect to m' facilities We fractionally open km(1/m')=(1+ε)k facilities Define F' j, d' av (j),d' max (j) similarly facilities clients j
Pseudo-Approximation Two clients j and j' conflict if F' j F' j' ≠ ∅ Select a set C' of clients such that no two clients in C' conflict each other facilities clients j j'
Pseudo-Approximation greedily constructing C' C with no confliction o while C ≠ ∅, select j C with the minimum d av (j) add j to C' remove j and all clients that conflict j from C facilities clients
Pseudo-Approximation open facilities o For every j C', randomly open 1 of the m' facility in F' j o For any facility i that is not inside j C' F' j, open i with probability 1/m' connect each client to its nearest open facility facilities clients Fact: every facility is open with probability 1/m'
Pseudo-Approximation j j' facilities clients F' j F' j'
Introduction Linear Programming Relaxation Simple Pseudo-Approx. for k-median Our Algorithm for k-medianOutline
Barrier to Obtain True Approximation If ε=0, then F' j =F j d max (j) >> d av (j) With non-zero prob., j will be connected to facilities in F j' The expected connection cost of j is unbounded compared to d av (j) facilities clients FjFj F j' j j'
Remove the Barrier Solution: j only “claims” close facilities in F j Let U j be the set of claimed facilities Use U j to replace F j in the algorithm New Barrier: |U j | < m might happen can not guarantee always a facility open in U j FjFj UjUj j
Remove the New Barrier can guarantee |U j | ≥ m/2 |U j U j' | ≥ m if U j and U j' are disjoint pair the clients in C' always open 1 facility (possibly 2 facilities) in U j U j' for a matched pair (j, j') j UjUj U j' j'
Remove the New Barrier How to open facilities for a matched pair? m boxes in a line Permute facilities in U j put them in the leftmost |U j | boxes Permute facilities in U j' put them in the rightmost |U j' | boxes Open facilities in a random selected box m UjUj U j'
The Algorithm Filtering o 2 clients j and j' conflict if d(j, j') ≤ 4max{d av (j),d av (j')} o while C ≠ ∅ select j C that minimizes d av (j); add j to C' remove j and all clients that conflict j from C
The Algorithm Filtering Claiming o For any j C', let 2R j be the distance between j and its nearest neighbor in C' o A facility i is claimed by j, if i F j and d(i, j) ≤ R j i.e, U j = F j Ball(j, R j ) Fact: any client j C' will claim at least m/2 and at most m facilities.
The Algorithm Filtering Claiming Matching o while there are at least 2 unmatched clients in C' select 2 unmatched clients j and j' that minimizes d(j, j') match j and j'
The Algorithm Filtering Claiming Matching Rounding o For each matched pair (j, j'), open 1 or 2 facilities in U j U j' o If there is an unmatched client j, open 0 or 1 facility in U j o For each facility i that is not inside any U j, open i with probability 1/m o Connect each client to its nearest open facility
Proof of Constant Approx. Ratio
j j1j1 j2j2 nearest neighbor of j in C' j 2 is matched with j 1 2Rj2Rj 2R j1 ≤ 2R j RjRj Rj1Rj1 Rj2Rj2 UjUj Uj1Uj1 Uj2Uj2
Proof of 3.25 approx. ratio complicated, details omitted rough idea : for a client j C' o j 1 C' is the client that conflicts and removes j in the filtering phase o j 2 C' is the nearest neighbor of j 1 in C' o j 3 C' is the client matched with j 2 o Consider the nearest open facility of j in F j F j1 U j2 U j3 Our algorithm opens k facilities in expectation Can be easily transformed so that it always opens k facilities Algorithm naturally extends to k-FL problem
Ongoing Work Joint work with Svensson, improved the best approximation ratio (3+ε) for k-median
Summary We introduced a LP-rounding algorithm for k-median problem o proved 3.25 approximation ratio for the problem o it has potential to improve the decade-long 3 approximation Improved approximation algorithms for the following problems o k-facility location problem 3.25 o Matroid median problem 9 o Knapsack median problem 34