Krakow, Jan. 9, Outline: 1. Online bidding 2. Cow-path 3. Incremental medians (size approximation) 4. Incremental medians (cost approximation) 5. List scheduling on related machines 6. Minimum latency tours 7. Incremental clustering
Krakow, Jan. 9, List Scheduling jobs Given a list of jobs (each with a specified processing time), assign them to processors to minimize makespan (max load) Online algorithm: assignment of a job does not depend on future jobs Goal: small competitive ratio processors time
Krakow, Jan. 9, processors jobs makespan Greedy: Assign each job to the machine with the lightest load
Krakow, Jan. 9, jobs processors makespan better schedule:
Krakow, Jan. 9, List Scheduling Greedy is (2-1/m)-competitive [Graham ’66] Lower bound ≈1.88 [Rudin III, Chandrasekaran’03] Best known ratio ≈1.92 [Albers ‘99] [Fleischer, Wahl ‘00] Lots of work on randomized algorithms, preemptive scheduling, …
Krakow, Jan. 9, List Scheduling on Related Machines processors Related machines: machines may have different speeds 0.25 0.5 1 1 1 jobs 1 1 1
Krakow, Jan. 9, 0.25 0.5 1 jobs Algorithm 2PACK(L): schedule each job on the slowest machine whose load will not exceed 2L L 2L2L processors Hey, the opt makespan is at most L
Krakow, Jan. 9, Lemma: If the little birdie is right (opt makespan ≤ L) then 2PACK will succeed. Proof: Suppose 2PACK fails on job h h’s length on processor 1 ≤ L, so so load of processor 1 > L r = first processor with load ≤ L (or m+1, if no such processor) 1 2 … … m L 2L2L Claim: if opt executes k on a machine in {r,r+1,…,m} then so does 2PACK optimum 2PACK r r
Krakow, Jan. 9, … … m L 2L2L so k fits on r k r r optimum 2PACK so k‘s length here ≤ L k Lemma: If the little birdie is right (opt makespan ≤ L) then 2PACK will succeed. Proof: Suppose 2PACK fails on job h h’s length on processor 1 ≤ L, so so load of processor 1 > L r = first processor with load ≤ L (or m+1, if no such processor) Claim: if opt executes k on a machine in {r,r+1,…,m} then so does 2PACK
Krakow, Jan. 9, … … m L 2L2L r r optimum 2PACK So opt’s (speed-weighted) total load on processors {1,2,…,r-1} is > (r-1)L Lemma: If the little birdie is right (opt makespan ≤ L) then 2PACK will succeed. Proof: Suppose 2PACK fails on job h h’s length on processor 1 ≤ L, so so load of processor 1 > L r = first processor with load ≤ L (or m+1, if no such processor) In other words: if 2PACK executes k on a machine in {1,2,…,r-1} then so does opt So some opt’s processor has load > L -- contradiction
Krakow, Jan. 9, Algorithm: 1. Choose d 1 < d 2 < d 3 < … (makespan estimates) Let B j = 2·( d 1 + d 2 + … + d j ) “bucket” j : time interval [B j-1, B j ] 2. j = 0 while there are unassigned jobs apply 2PACK with L = d j in bucket j if 2PACK fails on job k let j = j+1 and continue (starting with job k)
Krakow, Jan. 9, k bucket j 1 2 m … processor B1B1 B2B2 B j-1 BjBj B j+1 … k k’
Krakow, Jan. 9, Analysis: Suppose the optimal makespan is u Choose j such that d j-1 < u ≤ d j Then 2PACK will succeed in j ’th bucket (L = d j ) so algorithm’s makespan ≤ 2·(d 1 +d 2 + … + d j ) and We get ratio 8 for d j = 2 j 2 (bidding ratio)
Krakow, Jan. 9, Theorem: There is an 8-competitive online algorithm for list scheduling on related machines (to minimize makespan). With randomization the ratio can be improved to 2e. [Aspnes, Azar, Fiat, Plotkin, Waarts ‘06] World records: upper bound ≈ (4.311 randomized) lower bound ≈ (2 randomized) [Berman, Charikar, Karpinski ‘97] [Epstein, Sgall ‘00] List Scheduling on Related Machines
Krakow, Jan. 9, Outline: 1. Online bidding 2. Cow-path 3. Incremental medians (size approximation) 4. Incremental medians (cost approximation) 5. List scheduling on related machines 6. Minimum latency tours 7. Incremental clustering
Krakow, Jan. 9, Minimum Latency Tour X = metric space P = v 1 v 2 …v h : path in X Latency of v i on P latency P (v i ) = d(v 1, v 2 ) + … + d(v i-1, v i ) (Total) latency of P = i latency P (v i ) Minimum Latency Tour Problem: Given X, find a tour (path visiting all vertices) of minimum total latency Goal: polynomial-time approximation algorithm
Krakow, Jan. 9, Total latency = = 40 A 15 E 2 D 11 C 4 B 8 F
Krakow, Jan. 9, A E D C B F Minimum k-Tour Problem: find a shortest k-tour (a path that starts and ends at v 1 and visits ≥ k different vertices) 2-tour 4-tour
Krakow, Jan. 9, Algorithm: 1. Choose d 1 < d 2 < d 3 < … 2. For each k compute the optimal k-tour T k 3. Choose p(1) < … < p(m) = n s.t. length(T p(i) ) = d i (For simplicity assume they exist) 4. Output Q = T p(1) T p(2) …T p(m) (concatenation) Denote Q = q 1 q 2 …q n (q i = first point on Q different from q 1, q 2,…,q i-1 )
Krakow, Jan. 9, v1v1 T p(1) T p(2) T p(3) Q
Krakow, Jan. 9, Lemma: S = s 1 s 2 …s n : tour with optimum latency. Then latency S (s k ) ≥ (1/2) · length(T k ) Proof: s1s1 s2s2 s3s3 sksk S T T is a k-tour, so 2·latency S (s k ) = length(T) ≥ length(T k )
Krakow, Jan. 9, Analysis: For p(j-1) < k ≤ p(j) latency S (s k ) ≥ (1/2)·length(T k ) ≥ (1/2)·length(T p(j-1) ) = d j-1 /2 q k will be visited in T p(j) (or earlier), so latency Q (q k ) ≤ d 1 +d 2 + … + d j We get ratio 8 for d j = 2 j … if we can compute k-tours efficiently !!! 2 (bidding ratio)
Krakow, Jan. 9, If X is a weighted tree, optimal k-tours can be computed in polynomial time… Theorem: There is a polynomial-time 8-approximation algorithm for maximum latency tours on weighted trees [Blum, Chalasani, Coppersmith, Pulleyblank, Raghavan, Sudan ‘94] Can we do better?
Krakow, Jan. 9, Choose a random direction (clockwise or counter-clockwise) and traverse each T p(j) in this direction … v1v1 Tp(j)Tp(j) u Expected latency of u = d 1 +d 2 + …+ d j-1 + d j /2 We get ratio 6 for d j = 2 j
Krakow, Jan. 9, Can be extended to arbitrary spaces, with ratio [Chauduri, Godfrey, Rao, Talwar ‘03] Theorem: There is a polynomial-time approximation algorithm for maximum latency tours on weighted trees [Goemans, Kleinberg ‘98] Can we do even better? Instead of d j = 2 j choose d j = c j+x, where c is the constant from the Cow Path problem and x is random in [0,1) We don’t really need randomization: choose better direction (clockwise or counter-clockwise) There are only O(n) x’s that matter, so try them all
Krakow, Jan. 9, Outline: 1. Online bidding 2. Cow-path 3. Incremental medians (size approximation) 4. Incremental medians (cost approximation) 5. List scheduling on related machines 6. Minimum latency tours 7. Incremental clustering
Krakow, Jan. 9, k-Clustering X = metric space For C X, diameter(C) = maximum distance between points in C k-Clustering Problem: Given k, partition X into k disjoint clusters C 1,…,C k to minimize the maximum diameter(C j ) Offline: approximable with ratio 2 [Gonzales ‘85] [Hochbaum, Shmoys ‘85] lower bound of 2 for polynomial algorithms (unless P = NP) [Feder, Greene ‘88] [Bern, Eppstein ‘96]
Krakow, Jan. 9, E D C B F A G H 3-Clustering with maximum diameter 5 k=3
Krakow, Jan. 9, E D C B F A G H 3-Clustering with maximum diameter 3 k=3
Krakow, Jan. 9, Incremental k-Clustering Problem: Maintain k-clustering when points in X arrive online allowed operations: add point to a cluster merge clusters create a new singleton cluster Goal: online competitive algorithm (polynomial-time)
Krakow, Jan. 9, D C G diameter = 0 A k=3
Krakow, Jan. 9, D C G A E diameter = 2 k=3
Krakow, Jan. 9, D C G A E H diameter = 3 k=3
Krakow, Jan. 9, D C G A E H diameter = 3 k=3
Krakow, Jan. 9, Notation and terminology: Algorithm’s clusters C 1,C 2,…,C k’ with k’ ≤ k in each C i fix a center o i radius of C i = max distance between x C i and o i diameter of C i ≤ 2 · (radius of C i ) CiCi oioi radius
Krakow, Jan. 9, Procedure CleanUp(z). Goal: merge some clusters C 1,C 2,…,C k’ so that afterwards all inter-center distances are > z 1.Find a maximal set J of clusters with all inter- center distances > z 2. for each cluster C a J choose C b J with d(o a,o b ) ≤ z merge C a into C b (with center o b )
Krakow, Jan. 9, Lemma: If the max radius before CleanUp is h then after CleanUp it is ≤ h+z. Proof: follows from the ∆ inequality z
Krakow, Jan. 9, Lemma: If the max radius before CleanUp is h then after CleanUp it is ≤ h+z. Proof: follows from the ∆ inequality z v z h
Krakow, Jan. 9, Algorithm: 1. Choose d 1 < d 2 < d 3 < … 2. Initially C 1,C 2,…,C k are singletons (first k points) (Assume min distance between these points is > 1) 3. j 1 4. repeat when a new point x arrives if d(x,o i ) ≤ d j for some i add x to C i else if k’ < k k’ k’+1; C k’ {x} else create a temporary cluster C k+1 {x} while there k+1 clusters j j+1 do CleanUp with z = d j (merge clusters) checkpoint j Invariant: inter-center distance > d j
Krakow, Jan. 9, Example: k = 4
Krakow, Jan. 9, Example: k = 4 d j+1 d j+2
Krakow, Jan. 9, Analysis: At checkpoint j : Before clean-up k+1 clusters with inter-center distances > d j-1 so opt diameter > d j-1 After clean-up max radius ≤ d 1 + d 2 + … + d j so max diameter ≤ 2·(d 1 + d 2 + … + d j ) We get ratio 8 for d j = 2 j 2 (bidding ratio)
Krakow, Jan. 9, Theorem: There is an 8-competitive online algorithm for incremental clustering (ratio 2e with randomization). [Charikar, Chekuri, Feder, Motwani ‘06] Other results: upper bound 6 (4.33 randomized) (not polynomial-time) lower bound (2 randomized) Incremental Clustering
Krakow, Jan. 9, Online bidding 2. Cow-path 3. Incremental medians (size approximation) 4. Incremental medians (cost approximation) 5. List scheduling on related machines 6. Minimum latency tours 7. Incremental clustering 8. Other scheduling problems: List scheduling, related machines with preemption Scheduling with min-sum criteria Non-clairvoyant scheduling 9. Hierarchical clustering 10. Load balancing 11. Online algorithms for set cover (combined with primal-dual) …. Doubling Method Applications: