Online Algorithms via Projections set cover, paging, k-server Niv Buchbinder Tel-Aviv Anupam Gupta CMU Marco Molinaro PUC-Rio Seffi Naor Technion
K-Server problem Finite metric space (𝑉,𝑑) with |𝑉|=𝑛 points 𝑘= # of “servers” that algo controls Input: request sequence 𝑟 1 , 𝑟 2 , …, 𝑟 𝑡 , … Output: on seeing 𝑟 𝑡 , algo needs to have a server at 𝑟 𝑡 Min: total distance moved by servers OPT = optimal cost/solution in hindsight Goal: online ALG such that 𝔼[𝑐𝑜𝑠𝑡(𝐴𝐿𝐺)] ≤ 𝛼.𝑐𝑜𝑠𝑡(𝑂𝑃𝑇) + 𝑐′
(very partial) history and results Deterministic 2𝑘−1 upper bound, 𝑘 lower bound [Koutsoupias Papadimitriou 95] Randomized Ω( log 𝑘 ) even when metric is a star [folklore 90s?] 𝑂(log 𝑘) for weighted stars [Bansal Buchbinder Naor 07] 𝑂( log 3 𝑛 log 2 𝑘) [Bansal Buchbinder Madry Naor 11] 𝑶( 𝐥𝐨𝐠 𝟔 𝒌) [Bubeck Cohen Lee Lee Madry 18] [Lee 18] Rounding, loses 𝑶(𝟏) 𝑶( log 𝟐 𝑘 ) fractional on HSTs If HSTs, then general metrics, loses 𝑶( log 𝟒 𝑘 )
(very partial) history and results Deterministic 2𝑘−1 upper bound, 𝑘 lower bound [Koutsoupias Papadimitriou 95] Randomized Ω( log 𝑘 ) even when metric is a star [folklore 90s?] 𝑂(log 𝑘) for weighted stars [Bansal Buchbinder Naor 07] 𝑂( log 3 𝑛 log 2 𝑘) [Bansal Buchbinder Madry Naor 11] 𝑶( 𝐥𝐨𝐠 𝟔 𝒌) [Bubeck Cohen Lee Lee Madry 18] 𝑶( log 𝟐 𝑘 ) fractional on HSTs
𝑂( log 2 𝑘 ) on HST Technique: Continuous time online mirror descent - Differential inclusion gives trajectory 𝑥 𝑡 Can be discretized 𝑥 𝑡 𝑥 𝑡+1 Hopefully some progress in role of regularization in online algo
Our result Our result: Very coarse discretization works! ≡ projection-based algorithm works Projection-based algorithms as a natural option for movement-based online problems? [Buchbinder Chen Naor 14] Thm: Discrete*, projection-based algo gives 𝑂( log 2 𝑘 ) approximation for fractional k-server on HSTs 𝑥 𝑡 𝑥 𝑡+1 𝑥 𝑡 𝑥 𝑡+1 Hopefully some progress in role of regularization in online algo
projection-based algorithm “Base” polytope 𝐾 based on metric (HST) and 𝑘 At time t: polytope 𝐾 𝑡 of feasible states where both alg and opt need to be in (i.e. 𝐾 + some server at the requested position 𝑟 𝑡 ) Distance is a Bregman divergence Use variants of KL divergence 𝑲 𝒕 𝑲 𝑥 𝑡 𝑲 𝒕+𝟏 Algorithm: 𝑥 𝑡+1 ← arg min 𝑥∈ 𝐾 𝑡+1 distance(𝑥, 𝑥 𝑡 ) 𝑥 𝑡+1
rest of the talk - Illustration on a simpler problem: Online Set Cover - Some words about generalization to k-server - Closing remarks
online set cover 𝑥 𝑡−1 ∈ 𝐑 + 𝑛 At time t: 𝑎 𝑡 , 𝑥 ≥1 for some 𝑎 𝑡 ∈ 0,1 𝑛 Monotonically increase 𝑥 𝑡−1 → 𝑥 𝑡 Satisfy all constraints until now Movement cost at time t: | 𝑥 𝑡 − 𝑥 𝑡−1 | 1 = 𝟏, 𝑥 𝑡 − 𝑥 𝑡−1 Min: total movement cost 𝒕 𝟏, 𝑥 𝑡 − 𝑥 𝑡−1 = 𝟏, 𝑥 𝑇 𝒙 𝒕 𝒙 𝒕−𝟏
the projection-based algorithm Define the feasible states set 𝐾 𝑡 = all constraints up to time t = 𝑥≥0 𝑎 𝑠 , 𝑥 ≥1 ∀𝑠≤𝑡} (cannot add monotonicity 𝑥≥ 𝑥 𝑡−1 , OPT does not satisfy) 𝑲 𝒕
the projection-based algorithm Define the feasible states set 𝐾 𝑡 = all constraints up to time t = 𝑥≥0 𝑎 𝑠 , 𝑥 ≥1 ∀𝑠≤𝑡} (cannot add monotonicity 𝑥≥ 𝑥 𝑡−1 , OPT does not satisfy) 𝑲 𝒕 Algorithm: 𝑥 0 = 1/n 𝑥 𝑡 ← arg min 𝑥∈ 𝐾 𝑡 D(𝑥| 𝑥 𝑡−1 ) 𝐷(𝑝 𝑞 = 𝑖 𝑝 𝑖 log 𝑝 𝑖 𝑞 𝑖 − 𝑝 𝑖 + 𝑞 𝑖 unnormalized KL divergence
Guarantee of Proj-based algo Obs: The algo is feasible: 𝑥 𝑡 are monotonically increasing Thm: ALG ≤ log 𝑛 ⋅OPT+ 1 Not new [Alon et al. 03], [Buchbinder, Chen, Naor 14], ... Comparing our fractional solution 𝑥 𝑡 to integral OPT 𝑦 𝑡
Analysis: cost Cost: 𝟏, 𝑥 𝑡 − 𝑥 𝑡−1 Algo: 𝑥 0 = 1/n 𝑥 𝑡 ← arg min 𝑥∈ 𝐾 𝑡 D(𝑥| 𝑥 𝑡−1 ) 𝐷(𝑝 𝑞 = 𝑖 𝑝 𝑖 log 𝑝 𝑖 𝑞 𝑖 − 𝑝 𝑖 + 𝑞 𝑖 Main property of divergence: Reverse Pythagorean inequality 𝐷(𝑦 𝑥 ≥ 𝐷(𝑦| 𝑥 𝑝𝑟𝑜𝑗 ) + 𝐷( 𝑥 𝑝𝑟𝑜𝑗 |𝑥) 𝑥 𝑝𝑟𝑜𝑗 𝑥 ≥𝟎 ⇒𝐷 𝑦 𝑥 𝑝𝑟𝑜𝑗 −𝐷(𝑦 𝑥 ≤0 𝑦
Analysis: cost Cost: 𝟏, 𝑥 𝑡 − 𝑥 𝑡−1 Algo: 𝑥 0 = 1/n 𝑥 𝑡 ← arg min 𝑥∈ 𝐾 𝑡 D(𝑥| 𝑥 𝑡−1 ) 𝐷(𝑝 𝑞 = 𝑖 𝑝 𝑖 log 𝑝 𝑖 𝑞 𝑖 − 𝑝 𝑖 + 𝑞 𝑖 Main property of divergence: Reverse Pythagorean inequality 𝑥 𝑡 𝑥 𝑡−1 𝐾 𝑡 ⇒𝐷 𝑦 𝑡 𝑥 𝑡 −𝐷( 𝑦 𝑡 𝑥 𝑡−1 ≤0 𝑦 𝑡
Analysis: cost Cost: 𝟏, 𝑥 𝑡 − 𝑥 𝑡−1 Algo: 𝑥 0 = 1/n 𝑥 𝑡 ← arg min 𝑥∈ 𝐾 𝑡 D(𝑥| 𝑥 𝑡−1 ) 𝐷(𝑝 𝑞 = 𝑖 𝑝 𝑖 log 𝑝 𝑖 𝑞 𝑖 − 𝑝 𝑖 + 𝑞 𝑖 Main property of divergence: Reverse Pythagorean inequality Φ(𝑦 𝑥 =𝐷 𝑦 𝑥 + 𝟏,𝑦 − 𝟏,𝑥 𝑥 𝑡 𝑥 𝑡−1 𝐾 𝑡 𝟏, 𝑥 𝑡 − 𝑥 𝑡−1 +Φ( 𝑦 𝑡 𝑥 𝑡 − Φ( 𝑦 𝑡 𝑥 𝑡−1 ≤0 ⇒𝐷 𝑦 𝑡 𝑥 𝑡 −𝐷( 𝑦 𝑡 𝑥 𝑡−1 ≤0 𝑦 𝑡
Analysis: cost Cost: 𝟏, 𝑥 𝑡 − 𝑥 𝑡−1 Algo: 𝑥 0 = 1/n 𝑥 𝑡 ← arg min 𝑥∈ 𝐾 𝑡 D(𝑥| 𝑥 𝑡−1 ) 𝐷(𝑝 𝑞 = 𝑖 𝑝 𝑖 log 𝑝 𝑖 𝑞 𝑖 − 𝑝 𝑖 + 𝑞 𝑖 Main property of divergence: Reverse Pythagorean inequality Φ(𝑦 𝑥 =𝐷 𝑦 𝑥 + 𝟏,𝑦 − 𝟏,𝑥 𝑥 𝑡−1 𝑥 𝑡 𝑦 𝑡 𝐾 𝑡 𝟏, 𝑥 𝑡 − 𝑥 𝑡−1 +Φ( 𝑦 𝑡 𝑥 𝑡 − Φ( 𝑦 𝑡−1 𝑥 𝑡−1 ≤Φ( 𝑦 𝑡 𝑥 𝑡−1 −Φ( 𝑦 𝑡−1 𝑥 𝑡−1 𝟏, 𝑥 𝑡 − 𝑥 𝑡−1 +Φ( 𝑦 𝑡 𝑥 𝑡 − Φ( 𝑦 𝑡 𝑥 𝑡−1 ≤0 ALGs cost change in potential ≈ OPTs cost ?
Analysis: cost 𝟏, 𝑥 𝑡 − 𝑥 𝑡−1 +Φ( 𝑦 𝑡 𝑥 𝑡 − Φ( 𝑦 𝑡 𝑥 𝑡−1 ≤0 Algo: 𝑥 0 = 1/n 𝑥 𝑡 ← arg min 𝑥∈ 𝐾 𝑡 D(𝑥| 𝑥 𝑡−1 ) 𝐷(𝑝 𝑞 = 𝑖 𝑝 𝑖 log 𝑝 𝑖 𝑞 𝑖 − 𝑝 𝑖 + 𝑞 𝑖 Φ(𝑦 𝑥 =𝐷 𝑦 𝑥 + 𝟏,𝑦 − 𝟏,𝑥 𝟏, 𝑥 𝑡 − 𝑥 𝑡−1 +Φ( 𝑦 𝑡 𝑥 𝑡 − Φ( 𝑦 𝑡 𝑥 𝑡−1 ≤0 𝟏, 𝑥 𝑡 − 𝑥 𝑡−1 +Φ( 𝑦 𝑡 𝑥 𝑡 − Φ( 𝑦 𝑡−1 𝑥 𝑡−1 ≤Φ( 𝑦 𝑡 𝑥 𝑡−1 −Φ( 𝑦 𝑡−1 𝑥 𝑡−1 ⇒𝐷 𝑦 𝑡 𝑥 𝑡 −𝐷( 𝑦 𝑡 𝑥 𝑡−1 ≤0 ≈ OPTs cost ? Lemma: RHS ≤ log 𝑛 ∙ OPTs cost Pf: If OPT increases 𝑦 𝑖 𝑡−1 =0 → 𝑦 𝑖 𝑡 =1 OPTs cost = +1 ΔΦ=1 log 1 𝑥 𝑖 𝑡−1 −0 log 0 𝑥 𝑖 𝑡−1 ≤ log 𝑛 (because 𝑥 𝑖 𝑡−1 ≥ 𝑥 𝑖 0 ≥ 1 𝑛 )
Analysis: cost 𝟏, 𝑥 𝑡 − 𝑥 𝑡−1 +Φ( 𝑦 𝑡 𝑥 𝑡 − Φ( 𝑦 𝑡 𝑥 𝑡−1 ≤0 Algo: 𝑥 0 = 1/n 𝑥 𝑡 ← arg min 𝑥∈ 𝐾 𝑡 D(𝑥| 𝑥 𝑡−1 ) 𝐷(𝑝 𝑞 = 𝑖 𝑝 𝑖 log 𝑝 𝑖 𝑞 𝑖 − 𝑝 𝑖 + 𝑞 𝑖 Φ(𝑦 𝑥 =𝐷 𝑦 𝑥 + 𝟏,𝑦 − 𝟏,𝑥 𝟏, 𝑥 𝑡 − 𝑥 𝑡−1 +Φ( 𝑦 𝑡 𝑥 𝑡 − Φ( 𝑦 𝑡 𝑥 𝑡−1 ≤0 𝟏, 𝑥 𝑡 − 𝑥 𝑡−1 +Φ( 𝑦 𝑡 𝑥 𝑡 − Φ( 𝑦 𝑡−1 𝑥 𝑡−1 ≤Φ( 𝑦 𝑡 𝑥 𝑡−1 −Φ( 𝑦 𝑡−1 𝑥 𝑡−1 ⇒𝐷 𝑦 𝑡 𝑥 𝑡 −𝐷( 𝑦 𝑡 𝑥 𝑡−1 ≤0 log 𝑛 ∙ OPTs cost Adding over all times t: ALGs total cost ≤ log𝑛∙ OPTs total cost + Φ( 𝑦 0 | 𝑥 0 ) ≤ log𝑛∙ OPTs total cost + 1
rest of the talk - Illustration on a simpler problem: Online Set Cover - Some words about generalization to k-server - Closing remarks
k-server polytope and distance Use the non-trivial LP from Bubeck Cohen Lee Lee Madry 18 (~unary encoding of # servers + …) 𝐾 = { 𝑥 : 𝑥 𝑢,𝑗 ∈[0,1] 𝑥 root, 𝑗 ≤ 𝟏 𝑗≤𝑘 ∀ 𝑗∈𝑛 𝑗≤|𝑆| 𝑥 𝑝 𝑆 , 𝑗 ≥ 𝑣,ℓ ∈𝑆 𝑥 𝑣,ℓ } (actual polytope: “anti-server” polytope) 𝐾 𝑡 =𝐾∩ 𝑥 𝑟 𝑡 ≥ 1 Also use divergence from Bubeck Cohen Lee Lee Madry 18 𝐷(𝑝 𝑞 = ∑ 𝑢 𝑤 𝑢 ( ∑ 𝑗 𝑝 𝑢,𝑗 log 𝑝 𝑢,𝑗 𝑞 𝑢,𝑗 − 𝑝 𝑢,𝑗 + 𝑞 𝑢,𝑗 ) ∀ S with common parent
proof elements Very inspired in proof of Bubeck Cohen Lee Lee Madry 18 𝐾 𝑡 = { 𝑥 : 𝑥 𝑢,𝑗 ∈[0,1] 𝑥 root, 𝑗 ≥ 𝟏 𝑗>𝑘 ∀ 𝑗∈𝑛 𝑗≤|𝑆| 𝑥 𝑝 𝑆 , 𝑗 ≥ 𝑣,ℓ ∈𝑆 𝑥 𝑣,ℓ ∀ S with common parent 𝑥 𝑟 𝑡 , 1 ≤ 𝛿 } Very inspired in proof of Bubeck Cohen Lee Lee Madry 18 Simplification and KKT Potential: D + linear terms Cost function not linear anymore (||∙| 1 -type but no monotonicity) Stronger version of Reverse Pythagorean ineq, relates to duals To avoid dependence on height, additional potential, delicate
closing remarks Discrete projection-based algorithms for k-server (and paging) 𝑂( log 2 𝑘) -competitive for fractional k-server on HSTs Matches results of Bubeck et al. 18 Show tight 𝑂( log 𝑘) result k-server on HSTs, and on general graphs? Other problems? What are crucial properties of the LP? Right divergence?
Thank you!