Submodular Function Maximization with Packing Constraints via MWU

Submodular Function Maximization with Packing Constraints via MWU
Chandra Chekuri Univ. of Illinois, Urbana-Champaign 11th Israeli CS Theory Day, Tel-Aviv, Dec 11, 2018

Max k-Cover U = {1, 2, …, m} X1, X2, …, Xn each a subset of U
Max k-Cover: Find k sets from X1, X2, …, Xn to maximize union of chosen sets X1 X2 X3 X4 X5 X1 X5 X4 X2 X3

Greedy for Max-k-Cover
For (i = 1 to k) do Add largest cardinality set in residual instance to solution Remove covered elements Special case of a general result on submodular functions. [Nemhauser-Wolsey-Fisher’78] Theorem: Greedy yields a (1-1/e) approximation. Theorem [Feige’98] Unless P = NP, no (1 - 1/e - ε) approximation for Max-k-Cover

Submodular Set Functions
A function 𝑓: 2 𝑁 →𝑅 is submodular if f(A+j) – f(A) ≥ f(B+j) – f(B) for all A ⊆ B and j ∉ B A B j f(A+j) – f(A) ≥ f(A+i+j) – f(A+i) for all A and i,j ∉ A Equivalently f(A) + f(B) ≥ f(A⋃ B) + f (A ⋂ B) for all A,B  N

Coverage in Set Systems
X1, X2, ..., Xn subsets of set U N = [n] where f(A) = |∪ 𝑖∈𝐴 𝑋 𝑖 | X1 X1 X5 X5 X4 X4 X2 X2 X3 X3

Cut functions in graphs
G=(V,E) undirected graph 𝑓: 2 𝑉 → 𝑅 + where f(S) = |δ(S)| S

Submodular Set Functions
Monotone: f(A) ≤ f(B) for all A  B Example: coverage in set systems, matroid rank function Non-negative: f(A) ≥ 0 for all A Example: cut function of directed graphs Symmetric: f(A) = f(N\A) for all A Example: cut function of undirected graphs

Constrained Submodular Function Maximization
max f(S) |S| ≤ k max f(S) S is independent set in a matroid max f(S) S is independent in intersection of p matroids max f(S) A 1S ≤ b max f(S) S is independent

Constrained Submodular Function Maximization
Classical topic from late 70’s [Cornuejols-Fisher- Nemhauser-Wolsey …] Yuge progress in (approximation) algorithms and hardness in the last 15 years Many new applications: algorithmic game theory, machine learning, data summarization and others Ongoing work … CAVEAT: will not be able to mention many refs

Explicit Packing Constraints
max f(S) A 1S ≤ b 𝐴∈ 𝑅 + 𝑚×𝑛 𝑏∈ 𝑅 + 𝑚 Partition and laminar matroid constraints Knapsack constraint(s) Matchings in graphs and hypergraphs Resource allocation problems Intersections of the above …

Mathematical programming approach
For submodular func maximization [Calinescu-C-Pal-Vondrak’07] Initial goal: to improve ½ for matroid constraint to (1-1/e) Necessary for handing packing constraints: Ax ≤ b

Math. Programming approach
max f(S) A 1S ≤ b max F(x) A x ≤ b Round x Several general results via this approach

Multilinear extension of f
[Calinescu-C-Pal-Vondrak’07] inspired by [Ageev-Sviridenko’04] For f : 2N → R+ define F : [0,1]N → R+ Let x = (x1, x2, ..., xn)  [0,1]N R: random set, include i independently with prob. xi F(x) = E[ f(R) ] =  S  N f(S) iS xi  i  N\S (1-xi)

Properties of F F is neither concave nor convex
𝐹(𝒙) and 𝐹’(𝑥) can be estimated by random sampling F is a smooth submodular function 2F/xixj ≤ 0 for all i,j. Recall f(A+j) – f(A) ≥ f(A+i+j) – f(A+i) for all A, i, j 2F/xi2 = 0 for all i (multilinearity) F is concave along any non-negative direction vector If f is monotone: F/xi ≥ 0 for all i, and F(y) ≥ F(x) if y ≥ x

Properties of F F is concave along any non-negative direction vector
𝒙+𝒅 𝐝≥0 𝒙+𝜹𝒅 𝒙 𝑔 𝛿 =𝐹 𝑥+𝛿𝑑 is concave in 𝛿 𝐹 ′ 𝑥 ,𝑑 ≥𝐹 𝑥+𝑑 −𝐹(𝑥)

(Approximately) Maximizing F
max f(S) A 1S ≤ b max F(x) A x ≤ b Round x Theorem: [Vondrak’08] For any monotone f, there is a (1-1/e) approximation for max { F(x) | x  P } where P [0,1]N is any solvable polytope. Algorithm: Continuous-Greedy Many subsequent developments, especially for nonnegative [C-Vondrak-Zenklusen’11,Feldman-Naor-Schwartz’11,Ene- Nguyen’16, Buchbinder-Feldman’17]

Limitation Algorithms based on multilinear relaxation are slow
Need to evaluate 𝐹 and derivative 𝐹’ via sampling Step size in discretized versions of continuous-greedy and related algorithms needs to be small Ω 𝑛 5 calls to f

This Talk max f(S) A 1S ≤ b max F(x) Ax ≤ b
Faster sequential and parallel algorithms (in vouge recently) (1-1/e - ε) approximation for monotone functions Based on [C-Jayram-Vondrak’15] and [C-Quanrud’18] Will ignore rounding: efficient in many cases

Measure of performance
Restrict attention to monotone submodular functions Measure of performance # of calls to value oracle for f # of calls to gradient oracle for F (which is some times available directly) Depth of computation in the parallel setting

Single Packing Constraint
max F(x) 𝑎 1 𝑥 1 + 𝑎 2 𝑥 2 +…+ 𝑎 𝑛 𝑥 𝑛 ≤𝑏 𝒙←𝟎 While 𝑎,𝑥 <𝑏 do 𝑗←𝑎𝑟𝑔𝑚𝑎 𝑥 ℎ 𝐹 ℎ ′(𝑥)/ 𝑎 ℎ 𝒙←𝒙+𝜹 𝒆 𝒋 Output x [Wolsey’82, Vondrak’08]

Analysis Lemm: For monotone f, F(x) ≥ (1-1/e) OPT Let x* be optimum fractional soln. Wlog 𝑎, 𝑥 ∗ =𝑏 x is current point 𝒙∨ 𝒙 ∗ x ⟨𝐹 ′ 𝑥 , 𝑥∨ 𝑥 ∗ −𝑥 ⟩≥𝐹 𝑥∨ 𝑥 ∗ −𝐹 𝑥 ≥𝑂𝑃𝑇 −𝐹(𝑥)

Analysis ⟨𝐹 ′ 𝑥 , 𝑥∨ 𝑥 ∗ −𝑥 ⟩≥𝐹 𝑥∨ 𝑥 ∗ −𝐹 𝑥 ≥𝑂𝑃𝑇 −𝐹(𝑥) By monotonicity 𝐹 𝑗 ′ 𝑥 ≥0 for all j 𝑗 𝐹 𝑗 ′ 𝑥 𝑥 𝑗 ∗ ≥⟨ F ′ 𝑥 , 𝑥∨ 𝑥 ∗ −𝑥 ⟩≥𝑂𝑃𝑇 −𝐹(𝑥) And 𝑗 𝑎 𝑗 𝑥 𝑗 ∗ =𝑏 and hence 𝑚𝑎 𝑥 ℎ 𝐹 ℎ ′ 𝑥 𝑎 ℎ ≥ 𝑂𝑃𝑇 −𝐹(𝑥) 𝑏

Analysis 𝑑𝐹 𝑥 𝑑𝑡 ≥ 𝑂𝑃𝑇 −𝐹 𝑥 𝑏 𝐹 𝑗 ′ 𝑥 𝑎 𝑗 ≥ 𝑂𝑃𝑇 −𝐹(𝑥) 𝑏
𝐹 𝑗 ′ 𝑥 𝑎 𝑗 ≥ 𝑂𝑃𝑇 −𝐹(𝑥) 𝑏 𝐹 𝑥+𝛿 𝑒 𝑗 −𝐹 𝑥 = 𝐹 𝑗 ′ 𝑥 𝛿≥ 𝛿 𝑎 𝑗 𝑂𝑃𝑇 −𝐹(𝑥) 𝑏 Let 𝑡= 𝑗 𝑎 𝑗 𝑥 𝑗 𝑑𝐹 𝑥 𝑑𝑡 ≥ 𝑂𝑃𝑇 −𝐹 𝑥 𝑏 Solving differential equation with initial condition F(0) = 0, at t = b, 𝐹 𝑥 ≥ 1− 1 𝑏 𝑏 𝑂𝑃𝑇≥ 1 − 1 𝑒 𝑂𝑃𝑇

Single Packing Constraint
𝒙←𝟎 While 𝑎,𝑥 <𝑏 do 𝑗←𝑎𝑟𝑔𝑚𝑎 𝑥 ℎ 𝐹 ℎ ′(𝑥)/ 𝑎 ℎ 𝒙←𝒙+𝜹 𝒆 𝒋 Output x F is multilinear 𝐹 𝑗 ′ 𝑥 does not change if x changes only in j All other coordinates get only worse due to submodularity # of gradient calls O(n) 𝒙←𝟎 While 𝑎,𝑥 <𝑏 do 𝑗←𝑎𝑟𝑔𝑚𝑎 𝑥 ℎ 𝐹 ℎ ′(𝑥)/ 𝑎 ℎ 𝜹 ← min{1, (b - 𝑎,𝑥 )/aj } 𝒙←𝒙+ 𝜹 𝒆 𝒋 Output x

Parallel Algorithm? max f(S) |𝑆|≤𝑘
[Balkanski-Singer’18] initiated study of adaptivity in submodular maximization. Close to standard PRAM/NC models: assume value oracle for f is parallelizable

Parallel Algorithm? max f(S) |𝑆|≤𝑘
[Balkanski-Singer’18] initiated study of adaptivity in submodular maximization. Assume oracle access to f and count only “rounds” of access to f [Balkanski-Rubinstein-Singer’18, Ene-Ngyuyen’18,Fahrbach- Mirrokni-Zadimoghaddam’18] [C-Quanrud’18] (1-1/e- ε) approximation in O(log n/ε2) rounds

Parallel Algorithm max f(S) |𝑆|≤𝑘 max F(x) 𝑥 1 + 𝑥 2 +…+ 𝑥 𝑛 ≤𝑘
Can round x without loss for matroid constraint [CCPV’07]

Parallel Algorithm max F(x) 𝑥 1 + 𝑥 2 +…+ 𝑥 𝑛 ≤𝑘
Simple Idea: Simultaneously increase all coordinates with best gradient as much as possible

Parallel Algorithm 𝒙←𝟎 While ( 𝑗 𝑥 𝑗 ≤𝑘 ) do
𝜆← max ℎ 𝐹 ℎ ′ (𝑥 ) // current best gradient repeat 𝑆← 𝑗 𝐹 𝑗 ′ (𝑥)≥ 1−𝜖 𝜆 } // good coords 𝒙←𝒙+𝜹 𝑺 // increase all of them uniformly Until ( 𝑆=∅ ) Output x Choose greedy step δ as large as possible such that S remains good through the step That is 𝐹 𝑥+𝛿𝑆 −𝐹 𝑥 ≥ 1−2𝜖 𝜆 𝛿 𝑆

Consequences of greedy step size
All coordinates that are simultaneously increased are good throughout greedy step: approximation ratio is similar to single coordinate algorithm: (1-1/e – ε) Main issue: # of iterations/depth

Claim: After greedy choice |S| goes down by (1– ε) factor # of inner iterations is O((log n)/ε) # of outer iterations is O((log n)/ε) 𝜆 decreases by (1– ε) factor each time 𝜆 starts at OPT and can stop when 𝜆 < OPT/n Thus total # of iterations is O((log n)2 /ε2) Can improve to O((log n)/ε2)

Claim: After greedy choice |S| goes down by (1– ε) factor If 𝑆 ′ > 1−𝜖 𝑆 then for some 𝛿 ′ >𝛿 we have 𝐹 𝑥+𝛿′𝑆′ −𝐹 𝑥 ≥ 1−𝜖 𝜆 𝛿 ′ 𝑆 ′ ≥ 1−𝜖 2 𝜆 𝛿 ′ 𝑆 submodularity and monotonicity, contradicting choice of 𝛿 𝑆= 𝑗 𝐹 𝑗 ′ 𝑥 ≥ 1−𝜖 𝜆 } 𝑦=𝑥+𝛿𝑆 𝑆 ′ = 𝑗 𝐹 𝑗 ′ 𝑦 ≥ 1−𝜖 𝜆 } > 1−2𝜖 𝜆 𝛿 ′ 𝑆

Comments Algorithm is deterministic assuming 𝐹’ is available
Greedy step can implemented via binary search Can convert continuous algorithm into randomized combinatorial algorithm relatively easily

Multiple Constraints max F(x) 𝐴𝒙≤𝒃 max F(x) 𝐴𝒙≤𝟏 max F(x)
𝐴 11 𝑥 1 + 𝐴 12 𝑥 2 +…+ 𝐴 1𝑛 𝑥 𝑛 ≤1 . 𝐴 𝑚1 𝑥 1 + 𝐴 𝑚2 𝑥 2 +…+ 𝐴 𝑚𝑛 𝑥 𝑛 ≤1

MWU framework for Submod
[C-Jayram-Vondrak ITCS 2015, Israel] inspired by [Azar-Gamzu’13] Positive weights w1,w2,…,wm for the constraints. “Solve Lagrangean relaxation” by collapsing into one constraint Add a small amount of solution to 𝑥 Update weights and maintain them to be exponential in the load on constraints 𝑤=exp⁡(𝜂𝐴𝑥) Iterate until ”one unit of time” max F(x) 𝐴𝒙≤𝟏 max F(x) 𝑤 𝑇 𝐴𝒙≤ Σ 𝑖 𝑤 𝑖

MWU Based Algorithm max F(x) 𝐴𝒙≤𝟏 max F(x) 𝑤 𝑇 𝐴𝒙≤ Σ 𝑖 𝑤 𝑖
𝒙←𝟎, 𝒘←𝟏, time = 0 While 𝑡𝑖𝑚𝑒≤1 do For each coordinate h 𝑎 ℎ =( 𝑖 𝑤 𝑖 𝐴 𝑖ℎ )/ 𝑖 𝑤 𝑖 𝑗←𝑎𝑟𝑔𝑚𝑎 𝑥 ℎ 𝐹 ℎ ′(𝑥)/ 𝑎 ℎ 𝒗← 𝒂 𝒉 𝒆 𝒋 𝒙←𝒙+𝜹 𝒗 𝒕𝒊𝒎𝒆←𝒕𝒊𝒎𝒆+𝜹 For each constraint i 𝑤 𝑖 = w i exp⁡(𝜂 𝐴 𝛿 𝑣 𝑖 ) Output x max F(x) 𝑤 𝑇 𝐴𝒙≤ Σ 𝑖 𝑤 𝑖

MWU Based Algorithm 𝒙←𝟎, 𝒘←𝟏, time = 0 While 𝑡𝑖𝑚𝑒≤1 do
For each coordinate h 𝑎 ℎ =( 𝑖 𝑤 𝑖 𝐴 𝑖ℎ )/ 𝑖 𝑤 𝑖 𝑗←𝑎𝑟𝑔𝑚𝑎 𝑥 ℎ 𝐹 ℎ ′(𝑥)/ 𝑎 ℎ 𝒗← 𝒂 𝒉 𝒆 𝒋 𝒙←𝒙+𝜹 𝒗 𝒕𝒊𝒎𝒆←𝒕𝒊𝒎𝒆+𝜹 For each constraint i 𝑤 𝑖 = w i exp⁡(𝜂 𝐴 𝛿 𝑣 𝑖 ) Output x As 𝛿→0 algorithm is continuous For 𝜂=( ln 𝑚)/𝜖 𝐹 𝑥 ≥ 1− 1 𝑒 𝑂𝑃𝑇 𝐴𝑥≤ 1+𝜖 𝟏 Discretize via greedy step size idea of [Garg-Konneman] O(m log m/ε2) iterations O(n) gradient calculations per iteration Using bucketing tricks O(n polylog(m,n)/ε2) gradient evals total O(n2 polylog(m,n)/ε4) calls to f

Parallel Algorithm max F(x) 𝐴𝒙≤𝟏 Combine:
Idea for parallelizing algorithm for single constraint (knapsack): simultaneously increase all good coordinates [Luby-Nisan’94,Young’00] Parallel version of MWU for packing LPs: update xi multiplicatively

𝒙←𝟎, 𝒘←𝟏, time = 0 While time≤1 do 𝑊← 𝑖 𝑤 𝑖 For each coordinate h: 𝑎 ℎ =( 𝑖 𝑤 𝑖 𝐴 𝑖ℎ )/𝑊 𝜆← max ℎ 𝐹 ℎ ′ (𝑥)/ 𝑎 ℎ 𝑆= 𝑗 𝐹 𝑗 ′ 𝑥 𝑎 ℎ ≥ 1−3𝜖 𝜆} While (𝑆≠∅ and 𝑖 𝑤 𝑖 ≤ 1+𝜖 𝑊 ) do 𝑥 𝑖 ← 𝑥 𝑖 1+𝛿𝛾 for max δ such that gradient is good throughout step 𝑤𝑒𝑖𝑔ℎ𝑡𝑠 𝑎𝑟𝑒 𝑠𝑡𝑎𝑏𝑙𝑒 time←time+𝛿 Update 𝑤 and 𝑆 Output x

Properties (1-1/e – ε) approximation Depth is 𝑂( log 2 𝑚 log 𝑛 / 𝜖 4 )
Total # of oracle calls to 𝐹’ is O(n poly (log(n)/ε)) and algorithm is deterministic Total # of calls to f is O(n2 poly (log(n)/ε)) if 𝐹’ is not directly available, and algorithm is randomized

Recent Work on Parallel Algorithms
Better dependence on ε and non-monotone functions for packing constraints [Ene-Nguyen-Vladu’18] Parallel algorithms for matroids and intersections [Balkanski-Rubinstein-Singer’18, C-Quanrud’18,Ene- Nguyen-Vladu’18] Parallel Double Greedy: (1/2 – ε) approximation for unconstrained optimization [Chen-Feldman-Karbasi’18, Ene-Nguyen-Vladu’18]

Conclusions Continuous approach for submod maximization powerful in theory and we now have faster algorithms for various settings Potentially useful in practice? Need empirical evaluation Can we extend continuous approach to relaxed notions submodularity and other classes of functions?

THANKS

A Combinatorial Algorithm
𝑸←∅ While ( Q ≤𝑘 ) do 𝜆← max ℎ 𝑓 𝑄 (ℎ ) repeat 𝑆← 𝑗 𝑓 𝑄 (𝑗)≥ 1−𝜖 𝜆 } 𝑸←𝑸∪𝑹 where R samples 𝛿𝑆 Until ( 𝑆=∅ ) Output Q

Submodular Function Maximization with Packing Constraints via MWU

Similar presentations

Presentation on theme: "Submodular Function Maximization with Packing Constraints via MWU"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Submodular Function Maximization with Packing Constraints via MWU

Similar presentations

Presentation on theme: "Submodular Function Maximization with Packing Constraints via MWU"— Presentation transcript:

Similar presentations

About project

Feedback