Randomized Composable Core-sets for Submodular Maximization Morteza Zadimoghaddam and Vahab Mirrokni Google Research New York
Outline Core-sets and their Applications Why submodular functions? A simple algorithm for monotone and non- monotone submodular functions. Improving approximation factor for monotone submodular functions.
Processing Big Data Input is too large to fit in one machine Extract and process a compact representation: – Sampling: focus only on a small subset of data – Sketching: compute a small summary of data, e.g. mean, variance, … – Mergeable Summaries: if multiple summaries can be merged while preserving accuracy: See [Ahn, Guha, McGregor PODS,SODA2012], [Agarwal et al. PODS2012]. – Composable core-sets [Indyk et al. 2014]
Composable Core-sets Partition input into several parts T 1, T 2, …, T m In each part, select a subset S i T i Take the union of selected sets:S=S 1 S 2 … S m Solve the problem on S Evaluation: We want set S to represent the original big input well, and preserve the optimum solution approximately.
Core-sets Applications Diversity and Coverage Maximization; Indyk, Mahabadi, Mahdian, Mirrokni [PODS 2014] k-means and k-median; Balcan et al. NIPS2013 Distributed Balanced Clustering; Bateni et al. NIPS 2014
Submodular Functions A non-negative set function f defined on subsets of a ground set N, i.e. f: 2 N R + {0} f is submodular iff for any two subsets A and B – f(A) + f(B) ≥ f(A B) + f(A B) Alternative definition: f is submodular iff for any two subsets A B, and element x: – f(A {x}) – f(A) ≥ f(B {x}) - f(B)
Submodular Maximization with Cardinality Constraints Given a cardinality constraint k, find a size k subset of N with maximum f value Machine Learning Applications: Exemplar based clustering, active, set selections, graph cuts, … [Mirzasoleiman, Karbasi, Sarkar, Krause NIPS 2013] Applications for search engines: coverage utility functions
Submodular Maximization via MapReduce Fast Greedy Algorithms in MapReduce and Streaming; Kumar et al. [SPAA 2013] They get 1/2 approximation factor with super constant number of rounds
Searched Keywords with Multiple Meanings
Formal Definition of Composable Core-sets Define, e.g. f k (N) is the value of optimum solution. Define ALG(T) to be the output of algorithm ALG on input set T. Suppose |ALG(T)| ≤ k. ALG is α-approximate composable core-set iff for any collection of sets T 1, T 2, …, T m we have
Applications of composable core-sets Distributed Approximation: – Distribute input between m machines, – ALG selects set S i = ALG(T i ) in machine 1 ≤ i ≤ m, – Gather the union of selected items, S 1 S 2 … S m, on a single machine, and select k elements. Streaming Models: Partition the sequence of elements, and simulate the above procedure. A class of nearest neighbor search problems
Bad News! Theorem[Indyk et al. 2013] There exists no better than approximate composable core-set for submodular maximization
Randomization comes to rescue Instead of working with worst case collection of sets T 1, T 2, …, T m, suppose we have a random partitioning of the input. We say ALG is α-approximate randomized composable core-set iff where the expectation is taken over the random choice of {T 1, T 2, …, T m }
General Framework Input Set N Machine 1 Machine m Machine 2 T1T1 T2T2 TmTm Run ALG in each machine Selected elements S1S1 S2S2 SmSm output set Run ALG’ to find the final size k output set
Good news! Theorem [Mirrokni, Z]: There exists a class of O(1)-approximate randomized composable core-sets for monotone and non-monotone submodular maximization. In particular, algorithm Greedy is 1/3- approximate randomized core-set for monotone f, and (1/3-1/3m)-approximate for non-monotone f.
Family of β-nice algorithms ALG is β-nice if for any set T and element x T \ ALG(T) we have: – ALG(T) = ALG(T\{x}) – Δ(x, ALG(T)) is at most βf(ALG(T))/k where Δ(x, A) is the marginal value of adding x to set A, i.e. Δ(x, A) = f(A {x})-f(A) Theorem: A β-nice algorithm is (1/(2+β))-approx randomized composable core-sets for monotone f and ((1-1/m)/(2+β))-approx for non-monotone.
Algorithm Greedy Given input set T, Greedy returns a size k output set S as follows: – Start with an empty set For k iterations, find an item x T with maximum marginal value to S, Δ(x, S), and add x to S. Remark: Greedy is a 1-nice algorithm. In the rest, we analyze algorithm Greedy for a monotone submodular function f.
Analysis Let OPT be the size k subset with maximum value of f. Let OPT’ be OPT (S 1 S 2 … S m ), and OPT’’ be OPT\OPT’ We prove that E[max{f(OPT’), f(S 1 ), f(S 2 ), …, f(S m )}] ≥ f(OPT)/3
Linearizing marginal contributions of elements in OPT Consider an arbitrary permutation π on elements of OPT For each x OPT, define OPT x to be elements of OPT that appear before x in π By definition of Δ values, we have: f(OPT) = x OPT Δ(x, OPT x )
Lower bounding f(OPT ’ ) f(OPT’) is x OPT’ Δ(x, OPT x OPT’) Using submodularity, we have: Δ(x, OPT x OPT’) ≥ Δ(x, OPT x ) Therefore: f(OPT ’ ) ≥ x OPT’ Δ(x, OPT x ) It suffices to upper bound x OPT’’ Δ(x, OPT x )
Proof Scheme OPT OPT’ ’ OPT’ Selected Not Selected ≥ x OPT’ Δ(x, OPT x ) Suffices to upper bound x OPT’’ Δ(x, OPT x ) f(OPT) = x OPT Δ(x, OPT x ) For each x in T i OPT’’ : Δ(x, S i ) ≤ f(S i )/k 1 ≤ i ≤ m x in OPT’’ Ti Δ(x, S i ) ≤ max i {f(S i )} How large can Δ(x, OPT x ) - Δ(x, S i ) be? Goal: Lower bound max{f(OPT’), f(S 1 ), f(S 2 ), …, f(S m )}
Upper bounding Δ reductions Δ(x, OPT x ) - Δ(x, S i ) ≤ Δ(x, OPT x ) - Δ(x, OPT x S i ) x in OPT Δ(x, OPT x ) - Δ(x, OPT x S i ) = f(OPT) – (f(OPT S i ) – f(S i )) ≤ f(S i ) in worst case: 1 ≤ i ≤ m x in OPT’’ Ti Δ(x, OPT x ) - Δ(x, S i ) ≤ 1 ≤ i ≤ m f(S i ) in expectation: 1 ≤ i ≤ m x in OPT’’ Ti Δ(x, OPT x ) - Δ(x, S i ) ≤ 1 ≤ i ≤ m f(S i )/m Conclusion: E[f(OPT’)] ≥ f(OPT) - max i {f(S i )} - Average i {f(S i )} Greedy is a 1/3-approximate randomized core-set
Distributed Approximation Factor Input Set N Machine 1 Machine m Machine 2 T1T1 T2T2 TmTm Run Greedy in each machine Selected elements S1S1 S2S2 SmSm output set Run Greedy to find the final size k output set with value ≥ (1-1/e)f(OPT)/3 There exists a solution with f(OPT)/3 value. Take the maximum of max i {f(S i )} and Greedy(S 1 S 2 … S m ) to achieve 0.27 approximation factor
Improving Approximation Factors for Monotone Submodular Functions Hardness Result [Mirrokni, Z]: With output sizes (|S i |) ≤ k, Greedy, and locally optimum algorithms are not better than ½ approximate randomized core-sets. Can we increase the output sizes and get better results?
-approximate Randomized Core-set Positive Result [Mirrokni, Z]: If we increase the output sizes to be 4k, Greedy will be (2-√2)- o(1) ≥ approximate randomized core-set for a monotone submodular function. Remark: In this result, we send each item to C random machines instead of one. As a result, the approximation factors are reduced by a O(ln(C)/C) term.
0.545 Distributed Approximation Factor If we run Greedy again at the second stage, the distributed approximation factor will be 0.585(1-1/e) < Positive Distributed Result [Mirrokni, Z]: We propose a new algorithm PseudoGreedy for the with distributed approximation factor We still use Greedy in the first stage to return 4k elements in each machine.
Improved Distributed Approximation Factor Input Set N Machine 1 Machine m Machine 2 T1T1 T2T2 TmTm Run Greedy, and return 4k items in each machine Selected elements S1S1 S2S2 SmSm output set Run PseudoGreedy to find the final size k output set with value ≥ 0.545f(OPT) There exists a size k subset with value ≥ 0.585f(OPT)
Analysis: Greedy is -approximate Core-set OPT OPT1 := {x | x in OPT and x in Greedy(T 1 {x})} K 1 := |OPT1| OPT1 := {x | x in OPT and x in Greedy(T 1 {x})} K 1 := |OPT1| OPT2 := {x | x in OPT and x not in Greedy(T 1 {x})} K 2 := |OPT2| OPT2 := {x | x in OPT and x not in Greedy(T 1 {x})} K 2 := |OPT2| Assume that OPT1 OPT’ with an O(ln(C)/C) error term. Use OPT2 to lower bound marginal gains of selected items in the first machine, S 1.
Analysis (cont’d) Greedy starts with an empty set, and adds items y 1, y 2, …, y 4k to S 1 Z 1 = {y 1 } Z 2 = {y 1, y 2 } Z i = {y 1, y 2, …, y i } S 1 = Z 4k = {y 1, y 2, …, y 4k } Marginal gain of y i ≥ x in OPT2 Δ(x, Z i-1 )/K 2 ≥ x in OPT2 Δ(x, OPT x Z i-1 )/K 2 x in OPT2 Δ(x, OPT x Z i-1 ) x in OPT2 Δ(x, OPT x Z i ) x in OPT1 Δ(x, OPT x Z i-1 ) x in OPT1 Δ(x, OPT x Z i ) Adding y i reduces Δ values aiai bibi Linear Program Variables
Analysis (cont’d): Factor Revealing LP x in OPT2 Δ(x, OPT x Z i-1 ) x in OPT2 Δ(x, OPT x Z i ) x in OPT1 Δ(x, OPT x Z i-1 ) x in OPT1 Δ(x, OPT x Z i ) Define a i := minus Define b i := minus For each 1 ≤ i ≤ 4k Assume f(OPT) =1 α := x in OPT2 Δ(x, OPT x ) β := f k (S 1 OPT1) Assume f(OPT) =1 α := x in OPT2 Δ(x, OPT x ) β := f k (S 1 OPT1) Linear Program LP k,k2 : Minimize β a i + b i ≥ (α - 1 ≤ i’ ≤ i a i’ )/K 2 1 ≤ i ≤ 4k 1 - α ≥ 1 ≤ i ≤ 4k b i β ≥ 1 – α + i in L a j set L {1, 2, …, 4k} with |L|=K 2 Linear Program LP k,k2 : Minimize β a i + b i ≥ (α - 1 ≤ i’ ≤ i a i’ )/K 2 1 ≤ i ≤ 4k 1 - α ≥ 1 ≤ i ≤ 4k b i β ≥ 1 – α + i in L a j set L {1, 2, …, 4k} with |L|=K 2
Analysis: wrapping up LP k,k2 is lower bounded by β in the following mathematical program: 0 ≤ α, β, λ, r ≤ β ≥ 1 – α + (1 – e -r ) α + (1 - r) λ 1 – α ≥ (e -r α - λ) 2 /2λ 0 ≤ α, β, λ, r ≤ β ≥ 1 – α + (1 – e -r ) α + (1 - r) λ 1 – α ≥ (e -r α - λ) 2 /2λ Minimum occurs at: r = 0, λ = 1 -, α =, and β = 2 - Minimum occurs at: r = 0, λ = 1 -, α =, and β = 2 -
Improved Distributed Approximation Factor Input Set N Machine 1 Machine m Machine 2 T1T1 T2T2 TmTm Run Greedy, and return 4k items in each machine Selected elements S1S1 S2S2 SmSm output set Run PseudoGreedy to find the final size k output set with value ≥ 0.545f(OPT) There exists a size k subset with value ≥ 0.585f(OPT)
Algorithm PseudoGreedy Forall 1 ≤ K 2 ≤ k – Set K’ := K 2 /4 – Set K 1 := k – K 2 – Partition the first 8K’ items of S 1 into sets {A 1, …,A 8 } – For each L {1, …, 8} Let S’ be union of A i where i is in L Among selected items, insert K 1 + (4 - |L|)K’ items to S’ greedily – If (f(S’) > f(S)) then S := S’ Return S
Small Size Core-sets Each machine can only select k’ << k items For random partitions, both distributed and core-set approximations factors are For worst case, they are
Thank you! Questions?