Learning Submodular Functions

Slides:

Advertisements

Similar presentations

Beyond Convexity – Submodularity in Machine Learning

Advertisements

Submodular Set Function Maximization A Mini-Survey Chandra Chekuri Univ. of Illinois, Urbana-Champaign.

Matroids from Lossless Expander Graphs

Submodular Set Function Maximization via the Multilinear Relaxation & Dependent Rounding Chandra Chekuri Univ. of Illinois, Urbana-Champaign.

C&O 355 Lecture 23 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A.

Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.

Maximizing the Spread of Influence through a Social Network

C&O 355 Mathematical Programming Fall 2010 Lecture 21 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A.

Dependent Randomized Rounding in Matroid Polytopes (& Related Results) Chandra Chekuri Jan VondrakRico Zenklusen Univ. of Illinois IBM ResearchMIT.

Learning Submodular Functions Nick Harvey University of Waterloo Joint work with Nina Balcan, Georgia Tech.

Learning Submodular Functions Nick Harvey, Waterloo C&O Joint work with Nina Balcan, Georgia Tech.

Algorithms for Max-min Optimization

Graph Sparsifiers by Edge-Connectivity and Random Spanning Trees Nick Harvey University of Waterloo Department of Combinatorics and Optimization Joint.

Graph Sparsifiers by Edge-Connectivity and Random Spanning Trees Nick Harvey U. Waterloo C&O Joint work with Isaac Fung TexPoint fonts used in EMF. Read.

Greedy Algorithms for Matroids Andreas Klappenecker.

The Submodular Welfare Problem Lecturer: Moran Feldman Based on “Optimal Approximation for the Submodular Welfare Problem in the Value Oracle Model” By.

Active Learning of Binary Classifiers

Randomized Algorithms and Randomized Rounding Lecture 21: April 13 G n 2 leaves

Approximation Algorithm: Iterative Rounding Lecture 15: March 9.

Probably Approximately Correct Model (PAC)

Testing of Clustering Noga Alon, Seannie Dar Michal Parnas, Dana Ron.

Matroids, Secretary Problems, and Online Mechanisms Nicole Immorlica, Microsoft Research Joint work with Robert Kleinberg and Moshe Babaioff.

Greedy Algorithms Reading Material: Chapter 8 (Except Section 8.5)

Sublinear time algorithms Ronitt Rubinfeld Blavatnik School of Computer Science Tel Aviv University TexPoint fonts used in EMF. Read the TexPoint manual.

Maria-Florina Balcan Learning with Similarity Functions Maria-Florina Balcan & Avrim Blum CMU, CSD.

1 Introduction to Approximation Algorithms Lecture 15: Mar 5.

Approximation Algorithms: Bristol Summer School 2008 Seffi Naor Computer Science Dept. Technion Haifa, Israel TexPoint fonts used in EMF. Read the TexPoint.

Incorporating Unlabeled Data in the Learning Process

Approximation Algorithms for Stochastic Combinatorial Optimization Part I: Multistage problems Anupam Gupta Carnegie Mellon University.

Approximating Minimum Bounded Degree Spanning Tree (MBDST) Mohit Singh and Lap Chi Lau “Approximating Minimum Bounded DegreeApproximating Minimum Bounded.

C&O 355 Mathematical Programming Fall 2010 Lecture 4 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.

1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.

An Algorithmic Proof of the Lopsided Lovasz Local Lemma Nick Harvey University of British Columbia Jan Vondrak IBM Almaden TexPoint fonts used in EMF.

Approximating Submodular Functions Part 2 Nick Harvey University of British Columbia Department of Computer Science July 12 th, 2015 Joint work with Nina.

Submodular Functions Learnability, Structure & Optimization Nick Harvey, UBC CS Maria-Florina Balcan, Georgia Tech.

Martin Grötschel  Institute of Mathematics, Technische Universität Berlin (TUB)  DFG-Research Center “Mathematics for key technologies” (M ATHEON ) 

Greedy Algorithms and Matroids Andreas Klappenecker.

PODC Distributed Computation of the Mode Fabian Kuhn Thomas Locher ETH Zurich, Switzerland Stefan Schmid TU Munich, Germany TexPoint fonts used in.

Randomized Composable Core-sets for Submodular Maximization Morteza Zadimoghaddam and Vahab Mirrokni Google Research New York.

Maria-Florina Balcan Active Learning Maria Florina Balcan Lecture 26th.

1/19 Minimizing weighted completion time with precedence constraints Nikhil Bansal (IBM) Subhash Khot (NYU)

CPSC 536N Sparse Approximations Winter 2013 Lecture 1 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAA.

Submodular Maximization with Cardinality Constraints Moran Feldman Based On Submodular Maximization with Cardinality Constraints. Niv Buchbinder, Moran.

Improved Competitive Ratios for Submodular Secretary Problems ? Moran Feldman Roy SchwartzJoseph (Seffi) Naor Technion – Israel Institute of Technology.

A Unified Continuous Greedy Algorithm for Submodular Maximization Moran Feldman Roy SchwartzJoseph (Seffi) Naor Technion – Israel Institute of Technology.

Maximization Problems with Submodular Objective Functions Moran Feldman Publication List Improved Approximations for k-Exchange Systems. Moran Feldman,

Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University Today: Computational Learning Theory Probably Approximately.

Submodular Set Function Maximization A Mini-Survey Chandra Chekuri Univ. of Illinois, Urbana-Champaign.

Deterministic Algorithms for Submodular Maximization Problems Moran Feldman The Open University of Israel Joint work with Niv Buchbinder.

Secret Sharing Non-Shannon Information Inequalities Presented in: Theory of Cryptography Conference (TCC) 2009 Published in: IEEE Transactions on Information.

Matroids, Secretary Problems, and Online Mechanisms Nicole Immorlica, Microsoft Research Joint work with Robert Kleinberg and Moshe Babaioff.

CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.

Aspects of Submodular Maximization Subject to a Matroid Constraint Moran Feldman Based on A Unified Continuous Greedy Algorithm for Submodular Maximization.

Maximizing Symmetric Submodular Functions Moran Feldman EPFL.

Iterative Rounding in Graph Connectivity Problems Kamal Jain ex- Georgia Techie Microsoft Research Some slides borrowed from Lap Chi Lau.

Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.

The Message Passing Communication Model David Woodruff IBM Almaden.

Learning with General Similarity Functions Maria-Florina Balcan.

Approximation Algorithms based on linear programming.

Unconstrained Submodular Maximization Moran Feldman The Open University of Israel Based On Maximizing Non-monotone Submodular Functions. Uriel Feige, Vahab.

Submodularity Reading Group Matroids, Submodular Functions M. Pawan Kumar

New Characterizations in Turnstile Streams with Applications

Vitaly Feldman and Jan Vondrâk IBM Research - Almaden

Distributed Submodular Maximization in Massive Datasets

Combinatorial Prophet Inequalities

Framework for the Secretary Problem on the Intersection of Matroids

k-center Clustering under Perturbation Resilience

Coverage Approximation Algorithms

Submodular Maximization Through the Lens of the Multilinear Relaxation

Submodular Maximization with Cardinality Constraints

Presentation transcript:

Learning Submodular Functions Maria Florina Balcan LGO, 11/16/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAA

Submodular functions V={1,2, …, n}; set-function f : 2V ! R f(S)+f(T) ¸ f(S Å T) + f(S [ T), 8 S,Tµ V Decreasing marginal return f(T [ {x})-f(T)¸ f(S [ {x})-f(S), 8 S,Tµ V, T µ S, x not in T Examples: Vector Spaces Let V={v1,,vn}, each vi 2 Fn. For each S µ V, let f(S) = rank(V[S]) Concave Functions Let h : R ! R be concave. For each S µ V, let f(S) = h(|S|)

Submodular set functions Set function f on V is called submodular if For all S,T µ V: f(S)+f(T) ¸ f(S[T)+f(SÅT) Equivalent diminishing returns characterization: + ¸ + S [ T S T SÅT S + x Large improvement Submodularity: T + x Small improvement For TµS, xS, f(T [ {x}) – f(T) ¸ f(S [ {x}) – f(S)

Example: Set cover Want to cover floorplan with discs Place sensors in building Possible locations V For S µ V: f(S) = “area (# locations) covered by sensors placed at S” Node predicts values of positions with some radius Formally: W finite set, collection of n subsets Wi µ W For S µ V={1,…,n} define f(S) = |i2 S Wi|

Set cover is submodular T={W1,W2} W1 W2 x f(T[{x})-f(T) ¸ f(S[{x})-f(S) W1 W2 W3 x W4 S = {W1,W2,W3,W4}

Submodular functions V={1,2, …, n}; set-function f : 2V ! R f(S)+f(T) ¸ f(S Å T) + f(S [ T), 8 S,Tµ V Decreasing marginal return f(T [ {x})-f(T)· f(S [ {x})-f(S), 8 S,Tµ V, S µ T, x not in T Examples: Vector Spaces Let V={v1,,vn}, each vi 2 Fn. For each S µ V, let f(S) = rank(V[S]) Concave Functions Let h : R ! R be concave. For each S µ V, let f(S) = h(|S|)

Submodular functions V={1,2, …, n}; set-function f : 2V ! R f(S)+f(T) ¸ f(S Å T) + f(S [ T), 8 S,Tµ V Monotone: f(S) · f(T) , 8 S µ T Non-negative: f(S) ¸ 0, 8 S µ V

Submodular functions A lot of work on optimization and submodularity. Can be minimized in polynomial time. Algorithmic game theory decreasing marginal utilities. Substantial interest in the ML community recently. Tutorials, workshops at ICML, NIPS, etc. www.submodularity.org/ owned by ML.

Learnability of Submodular Fns Important to also understand their learnability. Previous Work: Exact learning with value queries Goemans, Harvey, Iwata, Mirrokni, SODA 2009 [GHIM’09] Model There is an unknown submodular target function. Algorithm allowed to (adaptively) pick sets and query the value of the target on those sets. Can we learn the target with a polynomial number of queries in poly time? Output a function that approximates the target within a factor of ® on every single subset.

Exact learning with value queries Goemans, Harvey, Iwata, Mirrokni, SODA 2009 Theorem: (General upper bound) 9 an alg. for learning a submodular function with an approx. factor O(n1/2). Theorem: (General lower bound) Any alg. for learning a submodular must have an approx. factor of (n1/2).

Problems with the GHIM model - Lower bound fails if our goal is to do well on most of the points. - Many simple functions that are easy to learn in the PAC model (e.g., conjunctions) are impossible to get exactly from a poly number of queries - Well known that value queries are undesirable in some learning applications. Is there a better model that gets around these problems?

Problems with the GHIM model - Lower bound fails if our goal is to do well on most of the points. - Many simple functions that are easy to learn in the PAC model (e.g., conjunctions) are impossible to get exactly from a poly number of queries - Well known that value queries are undesirable in some learning applications. Learning submodular fns in a distributional learning setting [BH10]

Our model: Passive Supervised Learning Data Source Distribution D on {0,1}n Expert / Oracle Learning Algorithm Labeled Examples (x1,f(x1)),…, (xk,f(xk)) f : {0,1}n  R+ Alg.outputs g : {0,1}n  R+

Our model: Passive Supervised Learning Distribution D on {0,1}n Labeled Examples Learning Algorithm Expert / Oracle Data Source Alg.outputs f : {0,1}n  R+ g : {0,1}n  R+ (x1,f(x1)),…, (xk,f(xk)) Algorithm sees (x1,f(x1)),…, (xk,f(xk)), xi i.i.d. from D Algorithm produces “hypothesis” g. (Hopefully g ¼ f) Prx1,…,xm[ Prx[g(x) · f(x)· ® g(x)]¸ 1-²] ¸ 1-± “Probably Mostly Approximately Correct”

Main results Theorem: (Our general upper bound) 9 an alg. for PMAC-learning the class of non-negative, monotone, submodular fns (w.r.t. an arbitrary distribution) with an approx. factor O(n1/2). Note: Much simpler alg. compared to GIHM’09 Theorem: (Our general lower bound) No algorithm can PMAC learn the class of non-neg., monotone, submodular fns with an approx. factor õ(n1/3). Note: The GIHM’09 lower bound fails in our model. Theorem: (Product distributions) Matroid rank functions, const. approx.

A General Upper Bound Theorem: 9 an alg. for PMAC-learning the class of non-negative, monotone, submodular fns (w.r.t. an arbitrary distribution) with an approx. factor O(n1/2).

Subaddtive Fns are Approximately Linear Let f be non-negative, monotone and subadditive Claim: f can be approximated to within factor n by a linear function g. Proof Sketch: Let g(S) = s in S f({s}). Then f(S) · g(S) · n ¢ f(S). Subadditive: f(S)+f(T) ¸ f(S[ T) 8 S,T µ V Monotonicity: f(S) · f(T) 8 Sµ T Non-negativity: f(S) ¸ 0 8 S µ V

Subaddtive Fns are Approximately Linear f(S) · g(S) · n¢f(S). n¢f g f V

PMAC Learning Subadditive Fns f non-negative, monotone, subadditive approximated to within factor n by a linear function g, g (S) =w ¢ Â (S). Labeled examples ((Â(S), f(S) ), +) and ((Â(S), n¢f(S) ), -) are linearly separable in Rn+1. Idea: learn a linear separator. Use std sample complex. Problem: data not i.i.d. Solution: create a related distribution. w chi(S) – f(S) >0 w chi(S) – (n+1) f(S)<0 w chi(S) – z f(S) >0 W chi(S) – z(n+1) f(S) <0 === Sample S from D; flip a coin. If heads add ((Â(S), f(S) ), +). Else add ((Â(S), n¢f(S) ), -).

PMAC Learning Subadditive Fns Algorithm: Note: Deal with the set {S:f(S)=0 } separately. Input: (S1, f(S1)) …, (Sm, f(Sm)) For each Si, flip a coin. If heads add ((Â(S), f(Si) ), +). Else add ((Â(S), n¢f(Si) ), -). Learn a linear separator u=(w,-z) in Rn+1. Output: g(S)=1/(n+1) w ¢ Â (S). Theorem: For m = £(n/²), g approximates f to within a factor n on a 1-² fraction of the distribution.

PMAC Learning Submodular Fns Algorithm: Note: Deal with the set {S:f(S)=0 } separately. Input: (S1, f(S1)) …, (Sm, f(Sm)) For each Si, flip a coin. If heads add ((Â(S), f2(S_i)) ), +). Else add ((Â(S), n f2(S_i) ), -). Learn a linear separator u=(w,-z) in Rn+1. Output: g(S)=1/(n+1)1/2 w ¢ Â (S) Theorem: For m = £(n/²), g approximates f to within a factor \sqrt{n} on a 1-² fraction of the distribution. Proof idea: f non-negative, monotone, submodular approximated to within factor \sqrt{n} by a \sqrt{linear function}. [GHIM, 09]

PMAC Learning Submodular Fns Algorithm: Note: Deal with the set {S:f(S)=0 } separately. Input: (S1, f(S1)) …, (Sm, f(Sm)) For each Si, flip a coin. If heads add ((Â(S), f2(S_i)) ), +). Else add ((Â(S), n f2(S_i) ), -). Learn a linear separator u=(w,-z) in Rn+1. Output: g(S)=1/(n+1)1/2 w ¢ Â (S) Much simpler than [GIHM09]. More robust to variations. the target only needs to be within an ¯ factor of a submodular fnc. 9 a submodular fnc that agrees with target on all but a ´ fraction of the points (on the points it disagrees it can be arbitrarily far). [the alg is inefficient in this case]

A General Lower Bound Theorem: (Our general lower bound) No algorithm can PMAC learn the class of non-neg., monotone, submodular fns with an approx. factor õ(n1/3). Plan: Use the fact that any matroid rank fnc is submodular. Construct a hard family of matroid rank functions. High=n1/3 X X L=nlog log n Low=log2n X X A1 A2 A3 … … …. …. AL

Matroids (V,Ind) is a matroid: Subsets of independent sets are independent. If I,J are independent and |I| < |J| then I U {j} is independent for some j in J. Rank(S)= max{|I|, I 2 Ind, I µ S}, for any S µ V Examples: Uniform Matroids V={1,2, …, n}, Ind={I µ V: |I | · k} Graphical Matroid: Elements are the edges of an undirected graph G = (V;E) Set of edges is independent if it does not contain a cycle

Ind={I: |I Å Aj| · uj, for all j } Partition Matroids A1, A2, …, Ak µ V={1,2, …, n}, all disjoint; ui · |Ai|-1 Ind={I: |I Å Aj| · uj, for all j } Then (V, Ind) is a matroid. If sets Ai are not disjoint, then (V,Ind) might not be a matroid. E.g., n=5, A1={1,2,3}, A2={3,4,5}, u1=u2=2. {1,2,4,5} and {2,3,4} both maximal sets in Ind; do not have the same cardinality.

Almost partition matroids k=2, A1, A2 µ V (not necessarily disjoint); ui · |Ai|-1 Ind={I: |I Å Aj| · uj , |I Å (A1 [ A2)| · u1 +u2 - |A1 Å A2|} Then (V,Ind) is a matroid.

Almost partition matroids More generally A1, A2, …, Ak µ V={1,2, …, n}, ui · |Ai|-1; f : 2[k] ! Z =<0 f(J)= j 2 J uj +|A(J)|-j 2 J|Aj|, 8 J µ [k] Ind= { I: |I Å A(J)| · f(J), 8 J µ [k] } Then (V, Ind) is a matroid (if nonempty). Rewrite f, f(J)=|A(J)|-j 2 J(|Aj| - uj), 8 J µ [k]

A generalization of partition matroids More generally f : 2[k] ! Z f(J)=|A(J)|-j 2 J(|Aj| - uj), 8 J µ [k] Ind= { I: |I Å A(J)| · f(J), 8 J µ [k] } Then (V, Ind) is a matroid (if nonempty). Proof technique: Uncrossing argument For a set I, define T(I) to be the set of tight constraints T(I)= {J µ [k], |I Å A(J)|=f(J)} 8 I 2 Ind, J1, J2 2 T(I), then (J1 [ J2 2 T(I)) or (J1 Å J2 =) Ind is the family of independent sets of a matroid.

A generalization of almost partition matroids f : 2[k] ! Z, f(J)=|A(J)|-j 2 J(|Aj| -uj), 8 J µ [k]; ui · |Ai|-1 Note: This requires k· n (for k > n, f becomes negative) But we want k=n^{log log n}. Do some sort of truncation to allow k>>n. f(J) is (¹, ¿) good if f(J) ¸ 0 for J µ [k], |J| · ¿ and f(J) ¸ ¹ for J µ [k], ¿ ·|J| · 2¿ -2 h(J)=f(J) if |J| · ¿ and h(J)=¹, otherwise. Ind= { I: |I Å A(J)| · h(J), 8 J µ [k] } Then (V,Ind) is a matroid (if nonempty).

A generalization of partition matroids Let L = nlog log n. Let A1, A2, …, AL be random subsets of V. (Ai -- include each elem of V indep with prob n-2/3. ) Let ¹=n^{1/3} log2 n, u=log2 n, ¿=n1/3 Each subset J µ {1,2, …, L} induces a matroid s.t. for any i not in J, Ai is indep in this matroid Rank(Ai), i not in J, is roughly |Ai| (i.e., £(n^{1/3})), The rank of sets Aj, j in J is u=log2 n. High=n1/3 X X L=nlog log n Low=log2n X X A1 A2 A3 … … …. …. AL

Product distributions, Matroid Rank Fns Talagrand implies: Let D be a product distribution on V, R=rank(X), X drawn from D. If E[R] ¸ 4000, If E[R]· 500 log(1/²), Related Work: [Chekuri, Vondrak ’09] and [Vondrak ’10] prove a slightly more general result by two different techniques

Product distributions, Matroid Rank Fns Talagrand implies: Let D be a product distribution on V, R=rank(X), X drawn from D. If E[R] ¸ 4000, If E[R]· 500 log(1/²), Algorithm: Let ¹= i=1m f (xi) / m Let g be the constant function with value ¹ This achieves approximation factor O(log(1/²)) on a 1-² fraction of points, with high probability.

Conclusions and Open Questions Analyze intrinsic learnability of submodular fns Our analysis reveals interesting novel extremal and structural properties of submodular fns. Open questions Improve (n1/3) lower bound to (n1/2) Non-monotone submodular functions

Other interesting structural properties Let h : R ! R+ be concave, non-decreasing. For each Sµ V, let f(S) = h(|S|) Claim: These functions f are submodular, monotone, non-negative. ; V

Lots of Theorem: Every submodular function looks like this. approximately usually. V ;

Theorem: Every submodular function looks like this. Lots of approximately usually. V ; Theorem Let f be a non-negative, monotone, submodular, 1-Lipschitz function. For any ²>0, there exists a concave function h : [0,n] ! R s.t. for every k2[0,n], and for a 1-² fraction of SµV with |S|=k, we have: h(k) · f(S) · O(log2(1/²))¢h(k). In fact, h(k) is just E[ f(S) ], where S is uniform on sets of size k. Proof: Based on Talagrand’s Inequality.

Conclusions and Open Questions Analyze intrinsic learnability of submodular fns Our analysis reveals interesting novel extremal and structural properties of submodular fns. Open questions Improve (n1/3) lower bound to (n1/2) Non-monotone submodular functions Any algorithm? Lower bound better than (n1/3)