Learning Submodular Functions Nick Harvey University of Waterloo Joint work with Nina Balcan, Georgia Tech.

Slides:

Advertisements

Similar presentations

Matroids from Lossless Expander Graphs

Advertisements

Submodular Set Function Maximization via the Multilinear Relaxation & Dependent Rounding Chandra Chekuri Univ. of Illinois, Urbana-Champaign.

C&O 355 Lecture 23 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A.

Approximation Algorithms Chapter 14: Rounding Applied to Set Cover.

Dependent Randomized Rounding in Matroid Polytopes (& Related Results) Chandra Chekuri Jan VondrakRico Zenklusen Univ. of Illinois IBM ResearchMIT.

Learning Submodular Functions Nick Harvey, Waterloo C&O Joint work with Nina Balcan, Georgia Tech.

Algorithms for Max-min Optimization

Graph Sparsifiers by Edge-Connectivity and Random Spanning Trees Nick Harvey University of Waterloo Department of Combinatorics and Optimization Joint.

Machine Learning Week 2 Lecture 2.

Item Pricing for Revenue Maximization in Combinatorial Auctions Maria-Florina Balcan, Carnegie Mellon University Joint with Avrim Blum and Yishay Mansour.

CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.

Approximation Algorithm: Iterative Rounding Lecture 15: March 9.

Testing of Clustering Noga Alon, Seannie Dar Michal Parnas, Dana Ron.

Combinatorial Auction. Conbinatorial auction t 1 =20 t 2 =15 t 3 =6 f(t): the set X  F with the highest total value the mechanism decides the set of.

EXPANDER GRAPHS Properties & Applications. Things to cover ! Definitions Properties Combinatorial, Spectral properties Constructions “Explicit” constructions.

1 On approximating the number of relevant variables in a function Dana Ron & Gilad Tsur Tel-Aviv University.

CS151 Complexity Theory Lecture 6 April 15, 2004.

Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

(work appeared in SODA 10’) Yuk Hei Chan (Tom)

On Kernels, Margins, and Low- dimensional Mappings or Kernels versus features Nina Balcan CMU Avrim Blum CMU Santosh Vempala MIT.

Packing Element-Disjoint Steiner Trees Mohammad R. Salavatipour Department of Computing Science University of Alberta Joint with Joseph Cheriyan Department.

Some 3CNF Properties are Hard to Test Eli Ben-Sasson Harvard & MIT Prahladh Harsha MIT Sofya Raskhodnikova MIT.

How Robust are Linear Sketches to Adaptive Inputs? Moritz Hardt, David P. Woodruff IBM Research Almaden.

Dana Moshkovitz, MIT Joint work with Subhash Khot, NYU.

Fast Algorithms for Submodular Optimization

Learning Submodular Functions

Correlation testing for affine invariant properties on Shachar Lovett Institute for Advanced Study Joint with Hamed Hatami (McGill)

Approximation algorithms for sequential testing of Boolean functions Lisa Hellerstein Polytechnic Institute of NYU Joint work with Devorah Kletenik (Polytechnic.

Approximating Minimum Bounded Degree Spanning Tree (MBDST) Mohit Singh and Lap Chi Lau “Approximating Minimum Bounded DegreeApproximating Minimum Bounded.

C&O 355 Mathematical Programming Fall 2010 Lecture 4 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.

1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.

Approximating Submodular Functions Part 2 Nick Harvey University of British Columbia Department of Computer Science July 12 th, 2015 Joint work with Nina.

Submodular Functions Learnability, Structure & Optimization Nick Harvey, UBC CS Maria-Florina Balcan, Georgia Tech.

Edge-disjoint induced subgraphs with given minimum degree Raphael Yuster 2012.

Greedy Algorithms and Matroids Andreas Klappenecker.

Randomized Composable Core-sets for Submodular Maximization Morteza Zadimoghaddam and Vahab Mirrokni Google Research New York.

Approximation Algorithms for Prize-Collecting Forest Problems with Submodular Penalty Functions Chaitanya Swamy University of Waterloo Joint work with.

C&O 355 Mathematical Programming Fall 2010 Lecture 16 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.

X y x-y · 4 -y-2x · 5 -3x+y · 6 x+y · 3 Given x, for what values of y is (x,y) feasible? Need: y · 3x+6, y · -x+3, y ¸ -2x-5, and y ¸ x-4 Consider the.

Implicit Hitting Set Problems Richard M. Karp Erick Moreno Centeno DIMACS 20 th Anniversary.

CPSC 536N Sparse Approximations Winter 2013 Lecture 1 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAA.

Submodular Maximization with Cardinality Constraints Moran Feldman Based On Submodular Maximization with Cardinality Constraints. Niv Buchbinder, Moran.

Goal of Learning Algorithms  The early learning algorithms were designed to find such an accurate fit to the data.  A classifier is said to be consistent.

Improved Competitive Ratios for Submodular Secretary Problems ? Moran Feldman Roy SchwartzJoseph (Seffi) Naor Technion – Israel Institute of Technology.

Variations of the Prize- Collecting Steiner Tree Problem Olena Chapovska and Abraham P. Punnen Networks 2006 Reporter: Cheng-Chung Li 2006/08/28.

C&O 355 Lecture 24 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A A A A A A A A.

A Unified Continuous Greedy Algorithm for Submodular Maximization Moran Feldman Roy SchwartzJoseph (Seffi) Naor Technion – Israel Institute of Technology.

Maximization Problems with Submodular Objective Functions Moran Feldman Publication List Improved Approximations for k-Exchange Systems. Moran Feldman,

Forrelation: A Problem that Optimally Separates Quantum from Classical Computing.

Secret Sharing Non-Shannon Information Inequalities Presented in: Theory of Cryptography Conference (TCC) 2009 Published in: IEEE Transactions on Information.

Maximizing Symmetric Submodular Functions Moran Feldman EPFL.

5.3 Algorithmic Stability Bounds Summarized by: Sang Kyun Lee.

Iterative Rounding in Graph Connectivity Problems Kamal Jain ex- Georgia Techie Microsoft Research Some slides borrowed from Lap Chi Lau.

Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.

The Message Passing Communication Model David Woodruff IBM Almaden.

Learning with General Similarity Functions Maria-Florina Balcan.

Approximation Algorithms based on linear programming.

Sketching complexity of graph cuts Alexandr Andoni joint work with: Robi Krauthgamer, David Woodruff.

Unconstrained Submodular Maximization Moran Feldman The Open University of Israel Based On Maximizing Non-monotone Submodular Functions. Uriel Feige, Vahab.

Approximation algorithms for combinatorial allocation problems

Information Complexity Lower Bounds

Stochastic Streams: Sample Complexity vs. Space Complexity

Vitaly Feldman and Jan Vondrâk IBM Research - Almaden

Lebesgue measure: Lebesgue measure m0 is a measure on i.e., 1. 2.

Distributed Submodular Maximization in Massive Datasets

k-center Clustering under Perturbation Resilience

Submodular Maximization Through the Lens of the Multilinear Relaxation

Submodular Maximization with Cardinality Constraints

Locality In Distributed Graph Algorithms

Guess Free Maximization of Submodular and Linear Sums

Presentation transcript:

Learning Submodular Functions Nick Harvey University of Waterloo Joint work with Nina Balcan, Georgia Tech

Submodular functions V={1,2, …, n} f : 2 V ! R Concave Functions Let h : R ! R be concave. For each S µ V, let f(S) = h(|S|) Vector Spaces Let V={v 1, ,v n }, each v i 2 R n. For each S µ V, let f(S) = rank(V[S]) Examples: f(S)+f(T) ¸ f(S Å T) + f(S [ T) 8 S,T µ V Decreasing marginal values: f(S [ {x})-f(S) ¸ f(T [ {x})-f(T) 8 S µ T µ V, x  T Submodularity: Equivalent

Submodular functions V={1,2, …, n} f : 2 V ! R f(S) · f(T), 8 S µ T f(S) ¸ 0, 8 S µ V Non-negative: Monotone: f(S)+f(T) ¸ f(S Å T) + f(S [ T) 8 S,T µ V Decreasing marginal values: f(S [ {x})-f(S) ¸ f(T [ {x})-f(T) 8 S µ T µ V, x  T Submodularity: Equivalent

Submodular functions Strong connection between optimization and submodularity e.g.: minimization [C’85,GLS’87,IFF’01,S’00,…], maximization [NWF’78,V’07,…] Much interest in Machine Learning community recently Tutorials at major conferences: ICML, NIPS, etc. is a Machine Learning site Algorithmic game theory Submodular utility functions Interesting to understand their learnability

Algorithm adaptively queries x i and receives value f(x i ), for i=1,…,q, where q=poly(n). Algorithm produces “hypothesis” g. (Hopefully g ¼ f) Goal: g(x) · f(x) · ® ¢ g(x) 8 x 2 {0,1} n ® as small as possible f : {0,1} n  R Algorithm f(x 1 ) g : {0,1} n  R x1x1 Exact Learning with value queries Goemans, Harvey, Iwata, Mirrokni SODA 2009

Algorithm adaptively queries x i and receives value f(x i ), for i=1,…,q Algorithm produces “hypothesis” g. (Hopefully g ¼ f) Goal: g(x) · f(x) · ® ¢ g(x) 8 x 2 {0,1} n ® as small as possible Exact Learning with value queries Goemans, Harvey, Iwata, Mirrokni SODA an alg. for learning a submodular function with ® = O(n 1/2 ). Theorem: (Upper bound) ~ Any alg. for learning a submodular function must have ® =  (n 1/2 ). Theorem: (Lower bound) ~

Problems with this model In learning theory, usually only try to predict value of most points GHIM lower bound fails if goal is to do well on most of the points To define “most” need a distribution on {0,1} n Is there a distributional model for learning submodular functions?

Distribution D on {0,1} n Our Model Algorithm sees examples (x 1,f(x 1 )),…, (x q,f(x q )) where x i ’s are i.i.d. from distribution D Algorithm produces “hypothesis” g. (Hopefully g ¼ f) f : {0,1} n  R + Algorithm xixi f(x i ) g : {0,1} n  R +

Distribution D on {0,1} n Our Model Algorithm sees examples (x 1,f(x 1 )),…, (x q,f(x q )) where x i ’s are i.i.d. from distribution D Algorithm produces “hypothesis” g. (Hopefully g ¼ f) Pr x 1,…,x q [ Pr x [g(x) · f(x) · ® ¢ g(x)] ¸ 1- ² ] ¸ 1- ± “Probably Mostly Approximately Correct” f : {0,1} n  R + Algorithm x g : {0,1} n  R + Is f(x) ¼ g(x)?

Distribution D on {0,1} n Our Model “Probably Mostly Approximately Correct” Impossible if f arbitrary and # training points ¿ 2 n Possible if f is a non-negative, monotone, submodular function f : {0,1} n  R + Algorithm x g : {0,1} n  R + Is f(x) ¼ g(x)?

Example: Concave Functions Concave Functions Let h : R ! R be concave. h

; V Example: Concave Functions Concave Functions Let h : R ! R be concave. For each S µ V, let f(S) = h(|S|). Claim: f is submodular. We prove a partial converse.

Theorem: Every submodular function looks like this. Lots of approximately usually. ; V

Theorem: Every submodular function looks like this. Lots of approximately usually. Theorem: Let f be a non-negative, monotone, submodular, 1-Lipschitz function. There exists a concave function h : [0,n] ! R s.t., for any ² >0, for every k 2 {0,..,n}, and for a 1- ² fraction of S µ V with |S|=k, we have: In fact, h(k) is just E[ f(S) ], where S is uniform on sets of size k. Proof: Based on Talagrand’s Inequality. h(k) · f(S) · O(log 2 (1/ ² )) ¢ h(k). ; V  matroid rank function

Learning Submodular Functions under any product distribution Product Distribution D on {0,1} n f : {0,1} n  R + Algorithm xixi f(x i ) g : {0,1} n  R + Algorithm: Let ¹ = § i =1 f(x i ) / q Let g be the constant function with value ¹ This achieves approximation factor O(log 2 (1/ ² )) on a 1- ² fraction of points, with high probability. Proof: Essentially follows from previous theorem. q

Learning Submodular Functions under an arbitrary distribution? Same argument no longer works. Talagrand’s inequality requires a product distribution. Intuition: A non-uniform distribution focuses on fewer points, so the function is less concentrated on those points. ; V

A General Upper Bound? Theorem: (Our upper bound) 9 an algorithm for learning a submodular function w.r.t. an arbitrary distribution that has approximation factor  (n 1/2 ).

Computing Linear Separators + – – – – – + – + + – – – Given {+,–}-labeled points in R n, find a hyperplane c T x = b that separates the +s and –s. Easily solved by linear programming.

Learning Linear Separators + – – – – – + – + + – – – Given random sample of {+,–}-labeled points in R n, find a hyperplane c T x = b that separates most of the +s and –s. Classic machine learning problem. Error!

Learning Linear Separators + – – – – – + – + + – – – Classic Theorem: [Vapnik-Chervonenkis 1971?] O( n/ ² 2 ) samples suffice to get error ². Error! ~

Submodular Functions are Approximately Linear Let f be non-negative, monotone and submodular Claim: f can be approximated to within factor n by a linear function g. Proof Sketch: Let g(S) = § s 2 S f({s}). Then f(S) · g(S) · n ¢ f(S). Submodularity: f(S)+f(T) ¸ f(S Å T)+f(S [ T) 8 S,T µ V Monotonicity: f(S) · f(T) 8 S µ T Non-negativity: f(S) ¸ 0 8 S µ V

V Submodular Functions are Approximately Linear f n¢fn¢f g

V f n¢fn¢f Randomly sample {S 1,…,S q } from distribution Create + for f(S i ) and – for n ¢ f(S i ) Now just learn a linear separator! – – – – – – – – – – – – – – g

V f n¢fn¢f Theorem: g approximates f to within a factor n on a 1- ² fraction of the distribution. Can improve to factor O(n 1/2 ) by GHIM lemma: ellipsoidal approximation of submodular functions. g

A Lower Bound? A non-uniform distribution focuses on fewer points, so the function is less concentrated on those points Can we create a submodular function with lots of deep “bumps”? Yes! ; V

A General Lower Bound Plan: Use the fact that matroid rank functions are submodular. Construct a hard family of matroids. Pick A 1,…,A m ½ V with |A i | = n 1/3 and m=n log n A1A1 A2A2 ALAL A3A3 X X X Low=log 2 n High=n 1/3 X … … …. …. No algorithm can PMAC learn the class of non-neg., monotone, submodular fns with an approx. factor õ(n 1/3 ). Theorem: (Our general lower bound)

Matroids Ground Set V Family of Independent Sets I Axioms: ; 2 I “nonempty” J ½ I 2 I ) J 2 I “downwards closed” J, I 2 I and |J|<|I| ) 9 x 2 I n J s.t. J+x 2 I “maximum-size sets can be found greedily” Rank function: r(S) = max { |I| : I 2I and I µ S }

f(S) = min{ |S|, k } r(S) = |S|(if |S| · k) k(otherwise) ; V

; V r(S) = |S|(if |S| · k) k-1(if S=A) k(otherwise) A

; V r(S) = |S|(if |S| · k) k-1(if S 2 A ) k(otherwise) A1A1 A2A2 A3A3 AmAm A = {A 1, ,A m }, |A i |=k 8 i Claim: r is submodular if |A i Å A j | · k-2 8 i  j r is the rank function of a “paving matroid”

; V r(S) = |S|(if |S| · k) k-1(if S 2 A ) k(otherwise) A1A1 A2A2 A3A3 AmAm A = {A 1, ,A m }, |A i |=k 8 i, |A i Å A j | · k-2 8 i  j

; V r(S) = |S| (if |S| · k) k-1 (if S 2 A and wasn’t deleted) k (otherwise) A1A1 A3A3 Delete half of the bumps at random. If m large, alg. cannot learn which were deleted ) any algorithm to learn f has additive error 1 If algorithm sees only these examples Then f can’t be predicted here A2A2 AmAm

; V A1A1 A3A3 Can we force a bigger error with bigger bumps? Yes! Need to generalize paving matroids A needs to have very strong properties AmAm A2A2

The Main Question Let V = A 1 [  [ A m and b 1, ,b m 2 N Is there a matroid s.t. r(A i ) · b i 8 i r(S) is “as large as possible” for S  A i (this is not formal) If A i ’s are disjoint, solution is partition matroid If A i ’s are “almost disjoint”, can we find a matroid that’s “almost” a partition matroid? Next: formalize this

Lossless Expander Graphs Definition: G =(U [ V, E) is a (D,K, ² )-lossless expander if – Every u 2 U has degree D – | ¡ (S)| ¸ (1- ² ) ¢ D ¢ |S| 8 S µ U with |S| · K, where ¡ (S) = { v 2 V : 9 u 2 S s.t. {u,v} 2 E } “Every small left-set has nearly-maximal number of right-neighbors” UV

Lossless Expander Graphs Definition: G =(U [ V, E) is a (D,K, ² )-lossless expander if – Every u 2 U has degree D – | ¡ (S)| ¸ (1- ² ) ¢ D ¢ |S| 8 S µ U with |S| · K, where ¡ (S) = { v 2 V : 9 u 2 S s.t. {u,v} 2 E } “Neighborhoods of left-vertices are K-wise-almost-disjoint” UV

Trivial Case: Disjoint Neighborhoods Definition: G =(U [ V, E) is a (D,K, ² )-lossless expander if – Every u 2 U has degree D – | ¡ (S)| ¸ (1- ² ) ¢ D ¢ |S| 8 S µ U with |S| · K, where ¡ (S) = { v 2 V : 9 u 2 S s.t. {u,v} 2 E } If left-vertices have disjoint neighborhoods, this gives an expander with ² =0, K= 1 UV

Main Theorem: Trivial Case Suppose G =(U [ V, E) has disjoint left-neighborhoods. Let A ={A 1,…,A m } be defined by A = { ¡ (u) : u 2 U }. Let b 1, …, b m be non-negative integers. Theorem: is family of independent sets of a matroid. A1A1 A2A2 · b1· b1 · b2· b2 U V Partition matroid u1u1 u2u2 u3u3

Main Theorem Let G =(U [ V, E) be a (D,K, ² )-lossless expander Let A ={A 1,…,A m } be defined by A = { ¡ (u) : u 2 U } Let b 1, …, b m satisfy b i ¸ 4 ² D 8 i A1A1 · b1· b1 A2A2 · b2· b2

Main Theorem Let G =(U [ V, E) be a (D,K, ² )-lossless expander Let A ={A 1,…,A m } be defined by A = { ¡ (u) : u 2 U } Let b 1, …, b m satisfy b i ¸ 4 ² D 8 i “Desired Theorem”: I is a matroid, where

Main Theorem Let G =(U [ V, E) be a (D,K, ² )-lossless expander Let A ={A 1,…,A m } be defined by A = { ¡ (u) : u 2 U } Let b 1, …, b m satisfy b i ¸ 4 ² D 8 i Theorem: I is a matroid, where

Main Theorem Let G =(U [ V, E) be a (D,K, ² )-lossless expander Let A ={A 1,…,A m } be defined by A = { ¡ (u) : u 2 U } Let b 1, …, b m satisfy b i ¸ 4 ² D 8 i Theorem: I is a matroid, where Trivial case: G has disjoint neighborhoods, i.e., K= 1 and ² =0. = 0 = 1 = 0 = 1

LB for Learning Submodular Functions ; V A2A2 A1A1 How deep can we make the “valleys”? n 1/3 log 2 n

LB for Learning Submodular Functions Let G =(U [ V, E) be a (D,K, ² )-lossless expander, where A i = ¡ (u i ) and – |V|=n − |U|=n log n – D = K = n 1/3 − ² = log 2 (n)/n 1/3 Such graphs exist by the probabilistic method Lower Bound Proof: – Delete each node in U with prob. ½, then use main theorem to get a matroid – If u i 2 U was not deleted then r(A i ) · b i = 4 ² D = O(log 2 n) – Claim: If u i deleted then A i 2 I (Needs a proof) ) r(A i ) = |A i | = D = n 1/3 – Since # A i ’s = |U| = n log n, no algorithm can learn a significant fraction of r(A i ) values in polynomial time

Summary PMAC model for learning real-valued functions Learning under arbitrary distributions: – Factor O(n 1/2 ) algorithm – Factor  (n 1/3 ) hardness (info-theoretic) Learning under product distributions: – Factor O(log(1/ ² )) algorithm New general family of matroids – Generalizes partition matroids to non-disjoint parts

Open Questions Improve  (n 1/3 ) lower bound to  (n 1/2 ) Explicit construction of expanders Non-monotone submodular functions – Any algorithm? – Lower bound better than  (n 1/3 ) For algorithm under uniform distribution, relax 1-Lipschitz condition