Carnegie Mellon Maximizing Submodular Functions and Applications in Machine Learning Andreas Krause, Carlos Guestrin Carnegie Mellon University.

Slides:



Advertisements
Similar presentations
Beyond Convexity – Submodularity in Machine Learning
Advertisements

Cost-effective Outbreak Detection in Networks Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, Natalie Glance.
1 Adaptive Submodularity: A New Approach to Active Learning and Stochastic Optimization Joint work with Andreas Krause 1 Daniel Golovin.
Submodularity for Distributed Sensing Problems Zeyn Saigol IR Lab, School of Computer Science University of Birmingham 6 th July 2010.
Submodular Set Function Maximization via the Multilinear Relaxation & Dependent Rounding Chandra Chekuri Univ. of Illinois, Urbana-Champaign.
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
Cost-effective Outbreak Detection in Networks Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, Natalie Glance.
Randomized Sensing in Adversarial Environments Andreas Krause Joint work with Daniel Golovin and Alex Roper International Joint Conference on Artificial.
Maximizing the Spread of Influence through a Social Network
Online Distributed Sensor Selection Daniel Golovin, Matthew Faulkner, Andreas Krause theory and practice collide 1.
S. J. Shyu Chap. 1 Introduction 1 The Design and Analysis of Algorithms Chapter 1 Introduction S. J. Shyu.
Carnegie Mellon Selecting Observations against Adversarial Objectives Andreas Krause Brendan McMahan Carlos Guestrin Anupam Gupta TexPoint fonts used in.
Near-Optimal Sensor Placements in Gaussian Processes Carlos Guestrin Andreas KrauseAjit Singh Carnegie Mellon University.
PCPs and Inapproximability Introduction. My T. Thai 2 Why Approximation Algorithms  Problems that we cannot find an optimal solution.
Efficient Informative Sensing using Multiple Robots
A Decentralised Coordination Algorithm for Maximising Sensor Coverage in Large Sensor Networks Ruben Stranders, Alex Rogers and Nicholas R. Jennings School.
A Utility-Theoretic Approach to Privacy and Personalization Andreas Krause Carnegie Mellon University work performed during an internship at Microsoft.
Near-optimal Nonmyopic Value of Information in Graphical Models Andreas Krause, Carlos Guestrin Computer Science Department Carnegie Mellon University.
Sensor placement applications Monitoring of spatial phenomena Temperature Precipitation... Active learning, Experiment design Precipitation data from Pacific.
Computational Methods for Management and Economics Carla Gomes
Non-myopic Informative Path Planning in Spatio-Temporal Models Alexandra Meliou Andreas Krause Carlos Guestrin Joe Hellerstein.
Optimal Nonmyopic Value of Information in Graphical Models Efficient Algorithms and Theoretical Limits Andreas Krause, Carlos Guestrin Computer Science.
INFERRING NETWORKS OF DIFFUSION AND INFLUENCE Presented by Alicia Frame Paper by Manuel Gomez-Rodriguez, Jure Leskovec, and Andreas Kraus.
Near-optimal Observation Selection using Submodular Functions Andreas Krause joint work with Carlos Guestrin (CMU)
1 Efficient planning of informative paths for multiple robots Amarjeet Singh *, Andreas Krause +, Carlos Guestrin +, William J. Kaiser *, Maxim Batalin.
Nonmyopic Active Learning of Gaussian Processes An Exploration – Exploitation Approach Andreas Krause, Carlos Guestrin Carnegie Mellon University TexPoint.
Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.
10/31/02CSE Greedy Algorithms CSE Algorithms Greedy Algorithms.
10/31/02CSE Greedy Algorithms CSE Algorithms Greedy Algorithms.
Carnegie Mellon AI, Sensing, and Optimized Information Gathering: Trends and Directions Carlos Guestrin joint work with: and: Anupam Gupta, Jon Kleinberg,
Decision Procedures An Algorithmic Point of View
Approximation Algorithms for Stochastic Combinatorial Optimization Part I: Multistage problems Anupam Gupta Carnegie Mellon University.
Distributed Constraint Optimization Michal Jakob Agent Technology Center, Dept. of Computer Science and Engineering, FEE, Czech Technical University A4M33MAS.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Minimizing general submodular functions
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
Influence Maximization in Dynamic Social Networks Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang, Xiaoming Sun.
Great Theoretical Ideas in Computer Science.
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
Maximizing the Spread of Influence through a Social Network David Kempe, Jon Kleinberg, Eva Tardos Cornell University KDD 2003.
Randomized Composable Core-sets for Submodular Maximization Morteza Zadimoghaddam and Vahab Mirrokni Google Research New York.
Approximation Algorithms for Prize-Collecting Forest Problems with Submodular Penalty Functions Chaitanya Swamy University of Waterloo Joint work with.
Maximizing the Spread of Influence through a Social Network Authors: David Kempe, Jon Kleinberg, É va Tardos KDD 2003.
5 Maximizing submodular functions Minimizing convex functions: Polynomial time solvable! Minimizing submodular functions: Polynomial time solvable!
Tractable Higher Order Models in Computer Vision (Part II) Slides from Carsten Rother, Sebastian Nowozin, Pusohmeet Khli Microsoft Research Cambridge Presented.
Submodular Maximization with Cardinality Constraints Moran Feldman Based On Submodular Maximization with Cardinality Constraints. Niv Buchbinder, Moran.
A Unified Continuous Greedy Algorithm for Submodular Maximization Moran Feldman Roy SchwartzJoseph (Seffi) Naor Technion – Israel Institute of Technology.
Deterministic Algorithms for Submodular Maximization Problems Moran Feldman The Open University of Israel Joint work with Niv Buchbinder.
Cost-effective Outbreak Detection in Networks Presented by Amlan Pradhan, Yining Zhou, Yingfei Xiang, Abhinav Rungta -Group 1.
Maximizing Symmetric Submodular Functions Moran Feldman EPFL.
Distributed Optimization Yen-Ling Kuo Der-Yeuan Yu May 27, 2010.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
The NP class. NP-completeness Lecture2. The NP-class The NP class is a class that contains all the problems that can be decided by a Non-Deterministic.
Unconstrained Submodular Maximization Moran Feldman The Open University of Israel Based On Maximizing Non-monotone Submodular Functions. Uriel Feige, Vahab.
Inferring Networks of Diffusion and Influence
The NP class. NP-completeness
Monitoring rivers and lakes [IJCAI ‘07]
Near-optimal Observation Selection using Submodular Functions
Who are the most influential bloggers?
Distributed Submodular Maximization in Massive Datasets
Example: Feature selection Given random variables Y, X1, … Xn Want to predict Y from subset XA = (Xi1,…,Xik) Want k most informative features: A*
Chapter 6. Large Scale Optimization
Coverage Approximation Algorithms
Cost-effective Outbreak Detection in Networks
Submodular Maximization Through the Lens of the Multilinear Relaxation
Near-Optimal Sensor Placements in Gaussian Processes
Submodular Maximization with Cardinality Constraints
Chapter 6. Large Scale Optimization
Guess Free Maximization of Submodular and Linear Sums
Presentation transcript:

Carnegie Mellon Maximizing Submodular Functions and Applications in Machine Learning Andreas Krause, Carlos Guestrin Carnegie Mellon University

Tutorial Outline 1.Introduction 2.Submodularity: properties and examples 3.Maximizing submodular function 4.Applications in Machine Learning

3 Feature selection Given random variables Y, X 1, … X n Want to predict Y from subset X A = (X i 1,…,X i k ) Want k most informative features: A* = argmax IG(X A ; Y) s.t. |A| · k where IG(X A ; Y) = H(Y) - H(Y | X A ) Problem inherently combinatorial! Y “Sick” X 1 “Fever” X 2 “Rash” X 3 “Male” Naïve Bayes Model Uncertainty before knowing X A Uncertainty after knowing X A

4 Factoring distributions Given random variables X 1,…,X n Partition variables V into sets A and V n A as independent as possible Formally: Want A* = argmin A I(X A ; X V n A ) s.t. 0<|A|<n where I(X A,X B ) = H(X B ) - H(X B j X A ) Fundamental building block in structure learning [Narasimhan&Bilmes, UAI ’04] Problem inherently combinatorial! X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X1X1 X3X3 X4X4 X6X6 X2X2 X5X5 X7X7 A VnAVnA V

5 Combinatorial problems in ML Given a (finite) set V, function F: 2 V ! R, want A* = argmin F(A) s.t. some constraints on A Solving combinatorial problems: Mixed integer programming? Often difficult to scale to large problems Relaxations? (e.g., L1 regularization, etc.) Not clear when they work This talk: Fully combinatorial algorithms (spanning tree, matching, …)

6 Example: Greedy algorithm for feature selection Given: finite set V of features, utility function F(A) = IG(X A ; Y) Want: A * µ V such that NP-hard! How well can this simple heuristic do? Greedy algorithm: Start with A = ; For i = 1 to k s* := argmax s F(A [ {s}) A := A [ {s*} Y “Sick” X 1 “Fever” X 2 “Rash” X 3 “Male” M

7 s Key property: Diminishing returns Selection A = {} Selection B = {X 2,X 3 } Adding X 1 will help a lot! Adding X 1 doesn’t help much New feature X 1 B A s + + Large improvement Small improvement For A µ B, F(A [ {s}) – F(A) ¸ F(B [ {s}) – F(B) Submodularity: Y “Sick” X 1 “Fever” X 2 “Rash” X 3 “Male” Y “Sick” Theorem [Krause, Guestrin UAI ‘05] : Information gain F(A) in Naïve Bayes models is submodular!

8 Why is submodularity useful? Theorem [Nemhauser et al ‘78] Greedy maximization algorithm returns A greedy : F(A greedy ) ¸ (1-1/e) max |A| · k F(A) Greedy algorithm gives near-optimal solution! More details and exact statement later For info-gain: Guarantees best possible unless P = NP! [Krause, Guestrin UAI ’05] ~63%

Carnegie Mellon Submodularity Properties and Examples

10 Set functions Finite set V = {1,2,…,n} Function F: 2 V ! R Will always assume F( ; ) = 0 (w.l.o.g.) Assume black-box that can evaluate F for any input A Approximate (noisy) evaluation of F is ok (e.g., [37]) Example: F(A) = IG(X A ; Y) = H(Y) – H(Y | X A ) =  y,x A P(x A ) [log P(y | x A ) – log P(y)] Y “Sick” X 1 “Fever” X 2 “Rash” F({X 1,X 2 }) = 0.9 Y “Sick” X 2 “Rash” X 3 “Male” F({X 2,X 3 }) = 0.5

11 Submodular set functions Set function F on V is called submodular if For all A,B µ V: F(A)+F(B) ¸ F(A [ B)+F(A Å B) Equivalent diminishing returns characterization: S B A S + + Large improvement Small improvement For A µ B, s  B, F(A [ {s}) – F(A) ¸ F(B [ {s}) – F(B) Submodularity: B A A [ B AÅBAÅB + + ¸

12 Submodularity and supermodularity Set function F on V is called submodular if 1) For all A,B µ V: F(A)+F(B) ¸ F(A [ B)+F(A Å B)  2) For all A µ B, s  B, F(A [ {s}) – F(A) ¸ F(B [ {s}) – F(B) F is called supermodular if –F is submodular F is called modular if F is both sub- and supermodular for modular (“additive”) F, F(A) =  i 2 A w(i)

13 Example: Set cover Node predicts values of positions with some radius For A µ V: F(A) = “area covered by sensors placed at A” Want to cover floorplan with discs Place sensors in building Possible locations V

14 Set cover is submodular S1S1 S2S2 S1S1 S2S2 S3S3 S4S4 S’ A={S 1,S 2 } B = {S 1,S 2,S 3,S 4 } F(A [ {S’})-F(A) F(B [ {S’})-F(B) ¸

15 Example: Influence in social networks [Kempe, Kleinberg, Tardos KDD ’03] Who should get free cell phones? V = {Dr. Z, Dr. H, Khuong, Huy, Thanh, Nam} F(A) = Expected number of people influenced when targeting A Dr. Zheng Khuong Huy Dr. Han Thanh Nam Prob. of influencing

16 Influence in social networks is submodular [Kempe, Kleinberg, Tardos KDD ’03] Key idea: Flip coins c in advance  “live” edges F c (A) = People influenced under outcome c (set cover!) F(A) =  c P(c) F c (A) is submodular as well! Dr. Zheng Huy Dr. Han Thanh Nam Khuong

17 Submodularity and Concavity Suppose g: N ! R and F(A) = g(|A|) Then F(A) submodular if and only if g concave! E.g., g could say “buying in bulk is cheaper” |A| g(|A|)

18 Maximum of submodular functions Suppose F 1 (A) and F 2 (A) submodular. Is F(A) = max(F 1 (A),F 2 (A)) submodular? |A| F 2 (A) F 1 (A) F(A) = max(F 1 (A),F 2 (A)) max(F 1,F 2 ) not submodular in general!

19 Minimum of submodular functions Well, maybe F(A) = min(F 1 (A),F 2 (A)) instead? F 1 (A)F 2 (A)F(A) ; 000 {a}100 {b}010 {a,b}111 F({b}) – F( ; )=0 F({a,b}) – F({a})=1 < min(F 1,F 2 ) not submodular in general!

Carnegie Mellon Maximizing submodular functions

21 Maximizing submodular functions Minimizing convex functions: Polynomial time solvable! Minimizing submodular functions: Polynomial time solvable! Maximizing convex functions: NP hard! Maximizing submodular functions: NP hard! But can get approximation guarantees

22 Maximizing non-monotonic functions Suppose we want for not monotonic F A* = argmax F(A) s.t. A µ V Example: F’(A) = F(A) – C(A) where F(A) is submodular utility, and C(A) is linear cost function E.g.: Trading off utility and privacy in personalized search [Krause & Horvitz AAAI ’08] In general: NP hard. Moreover: If F(A) can take negative values: As hard to approximate as maximum independent set (i.e., NP hard to get O(n 1-  ) approximation) |A| maximum

23 Maximizing positive submodular functions [Feige, Mirrokni, Vondrak FOCS ’07] picking a random set gives ¼ approximation we cannot get better than ¾ approximation unless P = NP Theorem There is an efficient randomized local search procedure, that, given a positive submodular function F, F( ; )=0, returns set A LS such that F(A LS ) ¸ (2/5) max A F(A)

24 Scalarization vs. constrained maximization Given monotonic utility F(A) and cost C(A), optimize: Option 1: max A F(A) – C(A) s.t. A µ V Option 2: max A F(A) s.t. C(A) · B Can get 2/5 approx… if F(A)-C(A) ¸ 0 for all A µ V coming up… Positiveness is a strong requirement  “Scalarization”“Constrained maximization”

25 Robust optimization Complex constraints Constrained maximization: Outline Selected set Monotonic submodular Budget Selection cost Subset selection: C(A) = |A|

26 Monotonicity A set function is called monotonic if A µ B µ V ) F(A) · F(B) Examples: Influence in social networks [Kempe et al KDD ’03] For discrete RVs, entropy F(A) = H(X A ) is monotonic: Suppose B=A [ C. Then F(B) = H(X A, X C ) = H(X A ) + H(X C | X A ) ¸ H(X A ) = F(A) Information gain: F(A) = H(Y)-H(Y | X A ) Set cover Matroid rank functions (dimension of vector spaces, …) …

27 Subset selection Given: Finite set V, monotonic submodular function F, F( ; ) = 0 Want: A * µ V such that NP-hard!

28 Exact maximization of monotonic submodular functions 1) Mixed integer programming [Nemhauser et al ’81] 2) Branch-and-bound: “Data-correcting algorithm” [Goldengorin et al ’99] max  s.t.  · F(B) +  s 2 V n B  s  s (B) for all B µ S  s  s · k  s 2 {0,1} where  s (B) = F(B [ {s}) – F(B) Solved using constraint generation Both algorithms worst-case exponential! M

29 Approximate maximization Given: finite set V, monotonic submodular function F(A) Want: A * µ V such that NP-hard! Greedy algorithm: Start with A 0 = ; For i = 1 to k s i := argmax s F(A i-1 [ {s}) - F(A i-1 ) A i := A i-1 [ {s i } Y “Sick” X 1 “Fever” X 2 “Rash” X 3 “Male” M

30 Performance of greedy algorithm Theorem [Nemhauser et al ‘78] Given a monotonic submodular function F, F( ; )=0, the greedy maximization algorithm returns A greedy F(A greedy ) ¸ (1-1/e) max |A| · k F(A) ~63%

31 Example: Submodularity of info-gain Y 1,…,Y m, X 1, …, X n discrete RVs F(A) = IG(Y; X A ) = H(Y)-H(Y | X A ) F(A) is always monotonic However, NOT always submodular Theorem [Krause & Guestrin UAI’ 05] If X i are all conditionally independent given Y, then F(A) is submodular! Y1Y1 X1X1 Y2Y2 X2X2 Y3Y3 X4X4 X3X3 Hence, greedy algorithm works! In fact, NO algorithm can do better than (1-1/e) approximation!

32 People sit a lot Activity recognition in assistive technologies Seating pressure as user interface Equipped with 1 sensor per cm 2 ! Costs $16,000!  Can we get similar accuracy with fewer, cheaper sensors? Lean forward SlouchLean left 82% accuracy on 10 postures! [Tan et al] Building a Sensing Chair [Mutlu, Krause, Forlizzi, Guestrin, Hodgins UIST ‘07]

33 How to place sensors on a chair? Sensor readings at locations V as random variables Predict posture Y using probabilistic model P(Y,V) Pick sensor locations A* µ V to minimize entropy: Possible locations V AccuracyCost Before82%$16,000  After79%$100 Placed sensors, did a user study: Similar accuracy at <1% of the cost!

34 Monitoring water networks [Krause et al, J Wat Res Mgt 2008] Contamination of drinking water could affect millions of people Contamination Place sensors to detect contaminations “Battle of the Water Sensor Networks” competition Where should we place sensors to quickly detect contamination? Sensors Simulator from EPA Hach Sensor ~$14K

35 Model-based sensing Utility of placing sensors based on model of the world For water networks: Water flow simulator from EPA F(A)=Expected impact reduction placing sensors at A S2S2 S3S3 S4S4 S1S1 S2S2 S3S3 S4S4 S1S1 High impact reduction F(A) = 0.9 Low impact reduction F(A)=0.01 Model predicts High impact Medium impact location Low impact location Sensor reduces impact through early detection! S1S1 Contamination Set V of all network junctions Theorem [Krause et al., J Wat Res Mgt ’08] : Impact reduction F(A) in water networks is submodular!

36 Battle of the Water Sensor Networks Competition Real metropolitan area network (12,527 nodes) Water flow simulator provided by EPA 3.6 million contamination events Multiple objectives: Detection time, affected population, … Place sensors that detect well “on average”

37 Bounds on optimal solution [Krause et al., J Wat Res Mgt ’08] (1-1/e) bound quite loose… can we get better bounds? Population protected F(A) Higher is better Water networks data Offline (Nemhauser) bound Greedy solution Number of sensors placed

38 Data dependent bounds [Minoux ’78] Suppose A is candidate solution to argmax F(A) s.t. |A| · k and A* = {s 1,…,s k } be an optimal solution Then F(A*) · F(A [ A*) = F(A)+  i F(A [ {s 1,…,s i })-F(A [ {s 1,…,s i-1 }) · F(A) +  i (F(A [ {s i })-F(A)) = F(A) +  i  s i For each s 2 V n A, let  s = F(A [ {s})-F(A) Order such that  1 ¸  2 ¸ … ¸  n Then: F(A*) · F(A) +  i=1 k  i M

39 Bounds on optimal solution [Krause et al., J Wat Res Mgt ’08] Submodularity gives data-dependent bounds on the performance of any algorithm Sensing quality F(A) Higher is better Water networks data Offline (Nemhauser) bound Data-dependent bound Greedy solution Number of sensors placed

40 BWSN Competition results [Ostfeld et al., J Wat Res Mgt 2008] 13 participants Performance measured in 30 different criteria Total Score Higher is better Krause et al. Berry et al. Dorini et al. Wu & Walski Ostfeld & Salomons Propato & Piller Eliades & Polycarpou Huang et al. Guan et al. Ghimire & Barkdoll Trachtman Gueli Preis & Ostfeld E E D D G G G G G H H H G: Genetic algorithm H: Other heuristic D: Domain knowledge E: “Exact” method (MIP) 24% better performance than runner-up!

41 “Lazy” greedy algorithm [Minoux ’78] Lazy greedy algorithm:  First iteration as usual  Keep an ordered list of marginal benefits  i from previous iteration  Re-evaluate  i only for top element  If  i stays on top, use it, otherwise re-sort a b c d Benefit  s (A) e a dbce a c d be Note: Very easy to compute online bounds, lazy evaluations, etc. [Leskovec et al. ’07] M

42 Simulated all on 2 weeks / 40 processors 152 GB data on disk  Very accurate computation of F(A) Using “lazy evaluations”: 1 hour/20 sensors Done after 2 days!, 16 GB in main memory (compressed) Lower is better 30 hours/20 sensors 6 weeks for all 30 settings  3.6M contaminations Very slow evaluation of F(A)  Number of sensors selected Running time (minutes) Exhaustive search (All subsets) Naive greedy Fast greedy ubmodularity to the rescue: Result of lazy evaluation

43 What about worst-case? [Krause et al., NIPS ’07] S2S2 S3S3 S4S4 S1S1 Knowing the sensor locations, an adversary contaminates here! Where should we place sensors to quickly detect in the worst case? Very different average-case impact, Same worst-case impact S2S2 S3S3 S4S4 S1S1 Placement detects well on “average-case” (accidental) contamination

44 Robust optimization Complex constraints Constrained maximization: Outline Selected set Utility function Budget Selection cost Subset selection

45 Separate utility function F i for each contamination i F i (A) = impact reduction by sensors A for contamination i Want to solve Each of the F i is submodular Unfortunately, min i F i not submodular! How can we solve this robust optimization problem? Optimizing for the worst case Contamination at node s Sensors A F s (A) is high Contamination at node r F r (A) is lowF r (B) is high F s (B) is high Sensors B

46 How does the greedy algorithm do? Theorem [NIPS ’07] : The problem max |A| · k min i F i (A) does not admit any approximation unless P=NP Optimal solution Greedy picks first Then, can choose only or  Greedy does arbitrarily badly. Is there something better? V={,, } Can only buy k=2 Greedy score:  Optimal score: 1 Set AF1F1 F2F2 min i F i  1   2  121 Hence we can’t find any approximation algorithm. Or can we?

47 Alternative formulation If somebody told us the optimal value, can we recover the optimal solution A * ? Need to find Is this any easier? Yes, if we relax the constraint |A| · k

48 Solving the alternative problem Trick: For each F i and c, define truncation c |A| F i (A) F’ i,c (A) Same optimal solutions! Solving one solves the other Non-submodular  Don’t know how to solve Submodular! But appears as constraint? Problem 1 (last slide) Problem 2 Remains submodular!

49 Maximization vs. coverage Previously: Wanted A* = argmax F(A) s.t. |A| · k Now need to solve: A* = argmin |A| s.t. F(A) ¸ Q Greedy algorithm: Start with A := ; ; While F(A) < Q and |A|< n s* := argmax s F(A [ {s}) A := A [ {s*} Theorem [Wolsey et al]: Greedy will return A greedy |A greedy | · (1+log max s F({s})) |A opt | For bound, assume F is integral. If not, just round it. M

50 Solving the alternative problem Trick: For each F i and c, define truncation c |A| F i (A) F’ i,c (A) Non-submodular  Don’t know how to solve Submodular! Can use greedy algorithm! Problem 1 (last slide) Problem 2

51 Truncation threshold (color) SATURATE Algorithm [Krause et al, NIPS ‘07] Given: set V, integer k and monotonic SFs F 1,…,F m Initialize c min =0, c max = min i F i (V) Do binary search: c = (c min +c max )/2 Greedily find A G such that F’ avg,c (A G ) = c If |A G | ·  k: increase c min If |A G | >  k: decrease c max until convergence M

52 Theoretical guarantees [Krause et al, NIPS ‘07] Theorem: If there were a polytime algorithm with better factor  < , then NP µ DTIME(n log log n ) Theorem: SATURATE finds a solution A S such that min i F i (A S ) ¸ OPT k and |A S | ·  k where OPT k = max |A| · k min i F i (A)  = 1 + log max s  i F i ({s}) Theorem: The problem max |A| · k min i F i (A) does not admit any approximation unless P=NP 

53 Robust optimization Complex constraints Constrained maximization: Outline Selected set Budget Subset selection Utility function Selection cost

54 Other aspects: Complex constraints max A F(A) or max A min i F i (A) subject to So far: |A| · k In practice, more complex constraints: Different costs: C(A) · B Locations need to be connected by paths [Chekuri & Pal, FOCS ’05] [Singh et al, IJCAI ’07] Lake monitoring Sensors need to communicate (form a routing tree) Building monitoring

55 Non-constant cost functions For each s 2 V, let c(s)>0 be its cost (e.g., feature acquisition costs, …) Cost of a set C(A) =  s 2 A c(s) (modular function!) Want to solve A* = argmax F(A) s.t. C(A) · B Cost-benefit greedy algorithm: Start with A := ; ; While there is an s 2 V n A s.t. C(A [ {s}) · B A := A [ {s*} M

56 Performance of cost-benefit greedy Want max A F(A) s.t. C(A) · 1 Cost-benefit greedy picks a. Then cannot afford b!  Cost-benefit greedy performs arbitrarily badly! Set AF(A)C(A) {a}{a} 22 {b}{b}11

57 Cost-benefit optimization [Wolsey ’82, Sviridenko ’04, Leskovec et al ’07] Theorem [Leskovec et al. KDD ‘07] A CB : cost-benefit greedy solution and A UC : unit-cost greedy solution (i.e., ignore costs) Then max { F(A CB ), F(A UC ) } ¸ ½ (1-1/e) OPT Can still compute online bounds and speed up using lazy evaluations Note: Can also get (1-1/e) approximation in time O(n 4 ) [Sviridenko ’04] Slightly better than ½ (1-1/e) in O(n 2 ) [Wolsey ‘82]

58 Time Information cascade Example: Cascades in the Blogosphere [Leskovec, Krause, Guestrin, Faloutsos, VanBriesen, Glance ‘07] Which blogs should we read to learn about big cascades early? Learn about story after us!

59 Water vs. Web In both problems we are given Graph with nodes (junctions / blogs) and edges (pipes / links) Cascades spreading dynamically over the graph (contamination / citations) Want to pick nodes to detect big cascades early Placing sensors in water networks Selecting informative blogs vs. In both applications, utility functions submodular [Generalizes Kempe et al, KDD ’03]

60 Performance on Blog selection Outperforms state-of-the-art heuristics 700x speedup using submodularity! Blog selection Lower is better Number of blogs selected Running time (seconds) Exhaustive search (All subsets) Naive greedy Fast greedy Blog selection ~45k blogs Higher is better Number of blogs Cascades captured Greedy In-links All outlinks # Posts Random

61 Predicting the “hot” blogs Detects on training set Greedy on historic Test on future Poor generalization! Why’s that? Greedy on future Test on future “Cheating” Cascades captured Cost(A) = Number of posts / day Detect well here! Detect poorly here! Want blogs that will be informative in the future Split data set; train on historic, test on future Blog selection “overfits” to training data! Let’s see what goes wrong here. Want blogs that continue to do well!

62 Robust optimization Detections using SATURATE F 1 (A)=.5F 2 (A)=.8F 3 (A)=.6F 4 (A)=.01F 5 (A)=.02 Optimize worst-case F i (A) = detections in interval i “Overfit” blog selection A “Robust” blog selection A* Robust optimization  Regularization!

63 Predicting the “hot” blogs Greedy on historic Test on future Robust solution Test on future Greedy on future Test on future “Cheating” Cascades captured Cost(A) = Number of posts / day 50% better generalization!