Download presentation
Presentation is loading. Please wait.
Published byLydia Strickland Modified over 9 years ago
1
Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school
2
Overview discrete labeling problems (MAP inference) (structured) sparse variable selection finding informative / influential subsets Recurrent questions: how model prior knowledge / assumptions? structure efficient optimization? Recurrent themes: convexity submodularity polyhedra
4
Intuition: min vs max
5
Sensing Place sensors to monitor temperature
6
Sensing Y s : temperature at location s X s : sensor value at location s X s = Y s + noise x1x1 x2x2 x3x3 x6x6 x5x5 x4x4 y1y1 y4y4 y3y3 y6y6 y5y5 y2y2 Where to measure to maximize information about y? monotone submodular function!
7
Maximizing influence
8
Maximizing diffusion each node monotone submodular activation function and random threshold activated if active neighbors Theorem (Mossel & Roch 07) is submodular. # active after n steps
9
Diversity priors “spread out”
10
Determinantal point processes normalized similarity matrix sample Y: repulsion is submodular (not monotone)
11
Diversity priors (Kulesza & Taskar 10)
12
Summarization (Lin & Bilmes 11) RelevanceDiversity
13
assume generic case – bi-directional greedy (BFNS12) – local search (FMV07) monotone function (constrained) – greedy (NWF78) – relaxation (CCPV11) exact methods (NW81,GSTT99,KNTB09) NP hard
14
Monotone maximization greedy algorithm:
15
Monotone maximization Theorem (NWF78) sensor placement information gain optimal greedy empirically: speedup in practice: “lazy greedy” (Minoux, 78)
16
More complex costraints Ground set Configuration: Sensing quality model k Configuration is feasible if no camera points in two directions at once
17
Matroids 17 S is independent if … … |S| ≤ k Uniform matroid … S contains at most one element from each square Partition matroid … S contains no cycles Graphic matroid S independent T S also independent Exchange property: S, U independent, |S| > |U| some can be added to U: independent All maximal independent sets have the same size
18
Matroids 18 S is independent if … … |S| ≤ k Uniform matroid … S contains at most one element from each group Partition matroid … S contains no cycles Graphic matroid S independent T S also independent Exchange property: S, U independent, |S| > |U| some can be added to U: independent All maximal independent sets have the same size
19
More complex costraints Ground set Configuration: Sensing quality model k Configuration is feasible if no camera points in two directions at once Partition matroid independence if
20
Maximization over matroids greedy algorithm:
21
Maximization over matroids Theorem (FNW78) better: relaxation (continuous greedy) approximation factor (CCPV11)
22
concave in certain directions approximate by sampling Multilinear relaxation vs. Lovász ext. convex computable in O(n log n)
23
assume generic case – bi-directional greedy (BFNS12) – local search (FMV07) monotone function (constrained) – greedy (NWF78) – relaxation (CCPV11) exact methods (NW81,GSTT99,KNTB09) NP hard
24
Non-monotone maximization AB a b c d e f a
25
AB a c d e f a c
26
Theorem (BFNS12)
27
Summary submodular maximization NP-hard – ½ approximation constrained maximization NP-hard, mostly constant approximation factors submodular minimization exploit convexity – poly-time constrained minimization? special cases poly-time; many cases polynomial lower bounds
28
Constraints 28 cutmatchingpathspanning tree ground set: edges in a graph minimum…
29
Recall: MAP and cuts 29 pairwise random field: What’s the problem? minimum cut: prefer short cut = short object boundary aim reality
30
MAP and cuts 30 Minimum cut minimize sum of edge weights implicit criterion: short cut = short boundary minimize submodular function of edges new criterion: boundary may be long if the boundary is homogeneous Minimum cooperative cut not a sum of edge weights!
31
Reward co-occurrence of edges 31 submodular cost function: use few groups S i of edges sum of weights: use few edges 7 edges, 4 types 25 edges, 1 type
32
Results Graph cutCooperative cut 32
33
Constrained optimization 33 cut matchingpath spanning tree convex relaxation minimize surrogate function (Goel et al.`09, Iwata & Nagano `09, Goemans et al. `09, Jegelka & Bilmes `11, Iyer et al. `13, Kohli et al `13...) approximate optimization approximation bounds dependent on F: polynomial – constant – FPTAS
34
Efficient constrained optimization 34 (JB11, IJB13) 2. Solve easy sum-of-weights problem: and repeat. minimize a series of surrogate functions 1. compute linear upper bound efficient only need to solve sum-of-weights problems
35
Does it work? 35 Goemans et al 2009 majorize-minimize 1 iteration optimal solution empirical results much better than theoretical worst-case bounds!?
36
Does it work? 36 approximate solutionoptimal solution (Kohli, Osokin, Jegelka 2013) (Jegelka & Bilmes 2011) minimum cut solution
37
Theory and practice 37 vs. worst-case Lower bound trees, matchings cuts approximation learning bounds from (Goel et al.‘09, Iwata & Nagano‘09, Jegelka & Bilmes‘11, Goemans et al‘09, Svitkina& Fleischer‘08, Balcan & Harvey’12) Good approximations in practice …. BUT not in theory? theory says: no good approximations possible (in general) What makes some (practical) problems easier than others?
38
Curvature 38 Theorems (IJB 2013). Tightened upper & lower bounds for constrained minimization, approximation, learning: size of set for submodular max: (Conforti & Cornuéjols`84, Vondrák`08) marginal cost single-item cost small large worst case opt cost
39
Curvature and approximations 39 smaller is better
40
If there was more time… Learning submodular functions Adaptive submodular maximization Online learning/optimization Distributed algorithms Many more applications… worst case vs. average practical case pointers and references: http://www.cs.berkeley.edu/~stefje/madalgo/literature_list.pdf http://www.cs.berkeley.edu/~stefje/madalgo/literature_list.pdf slides: http://www.cs.berkeley.edu/~stefje/madalgo/http://www.cs.berkeley.edu/~stefje/madalgo/
41
Summary discrete labeling problems (MAP inference) (structured) sparse variable selection finding informative / influential subsets Recurrent questions: how model prior knowledge / assumptions? structure efficient optimization? Recurrent themes: convexity submodularity polyhedra
42
Submodularity and machine learning 42 bla blablala oh bla bl abl lba bla gggg hgt dfg uyg sd djfkefbjal odh wdbfeowhjkd fenjk jj bla blablala oh bla dw bl abl lba bla gggg hgt dfg uyg sd djfkefbjal odh wdbfeowhjkd fenjk jj bla blablala oh bla bl abl lba bla gggg hgt dfg uyg efefm o sd djfkefbjal odh wdbfeowhjkd fenjk jj ef owskf wu distributions over labels, sets often: tractability – submodularity e.g. “attractive” graphical models, determinantal point processes distributions over labels, sets often: tractability – submodularity e.g. “attractive” graphical models, determinantal point processes (convex) regularization submodularity: “discrete convexity” e.g. combinatorial sparse estimation (convex) regularization submodularity: “discrete convexity” e.g. combinatorial sparse estimation submodularit y behind a lot of machine learning! submodularit y behind a lot of machine learning!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.