Minimizing general submodular functions CVPR 2015 Tutorial Stefanie Jegelka MIT
( ) = The set function view cost of buying items together, or ( ) = cost of buying items together, or utility, or probability, … We will assume: . black box “oracle” to evaluate F
Set functions and energy functions any set function with . … is a function on binary vectors! 1 a b c d A a b d c F: \{0,1\}^n \to \mathbb{R} binary labeling problems = subset selection problems!
Discrete Labeling sky tree house grass TODO: also stereo, 3d segmentation?
Summarization
Influential subsets
Submodularity extra cost: extra cost: free refill one drink \underbrace{\textcolor{white}{\hspace{25pt}.}} extra cost: one drink extra cost: free refill diminishing marginal costs
The big picture graph theory electrical networks game theory (Frank 1993) electrical networks (Narayanan 1997) game theory (Shapley 1970) G. Choquet J. Edmonds combinatorial optimization submodular functions matroid theory (Whitney, 1935) computer vision & machine learning stochastic processes (Macchi 1975, Borodin 2009) L. Lovász L.S. Shapley
Examples sensing: F(S) = information gained from locations S
Example: cover
Maximizing Influence Kempe, Kleinberg & Tardos 2003
Submodular set functions Diminishing gains: for all Union-Intersection: for all B A + e + e
Submodularity: boolean & sets
Graph cuts Cut for one edge: cut of one edge is submodular! cut of one edge is submodular! large graph: sum of edges Useful property: sum of submodular functions is submodular
Other closedness properties submodular on . The following are submodular: Restriction: ----- Meeting Notes (8/14/12 09:55) ----- illustrations S W V S V
Other closedness properties submodular on . The following are submodular: Restriction: Conditioning: ----- Meeting Notes (8/14/12 09:55) ----- illustrations S W V S V
Closedness properties submodular on . The following are submodular: Restriction: Conditioning: Reflection: ----- Meeting Notes (8/14/12 09:55) ----- illustrations S V
Submodular optimization subset selection: min / max F(S) minimizing submodular functions: next maximizing submodular functions: afternoon convex … … and concave aspects!
Minimizing submodular functions Why? energy minimization variational inference (marginals) structured sparse estimation … How? graph cuts – fast, not always possible convex relaxations – can be fast, always possible …
submodularity & convexity … is a function on binary vectors! any set function with . pseudo-boolean function A 1 a b c d a b d c F: \{0,1\}^n \to \mathbb{R}
Relaxation: idea
A relaxation (extension) have want: extension (1.0 - 0.5) + (0.5 – 0.2) + (0.2) x = \sum_{i=1}^k\; \alpha_i\, \mathbf{1}_{S_i}
The Lovász extension have want: extension
Examples truncation cut function “total variation”! 1.0 - 0.5 F(S) = \begin{cases} 1 &\text{ if } S = \{1\}, \,\{2\}\\ 0 &\text{ if } S = \emptyset,\, \{1,2\} \end{cases} “total variation”!
Alternative characterization if F is submodular, this is equivalent to: Theorem (Lovász, 1983) Lovasz extension is convex F is submodular.
Submodular polyhedra submodular polyhedron: Base polytope \mathcal{P}_F = \{ y\in \mathbb{R}^n \mid y(A) \leq F(A) \text{ for all } A \subseteq \mathcal{V}\} \mathcal{B}_F = \{y \in \mathcal{P}_F \mid y(\mathcal{V}) = F(\mathcal{V})\} \begin{tabular}{c|r} $A$ & $F(A)$\\ \hline $\emptyset$ & $0$\\ $a$ & $-1$ \\ $b$ & $2$\\ $\{a,b\}$ & $0$ \end{tabular}
Base polytope Base polytope Edmonds 1970: “magic” exponentially many constraints! Edmonds 1970: “magic” compute argmax in O(n log n) basis of (almost all) optimization! -- separation oracle – subgradient --
Base polytopes Base polytope 2D (2 elements) 3D (3 elements)
Convex relaxation relaxation: convex optimization (non-smooth) \min_{S \subseteq \mathcal{V}}\, F(S) \min_{x \in [0,1]^n}\; f(x) relaxation: convex optimization (non-smooth) relaxation is exact! submodular minimization in polynomial time! (Grötschel, Lovász, Schrijver 1981)
Submodular minimization minimize subgradient descent smoothing (special cases) solve dual: combinatorial algorithms foundations: Edmonds, Cunningham first poly-time algorithms: (Iwata-Fujishige-Fleischer 2001, Schrijver 2000) many more after that …
Minimum-norm-point algorithm Fujishige ‘91, Fujishige & Isotani ‘11 Lovász extension proximal problem dual: minimum norm problem -1 1 -1 a a minimizes F ! b \min_{x \in [0,1]^n} f(x) + \tfrac{1}{2}\|x\|^2 \min_{u \in B(F)} \tfrac{1}{2}\|u\|^2 A^* = \arg\min_{A \subseteq V} F(A) A^* = \{ i \mid u^*(i) \leq 0\}
Minimum-norm-point algorithm 1. optimization: find 2. rounding: -0.5 0.8 1.0 a b c d a b d c
The bigger story projection proximal parametric thresholding TODO: refs divide-and-conquer (Fujishige & Isotani 11, Nagano, Gallo-Grigoriadis-Tarjan 06, Hochbaum 01, Chambolle & Darbon 09, …)
Minimum-norm-point algorithm how solve? 1. optimization: find 2. rounding: Polytope has exponentially many inequalities / faces BUT: can do linear optimization over Frank-Wolfe or Fujishige-Wolfe algorithm a b d c -0.5 0.8 1.0
Frank-Wolfe: main idea
Empirically convergence of relaxation convergence of S min-norm point (Figure from Bach, 2012)
Recap – links to convexity submodular function F(S) convex extension f(x) --- can compute it! submodular minimization as convex optimization -- can solve it! What can we do with it?
Links to convexity What can we do with it? MAP inference / energy minimization (out-of-the-box) variational inference (Djolonga & Krause 2014) structured sparsity (Bach 2010) decomposition & parallel algorithms
Structured sparsity and submodularity
Sparse reconstruction Assumption: x is sparse subset selection: S = {1,3,4,7} discrete regularization on support S of x relax to convex envelope \Omega(x) = f(|x|) sparsity pattern often not random…
Structured sparsity Assumption: support of x has structure express by set function!
Preference for trees Set function: if T is a tree and S not |S| = |T| use as regularizer?
Sparsity x sparse x structured sparse submodular function discrete regularization on support S of x relax to convex envelope \Omega(x) = f(|x|) Lovász extension Optimization: submodular minimization (min-norm) (Bach2010)
Special case minimize a sum of submodular functions “easy” combinatorial algorithms (Kolmogorov 12, Fix-Joachims-Park-Zabih 13, Fix-Wang-Zabih 14) convex relaxations
Relaxation convex Lovász extension: tight relaxation dual decomposition: parallel algorithms (Komodakis-Paragios-Tziritas 11, Savchynskyy-Schmidt-Kappes-Schnörr 11, J-Bach-Sra 13) \min_{S \subseteq \mathcal{V}}\; \sum\nolimits_{i} F_i(S) \;\; = \; \min_{x \in [0,1]^n}\; \sum\nolimits_i f_i(x)
Results: dual decomposition relaxation I relax II convergence discrete problem smooth dual non-smooth dual faster parallel algorithms (Jegelka, Bach, Sra 2013; Nishihara, Jegelka, Jordan 2014)
Summary Submodular functions – diminishing returns/costs convex relations: exact relaxation structured norms fast algorithms more soon: constraints maximization: diversity, information