Minimizing general submodular functions

Slides:



Advertisements
Similar presentations
Iterative Rounding and Iterative Relaxation
Advertisements

Primal-dual Algorithm for Convex Markov Random Fields Vladimir Kolmogorov University College London GDR (Optimisation Discrète, Graph Cuts et Analyse d'Images)
Algorithms for MAP estimation in Markov Random Fields Vladimir Kolmogorov University College London Tutorial at GDR (Optimisation Discrète, Graph Cuts.
1 LP, extended maxflow, TRW OR: How to understand Vladimirs most recent work Ramin Zabih Cornell University.
Beyond Convexity – Submodularity in Machine Learning
Primal Dual Combinatorial Algorithms Qihui Zhu May 11, 2009.
Linear Time Methods for Propagating Beliefs Min Convolution, Distance Transforms and Box Sums Daniel Huttenlocher Computer Science Department December,
Submodular Set Function Maximization via the Multilinear Relaxation & Dependent Rounding Chandra Chekuri Univ. of Illinois, Urbana-Champaign.
Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv: )
Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva Joint work-in-progress with Zoubin Ghahramani.
Graph Cut Algorithms for Computer Vision & Medical Imaging Ramin Zabih Computer Science & Radiology Cornell University Joint work with Y. Boykov, V. Kolmogorov,
Provable Submodular Minimization using Wolfe’s Algorithm Deeparnab Chakrabarty (Microsoft Research) Prateek Jain (Microsoft Research) Pravesh Kothari (U.
Parallel Double Greedy Submodular Maxmization Xinghao Pan, Stefanie Jegelka, Joseph Gonzalez, Joseph Bradley, Michael I. Jordan.
ICCV 2007 tutorial Part III Message-passing algorithms for energy minimization Vladimir Kolmogorov University College London.
C&O 355 Mathematical Programming Fall 2010 Lecture 21 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A.
Combinatorial Algorithms for Market Equilibria Vijay V. Vazirani.
Dependent Randomized Rounding in Matroid Polytopes (& Related Results) Chandra Chekuri Jan VondrakRico Zenklusen Univ. of Illinois IBM ResearchMIT.
C. Olsson Higher-order and/or non-submodular optimization: Yuri Boykov jointly with Western University Canada O. Veksler Andrew Delong L. Gorelick C. NieuwenhuisE.
Variational Inference in Bayesian Submodular Models
1 s-t Graph Cuts for Binary Energy Minimization  Now that we have an energy function, the big question is how do we minimize it? n Exhaustive search is.
Learning with Inference for Discrete Graphical Models Nikos Komodakis Pawan Kumar Nikos Paragios Ramin Zabih (presenter)
1 Fast Primal-Dual Strategies for MRF Optimization (Fast PD) Robot Perception Lab Taha Hamedani Aug 2014.
Approximation Algoirthms: Semidefinite Programming Lecture 19: Mar 22.
1 Can this be generalized?  NP-hard for Potts model [K/BVZ 01]  Two main approaches 1. Exact solution [Ishikawa 03] Large graph, convex V (arbitrary.
Max-Min Fair Allocation of Indivisible Goods Amin Saberi Stanford University Joint work with Arash Asadpour TexPoint fonts used in EMF. Read the TexPoint.
The Submodular Welfare Problem Lecturer: Moran Feldman Based on “Optimal Approximation for the Submodular Welfare Problem in the Value Oracle Model” By.
Totally Unimodular Matrices Lecture 11: Feb 23 Simplex Algorithm Elliposid Algorithm.
Semidefinite Programming
1 Introduction to Linear and Integer Programming Lecture 9: Feb 14.
Introduction to Linear and Integer Programming Lecture 7: Feb 1.
2010/5/171 Overview of graph cuts. 2010/5/172 Outline Introduction S-t Graph cuts Extension to multi-label problems Compare simulated annealing and alpha-
Online Oblivious Routing Nikhil Bansal, Avrim Blum, Shuchi Chawla & Adam Meyerson Carnegie Mellon University 6/7/2003.
Lecture 10: Support Vector Machines
Graph-Cut Algorithm with Application to Computer Vision Presented by Yongsub Lim Applied Algorithm Laboratory.
Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.
Extensions of submodularity and their application in computer vision
1 Spanning Tree Polytope x1 x2 x3 Lecture 11: Feb 21.
Probabilistic Inference Lecture 4 – Part 2 M. Pawan Kumar Slides available online
Submodularity in Machine Learning
Online Oblivious Routing Nikhil Bansal, Avrim Blum, Shuchi Chawla & Adam Meyerson Carnegie Mellon University 6/7/2003.
CS774. Markov Random Field : Theory and Application Lecture 08 Kyomin Jung KAIST Sep
Approximation algorithms for sequential testing of Boolean functions Lisa Hellerstein Polytechnic Institute of NYU Joint work with Devorah Kletenik (Polytechnic.
CS774. Markov Random Field : Theory and Application Lecture 13 Kyomin Jung KAIST Oct
Martin Grötschel  Institute of Mathematics, Technische Universität Berlin (TUB)  DFG-Research Center “Mathematics for key technologies” (M ATHEON ) 
Chapter 1. Formulations 1. Integer Programming  Mixed Integer Optimization Problem (or (Linear) Mixed Integer Program, MIP) min c’x + d’y Ax +
Algorithms for MAP estimation in Markov Random Fields Vladimir Kolmogorov University College London.
Approximation Algorithms for Prize-Collecting Forest Problems with Submodular Penalty Functions Chaitanya Swamy University of Waterloo Joint work with.
5 Maximizing submodular functions Minimizing convex functions: Polynomial time solvable! Minimizing submodular functions: Polynomial time solvable!
2) Combinatorial Algorithms for Traditional Market Models Vijay V. Vazirani.
Tractable Higher Order Models in Computer Vision (Part II) Slides from Carsten Rother, Sebastian Nowozin, Pusohmeet Khli Microsoft Research Cambridge Presented.
A Unified Continuous Greedy Algorithm for Submodular Maximization Moran Feldman Roy SchwartzJoseph (Seffi) Naor Technion – Israel Institute of Technology.
Submodular set functions Set function z on V is called submodular if For all A,B µ V: z(A)+z(B) ¸ z(A[B)+z(AÅB) Equivalent diminishing returns characterization:
A global approach Finding correspondence between a pair of epipolar lines for all pixels simultaneously Local method: no guarantee we will have one to.
Maximizing Symmetric Submodular Functions Moran Feldman EPFL.
Algorithmic Game Theory and Internet Computing Vijay V. Vazirani 3) New Market Models, Resource Allocation Markets.
MAP Estimation in Binary MRFs using Bipartite Multi-Cuts Sashank J. Reddi Sunita Sarawagi Sundar Vishwanathan Indian Institute of Technology, Bombay TexPoint.
Submodularity Reading Group Submodular Function Minimization via Linear Programming M. Pawan Kumar
Energy minimization Another global approach to improve quality of correspondences Assumption: disparities vary (mostly) smoothly Minimize energy function:
Submodularity Reading Group Matroids, Submodular Functions M. Pawan Kumar
Lap Chi Lau we will only use slides 4 to 19
Topics in Algorithms Lap Chi Lau.
Non-additive Security Games
Joseph E. Gonzalez Postdoc, UC Berkeley AMPLab
Distributed Submodular Maximization in Massive Datasets
Coverage Approximation Algorithms
A Faster Algorithm for Computing the Principal Sequence of Partitions
Chapter 1. Formulations (BW)
Submodular Maximization Through the Lens of the Multilinear Relaxation
Chapter 1. Formulations.
Guess Free Maximization of Submodular and Linear Sums
Presentation transcript:

Minimizing general submodular functions CVPR 2015 Tutorial Stefanie Jegelka MIT

( ) = The set function view cost of buying items together, or ( ) = cost of buying items together, or utility, or probability, … We will assume: . black box “oracle” to evaluate F

Set functions and energy functions any set function with . … is a function on binary vectors! 1 a b c d A a b d c F: \{0,1\}^n \to \mathbb{R} binary labeling problems = subset selection problems!

Discrete Labeling sky tree house grass TODO: also stereo, 3d segmentation?

Summarization

Influential subsets

Submodularity extra cost: extra cost: free refill  one drink \underbrace{\textcolor{white}{\hspace{25pt}.}} extra cost: one drink extra cost: free refill  diminishing marginal costs

The big picture graph theory electrical networks game theory (Frank 1993) electrical networks (Narayanan 1997) game theory (Shapley 1970) G. Choquet J. Edmonds combinatorial optimization submodular functions matroid theory (Whitney, 1935) computer vision & machine learning stochastic processes (Macchi 1975, Borodin 2009) L. Lovász L.S. Shapley

Examples sensing: F(S) = information gained from locations S

Example: cover

Maximizing Influence Kempe, Kleinberg & Tardos 2003

Submodular set functions Diminishing gains: for all Union-Intersection: for all B A + e + e

Submodularity: boolean & sets

Graph cuts Cut for one edge: cut of one edge is submodular! cut of one edge is submodular! large graph: sum of edges Useful property: sum of submodular functions is submodular

Other closedness properties submodular on . The following are submodular: Restriction: ----- Meeting Notes (8/14/12 09:55) ----- illustrations S W V S V

Other closedness properties submodular on . The following are submodular: Restriction: Conditioning: ----- Meeting Notes (8/14/12 09:55) ----- illustrations S W V S V

Closedness properties submodular on . The following are submodular: Restriction: Conditioning: Reflection: ----- Meeting Notes (8/14/12 09:55) ----- illustrations S V

Submodular optimization subset selection: min / max F(S) minimizing submodular functions: next maximizing submodular functions: afternoon convex … … and concave aspects!

Minimizing submodular functions Why? energy minimization variational inference (marginals) structured sparse estimation … How? graph cuts – fast, not always possible convex relaxations – can be fast, always possible …

submodularity & convexity … is a function on binary vectors! any set function with . pseudo-boolean function A 1 a b c d a b d c F: \{0,1\}^n \to \mathbb{R}

Relaxation: idea

A relaxation (extension) have want: extension (1.0 - 0.5) + (0.5 – 0.2) + (0.2) x = \sum_{i=1}^k\; \alpha_i\, \mathbf{1}_{S_i}

The Lovász extension have want: extension

Examples truncation cut function “total variation”! 1.0 - 0.5 F(S) = \begin{cases} 1 &\text{ if } S = \{1\}, \,\{2\}\\ 0 &\text{ if } S = \emptyset,\, \{1,2\} \end{cases} “total variation”!

Alternative characterization if F is submodular, this is equivalent to: Theorem (Lovász, 1983) Lovasz extension is convex F is submodular.

Submodular polyhedra submodular polyhedron: Base polytope \mathcal{P}_F = \{ y\in \mathbb{R}^n \mid y(A) \leq F(A) \text{ for all } A \subseteq \mathcal{V}\} \mathcal{B}_F = \{y \in \mathcal{P}_F \mid y(\mathcal{V}) = F(\mathcal{V})\} \begin{tabular}{c|r} $A$ & $F(A)$\\ \hline $\emptyset$ & $0$\\ $a$ & $-1$ \\ $b$ & $2$\\ $\{a,b\}$ & $0$ \end{tabular}

Base polytope Base polytope Edmonds 1970: “magic” exponentially many constraints! Edmonds 1970: “magic” compute argmax in O(n log n)  basis of (almost all) optimization! -- separation oracle – subgradient --

Base polytopes Base polytope 2D (2 elements) 3D (3 elements)

Convex relaxation relaxation: convex optimization (non-smooth) \min_{S \subseteq \mathcal{V}}\, F(S) \min_{x \in [0,1]^n}\; f(x) relaxation: convex optimization (non-smooth) relaxation is exact!  submodular minimization in polynomial time! (Grötschel, Lovász, Schrijver 1981)

Submodular minimization minimize subgradient descent smoothing (special cases) solve dual: combinatorial algorithms foundations: Edmonds, Cunningham first poly-time algorithms: (Iwata-Fujishige-Fleischer 2001, Schrijver 2000) many more after that …

Minimum-norm-point algorithm Fujishige ‘91, Fujishige & Isotani ‘11 Lovász extension proximal problem dual: minimum norm problem -1 1 -1 a a minimizes F ! b \min_{x \in [0,1]^n} f(x) + \tfrac{1}{2}\|x\|^2 \min_{u \in B(F)} \tfrac{1}{2}\|u\|^2 A^* = \arg\min_{A \subseteq V} F(A) A^* = \{ i \mid u^*(i) \leq 0\}

Minimum-norm-point algorithm 1. optimization: find 2. rounding: -0.5 0.8 1.0 a b c d a b d c

The bigger story projection proximal parametric thresholding TODO: refs divide-and-conquer (Fujishige & Isotani 11, Nagano, Gallo-Grigoriadis-Tarjan 06, Hochbaum 01, Chambolle & Darbon 09, …)

Minimum-norm-point algorithm how solve? 1. optimization: find 2. rounding: Polytope has exponentially many inequalities / faces BUT: can do linear optimization over   Frank-Wolfe or Fujishige-Wolfe algorithm a b d c -0.5 0.8 1.0

Frank-Wolfe: main idea

Empirically convergence of relaxation convergence of S min-norm point (Figure from Bach, 2012)

Recap – links to convexity submodular function F(S) convex extension f(x) --- can compute it! submodular minimization as convex optimization -- can solve it! What can we do with it?

Links to convexity What can we do with it? MAP inference / energy minimization (out-of-the-box) variational inference (Djolonga & Krause 2014) structured sparsity (Bach 2010) decomposition & parallel algorithms

Structured sparsity and submodularity

Sparse reconstruction Assumption: x is sparse subset selection: S = {1,3,4,7} discrete regularization on support S of x relax to convex envelope \Omega(x) = f(|x|) sparsity pattern often not random…

Structured sparsity Assumption: support of x has structure express by set function!

Preference for trees Set function: if T is a tree and S not |S| = |T| use as regularizer?

Sparsity x sparse x structured sparse submodular function discrete regularization on support S of x relax to convex envelope \Omega(x) = f(|x|)  Lovász extension Optimization: submodular minimization (min-norm) (Bach2010)

Special case minimize a sum of submodular functions “easy” combinatorial algorithms (Kolmogorov 12, Fix-Joachims-Park-Zabih 13, Fix-Wang-Zabih 14) convex relaxations

Relaxation convex Lovász extension: tight relaxation dual decomposition: parallel algorithms (Komodakis-Paragios-Tziritas 11, Savchynskyy-Schmidt-Kappes-Schnörr 11, J-Bach-Sra 13) \min_{S \subseteq \mathcal{V}}\; \sum\nolimits_{i} F_i(S) \;\; = \; \min_{x \in [0,1]^n}\; \sum\nolimits_i f_i(x)

Results: dual decomposition relaxation I relax II convergence discrete problem smooth dual non-smooth dual faster parallel algorithms  (Jegelka, Bach, Sra 2013; Nishihara, Jegelka, Jordan 2014)

Summary Submodular functions – diminishing returns/costs convex relations: exact relaxation structured norms fast algorithms more soon: constraints maximization: diversity, information