Algorithms for MAP estimation in Markov Random Fields Vladimir Kolmogorov University College London.

Slides:

Advertisements

Similar presentations

Primal-dual Algorithm for Convex Markov Random Fields Vladimir Kolmogorov University College London GDR (Optimisation Discrète, Graph Cuts et Analyse d'Images)

Advertisements

Algorithms for MAP estimation in Markov Random Fields Vladimir Kolmogorov University College London Tutorial at GDR (Optimisation Discrète, Graph Cuts.

Bayesian Belief Propagation

1 LP, extended maxflow, TRW OR: How to understand Vladimirs most recent work Ramin Zabih Cornell University.

1 LP Duality Lecture 13: Feb Min-Max Theorems In bipartite graph, Maximum matching = Minimum Vertex Cover In every graph, Maximum Flow = Minimum.

Tutorial at ICCV (Barcelona, Spain, November 2011)

Introduction to Algorithms

Introduction to Markov Random Fields and Graph Cuts Simon Prince

ICCV 2007 tutorial Part III Message-passing algorithms for energy minimization Vladimir Kolmogorov University College London.

Discrete Optimization in Computer Vision Nikos Komodakis Ecole des Ponts ParisTech, LIGM Traitement de l’information et vision artiﬁcielle.

I Images as graphs Fully-connected graph – node for every pixel – link between every pair of pixels, p,q – similarity w ij for each link j w ij c Source:

The University of Ontario CS 4487/9687 Algorithms for Image Analysis Multi-Label Image Analysis Problems.

1 s-t Graph Cuts for Binary Energy Minimization  Now that we have an energy function, the big question is how do we minimize it? n Exhaustive search is.

Learning with Inference for Discrete Graphical Models Nikos Komodakis Pawan Kumar Nikos Paragios Ramin Zabih (presenter)

F IXING M AX -P RODUCT : A U NIFIED L OOK AT M ESSAGE P ASSING A LGORITHMS Nicholas Ruozzi and Sekhar Tatikonda Yale University.

Learning to Detect A Salient Object Reporter: 鄭綱 (3/2)

Convergent and Correct Message Passing Algorithms Nicholas Ruozzi and Sekhar Tatikonda Yale University TexPoint fonts used in EMF. Read the TexPoint manual.

P 3 & Beyond Solving Energies with Higher Order Cliques Pushmeet Kohli Pawan Kumar Philip H. S. Torr Oxford Brookes University CVPR 2007.

The University of Ontario University of Bonn July 2008 Optimization of surface functionals using graph cut algorithms Yuri Boykov presenting joint work.

Improved Moves for Truncated Convex Models M. Pawan Kumar Philip Torr.

2010/5/171 Overview of graph cuts. 2010/5/172 Outline Introduction S-t Graph cuts Extension to multi-label problems Compare simulated annealing and alpha-

Stereo & Iterative Graph-Cuts Alex Rav-Acha Vision Course Hebrew University.

Message Passing Algorithms for Optimization

Efficiently Solving Convex Relaxations M. Pawan Kumar University of Oxford for MAP Estimation Philip Torr Oxford Brookes University.

Stereo Computation using Iterative Graph-Cuts

Comp 775: Graph Cuts and Continuous Maximal Flows Marc Niethammer, Stephen Pizer Department of Computer Science University of North Carolina, Chapel Hill.

Measuring Uncertainty in Graph Cut Solutions Pushmeet Kohli Philip H.S. Torr Department of Computing Oxford Brookes University.

Graph-Cut Algorithm with Application to Computer Vision Presented by Yongsub Lim Applied Algorithm Laboratory.

Computer vision: models, learning and inference

Extensions of submodularity and their application in computer vision

MAP Estimation Algorithms in M. Pawan Kumar, University of Oxford Pushmeet Kohli, Microsoft Research Computer Vision - Part I.

Probabilistic Inference Lecture 4 – Part 2 M. Pawan Kumar Slides available online

Computer vision: models, learning and inference

CS774. Markov Random Field : Theory and Application Lecture 08 Kyomin Jung KAIST Sep

Graph Cut & Energy Minimization

MRFs and Segmentation with Graph Cuts Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 02/24/10.

Graph Cut 韋弘 2010/2/22. Outline Background Graph cut Ford–Fulkerson algorithm Application Extended reading.

CS774. Markov Random Field : Theory and Application Lecture 13 Kyomin Jung KAIST Oct

Planar Cycle Covering Graphs for inference in MRFS The Typhon Algorithm A New Variational Approach to Ground State Computation in Binary Planar Markov.

Lena Gorelick joint work with O. Veksler I. Ben Ayed A. Delong Y. Boykov.

Graph Cuts Marc Niethammer. Segmentation by Graph-Cuts A way to compute solutions to the optimization problems we looked at before. Example: Binary Segmentation.

Discrete Optimization in Computer Vision M. Pawan Kumar Slides will be available online

Discrete Optimization Lecture 3 – Part 1 M. Pawan Kumar Slides available online

1 Markov Random Fields with Efficient Approximations Yuri Boykov, Olga Veksler, Ramin Zabih Computer Science Department CORNELL UNIVERSITY.

Fast and accurate energy minimization for static or time-varying Markov Random Fields (MRFs) Nikos Komodakis (Ecole Centrale Paris) Nikos Paragios (Ecole.

Probabilistic Inference Lecture 5 M. Pawan Kumar Slides available online

Dynamic Tree Block Coordinate Ascent Daniel Tarlow 1, Dhruv Batra 2 Pushmeet Kohli 3, Vladimir Kolmogorov 4 1: University of Toronto3: Microsoft Research.

Update any set S of nodes simultaneously with step-size We show fixed point update is monotone for · 1/|S| Covering Trees and Lower-bounds on Quadratic.

Lecture 19: Solving the Correspondence Problem with Graph Cuts CAP 5415 Fall 2006.

Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)

1 Mean Field and Variational Methods finishing off Graphical Models – Carlos Guestrin Carnegie Mellon University November 5 th, 2008 Readings: K&F:

Inference for Learning Belief Propagation. So far... Exact methods for submodular energies Approximations for non-submodular energies Move-making ( N_Variables.

Discrete Optimization Lecture 1 M. Pawan Kumar Slides available online

Pattern Recognition and Machine Learning

A global approach Finding correspondence between a pair of epipolar lines for all pixels simultaneously Local method: no guarantee we will have one to.

Graph Algorithms for Vision Amy Gale November 5, 2002.

MAP Estimation in Binary MRFs using Bipartite Multi-Cuts Sashank J. Reddi Sunita Sarawagi Sundar Vishwanathan Indian Institute of Technology, Bombay TexPoint.

TU/e Algorithms (2IL15) – Lecture 12 1 Linear Programming.

Markov Random Fields in Vision

Approximation Algorithms Duality My T. UF.

Approximation Algorithms based on linear programming.

Rounding-based Moves for Metric Labeling M. Pawan Kumar École Centrale Paris INRIA Saclay, Île-de-France.

Energy minimization Another global approach to improve quality of correspondences Assumption: disparities vary (mostly) smoothly Minimize energy function:

Introduction of BP & TRW-S

Alexander Shekhovtsov and Václav Hlaváč

The minimum cost flow problem

Markov Random Fields with Efficient Approximations

Efficient Graph Cut Optimization for Full CRFs with Quantized Edges

A Faster Algorithm for Computing the Principal Sequence of Partitions

Expectation-Maximization & Belief Propagation

Presentation transcript:

Algorithms for MAP estimation in Markov Random Fields Vladimir Kolmogorov University College London

Energy function p q unary terms (data) pairwise terms (coherence) - x p are discrete variables (for example, x p  {0,1}) -  p ( ) are unary potentials -  pq (, ) are pairwise potentials

Minimisation algorithms Min Cut / Max Flow [Ford&Fulkerson ‘56] [Grieg, Porteous, Seheult ‘89] : non-iterative (binary variables) [Boykov, Veksler, Zabih ‘99] : iterative - alpha-expansion, alpha-beta swap, … (multi-valued variables) + If applicable, gives very accurate results – Can be applied to a restricted class of functions BP – Max-product Belief Propagation [Pearl ‘86] + Can be applied to any energy function – In vision results are usually worse than that of graph cuts – Does not always converge TRW - Max-product Tree-reweighted Message Passing [Wainwright, Jaakkola, Willsky ‘02], [Kolmogorov ‘05] + Can be applied to any energy function + For stereo finds lower energy than graph cuts + Convergence guarantees for the algorithm in [Kolmogorov ’05]

Main idea: LP relaxation Goal: Minimize energy E(x) under constraints x p  {0,1} In general, NP-hard problem! Relax discreteness constraints: allow x p  [0,1] Results in linear program. Can be solved in polynomial time! Energy function with discrete variables LP relaxation E E E tight not tight

Solving LP relaxation Too large for general purpose LP solvers (e.g. interior point methods) Solve dual problem instead of primal: –Formulate lower bound on the energy –Maximize this bound –When done, solves primal problem (LP relaxation) Two different ways to formulate lower bound –Via posiforms: leads to maxflow algorithm –Via convex combination of trees: leads to tree-reweighted message passing Lower bound on the energy function E Energy function with discrete variables E E LP relaxation

Notation and Preliminaries

Energy function - visualisation node p edge (p,q) node q label 0 label 1 0

node p edge (p,q) node q label 0 label 1 Energy function - visualisation 0 vector of all parameters

Reparameterisation

Definition. is a reparameterisation of if they define the same energy: Maxflow, BP and TRW perform reparameterisations 1

Part I: Lower bound via posiforms (  maxflow algorithm)

non-negative - lower bound on the energy: maximize Lower bound via posiforms [Hammer, Hansen, Simeone’84]

Maximisation algorithm? –Consider functions of binary variables only Maximising lower bound for submodular functions –Definition of submodular functions –Overview of min cut/max flow –Reduction to max flow –Global minimum of the energy Maximising lower bound for non-submodular functions –Reduction to max flow More complicated graph –Part of optimal solution Outline of part I

Definition: E is submodular if every pairwise term satisfies Can be converted to “canonical form”: Submodular functions of binary variables zero cost

Overview of min cut/max flow

Min Cut problem source sink Directed weighted graph

Min Cut problem sink S = {source, node 1} T = {sink, node 2, node 3} Cut: source

Min Cut problem sink S = {source, node 1} T = {sink, node 2, node 3} Cut: Task: Compute cut with minimum cost Cost(S,T) = = 2 source

sink source Maxflow algorithm value(flow)=0

Maxflow algorithm sink value(flow)=0 source

Maxflow algorithm sink value(flow)=1 source

Maxflow algorithm sink value(flow)=1 source

Maxflow algorithm sink value(flow)=2 source

Maxflow algorithm sink value(flow)=2 source

value(flow)=2 sink source Maxflow algorithm

Maximising lower bound for submodular functions: Reduction to maxflow

sink source value(flow)=0 0 Maxflow algorithm and reparameterisation

sink value(flow)= source Maxflow algorithm and reparameterisation

sink value(flow)= source Maxflow algorithm and reparameterisation

sink value(flow)= source Maxflow algorithm and reparameterisation

sink value(flow)= source Maxflow algorithm and reparameterisation

sink value(flow)= source Maxflow algorithm and reparameterisation

value(flow)= minimum of the energy: 2 0 sink source Maxflow algorithm and reparameterisation

Maximising lower bound for non-submodular functions

Arbitrary functions of binary variables Can be solved via maxflow [Boros,Hammer,Sun’91] –Specially constructed graph Gives solution to LP relaxation: for each node x p  {0, 1/2, 1} E LP relaxation non-negative maximize

Arbitrary functions of binary variables /2 Part of optimal solution [Hammer, Hansen, Simeone’84]

Part II: Lower bound via convex combination of trees (  tree-reweighted message passing)

Goal: compute minimum of the energy for  In general, intractable! Obtaining lower bound: –Split  into several components:      –Compute minimum for each component: –Combine      to get a bound on  Use trees! Convex combination of trees [Wainwright, Jaakkola, Willsky ’02]

graph tree Ttree T’ lower bound on the energy maximize Convex combination of trees (cont’d)

TRW algorithms Goal: find reparameterisation maximizing lower bound Apply sequence of different reparameterisation operations: –Node averaging –Ordinary BP on trees Order of operations? –Affects performance dramatically Algorithms: –[Wainwright et al. ’02]: parallel schedule May not converge –[Kolmogorov’05]: specific sequential schedule Lower bound does not decrease, convergence guarantees

Node averaging

Send messages –Equivalent to reparameterising node and edge parameters Two passes (forward and backward) Belief propagation (BP) on trees

3 0 Key property (Wainwright et al.): Upon termination  p gives min-marginals for node p:

TRW algorithm of Wainwright et al. with tree-based updates (TRW-T) Run BP on all trees “Average” all nodes If converges, gives (local) maximum of lower bound Not guaranteed to converge. Lower bound may go down.

Sequential TRW algorithm (TRW-S) [Kolmogorov’05] Run BP on all trees containing p “Average” node p Pick node p

Main property of TRW-S Theorem: lower bound never decreases. Proof sketch:

Main property of TRW-S Theorem: lower bound never decreases. Proof sketch:

TRW-S algorithm Particular order of averaging and BP operations Lower bound guaranteed not to decrease There exists limit point that satisfies weak tree agreement condition Efficiency?

“Average” node p Pick node p inefficient? Efficient implementation Run BP on all trees containing p

Efficient implementation Key observation: Node averaging operation preserves messages oriented towards this node Reuse previously passed messages! Need a special choice of trees: –Pick an ordering of nodes –Trees: monotonic chains

Efficient implementation Algorithm: –Forward pass: process nodes in the increasing order pass messages from lower neighbours –Backward pass: do the same in reverse order Linear running time of one iteration

Efficient implementation Algorithm: –Forward pass: process nodes in the increasing order pass messages from lower neighbours –Backward pass: do the same in reverse order Linear running time of one iteration

Memory requirements Additional advantage of TRW-S: –Needs only half as much memory as standard message passing! –Similar observation for bipartite graphs and parallel schedule was made in [Felzenszwalb&Huttenlocher’04] standard message passing TRW-S

Experimental results: binary segmentation (“GrabCut”) Time Energy average over 50 instances

Experimental results: stereo left image ground truth BP TRW-S

Experimental results: stereo

Summary MAP estimation algorithms are based on LP relaxation –Maximize lower bound Two ways to formulate lower bound Via posiforms: leads to maxflow algorithm –Polynomial time solution –But: applicable for restricted energies (e.g. binary variables) Submodular functions: global minimum Non-submodular functions: part of optimal solution Via convex combination of trees: leads to TRW algorithm –Convergence in the limit (for TRW-S) –Applicable to arbitrary energy function Graph cuts vs. TRW: –Accuracy: similar –Generality: TRW is more general –Speed: for stereo TRW is currently 2-5 times slower. But: 3 vs. 50 years of research! More suitable for parallel implementation (GPU? Hardware?)

Discrete vs. continuous functionals Continuous formulation (Geodesic active contours) Maxflow algorithm –Global minimum, polynomial-time Metrication artefacts? Level sets –Numerical stability? Geometrically motivated –Invariant under rotation Discrete formulation (Graph cuts)

Geo-cuts Continuous functional Construct graph such that for smooth contours C Class of continuous functionals? [Boykov&Kolmogorov’03], [Kolmogorov&Boykov’05]: –Geometric length/area (e.g. Riemannian) –Flux of a given vector field –Regional term

TRW formulation where is the collection of all parameter vectors is a fixed probability distribution on trees T

Efficient implementation Algorithm: –Forward pass: process nodes in the increasing order pass messages from lower neighbours –Backward pass: do the same in reverse order Linear running time of one iteration node being processed valid messages

Efficient implementation Algorithm: –Forward pass: process nodes in the increasing order pass messages from lower neighbours –Backward pass: do the same in reverse order Linear running time of one iteration

Efficient implementation Algorithm: –Forward pass: process nodes in the increasing order pass messages from lower neighbours –Backward pass: do the same in reverse order Linear running time of one iteration node being processed valid messages

Efficient implementation valid messages node being processed Algorithm: –Forward pass: process nodes in the increasing order pass messages from lower neighbours –Backward pass: do the same in reverse order Linear running time of one iteration

Efficient implementation node being processed valid messages Algorithm: –Forward pass: process nodes in the increasing order pass messages from lower neighbours –Backward pass: do the same in reverse order Linear running time of one iteration