Discrete Optimization in Computer Vision Nikos Komodakis Ecole des Ponts ParisTech, LIGM Traitement de l’information et vision artificielle
Message passing algorithms for energy minimization
Message-passing algorithms Central concept: messages These methods work by propagating messages across the MRF graph Widely used algorithms in many areas
Message-passing algorithms But how do messages relate to optimizing the energy? Let’s look at a simple example first: we will examine the case where the MRF graph is a chain
Message-passing on chains MRF graph
Message-passing on chains Corresponding lattice or trellis
Message-passing on chains Global minimum in linear time Optimization proceeds in two passes: Forward pass (dynamic programming) Backward pass
Message-passing on chains (example on board) (algebraic derivation of messages)
s qpr Message-passing on chains
s qpr Forward pass (dynamic programming)
s qpr
s qpr
s qpr
s qpr
s qpr Min-marginal for node s and label j:
s qpr Backward pass xsxs xrxr xqxq xpxp
Message-passing on chains How can I compute min-marginals for any node in the chain? How to compute min-marginals for all nodes efficiently? What is the running time of message-passing on chains?
Message-passing on trees We can apply the same idea to tree- structured graphs Slight generalization from chains Resulting algorithm called: belief propagation (also called under many other names: e.g., max-product, min-sum etc.) (for chains, it is also often called the Viterbi algorithm)
Belief propagation (BP)
Dynamic programming: global minimum in linear time BP: Inward pass (dynamic programming) Outward pass Gives min-marginals qpr BP on a tree [Pearl’88] root leaf
qpr Inward pass (dynamic programming)
qpr
qpr
qpr
qpr
qpr
qpr
qpr Outward pass
qpr BP on a tree: min-marginals Min-marginal for node q and label j:
Belief propagation: message-passing on trees
min-marginals = ???min-marginals = sum of all messages + unary potential
What is the running time of message- passing for trees?
Message-passing on chains Essentially, message passing on chains is dynamic programming Dynamic programming means reuse of computations
Generalizing belief propagation Key property: min(a+b,a+c) = a+min(b,c) BP can be generalized to any operators satisfying the above property E.g., instead of (min,+), we could have: (max,*) Resulting algorithm called max-product. What does it compute? (+,*) Resulting algorithm called sum-product. What does it compute?
Belief propagation as a distributive algorithm BP works distributively (as a result, it can be parallelized) Essentially BP is a decentralized algorithm Global results through local exchange of information Simple example to illustrate this: counting soldiers
Counting soldiers in a line Can you think of a distributive algorithm for the commander to count its soldiers? (From David MacKay’s book “Information Theory, Inference, and Learning”)
Counting soldiers in a line
Counting soldiers in a tree Can we do the same for this case?
Counting soldiers in a tree
Counting soldiers Simple example to illustrate BP Same idea can be used in cases which are seemingly more complex: counting paths through a point in a grid probability of passing through a node in the grid In general, we have used the same idea for minimizing MRFs (a much more general problem)
Graphs with loops How about counting these soldiers? Hmmm…overcounting?