Michael Bleyer LVA Stereo Vision Global Methods 1 Michael Bleyer LVA Stereo Vision
What happened last time? Local methods Pros and cons Adaptive windows Slanted surfaces Occlusion handling in local stereo
Outline Stereo as an Energy Minimization Problem Dynamic programming (DP) Basic Algorithm DP Algorithms Scanline Optimization Tree DP Semi global matching Simple Tree Method
Michael Bleyer LVA Stereo Vision Stereo as an Energy Minimization Problem Michael Bleyer LVA Stereo Vision
Stereo as an Energy Minimization Problem Define an energy/cost function to measure the quality of a disparity map: High energy means that the disparity map is bad. Low energy means it is good. Energy function is typically in the form of: where D is the disparity map of the left image Edata measures photo consistency Esmooth measures smoothness Global methods express smoothness assumption in an explicit form (as a smoothness term). Let us take a closer look at Edata and Esmooth.
The Data Term Measures the color dissimilarity for each pixel p of the left image I: where dp is the disparity of p in the disparity map D m() is a function computing the color dissimilarity between pixels of left and right images.
Nodes correspond to pixels of the left image The Smoothness Term The smoothness assumption states that neighboring pixels should be assigned to the same (or similar) disparities: Nodes correspond to pixels of the left image
Edges represent interactions between pixels. The Smoothness Term The smoothness assumption states that neighboring pixels should be assigned to the same (or similar) disparities: Edges represent interactions between pixels. In our case, interactions occur between a pixel and its 4 spatial neighbours. We state that the pixel should have the same disparity with its neighbours
The Smoothness Term Let us write this idea as a term: where N is the set of all spatial neighbouring pixels in the left image. s() is a smoothness function that imposes a penalty if two disparities are different from each other. Let us use the following smoothness function: P is a user-defined penalty that balances data and smoothness terms 0 if dp = dq P otherwise
This particular smoothness function is called the Potts model The Smoothness Term Let us write this idea as a term: where N is the set of all spatial neighbouring pixels in the left image. s() is a smoothness function that imposes a penalty if two disparities are different from each other. Let us use the following smoothness function: P is a user-defined penalty that balances data and smoothness terms This particular smoothness function is called the Potts model 0 if dp = dq P otherwise
Balancing Data and Smoothness Terms P = 0 P = 5 P = 10 P = 30 P = 50 P = 5000 Disparity maps generated by energy optimization via alpha-expansion algorithm (no global optimum)
The Smoothness Term Our smoothness term defines the following set of (smoothness) interactions: This is called the 4-connected grid.
Optimizing the 4-connected grid We are looking for a disparity map D that has the minimum energy E(D) among all possible disparity maps. This is a very difficult problem: np-complete problem in the general case => It is not possible to compute the optimal disparity map in reasonable time (most likely). Why is it difficult: A pixel has influence on all other pixels in the image: Changing the disparity of the pixel in the top-left corner might change the disparity of the pixel in the bottom left corner.
Optimizing the 4-connected grid We are looking for a disparity map D that has the minimum energy E(D) among all possible disparity map. In general, this is a very difficult problem: np-complete problem in general => It is not possible to compute the optimal disparity map in reasonable time. Why is it difficult: Every pixel is connected to every other pixel. Changing the disparity of the pixel in the top-left corner might change the disparity of the pixel in the bottom left corner. We will spend two sessions on optimization algorithms. We will learn about the following algorithms: Dynamic Programming (This session) Belief Propagation (Next session) Graph-Cuts (Next session)
Application to Other CV Problems Our energy function measures the quality of an assignment of pixels to labels. In our case, labels correspond to disparities. However, labels can have a different meaning => We can use energy minimization approaches to solve a lot of other computer vision problems.
Optical Flow (Very Similar to Stereo) Input: 2 consecutive frames of a video Desired output: Map of 2D vectors representing the movement of each pixel Labels: All allowed 2D displacement vectors Data Term: Color dissimilarity between corresponding pixels Smoothness term: Penalty if neighbouring pixels have different 2D vectors
Image Denoising Input: Desired output: Labels: Data Term: Noisy image Desired output: Noise-free image Labels: 255 intensity values Data Term: Dissimilarity between pixel’s intensity and assigned intensity Smoothness term: Penalty if neighbouring pixels are assigned to different intensities.
Inpainting Input: Desired output: Labels: Data Term: Smoothness term: Image with partially missing information (red rectangle) Desired output: Complete image Labels: 255 intensity values Data Term: 0 for each label assignment Smoothness term: Penalty if neighbouring pixels are assigned to different intensities.
Interactive Image Segmentation Input: Color Image Foreground and background scribbles provided by user. Desired output: Binary map Label 0: Pixel belongs to background Label 1: Pixel belongs to foreground Data Term: Dissimilarity between a pixel’s color and the color models of fore-/background. Smoothness term: Penalty on 0/1 label transitions
Interactive Image Segmentation Input: Color Image Foreground and background scribbles provided by user. Desired output: Binary map Label 0: Pixel belongs to background Label 1: Pixel belongs to foreground Data Term: Dissimilarity between a pixel’s color and the color models of fore-/background. Smoothness term: Penalty on 0/1 label transitions There are many more computer vision problems that can be modelled by our energy function.
Generality of energy functions Apart from smoothness, we can model other assumptions in the energy function. Some examples for stereo: Energy gives infinite costs if uniqueness assumption is violated. Energy is lower if disparity borders coincide with intensity edges. In general, if you have a computer vision problem: Think about what a perfect solution should look like. Try to express the properties of this perfect solution as an energy function. Apply one of many existing optimization algorithm to find the solution that minimizes your energy function. ~30% of vision papers work like this.
Limitations of Energy Minimization I have implemented an energy minimization approach, but it gives poor results. Why? There are 2 reasons: Energy modeling: Your energy represents a poor model of your problem. Ideally, the correct solution should have lower energy than all other possible solutions. Energy minimization: Your optimization algorithm delivers a solution that is far off from the exact minimum of your energy. Problem: You usually do not know which of the two reasons is the problem in your approach. However, there are strong indications that energy modeling is the major problem at the current state-of-the-art.
Limitations of Energy Minimization I have implemented an energy minimization approach, but it gives poor results. Why? There are 2 reasons: Energy modeling: Your energy represents a poor model of your problem: Ideally, the correct solution should have lower energy than all other possible solutions. If this is the disparity map that has lower energy than all other possible disparity maps, you have done a good job in the energy modelling step.
Limitations of Energy Minimization I have implemented an energy minimization approach, but it gives poor results. Why? There are 2 reasons: Energy modeling: Your energy represents a poor model of your problem. Ideally, the correct solution should have lower energy than all other possible solutions. Energy minimization: Your optimization algorithm delivers a solution that is far off from the exact minimum of your energy. Problem: You usually do not know which of the two reasons is the problem in your approach. However, there are strong indications that energy modeling is the major problem at the current state-of-the-art.
Limitations of Energy Minimization I have implemented an energy minimization approach, but it gives poor results. Why? There are 2 reasons: Energy modeling: Your energy represents a poor model of your problem. Ideally, the correct solution should have lower energy than all other possible solutions. Energy minimization: Your optimization algorithm delivers a solution that is far off from the exact minimum of your energy. Problem: You usually do not know which of the two reasons is the problem in your approach. However, there are strong indications that energy modeling is the major problem at the current state-of-the-art. Result of applying two different optimization algorithms on the same energy function ICM Graph-Cuts
Limitations of Energy Minimization I have implemented an energy minimization approach, but it gives poor results. Why? There are 2 reasons: Energy modeling: Your energy represents a poor model of your problem. Ideally, the correct solution should have lower energy than all other possible solutions. Energy minimization: Your optimization algorithm delivers a solution that is far off from the exact minimum of your energy. Problem: You usually do not know which of the two reasons is the problem in your approach. However, there are strong indications that energy modeling is the major problem at the current state-of-the-art.
Limitations of Energy Minimization I have implemented an energy minimization approach, but it gives poor results. Why? There are 2 reasons: Energy modeling: Your energy represents a poor model of your problem. Ideally, the correct solution should have lower energy than all other possible solutions. Energy minimization: Your optimization algorithm delivers a solution that is far off from the exact minimum of your energy. Problem: You usually do not know which of the two reasons is the problem in your approach. However, there are strong indications that energy modeling is the major problem at the current state-of-the-art. We will spend this and next sessions on the energy optimization problem. We will then focus on the modelling component.
Michael Bleyer LVA Stereo Vision Dynamic Programming Michael Bleyer LVA Stereo Vision
Special Case of our Energy Function Let us come back to our energy function where Esmooth is implemented by the Potts model. Optimization of E is np-complete: There is most likely no algorithm that can give you the exact minimum in reasonable time. However, there is a special case: If the smoothness interactions form a tree in the grid graph, the optimal solution can be efficiently computed using dynamic programming (DP).
Example of a tree A tree is a graph that does not contain cycles.
Dynamic Programming - Algorithm Function L(r) computes the exact energy optimum r represents the root of the tree (can be chosen arbitrarily) D is the set of all allowed disparities m(p,d) are the costs for matching pixel p at disparity d. s(d,d’) gives a penalty if the disparities d and d’ have different values (smoothness function) Cp is the set of all siblings of p (Those pixels that have p as a direct predecessor on the path to the root node).
Dynamic Programming - Algorithm If the graph was not a tree, this recursion would run forever. Function L(r) computes the exact energy optimum r represents the root of the tree (can be chosen arbitrarily) D is the set of all allowed disparities m(p,d) are the costs for matching pixel p at disparity d. s(d,d’) gives a penalty if the disparities d and d’ have different values (smoothness function) Cp is the set of all siblings of p (Those pixels that have p as a direct predecessor on the path to the root node).
Dynamic Programming – An Example We will use the Potts model to implement s() with P = 10. Matching costs are given by m(p,d) r s t u d1 5 20 10 d2 15 30 25 r s t u
Dynamic Programming – An Example We will use the Potts model to implement s() with P = 10. Matching costs are given by The energy of the optimal disparity assignment is 55. What is the disparity assignment that has led to this optimal energy? m(p,d) r s t u d1 5 20 10 d2 15 30 25 r s t u
Dynamic Programming – An Example We can find the disparity sequence that has led to the optimum by back-tracking. We look which disparity was chosen at each pixel and follow this path. dr = 1 r ds = 1 s dt = 1 du = 1 t u
Dynamic Programming – An Example We can find the disparity sequence that has led to the optimum by back-tracking. We look which disparity was chosen at each pixel and follow this path. Setting all pixels to disparity 1 represents the optimal disparity assignment in our example. dr = 1 r ds = 1 s dt = 1 du = 1 t u
DP on the 4-connected Grid Problem: The 4-connected grid is definitely not a tree!
DP on the 4-connected Grid Idea: We can remove edges (smoothness interactions) so that the 4-connected grid becomes a tree. The following approaches only differ in the way how they erase edges. Problem: The 4-connected grid is definitely not a tree!
Scanline DP All vertical edges are deleted from the 4-connected grid. That is what the majority of DP-based approaches do. Oftentimes, these approaches implement the ordering assumption (I will skip this, because this is not state-of-the-art anymore.)
Scanline DP All vertical edges are deleted from the 4-connected grid. That is what the majority of DP-based approaches do. Oftentimes, these approaches implement the ordering assumption (I will skip this, because this is not state-of-the-art anymore.)
Scanline DP All vertical edges are deleted from the 4-connected grid. That is what the majority of DP-based approaches do. Oftentimes, these approaches implement the ordering assumption (I will skip this, because this is not state-of-the-art anymore.)
What will be the problem of this approach? Scanline DP What will be the problem of this approach? All vertical edges are deleted from the 4-connected grid. That is what the majority of DP-based approaches do. Oftentimes, these approaches implement the ordering assumption (I will skip this, because this is not state-of-the-art anymore.)
The Scanline Streaking Problem Deleting the vertical smoothness edges leads to horizontal streaks in the disparity maps. The problem is that smoothness between neighbouring scanlines is not enforced.
Tree DP by [Veksler, CVPR2005] We can obtain a tree structure in a smarter way. Observation: Disparity discontinuities are typically aligned with intensity edges, hence: Two neighbouring pixels of similar intensities are very likely to lie on the same disparity. Two neighbouring pixels of different intensities are less likely to lie on the same disparity. => The smoothness edges between neighbouring pixels of very different intensities are the least important ones. => We should remove those.
Tree DP by [Veksler, CVPR2005] Algorithm for obtaining the tree structure: For each smoothness edge between two pixels p and q: Compute a weight w(p,q) by where I(p) denotes pixel p’s intensity. Build the minimum spanning tree (MST) using the computed weights: The MST is the tree connecting all pixels whose sum of weights is minimum among all such trees. The MST can be computed in linear time using standard graph algorithms.
Tree DP by [Veksler, CVPR2005] Pixels of high intensity difference (no smoothness edge) Pixels of low intensity difference (smoothness edge)
Tree DP by [Veksler, CVPR2005] Horizontal streaks are effectively reduced. However, vertical streaks or now present as well.
Semi-Global Matching by [Hirschmueller, CVPR2005] Disclaimer: The original paper is written from a completely different perspective (no tree DP). Construct an individual tree at each pixel p. Tree contains the vertical, horizontal and diagonal lines on which p resides (star shape).
Semi-Global Matching by [Hirschmueller, CVPR2005] No streaks, but isolated pixels
Semi-Global Matching by [Hirschmueller, CVPR2005] Texture If the tree for pixel p does not capture texture, the algorithm will fail.
Simple Tree by [Bleyer, VISAPP2008] Motivation: Overcome the problem of [Hirschmueller, CVPR2005] in untextured regions Idea: Also generate an individual tree at each pixel p. This tree contains all pixels of the reference view (=> The problem of missing texture is avoided) 2 tree structures: Horizontal Tree Vertical Tree
Simple Tree by [Bleyer, VISAPP2008] Texture cannot be missed by these trees.
Simple Tree by [Bleyer, VISAPP2008] No streaks, no problems in untextured regions.
Dynamic Programming - Pros and Cons DP algorithms are very fast (comparable to local methods) Good tradeoff between speed and accuracy Cons: Can only be applied on tree structures. Erasing smoothness edges leads to performance degradations. Optimization algorithms that operate on the full 4-connected grid perform better (next session)
Summary Principle of global methods: Energy modeling Energy minimization Energy minimization for CV problems different from stereo Dynamic Programming: Scanline optimization Tree DP Semi-global matching Simple Tree
References M. Bleyer, M. Gelautz, Simple but Effective Tree Structures for Dynamic Programming-based Stereo Matching, VISAPP 2008. H. Hirschmueller, Accurate and Efficient Stereo Processing by Semi-Global Matching and Mutual Information, CVPR 2005. O. Veksler, Stereo Correspondence by Dynamic Programming on a Tree, CVPR 2005.