Michael Bleyer LVA Stereo Vision

Michael Bleyer LVA Stereo Vision
Data Term Michael Bleyer LVA Stereo Vision

What happened last time?
We have looked at our energy function: We have learned about an optimization algorithm that can minimize our energy Belief Propagation We have investigated the smoothness function s(): First order smoothness functions Linear Potts model Truncated linear Second order smoothness Edge/segment-sensitive smoothness functions

What is Going to Happen Today?
We will look at the data term

Michael Bleyer LVA Stereo Vision
Data Term Michael Bleyer LVA Stereo Vision

The Data Term Measures the goodness of correspondences.
The color of corresponding pixels are compared. Formally defined as where m() computes the intensity/color dissimilarity between two pixels p is a pixel of the left image q is p’s correspondence in the right image according to disparity map D. We will now look at different methods for defining the pixel dissimilarity function m(). For the first match measures, I will assume that the photo-consistency assumption holds: I will relax this assumption later on. For now, we will just consider grey-scale images: I will speak about color matching later on.

Absolute Intensity Difference
Pixel dissimilarity is computed as the absolute difference in intensity values: where Ip denotes the intensity of pixel p. Quite common choice in stereo matching

Squared Intensity Difference
Pixel dissimilarity is computed as the squared difference in intensity values: Probably not a good choice for stereo matching: Sensitive to outliers Typically performs slightly worse than absolute difference.

Truncated Absolute Difference
Pixel dissimilarity is computed as the truncated absolute difference in intensity values: where k is a user-defined value. Advantage: Robustness against outliers. Better performance in occluded regions: Since an occluded pixel does not have a matching point, it will have high pixel dissimilarity. Truncation lowers the data costs for the “correct” disparity. There is still hope that the correct disparity can be propagated from surrounding non-occluded pixels. Disadvantage: You have an additional parameter. An optimal value for k is difficult to find or might not even exist.

Sampling Insensitive Measure [Birchfield,PAMI98]
In the “real world”, intensity is a continuous function Intensity X-Coordinates (Left image)

When we take a photo, we sample this continuous curve to derive discrete pixels. Intensity X-Coordinates (Left image)

Problem: Samples are typically taken at different curve positions in the left and right images. Intensity Intensity X-Coordinates X-Coordinates (Left image) (Right image)

Due to this different sampling, corresponding pixels have different intensity values (and this will oftentimes lead to wrong matches). p p’ Intensity Intensity X-Coordinates X-Coordinates p and p’ are corresponding pixels, but have different intensity values (Left image) (Right image)

Idea of [Birchfield,PAMI98]: We also look p’s horizontal neighbors p p’ Intensity q Intensity r X-Coordinates X-Coordinates (Left image) (Right image)

We interpolate the intensity of the pixel p- that lies in between p and q by p p’ p- Intensity q Intensity r X-Coordinates X-Coordinates (Left image) (Right image)

We also interpolate the intensity of the pixel p+ that lies in between p and r by p p’ p- p+ Intensity q Intensity r X-Coordinates X-Coordinates (Left image) (Right image)

We compute the sampling insensitive matching as p p’ p- p+ Intensity q Intensity X-Coordinates X-Coordinates (Left image) (Right image)

We compute the sampling insensitive matching as p p’ p- p+ Intensity q Intensity Advantage: We have also included the “correctly sampled” pixel p- => low intensity dissimilarity and high chances for correct match X-Coordinates X-Coordinates (Left image) (Right image)

We should do this in a symmetric way: p p’ p’+ p- p+ Intensity q Intensity p’- X-Coordinates X-Coordinates (Left image) (Right image)

Violations of Photo-Consistency Assumption
In real-world stereo images, the photo-consistency assumption is almost never perfectly fulfilled. We call a pixel radiometric distorted if its intensity is different in left and right images. There are various reasons for radiometric distortions, e.g.: Different illumination conditions in the images Different exposure times Different sensor characteristics (Left image) (Right image)

Radiometric Insensitive Match Measures
Treatment of radiometric distortions has great impact on the quality of results. Unfortunately, none of the above match measures is able to cope with radiometric distortions. We will now learn about 3 radiometric insensitive measures: Mutual Information Zero mean Normalized Cross-Correlation (ZNCC) Census

Mutual Information [Hirschmueller,PAMI08]
Advantage: Mutual Information is a pixel-based measure. In contrast to window-based measures, artifacts at disparity borders are avoided To compute Mutual Information matching scores, we need the disparity map: Chicken and Egg problem If we knew the disparity map, then we were already done. This dilemma is typically solved in an iterative fashion: We compute an initial disparity map (e.g., using absolute differences as a dissimilarity function) We compute Mutual Information scores using our current disparity map. We compute a new disparity map using the Mutual Information scores. Goto 2.

Mutual Information [Hirschmueller,PAMI08]
Advantage: Mutual Information is a pixel-based measure. In contrast to window-based measures, artifacts at disparity borders are avoided To compute Mutual Information matching scores, we need the disparity map: Chicken and Egg problem If we knew the disparity map, then we were already done. This dilemma is typically solved in an iterative fashion: We compute an initial disparity map (e.g., using absolute differences as a dissimilarity function) We compute Mutual Information scores using our current disparity map. We compute a new disparity map using the Mutual Information scores. Goto 2. How can we compute the Mutual Information matching scores? Disclaimer: There is quite a lot of theory behind that. I will focus on the practical implementation as described in [Hirschmueller,PAMI08].

Computing Mutual Information Scores
For each pixel p, we look-up its matching point q in the right image using our current disparity map. We look up the intensity values Ip and Iq. We make an entry at <Ip,Iq> in the diagram bellow. For each possible pair of intensity values <Ip,Iq>, our diagram stores how often this pair occurred in the disparity map. Ip=150 Iq=100 Entry Intensity Right Image 100 150 Intensity Left Image

Let us assume that all corresponding pixels have identical intensity values, i.e. there is no radiometric distortion. The diagram looks like this: Intensity Right Image 45˚ Intensity Left Image

Let us assume that the right image is darker than the left one. The diagram looks like this: Intensity Right Image <45˚ Intensity Left Image

Let us assume that the left image is darker than the right one. The diagram looks like this: Intensity Right Image >45˚ Intensity Left Image

This is what the diagram looks like for the Teddy test set: Black means that the intensity pair occurred very frequently in the disparity map Images taken from [Hirschmueller, PAMI08]

This is what the diagram looks like for the Teddy test set: Those intensity pairs that occurred frequently should be given low matching costs. Black means that the intensity pair occurred very frequently in the disparity map

We compute –log(P) where P is our diagram. These are our Mutual Information scores Where white pixels mean low matching costs. Side node: For simplicity, I left out two steps where you apply Gaussian smoothing on P and –log(P). P -log(P)

Disadvantage of Mutual Information
Global model: Mutual Information can only model radiometric changes that are valid for the whole image. For example, the whole left image is darker than the right one. It cannot model radiometric changes that occur only locally. For example, the bottom left part of the image is darker in the left view. Unfortunately, radiometric distortions are oftentimes local.

Zero mean Normalized Cross-Correlation (ZNCC)
Is defined on windows => Will lead to artefacts at disparity discontinuities Pixel dissimilarity m() computed as where Wp is the set of all pixels in the window centered at p. is the mean intensity computed over all pixels insider Wp. Subtraction of the mean value serves to normalize intensity values (robustness against radiometric changes)

Census We center a window on pixel p in the left image.

Census We center a window on pixel p in the left image.
Apply the following transformation: If a pixel has smaller intensity than the window center pixel, write 0. Else write 1.

Apply the following transformation: If a pixel has smaller intensity than the window center pixel, write 0. Else write 1. Write the binary values as a bit-string.

Apply the following transformation: If a pixel has smaller intensity than the window center pixel, write 0. Else write 1. Write the binary values as a bit-string. Apply the same operations for the window centered on pixel q in the right image.

Apply the following transformation: If a pixel has smaller intensity than the window center pixel, write 0. Else write 1. Write the binary values as a bit-string. Apply the same operations for the window centered on pixel q in the right image. Census matching costs are computed as the Hamming distance between the bit strings: Number of positions at which binary values are different

Census - Discussion We do not directly match the intensities, but the local texture represented as a bit-string. If one image is darker than the other, the bit strings should still agree: You can, for example, add a value of 10 to the intensity values in p’s window on the previous slide and will still get the same bit string. Increased robustness if window overlaps a disparity discontinuity. Problems in untextured regions If all pixel have very similar intensities, the values of the bit string largely depend on image noise. Leads to noisy results in untextured regions.

How Can We Incorporate Color Information?
Typically this is done in the most simple way: Compute the match measure individually for each color channel Sum-up the values over all color channels Let us now investigate the role of color

Why Should Color Help? We have a blue pixel in the left image.
We have 2 candidate matches in the right image: A blue pixel A yellow pixel It is quite clear that the blue pixel is the correct match. (Left Image) (Right Image)

Why Should Color Help? Let us now convert the color images into grey-scale images. In our example blue and yellow colors map to the same grey-value. It is no longer clear which of our 2 candidate pixels is the correct match => Color information reduces ambiguity! ? (Left Image) (Right Image)

However, a lot of stereo algorithms do not use color information.
Why Should Color Help? Let us now convert the color images into grey-scale images. In our example blue and yellow colors map to the same grey-value. It is no longer clear which of our 2 candidate pixels is the correct match => Color information reduces ambiguity! However, a lot of stereo algorithms do not use color information. Is this for a reason? ? (Left Image) (Right Image)

Evaluation of Color Matching [Bleyer,3DPVT10]
We evaluate the performance of 8 different color systems and grey scale matching: The different color systems affect the data term of our energy. Truncated linear is used as a smoothness term. Energy optimization is accomplished using the Simple Tree dynamic programming method (see session on dynamic programming) We use 30 ground truth pairs from Middlebury as test data. Error is computed as the percentage of pixels having an absolute disparity error > 1 pixel in non-occluded regions.

Test Set

We compute the pixel dissimilarity as the absolute difference in intensity values. Results (see plot bellow): Grey-scale matching nearly always performs worst. The color system LUV seems to perform better than RGB.

We compute the pixel dissimilarity as the absolute difference in intensity values. Results (see plot bellow): Grey-scale matching nearly always performs worst. The color system LUV seems to perform better than RGB. So using color is a good thing? Well, this is not the whole story.

Example Results for Dolls Test Set
(Left Image) (Grey - Disparity) (LUV - Disparity) Let us have a closer look at the disparity maps. I show two disparity maps where One has been computed by using the absolute difference of grey values as a match function. One has been computed by using the summed-up absolute differences in LUV values.

Example Results for Dolls Test Set
(Left Image) (Grey - Errors) (LUV - Errors) We should rather look at the error maps Black pixels are pixels whose disparity error is larger than 1 pixel in comparison against the ground truth. Errors are clearly smaller when using LUV.

(Data Costs of Ground Truth Solution)
Radiometric Problems in the Dolls Set We have the ground truth disparity map for the Dolls Set. => For each pixel p of the left image, we know its correct correspondence q in the right image. If there are no radiometric distortions, |Ip-Iq| should be equal to 0. In practice, we obtain the image shown on the right where: Bright pixels have a large value for |Ip-Iq|. These bright pixels are the result of radiometric distortions. (Data Costs of Ground Truth Solution)

Radiometric Problems in the Dolls Set We can apply thresholding on the ground truth data cost image. (Data Costs of Ground Truth Solution) (Smoothed Thresholding)

Radiometric Problems in the Dolls Set We can apply thresholding on the ground truth data cost image. There seems to be large overlap between errors in the grey-scale matching result and radiometric distorted regions. (Data Costs of Ground Truth Solution) (Smoothed Thresholding) (Disparity Error when using Grey-Scale Matching)

Radiometric Problems in the Dolls Set
We can apply thresholding on the ground truth data cost image. There seems to be large overlap between errors in the grey-scale matching result and radiometric distorted regions. Errors in radiometric distorted regions seem to be effectively reduced when using color matching. (Data Costs of Ground Truth Solution) (Smoothed Thresholding) (Disparity Error when using LUV Matching)

Radiometric Problems in the Dolls Set
We can apply thresholding on the ground truth data cost image. There seems to be large overlap between errors in the grey-scale matching result and radiometric distorted regions. Errors in radiometric distorted regions seem to be effectively reduced when using color matching. Color might be of specific importance in radiometric distorted image regions (Data Costs of Ground Truth Solution) (Smoothed Thresholding) (Disparity Error when using LUV Matching)

Color Helps in Radiometric Distorted Regions
We have extracted the radiometric distorted regions for all 30 test images. We now analyze the disparity error separately In regions affected by radiometric distortions In regions unaffected by radiometric distortions Average error percentage in regions affected by radiometric distortions A large improvement can be observed when using color (e.g., LUV) instead of grey-scale matching (Grey)

We have extracted the radiometric distorted regions for all 30 test images. We now analyze the disparity error separately In regions affected by radiometric distortions In regions unaffected by radiometric distortions Regions unaffected by radiometric distortions Almost no improvement

We have extracted the radiometric distorted regions for all 30 test images. We now analyze the disparity error separately In regions affected by radiometric distortions In regions unaffected by radiometric distortions Average error percentage in all regions. The overall improvement is largely due to considerable improvement in radiometric distorted regions.

Why Not Directly Using a Radiometric Insensitive Measure?
If color only helps in radiometric distorted regions, we can directly use radiometric insensitive measures instead of color. 3 radiometric insensitive measures are tested: Mutual information (MI) Zero mean Normalized Cross-Correlation (NCC) Census (CENSUS) All 3 match measures are applied on grey-scale images

Why Not Directly Using a Radiometric Insensitive Measure?
NCC and CENUS seem to perform best This result is consistent with [Hirschmueller,PAMI09] NCC and CENSUS have the same effect as using color (AbsDif (LUV)): They improve performance in radiometric distorted regions (blue line) NCC and CENSUS are considerably better than color in this respect.

Using Color with Radiometric Insensitive Measures
NCC used with grey scale and 8 color spaces CENSUS used with grey scale and 8 color spaces It seems to be a bad idea to use color in conjunction with radiometric insensitive measures: Grey-scale matching performs better than all 8 color spaces. How can this happen? Increased robustness of color in radiometric regions is not important anymore (NCC and CENSUS do a better job) You practically do not lose texture when deleting color Intensity is probably more robustly capture by nowadays cameras (less noise in the intensity channel)

Using Color with Radiometric Insensitive Measures
My advice: You should not use color. You should definitely use a radiometric insensitive match measure. NCC used with grey scale and 8 color spaces CENSUS used with grey scale and 8 color spaces It seems to be a bad idea to use color in conjunction with radiometric insensitive measures: Grey-scale matching performs better than all 8 color spaces. How can this happen? Increased robustness of color in radiometric regions is not important anymore (NCC and CENSUS do a better job) You practically do not lose texture when deleting color Intensity is probably more robustly capture by nowadays cameras (less noise in the intensity channel)

Support Aggregation in Global Stereo Matching
In the session on local stereo methods, we have learned about support aggregation: We do not match single pixels, but windows Until recently, using windows in global stereo was considered a bad idea: You get artefacts at disparity discontinuities!

Standard Support Weights Geodesic Support Weights
Support Aggregation in Global Stereo Matching We have also learned about new segmentation-based aggregation schemes: They deliver excellent performance In general, they will even improve the performance near depth discontinuities Apart from increased computational costs, there is relatively little that speaks against using these aggregation methods for implementing your pixel dissimilarity function m(). Standard Support Weights Geodesic Support Weights

Summary Data Term: Standard dissimilarity measures:
Absolute / Squared intensity differences Sampling insensitive measures Radiometric insensitive measures: Mutual information ZNCC Census The role of color Segmentation-based aggregation schemes

References [Birchfiled,PAMI98] S. Birchfield, C. Tomasi, A Pixel Dissimilarity Measure That Is Insensitive to Image Sampling, PAMI, vol. 20, 1998. [Bleyer,3DPVT10] M. Bleyer, S. Chambon, Does Color Really Help in Dense Stereo Matching?, 3DPVT, 2010. [Hirschmueller,PAMI08] H. Hirschmueller, Stereo Processing by Semi-Global Matching and Mutual Information, PAMI, vol. 30, no. 2, 2008. [Hirschmueller,PAMI09] H. Hirschmueller, D. Scharstein, Evaluation of Stereo Matching Costs on Images with Radiometric Differences, PAMI, vol. 31, no. 9, 2009.

Michael Bleyer LVA Stereo Vision

Similar presentations

Presentation on theme: "Michael Bleyer LVA Stereo Vision"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Michael Bleyer LVA Stereo Vision

Similar presentations

Presentation on theme: "Michael Bleyer LVA Stereo Vision"— Presentation transcript:

Similar presentations

About project

Feedback