Stereo Vision ECE 847: Digital Image Processing Stan Birchfield Clemson University
Outline Stereo basics Binocular stereo matching Advanced stereo techniques
Modeling from multiple views # cameras camera dome multi-baseline stereo ... trinocular stereo human vision binocular stereo photograph two frames ... camcorder time stereoV – Greek for solid S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Invented by Wheatstone in 1838 Stereoscope Invented by Wheatstone in 1838 S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Modern version S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
No special instrument needed Can you fuse these? left right No special instrument needed Just relax your eyes L R S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Random dot stereogram invented by Bela Julesz in 1959 http://www.magiceye.com/faq.htm S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Autostereogram Do you see the shark? http://en.wikipedia.org/wiki/Autostereogram S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Can you cross-fuse these? right left Note: Cross-fusion is necessary if distance between images is greater than inter-ocular distance L R impossible: instead, trick the brain: R L Tsukuba stereo images courtesy of Y. Ohta and Y. Nakamura at the University of Tsukuba S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Human stereo geometry aR aL fixation point disparity corresponding points http://webvision.med.utah.edu/space_perception.html S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Horopter Horopter: surface where disparity is zero For round retina, the theoretical horopter is a circle (Vieth-Muller circle) http://webvision.med.utah.edu/space_perception.html S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Cyclopean image http://webvision.med.utah.edu/space_perception.html http://bearah718.tripod.com/sitebuildercontent/sitebuilderpictures/cyclops.jpg http://webvision.med.utah.edu/space_perception.html S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Panum’s fusional area (volume) Human visual system is only capable of fusing the two images with a narrow range of disparities around fixation point This area (volume) is Panum’s fusional area Outside this area we get double-vision (diplopia) http://www.allaboutvision.com/conditions/double-vision.htm S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Human visual pathway S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Prey and predator Cheetah: More accurate depth estimation Antelope: larger field of view photos courtesy California Academy of Science S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Example: Motion parallel to image scanlines Epipoles are at infinity Scanlines are the epipolar lines In this case, the images are said to be “rectified” Tsukuba stereo images courtesy of Y. Ohta and Y. Nakamura at the University of Tsukuba
Perspective projection X X x x M. Pollefeys, http://www.cs.unc.edu/Research/vision/comp256fall03/
Perspective projection f f S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Perspective projection X f x Z X x f Z S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Standard stereo geometry disparity is inversely proportional to depth stereo vision is less useful for distant objects M. Pollefeys, http://www.cs.unc.edu/Research/vision/comp256fall03/
Rectified geometry IL IR left optical axis world point right optical axis IR S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Rectified geometry two cameras overlapped (for display) d = x1 – x2 = f (X1-X2) / Z = f b / Z disparity baseline depth S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Matching space y xL xR S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Matching space d=7 d=2 d=1 d=0 possible match between pixel 7 in left scanline and pixel 4 in right scanline impossible matches S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Outline Stereo basics Binocular stereo matching Advanced stereo techniques
Binocular rectified stereo epipolar constraint 1D search: look for similar pixel in other image left right disparity map depth discontinuities S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Disparity function disparity pixel smaller slope = smaller disparity = left right occluded pixels smaller slope = smaller disparity = farther from camera higher slope = larger disparity = closer to camera lamp disparity wall pixel S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Occlusions disparity pixel left: right: object background occluded pixels object disparity background pixel
Matching a pixel Pixel’s value is not unique Only 256 values but ~100,000 pixels! Also, noise affects value Solution: use more than one pixel Assume neighbors have similar disparity Correlation window around pixel Can use any similarity measure S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Block matching compute best disparity for each pixel store result in disparity map left disparity map S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
value for this disparity Block matching (cont.) x x y y left right compare value for this disparity best so far? Yes Note: Window only moves left. Why? store it S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Block matching 5 nested for loops!!!!! disparity dissimilarity S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Block matching 5 nested for loops!!!!! disparity dissimilarity S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Eliminating redundant computations for same disparity, overlapping windows recompute the same dissimilarities for many pixels S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Block matching: another view Alternatively, precompute D(x,y,d) = dissim( IL(x,y), IR(x-d,y) ) for all x, y, d then for each (x,y) select the best d x y d S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
More efficient block matching } separable Key idea: Summation over window is convolution with box filter, which is separable Running sum improves efficiency even more (only 3 nested for loops!!!) S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
More efficient block matching separable Key idea: Summation over window is convolution with box filter, which is separable Running sum improves efficiency even more (only 3 nested for loops!!!) S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Comparing image regions Compare intensities pixel-by-pixel I(x,y) I´(x,y) Dissimilarity measures Sum of Square Differences Note: SAD is fast approximation (replace square with absolute value) M. Pollefeys, http://www.cs.unc.edu/Research/vision/comp256fall03/
Comparing image regions Compare intensities pixel-by-pixel I(x,y) I´(x,y) Dissimilarity measures If energy does not change much, then minimizing SSD equals maximizing cross-correlation M. Pollefeys, http://www.cs.unc.edu/Research/vision/comp256fall03/
Comparing image regions Compare intensities pixel-by-pixel I(x,y) I´(x,y) Similarity measures Zero-mean Normalized Cross Correlation M. Pollefeys, http://www.cs.unc.edu/Research/vision/comp256fall03/
Dissimilarity measures Most common: Connection between SSD and cross correlation: Also normalized correlation, rank, census, sampling-insensitive ... S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Comparing image regions Compare intensities pixel-by-pixel I(x,y) I´(x,y) Similarity measures Census 125 126 127 128 130 129 132 135 1 only compare bit signature using XOR, SAD, or Hamming distance (all equivalent) (Real-time chip from TZYX based on Census) M. Pollefeys, http://www.cs.unc.edu/Research/vision/comp256fall03/
Sampling-Insensitive Pixel Dissimilarity d(xL,xR) IL IR xL xR Our dissimilarity measure: d(xL,xR) = min{d(xL,xR) ,d(xR,xL)} [Birchfield & Tomasi 1998] S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Dissimilarity Measure Theorems Given: An interval A such that [xL – ½ , xL + ½] _ A, and [xR – ½ , xR + ½] _ A ∩ ∩ Theorem 1: If | xL – xR | ≤ ½, then d(xL,xR) = 0 | xL – xR | ≤ ½ iff d(xL,xR) = 0 (when A is convex or concave) Theorem 2: (when A is linear) [Birchfield & Tomasi 1998] S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Aggregation window sizes Small windows disparities similar more ambiguities accurate when correct Large windows larger disp. variation more discriminant often more robust use shiftable windows to deal with discontinuities (Illustration from Pascal Fua)
If pixel matches do not agree in both directions, Occlusions left: right: If pixel matches do not agree in both directions, then unreliable
Left-right consistency check d Search left-to-right, then right-to-left Retain disparity only if they agree Do minima coincide? xL Conceptually, S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Left-right consistency check d for pixel (x,y) in left image, choices are D(x,y,0), D(x,y,1), D(x,y,2), …, D(x,y,max_disp) for pixel (x,y) in right image, choices are D(x,y,0), D(x+1,y,1), D(x+2,y,2), …, D(x+max_disp,y,max_disp) xL xL: xR: because xL = xR + disparity S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Left-right consistency check d xL Actually, S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
With left-right check inefficient: more efficient: S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Results: correlation left disparity map with left-right consistency check S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Constraints Epipolar – match must lie on epipolar line Piecewise constancy – neighboring pixels should usually have same disparity Piecewise continuity – neighboring pixels should usually have similar disparity Disparity – impose allowable range of disparities (Panum’s fusional area) Disparity gradient – restricts slope of disparity Figural continuity – disparity of edges across scanlines Uniqueness – each pixel has no more than one match (violated by windows and mirrors) Ordering – disparity function is monotonic (precludes thin poles) S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Stereo constraints cheirality maximum disparity uniqueness ordering (monotonicity) When are these violated? S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
hourglass-shaped region (if surface is continuous) Forbidden zone surface in the world no matches are possible in the hourglass-shaped region (if surface is continuous) world point left camera right camera (Related to ordering constraint) S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Violation of ordering constraint b c d e a b f b f c f c thin pole thin pole c c d d e e S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Disparity gradient … disparity gradient: d1 = 7-3 = 4 xc = (7+3)/2 = 5 Cyclopean coordinate x1 in IL matches x’1 in IR: x2 in IL matches x’2 in IR: disparity gradient: d1 = 7-3 = 4 xc = (7+3)/2 = 5 2 d2 = 8-4 = 4 xc = (8+4)/2 = 6 d.g. = 0 2 2 ∞ d2 = 8-3 = 5 xc = (8+3)/2 = 5.5 d.g. = 2 2 2 2 2 2 2 2 ∞ 2 d2 = 6-4 = 2 xc = (6+4)/2 = 5 d.g. = ∞ ∞ 2 … 2 2 S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Disparity gradient constraint (human visual system imposes this) (same as ordering constraint) S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Figural continuity constraint right left [University of Tsukuba] S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Outline Stereo basics Binocular stereo matching Advanced stereo techniques
Dynamic Programming: 1D Search t 1 2 3 4 c a t c a r t penalties: mismatch = 1 insertion = 1 deletion = 1 c 1 1 2 3 string editing: a 2 1 1 2 t 3 2 1 1 1 occlusion RIGHT stereo matching: Disparity map LEFT depth discontinuity S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Minimizing a 2D Cost Functional Minimize: d GLOBAL disparity pixel ? 1D: disparity 2D: Global u(l ) p,q Discontinuity penalty: l minimum cut = disparity surface solves LOCAL Local (GOOD) (BAD) S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Multiway-Cut: 2D Search labels labels pixels pixels [Boykov, Veksler, Zabih 1998] S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Multiway-Cut Algorithm labels pixels source label sink label minimum cut pixels (cost of label discontinuity) (cost of assigning label to pixel) Minimizes S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Energy minimization (Slide from Pascal Fua)
Graph Cut (general formulation requires multi-way cut!) (Slide from Pascal Fua)
Simplified graph cut (Roy and Cox ICCV‘98) (Boykov et al ICCV‘99)
Correspondence as Segmentation Problem: disparities (fronto-parallel) O(D) surfaces (slanted) O(D s2 n) => computationally intractable! Solution: iteratively determine which labels to use find affine parameters of regions label pixels multiway-cut (Expectation) Newton-Raphson (Maximization) S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Stereo Results (Dynamic Programming) S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Stereo Results (Multiway-Cut) S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Stereo Results on Middlebury Database image Birchfield Tomasi 1999 Hong- Chen 2004 S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Untextured regions remain a challenge Dynamic programming Multiway-cut S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Results: dynamic programming left disparity map [Bobick & Intille] S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Results: multiway cut left disparity map [Kolmogorov & Zabih] S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Results: multiway cut (untextured) disparity map S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Multi-camera configurations (illustration from Pascal Fua) Okutami and Kanade M. Pollefeys, http://www.cs.unc.edu/Research/vision/comp256fall03/
Tsukuba dataset Example: Tsukuba
Real-time stereo on GPU (Yang and Pollefeys, CVPR2003) Computes Sum-of-Square-Differences (use pixelshader) Hardware mip-map generation for aggregation over window Trade-off between small and large support window 290M disparity hypothesis/sec (Radeon9800pro) e.g. 512x512x36disparities at 30Hz GPU is great for vision too!
(dynamic programming ) Stereo matching Constraints epipolar ordering uniqueness disparity limit Trade-off Matching cost (data) Discontinuities (prior) Similarity measure (SSD or NCC) Optimal path (dynamic programming ) Consider all paths that satisfy the constraints pick best using dynamic programming M. Pollefeys, http://www.cs.unc.edu/Research/vision/comp256fall03/
Hierarchical stereo matching Allows faster computation Deals with large disparity ranges Downsampling (Gaussian pyramid) Disparity propagation M. Pollefeys, http://www.cs.unc.edu/Research/vision/comp256fall03/
Disparity map (x´,y´)=(x+D(x,y),y) image I´(x´,y´) image I(x,y) Disparity map D(x,y) image I´(x´,y´) (x´,y´)=(x+D(x,y),y) M. Pollefeys, http://www.cs.unc.edu/Research/vision/comp256fall03/
Example: reconstruct image from neighboring images M. Pollefeys, http://www.cs.unc.edu/Research/vision/comp256fall03/
Stereo matching with general camera configuration M. Pollefeys, http://www.cs.unc.edu/Research/vision/comp256fall03/
Image pair rectification M. Pollefeys, http://www.cs.unc.edu/Research/vision/comp256fall03/
Planar rectification (calibrated) Bring two views (uncalibrated) ~ image size (calibrated) Bring two views to standard stereo setup (moves epipole to ) (not possible when in/close to image) Distortion minimization (uncalibrated) M. Pollefeys, http://www.cs.unc.edu/Research/vision/comp256fall03/
M. Pollefeys, http://www.cs.unc.edu/Research/vision/comp256fall03/
original image pair planar rectification polar rectification M. Pollefeys, http://www.cs.unc.edu/Research/vision/comp256fall03/
Stereo camera configurations (Slide from Pascal Fua)
More cameras Multi-baseline stereo [Okutomi & Kanade] S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847
Applications www.bigstage.com Take 3 pictures, reconstruct 3D geometry S. Birchfield, Clemson Univ., ECE 847, http://www.ces.clemson.edu/~stb/ece847