3D Vision and Graphics
The Geometry of Multiple Views Motion Field Stereo Epipolar Geometry The Essential Matrix The Fundamental Matrix Structure from Motion
Multiple View/Motion Research Motion Research Motion Segmentation Tracking Structure From Motion Infinitesimal SFM Multiview Methods Orthographic/AffinePerspectiveUncalibratedStereo
The Motion Field
Motion “When objects move at equal speed, those more remote seem to move more slowly.” - Euclid, 300 BC
The Motion Field Where in the image did a point move? Down and left
Motion Field Equation v x, v y : Components of image motion T: Components of 3-D linear motion Angular velocity vector (x,y): Image point coordinates Z: depth f: focal length
Pure Translation
Forward Translation & Focus of Expansion [Gibson, 1950]
Pure Rotation: T=0 Independent of T x T y T z Independent of Z Only function of (x,y), f and
Pure Rotation: Motion Field on Sphere
Binocular Stereo Binocular Stereo Some slides on stereo adapted from Kristian Kirk
Traditional Application: Mobile Robot Navigation The Stanford Cart, H. Moravec, The INRIA Mobile Robot, Courtesy O. Faugeras and H. Moravec.
What is binocular stereo vision? A way of getting depth (3-D) information about a scene from two 2-D views (images) of the scene
What is binocular stereo vision? A way of getting depth (3-D) information about a scene from two 2-D views (images) of the scene Used by humans and animals
What is binocular stereo vision? A way of getting depth (3-D) information about a scene from two 2-D views (images) of the scene Used by humans and animals Computational stereo vision Programming machines to do stereo vision Studied extensively in the past 25 years Difficult; still being researched
Fundamentals of Stereo Vision A camera model: Models how 3-D scene points are transformed into 2-D image points The pinhole camera: a simple linear model for perspective projection
Fundamentals of Stereo Vision The goal of stereo analysis: The inverse process: From 2-D image coordinates to 3-D scene coordinates Requires images from at least two views
Fundamentals of Stereo Vision 3-D reconstruction
Fundamentals of Stereo Vision 3-D reconstruction
Fundamentals of Stereo Vision 3-D reconstruction
Fundamentals of Stereo Vision 3-D reconstruction
Fundamentals of Stereo Vision 3-D reconstruction
Fundamentals of Stereo Vision 3-D reconstruction
Fundamentals of Stereo Vision 3-D reconstruction
Fundamentals of Stereo Vision 3-D reconstruction
Prerequisites Camera model parameters must be known: External parameters: Positions, orientations Internal parameters: Focal length, image center, distortion, etc..
Prerequisites Camera calibration
Two subproblems Matching Finding corresponding elements in the two images Reconstruction Establishing 3-D coordinates from the 2-D image correspondences found during matching
Two Subproblems Matching (hardest) Finding corresponding elements in the two images Reconstruction Establishing 3-D coordinates from the 2-D image correspondences found during matching
Which image entities should be matched? Two main approaches Pixel/area-based (lower-level) Feature-based (higher-level) The Matching Problem
Matching Challenges Scene elements do not always look the same in the two images Camera-related problems Image noise, differing gain, contrast, etc.. Viewpoint-related problems: Perspective distortions Occlusions Specular reflections
Choice of Camera Setup Baseline distance between cameras (focal points) Trade-off Small baseline: Matching easier Large baseline: Depth precision better
Matching Clues Correspondance search is a 1-D problem Matching point must lie on a line
Matching Clues Epipolar geometry
Matching Clues Epipolar geometry
Rectification Simplifies the correspondance search Makes all epipolar lines parallel and coincident Corresponds to parallel camera configuration
Rectification All epipolar lines are parallel in the rectified image plane.
Reconstruction from Rectified Images Disparity: d=u’-u.Depth: z = -B/d.
Goal: Disparity Map Disparity: The horizontal displacement between corresponding points Closely related to scene depth
Epipolar geometry example
image I(x,y) image I´(x´,y´) Disparity map D(x,y) (x´,y´)=(x+D(x,y),y)
Human Stereopsis: Binocular Fusion How are the correspondences established? Julesz (1971): Is the mechanism for binocular fusion a monocular process or a binocular one?? There is anecdotal evidence for the latter (camouflage). Random dot stereograms provide an objective answer
A Cooperative Model (Marr and Poggio, 1976) Excitory connections: continuity Inhibitory connections: uniqueness Iterate: C = S C - w S C + C. ei0 Reprinted from Vision: A Computational Investigation into the Human Representation and Processing of Visual Information by David Marr. 1982 by David Marr. Reprinted by permission of Henry Holt and Company, LLC.
More Matching Heuristics Always valid: (Epipolar line) Uniqueness Minimum/maximum disparity Sometimes valid: Ordering Local continuity (smoothness)
Area-based Matching Finding pixel-to-pixel correspondences For each pixel in the left image, search for the most similar pixel in the right image
Area-based Matching Finding pixel-to-pixel correspondences For each pixel in the left image, search for the most similar pixel in the right image Using neighbourhood windows
Area-based Matching Similarity measures for two windows SAD (sum of absolute differences) SSD (sum of squared differences) CC (cross-correlation) …
Area-based Matching Choice of window size Factors to considers: Ambiguity Noise sensitivity Sensitivity towards viewpoint-related distortions Expected object sizes Frequency of depth jumps
Correlation Methods (1970--) Slide the window along the epipolar line until w.w’ is maximized.
Correlation Methods: Foreshortening Problems Solution: add a second pass using disparity estimates to warp the correlation windows, e.g. Devernay and Faugeras (1994). Reprinted from “Computing Differential Properties of 3D Shapes from Stereopsis without 3D Models,” by F. Devernay and O. Faugeras, Proc. IEEE Conf. on Computer Vision and Pattern Recognition (1994). 1994 IEEE.
Dynamic Programming (Baker and Binford, 1981) Find the minimum-cost path going monotonically down and right from the top-left corner of the graph to its bottom-right corner. Nodes = matched feature points (e.g., edge points). Arcs = matched intervals along the epipolar lines. Arc cost = discrepancy between intervals.
Dynamic Programming (Ohta and Kanade, 1985) Reprinted from “Stereo by Intra- and Intet-Scanline Search,” by Y. Ohta and T. Kanade, IEEE Trans. on Pattern Analysis and Machine Intelligence, 7(2): (1985). 1985 IEEE.
Hierarchical stereo matching Downsampling (Gaussian pyramid) Disparity propagation Allows faster computation Deals with large disparity ranges ( Falkenhagen´97;Van Meerbergen,Vergauwen,Pollefeys,VanGool IJCV‘02)
Disparity map image I(x,y) image I´(x´,y´) Disparity map D(x,y) (x´,y´)=(x+D(x,y),y)
Example: reconstruct image from neighboring images
Three or More Viewpoints More matching information Additional epipolar constraints More confident matches
I1I2 I10 Reprinted from “A Multiple-Baseline Stereo System,” by M. Okutami and T. Kanade, IEEE Trans. on Pattern Analysis and Machine Intelligence, 15(4): (1993). \copyright 1993 IEEE.
Epipolar Geometry
Triangulation Nalwa Fig. 7.2
Epipolar Constraint Potential matches for p have to lie on the corresponding epipolar line l’. Potential matches for p’ have to lie on the corresponding epipolar line l.
Epipolar Geometry Epipolar Plane Epipoles Epipolar Lines Baseline
Epipolar Constraint: Calibrated Case Essential Matrix (Longuet-Higgins, 1981)
Properties of the Essential Matrix E p’ is the epipolar line associated with p’. E p is the epipolar line associated with p. E e’=0 and E e=0. E is singular. E has two equal non-zero singular values (Huang and Faugeras, 1989). T T
Perspective Projection 8-point algorithm Uncalibrated cameras
Epipolar Constraint: Uncalibrated Case Fundamental Matrix (Faugeras and Luong, 1992)
Properties of the Fundamental Matrix F p’ is the epipolar line associated with p’. F p is the epipolar line associated with p. F e’=0 and F e=0. F is singular. T T
E vs. F The Essential Matrix E: Encodes information on the extrinsic parameters only Has rank 2 since R is full rank and [T x ] is skew & rank 2 Its two non-zero singular values are equal 5 degrees of freedom The Fundamental Matrix F: Encodes information on both the intrinsic and extrinsic parameters Also has rank 2 since E is rank 2 7 degrees of freedom
The Eight-Point Algorithm (Longuet-Higgins, 1981) | F | =1. Minimize: under the constraint 2
Non-Linear Least-Squares Approach (Luong et al., 1993) Minimize with respect to the coefficients of F.
Recovering Motion Recall epipolar constraint Where Solve for R & t
Rectification Given a pair of images, transform both images so that epipolar lines are scan lines. Estimate a Homography to apply to each image, given at least four corresponding points
Structure from Motion: Uncalibrated cameras and projective amibguity
Projective Ambiguity We see that the same set of corresponding image points could come from two different sets of real world points and therefore both sets satisfy:
Factorization Method: Orthographic Cameras C. Tomasi, T. Kanade, Shape and Motion from Image Streams: A Factorization Method, IJCV, 9(2), 1992,
Factorization Notation N: Number of frames n: Number of points P j : j th point in 3-D R i, T i : Rotation & Translation of Camera i (x i,j, y i,j ): image of j th point measured in the i th frame.
Origin of 3D Points at 3-D Centroid CentroidCentered Data Points
Origin of image at image Centroid Centered Image Points Centriod of image points is projection of 3D centroid
Data Matrix are the i,j-th element of the N by n data matrices X, Y 2N by n registered data matrix W3 Rank Theorem: Without noise, the rank of W is less than or equal to 3.
The Rank Theorem: Factorized The registered measurement matrix can be expressed in a matrix form: represents the camera rotation is the shape matrix
Factoring Given a data matrix containing measured feature points, it can be factored using singular value decomposition as: Where D: n by n diagonal matrix, non-negative entries, called singular values U: 2N by n with orthogonal columns V T : n by n with orthogonal columns Without noise, i =0, i>3 With noise, set i =0, i>3
Factoring: After setting i =0, i>3 Where D’: 3 by 3 diagonal matrix, non-negative entries U: 2N by 3 with orthogonal columns V T : 3 by n with orthogonal columns
Ambiguity True for any A. So, find an A such that rows of RA are unit length and pairs corresponding to same image are orthogonal. Estimated camera orientation Estimated 3-D structure
Four of 150 input images
Tracked Corner Features
3-D Reconstruction
Building
Reconstruction Reconstruction after Triangulation and Texture Mapping