CSCE 641 Computer Graphics: Image-based Modeling Jinxiang Chai
Outline Stereo matching - Traditional stereo - Multi-baseline stereo - Active stereo Volumetric stereo - Visual hull - Voxel coloring - Space carving
Multi-baseline stereo reconstruction Need to choose reference frames! Does not fully utilize pixel information across all images
Volumetric stereo Scene Volume V Input Images (Calibrated) Goal: Determine occupancy, “color” of points in V
Discrete formulation: Voxel Coloring Discretized Scene Volume Input Images (Calibrated) Goal: Assign RGBA values to voxels in V photo-consistent with images
Complexity and computability Discretized Scene Volume N voxels C colors 3 All Scenes ( C N 3 ) Photo-Consistent Scenes True Scene
Issues Theoretical Questions Identify class of all photo-consistent scenes Practical Questions How do we compute photo-consistent models?
1. C=2 (shape from silhouettes) Volume intersection [Baumgart 1974] >For more info: Rapid octree construction from image sequences. R. Szeliski, CVGIP: Image Understanding, 58(1):23-32, July (this paper is apparently not available online) or >W. Matusik, C. Buehler, R. Raskar, L. McMillan, and S. J. Gortler, Image-Based Visual Hulls, SIGGRAPH 2000 ( pdf 1.6 MB )pdf 1.6 MB 2. C unconstrained, viewpoint constraints Voxel coloring algorithm [Seitz & Dyer 97] 3. General Case Space carving [Kutulakos & Seitz 98] Voxel coloring solutions
Why use silhouettes? Can be computed robustly Can be computed efficiently - =background+foregroundbackgroundforeground
Reconstruction from silhouettes (C = 2) Binary Images How to construct 3D volume?
Reconstruction from silhouettes (C = 2) Binary Images Approach: Backproject each silhouette Intersect backprojected volumes
Volume intersection Reconstruction Contains the True Scene But is generally not the same In the limit (all views) get visual hull >Complement of all lines that don’t intersect S
Voxel algorithm for volume intersection Color voxel black if on silhouette in every image for M images, N 3 voxels Don’t have to search 2 N 3 possible scenes! O( ? ),
Visual Hull Results Download data and results from
Properties of volume intersection Pros Easy to implement, fast Accelerated via octrees [Szeliski 1993] or interval techniques [Matusik 2000] Cons No concavities Reconstruction is not photo-consistent Requires identification of silhouettes
Voxel coloring solutions 1. C=2 (silhouettes) Volume intersection [Baumgart 1974] 2. C unconstrained, viewpoint constraints Voxel coloring algorithm [Seitz & Dyer 97] >For more info: General Case Space carving [Kutulakos & Seitz 98]
1. Choose voxel 2. Project and correlate 3.Color if consistent (standard deviation of pixel colors below threshold) Voxel coloring approach Visibility Problem: in which images is each voxel visible?
The visibility problem Inverse Visibility? known images Unknown Scene Which points are visible in which images? Known Scene Forward Visibility - Z-buffer - Painter’s algorithm known scene
Layers Depth ordering: visit occluders first! SceneTraversal Condition: depth order is the same for all input views
Panoramic depth ordering Cameras oriented in many different directions Planar depth ordering does not apply
Panoramic depth ordering Layers radiate outwards from cameras
Panoramic layering Layers radiate outwards from cameras
Panoramic layering Layers radiate outwards from cameras
Compatible camera configurations Depth-Order Constraint Scene outside convex hull of camera centers Outward-Looking cameras inside scene Inward-Looking cameras above scene
Calibrated image acquisition Calibrated Turntable 360° rotation (21 images) Selected Dinosaur Images Selected Flower Images
Voxel coloring results Dinosaur Reconstruction 72 K voxels colored 7.6 M voxels tested 7 min. to compute on a 250MHz SGI Flower Reconstruction 70 K voxels colored 7.6 M voxels tested 7 min. to compute on a 250MHz SGI
Limitations of depth ordering A view-independent depth order may not exist pq Need more powerful general-case algorithms Unconstrained camera positions Unconstrained scene geometry/topology
Voxel coloring solutions 1. C=2 (silhouettes) Volume intersection [Baumgart 1974] 2. C unconstrained, viewpoint constraints Voxel coloring algorithm [Seitz & Dyer 97] 3. General Case Space carving [Kutulakos & Seitz 98] >For more info:
Space carving algorithm Space Carving Algorithm Image 1 Image N …... Initialize to a volume V containing the true scene Repeat until convergence Choose a voxel on the current surface Carve if not photo-consistent Project to visible input images
Which shape do you get? The Photo Hull is the UNION of all photo-consistent scenes in V It is a photo-consistent scene reconstruction Tightest possible bound on the true scene True Scene V Photo Hull V
Space carving algorithm The Basic Algorithm is Unwieldy Complex update procedure Alternative: Multi-Pass Plane Sweep Efficient, can use texture-mapping hardware Converges quickly in practice Easy to implement Results Algorithm
Multi-pass plane sweep Sweep plane in each of 6 principle directions Consider cameras on only one side of plane Repeat until convergence True SceneReconstruction
Multi-pass plane sweep Sweep plane in each of 6 principle directions Consider cameras on only one side of plane Repeat until convergence
Multi-pass plane sweep Sweep plane in each of 6 principle directions Consider cameras on only one side of plane Repeat until convergence
Multi-pass plane sweep Sweep plane in each of 6 principle directions Consider cameras on only one side of plane Repeat until convergence
Multi-pass plane sweep Sweep plane in each of 6 principle directions Consider cameras on only one side of plane Repeat until convergence
Multi-pass plane sweep Sweep plane in each of 6 principle directions Consider cameras on only one side of plane Repeat until convergence
Space carving results: african violet Input Image (1 of 45)Reconstruction
Space carving results: hand Input Image (1 of 100) Views of Reconstruction
Properties of space carving Pros Voxel coloring version is easy to implement, fast Photo-consistent results No smoothness prior Cons Bulging No smoothness prior
Next lecture Images user input range scans Model Images Image based modeling Image- based rendering Geometry+ Images Geometry+ Materials Images + Depth Light field Panoroma Kinematics Dynamics Etc. Camera + geometry
Stereo reconstruction Given two or more images of the same scene or object, compute a representation of its shape knowncameraviewpoints
Stereo reconstruction Given two or more images of the same scene or object, compute a representation of its shape knowncameraviewpoints How to estimate camera parameters? - where is the camera? - where is it pointing? - what are internal parameters, e.g. focal length?
Spectrum of IBMR Images user input range scans Model Images Image based modeling Image- based rendering Geometry+ Images Geometry+ Materials Images + Depth Light field Panoroma Kinematics Dynamics Etc. Camera + geometry
How can we estimate the camera parameters?
Camera calibration Augmented pin-hole camera - focal point, orientation - focal length, aspect ratio, center, lens distortion Known 3D Classical calibration - 3D 2D - correspondence Camera calibration online resources
Camera and calibration target
Classical camera calibration Known 3D coordinates and 2D coordinates - known 3D points on calibration targets - find corresponding 2D points in image using feature detection algorithm
Camera parameters u0u0 v0v s y 0 sxsx а u v 1 Perspective proj. View trans. Viewport proj. Known 3D coords and 2D coords
Camera parameters u0u0 v0v s y 0 sxsx а u v 1 Perspective proj. View trans. Viewport proj. Known 3D coords and 2D coords Intrinsic camera parameters (5 parameters) extrinsic camera parameters (6 parameters)
Camera matrix Fold intrinsic calibration matrix K and extrinsic pose parameters (R,t) together into a camera matrix M = K [R | t ] (put 1 in lower r.h. corner for 11 d.o.f.)
Camera matrix calibration Directly estimate 11 unknowns in the M matrix using known 3D points (X i,Y i,Z i ) and measured feature positions (u i,v i )
Camera matrix calibration Linear regression: Bring denominator over, solve set of (over-determined) linear equations. How?
Camera matrix calibration Linear regression: Bring denominator over, solve set of (over-determined) linear equations. How? Least squares (pseudo-inverse) - 11 unknowns (up to scale) - 2 equations per point (homogeneous coordinates) - 6 points are sufficient
Nonlinear camera calibration Perspective projection:
Nonlinear camera calibration Perspective projection: KRTP
Nonlinear camera calibration Perspective projection: 2D coordinates are just a nonlinear function of its 3D coordinates and camera parameters: KRTP
Nonlinear camera calibration Perspective projection: 2D coordinates are just a nonlinear function of its 3D coordinates and camera parameters: KRTP
Multiple calibration images Find camera parameters which satisfy the constraints from M images, N points: for j=1,…,M for i=1,…,N This can be formulated as a nonlinear optimization problem:
Multiple calibration images Find camera parameters which satisfy the constraints from M images, N points: for j=1,…,M for i=1,…,N This can be formulated as a nonlinear optimization problem: Solve the optimization using nonlinear optimization techniques: - Gauss-newton - Levenberg-MarquardtLevenberg-Marquardt
Nonlinear approach Advantages: can solve for more than one camera pose at a time fewer degrees of freedom than linear approach Standard technique in photogrammetry, computer vision, computer graphics - [Tsai 87] also estimates lens distortions CMU) Disadvantages: more complex update rules need a good initialization (recover K [R | t] from M)
How can we estimate the camera parameters?
Application: camera calibration for sports video [Farin et. Al] imagesCourt model
Calibration from 2D motion Structure from motion (SFM) - track points over a sequence of images - estimate for 3D positions and camera positions - calibrate intrinsic camera parameters before hand Self-calibration: - solve for both intrinsic and extrinsic camera parameters
SFM = Holy Grail of 3D Reconstruction Take movie of object Reconstruct 3D model Would be commercially highly viable
How to get feature correspondences Feature-based approach - good for images - feature detection (corners) - feature matching using RANSAC (epipolar line) Pixel-based approach - good for video sequences - patch based registration with lucas-kanade algorithm - register features across the entire sequence
Structure from motion Two Principal Solutions Bundle adjustment (nonlinear optimization) Factorization (SVD, through orthographic approximation, affine geometry)
Nonlinear approach for SFM What’s the difference between camera calibration and SFM?
Nonlinear approach for SFM What’s the difference between camera calibration and SFM? - camera calibration: known 3D and 2D
Nonlinear approach for SFM What’s the difference between camera calibration and SFM? - camera calibration: known 3D and 2D - SFM: unknown 3D and known 2D
Nonlinear approach for SFM What’s the difference between camera calibration and SFM? - camera calibration: known 3D and 2D - SFM: unknown 3D and known 2D - what’s 3D-to-2D registration problem?
Nonlinear approach for SFM What’s the difference between camera calibration and SFM? - camera calibration: known 3D and 2D - SFM: unknown 3D and known 2D - what’s 3D-to-2D registration problem?
SFM: bundle adjustment SFM = Nonlinear Least Squares problem Minimize through Gradient Descent Conjugate Gradient Gauss-Newton Levenberg Marquardt common method Prone to local minima
Count # Constraints vs #Unknowns M camera poses N points 2MN point constraints 6M+3N unknowns Suggests: need 2mn 6m + 3n But: Can we really recover all parameters???
Count # Constraints vs #Unknowns M camera poses N points 2MN point constraints 6M+3N unknowns (known intrinsic camera parameters) Suggests: need 2mn 6m + 3n But: Can we really recover all parameters??? Can’t recover origin, orientation (6 params) Can’t recover scale (1 param) Thus, we need 2mn 6m + 3n - 7
Are we done? No, bundle adjustment has many local minima.
SFM using Factorization Assume an othorgraphic camera Image World
SFM using Factorization Assume othorgraphic camera Image World Subtract the mean
SFM using Factorization Stack all the features from the same frame:
SFM using Factorization Stack all the features from the same frame: Stack all the features from all the images: W
SFM using Factorization Stack all the features from the same frame: Stack all the features from all the images: W
SFM using Factorization Stack all the features from all the images: W Factorize the matrix into two matrix using SVD:
SFM using Factorization Stack all the features from all the images: Factorize the matrix into two matrix using SVD:
SFM using Factorization Stack all the features from all the images: W Factorize the matrix into two matrix using SVD: How to compute the matrix ?
SFM using Factorization M is the stack of rotation matrix:
SFM using Factorization M is the stack of rotation matrix: Orthogonal constraints from rotation matrix
SFM using Factorization M is the stack of rotation matrix: Orthogonal constraints from rotation matrix
SFM using Factorization Orthogonal constraints from rotation matrices:
SFM using Factorization Orthogonal constraints from rotation matrices: QQ: symmetric 3 by 3 matrix
SFM using Factorization Orthogonal constraints from rotation matrices: How to compute QQ? least square solution - 4F linear constraints, 9 unknowns (6 independent due to symmetric matrix) QQ: symmetric 3 by 3 matrix
SFM using Factorization Orthogonal constraints from rotation matrices: How to compute QQ? least square solution - 4F linear constraints, 9 unknowns (6 independent due to symmetric matrix) How to compute Q from QQ: SVD again: QQ: symmetric 3 by 3 matrix
SFM using Factorization M is the stack of rotation matrix: Orthogonal constraints from rotation matrix QQ: symmetric 3 by 3 matrix Computing QQ is easy: - 3F linear equations - 6 independent unknowns
SFM using factorization 1.Form the measurement matrix 2.Decompose the matrix into two matrices and using SVD 3.Compute the matrix Q with least square and SVD 4.Compute the rotation matrix and shape matrix: and
Weak-perspective Projection Factorization also works for weak-perspective projection (scaled orthographic projection): d z0z0
SFM using factorization Bundle adjustment (nonlinear optimization) - work with perspective camera model - work with incomplete data - prone to local minima Factorization: - closed-form solution for weak perspective camera - simple and efficient - usually need complete data - becomes complicated for full-perspective camera model Phil Torr’s structure from motion toolkit in matlab (click here)here Voodoo camera tracker (click here)here
Results from SFM [Han and Kanade]
All together video Click herehere - feature detection - feature matching (epipolar geometry) - structure from motion - stereo reconstruction - triangulation - texture mapping
Challenging objects Why are they so difficult to model? How to model them?