Download presentation
1
Lecture 23: Structure from motion and multi-view stereo
CS6670: Computer Vision Noah Snavely Lecture 23: Structure from motion and multi-view stereo
2
Readings Szeliski, Chapter 7.1 – 7.4, 11.6
3
Announcements Project 2b due on Tuesday by 10:59pm
Final project proposals feedback soon
4
Structure from motion Reconstruction (side) (top) Input: images with points in correspondence pi,j = (ui,j,vi,j) Output structure: 3D location xi for each point pi motion: camera parameters Rj , tj possibly Kj Objective function: minimize reprojection error
5
Structure from motion g(X,R, T) minimize R1,t1 R3,t3 R2,t2 X4 X1 X3 X2
non-linear least squares p1,1 p1,3 Structure from motion solves the following problem: Given a set of images of a static scene with 2D points in correspondence, shown here as color-coded points, find… a set of 3D points P and a rotation R and position t of the cameras that explain the observed correspondences. In other words, when we project a point into any of the cameras, the reprojection error between the projected and observed 2D points is low. This problem can be formulated as an optimization problem where we want to find the rotations R, positions t, and 3D point locations P that minimize sum of squared reprojection errors f. This is a non-linear least squares problem and can be solved with algorithms such as Levenberg-Marquart. However, because the problem is non-linear, it can be susceptible to local minima. Therefore, it’s important to initialize the parameters of the system carefully. In addition, we need to be able to deal with erroneous correspondences. p1,2 Camera 1 Camera 3 R1,t1 R3,t3 Camera 2 R2,t2
6
is point i visible in image j ?
Structure from motion Minimize sum of squared reprojection errors: Minimizing this function is called bundle adjustment Optimized using non-linear least squares, e.g. Levenberg-Marquardt predicted image location observed image location indicator variable: is point i visible in image j ?
7
Incremental structure from motion
To help get good initializations for all of the parameters of the system, we reconstruct the scene incrementally, starting from two photographs and the points they observe.
8
Incremental structure from motion
To help get good initializations for all of the parameters of the system, we reconstruct the scene incrementally, starting from two photographs and the points they observe.
9
Incremental structure from motion
We then add several photos at a time to the reconstruction, refine the model, and repeat until no more photos match any points in the scene.
10
Photo Explorer
11
Demo
14
Why is this hard?
16
Extensions to SfM Can also solve for intrinsic parameters (focal length, radial distortion, etc.) Can use a more robust function than squared error, to avoid fitting to outliers For more information, see: Triggs, et al, “Bundle Adjustment – A Modern Synthesis”, Vision Algorithms 2000.
17
Questions?
18
Can we reconstruct entire cities?
20
_______ _ _____ ______ _ _____ _______ _______ ______ ____ _______ _ _________ ____
21
Gigantic matching problem
1,000,000 images 500,000,000,000 pairs Matching all of these on a 1,000-node cluster would take more than a year, even if we match 10,000 every second And involves TBs of data The vast majority (>99%) of image pairs do not match There are better ways of finding matching images (more on this later)
22
Gigantic SfM problem Largest problem size we’ve seen:
15,000 cameras 4 million 3D points more than 12 million parameters more than 25 million equations Huge optimization problem Requires sparse least squares techniques
23
Building Rome in a Day St. Peter’s Basilica Colosseum Trevi Fountain Rome, Italy. Reconstructed 150,000 in 21 hours on 496 machines
24
Dubrovnik, Croatia. 4,619 images (out of an initial 57,845).
Total reconstruction time: 23 hours Number of cores: 352
25
Dubrovnik Dubrovnik, Croatia. 4,619 images (out of an initial 57,845).
Total reconstruction time: 23 hours Number of cores: 352
26
Total reconstruction time: 3 days. Number of cores: 496.
San Marco Square San Marco Square and environs, Venice. 14,079 photos, out of an initial 250,000. Total reconstruction time: 3 days. Number of cores: 496.
27
Multi-view stereo Stereo Multi-view stereo
28
Multi-view Stereo Point Grey’s Bumblebee XB3 Point Grey’s ProFusion 25
CMU’s 3D Room
29
Multi-view Stereo 29
30
Figures by Carlos Hernandez
Multi-view Stereo Input: calibrated images from several viewpoints Output: 3D object model Figures by Carlos Hernandez
31
Narayanan, Rander, Kanade
Fua Seitz, Dyer Narayanan, Rander, Kanade Faugeras, Keriven 1995 1997 1998 1998 Hernandez, Schmitt Pons, Keriven, Faugeras Furukawa, Ponce Goesele et al. 2004 2005 2006 2007
32
Stereo: basic idea error depth
33
Choosing the stereo baseline
all of these points project to the same pair of pixels width of a pixel Large Baseline Small Baseline What’s the optimal baseline? Too small: large depth error Too large: difficult search problem
34
The Effect of Baseline on Depth Estimation
35
z width of a pixel pixel matching score width of a pixel z
37
Multibaseline Stereo Basic Approach Limitations
Choose a reference view Use your favorite stereo algorithm BUT replace two-view SSD with SSSD over all baselines Limitations Only gives a depth map (not an “object model”) Won’t work for widely distributed views: Visibility!
38
Problem: visibility Some Solutions
Match only nearby photos [Narayanan 98] Use NCC instead of SSD, Ignore NCC values > threshold [Hernandez & Schmitt 03]
39
Popular matching scores
SSD (Sum Squared Distance) NCC (Normalized Cross Correlation) where what advantages might NCC have?
40
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.