Lecture 23: Structure from motion and multi-view stereo

Name: Lecture 23: Structure from motion and multi-view stereo
Uploaded: 2017-08-23T09:50:54+00:00
Duration: PTM10S53
Description: Lecture 23: Structure from motion and multi-view stereo

Lecture 23: Structure from motion and multi-view stereo
CS6670: Computer Vision Noah Snavely Lecture 23: Structure from motion and multi-view stereo

Readings Szeliski, Chapter 7.1 – 7.4, 11.6

Announcements Project 2b due on Tuesday by 10:59pm
Final project proposals feedback soon

Structure from motion Reconstruction (side) (top) Input: images with points in correspondence pi,j = (ui,j,vi,j) Output structure: 3D location xi for each point pi motion: camera parameters Rj , tj possibly Kj Objective function: minimize reprojection error

Structure from motion g(X,R, T) minimize R1,t1 R3,t3 R2,t2 X4 X1 X3 X2
non-linear least squares p1,1 p1,3 Structure from motion solves the following problem: Given a set of images of a static scene with 2D points in correspondence, shown here as color-coded points, find… a set of 3D points P and a rotation R and position t of the cameras that explain the observed correspondences. In other words, when we project a point into any of the cameras, the reprojection error between the projected and observed 2D points is low. This problem can be formulated as an optimization problem where we want to find the rotations R, positions t, and 3D point locations P that minimize sum of squared reprojection errors f. This is a non-linear least squares problem and can be solved with algorithms such as Levenberg-Marquart. However, because the problem is non-linear, it can be susceptible to local minima. Therefore, it’s important to initialize the parameters of the system carefully. In addition, we need to be able to deal with erroneous correspondences. p1,2 Camera 1 Camera 3 R1,t1 R3,t3 Camera 2 R2,t2

is point i visible in image j ?
Structure from motion Minimize sum of squared reprojection errors: Minimizing this function is called bundle adjustment Optimized using non-linear least squares, e.g. Levenberg-Marquardt predicted image location observed image location indicator variable: is point i visible in image j ?

Incremental structure from motion
To help get good initializations for all of the parameters of the system, we reconstruct the scene incrementally, starting from two photographs and the points they observe.

Incremental structure from motion
We then add several photos at a time to the reconstruction, refine the model, and repeat until no more photos match any points in the scene.

Photo Explorer

Why is this hard?

Extensions to SfM Can also solve for intrinsic parameters (focal length, radial distortion, etc.) Can use a more robust function than squared error, to avoid fitting to outliers For more information, see: Triggs, et al, “Bundle Adjustment – A Modern Synthesis”, Vision Algorithms 2000.

Questions?

Can we reconstruct entire cities?

_______ _ _____ ______ _ _____ _______ _______ ______ ____ _______ _ _________ ____

Gigantic matching problem
1,000,000 images  500,000,000,000 pairs Matching all of these on a 1,000-node cluster would take more than a year, even if we match 10,000 every second And involves TBs of data The vast majority (>99%) of image pairs do not match There are better ways of finding matching images (more on this later)

Gigantic SfM problem Largest problem size we’ve seen:
15,000 cameras 4 million 3D points more than 12 million parameters more than 25 million equations Huge optimization problem Requires sparse least squares techniques

Building Rome in a Day St. Peter’s Basilica Colosseum Trevi Fountain Rome, Italy. Reconstructed 150,000 in 21 hours on 496 machines

Dubrovnik, Croatia. 4,619 images (out of an initial 57,845).
Total reconstruction time: 23 hours Number of cores: 352

Dubrovnik Dubrovnik, Croatia. 4,619 images (out of an initial 57,845).
Total reconstruction time: 23 hours Number of cores: 352

Total reconstruction time: 3 days. Number of cores: 496.
San Marco Square San Marco Square and environs, Venice. 14,079 photos, out of an initial 250,000. Total reconstruction time: 3 days. Number of cores: 496.

Multi-view stereo Stereo Multi-view stereo

Multi-view Stereo Point Grey’s Bumblebee XB3 Point Grey’s ProFusion 25
CMU’s 3D Room

Multi-view Stereo 29

Figures by Carlos Hernandez
Multi-view Stereo Input: calibrated images from several viewpoints Output: 3D object model Figures by Carlos Hernandez

Narayanan, Rander, Kanade
Fua Seitz, Dyer Narayanan, Rander, Kanade Faugeras, Keriven 1995 1997 1998 1998 Hernandez, Schmitt Pons, Keriven, Faugeras Furukawa, Ponce Goesele et al. 2004 2005 2006 2007

Stereo: basic idea error depth

Choosing the stereo baseline
all of these points project to the same pair of pixels width of a pixel Large Baseline Small Baseline What’s the optimal baseline? Too small: large depth error Too large: difficult search problem

The Effect of Baseline on Depth Estimation

z width of a pixel pixel matching score width of a pixel z

Multibaseline Stereo Basic Approach Limitations
Choose a reference view Use your favorite stereo algorithm BUT replace two-view SSD with SSSD over all baselines Limitations Only gives a depth map (not an “object model”) Won’t work for widely distributed views: Visibility!

Problem: visibility Some Solutions
Match only nearby photos [Narayanan 98] Use NCC instead of SSD, Ignore NCC values > threshold [Hernandez & Schmitt 03]

Popular matching scores
SSD (Sum Squared Distance) NCC (Normalized Cross Correlation) where what advantages might NCC have?

Questions?

Lecture 23: Structure from motion and multi-view stereo

Similar presentations

Presentation on theme: "Lecture 23: Structure from motion and multi-view stereo"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 23: Structure from motion and multi-view stereo

Similar presentations

Presentation on theme: "Lecture 23: Structure from motion and multi-view stereo"— Presentation transcript:

Similar presentations

About project

Feedback