Discrete-Continuous Optimization for Large-scale Structure from Motion

Slides:



Advertisements
Similar presentations
Wheres Waldo: Matching People in Images of Crowds Rahul GargDeva RamananSteven M. Seitz Noah Snavely Problem Definition University of Washington University.
Advertisements

Image Registration  Mapping of Evolution. Registration Goals Assume the correspondences are known Find such f() and g() such that the images are best.
Linear Time Methods for Propagating Beliefs Min Convolution, Distance Transforms and Box Sums Daniel Huttenlocher Computer Science Department December,
The fundamental matrix F
Lecture 11: Two-view geometry
For Internal Use Only. © CT T IN EM. All rights reserved. 3D Reconstruction Using Aerial Images A Dense Structure from Motion pipeline Ramakrishna Vedantam.
Building Rome in a Day Sameer Agarwal1 Noah Snavely2 Ian Simon1 Steven M. Seitz1 Richard Szeliski3 1University of Washington 2Cornell University 3Microsoft.
ICCV 2007 tutorial Part III Message-passing algorithms for energy minimization Vladimir Kolmogorov University College London.
Recent work in image-based rendering from unstructured image collections and remaining challenges Sudipta N. Sinha Microsoft Research, Redmond, USA.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Two-View Geometry CS Sastry and Yang
Learning to estimate human pose with data driven belief propagation Gang Hua, Ming-Hsuan Yang, Ying Wu CVPR 05.
Chapter 6 Feature-based alignment Advanced Computer Vision.
A Global Linear Method for Camera Pose Registration
Discrete-Continuous Optimization for Large-scale Structure from Motion David Crandall, Andrew Owens, Noah Snavely, Dan Huttenlocher Presented by: Rahul.
Structure from motion.
Last Time Pinhole camera model, projection
Lecture 23: Structure from motion and multi-view stereo
Lecture 11: Structure from motion, part 2 CS6670: Computer Vision Noah Snavely.
Probabilistic video stabilization using Kalman filtering and mosaicking.
Structure from motion. Multiple-view geometry questions Scene geometry (structure): Given 2D point matches in two or more images, where are the corresponding.
Lecture 11: Structure from motion CS6670: Computer Vision Noah Snavely.
Lecture 22: Structure from motion CS6670: Computer Vision Noah Snavely.
The plan for today Camera matrix
CS664 Lecture #19: Layers, RANSAC, panoramas, epipolar geometry Some material taken from:  David Lowe, UBC  Jiri Matas, CMP Prague
Global Alignment and Structure from Motion Computer Vision CSE455, Winter 2008 Noah Snavely.
CSCE 641 Computer Graphics: Image-based Modeling (Cont.) Jinxiang Chai.
Lecture 12: Structure from motion CS6670: Computer Vision Noah Snavely.
Multi-view geometry. Multi-view geometry problems Structure: Given projections of the same 3D point in two or more images, compute the 3D coordinates.
Automatic Camera Calibration
Metric Self Calibration From Screw-Transform Manifolds Russell Manning and Charles Dyer University of Wisconsin -- Madison.
Reconstructing Relief Surfaces George Vogiatzis, Philip Torr, Steven Seitz and Roberto Cipolla BMVC 2004.
What we didn’t have time for CS664 Lecture 26 Thursday 12/02/04 Some slides c/o Dan Huttenlocher, Stefano Soatto, Sebastian Thrun.
The Brightness Constraint
Course 12 Calibration. 1.Introduction In theoretic discussions, we have assumed: Camera is located at the origin of coordinate system of scene.
CSCE 643 Computer Vision: Structure from Motion
Problems in large-scale computer vision David Crandall School of Informatics and Computing Indiana University.
Visual SLAM Visual SLAM SPL Seminar (Fri) Young Ki Baik Computer Vision Lab.
IIIT HYDERABAD Image-based walkthroughs from partial and incremental scene reconstructions Kumar Srijan Syed Ahsan Ishtiaque C. V. Jawahar Center for Visual.
Bundle Adjustment A Modern Synthesis Bill Triggs, Philip McLauchlan, Richard Hartley and Andrew Fitzgibbon Presentation by Marios Xanthidis 5 th of No.
776 Computer Vision Jan-Michael Frahm & Enrique Dunn Spring 2013.
Efficient Belief Propagation for Image Restoration Qi Zhao Mar.22,2006.
Project 2 due today Project 3 out today Announcements TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAA.
776 Computer Vision Jan-Michael Frahm Spring 2012.
Announcements No midterm Project 3 will be done in pairs same partners as for project 2.
Lecture 22: Structure from motion CS6670: Computer Vision Noah Snavely.
IIIT HYDERABAD Techniques for Organization and Visualization of Community Photo Collections Kumar Srijan Faculty Advisor : Dr. C.V. Jawahar.
Multi-view geometry. Multi-view geometry problems Structure: Given projections of the same 3D point in two or more images, compute the 3D coordinates.
Think-Pair-Share What visual or physiological cues help us to perceive 3D shape and depth?
Answering ‘Where am I?’ by Nonlinear Least Squares
Line Fitting James Hayes.
Semi-Global Stereo Matching with Surface Orientation Priors
Nonparametric Semantic Segmentation
The Brightness Constraint
Miniature faking In close-up photo, the depth of field is limited.
Dynamical Statistical Shape Priors for Level Set Based Tracking
A special case of calibration
Modeling the world with photos
Structure from motion Input: Output: (Tomasi and Kanade)
The Brightness Constraint
The Brightness Constraint
Combining Geometric- and View-Based Approaches for Articulated Pose Estimation David Demirdjian MIT Computer Science and Artificial Intelligence Laboratory.
Noah Snavely.
Video Compass Jana Kosecka and Wei Zhang George Mason University
Two-view geometry.
Multi-view geometry.
Structure from motion.
Calibration and homographies
Structure from motion Input: Output: (Tomasi and Kanade)
Lecture 15: Structure from motion
Presentation transcript:

Discrete-Continuous Optimization for Large-scale Structure from Motion David Crandall School of Informatics and Computing Indiana University Andrew Owens CSAIL MIT Noah Snavely and Dan Huttenlocher Department of Computer Science Cornell University

SfM from Internet Images Recent work has built 3D models from large, unstructured online image collections [Snavely06], [Li08], [Agarwal09], [Frahm10], Microsoft’s PhotoSynth, … SfM is a key part of these reconstruction pipelines

Structure from Motion f (R, T, P) minimize R1,t1 R3,t3 R2,t2 p4 p1 p3 Structure from motion solves the following problem: Given a set of images of a static scene with 2D points in correspondence, shown here as color-coded points, find… a set of 3D points P and a rotation R and position t of the cameras that explain the observed correspondences. In other words, when we project a point into any of the cameras, the reprojection error between the projected and observed 2D points is low. This problem can be formulated as an optimization problem where we want to find the rotations R, positions t, and 3D point locations P that minimize sum of squared reprojection errors f. This is a non-linear least squares problem and can be solved with algorithms such as Levenberg-Marquart. However, because the problem is non-linear, it can be susceptible to local minima. Therefore, it’s important to initialize the parameters of the system carefully. In addition, we need to be able to deal with erroneous correspondences. Camera 1 Camera 3 R1,t1 R3,t3 Camera 2 R2,t2

Reconstruction pipeline Feature detection Feature matching Find scene points seen by multiple cameras Incremental Bundle Adjustment (IBA) Start with seed model Run bundle adjustment Remove outliers Add another image Repeat [Snavely06], [Li08], [Agarwal09], [Frahm10] … Initialization Robustly estimate camera poses and/or scene points Bundle adjustment Refine camera poses R, T and scene structure P

Initializing bundle adjustment Incremental BA Works very well for many scenes Poor scalability, much use of bundle adjustment Poor results if a bad seed image set is chosen Drift and bad local minima for some scenes Recent work: hierarchical approaches [Gherardhi10, Strecha10]

Batch initialization techniques Factorization [Tomasi92], [Sturm96], [Hartley03] Problems with missing data, robustness, … Linear techniques [Govindu01], [Martinec07], [Sinha08], … L∞ norm approaches [Sim06], [Kahl07], … Sensitive to outliers Photo metadata (e.g. geo-tags) Too noisy to be used directly

Our approach View SfM as inference over a Markov Random Field, solving for all camera poses at once Vertices are cameras (or points) Both pairwise and unary constraints Inference problem: label each image with a camera pose, such that constraints are satisfied

Our approach View SfM as inference over a Markov Random Field, solving for all camera poses at once Combines discrete and continuous optimization: Discrete optimization (loopy belief propagation) with robust energy used to find good initialization Continuous optimization (bundle adjustment) used to refine Related to FusionFlow [Lempitsky08]

Our approach View SfM as inference over a Markov Random Field, solving for all camera poses at once Initializes all cameras at once Can avoid local minima Easily parallelized Up to 6x speedups over incremental bundle adjustment for large problems

3D points can also be modeled The MRF model Input: set of images with correspondence binary constraints: pairwise camera transformations unary constraints: pose estimates (e.g., GPS, heading info) Tie to normal MRFs. 3D points can also be modeled

Constraints on camera pairs Compute relative pose between camera pairs using 2-frame SfM [Nister04] Rij relative 3d rotation tij relative translation direction

Constraints on camera pairs Rij relative 3d rotation tij relative translation direction Find absolute camera poses (Ri, ti) and (Rj, tj) that agree with these pairwise estimates: rotation consistency translation direction consistency

Constraints on camera pairs Rij relative 3d rotation tij relative translation direction Define robustified error functions to use as pairwise potentials: rotation consistency translation direction consistency

Prior pose information Noisy absolute pose info for some cameras 2D positions from geotags (GPS coordinates) Orientations (tilt & twist angles) from vanishing point detection [Sinha10]

Overall optimization problem Given pairwise and unary pose constraints, solve for absolute camera poses simultaneously for n cameras, estimate = (R1, R2, …, Rn) and = (t1, t2, …, tn) so as to minimize total error over the entire graph pairwise rotation consistency unary rotation consistency pairwise translation consistency unary translation consistency

Solving the MRF Use discrete loopy belief propagation [Pearl88] Up to 1,000,000 nodes (cameras and points) Up to 5,000,000 edges (constraints between cameras and points) 6-dimensional label space for cameras (3-dimensional for points)

Solving the MRF Reduce 6-dimensional label space by… Speed up BP by… Solving for rotations & translations independently[Martinec07], [Sim06], [Sinha08] Assuming camera twist angles are near 0 Initially solving for 2D camera positions Speed up BP by… Using a parallel implementation on a cluster Using distance transforms (aka min convolutions) to compute BP messages in O(L) time in # of labels (instead of O(L2)) [Felzenszwalb04]

Discrete BP: Rotations Parameterize viewing directions as points on unit sphere Discretize into 10x10x10 = 1,000 possible labels Measure rotational errors as robust Euclidean distances on sphere (to allow use of distance transform)

Discrete BP: Translations Parameterize positions as 2D points in plane Use approximation to error function (to allow use of distance transforms) Discretize into up to 300 x 300 = 90,000 labels

Our approach Discrete BP optimization over viewing directions Pairwise reconstructions, geotags, VPs Discrete BP optimization over viewing directions Initial rotations NLLS refinement of rotations Discrete BP optimization over 2D translations Refined rotations Initial translations, 2D points NLLS refinement of 3D translations Initial 3D camera poses, 3D points

SfM on unstructured photo collections Feature point detection Image matching Find scene points seen by multiple cameras Initialization Robustly estimate camera poses and/or scene points Discrete optimization over viewing directions NLLS refinement of rotations Discrete optimization over 2D translations NLLS refinement of 3D translations Bundle adjustment Refine camera poses R, T and scene structure P

Experimental results Evaluated on several Flickr datasets Download photos (highest resolution available) & geotags (if available) Removed images with missing EXIF focal lengths Removed panoramic and high-twist images To build the image match graph Used vocabulary tree and inverted files to find similar photos [Agarwal09] Matched images with nearby geotags, closing triangles and performing successive rounds of query expansion [Frahm10]

Acropolis Reconstructed images: 454 Edges in MRF: 65,097 Median camera pose difference wrt IBA: 0.1m

Dubrovnik (Croatia) Reconstructed images: 6,532 Edges in MRF: 1,835,488 Median camera pose difference wrt IBA: 1.0m

Quad Reconstructed images: 5,233 Edges in MRF: 995,734

Ground truth dataset Captured 348 photos of Quad from locations surveyed with differential GPS Allows for ground truth comparison of reconstructed camera positions Median error wrt ground truth: 1.16m for our approach (vs 1.01m for IBA)

Central Rome Reconstructed images: 14,754 Edges in MRF: 2,258,416

Central Rome Reconstructed images: 14,754 Edges in MRF: 2,258,416 Median camera pose difference wrt IBA: 25.0m Our result Incremental Bundle Adjustment [Agarwal09]

After discrete BP

After bundle adjustment

Running times Our results are about as good as or better than IBA, but at a fraction of the computational cost Favorable asymptotic complexity BP is easy to parallelize on a cluster, unlike BA

Limitations and future work Inference on large random fields with huge label spaces Hierarchical state spaces? Dynamic, non-uniform state space discretization? Other inference techniques? (sampling, …) Solving for rotations and translations simultaneously Currently require geotags for some images Using other prior information Orientation sensors, image shadows, etc. Larger image collections

Acknowledgements Jon Kleinberg The Lilly Foundation Indiana University Data to Insight Center NSF Grant 0964027 Indiana University UITS supercomputing center Cornell Center for Advanced Computing Cornell Campus Planning Office MIT Lincoln Labs Quanta Computer Intel

Summary http://vision.soic.indiana.edu/disco A new SfM method for global initialization of cameras and points Uses combination of discrete and continuous optimization to initialize bundle adjustment More robust reconstructions for some scenes Up to 6x speedups over incremental bundle adjustment for large problems http://vision.soic.indiana.edu/disco

Running times detail

Comparison with IBA

Dataset details

Quad results

Approximation Visualization Cost as function of tj