Discrete-Continuous Optimization for Large-scale Structure from Motion David Crandall, Andrew Owens, Noah Snavely, Dan Huttenlocher Presented by: Rahul.

Slides:



Advertisements
Similar presentations
Structure from motion.
Advertisements

Wheres Waldo: Matching People in Images of Crowds Rahul GargDeva RamananSteven M. Seitz Noah Snavely Problem Definition University of Washington University.
The fundamental matrix F
Lecture 11: Two-view geometry
For Internal Use Only. © CT T IN EM. All rights reserved. 3D Reconstruction Using Aerial Images A Dense Structure from Motion pipeline Ramakrishna Vedantam.
Building Rome in a Day Sameer Agarwal1 Noah Snavely2 Ian Simon1 Steven M. Seitz1 Richard Szeliski3 1University of Washington 2Cornell University 3Microsoft.
MASKS © 2004 Invitation to 3D vision Lecture 7 Step-by-Step Model Buidling.
Recent work in image-based rendering from unstructured image collections and remaining challenges Sudipta N. Sinha Microsoft Research, Redmond, USA.
Two-View Geometry CS Sastry and Yang
A Global Linear Method for Camera Pose Registration
Structure from motion.
Last Time Pinhole camera model, projection
Lecture 23: Structure from motion and multi-view stereo
Lecture 11: Structure from motion, part 2 CS6670: Computer Vision Noah Snavely.
Structure from motion. Multiple-view geometry questions Scene geometry (structure): Given 2D point matches in two or more images, where are the corresponding.
Automatic Panoramic Image Stitching using Local Features Matthew Brown and David Lowe, University of British Columbia.
Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado.
Lecture 11: Structure from motion CS6670: Computer Vision Noah Snavely.
Lecture 22: Structure from motion CS6670: Computer Vision Noah Snavely.
The plan for today Camera matrix
CS664 Lecture #19: Layers, RANSAC, panoramas, epipolar geometry Some material taken from:  David Lowe, UBC  Jiri Matas, CMP Prague
Lecture 19: Optical flow CS6670: Computer Vision Noah Snavely
Global Alignment and Structure from Motion Computer Vision CSE455, Winter 2008 Noah Snavely.
CSCE 641 Computer Graphics: Image-based Modeling (Cont.) Jinxiang Chai.
Lecture 12: Structure from motion CS6670: Computer Vision Noah Snavely.
Accurate, Dense and Robust Multi-View Stereopsis Yasutaka Furukawa and Jean Ponce Presented by Rahul Garg and Ryan Kaminsky.
Review: Binocular stereo If necessary, rectify the two stereo images to transform epipolar lines into scanlines For each pixel x in the first image Find.
Matthew Brown University of British Columbia (prev.) Microsoft Research [ Collaborators: † Simon Winder, *Gang Hua, † Rick Szeliski † =MS Research, *=MS.
Internet-scale Imagery for Graphics and Vision James Hays cs195g Computational Photography Brown University, Spring 2010.
Mosaics CSE 455, Winter 2010 February 8, 2010 Neel Joshi, CSE 455, Winter Announcements  The Midterm went out Friday  See to the class.
776 Computer Vision Jan-Michael Frahm, Enrique Dunn Spring 2013.
Automatic Camera Calibration
Linearizing (assuming small (u,v)): Brightness Constancy Equation: The Brightness Constraint Where:),(),(yxJyxII t  Each pixel provides 1 equation in.
The Brightness Constraint
Course 12 Calibration. 1.Introduction In theoretic discussions, we have assumed: Camera is located at the origin of coordinate system of scene.
ALIGNMENT OF 3D ARTICULATE SHAPES. Articulated registration Input: Two or more 3d point clouds (possibly with connectivity information) of an articulated.
Image stitching Digital Visual Effects Yung-Yu Chuang with slides by Richard Szeliski, Steve Seitz, Matthew Brown and Vaclav Hlavac.
CSCE 643 Computer Vision: Structure from Motion
Correspondence-Free Determination of the Affine Fundamental Matrix (Tue) Young Ki Baik, Computer Vision Lab.
Large Scale Discovery of Spatially Related Images Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University Prague.
Problems in large-scale computer vision David Crandall School of Informatics and Computing Indiana University.
© 2005 Martin Bujňák, Martin Bujňák Supervisor : RNDr.
Peripheral drift illusion. Multiple views Hartley and Zisserman Lowe stereo vision structure from motion optical flow.
IIIT HYDERABAD Image-based walkthroughs from partial and incremental scene reconstructions Kumar Srijan Syed Ahsan Ishtiaque C. V. Jawahar Center for Visual.
Scene Reconstruction Seminar presented by Anton Jigalin Advanced Topics in Computer Vision ( )
776 Computer Vision Jan-Michael Frahm & Enrique Dunn Spring 2013.
Linearizing (assuming small (u,v)): Brightness Constancy Equation: The Brightness Constraint Where:),(),(yxJyxII t  Each pixel provides 1 equation in.
Announcements No midterm Project 3 will be done in pairs same partners as for project 2.
Internet-scale Imagery for Graphics and Vision James Hays cs129 Computational Photography Brown University, Spring 2011.
Tracking Hands with Distance Transforms Dave Bargeron Noah Snavely.
Lecture 22: Structure from motion CS6670: Computer Vision Noah Snavely.
CSE 185 Introduction to Computer Vision Stereo 2.
IIIT HYDERABAD Techniques for Organization and Visualization of Community Photo Collections Kumar Srijan Faculty Advisor : Dr. C.V. Jawahar.
Think-Pair-Share What visual or physiological cues help us to perceive 3D shape and depth?
Discrete-Continuous Optimization for Large-scale Structure from Motion
Line Fitting James Hayes.
Depth from disparity (x´,y´)=(x+D(x,y), y)
The Brightness Constraint
Modeling the world with photos
Structure from motion Input: Output: (Tomasi and Kanade)
The Brightness Constraint
The Brightness Constraint
Noah Snavely.
Lecture 23: Structure from motion 2
Multi-view geometry.
Structure from motion.
Automatic Panoramic Image Stitching using Invariant Features
Structure from motion Input: Output: (Tomasi and Kanade)
Lecture 15: Structure from motion
Presentation transcript:

Discrete-Continuous Optimization for Large-scale Structure from Motion David Crandall, Andrew Owens, Noah Snavely, Dan Huttenlocher Presented by: Rahul Garg

SfM from Internet Images Recent work has built 3D models from large, unstructured online image collections – [Snavely06], [Li08], [Agarwal09], [Frahm10], Microsoft’s PhotoSynth, … SfM is a key part of these reconstruction pipelines [Slide from authors]

Camera 1 Camera 2 Camera 3 R 1,t 1 R 2,t 2 R 3,t 3 p1p1 p4p4 p3p3 p2p2 p5p5 p6p6 p7p7 minimize f (R, T, P)f (R, T, P) Structure from Motion [Slide from authors]

Feature detection Bundle adjustment Refine camera poses R, T and scene structure P Bundle adjustment Refine camera poses R, T and scene structure P Feature matching Find scene points seen by multiple cameras Feature matching Find scene points seen by multiple cameras Incremental Bundle Adjustment (IBA) Reconstruction pipeline [Slide from authors]

How IBA Works Image graph with pairwise point correspondences

How IBA Works R 1, t 1 R 2, t 2 Start with a seed pair Constraints from correspondences

How IBA works R 1, t 1 R 2, t 2 Add new cameras to the seed R 3, t 3

How IBA works R 1, t 1 R 2, t 2 Add new cameras to the seed R 3, t 3 (x,y,z)

How IBA works R 1, t 1 R 2, t 2 Also calculate 3D positions of points that cameras see R 3, t 3 (x,y,z)

How IBA works R 1, t 1 R 2, t 2 Wiggle solution periodically to get a better solution R 3, t 3 (x,y,z)

How IBA works R 1, t 1 R 2, t 2 Wiggle solution periodically to get a better solution R 3, t 3 (x,y,z)

How IBA works R 1, t 1 R 2, t 2 Wiggle (Bundle adjust) solution periodically to get a better solution R 3, t 3 (x,y,z)

How IBA works R 1, t 1 R 2, t 2 Keep adding more cameras R 3, t 3 (x,y,z) R 4, t 4

Problems with IBA R 1, t 1 R 2, t 2 Incremental, prone to drifts and local minima R 3, t 3 (x,y,z) R 4, t 4

Problems with IBA R 1, t 1 R 2, t 2 Incremental, prone to drifts and local minima R 3, t 3 (x,y,z) R 4, t 4

Problems with IBA R 1, t 1 R 2, t 2 No easy way to incorporate priors (GPS info, etc.) R 3, t 3 (x,y,z) R 4, t 4 t 4 from GPS R 4 from Vanishing Points

Initializing bundle adjustment Incremental BA Works very well for many scenes Poor scalability, much use of bundle adjustment Poor results if a bad seed image set is chosen Drift and bad local minima for some scenes No way to add priors [Slide from authors]

Our Approach R 1, t 1 R 2, t 2 Solve for everything simultaneously R 3, t 3 (x,y,z) R 4, t 4

Our Approach R 1, t 1 R 2, t 2 R 3, t 3 (x,y,z) R 4, t 4 Pose the problem as an MRF Nodes – Cameras and 3D Points Labels – Discretized Pose and 3D locations

The MRF model binary constraints: pairwise camera transformations unary constraints: pose estimates (e.g., GPS, heading info) 3D points can also be modeled Input: set of images with correspondence [Slide from authors]

Making MRF Tractable Label Space: 3 unknowns for rotation 3 unknowns for translation Dimensionality of State space: 6

Making MRF Tractable First solve for rotation Dimensionality of State space: Label Space: 3 unknowns for rotation 3 unknowns for translation

Making MRF Tractable First solve for rotation Assume twist = 0 Dimensionality of State space: Label Space: 2 unknowns for rotation 3 unknowns for translation

Making MRF Tractable First solve for rotation Assume twist = 0 Assume cameras are on the ground Dimensionality of State space: Label Space: 2 unknowns for rotation 2 unknowns for translation

Making MRF Tractable First solve for rotation Assume twist = 0 Assume cameras are near ground Solve using parallelizable BP on a cluster, use distance transforms to speed up message passing Dimensionality of State space: Label Space: 2 unknowns for rotation 2 unknowns for translation

Refining the MRF Solution Use Non Linear Least Squares to refine the MRF solution (Discrete => Continuous space)

Overall Approach Discrete BP optimization over viewing directions NLLS refinement of rotations Discrete BP optimization over 2D translations NLLS refinement of 3D translations Pairwise reconstructions, geotags, VPs Initial rotations Refined rotations Initial translations, 2D points Initial 3D camera poses, 3D points [Slide from authors]

Feature point detection Bundle adjustment Refine camera poses R, T and scene structure P Bundle adjustment Refine camera poses R, T and scene structure P Image matching Find scene points seen by multiple cameras Image matching Find scene points seen by multiple cameras Initialization Robustly estimate camera poses and/or scene points Initialization Robustly estimate camera poses and/or scene points SfM on unstructured photo collections Discrete optimization over viewing directions NLLS refinement of rotations Discrete optimization over 2D translations NLLS refinement of 3D translations [Slide from authors]

Experimental results Evaluated on several Flickr datasets – Download photos (highest resolution available) & geotags (if available) – Removed images with missing EXIF focal lengths – Removed panoramic and high-twist images [Slide from authors]

Acropolis Reconstructed images: 454 Edges in MRF: 65,097 Median camera pose difference wrt IBA: 0.1m [Slide from authors]

Dubrovnik (Croatia) Reconstructed images: 6,532 Edges in MRF: 1,835,488 Median camera pose difference wrt IBA: 1.0m [Slide from authors]

Quad Total images: 6,514 Reconstructed images: 5,233 Edges in MRF: 995,734 Ground truth for 348 cameras Median error wrt ground truth: 1.16m (vs 1.01 for IBA) [Slide from authors]

Quad results [Slide from authors]

Central Rome Reconstructed images: 14,754 Edges in MRF: 2,258,416 [Slide from authors]

Central Rome Reconstructed images: 14,754 Edges in MRF: 2,258,416 Median camera pose difference wrt IBA: 25.0m Our result Incremental Bundle Adjustment [Agarwal09] [Slide from authors]

After discrete BP [Slide from authors]

After NNLS [Slide from authors]

After final BA [Slide from authors]

Running times Our results are about as good as or better than IBA, but at a fraction of the computational cost – Favorable asymptotic complexity – BP is easy to parallelize on a cluster, unlike BA

IBA vs Our Approach IBA Prone to drift, local minima Dependent upon seed set Not robust to outliers No easy way to add priors (GPS tags, etc.) Not Parallelizable Our Approach Simultaneously solve for global optima Objective function robust to outliers Easy to add priors Parallelizable

Discussion Discrete BP optimization over viewing directions NLLS refinement of rotations Discrete BP optimization over 2D translations NLLS refinement of 3D translations Pairwise reconstructions, geotags, VPs Initial rotations Refined rotations Initial translations, 2D points Initial 3D camera poses, 3D points Parallelizable! Easily incorporates noisy GPS info Non Parallelizable

Discussion Two step process for finding R and T. Is that why final BA is required? Final BA is still the bottleneck – not parallelizable

Thank You

References Photo Tourism: Exploring Image Collections in 3D. Noah Snavely, Steven M. Seitz, Richard Szeliski. SIGGRAPH 2006 Building Rome in a Day Sameer Agarwal, Noah Snavely, Ian Simon, Steven M. Seitz and Richard Szeliski, ICCV 2009 Building Rome on a Cloudless Day Jan-Michael Frahm, Pierre Georgel, David Gallup, Tim Johnson, Rahul Raguram, Changchang Wu, Yi-Hung Jen, Enrique Dunn, Brian Clipp, Svetlana Lazebnik, Marc Pollefeys, ECCV 2010