Building Rome in a Day Sameer Agarwal1 Noah Snavely2 Ian Simon1 Steven M. Seitz1 Richard Szeliski3 1University of Washington 2Cornell University 3Microsoft.

Slides:



Advertisements
Similar presentations
Structure from motion.
Advertisements

3D Model Matching with Viewpoint-Invariant Patches(VIP) Reporter :鄒嘉恆 Date : 10/06/2009.
The fundamental matrix F
Registration for Robotics Kurt Konolige Willow Garage Stanford University Patrick Mihelich JD Chen James Bowman Helen Oleynikova Freiburg TORO group: Giorgio.
Presented by Xinyu Chang
Summary of Friday A homography transforms one 3d plane to another 3d plane, under perspective projections. Those planes can be camera imaging planes or.
TP14 - Local features: detection and description Computer Vision, FCUP, 2014 Miguel Coimbra Slides by Prof. Kristen Grauman.
Neurocomputing,Neurocomputing, Haojie Li Jinhui Tang Yi Wang Bin Liu School of Software, Dalian University of Technology School of Computer Science,
CS4670 / 5670: Computer Vision Bag-of-words models Noah Snavely Object
Discrete-Continuous Optimization for Large-scale Structure from Motion David Crandall, Andrew Owens, Noah Snavely, Dan Huttenlocher Presented by: Rahul.
Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool.
Image alignment Image from
CVPR 2008 James Philbin Ondˇrej Chum Michael Isard Josef Sivic
Effective Image Database Search via Dimensionality Reduction Anders Bjorholm Dahl and Henrik Aanæs IEEE Computer Society Conference on Computer Vision.
Robust and large-scale alignment Image from
Object retrieval with large vocabularies and fast spatial matching
Lecture 23: Structure from motion and multi-view stereo
Lecture 28: Bag-of-words models
Lecture 11: Structure from motion, part 2 CS6670: Computer Vision Noah Snavely.
Recognising Panoramas
Global Alignment and Structure from Motion
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.
Automatic Panoramic Image Stitching using Local Features Matthew Brown and David Lowe, University of British Columbia.
Lecture 11: Structure from motion CS6670: Computer Vision Noah Snavely.
Lecture 22: Structure from motion CS6670: Computer Vision Noah Snavely.
CS664 Lecture #19: Layers, RANSAC, panoramas, epipolar geometry Some material taken from:  David Lowe, UBC  Jiri Matas, CMP Prague
Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.
Global Alignment and Structure from Motion Computer Vision CSE455, Winter 2008 Noah Snavely.
CSCE 641 Computer Graphics: Image-based Modeling (Cont.) Jinxiang Chai.
Lecture 12: Structure from motion CS6670: Computer Vision Noah Snavely.
Feature Matching and RANSAC : Computational Photography Alexei Efros, CMU, Fall 2005 with a lot of slides stolen from Steve Seitz and Rick Szeliski.
Matthew Brown University of British Columbia (prev.) Microsoft Research [ Collaborators: † Simon Winder, *Gang Hua, † Rick Szeliski † =MS Research, *=MS.
Object Recognition and Augmented Reality
Mosaics CSE 455, Winter 2010 February 8, 2010 Neel Joshi, CSE 455, Winter Announcements  The Midterm went out Friday  See to the class.
Keypoint-based Recognition Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/04/10.
Final Exam Review CS485/685 Computer Vision Prof. Bebis.
RFID ACCESS AUTHORIZATION BY FACE RECOGNITION 報告學生:翁偉傑 1 Proceedings of the Eighth International Conference on Machine Learning and Cybernetics, Baoding,
Image Stitching Shangliang Jiang Kate Harrison. What is image stitching?
A Statistical Approach to Speed Up Ranking/Re-Ranking Hong-Ming Chen Advisor: Professor Shih-Fu Chang.
Example: line fitting. n=2 Model fitting Measure distances.
CSCE 643 Computer Vision: Structure from Motion
Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.
10/31/13 Object Recognition and Augmented Reality Computational Photography Derek Hoiem, University of Illinois Dali, Swans Reflecting Elephants.
IIIT HYDERABAD Image-based walkthroughs from partial and incremental scene reconstructions Kumar Srijan Syed Ahsan Ishtiaque C. V. Jawahar Center for Visual.
Scene Reconstruction Seminar presented by Anton Jigalin Advanced Topics in Computer Vision ( )
3D reconstruction from uncalibrated images
COS429 Computer Vision =++ Assignment 4 Cloning Yourself.
CSE 185 Introduction to Computer Vision Feature Matching.
Local features: detection and description
Bundling Features for Large Scale Partial-Duplicate Web Image Search Zhong Wu ∗, Qifa Ke, Michael Isard, and Jian Sun Microsoft Research.
Lecture 9 Feature Extraction and Motion Estimation Slides by: Michael Black Clark F. Olson Jean Ponce.
776 Computer Vision Jan-Michael Frahm Spring 2012.
Announcements No midterm Project 3 will be done in pairs same partners as for project 2.
Lecture 22: Structure from motion CS6670: Computer Vision Noah Snavely.
Invariant Local Features Image content is transformed into local feature coordinates that are invariant to translation, rotation, scale, and other imaging.
IIIT HYDERABAD Techniques for Organization and Visualization of Community Photo Collections Kumar Srijan Faculty Advisor : Dr. C.V. Jawahar.
776 Computer Vision Jan-Michael Frahm Spring 2012.
Discrete-Continuous Optimization for Large-scale Structure from Motion
Capturing, Processing and Experiencing Indian Monuments
TP12 - Local features: detection and description
Modeling the world with photos
Structure from motion Input: Output: (Tomasi and Kanade)
Feature Matching and RANSAC
Noah Snavely.
Lecture 23: Structure from motion 2
Structure from motion.
Computational Photography
Automatic Panoramic Image Stitching using Invariant Features
Structure from motion Input: Output: (Tomasi and Kanade)
Lecture 15: Structure from motion
Presentation transcript:

Building Rome in a Day Sameer Agarwal1 Noah Snavely2 Ian Simon1 Steven M. Seitz1 Richard Szeliski3 1University of Washington 2Cornell University 3Microsoft Research

Outline 1. Introduction 2. System Design 3. Result 4. Conclusion

Introduction Entering the search term “Rome” on flickr returns more than two million photographs. 3D reconstruction in Google Earth and Microsoft’s Virtual Earth

Exploring Photo Collection in 3D

Outline 1. Introduction 2. System Design – 1.pre-processing & feature extraction – 2.matching – 3.geometric estimation 3. Result 4. Conclusion

Scene reconstruction Automatically estimate position, orientation, and focal length of cameras 3D positions of feature points

Feature detection Detect features using SIFT [Lowe, IJCV 2004]

Feature detection Detect features using SIFT [Lowe, IJCV 2004]

Feature detection Detect features using SIFT [Lowe, IJCV 2004]

Feature matching Match features between each pair of images approximate nearest neighbor matching

Feature matching Refine matching using RANSAC [Fischler & Bolles 1987] to estimate fundamental matrices between pairs

Correspondence estimation Link up pairwise matches to form connected components of matches across several images Image 1Image 2Image 3Image 4

Structure from motion structure for motion: automatic recovery of camera motion and scene structure from two or more images. It is a self calibration technique and called automatic camera tracking or match moving. Unknowncameraviewpoints

Structure from motion Camera 1 Camera 2 Camera 3 R 1,t 1 R 2,t 2 R 3,t 3 p1p1 p4p4 p3p3 p2p2 p5p5 p6p6 p7p7 minimize f (R, T, P)f (R, T, P) rotations R, positions t, and 3D point locations P that minimize sum of squared reprojection errors f

Incremental structure from motion

Optimize parameters for two cameras and common points Find new image with most matches to existing points Initialize new camera using pose estimation Bundle adjust Add new points Bundle adjust

Incremental structure from motion

Vocabulary trees (Nister & Stewenius, 2006) Computational efficiency k-means tree is used to quantize the feature descriptors

TF-IDF ( term frequency–inverse document frequency ) Consider a document containing 100 words wherein the word cow appears 3 times. (TF) = (3 / 100) = Assume we have 10 million documents and cow appears in one thousand of these. (IDF) = log( / 1 000) = 4.

TF-IDF score is the product of these quantities: 0.03 × 4 = 0.12 The word is important if the TF-IDF score is large 某一特定文件內的高詞語頻率,以及該詞語 在整個文件集合中的低文件頻率,可以產生 出高權重的 TF-IDF 。因此, TF-IDF 傾向於過 濾掉常見的詞語,保留重要的詞語。

Query expansion Large-scale image matching Better approach: use bag-of-words technique to find likely matches For each image, find the top M scoring other images, do detailed SIFT matching with those

Outline 1. Introduction 2. System Design 3. Result 4. Conclusion

Matching and reconstruction statics for the three sets

Building Rome in a Day Rome, Italy. Reconstructed 150,000 in 21 hours on 496 machines Colosseum St. Peter’s Basilica Trevi Fountain

Dubrovnik, Croatia. 4,619 images (out of an initial 57,845). Total reconstruction time: 23 hours Number of cores: 352

Dubrovnik Dubrovnik, Croatia. 4,619 images (out of an initial 57,845). Total reconstruction time: 23 hours Number of cores: 352

San Marco Square San Marco Square and environs, Venice. 14,079 photos, out of an initial 250,000. Total reconstruction time: 3 days. Number of cores: 496.

Outline 1. Introduction 2. System Design 3. Result 4. Conclusion

Conclusion Our experimental results demonstrate that it is now possible to reconstruct cities consisting of 150K images in less than a day on a cluster with 500 compute cores. Large-scale image matching 3D models tml