Building Rome in a Day Sameer Agarwal1 Noah Snavely2 Ian Simon1 Steven M. Seitz1 Richard Szeliski3 1University of Washington 2Cornell University 3Microsoft.

Slides:

Advertisements

Similar presentations

Structure from motion.

Advertisements

3D Model Matching with Viewpoint-Invariant Patches(VIP) Reporter ：鄒嘉恆 Date ： 10/06/2009.

The fundamental matrix F

Registration for Robotics Kurt Konolige Willow Garage Stanford University Patrick Mihelich JD Chen James Bowman Helen Oleynikova Freiburg TORO group: Giorgio.

Presented by Xinyu Chang

Summary of Friday A homography transforms one 3d plane to another 3d plane, under perspective projections. Those planes can be camera imaging planes or.

TP14 - Local features: detection and description Computer Vision, FCUP, 2014 Miguel Coimbra Slides by Prof. Kristen Grauman.

Neurocomputing,Neurocomputing, Haojie Li Jinhui Tang Yi Wang Bin Liu School of Software, Dalian University of Technology School of Computer Science,

CS4670 / 5670: Computer Vision Bag-of-words models Noah Snavely Object

Discrete-Continuous Optimization for Large-scale Structure from Motion David Crandall, Andrew Owens, Noah Snavely, Dan Huttenlocher Presented by: Rahul.

Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool.

Image alignment Image from

CVPR 2008 James Philbin Ondˇrej Chum Michael Isard Josef Sivic

Effective Image Database Search via Dimensionality Reduction Anders Bjorholm Dahl and Henrik Aanæs IEEE Computer Society Conference on Computer Vision.

Robust and large-scale alignment Image from

Object retrieval with large vocabularies and fast spatial matching

Lecture 23: Structure from motion and multi-view stereo

Lecture 28: Bag-of-words models

Lecture 11: Structure from motion, part 2 CS6670: Computer Vision Noah Snavely.

Recognising Panoramas

Global Alignment and Structure from Motion

Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.

Automatic Panoramic Image Stitching using Local Features Matthew Brown and David Lowe, University of British Columbia.

Lecture 11: Structure from motion CS6670: Computer Vision Noah Snavely.

Lecture 22: Structure from motion CS6670: Computer Vision Noah Snavely.

CS664 Lecture #19: Layers, RANSAC, panoramas, epipolar geometry Some material taken from:  David Lowe, UBC  Jiri Matas, CMP Prague

Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.

Global Alignment and Structure from Motion Computer Vision CSE455, Winter 2008 Noah Snavely.

CSCE 641 Computer Graphics: Image-based Modeling (Cont.) Jinxiang Chai.

Lecture 12: Structure from motion CS6670: Computer Vision Noah Snavely.

Feature Matching and RANSAC : Computational Photography Alexei Efros, CMU, Fall 2005 with a lot of slides stolen from Steve Seitz and Rick Szeliski.

Matthew Brown University of British Columbia (prev.) Microsoft Research [ Collaborators: † Simon Winder, *Gang Hua, † Rick Szeliski † =MS Research, *=MS.

Object Recognition and Augmented Reality

Mosaics CSE 455, Winter 2010 February 8, 2010 Neel Joshi, CSE 455, Winter Announcements  The Midterm went out Friday  See to the class.

Keypoint-based Recognition Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/04/10.

Final Exam Review CS485/685 Computer Vision Prof. Bebis.

RFID ACCESS AUTHORIZATION BY FACE RECOGNITION 報告學生：翁偉傑 1 Proceedings of the Eighth International Conference on Machine Learning and Cybernetics, Baoding,

Image Stitching Shangliang Jiang Kate Harrison. What is image stitching?

A Statistical Approach to Speed Up Ranking/Re-Ranking Hong-Ming Chen Advisor: Professor Shih-Fu Chang.

Example: line fitting. n=2 Model fitting Measure distances.

CSCE 643 Computer Vision: Structure from Motion

Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.

10/31/13 Object Recognition and Augmented Reality Computational Photography Derek Hoiem, University of Illinois Dali, Swans Reflecting Elephants.

IIIT HYDERABAD Image-based walkthroughs from partial and incremental scene reconstructions Kumar Srijan Syed Ahsan Ishtiaque C. V. Jawahar Center for Visual.

Scene Reconstruction Seminar presented by Anton Jigalin Advanced Topics in Computer Vision ( )

3D reconstruction from uncalibrated images

COS429 Computer Vision =++ Assignment 4 Cloning Yourself.

CSE 185 Introduction to Computer Vision Feature Matching.

Local features: detection and description

Bundling Features for Large Scale Partial-Duplicate Web Image Search Zhong Wu ∗, Qifa Ke, Michael Isard, and Jian Sun Microsoft Research.

Lecture 9 Feature Extraction and Motion Estimation Slides by: Michael Black Clark F. Olson Jean Ponce.

776 Computer Vision Jan-Michael Frahm Spring 2012.

Announcements No midterm Project 3 will be done in pairs same partners as for project 2.

Lecture 22: Structure from motion CS6670: Computer Vision Noah Snavely.

Invariant Local Features Image content is transformed into local feature coordinates that are invariant to translation, rotation, scale, and other imaging.

IIIT HYDERABAD Techniques for Organization and Visualization of Community Photo Collections Kumar Srijan Faculty Advisor : Dr. C.V. Jawahar.

776 Computer Vision Jan-Michael Frahm Spring 2012.

Discrete-Continuous Optimization for Large-scale Structure from Motion

Capturing, Processing and Experiencing Indian Monuments

TP12 - Local features: detection and description

Modeling the world with photos

Structure from motion Input: Output: (Tomasi and Kanade)

Feature Matching and RANSAC

Lecture 23: Structure from motion 2

Structure from motion.

Computational Photography

Automatic Panoramic Image Stitching using Invariant Features

Structure from motion Input: Output: (Tomasi and Kanade)

Lecture 15: Structure from motion

Presentation transcript:

Building Rome in a Day Sameer Agarwal1 Noah Snavely2 Ian Simon1 Steven M. Seitz1 Richard Szeliski3 1University of Washington 2Cornell University 3Microsoft Research

Outline 1. Introduction 2. System Design 3. Result 4. Conclusion

Introduction Entering the search term “Rome” on flickr returns more than two million photographs. 3D reconstruction in Google Earth and Microsoft’s Virtual Earth

Exploring Photo Collection in 3D

Outline 1. Introduction 2. System Design – 1.pre-processing & feature extraction – 2.matching – 3.geometric estimation 3. Result 4. Conclusion

Scene reconstruction Automatically estimate position, orientation, and focal length of cameras 3D positions of feature points

Feature detection Detect features using SIFT [Lowe, IJCV 2004]

Feature detection Detect features using SIFT [Lowe, IJCV 2004]

Feature detection Detect features using SIFT [Lowe, IJCV 2004]

Feature matching Match features between each pair of images approximate nearest neighbor matching

Feature matching Refine matching using RANSAC [Fischler & Bolles 1987] to estimate fundamental matrices between pairs

Correspondence estimation Link up pairwise matches to form connected components of matches across several images Image 1Image 2Image 3Image 4

Structure from motion structure for motion: automatic recovery of camera motion and scene structure from two or more images. It is a self calibration technique and called automatic camera tracking or match moving. Unknowncameraviewpoints

Structure from motion Camera 1 Camera 2 Camera 3 R 1,t 1 R 2,t 2 R 3,t 3 p1p1 p4p4 p3p3 p2p2 p5p5 p6p6 p7p7 minimize f (R, T, P)f (R, T, P) rotations R, positions t, and 3D point locations P that minimize sum of squared reprojection errors f

Incremental structure from motion

Optimize parameters for two cameras and common points Find new image with most matches to existing points Initialize new camera using pose estimation Bundle adjust Add new points Bundle adjust

Incremental structure from motion

Vocabulary trees (Nister & Stewenius, 2006) Computational efficiency k-means tree is used to quantize the feature descriptors

TF-IDF （ term frequency–inverse document frequency ） Consider a document containing 100 words wherein the word cow appears 3 times. (TF) = (3 / 100) = Assume we have 10 million documents and cow appears in one thousand of these. (IDF) = log( / 1 000) = 4.

TF-IDF score is the product of these quantities: 0.03 × 4 = 0.12 The word is important if the TF-IDF score is large 某一特定文件內的高詞語頻率，以及該詞語在整個文件集合中的低文件頻率，可以產生出高權重的 TF-IDF 。因此， TF-IDF 傾向於過濾掉常見的詞語，保留重要的詞語。

Query expansion Large-scale image matching Better approach: use bag-of-words technique to find likely matches For each image, find the top M scoring other images, do detailed SIFT matching with those

Outline 1. Introduction 2. System Design 3. Result 4. Conclusion

Matching and reconstruction statics for the three sets

Building Rome in a Day Rome, Italy. Reconstructed 150,000 in 21 hours on 496 machines Colosseum St. Peter’s Basilica Trevi Fountain

Dubrovnik, Croatia. 4,619 images (out of an initial 57,845). Total reconstruction time: 23 hours Number of cores: 352

Dubrovnik Dubrovnik, Croatia. 4,619 images (out of an initial 57,845). Total reconstruction time: 23 hours Number of cores: 352

San Marco Square San Marco Square and environs, Venice. 14,079 photos, out of an initial 250,000. Total reconstruction time: 3 days. Number of cores: 496.

Outline 1. Introduction 2. System Design 3. Result 4. Conclusion

Conclusion Our experimental results demonstrate that it is now possible to reconstruct cities consisting of 150K images in less than a day on a cluster with 500 compute cores. Large-scale image matching 3D models tml