Automatic Photo Popup Derek Hoiem Alexei A. Efros Martial Hebert Carnegie Mellon University.

Slides:



Advertisements
Similar presentations
Automatic Photo Pop-up Derek Hoiem Alexei A.Efros Martial Hebert Carnegie Mellon University.
Advertisements

Analysis of Contour Motions Ce Liu William T. Freeman Edward H. Adelson Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.
Recovering Human Body Configurations: Combining Segmentation and Recognition Greg Mori, Xiaofeng Ren, and Jitentendra Malik (UC Berkeley) Alexei A. Efros.
Proportion Priors for Image Sequence Segmentation Claudia Nieuwenhuis, etc. ICCV 2013 Oral.
Chapter 4: Linear Models for Classification
Fast and Extensible Building Modeling from Airborne LiDAR Data Qian-Yi Zhou Ulrich Neumann University of Southern California.
Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision CS 143, Brown James Hays 02/22/11 Many slides from Derek Hoiem.
Optimization & Learning for Registration of Moving Dynamic Textures Junzhou Huang 1, Xiaolei Huang 2, Dimitris Metaxas 1 Rutgers University 1, Lehigh University.
Localization of Piled Boxes by Means of the Hough Transform Dimitrios Katsoulas Institute for Pattern Recognition and Image Processing University of Freiburg.
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Robust Higher Order Potentials For Enforcing Label Consistency
Image Quilting for Texture Synthesis and Transfer Alexei A. Efros1,2 William T. Freeman2.
Recognising Panoramas
Uncalibrated Geometry & Stratification Sastry and Yang
Image Morphing : Rendering and Image Processing Alexei Efros.
reconstruction process, RANSAC, primitive shapes, alpha-shapes
Automatic Photo Pop-up Derek Hoiem Alexei A. Efros Martial Hebert.
Accurate, Dense and Robust Multi-View Stereopsis Yasutaka Furukawa and Jean Ponce Presented by Rahul Garg and Ryan Kaminsky.
Statistical Color Models (SCM) Kyungnam Kim. Contents Introduction Trivariate Gaussian model Chromaticity models –Fixed planar chromaticity models –Zhu.
3D Scene Models Object recognition and scene understanding Krista Ehinger.
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Image Segmentation Rob Atlas Nick Bridle Evan Radkoff.
Computer Vision James Hays, Brown
Gwangju Institute of Science and Technology Intelligent Design and Graphics Laboratory Multi-scale tensor voting for feature extraction from unstructured.
October 14, 2014Computer Vision Lecture 11: Image Segmentation I 1Contours How should we represent contours? A good contour representation should meet.
Single-view 3D Reconstruction Computational Photography Derek Hoiem, University of Illinois 10/18/12 Some slides from Alyosha Efros, Steve Seitz.
MRFs and Segmentation with Graph Cuts Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 02/24/10.
Recovering Surface Layout from a Single Image D. Hoiem, A.A. Efros, M. Hebert Robotics Institute, CMU Presenter: Derek Hoiem CS 598, Spring 2009 Jan 29,
I 3D: Interactive Planar Reconstruction of Objects and Scenes Adarsh KowdleYao-Jen Chang Tsuhan Chen School of Electrical and Computer Engineering Cornell.
Self-Calibration and Metric Reconstruction from Single Images Ruisheng Wang Frank P. Ferrie Centre for Intelligent Machines, McGill University.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Supervised Learning of Edges and Object Boundaries Piotr Dollár Zhuowen Tu Serge Belongie.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
Course 9 Texture. Definition: Texture is repeating patterns of local variations in image intensity, which is too fine to be distinguished. Texture evokes.
Single View Geometry Course web page: vision.cis.udel.edu/cv April 9, 2003  Lecture 20.
Single-view 3D Reconstruction Computational Photography Derek Hoiem, University of Illinois 10/12/10 Some slides from Alyosha Efros, Steve Seitz.
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.
Wenqi Zhu 3D Reconstruction From Multiple Views Based on Scale-Invariant Feature Transform.
1 Research Question  Can a vision-based mobile robot  with limited computation and memory,  and rapidly varying camera positions,  operate autonomously.
Epitomic Location Recognition A generative approach for location recognition K. Ni, A. Kannan, A. Criminisi and J. Winn In proc. CVPR Anchorage,
Image-Based Segmentation of Indoor Corridor Floors for a Mobile Robot Yinxiao Li and Stanley T. Birchfield The Holcombe Department of Electrical and Computer.
Single-view 3D Reconstruction
(c) 2000, 2001 SNU CSE Biointelligence Lab Finding Region Another method for processing image  to find “regions” Finding regions  Finding outlines.
3D reconstruction from uncalibrated images
Category Independent Region Proposals Ian Endres and Derek Hoiem University of Illinois at Urbana-Champaign.
October 1, 2013Computer Vision Lecture 9: From Edges to Contours 1 Canny Edge Detector However, usually there will still be noise in the array E[i, j],
Stereo Vision Local Map Alignment for Robot Environment Mapping Computer Vision Center Dept. Ciències de la Computació UAB Ricardo Toledo Morales (CVC)
Object Recognition by Integrating Multiple Image Segmentations Caroline Pantofaru, Cordelia Schmid, Martial Hebert ECCV 2008 E.
Coherent Scene Understanding with 3D Geometric Reasoning Jiyan Pan 12/3/2012.
Digital Image Processing Lecture 17: Segmentation: Canny Edge Detector & Hough Transform Prof. Charlene Tsai.
Photo-Quality Enhancement based on Visual Aesthetics S. Bhattacharya*, R. Sukthankar**, M.Shah* *University of Central Florida, **Intel labs & CMU.
Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 02/22/11.
Photoconsistency constraint C2 q C1 p l = 2 l = 3 Depth labels If this 3D point is visible in both cameras, pixels p and q should have similar intensities.
Automatic 3D modelling of Architecture Anthony Dick 1 Phil Torr 2 Roberto Cipolla 1 1 Department of Engineering 2 Microsoft Research, University of Cambridge.
Color Image Segmentation Mentor : Dr. Rajeev Srivastava Students: Achit Kumar Ojha Aseem Kumar Akshay Tyagi.
ICCV 2007 Optimization & Learning for Registration of Moving Dynamic Textures Junzhou Huang 1, Xiaolei Huang 2, Dimitris Metaxas 1 Rutgers University 1,
1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.
CSCI 631 – Foundations of Computer Vision March 15, 2016 Ashwini Imran Image Stitching Link: singhashwini.mesinghashwini.me.
A Plane-Based Approach to Mondrian Stereo Matching
3D Single Image Scene Reconstruction For Video Surveillance Systems
Learning a Region-based Scene Segmentation Model
DIGITAL SIGNAL PROCESSING
A special case of calibration
Common Classification Tasks
Brief Review of Recognition + Context
Video Compass Jana Kosecka and Wei Zhang George Mason University
Image Segmentation.
Calibration and homographies
Presentation transcript:

Automatic Photo Popup Derek Hoiem Alexei A. Efros Martial Hebert Carnegie Mellon University

What Is Automatic Photo Popup

Introduction Creating 3D models from images is a complex process – Time-consuming – Special equipment – Large number of photographs Easy methods not widely accessible to general public

Related Work Multiple Images – Manual Reconstruction Façade [Debevec et al. 1996], REALVIZ Image Modeler, Photobuilder [Cipolla et al. 1999], [Zieglier et al. 2003], etc. – Automatic Reconstruction Structure from Motion (e.g. [Nister 2001], [Pollefeys et al. 2004]) Single Image – Manual Reconstruction Metric ( [Liebowitz et al. 1999], [Criminisi et al. 1999]) Approximate: Tour into the Picture [Horry et al. 1997]; [Kang et al. 2001], [Oh et al. 2001], [Zhang et al. 2001], etc. – Automatic Reconstruction

Our Goals Simple, piecewise planar models Outdoor scenes Doesn’t need to work all the time (~35%) Pop-up Book

The Problem from [Sinha and Adelson 1993] Recovering 3D geometry from single 2D projection Infinite number of possible solutions!

Intuition From a single photograph, humans can fairly easily – Grasp the overall structure of a scene – Imagine reasonably well how the scene would appear from a different perspective How? – What is it about our world that allows us to do so?

Structure In Our World Gravity – Flat ground – Objects (trees, buildings) meet the ground at roughly right angles – Ground is below sky Materials correspond to 3D orientations – e.g. bark is usually on a vertical surface Structures exhibit high degree of similarity – e.g. grass is usually green

Solving the Problem Exploit structure, eliminate unlikely geometrical interpretations Use training data to learn correspondences between image features and geometric classes …

Basic Idea Assign geometric labels to regions of the image Use the labeled regions to infer a 3D model Cut and Fold 3D Model Image Learned Models Labels

Overview Image to Superpixels Superpixels to Multiple Constellations Multiple Constellations to Superpixel Labels Superpixel Labels to 3D Model

Superpixels Without knowledge of the scene’s structure, we can only compute simple features – Pixel colors – Filter responses Superpixels enable us to – Compute more complex statistics – Improve the efficiency of the algorithm – Locate regions with the same geometric label – Reduce noise

Superpixels to Multiple Constellations Superpixels do not provide enough spatial support Group together superpixels that are likely to be from the same surface – Call each group a constellation Ideally, each constellation corresponds to one physical object in the scene – Cannot guarantee that a constellation describes a single object – Contained superpixels may not have the same label Generate several overlapping sets of possible constellations – Use this to help determine the final labels for superpixels …

Forming Constellations Group superpixels that are likely to share a common geometric label We cannot be completely confident that these constellations are homogenous – Find multiple constellations for each superpixel Superpixel features allow us to determine whether a pair of superpixels have the same label – Not sufficient to determine the label for the superpixel

Forming Constellations Assign a superpixel randomly to each of N c constellations Then iteratively assign each remaining superpixel to the constellation most likely to share its label Want to maximize the average pairwise log- likelihoods with other superpixels in the constellation Estimated prob. that two superpixels share the same label

Forming Constellations Vary N c from 3 to 25 to explore – The trade-off of variance from poor spatial support – The bias from risk of mixed labels in a constellation

Features for Geometric Classes When looking at a region r, what characteristics can help us determine the label for r? – Color – Texture – Location and Shape – Perspective

Features: Color Color helps identify material – Sky is usually blue or white – Ground is usually brown or green RGB – blueness or greenness HSV – perceptual attributes (hue and “darkness”)

Features: Texture Additional information about the material – Distinguish sky from water, green leaves from grass Derivative of oriented Gaussian filters provides texture cues Textons also provide texture cues

Features: Location Location provides cues for distinguishing between ground, vertical structures and sky Ground Vertical Sky

Features: Perspective Perspective helps determine 3D orientation of surfaces – Use region boundaries to compute vanishing points

Many Weak Geometric Cues Material Image Location 3D Geometry

Multiple Constellations to Superpixel Labels The greatest challenge is to determine the geometric label of an image region – Machine learning approach – Model the appearance of geometric labels with training images For each constellation, we estimate – The likelihood of each of the three possible geometric labels – The confidence that all contained superpixels have the same label Superpixel’s label is inferred from the likelihoods of the constellations that contain that superpixel

Multiple Constellations to Superpixel Labels … …

Labeling the Image Feature descriptors are computed from – Superpixels – Constellations Geometric classification of regions Training – Training Data – Superpixel Same-Label Likelihoods – Constellation Label and Homogeneity Likelihoods

Geometric Classification Goal: obtain the most likely label for each superpixel in the image Label likelihood – confidence in each geometric label Homogeneity likelihood – do all superpixels in a constellation share the same label?

Geometric Classification Estimate the likelihood of a superpixel label with s i – the i th superpixel C k – the k th constellation y i – the label of s i y k – the label of C k x k – the feature vector for C k LABEL LIKELIHOOD HOMOGENEITY LIKELIHOOD

Training Data: 82 images of outdoor scenes – Natural, urban and suburban scenes …

Superpixel Same-Label Likelihoods Sample 2500 superpixel pairs – Same-label and different-label Use kernel density estimation [Duda et al. 2000] and AdaBoost [Collins et al. 2002] to estimate the likelihood function

Learn From Training Images Prepare training images – Create multiple segmentations of training images – Get segment labels from ground truth Ground, vertical, sky or “mixed” Density estimation by decision trees – Improved accuracy with logistic regression form of AdaBoost [Collins et al. 2002] Label LikelihoodHomogeneity Likelihood

Superpixel Labels to 3D Model Once we have the geometric labels for each superpixel, we can estimate the position of each object – Estimate ground plane – Fit the boundary of the bottom of the vertical regions with the ground – Estimate horizon position from geometric features and ground labels Vertically labeled regions become planar billboards Remove regions labeled sky from model Texture-map image onto model

Vanishing Points of Surfaces Compute intersections for long lines in a constellation – Video Compass [Kosecka and Zhang 2002]

Horizon Estimation Find the position that minimizes the L 1/2 distance from all of the intersection points in the image – Robustness to outliers – Provides a reasonable estimate of the horizon in man-made images

Creating the 3D Model Cutting and folding Camera parameters

Cutting And Folding Construct a simple 3D model – Single ground plane – Simple, piecewise planar objects for vertically labeled regions Partition vertical regions into a set of objects Determine where each object intersects the ground plane – There are difficulties, be conservative

Partitioning Vertically Labeled Regions Divide vertically labeled pixels into loosely connected regions using the connected components algorithm For each region, fit a set of lines to the boundary using the Hough transform [Duda et al. 1972] Form disjoint line segments into a set of polylines

Cutting and Folding Fit ground-vertical boundary – Iterative Hough transform

Cutting and Folding Form polylines from boundary segments – Join segments that intersect at slight angles – Remove small overlapping polylines Estimate horizon position from perspective cues

Cutting and Folding ``Fold’’ along polylines and at corners ``Cut’’ at ends of polylines and along vertical-sky boundary

Cutting and Folding Construct 3D model Texture map

Results Automatic Photo Pop-up Input Image Cut and Fold

Results Automatic Photo Pop-up Input Image

Results Automatic Photo Pop-up Input Images

Results Automatic Photo Pop-upInput Image

Comparison with Manual Method Input Image Automatic Photo Pop-up (30 sec)! [Liebowitz et al. 1999]

Failures Labeling Errors

Failures Foreground Objects

The Music Video

Just the first step… Richer set of geometric classes (ICCV 05 paper) Segment foreground objects Cleaner segmentation More accurate labeling

Conclusion First system to automatically recover 3D scene from single image! Learn statistics of our world from training images

Quantitative Results (ICCV 05) Average: 86%

Questions?