Indoor Segmentation and Support Inference from RGBD Images Nathan Silberman, Derek Hoiem, Pushmeet Kohli, Rob Fergus.

Slides:

Advertisements

Similar presentations

Automatic Photo Pop-up Derek Hoiem Alexei A.Efros Martial Hebert Carnegie Mellon University.

Advertisements

3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes.

From Interactive to Semantic Image Segmentation Varun Gulshan Supervisors: Prof. Andrew Blake Prof. Andrew Zisserman 20 Jan 2012.

Creating Consistent Scene Graphs Using a Probabilistic Grammar Tianqiang LiuSiddhartha ChaudhuriVladimir G. Kim Qi-Xing HuangNiloy J. MitraThomas Funkhouser.

Scene Labeling Using Beam Search Under Mutex Constraints ID: O-2B-6 Anirban Roy and Sinisa Todorovic Oregon State University 1.

SPONSORED BY SA2014.SIGGRAPH.ORG Annotating RGBD Images of Indoor Scenes Yu-Shiang Wong and Hung-Kuo Chu National Tsing Hua University CGV LAB.

Recovering Human Body Configurations: Combining Segmentation and Recognition Greg Mori, Xiaofeng Ren, and Jitentendra Malik (UC Berkeley) Alexei A. Efros.

Indoor Scene Segmentation using a Structured Light Sensor

Patch to the Future: Unsupervised Visual Prediction

Unfolding an Indoor Origami World David Fouhey, Abhinav Gupta, Martial Hebert 1.

Creating Consistent Scene Graphs Using a Probabilistic Grammar Tianqiang LiuSiddhartha ChaudhuriVladimir G. Kim Qi-Xing HuangNiloy J. MitraThomas Funkhouser.

A Framework for Photo-Quality Assessment and Enhancement based on Visual Aesthetics Subhabrata Bhattacharya Rahul Sukthankar Mubarak Shah.

Contextual Classification with Functional Max-Margin Markov Networks Dan MunozDrew Bagnell Nicolas VandapelMartial Hebert.

1. Introduction Humanising GrabCut: Learning to segment humans using the Kinect Varun Gulshan, Victor Lempitksy and Andrew Zisserman Dept. of Engineering.

Robust Higher Order Potentials For Enforcing Label Consistency

CVR05 University of California Berkeley 1 Familiar Configuration Enables Figure/Ground Assignment in Natural Scenes Xiaofeng Ren, Charless Fowlkes, Jitendra.

Visual Object Recognition Rob Fergus Courant Institute, New York University

Automatic Photo Popup Derek Hoiem Alexei A. Efros Martial Hebert Carnegie Mellon University.

A Novel 2D To 3D Image Technique Based On Object- Oriented Conversion.

Opportunities of Scale, Part 2 Computer Vision James Hays, Brown Many slides from James Hays, Alyosha Efros, and Derek Hoiem Graphic from Antonio Torralba.

What, Where & How Many? Combining Object Detectors and CRFs

3D Scene Models Object recognition and scene understanding Krista Ehinger.

Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.

12/01/11 How the Kinect Works Computational Photography Derek Hoiem, University of Illinois Photo frame-grabbed from:

Yuping Lin and Gérard Medioni.  Introduction  Method  Register UAV streams to a global reference image ▪ Consecutive UAV image registration ▪ UAV to.

Computational Photography lecture 19 – How the Kinect 1 works? CS 590 Spring 2014 Prof. Alex Berg (Credits to many other folks on individual slides)

Unsupervised Learning of Categories from Sets of Partially Matching Image Features Kristen Grauman and Trevor Darrel CVPR 2006 Presented By Sovan Biswas.

Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning

Image Recognition using Hierarchical Temporal Memory Radoslav Škoviera Ústav merania SAV Fakulta matematiky, fyziky a informatiky UK.

NATIONAL TECHNICAL UNIVERSITY OF ATHENS Image, Video And Multimedia Systems Laboratory Background

Object Stereo- Joint Stereo Matching and Object Segmentation Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on Michael Bleyer Vienna.

Recovering Surface Layout from a Single Image D. Hoiem, A.A. Efros, M. Hebert Robotics Institute, CMU Presenter: Derek Hoiem CS 598, Spring 2009 Jan 29,

#MOTION ESTIMATION AND OCCLUSION DETECTION #BLURRED VIDEO WITH LAYERS

INTRODUCTION Heesoo Myeong and Kyoung Mu Lee Department of ECE, ASRI, Seoul National University, Seoul, Korea Tensor-based High-order.

Associative Hierarchical CRFs for Object Class Image Segmentation Ľubor Ladický 1 1 Oxford Brookes University 2 Microsoft Research Cambridge Based on the.

Putting Context into Vision Derek Hoiem September 15, 2004.

IIIT Hyderabad Learning Semantic Interaction among Graspable Objects Swagatika Panda, A.H. Abdul Hafez, C.V. Jawahar Center for Visual Information Technology,

Acquiring 3D Indoor Environments with Variability and Repetition Young Min Kim Stanford University Niloy J. Mitra UCL/ KAUST Dong-Ming Yan KAUST Leonidas.

A New Method for Automatic Clothing Tagging Utilizing Image-Click-Ads Introduction Conclusion Can We Do Better to Reduce Workload?

Category Independent Region Proposals Ian Endres and Derek Hoiem University of Illinois at Urbana-Champaign.

11/05/15 How the Kinect Works Computational Photography Derek Hoiem, University of Illinois Photo frame-grabbed from:

Multi Scale CRF Based RGB-D Image Segmentation Using Inter Frames Potentials Taha Hamedani Robot Perception Lab Ferdowsi University of Mashhad The 2 nd.

Object Recognition by Integrating Multiple Image Segmentations Caroline Pantofaru, Cordelia Schmid, Martial Hebert ECCV 2008 E.

Learning video saliency from human gaze using candidate selection CVPR2013 Poster.

REU WEEK III Malcolm Collins-Sibley Mentor: Shervin Ardeshir.

Coherent Scene Understanding with 3D Geometric Reasoning Jiyan Pan 12/3/2012.

Photo-Quality Enhancement based on Visual Aesthetics S. Bhattacharya*, R. Sukthankar**, M.Shah* *University of Central Florida, **Intel labs & CMU.

DISCRIMINATIVELY TRAINED DENSE SURFACE NORMAL ESTIMATION ANDREW SHARP.

Automatic 3D modelling of Architecture Anthony Dick 1 Phil Torr 2 Roberto Cipolla 1 1 Department of Engineering 2 Microsoft Research, University of Cambridge.

Scene Parsing with Object Instances and Occlusion Ordering JOSEPH TIGHE, MARC NIETHAMMER, SVETLANA LAZEBNIK 2014 IEEE CONFERENCE ON COMPUTER VISION AND.

Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.

Machine learning & object recognition Cordelia Schmid Jakob Verbeek.

Segmentation of Building Facades using Procedural Shape Priors

A Plane-Based Approach to Mondrian Stereo Matching

Learning a Region-based Scene Segmentation Model

Intrinsic images and shape refinement

Krishna Kumar Singh, Yong Jae Lee University of California, Davis

Finding Things: Image Parsing with Regions and Per-Exemplar Detectors

Recognizing Deformable Shapes

Nonparametric Semantic Segmentation

How the Kinect Works Computational Photography

Unsupervised Learning and Autoencoders

Efficiently Selecting Regions for Scene Understanding

Rob Fergus Computer Vision

Data-driven Depth Inference from a Single Still Image

Weakly Supervised Action Recognition

“Traditional” image segmentation

Presented By: Harshul Gupta

Problem Image and Volume Segmentation:

Presentation transcript:

Indoor Segmentation and Support Inference from RGBD Images Nathan Silberman, Derek Hoiem, Pushmeet Kohli, Rob Fergus

Goal: Infer Support for Every Region Lamp Supported by Nightstand Nightstand Supported by Floor

Goal: Infer Support for Every Region

Why infer physical support? Interacting with objects may have physical consequences!

Why infer physical support: Recognition Object on top of desk Object hanging from file cabinet

Why infer physical support: Recognition

Working with RGB+Depth Captured with Microsoft Kinect Restricted to Indoor Scenes

NYU Depth Dataset Version 2.0 Collected new NYU Depth Dataset Much larger than NYU Depth 1.0 – 464 Scenes – 1449 Densely Labeled frames – Over 400,000 Unlabeled frames – Over 800 Semantic Classes – Full videos available Larger variation in scenes Dense Labels much higher quality

High Quality Semantic Labels Bed Pillow 1 Pillow 2 Headboard Nightstand Lamp Window Dresser Picture 1 Wall 1 Wall Picture 3 Doll 1 Doll 2 Floor Picture 2 Pillow 3

High Quality Support Labels Support from behind Support from belowSupport from hidden region

Segmentation Support Inference RGBD Image

Input Scene Parsing

Major Surfaces Surface Normals Aligned Point Cloud Major Surfaces Surface Normals Aligned Point Cloud Input Scene Parsing

Input Major Surfaces Surface Normals Aligned Point Cloud Major Surfaces Surface Normals Aligned Point Cloud Segmentation Scene Parsing

Segmentation Scheme similar to: Recovering Occlusion Boundaries from a Single Image D. Hoiem, A.N. Stein, A.A. Efros, and M. Hebert, ICCV Hierarchical Segmentation

Segmentation Scheme similar to: Recovering Occlusion Boundaries from a Single Image D. Hoiem, A.N. Stein, A.A. Efros, and M. Hebert, ICCV Hierarchical Segmentation

Segmentation Scheme similar to: Recovering Occlusion Boundaries from a Single Image D. Hoiem, A.N. Stein, A.A. Efros, and M. Hebert, ICCV Hierarchical Segmentation

Segmentation Scheme similar to: Recovering Occlusion Boundaries from a Single Image D. Hoiem, A.N. Stein, A.A. Efros, and M. Hebert, ICCV Hierarchical Segmentation

Segmentation Scheme similar to: Recovering Occlusion Boundaries from a Single Image D. Hoiem, A.N. Stein, A.A. Efros, and M. Hebert, ICCV Hierarchical Segmentation

Input Major Surfaces Surface Normals Aligned Point Cloud Major Surfaces Surface Normals Aligned Point Cloud Segmentation Scene Parsing

Input Major Surfaces Surface Normals Aligned Point Cloud Major Surfaces Surface Normals Aligned Point Cloud Segmentation Support Inference

Segmentation Support Inference RGBD Image

Modeling Choice #1 All objects supported by a single object except – Floor requires no support Floor 3. Wall 1. Chair 2. Picture (Inverted) Tree Representation Image Regions

Modeling Choice #2 All objects are either supported by another region in the image OR a hidden region.

Modeling Choice #2 All objects are either supported by another region in the image OR a hidden region. Deoderant supported by counter

Modeling Choice #2 All objects are either supported by another region in the image OR a hidden region. Cabinet supported by hidden region

Modeling Choice #3 Every object is either supported from below or from behind.

Modeling Choice #3 Every object is either supported from below or from behind. Deoderant supported from below

Modeling Choice #3 Every object is either supported from below or from behind. Mirror supported from behind

Modeling Support: Structure Classes ‘Structure Classes’ encode high level support prior knowledge (1) Ground (2) Furniture (3) Prop or (4) Structure

Modeling Support Goal: For each region in regions, infer: 1.Supporting region 2.Support Type 3.Structure class

Modeling Support Goal: For each region in regions, infer: 1.Supporting region 2.Support Type 3.Structure class The formal problem per image:

Local Support Local Structure ClassPrior - supporting region - support type - structure class Joint Energy Factorizes into three terms:

comes from logistic regressor trained on pairwise features Local Support Energy supporting region - support type - structure class

Local Structure Class Energy from logistic regressor trained on features from each individual region - supporting region - support type - structure class

A region’s structure class helps predict its support. Structure Furniture Structure Floor OR Prior (1/4): Transitions - supporting region - support type - structure class

Supporting regions should be nearby OR Prior (2/4): Support Consistency - supporting region - support type - structure class

A region requires no support if and only if its structure class is ‘floor’ Prior (3/4): Ground Consistency - supporting region - support type - structure class

A region is unlikely to be the floor if another floor region is lower than it OR Prior (4/4): Global Ground Consistency Floor Prop - supporting region - support type - structure class

Integer Program Formulation Relaxed to Linear Program

Experiments

Evaluating Support Accuracy = # of Correctly Labeled Support Relationships # of Total Labeled Support Relationships Evaluation with features extracted from: Regions from Ground Truth LabelsRegions from Segmentation

Baseline #1: Image Plane Rules Heuristic: look at neighboring regions for support

Baselines #2: Structure Class Rules Heuristic: Support is deterministic given Structure Classes Floor Furniture Prop Structure

Baselines #3: Support Classifier Use only the output of support classifier

Evaluating Support (Regions from Ground Truth Labels) Examples of Manually Labeled Regions

Evaluating Support (Regions from Ground Truth Labels) Examples of Manually Labeled Regions

Results Ground Truth Regions Floor Furniture Prop Structure

Results Ground Truth Regions Correct Prediction

Results Ground Truth Regions Correct Prediction Incorrect Prediction

Results Ground Truth Regions Correct Prediction Incorrect Prediction Support from below

Results Ground Truth Regions Correct Prediction Incorrect Prediction Support from behind Support from below

Results Ground Truth Regions Correct Prediction Incorrect Prediction Support from behind Support from below Support from hidden region

Results Ground Truth Regions Correct Prediction Incorrect Prediction Support from behind Support from below Support from hidden region

Results Ground Truth Regions Correct Prediction Incorrect Prediction Support from behind Support from below Support from hidden region

Evaluating Support (Regions from Segmentation) Examples of Regions from Segmentation

Results Automatically Segmented Regions Correct Prediction Incorrect Prediction Support from behind Support from below Support from hidden region

Results Automatically Segmented Regions Correct Prediction Incorrect Prediction Support from behind Support from below Support from hidden region

Conclusion Algorithm for inferring Physical Support Novel Integer Program Formulation 3D Cues for segmentation Dataset: – Code: – seg_sup.html