3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes.

Slides:



Advertisements
Similar presentations
Mean-Field Theory and Its Applications In Computer Vision1 1.
Advertisements

O BJ C UT M. Pawan Kumar Philip Torr Andrew Zisserman UNIVERSITY OF OXFORD.
Indoor Segmentation and Support Inference from RGBD Images Nathan Silberman, Derek Hoiem, Pushmeet Kohli, Rob Fergus.
Automatic Photo Pop-up Derek Hoiem Alexei A.Efros Martial Hebert Carnegie Mellon University.
A Search-Classify Approach for Cluttered Indoor Scene Understanding Liangliang Nan 1, Ke Xie 1, Andrei Sharf 2 1 SIAT, China 2 Ben Gurion University, Israel.
The Layout Consistent Random Field for detecting and segmenting occluded objects CVPR, June 2006 John Winn Jamie Shotton.
Pose Estimation and Segmentation of People in 3D Movies Karteek Alahari, Guillaume Seguin, Josef Sivic, Ivan Laptev Inria, Ecole Normale Superieure ICCV.
RGB-D object recognition and localization with clutter and occlusions Federico Tombari, Samuele Salti, Luigi Di Stefano Computer Vision Lab – University.
Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.
Learning to Combine Bottom-Up and Top-Down Segmentation Anat Levin and Yair Weiss School of CS&Eng, The Hebrew University of Jerusalem, Israel.
Shape Sharing for Object Segmentation
Scene Labeling Using Beam Search Under Mutex Constraints ID: O-2B-6 Anirban Roy and Sinisa Todorovic Oregon State University 1.
SPONSORED BY SA2014.SIGGRAPH.ORG Annotating RGBD Images of Indoor Scenes Yu-Shiang Wong and Hung-Kuo Chu National Tsing Hua University CGV LAB.
Indoor Scene Segmentation using a Structured Light Sensor
Extracting Minimalistic Corridor Geometry from Low-Resolution Images Yinxiao Li, Vidya, N. Murali, and Stanley T. Birchfield Department of Electrical and.
Patch to the Future: Unsupervised Visual Prediction
1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.
Unfolding an Indoor Origami World David Fouhey, Abhinav Gupta, Martial Hebert 1.
Biased Normalized Cuts 1 Subhransu Maji and Jithndra Malik University of California, Berkeley IEEE Conference on Computer Vision and Pattern Recognition.
Contextual Classification with Functional Max-Margin Markov Networks Dan MunozDrew Bagnell Nicolas VandapelMartial Hebert.
A Graphical Model For Simultaneous Partitioning And Labeling Philip Cowans & Martin Szummer AISTATS, Jan 2005 Cambridge.
Learning to Detect A Salient Object Reporter: 鄭綱 (3/2)
Robust Higher Order Potentials For Enforcing Label Consistency
LARGE-SCALE NONPARAMETRIC IMAGE PARSING Joseph Tighe and Svetlana Lazebnik University of North Carolina at Chapel Hill CVPR 2011Workshop on Large-Scale.
Measuring Uncertainty in Graph Cut Solutions Pushmeet Kohli Philip H.S. Torr Department of Computing Oxford Brookes University.
1 Outline Overview Integrating Vision Models CCM: Cascaded Classification Models Learning Spatial Context TAS: Things and Stuff Descriptive Querying of.
What, Where & How Many? Combining Object Detectors and CRFs
A Trainable Graph Combination Scheme for Belief Propagation Kai Ju Liu New York University.
3D Scene Models Object recognition and scene understanding Krista Ehinger.
Graph-based Segmentation
Latent Boosting for Action Recognition Zhi Feng Huang et al. BMVC Jeany Son.
Multiple Organ detection in CT Volumes - Week 2 Daniel Donenfeld.
REU WEEK IV Malcolm Collins-Sibley Mentor: Shervin Ardeshir.
1 Action Classification: An Integration of Randomization and Discrimination in A Dense Feature Representation Computer Science Department, Stanford University.
#MOTION ESTIMATION AND OCCLUSION DETECTION #BLURRED VIDEO WITH LAYERS
80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.
Supervised Learning of Edges and Object Boundaries Piotr Dollár Zhuowen Tu Serge Belongie.
Associative Hierarchical CRFs for Object Class Image Segmentation Ľubor Ladický 1 1 Oxford Brookes University 2 Microsoft Research Cambridge Based on the.
Putting Context into Vision Derek Hoiem September 15, 2004.
IIIT Hyderabad Learning Semantic Interaction among Graspable Objects Swagatika Panda, A.H. Abdul Hafez, C.V. Jawahar Center for Visual Information Technology,
Peter Henry1, Michael Krainin1, Evan Herbst1,
Image-Based Segmentation of Indoor Corridor Floors for a Mobile Robot
A New Method for Automatic Clothing Tagging Utilizing Image-Click-Ads Introduction Conclusion Can We Do Better to Reduce Workload?
Multi Scale CRF Based RGB-D Image Segmentation Using Inter Frames Potentials Taha Hamedani Robot Perception Lab Ferdowsi University of Mashhad The 2 nd.
REU WEEK III Malcolm Collins-Sibley Mentor: Shervin Ardeshir.
Coherent Scene Understanding with 3D Geometric Reasoning Jiyan Pan 12/3/2012.
Learning Hierarchical Features for Scene Labeling
Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,
Learning Hierarchical Features for Scene Labeling Cle’ment Farabet, Camille Couprie, Laurent Najman, and Yann LeCun by Dong Nie.
Image segmentation.
Gaussian Conditional Random Field Network for Semantic Segmentation
Scene Parsing with Object Instances and Occlusion Ordering JOSEPH TIGHE, MARC NIETHAMMER, SVETLANA LAZEBNIK 2014 IEEE CONFERENCE ON COMPUTER VISION AND.
Week 3 Emily Hand UNR. Online Multiple Instance Learning The goal of MIL is to classify unseen bags, instances, by using the labeled bags as training.
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
Holistic Scene Understanding Virginia Tech ECE /02/26 Stanislaw Antol.
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
Data Driven Attributes for Action Detection
Krishna Kumar Singh, Yong Jae Lee University of California, Davis
Finding Things: Image Parsing with Regions and Per-Exemplar Detectors
Compositional Human Pose Regression
Nonparametric Semantic Segmentation
Saliency detection Donghun Yeo CV Lab..
Summary Presentation.
SemanticFusion: Dense 3D Semantic Mapping with Convolutional Neural Networks Paper by John McCormac, Ankur Handa, Andrew Davison, and Stefan Leutenegger.
Context-Aware Modeling and Recognition of Activities in Video
Learning to Combine Bottom-Up and Top-Down Segmentation
Saliency detection Donghun Yeo CV Lab..
Cascaded Classification Models
边缘检测年度进展概述 Ming-Ming Cheng Media Computing Lab, Nankai University
“Traditional” image segmentation
Presentation transcript:

3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes Cesar Cadena and Jana Kosecka

Motivation 5/5/2013  Long-term robotic operation  The semantic information about the surrounding environment is important for high level robotic tasks.  It is difficult to know a priori all the possible instances or classes of objects that the robot will find in a real operation.  Even if we know a lot of them, it is unreasonable and expensive, run all specific object detectors at the same time. Semantic Parsing for Priming Object Detection in RGB-D Scenes

Motivation 5/5/2013  Long-term robotic operation  The semantic information about the surrounding environment is important for high level robotic tasks.  It is difficult to know a priori all the possible instances or classes of objects that the robot will find in a real operation.  Even if we know a lot of them, it is unreasonable and expensive, run all specific object detectors at the same time. Semantic Parsing for Priming Object Detection in RGB-D Scenes

Motivation 5/5/2013  Long-term robotic operation  The semantic information about the surrounding environment is important for high level robotic tasks.  It is difficult to know a priori all the possible instances or classes of objects that the robot will find in a real operation.  Even if we know a lot of them, it is unreasonable and expensive, run all specific object detectors at the same time. Semantic Parsing for Priming Object Detection in RGB-D Scenes

Motivation 5/5/2013  Long-term robotic operation  The semantic information about the surrounding environment is important for high level robotic tasks.  It is difficult to know a priori all the possible instances or classes of objects that the robot will find in a real operation.  Even if we know a lot of them, it is unreasonable and expensive, run all specific object detectors at the same time. Semantic Parsing for Priming Object Detection in RGB-D Scenes

 However:  There are things we can assume to be present (almost) always  Generic “detachable” objects also share some characteristics Urban: GroundBuildingsSkyObjects Indoors:GroundWallsCeilingObjects Today: Ground – Structure – Furniture – Props Efficiently to segment RGB+3D scenes into these general classes to be used as a prior for specific task detectors Motivation 5/5/2013Semantic Parsing for Priming Object Detection in RGB-D Scenes

 However:  There are things we can assume to be present (almost) always  Generic “detachable” objects also share some characteristics Urban: GroundBuildingsSkyObjects Indoors:GroundWallsCeilingObjects Today: Ground – Structure – Furniture – Props Efficiently to segment RGB+3D scenes into these general classes to be used as a prior for specific task detectors Motivation 5/5/2013Semantic Parsing for Priming Object Detection in RGB-D Scenes

 However:  There are things we can assume to be present (almost) always  Generic “detachable” objects also share some characteristics Urban: GroundBuildingsSkyObjects Indoors:GroundWallsCeilingObjects Today: Ground – Structure – Furniture – Props Efficiently to segment RGB+3D scenes into these general classes to be used as a prior for specific task detectors Motivation 5/5/2013Semantic Parsing for Priming Object Detection in RGB-D Scenes

 However:  There are things we can assume to be present (almost) always  Generic “detachable” objects also share some characteristics Urban: GroundBuildingsSkyObjects Indoors:GroundWallsCeilingObjects Today: Ground – Structure – Furniture – Props Efficiently to segment RGB+3D scenes into these general classes to be used as a prior for specific task detectors Motivation 5/5/2013Semantic Parsing for Priming Object Detection in RGB-D Scenes

 However:  There are things we can assume to be present (almost) always  Generic “detachable” objects also share some characteristics Urban: GroundBuildingsSkyObjects Indoors:GroundWallsCeilingObjects Today: Ground – Structure – Furniture – Props Efficiently to segment RGB+3D scenes into these general classes to be used as a prior for specific task detectors Our Problem 5/5/2013Semantic Parsing for Priming Object Detection in RGB-D Scenes

 However:  There are things we can assume to be present (almost) always  Generic “detachable” objects also share some characteristics Urban: GroundBuildingsSkyObjects Indoors:GroundWallsCeilingObjects Today: Ground – Structure – Furniture – Props Efficiently to segment RGB+3D scenes into these general classes to be used as a prior for specific task detectors Our Problem 5/5/2013Semantic Parsing for Priming Object Detection in RGB-D Scenes

NYU Depth v2 5/5/2013  1449 labeled frames.  26 scenes classes.  Labeling spans over 894 different classes. N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, Indoor segmentation and support inference from RGBD images, in ECCV, Thanks to N. Silberman for proving the mapping 894 to 4 classes. Semantic Parsing for Priming Object Detection in RGB-D Scenes

The System 5/5/2013 Semantic Segmentation MAP Marginals Semantic Parsing for Priming Object Detection in RGB-D Scenes

Different approaches 5/5/2013 Semantic Segmentation MAP Marginals  N. Silberman et al. ECCV 2012  C. Couprie et al. CoRR 2013  X. Ren et al. CVPR 2012  D. Munoz et al. ECCV 2010  I. Endres and D. Hoeim, ECCV 2010 They have at least one:  Expensive over-segmentation  Expensive features  Expensive Inference Semantic Parsing for Priming Object Detection in RGB-D Scenes

Our approach 5/5/2013 MAP Marginals Semantic Segmentation Conditional Random Fields Potentials Graph Structure InferencePreprocessing Semantic Parsing for Priming Object Detection in RGB-D Scenes

Outline 5/5/2013 MAP Marginals Conditional Random Fields Potentials Graph Structure InferencePreprocessing (1) (2) (3) (5) Results (6) Conclusions (4) Semantic Parsing for Priming Object Detection in RGB-D Scenes

Preprocessing: Over-segmentation 5/5/2013 SLIC superpixels R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Susstrunk, SLIC superpixels compared to state-of-the-art superpixel methods, PAMI, Semantic Parsing for Priming Object Detection in RGB-D Scenes

Graph Structure 5/5/2013 Classical choice on images Semantic Parsing for Priming Object Detection in RGB-D Scenes

Graph Structure: Our choice 5/5/2013 Minimum Spanning Tree Over 3D Semantic Parsing for Priming Object Detection in RGB-D Scenes

Graph Structure: Our choice 5/5/2013 Minimum Spanning Tree Over 3D Semantic Parsing for Priming Object Detection in RGB-D Scenes

Potentials: Pairwise CRFs 5/5/2013Semantic Parsing for Priming Object Detection in RGB-D Scenes

Potentials: Pairwise CRFs 5/5/2013Semantic Parsing for Priming Object Detection in RGB-D Scenes

Potentials: Pairwise CRFs 5/5/2013Semantic Parsing for Priming Object Detection in RGB-D Scenes

Potentials: unary 5/5/2013 frequency of label j in a k-NN query frequency of label j the database J. Tighe and S. Lazebnik, Superparsing: Scalable nonparametric image parsing with superpixels, ECCV The database is a kd-tree of features from training data Semantic Parsing for Priming Object Detection in RGB-D Scenes

Features12D 5/5/2013  From Image:  mean of Lab color space3D  vertical pixel location1D  entropy from vanishing points1D  From 3D  height and depth2D  mean and std of differences on depth2D  local planarity1D  neighboring planarity1D  vertical orientation1D Semantic Parsing for Priming Object Detection in RGB-D Scenes

Features 5/5/2013  From Image:  entropy from vanishing points Semantic Parsing for Priming Object Detection in RGB-D Scenes

Features 5/5/2013  From 3D  mean and std of differences on depth Semantic Parsing for Priming Object Detection in RGB-D Scenes

Features 5/5/2013  From 3D  mean and std of differences on depth Semantic Parsing for Priming Object Detection in RGB-D Scenes

Features 5/5/2013  From 3D  mean and std of differences on depth  local planarity  neighboring planarity  vertical orientation Semantic Parsing for Priming Object Detection in RGB-D Scenes

Potentials: pairwise 5/5/2013 Lab color Semantic Parsing for Priming Object Detection in RGB-D Scenes

Inference 5/5/2013  We use belief propagation:  Exact results in MAP/marginals  Efficient computation, in Thanks to our graph structure choice! Semantic Parsing for Priming Object Detection in RGB-D Scenes

Results: NYU-D v2 Dataset 5/5/2013 GTMAP Semantic Parsing for Priming Object Detection in RGB-D Scenes

Results: NYU-D v2 Dataset 5/5/2013  Confusion matrix:  Comparisons: Semantic Parsing for Priming Object Detection in RGB-D Scenes

Results: NYU-D v2 Dataset 5/5/2013  Confusion matrix:  Comparisons: Semantic Parsing for Priming Object Detection in RGB-D Scenes

Results: NYU-D v2 Dataset 5/5/2013 GTMAP  Some failures: Semantic Parsing for Priming Object Detection in RGB-D Scenes

Results: NYU-D v2 Dataset 5/5/2013Semantic Parsing for Priming Object Detection in RGB-D Scenes

Marginal probabilities 5/5/2013  Provide very useful information for specific tasks, e.g. :  Specific object detection  Support inference P(Ground)P(Structure)P(Furniture)P(Props) Semantic Parsing for Priming Object Detection in RGB-D Scenes

Conclusions 5/5/2013  We have presented a computational efficient approach for semantic segmentation of priming objects in indoors.  Our approach effectively uses 3D and Images cues. Depth discontinuities are evidence for occlusions  The MST over 3D keeps intra-class components coherently connected. Semantic Parsing for Priming Object Detection in RGB-D Scenes

Discussion 5/5/2013  Features:  Local classifier:  Graph structure Bunch of engineered features (>1000D) Learned features (>1000D) Select meaningful features (12D) Logistic RegressionNeural Networksk-NN Dense Connections Image NoneMST over 3D Silberman et al. 2012Couprie et al. 2013Ours. Semantic Parsing for Priming Object Detection in RGB-D Scenes

Thanks!! 5/5/2013 Cesar Jana Funded by the US Army Research Office Grant W911NF Semantic Parsing for Priming Object Detection in RGB-D Scenes

Working on: 5/5/2013  People detection by Shenghui Zhou Semantic Parsing for Priming Object Detection in RGB-D Scenes

Multi-view and video: 5/5/2013Semantic Parsing for Priming Object Detection in RGB-D Scenes

Multi-view and video: 5/5/2013Semantic Parsing for Priming Object Detection in RGB-D Scenes

Multi-view and video: 5/5/2013Semantic Parsing for Priming Object Detection in RGB-D Scenes

Multi-view and video: 5/5/2013Semantic Parsing for Priming Object Detection in RGB-D Scenes

Multi-view and video: 5/5/2013Semantic Parsing for Priming Object Detection in RGB-D Scenes

Multi-view and video: 5/5/2013Semantic Parsing for Priming Object Detection in RGB-D Scenes