SPONSORED BY SA2014.SIGGRAPH.ORG Annotating RGBD Images of Indoor Scenes Yu-Shiang Wong and Hung-Kuo Chu National Tsing Hua University CGV LAB.

Slides:



Advertisements
Similar presentations
Indoor Segmentation and Support Inference from RGBD Images Nathan Silberman, Derek Hoiem, Pushmeet Kohli, Rob Fergus.
Advertisements

Attributes for Classifier Feedback Amar Parkash and Devi Parikh.
3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes.
For Internal Use Only. © CT T IN EM. All rights reserved. 3D Reconstruction Using Aerial Images A Dense Structure from Motion pipeline Ramakrishna Vedantam.
Interactively Co-segmentating Topically Related Images with Intelligent Scribble Guidance Dhruv Batra, Carnegie Mellon University Adarsh Kowdle, Cornell.
SPONSORED BY SA2014.SIGGRAPH.ORG MCGraph: Multi-criterion representation for scene understanding Moos Hueting ∗ Aron Monszpart ∗ Nicolas Mellado University.
Beyond Mindless Labeling: Really Leveraging Humans to Build Intelligent Machines Devi Parikh Virginia Tech.
Face Alignment with Part-Based Modeling
Automatic scene inference for 3D object compositing Kevin Karsch (UIUC), Sunkavalli, K. Hadap, S.; Carr, N.; Jin, H.; Fonte, R.; Sittig, M., David Forsyth.
Internet Vision - Lecture 3 Tamara Berg Sept 10. New Lecture Time Mondays 10:00am-12:30pm in 2311 Monday (9/15) we will have a general Computer Vision.
Object-centric spatial pooling for image classification Olga Russakovsky, Yuanqing Lin, Kai Yu, Li Fei-Fei ECCV 2012.
Structure Recovery by Part Assembly Chao-Hui Shen 1 Hongbo Fu 2 Kang Chen 1 Shi-Min Hu 1 1 Tsinghua University 2 City University of Hong Kong.
Presented by Yehuda Dar Advanced Topics in Computer Vision ( )Winter
A Versatile Depalletizer of Boxes Based on Range Imagery Dimitrios Katsoulas*, Lothar Bergen*, Lambis Tassakos** *University of Freiburg **Inos Automation-software.
ICIP 2000, Vancouver, Canada IVML, ECE, NTUA Face Detection: Is it only for Face Recognition?  A few years earlier  Face Detection Face Recognition 
Proceedings of the IEEE 2010 Antonio Torralba, MIT Jenny Yuen, MIT Bryan C. Russell, MIT.
Robust Higher Order Potentials For Enforcing Label Consistency
WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES Prasad Gabbur, Kobus Barnard University of Arizona.
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Usability 2.
Hierarchical Subquery Evaluation for Active Learning on a Graph Oisin Mac Aodha, Neill Campbell, Jan Kautz, Gabriel Brostow CVPR 2014 University College.
3D Scene Models Object recognition and scene understanding Krista Ehinger.
Salient Object Detection by Composition
Unsupervised Learning of Categories from Sets of Partially Matching Image Features Kristen Grauman and Trevor Darrel CVPR 2006 Presented By Sovan Biswas.
A Bayesian Approach For 3D Reconstruction From a Single Image
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
Automatic Registration of Color Images to 3D Geometry Computer Graphics International 2009 Yunzhen Li and Kok-Lim Low School of Computing National University.
A General Framework for Tracking Multiple People from a Moving Camera
3D SLAM for Omni-directional Camera
MESA LAB Multi-view image stitching Guimei Zhang MESA LAB MESA (Mechatronics, Embedded Systems and Automation) LAB School of Engineering, University of.
NATIONAL TECHNICAL UNIVERSITY OF ATHENS Image, Video And Multimedia Systems Laboratory Background
Image-based Plant Modeling Zeng Lanling Mar 19, 2008.
MESA LAB Two papers in icfda14 Guimei Zhang MESA LAB MESA (Mechatronics, Embedded Systems and Automation) LAB School of Engineering, University of California,
I 3D: Interactive Planar Reconstruction of Objects and Scenes Adarsh KowdleYao-Jen Chang Tsuhan Chen School of Electrical and Computer Engineering Cornell.
SPONSORED BY Data-driven Segmentation and Labeling of Freehand Sketches Zhe Huang, Hongbo Fu, Rynson W.H. Lau City University of Hong Kong.
Presenter: Lung-Hao Lee ( 李龍豪 ) January 7, 309.
Self-Calibration and Metric Reconstruction from Single Images Ruisheng Wang Frank P. Ferrie Centre for Intelligent Machines, McGill University.
80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.
BING: Binarized Normed Gradients for Objectness Estimation at 300fps
ECE 172A SIMPLE OBJECT DETECTOR WITH INDICATOR WHEN A NEW OBJECT HAS BEEN ADDED TO OR MISSING IN A ROOM Presented by by Hugo Groening.
IIIT Hyderabad Learning Semantic Interaction among Graspable Objects Swagatika Panda, A.H. Abdul Hafez, C.V. Jawahar Center for Visual Information Technology,
Acquiring 3D Indoor Environments with Variability and Repetition Young Min Kim Stanford University Niloy J. Mitra UCL/ KAUST Dong-Ming Yan KAUST Leonidas.
1 Research Question  Can a vision-based mobile robot  with limited computation and memory,  and rapidly varying camera positions,  operate autonomously.
The 18th Meeting on Image Recognition and Understanding 2015/7/29 Depth Image Enhancement Using Local Tangent Plane Approximations Kiyoshi MatsuoYoshimitsu.
Peter Henry1, Michael Krainin1, Evan Herbst1,
Removing Moving Objects from Point Cloud Scenes Krystof Litomisky and Bir Bhanu International Workshop on Depth Image Analysis November 11, 2012.
Chapter 10. The Explorer System in Cognitive Systems, Christensen et al. Course: Robots Learning from Humans On, Kyoung-Woon Biointelligence Laboratory.
Sponsored by Deformation-Driven Topology-Varying 3D Shape Correspondence Ibraheem Alhashim Kai Xu Yixin Zhuang Junjie Cao Patricio Simari Hao Zhang Presenter:
An Effective & Interactive Approach to Particle Tracking for DNA Melting Curve Analysis 李穎忠 DEPARTMENT OF COMPUTER SCIENCE & INFORMATION ENGINEERING NATIONAL.
High Resolution Surface Reconstruction from Overlapping Multiple-Views
SA2014.SIGGRAPH.ORG SPONSORED BY Automatic Semantic Modeling of Indoor Scenes from Low-quality RGB-D Data using Contextual Information Kang Chen 1 Yu-Kun.
Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.
SUN Database: Large-scale Scene Recognition from Abbey to Zoo Jianxiong Xiao *James Haysy Krista A. Ehinger Aude Oliva Antonio Torralba Massachusetts Institute.
Data-driven Image Processing Fubo Han Images in computer graphics IMAGE: the most engaging visual content in the internet. Image Superiority.
Center for Machine Perception Department of Cybernetics Faculty of Electrical Engineering Czech Technical University in Prague Segmentation Based Multi-View.
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.
Object detection with deformable part-based models
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
Krishna Kumar Singh, Yong Jae Lee University of California, Davis
Compositional Human Pose Regression
Nonparametric Semantic Segmentation
Li Fei-Fei, UIUC Rob Fergus, MIT Antonio Torralba, MIT
Indoor 3D Reconstruction from Laser Scanner Data
Normalized Cut Loss for Weakly-supervised CNN Segmentation
Rob Fergus Computer Vision
Radio Propagation Simulation Based on Automatic 3D Environment Reconstruction D. He A novel method to simulate radio propagation is presented. The method.
RGB-D Image for Scene Recognition by Jiaqi Guo
Outline Background Motivation Proposed Model Experimental Results
Deep Structured Scene Parsing by Learning with Image Descriptions
Computing the Stereo Matching Cost with a Convolutional Neural Network
Presentation transcript:

SPONSORED BY SA2014.SIGGRAPH.ORG Annotating RGBD Images of Indoor Scenes Yu-Shiang Wong and Hung-Kuo Chu National Tsing Hua University CGV LAB

SA2014.SIGGRAPH.ORG SPONSORED BY Outline Motivation Related Works Annotation Procedure User Study

SA2014.SIGGRAPH.ORG SPONSORED BY Motivation Scene understanding is a popular topic. RGBD dataset with high quality semantic annotations are valuable: Learning Evaluations Two fundamental problems Data Acquisition and Annotation

SA2014.SIGGRAPH.ORG SPONSORED BY Motivation Scene understanding is a popular topic. RGBD dataset with high quality semantic annotations are valuable: Learning Evaluations Two fundamental problems Data Acquisition and Annotation

SA2014.SIGGRAPH.ORG SPONSORED BY RGBD Indoor Datasets Cornell-RGBD ( ) : 24 labeled office scenes NYU2 ( ) : 1449 labeled indoor scenes –408,000+ RGBD videos frames (unlabeled) SUN 3D (2013) : 415+ full captured room –10+ room is full labeled, annotations are propagated through video. UZH & ETH 3D Scanned Point Datasets (2014) : 42 x full captured room –high quality point clouds (unlabeled) Object Detection and Classification from Large-Scale Cluttered Indoor Scans (EG 2014) …

SA2014.SIGGRAPH.ORG SPONSORED BY Data annotation is a painstaking and time- consuming task Motivation OMG! So many data need to be annotated

SA2014.SIGGRAPH.ORG SPONSORED BY Data annotation is a painstaking and time- consuming task Interactive tool for annotating RGBD indoor scenes Motivation We need a good tool!

SA2014.SIGGRAPH.ORG SPONSORED BY Data annotation is a tedious and time- consuming task Interactive tool for annotating RGBD indoor scenes Leverage both the cognitive ability of human and computational power of machine. Motivation

SA2014.SIGGRAPH.ORG SPONSORED BY RELATED WORKS

SA2014.SIGGRAPH.ORG SPONSORED BY Image Annotation LabelMe: a database and web-based tool for image annotation. Russell et. al., IJCV 2007 SUN3D: A Database of Big Spaces Reconstructed using SfM and Object Labels, Xiao et.al. ICCV 2013 Cheaper by the Dozen: Group Annotation of 3D Data, Boyko et. al., UIST 2014

SA2014.SIGGRAPH.ORG SPONSORED BY Scene Understanding using RGBD Data Image-based Indoor segmentation and support inference from RGBD images. Silberman et.al. ECCV RGB-(D) scene labeling: Features and algorithms. Ren et. al. CVPR Proxy-based Imagining the unseen: Stability- based cuboid arrangements for understanding cluttered indoor scenes. Shao et. al., SIGGRAPH Asia 2014 PanoContext: A whole-room 3d context model for panoramic scene understanding. Zhang et. al., ECCV 2014 Holistic scene understanding for 3D object detection with rgbd cameras., Lin et. al., ICCV D- based reasoning with blocks, support, and stability. Xiao et. al. CVPR 2013

SA2014.SIGGRAPH.ORG SPONSORED BY Annotation Procedure: Overview Input : RGB-D image Output: Seg., Label, Box proxy, Support structure User Å Input Machine Output

SA2014.SIGGRAPH.ORG SPONSORED BY Input RGB-D Image Output Annotated 3D Structure Annotation Procedure: Overview Draw Scribbles Draw Scribbles Estimate Boxes Estimate Boxes Extract Room Extract Room Annotate Label and Structure Annotate Label and Structure User Session Machine Session

SA2014.SIGGRAPH.ORG SPONSORED BY Preprocessing Estimate normal Perform over-segmentation using both color and normal map. Efficient graph based image segmentation [Felzenszwalb et.al. 2004] The coarser segmentation is used for room estimation. The finer segmentation is used for user- assisted object segmentation. Annotation Procedure:

SA2014.SIGGRAPH.ORG SPONSORED BY Extracting Room Layout Input RGB-D Image Output Annotated 3D Structure Draw Scribbles Draw Scribbles Estimate Boxes Estimate Boxes Extract Room Extract Room Annotate Label and Structure Annotate Label and Structure Annotation Procedure: [Silberman 2012]

SA2014.SIGGRAPH.ORG SPONSORED BY User Scribbles Check floor and walls hypotheses If the hypotheses fail, user clicks the segment to identify floor and walls. User draws scribbles to extract the object segments Output Annotated 3D Structure Draw Scribbles Draw Scribbles Estimate Boxes Estimate Boxes Extract Room Extract Room Annotate Label and Structure Annotate Label and Structure User Annotation Procedure: Input RGB-D Image

SA2014.SIGGRAPH.ORG SPONSORED BY Estimating Boxes Box orientation = Find out an orthogonal basis in 3D domain (3 unknowns direction) We assume one direction of box is parallel to the normal of floor (1 unknowns direction, 1 by cross product) Box Fitting Method : 1.Filtering point cloud by KNN 2.Project point cloud of a box to floor plane 3.Fit a line in 2D domain to extract a major direction 4.Using cross product to extract last direction. Output Annotated 3D Structure Draw Scribbles Draw Scribbles Estimate Boxes Estimate Boxes Extract Room Extract Room Annotate Label and Structure Annotate Label and Structure Annotation Procedure: Input RGB-D Image

SA2014.SIGGRAPH.ORG SPONSORED BY Output Annotated 3D Structure Annotate Label and 3D Structure Draw Scribbles Draw Scribbles Estimate Boxes Estimate Boxes Extract Room Extract Room Annotate Label and Structure Annotate Label and Structure User Tasks : 1.Type in the object label 2.Drag an arrow to specify the support relationships User Annotation Procedure: Input RGB-D Image

SA2014.SIGGRAPH.ORG SPONSORED BY Output Annotated 3D Structure Box Quality Refinement (Optional) Draw Scribbles Draw Scribbles Estimate Boxes Estimate Boxes Extract Room Extract Room Annotate Label and Structure Annotate Label and Structure User Tasks : 1.Adjust the orientation of boxes 2.Adjust the size of boxes User Annotation Procedure: Input RGB-D Image

SA2014.SIGGRAPH.ORG SPONSORED BY USER STUDY

SA2014.SIGGRAPH.ORG SPONSORED BY User Study : Settings Select 50 x scenes across 7 scene class from NYU2 Recruit 2 users, Each user is requested to annotate 50 x scenes Target class : 24 merged object classes List : bed, chair, cabinet, dresser, television, night stand, table, sofa, picture, pillow, … Each scene contains 3-6 objects

SA2014.SIGGRAPH.ORG SPONSORED BY User Study : Results Task Type Mean time per box Mean time per scene Total Time Check Room--1.6 sec1.3 min Draw Scribbles16 sec1 min51 min Type Labels4 sec17 sec13 min Drag Supports2 sec9 sec7.5 min Boxes Adjustment 11 sec35 sec29 min System Process Time: calculate normal, fitting planes and boxes: < 3 sec [in C++] Annotation Time: ( 50 x Scenes ) ( Accuracy = 64 %) TOTAL = 101 min

SA2014.SIGGRAPH.ORG SPONSORED BY Demo

SA2014.SIGGRAPH.ORG SPONSORED BY Conclusion An interactive system to facilitate annotating RGBD indoor scenes. Generating high quality ground truth data with rich annotations Object segments Object labels 3D geometry 3D structure

SA2014.SIGGRAPH.ORG SPONSORED BY On Going Work The major bottleneck lie in manual operations: Drawing scribbles Refine box proxy Typing labels Specify structure Incorporate inferring algorithm and 3D structure analysis to reduce the manual burden from the user.

SA2014.SIGGRAPH.ORG SPONSORED BY THANKS YOU !