Acquiring 3D Indoor Environments with Variability and Repetition Young Min Kim Stanford University Niloy J. Mitra UCL/ KAUST Dong-Ming Yan KAUST Leonidas Guibas Stanford University 1
Data Acquisition via Microsoft Kinect Raw data: Noisy point clouds Unsegmented Occlusion issues Our tool: Microsoft Kinect Real-time Provides depth and color Small and inexpensive 2
Dealing with Pointcloud Data Object-level reconstruction Scene-level reconstruction [Chang and Zwicker 2011] [Xiao et. al. 2012] 3
Mapping Indoor Environments Mapping outdoor environments – Roads to drive vehicles – Flat surfaces General indoor environments contain both objects and flat surfaces – Diversity of objects of interest – Objects are often cluttered – Objects deform and move Solution: Utilize semantic information 4
Nature of Indoor Environments Man-made objects can often be well- approximated by simple building blocks – Geometric primitives – Low DOF joints Many repeating elements – Chairs, desks, tables, etc. Relations between objects give good recognition cues 5
Indoor Scene Understanding with Pointcloud Data Patch-based approach Object-level understanding [Silberman et. al. 2012] [Koppula et. al. 2011] [Shao et. al. 2012][Nan et. al. 2012] 6
Comparisons [1] An Interactive Approach to Semantic Modeling of Indoor Scenes with an RGBD Camera [2] A Search-Classify Approach for Cluttered Indoor Scene Understanding [1][2]ours Prior model3D database Learned DeformationScaling Part-based scaling Learned MatchingClassifier Geometric SegmentationUser-assistedIteration DataMicrosoft KinectMantis VisionMicrosoft Kinect 7
Contributions Novel approach based on learning stage – Learning stage builds the model that is specific to the environment Build an abstract model composed of simple parts and relationship between parts – Uniquely explain possible low DOF deformation Recognition stage can quickly acquire large- scale environments – About 200ms per object 8
Approach Learning: Build a high-level model of the repeating elements Recognition: Use the model and relationship to recognize the objects translational rotational 9
Approach Learning – Build a high-level model of the repeating elements 10
Output Model: Simple, Light-Weighted Abstraction Primitives – Observable faces Connectivity – Rigid – Rotational – Translational – Attachment Relationship – Placement information contact translational rotational 11
Joint Matching and Fitting Individual segmentation – Group by similar normals Initial matching – Focus on large parts – Use size, height, relative positions – Keep consistent match Joint primitive fitting – Add joints if necessary – Incrementally complete the model 12
Approach Learning – Build a high-level model of the repeating elements 13
Approach Learning – Build a high-level model of the repeating elements Recognition – Use the model and relationship to recognize the objects 14
Hierarchy Ground plane and desk Objects – Isolated clusters Parts – Group by normals The segmentation is approximate and to be corrected later 15
Bottom-Up Approach Initial assignment for parts vs. primitives – Simple comparison of height, normal, size – Robust to deformation – Low false-negatives Refined assignment for objects vs. models – Iteratively solve for position, deformation and segmentation – Low false-positives parts 16
Bottom-Up Approach Initial assignment for parts vs. primitive nodes Refined assignment for objects vs. models Input points Initial objects Models matched Refined objects objectspartsmatched 17
Results Data available: door/paper_docs/data_learning.zip door/paper_docs/data_recognition.zip 18
Synthetic Scene Recognition speed: about 200ms per object 19
Synthetic Scene 20
Synthetic Scene 21
Different pair Similar pair 22
Different pair Similar pair 23
24
Office 1 trash bin 4 chairs 2 monitors 2 whiteboards 25
Office 2 26
Office 3 27
Deformations drawer deformations monitorlaptop missed monitor chair 28
Auditorium 1 Open table 29
Auditorium 2 Open table Open chairs 30
Seminar Room 1 missed chairs 31
Seminar Room 2 missed chairs 32
Limitations Missing data – Occlusion, material, … Error in initial segmentation – Cluttered objects are merged as a single segment – View-point sometimes separate single object into pieces 33
Conclusion We present a system that can recognize repeating objects in cluttered 3D indoor environments. We used purely geometric approach based on learned attributes and deformation modes. The recognized objects provide high-level scene understanding and can be replaced with high-quality CAD models for visualization (as shown in the previous talks!) 34
Thank You Qualcomm Corporation Max Planck Center for Visual Computing and Communications NSF grants and a KAUST AEA grant Marie Curie Career Integration Grant Stanford Bio-X travel Subsidy 35