Toward Object Discovery and Modeling via 3-D Scene Comparison Evan Herbst, Peter Henry, Xiaofeng Ren, Dieter Fox University of Washington; Intel Research Seattle 1
Overview Goal: learn about an environment by tracking changes in it over time Detect objects that occur in different places at different times 2 Handle textureless objects Avoid appearance/shape priors Represent a map with static + dynamic parts
Algorithm Outline Input: two RGB-D videos Mapping & reconstruction of each video Interscene alignment Change detection Spatial regularization Outputs: reconstructed static background; segmented movable objects 3
Scene Reconstruction Mapping based on RGB-D Mapping [Henry et al. ISER’10] Visual odometry, loop-closure detection, pose-graph optimization, bundle adjustment 4
Scene Reconstruction Mapping based on RGB-D Mapping [Henry et al. ISER’10] Surface representation: surfels 5
Scene Differencing Given two scenes, find parts that differ Surfaces in two scenes similar iff object doesn’t move Comparison at each surface point 6
Scene Differencing Given two scenes, find parts that differ Comparison at each surface point Start by globally aligning scenes 7 (2-D)(3-D)
Naïve Scene Differencing Easy algorithm: closest point within δ → same Ignores color, surface orientation Ignores occlusions 8
Model probability that a surface point moved Sensor readings z Expected measurement z* m ϵ {0, 1} Scene Differencing 9 z*z* z0z0 z1z1 z2z2 z3z3 frame 0 frame 10 frame 25 frame 49
Sensor Models 10 Model probability that a surface point moved Sensor readings z; expected measurement z* By Bayes, Two sensor measurement models With no expected surface: With expected surface:
Sensor Models Two sensor measurement models With expected surface Depth: uniform + exponential + Gaussian 1 Color: uniform + Gaussian Orientation: uniform + Gaussian 11 1 Thrun et al., Probabilistic Robotics, 2005 zd*zd*
Sensor Models Two sensor measurement models With expected surface Depth: uniform + exponential + Gaussian 1 Color: uniform + Gaussian Orientation: uniform + Gaussian With no expected surface Depth: uniform + exponential Color: uniform Orientation: uniform 12 1 Thrun et al., Probabilistic Robotics, 2005 zd*zd*
Example Result 13 Scene 1 Scene 2
Spatial Regularization 14 Points treated independently so far MRF to label each surfel moved or not moved Data term given by pointwise evidence Smoothness term: Potts, weighted by curvature
Spatial Regularization 15 Points treated independently so far MRF to label each surfel moved or not moved Scene 1 Scene 2 pointwise regularized
Experiments Trained MRF on four scenes (1.4M surfels) Tested on twelve scene pairs (8.0M surfels) 70% error reduction wrt max-class baseline 16 Count% % Total surfels8.0M1008.0M100 Moved surfels250k3 3 Errors250k355.5k0.7 False pos004.5k0.06 False neg250k351.0k0.64 BaselineOurs
Experiments Results: complex scene 17
Experiments Results: large object 18
Next steps All scenes in one optimization Model completion from many scenes Train more supervised object segmentation Conclusion Segment movable objects in 3-D using scene changes over time Represent a map as static + dynamic parts Extensible sensor model for RGB-D sensors 19
Using More Than 2 Scenes Given our framework, pretty easy to combine evidence from multiple scenes: w scene could be chosen to weight all scenes (rather than frames) equally, or upweight those taken under good lighting Other ways to subsample frames: as in keyframe selection in mapping 20
– Color, normal: uniform + Gaussian; mixing controlled by probability that beam hit expected surface First Sensor Model: Surface Didn’t Move Modeling sensor measurements: Depth: uniform + exponential + Gaussian * 21 * Fox et al., “Markov Localization…”, JAIR ‘99 zd*zd*
Experiments Trained MRF on four scenes (2.7 Msurfels) Tested on twelve scene pairs (8.0 Msurfels) 250k moved surfels; we get 4.5k FP, 51k FN 65% error reduction wrt max-class baseline Extract foreground segments as “objects” 22
Overview Many visits to same area over time Find objects by motion 23
(extra) Related Work Prob. Sensor models Depth only Depth & color, extra indep. Assumptions Static + dynamic maps In 2-d Usually not modeling objs 24
Spatial Regularization Pointwise only so far MRF to label each surfel moved or not moved Data term given by pointwise evidence Smoothness term: Potts, weighted by curvature 25
Depth-Dependent Color/Normal Model Modeling sensor measurements: Combine depth/color/normal: 26
Scene Reconstruction Mapping based on RGB-D Mapping [Henry et al. ISER’10] Surface representation: surfels 27