Multi-Output Learning for Camera Relocalization Abner Guzmán-Rivera UIUC Pushmeet Kohli Ben Glocker Jamie Shotton Toby Sharp Andrew Fitzgibbon Shahram Izadi Microsoft Research
Camera Relocalization from RGB-D images 2 World Know 3D model RGB-Depth Observe single frame Where is the camera? 6D camera pose H (rotation and translation)
Applications Large scale 3D model reconstruction 3
Applications Vehicle, robot, etc. localization 4
Applications Augmented Reality 5
Other Approaches to Localization Sparse key-point matching: – Detectors: [Rosten et al. PAMI’10], [Holzer et al. ECCV’12] – Descriptors: [Winder and Brown CVPR’07], [Calonder et al. ECCV’10], [Rublee et al. ICCV’11] – Matching: [Lepetit and Fua PAMI’06], [Nistér and Stewénius CVPR’06], [Schindler et al. CVPR’07] – Pose estimation: [Irschara et al. CVPR’09], [Dong et al. ICCV’09], [Yi et al. ECCV’10], [Baatz et al. IJCV’11], [Sattler et al. ICCV’11] Whole key-frame matching [Klein and Murray ECCV’08], [Gee and Mayol-Cuevas BMVC’12] Epitomic location recognition [Ni et al. PAMI’09] 6
Relocalization as Inverse Problem Find the pose H * minimizing the error in a rendering of the model 7 3D model of sceneRendering error View “renderer” Input RGB-D frame
Inverse Problem 8 Discriminative Predictor
Inverse Problem 9
Single Predictor Not Powerful Enough Limited expressivity The mapping is one-to-many 10 Input frame
Approx. Inverse Problem Stage 1 11 Portfolio of Discriminative Predictors Want complementary or “diverse” predictions
Approx. Inverse Problem Stage 2 12
How to train such portfolio of complementary predictors? 13
Discriminative Predictor [Shotton et al. CVPR’13] 14
Scene Coordinate Regression Forests 15 [Shotton et al. CVPR’13] Pixel comparison features (Depth and RGB) (x,y,z) world coordinate Regression tree: Regression forest...
Scene Coordinate Regression Forests 16 [Shotton et al. CVPR’13] Inliers for several hypotheses from RANSAC H1H1 H2H2 H3H3 H4H4 H5H5 H6H6... Forest predicts 3D world coordinates Sample pixels from input RGB-D frame
Learning a portfolio of predictors 17 to output a set of hypotheses that: Would like to train a set of predictors 1.Are relevant, i.e., approx. local minimizers 2.Summarize well the output space
Learning a portfolio: previous work Multiple Choice Learning [Guzman-Rivera et al. NIPS’12, AISTATS’14] 18 Set min -lossOracle penalizes portfolio for the error in the best prediction in the output – The portfolio is NOT penalized for being diverse – Set min -loss applies to standard datasets – Iterative training of fixed size portfolio Standard task-loss
Learning a portfolio of predictors 19 Portfolio of predictorsCVPR’13 SCoRe Forest We already have the objective to optimize and propose to approximate (1) by
– The portfolio is NOT penalized for being diverse – Learning procedure is able to tune portfolio to the reconstruction error to be used at test-time – Next we describe one way to achieve diversity Multi-Output Loss 20 Standard task-loss
Training Algorithm 21
Loss to Example Weights 22 Diversity parameter (“variance” of the weights) Multi-output loss for example j Intuition: Want next predictor to emphasize accuracy on examples difficult thus far
Rendering Error 23
L1 Rendering Error 24 Input frame 1. Raycast depth frame for some hypothesis 2. Evaluate L1 distance between input depth and raycast depth
Results 25
7-Scenes Dataset 26 [Shotton et al. CVPR’13, Glocker et al. ISMAR’13]
Metric Proportion Correct (single prediction) – Correct if translational error ≤ 5cm AND rotational error ≤ 5 o 27 Competing Approaches CVPR13: Scene Coordinate Regression Forests [Shotton et al. CVPR’13] CVPR13 + M -Best – Take M -Best RANSAC hypotheses
Office 28 Input frame Multiple predictions: Ground-truth (white), Prediction (magenta):
Stairs 29 Input frame Multiple predictions: Ground-truth (white), Prediction (magenta):
All Scene Average 30 Proportion Correct Size of Portfolio
All Scene Average 31 Proportion Correct Size of Portfolio Using aggregation
Summary Camera relocalization as inverse problem Portfolio of complementary discriminative predictors Method to learn such portfolio State-of-the-art camera relocalization 32