Multi-Output Learning for Camera Relocalization Abner Guzmán-Rivera UIUC Pushmeet Kohli Ben Glocker Jamie Shotton Toby Sharp Andrew Fitzgibbon Shahram.

Slides:

Advertisements

Similar presentations

The Layout Consistent Random Field for detecting and segmenting occluded objects CVPR, June 2006 John Winn Jamie Shotton.

Advertisements

A Fast Local Descriptor for Dense Matching

Registration for Robotics Kurt Konolige Willow Garage Stanford University Patrick Mihelich JD Chen James Bowman Helen Oleynikova Freiburg TORO group: Giorgio.

For Internal Use Only. © CT T IN EM. All rights reserved. 3D Reconstruction Using Aerial Images A Dense Structure from Motion pipeline Ramakrishna Vedantam.

Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.

Joydeep Biswas, Manuela Veloso

1. 2 An extreme occurrence of the missing data W I D E B A S E L I N E – no point in more than 2 images!

Recent work in image-based rendering from unstructured image collections and remaining challenges Sudipta N. Sinha Microsoft Research, Redmond, USA.

Low Complexity Keypoint Recognition and Pose Estimation Vincent Lepetit.

Nikolas Engelhard 1, Felix Endres 1, Jürgen Hess 1, Jürgen Sturm 2, Wolfram Burgard 1 1 University of Freiburg, Germany 2 Technical University Munich,

Parallel Tracking and Mapping for Small AR Workspaces Vision Seminar

IIIT Hyderabad Pose Invariant Palmprint Recognition Chhaya Methani and Anoop Namboodiri Centre for Visual Information Technology IIIT, Hyderabad, INDIA.

Face Alignment by Explicit Shape Regression

Silhouette Lookup for Automatic Pose Tracking N ICK H OWE.

Optimization & Learning for Registration of Moving Dynamic Textures Junzhou Huang 1, Xiaolei Huang 2, Dimitris Metaxas 1 Rutgers University 1, Lehigh University.

Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.

Kinect Case Study CSE P 576 Larry Zitnick

Robust and large-scale alignment Image from

A new approach for modeling and rendering existing architectural scenes from a sparse set of still photographs Combines both geometry-based and image.

Object retrieval with large vocabularies and fast spatial matching

Vision-based Registration for AR Presented by Diem Vu Nov 20, 2003.

Triangulation and Multi-View Geometry Class 9 Read notes Section 3.3, , 5.1 (if interested, read Triggs’s paper on MVG using tensor notation, see.

Face Recognition from Face Motion Manifolds using Robust Kernel RAD Ognjen Arandjelović Roberto Cipolla Funded by Toshiba Corp. and Trinity College, Cambridge.

What, Where & How Many? Combining Object Detectors and CRFs

Sebastian Thrun CS223B Computer Vision, Winter Stanford CS223B Computer Vision, Winter 2005 Lecture 3 Advanced Features Sebastian Thrun, Stanford.

(Fri) Young Ki Baik Computer Vision Lab.

Face Detection CSE 576. Face detection State-of-the-art face detection demo (Courtesy Boris Babenko)Boris Babenko.

Keypoint-based Recognition and Object Search

Matthew Brown University of British Columbia (prev.) Microsoft Research [ Collaborators: † Simon Winder, *Gang Hua, † Rick Szeliski † =MS Research, *=MS.

Registration for Robotics Kurt Konolige Willow Garage Stanford University Patrick Mihelich JD Chen James Bowman Helen Oleynikova Freiburg TORO group: Giorgio.

The 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems October 11-15, 2009 St. Louis, USA.

Automatic Camera Calibration

Face Alignment Using Cascaded Boosted Regression Active Shape Models

Keypoint-based Recognition Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/04/10.

Internet-scale Imagery for Graphics and Vision James Hays cs195g Computational Photography Brown University, Spring 2010.

KinectFusion : Real-Time Dense Surface Mapping and Tracking IEEE International Symposium on Mixed and Augmented Reality 2011 Science and Technology Proceedings.

A Statistical Approach to Speed Up Ranking/Re-Ranking Hong-Ming Chen Advisor: Professor Shih-Fu Chang.

Example: line fitting. n=2 Model fitting Measure distances.

776 Computer Vision Jan-Michael Frahm Fall SIFT-detector Problem: want to detect features at different scales (sizes) and with different orientations!

Ray Divergence-Based Bundle Adjustment Conditioning for Multi-View Stereo Mauricio Hess-Flores 1, Daniel Knoblauch 2, Mark A. Duchaineau 3, Kenneth I.

CLASSIFICATION: Ensemble Methods

BAGGING ALGORITHM, ONLINE BOOSTING AND VISION Se – Hoon Park.

Human pose recognition from depth image MS Research Cambridge.

Epitomic Location Recognition A generative approach for location recognition K. Ni, A. Kannan, A. Criminisi and J. Winn In proc. CVPR Anchorage,

Peter Henry1, Michael Krainin1, Evan Herbst1,

VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR

3D reconstruction from uncalibrated images

CSE 185 Introduction to Computer Vision Feature Matching.

COMP24111: Machine Learning Ensemble Models Gavin Brown

Visual Odometry David Nister, CVPR 2004

RGB-D Images and Applications

Object Recognition by Integrating Multiple Image Segmentations Caroline Pantofaru, Cordelia Schmid, Martial Hebert ECCV 2008 E.

776 Computer Vision Jan-Michael Frahm Spring 2012.

ICCV 2007 Optimization & Learning for Registration of Moving Dynamic Textures Junzhou Huang 1, Xiaolei Huang 2, Dimitris Metaxas 1 Rutgers University 1,

Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.

Invariant Local Features Image content is transformed into local feature coordinates that are invariant to translation, rotation, scale, and other imaging.

Answering ‘Where am I?’ by Nonlinear Least Squares

Compositional Human Pose Regression

COMP61011 : Machine Learning Ensemble Models

Approximate Models for Fast and Accurate Epipolar Geometry Estimation

Real-Time Human Pose Recognition in Parts from Single Depth Image

A special case of calibration

Structure from motion Input: Output: (Tomasi and Kanade)

Features Readings All is Vanity, by C. Allan Gilbert,

Rob Fergus Computer Vision

Liyuan Li, Jerry Kah Eng Hoe, Xinguo Yu, Li Dong, and Xinqi Chu

RCNN, Fast-RCNN, Faster-RCNN

CSE 185 Introduction to Computer Vision

Calibration and homographies

Structure from motion Input: Output: (Tomasi and Kanade)

Presentation transcript:

Multi-Output Learning for Camera Relocalization Abner Guzmán-Rivera UIUC Pushmeet Kohli Ben Glocker Jamie Shotton Toby Sharp Andrew Fitzgibbon Shahram Izadi Microsoft Research

Camera Relocalization from RGB-D images 2 World Know 3D model RGB-Depth Observe single frame Where is the camera? 6D camera pose H (rotation and translation)

Applications  Large scale 3D model reconstruction 3

Applications  Vehicle, robot, etc. localization 4

Applications  Augmented Reality 5

Other Approaches to Localization  Sparse key-point matching: – Detectors: [Rosten et al. PAMI’10], [Holzer et al. ECCV’12] – Descriptors: [Winder and Brown CVPR’07], [Calonder et al. ECCV’10], [Rublee et al. ICCV’11] – Matching: [Lepetit and Fua PAMI’06], [Nistér and Stewénius CVPR’06], [Schindler et al. CVPR’07] – Pose estimation: [Irschara et al. CVPR’09], [Dong et al. ICCV’09], [Yi et al. ECCV’10], [Baatz et al. IJCV’11], [Sattler et al. ICCV’11]  Whole key-frame matching [Klein and Murray ECCV’08], [Gee and Mayol-Cuevas BMVC’12]  Epitomic location recognition [Ni et al. PAMI’09] 6

Relocalization as Inverse Problem  Find the pose H * minimizing the error in a rendering of the model 7 3D model of sceneRendering error View “renderer” Input RGB-D frame

Inverse Problem 8 Discriminative Predictor

Inverse Problem 9

Single Predictor Not Powerful Enough  Limited expressivity  The mapping is one-to-many 10 Input frame

Approx. Inverse Problem Stage 1 11 Portfolio of Discriminative Predictors Want complementary or “diverse” predictions

Approx. Inverse Problem Stage 2 12

How to train such portfolio of complementary predictors? 13

Discriminative Predictor [Shotton et al. CVPR’13] 14

Scene Coordinate Regression Forests 15 [Shotton et al. CVPR’13] Pixel comparison features (Depth and RGB) (x,y,z) world coordinate Regression tree: Regression forest...

Scene Coordinate Regression Forests 16 [Shotton et al. CVPR’13] Inliers for several hypotheses from RANSAC H1H1 H2H2 H3H3 H4H4 H5H5 H6H6... Forest predicts 3D world coordinates Sample pixels from input RGB-D frame

Learning a portfolio of predictors 17 to output a set of hypotheses that: Would like to train a set of predictors 1.Are relevant, i.e., approx. local minimizers 2.Summarize well the output space

Learning a portfolio: previous work  Multiple Choice Learning [Guzman-Rivera et al. NIPS’12, AISTATS’14] 18 Set min -lossOracle penalizes portfolio for the error in the best prediction in the output – The portfolio is NOT penalized for being diverse – Set min -loss applies to standard datasets – Iterative training of fixed size portfolio Standard task-loss

Learning a portfolio of predictors 19 Portfolio of predictorsCVPR’13 SCoRe Forest We already have the objective to optimize and propose to approximate (1) by

– The portfolio is NOT penalized for being diverse – Learning procedure is able to tune portfolio to the reconstruction error to be used at test-time – Next we describe one way to achieve diversity Multi-Output Loss 20 Standard task-loss

Training Algorithm 21

Loss to Example Weights 22 Diversity parameter (“variance” of the weights) Multi-output loss for example j Intuition: Want next predictor to emphasize accuracy on examples difficult thus far

Rendering Error 23

L1 Rendering Error 24 Input frame 1. Raycast depth frame for some hypothesis 2. Evaluate L1 distance between input depth and raycast depth

Results 25

7-Scenes Dataset 26 [Shotton et al. CVPR’13, Glocker et al. ISMAR’13]

Metric  Proportion Correct (single prediction) – Correct if translational error ≤ 5cm AND rotational error ≤ 5 o 27 Competing Approaches  CVPR13: Scene Coordinate Regression Forests [Shotton et al. CVPR’13]  CVPR13 + M -Best – Take M -Best RANSAC hypotheses

Office 28 Input frame Multiple predictions: Ground-truth (white), Prediction (magenta):

Stairs 29 Input frame Multiple predictions: Ground-truth (white), Prediction (magenta):

All Scene Average 30 Proportion Correct Size of Portfolio

All Scene Average 31 Proportion Correct Size of Portfolio Using aggregation

Summary  Camera relocalization as inverse problem  Portfolio of complementary discriminative predictors  Method to learn such portfolio  State-of-the-art camera relocalization 32