Landmark-Based Visual-Inertial Odometry and SLAM

Landmark-Based Visual-Inertial Odometry and SLAM
Leo Koppel 11/07/2017

Basic visual odometry (Geiger 2011)
Landmark-based methods| Overview Basic visual odometry (Geiger 2011) Multi-State Constraint Kalman Filter (Mourikis 2007) MSCKF for spacecraft (Mourikis 2009) Keyframe-based VIO (Leutenegger 2015)

This presentation focuses on a few indirect, visual-inertial methods
Landmark-based methods | The visual-inertial combination This presentation focuses on a few indirect, visual-inertial methods All methods Camera + IMU Indirect methods A few works

Important distinctions within this category:
Landmark-based methods | The visual-inertial combination Important distinctions within this category: Monocular Stereo Egomotion only (VIO) Mapping (SLAM) Filtering Optimization Loosely-coupled Tightly-coupled Apart from the separation into batch and filtering, the visual-inertial fusion approaches found in the literature can be divided into two other categories: loosely-coupled systems independently estimate the pose by a vision only algorithm and fuse IMU measurements only in a separate estimation step, limiting computational complexity. Tightly-coupled approaches in contrast include both the measurements from the IMU and the camera into a common problem where all states are jointly estimated, thus considering all correlations amongst them. -Leutenegger, Stefan et al. “Keyframe-Based Visual–inertial Odometry Using Nonlinear Optimization.” The International Journal of Robotics Research 34.3 (2015): 314–334. Web.

Landmark-based methods
From Cadena, Cesar et al. “Past , Present , and Future of Simultaneous Localization And Mapping : Towards the Robust-Perception Age.” (2016)

Visual Back-end ? Feature detection Feature tracking Outlier rejection IMU propagation Inertial

An example of visual odometry
Landmark-based methods Part 1 An example of visual odometry Geiger 2011 LibVISO2 demonstrate the basic steps of a visual odometry system, with fewer details than VIO / SLAM methods Geiger, Andreas, Julius Ziegler, and Christoph Stiller. "Stereoscan: Dense 3d reconstruction in real-time." Intelligent Vehicles Symposium (IV), 2011

1.a) Feature detection: blob and corner detector
Landmark-based methods | VISO2 1.a) Feature detection: blob and corner detector From Geiger, Andreas, Julius Ziegler, and Christoph Stiller. “StereoScan : Dense 3d in Real-Time” slides

1.b) Feature description – Sobel operator in 16 points
Landmark-based methods | VISO2 1.b) Feature description – Sobel operator in 16 points (an arbitrary, empirical choice) From Geiger, Andreas, Julius Ziegler, and Christoph Stiller. “StereoScan : Dense 3d in Real-Time” slides

Landmark-based methods | VISO2
2. Feature matching: Match current-to-last image, and left-to-right with epipolar constrains) From Geiger, Andreas, Julius Ziegler, and Christoph Stiller. “StereoScan : Dense 3d in Real-Time” slides

Displacements are compared to neighbours
Landmark-based methods | VISO2 3. Outlier rejection Displacements are compared to neighbours From Geiger, Andreas, Julius Ziegler, and Christoph Stiller. “StereoScan : Dense 3d in Real-Time” slides

Also, matches are thinned using bucketing
Landmark-based methods | VISO2 3. Outlier rejection Also, matches are thinned using bucketing From Geiger, Andreas, Julius Ziegler, and Christoph Stiller. “StereoScan : Dense 3d in Real-Time” slides

4. Motion estimation, and 5. Filtering (EKF)
Landmark-based methods | VISO2 4. Motion estimation, and 5. Filtering (EKF) From Geiger, Andreas, Julius Ziegler, and Christoph Stiller. “StereoScan : Dense 3d in Real-Time” slides

Goal is real-time speed
Landmark-based methods | VISO2 takeaway Goal is real-time speed Very empirical; main contributions are various optimizations Demonstrates the basic steps of a VO Only works with two images at a time; all else is forgotten Geiger, Andreas, Julius Ziegler, and Christoph Stiller. "Stereoscan: Dense 3d reconstruction in real-time." Intelligent Vehicles Symposium (IV), 2011

Multi-State Constraint Kalman Filter
Landmark-based methods Part 2 Multi-State Constraint Kalman Filter A filtering method for VIO Mourikis 2007 Filtering method (EKF) Goal is pose estimation only (not mapping) Keeps a window of camera poses Covariance form Mourikis, A I, and S I Roumeliotis. “A Multi-State Kalman Filter for Vision-Aided Inertial Navigation.” Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) April (2007)

Landmark-based methods | MSCKF
Some content from my presentation for ME 780 Autonomous Mobile Robotics, 2017, follows

In the original paper, SIFT is suggested.

We build a track for each feature, until it no longer appears.
Landmark-based methods | MSCKF The feature detector gives feature measurements (with correspondence) in each image We build a track for each feature, until it no longer appears.

That’s our measurement.
Landmark-based methods | MSCKF That’s our measurement. The estimator gets no info about the feature, before it’s done

The core problem Measurement model: a function of state.
Landmark-based methods | MSCKF The core problem Measurement model: a function of state. But how to connect feature measurements to robot state? MSCKF is a way to have feature measurements while keeping the robot pose but not the feature position in the estimator. From Prof. Waslander’s lecture slides

The core problem Need to get residuals in the form
Landmark-based methods | MSCKF The core problem Need to get residuals in the form where n is zero-mean white noise. Measurement model: a function of state. But how to connect feature measurements to robot state? MSCKF is a way to have feature measurements while keeping the robot pose but not the feature position in the estimator.

The state vector includes the current pose plus N past poses
Landmark-based methods | MSCKF The state vector includes the current pose plus N past poses

orientation biases position covariance

Whenever we get a new camera frame, we augment the state
Landmark-based methods | MSCKF Whenever we get a new camera frame, we augment the state state covariance Jacobian relating camera pose to IMU pose

Note: quaternion representations are not covered here.
Landmark-based methods | MSCKF Note: quaternion representations are not covered here. MSCKF and other works use these minimal representations: Store rotation as quaternions, but perturbations as Euler angles or axis-angle (in the tangent space) See Trawny, Nikolas, and Stergios I. Roumeliotis. “Indirect Kalman Filter for 3D Attitude Estimation (2005) Barfoot, Timothy, James R. Forbes, and Paul T. Furgale. "Pose estimation using linearized rotations and quaternion algebra." Acta Astronautica 68.1 (2011)

(A regular old motion model, except for quaternion math)
Landmark-based methods | MSCKF The IMU motion model is rotation meas. velocity meas. (A regular old motion model, except for quaternion math)

And covariance blocks:
Landmark-based methods | MSCKF Update the state: And covariance blocks: where is the state transition matrix

Note: this was the prediction update step of the EKF.
Landmark-based methods | MSCKF Note: this was the prediction update step of the EKF. The IMU measurements are “inputs.”

We trigger an update when: A tracked feature leaves the image
Landmark-based methods | MSCKF We trigger an update when: A tracked feature leaves the image The maximum number of states ( 𝑁 𝑚𝑎𝑥 ) is reached

Two-view geometry is used to get an initial estimate.
Landmark-based methods | MSCKF Two-view geometry is used to get an initial estimate. If we know the two camera centres 𝐶 and directions 𝑑, solve a simple least squares system to get distances 𝜆.

feature position feature position in camera frame camera pose

Use inverse parametrization:
Landmark-based methods | MSCKF Find that minimize square error, using Levenberg-Marquart (or GN) method. Use inverse parametrization:

But wait! Landmark-based methods | MSCKF camera pose feature position
Clement, Lee E et al. “The Battle for Filter Supremacy: A Comparative Study of the Multi-State Constraint Kalman Filter and the Sliding Window Filter.” 2015

De-correlate this: The dimension of 𝐫 𝒊 is 2 𝑀 𝑗 . The dimension of 𝐫 𝒐 is 2 𝑀 𝑗 −3 Clement, Lee E et al. “The Battle for Filter Supremacy: A Comparative Study of the Multi-State Constraint Kalman Filter and the Sliding Window Filter.” 2015

Stack all feature tracks together:
Landmark-based methods | MSCKF Stack all feature tracks together: To reduce dimension, use QR decomposition:

Finally, the EKF update:
Landmark-based methods | MSCKF Finally, the EKF update: According to Shimkin the Joseph form “may be more computationally expensive, but has the following advantages: • Numerically, it is guaranteed to preserve positive-definiteness (Pk+ > 0). • As mentioned, it holds for any gain K (not just the optimal) that may be used in the measurement update” (Mourkis uses the Joseph form equation) Shimkin, N. “Derivations of the Discrete-Time Kalman Filter The Stochastic State-Space Model.” (2009)

The most expensive part is calculating QR decomposition.
Landmark-based methods | MSCKF The most expensive part is calculating QR decomposition. It is linear in number of measurements. Mourikis, A I, and S I Roumeliotis. “A Multi-State Kalman Filter for Vision-Aided Inertial Navigation.” Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) April (2007)

Koppel, L. “Implementation of the Multi-State Constraint Kalman Filter” for ME 780, 2017

Error goes down in steps (though it must gradually drift upward) Related to delayed linearization Koppel, L. “Implementation of the Multi-State Constraint Kalman Filter” for ME 780, 2017

Part 3 Vision-Aided Inertial Navigation for Spacecraft Entry, Descent, and Landing Mourikis 2009 An application of MSCKF Mourikis, Anastasios I., et al. "Vision-aided inertial navigation for spacecraft entry, descent, and landing." IEEE Transactions on Robotics 25.2 (2009):

Landmark-based methods | Entry, Descent, and Landing
This was an experiment at JPL. They launched a sounding rocket to actually verify their system. From: Mourikis, Anastasios I., et al. "Vision-aided inertial navigation for spacecraft entry, descent, and landing." IEEE Transactions on Robotics 25.2 (2009):

Several interesting differences from road vehicles. The position error when images begin to be taken is large: “100 m for small bodies, and of 10 km for the Mars Science Laboratory (MSL) at the time of parachute deployment” (I don’t think MSL used this system in 2011; it had a 20km landing ellipse) “initial altitude and attitude are provided by altimeters and star-trackers, respectively… Thus, at least four DOFs are known fairly accurately” IMU diverged after parachute opening, due to jerky motion. From: Mourikis, Anastasios I., et al. "Vision-aided inertial navigation for spacecraft entry, descent, and landing." IEEE Transactions on Robotics 25.2 (2009):

Without landmark localization, the spacecraft can estimate: Altitude (altimeter) Attitude (orientation) (star tracker, IMU) Velocity (Doppler radar, IMU) Position Altitude (vertical position) can be estimated pretty well. Lateral position poorly. From: Mourikis, Anastasios I., et al. "Vision-aided inertial navigation for spacecraft entry, descent, and landing." IEEE Transactions on Robotics 25.2 (2009):

The system uses two kinds of features: Mapped Landmarks (ML)
Landmark-based methods | Entry, Descent, and Landing The system uses two kinds of features: Mapped Landmarks (ML) Opportunistic Features (OF) MLs are qualitatively different from OFs: since global position is known, they “make the camera pose observable” (can be used to find absolute position) if at least three are seen. OFs only give estimate of velocity.

Feature tracking New MSCKF
Landmark-based methods | Entry, Descent, and Landing Feature tracking New MSCKF From: Mourikis, Anastasios I., et al. "Vision-aided inertial navigation for spacecraft entry, descent, and landing." IEEE Transactions on Robotics 25.2 (2009):

Harris corners are used for feature matching.
Landmark-based methods | Entry, Descent, and Landing Harris corners are used for feature matching. The scale invariance of SIFT is not needed, since altitude is known. The update step uses all ML measurements in the image, in addition to OF tracks. OF measurements are only started when ML correspondences drop off I don’t cover the feature matching algorithm here. See Mourikis, Anastasios I., et al. "Vision-aided inertial navigation for spacecraft entry, descent, and landing." IEEE Transactions on Robotics 25.2 (2009):

Matching mapped landmarks (ML) They achieved errors of 0.16 m/s in velocity and 6.4 m in position at touchdown, much better than without vision! From: Mourikis, Anastasios I., et al. "Vision-aided inertial navigation for spacecraft entry, descent, and landing." IEEE Transactions on Robotics 25.2 (2009):

Keyframe-Based Visual-Inertial SLAM Using Nonlinear Optimization
Landmark-based methods Part 4 Keyframe-Based Visual-Inertial SLAM Using Nonlinear Optimization Leutenegger 2013 / 2015 While MSCKF is a recursive filtering method, this is batch method using nonlinear optimization. Leutenegger, S et al. “Keyframe-Based Visual-Inertial SLAM Using Nonlinear Optimization.” Proc. of Robot.: Sci. and Syst. (2013)

Landmark-based methods | Keyframe-based SLAM
The paper starts similarly, defining the state vector, rotation representation, Jacobians. IMU biases position speed orientation Leutenegger, Stefan et al. “Keyframe-Based Visual–inertial Odometry Using Nonlinear Optimization.” The International Journal of Robotics Research 34.3 (2015)

A batch method, with a graph representation (like GraphSLAM)
Landmark-based methods | Keyframe-based SLAM A batch method, with a graph representation (like GraphSLAM) Many IMU measurements between each “pose” (camera frame), combined into one factor. Leutenegger, Stefan et al. “Keyframe-Based Visual–inertial Odometry Using Nonlinear Optimization.” The International Journal of Robotics Research 34.3 (2015)

The cost function to be minimized is:
Landmark-based methods | Keyframe-based SLAM The cost function to be minimized is: reprojection error information information IMU error all cameras all frames all frames visible landmarks The authors use Ceres. Leutenegger, Stefan et al. “Keyframe-Based Visual–inertial Odometry Using Nonlinear Optimization.” The International Journal of Robotics Research 34.3 (2015)

IMU measurements are integrated between camera frames.
Landmark-based methods | Keyframe-based SLAM IMU measurements are integrated between camera frames. The IMU model and integration method are as in MSCKF. The authors specifically cite Mourikis 2007 (MSCKF). The Jacobians are not trivial, but not included here. Leutenegger, Stefan et al. “Keyframe-Based Visual–inertial Odometry Using Nonlinear Optimization.” The International Journal of Robotics Research 34.3 (2015)

Harris corner detector (custom implementation) BRISK descriptor
Landmark-based methods | Keyframe-based SLAM Feature detection Harris corner detector (custom implementation) BRISK descriptor Feature matching Performed against last keyframe. 3D-2D matching for landmarks in the map already 2D-2D matching for features without known landmark

Harris corner detector (custom implementation) BRISK descriptor
Landmark-based methods | Keyframe-based SLAM Feature detection Harris corner detector (custom implementation) BRISK descriptor Feature matching 3D-2D matching for landmarks in the map already: Propagate IMU to estimate current pose Estimate positions of known landmarks Match descriptors (brute force) Reject outliers using Mahalanobis distance from pose estimates, and RANSAC 2D-2D matching for features without known landmark: Initialize landmark positions via triangulation “Given the current pose prediction, all landmarks that should be visible are considered for brute-force descriptor matching. Outliers are only rejected afterwards. This scheme may seem illogical to the reader who might intuitively want to apply the inverse order in the sense of a guided matching strategy; however, owing to the super-fast matching of binary descriptors, it would actually be more expensive to first look at image-space consistency”

3D-2D match 2D-2D match L-R match only No match
Landmark-based methods | Keyframe-based SLAM 3D-2D match 2D-2D match L-R match only No match Last keyframe “Given the current pose prediction, all landmarks that should be visible are considered for brute-force descriptor matching. Outliers are only rejected afterwards. This scheme may seem illogical to the reader who might intuitively want to apply the inverse order in the sense of a guided matching strategy; however, owing to the super-fast matching of binary descriptors, it would actually be more expensive to first look at image-space consistency” Leutenegger, Stefan et al. “Keyframe-Based Visual–inertial Odometry Using Nonlinear Optimization.” The International Journal of Robotics Research 34.3 (2015) Current frame Leutenegger, Stefan et al. “Keyframe-Based Visual–inertial Odometry Using Nonlinear Optimization.” The International Journal of Robotics Research 34.3 (2015)

“Given the current pose prediction, all landmarks that should be visible are considered for brute-force descriptor matching. Outliers are only rejected afterwards. This scheme may seem illogical to the reader who might intuitively want to apply the inverse order in the sense of a guided matching strategy; however, owing to the super-fast matching of binary descriptors, it would actually be more expensive to first look at image-space consistency” Leutenegger, Stefan et al. “Keyframe-Based Visual–inertial Odometry Using Nonlinear Optimization.” The International Journal of Robotics Research 34.3 (2015) Leutenegger, Stefan et al. “Keyframe-Based Visual–inertial Odometry Using Nonlinear Optimization.” The International Journal of Robotics Research 34.3 (2015)

When is a frame a keyframe?
Landmark-based methods | Keyframe-based SLAM When is a frame a keyframe? Want to make a new keyframe when stored features are no longer prominent (but not too often) Leutenegger 2015 suggests: If hull of matched landmarks cover <50% of image or If ratio matched:detected keypoints < 20% Leutenegger 2013 suggests the ratio of hulls (matched:all detected) < 50 or 60% Small hull Small ratio

Recall marginalization from Sliding Window Filter:
Landmark-based methods | Keyframe-based SLAM Recall marginalization from Sliding Window Filter: Sibley, Gabe, Larry Matthies, and Gaurav Sukhatme. "Sliding window filter with application to planetary landing." Journal of Field Robotics 27.5 (2010):

Here we use the same thing:
Landmark-based methods | Keyframe-based SLAM Here we use the same thing: information error residuals (then solve for correction to state – see paper) Leutenegger, Stefan et al. “Keyframe-Based Visual–inertial Odometry Using Nonlinear Optimization.” The International Journal of Robotics Research 34.3 (2015)

They use two windows: M keyframes and S “temporal” frames
Landmark-based methods | Keyframe-based SLAM They use two windows: M keyframes and S “temporal” frames Leutenegger, Stefan et al. “Keyframe-Based Visual–inertial Odometry Using Nonlinear Optimization.” The International Journal of Robotics Research 34.3 (2015)

A non-keyframe falls out of the temporal window:
Landmark-based methods | Keyframe-based SLAM A non-keyframe falls out of the temporal window: M=3 S=3 IMU marginalized, but landmark measurements discarded Leutenegger, Stefan et al. “Keyframe-Based Visual–inertial Odometry Using Nonlinear Optimization.” The International Journal of Robotics Research 34.3 (2015)

A keyframe falls out of the window:
Landmark-based methods | Keyframe-based SLAM A keyframe falls out of the window: M=3 S=3 Leutenegger, Stefan et al. “Keyframe-Based Visual–inertial Odometry Using Nonlinear Optimization.” The International Journal of Robotics Research 34.3 (2015)

The authors used a helmet-mounted rig, and compared against improved MSCKF (2012) Leutenegger, Stefan et al. “Keyframe-Based Visual–inertial Odometry Using Nonlinear Optimization.” The International Journal of Robotics Research 34.3 (2015)

They did better Landmark-based methods | Keyframe-based SLAM
Leutenegger, Stefan et al. “Keyframe-Based Visual–inertial Odometry Using Nonlinear Optimization.” The International Journal of Robotics Research 34.3 (2015)

The authors’ open-source implementation: OKVIS: Open Keyframe-based Visual-Inertial SLAM

The current KITTI Odometry Stereo leader:
Landmark-based methods | Other works The current KITTI Odometry Stereo leader: A variant of SOFT (Cvišić 2015) Note KITTI Odometry does not include IMU! Table from Janai, Joel et al. “Computer Vision for Autonomous Vehicles: Problems, Datasets and State-of-the-Art.” (2017)

Feature detector and descriptor from LibVISO2
Landmark-based methods | SOFT SOFT (Cvišić 2015) Feature detector and descriptor from LibVISO2 Features are tracked for more than one frame More complicated feature selection: In each 50x50px bucket: Cvišić, Igor, and Ivan Petrović. "Stereo odometry based on careful feature selection and tracking." Mobile Robots (ECMR), 2015 European Conference on. IEEE, 2015.

Feature detector and descriptor from LibVISO2
Landmark-based methods | SOFT SOFT (Cvišić 2015) Feature detector and descriptor from LibVISO2 Features are tracked for more than one frame More complicated feature selection Different Egomotion estimation (separate rotation and translation) Cvišić, Igor, and Ivan Petrović. "Stereo odometry based on careful feature selection and tracking." Mobile Robots (ECMR), 2015 European Conference on. IEEE, 2015.

VISO2 Landmark-based methods | SOFT

SOFT Landmark-based methods | SOFT

Thank you.

Algorithm choices often seem empirical
Landmark-based methods | SOFT takeaway Discussion topics Algorithm choices often seem empirical Is there something to emulate here? Should we value KITTI benchmark results?

Landmark-Based Visual-Inertial Odometry and SLAM

Similar presentations

Presentation on theme: "Landmark-Based Visual-Inertial Odometry and SLAM"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Landmark-Based Visual-Inertial Odometry and SLAM

Similar presentations

Presentation on theme: "Landmark-Based Visual-Inertial Odometry and SLAM"— Presentation transcript:

Similar presentations

About project

Feedback