Daniel Shepard and Todd Humphreys High-Precision Globally-Referenced Position and Attitude via a Fusion of Visual SLAM, Carrier-Phase-Based GPS, and Inertial Measurements Daniel Shepard and Todd Humphreys 2014 IEEE/ION PLANS Conference, Monterey, CA | May 8, 2014
Overview Globally-Referenced Visual SLAM Motivating Application: Augmented Reality Estimation Architecture Bundle Adjustment (BA) Simulation Results for BA
Stand-Alone Visual SLAM Produces high-precision estimates of Camera motion (with ambiguous scale for monocular SLAM) A map of the environment Limited in application due to lack of a global reference [1] G. Klein and D. Murray, “Parallel tracking and mapping for small AR workspaces,” in 6th IEEE and ACM International Symposium on Mixed and Augmented Reality. IEEE, 2007, pp. 225–234.
Visual SLAM with Fiduciary Markers Globally-referenced solution if fiduciary markers are globally-referenced Requires substantial infrastructure and/or mapping effort Microsoft’s augmented reality maps (TED2010[2]) [2] B. A. y Arcas, “Blaise Aguera y Arcas demos augmented-reality maps,” TED, Feb. 2010, http://www.ted.com/talks/blaise aguera.html.
Can globally-referenced position and attitude (pose) be recovered from combining visual SLAM and GPS?
Observability of Visual SLAM + GPS No GPS positions Translation Rotation Scale 1 GPS position Translation Rotation Scale 2 GPS positions Translation Rotation Scale ~ 3 GPS positions Translation Rotation Scale
Combined Visual SLAM and CDGPS CDGPS anchors visual SLAM to a global reference frame Can add an IMU to improve dynamic performance (not required!) Can be made inexpensive Requires little infrastructure Very Accurate!
Motivating Application: Augmented Reality Augmenting a live view of the world with computer-generated sensory input to enhance one’s current perception of reality[3] Current applications are limited by lack of accurate global pose Potential uses in Construction Real-Estate Gaming Social Media [3] Graham, M., Zook, M., and Boulton, A. "Augmented reality in urban places: contested content and the duplicity of code." Transactions of the Institute of British Geographers. .
Estimation Architecture Motivation Sensors: Camera Two GPS antennas (reference and mobile) IMU How can the information from these sensors best be combined to estimate the camera pose and a map of the environment? Real-time operation Computational burden vs. precision
Sensor Fusion Approach Tighter coupling = higher precision, but increased computational burden IMU Visual SLAM CDGPS IMU Visual SLAM CDGPS IMU Visual SLAM CDGPS IMU Visual SLAM CDGPS
The Optimal Estimator
IMU only for Pose Propagation
Tightly-Coupled Architecture
Loosely-Coupled Architecture
Hybrid Batch/Sequential Estimator Only geographically diverse frames (keyframes) in batch estimator
Bundle Adjustment State and Measurements State Vector: 𝑿 𝐵𝐴 = 𝒄 𝒑 , 𝒄= … 𝒙 𝐺 𝐶 𝑖 𝑇 𝒒 𝐺 𝐶 𝑖 𝑇 … 𝑇 , 𝒑= … 𝒙 𝐺 𝑝 𝑗 𝑇 … 𝑇 Measurement Models: CDGPS Positions: 𝒙 𝐺 𝐴 𝑖 = 𝒉 𝑥 𝒙 𝐺 𝐶 𝑖 , 𝒒 𝐺 𝐶 𝑖 + 𝒘 𝑥 𝑖 = 𝒙 𝐺 𝐶 𝑖 +𝑅 𝒒 𝐺 𝐶 𝑖 𝒙 𝐶 𝐴 + 𝒘 𝑥 𝑖 Image Feature Measurements: 𝒔 𝐼 𝑖 𝑝 𝑗 = 𝒉 𝑠 𝒙 𝐶 𝑖 𝑝 𝑗 + 𝒘 𝐼 𝑖 𝑝 𝑗 = 𝑥 𝐶 𝑖 𝑝 𝑗 𝑧 𝐶 𝑖 𝑝 𝑗 𝑦 𝐶 𝑖 𝑝 𝑗 𝑧 𝐶 𝑖 𝑝 𝑗 𝑇 + 𝒘 𝐼 𝑖 𝑝 𝑗 𝒙 𝐶 𝑖 𝑝 𝑗 = 𝑥 𝐶 𝑖 𝑝 𝑗 𝑦 𝐶 𝑖 𝑝 𝑗 𝑧 𝐶 𝑖 𝑝 𝑗 𝑇 = 𝑅 𝒒 𝐺 𝐶 𝑖 𝑇 ( 𝒙 𝐺 𝑝 𝑗 − 𝒙 𝐺 𝐶 𝑖 )
Bundle Adjustment Cost Minimization Weighted least-squares cost function Employs robust weight functions to handle outliers argmin 𝑿 𝐵𝐴 1 2 𝑖=1 𝑁 Δ 𝒙 𝐺 𝐴 𝑖 2 + 𝑗=1 𝑀 𝑤 𝑉 Δ 𝒔 𝐼 𝑖 𝑝 𝑗 Δ 𝒔 𝐼 𝑖 𝑝 𝑗 2 Δ 𝒙 𝐺 𝐴 𝑖 = 𝑅 𝒙 𝐺 𝐴 𝑖 −1/2 𝒙 𝐺 𝐴 𝑖 − 𝒙 𝐺 𝐴 𝑖 Δ 𝒔 𝐼 𝑖 𝑝 𝑗 = 𝑅 𝒔 𝐼 𝑖 𝑝 𝑗 −1/2 𝒔 𝐼 𝑖 𝑝 𝑗 − 𝒔 𝐼 𝑖 𝑝 𝑗 Sparse Levenberg-Marquart algorithm Computational complexity linear in number of point features, but cubic in number of keyframes
Bundle Adjustment Initialization Initialize BA based on stand-alone visual SLAM solution and CDGPS positions Determine similarity transform relating coordinate systems argmin 𝒙 𝐺 𝑉 , 𝒒 𝐺 𝑉 , 𝑠 1 2 𝑖=1 𝑁 𝒙 𝐺 𝐴 𝑖 − 𝒙 𝐺 𝑉 −𝑅 𝒒 𝐺 𝑉 𝑠 𝒙 𝑉 𝐶 𝑖 +𝑅 𝒒 𝑉 𝐶 𝑖 𝒙 𝐶 𝐴 2 Generalized form of Horn’s transform[4] Rotation: Rotation that best aligns deviations from mean camera position Scale: A ratio of metrics describing spread of camera positions Translation: Difference in mean antenna position [4] B. K. Horn, “Closed-form solution of absolute orientation using unit quaternions,” JOSA A, vol. 4, no. 4, pp. 629–642, 1987.
Simulation Scenario for BA Simulations investigating estimability included in paper Hallway Simulation: Measurement errors: 2 cm std for CDGPS 1 pixel std for vision Keyframes every 0.25 m 242 keyframes 1310 point features Three scenarios: GPS available GPS lost when hallway entered GPS reacquired when hallway exited A D ←C ←B
Simulation Results for BA
Summary Hybrid batch/sequential estimator for loosely-coupled visual SLAM and CDGPS with IMU for state propagation Compared to optimal estimator Outlined algorithm for BA (batch) Presented a novel technique for initialization of BA BA simulations Demonstrated positioning accuracy of ~1 cm and attitude accuracy of ~ 0.1 ∘ in areas of GPS availability Attained slow drift during GPS unavailability (0.4% drift over 50 m)
Navigation Filter State Vector: Propagation Step: 𝑿 𝐹 = 𝒙 𝐺 𝐶 𝑇 𝒗 𝐺 𝐶 𝑇 𝒃 𝐵 𝑓 𝑇 𝒒 𝐺 𝐶 𝑇 𝒃 𝐵 𝜔 𝑇 𝑇 Propagation Step: Standard EKF propagation step using accelerometer and gyro measurements Accelerometer and gyro biases modeled as a first-order Gauss-Markov processes More information in paper …
Navigation Filter (cont.) Measurement Update Step: Image feature measurements from all non-keyframes Temporarily augment the state with point feature positions Prior from map produced by BA Must ignore cross-covariances ⇒ filter inconsistency Similar block diagonal structure in the normal equations as BA 𝑈 𝐹 𝑊 𝐹 𝑊 𝐹 𝑇 𝑉 𝐹 𝛿 𝑿 𝐹 𝛿𝒑 = 𝝐 𝐹 𝝐 𝑝 ⇒ 𝑈 𝐹 − 𝑊 𝐹 𝑉 𝐹 −1 𝑊 𝐹 𝑇 0 𝑊 𝐹 𝑇 𝑉 𝐹 𝛿𝒄 𝛿𝒑 = 𝐼 − 𝑊 𝐹 𝑉 𝐹 −1 0 𝐼 𝝐 𝐹 𝝐 𝑝
Simulation Results for BA (cont.)