Akash Bapat1, Enrique Dunn1,2 & Jan-Michael Frahm1,

Slides:



Advertisements
Similar presentations
Zhengyou Zhang Vision Technology Group Microsoft Research
Advertisements

Real-Time Projector Tracking on Complex Geometry Using Ordinary Imagery Tyler Johnson and Henry Fuchs University of North Carolina – Chapel Hill ProCams.
Simultaneous surveillance camera calibration and foot-head homology estimation from human detection 1 Author : Micusic & Pajdla Presenter : Shiu, Jia-Hau.
3D reconstruction.
For Internal Use Only. © CT T IN EM. All rights reserved. 3D Reconstruction Using Aerial Images A Dense Structure from Motion pipeline Ramakrishna Vedantam.
1. 2 An extreme occurrence of the missing data W I D E B A S E L I N E – no point in more than 2 images!
MASKS © 2004 Invitation to 3D vision Lecture 7 Step-by-Step Model Buidling.
Computer vision: models, learning and inference
Lecture 8: Stereo.
Camera calibration and epipolar geometry
3D Computer Vision and Video Computing 3D Vision Topic 4 of Part II Visual Motion CSc I6716 Fall 2011 Cover Image/video credits: Rick Szeliski, MSR Zhigang.
Boundary matting for view synthesis Samuel W. Hasinoff Sing Bing Kang Richard Szeliski Computer Vision and Image Understanding 103 (2006) 22–32.
Motion Tracking. Image Processing and Computer Vision: 82 Introduction Finding how objects have moved in an image sequence Movement in space Movement.
Last Time Pinhole camera model, projection
Multi video camera calibration and synchronization.
Recognising Panoramas
Uncalibrated Geometry & Stratification Sastry and Yang
Automatic Panoramic Image Stitching using Local Features Matthew Brown and David Lowe, University of British Columbia.
Lecture 11: Structure from motion CS6670: Computer Vision Noah Snavely.
Passive Object Tracking from Stereo Vision Michael H. Rosenthal May 1, 2000.
The plan for today Camera matrix
Previously Two view geometry: epipolar geometry Stereo vision: 3D reconstruction epipolar lines Baseline O O’ epipolar plane.
CS664 Lecture #19: Layers, RANSAC, panoramas, epipolar geometry Some material taken from:  David Lowe, UBC  Jiri Matas, CMP Prague
CSCE 641 Computer Graphics: Image-based Modeling (Cont.) Jinxiang Chai.
CSE473/573 – Stereo Correspondence
Hand Signals Recognition from Video Using 3D Motion Capture Archive Tai-Peng Tian Stan Sclaroff Computer Science Department B OSTON U NIVERSITY I. Introduction.
3-D Scene u u’u’ Study the mathematical relations between corresponding image points. “Corresponding” means originated from the same 3D point. Objective.
776 Computer Vision Jan-Michael Frahm, Enrique Dunn Spring 2013.
Simple Calibration of Non-overlapping Cameras with a Mirror
Landing a UAV on a Runway Using Image Registration Andrew Miller, Don Harper, Mubarak Shah University of Central Florida ICRA 2008.
A General Framework for Tracking Multiple People from a Moving Camera
3D SLAM for Omni-directional Camera
Y. Moses 11 Combining Photometric and Geometric Constraints Yael Moses IDC, Herzliya Joint work with Ilan Shimshoni and Michael Lindenbaum, the Technion.
Dynamic 3D Scene Analysis from a Moving Vehicle Young Ki Baik (CV Lab.) (Wed)
Ground Truth Free Evaluation of Segment Based Maps Rolf Lakaemper Temple University, Philadelphia,PA,USA.
© 2005 Martin Bujňák, Martin Bujňák Supervisor : RNDr.
Peripheral drift illusion. Multiple views Hartley and Zisserman Lowe stereo vision structure from motion optical flow.
Computer Vision Lecture #10 Hossam Abdelmunim 1 & Aly A. Farag 2 1 Computer & Systems Engineering Department, Ain Shams University, Cairo, Egypt 2 Electerical.
Single-view geometry Odilon Redon, Cyclops, 1914.
3D Sensing Camera Model Camera Calibration
Feature Matching. Feature Space Outlier Rejection.
55:148 Digital Image Processing Chapter 11 3D Vision, Geometry Topics: Basics of projective geometry Points and hyperplanes in projective space Homography.
776 Computer Vision Jan-Michael Frahm & Enrique Dunn Spring 2013.
Using Adaptive Tracking To Classify And Monitor Activities In A Site W.E.L. Grimson, C. Stauffer, R. Romano, L. Lee.
Visual Odometry David Nister, CVPR 2004
3D Reconstruction Using Image Sequence
Last Two Lectures Panoramic Image Stitching
3D Sensing 3D Shape from X Perspective Geometry Camera Model Camera Calibration General Stereo Triangulation 3D Reconstruction.
Correspondence and Stereopsis. Introduction Disparity – Informally: difference between two pictures – Allows us to gain a strong sense of depth Stereopsis.
Design and Calibration of a Multi-View TOF Sensor Fusion System Young Min Kim, Derek Chan, Christian Theobalt, Sebastian Thrun Stanford University.
55:148 Digital Image Processing Chapter 11 3D Vision, Geometry
Advanced Computer Graphics
Paper – Stephen Se, David Lowe, Jim Little
A Forest of Sensors: Using adaptive tracking to classify and monitor activities in a site Eric Grimson AI Lab, Massachusetts Institute of Technology
Depth from disparity (x´,y´)=(x+D(x,y), y)
3D Vision Topic 4 of Part II Visual Motion CSc I6716 Fall 2009
Compositional Human Pose Regression
Approximate Models for Fast and Accurate Epipolar Geometry Estimation
Zhigang Zhu, K. Deepak Rajasekar Allen R. Hanson, Edward M. Riseman
3D Photography: Epipolar geometry
Structure from motion Input: Output: (Tomasi and Kanade)
Eric Grimson, Chris Stauffer,
3D Sensing Camera Model and 3D Transformations
Multiple View Geometry for Robotics
Noah Snavely.
Single-view geometry Odilon Redon, Cyclops, 1914.
Optical flow and keypoint tracking
Structure from motion Input: Output: (Tomasi and Kanade)
Unrolling the shutter: CNN to correct motion distortions
Presentation transcript:

Towards kHz 6-DoF Visual Tracking Using an Egocentric Cluster of Rolling Shutter Cameras Akash Bapat1, Enrique Dunn1,2 & Jan-Michael Frahm1, UNC Chapel Hill, USA1, Stevens Institute of Technology, USA2

AR/VR System Components Tracking Rendering Display Image source : http://static3.gamespot.com/uploads/original/1179/11799911/2786939-hololens7.jpg

AR/VR System Components Tracking Rendering Display Tz Tx Ty Find 6-DoF head pose1 LaValle etal, Head Tracking for Oculus Rift

Positional tracking 2,3 Camera fps = 60 fps Tracking Systems Our NDI Optotrak Certus1 Tracking frequency ≈ 5kHz State of the art Positional tracking 2,3 Camera fps = 60 fps Orientation tracking IMU at 1000 Hz 4,5 Oculus Rift DK2 NDI Frequency HiBall Self-tracker DK2 Complexity or cost http://certus.ndigital.com http://cdn.pocketnow.com/wp-content/uploads/2011/04/gyro.jpg http://vrshoot.ru/sites/default/files/oculus-rift-dk2-cam.jpg LaValle etal, Head Tracking for Oculus Rift http://i.imgur.com/64LA9Kv.jpg

Rolling shutter capture1 Row-by-row acquisition of linescan snapshots at slightly different times Stream of row-images http://jasmcole.com/2014/10/12/rolling-shutters/

Stream of Rows Frequency of row-samples F = FPS * Height Rolling shutter1 Frequency of row-samples F = FPS * Height = 120 * 720 > 80kHz http://www.diyphotography.net/everything-you-wanted-to-know-about-rolling-shutter/

Our Tracker Enabling component for rendering Use commodity cameras >30 kHz tracker 30 kHz Renderer Enabling component for rendering Zheng et al., 2014 Lincoln et al., 2016 Use commodity cameras Tracking frequency Up to 80 kHz Break frame-rate barrier Process each row of image GoPro1 http://shop.gopro.com/hero4/hero4-black/CHDHX-401.html

Artifact visualisation1 Related Work Removing rolling shutter Forssen et al., CVPR 2010 Track points using KLT tracker, estimate rotation Parametrize intra-frame rotation as spline Geometric problems Multi-view stereo: Saurer et al., ICCV 2013 Adapt plane sweep stereo for rolling shutter Velocity estimation Ait-Aider et al., ICVS 2006 Solve for pose and velocity using bundle adjustment Use 2D-3D correspondence, similar to camera calibration http://jasmcole.com/2014/10/12/rolling-shutters/

High frequency 6DoF tracking Approach Overview Stream of rows Cluster Estimation Framework Camera Cluster High frequency 6DoF tracking Output Spatial Stereo Temporal Stereo Input Our approach

Dynamic sensor with periodic movement2 Cluster Spatial Stereo Temporal Estimation Framework Our Cluster GoPro1 Dynamic sensor with periodic movement2 Small vertical FoV http://shop.gopro.com/hero4/hero4-black/CHDHX-401.html http://users.erols.com/njastro/faas/image001/CCD2.jpg

Our Cluster N stereo-pairs of rolling shutter camera, N=5 Spatial Stereo Temporal Estimation Framework Our Cluster 1 2 3 4 5 N stereo-pairs of rolling shutter camera, N=5 Precalibrated intrinsic and extrinsics Temporal sync Known history of motion Hi-Ball for ground truth H Geometry of cluster H

Motion Model 3D point 𝑿 𝑐𝑎𝑚𝑖 𝑡+1 = 𝑴 𝑐𝑎𝑚𝑖 𝑉1 𝑿 𝑐𝑎𝑚𝑖 𝑡 3D point X Cluster Spatial Stereo Temporal Estimation Framework Motion Model 3D point 𝑿 𝑐𝑎𝑚𝑖 𝑡+1 = 𝑴 𝑐𝑎𝑚𝑖 𝑉1 𝑿 𝑐𝑎𝑚𝑖 𝑡 3D point X

Motion Model Moved 3D point 3D point Motion of camera Cluster Spatial Stereo Temporal Estimation Framework Motion Model Moved 3D point 3D point Motion of camera 𝑿 𝑐𝑎𝑚𝑖 𝑡+1 = 𝑴 𝑐𝑎𝑚𝑖 𝑉1 𝑿 𝑐𝑎𝑚𝑖 𝑡 3D point X

Motion Model Moved 3D point 3D point 𝑿 𝑐𝑎𝑚𝑖 𝑡+1 = 𝑴 𝑐𝑎𝑚𝑖 𝑉1 𝑿 𝑐𝑎𝑚𝑖 𝑡 Cluster Spatial Stereo Temporal Estimation Framework Motion Model Moved 3D point 3D point 𝑿 𝑐𝑎𝑚𝑖 𝑡+1 = 𝑴 𝑐𝑎𝑚𝑖 𝑉1 𝑿 𝑐𝑎𝑚𝑖 𝑡 𝑿 𝑐𝑎𝑚𝑖 𝑡+1 = 𝑽𝒊 𝑴𝑐𝑙𝑢𝑠𝑡𝑒𝑟 𝑽 𝒊 −𝟏 𝑿 𝑐𝑎𝑚𝑖 𝑡 Vi cami RS cluster

Linearized Motion Model Cluster Spatial Stereo Temporal Estimation Framework Linearized Motion Model 𝑿 𝑐𝑎𝑚𝑖 𝑡+1 = 𝑴 𝑐𝑎𝑚𝑖 𝑉1 𝑿 𝑐𝑎𝑚𝑖 𝑡 𝑿 𝑐𝑎𝑚𝑖 𝑡+1 = 𝑽𝒊 𝑴𝑐𝑙𝑢𝑠𝑡𝑒𝑟 𝑽 𝒊 −𝟏 𝑿 𝑐𝑎𝑚𝑖 𝑡 𝒅𝑴= 1 −θ 𝑧 θ 𝑦 𝛿𝑇 𝑥 θ 𝑧 1 −θ 𝑥 𝛿𝑇 𝑦 −θ 𝑦 θ 𝑥 1 𝛿𝑇 𝑧 0 0 0 1 𝑴 𝒄𝒍𝒖𝒔𝒕𝒆𝒓 = 𝑅( 𝜃 𝑥 , 𝜃 𝑦 , 𝜃 𝑧 ) 𝑇 0 1

Motion Estimation Stereo in space Cluster Spatial Stereo Temporal Estimation Framework Motion Estimation 𝑿 𝑐𝑎𝑚𝑖 𝑡+1 = 𝑽𝒊 𝑴𝑐𝑙𝑢𝑠𝑡𝑒𝑟 𝑉 𝑖 −1 𝑿 𝑐𝑎𝑚𝑖 𝑡 𝑿 𝑐𝑎𝑚𝑖 𝑡 = 0 y d 1 𝑇 Stereo in space

Motion Estimation 𝑿 𝑐𝑎𝑚𝑖 𝑡+1 = ∆x y+∆𝑦 d+∆𝑑 1 𝑇 Cluster Spatial Stereo Temporal Estimation Framework Motion Estimation 𝑿 𝑐𝑎𝑚𝑖 𝑡+1 = ∆x y+∆𝑦 d+∆𝑑 1 𝑇 𝑿 𝑐𝑎𝑚𝑖 𝑡+1 = 𝑽𝒊 𝑴𝑐𝑙𝑢𝑠𝑡𝑒𝑟 𝑉 𝑖 −1 𝑿 𝑐𝑎𝑚𝑖 𝑡

Stereo Estimation Spatial stereo sd Temporal stereo st Cluster Spatial Stereo Temporal Estimation Framework Stereo Estimation Spatial stereo sd Use stereo-pair of cameras Measure pixel disparity Temporal stereo st Measure small shifts in pixel Compare row at time t and t-k t1 t2

Visualization of row-image Cluster Spatial Stereo Temporal Estimation Framework Binary Row Descriptor Visualization of row-image (296th) … 𝑑 2 𝐺(𝑥,0,σ) 𝑑 𝑥 2 ∗𝑟𝑜𝑤 2nd derivative Sign

Spatial Stereo : Measure Disparity Cluster Spatial Stereo Temporal Estimation Framework Spatial Stereo : Measure Disparity Compare binary descriptor of rows of stereo cameras Fast hamming cost matching 𝑑𝑒𝑝𝑡ℎ= 𝑓𝑜𝑐𝑎𝑙 ∗𝑏𝑎𝑠𝑒𝑙𝑖𝑛𝑒 𝑑𝑖𝑠𝑝𝑎𝑟𝑖𝑡𝑦 Row samples of stereo pair Reconstructed row using homography Stereo Pair Binary descriptor Previous frame Current frames st sd baseline focal length P (x,y,d) xL xR Stereo in space

Temporal Stereo: Measure Shift st Cluster Spatial Stereo Temporal Estimation Framework Temporal Stereo: Measure Shift st Different regions in space are captured at different timestamps Stereo in time Need snaps of same space but at different timestamps Row-wise homography Known [R |t] per row Reference time Current time Compare Warped to match reference time pose Previous Frame

Temporal Stereo: Measure Shift st Cluster Spatial Stereo Temporal Estimation Framework Temporal Stereo: Measure Shift st Reconstruct row using per- row homography Use binary descriptors and hamming distance Row sample of stereo pair Reconstructed row using homography One Camera of Stereo Pair Binary descriptor Sampled from previous frame sx Current frame

Adaptive Reference Leave sufficient motion Cluster Spatial Stereo Temporal Estimation Framework Adaptive Reference Leave sufficient motion Row-to-row motion Interpolation & pixel measurement noise Satisfy small-motion assumption Reference should not be far away Current frame Current Row (t) Reference row: t-k Noisy measurements Small motion assumption invalid Estimation reliability k

Confidence Scores Quality of minimum: cPKR Temporal consistency : ct Cluster Spatial Stereo Temporal Estimation Framework Confidence Scores High confidence Low Confidence Shift Hamming distance Quality of minimum: cPKR Is the valley unique? Use Peak ratio (PKR)[1] Temporal consistency : ct Consistent shifts in time Penalize sudden changes 𝑐 𝑖,𝑡 = 𝑐 𝑃𝐾𝑅 ∗ 𝑐 𝑡 Time Shift C(t) = 𝑐 1,𝑡 ⋯ 0 ⋮ ⋱ ⋮ 0 ⋯ 𝑐 𝑛,𝑡 X. Hu and P. Mordohai. A quantitative evaluation of confidence measures for stereo vision, PAMI 2012

Motion Estimation Stereo in time Stereo in space Cluster Spatial Stereo Temporal Estimation Framework Motion Estimation 𝑿 𝑐𝑎𝑚𝑖 𝑡+1 = ∆x y+∆𝑦 d+∆𝑑 1 𝑇 Stereo in time 𝑿 𝑐𝑎𝑚𝑖 𝑡+1 = 𝑽𝒊 𝑴𝑐𝑙𝑢𝑠𝑡𝑒𝑟 𝑉 𝑖 −1 𝑿 𝑐𝑎𝑚𝑖 𝑡 𝑿 𝑐𝑎𝑚𝑖 𝑡 = 0 y d 1 𝑇 Stereo in space

Weighted Linear System Cluster Spatial Stereo Temporal Estimation Framework Weighted Linear System 𝑿 𝑐𝑎𝑚𝑖 𝑡+1 =𝑽𝒊 𝑴𝒄𝒍𝒖𝒔𝒕𝒆𝒓 𝑽𝒊 −1 𝑿 𝑐𝑎𝑚𝑖 𝑡 One equation from each camera Use more cameras for robustness 𝑪 𝑨 𝜹 𝑻 𝒙 𝜹 𝑻 𝒚 𝜹 𝑻 𝒛 𝜽 𝒙 𝜽 𝒚 𝜽 𝒛 = 𝑪 𝑩

Approach Summary Our approach Spatial Stereo Weighted Least Squares High frequency 6DoF tracking Stream of rows Temporal Stereo Confidence Scores Input Output

Simulator Developed in OpenGL+Qt and Unity3D Support for Hi-Ball tracker data Motion Blur

Experiments Camera specs: 640 × 480 pixels FoV = 60° Stereo pairs = 10 120 fps Quantify errors in terms of display pixel errors Display specs : HTC Vive 120° FoV 1080 × 1200 pixels per eye Point 1m in front 1.Mingsong Duo et al, Exploring high-level plane primitives for indoor 3d reconstruction with a hand-held rgb-d camera, ACCV 2012

Display Pixel Error Pixels shown in HMD

Experiments : Real motion1 in Synthetic Room f= 56.7kHz Frame i i+1 i+2 i+3 i+4 … 1. Hi-Ball tracker data

Experiments: Synthetic Large Motion f= 56.7kHz v= 1.4 m/s ω= 120 deg/s

Experiments : Synthetic Extreme Motion f= 56.7kHz v= 1.4 m/s ω= 500 deg/s Sampling not possible

Results for Real Imagery f= 80.4kHz

Direct pixel comparison Conclusion High frequency visual tracker Up to 80 kHz Off-the-shelf cameras Rolling Shutter High Frequency Direct pixel comparison Redundant cameras Motion linearization

Contact: akash@cs.unc.edu Thank you! Contact: akash@cs.unc.edu

Camera tracking : The spectrum Global Shutter Rolling Shutter as 1-D sensor 1-D sensor 1-D camera1 No distortion Distortion Drift correction No drift correction High cost Cheap High Cost Low fps ≈ 200Hz High fps, 56 kHz High fps, till 87 kHz Higher noise Low noise