Single Image Rolling Shutter Distortion Correction Ph.D. seminar talk – 2 Vijay Rengarajan EE11D035 Guides: Prof. A.N.Rajagopalan and Prof. R.Aravind March 22, 2017 Image Processing and Computer Vision lab Department of Electrical Engineering, IIT Madras
Camera Motion in Mobile Phones
Motion Causes Geometric Distortions
Sequential Exposure of Rolling Shutter Global shutter CCD image sensor Exposure time te Top row time All pixels expose at the same time Bottom row Exposure open Exposure close te Top row time Each row starts exposing sequentially Bottom row Td Total line delay Rolling shutter CMOS image sensor
Rolling Shutter Causes Local Distortions Different rows see the scene at different poses of the moving camera Even short exposure causes distortions Scene Scene rz x z y x z y tx time time Captured image Captured image
Prior works use multi-source information 2 3 1 Known undistorted reference frame Local blur kernels within the blurred image Multi-image feature correspondences Motion deblurring Video rectification Super-resolution Forssén and Ringaby (CVPR 2010) Grundmann et al. (ICCP 2012) Punnappurath et al. ICCV 2015 Su and Heidrich CVPR 2015 Super-resolution Change detection Rengarajan et al. ICIP 2016 Rengarajan et al. ECCV 2014, TPAMI 2016
Problem Definition - Single Image Correction Goal Correct the rolling shutter image as if it is captured by a global shutter camera Challenges Only one distorted image No motion blur Geometric distortions
Geometry-based Correction Corrected Image Feature Extraction Motion Estimation Distortion Correction How to correct the distorted image? Which camera motion will “correct” these features? What image features get distorted by camera motion?
Camera motion is embedded in curvatures rz A C D B E A C D B E time Forward mapping Inverse mapping Camera matrix Rotations Translations
Curves as distorted features Use Hough transform to detect small line segments Join line segments based on spatial and angular proximity to detect curves Group curves as vertical, horizontal, and slanted
Straightness to be perceptually desirable 1 Estimate camera motion that would make curves straight 3 6M unknowns! 2
Polynomial Camera Trajectory Short exposure warrants simple motion model Model polynomial trajectory w.r.t. row index 6(n+1) unknowns
Straightness to be perceptually desirable Estimate camera motion that would make curves straight Multiple solutions exist along the row dimension that preserve straightness Global inplane rotation Vertical shrinking Horizontal shearing
More constraints to zero-in 1 Make curves as lines Nonlinear least squares optimization 1 2 3 L2 distance Curves in vertical/horizontal group as vertical/horizontal lines Preserve vertical length of curves after correction 2 3
Results Camera motion estimation based on desirability costs Distorted image Curve detection Corrected image Correction over iterations
Camera motion estimation based on desirability costs Results Camera motion estimation based on desirability costs Distorted image Curve detection Corrected image Distorted image Corrected image Distorted image Corrected image Vijay Rengarajan, A.N. Rajagopalan, and R. Aravind, “From Bows to Arrows: Rolling Shutter Rectification of Urban Scenes,” IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, Nevada, June 2016.
On par with video methods Forssen and Ringaby CVPR 2010 Grundmann et al. ICCP 2012 Captured image Video-based methods Our correction
Should all curves be straight? Distorted image Corrected image
Let machines extract desired features RS Distorted Image Corrected Image Feature Extraction Motion Estimation Distortion Correction Motion Fitting Distortion Correction Corrected Image RS Distorted Image CNN CNN – Convolutional Neural Network Non-linearly maps distorted image to camera motion Estimates motion values at fixed number of rows Fits a polynomial trajectory
Learn image to motion as a nonlinear mapping Motion Mean Squared Error tx and rz at 15 rows Vanilla Convolutional Neural Network ReLU Tanh HardTanh 21 19 23 34 1 5 2 4 45 31 11 7 9 MaxPool /2 MaxPooling layer Convolution layer Conv Fully connected layer FC
Training the CNN Stochastic Gradient Descent Iteration Image Motion - Pass an input forward to obtain its output - Update for this input Calculate output gradient, Backpropagate the gradient through all layers to obtain Use a batch of images instead of a single input CNN Image Motion Training Data Generation Generate random camera motion Apply on undistorted image Datasets Chessboard Urban scenes Faces 7k 300k 250k Sun Oxford Zurich LFW
Correction Results of VanillaCNN 1 2 Translations only Translations and rotations Distorted image Corrected by VanillaCNN Distorted image Corrected by VanillaCNN
VanillaCNN filters cover smaller field of view 11x11 7x7 5x5 3x3 Without maxpooling 11x11 18x18 23x23 26x26 1x1 11x11 14x14 20x20 24x24 With maxpooling 25x25 45x45 69x69 11x11
Modify filter shape to suit rolling shutter Long filters for convolutions Square filters for convolutions Captures information slowly over many layers Captures information along row dimension early Captures information along time dimension early
RowColCNN architecture VanillaCNN Initial feature extraction Feature combination
Use long filters for RowColCNN Motion Mean Squared Error
Correction Results of RowColCNN Vijay Rengarajan, Yogesh Balaji, and A.N. Rajagopalan, “Unrolling the Shutter: CNN to Correct Motion Distortions,” IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, July 2017.
Learning wins in challenging conditions Distorted input Geometric-based Learning-based Rengarajan et al. 2016 Heflin et al. 2010
Machines as good as training data Geometry-based correction Human tailored distorted-feature selection Motion mapping based on straightness measure Learning-based correction Machines learn feature extraction and motion mapping Dependent on training data
Machines (only) as good as training data Geometry-based correction Human tailored distorted-feature selection Motion mapping based on straightness measure Learning-based correction Machines learn feature extraction and motion mapping Dependent on training data Trained on building datasets Vijay Rengarajan, A.N. Rajagopalan, and R. Aravind, “From Bows to Arrows: Rolling Shutter Rectification of Urban Scenes,” IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, Nevada, June 2016. Vijay Rengarajan, Yogesh Balaji, and A.N. Rajagopalan, “Unrolling the Shutter: CNN to Correct Motion Distortions,” IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, July 2017. apvijay.github.io/rs_demo
Backup
Rolling Shutter Causes Local Distortions Different rows see different poses of the camera under motion Even short exposure causes distortions x z y x z y tx rz Captured image Scene Captured image Scene time time
Rolling Shutter Causes Local Distortions Different rows see different poses of the camera under motion Even short exposure causes distortions x z y x z y tx rz Captured image Scene Captured image Scene time time
What to learn? Motion or Image?