Perceptual real-time 2D-to-3D conversion using cue fusion Thomas Leimkühler1 Petr Kellnhofer1 Tobias Ritschel2 Karol Myszkowski1 Hans-Peter Seidel1 1 2 20+5 minutes
Stereo 3D Stereo 3D has become a significant part of visual media production 3D-Television has arrived Problem 1: Viewing discomfort Solution: Careful content production & improved hardware technology Problem 2: Increased production costs compared to 2D Solution: 2D-to-3D conversion Image Source: http://vr-zone.com/articles/sony-and-panasonic-to-use-lgs-3d-technology-in-their-tvs/15416.html Leimkühler et al.: Perceptual real-time 2D-to-3D conversion using cue fusion. GI 2016, Victoria/Canada, June 1st – 3rd, 2016
2D-to-3D Conversion Least expensive method for producing Stereo 3D Only method to deal with 2D legacy content Ideally: Real-time performance = On-the-fly conversion Mono Image Disparity Map Stereo Image Leimkühler et al.: Perceptual real-time 2D-to-3D conversion using cue fusion. GI 2016, Victoria/Canada, June 1st – 3rd, 2016
2D-to-3D Conversion Massive research on accurate 3D reconstruction Very hard problem, far from being solved Higher-quality systems take minutes per frame Exact 3D reconstruction not necessary for Stereo 3D Exploit limits of human stereo perception Low-pass filtering in space and time, except at luminance discontinuities [Kane et al. 2014, Kellnhofer et al. 2015] Relax the problem Leimkühler et al.: Perceptual real-time 2D-to-3D conversion using cue fusion. GI 2016, Victoria/Canada, June 1st – 3rd, 2016
Plausible 2D-to-3D Conversion Mono Input ≠ Ground Truth Depth Distorted Depth ≡ Stereo from Ground Truth Depth Stereo from Distorted Depth Leimkühler et al.: Perceptual real-time 2D-to-3D conversion using cue fusion. GI 2016, Victoria/Canada, June 1st – 3rd, 2016
Monocular Depth Cues … Aerial Perspective Defocus Perspective Occlusion Motion Parallax Leimkühler et al.: Perceptual real-time 2D-to-3D conversion using cue fusion. GI 2016, Victoria/Canada, June 1st – 3rd, 2016
The Idea Use several monocular cues to produce binocular disparity Wish list Real-time system: Efficient inference, GPU processing Robust fusion: Resolve contradicting cues Spatial and temporal coherence: Long-range exchange of information Probabilistic model: Per-pixel normal distributions of disparity Confidence-aware processing Leimkühler et al.: Perceptual real-time 2D-to-3D conversion using cue fusion. GI 2016, Victoria/Canada, June 1st – 3rd, 2016
1. Learning Disparity Priors Pipeline Ω 1. Learning Disparity Priors 2. Cue Extraction 3. Cue Fusion 4. Stereo Image Generation Pre-process Runtime Leimkühler et al.: Perceptual real-time 2D-to-3D conversion using cue fusion. GI 2016, Victoria/Canada, June 1st – 3rd, 2016
Disparity Priors Acquired from our own stereo database Conditioned on Publicly available Conditioned on Scene class Close-up, Coast, Forest, Indoor, Inside City, Mountain, Open Country, Portrait, Street, Tall Buildings Location in the image plane Appearance SVM trained for scene classification “Close-up” “Forest” Leimkühler et al.: Perceptual real-time 2D-to-3D conversion using cue fusion. GI 2016, Victoria/Canada, June 1st – 3rd, 2016
Disparity Priors Appearance Samples Mean Disparity Confidence “Open Country” “Portrait” Leimkühler et al.: Perceptual real-time 2D-to-3D conversion using cue fusion. GI 2016, Victoria/Canada, June 1st – 3rd, 2016
1. Learning Disparity Priors Pipeline Ω 1. Learning Disparity Priors 2. Cue Extraction 3. Cue Fusion 4. Stereo Image Generation Pre-process Runtime Leimkühler et al.: Perceptual real-time 2D-to-3D conversion using cue fusion. GI 2016, Victoria/Canada, June 1st – 3rd, 2016
Defocus Input Laplacian Pyramid Disparity Mean Confidence Leimkühler et al.: Perceptual real-time 2D-to-3D conversion using cue fusion. GI 2016, Victoria/Canada, June 1st – 3rd, 2016
Aerial Perspective Input Disparity Mean Confidence Leimkühler et al.: Perceptual real-time 2D-to-3D conversion using cue fusion. GI 2016, Victoria/Canada, June 1st – 3rd, 2016
Vanishing Points Input Line Accumulation Disparity Mean Confidence Leimkühler et al.: Perceptual real-time 2D-to-3D conversion using cue fusion. GI 2016, Victoria/Canada, June 1st – 3rd, 2016
Occlusion Input T-junctions Filter Bank Disparity Mean Confidence Leimkühler et al.: Perceptual real-time 2D-to-3D conversion using cue fusion. GI 2016, Victoria/Canada, June 1st – 3rd, 2016
Motion Input Disparity Mean Confidence Optical flow Leimkühler et al.: Perceptual real-time 2D-to-3D conversion using cue fusion. GI 2016, Victoria/Canada, June 1st – 3rd, 2016
1. Learning Disparity Priors Pipeline Ω 1. Learning Disparity Priors 2. Cue Extraction 3. Cue Fusion 4. Stereo Image Generation Pre-process Runtime Leimkühler et al.: Perceptual real-time 2D-to-3D conversion using cue fusion. GI 2016, Victoria/Canada, June 1st – 3rd, 2016
Step 1: Maximum Likelihood Estimation Cue Mean Disparity 𝜇 MLE 𝒙 = 1 𝑍(𝒙) 𝑖=1 𝑛 𝑐 𝜎 𝑖 −2 (𝒙) 𝜇 𝑖 𝒙 Cue Confidence Leimkühler et al.: Perceptual real-time 2D-to-3D conversion using cue fusion. GI 2016, Victoria/Canada, June 1st – 3rd, 2016
Step 2: Maximum a Posteriori Estimation Cue Evidence 𝜇 MAP 𝒙 = 1 𝑍 𝒙 𝜎 0 −2 𝒙 𝜇 0 𝒙 + 𝑖=1 𝑛 𝑐 𝜎 𝑖 −2 (𝒙) 𝜇 𝑖 𝒙 Prior Evidence Leimkühler et al.: Perceptual real-time 2D-to-3D conversion using cue fusion. GI 2016, Victoria/Canada, June 1st – 3rd, 2016
Step 3: Robust Estimation Far Near Prior Aerial Persp. Defocus Van. Point Occlusion Motion MAP estimation Robust MAP estimation Leimkühler et al.: Perceptual real-time 2D-to-3D conversion using cue fusion. GI 2016, Victoria/Canada, June 1st – 3rd, 2016
Step 4: Pairwise Estimation Robust MAP disparity 𝜇 𝒙 = 1 𝑍 𝒙 Ω 𝑣(𝒙,𝒚) 𝜎 𝑀𝐴𝑃 −2 𝒚 𝜇 𝑀𝐴𝑃 𝒚 d𝒚 Entire space-time domain Robust MAP confidence Relate v to perceptual findings 𝒩 𝒙−𝒚 , 𝜎 𝑑 𝒩 𝐼(𝒙)−𝐼(𝒚) , 𝜎 𝑟 Leimkühler et al.: Perceptual real-time 2D-to-3D conversion using cue fusion. GI 2016, Victoria/Canada, June 1st – 3rd, 2016
1. Learning Disparity Priors Pipeline Ω 1. Learning Disparity Priors 2. Cue Extraction 3. Cue Fusion 4. Stereo Image Generation Pre-process Runtime Leimkühler et al.: Perceptual real-time 2D-to-3D conversion using cue fusion. GI 2016, Victoria/Canada, June 1st – 3rd, 2016
Prior “Tall Buildings” Results Vanishing Point Defocus Prior “Mountain” Prior “Tall Buildings” Occlusion Prior “Forest” Aerial Perspective Aerial Perspective Occlusion Vanishing Point Leimkühler et al.: Perceptual real-time 2D-to-3D conversion using cue fusion. GI 2016, Victoria/Canada, June 1st – 3rd, 2016
Results Prior “Forest” Aerial Perspective Leimkühler et al.: Perceptual real-time 2D-to-3D conversion using cue fusion. GI 2016, Victoria/Canada, June 1st – 3rd, 2016
Results Defocus Motion Aerial Perspective Prior “Street” Motion Leimkühler et al.: Perceptual real-time 2D-to-3D conversion using cue fusion. GI 2016, Victoria/Canada, June 1st – 3rd, 2016
Evaluation Perceptual study Quantitative evaluation We can outperform real-time 2D-to-3D conversion systems We can achieve similar (and better) user preference compared to offline methods Quantitative evaluation Very similar results across tested methods No reliable quality metric for Stereo 3D content exists Leimkühler et al.: Perceptual real-time 2D-to-3D conversion using cue fusion. GI 2016, Victoria/Canada, June 1st – 3rd, 2016
Conclusion Real-time 2D-to-3D conversion can be successful, if We aim at reconstructing plausible disparity Multiple sources of information are combined We use a simple, yet expressive probabilistic model allowing for parallel inference Leimkühler et al.: Perceptual real-time 2D-to-3D conversion using cue fusion. GI 2016, Victoria/Canada, June 1st – 3rd, 2016
Results Gallery & Prior Database http://resources.mpi-inf.mpg.de/StereoCueFusion Acknowledgments Adam Laskowski, Dushyant Mehta, Elena Arabadzhiyska, Krzysztof Templin, and Waqar Khan Contact tleimkueh@mpi-inf.mpg.de Thank you! Leimkühler et al.: Perceptual real-time 2D-to-3D conversion using cue fusion. GI 2016, Victoria/Canada, June 1st – 3rd, 2016