THE UNIVERSITY OF BRITISH COLUMBIA Random Forests-Based 2D-to- 3D Video Conversion Presenter: Mahsa Pourazad M. Pourazad, P. Nasiopoulos, and A. Bashashati
22 Outline Introduction to 3D TV & 3D Content Motivation for 2D to 3D Video conversion Proposed 2D to 3D video conversion scheme Conclusions
33 Stereoscopic Dual Camera Image-Based Rendering Technique Stereo Video Stereo Video 3D Depth Range Camera 2D Video Depth Map Introduction to 3D TV & 3D content:
44 Industry is investing in 3D TV and broadcasting Hollywood already is investing in 3D Technology Are we ready for this? No! One of the issues: lack of content Converting existing 2D to 3D: Resell existing content (Movies, TV series, etc.) Motivation for 2D to 3D Video Conversion:
55 Sharpness, motion, occlusion, texture, perspective, and… 5 How it Works - 3D Perception
66 2D-to-3D Conversion Depth Map 2D Video 2D to 3D Video conversion : Monocular Depth Cues (Motion parallax, Sharpness, Occlusion and…) Proper integration of more monocular depth cues results in more accurate depth map estimate (imitating human brain system)
77 Depth Estimation for 2D Video Using motion: time view Disparity Vector Stereoscopic Cameras Right View Left View 2D Video Cameras Motion Vector time Idea: Motion vector resembles disparity
88 Motion-based 2D to 3D video conversion*: 2D video Motion Vectors (MVs) Motion Correction Camera Motion Correction Object-based Motion Correction Object-based Motion Estimation Non-Linear Transforming Model * * Pourazad, M.T., Nasiopoulos, P. and Ward, R.K. (2009) An H.264-based scheme for 2D to 3D video conversion. IEEE Transactions on Consumer Electronic, vol. 55, no. 2: Estimated Depth Map Main issue: Estimating depth information for static objects Near objects move faster across the retina than further objects do
9 Our Suggested Scheme: (integrating multiple monocular depth cues) Extracting Features Representing Monocular Depth Cues Motion Parallax 4x4 blocks 2D Video Implement a block matching technique between consecutive frames.
10 Motion Parallax Depth Cue: Finding disparity over time (motion vectors): Implementing Depth Estimation Reference Software (designed for Multiview streams): F0F0 F1F1 F2F2 F3F3 F1F1 F2F2 F3F3 F4F4 F3F3 F4F4 F5F5 F6F6 ……… Virtual Camera (Left Camera) 2D Video Camera (Center Camera) F: 2D Video Frames Virtual Camera (Right Camera) Estimated disparity over time for each 4x4 block represents Motion Parallax
11 Our Suggested Scheme: (integrating multiple monocular depth cues) Extracting Features Representing Monocular Depth Cues Motion Parallax Texture Variation 4x4 blocks 2D Video Face-texture of a textured material is more apparent when it is closer
12 Texture Variation Depth Cue: Applying Law’s texture energy masks to 4x4 blocks’ luma information as: L3L3L3L3 L3E3L3E3 L3S3L3S3 E3L3E3L3 E3E3E3E3 E3S3E3S3 S3L3S3L3 S3E3S3E3 S3S3S3S3 I: Luma information of each 4x4 block (Y) F: Law’s mask Law’s texture energy masks Feature set with18 components represents texture variation depth cue for each 4x4 block
13 Our Suggested Scheme: (integrating multiple monocular depth cues) Extracting Features Representing Monocular Depth Cues Motion Parallax Texture Variation Haze 4x4 blocks 2D Video Distant objects visually appear less distinct and more bluish than objects nearby due to haze
14 Haze Depth Cue: Haze is reflected in the low frequency information of chroma (U & V): Apply L 3 L 3 Law’s texture energy mask (local averaging) to 4x4 blocks’ Chroma information as: L3L3L3L3 C: Chroma information of each 4x4 block (U & V) F: Law’s mask Feature set with 4 components represents haze depth cue for each 4x4 block
15 Our Suggested Scheme: (integrating multiple monocular depth cues) Extracting Features Representing Monocular Depth Cues Motion Parallax Texture Variation Haze Perspective 4x4 blocks 2D Video The more the lines converge, the farther away they appear to be Applying the Radon Transform to the luma information of each block ( {0 , 30 , 60 , 90 , 120 , 150 }). Amplitude and phase of the most dominant edge are selected Feature set with 2 components
16 Our Suggested Scheme: (integrating multiple monocular depth cues) Extracting Features Representing Monocular Depth Cues Motion Parallax Texture Variation Haze Perspective Vertical Coordinate 4x4 blocks 2D Video In general the objects closer to the bottom boarder of the image are closer to the viewer Feature set includes vertical spatial coordinate of each 4x4 block (as a percentage of the frame’s height)
17 Our Suggested Scheme: (integrating multiple monocular depth cues) Extracting Features Representing Monocular Depth Cues Motion Parallax Texture Variation Haze Perspective Vertical Coordinate Sharpness 4x4 blocks 2D Video Closer objects appear sharper Sharpness of each 4x4 block is measured by implementing diagonal Laplacian method* * A. Thelen, S. Frey, S. Hirsch, and P. Hering, “Improvements in shape-from-focus for holographic reconstructions with regard to focus operators, neighborhood-size, and height value interpolation”, IEEE Trans.on Image Processing, Vol. 18, no. 1, pp , 2009
18 Our Suggested Scheme: (integrating multiple monocular depth cues) Extracting Features Representing Monocular Depth Cues Motion Parallax Texture Variation Haze Perspective Vertical Coordinate Sharpness Occlusion 4x4 blocks 2D Video The object which overlaps or partly obscures our view of another object, is closer. Extracting all feature sets for each 4x4 patch at three different image-resolution levels (1, 1/2, and 1/4). Capture occlusion Global accountable features
19 Our Suggested Scheme: (integrating multiple monocular depth cues) Extracting Features Representing Monocular Depth Cues Motion Parallax Texture Variation Haze Perspective Vertical Coordinate Sharpness Occlusion Random Forests (RF) Machine Learning Depth-Map Model Estimation 4x4 blocks 2D Video 81-dimensional feature vectors RF: A classification & regression technique which is a collection of individual Decision Trees (DTs)* Randomly select the input feature vectors Application: where DTs do not perform well on unseen test data individually, but the contribution of DTs perform well to unseen data * L. Breiman, and A. Cutler, “Random forest.” Machine Learning, 45, pp. 5–32, Training Set input: feature vectors of 4x4 blocks with pixels mostly belonging to a common object of key frames output: known depth values Test Set: 4x4 blocks of an unseen video
20 Our Suggested Scheme: (integrating multiple monocular depth cues) Extracting Features Representing Monocular Depth Cues Motion Parallax Texture Variation Haze Perspective Vertical Coordinate Sharpness Occlusion Depth-Map Model Estimation Estimated Depth Map Mean shift Image segmentation* Object-based depth information 4x4 blocks 2D Video Depth Map 81-dimensional feature vectors * D. Comaniciu, and P. Meer, “Mean Shift: A Robust Approach toward Feature Space Analysis,” IEEE Trans. Pattern Analysis Machine Intell., vol. 24, no. 5, pp , Random Forests (RF) Machine Learning
21 Experiments: Training sequences: Test sequences:
22 Results: 2D VideoAvailable Depth MapExisting Motion-based Technique Our Proposed Technique Subjective Test (ITU-R BT ): 18 people graded the stereo videos from 1 to 10
23 Conclusions: A new and efficient 2D to 3D video conversion method was presented. The method uses Random Forest Regression to estimate the depth map model based on multiple depth cues. Performance evaluations show that our approach outperforms a state of the art existing motion based method The subjective visual quality of our created 3D stream was also confirmed by watching the resulted 3D streams on a stereoscopic display. Our method is real-time and can be implemented at the receiver side without any burden into the network