An H.264-based Scheme for 2D to 3D Video Conversion Mahsa T. Pourazad Panos Nasiopoulos Rabab K. Ward IEEE Transactions on Consumer Electronics 2009
Outline Introduction to 3D television 2D-to-3D Conversion Scheme – Camera motion Correction – Correction of Displacement Estimates – Perceptual Depth Enhancement Performance Evaluation Conclusion
Introduction 3D television – Stereoscopic – Multi-view – 2D plus depth – 3D display
Introduction 2D to 3D video streams – 2D video stream + Depth map – Depth Image Based Rendering(DIBR) [1] – 2 different viewpoints (projected on left and right retinas) [1] L. Zhang, “Stereoscopic image generation based on depth images for 3D TV,” IEEE Trans. Broadcasting, vol. 51, no.2, pp , 2005.
Introduction Depth map estimation – Light, shade, relative size, motion parallax, partial occlusion, textural gradient, geometric perspective…… – Manual, semi automatic or automatic Machine learning Extract depth from blur Edge information Motion vector information – H.264/AVC standard – Can’t work on static objects
2D-to-3D Conversion Scheme Use abs(MV x ) for estimating the depth map Depth of point P can be easily obtained if the disparity d is known.
2D-to-3D Conversion Scheme H.264/AVC Motion vector estimation – variable block sizes – Quarter-pixel matching accuracy Correction – Moving camera – Object boundary Perceptual depth enhancement
Camera Motion Correction Camera panning – Recognize camera motion – Adjust “Skip Mode” – Adjust net motion Zoom in/out Check the tendency of the camera MVs are scaled accordingly [2] [2] D. Kim, D. Min, K. Sohn, “Stereoscopic video generation method using motion analysis,” 3DTV Conf. pp. 1-4, 2007.
Correction of Displacement Estimates Is this motion vector correct? – Readjust MVs by making it equal to the median MV Motion vector is very different from neighbors’ ? Object boundary pixels? MV=median of neighbors’ MV Yes No Check the variance of the corresponding block in residual frame
Perceptual Depth Enhancement Non-linear scaling model – The further the object is, the smaller the scaling factor. The enhanced disparity value (N uniformly spaced depth layer) Ex: Layer 0 (i=0, S(0)=Z far /Z near ) Layer N-1 (i=N-1, S(N-1)=1)
Performance Evaluation Video sequences – “Interview”, “Orbi” True Depth Maps – Captured by 3D-depth range camera (Zcam) – 0 to 255 (256 depth layers) JM12.2 version of the H.264/AVC standard Compare with [3] [3] I. Ideses, L. P. Yaroslavsky, and B. Fishbain, “Real-time 2D to 3D video conversion,” Journal of Real-Time Image Processing, vol. 2, no. 1, pp. 3-9, 2007.
Performance Evaluation Video sequence Recorded depth map OrbiInterview
Performance Evaluation Estimated depth map by [3] Estimated depth map by our approach
Performance Evaluation 15 people graded the videos from 1 to 10 of 3D perception and visual quality
Performance Evaluation
Badly matched pixels in the estimated depth (Th=1) InterviewOrbi Our method50%47% [3]34%27% Percentage of correctly matched pixels
Conclusion This paper present a efficient method that estimates the depth map of a 2D video sequence using its H.264/AVC estimated motion information. It can be implemented in real-time at the receiver-end, without increasing the transmission bandwidth requirement.