Download presentation
Presentation is loading. Please wait.
1
IMPLEMENTATION OF AN OUT-OF-THE LOOP POST-PROCESSING TECHNIQUE FOR HEVC DECODED DEPTH-MAPS
Nayana Parashar Multimedia Processing Lab University of Texas at Arlington Supervising Professor: Dr. K.R. Rao November 25th, 2013 Multimedia Processing Lab, UTA 11/25/2013
2
CONTENTS BASIC CONCEPTS VIDEO COMPRESSION 3D VIDEO COMPRESSION
THESIS-WORK RESULTS CONCLUSIONS FUTURE-WORK REFERENCES Multimedia Processing Lab, UTA 11/25/2013
3
THESIS IN A NUT-SHELL Normal Procedure Thesis
Motivation : Compression artifact removal, better perceptual quality of rendered frames. 3D VIDEO ENCODING (Color-sequence & corresponding Depth-map ) 3D VIDEO DECODING VIEW RENDERING for DISPLAY (Stereoscopic or Multi-view) 3D VIDEO ENCODING (Color-sequence & corresponding Depth-map ) 3D VIDEO DECODING VIEW RENDERING for DISPLAY (Stereoscopic or Multi-view) Post-processing of the decoded depth-map Multimedia Processing Lab, UTA 11/25/2013
4
BASIC CONCEPTS Multimedia Processing Lab, UTA 11/25/2013
5
Image and video Images and video make-up the visual media.
An image is characterized by pixels or pels, the smallest addressable elements in a display device. Properties of an image: number of pixels (height and width), color and brightness of each pixel. Video is composed of a sequence of pictures (frames) taken at regular time (temporal) intervals. Figure 1: 2D image with spatial samples (L) and Video with N frames (R) [1] Multimedia Processing Lab, UTA 11/25/2013
6
3D video – Multi-view video plus depth format
The multi-view video plus depth (MVD) [2] [3] format: The most promising format for enhanced 3D visual experiences. This type of representation provides, for each view-point, a texture (image sequence) and an associated depth-map sequence (fig. 2). Figure 2: Color video frame (L) and associated depth map frame (R) [4] Multimedia Processing Lab, UTA 11/25/2013
7
Depth-maps Depth maps represent the per-pixel depth of a corresponding color image, and signal the disparity information needed at the virtual (novel) view rendering system. They are represented as a gray-scale image sequence for storage and transmission requirements. In the depth maps, each pixel conveys information on the relative distance from the camera to the object in the 3D space. Their efficient compression and transmission to the decoder is important for view generation. They are never actually displayed and are used for view generation purposes only. Multimedia Processing Lab, UTA 11/25/2013
8
Depth Image Based Rendering (DIBR) [5]
It is the process of synthesizing “virtual” views of a scene from still or moving images and associated per-pixel depth information. Two step process: The original image points are reprojected into the 3D world, utilizing the respective depth data. 3D space points are projected into the image plane of a “virtual” camera, which is located at the required viewing position. Stereoscopic view generation:- Two (left and right) views are generated. Multiple view generation:- More than two views (each view corresponding to the scene viewed from a different angle) are generated. Multimedia Processing Lab, UTA 11/25/2013
9
Stereoscopic view rendering
A color image and per-pixel depth map can be used to generate virtual stereoscopic views. This is shown in fig. 3. In this process, the original image points at locations (x, y) are transferred to new locations (xL , y) and (xr , y) for left and right view respectively. The view generation process in a little detail: Figure 3: Virtual view generation in Depth Image Based Rendering (DIBR) process [6] Multimedia Processing Lab, UTA 11/25/2013
10
VIDEO COMPRESSION Multimedia Processing Lab, UTA 11/25/2013
11
Introduction Data compression: Science of representing information in a compact format. Common image/video compression techniques reduce the number of bits required to represent image/video sequence (can be lossy or lossless). Video compression strategies:- Spatial, temporal and bit-stream redundancies are exploited. High-frequency components are removed. Many organizations have come-up with a number of video compression codecs over the past many years. [1] High Efficiency Video Coding (HEVC) is the most recent video compression standard. Multimedia Processing Lab, UTA 11/25/2013
12
HEVC overview [13][14] Successor of the H.264/AVC video compression standard. Multiple goals: improved coding efficiency, ease of transport system integration data loss resilience implementation ability using parallel processing architectures Complexity of some key modules such as transforms, intra prediction, and motion compensation is higher in HEVC than in H.264/AVC. Complexity of modules such as entropy coding and deblocking is lower in HEVC than in H.264/AVC [15]. Multimedia Processing Lab, UTA 11/25/2013
13
HEVC encoder- Block diagram
LEGEND: - High freq. content removal - Spatial redundancy exploitation - Temporal redundancy exploitation - Bit-stream redundancy exploitation -Sharp edge smoothing Figure 4: HEVC encoder block-diagram [13] Multimedia Processing Lab, UTA 11/25/2013
14
3D VIDEO COMPRESSION Multimedia Processing Lab, UTA 11/25/2013
15
The depth-map dilemma Compression of depth-maps is a challenge.
Quantization process eliminates high spatial frequencies in individual frames. The compression artifacts have adverse consequences upon the quality of the rendered views. It is highly important to preserve the sharp depth discontinuities present in depth maps for high quality virtual view generation. Two solutions exist to this dilemma Multimedia Processing Lab, UTA 11/25/2013
16
The two approaches for 3D compression
Approach one: Use of novel video compression techniques that are suitable for 3D video. Special features are added to overcome the depth-map dilemma. E.g. 3D video coding in H.264/AVC [16], 3D video extension of HEVC [17] [18] [19] Advantages: Special features that are specific to 3D video are exploited (Inter-view prediction), Dedicated blocks for depth-map compression in the codec. Disadvantages: Insanely complex with respect to general codec structure as well encoding time. Approach two: Using the already existing codecs to encode and decode the sequences.. Later, use image denoising techniques [20] on decoded depth-maps to remove compression artifacts. Advantages: Not as complicated and complex as approach one. Use of existing video codecs without any modification. Disadvantages: There is never one right denoising solution. Multimedia Processing Lab, UTA 11/25/2013
17
THESIS-WORK Multimedia Processing Lab, UTA 11/25/2013
18
Scope and premises This thesis falls into the second approach explained for 3D video compression Not much research has been done to implement image denoising techniques for HEVC decoded depth-maps. A post-processing framework that is based on analysis of compression artifacts upon generation of virtual views is used. The post-processing frame-work utilizes a spatial filtering technique specifically discontinuity analysis followed by Edge-adaptive joint trilateral filter [6] to reduce compression artifacts. Effectively reduces the compression artifacts from HEVC decoded depth-maps. Improvement in the perceptual quality of rendered views without using depth-map specific video codec Multimedia Processing Lab, UTA 11/25/2013
19
Algorithm: Block diagram
Encoder/Decoder Depth Discontinuity Analysis Edge Adaptive Joint Trilateral Filter Original Depth Map Corresponding Color Image Compressed Binary Mask Reconstructed Depth Map (a) (b) Figure 5: Block-diagram of the algorithm used for depth-map enhancement Multimedia Processing Lab, UTA 11/25/2013
20
Step (a): Depth discontinuity analysis [6]
The purpose is twofold: 1) The areas that have aligned edges in the color image and the corresponding depth map are identified. The filter kernels of the EA-JTF are adaptively selected based on this information. 2) All depth discontinuities that are significant in terms of rendering are identified. Sub-steps: The depth map is convolved with a vertical sobel filter to obtain Gx. An edge mask Ed is derived using Eq. (1.1), which corresponds to pixel locations of significant depth discontinuities. 𝐸 𝑑(𝑝,𝑞) = 1 𝑖𝑓 𝐺 𝑥 (𝑝,𝑞) ≥∆ 𝑚 𝑚𝑎𝑥 0 𝑖𝑓 𝐺 𝑥 (𝑝,𝑞) ≤∆ 𝑚 𝑚𝑎𝑥 Where ∆ 𝑚 𝑚𝑎𝑥 is a theoretical threshold obtained after studying the effects of compression artifacts to view rendering. ∆ 𝑚 𝑚𝑎𝑥 = 2.𝐷.255 𝑥 𝐵 . 𝑁 𝑝𝑖𝑥 .( 𝑘 𝑛𝑒𝑎𝑟 + 𝑘 𝑓𝑎𝑟 ) xB – distance between the left and right virtual cameras or eye separation (assumed to be 6 cm) D - viewing distance (assumed to be 250 cm). knear and kfar – range of the depth information respectively behind and in front of the picture, relative to the screen width. Npix – screen width measured in pixels 8-bit images are considered ( that is why the number 255) (1.1) Multimedia Processing Lab, UTA 11/25/2013
21
Step (a): (contd.) To identify the regions in which the color edges and depth discontinuities are aligned, an edge mask Ec of the color image is generated by the canny edge detection algorithm. Using Ed and Ec, the binary mask Es signifying the aligned edge areas is obtained as: 𝐄 𝐬 = 𝐄 𝐝 ⊕ 𝐒 𝟏 ∩ 𝐄 𝐜 ⊕ 𝐒 𝟐 Where, ⨁ represents the morphological dilation and S1and S2 represent flat square structuring elements of size 2 and 7 respectively. Different stages of step (a) are shown in figure 6. (1.2) Multimedia Processing Lab, UTA 11/25/2013
22
Figure 6: Illustration of depth discontinuity analysis
Multimedia Processing Lab, UTA 11/25/2013
23
Step (b): Edge-adaptive joint trilateral filter
The edge adaptive joint trilateral filter [6] is based of bilateral filter and joint trilateral filter [7] [8] [9] [10] [11] [12]. For some pixel position p the filtered result F is given as in the eq. (2.1), 𝑭= 𝐪∈𝛀 𝐖 𝐩𝐪 − 𝐈 𝐪 𝐪∈𝛀 𝐖 𝐩𝐪 In Eq. (2.1), Iq is the value at pixel position q in the kernel neighborhood. The filter weight wpq at pixel position q is calculated as, 𝐖 𝐩𝐪 =𝐜 𝐩,𝐪 . 𝐬𝒕(𝐩,𝐪) Both c and s are popularly implemented as a Gaussian centered at p and Ip (Ip is the value at pixel position p) with standard deviations σc and σs, respectively as 𝐜 𝐩,𝐪 =𝐞𝐱𝐩 − 𝟏 𝟐 𝐩−𝐪 𝟐 / 𝛔 𝐜 𝟐 𝐬 𝐩,𝐪 =𝐞𝐱𝐩 − 𝟏 𝟐 𝐈 𝐩 − 𝐈 𝐪 𝟐 / 𝛔 𝐬 𝟐 The similarity filter kernel st of the joint trilateral filter is adaptively selected as given in Eq. (2.3). For the areas where the edges between the color image and the corresponding depth map are aligned (i.e. Es from eq. 1.2 = 1) , there will be two similarity filter kernels used, each derived from the compressed depth map (s) and the color image(sj). For the remaining area, only the similarity filter kernel derived from the compressed depth map is used. 𝐬 𝐭 (𝐩,𝐪)= 𝐬 𝐩,𝐪 . 𝐬 𝐣 𝐩,𝐪 𝐢𝐟 𝐄 𝐬 𝐩,𝐪 =𝟏 𝐬 𝐩,𝐪 𝐢𝐟 𝐄 𝐬 𝐩,𝐪 =𝟎 (2.1) (2.2) (2.3) (2.4) (2.5) Multimedia Processing Lab, UTA 11/25/2013
24
Step (c) :- Stereoscopic view rendering
The reconstructed depth-map from step (b) is used to generate left side and right side views using stereoscopic view rendering process [21] [22] [27]. Finally, the frames obtained using uncompressed depth map, HEVC decoded depth-map and HEVC decoded depth-map to which the post-processing has been applied are compared using the metrics PSNR, SSIM [24] and a approximate of Mean Opinion Score [25] for image quality. Multimedia Processing Lab, UTA 11/25/2013
25
RESULTS Multimedia Processing Lab, UTA 11/25/2013
26
Results: Experimental set-up
To evaluate the performance of the EA-JTF [6] on HEVC decoded depth maps, color sequences along with the corresponding depth maps are compressed using HEVC reference software HM 9.2 [26]. For the purpose of filtering and rendering MATLAB R2013a student version was used. For all the sequences, other than Ballet, only one frame result is obtained at QP = 32. A 15 frame sequence at a frame-rate of 3 frames/sec is used for Ballet. Three different rendered images are obtained: 1) Original image and the corresponding depth map are used. (original) 2) HEVC decoded image and the corresponding decoded depth-map are used. (compressed) 3) HEVC decoded image and the depth-map after post-processing. (post-processed) PSNR and SSIM [24] and an approximate Mean Opinion Score (MOS) [25] was used to evaluate the perceptual quality of the rendered views. Multimedia Processing Lab, UTA 11/25/2013
27
Results: Input parameters
Value Viewing distance (D) 250cm (assumed) Eye separation (xB) 6cm(assumed) Screen width in pixels (Npix) 1366 (for the laptop used for experimentation) knear and kfar knear = 44.00; kfar = (BreakDancer) knear = 42.00; kfar = (Ballet) knear = ; kfar = (Balloons) knear = ; kfar = (Kendo) Resolution of the video sequences used 1024 x 768 EA-JTF Kernel size: 15 x 15 pixels Standard deviation for the color similarity filter (σs) = (normalized range of 0-1) Standard deviation for the depth similarity filter (σj) = (normalized range of 0-1) Standard deviation for the closeness filter (σc) = 45 Multimedia Processing Lab, UTA 11/25/2013
28
Results: Break-dancer sequence
Original sequence obtained from Microsoft Research [23] An increase in both PSNR as well as SSIM is seen. High-quality rendering as the original depth-maps are generated using computer vision algorithms. A grayscale version of the sequence was used for approximate MOS calculation. Even, here the post-processed method had better ratings than the compressed one. Image database Metric Decoded Image(Left-side view) Processed Image(Left-side view) Decoded Image(Right-side view) Processed Image(Right-side view) PSNR (dB) SSIM (dB) 0.9133 0.9139 Image MOS Rating (max = 3) Original 2.6 Decoded 1.5 Processed 1.9 Multimedia Processing Lab, UTA 11/25/2013
29
Results: Ballet sequence
Original sequence obtained from Microsoft Research [23] An increase in both PSNR as well as SSIM is seen. High-quality rendering as the original depth-maps are generated using computer vision algorithms. Sequence not used for MOS calculation. Image database Metric Decoded Image(Left-side view) Processed Image(Left-side view) Decoded Image(Right-side view) Processed Image(Right-side view) PSNR (dB) 42.787 SSIM (dB) 0.9413 0.9444 Multimedia Processing Lab, UTA 11/25/2013
30
Results: Kendo sequence
Original sequence obtained from [4]. Very interesting sequence. Not much edge information, hence the original, post-processed and compressed all are extremely similar perceptually. However, there is a slight decrease in PSNR and SSIM turned out to be exactly equal. On the other hand, in MOS calculation, the post-processed frame performed better than the compressed frame. Image database Metric Decoded Image(Left-side view) Processed Image(Left-side view) Decoded Image(Right-side view) Processed Image(Right-side view) PSNR (dB) SSIM (dB) 0.9887 0.9877 Image MOS Rating (max = 3) Original 2.2 Decoded 1.7 Processed 2.1 Multimedia Processing Lab, UTA 11/25/2013
31
Results: Balloons sequence
Original sequence obtained from [4]. The compressed has better PSNR as well as SSIM compared to the processed. This can be attributed to the fact that the views rendered from the original sequence themselves are not optimal due to noise in the original-depth. The proposed solution improves the perceptual quality to a great extent. In MOS calculation, the post-processed frame performed better than the compressed frame. Image database Metric Decoded Image(Left-side view) Processed Image(Left-side view) Decoded Image(Right-side view) Processed Image(Right-side view) PSNR (dB) 43.209 SSIM (dB) 0.981 0.9798 Image MOS Rating (max = 3) Original 2.4 Decoded 1.0 Processed 2.5 Multimedia Processing Lab, UTA 11/25/2013
32
CONCLUSIONS Multimedia Processing Lab, UTA 11/25/2013
33
Conclusions Quality of rendered views (stereoscopic rendering) generated using HEVC decoded depth-maps was improved. Four multi-view plus depth sequences were used to carry-out experiments. There was a an improvement in PSNR as well as SSIM for the two sequences- Break-dancer and Ballet. Break-dancer sequence saw an improvement of dB in PSNR and dB in SSIM. Ballet saw improvement of dB in PSNR and dB in SSIM. There was no improvement in PSNR for Kendo sequence while the SSIM remained constant (not much edge information) while for the balloons sequence, there was no improvement in either the PSNR or the SSIM. However, the main improvement brought about by this method was the improvement in the perceptual quality of the rendered views. An approximate MOS test survey suggested that the views rendered after post-processing were always better perceptually compared to the ones rendered without post-processing. In this regard, all the four test sequences showed improvement in perceptual quality. Multimedia Processing Lab, UTA 11/25/2013
34
FUTURE-WORK Multimedia Processing Lab, UTA 11/25/2013
35
Future-work Improvement in filter design to provide more significant results. Moving ahead of stereoscopic rendering and into multi-view rendering. Method can be made in-loop and merged with the HEVC compression codec. To calculate the perceptual quality, the current work used SSIM and an approximate of Mean Opinion Score, more research into perceptual quality assessment for depth-maps and rendered views will be useful. Multimedia Processing Lab, UTA 11/25/2013
36
IMAGE DATABASE Multimedia Processing Lab, UTA 11/25/2013
37
Break-Dancer sequence
Multimedia Processing Lab, UTA 11/25/2013
38
Break-dancer sequence-Grayscale (used for MOS)
Multimedia Processing Lab, UTA 11/25/2013
39
Ballet sequence Multimedia Processing Lab, UTA 11/25/2013
40
Balloons sequence Multimedia Processing Lab, UTA 11/25/2013
41
Balloons –grayscale (used for MOS)
Multimedia Processing Lab, UTA 11/25/2013
42
Kendo sequence Multimedia Processing Lab, UTA 11/25/2013
43
Kendo sequence grayscale (used for MOS)
Multimedia Processing Lab, UTA 11/25/2013
44
REFERENCES Multimedia Processing Lab, UTA 11/25/2013
45
References Multimedia Processing Lab, UTA 11/25/2013
K.R. Rao, D.N. Kim and J.J. Hwang, “Video coding standards: AVS China, H.264/MPEG4-Part 10, HEVC, VP6, DIRAC and VC-1”, Springer D.K. Shah, et al, "Evaluating multi-view plus depth coding solutions for 3D video scenarios," 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON), 2012, vol., no., pp.1, 4, Oct Fraunhofer HHI, 3D Video coding information: groups/image-video-coding/3d-hevc-extension.html Balloons and Kendo test sequences: C. Fehn "A 3D-TV system based on video plus depth information," Signals, Systems and Computers, Conference Record of the Thirty-Seventh Asilomar Conference on, vol.2, no., pp Vol.2, 9-12 Nov D.V.S. De Silva, et al, “A Depth Map Post-Processing Framework for 3D-TV systems based on Compression Artifact Analysis”, Selected Topics in Signal Processing, 2011, IEEE journal of volume: PP, Issue: 99, pp C. Tomasi and R. Manduchi, “Bilateral filtering for gray and color images,” IEEE International Conference on Computer Vision, Washington DC, USA, pp , 1998. E. Eisemann and F. Durand, “Flash photography enhancement via intrinsic relighting,” in ACM Transactions on Graphics (TOG), vol. 23, no. 3. ACM, 2004, pp. 673–678. G. Petschnigg, et al, “Digital photography with flash and no-flash image pairs,” in ACM Transactions on Graphics (TOG), vol. 23, no. 3. ACM, 2004, pp. 664–672. B. Zhang and J. Allebach, “Adaptive bilateral filter for sharpness enhancement and noise removal,” Image Processing, IEEE Transactions on, vol. 17, no. 5, pp. 664–678, 2008. P. Choudhury and J. Tumblin, “The trilateral filter for high contrast images and meshes,” in ACM SIGGRAPH 2005 Courses. ACM, 2005, pp. 5-es. S. Liu, P. Lai, D. Tian, C. Gomila, and C. W. Chen, “Joint trilateral filtering for depth map compression,” Huangshan, China, 2010, pp F-10. G.J. Sullivan; J. Ohm; Woo-Jin Han and T.Wiegand, “Overview of the High Efficiency Video Coding (HEVC) Standard”, IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp , Dec 2012. HEVC text specification draft 10: sudparis.eu/jct/doc_end_user/current_document.php?id=7243 Multimedia Processing Lab, UTA 11/25/2013
46
REFERENCES Multimedia Processing Lab, UTA 11/25/2013
F Bossen, et al, “HEVC complexity and implementation analysis”, IEEE Transactions on Circuits and Systems for Video Technology, Volume: 22, Issue: 12, pp , December 2012. 3DV for H.264: Fraunhofer HHI, 3D Video coding information: processing/research-groups/image-video-coding/3d-hevc-extension.html P. Merkle, A Smolic, K. Müller, and T. Wiegand, “Multi-View video plus depth data representation and coding”. Picture Coding Symposium, 2007. “Test Model under Consideration for HEVC based 3D video coding”, ISO/IEC JTC1/SC29/WG11 MPEG2011/N12559, San Jose, CA, USA, February 2012. M.C. Motwani, et al, “A survey of image denoising techniques”, Proceedings of GSPx 2004, Santa Clara, CA: I. J. S. W. 11, “Proposed experimental conditions for EE4 in MPEG 3DAV. WG 11 doc. m9016,” vol. Shanghai, Oct C. Fehn, “Depth-image-based rendering (DIBR), compression and transmission for a new approach on 3D-TV,” Proceedings of the SPIE, vol. 5291, 93, 2004. Break-Dancers and Ballet sequence: Z. Wang, A. C. Bovik, H. R. Sheikh and E. P. Simoncelli, "Image quality assessment: From error visibility to structural similarity," IEEE Transactions on Image Processing, vol. 13, no. 4, pp , Apr L Ma, et al, "Image Retargeting Quality Assessment: A study of subjective scores and objective metrics," Selected Topics in Signal Processing, IEEE Journal of , vol.6, no.6, pp.626,639, Oct HEVC reference software (HM 9.2):- MATLAB code for stereoscopic view rendering: depth-image-based-stereoscopic-view-rendering Multimedia Processing Lab, UTA 11/25/2013
47
THANK YOU! QUESTIONS?? Multimedia Processing Lab, UTA 11/25/2013
48
The lighter gray regions represent near objects.
The darker gray regions represent far objects. Multimedia Processing Lab, UTA 11/25/2013
49
EQUATIONS FOR STEREOSCOPIC VIEW GENERATION
The original image points at locations (x, y) are transferred to new locations (xL, y) and (xr,y) for left and right view respectively. This process is defined with: 𝐱 𝐑 =𝐱+ 𝐏 𝐩𝐢𝐱 𝟐 𝐱 𝐋 =𝐱− 𝐏 𝐩𝐢𝐱 𝟐 𝐩 𝐩𝐢𝐱 = − 𝐱 𝐁 𝐍 𝐩𝐢𝐱 𝐃 𝐦 𝟐𝟓𝟓 𝐤 𝐧𝐞𝐚𝐫 + 𝐤 𝐟𝐚𝐫 − 𝐤 𝐟𝐚𝐫 Where, ppix – pixel parallax xB – distance between the left and right virtual cameras or eye separation (assumed to be 6 cm) D - viewing distance (assumed to be 250 cm). m – depth value of each pixel in the reference view knear and kfar – range of the depth information respectively behind and in front of the picture, relative to the screen width. Npix – screen width measured in pixels 8-bit images are considered ( that is why the number 255) (1) (2) (3) Multimedia Processing Lab, UTA 11/25/2013
50
Stereo triangulation: (contd.)
Virtual cameras are selected such that the epipolar lines are horizontal, and thus the y component is constant. The equation (3) is in accordance with MPEG informative recommendation. The dis-occluded regions (visual holes) are filled by background pixel extrapolation technique. Due to any noise with which the depth maps could be corrupted, the luminance values of the pixels would be modified, i.e. m in eq (3) will be modified. This will result in warping error and thus cause distortions in the image rendered with the noisy depth map. Multimedia Processing Lab, UTA 11/25/2013
51
Epipolar line The line OL–X is seen by the left camera as a point because it is directly in line with that camera's center of projection. The right camera sees this line as a line in its image plane. That line (eR–xR) in the right camera is called an epipolar line. Symmetrically, the line OR–X seen by the right camera as a point is seen as epipolar line eL–xLby the left camera. Any line which intersects with the epipolar point is an epipolar line since it can be derived from some 3D point X. Multimedia Processing Lab, UTA 11/25/2013
52
Video compression strategies
Multimedia Processing Lab, UTA 11/25/2013
53
The chronology of different video compression standards
Multimedia Processing Lab, UTA 11/25/2013
54
3D video coding in H.264/AVC Multiview Video Coding (MVC):- an amendment to H.264/MPEG-4 AVC video compression standard [3]. Enables efficient encoding of sequences captured simultaneously from multiple cameras using a single video stream. MVC is intended for encoding stereoscopic (two-view) video, as well as free viewpoint television and multi-view 3D television. MVC stream is backward compatible with H.264/AVC [3], which allows older devices and software to decode stereoscopic video streams, ignoring additional information for the second view [6]. Combined temporal and inter-view prediction is the key for efficient MVC encoding. A frame from a certain camera can be predicted not only from temporally related frames from the same camera, but also from the frames of neighboring cameras. These interdependencies can be used for efficient prediction [6]. Figure: Multi-view coding structure with hierarchical B pictures for both temporal (black arrows) and inter-view prediction(red arrows) Multimedia Processing Lab, UTA 11/25/2013
55
3D extension of HEVC Multimedia Processing Lab, UTA 11/25/2013
56
Basic 3D video codec structure
Figure: Block Diagram of a 3D Video Codec[4]
57
MVD codec- working The basic structure of the 3D video codec is shown in the block diagram of Figure 5. In principle, each component signal is coded using an HEVC-based codec. The resulting bit stream packets, or more accurately, the resulting Network Abstraction Layer (NAL) units, are multiplexed to form the 3D video bit stream. The base or independent view is coded using an unmodified HEVC codec. The base view sub-stream can be directly decoded using the conventional HEVC decoder. For coding the dependent views and the depth data, modified HEVC codec are used, which are extended by including additional coding tools and inter-component prediction techniques that employ already coded data inside the same access unit as indicated by the red arrows in Figure 5. For enabling an optional discarding of depth data from the bit stream, e.g., for supporting the decoding of a stereo video suitable for conventional stereo displays, the inter-component prediction can be configured in a way that video pictures can be decoded independently of the depth data..
58
Figure: Access units structure and coding order of view components[12]
MVD- CODING ALGORITHM The video pictures and, when present, the depth maps are coded access unit by access unit, as it is illustrated in Figure 6. An access unit includes all video pictures and depth maps that correspond to the same time instant. NAL units containing camera parameters may be additionally associated with an access unit. The video pictures and depth maps corresponding to a particular camera position are indicated by a view identifier (viewId). All video pictures and depth maps that belong to the same camera position are associated with the same value of viewId. Inside an access unit, the video picture and, when present, the associated depth map with viewId equal to 0 are coded first, followed by the video picture and depth map with viewId equal to 1, etc. For ordering the reconstructed video pictures and depth map after decoding, each value of viewId is associated with another identifier called view order index (VOI). The view order index is a signed integer values, which specifies the ordering of the coded views from left to right. Figure: Access units structure and coding order of view components[12]
59
COMPARSION – MVD AND HEVC CODEC
CODING OF DEPENDENT VIEWS -- Additional tools have been integrated into the HEVC codec, which employ already coded data in other views for efficiently representing a dependent view. These tools include - Disparity-compensated prediction, View synthesis based inter-view prediction, Post processing in-loop filtering, Inter-view motion prediction, Depth-based motion parameter prediction, Inter-view residual prediction, Adjustment of QP of texture based on depth data. CODING OF DEPTH MAPS – There are certain additional tools and also some tools are removed for coding of Depth maps. Some of the differences are -- Depth Maps are coded in 4:0:0 format, Non-linear depth representation is used, Z-near Z-far compensated weighted prediction, Modified motion compensation and motion vector coding ( No interpolation is used i.e. for depth maps, the inter-picture prediction is always performed with full-sample accuracy. Disabling of in-loop filtering ( deblocking filter and SAO), Depth modeling modes ( Four new Intra-prediction modes are used), Motion parameter inheritance.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.