By: Hitesh Yadav Supervising Professor: Dr. K. R. Rao Department of Electrical Engineering The University of Texas at Arlington Optimization of the Deblocking Filter Algorithm in H.264 Codec for Real Time Implementation
Brief Information about H.264 H.264 is the latest video coding standard It addresses practical applications such as internet multimedia, wireless video, video conferencing etc. In terms of compression efficiency it is up by a factor of two over MPEG-2. The increase in compression efficiency comes at the expense of complexity. The resulting complexity does depend upon the profile of a standard implemented which is application dependent.
H.264 Encoder
H.264 Decoder
Why Deblocking Filter? At very low bit rates, coding visual artifacts are noticed in decoded frames. Prominent among them are blocking effects and ringing effects. Deblocking filter is used in the H.264 encoder and decoder to remove the blocking effects from decoded frames.
Causes of Blocking Effects Transform coding causes discontinuity between adjacent blocks. Severity of blocking effects is subject to the coarseness of quantization of the transform coefficients. Motion-compensated prediction also contributes to the blocking effects but mostly in mildly textured areas.
Regions Susceptible to Blocking effects For intra coded blocks the effect is hidden in either the more spatially active areas or smooth areas. For predictive coded blocks the effect mainly occurs in mildly textured areas. For predictive coded blocks the artifact known as false edge typically occurs for macro blocks with smoothly texture content.
Loop filtering vs. post filtering Post filters offer maximum freedom for decoder implementation as they are not normative part of the standard. Empirical tests have shown that loop filtering improves both objective and subjective quality of video streams with significant reduction in decoder complexity compared to post filter.
Desired Deblocking filter Smoothing artificial discontinuities between blocks. Differentiating between image edges and artificial edges. Image edges should not be smoothed as it degrades image quality. If needed, filters can be applied specific to image edges.
Desired deblocking filter It should remove the blocking effects without blurring the image. Its computational complexity should be low. It can be implemented in real time systems.
Summary on Relative Complexity POCS-based algorithm Weighted sum based algorithm Adaptive algorithms Algorithm Flow Iteratively projecting back and forth between two sets on entire picture Grading of blocks with grading matrix iterative on every pixel Iteratively classify and applying filter on every block edge. Major Operations Low-pass filtering, DCT Weighted sum of 4 pixels for each pixel 3-tap or 5-tap filter on pixels across edges Relative Computation Complexity HighMediumLow Relative Implementation Complexity HighLowMedium Visual QualityBestGood
Algorithm used for deblocking filter in H.264 Standard As the relative computation complexity of adaptive algorithm is low as shown in the table, they are the first choice in real time implementation. Deblocking filter uses adaptive algorithm in H.264 standard to remove the blocking effects.
Deblocking filter operation The deblocking filter is applied to all the edges of a 4x4 pixels block in each macroblock except to the edges on the boundary of a frame or a slice. For each block,vertical edges are filtered from left to right first, and then horizontal edges are filtered from top to bottom.
Main characteristics of deblocking filter On slice level, the filtering strength can be adjusted to the individual characteristics of the video sequence. On edge level, the filtering strength is dependent on inter/intra, motion and coded residuals. On pixel level, quantizer dependent threshold can turn off filtering for every individual pixel.
Principle of deblocking filter The decision tap for each pixel is based on the following factors. 1. Boundary strength 2. Thresholds α and β. 3. The content of sample pixels
Decision flow of bS where P and Q denote adjacent blocks
Decision flow of filter tap selection bS!=0 AND |A0-B0| <α AND |A1-A0| <β AND |B1-B0|<β
Problems with the Deblocking Filter Analysis of run-time profiles of decoder sub- functions reported that deblocking filter process in H.264 standard is the most computationally intensive part. Deblocking filter took as much as one-third of the computational resources of the decoder.
Reasons for the Complexity High adaptivity of the filter which requires conditional block edge and pixel levels. As a result, conditional branches almost inevitably appear in the innermost loops of the algorithm. Small block size employed for residual coding also contributes to high complexity. Also the code exposes little parallelism.
How can complexity be reduced? Some of the branches are inherited to the algorithm itself.So it is hard to eliminate them at programming level. Nothing can be done about the small block size employed for residual coding. If the conditional branches in the innermost loop of the algorithm and access to memory are reduced,complexity can be reduced.
Why loops add to complexity in real time? Program code includes extensive conditional branching which makes it unsuitable for deeply pipelined processors. Also the little parallelism exhibited by code makes it unsuitable for VLIW processors. VLIW processors otherwise are well suited for video encoding/decoding applications.
Proposed Algorithm
Intra Frame Results for Main Profile Test clip (QCIF) QP PSNR (dB) Reconstruction with Proposed Method Reconstruction without Loop Filter JM 9.2 (H.264 reference software) Reconstruction with Loop Filter JM 9.2 (H.264 reference software) Foreman Car phone Car phone News News Silent Container Container Bridge-close Bridge-close
Blocking Artifacts Reconstructed I frame without using a loop filter with QP=37 Reconstructed I frame with proposed method with QP=37
Reconstructed I frame without using a loop filter with QP=45
Reconstructed I frame with proposed method with QP=45
Reconstructed I frame without using a loop filter with QP=37
Reconstructed I frame with proposed method with QP=37
Reconstructed I frame without using a loop filter with QP=45
Reconstructed I frame with proposed method with QP=45
Test clip (QCIF) -Type of frames QP PSNR (dB)Total number of bits used Reconstruction with Proposed Method Reconstruction without Loop filter JM 9.2 (H.264 software) Reconstruction with Loop filter JM 9.2 (H.264 software) Reconstruction with Proposed Method Reconstruction without Loop filter JM 9.2 (H.264 software) Reconstruction with Loop filter JM 9.2 (H.264 software) Foreman-P News-P Car phone-P Bridge close-P Foreman-B News-B Car phone-B Bridge close-B P Frame and B-frame results for Main profile
Reconstructed P frame without using a loop filter with QP=39
Reconstructed P frame with proposed method with QP=39
Reconstructed B frame without using a loop filter with QP=39
Reconstructed B frame with proposed method with QP=39
Results for a GOP of size 10 Frame Type- frame number (Foreman _qcif) QP PSNR (dB) Reconstruction with Proposed Method Reconstruction without Loop Filter JM 9.2 (H.264 software) Reconstruction with Loop Filter JM 9.2 (H.264 software) Intra B P B P
Advantages Memory is saved as the proposed loop filter code size is 11kb compared to JM 9.2(H.264 software) loop filter code size of 21kb. JM 9.2 uses 2 tables of size 52 bytes and one table of size 260 bytes to check the pixel filtering condition. No such tables are used in the proposed method to check the filtering condition. Hence time of the processor is saved. Conditional loops in the innermost loops of the algorithm are reduced compared to JM 9.2.
Further Research The proposed deblocking filter can be implemented in a DSP or VLIW processor. The deringing filter can also be incorporated to see its effects on the reconstructed video. Computationally realizable image recovery techniques can be explored in H.264. Transforms which do not produce blocking artifacts but at the same time provides the benefits of integer DCT can be explored.
References
Thank You !!