H.264 數位影音技術 Wen-Jyi Hwang ( 黃文吉 ) Department of Computer Science and Information Engineering, National Taiwan Normal University.

H.264 數位影音技術 Wen-Jyi Hwang ( 黃文吉 ) Department of Computer Science and Information Engineering, National Taiwan Normal University

Introduction The H.264 is the newest video coding standard. We also call H.264 as MPEG-4 Part 10, or MPEG-4 AVC (Advance Video Coding). The H.264 was developed by the ITU-T Video Coding Experts Group (VCEG) together with the ISO/IEC Moving Picture Experts Group (MPEG).ITU-T Video Coding Experts GroupISOIECMoving Picture Experts Group

The basic goal of the H.264 is to create a standard capable of supporting good video quality at lower bit rates than previous standards.

An additional goal is to provide enough flexibility so that the standard can be effectively applied to a wide range of applications with low and high bit rates, including – digital video broadcast (DVB), – DVD storage, – IPTV.

Structure of H.264/AVC Video Coder VCL: Designed to efficiently encode the video content. NAL: Formats the VCL representation of the video and provides head information for conveyance by a variety of transport layers or storage media.

Video Coding Layer

Basic Structure of VCL Scaling & Inv. Transform Motion- Compensation Control Data Quant. Transf. coeffs Motion Data Intra/Inter Coder Control Decoder Motion Estimation Transform/ Scal./Quant. - Input Video Signal Split into Macroblocks 16x16 pixels Intra-frame Prediction De-blocking Filter Decoded Video Entropy Coding CompressedVideobitsCompressedVideobits

Intra-frame Prediction Scaling & Inv. Transform Motion- Compensation Control Data Quant. Transf. coeffs Motion Data Intra/Inter Coder Control Decoder Motion Estimation Transform/ Scal./Quant. - Intra-frame Prediction De-blocking Filter Decoded Video Entropy Coding CompressedVideobitsCompressedVideobits Input Video Signal Split into Macroblocks 16x16 pixels

Intra-frame encoding of H.264 supports Intra_4  4, Intra_16  16 and I_PCM. I_PCM allows the encoder directly send the values of encoded sample. Intra_4  4 and Intra_16  16 allows the intra prediction.

Intra 16  16 - 4 modes - Used in flat area Intra 4  4 -9 modes -Used in texture area

Four modes of Intra_16  16 –Mode 0 (vertical) : extrapolation from upper samples(H) –Mode 1 (horizontal): extrapolation from left samples(V) –Mode 2 (DC): mean of upper and left-hand samples (H+V) –Mode 3 (Plane) : a linear “plane” function is fitted to the upper and left-hand samples H and V. This works well in areas of smoothly-varying luminance H V 1 (Horizontal) ….. H V 2 (DC) Mean(H+V) H V 0 (Vertical) ……… H V 3 (Plane)

Example: Original image

Nine modes of Intra_4  4 –The prediction block P is calculated based on the samples labeled A-M. –The encoder may select the prediction mode for each block that minimizes the residual between P and the block to be encoded. MABCDEFGH I J K L a e i m b f j n c g k o d h l p 0 1 3 75 4 6 8 Block P Samples

MABCDEFGH I J K L a e i m b f j n c g k o d h l p Mode 0 (Vertical) Pred( a, e, i, m) = Pixel(A) Pred( b, f, j, n ) = Pixel(B) Pred( c, g, k, o) = Pixel(C) Pred( d, h, l, p ) = Pixel(D) MABCDEFGH I J K L a e i m b f j n c g k o d h l p Mode 1 (Horizontal) Pred( a, b, c, d ) = Pixel(I) Pred( e, f, g, h ) = Pixel(J) Pred( i, j, k, l ) = Pixel(K) Pred( m, n, o, p) = Pixel(L)

MABCDEFGH I J K L a e i m b f j n c g k o d h l p Mode 2 (DC) Pred(a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p) = Pixel(A+B+C+D+I+J+K+L)/8 MABCDEFGH I J K L a e i m b f j n c g k o d h l p Mode 3 (Diagomal Down-Left) Pred(a) = Pixel(A + 2*B + C + 2)/4 Pred(b,e) = Pixel(B + 2*C + D + 2)/4 Pred(c,f,i) = Pixel(C + 2*D + E + 2)/4 Pred(d,g,j,m) = Pixel(D + 2*E + F + 2)/4 Pred(h,k,n) = Pixel(E + 2*F + G + 2)/4 Pred(l,o) = Pixel(F + 2*G + H + 2)/4 Pred(p) = Pixel(G + 3*H + 2)/4 Mean(A..D..I..L)

MABCDEFGH I J K L a e i m b f j n c g k o d h l p Mode 4 (Diagomal Down-Right) MABCDEFGH I J K L a e i m b f j n c g k o d h l p Mode 5 (Vertical-Right) Pred(a,j) = Pixel(M + A + 1)/2 Pred(b,k) = Pixel(A + B + 1)/2 Pred(c,l) = Pixel(B + C + 1)/2 Pred(d) = Pixel(C + D + 1)/2 Pred(e,n) = Pixel(I + 2*M + A + 2)/4 Pred(f,o) = Pixel(M + 2*A + B + 2)/4 Pred(g,p) = Pixel(A + 2*B + C + 2)/4 Pred(h) = Pixel(B + 2*C + D + 2)/4 Pred(i) = Pixel(M + 2*I + J + 2)/4 Pred(m) = Pixel(I + 2*J + K + 2)/4 Pred(a,f,k,p) = Pixel(I + 2*M + A + 2)/4 Pred(b,g,l) = Pixel(M + 2*A + B + 2)/4 Pred(c,h) = Pixel(A + 2*B + C + 2)/4 Pred(d) = Pixel(B + 2*C + D + 2)/4 Pred(e,j,o) = Pixel(M + 2*I + J + 2)/4 Pred(i,n) = Pixel(I + 2*J + K + 2)/4 Pred(m) = Pixel(J + 3*K + L + 2)/4

MABCDEFGH I J K L a e i m b f j n c g k o d h l p Mode 6 (Horizontal-Down) Pred(a) = Pixel(A + B + 1)/2 Pred(l) = Pixel(E + F + 1)/2 Pred(b,i) = Pixel(B + C + 1)/2 Pred(c,j) = Pixel(C + D + 1)/2 Pred(d,k) = Pixel(D + E + 1)/2 Pred(e) = Pixel(A + 2*B + C + 2)/4 Pred(f,m) = Pixel(B + 2*C + D + 2)/4 Pred(g,n) = Pixel(C + 2*D + E + 2)/4 Pred(h,o) = Pixel(D + 2*E + F + 2)/4 Pred(p) = Pixel(E + 2*F + G + 2)/4 Pred(a,g) = Pixel(M + A + 1)/2 Pred(e,k) = Pixel(I + J + 1)/2 Pred(i,o) = Pixel(J + K + 1)/2 Pred(m) = Pixel(K + L + 1)/2 Pred(b,h) = Pixel(I + 2*M + A + 2)/4 Pred(c) = Pixel(M + 2*A + B + 2)/4 Pred(d) = Pixel(A + 2*B + C + 2)/4 Pred(f,l) = Pixel(M + 2*I + J + 2)/4 Pred(j,p) = Pixel(I + 2*J + K + 2)/4 Pred(n) = Pixel(J + 2*K + L + 2)/4 MABCDEFGH I J K L a e i m b f j n c g k o d h l p Mode 7 (Vertical-Left)

Pred(a) = Pixel(I + J + 1)/2 Pred(c,e) = Pixel(J + K + 1)/2 Pred(g,i) = Pixel(K + L + 1)/2 Pred(b) = Pixel(I + 2*J + K + 2)/4 Pred(d,f) = Pixel(J + 2*K + L + 2)/4 Pred(h,j) = Pixel(K + 3*L + 2)/4 Pred(k,m) = Pixel(L) Pred(l,n) = Pixel(L) Pred(o) = Pixel(L) Pred(p) = Pixel(L) MABCDEFGH I J K L a e i m b f j n c g k o d h l p Mode 8 (Horizontal-Up)

Example:

Motion Estimation/Compensation Scaling & Inv. Transform Motion- Compensation Control Data Quant. Transf. coeffs Motion Data Intra/Inter Coder Control Decoder Motion Estimation Transform/ Scal./Quant. - Intra-frame Prediction De-blocking Filter Decoded Video Entropy Coding CompressedVideobitsCompressedVideobits Input Video Signal Split into Macroblocks 16x16 pixels

Features of the H.264 motion estimation –Various block sizes –¼ sample accuracy 6-tap filtering to ½ sample accuracy simplified filtering to ¼ sample accuracy –Multiple reference pictures –Generalized B-Frames

Variable Block Size Block-Matching –In the H.264, a video frame is first splitted using fixed size macroblocks. –Each macroblock may then be divided into subblocks with different block sizes. –A macroblock has a dimension of 16  16 pixels. The size of the smallest subblock is 4  4 0 16x16 01 8x16 MB Types 8x8 01 23 16x8 1 0 8x8 0 4x84x8 01 01 23 4x4 8x4 1 0 8x8 Types

Example: This example shows the effectiveness of block matching operations with smaller sizes. Frame 1Frame 2

Difference between Frame 1 and Frame 2

block-matching with size 16X16 Difference between Frame 2 and Frame 2(16X16) Frame 2

To use a subblock with size less than 8  8, it is necessary to first split the macroblock into four 8  8 subblocks.

Example:

Encoding a motion vector for each subblock can cost a significant number of bits, especially if small block sizes are chosen. Motion vectors for neighboring subblocks are often highly correlated. Therefore, each motion vector can be effectively predicted from vectors of nearby, previously coded subblocks. The difference between the motion vector of the current block and its prediction is encoded and transmitted.

The method of forming the prediction depends on the block size and on the availability of nearby vectors. Let E be the current block, let A be the subblock immediately to the left of E, let B be the subblock immediately above E, and let C be the subblock above and to the right of E. It is not necessary that A, B, C, and E have the same size. C DB AE

There are two modes for the prediction of motion vectors: Median prediction Use for all block sizes excluding 16×8 and 8×16 Directional segmentation prediction Use for 16×8 and 8×16

C DB AE Median prediction If C not exist then C=D If B, C not exist then prediction = V A If A, C not exist then prediction = V B If A, B not exist then prediction = V C Otherwise Prediction = median(V A,,V B,V C )

Directional segmentation prediction Vector block size 8×16 Left: prediction = V A Right: prediction = V C Vector block size 16×8 Up: prediction = V B Down: prediction =V A

Fractional Motion Estimation In H.264, the motion vectors between current block and candidate block has ¼-pel resolution. The samples at sub-pel positions do not exist in the reference frame and so it is necessary to create them using interpolation from nearby image samples.

b=round((E-5F+20G+20H-5I+J)/32) h=round((A-5C+20G+20M-5R+T)/32) j=round((aa-5bb+20b+20s-5gg+hh)/32) Interpolation of ½-pel samples.

Interpolation of ¼-pel samples. a=round((G+b)/2) d=round((G+h)/2) e=round((b+h)/2)

Multiple Reference Frames

Maximum Compression Cannot Recognize Periodic Motion

The motion estimation techniques based on multiple reference frame technique provides opportunities for more precise inter-prediction, and also improved robustness to lost picture data. The drawback of multiple reference frames is that both the encoder and decoder have to store the reference frames used for Inter-frame prediction in a multi-frame buffer.

Generalized B Frames Basic B-frames: The basic B-frames cannot be used as reference frames.

Generalized B-frames: The generalized B-frames can be used as reference frames.

Weighted Prediction Reference frames can be weighted for motion compensation. There are 3 types of weighted prediction. P-Frame, explicit weighted (weighting transmitted), B-Frame, explicit weighted, B-Frame, implicit weighted (weighting predicted at decoder based on the temporal distance) The weighted prediction scheme may be effective for frames with fade transition.

Transformation/Quantization Scaling & Inv. Transform Motion- Compensation Control Data Quant. Transf. coeffs Motion Data Intra/Inter Coder Control Decoder Motion Estimation Transform/ Scal./Quant. - Intra-frame Prediction De-blocking Filter Decoded Video Entropy Coding CompressedVideobitsCompressedVideobits Input Video Signal Split into Macroblocks 16x16 pixels

Transformation The DCT operates on y, a block of N × N samples and creates Y, and N × N block of coefficients. The forward DCT: The reverse DCT therefore is given by DCT matrix A is orthogonal. That is,

The elements of A are: where That is,

Example: The transform matrix A for a 4×4 DCT is:

orwhere That is,

The H.264 transform is based on the 4×4 DCT with the following simplifications: 1.The transform is an integer transform. 2.The core part of the transform can be realized by only shifts and additions. 3.A scaling multiplication is integrated into the quantizer, reducing the total number of multiplications.

Recall that where

Post-scaling We can rewrite Y as

1. We call (C y C T ) the core 2D transform. 2. The matrix E is a matrix of scaling factors. 3. The symbol  indicates that each element of (C y C T ) is multiplied by the scaling factor in the same position in matrix E (i.e.,  is scalar multiplication rather than matrix multiplication) where and d = c/b

To simplify the implementation of the transform, d is approximated by 0.5. In order to ensure that the transform remains orthogonal, b also needs to be modified so that:

The final forward transform becomes

Note that the modified core transform involves only shifts and additions.

The inverse transform is given by: Pre-Scaling

Quantization H.264 assumes a scalar quantization. The quantization should satisfy the following requirements: (a)avoid division and/or floating point arithmetic (b)incorporate the post and pre-scaling matrices E f and E i.

The basic forward quantizer operation is Z(u,v)= round( Y(u,v)/QStep ) where Y(u,v) is a transform coefficient, Z(u,v) is a quantized coefficient, and QStep is a quantizer step size.

There are 52 quantizers (i.e.,Quantization Parameter ( QP) =0-51). Increase of 1 in QP means an increase of QStep by approximately 12% Increase of 6 in QP means an increase of QStep by a factor of 2. QP0123456789101112 … QStep 0.6250.68750.81250.87511.1251.251.3751.6251.7222.252.5 … QP…18…24…30…36…42…48…51 QStep …5…10…20…40…80…160… 224

The post-scaling factor (PF) (i.e., a 2, ab/2 or b 2 /4) is incorporated into the forward quantizer in the following way: 1.The input block y is transformed to give a block of unscaled coefficients W=C f yC f T. 2.Then, each coefficient in W is quantized and scaled in a single operation: where PF is a 2, ab/2 or b 2 /4 depending on the position ( u, v ). Z(u,v)= round( W(u,v)×PF /QStep ) Why? Post-Scaling PosotionPF (0,0),(2,0),(0,2) or (2,2)a2a2 (1,1),(1,3),(3,1) or (3,3)b 2 /4 Others ab/2

In order to simplify the arithmetic, the factor ( PF/QStep ) is implemented as a multiplication by a factor MF and a right shift, avoiding any division operations. Z(u,v)= round( W(u,v)×MF /2 qbits ) where and qbits=15+  QP/6 

Example: Suppose QP=4 and location (u,v)=(0,0). Therefore, QStep=1.0, PF=a 2 =0.25, and qbits=15. From We have MF=8192

The MF value for various QP s ( QP  5) are shown below. For QP >5, the factors MF remain unchanged, but qbits increases by 1 for each increment of six in QP. That is, qbits =16 for 6  QP  11, qbits =17 for 12  QP  17, and so on. Table_for_MF QP Positions (0,0),(2,0),(0,2) or (2,2) Positions (1,1),(1,3),(3,1) or (3,3) Other Positions 0 1310752438066 1 1191646607490 2 1008241946554 3 936236475825 4 819233555243 5 728228934559

Example: Suppose QP=10. Find MF value in the positions (0,0), (2,0) and (3,1) using Table_for_MF. Sol. The MF value for QP=10 is the same as that for QP=4. Using Table_for_MF, we have MF value = 8192 for coefficient in location (0,0), MF value = 8192 for coefficient in location (2,0), and MF value = 3355 for coefficient in location (3,1).

Pre-Scaling The de-quantized coefficient is given by The inverse transform involving pre-scaling operations proceeds in the following way: 1. The dequantized block is pre-scaled to block for core 2D inverse transform. 2. The reconstructed block is then given by

The pre-scaling factor ( PF ) (i.e., a 2, ab or b 2 ) is incorporated in the computation of, together with a constant scaling factor of 64 to avoid rounding errors. The values at the output of the inverse transform should be divided by 64 to remove the constant scaling factor.

The H.264 standard does not specify QStep or PF directly. Instead, the parameters V=QStep×PF×64 is defined. The V values for various QP s ( QP  5) are shown below. Table_for_V QP Positions (0,0),(2,0),(0,2) or (2,2) Positions (1,1),(1,3),(3,1) or (3,3) Other Positions 0 101613 1 111814 2 132016 3 142318 4 162520 5 182923

For QP >5, the V value increases by a factor of 2 for each increment of six in QP. That is, where

The Complete Transformation, Quantization, Rescaling and Inverse Transformation Encoding: 1.Input 4×4 block: y 2.Forward core transform: W=C f yC f T 3.Post-scaling and quantization: Z(u,v)= round( W(u,v)×MF /2 qbits ) Decoding: 1.Pre-scaling: 2.Inverse core transform: 3.Re-scaling:

Example: 6544 12663 554 8886 1. Suppose QP=10, and input block y = 10214102 -2133114 -44-122 -13-6113-38 2. Forward core transform: W =

3. Because QP=10, we have MF=8192,3355 or 5243, qbits=16. Z= 1313110 -2-2201 0-20 -1-3-31-2-2 4. V=32, 50 or 40 because 2  QP/6  =2. 41640320 -80100050 -320-640 -40-15040-100

5. Output of the inverse core transform after division by 64 is 5543 13663 664 7887

Entropy Coding Scaling & Inv. Transform Motion- Compensation Control Data Quant. Transf. coeffs Motion Data Intra/Inter Coder Control Decoder Motion Estimation Transform/ Scal./Quant. - Intra-frame Prediction De-blocking Filter Decoded Video Entropy Coding CompressedVideobitsCompressedVideobits Input Video Signal Split into Macroblocks 16x16 pixels

Here we present two basic variable length coding (VLC) techniques used by H.264: the Exp-Golomb code and context adaptive VLC (CAVLC). Exp-Golomb code is used universally for all symbols except for transform coefficients. CAVLC is used for coding of transform coefficients. No end-of-block, but number of coefficients is decoded. Coefficients are scanned backward. Contexts are built dependent on transform coefficients.

Exp-Golomb codes are variable length codes with a regular construction. First 9 codewords of Exp-Golomb codes Exp-Golomb code Code_numCodeword 01 1010 2011 300100 400101 500110 600111 70001000 80001001 ……

Each codeword of Exp-Golomb codes is constructed as follows: [M zeros][1][INFO] where INFO is an M-bit field carrying information. Therefore, the length of a codeword is 2M+1.

Given a code_num, the corresponding Exp-Golomb codeword can be obtained by the following procedure: (a)M=  log 2 [code_num+1])  (b)INFO=code_num+1-2 M Example: code_num=6 M=  log 2 [6+1])  =2 INFO=6+1-2 2 =3 The corresponding Exp-Golomb codeword =[M zeros][1][INFO]=00111

Given a Exp-Golomb codeword, its code_num can be found as follows: (a)Read in M leading zeros followed by 1. (b)Read M-bit INFO field (c)code_num=2 M +INFO-1 Example: Exp-Golomb codeword=00111 (a)M=2 (b)INFO=3 (c)code_num=2 2 +3-1=6

A parameter v to be encoded is mapped to code_num in one of 3 ways: ue(v) : Unsigned direct mapping, code_num=v. (Mainly used for macroblock type and reference frame index) se(v): Signed mapping. v is mapped to code_num as follows. code_num=2|v|, (v  0) code_num=2v-1,(v>0) (Mainly used for motion vector difference and delta QP)

me(v): Mapped symbols. Parameter v is mapped to code_num according to a table specified in the standard. This mapping is used for coded_block_pattern parameters. An example of such a mapping is shown below. Coded_block_pattern (Inter prediction)code_num 0 (no non-zero blocks) 0 16 (chroma DC block non-zero) 1 1 (top-left 8x8 luma block non-zero) 2 2 (top-right 8x8 luma block non-zero) 3

CAVLC This is the method used to encode residual and zig-zag ordered blocks of coefficients of 4×4 DCT.

The CAVLC is designed to take advantage of several characteristics of quantized 4×4 blocks: After prediction, transformation and quantization, blocks are usually sparse (containing many zeros). The highest non-zero coefficients after the zig/zag ordering are often sequences of +/- 1. The number of non-zero coefficients in adjacent blocks is correlated. The level (magnitude) of non-zero coefficients tends to be higher at the start of the zig-zag scan, and lower towards the high frequencies.

The procedure described below is based on the document entitled JVT Document JVT-C028, Gisle Bjøntegaard and Karl Lillevold, “Context-adaptive VLC (CVLC) coding of coefficients,” Fairfax, VA, May 2002. The H.264 CAVLC is an extension of this work.

The CAVLC encoding of a block of transform coefficients proceeds as follows. 1.Encode the number of coefficients and trailing ones. 2.Encode the sign of each trailing ones. 3.Encode the levels of the remaining no-zero coefficients. 4.Encode the total number of zeros before the last coefficients. 5.Encode each run of zeros.

The first step is to encode the number of coefficients (NumCoef) and trailing ones (T1s). The range of NumCoef is from 0 (no coefficient in the block) to 16 (16 non-zero coefficients). The range of number of T1s is from 0 (no T1) to 3 (three or more T1s). If there are more than 3 T1s, only the last three are treated as ``special cases” and the others are coded as normal coefficients. Encode the number of coefficients and trailing ones

Example: Consider the 4×4 block shown below -240 3000 0010 100 The Num-Coef=7 (i.e., -2, 4, 3, -1, -1, 1, 1), and Number of T1s=3.

Three tables can be used for the encoding of Num_Coeff and T1: Num-VLC0, Num-VLC1 and Num-VLC2. Num-VLC0 NumCoef\T1s0123 01--- 100001101-- 2000011101001- 3000010010001001000100000011 40000100000000110000000101000010 50000001110000010110000001000001011 60000000111000001010000000110100010101 70000000100100000000110000000110000010100 80000000100000000001001000000001010000000111 90000000001110000000010110000000001010000000101 100000000001100000000001101000000000111100000001000 11000000000001100000000011000000000001110000000000100 12000000000001000000000000100000000000001100000000000101 1300000000001010000000000011100000000001000100000000001001 14000000000000110000000000000100000000000100000000000000000011 150000000000000001000000000000000110000000000000001000000000000000101 16000000000000000000000000000000100100000000000000100010000000000000010000

NumCoeff\T1 s 0123 011--- 1000011011-- 200001000011010- 3001001001000001010101 410000010010111001010011 5000001111000000100001000010 6000001101000011100110110001 70000010011001110110011100100100 80000010000000010110000001011001100 9000000011100000101000000010010011111 1000000001100000001101000000110010011110 11000000001010000000011100000001001000000111 120000000010000000000110000000010000000000101 13000000000011000000000010000000000100000000000111 14000000000001100000000010100000000000100000000001101 1500000000000001000000000000000000000000001110000000001100 1600000000000010100000000000010000000000000011010000000000001100 Num-VLC1

NumCoeff\T1s0123 00011--- 100000110010-- 200000101011101101- 30000111010010101101100 40000101010000100011111 51011011010110100001110 61011001010100100111001 71011110101010100101000 8011010101010001110100011 9011010001011101110000010 10011011101101100110000011111 11011110010110001011110100110011 1201111000011110110110010101100100 13000000011000000010000000100000000111 14000000001100000010100000011010000001100 150000000010000000000110000000001000000000001 1600000000000010000000000010000000000000100000000000000 Num-VLC2

The selection of tables (Num-VLC0, Num-VLC1 and Num-VLC2) depends on the number of non-zero coefficients in upper and left-hand previously coded blocks N U and N L. A parameter N is then computed as follows: If blocks U and L are available (i.e., in the same coded slice), N=(N U +N L )/2 If only block U is available, N=N U. If only block L is available, N= N L. If neither is available, N=0.

The selection of table is based on N in the following way: NSelected Table 0,1Num-VLC0 2,3Num-VLC1 4,5,6,7Num-VLC2 8 or aboveFLC The FLC is of the following form: xxxxyy (i.e., 6 bits) where xxxx and yy represent Num_Coeff and T1, respectively.

For each T1, a single bit encodes the sign (0=+,1=-). The T1s are encoded backward, beginning with the highest frequency (i.e., last) T1. Encode the sign of each trailing ones

The level (sign and magnitude) of each remaining non-zero coefficient in the block is encoded in reverse order. There are 5 VLC tables to choose from, Lev_VLC0 to Lev_VLC4. Lev_VLC0 is biased towards lower magnitudes; Lev_VLC1 is biased towards slightly higher magnitudes, and so on. Encode the levels of the remaining no-zero coefficients

Lev-VLC0 1 0 1 0 0 1. 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 x x x x 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 x x x x x x x x x x x x Lev-VLC1 1 x 0 1 x 0 0 1 x. 0 0 0 0 0 0 0 0 0 0 0 0 0 1 x 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 x x x x 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 x x x x x x x x x x x x Lev-VLC2 1 x x 0 1 x x 0 0 1 x x. 0 0 0 0 0 0 0 0 0 0 0 0 0 1 x x 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 x x x x 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 x x x x x x x x x x x x Lev-VLC4 1 x x x x 0 1 x x x x 0 0 1 x x x x. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 x x x x 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 x x x x x x x x x x x x Lev-VLC3 1 x x x 0 1 x x x 0 0 1 x x x. 0 0 0 0 0 0 0 0 0 0 0 0 0 1 x x x 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 x x x x 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 x x x x x x x x x x x x

This is used only when it is impossible for a coefficient to have values +/- 1. It will happen when T1s<3. Lev-VLC1 Code no.Code Level (+/- 1, +/- 2, … ) Level ’ (+/- 2, +/- 3, … ) 01012 111-2 201023 3011-2-3 4001034 50011-3-4.. 0000000000000101415 000000000000011-14-15 000000000000001xxxx+/- 15 to +/- 22+/- 16 to +/- 23 000000000000001xxxxxxxxxxxx+/- 23 - >+/- 24 - > Lev-VLC0 Code no.Code Level (+/- 1, +/- 2, … ) Level ’ (+/- 2, +/- 3, … ) 0112 101-2 200123 30001-2-3 40000134.. 1300000000000001-7-8 14000000000000001xxxx+/- 8 to +/- 15+/- 9 to +/- 16 15000000000000001xxxxxxxxxxxx+/- 16 - >+/- 17 - >

Lev-VLC2 Code no.Code Level (+/- 1, +/- 2, … ) 01001 1101 21102 3111-2 401003 50101-3 601104 70111-4 8001005.. 000000000000011028 0000000000000111-28 000000000000001xxxx+/- 29 to +/- 36 000000000000001xxxxxxxxxxxx+/- 37 - > Lev-VLC3 Code no.Code Level (+/- 1, +/- 2, … ) 010001 11001 210102 31011-2 411003 51101-3 611104 71111-4 8010005.. 0000000000000111056 00000000000001111-56 000000000000001xxxx+/- 57 to +/- 64 000000000000001xxxxxxxxxxxx+/- 65 - > Lev-VLC4 Code no.Code Level (+/- 1, +/- 2, … ) 0100001 110001 2100102 310011-2.. 111108 11111-8 0100009.. 0000000000000011110120 0000000000000011111-120 0000000000000001xxxxxxxxxxxx+/- 121 - >

To improve coding efficiency, the tables are changed along with the coding process based on the following procedure. Inter and intra with QP >= 9 First coefficient with VLC0. Next VLC1. Increase VLC by one (up to 2) if |Level| > 3 Intra with QP < 9 If (number of coefficients > 10) First coefficient with VLC1. Next VLC2. else First coefficient with VLC0. Next VLC1. If (VLC = VLC1) change VLC2 if |Level| > 3. If (VLC >= VLC2) increase VLC by one (up to 4) if |Level| > 5

The following shows the table for encoding the total number of zeros before the last coefficient (TotZeros) Encode the total number of zeros before the last coefficient NumCoeff TotZeros 1234567 01111001011110101000101100111000 10111011101111001010101101111001 2010011000011001011101011101 30011001010101011100011001 4001000010110000110101111 50001110001111100 00000 6000100101011110111111001 700001110011001011110111101 800001011000011010101100110 90000011010001110001 011100 100000010110111010011100010111- 11000000011101011000111101001-- 1200000000010010110011111100--- 13000000110100111110010---- 140000001010100110----- 15000000100------ NumCoeff TotZeros 89101112131415 010100011100010000110001000100000 110100111100110001110011001101011 2101011110110011101101111- 31011111110111100-- 41100001011--- 500011110---- 61111000----- 701110------ 8100------- 9-------- 10-------- 11-------- 12-------- 13-------- 14-------- 15--------

The decoder can use Totzeros and NumCoeff to determine the position of the last non-zero coefficient.

Encode each run of zeros After TotZeros is known, we are now ready to encode the number of preceding zeros before each non-zero coefficient (called RunBefore). Let ZerosLeft indicate how many zeros are left to distribute during this encoding process. The ZeroLeft is used for encoding the RunBefore in CAVLC.

When encoding the RunBefore for the first non-zero Coefficient, the ZerosLeft begins at TotZeros. The ZerosLeft then decreases as RunBefore of more non-zero coefficients are encoded. The encoding of each RunBefore is dependent on the current ZeroLeft value as shown in the following table.

Zero-Left Run Before 123456>6 0 1101 000 1 00100 010 2 -0011 101 3 --10101 100 4 --- 1001111 5 ----10001101110 6 -----11000011 7 ------0010 8 ------00011 9 ------00010 10 ------00001 11 ------0000011 12 ------0000010 13 ------0000001 14 ------00000001 Why maximum number is 14?

Example: 030 0 10 1000 0000 Consider the following interframe residual 4×4 block The zigzag re-ordering of the block is shown below: 0,3,0,1,-1,-1,0,1,0,0,0,0,0,0,0,0 Therefore, NumCoeff=5, TotZero=3, T1s=3 Assume N=0.

Encoding: ValueCodeComments NumCoeff=5, T1s=30001011 N=0  Use Num-VLC0 sign of T1 (1)0 Starting at highest frequency sign of T1(-1)1 1 Level= +11 Inter frame  Use Lev-VLC0 Level= +30010 Use Lev-VLC1 TotZeros=31110 Also depends on NumCoeff ZerosLeft=3;RunBefore=100 RunBefore of the 1 st Coeff ZerosLeft=2;RunBefore=01 RunBefore of the 2 nd Coeff ZerosLeft=2;RunBefore=01 RunBefore of the 3 rd Coeff ZerosLeft=2;RunBefore=101 RunBefore of the 4 th Coeff ZerosLeft=1;RunBefore=1 No code required; last coeff The transmitted bitstream for this block is 0001011011100101110001101

Decoding: CodeValueOutput ArrayComments 0001011NumCoeff=5, T1s=3 Empty 0+1 T1 sign 1--1,1 T1 sign 1--1,-1,1 T1 sign 1+11,-1,-1,1 level value 0010+3+3,1,-1,-1,1 level value 1110TotZeros=3+3,1,-1,-1,1 00RunBefore=1+3,1,-1,-1,0,1 RunBefore of the 1 st Coeff 1RunBefore=0+3,1,-1,-1,0,1 RunBefore of the 2 nd Coeff 1RunBefore=0+3,1,-1,-1,0,1 RunBefore of the 3 rd Coeff 01RunBefore=1+3,0,1,-1,-1,0,1 RunBefore of the 4 th Coeff 0,+3,0,1,-1,-1,0,1 ZeroLeft=1

De-block Filter Scaling & Inv. Transform Motion- Compensation Control Data Quant. Transf. coeffs Motion Data Intra/Inter Coder Control Decoder Motion Estimation Transform/ Scal./Quant. - Intra-frame Prediction De-blocking Filter Decoded Video Entropy Coding CompressedVideobitsCompressedVideobits Input Video Signal Split into Macroblocks 16x16 pixels

The deblocking filter improves subjective visual quality. The filter is highly context adaptive. It operates on the boundary of 4×4 blocks as shown below. q3 q2 q1 q0 p0 p1 p2 p3 q3q2q1q0p0p1p2p3

The choice of filtering outcome depends on the boundary strength and on the gradient of image samples across the boundary. Given two adjacent blocks p and q, the boundary strength parameter Bs is selected according to the following rules. 1) p or q is intra coded, 2) Boundary is macroblock boundary. Bs = 4 (Strongest filtering) 1) p or q is intra coded, 2) Boundary is not macroblock boundary. Bs = 3 1) Neither p or q is intra coded, 2) p or q contain coded coefficients. Bs = 2 1) Neither p or q is intra coded, 2) Neither p or q contain coded coefficients, 3) p and q have different reference frames or a different number of reference frames or different motion vector values. Bs = 1 1. Neither p or q is intra coded, 2. neither p or q contain coded coefficients 3. p and q have same reference frames and identical motion vectors. Bs = 0 (No filtering)

A group of samples from the set (p2,p1,p0,q0,q1,q2) is filtered only if: (a) Bs>0 and (b) |p0-q0| <  and |p1-p0| <  and |q1-q0| <  where  and  are thresholds defined in the standard. The threshold values increase with the average quantizer parameter QP of two blocks q and p.

When QP is small, gradient across the boundary is likely to be due to image features that should be preserved. Therefore, the thresholds  and  are low for small QP. When QP is larger, blocking distortion is likely to be more significant and  and  are higher so that more boundary samples are filtered.

without deblock filtering with deblock filtering

Data Partitioning and Network Abstraction Layer

A video frame is coded as one or more slices. Each slice contains an integral number of macroblocks from 1 to total number of macroblocks in a picture. The number of macroblocks per slice need not to be constant within a picture.

There are five slice modes. Three commonly use modes are: 1.I-slice: A slice where all macroblocks of the slice are coded using intra prediction. 2.P-slice: In addition to the coding types of the I- slice, some macroblocks of the P-slice can be coded using inter-prediction (predicted from one reference picture buffer only). 3.B-slice: In addition to the coding types available in a P-slice, some macroblocks of the B-slice can be predicted from two reference picture buffers.

In H.264, all the slices in the same frame are not necessarily of the same mode. That is, a frame may contain I-slices, P-slices and B-slices.

In addition to I-, P-, and B- slices, the other two slice modes defined by H. 264 are SP- slice and SI-slice. Both the SP- and SI- slices are used for video streaming applications. The SP- slice may be beneficial for switching bitstreams from the same video sequence (but with different bitrates). The SI- slice may be adopted for switching the bitstreams from different video sequences.

Bitstram 1 Bitstram 2 P 1, n-2 P 1, n-1 P 1, n P 1, n+1 P 1, n+2 P2, n-2 P 2, n-1 P 2, n P 2, n+1 P2, n+2 SP 12, n I2, n+3 I1, n+3 SP- Frame

Bitstram 1 Bitstram 2 P 1, n-2 P 1, n-1 P 1, n P 1, n+1 P 1, n+2 P2, n-2 P 2, n-1 P 2, n-2 P 2, n+1 P2, n+2 SI 2, n I2, n+3 I1, n+3 SI- Frame

Note that the coded data in a slice can be placed in three separate Data Partitions (A, B and C) for robust transmission. Partition A contains the slice header and header data for each marcoblock in the slice. Partition B contains coded residual data for Intra slice macroblocks. Partition C contains coded residual data for Inter slice macroblocks.

In the H.264, the VCL data will be mapped into NAL units prior to transmission or storage. Each NAL unit contains a Raw Byte Sequence Payload (RBSP), a set of data corresponding to coded video data or header information. The NAL units can be delivered over a packet-based network or a bitstream transmission link or stored in a file. NAL header RBSPNAL header RBSPNAL header RBSP sequence of NAL units

RBSP typeDescription Parameter Set Global parameter for a sequence such as picture dimensions, video format. Supplemental Enhancement Information Side messages that are not essential for correct decoding of the video sequences. Picture Delimiter Boundary between pictures (optional). If not present, the decoder infers the boundary based on the frame number contained within each slice header. Coded Slice Header and data for a slice; this RBSP contains actual coded video data. Data Partition A, B or C Three units containing Data Partitioned slice layer data (useful for error decoding). End of Sequence End of Stream Filler Data Contains ‘dummy’ data

Example: Sequence parameter set SEIPicture parameter set I Slice (Coded slice) Picture delimiter P Slice (Coded slice) P Slice (Coded slice) The following figure shows an example of RBSP elements....

Profiles Baseline: –For lower-cost applications with limited computing resources. This profile is used widely in mobile applications. Main –For broadcast and storage applications. This profile may be replaced by High profile for those applications.

Extended –For the video streaming applications, This profile has high compression ratio and error resilience capabilities. High –For digital video broadcast and disc storage applications, particularly for high- definition television applications.

Extended Profile Baseline Profile Slice Group and ASO Redundant Slices I, P Slice CAVLC SP, SI Slices Data Partitioning Weighted Prediction B Slice Main Profile Interlace CABAC High Profile 8X8 Integer DCT 422, 444 Support

H.264 數位影音技術 Wen-Jyi Hwang ( 黃文吉 ) Department of Computer Science and Information Engineering, National Taiwan Normal University.

Similar presentations

Presentation on theme: "H.264 數位影音技術 Wen-Jyi Hwang ( 黃文吉 ) Department of Computer Science and Information Engineering, National Taiwan Normal University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

H.264 數位影音技術 Wen-Jyi Hwang ( 黃文吉 ) Department of Computer Science and Information Engineering, National Taiwan Normal University.

Similar presentations

Presentation on theme: "H.264 數位影音技術 Wen-Jyi Hwang ( 黃文吉 ) Department of Computer Science and Information Engineering, National Taiwan Normal University."— Presentation transcript:

Similar presentations

About project

Feedback