Overview of the High Efficiency Video Coding (HEVC) Standard G.J. Sullivan, J.R. Ohm, W.J. Han, and T. Wiegand IEEE Trans. Circuits and Systems for Video Technology, vol. 22, no. 12, Dec., 2012 Gaewon Kim (Ph.D. course) and Prof. Changhoon Yim Department of Internet and Multimedia Engineering, Konkuk University
Typical HEVC video encoder
HEVC Video Coding Layer Coding tree unit (CTU) and coding tree block (CTB) A CTU consists of one luma CTB and two chroma CTB L×L luma CTB: L can be 16, 32, 64 Coding unit (CU) and coding block (CB) The root of quadtree is CTU. CTU is partitioned into CUs recursively. A CU consists of one luma CB and two chroma CB. Each CU has an associated partitioning into prediction units (PUs) and a tree of transform units (TUs)
HEVC Video Coding Layer Prediction unit (PU) and prediction block (PB) A PU partitioning structure has its root at the CU level. PB size can be from 64×64 down to 4×4. Transform unit (TU) and transform block (TB) A TU tree structure has its root at the CU level. A luma CB may be identical to the luma TB or may be split into smaller luma TBs. TB size can be 4×4, 8×8, 16×16, and 32×32.
HEVC Video Coding Layer Motion compensation Quarter-sample precision is used for the MVs. 7-tap or 8-tap filters are used for interpolation of fractional-sample positions. Intrapicture prediction 33 directional modes, planar (surface fitting), DC (flat) Modes are encoded by deriving most probable modes (MPMs) based on those of previously decoded neighboring PBs.
HEVC Video Coding Layer Quantization control Uniform reconstruction quantization (URQ) Entropy coding Context adaptive binary arithmetic coding (CABAC) In-Loop deblocking filtering Similar to the one in H.264 More friendly to parallel processing Sample adaptive offset (SAO) Nonlinear amplitude mapping For better reconstruction of amplitude by histogram analysis
HEVC Video Coding Techniques HEVC : block-based hybrid video coding Interpicture prediction Temporal statistical dependences Intraprcture prediction Spatial statistical dependences Transform coding
Sampled Representation of Pictures HEVC uses YCbCr color space with 4:2:0 subsampling. Y component Luminance (luma). Represents brightness (gray level). Cb and Cr components Chrominance (chroma). Color difference from gray toward blue and red.
Coding Tree Unit (CTU) A picture is partitioned into CTUs. The CTU is the basic processing unit. It contains luma CTBs and chroma CTBs. A luma CTB covers L × L samples. Two chroma CTBs cover each L/2 × L/2 samples. HEVC supports variable-size CTBs. The value of L may be equal to 16, 32, or 64. It is selected according to needs of encoders. In terms of memory and computational requirements. Large CTB is beneficial when encoding high-resolution video content.
Division of the CTB into CBs. CTBs can be used as CBs or can be partitioned into multiple CBs using quadtree structures. The quadtree splitting process can be iterated until the size for a luma CB reaches a minimum allowed luma CB size (8 × 8 or larger).
PBs and PUs The prediction mode for the CU is signaled as being intra or inter. When it is signaled as intra, the PB (prediction block) size is the same as the CB size for all block. CB can be split into four PB quadrants when the CB size is equal to the smallest CB size. It allows mode selections for blocks as small as 4 × 4.
PBs and PUs When the prediction mode is signaled as inter, It is specified whether the CBs are split into one, two, or four PBs. The splitting into four PBs is allowed only when the CB size is equal to the smallest CB size. Each interpicture-predicted PB is assigned one or two motion vectors and reference picture indices.
Asymmetric Motion Partitioning PBs and PUs Intrapicture prediction Interpicture Asymmetric Motion Partitioning
Tree-Structured Partitioning into Transform Blocks and Units For residual coding, a CB can be recursively partitioned into transform blocks. The partitioning is signaled by a residual quadtree.
Tree-Structured Partitioning into Transform Blocks and Units Subdivision of a CTB into CBs and TBs. Solid lines: CB boundaries, dotted lines: TB boundaries
Slices and Tiles Slices are a sequence of CTUs that are processed in the order of a raster scan. The main purpose of slices is resynchronization after data losses.
Slices and Tiles Slices are self-contained. It can be correctly decoded without the use of any data from other slices in the same picture. This means that prediction within the picture is not performed across slice boundaries. Except for the in-loop filtering.
Slices and Tiles Each slice can be coded using different coding types. I slice A slice in which all CUs are coded using only intrapicture prediction. P slice Some CUs can be coded using interpicture prediction with uniprediction. B slice Some CUs can be coded using interpicture prediction with biprediction.
Slices and Tiles Tiles are self-contained and independently decodable. The main purpose of tiles is to enable the use of parallel processing architectures for encoding and decoding.
Slices and Tiles A slice is divided into rows of CTUs. This supports parallel processing of rows of CTUs by using several processing threads in the encoder or decoder. Wavefront parallel processing (WPP)
Intrapicture Prediction Planar prediction (Intra_Planar) Amplitude surface with a horizontal and vertical slope derived from boundaries DC prediction (Intra_DC) Flat surface with a value matching the mean value of the boundary samples Directional prediction (Intra_Angular) 33 different directional prediction is defined for square TB sizes from 4×4 up to 32×32
Intrapicture Prediction Fig. 6. Modes and directional orientations for intrapicture prediction
Intrapicture Prediction PB Partitioning When the CB size is larger than the minimum CB size, PB size is equal to the CB size When the CB size is equal to the minimum CB size, An intrapicture-predicted CB may have two types of PB partitions PART_2N×2N: no split PART_N×N: split into four equal-sized PBs
Intrapicture Prediction Intra-Angular Prediction 33 prediction directions, Intra-Angular[k], k: 2~34 Each TB is predicted directionally from spatially neighboring samples that are reconstructed For TB of size N×N, a total of 4N+1 spatially neighboring samples may be used for prediction Left, Above, Above right, Lower left To improve the intrapicture prediction accuracy, the projected reference sample is computed with 1/32 sample accuracy
Intrapicture Prediction Reference Sample Smoothing Reference samples used for the intrapicture prediction are sometimes filtered by [1 2 1]/4 smoothing filter 4×4 block Smoothing filter is not applied 8×8 block Only for diagonal directions, k = 2, 8, 34 16×16 block Most directions, except near horizontal, vertical 32×32 block Most directions, except exact horizontal, vertical
Intrapicture Prediction Mode Coding HEVC considers 3 most probable modes (MPM) when coding luma intrapicture prediction modes predictively First two modes are initialized by the prediction modes of the above and left PBs Any unavailable prediction mode is considered to Intra_DC When the first two MPM are not equal, the third MPM is set to Intra_Planar, Intra_DC, or Intra_Angular[26] (vertical) If the current luma prediction mode is one of three MPMs, only the MPM index is transmitted Otherwise, the index of the current luma prediction mode is transmitted by using 5-b fixed length code
Interpicture Prediction Partitioning modes PART_2N×2N The CB is not split. PART_2N×N The CB is split into two equal-size PBs horizontally. PART_N×2N PART_N×N The CB is split into four equal-size PBs. PART_2N×nU, PART_2N×nD, PART_nL×2N, and PART_nR×2N These types are known as asymmetric motion partitions (AMP).
Interpicture Prediction HEVC supports motion vectors with units of one quarter of the distance between luma samples. Fractional Sample Interpolation It is used to generate the prediction samples for noninteger sampling positions.
Interpicture Prediction Fractional Sample Interpolation HEVC uses an eight-tap filter for the half-sample positions and a seven-tap filter for the quarter sample positions. HEVC uses a single interpolation process. It improves precision and simplifies the architecture.
Interpicture Prediction
Transform, Scaling, and Quantization HEVC uses transform coding of the prediction error residual. The residual block is partitioned into multiple square TBs. The supported transform block sizes are 4×4, 8×8, 16×16, and 32×32.
Transform, Scaling, and Quantization Core Transform Two-dimentional transforms are computed by applying 1-D transforms in the horizontal and vertical directions. The elements of the core transform matrices were derived by approximating scaled DCT basis functions.
Transform, Scaling, and Quantization Alternative integer Transform It is derived from a DST. It is applied to only 4×4 luma residual blocks. For intrapicture prediction modes. It is not much more computationally demanding than the 4×4 DCT-style transform. It provides approximately 1% bit-rate reduction.
Transform, Scaling, and Quantization HEVC uses a uniform reconstruction quantization (URQ) scheme controlled by a quantization parameter (QP). The range of the QP values is defined from 0 to 51.
Entropy Coding HEVC uses only CABAC for entropy coding. Context modeling The number of contexts used in HEVC is substantially less than in H.264/MPEG-4 AVC. Entropy coding design actually provides better compression. Adaptive coefficient scanning Coefficient scanning is performed in 4×4 subblocks for all TB sizes. The selection of the coefficient scanning order depends on the directionalities of the intrapicture prediction.
Entropy Coding Adaptive coefficient scanning The horizontal scan is used when the prediction direction is close to vertical. The vertical scan is used when the prediction direction is close to horizontal. For other prediction directions, the diagonal up-right scan is used.
Entropy Coding Coefficient coding HEVC transmits the position of the last nonzero transform coefficient, a significance map, sign bits and levels for the transform coefficient.
In-Loop Filters Two processing steps, a deblocking filter (DBF) followed by an sample adaptive offset (SAO) filter, are applied to the reconstructed samples. The DBF is intended to reduce the blocking artifacts due to block-based coding. The DBF is only applied to the samples located at block boundaries. The SAO filter is applied adaptively to all samples satisfying certain conditions. e.g. based on gradient.
In-Loop Filters Deblocking Filter It is applied to all samples adjacent to a PU or TU boundary. Except the case when the boundary is also a picture boundary, or when deblocking is disabled across slice or tile boundaries. HEVC only applies the deblocking filter to the edge that are aligned on an 8×8 sample grid. This restriction reduces the worst-case computational complexity without noticeable degradation of the visual quality. It also improves parallel-processing operation. The processing order of the deblocking filter is defined as horizontal filtering for vertical edges for the entire picture first, followed by vertical filtering for horizontal edges.
In-Loop Filters Deblocking Filter The strength of the deblocking filter is controlled to only three strengths. Given P and Q are two adjacent blocks with a common 8×8 grid boundary, The filter strength of 2 is assigned when one of the blocks is intrapicture predicted. The filter strength of 1 is assigned if any of the following conditions is satisfied. P or Q has at least one nonzero transform coefficient. The reference indices of P and Q are not equal. The motion vectors of P and Q are not equal. The difference between a motion vector component of P and Q is greater than or equal to one integer sample. The filter strength of 0 means that the deblcoking process is not applied.
In-Loop Filters SAO (sample adaptive offset) It is a process that modifies the decoded samples by conditionally adding an offset value to each sample after the application of the deblocking filter, based on values in look-up tables transmitted by the encoder. It is performed on a region basis, based on filtering type selected per CTB. sao_type_idx 0: it is not applied to the CTB. sao_type_idx 1: band offset filtering sao_type_idx 2: edge offset filtering
In-Loop Filters SAO In the band offset mode. The selected offset value directly depends on the sample amplitude. The full sample amplitude range is uniformly split into 32 segments called bands. The sample values belonging to four of these bands (which are consecutive within the 32 bands) are modified by adding transmitted values. The main reason for using four consecutive bands is that in the smooth areas artifacts can appear.
In-Loop Filters SAO In the edge offset mode. a horizontal, vertical, or one of two diagonal gradient directions is used for the edge offset classification in the CTB. Each sample in the CTB is classified into one of five EdgeIdx categories.
In-Loop Filters SAO In the edge offset mode. Depending on the EdgeIdx category, an offset value is added to the sample value. It generally has a smoothing effect in the edge offset mode.
Special Coding Modes I_PCM mode The prediction, transform, quantization and entropy coding are bypassed. The samples are directly represented by a pre-defined number of bits. Its main purpose is to avoid excessive consumption of bits when the signal characteristics are extremely unusual and cannot be properly handled by hybrid coding.
Special Coding Modes Lossless mode The transform, quantization, and other processing that affects the decoded picture are bypassed. The residual signal from inter- or intrapicture prediction is directly fed into the entropy coder. It allows mathematically lossless reconstruction. SAO and deblocking filtering are not applied to this regions.
Special Coding Modes Transform skipping mode Only the transform is bypassed. It improves compression for certain types of video content such as computer-generated images or graphics mixed with camera-view content. It can be applied to TBs of 4×4 size only.