Download presentation
1
HEVC High Level Syntax and Around
Prepared by Shevach Riabtsev All questions/suggestions pls. address to
2
HEVC Encoder T & Q Q-1& T-1 + Ref Ref Ref - + Ref. Ref Ref Ref +
Residual T & Q CABAC Bit-Stream Ref Ref Motion Est. - Input Video MVs Inter Motion Comp. Intra/Inter Decision Reference samples MVs/Intra modes Mode Intra Pred. Intra Intra Est. Quantized residuals + Ref. Ref Ref Ref SAO Deblk. Reconstructed Q-1& T-1 + DPB SAO Params Est. Filter Control SAO params Notes: In addition to AVC/H.264, SAO and SAO Params Estimation added. The block “SAO Params Est.” can be executed right after deblocking or right after the reconstruction (with negligible penalty) as shown in the figure. HEVC similarity with AVC/H.264 allows quick upgrading of existing AVC/H.264 solutions to HEVC ones.
3
Bitstream Structure VPS SPS PPS Slice Header Slice Data * * * *
* * * * Picture #1 Slice Header Slice Data Slice Header Slice Data * * * * Picture #k As in H.264/AVC the byte stream format is also specified in HEVC, where each NAL unit is delimited by start-code (0x000001). Notice that each stream must commences with the 4-bytes start code (0x ) at least. The 4-bytes start code at the very beginning of a stream enables a decoder to achieve byte boundary and not skip over the first NAL (provided that the decoder enter to the stream in bit-aligned position and not byte-aligned one).
4
High-Level Syntax ( VPS/SPS)
VPS – dedicated to convey information that is common for multiple layers, i.e. each layer refers same VPS SPS – contain information which applies to all pictures of a video sequence and is fixed within this sequence: Profile, level, picture size, number sub-layers Enabling flags Restrictions: log2_min_luma_coding_block_size_minus3 - minimal CU size log2_diff_max_min_luma_coding_block_size – together with the minimal CU size specifies maximal CU size log2_min_transform_block_size – minimal transform block size log2_diff_max_min_transform_block_size – together with the minimal TB size specifies the maximal TB size Temporal scalability control Visual usability information (VUI) Notes: There is a duplication of some information between SPS and VPS (e.g. profile_idc).
5
Potential Usage of Some SPS Parameters
log2_min_luma_coding_block_size – specify the minimal CU size. Potential usage: if a priory known that a video sequence is “flat” or “smooth” then it’s worth to consider setting log2_min_luma_coding_block_size = 4 (16x16). Otherwise split-flags at the depth 16x16 are redundantly signaled. log2_diff_max_min_luma_coding_block_size – together with log2_min_luma_coding_block_size specify CTU size. There is no reason (excepting maybe a legacy to H.264/AVC) to set CTU size smaller than 64x64. Moreover, according to [8], 64 × 64-sized CTU brings nearly 12% bitrate reduction on the average compared with 16×16-sized CTU. log2_min_transform_block_size – specify the minimal transform block size. Potential usage, in case of “flat” video sequence it’s worth to consider setting log2_min_transform_block_size to 8x8. log2_diff_max_min_transform_block_size - together with the minimal TB size specifies the maximal TB size. Large transform sizes can cause performance peaks therefore it’s worth consider to avoid 32x32 transforms by setting maximal transform size to 16x16.
6
High-Level Syntax ( PPS/Slice Header)
PPS – conveys information which could change from picture to picture Reference list size Initial QP, by the way do not confuse QP with the quantizer step size. QP is a control parameter that controls what the step size is. Enabling flags Tiles/Wavefronts Slice Header - conveys information that can change from slice to slice POC, Slice type Prediction weights Deblocking parameters Tiles Entry points Reference picture lists: the list of reference pictures in DPB is explicitly signaled in the slice header (unlike to AVC/H.264 where MMCO or sliding window mode is used). Not mentioned pictures in the list are marked as unused for reference and should be removed from DPB respectively. It’s worth mentioning that the explicit signaling of the reference pictures enhances error resilience. Indeed, if a decoder detects that one of the mentioned pictures is not exist in DPB then the decoder derives that this picture got lost. Maximal number of reference indexes is 15 (unlike 16 in AVC/H.264).
7
Selected Picture Types (IDR, CRA)
IDR - pictures following the IDR in decoding order cannot use pictures decoded prior to the IDR as reference: CRA – pictures following the CRA in both decoding and presentation order cannot use pictures decoded prior the CRA as reference: Leading pictures CRA 1 2 3 4 CRA 2 3 1 4
8
Selected Picture Types (RADL, RASL)
Leading pictures - following in decoding order but preceding in presentation order. Leading pictures are divided into two types: RADL (random access decodable leading) – can be correctly decoded if decoding starts with the current CRA RASL (random access skipped leading) - can’t be correctly decoding if decoding starts with the current CRA and therefore this picture should be skipped. Decoding order: Leading pictures CRA 1 2 3 4 RASL RADL CRA Presentation order: 2 3 1 4
9
Picture Syntax (2) Coding Tree Block (CTB):
Picture is partitioned into square coding tree blocks (CTBs). The size N of the CTBs is chosen by the encoder (16x16, 32x32, 64x64). Luma CTB covers a square picture area of N ×N samples and the corresponding chroma CTBs cover each (N/2) × (N/2) samples (in 4:2:0 format). Coding Tree Units (CTU): The luma CTB and the two chroma CTBs, together with the associated syntax, form a coding tree unit (CTU). The CTU is the basic processing unit similar to MB in prior standards. Coding Block (CB): Each CTB can be further partitioned into multiple coding blocks (CBs). The size of the CB can range from the same size as the CTB to a minimum size (8×8). Coding Unit (CU) The luma CB and the chroma CBs, together with the associated syntax, form a coding unit (CU). Each CU can be either Intra or Inter predicted. Actually CU is the basic unit for compression.
10
CTU Syntax 64x64 CTU
11
CTU Syntax (2) All CUs in a CTU are encoded (traversed) in Z–Scan (depth-first) order, this order makes top and left samples to be available (casual) in most cases : 64x64 CTU The figure taken: Benjamin Bross: “Relax it's only HEVC”, WBU-ISOG Forum, European Broadcast Union, Geneva, Switzerland, November 28, 2012,
12
CTU Syntax (3) CU Hdr CU Data CU Hdr CU Data * * * * CTU
Formally CTU specifies quad-tree traversed in depth-first order Note: unlike to prior standards where MB consists of MB header followed by MB data, in HEVC ‘headers are dispersed’ or interleaved with data (complicate CTU pipeline, we can’t separate CTU headers parsing and CTU data decoding): CTU Header CU Hdr CU Data CU Hdr CU Data * * * * CTU
13
CU Syntax (1) Prediction Block (PB):
Each CB is partitioned in 1, 2 or 4 prediction blocks (PBs). Prediction Unit (PU): The luma PB and the chroma PBs, together with the associated syntax, form a prediction unit (PU). Intra: 2Nx2N NxN (only if CB size is smallest CB size) Inter: 2Nx2N NxN NxN Nx2N
14
CU Syntax (2) Inter Assymetric Partitions (conditioned by amp_enabled_flag in SPS and disabled for the minimal CB size): nLx2N nRx2N 2NxnU 2NxnD Examples when assymetric partitions are beneficial: 2NxnU 2NxnD nLx2N nRx2N Notice that if CU size is 8x8 assymetric partitions are disabled (in order to reduce complexity). I think that assymetric partitions could be disabled for 16x16 sizes too.
15
CU Syntax (3) Notes: The smallest luma PB size is 4 × 8 or 8 × 4 samples (where 4x8 and 8x4 are permitted only for uni-directional predictions, no bi-prediction < 8x8 allowed). Chroma PBs mimic corresponding luma partition with the scaling factor 1/2 for 4:2:0. Assymetric splitting is also applied to chroma CBs. Preprocessing (basing on texture and/or block complexity metric) can be used to speed up PB size decision process. If the complexity of CTU is high (detailed, textured region) then large PUs are filtered out, if the complexity is low (flat region) then small PUs are filtered out. Example method for the fast PU size detection (for intra case) is described in the paper “Content Adaptive Prediction Unit Size Decision Algorithm for HEVC Intra Coding”, 2012 Picture Coding Symposium.
16
CU Syntax (4) Transform Block (TB) :
Each luma CB can be quadtree partitioned into one, four or larger number of TBs. The number of transform levels is controlled by max_transform_hierarchy_depth_inter and max_transform_hierarchy_depth_intra. Example. CB divided into two TB levels (the block #1 is split into four blocks): 1,0 1,1 1,2 1,3 2 3 2 3 1,0 1,1 1,2 1,3
17
CU Syntax (5) [Shevach] Computational complexity to find best TU partition: For the range transform block sizes from 8x8 to 32x32 we evaluate RD cost 21 times: 1 {32x32} + 4 {16x16} {8x8} = 21 For the range transform block sizes from 4x4 to 32x32 (intra CU) we evaluate RD cost 53 times: 1 {32x32} + 4 {16x16} {8x8} + 32 {4x4} = 53
18
CU Syntax (6) Notes Unlike to H.264/AVC where TB ≤ PB, prediction and transform partitioning are almost independent (i.e. TB can contain several PBs and vice versa). However, TB>PB is allowable only for Inter and not for Intra (i.e. intra TB ≤ PB ). Reported by some experts that prediction discontinuities on PB boundaries within TB are smoothed by transform and quantization. If PB and TB boundaries coincide then the discontinuities are observed increased. 2x2 TBs are disabled (minimal TB size is 4x4). How handle chroma blocks in 4:2:0 format if luma TB is 4x4? Luma 8x Chroma 4x4 4x4 1 4x4 Cb 4x4 2 4x4 3 4x4 Cr 4x4
19
Restrictions/Constraints
HEVC disallows 16x16 CTBs for level 5 and above (4K TV). Motivation: 16x16 CTBs add overheads for decoders to target 4K TV: Up to 10% increase in worst-case decode time Add storage for SAO params. Maximal CTU size shall be less than or equal to 5*RawCtuBits/3. The variable RawCtuBits is derived as RawCtuBits=CtbSizeY * CtbSizeY * BitDepthY +2 * ( CtbWidthC * CtbHeightC ) * BitDepthC Numeric Example: Let’s take CtbSizeY=16 (as in AVC/H.264). Then RawCtuBits = 16*16*8+2*8*8*8 = 3072, the maximal CTB bit-size is 5*3072/3 = 5120 bits ( much more than the corresponding 3200 bits threshold in AVC/H.264).
20
Note on maximal CTU bit-size and worst-case CABAC performance
CABAC decoding (as well as encoding) contains the renormalization stage (due to finite arithmetic). The renormalization procedure is time consuming since it contains a while-loop and several if-else statements inside the loop. The number of calls the renormalization routine for a CTU is less or equal than the CTU bit-size (because during the renormalization at least one bit is read from bit-stream). Therefore if the worst case CTU bit-size is 5120 bits then the decoder has to invoke the renormalization at most 5120 times, i.e times in the worst case. From point of CABAC HW design the execution of renormalization the 5120 times is a serious performance bottleneck.
21
Note on Interlace Coding
Unlike to H.264/AVC, support of interlace coding in HEVC is not exist: No mixed frame-field interaction (like PAFF in H.264/AVC) No interlace scanning of transform coefficients No correction MVX[1] (or y-component of MV) if current and reference pictures are in different polarity (top-bottom or bottom-top). Field pictures are signaled by an SEI message (pic_timing) for every picture in the sequence. If progressive and interlace streams are spliced together then it’s required to insert a new sequence start to switch from progressive coding to interlaced one (or vice versa). In addition a particular flag ‘general_interlaced_source_flag’ is signaled in VPS/SPS (within profile_tier_level section) In H.264/AVC PAFF mode can be used to diminish I-frame bitrate peaks: I-frame is divided into two field pictures where the top field picture is coded as I-picture while the bottom picture is coded as P-picture. Consequently total bits produced by two I-P field pictures is expected to be smaller than the bits generated by single I-frame. Because H.265/HEVC does not support PAFF the above trick can’t be applied to cope with I-frame bitrate peaks.
22
Note on Picture Boundaries
As per the standard the picture boundaries are defined in units of the minimum luma CB size (MinCbSizeY): pic_width_in_luma_samples shall not be equal to 0 and shall be an integer multiple of MinCbSizeY. pic_height_in_luma_samples shall not be equal to 0 and shall be an integer multiple of MinCbSizeY As a result, at the right and bottom edges of the picture CTBs may exceed the picture boundaries. Data outside of the picture is not coded, therefore quadtree on the right and bottom edges are pruned respectively. Pls. see the following slide (granted by John Funnel from Parabola) for illustration:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.