Overview of the Scalable Video Coding Extension of the H.264/AVC Standard Kai-Chao Yang 12007/8Kai-Chao Yang, NTHU, Taiwan
Outline Introduction Problems Definition Functionality Goal Competition Applications Targets History of SVC Structure of SVC Temporal Scalability Spatial Scalability Quality Scalability Combined Scalability Profiles of SVC Conclusions 2007/8Kai-Chao Yang, NTHU, Taiwan2
Introduction - problem Non-Scalable Video Streaming Multiple video streams are needed for heterogeneous clients 2007/8Kai-Chao Yang, NTHU, Taiwan3 8Mb/s 6Mb/s 4Mb/s 1Mb/s 512Kb/s
Introduction - definition Scalable video stream Scalability Removal of parts of the video bit-stream to adapt to the various needs of end users and to varying terminal capabilities or network conditions Sub-stream 1 Sub-stream 2 Sub-stream n … Sub-stream k 1 Sub-stream k 2 Sub-stream k i … reconstruc tion High quality Low quality 42007/8Kai-Chao Yang, NTHU, Taiwan
Introduction - functionality Functionality of SVC Graceful degradation when “right” parts of the bit-stream are lost Bit-rate adaptation to match the channel throughput Format adaptation for backwards compatible extension Power adaptation for trade-off between runtime and quality 2007/8Kai-Chao Yang, NTHU, Taiwan5
Introduction - mode Example Scalability mode Fidelity reduction (SNR scalability) Picture size reduction (spatial scalability) Frame rate reduction (temporal scalability) Sharpness reduction (frequency scalability) Selection of content (ROI or object-based scalability) 2007/8Kai-Chao Yang, NTHU, Taiwan Enhancement 1 Enhancement 2 Enhancement 3 Enhancement 4 Enhancement 5 residual Most significant bit Base layer Enhancement layer
Structure of SVC 2007/8Kai-Chao Yang, NTHU, Taiwan7 Spatial decimation Temporal scalable coding Prediction Base layer coding SNR scalable coding Multiplex
Temporal Scalability Hierarchical prediction structures Hierarchical B pictures Non-dyadic hierarchical prediction Hierarchical prediction with zero delay GOP 82007/8Kai-Chao Yang, NTHU, Taiwan
Temporal Scalability 2007/8Kai-Chao Yang, NTHU, Taiwan9 I I I I PPPPPPPP P PPP P P P B0B0 B0B0 B0B0 B0B0 B0B0 B0B0 B0B0 B1B1 B1B1 B1B1 B1B1 B1B1 B1B1 B2B2 B2B2 B2B2 B2B2 N=1 N=2 N=4 N=8 Temporal scalability Video Coding Experiment with H.264/MPEG4-AVC Foreman, CIF 1320kbps Performance as a function of N Cascaded QP assignment QP(P) QP(B0)-3 QP(B1)-4 QP(B2)-5 This slide is copied from JVT-W132-Talk
Spatial Scalability 2007/8 H.264/AVC MCP & Intra-prediction Hierarchical MCP & Intra-prediction Base layer coding texture motion texture motion texture motion Inter-layer prediction Intra Motion Residual Inter-layer prediction Intra Motion Residual Spatial decimation Multiplex Scalable bit-stream 10Kai-Chao Yang, NTHU, Taiwan H.264/AVC compatible coder H.264/AVC compatible base layer bit-stream
Spatial Scalability Similar to MPEG-2, H.263, and MPEG-4 Arbitrary resolution ratio The same coding order in all spatial layers Combination with temporal scalability Inter-layer prediction 2007/8Kai-Chao Yang, NTHU, Taiwan11 Intra Spatial 0 Temporal 0 Temporal 1 Spatial 1 Temporal 2
Spatial Scalability The prediction signals are formed by MCP inside the enhancement layer (Temporal) (small motion and high spatial detail) Up-sampling from the lower layer (Spatial) Average of the above two predictions (Temporal + Spatial) Inter-layer prediction Three kinds of inter-layer prediction Inter-layer motion prediction Inter-layer residual prediction Inter-layer intra prediction Base mode MB Only residual are transmitted, but no additional side info. 2007/8Kai-Chao Yang, NTHU, Taiwan12
Spatial Scalability Inter-layer motion prediction base_mode_flag = 1 The reference layer is inter-coded Data are derived from the reference layer MB partitioning Reference indices MVs motion_pred_flag 1: MV predictors are obtained from the reference layer 0: MV predictors are obtained by conventional spatial predictors. 2007/8Kai-Chao Yang, NTHU, Taiwan13 (x1,y1)(x1,y1) Reference layer (x2,y2)(x2,y2) (2x 2,2y 2 )(2x 1,2y 1 )
Spatial Scalability Inter-layer residual prediction residual_pred_flag = 1 Predictor Block-wise up-sampling by a bi-linear filter from the corresponding 8 8 sub-MB in the reference layer Transform block basis 2007/8Kai-Chao Yang, NTHU, Taiwan14
Spatial Scalability Inter-layer intra prediction base_mode_flag = 1 The reference layer is intra-coded Up-sampling from the reference layer Luma: one-dimensional 4-tap FIR filter Chroma: bi-linear filter 2007/8Kai-Chao Yang, NTHU, Taiwan15
Spatial Scalability Past spatial scalable video: Inter-layer intra prediction requires completely decoding of base layer. Multiple motion compensation and deblocking filter are needed. Full decoding + inter-layer prediction: complexity > simulcast. Single-loop decoding Inter-layer intra prediction is restricted to MBs for which the co-located base layer is intra-coded 2007/8Kai-Chao Yang, NTHU, Taiwan16
Spatial Scalability Single-loop vs. multi-loop decoding 2007/8Kai-Chao Yang, NTHU, Taiwan17 This slide is copied from Inter IBP
Spatial Scalability Generalized spatial scalability in SVC Arbitrary ratio Neither the horizontal nor the vertical resolution can decrease from one layer to the next. Cropping Containing new regions Higher quality of interesting regions 2007/8Kai-Chao Yang, NTHU, Taiwan18
Spatial Scalability Encoder control (JSVM) Base layer p 0 ’ is optimized for base layer Enhancement layer p 1 ’ is optimized for enhancement layer Decisions of p 1 depend on p 0 Efficient base layer coding but inefficient enhancement layer coding 2007/8Kai-Chao Yang, NTHU, Taiwan19
Spatial Scalability Encoder control (optimization) Base layer Considering enhancement layer coding Eliminating p 0 ’s disadvantaging enhancement layer coding Enhancement layer No change w w = 0: JSVM encoder control w = 1: Single-loop encoder control (base layer is not controlled) 2007/8Kai-Chao Yang, NTHU, Taiwan20
Quality Scalability Coarse-grain quality scalability (CGS) A special case of spatial scalability Identical sizes for base and enhancement layers Smaller quantization step sizes of for higher enhancement residual layers Designed for only several selected bit-rate points Supported bit-rate points = Number of layers Switch can only occur at IDR access units 2007/8Kai-Chao Yang, NTHU, Taiwan21
Quality Scalability Medium-grain quality scalability (MGS) More enhancement layers are supported Refinement quality layers of residual Key pictures Drift control Switch can occur at any access units CGS + key pictures + refinement quality layers 2007/8Kai-Chao Yang, NTHU, Taiwan22
Quality Scalability Drift control Drift: The effect caused by unsynchronized MCP at the encoder and decoder side Trade-off of MCP in quality SVC Coding efficiency drift 2007/8Kai-Chao Yang, NTHU, Taiwan23
Quality Scalability MPEG-4 quality scalability with FGS Base layer is stored and used for MCP of following pictures Drift: Drift free Complexity: Low Efficiency: Efficient based layer but inefficient enhancement layer Refinement data are not used for MCP Base layer Refinement (possibly lost or truncated) 2007/824Kai-Chao Yang, NTHU, Taiwan
Quality Scalability MPEG-2 quality scalability (without FGS) Only 1 reference picture is stored and used for MCP of following pictures Drift: Both base layer and enhancement layer Frequent intra updates is necessary Complexity: Low Efficiency: Efficient enhancement layer but inefficient base layer 2007/8Kai-Chao Yang, NTHU, Taiwan25 Base layer Refinement (possibly lost or truncated)
Quality Scalability 2-loop prediction Several closed encoder loops run at different bit- rate points in a layered structure Drift: Enhancement layer Complexity: High Efficiency: Efficient base layer and medium efficient enhancement layer Base layer Refinement (possibly lost or truncated) 2007/826Kai-Chao Yang, NTHU, Taiwan
Quality Scalability SVC concepts Key picture Trade-off between coding efficiency and drift MPEG-4 FGS: All key pictures MPEG-2 quality scalability: No key pictures Base layer Refinement (possibly lost or truncated) 2007/827Kai-Chao Yang, NTHU, Taiwan
Quality Scalability Drift control with hierarchical prediction Key pictures Based layer is stored and used for the MCP of following pictures Other pictures Enhancement layer is stored and used for the MCP of following pictures GOP size adjusts the trade-off between enhancement layer coding efficiency and drift Base layer Refinement (possibly lost or truncated) 2007/828Kai-Chao Yang, NTHU, Taiwan PPPB1B1 B1B1 B2B2 B2B2 B2B2 B2B2
Combined Scalability SVC encoder structure Dependency layer 2007/829Kai-Chao Yang, NTHU, Taiwan The same motion/prediction information Temporal Decomposition
Dependency and Quality refinement layers Combined Scalability 2007/8Kai-Chao Yang, NTHU, Taiwan30 D = 2 Q = 2 Q = 1 Q = 0 D = 1 Q = 2 Q = 1 Q = 0 D = 0 Q = 2 Q = 1 Q = 0 Scalable bit- stream
Combined Scalability 2007/8Kai-Chao Yang, NTHU, Taiwan31 T0T0 D1D1 Q1Q1 Q0Q0 D0D0 Q1Q1 Q0Q0 T2T2 T1T1 T2T2 T0T0
Combined Scalability Bit-stream format 2007/8Kai-Chao Yang, NTHU, Taiwan32 NAL unit headerNAL unit header extensionNAL unit payload PTDQ P (priority_id): indicates the importance of a NAL unit T (temporal_id): indicates temporal level D (dependency_id): indicates spatial/CGS layer Q (quality_id): indicates MGS/FGS layer
Combined Scalability Bit-stream switching Inside a dependency layer Switching everywhere Outside a dependency layer Switching up only at IDR access units Switching down everywhere if using multiple-loop decoding 2007/8Kai-Chao Yang, NTHU, Taiwan33
Profiles of SVC Scalable Baseline For conversational and surveillance applications requiring low decoding complexity Spatial scalability: fixed ratio (1, 1.5, or 2) and MB- aligned cropping Temporal and quality scalability: arbitrary No interlaced coding tools B-slices, weighted prediction, CABAC, and 8x8 luma transform The base layer conforms Baseline profile of H.264/AVC 2007/8Kai-Chao Yang, NTHU, Taiwan34
Profiles of SVC Scalable High For broadcast, streaming, and storage Spatial, temporal, and quality scalability: arbitrary The base layer conforms High profile of H.264/AVC Scalable High Intra Scalable High + all IDR pictures 2007/8Kai-Chao Yang, NTHU, Taiwan35
References H. Schwarz, D. Marpe, and T. Wiegand, “Overview of the Scalable Video Coding Extension of the H.264/AVC Standard,” CSVT T. Wiegand, “Scalable Video Coding,” Joint Video Team, doc. JVT-W132, San Jose, USA, April T. Wiegand, “Scalable Video Coding,” Digital Image Communication, Course at Technical University of Berlin, (Available on H. Schwarz, D. Marpe, and T. Wiegand, “Constrained Inter-Layer Prediction for Single-Loop Decoding in Spatial Scalability,” Proc. of ICIP’ /8Kai-Chao Yang, NTHU, Taiwan36