Overview of the Scalable Video Coding Extension of the H

Overview of the Scalable Video Coding Extension of the H
Overview of the Scalable Video Coding Extension of the H.264/AVC Standard Talk about SVC Heiko Schwarz, Detlev Marpe, Member, IEEE, and Thomas Wiegand, Member, IEEE presentation by: Fred Scott adapted from: Kianoosh Mokhtarian

Motivation High heterogeneity among receivers Simulcasting Transcoding
Connection quality Display resolution Processing power Simulcasting Transcoding Scalability Simulcasting - any number of varying quality streams can be sent but at the cost of a higher bit rate Transcoding - bad magnification artifacts Besides, spatial and quality scalability come at cost of significant loss in coding efficiency and large increase in decoder complexity.

Overview Background Temporal scalability Spatial scalability
Quality scalability Conclusion Background: types of scalability, applications, and requirements

Background Scalability Applications Temporal Spatial
Quality (fidelity or SNR) Object-based and region-of-interest Hybrid Applications Encode once, decode with differing quality Unequal importance + unequal error protection Player sensitive Definition of scalability: roughly: parts of the video bitstream can be removed in a way that the resulting … Object based and ROI scalability: Substreams represent spatially contiguous regions. Unequal error protection: esp. for the case of unpredictable throughput variations and/or high loss rate - error protection is greatly improved Player sens - ipod vs sdtv vs hdtv, low delay stream, later dl for full quality on slower connections.

Background Requirements for a scalable video coding technique
Similar coding efficiency to single-layer coding Little increase in decoding complexity Support of temporal, spatial, quality scalability Backward compatibility of the base layer Support of simple bitstream adaptations after encoding Similar coding efficiency: similarity means: for each substream: 10% to up to 50%, depending on the specific needs of an application. Non-VCL NAL units: Information that change infrequently. Video sequence: independently decodable part of a NAL unit bitstream. IDR is an access unit.

Quality scalability Conclusion

Temporal Scalability Enabled by restricting motion-compensated prediction Already provided by H.264/AVC Hierarchical prediction structure Pictures of temporal enhancement layers: typically B- pictures Group of Pictures (GoP) modifications to the standards of temporal encoding can lead to scalability rather than minimizing bitstream, the GOP’s can be arranged around

Temporal Scalability: Hierarchical Pred’ Struct’
Dyadic temporal enhancement layers dyadic - pairs of frames

Non-dyadic case it was just a special case: Not only the dyadic case: the example shows the case: susbtreams are of 1/3 and 1/9 of the full frame rate.

Other flexibilities Multiple reference picture concept of H.264/AVC Reference picture can be in the same layer as the target frame Hierarchical prediction structure can be modified over time Also, prediction structure of the base layer can be arbitrarily modified, e.g., such as for increasing coding efficiency.

Adjusting the structural delay any bidirectionality can add delays This is an example of adjusting the structure such that the delay becomes zero, i.e., receiving and decoding order become the same. Every picture can use only reference pictures that precede it. It typically decreases coding efficiency.

Temporal Scalability: Coding Efficiency
Highly dependent on quantization parameters Intuitively, higher fidelity for the temporal base layer pictures How to choose QPs Expensive rate-distortion analysis QPT = QP T High PSNR fluctuations inside a GoP Subjectively shown to be temporally smooth Dependence on quantization parameter: exists for both scalable and non- scalable cases. QP = Quantization Parameters QP_0: that of the base layer, QP_T: of the temporal layer T. This equation is tested for a wide range of sequences. Temporally smooth: without the annoying temporal artifacts.

Dyadic hierarchical B-pictures, no delay constraint Foreman, CIF, 30 Hz. referring GOP arrangement Coding efficiency is continuously improved by increasing GoP size. In comparison to the widely used IBBP coding structure, PSNR gain of more than 1 dB can be obtained for medium bitrates.

High-delay test set, CIF 30Hz, 34dB, compared to IPPP

Low-delay test set, 365x288, 25-30Hz, 38dB, delay is constrained to be zero compared to IPPP Low delay test set: video conferencing sequences. Coding efficiencies are significantly smaller than the previous slide.

Temporal Scalability: Conclusion
Typically no negative impact on coding efficiency But also significant improvement, especially when higher delays are tolerable Minor losses in coding efficiency are possible when low delay is required **Temporal scalability does not have any significant impact on coding efficiency, except in a few cases requiring low delays

Spatial Scalability Motion-compensated prediction and intra-prediction in each spatial layer, as for single-layer coding Inter-layer prediction Same coding order for all layers multilayered - each one for a specific spatial resolution interlayer for coding efficiency Same coding order: to restrict memory requirements and decoder complexity.

Spatial Scalability Motion-compensated prediction and intra-prediction in each spatial layer, as for single-layer coding Inter-layer prediction Same coding order for all layers Access units Access units: lower layer pictures do not need to be present in all access units. Combination of spatial and temporal scalabilities.

Spatial Scalability: Inter-Layer Prediction
Previous standards Inter-layer prediction by upsampling the reconstructed samples of the lower layer signal Prediction signal formed by: Upsampled lower layer signal Temporal prediction inside the enhancement layer Averaging both Lower layer samples not necessarily the most suitable data for inter-layer prediction Prediction of macroblock modes and associated motion parameters Prediction of the residual signal Two additional inter-layer prediction concepts … the temporal prediction signal mostly represents a better approximation of the original signal than the upsampled lower layer reconstruction.

A new macroblock type signalled by base mode flag Only a residual signal is transmitted No intra-prediction mode or motion parameter If the corresponding block in the reference layer is: Intra-coded  inter-layer intra prediction The reconstructed intra-signal of the reference layer is upsampled as a predictor Inter-coded  inter-layer motion prediction Partitioning data are upsampled, reference indexes are copied, and motion vectors are scaled up Derivation is done based on the corresponding data of the co-located 8x8 block in the reference layer: Done by upsampling/scaling up: partitioning, not indexes, vectors.

Inter-layer motion prediction (for a 16x16, 16x8, 8x16, or 8x8 macroblock partition) Reference indexes are copied Scaled motion vectors are used as motion vector predictors Inter-layer residual prediction Can be used for any inter-coded macroblock, regardless of its base mode flag or inter-layer motion prediction The residual signal of the reference layer is upsampled as a predictor

For a 16x16 macroblock in an enhancement layer: Inter-layer intra prediction (samples values are predicted) 1 Inter-layer residual prediction Inter-layer motion prediction (partitioning data, ref. indexes, and motion vectors are derived) base mode flag A summary No inter-layer residual prediction Inter-layer motion prediction (ref. indexes are derived, motion vectors are predicted) No inter-layer motion prediction

Spatial Scalability: Generalizing
Not only dyadic Enhancement layer may represent only a selected rectangular area of its reference layer picture Enhancement layer may contain additional parts beyond the borders of its reference layer picture Tools for spatial scalable coding of interlaced sources Like previous standards, it is not limited only to the dyadic case.

Spatial Scalability: Complexity Constraints
Inter-layer intra-prediction is restricted Although coding efficiency is improved by generally allowing this prediction mode Each layer can be decoded by a single motion compensation loop, unlike previous coding standards Restricted only to macroblocks whose co-located blocks are intra-coded.

Spatial Scalability: Coding Efficiency
Comparison to single-layer coding and simulcast Base/enhancement layer at 352x288 / 704x576 Only the first frame is intra-coded Inter-layer prediction (ILP): Intra (I), motion (M), residual (R) “City” was the worst performing case in the test set.

Comparison to single-layer coding and simulcast Base/enhancement layer at 352x288 / 704x576 Only the first frame is intra-coded Inter-layer prediction (ILP): Intra (I), motion (M), residual (R) As can be seen, effectiveness of a prediction tool or combination of tools strongly depends on the sequence characteristics.

Comparison to single-layer coding and simulcast Base/enhancement layer at 352x288 / 704x576 Only the first frame is intra-coded Inter-layer prediction (ILP): Intra (I), motion (M), residual (R) Also, effectiveness of a prediction tool or combination of tools strongly depends on the prediction structure. Rate-distortion performance of SVC compared to single-layer coding reduces when moving from a GoP size of 16 to IPPP coding.

Comparison of fully featured SVC “single-loop ILP (I, M, R)” to scalable profiles of previous standards “multi-loop ILP (I)” The gain of multi-loop decoding is often minor, and brings significant decoding complexity.

Spatial Scalability: Encoder Control
JSVM software encoder control Base layer coding parameters are optimized for that layer only  performance equal to single-layer H.264/AVC Spatial Scalability: Encoder Control Joint Scalable Video Model

JSVM software encoder control Base layer coding parameters are optimized for that layer only  performance equal to single-layer H.264/AVC Not necessarily suitable for an efficient enhancement layer coding Improved multi-layer encoder control Optimized for both layers

QPenhancement layer = QPbase layer + 4 Hierarchical B-pictures, GoP size = 16 Bit-rate increase relative to single-layer for the same quality is always less than or equal to 10% for both layers

Quality Scalability Special case of spatial scalability with identical picture sizes No upsampling for inter-layer predictions Inter-layer intra- and residual-prediction are directly performed in transform domain Different qualities achieved by decreasing quantization step along the layers Coarse-Grained Scalability (CGS) A few selected bitrates are supported in the scalable bitstream Quality scalability becomes less efficient when bitrate difference between CGS layers gets smaller

Quality Scalability: MGS
Medium-Grained Scalability (MGS) improves: Flexibility of the stream Packet-level quality scalability Error robustness Controlling drift propagation Coding efficiency Use of more information for temporal prediction

MGS: error robustness vs. coding efficiency A B Various concepts for trading off enhancement layer coding efﬁciency and drift for packet-based quality scalable coding. (a) Base layer only control. (b) Enhancement layer only control. (c) Two-loop control. (d) Key picture concept of SVC for hierarchical prediction structures, where key pictures are marked by the hatched boxes. C D

MGS: error robustness vs. coding efficiency Pictures of the coarsest temporal layer are transmitted as key pictures Only for them the base layer picture needs to be present in decoding buffer Re-synchronization points for controlling drift propagation All other pictures use the highest available quality picture of the reference frames for motion compensation High coding efficiency

Quality Scalability: Encoding, Extracting
Encoder does not known what quality will be available in the decoder Better to use highest quality references Should not be mistaken with open-loop coding Bitstream extraction based on priority identifier of NAL units assigned by encoder

Quality Scalability: Coding Efficiency
BL-/EL-only control: motion compensation loop is closed at the base/enhancement layer 2-loop control: one motion compensation loop in each layer adapt. BL/EL control: use of key pictures

Quality Scalability: Coding Efficiency
MGS vs. CGS

SVC encoder structure example

Conclusion SVC outperforms previous scalable video coding standards
Hierarchical Structures Temporal and Spatial Inter-layer and Intra-layer prediction Medium Grain Scalability (MGS)

References H. Schwarz, D. Marpe, and T. Wiegand, “Overview of the scalable video coding extension of the H.264/AVC standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 17, no. 9, pp. 1103– 1120, September 2007. T.Wiegand, G. Sullivan, J. Reichel, H. Schwarz, and M.Wien, "Joint Draft ITU-T Rec. H.264 | ISO/IEC / Amd.3 Scalable video coding," Joint Video Team, Doc. JVT-X201, July 2007. H. Kirchhoffer, H. Schwarz, and T. Wiegand, "CE1: Simplified FGS," Joint Video Team, Doc. JVT-W090, April 2007.

Overview of the Scalable Video Coding Extension of the H

Similar presentations

Presentation on theme: "Overview of the Scalable Video Coding Extension of the H"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Overview of the Scalable Video Coding Extension of the H

Similar presentations

Presentation on theme: "Overview of the Scalable Video Coding Extension of the H"— Presentation transcript:

Similar presentations

About project

Feedback