MPEG-2 Digital Video Coding Standard Dihong Tian dhtian@ece.gatech.edu http://users.ece.gatech.edu/dhtian ECE 8873 – Data Compression and Modeling Georgia Institute of Technology
Outline Overview MPEG audio (brief introduction) MPEG-2 video coding Basic building blocks Profiles and Levels Coding interlaced video Scalable coding extensions Other features Summary March 31, 2004 D.-H. Tian @ ECE-8873
What does MPEG define? MPEG defines the protocol of the bitstream between the encoder and the decoder The decoder is defined by implication; the encoder is left very much to the designer March 31, 2004 D.-H. Tian @ ECE-8873
MPEG-2 – Why another standard? Medium bandwidth (up to 1.5Mbits/sec) 1.25Mbits/sec video 352 x 240 x 30Hz 250Kbits/sec audio (two channels) Non-interlaced video Optimized for CD-Rom MPEG-2 (1990 - 1995) Multi-channel surround sound coding Higher bit rates (up to 80 Mbits/sec) A larger number of applications (SDTV, HDTV, etc) The encoding standard is a toolkit Interlaced and non-interlaced frame Different color subsampling modes e.g., 4:2:2, 4:2:0, 4:4:4 Flexible quantization schemes – can be changed at picture level Scalable bit-streams Profiles and levels Backward compatible March 31, 2004 D.-H. Tian @ ECE-8873
MPEG Audio Digital audio compression part of MPEG uses auditory masking techniques. MPEG-1 audio (ISO/IEC IS 11172-3) specifies mono or two-channel audio which may be Dolby Surround coded at bit rates from 32 kb/s to 384 kb/s. MPEG-2 audio (ISO/IEC IS 13818-3) specifies up to 7 channels (but 5 is more common), rates up to 1 Mb/s and supports variable bit-rate as well as constant bit-rate coding. MPEG-2 handles backward compatibility by encoding a two-channel MPEG-1 stream, then adds the 5/7 audio as an extension. March 31, 2004 D.-H. Tian @ ECE-8873
MPEG Video Coding - Fundamentals Source model Inter-pel correlation Spatial correlation Temporal correlation Subsampling and Interpolation Spatial domain (YUV components: 4:2:0, 4:2:2, ……) Temporal domain (frame rate) Basic building blocks: Temporal prediction (Motion Compensated prediction) Frequency domain decomposition (DCT) Quantization Variable-length coding March 31, 2004 D.-H. Tian @ ECE-8873
Hierarchy Sequence layer GOP layer Picture layer Slices Macroblock Block (8x8 pixels) I-, P-, or B-type 16 x 16 pixels 16 x 16 pixels Frequency domain decomposition Variable length coding Quantization Motion compensated prediction
Video Stream Data Hierarchy March 31, 2004 D.-H. Tian @ ECE-8873
Recap Overview MPEG audio (brief introduction) MPEG-2 video coding Basic building blocks Profiles and Levels Coding interlaced video Scalable coding extensions Other improvements March 31, 2004 D.-H. Tian @ ECE-8873
Scalable coding available Profiles & Levels Profile: A collection of compression tools that make up a coding system Level: Picture source format ranging from about VCR quality to full HDTV Profile Algorithms HIGH Supports all functionality provided by the Spatial Scalable Profile plus the provision to support 3 layers with the SNR and Spatial scalable coding modes 4:2:2 YUV-representation for improved quality requirements SPATIAL Scalable Supports all functionality provided by the SNR Scalable Profile plus an algorithm for spatial scalable coding (2 layers allowed) 4:2:0 YUV-representation SNR Scalable Supports all functionality provided by the MAIN Profile plus an algorithm for SNR scalable coding (2 layers allowed) MAIN Non-scalable coding algorithm supporting functionality for: coding interlaced video; random access; B-picture prediction modes SIMPLE Incudes all functionality provided by the MAIN Profile but does not support B-picture prediction modes Level Parameters HIGH 1920 samples/line 1152 lines/frame 60 frames/s 80 Mbit/s HIGH 1440 1440 samples/line 60 Mbit/s MAIN 720 samples/line 576 lines/frame 30 frames/s 15 Mbit/s LOW 352 samples/line 288 lines/frame 4 Mbit/s Scalable coding available Coding interlaced video A straightforward extension of MPEG-1
Coding Interlaced Video Interlaced video (TV) vs. Progressive video Each frame consists of two interlaced fields First field: odd-numbered lines of the frame (top field) Second field: even-numbered lines of the frame (bottom field) Display device interlaces the fields to composite a frame TV in Europe: Frame rate 25 Hz => Field rate 50 Hz TV in America: Frame rate 30 Hz => Field rate 60 Hz MPEG-2 supports: Two picture formats: frame-picture and field-picture Field/frame DCT option per MB for frame pictures New MC prediction modes for interlaced video March 31, 2004 D.-H. Tian @ ECE-8873
New Picture Types for Interlaced Video Frame pictures (I, P, or B type) Field pictures March 31, 2004 D.-H. Tian @ ECE-8873
Field/Frame DCT Option for Frame Pictures In a frame picture, mpeg-2 allows a field- or frame-DCT option for each macroblock High motion – field-DCT No motion, high spatial activity – frame-DCT March 31, 2004 D.-H. Tian @ ECE-8873
Prediction Modes in MPEG-2 Field prediction: made independently for each field from previously decoded field(s) Frame prediction: made from previously decoded frame(s) (16x8) motion compensation (only used in field pictures): 2 motion vectors used for each MB Dual-prime prediction (only used for P-pictures): made from 2 reference fields which are averaged to form the final prediction March 31, 2004 D.-H. Tian @ ECE-8873
Scalable Coding General philosophy Support applications beyond those addressed by the basic MAIN profile coding algorithm Scalable coding schemes: SNR scalability / Spatial scalability / Temporal scalability Up to three different scalable layers March 31, 2004 D.-H. Tian @ ECE-8873
SNR-Scalable Video Encoder March 31, 2004 D.-H. Tian @ ECE-8873
Spatial-Scalable Video Coder March 31, 2004 D.-H. Tian @ ECE-8873
Temporal Scalability March 31, 2004 D.-H. Tian @ ECE-8873
Other Features in MPEG-2 Alternate scan – fit interlaced video better Zig-Zag Scan Alternate Scan March 31, 2004 D.-H. Tian @ ECE-8873
Other Features in MPEG-2 (cont’d) Finer quantization of the DCT coefficients Intra macroblock DC coefficients: 11 bits (full) resolution vs. 8 bits in MPEG-1 AC coefficients: [-2048, 2047] vs. [-256, 255] in MPEG-1 Non-intra macroblock [-2048, 2047] vs. [-256, 255] in MPEG-1 Finer adjustment of MQUANT scaling parameter MPEG-1: integers (1-31) MPEG-2: an optional set including real numbers from 0.5 to 56 March 31, 2004 D.-H. Tian @ ECE-8873
Summary MPEG-2 allows higher bit rates than MPEG-1 MPEG-2 supports a larger number of applications MPEG-2 allows surround sound, and alternative language channels MPEG-2 allows progressive sequences and interlaced sequences MPEG-2 allows alternative scan patterns other than the zig-zag pattern MPEG-2 has scalable coding extensions, including: SNR scalability Spatial scalability Temporal scalability March 31, 2004 D.-H. Tian @ ECE-8873
References [1] Http://www.mpeg.org [2] John Watkinson, “MPEG-2”, Focal Press, 1999. [3] P. N. Tudor, “MPEG-2 video compression”, IEEE Electronics & Communication Engineering Journal, Dec. 1995. [4] T. Sikora, “MPEG digital video coding standards”, IEEE Signal Processing Magazine, Sept. 1997. [5] A. M. Tekalp, “Digital video processing”, Prentice Hall, 1998. March 31, 2004 D.-H. Tian @ ECE-8873
Questions