MP3 and AAC Trac D. Tran ECE Department The Johns Hopkins University Baltimore MD 21218
MP3 MP3 = MPEG2 Layer III audio coding Transform: cascade of 32- channel filter bank and 6- channel or 18- channel MDCT Quantization: uniform scalar quantizer with a psycho-acoustic model Entropy coding: run-length + Huffman
Transformation Stage in MP3 H (z) 0 1 x[n] channel 512-tap CMFB 6-channel 12-tap MLT/MDCT H (z) channel 36-tap MLT/MDCT transients steady-state
Masking Masking discovered from psycho-acoustic experiments Human auditory system is less sensitive around a strong tonal signal
Masking: Original Signal
Masking Threshold Signal components below the masking threshold are deemed insignificant (can be quantized to zero) Components are computed from overlapping 1024-long Hanning windows
Advanced Audio Coding (AAC) Successor of MP3 Better audio quality than MP3 at most bit rates Perceptually lossless at 320 kbps for 5-channel surround sound (64 kbps/channel) Almost CD quality at 96 kbps (48 kbps/channel) AAC is part of the MPEG4 Standard Default audio format of Apple’s iPhone, iPod, iTunes; Sony PlayStation 3; Nintendo Wii MDCT – Scalar Quantization – Huffman Coding
Transformation Stage in AAC H (z) channel 256-tap MDCT H (z) channel 2048-tap MDCT for transient signals for steady-state signals AAC adaptively switches between 8 blocks of 128-point MDCT with 256-point windows 1 block of 1024-point MDCT with 2048-point window All windows have 50% overlap x[n] x[n]
JPEG Still Image Coding Standard Trac D. Tran ECE Department The Johns Hopkins University Baltimore MD 21218
Overall Structure of JPEG Color Converter Level Offset 8x8 DCT Uniform Quant. DC Pred. DC VLC Zigzag Scan Run -Level AC VLC DC AC Color converter RGB to YUV Level offset subtract 2^(N-1). N: bits / pixel. Quantization Different step size for different coefficients DC Predict from DC of previous block AC: Zigzag scan to get 1-D data Run-level: joint coding of non-zero coeffs and number of zeros before
JPEG Quantization Uniform mid-tread quantizer Larger step sizes for chroma components Different coefficients have different step sizes Smaller steps for low frequency coefficients (more bits) Larger steps for high frequency coefficients (less bits) Human visual system is not sensitive to error in high frequency Chroma Quantization Table Luma Quantization Table Actual step size: Scale the basic table by a quality factor
Scaling of Quantization Table Actual Q table = scaling x Basic Q table: quality factor ≤ 50: scaling = 50/quality quality factor > 50: scaling = 2 - quality/50 Quality Factor Scaling
DC Prediction DC Coefficients: average of a block DC of neighboring blocks are still similar to each others: redundancy The redundancy can be removed by differential coding: e(n) = DC(n) – DC(n-1) Only encode the prediction error e(n) 8x8 DC coeffs of Lena
Coefficient Category Divide coefficients into categories of exponentially increased sizes Use Huffman code to encode category ID Use fixed length code within each category Similar to Exponential Golomb code RangesRange SizeDC Cat. IDAC Cat. ID 010N/A -1, , -2, 2, , -6, -5, -4, 4, 5, 6, , …, -8, 8, …, , …, -16, 16, …, , …, -32, 32, …, ………… [-32767, ], [16384, 32767]
Coding of DC Coefficients Encode e(n) = DC(n) – DC(n-1) 8x8 DC Cat.Prediction ErrorsBase Codeword , , -2, 2, , -6, -5, -4, 4, 5, 6, , …, -8, 8, …, , …, -16, 16, …, , …, -32, 32, …, ……… Our example: DC: 8. Assume last DC: 5 e = 8 – 5 = 3. Cat.: 2, index 3 Bitstream: 10011
Coding of AC Coefficients Most non-zero coefficients are in the upper-left corner Zigzag scanning Example Zigzag scanning result (DC is coded separately): EOB
A Complete Example Original data:2-D DCT Quantized by basic table Zigzag scanning EOB Q table: … 12 … 14 … floor(39.8/ ) = 2 floor(6.5/ ) = 1 -floor(102.4/ ) = -9 floor(37.7/ ) = 3
A Complete Example Zigzag scanning EOB Inverse Quantization Reconstructed block MSE: 5.67
Progressive JPEG Baseline JPEG encodes the image block by block: Decoder has to wait till the end to decode and display the entire image Progressive: Coding DCT coefficients in multiple scans The first scan generates a low-quality version of the entire image Subsequent scans refine the entire image gradually. Two procedures defined in JPEG: Spectral selection: Divide all DCT coefficients into several bands (low, middle, high frequency subbands…) Bands are coded into separate scans Successive approximation: Send MSB of all coefficients first Send lower significant bits in subsequent scans
JPEG Coding Result for Lena Quality factor: QF 25 QF 5 Blocking artifact
Summary Transformation Karhunen-Loeve Transform (KLT): optimal linear transform Discrete Cosine Transform (DCT): for images & video MDCT: overlapped higher frequency resolution for audio Discrete Wavelet Transform (DWT): multi-resolution representation MP3 & AAC Audio coding: FB/MDCT – Quantization – Huffman JPEG: first international compression standard for still images DCT – Quantization – Run-length – Huffman JPEG2000: latest technology, wavelet-based Scalable, progressive coding with flexible intelligent functionalities