III Digital Audio III.6 (Fr Oct 20) The MP3 algorithm with PAC
6. Frame Outputstream Formatting The MP3 encoder chain 5. Huffman Compression 6. Frame Outputstream Formatting Audio Data Filter Bank 32 Subbands Subbands Psychoacoustical Model Quantization and Encoding (Check of Quantization loop) External Check Encoding Encoding of Additional Information Datastream Formatting to Frames etc. Additional Data Data Stream 2*16 to Line 1. Digital Datastream 4. Quantization 2. FFT with Filter Bank 3. Psychoacoustical Model (Perceptual-Audio-Coding Model PAC)
The MP3 encoder chain 1. Digital Datastream 2 ~ stereo Audio Data Filter Bank 32 Subbands Subbands Psychoacoustical Model Quantization and Encoding (Check of Quantization loop) External Check Encoding Encoding of Additional Information Datastream Formatting to Frames etc. Additional Data Data Stream 2*16 to Line 1. Digital Datastream 2 ~ stereo 768 kbit/s ~ 48 000 × 16 b/s
Since # sample rate = # Fourier coefficients, The MP3 encoder chain Audio Data Filter Bank 32 Subbands Subbands Psychoacoustical Model Quantization and Encoding (Check of Quantization loop) External Check Encoding Encoding of Additional Information Datastream Formatting to Frames etc. Additional Data Data Stream 2*16 to Line 2. FFT with Filter Bank Important: Since # sample rate = # Fourier coefficients, speak of “Fourier samples per second” 2.1 Cut spectrum 0 – 20 kHz into 32 subbands of 625 Hz each (32 × 625 = 20 000) for 1/40 sec windows. 2.2 Use MDCT (Modified Discrete Cosine Transformation ~ variant of FFT) to split each 625 Hz band into 18 subbands with variable widths, according to psychoacoustical criteria. Get 576 = 18 × 32 “lines”.
5. Huffman lossless Compression The MP3 encoder chain 5. Huffman lossless Compression Audio Data Filter Bank 32 Subbands Subbands Psychoacoustical Model Quantization and Encoding (Check of Quantization loop) External Check Encoding Encoding of Additional Information Datastream Formatting to Frames etc. Additional Data Data Stream 2*16 to Line 4. lossy Quantization Already discussed, Ok!!!! 40% of compression
3. Psychoacoustical Model (Perceptual-Audio-Coding Model PAC) The MP3 encoder chain Audio Data Filter Bank 32 Subbands Subbands Psychoacoustical Model Quantization and Encoding (Check of Quantization loop) External Check Encoding Encoding of Additional Information Datastream Formatting to Frames etc. Additional Data Data Stream 2*16 to Line 3. Psychoacoustical Model (Perceptual-Audio-Coding Model PAC)
PAC 1: hearing thresholds PAC 2: auditory masking The MP3 encoder chain Psychoacoustical Model (Perceptual-Audio-Coding Model PAC) = core features of MP3, it covers 60 % of MP3 compression The PAC Model is based upon three limitations of human audio-perception: PAC 1: hearing thresholds PAC 2: auditory masking PAC 3: temporary masking All three PAC components generate lossy compression
PAC 1: hearing thresholds The MP3 encoder chain you don’t hear sinusoidal sounds below this threshold of loudness PAC 1: hearing thresholds loudness frequency (kHz)
The MP3 encoder chain PAC 2: auditory masking frequency loudness For every sinusoidal frequency component of frequency f and loudness l, there is a surrounding masking surface, where other frequency/loudness components cannot be heard together with the given one. Example: the 4 kHz/40 dB component (red) masks the blue one.
PAC 3: temporary masking The MP3 encoder chain PAC 3: temporary masking loudness time For every sinusoidal frequency component of frequency f and loudness l (red) another subsequent component (blue) cannot be heard below the given curve of loudness in time, because the ear needs some time to “recover” from that first component’s perception. This is even true for sounds before the given one (red curve), because the perception needs to be built up!
6. Frame Outputstream Formatting The MP3 encoder chain 6. Frame Outputstream Formatting Audio Data Filter Bank 32 Subbands Subbands Psychoacoustical Model Quantization and Encoding (Check of Quantization loop) External Check Encoding Encoding of Additional Information Datastream Formatting to Frames etc. Additional Data Data Stream 2*16 to Line