Download presentation
Presentation is loading. Please wait.
Published byLillie Wamsley Modified over 9 years ago
1
MPEG-1 MUMT-614 Jan.23, 2002 Wes Hatch
2
Purpose of MPEG encoding To decrease data rate How? –two choices: could decrease sample rate, but this would cause a decrease in available BW or, we could decrease word size. This will introduce noise into the signal (lower S:N ratio). Solution: –perceptual coding reduce word size based on signal conditions
3
Quick Overview MPEG removes “irrelevancy” & statistical redundancy lossy (but not perceptibly so) 1.41 Mbps (cd audio) between 64 and 448 kbps. (95% to 68% reduction) ratios of 4:1, 6:1 can be transparent in advanced listening tests supports 32, 44.1, 48 kHz sampling rates
4
MPEG-1 Types 3 layers: I, II, and III –I is simplest III is most complex –a layer can play encodings of those beneath it eg. Layer III can play I, II, and III; layer II may only play I and II
5
Components There are two “components” to MPEG-1: the encoder and the decoder. –the decoder is what is actually described under the specification; the encoder is not. –improvements to the encoder will have immediate effects in quality without necessitating corresponding changes to the decoder
6
Encoder vs. Decoder Encoder –does all the work –forward adaptive encoding all allocation of bits is performed by the encoder the psychoacoustic model used to determine “irrelevant” data is contained here improving psychoacoustic models/changes to encoder doesn’t require changing the decoder Decoder does less work
7
Encoder Details audio (PCM) passes through a polyphase filter bank, splitting the signal into 32 bands –filter outputs one sample per band for every 32 samples in layer I: after each band gets 12 samples the decoder determines the bit allocation for that band layer II: operates on 12 x 3= 36 samples per band (larger frame). Lower bands may receive: 15 bits, middle: 7 bits, and high: 3 bits max layer III is different…we’ll come back to it.
8
Encoder Details FFT is performed (w/Hann window) –512 point for layer I –1024 point for layer II a psychoacoustic model compares the output and is used to calculate masking thresholds used to determine which are the audible components (ie. SMR)
9
More details...
10
How bits are allocated data in the band is coded, NOT the FFT data. more “audible” components (ie. those highest above the masking threshold) are assigned the most bits
11
Encoder Details Scale factor is calculated –largest sample value in the band for each frame is found. Each of the 12 samples in the band are divided by this factor –layer II has 3 scale factors (for 3 groups of 12 samples), but one may suffice if the differences are small Corresponds to max. SPL in each band
12
Encoder Schematic
13
Encoder Details (layer III) layer III: –each band is transformed into 18 spectral coefficients with a MDCT (50% overlap) gives 576 coefficients, each representing a BW of 41.67 at 48 kHz 24ms –window size of the MDCT is variable long window for steady state signals (36 samples) to small windows for transient (12 samples)
14
Encoder Details (layer III) framerate varies in layer III can also use a bit reservoir for if more accuracy is needed Huffman encoding employed
15
Encoder a portion of the data stream is consumed by coding info: –headers –bit allocation info –scale factors –samples from each band
16
Other Features stereo joint coding –stereophonic irrelevance/redundancy eliminated –sum and difference signals (layer III) –L/R high frequency band samples summed into one channel, but scale factors remain independent
17
Decoder Details Put signal back together: –decode bit allocation info –samples multiplied by scale factors and run through an inverse filterbank –delays typically range from 10 to 30ms
18
Decoder schematic
19
Summary Split signal into 32 bands determine max. SPL levels for each band FFT to calculate masking thresholds –determine global masking curve calculate SMR for each band and assign bits accordingly
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.