1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.

1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik

2 Introduction Digital Audio Compression  Removal of redundant or otherwise irrelevant information from audio signal  Audio compression algorithms are often referred to as “audio encoders” Applications  Reduces required storage space  Reduces required transmission bandwidth

3 Audio Compression Audio signal – overview  Sampling rate (# of samples per second)  Bit rate (# of bits per second). Typically, uncompressed stereo 16-bit 44.1KHz signal has a 1.4MBps bit rate  Number of channels (mono / stereo / multichannel) Reduction by lowering those values or by data compression / encoding

4 Audio Data Compression Redundant information  Implicit in the remaining information  Ex. oversampled audio signal Irrelevant information  Perceptually insignificant  Cannot be recovered from remaining information

5 Audio Data Compression Lossless Audio Compression  Removes redundant data  Resulting signal is same as original – perfect reconstruction Lossy Audio Encoding  Removes irrelevant data  Resulting signal is similar to original

6 Audio Data Compression Audio vs. Speech Compression Techniques  Speech Compression uses a human vocal tract model to compress signals  Audio Compression does not use this technique due to larger variety of possible signal variations

7 Generic Audio Encoder

8 Psychoacoustic Model  Psychoacoustics – study of how sounds are perceived by humans  Uses perceptual coding eliminate information from audio signal that is inaudible to the ear  Detects conditions under which different audio signal components mask each other

9 Psychoacoustic Model Signal Masking  Threshold cut-off  Spectral (Frequency / Simultaneous) Masking  Temporal Masking Threshold cut-off and spectral masking occur in frequency domain, temporal masking occurs in time domain

10 Signal Masking Threshold cut-off  Hearing threshold level – a function of frequency  Any frequency components below the threshold will not be perceived by human ear

11 Signal Masking Spectral Masking  A frequency component can be partly or fully masked by another component that is close to it in frequency  This shifts the hearing threshold

12 Signal Masking Temporal Masking  A quieter sound can be masked by a louder sound if they are temporally close  Sounds that occur both (shortly) before and after volume increase can be masked

13 Spectral Analysis Tasks of Spectral Analysis  To derive masking thresholds to determine which signal components can be eliminated  To generate a representation of the signal to which masking thresholds can be applied Spectral Analysis is done through transforms or filter banks

14 Spectral Analysis Transforms  Fast Fourier Transform (FFT)  Discrete Cosine Transform (DCT) - similar to FFT but uses cosine values only  Modified Discrete Cosine Transform (MDCT) [used by MPEG-1 Layer-III, MPEG-2 AAC, Dolby AC-3] – overlapped and windowed version of DCT

15 Spectral Analysis Filter Banks  Time sample blocks are passed through a set of bandpass filters  Masking thresholds are applied to resulting frequency subband signals  Poly-phase and wavelet banks are most popular filter structures

16 Filter Bank Structures Polyphase Filter Bank [used in all of the MPEG-1 encoders]  Signal is separated into subbands, the widths of which are equal over the entire frequency range  The resulting subband signals are downsampled to create shorter signals (which are later reconstructed during decoding process)

17 Filter Bank Structures Wavelet Filter Bank [used by Enhanced Perceptual Audio Coder (EPAC) by Lucent]  Unlike polyphase filter, the widths of the subbands are not evenly spaced (narrower for higher frequencies)  This allows for better time resolution (ex. short attacks), but at expense of frequency resolution

18 Noise Allocation System Task: derive and apply shifted hearing threshold to the input signal  Anything below the threshold doesn’t need to be transmitted  Any noise below the threshold is irrelevant Frequency component quantization  Tradeoff between space and noise  Encoder saves on space by using just enough bits for each frequency component to keep noise under the threshold - this is known as noise allocation

19 Noise Allocation Pre-echo  In case a single audio block contains silence followed by a loud attack, pre-echo error occurs - there will be audible noise in the silent part of the block after decoding  This is avoided by pre-monitoring audio data at encoding stage and separating audio into shorter blocks in potential pre-echo case  This does not completely eliminate pre-echo, but can make it short enough to be masked by the attack (temporal masking)

20 Pre-echo Effect

21 Additional Encoding Techniques Other encoding techniques techniques are available (alternative or in combination)  Predictive Coding  Coupling / Delta Encoding  Huffman Encoding

22 Additional Encoding Techniques Predictive Coding  Often used in speech and image compression  Estimates the expected value for each sample based on previous sample values  Transmits/stores the difference between the expected and received value  Generates an estimate for the next sample and then adjusts it by the difference stored for the current sample  Used for additional compression in MPEG2 AAC

23 Additional Encoding Techniques Coupling / Delta encoding  Used in cases where audio signal consists of two or more channels (stereo or surround sound)  Similarities between channels are used for compression  A sum and difference between two channels are derived; difference is usually some value close to zero and therefore requires less space to encode  This is a case of lossless encoding process

24 Additional Encoding Techniques Huffman Coding  Information-theory-based technique  An element of a signal that often reoccurs in the signal is represented by a simpler symbol, and its value is stored in a look-up table  Implemented using a look-up tables in encoder and in decoder  Provides substantial lossless compression, but requires high computational power and therefore is not very popular  Used by MPEG1 and MPEG2 AAC

25 Encoding - Final Stages Audio data packed into frames Frames stored or transmitted

26 Conclusion HTML Bibliography http://www.music.mcgill.ca/~pkoles Questions

1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.

Similar presentations

Presentation on theme: "1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.

Similar presentations

Presentation on theme: "1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik."— Presentation transcript:

Similar presentations

About project

Feedback