1 Audio Compression Multimedia Systems (Module 4 Lesson 4) Summary: r Simple Audio Compression: m Lossy: Prediction based r Psychoacoustic Model r MPEG.

Slides:



Advertisements
Similar presentations
Audio Compression ADPCM ATRAC (Minidisk) MPEG Audio –3 layers referred to as layers I, II, and III –The third layer is mp3.
Advertisements

Department of Computer Engineering University of California at Santa Cruz MPEG Audio Compression Layer 3 (MP3) Hai Tao.
Introduction to MP3 and psychoacoustics Material from website by Mark S. Drew
Guerino Mazzola (Fall 2014 © ): Introduction to Music Technology IIIDigital Audio III.6 (Fr Oct 24) The MP3 algorithm with PAC.
MPEG/Audio Compression Tutorial Mike Blackstock CPSC 538a January 11, 2004.
CS335 Principles of Multimedia Systems Audio Hao Jiang Computer Science Department Boston College Oct. 11, 2007.
MPEG-1 MUMT-614 Jan.23, 2002 Wes Hatch. Purpose of MPEG encoding To decrease data rate How? –two choices: could decrease sample rate, but this would cause.
CGMB324: Multimedia System Design
School of Informatics CG087 Time-based Multimedia Assets Compression & StreamingDr Paul Vickers1 Compression & Streaming Serving, shrinking, and otherwise.
Time-Frequency Analysis Analyzing sounds as a sequence of frames
Digital Audio Compression
Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.
Digital Audio Coding – Dr. T. Collins Standard MIDI Files Perceptual Audio Coding MPEG-1 layers 1, 2 & 3 MPEG-4.
AUDIO COMPRESSION TOOLS & TECHNIQUES Gautam Bhattacharya.
Digital Representation of Audio Information Kevin D. Donohue Electrical Engineering University of Kentucky.
1 Digital Audio Compression. 2 Formats  There are many different formats for storing and communicating digital audio:  CD audio  Wav  Aiff  Au 
Chapter 14 MPEG Audio Compression 14.1 Psychoacoustics 14.2 MPEG Audio 14.3 Other Commercial Audio Codecs 14.4 The Future: MPEG-7 and MPEG Further.
CELLULAR COMMUNICATIONS 5. Speech Coding. Low Bit-rate Voice Coding  Voice is an analogue signal  Needed to be transformed in a digital form (bits)
Speech Coding Nicola Orio Dipartimento di Ingegneria dell’Informazione IV Scuola estiva AISV, 8-12 settembre 2008.
Multimedia communications EG-371Dr Matt Roach Multimedia Communications EG 371 and EG 348 Dr Matthew Roach Lecture 2 Digital.
Speech & Audio Processing
1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.
MPEG-3 For Audio Presented by: Chun Lui Sunjeev Sikand.
Lecture 14: Spring 2007 MPEG Audio Compression
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
MPEG Audio Compression by V. Loumos. Introduction Motion Picture Experts Group (MPEG) International Standards Organization (ISO) First High Fidelity Audio.
Digital Voice Communication Link EE 413 – TEAM 2 April 21 st, 2005.
Audio Coding MPEG1 Layers I, II, III MPEG2MPEG4 Sherida Subrati Anthony Caliendo.
11/11/03CSE 100 – Info Technology & Its Impact on Society1 MP-3 Compression: How it works.
COMP 249 :: Spring 2005 Slide: 1 Audio Coding Ketan Mayer-Patel.
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
Fundamentals of Perceptual Audio Encoding Craig Lewiston HST.723 Lab II 3/23/06.
Audio CompressiontMyn1 Audio Compression Audio compression has become well entrenched in consumer and professional digital audio products such as the compact.
Digital Audio Multimedia Systems (Module 1 Lesson 1)
CS :: Fall 2003 Audio Coding Ketan Mayer-Patel.
GODIAN MABINDAH RUTHERFORD UNUSI RICHARD MWANGI.  Differential coding operates by making numbers small. This is a major goal in compression technology:
School of Informatics CG087 Time-based Multimedia Assets Compression & StreamingDr Paul Vickers1 Compression & Streaming Serving, shrinking, and otherwise.
Audio Compression Usha Sree CMSC 691M 10/12/04. Motivation Efficient Storage Streaming Interactive Multimedia Applications.
AUDIO COMPRESSION msccomputerscience.com. The process of digitizing audio signals is called PCM PCM involves sampling audio signal at minimum rate which.
CMPT 365 Multimedia Systems
A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.
Multimedia Data Speech and Audio Dr Sandra I. Woolley Electronic, Electrical and Computer Engineering.
Speech and Audio Coding Heejune AHN Embedded Communications Laboratory Seoul National Univ. of Technology Fall 2013 Last updated
MPEG Audio coders. Motion Pictures Expert Group(MPEG) The coders associated with audio compression part of MPEG standard are called MPEG audio compressor.
Sound Sound is a continuous wave that travels through the air
CIS679: Multimedia Basics r Multimedia data type r Basic compression techniques.
Image Compression Supervised By: Mr.Nael Alian Student: Anwaar Ahmed Abu-AlQomboz ID: IT College “Multimedia”
1 Audio Compression. 2 Digital Audio  Human auditory system is much more sensitive to quality degradation then is the human visual system  redundancy.
Image Processing Architecture, © 2001, 2002, 2003 Oleh TretiakPage 1 ECE-C490 Image Processing Architecture MP-3 Compression Course Review Oleh Tretiak.
Submitted By: Santosh Kumar Yadav (111432) M.E. Modular(2011) Under the Supervision of: Mrs. Shano Solanki Assistant Professor, C.S.E NITTTR, Chandigarh.
CS Spring 2009 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2009.
Advances in digital image compression techniques Guojun Lu, Computer Communications, Vol. 16, No. 4, Apr, 1993, pp
MPEG-1Standard By Alejandro Mendoza. Introduction The major goal of video compression is to represent a video source with as few bits as possible while.
Digital Audio III. Sound compression (I) Compression of sound data requires different techniques from those for graphical data Requirements are less stringent.
1 Audio Coding. 2 Digitization Processing Signal encoder Signal decoder samplingquantization storage Analog signal Digital data.
IntroductiontMyn1 Introduction MPEG, Moving Picture Experts Group was started in 1988 as a working group within ISO/IEC with the aim of defining standards.
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2014.
Voice Sampling. Sampling Rate Nyquist’s theorem states that a signal can be reconstructed if it is sampled at twice the maximum frequency of the signal.
Fundamentals of Multimedia Chapter 6 Basics of Digital Audio Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Audio Coding Lecture 7. Content  Digital Audio Basic  Speech Compression  Music Compression.
Fundamentals of Multimedia 2 nd ed., Chapter 14 Chapter 14 MPEG Audio Compression 14.1 Psychoacoustics 14.2 MPEG Audio 14.3 Other Audio Codecs 14.4 MPEG-7.
Lifecycle from Sound to Digital to Sound. Characteristics of Sound Amplitude Wavelength (w) Frequency ( ) Timbre Hearing: [20Hz – 20KHz] Speech: [200Hz.
MP3 and MP4 Audio By: Krunal Tailor
III Digital Audio III.7 (W Nov 04) The MP3 frame format.
III Digital Audio III.6 (Fr Oct 20) The MP3 algorithm with PAC.
III Digital Audio III.7 (F Oct 20) The MP3 frame format.
Data Compression.
MPEG-1 Overview of MPEG-1 Standard
III Digital Audio III.6 (Mo Oct 22) The MP3 algorithm with PAC.
Govt. Polytechnic Dhangar(Fatehabad)
Presentation transcript:

1 Audio Compression Multimedia Systems (Module 4 Lesson 4) Summary: r Simple Audio Compression: m Lossy: Prediction based r Psychoacoustic Model r MPEG Audio m Layer I and II m MP3 (MPEG Layer III) Sources: r Dr. Ze-Nian Li’s course material at: /365/li/ r MPEG Audio: tml

2 Simple Audio Compression Methods r Silence Compression - detect the "silence", similar to run- length coding r Adaptive Differential Pulse Code Modulation (ADPCM) e.g., in CCITT G or 32 Kbits/sec. m Encode the difference between two or more consecutive signals; the difference is then quantized --> hence the loss m Adaptive quantization m It is necessary to predict where the waveform is headed m Apple has proprietary scheme called ACE/MACE. A Lossy scheme that tries to predict where wave will go in next sample. Gives about 2:1 compression. r Linear Predictive Coding (LPC) fits signal to speech model and then transmits parameters of model. It sounds like a computer talking, 2.4 kbits/sec. r Code Excited Linear Predictor (CELP) does LPC, but also transmits error term --> audio conferencing quality at 4.8 kbits/sec.

3 Psychoacoustic Model Human hearing and voice m Frequency range is about 20 Hz to 20 kHz, most sensitive at 1 to 5 KHz. m Dynamic range (quietest to loudest) is about 96 dB m Normal voice range is about 500 Hz to 2 kHz Low frequencies are vowels and bass High frequencies are consonants How sensitive is human hearing? To answer this question we look at the following concepts: m Threshold of hearing Describes the notion of “quietness” m Frequency Masking A component (at a particular frequency) masks components at neighboring frequencies. Such masking may be partial. m Temporal Masking When two tones (samples) are played closed together in time, one can mask the other.

4 Threshold of hearing Experiment: Put a person in a quiet room. Raise level of 1 kHz tone until just barely audible. Vary the frequency and plot bB Frequency (KHz) r The ear is most sensitive to frequencies between 1 and 5 kHz, where we can actually hear signals below 0 dB. r Two tones of equal power and different frequencies will not be equally loud. r Sensitivity decreases at low and high frequencies.

5 Frequency Masking Experiment: Play 1 kHz tone (masking tone) at fixed level (60 dB). Play test tone at a different level (e.g., 1.1 kHz), and raise level until just distinguishable. Vary the frequency of the test tone and plot the threshold when it becomes audible:

6 Frequency Masking (Contd.) r Repeat previous experiment for various frequencies of masking tones

7 Temporal Masking r If we hear a loud sound, and then it stops, it takes a little while until we can hear a soft tone nearby (in frequency). r Experiment: m Play 1 kHz masking tone at 60 dB, plus a test tone at 1.1 kHz at 40 dB. Test tone can't be heard (it's masked). m Stop masking tone, then stop test tone after a short delay. m Adjust delay time to the shortest time when test tone can be heard (e.g., 5 ms). m Repeat with different level of the test tone and plot:

8 Net effect of masking:

9 MPEG Audio Facts r The two most common advanced (beyond simple ADPCM) techniques for audio coding are: m Sub-Band Coding (SBC) based m Adaptive Transform Coding based r MPEG audio coding is comprised of three independent layers. Each layer is a self-contained SBC coder with its own time- frequency mapping, psychoacoustic model, and quantizer. m Layer I: Uses sub-band coding m Layer II: Uses sub-band coding (longer frames, more compression) m Layer III: Uses both sub-band coding and transform coding. r MPEG-1 Audio is intended to take a PCM audio signal sampled at a rate of 32, 44.1 or 48 kHz, and encode it at a bit rate of 32 to 192 kbps per audio channel (depending on layer).

10 More Facts r MPEG-1: Bitrate of 1.5 Mbits/sec for audio and video About 1.2 Mbits/sec for video, 0.3 Mbits/sec for audio m (Uncompressed CD audio is 44,100 samples/sec * 16 bits/sample * 2 channels > 1.4 Mbits/sec) r Compression factor ranging from 2.7 to 24. r With Compression rate 6:1 (16 bits stereo sampled at 48 KHz is reduced to 256 kbits/sec) m Under optimal listening conditions, expert listeners could not distinguish between coded and original audio clips. r Supports one or two audio channels in one of the four modes: 1. Monophonic -- single audio channel 2. Dual-monophonic -- two independent channels, e.g., English and French 3. Stereo -- for stereo channels that share bits, but not using Joint-stereo coding 4. Joint-stereo -- takes advantage of the correlations between stereo channels

11 MPEG Coding Algorithm 1. Use convolution filters to divide the audio signal (e.g., 48 kHz sound) into 32 frequency sub-bands. (sub-band filtering) 2. Determine amount of masking for each band caused by nearby band using the psychoacoustic model. 3. If the power in a band is below the masking threshold, don't encode it. 4. Otherwise, determine number of bits needed to represent the coefficient such that, the noise introduced by quantization is below the masking effect (Recall that one fewer bit of quantization introduces about 6 dB of noise). 5. Format bitstream Filter into Critical Bands (Sub-band filtering Compute Masking (Psychoacoustic Model) Allocate bits (Quantization) Format BitStream Input Output

12 Masking and Quantization (Example) r Say, performing the sub-band filtering step on the input results in the following values (for demonstration, we are only looking at the first 16 of the 32 bands): Band Level r The 60dB level of the 8th band gives a masking of 12 dB in the 7th band, 15dB in the 9th. (according to the Psychoacoustic model) r The level in 7th band is 10 dB ( < 12 dB ), so ignore it. r The level in 9th band is 35 dB ( > 15 dB ), so send it. r We only send the amount above the masking level r Therefore, instead of using 6 bits to encode it, we can use 4 bits -- a saving of 2 bits (= 12 dB). r “determine number of bits needed to represent the coefficient such that, the noise introduced by quantization is below the masking effect” [noise introduced = 12bB; masking = 15 dB]

13 MPEG Coding Specifics Audio Samples Sub-band filter 0 Sub-band filter 1 Sub-band filter 2 Sub-band filter samples 12 samples 12 samples 12 samples 12 samples 12 samples Layer I Frame Layer II, III Frame

14 MPEG Coding Specifics r MPEG Layer I m Filter is applied one frame (12x32 = 384 samples) at a time. At 48 kHz, each frame carries 8ms of sound. m Uses a 512-point FFT to get detailed spectral information about the signal. (sub-band filter). Uses equal frequency spread per band. m Psychoacoustic model only uses frequency masking. m Typical applications: Digital recording on tapes, hard disks, or magneto- optical disks, which can tolerate the high bit rate. m Highest quality is achieved with a bit rate of 384k bps. r MPEG Layer II m Use three frames in filter (before, current, next, a total of 1152 samples). At 48 kHz, each frame carries 24 ms of sound. m Models a little bit of the temporal masking. m Uses a 1024-point FFT for greater frequency resolution. Uses equal frequency spread per band. m Highest quality is achieved with a bit rate of 256k bps. m Typical applications: Audio Broadcasting, Television, Consumer and Professional Recording, and Multimedia.

15 MPEG Coding Specifics r MPEG Layer III m Better critical band filter is used m Uses non-equal frequency bands m Psychoacoustic model includes temporal masking effects, takes into account stereo redundancy, and uses Huffman coder. Stereo Redundancy Coding: m Intensity stereo coding -- at upper-frequency sub-bands, encode summed signals instead of independent signals from left and right channels. m Middle/Side (MS) stereo coding -- encode middle (sum of left and right) and side (difference of left and right) channels.

16 Effectiveness of MPEG Audio *Quality factor: m 5 – perfect m 4 - just noticeable m 3 - slightly annoying m 2 – annoying m 1 - very annoying LayerTarget bit-rate RatioQuality* at 64 kbps Quality at 128 kbps Layer I192 kbps4:1-- Layer II128 kbps6:12.1 to Layer III64 kbps12:13.6 to 3.84+