An Overview of Perceptual Audio Coding and MPEG AAC

Slides:



Advertisements
Similar presentations
Audio Compression ADPCM ATRAC (Minidisk) MPEG Audio –3 layers referred to as layers I, II, and III –The third layer is mp3.
Advertisements

MP3 Overview John Ehrhardt Elena Silenok CSE228 – Spring 03.
Department of Computer Engineering University of California at Santa Cruz MPEG Audio Compression Layer 3 (MP3) Hai Tao.
Psycho-acoustics and MP3 audio encoding
CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 11 – MP3 and MP4 Audio (Part 7) Klara Nahrstedt Spring 2012.
Guerino Mazzola (Fall 2014 © ): Introduction to Music Technology IIIDigital Audio III.6 (Fr Oct 24) The MP3 algorithm with PAC.
CS335 Principles of Multimedia Systems Audio Hao Jiang Computer Science Department Boston College Oct. 11, 2007.
MPEG-1 MUMT-614 Jan.23, 2002 Wes Hatch. Purpose of MPEG encoding To decrease data rate How? –two choices: could decrease sample rate, but this would cause.
Audio Coding Team Member: ChungMing Yan, Chun Tong.
MPEG Audio Formats Jason Leung Wednesday, February 5, 2014.
Time-Frequency Analysis Analyzing sounds as a sequence of frames
Digital Audio Compression
August 2004Multirate DSP (Part 2/2)1 Multirate DSP Digital Filter Banks Filter Banks and Subband Processing Applications and Advantages Perfect Reconstruction.
ACHIZITIA IN TIMP REAL A SEMNALELOR. Three frames of a sampled time domain signal. The Fast Fourier Transform (FFT) is the heart of the real-time spectrum.
Digital Audio Coding – Dr. T. Collins Standard MIDI Files Perceptual Audio Coding MPEG-1 layers 1, 2 & 3 MPEG-4.
AUDIO COMPRESSION TOOLS & TECHNIQUES Gautam Bhattacharya.
4.2 Digital Transmission Pulse Modulation (Part 2.1)
Digital Representation of Audio Information Kevin D. Donohue Electrical Engineering University of Kentucky.
PAC/AAC audio coding standard A. Moreno Georgia Institute of Technology ECE8873-Spring/2004
1 Digital Audio Compression. 2 Formats  There are many different formats for storing and communicating digital audio:  CD audio  Wav  Aiff  Au 
Chapter 14 MPEG Audio Compression 14.1 Psychoacoustics 14.2 MPEG Audio 14.3 Other Commercial Audio Codecs 14.4 The Future: MPEG-7 and MPEG Further.
Speech & Audio Processing
1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.
MPEG-3 For Audio Presented by: Chun Lui Sunjeev Sikand.
MPEG Audio Compression by V. Loumos. Introduction Motion Picture Experts Group (MPEG) International Standards Organization (ISO) First High Fidelity Audio.
Digital Voice Communication Link EE 413 – TEAM 2 April 21 st, 2005.
Dolby AC-3 Audio Encoding & THX Wai Kam (Winnie) Henele Adams Peter Boettcher.
Audio Coding MPEG1 Layers I, II, III MPEG2MPEG4 Sherida Subrati Anthony Caliendo.
T.Sharon-A.Frank 1 Multimedia Image Compression 2 T.Sharon-A.Frank Coding Techniques – Hybrid.
Fundamentals of Perceptual Audio Encoding Craig Lewiston HST.723 Lab II 3/23/06.
Audio CompressiontMyn1 Audio Compression Audio compression has become well entrenched in consumer and professional digital audio products such as the compact.
1 Audio Compression Multimedia Systems (Module 4 Lesson 4) Summary: r Simple Audio Compression: m Lossy: Prediction based r Psychoacoustic Model r MPEG.
Image Compression - JPEG. Video Compression MPEG –Audio compression Lossy / perceptually lossless / lossless 3 layers Models based on speech generation.
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 15 – MP3 and MP4 Audio Klara Nahrstedt Spring 2014.
Fundamentals Rawesak Tanawongsuwan
Digital Audio Watermarking: Properties, characteristics of audio signals, and measuring the performance of a watermarking system نيما خادمي کلانتري
GODIAN MABINDAH RUTHERFORD UNUSI RICHARD MWANGI.  Differential coding operates by making numbers small. This is a major goal in compression technology:
MPEG-2 Standard By Rigoberto Fernandez. MPEG Standards MPEG (Moving Pictures Experts Group) is a group of people that meet under ISO (International Standards.
DIGITAL WATERMARKING OF AUDIO SIGNALS USING A PSYCHOACOUSTIC AUDITORY MODEL AND SPREAD SPECTRUM THEORY By: Ricardo A. Garcia University of Miami School.
Psycho- acoustics and MP3 audio encoding Physics of Music PHY103.
LECTURE Copyright  1998, Texas Instruments Incorporated All Rights Reserved Encoding of Waveforms Encoding of Waveforms to Compress Information.
MPEG: (Moving Pictures Expert Group) A Video Compression Standard for Multimedia Applications Seo Yeong Geon Dept. of Computer Science in GNU.
1 Basics of Audio Signal Processing Sudhir K. 2 Summary Slide  Digital Representation of Audio  Psycho-Acoustic principles  Lossy Compression of Audio.
Audio Compression Usha Sree CMSC 691M 10/12/04. Motivation Efficient Storage Streaming Interactive Multimedia Applications.
By: T’quoia Boyd Science Glossary Encoder- a part in MP3 that turns messages into codes Polyphase filter bank-a part used in MP3 to separate sound.
A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.
MPEG Audio coders. Motion Pictures Expert Group(MPEG) The coders associated with audio compression part of MPEG standard are called MPEG audio compressor.
Dhatchaini Rajendran Student ID: Date :
1 Audio Compression. 2 Digital Audio  Human auditory system is much more sensitive to quality degradation then is the human visual system  redundancy.
8. 1 MPEG MPEG is Moving Picture Experts Group On 1992 MPEG-1 was the standard, but was replaced only a year after by MPEG-2. Nowadays, MPEG-2 is gradually.
Image Processing Architecture, © 2001, 2002, 2003 Oleh TretiakPage 1 ECE-C490 Image Processing Architecture MP-3 Compression Course Review Oleh Tretiak.
Submitted By: Santosh Kumar Yadav (111432) M.E. Modular(2011) Under the Supervision of: Mrs. Shano Solanki Assistant Professor, C.S.E NITTTR, Chandigarh.
MPEG-1Standard By Alejandro Mendoza. Introduction The major goal of video compression is to represent a video source with as few bits as possible while.
Data dan Teknologi Multimedia Sesi 09 Nofriyadi Nurdam.
1 Audio Coding. 2 Digitization Processing Signal encoder Signal decoder samplingquantization storage Analog signal Digital data.
IntroductiontMyn1 Introduction MPEG, Moving Picture Experts Group was started in 1988 as a working group within ISO/IEC with the aim of defining standards.
EE5359 Multimedia Processing Project Study and Comparison of AC3, AAC and HE-AAC Audio Codecs Dhatchaini Rajendran Student ID: Date :
Fundamentals of Multimedia Chapter 6 Basics of Digital Audio Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
UNIT V. Linear Predictive coding With the advent of inexpensive digital signal processing circuits, the source simply analyzing the audio waveform to.
By :- Ishank Ranjan Akash Gupta. Audio & Audio File Formats Audio is an electrical or other representation of sound. An audio file format is a file format.
Fundamentals of Multimedia 2 nd ed., Chapter 14 Chapter 14 MPEG Audio Compression 14.1 Psychoacoustics 14.2 MPEG Audio 14.3 Other Audio Codecs 14.4 MPEG-7.
MP3 and AAC Trac D. Tran ECE Department The Johns Hopkins University Baltimore MD
MP3 and MP4 Audio By: Krunal Tailor
III Digital Audio III.7 (W Nov 04) The MP3 frame format.
III Digital Audio III.6 (Fr Oct 20) The MP3 algorithm with PAC.
III Digital Audio III.7 (F Oct 20) The MP3 frame format.
III Digital Audio III.7 (Mo Oct 22) The MP3 frame format.
MPEG-1 Overview of MPEG-1 Standard
III Digital Audio III.6 (Mo Oct 22) The MP3 algorithm with PAC.
Govt. Polytechnic Dhangar(Fatehabad)
Presentation transcript:

An Overview of Perceptual Audio Coding and MPEG AAC

Introduction Audio coding or audio compression algorithms are used to obtain compact digital representation of high-fidelity (wideband) audio signals for the purpose of efficient transmission or storage. The central objective in audio coding is to represent the signal with minimum number of bits while achieving transparent signal reproduction i.e. generating output audio that cannot distinguished from the original input even by a listener with ”Golden Ears” The Motion Picture Experts Group (MPEG) audio compression algorithm is an International Organization for Standardization (ISO) standard for high- fidelity audio compression.

Continue … MPEG audio compression standards are lossy audio coding standards. They try to compress audio by trying to reduce perceptual and statistical redundancies. The basic task of a perceptual audio coding system is to compress the digital audio data in a way that - - the compression is as high as possible, and - the reconstructed (decoded) audio sounds exactly (or as close as possible) to the original audio before compression

Audio Coding Techniques Parametric Coding Waveform Coding Time Domain PCM, DPCM, ADPCM etc. Frequency Domain Transform Coding, Subband Coding Hybrid Coding

Perceptual Audio Coding Basics Human hearing limited to values lower than ~20kHz in most cases Human hearing is insensitive to quiet frequency components to sound accompanying other stronger frequency components Stereo audio streams contain largely redundant information MPEG audio compression takes advantage of these facts to reduce extent and detail of mostly inaudible frequency ranges

Generic Perceptual Audio Coding Architecture

Psychoacoustic Principles High-precision engineering models for high-fidelity audio currently do not exist. So, audio coding algorithms rely upon generalized receiver models to optimize coding efficiency. In the case of audio, the receiver is ultimately the human ear and sound perception is affected by its masking properties. Perceptual audio coders achieve compression by exploiting the fact that “irrelevant” signal information is not detectable by even a well trained or sensitive listener.

Irrelevant signal information is identified during signal analysis by incorporating into the coder several psychoacoustic principles, including absolute hearing thresholds, critical band frequency analysis, simultaneous masking, the spread of masking along the basilar membrane, and temporal masking. By combining all these, a quantitative estimate of the fundamental limit of transparent audio signal compression i.e. Perceptual Entropy is determined for given audio frame.

Perceptual entropy denotes minimum number of bits which should be allocated to a given audio frame to represent ‘perceptually lossless’ audio.

Absolute Threshold of Hearing The absolute threshold of hearing characterizes the amount of energy needed in a pure tone such that it can be detected by a listener in a noiseless environment. It can be expressed with a non-linear function, Tq(f) = 3.64(f/1000)-0.8 - 6.5e-0.6(f/1000-3.3)2 + 10-3(f/1000)4 (dB SPL)

When applied to signal compression, it could be interpreted as a maximum allowable energy level for coding distortions introduced in the frequency domain. So using this information the noise levels during quantization are tried to fit below this threshold. Due to this quantization noise does not become audible.

However … The detection threshold for spectrally complex quantization noise is a modified version of the absolute threshold, with its shape determined by the stimuli present at any given time. Since stimuli are in general time-varying, the detection threshold is also a time-varying function of the input signal. A Spreading function helps to determine modified detection threshold of hearing in presence of stimuli in given audio frame.

Critical Bands Human ear can be viewed as a discrete set of band pass filters, which covers the entire 20kHz frequency range. The inner ear called as ”Cochlea” contains frequency sensitive positions. Whenever any tone enters the cochlea it moves until it reaches the position where it resonates. (Works as spectrum analyzer) The “critical bandwidth” is a function of frequency that quantifies the cochlear filter pass bands. (unit – Bark)

Spectral analysis of audio content is performed using critical bands. As the center frequency goes on increasing, the bark-width also goes on increasing. Spectral analysis of audio content is performed using critical bands. Bark-width with center frequency ‘f’ is gives as … BWc(f) = 25 + 75(1 + 1.4(f/100)2)0.69 Hz To convert frequency in ‘Hz’ to ‘Bark’ … Z(f) = 13 arctan(0.00076f) + 3.5 arctan(f/7500)2 (Bark)

Figure: Idealized critical band filter bank

Masking Masking refers to a process where one sound is rendered inaudible because of the presence of another sound Simultaneous Masking (Frequency domain) Relative shapes of the masker and maskee magnitude spectra determine extent of masking Non-simultaneous Masking (Time domain) Phase relationships between masker and maskee determine masking outcome.

Depending on the behavior of masker and maskee there are following cases : Noise Masking Tone (NMT) Tone Masking Noise (TMN) Noise Masking Noise (NMN)

Noise Masking Tone Tone Masking Noise We can see the asymmetry of masking power between noise and tonal maskers. Significantly greater masking power is associated with noise maskers than with tonal masker.

Difference between SMR, NMR and SNR

Spread of Masking Masker centered within one critical band has some predictable effect on detection thresholds in other critical bands. This effect, also known as the spread of masking, It is often modeled in coding applications by an approximately triangular spreading function

Non-simultaneous Masking (Temporal Masking)

MPEG Audio Codec Family MPEG-1 (ISO/IEC 11172-3) Layer 2 (mp2) MPEG-1 Layer 3 (mp3) MPEG-2 (ISO/IEC 13818-3) AAC MPEG-4 (ISO/IEC 14496-3) AAC MPEG-4 HE AAC MPEG-4 HE AAV v2

MP3 Compression Flow Chart

MDCT Filter bank QMF Filter bank Layer 3 uses a 2-stage filter, more frequency resolution and improved Huffman Coding to the basic perceptual coder principle

Bit rates available : In MPEG-1 Layer 3 are 32, 40, 48, 56, 64, 80, 96, 112, 128, 160, 192, 224, 256 and 320 kbit/s, and the available sampling frequencies are 32, 44.1 and 48 kHz. 44.1 kHz is almost always used (coincides with the sampling rate of compact discs), and 128 kbit/s has become the de facto "good enough" standard, although 192 kbit/s is becoming increasingly popular over peer-to-peer file sharing networks. In MPEG-2 and [the non-official] MPEG-2.5 include some additional bit rates: 8, 16, 24, 32, 40, 48, 56, 64, 80, 96, 112, 128, 144, 160 kbit/s while providing lower sampling frequencies (8, 11.025, 12, 16, 22.05 and 24 kHz)

Design limitations of MP3 There are several limitations inherent to the MP3 format that cannot be overcome by using a better encoder. Newer audio compression formats such as Vorbis and AAC no longer have these limitations. In technical terms, MP3 is limited in the following ways: Bitrate is limited to a maximum of 320 kbit/s Time resolution can be too low for highly transient signals, causing some smearing of percussive sounds Frequency resolution is limited by the small long block window size, decreasing coding efficiency No scale factor band for frequencies above 15.5/15.8 kHz Joint stereo is done on a frame-to-frame basis Encoder/decoder overall delay is not defined, which means lack of official provision for gapless playback. However, some encoders such as LAME can attach additional metadata that will allow players that are aware of it to deliver gapless playback. Nevertheless, a well-tuned MP3 encoder can perform competitively even with these restrictions.

Advanced Audio Coding (AAC) It is a standardized, lossy digital audio compression scheme. It was developed with the cooperation and contributions of companies mainly including Dolby, Fraunhofer (FhG), AT&T, Sony and Nokia, and was officially declared an international standard by the Moving Pictures Experts Group in April of 1997. Not backward compatible with other MPEG audio standards (like mp3)

AAC was promoted as the successor to MP3 for audio coding at medium to high bitrates. AAC follows the same basic coding paradigm as Layer-3 (high frequency resolution filterbank, non-uniform quantization, Huffman coding, iteration loop structure using analysis by-synthesis), but improves on Layer-3 in a lot of details and uses new coding tools for improved quality at low bit-rates. Its popularity is currently maintained by it being the default iTunes codec, the media player which powers iPod, the most popular digital audio player on the market. Furthermore, the iTunes Music Store, whose sales account for 85% of the market for legal online downloads, sells AAC-encoded songs (encapsulated with FairPlay Digital Rights Management)

AAC's improvements over MP3 Sample frequencies from 8 kHz to 96 kHz (official MP3: 16 kHz to 48 kHz) Up to 48 channels Higher efficiency and simpler filterbank (hybrid → pure MDCT) Higher coding efficiency for stationary signals (blocksize: 576 → 1024 samples) Higher coding efficiency for transient signals (blocksize: 192 → 128 samples) Can use Kaiser-Bessel derived window function to eliminate spectral leakage at the expense of widening the main lobe Much better handling of frequencies above 16 kHz More flexible joint stereo (separate for every scale band)

Both the mid/side coding and the intensity coding are more flexible, allowing to apply them to reduce the bit-rate more frequently. An optional backward prediction, computed line by line, achieves better coding efficiency especially for very tone-like signals. This feature is only available within the rarely used main profile. Improved Huffman Coding : In AAC, coding by quadruples of frequency lines applied more often. In addition, the assignment of Huffman code tables to coder partitions can be much more flexible. AAC and HE-AAC are far better than MP3 at very low bitrates, but at medium to higher bitrates the two formats are more comparable

Modular encoding AAC takes a modular approach to encoding. Depending on the complexity of the bitstream to be encoded, the desired performance and the acceptable output, implementers may create profiles to define which of a specific set of tools they want use for a particular application. The standard offers four default profiles: Low Complexity (LC) - the simplest and most widely used and supported; Main Profile (MAIN) - like the LC profile, with the addition of backwards prediction; Sample-Rate Scalable (SRS), a.k.a. Scalable Sample Rate (MPEG-4 AAC-SSR); Long Term Prediction (LTP); added in the MPEG-4 standard - an improvement of the MAIN profile using a forward predictor with lower computational complexity. Depending on the AAC profile and the MP3 encoder, 96 kbit/s AAC can give nearly the same or better perceptional quality as 128 kbit/s MP3

MPEG-2 AAC Flowchart

MPEG AAC Family

Extensions and Improvements Some extensions have been added to the original AAC standard: MPEG-4 Scalable To Lossless (SLS); High Efficiency AAC (HE-AAC), a.k.a. aacPlus v1 or AAC+ - the combination of SBR (Spectral Band Replication) and AAC; used for low bitrates; HE-AAC v.2, a.k.a. aacPlus v2 - the combination of Parametric Stereo (PS) and HE-AAC; Perceptual Noise Substitution (PNS); Long Term Predictor (LTP) - added in MPEG-4 Part 3.

MPEG AAC Performance MPEG AAC provides excellent audio quality. Reaching perceptually transparent quality at only 64 kbit/s per channel, it fulfills the requirements for broadcast quality as defined by the European Broadcasting Union. With sampling rates ranging from 8kHz up to 96kHz and above, with bit rates up to 256 kbit/s, and with support for up to 48 channels, MPEG AAC is one of the most flexible audio codecs. Of course, the standard also supports mono, stereo, and all common multi-channel configurations (e. g. 5.1 or 7.1). The low computational demands make AAC the ideal codec for any low bit rate high-quality audio application.

MPEG-HE AAC HE-AAC is the low bit rate codec in the AAC family and is a combination of the AAC LC (Advanced Audio Coding Low Complexity) audio coder and the SBR (Spectral Band Replication) bandwidth expansion tool. This combination achieves good stereo quality already at bit rates of 32 to 48 kbit/s. HE-AAC is also known as aacPlus and can be used in multi-channel operations.

MPEG-4 HE-AAC v2 Combined with parametric stereo, the HE-AAC codec provides good audio quality starting at bit rates around 16 to 24 kbit/s for stereo content. HE-AAC v2 is also known as aacPlus v2.

Rough work … Explain basic psychoacoustic principles – Absolute threshold of hearing, Critical bands, Phenomenon of masking – Simultaneous, Masking asymmetry, Spread of masking, Non-simultaneous, Perceptual Entropy MPEG audio codec family – mp3, mp2 AAC, mp4 AAC, advanced AAC plus version 1, advanced AAC plus version 2 (mention features present/absent in each)

Limitations of mp3 What is different in AAC ? Features in AAC Explain each feature in detail (mp2, mp4)