Introduction to MP3 and psychoacoustics Material from website by Mark S. Drew

Slides:



Advertisements
Similar presentations
Alex Chen Nader Shehad Aamir Virani Erik Welsh
Advertisements

Audio Compression ADPCM ATRAC (Minidisk) MPEG Audio –3 layers referred to as layers I, II, and III –The third layer is mp3.
Department of Computer Engineering University of California at Santa Cruz MPEG Audio Compression Layer 3 (MP3) Hai Tao.
Psycho-acoustics and MP3 audio encoding
Guerino Mazzola (Fall 2014 © ): Introduction to Music Technology IIIDigital Audio III.6 (Fr Oct 24) The MP3 algorithm with PAC.
MPEG/Audio Compression Tutorial Mike Blackstock CPSC 538a January 11, 2004.
CS335 Principles of Multimedia Systems Audio Hao Jiang Computer Science Department Boston College Oct. 11, 2007.
MPEG-1 MUMT-614 Jan.23, 2002 Wes Hatch. Purpose of MPEG encoding To decrease data rate How? –two choices: could decrease sample rate, but this would cause.
CGMB324: Multimedia System Design
School of Informatics CG087 Time-based Multimedia Assets Compression & StreamingDr Paul Vickers1 Compression & Streaming Serving, shrinking, and otherwise.
Time-Frequency Analysis Analyzing sounds as a sequence of frames
Digital Audio Compression
Digital Audio Coding – Dr. T. Collins Standard MIDI Files Perceptual Audio Coding MPEG-1 layers 1, 2 & 3 MPEG-4.
CS 551/651: Structure of Spoken Language Lecture 11: Overview of Sound Perception, Part II John-Paul Hosom Fall 2010.
Speech Science XII Speech Perception (acoustic cues) Version
AUDIO COMPRESSION TOOLS & TECHNIQUES Gautam Bhattacharya.
Digital Representation of Audio Information Kevin D. Donohue Electrical Engineering University of Kentucky.
PAC/AAC audio coding standard A. Moreno Georgia Institute of Technology ECE8873-Spring/2004
1 Digital Audio Compression. 2 Formats  There are many different formats for storing and communicating digital audio:  CD audio  Wav  Aiff  Au 
Chapter 14 MPEG Audio Compression 14.1 Psychoacoustics 14.2 MPEG Audio 14.3 Other Commercial Audio Codecs 14.4 The Future: MPEG-7 and MPEG Further.
Loudness Physics of Music PHY103 experiments: mix at different volumes
Speech & Audio Processing
1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
MPEG Audio Compression by V. Loumos. Introduction Motion Picture Experts Group (MPEG) International Standards Organization (ISO) First High Fidelity Audio.
LBSC 690 Session #11 Multimedia Jimmy Lin The iSchool University of Maryland Wednesday, November 12, 2008 This work is licensed under a Creative Commons.
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
11/11/03CSE 100 – Info Technology & Its Impact on Society1 MP-3 Compression: How it works.
Warped Linear Prediction Concept: Warp the spectrum to emulate human perception; then perform linear prediction on the result Approaches to warp the spectrum:
Fundamentals of Perceptual Audio Encoding Craig Lewiston HST.723 Lab II 3/23/06.
1 Audio Compression Multimedia Systems (Module 4 Lesson 4) Summary: r Simple Audio Compression: m Lossy: Prediction based r Psychoacoustic Model r MPEG.
A Full Frequency Masking Vocoder for Legal Eavesdropping Conversation Recording R. F. B. Sotero Filho, H. M. de Oliveira (qPGOM), R. Campello de Souza.
School of Informatics CG087 Time-based Multimedia Assets Compression & StreamingDr Paul Vickers1 Compression & Streaming Serving, shrinking, and otherwise.
Chapter 6 Basics of Digital Audio
EE Audio Signals and Systems Effects Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Psycho- acoustics and MP3 audio encoding Physics of Music PHY103.
Audio Compression Usha Sree CMSC 691M 10/12/04. Motivation Efficient Storage Streaming Interactive Multimedia Applications.
AUDIO COMPRESSION msccomputerscience.com. The process of digitizing audio signals is called PCM PCM involves sampling audio signal at minimum rate which.
CMPT 365 Multimedia Systems
A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.
Multimedia Data Speech and Audio Dr Sandra I. Woolley Electronic, Electrical and Computer Engineering.
Speech and Audio Coding Heejune AHN Embedded Communications Laboratory Seoul National Univ. of Technology Fall 2013 Last updated
Dynamic Range and Dynamic Range Processors
1 Speech and Audio Processing and Coding (cont.) Dr Wenwu Wang Centre for Vision Speech and Signal Processing Department of Electronic Engineering
MPEG Audio coders. Motion Pictures Expert Group(MPEG) The coders associated with audio compression part of MPEG standard are called MPEG audio compressor.
Sound Sound is a continuous wave that travels through the air
1 Audio Compression. 2 Digital Audio  Human auditory system is much more sensitive to quality degradation then is the human visual system  redundancy.
Image Processing Architecture, © 2001, 2002, 2003 Oleh TretiakPage 1 ECE-C490 Image Processing Architecture MP-3 Compression Course Review Oleh Tretiak.
Submitted By: Santosh Kumar Yadav (111432) M.E. Modular(2011) Under the Supervision of: Mrs. Shano Solanki Assistant Professor, C.S.E NITTTR, Chandigarh.
CS Spring 2009 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2009.
SOUND PRESSURE, POWER AND LOUDNESS MUSICAL ACOUSTICS Science of Sound Chapter 6.
Digital Audio III. Sound compression (I) Compression of sound data requires different techniques from those for graphical data Requirements are less stringent.
1 Audio Coding. 2 Digitization Processing Signal encoder Signal decoder samplingquantization storage Analog signal Digital data.
1 Hearing Sound is created by vibrations from a source and is transmitted through a media (such as the atmosphere) to the ear. Sound has two main attributes:
AUDIOFILES Harika Basana ), Elizabeth Chan ), Nikolai ), Frank Zhang ) 6100.
IntroductiontMyn1 Introduction MPEG, Moving Picture Experts Group was started in 1988 as a working group within ISO/IEC with the aim of defining standards.
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2014.
Digital Audio I. Acknowledgement Some part of this lecture note has been taken from multimedia course made by Asst.Prof.Dr. William Bares and from Paul.
SOUND PRESSURE, POWER AND LOUDNESS
Audio Coding Lecture 7. Content  Digital Audio Basic  Speech Compression  Music Compression.
Fundamentals of Multimedia 2 nd ed., Chapter 14 Chapter 14 MPEG Audio Compression 14.1 Psychoacoustics 14.2 MPEG Audio 14.3 Other Audio Codecs 14.4 MPEG-7.
MP3 and MP4 Audio By: Krunal Tailor
PSYCHOACOUSTICS A branch of psychophysics
III Digital Audio III.6 (Fr Oct 20) The MP3 algorithm with PAC.
Ana Alves-Pinto, Joseph Sollini, Toby Wells, and Christian J. Sumner
CHAPTER 10 Auditory Sensitivity.
MPEG-1 Overview of MPEG-1 Standard
III Digital Audio III.6 (Mo Oct 22) The MP3 algorithm with PAC.
Speech Perception (acoustic cues)
Govt. Polytechnic Dhangar(Fatehabad)
Presentation transcript:

Introduction to MP3 and psychoacoustics Material from website by Mark S. Drew l

Human hearing and voice *Frequency range is about 20 Hz to 20 kHz, most sensitive at 2 to 4 KHz. *Dynamic range (quietest to loudest) is about 96 dB *Normal voice range is about 500 Hz to 2 kHz *Low frequencies are vowels and bass *High frequencies are consonants

Efficient coding Send what is audible; throw away what’s not. Or: Only send what is needed (e.g. telephone cut-off for speech)

Sensitivity of human hearing in relation to frequency Experiment: Put a person in a quiet room. Raise level of 1 kHz tone until just barely audible. Vary the frequency and plot threshold.

*Human auditory system has a limited, frequency-dependent resolution. The perceptually uniform measure of frequency can be expressed in terms of the width of the Critical Bands. It is less than 100 Hz at the lowest audible frequencies, and more than 4 kHz at the high end. Altogether, the audio frequency range can be partitioned into 25 critical bands. *A new unit for frequency bark (after Barkhausen) is introduced: 1 Bark = width of one critical band For frequency 500 Hz, it is Bark. Critical Bands

Frequency Masking Question: Do receptors interfere with each other? *Experiment: Play 1 kHz tone (masking tone) at fixed level (60 dB). Play test tone at a different level (e.g., 1.1 kHz), and raise level until just distinguishable. *Vary the frequency of the test tone and plot the threshold when it becomes audible:

Masking with various frequency masking tones.

Frequency Masking shown on critical band scale:

Temporal masking *If we hear a loud sound, then it stops, it takes a little while until we can hear a soft tone nearby. *Experiment: Play 1 kHz masking tone at 60 dB, plus a test tone at 1.1 kHz at 40 dB. Test tone can't be heard (it's masked). Stop masking tone, then stop test tone after a short delay. Adjust delay time to the shortest time when test tone can be heard (e.g., 5 ms). Repeat with different level of the test tone and plot:

Total effect of both frequency and temporal maskings :

Steps in algorithm: 1.Use convolution filters to divide the audio signal (e.g., 48 kHz sound) into 32 frequency subbands - -> subband filtering. 2.Determine amount of masking for each band caused by nearby band using the psychoacoustic model shown above. 3.If the power in a band is below the masking threshold, don't encode it. 4.Otherwise, determine number of bits needed to represent the coefficient such that noise introduced by quantization is below the masking effect (Recall that one fewer bit of quantization introduces about 6 dB of noise). 5.Format bitstream

Example of running algorithm Example: *After analysis, the first levels of 16 of the 32 bands are these: Band Level (db) *If the level of the 8th band is 60dB, it gives a masking of 12 dB in the 7th band, 15dB in the 9th. Level in 7th band is 10 dB ( < 12 dB ), so ignore it. Level in 9th band is 35 dB ( > 15 dB ), so send it. [ Only the amount above the masking level needs to be sent, so instead of using 6 bits to encode it, we can use 4 bits -- a saving of 2 bits (= 12 dB). ]