Multimedia Data Speech and Audio Dr Mike Spann Electronic, Electrical and Computer Engineering.

Slides:



Advertisements
Similar presentations
Multimedia: Digitised Sound Data Section 3. Sound in Multimedia Types: Voice Overs Special Effects Musical Backdrops Sound can make multimedia presentations.
Advertisements

Speech Coding Techniques
Audio Compression ADPCM ATRAC (Minidisk) MPEG Audio –3 layers referred to as layers I, II, and III –The third layer is mp3.
MP3 Overview John Ehrhardt Elena Silenok CSE228 – Spring 03.
Department of Computer Engineering University of California at Santa Cruz MPEG Audio Compression Layer 3 (MP3) Hai Tao.
Introduction to MP3 and psychoacoustics Material from website by Mark S. Drew
Guerino Mazzola (Fall 2014 © ): Introduction to Music Technology IIIDigital Audio III.6 (Fr Oct 24) The MP3 algorithm with PAC.
MPEG/Audio Compression Tutorial Mike Blackstock CPSC 538a January 11, 2004.
CS335 Principles of Multimedia Systems Audio Hao Jiang Computer Science Department Boston College Oct. 11, 2007.
MPEG-1 MUMT-614 Jan.23, 2002 Wes Hatch. Purpose of MPEG encoding To decrease data rate How? –two choices: could decrease sample rate, but this would cause.
4.1Different Audio Attributes 4.2Common Audio File Formats 4.3Balancing between File Size and Audio Quality 4.4Making Audio Elements Fit Our Needs.
1. Digitization of Sound What is Sound? Sound is a wave phenomenon like light, but is macroscopic and involves molecules of air being compressed and expanded.
MPEG Audio Formats Jason Leung Wednesday, February 5, 2014.
Time-Frequency Analysis Analyzing sounds as a sequence of frames
Digital Audio Compression
Preparing Audio for the Internet - Nick Kereakos - MPR Topics Covered: Topics Covered:  Static Audio Files  Audio Streams  Automation.
Audio 1 Subject:T0934 / Multimedia Programming Foundation Session:8 Tahun:2009 Versi:1/0.
Digital Audio Coding – Dr. T. Collins Standard MIDI Files Perceptual Audio Coding MPEG-1 layers 1, 2 & 3 MPEG-4.
Speech Compression. Introduction Use of multimedia in personal computers Requirement of more disk space Also telephone system requires compression Topics.
The frequency spectrum
Motivation Application driven -- VoD, Information on Demand (WWW), education, telemedicine, videoconference, videophone Storage capacity Large capacity.
Digital Representation of Audio Information Kevin D. Donohue Electrical Engineering University of Kentucky.
1 Digital Audio Compression. 2 Formats  There are many different formats for storing and communicating digital audio:  CD audio  Wav  Aiff  Au 
Audiovisual digital documents Adolf Knoll National Library of the Czech Republic
1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.
MPEG-3 For Audio Presented by: Chun Lui Sunjeev Sikand.
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
MPEG Audio Compression by V. Loumos. Introduction Motion Picture Experts Group (MPEG) International Standards Organization (ISO) First High Fidelity Audio.
Digital Voice Communication Link EE 413 – TEAM 2 April 21 st, 2005.
EET 450 Chapter 18 – Audio. Analog Audio Sound is analog Consists of air pressure that has a variety of characteristics  Frequencies  Amplitude (loudness)
Audio Coding MPEG1 Layers I, II, III MPEG2MPEG4 Sherida Subrati Anthony Caliendo.
11/11/03CSE 100 – Info Technology & Its Impact on Society1 MP-3 Compression: How it works.
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
Fundamentals of Perceptual Audio Encoding Craig Lewiston HST.723 Lab II 3/23/06.
Audio CompressiontMyn1 Audio Compression Audio compression has become well entrenched in consumer and professional digital audio products such as the compact.
Digital Audio Multimedia Systems (Module 1 Lesson 1)
1 Audio Compression Multimedia Systems (Module 4 Lesson 4) Summary: r Simple Audio Compression: m Lossy: Prediction based r Psychoacoustic Model r MPEG.
A Full Frequency Masking Vocoder for Legal Eavesdropping Conversation Recording R. F. B. Sotero Filho, H. M. de Oliveira (qPGOM), R. Campello de Souza.
Media File Formats Jon Ivins, DMU. Text Files n Two types n 1. Plain text (unformatted) u ASCII Character set is most common u 7 bits are used u This.
Digital Audio What do we mean by “digital”? How do we produce, process, and playback? Why is physics important? What are the limitations and possibilities?
Fall 2004EE 3563 Digital Systems Design Audio Basics  Analog to Digital Conversion  Sampling Rate  Quantization  Aliasing  Digital to Analog Conversion.
Digital Audio Watermarking: Properties, characteristics of audio signals, and measuring the performance of a watermarking system نيما خادمي کلانتري
COMP Representing Sound in a ComputerSound Course book - pages
The Application Layer Chapter 7. DNS – The Domain Name System a)The DNS Name Space b)Resource Records c)Name Servers.
Audio Compression Usha Sree CMSC 691M 10/12/04. Motivation Efficient Storage Streaming Interactive Multimedia Applications.
Multimedia Data Speech and Audio Dr Sandra I. Woolley Electronic, Electrical and Computer Engineering.
Overview of Multimedia A multimedia presentation might contain: –Text –Animation –Digital Sound Effects –Voices –Video Clips –Photographic Stills –Music.
Digital Multimedia, 2nd edition Nigel Chapman & Jenny Chapman Chapter 9 This presentation © 2004, MacAvon Media Productions Sound.
1 Audio Compression. 2 Digital Audio  Human auditory system is much more sensitive to quality degradation then is the human visual system  redundancy.
8. 1 MPEG MPEG is Moving Picture Experts Group On 1992 MPEG-1 was the standard, but was replaced only a year after by MPEG-2. Nowadays, MPEG-2 is gradually.
Georgia Institute of Technology Introduction to Processing Digital Sounds part 1 Barb Ericson Georgia Institute of Technology Sept 2005.
Image Processing Architecture, © 2001, 2002, 2003 Oleh TretiakPage 1 ECE-C490 Image Processing Architecture MP-3 Compression Course Review Oleh Tretiak.
Digital Recording. Digital recording is different from analog in that it doesn’t operate in a continuous way; it breaks a continuously varying waveform.
Submitted By: Santosh Kumar Yadav (111432) M.E. Modular(2011) Under the Supervision of: Mrs. Shano Solanki Assistant Professor, C.S.E NITTTR, Chandigarh.
MPEG-1Standard By Alejandro Mendoza. Introduction The major goal of video compression is to represent a video source with as few bits as possible while.
1 Audio Coding. 2 Digitization Processing Signal encoder Signal decoder samplingquantization storage Analog signal Digital data.
Encoding How is information represented?. Way of looking at techniques Data Medium Digital Analog Digital Analog NRZ Manchester Differential Manchester.
Voice Sampling. Sampling Rate Nyquist’s theorem states that a signal can be reconstructed if it is sampled at twice the maximum frequency of the signal.
EE5359 Multimedia Processing Project Study and Comparison of AC3, AAC and HE-AAC Audio Codecs Dhatchaini Rajendran Student ID: Date :
UNIT V. Linear Predictive coding With the advent of inexpensive digital signal processing circuits, the source simply analyzing the audio waveform to.
Lifecycle from Sound to Digital to Sound. Characteristics of Sound Amplitude Wavelength (w) Frequency ( ) Timbre Hearing: [20Hz – 20KHz] Speech: [200Hz.
III Digital Audio III.7 (W Nov 04) The MP3 frame format.
III Digital Audio III.7 (F Oct 20) The MP3 frame format.
Multimedia: Digitised Sound Data
Multimedia Systems and Applications
III Digital Audio III.7 (Mo Oct 22) The MP3 frame format.
MPEG-1 Overview of MPEG-1 Standard
COMS 161 Introduction to Computing
Audio Compression Techniques
Govt. Polytechnic Dhangar(Fatehabad)
Presentation transcript:

Multimedia Data Speech and Audio Dr Mike Spann Electronic, Electrical and Computer Engineering

Content  Speech and sound signals –Speech production –Sampling speech signals –What signals look and sound like?  Time/Frequency components –SFS demo –Compression methods  Audio coding –MP3 (perceptual coding)

Speech Production

Sampling and Quantizing A 5ms Speech Signal at 8kHz

Sound Facts  The human ear hears sounds up to 20kHz  Nyquist theorem states that we have to sample at at least twice the highest frequency - hence we need to sample at 40kHz or better  8kHz sampling used for telephone speech, 44.1kHz used by CD audio, and, Digital Audio Tape (DAT) samples at 44kHz using 16-bit samples  Demo  44kHz  22kHz  16kHz  8kHz  4kHz  16bit  8bit

Examples of Speech Sounds Examples of speech sounds are plosive, voiced and fricative.  Plosive –A speech sound generated by a sudden release of air in the vocal tract. Plosive sounds can also not be maintained. Once you release the air the sound has ended.  Voiced –A speech sound generated with vibrating vocal chords. Unvoiced speech sound is generated without the vibration of vocal chords.  Fricative –A speech sound generated by turbulent air flow produced by a constriction. E.g., “shy”, “high”, “zoo” “thy”. They can be voiced or unvoiced.  Examples: [p] in pale, [ee] in seem, and, [f] in face  Words can contain mixtures.... e.g. “sap” or “puff”

Speech Signals (SFS)  SFS demo (available on the course web page) –Speech filing system (SFS) from Mark Huckvale at UCL. – –(demo.sfs - “BOX...AGO...BOX...AGO) Time variation of signal amplitude Spectrogram

Spectrograms  A 2D plot showing the time/frequency distribution of a signal  Its essentially a ‘windowed’ frequency analysis –The window ‘slides’ along the time axis  Very common in speech analysis  The spectrogram of a sinusoid is a horizontal line  More interestingly the spectrogram of an FM signal is a sinusoid! FM signal Violin

SFS Demonstration  The demonstration will show that spoken words can contain silences.  It will provide spectrograph examples which shows the frequencies present in the speech signal.  We will see how much of the intelligibility is in the high frequency components.  The low-pass filter example will provide a very simple simulation of sound after passing through a wall. The sample waveform The spectograph (the frequency map of the signal above)

Compressing Speech Waveform Coding  Attempts to reproduce the original waveform.  64kbits/s -16kbits/s Vocoding  A synthesised version of the signal.  1.2kbits/s-2.4kbits/s  (and as low as bps) Hybrid Coding  Attempts to fill the gap between waveform and vocoding. Uses a combination of analysis and error minimisation.  4.8kbits/s - 9.6kbits/s

Compressing Speech  There is a good (but rather advanced) summary of speech compression using hybrid coders at compression.com/speech.htmlhttp:// compression.com/speech.html  Also includes a demo.

Audio Coding (MP3)  ‘MP3’ has almost become synonymous with the name of a player but its actually a standard for audio compression –MP3 is actually MPEG-1 Layer- III  The German company Fraunhofer- Gesellshaft developed MP3 technology and now licenses the patent rights to the audio compression technology - United States Patent 5,579,430 for a "digital encoding process".  The inventors named on the MP3 patent are Bernhard Grill, Karl- Heinz Brandenburg, Thomas Sporer, Bernd Kurten, and Ernst Eberlein.

Audio Coding (MP3)  The MPEG committee chose to recommend 3 audio compression methods of increasing complexity and demands on processing power.  Able to maintain excellent sound quality at very small file sizes.  The compression reduces an audio file to one-tenth of its original size. –E.g. 40MB file  3.5MB  MP3 is actually MPEG-1 Layer-III –They are 3 layers referred to as Audio Layer I, II and III  Layer I is the simplest, a sub-band coder with a psychoacoustic mode  Layer II adds more advanced bit allocation techniques and greater accuracy. This is used for digital radio (DAB, Digital Audio Broadcast)  Layer III (MP3) adds a hybrid filterbank and non- uniform quantization plus advanced features like Huffman coding, 18 times higher frequency resolution and bit reservoir technique

Audio Coding (MP3)  The standards require downward compatibility so, for example, a valid Layer III decoder must be able to decode any Layer I, II or III MPEG Audio stream. Similarly a layer II decoder should be able to decode Layer I and Layer II streams.  MPEG audio uses psychoacoustic models (perceptual coding), i.e., models of the way the human brain perceives sound. – Music consists of many different components - not all of which are audible in the same way. For example, a soft flute may be hidden from the ear of the listener if a trumpet is played at the same time. The flute is still present, of course, but the listener is simply unable to perceive it: The flute is masked by the trumpet –An mp3 implementation sees the trumpet represented with great precision and the flute more vaguely. This flexible method of representation helps to reduce the amount of information to be transmitted or stored - helping to minimize overall file size

Simple Masking Example (from  The figure shows the threshold of hearing curve and a single tone (sinewave) with a frequency of 1kHz.  The red curve (A) is the normal hearing threshold  The green curve (B) is the masking curve due to the tone (C) and the band of noise in yellow (D) at 1.5kHz cannot be perceived by the human ear because of the masking effect of the tone at 1kHz.

Audio Coding (MP3)… continued  Including a psychoacoustical model means that masked tones can be removed from the bitstream to improve compression performance.  The coder calculates masking effects by an iterative process until it runs out of time.  File sizes –As we would expect, quality descriptors are difficult to match to file sizes or compression ratios. For example, different users, different applications, different codecs will all have different expectations, requirements or different results. –But as a very rough guide...  higher quality bit rates would be from kbps (closer to CD-quality).  lower quality bit rates from 96kbps and below.  Uncompressed audio as stored on an audio-CD has a bit rate of 1,411.2 kbit/s

Audio Coding (MP3) demo  LAME is a high quality MP3 encoder/decoder –  RazorLame is a user friendly GUI for LAME allowing MP3 demonstrations – phphttp:// php  We can create mp3 files at different compression ratios

Summary  Speech and sound signals –Speech production –Sampling and quantisation –What signals look and sound like (SFS demo) - spectrogram –Compression approaches  Audio coding –MP3 (perceptual coding) –MP3 demonstrations

 This concludes our introduction to speech and audio.  You can find course information, including slides and supporting resources, on-line on the course web page at Thank You