Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.

Slides:



Advertisements
Similar presentations
Speech Coding Techniques
Advertisements

Time-Frequency Analysis Analyzing sounds as a sequence of frames
Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.
Ranko Pinter Simoco Digital Systems
Digital Representation of Audio Information Kevin D. Donohue Electrical Engineering University of Kentucky.
ACOUSTICAL THEORY OF SPEECH PRODUCTION
The Human Voice Chapters 15 and 17. Main Vocal Organs Lungs Reservoir and energy source Larynx Vocal folds Cavities: pharynx, nasal, oral Air exits through.
A 12-WEEK PROJECT IN Speech Coding and Recognition by Fu-Tien Hsiao and Vedrana Andersen.
Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.
CELLULAR COMMUNICATIONS 5. Speech Coding. Low Bit-rate Voice Coding  Voice is an analogue signal  Needed to be transformed in a digital form (bits)
Speech Coding Nicola Orio Dipartimento di Ingegneria dell’Informazione IV Scuola estiva AISV, 8-12 settembre 2008.
Speech & Audio Processing
Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
MPEG Audio Compression by V. Loumos. Introduction Motion Picture Experts Group (MPEG) International Standards Organization (ISO) First High Fidelity Audio.
Fundamental of Wireless Communications ELCT 332Fall C H A P T E R 6 SAMPLING AND ANALOG-TO-DIGITAL CONVERSION.
COMP 249 :: Spring 2005 Slide: 1 Audio Coding Ketan Mayer-Patel.
Waveform SpeechCoding Algorithms: An Overview
1 Audio Compression Multimedia Systems (Module 4 Lesson 4) Summary: r Simple Audio Compression: m Lossy: Prediction based r Psychoacoustic Model r MPEG.
Vytautas Deksnys, Algimantas Čitavičius Kaunas University of Technology Dept. of Electronics Engineering.
CS :: Fall 2003 Audio Coding Ketan Mayer-Patel.
„Bandwidth Extension of Speech Signals“ 2nd Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction 22nd and 23rd June.
Speech Coding PCM DPCM ADPCM LPC CELP A road map Page 1 of 30
LECTURE Copyright  1998, Texas Instruments Incorporated All Rights Reserved Encoding of Waveforms Encoding of Waveforms to Compress Information.
AUDIO COMPRESSION msccomputerscience.com. The process of digitizing audio signals is called PCM PCM involves sampling audio signal at minimum rate which.
Speech Coding Using LPC. What is Speech Coding  Speech coding is the procedure of transforming speech signal into more compact form for Transmission.
Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.
Speech Coding Submitted To: Dr. Mohab Mangoud Submitted By: Nidal Ismail.
SPEECH CODING Maryam Zebarjad Alessandro Chiumento.
CE Digital Signal Processing Fall 1992 Waveform Coding Hossein Sameti Department of Computer Engineering Sharif University of Technology.
Sound Sound is a continuous wave that travels through the air
1 Linear Prediction. Outline Windowing LPC Introduction to Vocoders Excitation modeling  Pitch Detection.
1 PCM & DPCM & DM. 2 Pulse-Code Modulation (PCM) : In PCM each sample of the signal is quantized to one of the amplitude levels, where B is the number.
1 Audio Compression. 2 Digital Audio  Human auditory system is much more sensitive to quality degradation then is the human visual system  redundancy.
Compression No. 1  Seattle Pacific University Data Compression Kevin Bolding Electrical Engineering Seattle Pacific University.
Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.
1 Speech Synthesis User friendly machine must have complete voice communication abilities Voice communication involves Speech synthesis Speech recognition.
Submitted By: Santosh Kumar Yadav (111432) M.E. Modular(2011) Under the Supervision of: Mrs. Shano Solanki Assistant Professor, C.S.E NITTTR, Chandigarh.
CS Spring 2009 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2009.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.
ITU-T G.729 EE8873 Rungsun Munkong March 22, 2004.
CHAPTER 3 DELTA MODULATION
Outline Transmitters (Chapters 3 and 4, Source Coding and Modulation) (week 1 and 2) Receivers (Chapter 5) (week 3 and 4) Received Signal Synchronization.
A Comparison Of Speech Coding With Linear Predictive Coding (LPC) And Code-Excited Linear Predictor Coding (CELP) By: Kendall Khodra Instructor: Dr. Kepuska.
1 Audio Coding. 2 Digitization Processing Signal encoder Signal decoder samplingquantization storage Analog signal Digital data.
More On Linear Predictive Analysis
SPEECH CODING Maryam Zebarjad Alessandro Chiumento Supervisor : Sylwester Szczpaniak.
Present document contains informations proprietary to France Telecom. Accepting this document means for its recipient he or she recognizes the confidential.
Chapter 20 Speech Encoding by Parameters 20.1 Linear Predictive Coding (LPC) 20.2 Linear Predictive Vocoder 20.3 Code Excited Linear Prediction (CELP)
EE 551/451, Fall, 2006 Communication Systems Zhu Han Department of Electrical and Computer Engineering Class 13 Oct. 3 rd, 2006.
Voice Sampling. Sampling Rate Nyquist’s theorem states that a signal can be reconstructed if it is sampled at twice the maximum frequency of the signal.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
CELP / FS-1016 – 4.8kbps Federal Standard in Voice Coding
1 Speech Compression (after first coding) By Allam Mousa Department of Telecommunication Engineering An Najah University SP_3_Compression.
Lifecycle from Sound to Digital to Sound. Characteristics of Sound Amplitude Wavelength (w) Frequency ( ) Timbre Hearing: [20Hz – 20KHz] Speech: [200Hz.
Chapter 13 Basic Audio Compression Techniques 13.1 ADPCM in Speech Coding 13.2 G.726 ADPCM 13.3 Vocoders 13.4 Further Exploration.
Digital Communications Chapter 13. Source Coding
Vocoders.
Chapter 13 Basic Audio Compression Techniques
Linear Prediction.
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
CS 4594 Data Communications
Linear Predictive Coding Methods
Mobile Systems Workshop 1 Narrow band speech coding for mobile phones
Vocoders.
PCM & DPCM & DM.
Speech Perception (acoustic cues)
Linear Prediction.
Presentation transcript:

Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg

Outline Part I - Speech –Speech –History of speech synthesis & coding –Speech coding methods Part II – Audio –Psychoacoustic models –MPEG-4 Audio

Speech Production The human’s vocal apparatus consists of: – lungs – trachea (wind pipe) – larynx contains 2 folds of skin called vocal cords which blow apart and flap together as air is forced through – oral tract – nasal tract

1 The Speech Signal

Elements of the speech signal: spectral resonance (formants, moving) periodic excitation (voicing, pitched) + pitch contour noise excitation (fricatives, unvoiced, no pitch) transients (stop-release bursts) amplitude modulation (nasals, approximants) timing The Speech Signal

Vowels - characterised by formants; generally voiced; Tongue & lips - effect of rounding. Examples of vowels: a, e, i, o, u, a, ah, oh. Vibration of vocal cords: male Hz, female up to 500Hz. Vowels have in average much longer duration than consonants. Most of the acoustic energy of a speech signal is carried by vowels. F1-F2 chartFormant positions The Speech Signal

Channel vocoder - first analysis-by-synthesis system developed by Homer Dudley of AT&T labs - VODER PCM - first conceived by Paul M. Rainey and independently by Alex Reeves (AT&T Paris) in Deployed in US PSTN in 1962 VODER – the architecture History of Speech Coding

Channel vocoder - first analysis - by - synthesis system developed by Homer Dudley of AT&T labs - VODER PCM - first conceived by Paul M. Rainey and independently by Alex Reeves (AT&T Paris) in Deployed in US PSTN in 1962 History of Speech Coding

OVE formant synthesis (Gunnar Fant, KTH), 1953

History of Speech - Coding Channel vocoder - first analysis - by - synthesis system Homer Dudley of AT&T labs - VODER PCM - first conceived by Paul M. Rainey and independently by Alex Reeves (AT&T Paris) in Deployed in US PSTN in  -law encoding proposed (standardised for telephone network in 1972 (G.711)) delta modulation proposed, differential PCM invented ADPCM developed CELP vocoder proposed (majority of coding standards for speech signal today use a variation on CELP)

Signal from a source is filtered by a time-varying filter with resonant properties similar to that of the vocal tract. The gain controls A v and A N determine the intensity of voiced and unvoiced excitation. The frequency of higher formant are attenuated by -12 dB/octave (due to the nature of our speech organs). This is an over simplified model for speech production. However, it is very often adequate for understanding the basic principles. Source-filter Model of Speech Production

Speech Coding Strategies 1. PCM Invented 1926, deployed The speech signal is sampled at 8 kHz. Uniform quantization requires >10 bits/sample. Non-uniform quantization (G.711, 1972) Quantizing y to 8 bits -> 64 kbit/s.

Speech Coding Strategies 2. Adaptive DPCM Example: G.726 (1974) Adaptive predictor based on six previous differences. Gain-adaptive quantizer with 15 levels ) 32 kbit/s.

Speech Coding Strategies 3. Model-based Speech Coding Advanced speech coders are based on models of how speech is produced: Excitation source Vocal tract

An Excitation Source Noise generator Pulse generator Pitch

Vocal Tract Filter 1: A Fixed Filter Bank BP g1g1 g2g2 gngn

Vocal Tract Filter 2: A Controllable Filter

Linear Predictive Coding (LPC) The controllable filter is modelled as y n =  a i y n-i + G  n where  n is the input signal and y n is the output. We need to estimate the vocal tract parameters (a i and G) and the exciatation parameters (pitch, v/uv). Typically the source signal is divided in short segments and the parameters are estimated for each segment. Example: The speech signal is sampled at 8 kHz and divided in segments of 180 samples (22.5 ms/segment).

Typical Scheme of an LPC Coder Noise generator Pulse generator Pitch Vocal tract filter v/uvGain Filter coeffs

Estimating the Parameters v/uv estimation –Based on energy and frequency spectrum. Pitch-period estimation –Look for periodicity, either via the a.c.f our some other measure, for example that gives you a minimum value when p equals the pitch period. –Typical pitch-periods: samples.

Estimating the Parameters Vocal tract filter estimation –Find the filter coefficients that minimize the error  2 = ( y n -  a i y n-i + G  n ) 2 –Compare to the computation of optimal predictors (Lecture 7).

Estimating the Parameters Assuming a stationary signal: where R and p contain acf values. This is called the autocorrelation method.

Estimating the Parameters Alternatively, in case of a non-stationary signal: where This is called the autocovariance method.

Example Coding of parameters using LPC10 (1984): v/uv1 bit Pitch6 bits Voiced filter46 bits Unvoiced filter46 bits Synchronization1 bit Sum: 54 bits ) 2.4 kbit/s

The Vocal Tract Filter Different representations: –LPC parameters –PARCOR (Partial Correlation Coefficients) –LSF (Line Spectrum Frequencies)

LPC analysis ) V(z) Define perceptual weighting filter. This permits more noise at formant frequencies where it will be masked by the speech Synthesise speech using each codebook entry in turn as the input to V(z) Calculate optimum gain to minimise perceptually weighted error energy in speech frame Select codebook entry that gives lowest error Decoding: Receive LPC parameters and codebook index Re-synthesise speech using V(z) and codebook entry Encoding: Transmit LPC parameters and codebook index Performance: 16kbit/s: MOS=4.2, Delay=1.5 ms, 19 MIPS 8 kbit/s: MOS=4.1, Delay=35 ms, 25 MIPS 2.4kbit/s: MOS=3.3, Delay=45 ms, 20 MIPS Code Excited Linear Prediction Coding (CELP)

Examples G.728 –V(z) is chosen as a large FIR-filter (M ¼ 50). –The gain and FIR-parametrers are estimated recursively from previously received samples. –The code book contains 127 sequences. GSM –The code book contains regular pulse trains with variabel frequency and amplitudes. MELP –Mixed excitation linear prediction –The code book is combined with a noise generator.

Other Variations SELP – Self Excited Linear Prediction MPLP – Multi-Pulse Excited Linear Prediction MBE – Multi-Band Excitation Coding

Quality Levels Quality levelBandwidthBitrate Broadcast quality10 kHz>64 kbit/s Network (tool) quality300 – 3400 kHz16 – 64 kbit/s Communication quality4 – 16 kbit/s Synthetic quality<4 kbit/s

MOS (Mean Opinion Score): result of averaging opinions scores for a set of between 20 – 60 untrained subjects. They rate the quality 1 to 5 (1-bad, 2-poor, 3-fair, 4-good, 5-excellent). MOS of 4 or higher defines good or tool quality (network quality) - reconstructed signal generally indistinguishable from the original. MOS between 3.5 – 4.0 defines communication quality – telephone communications MOS between 2.5 – 3.5 implies synthetic quality In digital communications speech quality is classified into four general categories, namely: broadcast, network or toll, communications, and synthetic. Broadcast wideband speech – high quality ”commentary” speech – generally achieved at rates above 64 kbits/s. Subjective Assessment

DRT (Diagnostic Rhyme Test): listeners should recognise one of the two possible words in a set of rhyming pairs (e.g. meatl/heat) DAM (Diagnostic Acceptability Measure) - trained listeners judge various factors e.g. muffledness, buzziness, intelligibility Quality versus data rate (8kHz sampling rate) Subjective Assessment