Speech Processing. References L.R. Rabiner and R.W. Schafer. Digital Processing of Speech Signals. Prentice-Hall, 1978. Lawrence Rabiner and Biing-Hwang.

Slides:



Advertisements
Similar presentations
Vowel Formants in a Spectogram Nural Akbayir, Kim Brodziak, Sabuha Erdogan.
Advertisements

The Human Voice. I. Speech production 1. The vocal organs
Speech Perception Overview of Questions Can computers perceive speech as well as humans? Does each word that we hear have a unique pattern associated.
The Human Voice Chapters 15 and 17. Main Vocal Organs Lungs Reservoir and energy source Larynx Vocal folds Cavities: pharynx, nasal, oral Air exits through.
Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors
Itay Ben-Lulu & Uri Goldfeld Instructor : Dr. Yizhar Lavner Spring /9/2004.
Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.
Automatic Lip- Synchronization Using Linear Prediction of Speech Christopher Kohnert SK Semwal University of Colorado, Colorado Springs.
PH 105 Dr. Cecilia Vogel Lecture 14. OUTLINE  consonants  vowels  vocal folds as sound source  formants  speech spectrograms  singing.
Speech Classification Speech Lab Spring 2009 February 17, 09 1 Montgomery College Speech Classification Uche O. Abanulo Physics, Engineering And Geosciences.
Spring Wave Oscillations External force causes oscillations Governing equation: f = ½π(k/m) ½ – The spring stiffness and quantity of mass determines the.
Complete Discrete Time Model Complete model covers periodic, noise and impulsive inputs. For periodic input 1) R(z): Radiation impedance. It has been shown.
EE513 Audio Signals and Systems LPC Analysis and Speech Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
1 Frequency Domain Analysis/Synthesis Concerned with the reproduction of the frequency spectrum within the speech waveform Less concern with amplitude.
Vowel Acoustics, part 2 November 14, 2012 The Master Plan Acoustics Homeworks are due! Today: Source/Filter Theory On Friday: Transcription of Quantity/More.
English Phonetics and Phonology Lesson 3B
6/3/20151 Voice Transformation : Speech Morphing Gidon Porat and Yizhar Lavner SIPL – Technion IIT December
DESCRIBING THE TWENTY-FOUR CONSONANT SOUNDS
Introduction to Speech Synthesis ● Key terms and definitions ● Key processes in sythetic speech production ● Text-To-Phones ● Phones to Synthesizer parameters.
SPEECH PERCEPTION The Speech Stimulus Perceiving Phonemes Top-Down Processing Is Speech Special?
Introduction to Speech Production Lecture 1. Phonetics and Phonology Phonetics: The physical manifestation of language in sound waves. –How sounds are.
Recap: Vowels & Consonants V – central “sound” of the syllable C – outer “shell” of the syllable (C) V (C) (C)(C)(C)V(C)(C)(C)
Voice Transformations Challenges: Signal processing techniques have advanced faster than our understanding of the physics Examples: – Rate of articulation.
Grammatical Aspects of Language Phonetics: The Sounds of Language.
Speech Signal Processing I Edmilson Morais and Prof. Greg. Dogil October, 25, 2001.
Source/Filter Theory and Vowels February 4, 2010.
Computer Sound Synthesis 2
Supervisor: Dr. Eddie Jones Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification System for Security.
Acoustic Phonetics 3/9/00. Acoustic Theory of Speech Production Modeling the vocal tract –Modeling= the construction of some replica of the actual physical.
Digital Systems: Hardware Organization and Design
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Speech Coding Using LPC. What is Speech Coding  Speech coding is the procedure of transforming speech signal into more compact form for Transmission.
1 Linear Prediction. 2 Linear Prediction (Introduction) : The object of linear prediction is to estimate the output sequence from a linear combination.
1 Linear Prediction. Outline Windowing LPC Introduction to Vocoders Excitation modeling  Pitch Detection.
♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 1/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay NCC 2011 : 17.
Compression No. 1  Seattle Pacific University Data Compression Kevin Bolding Electrical Engineering Seattle Pacific University.
Structure of Spoken Language
WEBSITE Please use this website to practice what you learn during lessons 1.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
Sound Waveforms Neil E. Cotter Associate Professor (Lecturer) ECE Department University of Utah CONCEPT U AL TOOLS.
Speech Sounds In any language we can identify a set of regularly used sound (consonants, vowels) that we call. Speech Sounds In any language we can identify.
Phonetics Definition Speech Organs Consonants vs. Vowels
Performance Comparison of Speaker and Emotion Recognition
More On Linear Predictive Analysis
SPEECH CODING Maryam Zebarjad Alessandro Chiumento Supervisor : Sylwester Szczpaniak.
EE Audio Signals and Systems Speech Production Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
Linear Prediction.
Speech Recognition with Matlab ® Neil E. Cotter ECE Department UNIVERSITY OF UTAH
Today we are going to learn about: Speech sounds Anomotical production.
1 Speech Compression (after first coding) By Allam Mousa Department of Telecommunication Engineering An Najah University SP_3_Compression.
Welcome to all.
Intro. to Audio Signals Jyh-Shing Roger Jang (張智星)
The Human Voice. 1. The vocal organs
Vocoders.
Essentials of English Phonetics
The Human Voice. 1. The vocal organs
EE513 Audio Signals and Systems
Linear Prediction.
Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan
Linear Predictive Coding Methods
Evolution of human vocal production
Linear Prediction.
Sound and Matlab® Neil E. Cotter ECE Department
EE Audio Signals and Systems
Fundamentals of Phonetics
Sound and Matlab® Neil E. Cotter ECE Department
ROBOT CONTROL WITH VOICE
Methodology for constructing a single vocal tract configuration
Presentation transcript:

Speech Processing

References L.R. Rabiner and R.W. Schafer. Digital Processing of Speech Signals. Prentice-Hall, Lawrence Rabiner and Biing-Hwang Juang. Fundamentals of Speech Recognition. Prentice-Hall, James H. McClellan, et al. Computer-Based Exercises for Signal Processing Using MATLAB 5. Prentice-Hall, 1998.

The sound of spoken words is divided-up into phonemes. European languages have about forty phonemes. Phonemes are divided into two groups: voiced sounds and unvoiced sounds. Voiced sounds are “vowel-like” sounds where the sound comes from the throat. Unvoiced phonemes are “consonant-like” phonemes where the sound comes from compressed air blown through the mouth. While unvoiced phonemes are “consonant-like,” not all consonants are unvoiced. Phonemes like “s” are unvoiced, but phonemes like “z” are voiced.

Speech production may be modeled by the following diagram: Pulse Train Glottis Random Noise Vocal Tract Lip Radiation Voiced Unvoiced (See Figure 10.5 in Computer-Based Exercises for Signal Processing.)

The glottis (in the throat) produces “quasi-periodic” signals (like singing a long note). These signals are modeled as the output of the glottis block. These signals are then passed into a vocal tract block. The vocal tract models the mouth, nose and teeth. Finally the lip radiation block models the lips. Unvoiced sounds have no glottal pulse component and can be modeled with the vocal tract and lip radiation blocks. To obtain any kind of sound, the input to the vocal tract and lip radiation blocks cannot be simply a unit step but rather a random process.

Pulse Train Glottis Random Noise Vocal Tract Lip Radiation Voiced Unvoiced Let us give function values to these signals and processes: e[n] G(z) V(z)R(z) u G [n] u L [n]p L [n]

e[n] is a periodic pulse train. G(z) is the transfer function of the glottis u G [n] is the glottis output. V(z) is the transfer function of the vocal tract. R(z) is the transfer function of the lips. u L [n] is the output of the vocal tract. p L [n] is the output of the lips.

The glottal transfer function G(z) will be represented by an exponential model: The symbol e represents the base of natural logarithms. The parameter a is some value less than one that corresponds to the natural frequency of the glottis (which varies from speaker to speaker, man to woman, child to adult, etc.).

The frequency response of G(z) for various values of a is shown on the following slide. (Graph printed using glottal.m.)

The vocal tract V(z) can be modeled after a sequence of “lossless tubes”: u G [n]u L [n] AkAk A k+1 A k-1 Each “tube” has a cross-sectional area A k.

The vocal tract transfer function V(z) will be represented by following model: The parameters r k (which correspond to reflection coefficients along the vocal tract) are found from

The denominator D(z) is found from the recursive relationship: Where A k (k=1, … N) are parameters corresponding to cross-sectional areas of the vocal tract. (These values are given for a particular phoneme.) starting with D 0 (z) = 1 and ending with D(z) = D N (z).

The numerator G [of V(z)] is found by Finally, the lip radiation transfer function is given by

The previous voice model was implemented in MATLAB in a script file called voice.m. The vocal tract transfer function V(z) parameters are computed by a MATLAB function called AtoV(). The glottal transfer function G(z) coefficients are assigned to arrays numg and deng. The vocal tract/lip radiation transfer function V(z)R(z) coefficients are assigned to arrays numv and denv.

Pulse Train Glottis Random Noise Vocal Tract Lip Radiation Voiced Unvoiced e[n] G(z) V(z)R(z) u G [n] u L [n]p L [n] numg, deng AtoV  numv, denv u G [n] = rand();

for k=1:N-1 r = [r (A(k+1)-A(k))/(A(k+1)+A(k))]; end; AtoV()

for k=1:N D = [D 0] + r(k).*[0 fliplr(D)]; G = G*(1+r(k)); end;

Voiced Speech ug = 0.1*filter(numg,deng,p); pl = filter(numv,denv,ug); ug = 0.01*randn(1,10000); pl = filter(numv,denv,ug); Unvoiced Speech The array p is a pulse train

Given the vocal tract areas A k for a given vowel, we can synthesize the vowels. In the following demonstration, we will synthesize the phonemes AA and IY. The phoneme AA is like a short a (ă) The phoneme IY is like a long e (ē).

AA voiced (aav.wav) AA unvoiced (aau.wav) IY voiced (iyv.wav) IY unvoiced (iyu.wav)