Speech Recognition Acoustic Theory of Speech Production.

Slides:



Advertisements
Similar presentations
Acoustic Characteristics of Consonants
Advertisements

Uniform plane wave.
Types, characteristics, properties
EEE 498/598 Overview of Electrical Engineering
Waves and Sound Honors Physics. What is a wave A WAVE is a vibration or disturbance in space. A MEDIUM is the substance that all SOUND WAVES travel through.
Chapter-3-1CS331- Fakhry Khellah Term 081 Chapter 3 Data and Signals.
TRANSMISSION FUNDAMENTALS Review
Sound Chapter 15.
Comments, Quiz # 1. So far: Historical overview of speech technology - basic components/goals for systems Quick overview of pattern recognition basics.
Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.
ECE 598: The Speech Chain Lecture 8: Formant Transitions; Vocal Tract Transfer Function.
ACOUSTICAL THEORY OF SPEECH PRODUCTION
The Human Voice Chapters 15 and 17. Main Vocal Organs Lungs Reservoir and energy source Larynx Vocal folds Cavities: pharynx, nasal, oral Air exits through.
PH 105 Dr. Cecilia Vogel Lecture 14. OUTLINE  consonants  vowels  vocal folds as sound source  formants  speech spectrograms  singing.
Complete Discrete Time Model Complete model covers periodic, noise and impulsive inputs. For periodic input 1) R(z): Radiation impedance. It has been shown.
Experiment with the Slinky
It was assumed that the pressureat the lips is zero and the volume velocity source is ideal  no energy loss at the input and output. For radiation impedance:
1 Fall 2004 Physics 3 Tu-Th Section Claudio Campagnari Lecture 3: 30 Sep Web page:
Physics of Sound Wave equation: Part. diff. equation relating pressure and velocity as a function of time and space Nonlinear contributions are not considered.
Anatomic Aspects Larynx: Sytem of muscles, cartileges and ligaments.
Ultrasound Imaging Atam Dhawan.
7/5/20141FCI. Prof. Nabila M. Hassan Faculty of Computer and Information Fayoum University 2013/2014 7/5/20142FCI.
Landmark-Based Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Networks, and Phonology Mark Hasegawa-Johnson
9.6 Other Heat Conduction Problems
Vibrations, Waves, & Sound
Waves and Sound AP Physics 1. What is a wave A WAVE is a vibration or disturbance in space. A MEDIUM is the substance that all SOUND WAVES travel through.
Chapter 12 Preview Objectives The Production of Sound Waves
Speech Production1 Articulation and Resonance Vocal tract as resonating body and sound source. Acoustic theory of vowel production.
Acoustic Phonetics 3/9/00. Acoustic Theory of Speech Production Modeling the vocal tract –Modeling= the construction of some replica of the actual physical.
Introduction to Vibrations and Waves
Sound Waves Sound waves are divided into three categories that cover different frequency ranges Audible waves lie within the range of sensitivity of the.
Speech Science Fall 2009 Oct 28, Outline Acoustical characteristics of Nasal Speech Sounds Stop Consonants Fricatives Affricates.
Waves and Sound Level 1 Physics.
Chapter 17 Sound Waves: part one. Introduction to Sound Waves Sound waves are longitudinal waves They travel through any material medium The speed of.
♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 1/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay NCC 2011 : 17.
Holt Physics Chapter 12 Sound.
Chapter 14 Sound. Sound waves Sound – longitudinal waves in a substance (air, water, metal, etc.) with frequencies detectable by human ears (between ~
Fundamentals of Audio Production. Chapter 1 1 Fundamentals of Audio Production Chapter One: The Nature of Sound.
Formants, Resonance, and Deriving Schwa March 10, 2009.
Structure of Spoken Language
© Houghton Mifflin Harcourt Publishing Company Preview Objectives The Production of Sound Waves Frequency of Sound Waves The Doppler Effect Chapter 12.
WAVES Wave motion allows a transfer of energy without a transfer of matter.
Speech Science VI Resonances WS Resonances Reading: Borden, Harris & Raphael, p Kentp Pompino-Marschallp Reetzp
Resonance October 23, 2014 Leading Off… Don’t forget: Korean stops homework is due on Tuesday! Also new: mystery spectrograms! Today: Resonance Before.
Lecture 21-22: Sound Waves in Fluids Sound in ideal fluid Sound in real fluid. Attenuation of the sound waves 1.
Chapters 16, 17 Waves.
Slide 17-1 Lecture Outline Chapter 17 Waves in Two and Three Dimensions © 2015 Pearson Education, Inc.
1 Linear Wave Equation The maximum values of the transverse speed and transverse acceleration are v y, max =  A a y, max =  2 A The transverse speed.
EE Audio Signals and Systems Wave Basics Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Chapter 12 Preview Objectives The Production of Sound Waves
Acoustic Tube Modeling (I) 虞台文. Content Introduction Wave Equations for Lossless Tube Uniform Lossless Tube Lips-Radiation Model Glottis Model One-Tube.
ARCHITECTURAL ACOUSTICS
Physics Mrs. Dimler SOUND.  Every sound wave begins with a vibrating object, such as the vibrating prong of a tuning fork. Tuning fork and air molecules.
Lecture #28: Waves and Sound AP Physics B. wave direction What is a wave? A wave is a traveling disturbance. A wave carries energy from place to place.
Physics 1 What is a wave? A wave is: an energy-transferring disturbance moves through a material medium or a vacuum.
AN ANALOG INTEGRATED- CIRCUIT VOCAL TRACT PRESENTED BY: NIEL V JOSEPH S7 AEI ROLL NO-46 GUIDED BY: MR.SANTHOSHKUMAR.S ASST.PROFESSOR E&C DEPARTMENT.
Sound.
Waves and Sound AP Physics B.
Chapter 11. The uniform plane wave
Hearing Biomechanics Standing waves.
Waves and Sound.
Waves and Sound AP Physics B.
11-3: PROPERTIES OF WAVES.
11-3: PROPERTIES OF WAVES.
Waves and Sound Honors Physics.
Waves and Sound AP Physics B.
Waves and Sound AP Physics 1.
Waves and Sound.
Waves and Sound AP Physics B.
Waves and Sound AP Physics 1.
Presentation transcript:

Speech Recognition Acoustic Theory of Speech Production

16 May 2015Veton Këpuska2 Acoustic Theory of Speech Production  Overview  Sound sources  Vocal tract transfer function Wave equations Sound propagation in a uniform acoustic tube  Representing the vocal tract with simple acoustic tubes  Estimating natural frequencies from area functions  Representing the vocal tract with multiple uniform tubes

16 May 2015Veton Këpuska3 Anatomical Structures for Speech Production

16 May 2015Veton Këpuska4 Places of Articulation for Speech Sounds

16 May 2015Veton Këpuska5 Phonemes in American English

SPHINX Phone Set PhonemeExampleTranslation 1AAoddAA D 2AEatAE T 3AHhutHH AH T 4AOoughtAO T 5AWcowK AW 6AYhideHH AY D 7BbeB IY 8CHdeeD IY 9DHtheeDH IY 10EHEdEH D 11ERhurtHH ER T 12EYateEY T 13FfeeF IY 14GgreenG R IY N 15HHheHH IY 16IHitIH T 17IYeatIY T 18JHgeeJH IY 16 May 2015Veton Këpuska6 PhonemeExampleTranslation 19KkeyK IY 20LleeL IY 21MmeM IY 22NkneeN IY 23NGpingP IH NG 24OWoatOW T 25OYtoyT OY 26PpeeP IY 27RreadR IY D 28SseaS IY 29SHsheSH IY 30TteaT IY 31THthetaTH EY T AH 32UHtwoT UW 33VveeV IY 34YyieldY IY L D 35ZzeeZ IY 36ZHseizureS IY ZH ER

16 May 2015Veton Këpuska7 Speech Waveform: An Example

16 May 2015Veton Këpuska8 A Wideband Spectrogram

16 May 2015Veton Këpuska9

A Narrowband Spectrogram 16 May 2015Veton Këpuska10

16 May 2015Veton Këpuska11 Physics of Sound  Sound Generation: Vibration of particles in a medium (e.g., air, water).  Speech Production: Perturbation of air particles near the lips.  Speech Communication: Propagation of particle vibrations/perturbations as chain reaction through free space (e.g., a medium like air) from the source (i.e., lips of a speaker) to the destination (i.e., ear of a listener). Listener’s ear ear-drum caused vibrations trigger series of transductions initiated by this mechanical motion leading to neural firing ultimately perceived by the brain.

16 May 2015Veton Këpuska12 Physics of Sound  A sound wave is the propagation of a disturbance of particles through an air medium (or more generally any conducting medium) without the permanent displacement of the particles themselves.  Alternating compression and rarefaction phases create a traveling wave.  Associated with disturbance are local changes in particle: Pressure Displacement Velocity

16 May 2015Veton Këpuska13 Physics of Sound  Sound wave: Wavelength, : distance between two consecutive peak compressions (or rarefactions) in space (not in time).  Wavelength,,is also the distance the wave travels in one cycle of the vibration of air particles. Frequency, f: is the number of cycles of compression (or rarefaction) of air particle vibration per second.  Wave travels a distance of f wavelengths in one second. Velocity of sound, c: is thus given by c = f.  At sea level and temperature of 70 o F, c=344 m/s. Wavenumber, k:  Radian frequency: =2f  /c=2/=k

16 May 2015Veton Këpuska14 Traveling Wave

16 May 2015Veton Këpuska15 Physics of Sound  Suppose the frequency of a sound wave is f = 50 Hz, 1000 Hz, and Hz. Also assume that the velocity of sound at sea level is c = 344 m/s.  The wavelength of sound wave is respectively: = 6.88 m, m and m.  Speech sounds have wide range of wavelengths values:  Audio range: f min = 30 Hz ⇒=11.5 m f max = 20 kHz ⇒= m  In audible range a propagation of sound wave is considered to be an adiabatic process, that is, heat generated by particle collision during pressure fluctuations, has not time to dissipate away and therefore temperature changes occur locally in the medium.

16 May 2015Veton Këpuska16 Acoustic Theory of Speech Production  The acoustic characteristics of speech are usually modeled as a sequence of source, vocal tract filter, and radiation characteristics P r (jΩ) = S(jΩ) T (jΩ) R(jΩ)  For vowel production: S(jΩ) = U G (jΩ) T (jΩ) = U L (jΩ) /U G (jΩ) R(jΩ) = P r (jΩ) /U L (jΩ)

16 May 2015Veton Këpuska17 Sound Source: Vocal Fold Vibration  Modeled as a volume velocity source at glottis, U G (jΩ)

16 May 2015Veton Këpuska18 Sound Source: Turbulence Noise  Turbulence noise is produced at a constriction in the vocal tract Aspiration noise is produced at glottis Frication noise is produced above the glottis  Modeled as series pressure source at constriction, P S (jΩ)

16 May 2015Veton Këpuska19 Vocal Tract Wave Equations  Define: u(x,t) ⇒ particle velocity U(x,t) ⇒ volume velocity (U = uA) p(x,t) ⇒ sound pressure variation (P = P O + p) ρ ⇒ density of air c ⇒ velocity of sound  Assuming plane wave propagation (for across dimension ≪ λ), and a one-dimensional wave motion, it can be shown that:

16 May 2015Veton Këpuska20 The Plane Wave Equation  First form of Wave Equation:  Second form is obtained by differentiating equations above with respect to x and t respectively:

16 May 2015Veton Këpuska21 Solution of Wave Equations

16 May 2015Veton Këpuska22 Propagation of Sound in a Uniform Tube  The vocal tract transfer function of volume velocities is

16 May 2015Veton Këpuska23 Analogy with Electrical Circuit Transmission Line

16 May 2015Veton Këpuska24 Propagation of Sound in a Uniform Tube  Using the boundary conditions U (0,s)=U G (s) and P(- l,s)=0  The poles of the transfer function T (jΩ) are where cos(Ω l /c)=0

16 May 2015Veton Këpuska25 Propagation of Sound in a Uniform Tube (con’t)  For c =34,000 cm/sec, l =17 cm, the natural frequencies (also called the formants) are at 500 Hz, 1500 Hz, 2500 Hz, …  The transfer function of a tube with no side branches, excited at one end and response measured at another, only has poles  The formant frequencies will have finite bandwidth when vocal tract losses are considered (e.g., radiation, walls, viscosity, heat)  The length of the vocal tract, l, corresponds to 1/4λ 1, 3/4λ 2, 5/4λ 3, …, where λ i is the wavelength of the ith natural frequency

16 May 2015Veton Këpuska26 Uniform Tube Model  Example Consider a uniform tube of length l =35 cm. If speed of sound is 350 m/s calculate its resonances in Hz. Compare its resonances with a tube of length l = 17.5 cm. f=/2 ⇒

16 May 2015Veton Këpuska27 Uniform Tube Model  For 17.5 cm tube:

16 May 2015Veton Këpuska28 Standing Wave Patterns in a Uniform Tube  A uniform tube closed at one end and open at the other is often referred to as a quarter wavelength resonator

16 May 2015Veton Këpuska29 Natural Frequencies of Simple Acoustic Tubes

16 May 2015Veton Këpuska30 Approximating Vocal Tract Shapes

16 May 2015Veton Këpuska l Estimating Natural Resonance Frequencies  Resonance frequencies occur where impendence (or admittance) function equals natural (e.g., open circuit) boundary conditions  For a two tube approximation it is easiest to solve for Y 1 + Y 2 = 0.

16 May 2015Veton Këpuska32 Decoupling Simple Tube Approximations  If A1 ≫ A2,or A1 ≪ A2, the tubes can be decoupled and natural frequencies of each tube can be computed independently  For the vowel /i y /, the formant frequencies are obtained from:  At low frequencies:  This low resonance frequency is called the Helmholtz resonance.

16 May 2015Veton Këpuska33 Vowel Production Example FormantActualEstimatedFormantActualEstimated F F F F F F32917 ………………

16 May 2015Veton Këpuska34 Example of Vowel Spectrograms

16 May 2015Veton Këpuska35 Estimating Anti-Resonance Frequencies (Zeros) Zeros occur at frequencies where there is no measurable output  For nasal consonants, zeros in U N occur where Y o = ∞  For fricatives or stop consonants, zeros in U L occur where the impedance behind source is infinite (i.e., a hard wall at source)

16 May 2015Veton Këpuska36 Estimating Anti-Resonance Frequencies (Zeros)  Zeros occur when measurements are made in vocal tract interior:

16 May 2015Veton Këpuska37 Consonant Production

16 May 2015Veton Këpuska38 Example of Consonant Spectrograms

16 May 2015Veton Këpuska39 Perturbation Theory  Consider a uniform tube, closed at one end and open at the other.  Reducing the area of a small piece of the tube near the opening (where U is max) has the same effect as keeping the area fixed and lengthening the tube.  Since lengthening the tube lowers the resonant frequencies, narrowing the tube near points where U (x) is maximum in the standing wave pattern for a given formant decreases the value of that formant.

16 May 2015Veton Këpuska40 Perturbation Theory (Con’t)  Reducing the area of a small piece of the tube near the closure (where p is max) has the same effect as keeping the area fixed and shortening the tube  Since shortening the tube will increase the values of the formants, narrowing the tube near points where p(x) is maximum in the standing wave pattern for a given formant will increase the value of that formant

16 May 2015Veton Këpuska41 Summary of Perturbation Theory Results

16 May 2015Veton Këpuska42 Illustration of Perturbation Theory

16 May 2015Veton Këpuska43 Illustration of Perturbation Theory

16 May 2015Veton Këpuska44 Multi-Tube Approximation of the Vocal Tract  We can represent the vocal tract as a concatenation of N lossless tubes with constant area {Ak}.and equal length Δx = l /N  The wave propagation time through each tube is  =Δx/c = l /Nc

16 May 2015Veton Këpuska45 A Discrete-Time Model Based on Tube Concatenation  Frequency response of vocal tract, V a ()=U(l,)/U g (), is easy to obtain due to linearity of the model. Radiation impedance can be modified to match observed formant bandwidths. Concatenated tube model leads to resulting all-pole model which in turn leads to linear prediction speech analysis. Draw back of this technique is that although frequency response predicted from concatenated tube model can be made to approximately match spectral measurements, the concatenated tube model is less accurate in representing the physics of sound propagation than the coupled partial differential equation models.  The contributions of energy loss from: Vibrating walls Viscosity, Thermal conduction, as well as Nonlinear coupling between the glottal and vocal tract airflow, Are not represented in the lossless concatenated tube model.

16 May 2015Veton Këpuska46 Sound Propagation in the Concatenated Tube Model  Consider an N-tube model of the previous figure. Each tube has length l k and cross sectional area of A k.  Assume: No losses Planar wave propagation  The wave equations for section k: 0≤x≤l k are of the form: where x is measured from the left-hand side (0 ≤x ≤Δx)

16 May 2015Veton Këpuska47 Sound Propagation in the Concatenated Tube Model  Boundary conditions:  Physical principle of continuity: Pressure and volume velocity must be continuous both in time and in space everywhere in the system:  At k’th/(k+1)’st junction we have:

16 May 2015Veton Këpuska48 Update Expression at Tube Boundaries  We can solve update expressions using continuity constraints at tube boundaries e.g., p k (Δx,t)= p k+1 (0,t),and U k (Δx,t)= U k+1 (0,t)

16 May 2015Veton Këpuska49 Digital Model of Multi-Tube Vocal Tract  Updates at tube boundaries occur synchronously every 2  If excitation is band-limited, inputs can be sampled every T =2  Each tube section has a delay of z -1/2.  The choice of N depends on the sampling rate T

16 May 2015Veton Këpuska50 Acoustic Theory of Speech Production  References Stevens, Acoustic Phonetics, MIT Press, Rabiner & Schafer, Digital Processing of Speech Signals, Prentice-Hall, Quatieri, Discrete-time Speech Signal Processing Principles and Practice, Prentice-Hall, 2002