ECE 598: The Speech Chain Lecture 8: Formant Transitions; Vocal Tract Transfer Function.

Slides:

Advertisements

Similar presentations

Perturbation Theory, part 2 November 4, 2014 Before I forget Course project report #3 is due! I have course project report #4 guidelines to hand out.

Advertisements

SPPA 403 Speech Science1 Unit 3 outline The Vocal Tract (VT) Source-Filter Theory of Speech Production Capturing Speech Dynamics The Vowels The Diphthongs.

Basic Spectrogram & Clinical Application: Consonants

Acoustic Characteristics of Consonants

Vowel Formants in a Spectogram Nural Akbayir, Kim Brodziak, Sabuha Erdogan.

1 CS 551/651: Structure of Spoken Language Lecture 4: Characteristics of Manner of Articulation John-Paul Hosom Fall 2008.

From Resonance to Vowels March 8, 2013 Friday Frivolity Some project reports to hand back… Mystery spectrogram reading exercise: solved! We need to plan.

Perturbation Theory March 11, 2013 Just So You Know The Fourier Analysis/Vocal Tract exercise is due on Wednesday. Please note: don’t make too much out.

1 CS 551/651: Structure of Spoken Language Spectrogram Reading: Stops John-Paul Hosom Fall 2010.

Vowels and Tubes (again) March 22, 2011 Today’s Plan Perception experiment! Discuss vowel theory #2: tubes! Then: some thoughts on music. First: let’s.

SPPA 403 Speech Science1 Unit 3 outline The Vocal Tract (VT) Source-Filter Theory of Speech Production Capturing Speech Dynamics The Vowels The Diphthongs.

ACOUSTICS OF SPEECH AND SINGING MUSICAL ACOUSTICS Science of Sound, Chapters 15, 17 P. Denes & E. Pinson, The Speech Chain (1963, 1993) J. Sundberg, The.

Speech Science XII Speech Perception (acoustic cues) Version

Landmark-Based Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Networks, and Phonology Mark Hasegawa-Johnson

Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.

The Human Voice. I. Speech production 1. The vocal organs

ACOUSTICAL THEORY OF SPEECH PRODUCTION

The Human Voice Chapters 15 and 17. Main Vocal Organs Lungs Reservoir and energy source Larynx Vocal folds Cavities: pharynx, nasal, oral Air exits through.

Structure of Spoken Language

PH 105 Dr. Cecilia Vogel Lecture 14. OUTLINE  consonants  vowels  vocal folds as sound source  formants  speech spectrograms  singing.

Eva Björkner Helsinki University of Technology Laboratory of Acoustics and Audio Signal Processing HUT, Helsinki, Finland KTH – Royal Institute of Technology.

Phonetics (Part 1) Dr. Ansa Hameed.

Sonorant Acoustics March 23, 2010 Announcements and Such Collect course reports Give back extra credits Hand out new course project guidelines Also:

It was assumed that the pressureat the lips is zero and the volume velocity source is ideal  no energy loss at the input and output. For radiation impedance:

Vowels Vowels: Articulatory Description (Ferrand, 2001) Tongue Position.

Physics of Sound Wave equation: Part. diff. equation relating pressure and velocity as a function of time and space Nonlinear contributions are not considered.

Anatomic Aspects Larynx: Sytem of muscles, cartileges and ligaments.

1 Lab Preparation Initial focus on Speaker Verification –Tools –Expertise –Good example “Biometric technologies are automated methods of verifying or recognising.

Representing Acoustic Information

Landmark-Based Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Networks, and Phonology Mark Hasegawa-Johnson

Source/Filter Theory and Vowels February 4, 2010.

Laterals + Nasals December 2, 2009 Administrative Stuff Friday: some notes on audition (hearing) Also: in-class USRIs Production Exercise #4 is due on.

Speech Production1 Articulation and Resonance Vocal tract as resonating body and sound source. Acoustic theory of vowel production.

Resonance, Revisited March 4, 2013 Leading Off… Project report #3 is due! Course Project #4 guidelines to hand out. Today: Resonance Before we get into.

Acoustic Phonetics 3/9/00. Acoustic Theory of Speech Production Modeling the vocal tract –Modeling= the construction of some replica of the actual physical.

MUSIC 318 MINI-COURSE ON SPEECH AND SINGING

Speech Science Fall 2009 Oct 26, Consonants Resonant Consonants They are produced in a similar way as vowels i.e., filtering the complex wave produced.

Speech Science VII Acoustic Structure of Speech Sounds WS

Vowel Acoustics November 2, 2012 Some Announcements Mid-terms will be back on Monday… Today: more resonance + the acoustics of vowels Also on Monday:

ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.

Speech Science Fall 2009 Oct 28, Outline Acoustical characteristics of Nasal Speech Sounds Stop Consonants Fricatives Affricates.

Voice Quality + Stop Acoustics

Say “blink” For each segment (phoneme) write a script using terms of the basic articulators that will say “blink.” Consider breathing, voicing, and controlling.

Transitions + Perception March 27, 2012 Tidbits First: Guidelines for the final project report So far, I have two people who want to present their projects.

Sonorant Acoustics March 24, 2009 Announcements and Such Collect course reports Give back homeworks Hand out new course project guidelines.

Speech Science VI Resonances WS Resonances Reading: Borden, Harris & Raphael, p Kentp Pompino-Marschallp Reetzp

Properties of Sound Making Waves. Sound Waves ■Sound is created by vibrations.

Resonance October 23, 2014 Leading Off… Don’t forget: Korean stops homework is due on Tuesday! Also new: mystery spectrograms! Today: Resonance Before.

Vowel Acoustics March 10, 2014 Some Announcements Today and Wednesday: more resonance + the acoustics of vowels On Friday: identifying vowels from spectrograms.

From Resonance to Vowels March 13, 2012 Fun Stuff (= tracheotomy) Peter Ladefoged: “To record the pressure of the air associated with stressed as opposed.

EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,

Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June,

Sonorant Acoustics + Place Transitions

More On Linear Predictive Analysis

Stop Acoustics and Glides December 2, 2013 Where Do We Go From Here? The Final Exam has been scheduled! Wednesday, December 18 th 8-10 am (!) Kinesiology.

Stop + Approximant Acoustics

P105 Lecture #27 visuals 20 March 2013.

Acoustic Phonetics 3/14/00.

EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 20,

Phonetics: A lecture Raung-fu Chung Southern Taiwan University

Sonorant Acoustics March 4, 2010.

The Human Voice. 1. The vocal organs

Structure of Spoken Language

Structure of Spoken Language

The Human Voice. 1. The vocal organs

The Production of Speech

Resonances of the Vocal Tract

ECE 598: The Speech Chain Lecture 6: Vowels.

Speech Perception (acoustic cues)

Presentation transcript:

ECE 598: The Speech Chain Lecture 8: Formant Transitions; Vocal Tract Transfer Function

Today Perturbation Theory: Perturbation Theory: A different way to estimate vocal tract resonant frequencies, useful for consonant transitions A different way to estimate vocal tract resonant frequencies, useful for consonant transitions Syllable-Final Consonants: Formant Transitions Syllable-Final Consonants: Formant Transitions Vocal Tract Transfer Function Vocal Tract Transfer Function Uniform Tube (Quarter-Wave Resonator) Uniform Tube (Quarter-Wave Resonator) During Vowels: All-Pole Spectrum During Vowels: All-Pole Spectrum Q Bandwidth Bandwidth Nasal Vowels: Sum of two transfer functions gives spectral zeros Nasal Vowels: Sum of two transfer functions gives spectral zeros

Topic #1: Perturbation Theory

Perturbation Theory (Chiba and Kajiyama, The Vowel, 1940) A(x) is constant everywhere, except for one small perturbation. Method: 1. Compute formants of the “unperturbed” vocal tract. 2. Perturb the formant frequencies to match the area perturbation.

Conservation of Energy Under Perturbation

“ Sensitivity ” Functions

Sensitivity Functions for the Quarter- Wave Resonator (Lips Open)  L /AA//ER/ /IY//W/ Note: low F3 of /er/ is caused in part by a side branch under the tongue – perturbation alone is not enough to explain it.

Sensitivity Functions for the Half- Wave Resonator (Lips Rounded)  L /L,OW//UW/ Note: high F3 of /l/ is caused in part by a side branch above the tongue – perturbation alone is not enough to explain it.

Formant Frequencies of Vowels From Peterson & Barney, 1952

Topic #2: Formant Transitions, Syllable-Final Consonant

Events in the Closure of a Nasal Consonant Vowel Nasalization Formant Transitions Nasal Murmur

Formant Transitions: A Perturbation Theory Model

Formant Transitions: Labial Consonants “the mom” “the bug”

Formant Transitions: Alveolar Consonants “the tug” “the supper”

Formant Transitions: Post-alveolar Consonants “the shoe” “the zsazsa”

Formant Transitions: Velar Consonants “the gut” “sing a song”

Topic #3: Vocal Tract Transfer Functions

Transfer Function “Transfer Function” T(  )=Output(  )/Input(  ) “Transfer Function” T(  )=Output(  )/Input(  ) In speech, it’s convenient to write T(  )=U L (  )/U G (  ) In speech, it’s convenient to write T(  )=U L (  )/U G (  ) U L (  ) = volume velocity at the lips U L (  ) = volume velocity at the lips U G (  ) = volume velocity at the glottis U G (  ) = volume velocity at the glottis T(0) = 1 T(0) = 1 Speech recorded at a microphone = pressure Speech recorded at a microphone = pressure P R (  ) = R(  )T(  )U G (  ) P R (  ) = R(  )T(  )U G (  ) R(  ) = j  f/r = “radiation characteristic” R(  ) = j  f/r = “radiation characteristic”  = density of air  = density of air r = distance to the microphone r = distance to the microphone f = frequency in Hertz f = frequency in Hertz

Transfer Function of an Ideal Uniform Tube Ideal Terminations: Ideal Terminations: Reflection coefficient at glottis: zero velocity,  =1 Reflection coefficient at glottis: zero velocity,  =1 Reflection coefficient at lips: zero pressure,  =  1 Reflection coefficient at lips: zero pressure,  =  1 Obviously, this is an approximation, but it gives… Obviously, this is an approximation, but it gives… T(  ) = 1/cos(  L/c) = …              …  n = n  c/L –  c/2L F n = nc/2L – c/4L 122232…122232…

Transfer Function of an Ideal Uniform Tube Peaks are actually infinite in height (figure is clipped to fit the display)

Transfer Function of a Non-Ideal Uniform Tube Almost ideal terminations: Almost ideal terminations: At glottis: velocity almost zero,  ≈1 At glottis: velocity almost zero,  ≈1 At lips: pressure almost zero,  ≈  1 At lips: pressure almost zero,  ≈  1 T(  ) = 1/(j/Q +cos(  L/c)) … at F n =nc/2L – c/4L,… T(2  F n ) =  jQ 20log 10 |T(2  F n )| = 20log 10 Q

Transfer Function of a Non-Ideal Uniform Tube

Transfer Function of a Vowel: Height of First Peak is Q 1 =F 1 /B 1 T(  ) =  (j  +j2  F n +  B n )(j  j2  F n +  B n ) T(2  F 1 ) ≈ (2  F 1 ) 2 /(j4  F 1  B 1 ) =  jF 1 /B 1 Call Q n = F n /B n T(2  F 1 ) ≈  jQ 1 20log10|T(2  F 1 )| ≈ 20log10Q 1 (2  F n ) 2 +(  B n ) 2 n=1 ∞

Transfer Function of a Vowel: Bandwidth of a Peak is B n T(  ) =  (j  +j2  F n +  B n )(j  j2  F n +  B n ) T(2  F 1 +  B 1 ) ≈ (2  F 1 ) 2 /((j4  F 1 )(  B 1 +  B 1 )) =  jQ 1 /2 At f=F B n, |T(  )|=0.5Q n 20log 10 |T(  )| = 20log 10 Q 1 – 3dB (2  F n ) 2 +(  B n ) 2 n=1 ∞

Amplitudes of Higher Formants: Include the Rolloff T(  ) =  (j  +j2  F n +  B n )(j  j2  F n +  B n ) At f above F 1 T(2  f) ≈ (F 1 /f) T(2  F 2 ) ≈ (  jF 2 /B 2 )(F 1 /F 2 ) 20log10|T(2  F 2 )| ≈ 20log 10 Q 2 – 20log 10 (F 2 /F 1 ) 1/f Rolloff: 6 dB per octave (per doubling of frequency) (2  F n ) 2 +(  B n ) 2 n=1 ∞

Vowel Transfer Function: Synthetic Example L 1 = 20log 10 (500/80)=16dB L 2 = 20log 10 (1500/240) – 20log 10 (F 2 /F 1 ) = 16dB – 9.5dB L 3 = 20log 10 (2500/600) – 20log 10 (F 3 /F 1 ) – 20log 10 (F 3 /F 2 ) B 2 = 240Hz B 1 = 80Hz B 3 = 600Hz? (hard to measure because rolloff from F 1, F 2 turns the F 3 peak into a plateau) F 4 peak completely swamped by rolloff from lower formants

Shorthand Notation for the Spectrum of a Vowel T(s) =  (s  s n )(s  s n *) s = j  s n =  B n +j2  F n s n * =  B n  j2  F n s n s n * = |s n | 2 = (2  F n ) 2 +(  B n ) 2 T(0) = 1 20log 10 |T(0)| = 0dB snsn*snsn* n=1 ∞

Another Shorthand Notation for the Spectrum of a Vowel T(s) =  (1-s/s n )(1-s/s n *) 1 n=1 ∞

Topic #4: Nasalized Vowels

Vowel Nasalization Nasalized Vowel Nasal Consonant

Nasalized Vowel P R (  ) = R(  )(U L (  )+U N (  )) U N (  ) = Volume Velocity from Nostrils P R (  ) = R(  )(T L (  )+T N (  ))U G (  ) = R(  )T(  )U G (  ) T(  ) = T L (  ) + T N (  )

Nasalized Vowel T(s) = T L (s)+T N (s) = (1-s/s Ln )(1-s/s Ln *) + (1-s/s Nn )(1-s/s Nn *) = (1-s/s Ln )(1-s/s Ln *)(1-s/s Nn )(1-s/s Nn *) 1/s Zn = ½(1/s Ln +1/s Nn ) s Zn = n th spectral zero T(s) = 0 if s=s Zn 11 2(1-s/s Zn )(1-s/s Zn *)

The “Pole-Zero Pair” 20log 10 T(  ) = 20log 10 (1/(1-s/s Ln )(1-s/s Ln *)) + 20log 10 ((1-s/s Zn )(1-s/s Zn *)/(1-s/s Nn )(1-s/s Nn *)) = original vowel log spectrum + log spectrum of a pole-zero pair

Additive Terms in the Log Spectrum

Transfer Function of a Nasalized Vowel

Pole-Zero Pairs in the Spectrogram Nasal Pole Zero Oral Pole

Summary Perturbation Theory: Perturbation Theory: Squeeze near a velocity peak: formant goes down Squeeze near a velocity peak: formant goes down Squeeze near a pressure peak: formant goes up Squeeze near a pressure peak: formant goes up Formant Transitions Formant Transitions Labial closure: loci near 250, 1000, 2000 Hz Labial closure: loci near 250, 1000, 2000 Hz Alveolar closure: loci near 250, 1700, 3000 Hz Alveolar closure: loci near 250, 1700, 3000 Hz Velar closure: F2 and F3 come together (“velar pinch”) Velar closure: F2 and F3 come together (“velar pinch”) Vocal Tract Transfer Function Vocal Tract Transfer Function T(s) =  s n s n */(s-s n )(s-s n *) T(s) =  s n s n */(s-s n )(s-s n *) T(  =2  Fn) = Q n = F n /B n T(  =2  Fn) = Q n = F n /B n 3dB bandwidth = B n Hertz 3dB bandwidth = B n Hertz T(0) = 1 T(0) = 1 Nasal Vowels: Nasal Vowels: Sum of two transfer functions gives a spectral zero between the oral and nasal poles Sum of two transfer functions gives a spectral zero between the oral and nasal poles Pole-zero pair is a local perturbation of the spectrum Pole-zero pair is a local perturbation of the spectrum