ECE 598: The Speech Chain Lecture 8: Formant Transitions; Vocal Tract Transfer Function
Today Perturbation Theory: Perturbation Theory: A different way to estimate vocal tract resonant frequencies, useful for consonant transitions A different way to estimate vocal tract resonant frequencies, useful for consonant transitions Syllable-Final Consonants: Formant Transitions Syllable-Final Consonants: Formant Transitions Vocal Tract Transfer Function Vocal Tract Transfer Function Uniform Tube (Quarter-Wave Resonator) Uniform Tube (Quarter-Wave Resonator) During Vowels: All-Pole Spectrum During Vowels: All-Pole Spectrum Q Bandwidth Bandwidth Nasal Vowels: Sum of two transfer functions gives spectral zeros Nasal Vowels: Sum of two transfer functions gives spectral zeros
Topic #1: Perturbation Theory
Perturbation Theory (Chiba and Kajiyama, The Vowel, 1940) A(x) is constant everywhere, except for one small perturbation. Method: 1. Compute formants of the “unperturbed” vocal tract. 2. Perturb the formant frequencies to match the area perturbation.
Conservation of Energy Under Perturbation
“ Sensitivity ” Functions
Sensitivity Functions for the Quarter- Wave Resonator (Lips Open) L /AA//ER/ /IY//W/ Note: low F3 of /er/ is caused in part by a side branch under the tongue – perturbation alone is not enough to explain it.
Sensitivity Functions for the Half- Wave Resonator (Lips Rounded) L /L,OW//UW/ Note: high F3 of /l/ is caused in part by a side branch above the tongue – perturbation alone is not enough to explain it.
Formant Frequencies of Vowels From Peterson & Barney, 1952
Topic #2: Formant Transitions, Syllable-Final Consonant
Events in the Closure of a Nasal Consonant Vowel Nasalization Formant Transitions Nasal Murmur
Formant Transitions: A Perturbation Theory Model
Formant Transitions: Labial Consonants “the mom” “the bug”
Formant Transitions: Alveolar Consonants “the tug” “the supper”
Formant Transitions: Post-alveolar Consonants “the shoe” “the zsazsa”
Formant Transitions: Velar Consonants “the gut” “sing a song”
Topic #3: Vocal Tract Transfer Functions
Transfer Function “Transfer Function” T( )=Output( )/Input( ) “Transfer Function” T( )=Output( )/Input( ) In speech, it’s convenient to write T( )=U L ( )/U G ( ) In speech, it’s convenient to write T( )=U L ( )/U G ( ) U L ( ) = volume velocity at the lips U L ( ) = volume velocity at the lips U G ( ) = volume velocity at the glottis U G ( ) = volume velocity at the glottis T(0) = 1 T(0) = 1 Speech recorded at a microphone = pressure Speech recorded at a microphone = pressure P R ( ) = R( )T( )U G ( ) P R ( ) = R( )T( )U G ( ) R( ) = j f/r = “radiation characteristic” R( ) = j f/r = “radiation characteristic” = density of air = density of air r = distance to the microphone r = distance to the microphone f = frequency in Hertz f = frequency in Hertz
Transfer Function of an Ideal Uniform Tube Ideal Terminations: Ideal Terminations: Reflection coefficient at glottis: zero velocity, =1 Reflection coefficient at glottis: zero velocity, =1 Reflection coefficient at lips: zero pressure, = 1 Reflection coefficient at lips: zero pressure, = 1 Obviously, this is an approximation, but it gives… Obviously, this is an approximation, but it gives… T( ) = 1/cos( L/c) = … … n = n c/L – c/2L F n = nc/2L – c/4L 122232…122232…
Transfer Function of an Ideal Uniform Tube Peaks are actually infinite in height (figure is clipped to fit the display)
Transfer Function of a Non-Ideal Uniform Tube Almost ideal terminations: Almost ideal terminations: At glottis: velocity almost zero, ≈1 At glottis: velocity almost zero, ≈1 At lips: pressure almost zero, ≈ 1 At lips: pressure almost zero, ≈ 1 T( ) = 1/(j/Q +cos( L/c)) … at F n =nc/2L – c/4L,… T(2 F n ) = jQ 20log 10 |T(2 F n )| = 20log 10 Q
Transfer Function of a Non-Ideal Uniform Tube
Transfer Function of a Vowel: Height of First Peak is Q 1 =F 1 /B 1 T( ) = (j +j2 F n + B n )(j j2 F n + B n ) T(2 F 1 ) ≈ (2 F 1 ) 2 /(j4 F 1 B 1 ) = jF 1 /B 1 Call Q n = F n /B n T(2 F 1 ) ≈ jQ 1 20log10|T(2 F 1 )| ≈ 20log10Q 1 (2 F n ) 2 +( B n ) 2 n=1 ∞
Transfer Function of a Vowel: Bandwidth of a Peak is B n T( ) = (j +j2 F n + B n )(j j2 F n + B n ) T(2 F 1 + B 1 ) ≈ (2 F 1 ) 2 /((j4 F 1 )( B 1 + B 1 )) = jQ 1 /2 At f=F B n, |T( )|=0.5Q n 20log 10 |T( )| = 20log 10 Q 1 – 3dB (2 F n ) 2 +( B n ) 2 n=1 ∞
Amplitudes of Higher Formants: Include the Rolloff T( ) = (j +j2 F n + B n )(j j2 F n + B n ) At f above F 1 T(2 f) ≈ (F 1 /f) T(2 F 2 ) ≈ ( jF 2 /B 2 )(F 1 /F 2 ) 20log10|T(2 F 2 )| ≈ 20log 10 Q 2 – 20log 10 (F 2 /F 1 ) 1/f Rolloff: 6 dB per octave (per doubling of frequency) (2 F n ) 2 +( B n ) 2 n=1 ∞
Vowel Transfer Function: Synthetic Example L 1 = 20log 10 (500/80)=16dB L 2 = 20log 10 (1500/240) – 20log 10 (F 2 /F 1 ) = 16dB – 9.5dB L 3 = 20log 10 (2500/600) – 20log 10 (F 3 /F 1 ) – 20log 10 (F 3 /F 2 ) B 2 = 240Hz B 1 = 80Hz B 3 = 600Hz? (hard to measure because rolloff from F 1, F 2 turns the F 3 peak into a plateau) F 4 peak completely swamped by rolloff from lower formants
Shorthand Notation for the Spectrum of a Vowel T(s) = (s s n )(s s n *) s = j s n = B n +j2 F n s n * = B n j2 F n s n s n * = |s n | 2 = (2 F n ) 2 +( B n ) 2 T(0) = 1 20log 10 |T(0)| = 0dB snsn*snsn* n=1 ∞
Another Shorthand Notation for the Spectrum of a Vowel T(s) = (1-s/s n )(1-s/s n *) 1 n=1 ∞
Topic #4: Nasalized Vowels
Vowel Nasalization Nasalized Vowel Nasal Consonant
Nasalized Vowel P R ( ) = R( )(U L ( )+U N ( )) U N ( ) = Volume Velocity from Nostrils P R ( ) = R( )(T L ( )+T N ( ))U G ( ) = R( )T( )U G ( ) T( ) = T L ( ) + T N ( )
Nasalized Vowel T(s) = T L (s)+T N (s) = (1-s/s Ln )(1-s/s Ln *) + (1-s/s Nn )(1-s/s Nn *) = (1-s/s Ln )(1-s/s Ln *)(1-s/s Nn )(1-s/s Nn *) 1/s Zn = ½(1/s Ln +1/s Nn ) s Zn = n th spectral zero T(s) = 0 if s=s Zn 11 2(1-s/s Zn )(1-s/s Zn *)
The “Pole-Zero Pair” 20log 10 T( ) = 20log 10 (1/(1-s/s Ln )(1-s/s Ln *)) + 20log 10 ((1-s/s Zn )(1-s/s Zn *)/(1-s/s Nn )(1-s/s Nn *)) = original vowel log spectrum + log spectrum of a pole-zero pair
Additive Terms in the Log Spectrum
Transfer Function of a Nasalized Vowel
Pole-Zero Pairs in the Spectrogram Nasal Pole Zero Oral Pole
Summary Perturbation Theory: Perturbation Theory: Squeeze near a velocity peak: formant goes down Squeeze near a velocity peak: formant goes down Squeeze near a pressure peak: formant goes up Squeeze near a pressure peak: formant goes up Formant Transitions Formant Transitions Labial closure: loci near 250, 1000, 2000 Hz Labial closure: loci near 250, 1000, 2000 Hz Alveolar closure: loci near 250, 1700, 3000 Hz Alveolar closure: loci near 250, 1700, 3000 Hz Velar closure: F2 and F3 come together (“velar pinch”) Velar closure: F2 and F3 come together (“velar pinch”) Vocal Tract Transfer Function Vocal Tract Transfer Function T(s) = s n s n */(s-s n )(s-s n *) T(s) = s n s n */(s-s n )(s-s n *) T( =2 Fn) = Q n = F n /B n T( =2 Fn) = Q n = F n /B n 3dB bandwidth = B n Hertz 3dB bandwidth = B n Hertz T(0) = 1 T(0) = 1 Nasal Vowels: Nasal Vowels: Sum of two transfer functions gives a spectral zero between the oral and nasal poles Sum of two transfer functions gives a spectral zero between the oral and nasal poles Pole-zero pair is a local perturbation of the spectrum Pole-zero pair is a local perturbation of the spectrum