Download presentation
Presentation is loading. Please wait.
Published byGeoffrey Flynn Modified over 9 years ago
1
Landmark-Based Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Networks, and Phonology Mark Hasegawa-Johnson jhasegaw@uiuc.edu University of Illinois at Urbana-Champaign, USA
2
Lecture 3: Spectral Dynamics and the Production of Consonants International Phonetic Alphabet Events in the Closure of a Nasal Consonant –Formant transitions: a perturbation model –Nasalized vowel –Nasal murmur Events in the Release of a Stop Consonant –Pre-voicing (voiced stops in carefully read English) –Transient (stops and affricates) –Frication (stops, affricates, and fricatives) –Aspiration (aspirated stops and /h/) –Formant Transitions (any consonant-vowel transition) Formant Tracking –Does it help Speech Recognition? –Methods for Vowels, and for Aspiration & Nasals Reminder – lab 1 due Monday!
3
International Phonetic Alphabet: Purpose and Brief History Purpose of the alphabet: to provide a universal notation for the sounds of the world’s languages –“Universal” = If any language on Earth distinguishes two phonemes, IPA must also distinguish them –“Distinguish” = Meaning of a word changes when the phoneme changes, e.g. “cat” vs. “bat.” Very Brief History: –1876: Alexander Bell publishes a distinctive-feature-based phonetic notation in “Visible Speech: The Science of the Universal Alphabetic.” His notation is rejected as being too expensive to print –1886: International Phonetic Association founded in Paris by phoneticians from across Europe –1991: Unicode provides a standard method for including IPA notation in computer documents
4
International Phonetic Alphabet: Vowels Pinyin ARPABET (Approx.) i /u (xu) IY / UX EY EH a (zhang) AE a (ma) Pinyin ARPABET (Approx.) / u (zhu) / UW o UH / oa / OW / o AH / AO a (ma) AA Pinyin:eARPA:AX
5
IPA: Regular Consonants NG ARPABET: F/V (labiodental), TH/DH (dental), S/Z (alveolar), SH/ZH (postalveolar or palatal) Pinyin: s (alveolar), x (postalveolar), sh/r (retroflex) DX R HH/HV Q Tongue Blade Tongue Body Y
6
Affricates and Doubly-Articulated Consonants Affricates in English and Chinese: Pinyin ARPABET IPA Alveolar: c/z ts/dz Post-alveolar: q/j CH/JH t ʃ/dʒ Retroflex: ch/zh ţş/ ɖ ʐ ARPABET WH W
7
Non-Pulmonic Consonants
8
Events in the Closure of a Syllable-Final Nasal Consonant
9
Events in the Closure of a Nasal Consonant Vowel Nasalization Formant Transitions Nasal Murmur
10
Formant Transitions: A Perturbation Theory Model
11
Formant Transitions: Labial Consonants “the mom” “the bug”
12
Formant Transitions: Alveolar Consonants “the tug” “the supper”
13
Formant Transitions: Post-alveolar Consonants “the shoe” “the zsazsa”
14
Formant Transitions: Velar Consonants “the gut” “sing a song”
15
Formant Transitions: A Perceptual Study The study: (1) Synthesize speech with different formant patterns, (2) record subject responses. Delattre, Liberman and Cooper, J. Acoust. Soc. Am. 1955.
16
Perception of Formant Transitions: Conclusions
17
Vowel Nasalization
19
Additive Terms in the Log Spectrum
20
Transfer Function of a Nasalized Vowel
21
Nasal Murmur “the mug” “the nut” “sing a song” Observations: Low-frequency resonance (about 300Hz) always present Low-frequency resonance has wide bandwidth (about 150Hz) Energy of low-frequency resonance is very constant Most high-frequency resonances cancelled by zeros Different places of articulation have different high frequency spectra High-frequency spectrum is talker-dependent and variable
22
Resonances of a Nasal Consonant Reference: Fujimura, JASA 1962
23
Anti-Resonances of a Nasal Consonant
24
Events in the Release of a Stop (Plosive) Consonant
25
Events in the Release of a Stop “Burst” = transient + frication (the part of the spectrogram whose transfer function has poles only at the front cavity resonance frequencies, not at the back cavity resonances).
26
Events in the Release of a Stop Unaspirated (/b/) Aspirated (/t/) TransientFricationAspirationVoicing
27
Pre-voicing during Closure To make a voiced stop in most European languages: Tongue root is relaxed, allowing it to expandm so that vocal folds can continue to vibrating for a little while after oral closure. Result is a low- frequency “voice bar” that may continue well into closure. In English, closure voicing is typical of read speech, but not casual speech. “the bug”
28
Transient: The Release of Pressure
29
Transfer Function During Transient and Frication: Poles Front cavity resonance frequency: F R = c/4L f Turbulence striking an obstacle makes noise
30
Transfer Function During Frication: An Important Zero
32
Transfer Function During Aspiration
33
Are Formant Frequencies Useful for Speech Recognition? Kopec and Bush (1992): WER(formants alone) > WER(cepstrum alone) > WER(formants and cepstrum together) How should we track formants? –In vowels: Autoregressive (AR) modeling (also known as LPC) –In aspiration, nasals: Autoregressive Moving Average (ARMA) modeling. Problem: no closed- form solution –In aspiration, nasals: Exponentially Weighted Autoregressive (EWAR; Zheng and Hasegawa- Johnson, ICASSP 2004)
34
Formant Tracking for Vowels: Autoregressive Model (LPC)
35
Formant Tracking for Aspiration: “Auto-Regressive Moving Average” Model (ARMA)
36
Formant Tracking for Aspiration: “Exponentially Weighted Auto- Regressive” Model (EWAR) (Zheng and Hasegawa-Johnson, ICSLP 2004)
37
Solving the EWAR Model
38
Results: Stop Classification, MFCC alone vs. MFCC+formants
40
Summary International Phonetic Alphabet: –Useful on any computer with unicode –International encoding for all sounds of the world’s languages Events in a nasal closure: –Formant transitions (perturbation model) –Vowel nasalization (sum of TFs) –Nasal murmur (impedance match at juncture) Events in release of a stop: –Pre-voicing in English voiced stops (read speech) –Transient (dp/dt ~ dA/dt) –Frication ((zero at f=0)/(front cavity resonances)) –Aspiration ((zero at f=0)/(same poles as the vowel)) Formant tracking –In a vowel: use LPC –In aspiration, frication, or nasal murmur: ARMA is theoretically optimum, but computationally expensive –Aspiration etcetera: EWAR can be a good approximation to ARMA
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.