Landmark-Based Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Networks, and Phonology Mark Hasegawa-Johnson University.

Slides:



Advertisements
Similar presentations
CS : Speech, NLP and the Web/Topics in AI
Advertisements

Normal Aspects of Articulation. Definitions Phonetics Phonology Articulatory phonetics Acoustic phonetics Speech perception Phonemic transcription Phonetic.
SPPA 403 Speech Science1 Unit 3 outline The Vocal Tract (VT) Source-Filter Theory of Speech Production Capturing Speech Dynamics The Vowels The Diphthongs.
Basic Spectrogram & Clinical Application: Consonants
Acoustic Characteristics of Consonants
1 CS 551/651: Structure of Spoken Language Lecture 4: Characteristics of Manner of Articulation John-Paul Hosom Fall 2008.
Phonetics.
SPPA 403 Speech Science1 Unit 3 outline The Vocal Tract (VT) Source-Filter Theory of Speech Production Capturing Speech Dynamics The Vowels The Diphthongs.
ACOUSTICS OF SPEECH AND SINGING MUSICAL ACOUSTICS Science of Sound, Chapters 15, 17 P. Denes & E. Pinson, The Speech Chain (1963, 1993) J. Sundberg, The.
Landmark-Based Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Networks, and Phonology Mark Hasegawa-Johnson
Digital Systems: Hardware Organization and Design
The Human Voice. I. Speech production 1. The vocal organs
ACOUSTICAL THEORY OF SPEECH PRODUCTION
PH 105 Dr. Cecilia Vogel Lecture 14. OUTLINE  consonants  vowels  vocal folds as sound source  formants  speech spectrograms  singing.
Speech Classification Speech Lab Spring 2009 February 17, 09 1 Montgomery College Speech Classification Uche O. Abanulo Physics, Engineering And Geosciences.
Phonetics (Part 1) Dr. Ansa Hameed.
Speech Anatomy and Articulation
Hossein Sameti Department of Computer Engineering Sharif University of Technology.
Speech sounds Articulation.
Unit 4 Articulation I.The Stops II.The Fricatives III.The Affricates IV.The Nasals.
Chapter 6 Features PHONOLOGY (Lane 335).
Anatomic Aspects Larynx: Sytem of muscles, cartileges and ligaments.
Chapter 2 Introduction to articulatory phonetics
Chapter 3 Phonetics: Describing Sounds. Phonetics -study of speech sounds Sounds and symbols --use a system of written symbols --one sound represents.
Phonetics III: Dimensions of Articulation October 15, 2012.
An important point… When discussing source-filter theory, the sound source was the glottal spectrum When discussing stops (and fricatives and affricates),
Fricatives + Voice Onset Time March 31, 2014 In the Year 2000 Today: we’ll wrap up fricatives… and then move on to stops. This Friday, there will be.
Speech Sounds of American English and Some Iranian Languages
The sounds of language Phonetics Chapter 4.
1 CS 551/651: Structure of Spoken Language Lecture 4: Characteristics of Manner of Articulation John-Paul Hosom Fall 2010.
Physics 1251 The Science and Technology of Musical Sound Unit 3 Session 31 MWF The Fundamentals of the Human Voice Unit 3 Session 31 MWF The Fundamentals.
Landmark-Based Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Networks, and Phonology Mark Hasegawa-Johnson
Phonetics Phonetics: It is the science of speech sounds. It is the study of the production and reception of speech sounds. It is concerned with the sounds.
LING 001 Introduction to Linguistics Fall 2010 Sound Structure I: Phonetics Articulatory phonetics Phonetic transcription Jan. 25.
1 4. Consonants  Consonants are produced ‘ by a closure in the vocal tract, or by a narrowing which is so marked that air cannot escape without producing.
Speech Production1 Articulation and Resonance Vocal tract as resonating body and sound source. Acoustic theory of vowel production.
Speech Science Fall 2009 Oct 26, Consonants Resonant Consonants They are produced in a similar way as vowels i.e., filtering the complex wave produced.
Speech Science VII Acoustic Structure of Speech Sounds WS
Introduction to Linguistics Ms. Suha Jawabreh Lecture # 7.
ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.
Speech Or can you hear me now?. Linguistic Parts of Speech Phone Phone Basic unit of speech sound Basic unit of speech sound Phoneme Phoneme Phone to.
English Phonetics and Phonology
SPEECH ORGANS & ARTICULATION
Speech Science Fall 2009 Oct 28, Outline Acoustical characteristics of Nasal Speech Sounds Stop Consonants Fricatives Affricates.
Say “blink” For each segment (phoneme) write a script using terms of the basic articulators that will say “blink.” Consider breathing, voicing, and controlling.
Phonetics: Dimensions of Articulation October 13, 2010.
Statistical NLP Spring 2011
Matakuliah: G0922/Introduction to Linguistics Tahun: 2008 Session 3 Phonetics: Consonants.
ACE TESOL Diploma Program – London Language Institute OBJECTIVES You will understand: 1. How each of the phonemes in English is articulated 2. The differences.
LIN 3201 Sounds of Human Language Sayers -- Week 1 – August 29 & 31.
PHONETIC 1 MGSTER. RAMON GUERRA by: Mgster. Ramon Guerra.
Stop + Approximant Acoustics
Ch4 – Features Features are partly acoustic partly articulatory aspects of sounds but they are used for phonology so sometimes they are created to distinguish.
Speech Generation and Perception
Acoustic Phonetics 3/14/00.
Stop/Plosives.
Welcome to all.
PHONETICS AND PHONOLOGY
ARTICULATORY PHONETICS
Phonetics Dimensions of Articulation
Linguistics: Phonetics
The Human Voice. 1. The vocal organs
Essentials of English Phonetics
Structure of Spoken Language
The Human Voice. 1. The vocal organs
Speech is made up of sounds.
The Vocal Pedagogy Workshop Session III – Articulation
CONSONANTS ARTICULATORY PHONETICS. Consonants When we pronounce consonants, the airflow out of the mouth is completely blocked, greatly restricted, or.
Presentation transcript:

Landmark-Based Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Networks, and Phonology Mark Hasegawa-Johnson University of Illinois at Urbana-Champaign, USA Assistant Professor, Electrical and Computer Engineering Department Assistant Professor, Beckman Institute for Advanced Science and Technology Adjunct Professor, Speech and Hearing Sciences Department

Lecture 1 Introduction to Spectrogram Reading Review –Laplace and Fourier transforms –Short-time Fourier transform (STFT) and windowing –White noise –Periodic Signals Spectrogram reading: Pitch –Wideband and narrowband spectrograms Spectrogram reading: Manner –Speech physiology –Manner classification of phonemes Spectrogram reading: Formants –Log-linear form of a rational filter

Laplace and Fourier Transforms

Transform Properties

Transforms worth knowing: Impulses

Transforms worth knowing: Filters

Rectangular Window

Hamming & Hanning Windows

Periodic Signals

Random Signals (Noise)

The Short-Time Fourier Transform

The Spectrogram

Narrowband Spectrogram: N > 2T 0

Wideband Spectrogram: N < T 0

Fundamental Frequency 4T 0 10F 0 Fundamental Frequency (Pitch): F 0 =1/T 0

On to New Material: Manner Features, Speech Production, and Landmarks

Anatomy of Speech Production Hard Palate Lips Oral Cavity Tongue Blade Nasal Cavity Soft Palate (Open) Epiglottis Vocal Folds Pharynx Tongue Body Jaw Tongue Root

Speech sources: Voicing, Turbulence, and Transients The vocal folds: –A nonlinear, high-impedance oscillator –Excitation is like a periodic impulse train Turbulence: –Vortices striking an obstacle produce white noise –Excitation is like white noise Transient: –High pressure, suddenly released –Excitation is like a single loud impulse,  (t)

The vocal folds: A nonlinear, high- impedance oscillator Vocal tract “rings” like a bell, shaping the sound produced by the vocal folds (Cross-sectional area of the vocal tract: cm 2 ) Larynx (the opening between the vocal folds) has an open area of 0.03 cm 2. In order to get through, air from lungs must speed up to a high-speed jet. Vocal folds flap back and forth, driven by the jet, with a rate of pulses/second.

Turbulence: Vortices striking an obstacle produce white noise In a fricative, area of the tongue constriction is about 0.2cm2. In order to get through, air speeds up into a turbulent jet. The turbulent jet strikes against downstream obstacles, like the teeth. The jet contains vortices of all different radii, between 0mm and 0.2cm, therefore the resulting sound contains noise at all frequencies above about 700Hz.

Transient: High pressure, suddenly released While tongue tip is closed, air pressure builds up behind the constriction. When constriction is released, there is a sudden change in air flow through the constriction (from 0 to nonzero). The sudden change in airflow is heard as a “pop.”

The Source-Filter Model of Speech Production Corresponds to: S(s) = H(s)E(s), where S(s) = Recorded speech spectrum E(s) = Source spectrum H(s) = Transfer function = Filtering by the vocal tract

Manner Classification of Phonemes: [continuant] [-continuant] = lips or tongue close COMPLETELY on midline of the vocal tract: –stops (p,b,t,d,k,g) –nasals (m,n,ng), –affricates (q,j,ch,zh) –syllable-initial lateral (l, e.g., “lake”) [+continuant] = no complete closure: –fricatives (f,v,s,z,sh,x, Chinese h) –glides (w,y,r, English h) –vowels (a,e,i,o,u) –diphthongs (in “buy,” “boy,” “bow”)

Manner Classification of Phonemes: [sonorant] [+sonorant] = “a sound you can sing” (Latin) –nasals (m,n,ng) –lateral (l) –glides (w,y,r) –vowels (a,e,i,o,u) –diphthongs (buy, boy, bow) [-sonorant] = air pressure builds up behind constriction; voicing amplitude drops (also called an “obstruent consonant”) –stops (p,b,t,d,k,g) –affricates (q,j,ch,zh) –fricatives (f,v,s,z,sh,x) Special status of “sonorant” in Chinese: –“initial” must be all-sonorant (“liang”) or all-obstruent (“qing”) –“final” must be all-sonorant

Sonorant Consonants: Glide, Lateral, Nasal “layya ton” -- /l/, /y/, /t/, /n/ (the /y/ is [+continuant], others are -) “ame” -- /m/ [-continuant]

Obstruent Consonants: Fricatives, Affricates, and Stops sa (+continuant)shi (+continuant) qe (-continuant)iji (-continuant) ba (-continuant)ita (-continuant)

Place of Primary Articulation Labial (Lips):p,b,f,v,m,w,u,o Alveolar (Blade):t,d,s,z,n,l Palatal (Blade):q,j,sh,y,i Retroflex (Blade):ch,zh,x,r,er Pharyngeal(Body):a,ae Velar (Body):k,g,ng,w,u Uvular (Body):h,o Dental (Blade):th,dh Laryngeal:h

Features of Secondary Articulators: [lateral], [nasal], [affricated], [aspirated] [+sonorant,+continuant]: vowels, glides [+sonorant,-continuant]: –[+nasal] = soft palate is open; air escapes through the nose –[+lateral] = tongue is open on the sides; air can escape around edges of tongue [-sonorant,+continuant]: fricatives [-sonorant,-continuant]: –[+affricated]: tongue stays nearly closed after release, causing frication (q,j,ch,zh) –[+aspirated]: larynx stays open after release, causing aspiration (p,t,k) –[-affricated,-aspirated]: nothing special happens after release; vowel starts immediately (b,d,g)

Sonorant Consonants: Glide, Lateral, Nasal “layya ton” -- /l/, /y/, /t/, /n/ (the /y/ is [+continuant], others are -) “ame” -- /m/ [-continuant]

Waveforms and Spectrograms: Aspirated and Unaspirated Stops Unaspirated: /b/Aspirated: /t/

Phonetic Subsegments in the Release of an Aspirated Stop

Waveforms and Spectrograms: Fricatives and Affricates sashi qe iji

Landmarks: Changes in the features [continuant], [sonorant] /l/ release /t/ closure /t/ release /v/ closure /v/ release /m/ closure /m/ release /n/ closure /n/ release /k/

The Vocal Tract Transfer Function

Log-Spectral Separation of Source and Filter

Formant Frequencies = Resonant Frequencies of the Vocal Tract

Formant Frequencies of a Vowel From Peterson and Barney, “Control Methods in a Study of the Vowels,” Journal of the Acoustical Society of America, 1952

Classifying Vowels F 2 =1200Hz F 1 =800Hz Therefore vowel is /AH/ F 2 starts at 1200Hz, rises to 2000Hz F 1 starts at 800Hz, falls to 300Hz Therefore diphthong is /AY/

Rational Filters: Obstruents

Example: Front Cavity Resonance of /ch/ (q) is near F3 of Following Vowel

Rational Filters: Nasal Consonants

Examples: Nasal Consonants /m/: This talker makes /m/ with resonances at 1000Hz, 1800Hz uncancelled, but with the resonance at 300Hz cancelled by zeros. /ng/: This talker makes /ng/ with resonances at 300Hz, 1000Hz uncancelled, but with the resonance at 1800Hz cancelled by zeros.

Summary Spectrogram is the log magnitude of the STFT. Wideband spectrogram: N<T 0, pitch shows up in the time domain Narrowband spectrogram: N>2T 0, pitch shows up in the frequency domain Landmarks occur at changes in the values of the distinctive features [continuant] and [sonorant]: –[+continuant,+sonorant]: vowels, glides, diphthongs –[+continuant,-sonorant]: fricatives –[-continuant,+sonorant]: nasals, laterals –[-continuant,-sonorant]: stops, affricates Recognition of Vowels and Glides: F 1 and F 2 are usually enough Recognition of Diphthongs: F 1 and F 2 at two separate points in time (beginning and ending of the vowel). Obstruent Consonants: Back cavity formants are cancelled by zeros, leaving only the front cavity formants (e.g., F3 for /sh/, /q/) Nasal Consonants: Resonances of the mouth-nose system are often cancelled by zeros, leaving primarily low-frequency energy.