ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.

Slides:



Advertisements
Similar presentations
Acoustic/Prosodic Features
Advertisements

Tom Lentz (slides Ivana Brasileiro)
SPPA 403 Speech Science1 Unit 3 outline The Vocal Tract (VT) Source-Filter Theory of Speech Production Capturing Speech Dynamics The Vowels The Diphthongs.
Basic Spectrogram & Clinical Application: Consonants
Acoustic Characteristics of Consonants
1 CS 551/651: Structure of Spoken Language Lecture 4: Characteristics of Manner of Articulation John-Paul Hosom Fall 2008.
1 CS 551/651: Structure of Spoken Language Spectrogram Reading: Stops John-Paul Hosom Fall 2010.
Nasal Stops.
ACOUSTICS OF SPEECH AND SINGING MUSICAL ACOUSTICS Science of Sound, Chapters 15, 17 P. Denes & E. Pinson, The Speech Chain (1963, 1993) J. Sundberg, The.
Landmark-Based Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Networks, and Phonology Mark Hasegawa-Johnson
Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.
ECE 598: The Speech Chain Lecture 8: Formant Transitions; Vocal Tract Transfer Function.
The Human Voice. I. Speech production 1. The vocal organs
ACOUSTICAL THEORY OF SPEECH PRODUCTION
The Human Voice Chapters 15 and 17. Main Vocal Organs Lungs Reservoir and energy source Larynx Vocal folds Cavities: pharynx, nasal, oral Air exits through.
Speech Sound Production: Recognition Using Recurrent Neural Networks Abstract: In this paper I present a study of speech sound production and methods for.
PH 105 Dr. Cecilia Vogel Lecture 14. OUTLINE  consonants  vowels  vocal folds as sound source  formants  speech spectrograms  singing.
Vowel Acoustics, part 2 November 14, 2012 The Master Plan Acoustics Homeworks are due! Today: Source/Filter Theory On Friday: Transcription of Quantity/More.
Unit 4 Articulation I.The Stops II.The Fricatives III.The Affricates IV.The Nasals.
Introduction to Speech Synthesis ● Key terms and definitions ● Key processes in sythetic speech production ● Text-To-Phones ● Phones to Synthesizer parameters.
Anatomic Aspects Larynx: Sytem of muscles, cartileges and ligaments.
Voice Transformations Challenges: Signal processing techniques have advanced faster than our understanding of the physics Examples: – Rate of articulation.
Representing Acoustic Information
Chapter 25 Nonsinusoidal Waveforms. 2 Waveforms Used in electronics except for sinusoidal Any periodic waveform may be expressed as –Sum of a series of.
Physics 1251 The Science and Technology of Musical Sound Unit 3 Session 31 MWF The Fundamentals of the Human Voice Unit 3 Session 31 MWF The Fundamentals.
CS 551/651: Structure of Spoken Language Lecture 1: Visualization of the Speech Signal, Introductory Phonetics John-Paul Hosom Fall 2010.
Landmark-Based Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Networks, and Phonology Mark Hasegawa-Johnson
Source/Filter Theory and Vowels February 4, 2010.
Where we’re going Speed, Storage Issues Frequency Space.
Speech Production1 Articulation and Resonance Vocal tract as resonating body and sound source. Acoustic theory of vowel production.
Resonance, Revisited March 4, 2013 Leading Off… Project report #3 is due! Course Project #4 guidelines to hand out. Today: Resonance Before we get into.
Vowels, part 4 March 19, 2014 Just So You Know Today: Source-Filter Theory For Friday: vowel transcription! Turkish, British English and New Zealand.
Acoustic Phonetics 3/9/00. Acoustic Theory of Speech Production Modeling the vocal tract –Modeling= the construction of some replica of the actual physical.
MUSIC 318 MINI-COURSE ON SPEECH AND SINGING
Speech Science Fall 2009 Oct 26, Consonants Resonant Consonants They are produced in a similar way as vowels i.e., filtering the complex wave produced.
Speech Science VII Acoustic Structure of Speech Sounds WS
LING 001 Introduction to Linguistics Fall 2010 Sound Structure I: Phonetics Acoustic phonetics Jan. 27.
Speech Science Fall 2009 Oct 28, Outline Acoustical characteristics of Nasal Speech Sounds Stop Consonants Fricatives Affricates.
Voice Quality + Stop Acoustics
Say “blink” For each segment (phoneme) write a script using terms of the basic articulators that will say “blink.” Consider breathing, voicing, and controlling.
The end of vowels + The beginning of fricatives November 19, 2012.
1 Prof. Nizamettin AYDIN Digital Signal Processing.
09/19/2002 (C) University of Wisconsin 2002, CS 559 Last Time Color Quantization Dithering.
Landmark-Based Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Networks, and Phonology Mark Hasegawa-Johnson University.
Structure of Spoken Language
Speech Science VI Resonances WS Resonances Reading: Borden, Harris & Raphael, p Kentp Pompino-Marschallp Reetzp
Resonance October 23, 2014 Leading Off… Don’t forget: Korean stops homework is due on Tuesday! Also new: mystery spectrograms! Today: Resonance Before.
Statistical NLP Spring 2011
Frequency, Pitch, Tone and Length February 12, 2014 Thanks to Chilin Shih for making some of these lecture materials available.
1 Conditions for Distortionless Transmission Transmission is said to be distortion less if the input and output have identical wave shapes within a multiplicative.
Stop Acoustics and Glides December 2, 2013 Where Do We Go From Here? The Final Exam has been scheduled! Wednesday, December 18 th 8-10 am (!) Kinesiology.
Stop + Approximant Acoustics
Vowels, part 4 November 16, 2015 Just So You Know Today: Vowel remnants + Source-Filter Theory For Wednesday: vowel transcription! Turkish and British.
1 Acoustic Phonetics 3/28/00. 2 Nasal Consonants Produced with nasal radiation of acoustic energy Sound energy is transmitted through the nasal cavity.
The Speech Chain (Denes & Pinson, 1993)
P105 Lecture #27 visuals 20 March 2013.
Acoustic Phonetics 3/14/00.
1 EE2003 Circuit Theory Chapter 17 The Fourier Series Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Chapter 17 The Fourier Series
Lecture 6 Periodic Signals, Harmonics & Time-Varying Sinusoids
The Human Voice. 1. The vocal organs
Vocoders.
How speech sounds are physically represented Chapters 12 and 13
Structure of Spoken Language
The Human Voice. 1. The vocal organs
The Vocal Pedagogy Workshop Session III – Articulation
Remember me? The number of times this happens in 1 second determines the frequency of the sound wave.
The Production of Speech
Speech Perception (acoustic cues)
Mark Hasegawa-Johnson 10/2/2018
Presentation transcript:

ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters

Today Fourier Series (Discrete Fourier Transform) Fourier Series (Discrete Fourier Transform) Sawtooth wave example Sawtooth wave example Finding the coefficients Finding the coefficients Spectrogram Spectrogram Frequency Response Frequency Response Speech Source-Filter Theory Speech Source-Filter Theory Speech Sources Speech Sources Transient Transient Frication Frication Aspiration Aspiration Voicing Voicing Formant Transitions Formant Transitions

Topic #1: Discrete Fourier Transform

Sawtooth Wave Sawtooth first harmonic: Sawtooth first harmonic: Amplitude: 0.64 Amplitude: 0.64 Phase: 0.52  radians Phase: 0.52  radians

Sawtooth Second Harmonic Sawtooth second harmonic: Sawtooth second harmonic: Amplitude: 0.32 Amplitude: 0.32 Phase: 0.53  radians Phase: 0.53  radians

Sawtooth Third Harmonic Sawtooth third harmonic: Sawtooth third harmonic: Amplitude: 0.21 Amplitude: 0.21 Phase: 0.55  radians Phase: 0.55  radians

Sawtooth: Calculating the Fourier Transform Average over one full period of (125Hz sawtooth * 125Hz cosine) = Average over one full period of (125Hz sawtooth * 125Hz cosine) = -0.03

Sawtooth: Calculating the Fourier Transform Average over one full period of (125Hz sawtooth * 125Hz sine) = Average over one full period of (125Hz sawtooth * 125Hz sine) = 0.636

Sawtooth: Amplitude and Phase of the First Harmonic First Fourier Coefficient of the Sawtooth Wave: First Fourier Coefficient of the Sawtooth Wave: j j | j| = 0.64 | j| = 0.64 phase( j) = 0.52  phase( j) = 0.52  X 1 = j = 0.64 e j0.52  X 1 = j = 0.64 e j0.52  First Harmonic: First Harmonic: h 1 (t) = 0.64 e j(250  t+0.52  ) h 1 (t) = 0.64 e j(250  t+0.52  ) Re{h1(t)} = 0.64 cos(250  t+0.52  ) Re{h1(t)} = 0.64 cos(250  t+0.52  )

Sawtooth: Calculating the Fourier Transform Average over one full period of (125Hz sawtooth * 250Hz cosine) = Average over one full period of (125Hz sawtooth * 250Hz cosine) = -0.03

Sawtooth: Calculating the Fourier Transform Average over one full period of (125Hz sawtooth * 250Hz sine) = Average over one full period of (125Hz sawtooth * 250Hz sine) =

Sawtooth: Amplitude and Phase of the Second Harmonic Second Fourier Coefficient of the Sawtooth Wave: Second Fourier Coefficient of the Sawtooth Wave: j j | j| = 0.32 | j| = 0.32 phase( j) = 0.53  phase( j) = 0.53  X 1 = j = 0.32 e j0.53  X 1 = j = 0.32 e j0.53  First Harmonic: First Harmonic: h 2 (t) = 0.32 e j(500  t+0.53  ) h 2 (t) = 0.32 e j(500  t+0.53  ) Re{h 2 (t)} = 0.64 cos(250  t+0.53  ) Re{h 2 (t)} = 0.64 cos(250  t+0.53  )

How to Reconstruct a Sawtooth from its Harmonics At sampling frequency of FS=8000Hz (8000 samples/second), the period of a 125Hz sawtooth is At sampling frequency of FS=8000Hz (8000 samples/second), the period of a 125Hz sawtooth is (8000 samples/second)/(125 cycles/second) = 64 samples (8000 samples/second)/(125 cycles/second) = 64 samples The “0”th harmonic is DC (0*125Hz = 0 Hz) The “0”th harmonic is DC (0*125Hz = 0 Hz) So we need to add up 63 harmonics: So we need to add up 63 harmonics: x(t) = h 1 (t) + h 2 (t) + … + h 63 (t) x(t) = h 1 (t) + h 2 (t) + … + h 63 (t)

Why Does This Work??? It’s a property of sines and cosines It’s a property of sines and cosines They are a BASIS SET, which means that… They are a BASIS SET, which means that… Let <> mean “average over one full period:” Let <> mean “average over one full period:” = 0.5 for any integer m = 0.5 for any integer m = 0 if m ≠ n = 0 if m ≠ n above two properties also hold for sine = 0 = 0  0 = 2  F 0 = 2  /T 0 is the “fundamental frequency” in radians/second  0 = 2  F 0 = 2  /T 0 is the “fundamental frequency” in radians/second

Topic #2: Spectrogram

Spectrogram Time (seconds) Frequency (Hz)

How to Create a Spectrogram Divide the waveform into overlapping window Divide the waveform into overlapping window Example: window length = 6ms Example: window length = 6ms Example: window spacing = one per 2ms Example: window spacing = one per 2ms Result: neighboring windows overlap by 4ms Result: neighboring windows overlap by 4ms Fourier transform each frame to create the “short-time Fourier transform” X k (t) Fourier transform each frame to create the “short-time Fourier transform” X k (t) Read: kth frequency bin, taken from the frame at time t seconds Read: kth frequency bin, taken from the frame at time t seconds Could also write X(t,f) = Fourier coefficient from the frame at time=t seconds, from the frequency bin centered at f Hz. Could also write X(t,f) = Fourier coefficient from the frame at time=t seconds, from the frequency bin centered at f Hz. In pixel (k,t), plot 20*log10(abs(X k (t)) In pixel (k,t), plot 20*log10(abs(X k (t)) The Fourier transform expressed in decibels! The Fourier transform expressed in decibels!

Reading a Spectrogram Time (seconds) Frequency (Hz) Vowel Turbulence (Frication or Aspiration) Stop Glide (Extreme Formant Frequencies) Each vertical stripe is a glottal closure instant (BANG – high energy at all frequencies) Inter-bang period = pitch period ≈ 10 ms Nasals

Topic #3: Frequency Response and Speech Source-Filter Theory

Frequency Response Any sound can be windowed Any sound can be windowed Call the window length “N” Call the window length “N” Its windows can be Fourier transformed: Its windows can be Fourier transformed: x(t) = h 1 (t) + h 2 (t) + … x(t) = X 1 e j  0 t + X 2 e 2j  0 t + … What happens when you filter a cosine? What happens when you filter a cosine? Input = e j  0 t --- Output = H(  0 )e j  0 t Input = e 2j  0 t --- Output = H(2  0 )e 2j  0 t …

Frequency Response If x(t) goes through any filter: If x(t) goes through any filter: x(t) → ANY FILTER → x(t) Then we can find y(t) using freq response! Then we can find y(t) using freq response! x(t) = X 1 e j  0 t + X 2 e 2j  0 t + … y(t) = H(  0 )X 1 e j  0 t + H(2  0 )X 2 e 2j  0 t + …

Frequency Response In general, we can write In general, we can write Y(  ) = H(  )X(  ) Y(  ) = H(  )X(  )

Examples of Filters Low-Pass Filter: Low-Pass Filter: Eliminates all frequency components above some cutoff frequency Eliminates all frequency components above some cutoff frequency Result sounds muffled Result sounds muffled High-Pass Filter: High-Pass Filter: Eliminates all frequency components below some cutoff frequency Eliminates all frequency components below some cutoff frequency Result sounds fizzy Result sounds fizzy Band-Pass Filter: Band-Pass Filter: Eliminates all components except those between f 1 and f 2 Eliminates all components except those between f 1 and f 2 Result: sounds like an old radio broadcast Result: sounds like an old radio broadcast Band-Stop Filter: Band-Stop Filter: Eliminates only the frequency components between f1 and f2 Eliminates only the frequency components between f1 and f2 Result: usually hard to hear that anything’s happened! Result: usually hard to hear that anything’s happened!

Examples of Filters A Room: A Room: Adds echoes to the signal Adds echoes to the signal A Quarter-Wave Resonator: A Quarter-Wave Resonator: Emphasizes frequency components at f=(c/4L)+(nc/2L), for all integer n Emphasizes frequency components at f=(c/4L)+(nc/2L), for all integer n Components at other frequencies are passed without any emphasis Components at other frequencies are passed without any emphasis A Vocal Tract: A Vocal Tract: Emphasizes any part of the input that is near a formant frequency Emphasizes any part of the input that is near a formant frequency Any energy at other frequencies is passed without emphasis Any energy at other frequencies is passed without emphasis

The Source-Filter Model of Speech Production Input: Weak Sources Input: Weak Sources Glottal closure Glottal closure Turbulence (frication, aspiration) Turbulence (frication, aspiration) Filter: The Vocal Tract Filter: The Vocal Tract Energy near formant frequency is emphasized by a factor of 10 or a factor of 100 Energy near formant frequency is emphasized by a factor of 10 or a factor of 100 Energy at other frequencies is passed without emphasis Energy at other frequencies is passed without emphasis Speech = (Vocal Tract Transfer Function) X (Excitation) Speech = (Vocal Tract Transfer Function) X (Excitation) S(  ) = T(  ) E(  ) S(  ) = T(  ) E(  )

Topic #4: Speech Sources. Presented as: Events in the Release of a Stop Consonant

Events in the Release of a Stop “Burst” = transient + frication (the part of the spectrogram whose transfer function has poles only at the front cavity resonance frequencies, not at the back cavity resonances).

Pre-voicing during Closure To make a voiced stop in most European languages: Tongue root is relaxed, allowing it to expandm so that vocal folds can continue to vibrating for a little while after oral closure. Result is a low- frequency “voice bar” that may continue well into closure. In English, closure voicing is typical of read speech, but not casual speech. “the bug”

Burst = Transient and Frication Frication = Turbulence at a Constriction Front cavity resonance frequency: F R = c/4L f Turbulence striking an obstacle makes noise The fricative “sh” (this ship)

Aspiration = Turbulence at the Glottis (like /h/). Transfer Function During Aspiration…

Formant Transitions: Labial Consonants “the mom” “the bug”

Formant Transitions: Alveolar Consonants “the tug” “the supper”

Formant Transitions: Post-alveolar Consonants “the shoe” “the zsazsa”

Formant Transitions: Velar Consonants “the gut” “sing a song”

Formant Transitions: A Perceptual Study The study: (1) Synthesize speech with different formant patterns, (2) record subject responses. Delattre, Liberman and Cooper, J. Acoust. Soc. Am

Perception of Formant Transitions