SPEECH RECOGNITION 1 DAY 14 – SEPT 27, 2013 Brain & Language LING 4110-4890-5110-7960 NSCI 4110-4891-6110 Harry Howard Tulane University.

Slides:

Advertisements

Similar presentations

Acoustic/Prosodic Features

Advertisements

Tom Lentz (slides Ivana Brasileiro)

Acoustic Characteristics of Consonants

Vowel Formants in a Spectogram Nural Akbayir, Kim Brodziak, Sabuha Erdogan.

Physical modeling of speech XV Pacific Voice Conference PVSF-PIXAR Brad Story Dept. of Speech, Language and Hearing Sciences University of Arizona.

From Resonance to Vowels March 8, 2013 Friday Frivolity Some project reports to hand back… Mystery spectrogram reading exercise: solved! We need to plan.

SPPA 403 Speech Science1 Unit 3 outline The Vocal Tract (VT) Source-Filter Theory of Speech Production Capturing Speech Dynamics The Vowels The Diphthongs.

Wave interactions.

SPEECH RECOGNITION 2 DAY 15 – SEPT 30, 2013 Brain & Language LING NSCI Harry Howard Tulane University.

The frequency spectrum

ACOUSTICS OF SPEECH AND SINGING MUSICAL ACOUSTICS Science of Sound, Chapters 15, 17 P. Denes & E. Pinson, The Speech Chain (1963, 1993) J. Sundberg, The.

PHONETICS AND PHONOLOGY

Physics of Sounds Overview Properties of vibrating systems Free and forced vibrations Resonance and frequency response Sound waves in air Frequency, wavelength,

The Human Voice. I. Speech production 1. The vocal organs

ACOUSTICAL THEORY OF SPEECH PRODUCTION

Speech Perception Overview of Questions Can computers perceive speech as well as humans? Does each word that we hear have a unique pattern associated.

The Human Voice Chapters 15 and 17. Main Vocal Organs Lungs Reservoir and energy source Larynx Vocal folds Cavities: pharynx, nasal, oral Air exits through.

Introduction to Acoustics Words contain sequences of sounds Each sound (phone) is produced by sending signals from the brain to the vocal articulators.

PHYSICAL PROPERTIES OF SPEECH SOUNDS

PH 105 Dr. Cecilia Vogel Lecture 14. OUTLINE  consonants  vowels  vocal folds as sound source  formants  speech spectrograms  singing.

Sound. Sound is Energy in Waves Specifically, longitudinal waves Different from transverse waves.

Acoustics of Instruments Music Theory Class Gettysburg College.

1 Lab Preparation Initial focus on Speaker Verification –Tools –Expertise –Good example “Biometric technologies are automated methods of verifying or recognising.

7/5/20141FCI. Prof. Nabila M. Hassan Faculty of Computer and Information Fayoum University 2013/2014 7/5/20142FCI.

Pitch changes result from changing the length and tension of the vocal folds The pitch you produce is based on the number of cycles per second Hertz (Hz)

MODULARITY DAY 13 – SEPT 25, 2013 Brain & Language LING NSCI Harry Howard Tulane University.

Harmonics, Timbre & The Frequency Domain

Source/Filter Theory and Vowels February 4, 2010.

Phonetics and Phonology

Speech Production1 Articulation and Resonance Vocal tract as resonating body and sound source. Acoustic theory of vowel production.

Resonance, Revisited March 4, 2013 Leading Off… Project report #3 is due! Course Project #4 guidelines to hand out. Today: Resonance Before we get into.

Acoustic Phonetics 3/9/00. Acoustic Theory of Speech Production Modeling the vocal tract –Modeling= the construction of some replica of the actual physical.

MUSIC 318 MINI-COURSE ON SPEECH AND SINGING

Vowel Acoustics November 2, 2012 Some Announcements Mid-terms will be back on Monday… Today: more resonance + the acoustics of vowels Also on Monday:

LING 001 Introduction to Linguistics Fall 2010 Sound Structure I: Phonetics Acoustic phonetics Jan. 27.

Say “blink” For each segment (phoneme) write a script using terms of the basic articulators that will say “blink.” Consider breathing, voicing, and controlling.

The end of vowels + The beginning of fricatives November 19, 2012.

SPEECH PERCEPTION DAY 16 – OCT 2, 2013 Brain & Language LING NSCI Harry Howard Tulane University.

LATERALIZATION OF PHONOLOGY 2 DAY 23 – OCT 21, 2013 Brain & Language LING NSCI Harry Howard Tulane University.

SPEECH PERCEPTION DAY 18 – OCT 9, 2013 Brain & Language LING NSCI Harry Howard Tulane University.

Formants, Resonance, and Deriving Schwa March 10, 2009.

Speech Science VI Resonances WS Resonances Reading: Borden, Harris & Raphael, p Kentp Pompino-Marschallp Reetzp

Resonance October 23, 2014 Leading Off… Don’t forget: Korean stops homework is due on Tuesday! Also new: mystery spectrograms! Today: Resonance Before.

Vowel Acoustics March 10, 2014 Some Announcements Today and Wednesday: more resonance + the acoustics of vowels On Friday: identifying vowels from spectrograms.

From Resonance to Vowels March 10, Fun Stuff (= tracheotomy) Peter Ladefoged: “To record the pressure of the air associated with stressed as opposed.

From Resonance to Vowels March 13, 2012 Fun Stuff (= tracheotomy) Peter Ladefoged: “To record the pressure of the air associated with stressed as opposed.

AUDITORY TRANSDUCTION SEPT 4, 2015 – DAY 6 Brain & Language LING NSCI Fall 2015.

Phonation + Voice Quality Feburary 11, 2014 Weekday Update Course project report #2 is due right now! I have guidelines for course project report #3,

Voicing + Basic Acoustics October 14, 2015 Agenda Production Exercise #2 is due on Friday! No transcription exercise this Friday! Today, we’ll begin.

Speech Generation and Perception

P105 Lecture #27 visuals 20 March 2013.

1 Waves and Vibrations. 2 Waves are everywhere in nature Sound waves, visible light waves, radio waves, microwaves, water waves, sine waves, telephone.

Physics Mrs. Dimler SOUND.  Every sound wave begins with a vibrating object, such as the vibrating prong of a tuning fork. Tuning fork and air molecules.

HOW WE TRANSMIT SOUNDS? Media and communication 김경은 김다솜 고우.

Basic Acoustics + Digital Signal Processing January 11, 2013.

PHONETICS AND PHONOLOGY

Resonance October 29, 2015 Looking Ahead I’m still behind on grading the mid-term and Production Exercise #1… They should be back to you by Monday. Today:

Chapter 3: The Speech Process

L 17 The Human Voice.

Speech 1 Sept 11, 2017 – DAY 6 Brain & Language

Auditory transduction 2 Sept 8, 2017 – DAY 5

B. Harpsichord Strings are plucked

The Human Voice. 1. The vocal organs

Auditory transduction Sept 6, 2017 – DAY 4

The Human Voice. 1. The vocal organs

Speech Generation and Perception

Remember me? The number of times this happens in 1 second determines the frequency of the sound wave.

The Production of Speech

Evolution of human vocal production

Speech Generation and Perception

Presentation transcript:

SPEECH RECOGNITION 1 DAY 14 – SEPT 27, 2013 Brain & Language LING NSCI Harry Howard Tulane University

Course organization The syllabus, these slides and my recordings are available at If you want to learn more about EEG and neurolinguistics, you are welcome to participate in my lab. This is also a good way to get started on an honor's thesis. The grades are posted to Blackboard. 9/27/13Brain & Language, Harry Howard, Tulane University 2

REVIEW Modularity 9/27/13Brain & Language, Harry Howard, Tulane University 3

Coltheart’s grouping & my explanation 1. Specific to a domain 2. Information is encapsulated 3. Fixed neural structure 4. Matures in a specific way 5. Fails in a specific way 6. Limits central access 7. Operates mandatorily 8. Acts quickly 9. Analyzes ‘shallowly’ 1. by definition. 2. by definition. 3. in order to keep out all the other stuff. 4. in order to build the fixed structure. 5. because it was built in a specific way. 6. in order to keep out other stuff. 7. since there is no external access, it can’t be turned on or off. 8. because there is no other stuff to get in the way of optimizing speed. 9. because other stuff is necessary to analyze deeply. 9/27/13Brain & Language, Harry Howard, Tulane University 4

SPEECH RECOGNITION Ingram §5 9/27/13Brain & Language, Harry Howard, Tulane University 5

9/27/13Brain & Language, Harry Howard, Tulane University 6 Three systems involved in speech production Respiratory Laryngeal Supralaryngeal

9/27/13Brain & Language, Harry Howard, Tulane University 7 Vocal folds and their location in the larynx

9/27/13Brain & Language, Harry Howard, Tulane University 8 Phonation Phonation, or speech sound, is created by turbulent oscillation between phases in which the passage of air through the larynx is unconstricted (the expiratory airflow has pushed the vocal folds apart) and phases in which the passage of air is blocked (the vocal folds snap back to their semi- closed position).

9/27/13Brain & Language, Harry Howard, Tulane University 9 Turbulent oscillation of vocal air The following figure depicts such a transition, in which increasing darkness symbolizes increasing compression of the airflow. The heavy line represents the pressure of the airflow through the vocal folds as a single quantity between a minimum and a maximum. as the vocal folds close, the outflow of air is compressed and its pressure rises; as they open, the outflow of air is rarefied and its pressure falls. A single cycle of closing and opening is defined by the distance between two peaks, marked by dotted white lines.

9/27/13Brain & Language, Harry Howard, Tulane University 10 Graph of turbulent oscillation of vocal air

An example: "phonetician" 9/27/13Brain & Language, Harry Howard, Tulane University 11 fonətɪʃənfonətɪʃən

9/27/13Brain & Language, Harry Howard, Tulane University 12 Frequency This cycling of airflow has a certain frequency the frequency of a phenomenon refers to the number of units that occur during some fixed extent of measurement. The basic unit of frequency, the hertz (Hz), is defined as one cycle per second.

9/27/13Brain & Language, Harry Howard, Tulane University 13 Two sine functions with different frequencies A simple illustration can be found in the next diagram. It consists of the graphs of two sine functions. The one marked with o’s, like beads on a necklace, completes an entire cycle in s, which gives it a frequency of 1.59 Hz. The other wave, marked with x’s so that it looks like barbed wire, completes two cycles in this period. Thus, its frequency is twice as much, 3.18 Hz.

9/27/13Brain & Language, Harry Howard, Tulane University 14 Graph of two sine functions with different frequencies

9/27/13Brain & Language, Harry Howard, Tulane University 15 Fundamental frequency The pitch of the human voice corresponds to the frequency of vocal fold oscillation, called fundamental frequency or F 0. Fundamental frequency & gender The fundamental frequency of a man’s voice averages 125 Hz; the fundamental frequency of a woman’s voice averages 200 Hz. This 60% increase in the pitch of a woman’s voice can be accounted for entirely by the fact that a man's vocal folds are on average 60% longer than a woman’s.

9/27/13Brain & Language, Harry Howard, Tulane University 16 An example: "phonetician"

The fundamental & higher frequencies This brief introduction to the pitch of the human voice leads one to believe that the vocal folds vibrate at a single frequency, that of their fundamental frequency, much as the schematic string on the left side is shown vibrating at its fundamental frequency. 9/27/13Brain & Language, Harry Howard, Tulane University 17

Higher frequencies However, this is but a idealization for the sake of simplification of a rather complex subject. In reality, the vocal folds vibrate at a variety of frequencies that are multiples of the fundamental. The diagram depicts how this is possible – a string can vibrate at a frequency higher than its fundamental because smaller lengths of the string complete a cycle in a shorter period of time. In the particular case of the central diagram, each half of the string completes a cycle in half the time. 9/27/13Brain & Language, Harry Howard, Tulane University 18

Superposition of frequencies This figure displays the outcome of superimposing both frequencies on the string and the waveform. The result is that a pulse of vibration created by the vocal folds projects an abundance of different frequencies in whole-number multiples of the fundamental. If we could hear just this pulse, it would sound, as Loritz (1999:93) says, “more like a quick, dull thud than a ringing bell”. 9/27/13Brain & Language, Harry Howard, Tulane University 19

An example: the spectrogram of "phonetician" 9/27/13Brain & Language, Harry Howard, Tulane University 20 f o n ə t ɪʃ ən

9/27/13Brain & Language, Harry Howard, Tulane University 21 Cavities & resonance But the human voice does not sound like a quick, dull thud; it sounds, well, it sounds like a human voice. This is because the human vocal tract sits on top of the larynx, and the vocal tract enhances the glottal pulse just like a trumpet enhances the shrill tweet of its reed, as illustrated previously. In particular, the buccal and nasal cavities resonate at certain frequencies, thereby exaggerating some harmonics while muting others. The oral cavity itself sits in a channel between two smaller cavities whose size varies according to the position of the tongue and lips. The next diagram zooms in on the buccal cavity to distinguish the other two. Counting from the back, there is 1. a pharyngeal cavity, 2. an oral cavity properly speaking, and 3. a labiodental cavity, between the teeth and the lips. Notice how the difference in tongue position for [i], the vowel in seed, and [a], the vowel in sod, changes the size of the oral and pharyngeal cavities.

9/27/13Brain & Language, Harry Howard, Tulane University 22 The three buccal cavities, articulating [i] and [a]

9/27/13Brain & Language, Harry Howard, Tulane University 23 Formants This difference produces a marked contrast in the frequencies that resonate in these cavities, as shown by the schematic plots of frequency over time in the next figure. Such enhanced frequencies, known as formants, carry the acoustic information that allows us to distinguish [i] from [a], as well as most other speech sounds. Roughly speaking, the resonance of all three cavities together produces the lowest or first formant, the resonance of the pharyngeal & oral cavities produces the second format, and the resonance of the labiodental cavity produces the third formant (Loritz 1999:96). We hedge with “roughly” because the pharyngeal cavity can take on special resonance properties, and the labiodental cavity can combine with the oral cavity; see Ladefoged (1996:123ff) for more detailed discussion.

9/27/13Brain & Language, Harry Howard, Tulane University 24 Schematic spectrograms of the lowest three resonant frequencies (formants) of [i] and [a]

9/27/13Brain & Language, Harry Howard, Tulane University 25 What it really looks like

NEXT TIME Q4 Finish Ingram §5 & start §6. ☞ Go over questions at end of chapter. 9/27/13Brain & Language, Harry Howard, Tulane University 26