The Acoustics and Perception of American English Vowels

Slides:



Advertisements
Similar presentations
Vowel production Introduction to sound waves
Advertisements

Tom Lentz (slides Ivana Brasileiro)
Rhotic Vowels 5. The Special Case of Vocalic /r/ This is the vowel in words like “bird,” “learn,” “nerd,” “sir” Symbol: /Ô/ (schwar) or /ÎÕ/ MacKay prefers.
Acoustic and Physiological Phonetics
Sounds that “move” Diphthongs, glides and liquids.
SPPA 403 Speech Science1 Unit 3 outline The Vocal Tract (VT) Source-Filter Theory of Speech Production Capturing Speech Dynamics The Vowels The Diphthongs.
Human Speech Recognition Julia Hirschberg CS4706 (thanks to John-Paul Hosum for some slides)
From Resonance to Vowels March 8, 2013 Friday Frivolity Some project reports to hand back… Mystery spectrogram reading exercise: solved! We need to plan.
“Connecting the dots” How do articulatory processes “map” onto acoustic processes?
Basic Spectrogram & Clinical Application Lab 9. Spectrographic Features of Vowels n 1st formant carries much information about manner of articulation.
JPN494: Japanese Language and Linguistics JPN543: Advanced Japanese Language and Linguistics Phonology & Phonetics (2)
Acoustic Characteristics of Vowels
Vowels (again) February 23, 2010 The News For Thursday: Give me a (one paragraph or so) description of what you’re thinking of doing for a term project.
Hillenbrand: Vowels1 The Acoustics and Perception of American English Vowels.
Vowel Acoustics, part 2 March 12, 2014 The Master Plan Today: How resonance relates to vowels (= formants) On Friday: In-class transcription exercise.
Speech and speaker normalization (in vowel normalization)
PHONETICS AND PHONOLOGY
Introduction to English vowels
What is Phonetics? Short answer: The study of speech sounds in all their aspects. Phonetics is about describing speech. (Note: phonetics ¹ phonics) Phonetic.
English Phonetics and Phonology Lesson 4A
SPEECH ARTICULATION: Vowels David Brett David Brett.
Phonetics: Vowels LING 400 Winter 2010 Vowels Upper and lower articulators relatively far apart Upper and lower articulators relatively far apart cf.
Source/Filter Theory and Vowels February 4, 2010.
PHONETICS & PHONOLOGY COURSE WINTER TERM 2014/2015.
Phonological Constraints on the Acquisition of Mid Vowels in English for Students in Taiwan author: 黃俐雯 presented by Lisa Liu 報告人: 劉莉莎.
Speech Perception. Phoneme - a basic unit of a speech sound that distinguishes one word from another Phonemes do not have meaning on their own but they.
Speech Production1 Articulation and Resonance Vocal tract as resonating body and sound source. Acoustic theory of vowel production.
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Vowel Acoustics November 2, 2012 Some Announcements Mid-terms will be back on Monday… Today: more resonance + the acoustics of vowels Also on Monday:
NAE Vowels-Part 1 Think about the vowel phonemes as you say the vowels that occur in the middle of these words: beat, bit, bait, bet, bat, but, pot, bought,
DIPHTHONGS Also called gliding vowels A significant glide from one articulatory position to another They have two target configurations represented by.
Stop Acoustics and Glides December 2, 2013 Where Do We Go From Here? The Final Exam has been scheduled! Wednesday, December 18 th 8-10 am (!) Kinesiology.
Stop + Approximant Acoustics
Rhotic Vowels 5. The Special Case of Vocalic R This is the vowel in words like “bird,” “learn,” “nerd,” “sir” Symbol: [ ɚ ] (schwar) or [ ɝ ] MacKay.
Stop Acoustics + Glides December 2, 2015 Down The Stretch They Come Today: Stop and Glide Acoustics Friday: Sonorant Acoustics + USRI evaluations We’ll.
Danielle Werle Undergraduate Thesis Intelligibility and the Carrier Phrase Effect in Sinewave Speech.
Acoustic phonetics: Concerned with describing the acoustics of speech. Also called speech acoustics. Big questions: (1) What are the relationships between.
Rhotic Vowels.
Vowel Symbols [i] beed small i [ɪ] bid cap i, or small cap i
LING 103 Introduction to English Linguistics 2017.
INTRODUCTION TO STATISTICS
Speech 1 Sept 11, 2017 – DAY 6 Brain & Language
Vowel Symbols [i] beed small i [ɪ] bid cap i, or small cap i
Chapter 3: Describing Relationships
Static Vowels ► Diphthongs ► Semivowels ► Stops
Introduction to Summary Statistics
Introduction to Linguistics
Introduction to Summary Statistics
English Phonetics and Phonology
What is Phonetics? Short answer: The study of speech sounds in all their aspects. Phonetics is about describing speech. (Note: phonetics ¹ phonics) Phonetic.
Introduction to Summary Statistics
Introduction to Summary Statistics
IPA Vowel Symbols [i] small i
The Acoustics and Perception of American English Vowels
Introduction to Summary Statistics
CHAPTER 3 Describing Relationships
Preparing a PROFILOR® Feedback Report
Speech Perception (acoustic cues)
Introduction to Summary Statistics
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Introduction to Summary Statistics
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Introduction to Summary Statistics
Using the Rule Normal Quantile Plots
CHAPTER 3 Describing Relationships
Using the Rule Normal Quantile Plots
Presentation transcript:

The Acoustics and Perception of American English Vowels Hillenbrand: Vowels

Vowel Symbols [i] heed small i [ɪ] hid cap i, or small cap i [e] hayed, bait small e [ɛ] head epsilon [æ] had ash [ɑ] hod, pod script a (note the difference between [ɑ] and [a] [ɔ] hawed, caught open o [o] hoed, boat small o [ʊ] hood upsilon [u] who’d, boot small u [ʌ] hud, but caret or wedge or turned v [ɚ] heard schwar (you may have learned [ɝ]) [ə] about, mantra schwa

Major dimensions of Vowel Articulation Tongue height [e.g., [i] (“beet”) vs. [æ] (“bat”)] Frontness or advancement [e.g., [æ] (“pat”) vs. [ɑ] (“pot”)] Lip rounding (e.g., [u] vs. [ɑ]) (There are many secondary dimensions as well.)

Vowel Quadrilateral for English

If you are not familiar with the vowel quadrilateral – i. e If you are not familiar with the vowel quadrilateral – i.e., which vowels are high, low & mid, which are front, back & central, which are rounded and which are retracted – you will need to review. If you need help finding material, let me know.

Formant Patterns for the “Non-central” (i. e Formant Patterns for the “Non-central” (i.e., omitting /ʌ/ and /ɚ/) Monophthongal Vowels of American English (based on Peterson & Barney averages) Hillenbrand: Vowels

Another Way to Visualize Formant Data for Vowels: The “Standard” F1-F2 Plot Hillenbrand: Vowels

Hillenbrand: Vowels

Hillenbrand: Vowels

Formant Data for Men “Standard” F1-F2 Plot Hillenbrand: Vowels

Notice that the formant values for women for a given vowel are shifted up and to the right, indicating higher values for both F1 and F2. This is due to the shorter vocal tracts of women vs. men. The same is true of the relationship between the formant values of children relative to women – and for the same reason; i.e., children have shorter vocal tracts than women. Hillenbrand: Vowels

Q: Is the upward shift in formants (M vs. W vs Q: Is the upward shift in formants (M vs. W vs. C) also due to differences in the length and mass of the vocal folds across the three talker groups? Hillenbrand: Vowels

One More (apparently screwy) Way to Visualize Vowel Formant Data: The Acoustic Vowel Diagram Note that in the Acoustic Vowel Diagram: (1) the axes are reversed, (2) the numbers go backwards. Why would anyone do such a screwy thing? Hillenbrand: Vowels

Acoustic Vowel Diagram Conventional F1-F2 Plot Acoustic Vowel Diagram Hillenbrand: Vowels

Formant data are being plotted, but the result strongly resembles an articulatory vowel diagram, with the x axis corresponding to tongue advancement (i.e., front vs. back) and the y axis corresponding to tongue height. This gives us a convenient way to interpret formant data in articulatory terms.

What is the articulatory explanation for the differences in formant frequencies? What effect might this have on the intelligibility of the vowels spoken by the deaf talker? Data shown above are hypothetical, but this is exactly the sort of thing that has been observed in the speech of deaf talkers. For example, Monsen (1978) showed that: (a) the formant values of deaf talkers tend to be centralized relative to NH talkers, and (b) the degree of centralization is a good predictor of speech intelligibility.

Peterson & Barney (1952) Study conducted at Bell Labs. The 1st big acoustic study carried out with the (then) recently invented sound spectrograph machine. 1. Recordings 10 vowels (i,ɪ,ɛ,æ,ɑ,ɔ,ʊ,u,ʌ,ɚ) in /hVd/ context (heed, hid, head, had, etc.); 76 talkers (33 men, 28 women, 15 children) 2. Measurements: f0, F1-F3 3. Listening Study 70 listeners asked to identify each test signal as one of ten words (heed, hid, head, had, etc.) Hillenbrand: Vowels

Listening Test Results Simple: The signals were highly intelligible: 94.5% Error rate varied some across vowels. For example: Error rate very low for: [i] (0.1%), [ɚ] (0.4%), [u] (0.8%) Higher for: [ɑ] (13.0%, confused with [ɔ]) [ɔ] (12.1%, confused with [ɑ]) ______________________________________________________ Details aside, the simple message is that vowel identity was transmitted quite accurately to the listeners. What information do listeners use to recognize vowels? To answer this, we need to start by looking at the acoustic data. ___________________________________________________________ Hillenbrand: Vowels

English Vowel Formant Data Peterson & Barney (1952) General American English Vowel Formant Data Most striking: Lots of overlap among adjacent vowels Hillenbrand: Vowels

It is mostly the case that the men occupy the lower left portion of each ellipse, the children occupy the upper right portion, and the women cluster toward the center. This is mainly due to differences in vocal- tract length. There is quite a bit of variability across individual talkers, though. (Data from Peterson & Barney, 1952.)

Same Data as Previous Figure, but Plotted on a Single Graph Hillenbrand: Vowels

Michigan (Northern Cities) Vowel Formant Data Hillenbrand, Getty, Clark & Wheeler (1995) Michigan (Northern Cities) Vowel Formant Data 1. Lots of overlap among adjacent vowels 2. [æ] and [ɛ] almost on top of one another, and out of order from Peterson & Barney (1952) Hillenbrand: Vowels

Peterson & Barney (Mostly Mid-Atlantic) vs. Hillenbrand et al. (Upper Midwest/Northern Cities) 1. [æ] is raised and fronted in Northern Cities data 2. Back vowels fronted (e.g., [ɑ,ɔ]) are lower in N. Cities data 3. High vowels ([i ɪ u ʊ]) not quite as high in N. Cities data Hillenbrand: Vowels

This is a much simpler idea than you might be thinking. Question: How well can vowels be separated based on F1 and F2 alone? This is the kind of question that can be answered with a statistical pattern recognition algorithm. This is a much simpler idea than you might be thinking. Hillenbrand: Vowels

How a Pattern Recognizer Works Training Testing Hillenbrand: Vowels

Q: So, how well can vowels be separated based on F1 and F2 alone? A: Pretty well, but not nearly well enough to explain human listener data. ____________________________________________________________ Pattern classification results from Hillenbrand-Gayvert (1993) Automatic Human Classification Listeners Peterson & Barney vowels: 74.9% 94.4% Hillenbrand et al. vowels: 68.2% 95.4% Listeners must be using information to recognize vowels other than F1 and F2. Like what? Hillenbrand: Vowels

F3 and f0: Better still (~89-90%), but still below human listeners. So, listeners must be using some information to recognize vowels other than F1 and F2. What information? F3: It helps some (especially for /ɚ/), but not enough: Automatic classification improves to about 80-85% – better, but still well below human listeners. f0: Ditto: It helps some, but not enough: Automatic classification improves to about 80-85% – better, but still well below human listeners. F3 and f0: Better still (~89-90%), but still below human listeners. Hillenbrand: Vowels

Patterns of spectral change over time What does this mean? It appears as though listeners are recognizing vowels based on information other than F0 and F1-F3. What are the possibilities? Two Candidates: Duration Patterns of spectral change over time Hillenbrand: Vowels

Do Listeners Use Duration in Vowel Identification? ___________________________________   American English Vowels Have Different Typical Durations /i/ > /I/ /u/ > /U/ /A/ > /‰/ /å/ > /ú/ /Ø/ > /å/ Do Listeners Use Duration in Vowel Identification? Hillenbrand: Vowels

[hæd] Original Duration Short Duration Long Duration Utterances were presented at their original durations, or they were artificially shorted or lengthened – but keeping everything else the same. Hillenbrand: Vowels

Shortened [i] ought to be heard as [ɪ] Logic: If duration plays no role in vowel recognition, the three signal types ought to be equally intelligible; i.e., artificially modifying duration will not affect what vowel is heard. On the other hand, if duration plays a role in vowel perception, the OD signals ought to be more intelligible than any of the duration-modified signals. Also, there are specific kinds of changes in vowel identity that we would expect. For example: Shortened [i] ought to be heard as [ɪ] Lengthened [ɪ] ought to be heard as [i] Shortened [ɑ] ought to be heard as [ʌ] Lengthened [ʌ] ought to be heard as [ɑ] Shortened [u] ought to be heard as [ʊ] Lengthened [ʊ] ought to be heard as [u] Shortened [æ] ought to be heard as [ɛ] Lengthened [ɛ] ought to be heard as [æ] Hillenbrand: Vowels

RESULTS Original Duration: 96.0% Short Duration: 91.4%    Original Duration: 96.0% Short Duration: 91.4% Long Duration: 90.9% Hillenbrand: Vowels

Effects of Duration on Vowel Perception Original Duration, Long Duration, Short Duration Hillenbrand: Vowels

CONCLUSIONS 1. Duration has a measurable but fairly small overall effect on vowel perception.   2. Vowel Shortening (-2 SDs): ~5% drop in overall intelligibility 3. Vowel Lengthening (+2 SDs): ~5% drop in overall intelligibility 4. Vowels Most Affected: [ɑ]-[ɔ]-[ʌ], [æ]-[ɛ] 5. Vowels Not Affected: [i]-[ɪ], [u]-[ʊ] Hillenbrand: Vowels

The Role of Spectral Change in Vowel Perception Notice that some vowels – especially [æ] and [ɪ] – show a fair amount of change in formant freq’s throughout the vowel. Is it possible that these formant movements are perceptually significant?

More examples. Note especially the rise in F2 for [ʊ] and [ʌ]. Hillenbrand: Vowels

Another way to visualize patterns of formant frequency change in vowels: This figure shows formant frequencies measured at the beginning of the vowel and a 2nd time at the end of the vowel. (The phonetic symbol is plotted at the 2nd measurement). Note that some vowels (e.g., [i] & [u]) are pretty steady over time, but others have formants that change quite a bit throughout the course of the vowel (e.g., [e,o,ʌ,ʊ,æ,ɪ]). Hillenbrand: Vowels

NAT: Naturally spoken [hæd] OF: Synthesized, preserving original formant contours FF: Synthesized with flattened formants Hillenbrand: Vowels

Spectral change patterns do matter – quite a bit. Key comparison is OF vs. FF: If the formant movements don’t matter, the recognition rates for OF and FF should be very similar. On the other hand, if the formant movements are important, the FF signals will be less intelligible than the OF signals. Conclusion Spectral change patterns do matter – quite a bit. Hillenbrand: Vowels

What can we conclude from all this about how listeners recognize which vowel was spoken? 1. Primary Cues: F1 and F2 Relationships among the formants matter, not absolute formant frequencies 2. Cues that are of secondary importance, but definitely play a role in vowel perception: f0 F3 (especially for [ɚ]) Spectral change patterns Vowel duration Hillenbrand: Vowels

Implications for 2nd Language Learning Is any of this information – e.g., the role played by vowel duration and spectral change in vowel perception – useful? These findings we just reviewed are not universal facts about vowels; they are facts about English vowels. Other languages will behave the same way only by accident. Hillenbrand: Vowels

English, for example, has a pretty large (and therefore crowded) vowel system. Only 12 vowels are plotted below, and it’s pretty crowded. Including diphthongs, English has 15 vowel phonemes. This almost certainly explains why duration and spectral change are important – these features give speakers two more ways to differentiate on vowel from another.

[æ] [ɛ] Example: Notice how close [æ] and [ɛ] are to one another. How do speakers distinguish these two vowels? How do listeners figure out which is which? (The same question posed from two points of view.) [æ] [ɛ]

Why does any of this matter? Many languages have much smaller vowel systems than English. Examples: Spanish (5), Italian (7), Japanese (5), … Spanish vowels

The simple point is that a speaker of a language like Spanish has some work to do – as a speaker (learning many brand-new vowels) AND as a listener – learning what native English speakers learned as children: e.g., learning that features like duration and spectral change now matter. Spanish vowels We’ll be talking about some closely related aspects of 2nd-language learning a little later.