Download presentation
Presentation is loading. Please wait.
1
Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 {steveng, hmcarvey, leahh, shawnc}@icsi.berkeley.edu
2
Acknowledgements and Thanks Research Funding U.S. Department of Defense U.S. National Science Foundation
3
For Further Information Consult the web site: www.icsi.berkeley.edu/~steveng
4
OVERTURE The Central Challenge for Models of Speech Recognition
5
The Serial Frame Perspective on Speech Traditional models of speech recognition assume the identity of a phonetic segment is derived from a detailed spectral profile of the acoustic signal computed for each time interval (frame) of speech
6
Phonemic Beads on a String Illustrated In traditional models of speech recognition words are represented as mere sequences of phonetic segments (“phones”) ….
7
Phonemic Beads on a String Illustrated In traditional models of speech recognition words are represented as mere sequences of phonetic segments (“phones”) …. Strung together like “beads on a string”
8
Phonemic Beads on a String Illustrated In traditional models of speech recognition words are conceptualized as mere sequences of phonetic segments (“phones”) …. Strung together like “beads on a string” No quarter is provided for stress accent or other syllabic properties
9
Language - The Traditional Perspective The “classical” view of spoken language posits a quasi-arbitrary relation between the lower and higher tiers of linguistic organization Cat= [k] + [ae] + [t] Cat = /k/ + /ae/ + /t/ ASR systems focus on decoding words from sequences of phones
10
A Challenge for the “Phonemic Beads on a String” Approach to Speech Recognition Pronunciation Variability
11
Pronunciation Variability of Real Speech Pronunciation patterns encountered in everyday life are extremely diverse
12
Pronunciation Variability of Real Speech Pronunciation patterns encountered in everyday life are extremely diverse There are literally dozens of ways in which common words are pronounced
13
Pronunciation Variability of Real Speech Pronunciation patterns encountered in everyday life are extremely diverse There are literally dozens of ways in which common words are pronounced (as the following two slides illustrate for the word “AND” based on manual phonetic annotation of a corpus comprising telephone dialogues)
14
How Many Pronunciations of “and”? NPronunciationN Canonical pronunciation
15
How Many Pronunciations of “and”? NPronunciationN
16
Pronunciation Variability of Real Speech The are literally dozens of ways in which common words are pronounced And as the following slide illustrates for the 20 most frequent words from the same corpus (Switchboard)
17
Pronunciation Variability of Real Speech The are literally dozens of ways in which common words are pronounced And as the following slide illustrates for the 20 most frequent words from the same corpus (Switchboard) (which together account for 35% of the word tokens in the corpus)
18
How Many Different Pronunciations? RankWordN#Pron Most Common Pronunciation MCP %Total The 20 most frequency words account for 35% of the tokens
19
QUESTION How do listeners decode the speech signal given the large amount of pronunciation variation?
20
PART ONE Anatomy of a Syllable
21
Language - A Syllable-Centric Perspective A more empirically grounded perspective of spoken language focuses on the SYLLABLE as the interface between “sound” and “meaning” Within this framework the relationship between the syllable and the higher and lower tiers is non-arbitrary and systematic statistically
22
The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure
23
The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position
24
The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position (as well as stress accent level)
25
The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position (as well as stress accent level) As a consequence, we will examine the onsets, codas and nuclei of syllables separately in order to gain insight into the underlying patterns
26
The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position (as well as stress accent level) As a consequence, we will examine the onsets, codas and nuclei of syllables separately in order to gain insight into the underlying patterns What is an onset?
27
The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position (as well as stress accent level) As a consequence, we will examine the onsets, codas and nuclei of syllables separately in order to gain insight into the underlying patterns What is a onset? What is a nucleus?
28
The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position (as well as stress accent level) As a consequence, we will examine the onsets, codas and nuclei of syllables separately in order to gain insight into the underlying patterns What is a onset? What is a nucleus? What is a coda?
29
The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position (as well as stress accent level) As a consequence, we will examine the onsets, codas and nuclei of syllables separately in order to gain insight into the underlying patterns What is an onset? What is a nucleus? What is a coda? The following slides provide a brief (and gentle) introduction to syllable structure
30
Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA “J” = JUNCTUREOGI Numbers95 corpus
31
Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA Virtually all syllables contain a NUCLEUS, which is VOCALIC (by definition) “J” = JUNCTUREOGI Numbers95 corpus
32
Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA Virtually all syllables contain a NUCLEUS, which is VOCALIC (by definition) Most (but not all) syllables also contain an ONSET (usually a CONSONANT) “J” = JUNCTUREOGI Numbers95 corpus
33
Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA Virtually all syllables contain a NUCLEUS, which is VOCALIC (by definition) Most (but not all) syllables also contain an ONSET (usually a CONSONANT) Many syllables contain a CODA (also typically a CONSONANT) “J” = JUNCTUREOGI Numbers95 corpus
34
Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA Virtually all syllables contain a NUCLEUS, which is VOCALIC (by definition) Most (but not all) syllables also contain an ONSET (usually a CONSONANT) Many syllables contain a CODA (also typically a CONSONANT) The most common syllable form in English is Onset + Nucleus + Coda (“Nine”) “J” = JUNCTUREOGI Numbers95 corpus
35
Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA Virtually all syllables contain a NUCLEUS, which is VOCALIC (by definition) Most (but not all) syllables also contain an ONSET (usually a CONSONANT) Many syllables contain a CODA (also typically a CONSONANT) The most common syllable form in English is Onset + Nucleus + Coda (“Nine”) Followed in popularity by Onset + Nucleus (“Two”) “J” = JUNCTUREOGI Numbers95 corpus
36
Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA Virtually all syllables contain a NUCLEUS, which is VOCALIC (by definition) Most (but not all) syllables also contain an ONSET (usually a CONSONANT) Many syllables contain a CODA (also typically a CONSONANT) The most common syllable form in English is Onset + Nucleus + Coda (“Nine”) Followed in popularity by Onset + Nucleus (“Two”) Onset segments often differ in significant ways from coda segments “J” = JUNCTUREOGI Numbers95 corpus
37
PART TWO Spectro-Temporal Profiles
38
The Spectro-Temporal Profile (STeP) Certain specific (and important) properties of the syllable are not well represented in terms of the traditional 2.5-D spectrographic representation
39
The Spectro-Temporal Profile (STeP) Certain specific (and important) properties of the syllable are not well represented in terms of the traditional 2.5-D spectrographic representation STRESS ACCENT and JUNCTURE are two such properties
40
The Spectro-Temporal Profile (STeP) Certain specific (and important) properties of the syllable are not well represented in terms of the traditional 2.5-D spectrographic representation Stress Accent and Juncture are two such properties A different representation, based on the log, critical-band energy profile across frequency and time, can provide the requisite detail
41
The Spectro-Temporal Profile (STeP) Certain specific (and important) properties of the syllable are not well represented in terms of the traditional 2.5-D spectrographic representation Stress Accent and Juncture are two such properties A different representation, based on the log, critical-band energy profile across frequency and time, can provide the requisite detail As shown in “miniature” below …..
42
The Spectro-Temporal Profile (STeP) Certain specific (and important) properties of the syllable are not well represented in terms of the traditional 2.5-D spectrographic representation Stress Accent and Juncture are two such properties A different representation, based on the log, critical-band energy profile across frequency and time, can provide the requisite detail As shown in “miniature” below ….. STePs are derived from averages of hundreds of individual instances
43
The Spectro-Temporal Profile (STeP) Certain specific (and important) properties of the syllable are not well represented in terms of the traditional 2.5-D spectrographic representation Stress Accent and Juncture are two such properties A different representation, based on the log, critical-band energy profile across frequency and time, can provide the requisite detail As shown in “miniature” below …. (and as shown in expanded form on the following slides) STePs are derived from averages of hundreds of individual instances
44
Spectro-Temporal Profile - DiSyllabic Word [s] [eh] [vx] [en] juncture accented syllable unaccented syllable “Seven” mean duration Full-spectrum perspective OGI Numbers95 [s] [eh] [vx] [en]
45
[s] [eh] [vx] [en] juncture accented syllable unaccented syllable mean duration “Seven” High-frequency perspective OGI Numbers95 [s] [eh] [vx] [en] Spectro-Temporal Profile - DiSyllabic Word
46
PART THREE Scientific Approach to Speech Recognition
47
Ascertain the contribution of …. A Scientific Approach to Speech Recognition
48
Ascertain the contribution of …. (1) phonetic segment (and feature) classification A Scientific Approach to Speech Recognition
49
Ascertain the contribution of …. (1) phonetic segment (and feature) classification (2) phonetic segmentation A Scientific Approach to Speech Recognition
50
Ascertain the contribution of …. (1) phonetic segment (and feature) classification (2) phonetic segmentation (3) stress accent, and A Scientific Approach to Speech Recognition
51
Ascertain the contribution of …. (1) phonetic segment (and feature) classification (2) phonetic segmentation (3) stress accent, and (4) syllable position A Scientific Approach to Speech Recognition
52
Ascertain the contribution of …. (1) phonetic segment (and feature) classification (2) phonetic segmentation (3) stress accent, and (4) syllable position to ASR performance A Scientific Approach to Speech Recognition
53
Ascertain the contribution of …. (1) phonetic segment (and feature) classification (2) phonetic segmentation (3) stress accent, and (4) syllable position to ASR performance Using the OGI Numbers95 Corpus as a controlled (limited vocabulary) corpus A Scientific Approach to Speech Recognition
54
Ascertain the contribution of …. (1) phonetic segment (and feature) classification (2) phonetic segmentation (3) stress accent, and (4) syllable position to ASR performance Using the OGI Numbers95 Corpus as a controlled (limited vocabulary) corpus And a relatively transparent recognition engine utilizing the following variety of articulatory-based features: manner and place of articulation, voicing, vowel height, lip-rounding, spectral dynamics, segment length A Scientific Approach to Speech Recognition
55
Ascertain the contribution of …. (1) phonetic segment (and feature) classification (2) phonetic segmentation (3) stress accent, and (4) syllable position to ASR performance Using the OGI Numbers95 Corpus as a controlled (limited vocabulary) corpus And a relatively transparent recognition engine utilizing the following variety of articulatory-based features: manner and place of articulation, voicing, vowel height, lip-rounding, spectral dynamics, segment length That are explicitly tied to syllable position (i.e., onset, nucleus and coda) and stress-accent level A Scientific Approach to Speech Recognition
56
Ascertain the contribution of …. (1) phonetic segment (and feature) classification (2) phonetic segmentation (3) stress accent, and (4) syllable position to ASR performance Using the OGI Numbers95 Corpus as a controlled (limited vocabulary) corpus And a relatively transparent recognition engine utilizing the following variety of articulatory-based features: manner and place of articulation, voicing, vowel height, lip-rounding, spectral dynamics, segment length That are explicitly tied to syllable position (i.e., onset, nucleus and coda) and stress-accent level We will be comparing the “baseline” system (entirely automatic recognition) with an entirely “fabricated” set of input data (derived from hand-labeled phonetic annotation + autoSAL) as well as a “half-way house” system that is partially automatic and partially not (manually derived phonetic segmentation, as well as whether each segment is vocalic or not) A Scientific Approach to Speech Recognition
57
Entirely Stress-Accent Dependent Results Word Error Rate Fabricated 1.3% Half-way House2.0% Baseline 5.6% Numbers95 Recognition – Stress Accent Impact
58
Entirely Stress-Accent Dependent Results Word Error Rate Fabricated 1.3% Half-way House2.0% Baseline 5.6% The half-way house system is much closer in performance to the fabricated data version than to the baseline system, suggesting that …. Numbers95 Recognition – Stress Accent Impact
59
Entirely Stress-Accent Dependent Results Word Error Rate Fabricated 1.3% Half-way House2.0% Baseline 5.6% The half-way house system is much closer in performance to the fabricated data version than to the baseline system, suggesting that …. Accurate phonetic segmentation is extremely important for enhanced ASR performance, as is knowledge of the location of the syllabic nucleus Numbers95 Recognition – Stress Accent Impact
60
Entirely Stress-Accent Dependent Results Word Error Rate Fabricated 1.3% Half-way House2.0% Baseline 5.6% The half-way house system is much closer in performance to the fabricated data version than to the baseline system, suggesting that …. Accurate phonetic segmentation is extremely important for enhanced ASR performance, as is knowledge of the location of the syllabic nucleus Stress-accent information most important for the vocalic nucleus – without it WER increases by 10-20% Numbers95 Recognition – Stress Accent Impact
61
Entirely Stress-Accent Dependent Results Word Error Rate Fabricated 1.3% Half-way House2.0% Baseline 5.6% The half-way house system is much closer in performance to the fabricated data version than to the baseline system, suggesting that …. Accurate phonetic segmentation is extremely important for enhanced ASR performance, as is knowledge of the location of the syllabic nucleus Stress-accent information most important for the vocalic nucleus – without it WER increases by 10-20% Also important for coda – WER increases by 7-15% Numbers95 Recognition – Stress Accent Impact
62
Effect of pronunciation variation as a function of syllable position, where the “canonical” pronunciation is potentially fixed for each syllable position separately (or “All” together) “Standard” refers to regular recognition system Word Error Rate StandardOnsetNucleus Coda All Fabricated 1.29 1.33 1.61 1.63 1.76% Half-way House1.97 2.16 2.21 2.55 2.81% Baseline 5.59 5.91 5.91 6.70 7.03% Numbers95 Recognition – Pronunciation Impact
63
Effect of pronunciation variation as a function of syllable position, where the “canonical” pronunciation is potentially fixed for each syllable position separately (or “All” together) “Standard” refers to regular recognition system Word Error Rate StandardOnsetNucleus Coda All Fabricated 1.29 1.33 1.61 1.63 1.76% Half-way House1.97 2.16 2.21 2.55 2.81% Baseline 5.59 5.91 5.91 6.70 7.03% Conclusions: Onset segments are most canonical Numbers95 Recognition – Pronunciation Impact
64
Effect of pronunciation variation as a function of syllable position, where the “canonical” pronunciation is potentially fixed for each syllable position separately (or “All” together) “Standard” refers to regular recognition system Word Error Rate StandardOnsetNucleus Coda All Fabricated 1.29 1.33 1.61 1.63 1.76% Half-way House1.97 2.16 2.21 2.55 2.81% Baseline 5.59 5.91 5.91 6.70 7.03% Conclusions: Onset segments are most canonical Coda segments are least canonical Numbers95 Recognition – Pronunciation Impact
65
Effect of pronunciation variation as a function of syllable position, where the “canonical” pronunciation is potentially fixed for each syllable position separately (or “All” together) “Standard” refers to regular recognition system Word Error Rate StandardOnsetNucleus Coda All Fabricated 1.29 1.33 1.61 1.63 1.76% Half-way House1.97 2.16 2.21 2.55 2.81% Baseline 5.59 5.91 5.91 6.70 7.03% Conclusions: Onset segments are most canonical Coda segments are least canonical Therefore, it is important to provide for pronunciation variation in ASR system Numbers95 Recognition – Pronunciation Impact
66
Effect of pronunciation variation as a function of syllable position, where each syllabic constituent is “neutralized” with respect to lexical matching (i.e., each element is factored out of the decoding process separately) “Standard” refers to the regular recognition system Word Error Rate Standard Onset Nucleus Coda Fabricated 1.29 9.70 5.95 3.92% Half-way House1.97 11.27 13.28 6.60% Baseline 5.59 15.70 20.22 10.13% Numbers95 – Syllable Position Importance
67
Effect of pronunciation variation as a function of syllable position, where each syllabic constituent is “neutralized” with respect to lexical matching (i.e., each element is factored out of the decoding process separately) “Standard” refers to the regular recognition system Word Error Rate Standard Onset Nucleus Coda Fabricated 1.29 9.70 5.95 3.92% Half-way House1.97 11.27 13.28 6.60% Baseline 5.59 15.70 20.22 10.13% Neutralization of the onset and nucleic elements exerts a greater impact on ASR performance than codas Numbers95 – Syllable Position Importance
68
Effect of pronunciation variation as a function of syllable position, where each syllabic constituent is “neutralized” with respect to lexical matching (i.e., each element is factored out of the decoding process separately) “Standard” refers to the regular recognition system Word Error Rate Standard Onset Nucleus Coda Fabricated 1.29 9.70 5.95 3.92% Half-way House1.97 11.27 13.28 6.60% Baseline 5.59 15.70 20.22 10.13% Neutralization of the onset and nucleic elements exerts a greater impact on ASR performance than codas Conclusion: Onsets and nuclei are most important for lexical access in an ASR system (at least for the Numbers95 corpus) Numbers95 – Syllable Position Importance
69
PART FOUR Being Phonetically and Prosodically Annotated
70
Phonetic Transcription of Spontaneous English Telephone dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically annotated (labeled and segmented)
71
Phonetic Transcription of Spontaneous English Telephone dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically transcribed (labeled and segmented) Most of this material has been annotated manually
72
Phonetic Transcription of Spontaneous English Telephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically transcribed (labeled and segmented) Most of this material has been annotated manually 4 hours labeled at the phone level and segmented at the syllabic level
73
Phonetic Transcription of Spontaneous English Telephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically transcribed (labeled and segmented) Most of this material has been annotated manually 4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment level
74
Phonetic Transcription of Spontaneous English Telephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically transcribed (labeled and segmented) Most of this material has been annotated manually 4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment level The remaining material segmented at the phonetic-segment level using automatic methods
75
Phonetic Transcription of Spontaneous English Telephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically transcribed (labeled and segmented) Most of this material has been annotated manually 4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment level The remaining material segmented at the phonetic-segment level using automatic methods 45 minutes of hand-labeled stress-accent material
76
Phonetic Transcription of Spontaneous English Telephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically transcribed (labeled and segmented) Most of this material has been annotated manually 4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment level The remaining material segmented at the phonetic-segment level using automatic methods 45 minutes of hand-labeled stress-accent material An additional four hours of stress-accent material automatically labeled (though unused in the current analysis)
77
Phonetic Transcription of Spontaneous English Telephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically transcribed (labeled and segmented) Most of this material has been annotated manually 4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment level The remaining material segmented at the phonetic-segment level using automatic methods 45 minutes of hand-labeled stress-accent material An additional four hours of stress-accent material automatically labeled (though unused in the current analysis) There is a Lot of Diversity in the Material Transcribed
78
Phonetic Transcription of Spontaneous English Telephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically transcribed (labeled and segmented) Most of this material has been annotated manually 4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment level The remaining material segmented at the phonetic-segment level using automatic methods 45 minutes of hand-labeled stress-accent material An additional four hours of stress-accent material automatically labeled (though unused in the current analysis) There is a Lot of Diversity in the Material Transcribed Spans speech of both genders (ca. 50/50%), reflecting a wide range of American dialectal variation, speaking rate and voice quality
79
Phonetic Transcription of Spontaneous English Telephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically transcribed (labeled and segmented) Most of this material has been annotated manually 4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment level The remaining material segmented at the phonetic-segment level using automatic methods 45 minutes of hand-labeled stress-accent material An additional four hours of stress-accent material automatically labeled (though unused in the current analysis) There is a Lot of Diversity in the Material Transcribed Spans speech of both genders (ca. 50/50%), reflecting a wide range of American dialectal variation, speaking rate and voice quality Transcription System A variant of Arpabet, with phonetic diacritics such as:_gl,_cr, _fr, _n, _vl, _vd
80
Phonetic Transcription of Spontaneous English The Data are Available at ….
81
Phonetic Transcription of Spontaneous English The Data are Available at …. http://www.icsi/berkeley.edu/real/stp
82
Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent
83
Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent Three levels of accent were distinguished:
84
Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent Three levels of accent were distinguished: Heavy
85
Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent Three levels of accent were distinguished: HeavyLight
86
Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent Three levels of accent were distinguished: HeavyLightNone
87
Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent Three levels of accent were distinguished: HeavyLightNone
88
Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent Three levels of accent were distinguished: HeavyLightNone (In actuality, labelers assigned a “1” to fully accented syllables, a “null” to completely unaccented syllables, and a “0.5” to all others)
89
Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent Three levels of accent were distinguished: HeavyLightNone (In actuality, labelers assigned a “1” to fully accented syllables, a “null” to completely unaccented syllables, and a “0.5” to all others) An example of the annotation (attached to the vocalic nucleus) is shown below (where the accent levels could not be derived from a dictionary)
90
Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent Three levels of accent were distinguished: HeavyLightNone (In actuality, labelers assigned a “1” to fully accented syllables, a “null” to completely unaccented syllables, and a “0.5” to all others) An example of the annotation (attached to the vocalic nucleus) is shown below (where the accent levels could not be derived from a dictionary) In this example most of the syllables are unaccented, with two labeled as lightly accented (0.5)
91
Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent Three levels of accent were distinguished: HeavyLightNone (In actuality, labelers assigned a “1” to fully accented syllables, a “null” to completely unaccented syllables, and a “0.5” to all others) An example of the annotation (attached to the vocalic nucleus) is shown below (where the accent levels could not be derived from a dictionary) In this example most of the syllables are unaccented, with two labeled as lightly accented (0.5) (and one other labeled as very lightly accented (0.25))
92
The data are available at …. Annotation of Stress Accent
93
The data are available at …. http://www.icsi/berkeley.edu/~steveng/prosody Annotation of Stress Accent
94
Automatic Labeling of Stress Accent This forty-five minutes of hand-labeled phonetic and prosodic annotation from the Switchboard corpus was used as training data for development of an Automatic Stress Accent Labeling System (AutoSAL)
95
How Good is AutoSAL? There is an 79% concordance between human and machine accent labels when the tolerance level is a quarter-step
96
How Good is AutoSAL? There is an 79% concordance between human and machine accent labels when the tolerance level is a quarter-step There is 97.5% concordance when the tolerance level is half a step
97
How Good is AutoSAL? There is an 79% concordance between human and machine accent labels when the tolerance level is a quarter-step There is 97.5% concordance when the tolerance level is half a step This degree of concordance is as high as that exhibited by two highly trained (human) transcribers
98
PART FIVE Stress Accent and Syllable Position
99
The Importance of Syllable Structure Before going into the details of durational variation at the segmental level we briefly examine some general patterns of pronunciation variation that are conditioned by syllable position and stress accent
100
The Importance of Syllable Structure Before going into the details of durational variation at the segmental level we briefly examine some general patterns of pronunciation variation that are conditioned by syllable position and stress accent These data serve to illustrate the sort of variation observed that is conditioned by position within the syllable
101
All Segments Pronunciation Variation – Syllable and Accent Deletions Insertions Substitutions Pronunciation variation is systematic at the level of the syllable CODA Territory ONSET Territory NUCLEUS Territory
102
All Segments Pronunciation Variation – Syllable and Accent Deletions Insertions Substitutions Pronunciation variation is systematic at the level of the syllable Particularly when stress accent is also taken into account CODA Territory ONSET Territory NUCLEUS Territory
103
Pronunciation Variation – Syllable and Accent Pronunciation variation is systematic at the level of the syllable Particularly when stress accent is also taken into account BOTH syllable structure and accent level are required for a full accounting All Segments Deletions Insertions Substitutions CODA Territory ONSET Territory NUCLEUS Territory
104
PART SIX Durational Properties of Pronunciation Variation
105
Analysis of Durational Properties of Speech The following analyses are conditioned on stress accent level and (for the most part) syllable position
106
Analysis of Durational Properties of Speech The following analyses are conditioned on stress accent level and (for the most part) syllable position We’ll begin with analyses illustrating the patterns associated with three levels of stress accent (heavy, light and none) to show the graded nature of the durational properties pertaining to syllable and segment duration
107
Analysis of Durational Properties of Speech The following analyses are conditioned on stress accent level and (for the most part) syllable position We’ll begin with analyses illustrating the patterns associated with three levels of stress accent (heavy, light and none) to show the graded nature of the durational properties pertaining to syllable and segment duration However, for purposes of illustrative clarity, many of the slides will show only two levels of accent (heavy and none) in order to delineate the differences in duration associated with stress accent level
108
Analysis of Durational Properties of Speech The following analyses are conditioned on stress accent level and (for the most part) syllable position We’ll begin with analyses illustrating the patterns associated with three levels of stress accent (heavy, light and none) to show the graded nature of the durational properties pertaining to syllable and segment duration However, for purposes of illustrative clarity, many of the slides will show only two levels of accent (heavy and none) in order to delineate the differences in duration associated with stress accent level Under such conditions, the durational properties associated with light accent are generally intermediate between heavy accent and none
109
Syllable Duration - Across Syllable Forms There is a broad range of syllable structures observed in spoken English
110
Syllable Duration - Across Syllable Forms There is a broad range of syllable structures observed in spoken English The CV and CVC forms cover ca. 60% of the syllables V = Vowel C = Consonant
111
Syllable Duration - Across Syllable Forms There is a broad range of syllable structures observed in spoken English The CV and CVC forms cover ca. 60% of the syllables Together, the V, VC, CV and CVC forms account for 85% of syllables V = Vowel C = Consonant
112
Syllable Duration - Across Syllable Forms There is a broad range of syllable structures observed in spoken English The CV and CVC forms cover ca. 60% of the syllables Together, the V, VC, CV and CVC forms account for 85% of syllables The CVCC and CCVC (complex syllable) forms account for another 10% V = Vowel C = Consonant
113
Syllable Duration - Across Syllable Forms It is unsurprising that syllable duration is largely a function of the number of segments within the syllable (as shown in the graph below) Canonical Syllable Forms V = Vowel C = Consonant
114
Syllable Duration - Across Syllable Forms It is unsurprising that syllable duration is largely a function of the number of segments within the syllable (as shown in the graph below) Note the systematic lengthening of the syllable for each form as the accent level increases from “NONE” to “LIGHT “to “HEAVY” Canonical Syllable Forms V = Vowel C = Consonant
115
Syllable Duration - Across Syllable Forms It is unsurprising that syllable duration is largely a function of the number of segments within the syllable (as shown in the graph below) Note the systematic lengthening of the syllable for each form as the accent level increases from “NONE” to “LIGHT “to “HEAVY” This pattern is representative of accent’s impact on duration Canonical Syllable Forms V = Vowel C = Consonant
116
Syllable Duration - Across Syllable Forms It is unsurprising that syllable duration is largely a function of the number of segments within the syllable (as shown in the graph below) Note the systematic lengthening of the syllable for each form as the accent level increases from “NONE” to “LIGHT “to “HEAVY” This pattern is representative of accent’s impact on duration (as we’ll see) Canonical Syllable Forms V = Vowel C = Consonant
117
Syllable Duration - Accent Level/Syllable Form Canonical Syllable Forms This graph shows the same data as the previous slides, but from the perspective of just two accent levels (“HEAVY” and “NONE”) V = Vowel C = Consonant
118
Syllable Duration - Accent Level/Syllable Form Canonical Syllable Forms This graph shows the same data as the previous slides, but from the perspective of just two accent levels (“HEAVY” and “NONE”) The heavily accented syllables are generally 60-100% longer than their unaccented counterparts V = Vowel C = Consonant
119
Syllable Duration - Accent Level/Syllable Form Canonical Syllable Forms This graph shows the same data as the previous slides, but from the perspective of just two accent levels (“HEAVY” and “NONE”) The heavily accented syllables are generally 60-100% longer than their unaccented counterparts The disparity in duration is most pronounced for syllable forms with one or no consonants (i.e., V, VC, CV) V = Vowel C = Consonant
120
Syllable Duration - Accent Level/Syllable Form Canonical Syllable Forms This graph shows the same data as the previous slides, but from the perspective of just two accent levels (“HEAVY” and “NONE”) The heavily accented syllables are generally 60-100% longer than their unaccented counterparts The disparity in duration is most pronounced for syllable forms with one or no consonants (i.e., V, VC, CV) This pattern implies that accent has the greatest impact on vocalic duration V = Vowel C = Consonant
121
Canonical Syllable Forms Nucleus Duration - Accent Level/Syllable Form The hypothesis delineated on the previous slide (that accent has the most profound impact on vocalic duration) is confirmed in the graph below
122
Canonical Syllable Forms Nucleus Duration - Accent Level/Syllable Form The hypothesis delineated on the previous slide (that accent has the most profound impact on vocalic duration) is confirmed in the graph below Vowels in accented syllables (of all forms) are at least twice as long as their unaccented counterparts
123
Canonical Syllable Forms Nucleus Duration - Accent Level/Syllable Form The hypothesis delineated on the previous slide (that accent has the most profound impact on vocalic duration) is confirmed in the graph below Vowels in accented syllables (of all forms) are at least twice as long as their unaccented counterparts This pattern implies that the syllable nucleus absorbs a major component of accent’s impact (at least as far as duration is concerned)
124
PART SEVEN Stress Accent and the Vocalic Nucleus
125
Because the pattern of stress accent’s impact on vocalic duration is relatively uniform across syllable form it is likely that the specific structure of the syllable has relatively little impact on vocalic duration Stress Accent’s Impact on the Vocalic Nucleus
126
Because the pattern of stress accent’s impact on vocalic duration is relatively uniform across syllable form it is likely that the specific structure of the syllable has relatively little impact on vocalic duration As a consequence, the remaining analyses pertaining to accent’s impact on vocalic duration collapse the data across syllable form Stress Accent’s Impact on the Vocalic Nucleus
127
Because the pattern of stress accent’s impact on vocalic duration is relatively uniform across syllable form it is likely that the specific structure of the syllable has relatively little impact on vocalic duration As a consequence, the remaining analyses pertaining to accent’s impact on vocalic duration collapse the data across syllable form We now examine vocalic duration in somewhat greater detail and illustrate how duration, stress accent and vocalic identity interact Stress Accent’s Impact on the Vocalic Nucleus
128
The Spatial Patterning of Duration in Vocalic Nuclei
129
Vowel quality is generally thought to be a function primarily of two articulatory properties – both related to the motion of the tongue A Brief Primer on Vocalic Acoustics
130
Vowel quality is generally thought to be a function primarily of two articulatory properties – both related to the motion of the tongue The front-back plane is most closely associated with the second formant frequency (or more precisely F2 - F1) and the volume of the front-cavity resonance A Brief Primer on Vocalic Acoustics
131
Vowel quality is generally thought to be a function primarily of two articulatory properties – both related to the motion of the tongue The front-back plane is most closely associated with the second formant frequency (or more precisely F2 - F1) and the volume of the front-cavity resonance The height parameter is closely linked to the frequency of F1 A Brief Primer on Vocalic Acoustics
132
Vowel quality is generally thought to be a function primarily of two articulatory properties – both related to the motion of the tongue The front-back plane is most closely associated with the second formant frequency (or more precisely F2 - F1) and the volume of the front-cavity resonance The height parameter is closely linked to the frequency of F1 In the classic vowel “triangle,” segments are positioned in terms of the tongue positions associated with their production, as follows: A Brief Primer on Vocalic Acoustics
133
Vowel quality is generally thought to be a function primarily of two articulatory properties – both related to the motion of the tongue The front-back plane is most closely associated with the second formant frequency (or more precisely F2 - F1) and the volume of the front-cavity resonance The height parameter is closely linked to the frequency of F1 In the classic vowel “triangle,” segments are positioned in terms of the tongue positions associated with their production, as follows: A Brief Primer on Vocalic Acoustics
134
In the following slides duration is plotted on a 2-D grid, where the x-axis represents the (hypothetical) front-back tongue position Spatial Patterning of Duration et al.
135
In the following slides duration is plotted on a 2-D grid, where the x-axis represents the (hypothetical) front-back tongue position (and hence remains a constant throughout the plots to follow) Spatial Patterning of Duration et al.
136
In the following slides duration is plotted on a 2-D grid, where the x-axis represents the (hypothetical) front-back tongue position (and hence remains a constant throughout the plots to follow) The y-axis serves as the dependent measure, expressed in terms of either duration or the proportion of fully stressed (or unstressed) nuclei Spatial Patterning of Duration et al.
137
Vocalic Duration and Vowel Height The spatial patterning of vocalic segments is systematic with respect to duration
138
Vocalic Duration and Vowel Height The spatial patterning of vocalic segments is systematic with respect to duration Low vowels, be they diphthongs or monophthongs, are longer (on average) than high vowels
139
Vocalic Duration and Vowel Height All nuclei DiphthongsMonophthongs The spatial patterning of vocalic segments is systematic with respect to duration Low vowels, be they diphthongs or monophthongs, are longer (on average) than high vowels
140
Vocalic Duration and Vowel Height All nuclei DiphthongsMonophthongs The spatial patterning of vocalic segments is systematic with respect to duration Low vowels, be they diphthongs or monophthongs, are longer (on average) than high vowels Thus, duration appears to be highly correlated with vowel height
141
Vocalic Duration and Vowel Height All nuclei DiphthongsMonophthongs The spatial patterning of vocalic segments is systematic with respect to duration Low vowels, be they diphthongs or monophthongs, are longer (on average) than high vowels Thus, duration appears to be highly correlated with vowel height But … the situation is a little more complicated than first appearances would suggest
142
Durational Differences - Stressed/Unstressed There is a large dynamic range in duration between accented and unaccented vocalic nuclei Canonical Syllable Forms
143
Durational Differences - Stressed/Unstressed There is a large dynamic range in duration between accented and unaccented vocalic nuclei Moreover, diphthongs and tense, low monophthongs tend to exhibit a larger dynamic range than the lax monophthongs Canonical Syllable Forms
144
Durational Differences - Stressed/Unstressed There is a large dynamic range in duration between accented and unaccented vocalic nuclei Moreover, diphthongs and tense, low monophthongs tend to exhibit a larger dynamic range than the lax monophthongs Canonical Syllable Forms Lax monophthongs
145
Vocalic Identity Among Unstressed Nuclei The high, lax monophthongs are almost always unstressed
146
Vocalic Identity Among Unstressed Nuclei The high, lax monophthongs are almost always unstressed The low vowels, be they monophthongs or diphthongs, are rarely unstressed
147
Vocalic Identity Among Unstressed Nuclei The high, lax monophthongs are almost always unstressed The low vowels, be they monophthongs or diphthongs, are rarely unstressed The high diphthongs and high/mid, tense monophthongs occupy an intermediate position
148
The high vowels are rarely fully stressed Vocalic Identity Among Fully Stressed Nuclei
149
The high vowels are rarely fully stressed The low vowels, be they monophthongs or diphthongs, are far more likely to be fully stressed Vocalic Identity Among Fully Stressed Nuclei
150
The high vowels are rarely fully stressed The low vowels, be they monophthongs or diphthongs, are far more likely to be fully stressed An intermediate degree of stress accounts for the other vocalic instances Vocalic Identity Among Fully Stressed Nuclei
151
The high vowels are rarely fully stressed The low vowels, be they monophthongs or diphthongs, are far more likely to be fully stressed An intermediate degree of stress accounts for the other vocalic instances (but will not be addressed here) Vocalic Identity Among Fully Stressed Nuclei
152
The vowels of heavily accented syllables are (mostly) pronounced canonically Canonical PronunciationsNon-Canonical Pronunciations Vocalic Variation – Importance of Stress Accent
153
The vowels of heavily accented syllables are (mostly) pronounced canonically Low vowels are largely the province of accented syllables Canonical PronunciationsNon-Canonical Pronunciations Vocalic Variation – Importance of Stress Accent
154
The vowels of heavily accented syllables are (mostly) pronounced canonically Low vowels are largely the province of accented syllables, and High vowels the province of unaccented syllables Vocalic Variation – Importance of Stress Accent Canonical PronunciationsNon-Canonical Pronunciations
155
The vowels of heavily accented syllables are (mostly) pronounced canonically Low vowels are largely the province of accented syllables, and High vowels the province of unaccented syllables Moreover, there’s a lexical bias towards high vowels for unaccented forms Canonical PronunciationsNon-Canonical Pronunciations Vocalic Variation – Importance of Stress Accent
156
The vowels of heavily accented syllables are (mostly) pronounced canonically Low vowels are largely the province of accented syllables, and High vowels the province of unaccented syllables Moreover, there’s a lexical bias towards high vowels for unaccented forms That’s reinforced in patterns of deviation from canonical pronunciation Canonical PronunciationsNon-Canonical Pronunciations Vocalic Variation – Importance of Stress Accent
157
Vocalic Height Deviation from Canonical Amount of ChangeDirection of Change Vowels are more likely to RISE in height than to descend when unaccented
158
Vocalic Height Deviation from Canonical Amount of ChangeDirection of Change Vowels are more likely to RISE in height than to descend when unaccented Vocalic lowering of height is rare
159
Vocalic Height Deviation from Canonical Amount of ChangeDirection of Change Vowels are more likely to RISE in height than to descend when unaccented Vocalic lowering of height is rare Most deviations from the canonical maintain vowel height
160
Vocalic Height Deviation from Canonical Amount of ChangeDirection of Change Vowels are more likely to RISE in height than to descend when unaccented Vocalic lowering of height is rare Most deviations from the canonical maintain vowel height More than a single height step deviation is uncommon
161
Vocalic Height Deviation from Canonical Amount of ChangeDirection of Change Vowels are more likely to RISE in height than to descend when unaccented Vocalic lowering of height is rare Most deviations from the canonical maintain vowel height More than a single height step deviation is uncommon Virtually all 2-step height deviations occur in unaccented syllables
162
The Vowel Space Under (Full) Stress (Accent) In unaccented nuclei there is a relatively even distribution of segments across the vowel space, with a slight bias towards the front and central vowels Canonical Vowels Only
163
In unaccented syllables vowels are confined largely to the high-front and high-central sectors of the articulatory space The Vowel Space Without (Stress) Accent Canonical Vowels Only
164
In unaccented syllables vowels are confined largely to the high-front and high-central sectors of the articulatory space The low and mid vowels “get creamed” The Vowel Space Without (Stress) Accent Canonical Vowels Only
165
Stress accent exerts a profound effect on the character of the vowel space The Vowel Spaces Compared Heavily AccentedUnaccented Canonical Vowels Only
166
Stress accent exerts a profound effect on the character of the vowel space High vowels are largely associated with unaccented syllables The Vowel Spaces Compared Heavily AccentedUnaccented Canonical Vowels Only
167
Stress accent exerts a profound effect on the character of the vowel space High vowels are largely associated with unaccented syllables Low vowels are mostly associated with accented forms The Vowel Spaces Compared Heavily AccentedUnaccented Canonical Vowels Only
168
Stress accent exerts a profound effect on the character of the vowel space High vowels are largely associated with unaccented syllables Low vowels are mostly associated with accented forms This distinction between accented and unaccented syllables is of profound importance for understanding (and modeling) pronunciation variation The Vowel Spaces Compared Heavily AccentedUnaccented Canonical Vowels Only
169
Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse Is It Stress? Vocalic Identity? Or What?
170
Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse For any given vocalic class, stressed segments are longer (on average) Is It Stress? Vocalic Identity? Or What?
171
Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse For any given vocalic class, stressed segments are longer (on average) The durational disparity is most pronounced among the low vowels and the diphthongs Is It Stress? Vocalic Identity? Or What?
172
Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse For any given vocalic class, stressed segments are longer (on average) The durational disparity is most pronounced among the low vowels and the diphthongs Low vowels tend to be much longer in duration than high vowels Is It Stress? Vocalic Identity? Or What?
173
Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse For any given vocalic class, stressed segments are longer (on average) The durational disparity is most pronounced among the low vowels and the diphthongs Low vowels tend to be much longer in duration than high vowels This is the case even for diphthongs Is It Stress? Vocalic Identity? Or What?
174
Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse For any given vocalic class, stressed segments are longer (on average) The durational disparity is most pronounced among the low vowels and the diphthongs Low vowels tend to be much longer in duration than high vowels This is the case even for diphthongs Low vowels are rarely without some measure of stress accent Is It Stress? Vocalic Identity? Or What?
175
Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse For any given vocalic class, stressed segments are longer (on average) The durational disparity is most pronounced among the low vowels and the diphthongs Low vowels tend to be much longer in duration than high vowels This is the case even for diphthongs Low vowels are rarely without some measure of stress accent This is true for monophthongs as well as diphthongs Is It Stress? Vocalic Identity? Or What?
176
Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse For any given vocalic class, stressed segments are longer (on average) The durational disparity is most pronounced among the low vowels and the diphthongs Low vowels tend to be much longer in duration than high vowels This is the case even for diphthongs Low vowels are rarely without some measure of stress accent This is true for monophthongs as well as diphthongs High vowels are RARELY fully stressed Is It Stress? Vocalic Identity? Or What?
177
Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse For any given vocalic class, stressed segments are longer (on average) The durational disparity is most pronounced among the low vowels and the diphthongs Low vowels tend to be much longer in duration than high vowels This is the case even for diphthongs Low vowels are rarely without some measure of stress accent This is true for monophthongs as well as diphthongs High vowels are RARELY fully stressed This is particularly so for monophthongs, but also applies to diphthongs Is It Stress? Vocalic Identity? Or What?
178
Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse For any given vocalic class, stressed segments are longer (on average) The durational disparity is most pronounced among the low vowels and the diphthongs Low vowels tend to be much longer in duration than high vowels This is the case even for diphthongs Low vowels are rarely without some measure of stress accent This is true for monophthongs as well as diphthongs High vowels are RARELY fully stressed This is particularly so for monophthongs, but also applies to diphthongs Thus, stress accent appears to be intricately involved with vocalic identity Is It Stress? Vocalic Identity? Or What?
179
PART EIGHT Stress Accent’s Impact on Syllable Onsets
180
Stress Accent and Syllable Onsets The onset is often cited as the key syllabic constituent with respect to “lexical access”
181
Stress Accent and Syllable Onsets The onset is often cited as the key syllabic constituent with respect to “lexical access” It is therefore of interest to ascertain how the onset’s duration behaves as a function of accent level
182
Stress Accent and Syllable Onsets The onset is often cited as the key syllabic constituent with respect to “lexical access” It is therefore of interest to ascertain how the onset’s duration behaves as a function of accent level Because of the onset’s key role in lexical access one might assume that its duration would be relatively stable across accent level
183
Stress Accent and Syllable Onsets The onset is often cited as the key syllabic constituent with respect to “lexical access” It is therefore of interest to ascertain how the onset’s duration behaves as a function of accent level Because of the onset’s key role in lexical access one might assume that its duration would be relatively stable across accent level The following slides suggest that this assumption is INCORRECT
184
Stress Accent and Syllable Onsets The onset is often cited as the key syllabic constituent with respect to “lexical access” It is therefore of interest to ascertain how the onset’s duration behaves as a function of accent level Because of the onset’s key role in lexical access one might assume that its duration would be relatively stable across accent level The following slides suggest that this assumption is INCORRECT, And that the structure of the onset is more complex (and more interesting) than initial intuition would suggest
185
Canonical Syllable Forms Onset Duration - Accent Level/Syllable Form The duration of the syllable onset varies significantly as a function of accent level (though not quite as much as exhibited by vocalic constituents)
186
Canonical Syllable Forms Onset Duration - Accent Level/Syllable Form The duration of the syllable onset varies significantly as a function of accent level (though not quite as much as exhibited by vocalic constituents) Onset duration is similar across syllable form
187
Canonical Syllable Forms Onset Duration - Accent Level/Syllable Form The duration of the syllable onset varies significantly as a function of accent level (though not quite as much as exhibited by vocalic constituents) Onset duration is similar across syllable form (except that segments comprising complex onsets [i.e., CCVC] are slightly shorter)
188
Canonical Syllable Forms Onset Duration - Accent Level/Syllable Form The duration of the syllable onset varies significantly as a function of accent level (though not quite as much as exhibited by vocalic constituents) Onset duration is similar across syllable form (except that segments comprising complex onsets [i.e., CCVC] are slightly shorter) The duration of unaccented onsets is similar across syllable forms
189
Canonical Syllable Forms Onset Duration - Accent Level/Syllable Form Onsets of accented syllables are generally 50-60% longer than their unaccented counterparts
190
Canonical Syllable Forms Onset Duration - Accent Level/Syllable Form Onsets of accented syllables are generally 50-60% longer than their unaccented counterparts Although this durational difference is not quite as large as observed for vocalic nuclei, it is still substantial (and mostly consistent across forms)
191
Place of Articulation – A Brief Primer The tongue contacts (or nearly so) the roof of the mouth in producing many of the consonantal sounds in English Anterior Labial [p] [b] [m] Labio-dental [f] [v] Inter-dental [th] [dh] Central Alveolar [t] [d] [n] [s] [z] Posterior Palatal [sh] [zh] Velar [k] [g] [ng] From Daniloff (1973)
192
Segmental Identity and Stress Accent It is of interest to compare accent’s impact on segmental duration with its impact on segmental realization (i.e., whether the segment is realized canonically or not …)
193
Segmental Identity and Stress Accent It is of interest to compare accent’s impact on segmental duration with its impact on segmental realization (i.e., whether the segment is realized canonically or not …) Usually, non-canonical realizations are manifest as segmental deletions
194
Segmental Identity and Stress Accent It is of interest to compare accent’s impact on segmental duration with its impact on segmental realization (i.e., whether the segment is realized canonically or not... ) Usually, non-canonical realizations are manifest as segmental deletions The pattern of segmental realization bears some correspondence to durational variation as a function of accent level
195
Segmental Identity and Stress Accent It is of interest to compare accent’s impact on segmental duration with its impact on segmental realization (i.e., whether the segment is realized canonically or not... ) Usually, non-canonical realizations are manifest as segmental deletions The pattern of segmental realization bears some correspondence to durational variation as a function of accent level But also exhibits some interesting differences
196
Segmental Identity and Stress Accent It is of interest to compare accent’s impact on segmental duration with its impact on segmental realization (i.e., whether the segment is realized canonically or not... ) Usually, non-canonical realizations are manifest as segmental deletions The pattern of segmental realization bears some correspondence to durational variation as a function of accent level But also exhibits some interesting differences (which are potentially significant for models of phonetic organization)
197
Segmental Identity and Stress Accent It is of interest to compare accent’s impact on segmental duration with its impact on segmental realization (i.e., whether the segment is realized canonically or not... ) Usually, non-canonical realizations are manifest as segmental deletions The pattern of segmental realization bears some correspondence to durational variation as a function of accent level But also exhibits some interesting differences (which are potentially significant for models of phonetic organization) Before we examine the segmental patterns in detail, a brief primer on the interpretation of these data is presented
198
Road Map - How to Interpret the Data Compare the numbers in the YELLOW and ORANGE columns Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
199
Road Map - How to Interpret the Data Compare the numbers in the YELLOW and ORANGE columns Most numbers in the YELLOW / ORANGE columns will be similar Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
200
Road Map - How to Interpret the Data Compare the numbers in the YELLOW and ORANGE columns Most numbers in the YELLOW / ORANGE columns will be similar Indicating that the phonetic realization of the segment is the canonical form Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
201
Road Map - How to Interpret the Data Compare the numbers in the YELLOW and ORANGE columns Most numbers in the YELLOW / ORANGE columns will be similar Indicating that the phonetic realization of the segment is the canonical form A large disparity between columns is marked with a blue box Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
202
Road Map - How to Interpret the Data Compare the numbers in the YELLOW and ORANGE columns Most numbers in the YELLOW / ORANGE columns will be similar Indicating that the phonetic realization of the segment is the canonical form A large disparity between columns is marked with a blue box READY? Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
203
Road Map - How to Interpret the Data Compare the numbers in the YELLOW and ORANGE columns Most numbers in the YELLOW / ORANGE columns will be similar Indicating that the phonetic realization of the segment is the canonical form A large disparity between columns is marked with a blue box READY? OK, Let’s go! Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
204
Syllable Onset Statistics – ANTERIOR Place Stress accent has relatively little impact on anterior onset segments Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
205
Syllable Onset Statistics – ANTERIOR Place Stress accent has relatively little impact on anterior onset segments EXCEPT for [dh] and [y] Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
206
Syllable Onset Statistics – ANTERIOR Place Stress accent has relatively little impact on anterior onset segments EXCEPT for [dh] and [y] [dh] (as in “the” and “them”) tends to delete in unaccented syllables, as does [y] (although to a lesser extent) Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
207
Central segments tend to “disappear” under (absence of) stress (accent) Can = Canonical form Trans = Transcribed (i.e., phonetically realized) Syllable Onset Statistics – CENTRAL Place
208
Central segments tend to “disappear” under (absence) of stress (accent) There is also a tendency for flaps ([dx] and [nx]) to insert under similar conditions Can = Canonical form Trans = Transcribed (i.e., phonetically realized) Syllable Onset Statistics – CENTRAL Place
209
Central segments tend to “disappear” under (absence) of stress (accent) There is also a tendency for flaps ([dx] and [nx]) to insert under similar conditions In heavily accented syllables, central segments maintain their canonical identity Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
210
Syllable Onset Duration - Posterior Place Posterior segments are remarkably stable in onset position Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
211
Syllable Onset Statistics – Posterior Place Posterior segments are remarkably stable in onset position The only significant “deviation” from canonical realization is the intrusion of the glottal stop [q], which lacks phonemic status in English Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
212
Syllable Onset Statistics – Place Chameleons “Chameleons” assimilate their place of articulation to the following vowel Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
213
Syllable Onset Statistics – Place Chameleons “Chameleons” assimilate their place of articulation to the following vowel They are relatively stable at syllable onset, except in unaccented forms Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
214
Syllable Onset Statistics – Place Chameleons “Chameleons” assimilate their place of articulation to the following vowel They are relatively stable at syllable onset, except in unaccented forms The reduced form of [l] is [lg], a glide-like element – it tends to assume the functional status of [l] in unaccented syllables Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
215
PART NINE Stress Accent’s Impact on Syllable Codas
216
Stress Accent and Syllable Codas Stress accent’s impact on syllable codas differs from that of onsets
217
Stress Accent and Syllable Codas Stress accent’s impact on syllable codas differs from that of onsets The disparity in duration between accented and unaccented forms tends to be significantly less for codas than for onsets (at least when deletions are omitted from consideration)
218
Stress Accent and Syllable Codas Stress accent’s impact on syllable codas differs from that of onsets The disparity in duration between accented and unaccented forms tends to be significantly less for codas than for onsets (at least when deletions are omitted from consideration) There is a far greater probability of segmental deletion in coda constituents
219
Stress Accent and Syllable Codas Stress accent’s impact on syllable codas differs from that of onsets The disparity in duration between accented and unaccented forms tends to be significantly less for codas than for onsets (at least when deletions are omitted from consideration) There is a far greater probability of segmental deletion in coda constituents Accent level exerts a powerful influence on segmental deletion and on segmental duration
220
Stress Accent and Syllable Codas Stress accent’s impact on syllable codas differs from that of onsets The disparity in duration between accented and unaccented forms tends to be significantly less for codas than for onsets (at least when deletions are omitted from consideration) There is a far greater probability of segmental deletion in coda constituents Accent level exerts a powerful influence on segmental deletion and on segmental duration To a certain degree segmental deletion and duration interact (or are flip sides of the same phonetic coin)
221
Coda Duration - Accent Level/Syllable Form Coda duration (on average) is similar across syllable structure, both for accented and unaccented forms Canonical Syllable Forms
222
Coda Duration - Accent Level/Syllable Form Coda duration (on average) is similar across syllable structure, both for accented and unaccented forms There is a relatively small dynamic range in duration between accented and unaccented codas (relative to onsets and nuclei) Canonical Syllable Forms
223
Coda Duration - Accent Level/Syllable Form Coda duration (on average) is similar across syllable structure, both for accented and unaccented forms There is a relatively small dynamic range in duration between accented and unaccented codas (relative to onsets and nuclei) Moreover, the duration of certain coda constituents is virtually identical in accented and unaccented syllables Canonical Syllable Forms
224
Syllable Coda Statistics – Anterior Place Anterior coda segments are relatively stable under stress (accent) Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
225
Syllable Coda Statistics – Anterior Place Anterior coda segments are relatively stable under stress (accent) The segments [m] and [v] are exceptions Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
226
Syllable Coda Statistics – Anterior Place Anterior coda segments are relatively stable under stress (accent) The segments [m] and [v] are exceptions – they often function as “flaps” in this context, and Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
227
Syllable Coda Statistics – Anterior Place Anterior coda segments are relatively stable under stress (accent) The segments [m] and [v] are exceptions – they often function as “flaps” in this context, and They tend to delete in unaccented syllables Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
228
Syllable Coda Statistics – Central Place Central coda segments are extremely unstable under stress (accent) Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
229
Syllable Coda Statistics – Central Place Central coda segments are extremely unstable under stress (accent) (except for the fricatives [s] and [z]) Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
230
Syllable Coda Statistics – Central Place Central coda segments are extremely unstable under stress (accent) (except for the fricatives [s] and [z]) The segments [t], [d] and [n] tend to delete in coda position, even in heavily accented syllables Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
231
Syllable Coda Statistics – Central Place Central coda segments are extremely unstable under stress (accent) (except for the fricatives [s] and [z]) The segments [t], [d] and [n] tend to delete in coda position, even in heavily accented syllables The major effect of stress accent is its affect on the probability of segmental deletion (which is appreciably higher in unaccented forms) Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
232
Syllable Coda Duration - CENTRAL Place CANONICAL Syllable Forms The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties
233
Syllable Coda Duration - CENTRAL Place CANONICAL Syllable Forms The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties The duration of many of the coda segments do not exhibit a difference in duration
234
Syllable Coda Duration - CENTRAL Place CANONICAL Syllable Forms The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties The duration of many of the coda segments do not exhibit a difference in duration Many of the unaccented central codas are short in duration, in contrast to:
235
Syllable Coda Duration - CENTRAL Place CANONICAL Syllable Forms The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties The duration of many of the coda segments do not exhibit a difference in duration Many of the unaccented central codas are short in duration, in contrast to: (1) central onsets
236
Syllable Coda Duration - CENTRAL Place CANONICAL Syllable Forms The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties The duration of many of the coda segments do not exhibit a difference in duration Many of the unaccented central codas are short in duration, in contrast to: (1) central onsets, (2) anterior codas
237
Syllable Coda Duration - CENTRAL Place CANONICAL Syllable Forms The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties The duration of many of the coda segments do not exhibit a difference in duration Many of the unaccented central codas are short in duration, in contrast to: (1) central onsets, (2) anterior codas, (3) posterior codas
238
Syllable Coda Duration - CENTRAL Place CANONICAL Syllable Forms The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties The duration of many of the coda segments do not exhibit a difference in duration Many of the unaccented central codas are short in duration, in contrast to: (1) central onsets, (2) anterior codas, (3) posterior codas, (4) chameleon codas
239
Syllable Coda Duration - CENTRAL Place CANONICAL Syllable Forms The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties The duration of many of the coda segments do not exhibit a difference in duration Many of the unaccented central codas are short in duration, in contrast to: (1) central onsets, (2) anterior codas, (3) posterior codas, (4) chameleon codas
240
Syllable Coda Duration - CENTRAL Place ALLSyllable Forms Because of the high probability of deletions for central coda consonants the mean durations are quite low relative to other conditions
241
Syllable Coda Duration - CENTRAL Place ALLSyllable Forms Because of the high probability of deletions for central coda consonants the mean durations are quite low relative to other conditions In some sense the default duration for central codas is very short
242
Syllable Coda Statistics – Posterior Place Posterior coda segments are relatively stable under stress (accent) Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
243
Syllable Coda Statistics – Posterior Place Posterior coda segments are relatively stable under stress (accent) The primary exception is [ng], which tends to delete in unaccented syllables Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
244
Syllable Coda Statistics – POSTERIOR Place Posterior coda segments are relatively stable under stress (accent) The primary exception is [ng], which tends to delete in unaccented syllables The “infamous” glottal stop [q] tends to insert in this context Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
245
Syllable Coda Statistics – Place Chameleons Chameleon segments are unstable under stress (accent) Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
246
Syllable Coda Statistics – Place Chameleons Chameleon segments are unstable under stress (accent) This is particularly true for [l] (for all levels of accent), where many canonical segments transmute into [lg], particularly in accented forms Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
247
Syllable Coda Statistics – Place Chameleons Chameleon segments are unstable under stress (accent) This is particularly true for [l] (for all levels of accent), where many canonical segments transmute into [lg], particularly in accented forms The segment [r] tends to delete in unaccented syllables, but not otherwise Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
248
PART TEN What’s Going on in Pronunciation?
249
With respect to onset and coda segments, there are two basic forms … What’s Going On? (in pronunciation)
250
With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and What’s Going On? (in pronunciation)
251
With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not What’s Going On? (in pronunciation)
252
With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior What’s Going On? (in pronunciation)
253
With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables What’s Going On? (in pronunciation)
254
With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables The place chameleons (i.e., the approximants) are not very stable in either onset or coda position What’s Going On? (in pronunciation)
255
With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables The place chameleons (i.e., the approximants) are not very stable in either onset or coda position The vowels form two basic groups – What’s Going On? (in pronunciation)
256
With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables The place chameleons (i.e., the approximants) are not very stable in either onset or coda position The vowels form two basic groups – (1) accented What’s Going On? (in pronunciation)
257
With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables The place chameleons (i.e., the approximants) are not very stable in either onset or coda position The vowels form two basic groups – (1) accented and (2) unaccented What’s Going On? (in pronunciation)
258
With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables The place chameleons (i.e., the approximants) are not very stable in either onset or coda position The vowels form two basic groups – (1) accented and (2) unaccented The accented vowels are generally canonically realized and quasi-evenly distributed across the vowel space What’s Going On? (in pronunciation)
259
With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables The place chameleons (i.e., the approximants) are not very stable in either onset or coda position The vowels form two basic groups – (1) accented and (2) unaccented The accented vowels are generally canonically realized and quasi-evenly distributed across the vowel space The unaccented forms tend to concentrate in the high-front and high-central regions of the vowel space What’s Going On? (in pronunciation)
260
With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables The place chameleons (i.e., the approximants) are not very stable in either onset or coda position The vowels form two basic groups – (1) accented and (2) unaccented The accented vowels are generally canonically realized and quasi-evenly distributed across the vowel space The unaccented forms tend to concentrate in the high-front and high-central regions of the vowel space Certain segments are actually junctures – e.g., the flaps and the glottal stop What’s Going On? (in pronunciation)
261
With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables The place chameleons (i.e., the approximants) are not very stable in either onset or coda position The vowels form two basic groups – (1) accented and (2) unaccented The accented vowels are generally canonically realized and quasi-evenly distributed across the vowel space The unaccented forms tend to concentrate in the high-front and high-central regions of the vowel space Certain segments are actually junctures – e.g., the flaps and the glottal stop Several other so-called segments are junctures as well What’s Going On? (in pronunciation)
262
With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables The place chameleons (i.e., the approximants) are not very stable in either onset or coda position The vowels form two basic groups – (1) accented and (2) unaccented The accented vowels are generally canonically realized and quasi-evenly distributed across the vowel space The unaccented forms tend to concentrate in the high-front and high-central regions of the vowel space Certain segments are actually junctures – e.g., the flaps and the glottal stop Several other so-called segments are junctures as well (as they function like flaps), the most noteworthy examples are [dh] and [v] What’s Going On? (in pronunciation)
263
With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables The place chameleons (i.e., the approximants) are not very stable in either onset or coda position The vowels form two basic groups – (1) accented and (2) unaccented The accented vowels are generally canonically realized and quasi-evenly distributed across the vowel space The unaccented forms tend to concentrate in the high-front and high-central regions of the vowel space Certain segments are actually junctures – e.g., the flaps and the glottal stop Several other so-called segments are junctures as well (as they function like flaps), the most noteworthy examples are [dh] and [v] None of these properties is consistent with a segmental model of language What’s Going On? (in pronunciation)
264
Synopsis The Rationale for a Juncture-Accent Model of Spoken Language
265
Take Home Messages Based on a detailed analysis of a manually annotated corpus of spontaneous American English (Switchboard) the following conclusions are drawn:
266
The pattern of pronunciation variation observed is inconsistent with a segmental model of spoken language Take Home Messages
267
Based on a detailed analysis of a manually annotated corpus of spontaneous American English (Switchboard) the following conclusions are drawn: The pattern of pronunciation variation observed is inconsistent with a segmental model of spoken language The pronunciation patterns observed cut across segment and articulatory- feature classes Take Home Messages
268
Based on a detailed analysis of a manually annotated corpus of spontaneous American English (Switchboard) the following conclusions are drawn: The pattern of pronunciation variation observed is inconsistent with a segmental model of spoken language The pronunciation patterns observed cut across segment and articulatory- feature classes The patterns observed display systematic variation when syllable structure and stress accent are taken into account Take Home Messages
269
Based on a detailed analysis of a manually annotated corpus of spontaneous American English (Switchboard) the following conclusions are drawn: The pattern of pronunciation variation observed is inconsistent with a segmental model of spoken language The pronunciation patterns observed cut across segment and articulatory- feature classes The patterns observed display systematic variation when syllable structure and stress accent are taken into account Therefore, future-generation speech recognition systems need to build syllable structure and stress-accent information into pronunciation models and lexical representations Take Home Messages
270
Based on a detailed analysis of a manually annotated corpus of spontaneous American English (Switchboard) the following conclusions are drawn: The pattern of pronunciation variation observed is inconsistent with a segmental model of spoken language The pronunciation patterns observed cut across segment and articulatory- feature classes The patterns observed display systematic variation when syllable structure and stress accent are taken into account Therefore, future-generation speech recognition systems need to build syllable structure and stress-accent information into pronunciation models and lexical representations A preliminary juncture-accent model provides a potential starting point for developing more realistic (and robust) lexical representations Take Home Messages
271
That’s All, Folks Many Thanks for Your Time and Attention
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.