Presentation is loading. Please wait.

Presentation is loading. Please wait.

Temporal Properties of Spoken Language Steven Greenberg In Collaboration with Hannah Carvey,

Similar presentations


Presentation on theme: "Temporal Properties of Spoken Language Steven Greenberg In Collaboration with Hannah Carvey,"— Presentation transcript:

1

2 Temporal Properties of Spoken Language Steven Greenberg http://www.icsi.berkeley.edu/~steveng steveng@icsi.berkeley.edu In Collaboration with Hannah Carvey, Leah Hitchcock and Shawn Chang

3 Acknowledgements and Thanks Statistical Analysis and Automatic Classification Hannah Carvey, Shawn Chang, Leah Hitchcock Research Funding U.S. National Science Foundation U.S. Department of Defense

4 For Further Information Consult the web site: www.icsi.berkeley.edu/~steveng

5 Intelligibility and the Modulation Spectrum Significant attenuation (or distortion) of the modulation spectrum results in an appreciable decline in the ability to understand spoken language Why should this be so? Greenberg and Arai (1998)

6 Syllable Duration & the Modulation Spectrum

7 Spectral Slit Paradigm Can listeners decode spoken sentences (“TIMIT”) using just four narrow (1/3 octave) channels (“slits”) distributed across the spectrum? The edge of each slit was separated from its nearest neighbor by an octave The modulation pattern for each slit differs from that of the others The four-slit compound waveform looks very similar to the full-band signal

8 Syllable Duration & the Modulation Spectrum

9 Anatomy of the Modulation Spectrum Why is the modulation spectrum’s integrity so crucial for intelligibility?

10 Anatomy of the Modulation Spectrum Why is the modulation spectrum’s integrity so crucial for intelligibility? What does it reflect linguistically?

11 Anatomy of the Modulation Spectrum Why is the modulation spectrum’s integrity so crucial for intelligibility? What does it reflect linguistically? Why is the bandwidth of the modulation spectrum associated with (intelligible) speech so broad?

12 Anatomy of the Modulation Spectrum Why is the modulation spectrum’s integrity so crucial for intelligibility? What does it reflect linguistically? Why is the bandwidth of the modulation spectrum associated with (intelligible) speech so broad? Modulation spectrum of 40 TIMIT sentences (computed across a 6-kHz)

13 Anatomy of the Modulation Spectrum Why is the modulation spectrum’s integrity so crucial for intelligibility? What does it reflect linguistically? Why is the bandwidth of the modulation spectrum associated with (intelligible) speech so broad? Does the modulation spectrum reflect a unitary property of the speech signal?

14 Anatomy of the Modulation Spectrum Why is the modulation spectrum’s integrity so crucial for intelligibility? What does it reflect linguistically? Why is the bandwidth of the modulation spectrum associated with (intelligible) speech so broad? Does the modulation spectrum reflect a unitary property of the speech signal? Or something more complex?

15 The Modulation Spectrum Reflects Syllables The peak in the modulation spectrum (for speech) is ca. 5 Hz (200 ms)

16 The Modulation Spectrum Reflects Syllables The peak in the modulation spectrum (for speech) is ca. 5 Hz (200 ms) The distribution associated with SYLLABLE DURATION is similar to the pattern of the MODULATION SPECTRUM ….

17 The Modulation Spectrum Reflects Syllables The peak in the modulation spectrum (for speech) is ca. 5 Hz (200 ms) The distribution associated with SYLLABLE DURATION is similar to the pattern of the MODULATION SPECTRUM …. Syllable duration (in terms of equivalent Modulation frequency) Modulation Spectrum Modulation spectrum of a short excerpt from the Switchboard Corpus Syllable duration distribution associated with a 30-minute subset of Switchboard

18 The Modulation Spectrum Reflects Syllables The peak in the modulation spectrum (for speech) is ca. 5 Hz (200 ms) The distribution associated with SYLLABLE DURATION is similar to the pattern of the MODULATION SPECTRUM …. Suggesting that the latter reflects SYLLABLES Syllable duration (in terms of equivalent Modulation frequency) Modulation spectrum of a short excerpt from the Switchboard Corpus Syllable duration distribution associated with a 30-minute subset of Switchboard

19 The Trouble with Syllables … The question thus arises …

20 The Trouble with Syllables … The question thus arises … If the modulation spectrum truly reflects syllables in the speech signal

21 The Trouble with Syllables … The question thus arises … If the modulation spectrum truly reflects syllables in the speech signal Why is the distribution of syllable duration so broad?

22 The Trouble with Syllables … The question thus arises … If the modulation spectrum truly reflects syllables in the speech signal Why is the distribution of syllable duration so broad? Modulation spectrum of 15 minutes of spontaneous Japanese speech (OGI-TS corpus) compared with the syllable duration distribution for the same material (Arai and Greenberg, 1997) Syllable duration (modulation frequency) Modulation Spectrum

23 The Trouble with Syllables … The question thus arises … If the modulation spectrum truly reflects syllables in the speech signal Why is the distribution of syllable duration so broad? And does this variability in syllable duration reflect something significant? Syllable duration (modulation frequency) Modulation Spectrum Modulation spectrum of 15 minutes of spontaneous Japanese speech (OGI-TS corpus) compared with the syllable duration distribution for the same material (Arai and Greenberg, 1997)

24 PART ONE What Underlies Variation in Word Duration?

25 Word Duration Most words (81%) in the Switchboard corpus are monosyllabic, and most of the remainder are disyllabic (together comprising 95% of the words)

26 Word Duration Most words (81%) in the Switchboard corpus are monosyllabic, and most of the remainder are disyllabic (together comprising 95% of the words) The distribution of word duration therefore largely parallels that of syllables(plotted in units of duration [ms] on a logarithmic scale) All Words

27 What Underlies Word Duration Variability? Is this distribution of lexical duration of a uniform nature (and source)?

28 What Underlies Word Duration Variability? Is this distribution of lexical duration of a uniform nature (and source)? Or does it reflect a more complex set of phenomena?

29 What Underlies Word Duration Variability? Is this distribution of lexical duration of a uniform nature (and source)? Or does it reflect a more complex set of phenomena? It has been observed for WRITTEN text that the more frequent words tend to be shorter and the less common words longer (i.e., Zipf’s law)

30 What Underlies Word Duration Variability? Is this distribution of lexical duration of a uniform nature (and source)? Or does it reflect a more complex set of phenomena? It has been observed for WRITTEN text that the more frequent words tend to be shorter and the less common words longer (i.e., Zipf’s law) Does such a relationship hold for spoken language?

31 What Underlies Word Duration Variability? Is this distribution of lexical duration of a uniform nature (and source)? Or does it reflect a more complex set of phenomena? It has been observed for WRITTEN text that the more frequent words tend to be shorter and the less common words longer (i.e., Zipf’s law) Does such a relationship hold for spoken language? Let’s find out!

32 Is Word Duration Related to Word Frequency? Word duration (derived from the phonetically annotated portion of the Switchboard corpus) can be plotted relative to frequency of occurrence

33 Is Word Duration Related to Word Frequency? Word duration (derived from the phonetically annotated portion of the Switchboard corpus) can be plotted relative to frequency of occurrence r = – 0.42 Words with fewer than 5 instances omitted from graph

34 Is Word Duration Related to Word Frequency? Word duration (derived from the phonetically annotated portion of the Switchboard corpus) can be plotted relative to frequency of occurrence Such an exercise shows that there is a WEAK relationship (r = – 0.42) between lexical (unigram) frequency and word duration r = – 0.42 Words with fewer than 5 instances omitted from graph

35 Is Word Duration Related to Word Frequency? Word duration (derived from the phonetically annotated portion of the Switchboard corpus) can be plotted relative to frequency of occurrence Such an exercise shows that there is a WEAK relationship (r = – 0.42) between lexical (unigram) frequency and word duration There is a lot of variability in word duration for any given frequency range r = – 0.42 Words with fewer than 5 instances omitted from graph

36 Is Word Duration Related to Word Frequency? Word duration (derived from the phonetically annotated portion of the Switchboard corpus) can be plotted relative to frequency of occurrence Such an exercise shows that there is a WEAK relationship (r = – 0.42) between lexical (unigram) frequency and word duration There is a lot of variability in word duration for any given frequency range Suggesting that lexical frequency, alone, is unlikely to account for variation in word duration r = – 0.42 Words with fewer than 5 instances omitted from graph

37 If Not (entirely) Word Frequency, Then What? One parameter that might be more directly related to word duration (and other durational properties of speech) is STRESS ACCENT

38 If Not (entirely) Word Frequency, Then What? One parameter that might be more directly related to word duration (and other durational properties of speech) is STRESS ACCENT Stress Accent is related to the emphasis (or prominence) associated with individual syllables within a word

39 If Not (entirely) Word Frequency, Then What? One parameter that might be more directly related to word duration (and other durational properties of speech) is STRESS ACCENT Stress Accent is related to the emphasis (or prominence) associated with individual syllables within a word Although dictionaries list the stress patterns associated with words, this information is but a rough guide to the actual patterns observed (as is the phonetic pronunciation provided in the dictionary)

40 If Not (entirely) Word Frequency, Then What? One parameter that might be more directly related to word duration (and other durational properties of speech) is STRESS ACCENT Stress Accent is related to the emphasis (or prominence) associated with individual syllables within a word Although dictionaries list the stress patterns associated with words, this information is but a rough guide to the actual patterns observed (as is the phonetic pronunciation provided in the dictionary) In order to obtain empirical data pertaining to stress accent, it is necessary to manually annotate a corpus (syllable by syllable)

41 If Not (entirely) Word Frequency, Then What? One parameter that might be more directly related to word duration (and other durational properties of speech) is STRESS ACCENT Stress Accent is related to the emphasis (or prominence) associated with individual syllables within a word Although dictionaries list the stress patterns associated with words, this information is but a rough guide to the actual patterns observed (as is the phonetic pronunciation provided in the dictionary) In order to obtain empirical data pertaining to stress accent, it is necessary to manually annotate a corpus (syllable by syllable) This manual annotation has been performed for a 45-minute subset of the Switchboard corpus, which has also been labeled with respect to phonetic segments, syllables and words

42 If Not (entirely) Word Frequency, Then What? One parameter that might be more directly related to word duration (and other durational properties of speech) is STRESS ACCENT Stress Accent is related to the emphasis (or prominence) associated with individual syllables within a word Although dictionaries list the stress patterns associated with words, this information is but a rough guide to the actual patterns observed (as is the phonetic pronunciation provided in the dictionary) In order to obtain empirical data pertaining to stress accent, it is necessary to manually annotate a corpus (syllable by syllable) This manual annotation has been performed for a 45-minute subset of the Switchboard corpus, which has also been labeled with respect to phonetic segments, syllables and words It is thus possible to ascertain the relationship between stress accent and duration at the level of the word, syllable and phonetic segment

43 If Not (entirely) Word Frequency, Then What? One parameter that might be more directly related to word duration (and other durational properties of speech) is STRESS ACCENT Stress Accent is related to the emphasis (or prominence) associated with individual syllables within a word Although dictionaries list the stress patterns associated with words, this information is but a rough guide to the actual patterns observed (as is the phonetic pronunciation provided in the dictionary) In order to obtain empirical data pertaining to stress accent, it is necessary to manually annotate a corpus (syllable by syllable) This manual annotation has been performed for a 45-minute subset of the Switchboard corpus, which has also been labeled with respect to phonetic segments, syllables and words It is thus possible to ascertain the relationship between stress accent and duration at the level of the word, syllable and phonetic segment The remainder of this presentation focuses on the statistical relationship between stress accent and duration at these different linguistic tiers

44 If Not (entirely) Word Frequency, Then What? One parameter that might be more directly related to word duration (and other durational properties of speech) is STRESS ACCENT Stress Accent is related to the emphasis (or prominence) associated with individual syllables within a word Although dictionaries list the stress patterns associated with words, this information is but a rough guide to the actual patterns observed (as is the phonetic pronunciation provided in the dictionary) In order to obtain empirical data pertaining to stress accent, it is necessary to manually annotate a corpus (syllable by syllable) This manual annotation has been performed for a 45-minute subset of the Switchboard corpus, which has also been labeled with respect to phonetic segments, syllables and words It is thus possible to ascertain the relationship between stress accent and duration at the level of the word, syllable and phonetic segment The remainder of this presentation focuses on the statistical relationship between stress accent and duration at these different linguistic tiers In order to highlight the importance of “information” in guiding the phonetic realization (and hence articulation) of spoken language

45 If Not (entirely) Word Frequency, Then What? Before examining these data, let’s briefly consider the nature of the annotated material

46 If Not (entirely) Word Frequency, Then What? Before examining these data, let’s briefly consider the nature of the annotated material (this is important for evaluating the reliability of the results obtained)

47 PART TWO Being Phonetically and Prosodically Annotated

48 Phonetic Transcription of Spontaneous English Telephone dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically transcribed (labeled and segmented)

49 Phonetic Transcription of Spontaneous English Telephone dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically transcribed (labeled and segmented) There is a Lot of Diversity in the Material Transcribed

50 Phonetic Transcription of Spontaneous English Telephone dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically transcribed (labeled and segmented) There is a Lot of Diversity in the Material Transcribed Spans speech of both genders (ca. 50/50%), reflecting a wide range of American dialectal variation, speaking rate and voice quality

51 Phonetic Transcription of Spontaneous English Telephone dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically transcribed (labeled and segmented) There is a Lot of Diversity in the Material Transcribed Spans speech of both genders (ca. 50/50%), reflecting a wide range of American dialectal variation, speaking rate and voice quality This material has been MANUALLY annotated

52 Phonetic Transcription of Spontaneous English Telephone dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically transcribed (labeled and segmented) There is a Lot of Diversity in the Material Transcribed Spans speech of both genders (ca. 50/50%), reflecting a wide range of American dialectal variation, speaking rate and voice quality This material has been MANUALLY annotated 1 hour LABELED and SEGMENTED at the phonetic-segment level

53 Phonetic Transcription of Spontaneous English Telephone dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically transcribed (labeled and segmented) There is a Lot of Diversity in the Material Transcribed Spans speech of both genders (ca. 50/50%), reflecting a wide range of American dialectal variation, speaking rate and voice quality This material has been MANUALLY annotated 1 hour LABELED and SEGMENTED at the phonetic-segment level 4 hours LABELED at the phone level and SEGMENTED wrt syllable boundaries

54 Phonetic Transcription of Spontaneous English Telephone dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically transcribed (labeled and segmented) There is a Lot of Diversity in the Material Transcribed Spans speech of both genders (ca. 50/50%), reflecting a wide range of American dialectal variation, speaking rate and voice quality This material has been MANUALLY annotated 1 hour LABELED and SEGMENTED at the phonetic-segment level 4 hours LABELED at the phone level and SEGMENTED wrt syllable boundaries The latter material SEGMENTED into PHONES using AUTOMATIC methods

55 Phonetic Transcription of Spontaneous English Telephone dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically transcribed (labeled and segmented) There is a Lot of Diversity in the Material Transcribed Spans speech of both genders (ca. 50/50%), reflecting a wide range of American dialectal variation, speaking rate and voice quality This material has been MANUALLY annotated 1 hour LABELED and SEGMENTED at the phonetic-segment level 4 hours LABELED at the phone level and SEGMENTED wrt syllable boundaries The latter material SEGMENTED into PHONES using AUTOMATIC methods 45 minutes of HAND-LABELED, STRESS-ACCENT material

56 Phonetic Transcription of Spontaneous English Telephone dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically transcribed (labeled and segmented) There is a Lot of Diversity in the Material Transcribed Spans speech of both genders (ca. 50/50%), reflecting a wide range of American dialectal variation, speaking rate and voice quality This material has been MANUALLY annotated 1 hour LABELED and SEGMENTED at the phonetic-segment level 4 hours LABELED at the phone level and SEGMENTED wrt syllable boundaries The latter material SEGMENTED into PHONES using AUTOMATIC methods 45 minutes of HAND-LABELED, STRESS-ACCENT material An additional four hours of stress-accent material automatically labeled

57 Phonetic Transcription of Spontaneous English Telephone dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically transcribed (labeled and segmented) There is a Lot of Diversity in the Material Transcribed Spans speech of both genders (ca. 50/50%), reflecting a wide range of American dialectal variation, speaking rate and voice quality This material has been MANUALLY annotated 1 hour LABELED and SEGMENTED at the phonetic-segment level 4 hours LABELED at the phone level and SEGMENTED wrt syllable boundaries The latter material SEGMENTED into PHONES using AUTOMATIC methods 45 minutes of HAND-LABELED, STRESS-ACCENT material An additional four hours of stress-accent material automatically labeled (though NOT used in the current analysis)

58 Phonetic Transcription of Spontaneous English Telephone dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically transcribed (labeled and segmented) There is a Lot of Diversity in the Material Transcribed Spans speech of both genders (ca. 50/50%), reflecting a wide range of American dialectal variation, speaking rate and voice quality This material has been MANUALLY annotated 1 hour LABELED and SEGMENTED at the phonetic-segment level 4 hours LABELED at the phone level and SEGMENTED wrt syllable boundaries The latter material SEGMENTED into PHONES using AUTOMATIC methods 45 minutes of HAND-LABELED, STRESS-ACCENT material An additional four hours of stress-accent material automatically labeled (though NOT used in the current analysis) Transcription System A variant of Arpabet, a fairly broad phonetic transcription orthography

59 Phonetic Transcription of Spontaneous English The Data are Available at ….

60 Phonetic Transcription of Spontaneous English The Data are Available at …. http://www.icsi.berkeley.edu/real/stp

61 Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent (prominence)

62 Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent (prominence) Three levels of accent were distinguished:

63 Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent (prominence) Three levels of accent were distinguished: Heavy

64 Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent (prominence) Three levels of accent were distinguished: HeavyLight

65 Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent (prominence) Three levels of accent were distinguished: HeavyLightNone

66 Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent (prominence) Three levels of accent were distinguished: Heavy (1)Light (0.5)None (0)

67 Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent (prominence) Three levels of accent were distinguished: Heavy (1)Light (0.5)None (0) (In actuality, labelers assigned a “1” to fully accented syllables, a “null” to completely unaccented syllables, and a “0.5” to all others)

68 Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent (prominence) Three levels of accent were distinguished: Heavy (1)Light (0.5)None (0) (In actuality, labelers assigned a “1” to fully accented syllables, a “null” to completely unaccented syllables, and a “0.5” to all others) An example of the annotation (attached to the vocalic nucleus) is shown below (where the accent levels could not be derived from a dictionary)

69 Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent (prominence) Three levels of accent were distinguished: Heavy (1)Light (0.5)None (0) (In actuality, labelers assigned a “1” to fully accented syllables, a “null” to completely unaccented syllables, and a “0.5” to all others) An example of the annotation (attached to the vocalic nucleus) is shown below (where the accent levels could not be derived from a dictionary) In this example most of the syllables are unaccented, with two labeled as lightly accented (0.5)

70 Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent (prominence) Three levels of accent were distinguished: Heavy (1)Light (0.5)None (0) (In actuality, labelers assigned a “1” to fully accented syllables, a “null” to completely unaccented syllables, and a “0.5” to all others) An example of the annotation (attached to the vocalic nucleus) is shown below (where the accent levels could not be derived from a dictionary) In this example most of the syllables are unaccented, with two labeled as lightly accented (0.5) (and one other labeled as very lightly accented (0.25))

71 The data are available at …. Annotation of Stress Accent

72 The data are available at …. http://www.icsi.berkeley.edu/~steveng/prosody Annotation of Stress Accent

73 The data are available at …. http://www.icsi.berkeley.edu/~steveng/prosody The methods used for automatically labeling this material wrt to stress accent will be described in further detail in my talk on July 2 Annotation of Stress Accent

74 PART THREE The Relation between Stress Accent and Word Duration

75 Back to Stress Accent and Word Duration… Stress accent is supposed to bear some systematic relation to three principal acoustic parameters of the speech signal:

76 Back to Stress Accent and Word Duration… Stress accent is supposed to bear some systematic relation to three principal acoustic parameters of the speech signal: Fundamental Frequency

77 Back to Stress Accent and Word Duration… Stress accent is supposed to bear some systematic relation to three principal acoustic parameters of the speech signal: Fundamental FrequencyAmplitude

78 Back to Stress Accent and Word Duration… Stress accent is supposed to bear some systematic relation to three principal acoustic parameters of the speech signal: Fundamental FrequencyAmplitude Duration

79 Back to Stress Accent and Word Duration… Stress accent is supposed to bear some systematic relation to three principal acoustic parameters of the speech signal: Fundamental FrequencyAmplitude Duration

80 Back to Stress Accent and Word Duration… Stress accent is supposed to bear some systematic relation to three principal acoustic parameters of the speech signal: Fundamental FrequencyAmplitude Duration In previous studies my colleagues and I have shown that f 0 -related cues play a relatively small role in stress accent assignment (at least for spontaneous American English material)

81 Back to Stress Accent and Word Duration… Stress accent is supposed to bear some systematic relation to three principal acoustic parameters of the speech signal: Fundamental FrequencyAmplitude Duration In previous studies my colleagues and I have shown that f 0 -related cues play a relatively small role in stress accent assignment (at least for spontaneous American English material) Amplitude and duration appear to play a far more important role than f 0

82 Back to Stress Accent and Word Duration… Stress accent is supposed to bear some systematic relation to three principal acoustic parameters of the speech signal: Fundamental FrequencyAmplitude Duration In previous studies my colleagues and I have shown that f 0 -related cues play a relatively small role in stress accent assignment (at least for spontaneous American English material) Amplitude and duration appear to play a far more important role than f 0 Therefore, it is not unreasonable to assume that the stress accent patterns associated with words bear some tangible relation to lexical duration

83 Back to Stress Accent and Word Duration… Stress accent is supposed to bear some systematic relation to three principal acoustic parameters of the speech signal: Fundamental FrequencyAmplitude Duration In previous studies my colleagues and I have shown that f 0 -related cues play a relatively small role in stress accent assignment (at least for spontaneous American English material) Amplitude and duration appear to play a far more important role than f 0 Therefore, it is not unreasonable to assume that the stress accent patterns associated with words bear some tangible relation to lexical duration So ….

84 Back to Stress Accent and Word Duration… Stress accent is supposed to bear some systematic relation to three principal acoustic parameters of the speech signal: Fundamental FrequencyAmplitude Duration In previous studies my colleagues and I have shown that f 0 -related cues play a relatively small role in stress accent assignment (at least for spontaneous American English material) Amplitude and duration appear to play a far more important role than f 0 Therefore, it is not unreasonable to assume that the stress accent patterns associated with words bear some tangible relation to lexical duration So …. Let’s find out!

85 Word Duration and Stress Accent Level Let’s first examine the durational properties of heavily accented words

86 Word Duration and Stress Accent Level Let’s first examine the durational properties of heavily accented words (these are words containing at least one heavily accented syllable)

87 Word Duration and Stress Accent Level Let’s first examine the durational properties of heavily accented words (these are words containing at least one heavily accented syllable) The mean duration of this subset (36%) is 378 ms (s.d. = 168 ms) Heavily Accented

88 Word Duration and Stress Accent Level Let’s first examine the durational properties of heavily accented words (these are words containing at least one heavily accented syllable) The mean duration of this subset (36%) is 378 ms (s.d. = 168 ms) Most of the heavily accented words are longer than 200 ms Heavily Accented

89 Let’s now compare the duration of the heavily accented words with those of their lightly accented counterparts (25% of the total) Word Duration and Stress Accent Level Heavily Accented

90 Heavily Accented Lightly Accented Let’s now compare the duration of the heavily accented words with those of their lightly accented counterparts (25% of the total) The mean duration of this subset is 255 ms (s.d. = 116 ms) Word Duration and Stress Accent Level

91 Heavily Accented Lightly Accented Let’s now compare the duration of the heavily accented words with those of their lightly accented counterparts (25% of the total) The mean duration of this subset is 255 ms (s.d. = 116 ms) In many respects the durational properties of these two subsets are similar Word Duration and Stress Accent Level

92 Heavily Accented Lightly Accented Let’s now compare the duration of unaccented words with that of their accented counterparts Word Duration and Stress Accent Level

93 Heavily Accented Lightly Accented Unaccented Let’s now compare the duration of unaccented words with that of their accented counterparts The mean duration of the unaccented subset (39%) is 149 ms (s.d. = 78 ms) Word Duration and Stress Accent Level

94 Heavily Accented Lightly Accented Unaccented Let’s now compare the duration of unaccented words with that of their accented counterparts The mean duration of the unaccented subset (39%) is 149 ms (s.d. = 78 ms) The unaccented words are generally shorter than 200 ms Word Duration and Stress Accent Level

95 Heavily Accented Lightly Accented Unaccented Let’s now compare the duration of unaccented words with that of their accented counterparts The mean duration of the unaccented subset (39%) is 149 ms (s.d. = 78 ms) The unaccented words are generally shorter than 200 ms and constitute a very different distributional form than their accented counterparts Word Duration and Stress Accent Level

96 Heavily Accented Lightly Accented Unaccented Let’s now compare the durational properties of ALL WORDS in the corpus with those pertaining to words of varying accent levels Word Duration and Stress Accent Level

97 Heavily Accented Lightly Accented Unaccented All Words Word Duration and Stress Accent Level Let’s now compare the durational properties of ALL WORDS in the corpus with those pertaining to words of varying accent levels When we do so,

98 Heavily Accented Lightly Accented Unaccented All Words Word Duration and Stress Accent Level Let’s now compare the durational properties of ALL WORDS in the corpus with those pertaining to words of varying accent levels When we do so, we notice that the left-hand branch of the lexical distribution largely reflects unaccented words,

99 Heavily Accented Lightly Accented Unaccented All Words Word Duration and Stress Accent Level Let’s now compare the durational properties of ALL WORDS in the corpus with those pertaining to words of varying accent levels When we do so, we notice that the left-hand branch of the lexical distribution largely reflects unaccented words, while the right-hand branch reflects mostly accented words (with the peak reflecting both)

100 Heavily Accented Lightly Accented Unaccented All Words Word Duration and Stress Accent Level Therefore, it appears that the broad distribution of word duration (and, in turn, syllable duration) largely reflects the co-existence of accented and unaccented words within spontaneous speech

101 Heavily Accented Lightly Accented Unaccented All Words Word Duration and Stress Accent Level Therefore, it appears that the broad distribution of word duration (and, in turn, syllable duration) largely reflects the co-existence of accented and unaccented words within spontaneous speech What are the implications of this insight?

102 Breadth of the Modulation Spectrum The broad bandwidth of the modulation spectrum, therefore, appears to reflect the heterogeneity in syllabic and lexical duration associated with variation in stress accent level

103 Breadth of the Modulation Spectrum The broad bandwidth of the modulation spectrum, therefore, appears to reflect the heterogeneity in syllabic and lexical duration associated with variation in stress accent level Modulation spectrum of 40 TIMIT sentences (computed across a 6-kHz bandwidth) Unaccented Heavily Accented All Accents (Convergnce)

104 Breadth of the Modulation Spectrum The broad bandwidth of the modulation spectrum, therefore, appears to reflect the heterogeneity in syllabic and lexical duration associated with variation in stress accent level Does this insight have implications for the lower tiers of spoken language? Modulation spectrum of 40 TIMIT sentences (computed across a 6-kHz bandwidth) Unaccented Heavily Accented All Accents (Convergnce)

105 Breadth of the Modulation Spectrum The broad bandwidth of the modulation spectrum, therefore, appears to reflect the heterogeneity in syllabic and lexical duration associated with variation in stress accent level Does this insight have implications for the lower tiers of spoken language? (e.g., the phonetic and phonological levels) Modulation spectrum of 40 TIMIT sentences (computed across a 6-kHz bandwidth) Unaccented Heavily Accented All Accents (Convergnce)

106 Breadth of the Modulation Spectrum The broad bandwidth of the modulation spectrum, therefore, appears to reflect the heterogeneity in syllabic and lexical duration associated with variation in stress accent level Does this insight have implications for the lower tiers of spoken language? (e.g., the phonetic and phonological levels) Let’s find out! Modulation spectrum of 40 TIMIT sentences (computed across a 6-kHz bandwidth) Unaccented Heavily Accented All Accents (Convergnce)

107 PART FOUR Anatomy of a Syllable

108 The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure

109 The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position

110 The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure In order to highlight patterns germane to variation in segmental identity and duration it is necessary to partition the data in terms of syllable position (as well as stress accent level)

111 The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure In order to highlight patterns germane to variation in segmental identity and duration it is necessary to partition the data in terms of syllable position (as well as stress accent level) As a consequence, we will examine the onsets, codas and nuclei of syllables separately in order to gain insight into the underlying patterns

112 The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure In order to highlight patterns germane to variation in segmental identity and duration it is necessary to partition the data in terms of syllable position (as well as stress accent level) As a consequence, we will examine the onsets, codas and nuclei of syllables separately in order to gain insight into the underlying patterns What is an onset?

113 The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure In order to highlight patterns germane to variation in segmental identity and duration it is necessary to partition the data in terms of syllable position (as well as stress accent level) As a consequence, we will examine the onsets, codas and nuclei of syllables separately in order to gain insight into the underlying patterns What is a onset? What is a nucleus?

114 The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure In order to highlight patterns germane to variation in segmental identity and duration it is necessary to partition the data in terms of syllable position (as well as stress accent level) As a consequence, we will examine the onsets, codas and nuclei of syllables separately in order to gain insight into the underlying patterns What is a onset? What is a nucleus? What is a coda?

115 The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure In order to highlight patterns germane to variation in segmental identity and duration it is necessary to partition the data in terms of syllable position (as well as stress accent level) As a consequence, we will examine the onsets, codas and nuclei of syllables separately in order to gain insight into the underlying patterns What is an onset? What is a nucleus? What is a coda? The following slides provide a brief (and hopefully gentle) introduction to syllable structure

116 “J” = JUNCTURE OGI Numbers95 corpus Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents

117 “J” = JUNCTURE OGI Numbers95 corpus Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET

118 “J” = JUNCTURE OGI Numbers95 corpus Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS

119 “J” = JUNCTURE OGI Numbers95 corpus Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA

120 “J” = JUNCTURE OGI Numbers95 corpus Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA Virtually all syllables contain a NUCLEUS, which is VOCALIC (by definition)

121 “J” = JUNCTURE OGI Numbers95 corpus Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA Virtually all syllables contain a NUCLEUS, which is VOCALIC (by definition) Most (but not all) syllables also contain an ONSET (usually a CONSONANT)

122 “J” = JUNCTURE OGI Numbers95 corpus Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA Virtually all syllables contain a NUCLEUS, which is VOCALIC (by definition) Most (but not all) syllables also contain an ONSET (usually a CONSONANT) Many syllables contain a CODA (also typically a CONSONANT)

123 “J” = JUNCTURE OGI Numbers95 corpus Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA Virtually all syllables contain a NUCLEUS, which is VOCALIC (by definition) Most (but not all) syllables also contain an ONSET (usually a CONSONANT) Many syllables contain a CODA (also typically a CONSONANT) The most common syllable form in English is Onset + Nucleus + Coda (“Nine”)

124 “J” = JUNCTURE OGI Numbers95 corpus Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA Virtually all syllables contain a NUCLEUS, which is VOCALIC (by definition) Most (but not all) syllables also contain an ONSET (usually a CONSONANT) Many syllables contain a CODA (also typically a CONSONANT) The most common syllable form in English is Onset + Nucleus + Coda (“Nine”) Followed in popularity by Onset + Nucleus (“Two”)

125 “J” = JUNCTURE OGI Numbers95 corpus Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA Virtually all syllables contain a NUCLEUS, which is VOCALIC (by definition) Most (but not all) syllables also contain an ONSET (usually a CONSONANT) Many syllables contain a CODA (also typically a CONSONANT) The most common syllable form in English is Onset + Nucleus + Coda (“Nine”) Followed in popularity by Onset + Nucleus (“Two”) Note that onset segments often differ in significant ways from coda segments

126 “J” = JUNCTURE OGI Numbers95 corpus Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA Virtually all syllables contain a NUCLEUS, which is VOCALIC (by definition) Most (but not all) syllables also contain an ONSET (usually a CONSONANT) Many syllables contain a CODA (also typically a CONSONANT) The most common syllable form in English is Onset + Nucleus + Coda (“Nine”) Followed in popularity by Onset + Nucleus (“Two”) Note that onset segments often differ in significant ways from coda segments And that certain phones are actually “junctures” not segments

127 PART FIVE Spectro-Temporal Profiles

128 The Spectro-Temporal Profile (STeP) Certain specific (and important) properties of the syllable are not well represented in terms of the traditional 2.5-D spectrographic representation

129 The Spectro-Temporal Profile (STeP) Certain specific (and important) properties of the syllable are not well represented in terms of the traditional 2.5-D spectrographic representation STRESS ACCENT and JUNCTURE are two such properties

130 The Spectro-Temporal Profile (STeP) Certain specific (and important) properties of the syllable are not well represented in terms of the traditional 2.5-D spectrographic representation STRESS ACCENT and JUNCTURE are two such properties A different representation, based on the log, critical-band energy profile across frequency and time, can provide the requisite detail

131 The Spectro-Temporal Profile (STeP) Certain specific (and important) properties of the syllable are not well represented in terms of the traditional 2.5-D spectrographic representation STRESS ACCENT and JUNCTURE are two such properties A different representation, based on the log, critical-band energy profile across frequency and time, can provide the requisite detail As shown in “miniature” below ….. STePs are derived from averages of hundreds of individual instances

132 The Spectro-Temporal Profile (STeP) Certain specific (and important) properties of the syllable are not well represented in terms of the traditional 2.5-D spectrographic representation STRESS ACCENT and JUNCTURE are two such properties A different representation, based on the log, critical-band energy profile across frequency and time, can provide the requisite detail As shown in “miniature” below …. (and as shown in expanded form on a subsequent following slide) STePs are derived from averages of hundreds of individual instances

133 The Spectro-Temporal Profile (STeP) The onset, nucleus and coda exhibit differential compression (durational) characteristics as a function of stress-accent level

134 The Spectro-Temporal Profile (STeP) The onset, nucleus and coda exhibit differential compression (durational) characteristics as a function of stress-accent level In some sense the speech signal can be thought of as a mountain topography, where the “foothills” (i.e., onset) behave quite differently than the “cliffs” (i.e., codas)

135 The Spectro-Temporal Profile (STeP) The onset, nucleus and coda exhibit differential compression (durational) characteristics as a function of stress-accent level In some sense the speech signal can be thought of as a mountain topography, where the “foothills” (i.e., onset) behave quite differently than the “cliffs” (i.e., codas) The controlling factor appears to be accent level

136 The Spectro-Temporal Profile (STeP) The onset, nucleus and coda exhibit differential compression (durational) characteristics as a function of stress-accent level In some sense the speech signal can be thought of as a mountain topography, where the “foothills” (i.e., onset) behave quite differently than the “cliffs” (i.e., codas) The controlling factor appears to be accent level (i.e., the height of the mountain peak)

137 Spectro-Temporal Profile - DiSyllabic Word [s] [eh] [vx] [en] juncture accented syllable unaccented syllable “Seven” mean duration Full-spectrum perspective OGI Numbers95 [s] [eh] [vx] [en] Nucleus Onset Ambi-syllabic Nucleus Juncture

138 PART SIX Durational Properties of the Syllable (and its relation to pronunciation variation)

139 Analysis of Durational Properties of Speech The following analyses are conditioned on stress accent level and (for the most part) syllable position

140 Analysis of Durational Properties of Speech The following analyses are conditioned on stress accent level and (for the most part) syllable position We’ll begin with analyses illustrating the patterns associated with three levels of stress accent (heavy, light and none) to show the graded nature of the durational properties pertaining to syllable and segment duration

141 Analysis of Durational Properties of Speech The following analyses are conditioned on stress accent level and (for the most part) syllable position We’ll begin with analyses illustrating the patterns associated with three levels of stress accent (heavy, light and none) to show the graded nature of the durational properties pertaining to syllable and segment duration However, for purposes of illustrative clarity, many of the slides will show only two levels of accent (heavy and none) in order to delineate the differences in duration associated with stress accent level

142 Analysis of Durational Properties of Speech The following analyses are conditioned on stress accent level and (for the most part) syllable position We’ll begin with analyses illustrating the patterns associated with three levels of stress accent (heavy, light and none) to show the graded nature of the durational properties pertaining to syllable and segment duration However, for purposes of illustrative clarity, many of the slides will show only two levels of accent (heavy and none) in order to delineate the differences in duration associated with stress accent level Under such conditions, the durational properties associated with light accent are generally intermediate between heavy accent and none

143 Syllable Duration - Across Syllable Forms There is a broad range of syllable structures observed in spoken English

144 Syllable Duration - Across Syllable Forms There is a broad range of syllable structures observed in spoken English The CV and CVC forms cover ca. 60% of the syllables V = Vowel C = Consonant

145 Syllable Duration - Across Syllable Forms There is a broad range of syllable structures observed in spoken English The CV and CVC forms cover ca. 60% of the syllables Together, the V, VC, CV and CVC forms account for 85% of syllables V = Vowel C = Consonant

146 Syllable Duration - Across Syllable Forms There is a broad range of syllable structures observed in spoken English The CV and CVC forms cover ca. 60% of the syllables Together, the V, VC, CV and CVC forms account for 85% of syllables The CVCC and CCVC (complex syllable) forms account for another 10% V = Vowel C = Consonant

147 Syllable Duration - Across Syllable Forms It is unsurprising that syllable duration is largely a function of the number of segments within the syllable (as shown in the graph below) Canonical Syllable Forms V = Vowel C = Consonant

148 Syllable Duration - Across Syllable Forms It is unsurprising that syllable duration is largely a function of the number of segments within the syllable (as shown in the graph below) Note the systematic lengthening of the syllable for each form as the accent level increases from “NONE” to “LIGHT “to “HEAVY” Canonical Syllable Forms V = Vowel C = Consonant

149 Syllable Duration - Across Syllable Forms It is unsurprising that syllable duration is largely a function of the number of segments within the syllable (as shown in the graph below) Note the systematic lengthening of the syllable for each form as the accent level increases from “NONE” to “LIGHT “to “HEAVY” This pattern is representative of accent’s impact on duration Canonical Syllable Forms V = Vowel C = Consonant

150 Syllable Duration - Across Syllable Forms It is unsurprising that syllable duration is largely a function of the number of segments within the syllable (as shown in the graph below) Note the systematic lengthening of the syllable for each form as the accent level increases from “NONE” to “LIGHT “to “HEAVY” This pattern is representative of accent’s impact on duration (as we’ll see) Canonical Syllable Forms V = Vowel C = Consonant

151 Syllable Duration - Accent Level/Syllable Form Canonical Syllable Forms This graph shows the same data as the previous slides, but from the perspective of just two accent levels (“HEAVY” and “NONE”) V = Vowel C = Consonant

152 Syllable Duration - Accent Level/Syllable Form Canonical Syllable Forms This graph shows the same data as the previous slides, but from the perspective of just two accent levels (“HEAVY” and “NONE”) The heavily accented syllables are generally 60-100% longer than their unaccented counterparts V = Vowel C = Consonant

153 Syllable Duration - Accent Level/Syllable Form Canonical Syllable Forms This graph shows the same data as the previous slides, but from the perspective of just two accent levels (“HEAVY” and “NONE”) The heavily accented syllables are generally 60-100% longer than their unaccented counterparts The disparity in duration is most pronounced for syllable forms with one or no consonants (i.e., V, VC, CV) V = Vowel C = Consonant

154 Syllable Duration - Accent Level/Syllable Form Canonical Syllable Forms This graph shows the same data as the previous slides, but from the perspective of just two accent levels (“HEAVY” and “NONE”) The heavily accented syllables are generally 60-100% longer than their unaccented counterparts The disparity in duration is most pronounced for syllable forms with one or no consonants (i.e., V, VC, CV) This pattern implies that accent has the greatest impact on vocalic duration V = Vowel C = Consonant

155 Canonical Syllable Forms Nucleus Duration - Accent Level/Syllable Form The hypothesis delineated on the previous slide (that accent has the most profound impact on vocalic duration) is confirmed in the graph below

156 Canonical Syllable Forms Nucleus Duration - Accent Level/Syllable Form The hypothesis delineated on the previous slide (that accent has the most profound impact on vocalic duration) is confirmed in the graph below Vowels in accented syllables (of all forms) are at least twice as long as their unaccented counterparts

157 Canonical Syllable Forms Nucleus Duration - Accent Level/Syllable Form The hypothesis delineated on the previous slide (that accent has the most profound impact on vocalic duration) is confirmed in the graph below Vowels in accented syllables (of all forms) are at least twice as long as their unaccented counterparts This pattern implies that the syllabic nucleus absorbs a major component of accent’s impact (at least as far as duration is concerned)

158 PART SEVEN Stress Accent and the Vocalic Nucleus

159 Because the pattern of stress accent’s impact on vocalic duration is relatively uniform across syllable form it is likely that the specific structure of the syllable has relatively little impact on vocalic duration Stress Accent’s Impact on the Vocalic Nucleus

160 Because the pattern of stress accent’s impact on vocalic duration is relatively uniform across syllable form it is likely that the specific structure of the syllable has relatively little impact on vocalic duration As a consequence, the remaining analyses pertaining to accent’s impact on vocalic duration collapse the data across syllable form Stress Accent’s Impact on the Vocalic Nucleus

161 Because the pattern of stress accent’s impact on vocalic duration is relatively uniform across syllable form it is likely that the specific structure of the syllable has relatively little impact on vocalic duration As a consequence, the remaining analyses pertaining to accent’s impact on vocalic duration collapse the data across syllable form We now examine vocalic duration in somewhat greater detail and illustrate how duration, stress accent and vocalic identity interact Stress Accent’s Impact on the Vocalic Nucleus

162 The Spatial Patterning of Duration in Vocalic Nuclei

163 Vowel quality is generally thought to be a function primarily of two articulatory properties – both related to the motion of the tongue A Brief Primer on Vocalic Acoustics

164 Vowel quality is generally thought to be a function primarily of two articulatory properties – both related to the motion of the tongue The front-back plane is most closely associated with the second formant frequency (or more precisely F2 - F1) and the volume of the front-cavity resonance A Brief Primer on Vocalic Acoustics

165 Vowel quality is generally thought to be a function primarily of two articulatory properties – both related to the motion of the tongue The front-back plane is most closely associated with the second formant frequency (or more precisely F2 - F1) and the volume of the front-cavity resonance The height parameter is closely linked to the frequency of F1 A Brief Primer on Vocalic Acoustics

166 Vowel quality is generally thought to be a function primarily of two articulatory properties – both related to the motion of the tongue The front-back plane is most closely associated with the second formant frequency (or more precisely F2 - F1) and the volume of the front-cavity resonance The height parameter is closely linked to the frequency of F1 In the classic vowel “triangle,” segments are positioned in terms of the tongue positions associated with their production, as follows: A Brief Primer on Vocalic Acoustics

167 Vowel quality is generally thought to be a function primarily of two articulatory properties – both related to the motion of the tongue The front-back plane is most closely associated with the second formant frequency (or more precisely F2 - F1) and the volume of the front-cavity resonance The height parameter is closely linked to the frequency of F1 In the classic vowel “triangle,” segments are positioned in terms of the tongue positions associated with their production, as follows: A Brief Primer on Vocalic Acoustics

168 In the following slides duration is plotted on a 2-D grid, where the x-axis represents the (hypothetical) front-back tongue position Spatial Patterning of Duration et al.

169 In the following slides duration is plotted on a 2-D grid, where the x-axis represents the (hypothetical) front-back tongue position (and hence remains a constant throughout the plots to follow) Spatial Patterning of Duration et al.

170 In the following slides duration is plotted on a 2-D grid, where the x-axis represents the (hypothetical) front-back tongue position (and hence remains a constant throughout the plots to follow) The y-axis serves as the dependent measure, expressed in terms of either duration or the proportion of fully stressed (or unstressed) nuclei Spatial Patterning of Duration et al.

171 Vocalic Duration and Vowel Height The spatial patterning of vocalic segments is systematic with respect to duration

172 Vocalic Duration and Vowel Height The spatial patterning of vocalic segments is systematic with respect to duration Low vowels, be they diphthongs or monophthongs, are longer (on average) than high vowels

173 Vocalic Duration and Vowel Height All nuclei DiphthongsMonophthongs The spatial patterning of vocalic segments is systematic with respect to duration Low vowels, be they diphthongs or monophthongs, are longer (on average) than high vowels

174 Vocalic Duration and Vowel Height All nuclei DiphthongsMonophthongs The spatial patterning of vocalic segments is systematic with respect to duration Low vowels, be they diphthongs or monophthongs, are longer (on average) than high vowels Thus, duration appears to be highly correlated with vowel height

175 Vocalic Duration and Vowel Height All nuclei DiphthongsMonophthongs The spatial patterning of vocalic segments is systematic with respect to duration Low vowels, be they diphthongs or monophthongs, are longer (on average) than high vowels Thus, duration appears to be highly correlated with vowel height But … the situation is a little more complicated than first appearances would suggest

176 Durational Differences - Stressed/Unstressed There is a large dynamic range in duration between accented and unaccented vocalic nuclei Canonical Syllable Forms

177 Durational Differences - Stressed/Unstressed There is a large dynamic range in duration between accented and unaccented vocalic nuclei Moreover, diphthongs and tense, low monophthongs tend to exhibit a larger dynamic range than the lax monophthongs Canonical Syllable Forms Lax monophthongs

178 QUESTION What are the implications of this durational variation among vowels for vocalic identity?

179 The Vowel Space Under (Full) Stress (Accent) In accented nuclei there is a relatively even distribution of segments across the vowel space, with a slight bias towards the front and central vowels Canonical Vowels Only

180 In unaccented syllables vowels are confined largely to the high-front and high-central sectors of the articulatory space The Vowel Space Without (Stress) Accent Canonical Vowels Only

181 In unaccented syllables vowels are confined largely to the high-front and high-central sectors of the articulatory space The low and mid vowels “get creamed” The Vowel Space Without (Stress) Accent Canonical Vowels Only

182 Stress accent exerts a profound effect on the character of the vowel space The Vowel Spaces Compared Canonical Vowels Only Heavily AccentedUnaccented

183 Stress accent exerts a profound effect on the character of the vowel space High vowels are largely associated with unaccented syllables The Vowel Spaces Compared Canonical Vowels Only Heavily AccentedUnaccented

184 Stress accent exerts a profound effect on the character of the vowel space High vowels are largely associated with unaccented syllables Low vowels are mostly associated with accented forms The Vowel Spaces Compared Canonical Vowels Only Heavily AccentedUnaccented

185 Stress accent exerts a profound effect on the character of the vowel space High vowels are largely associated with unaccented syllables Low vowels are mostly associated with accented forms This distinction between accented and unaccented syllables is of profound importance for understanding (and modeling) pronunciation variation The Vowel Spaces Compared Canonical Vowels Only Heavily AccentedUnaccented

186 Stress accent exerts a profound effect on the character of the vowel space High vowels are largely associated with unaccented syllables Low vowels are mostly associated with accented forms This distinction between accented and unaccented syllables is of profound importance for understanding (and modeling) pronunciation variation And will be addressed further in the “Beyond the Phoneme” presentation The Vowel Spaces Compared Canonical Vowels Only Heavily AccentedUnaccented

187 PART EIGHT Stress Accent’s Impact on Syllable Onsets

188 Stress Accent and Syllable Onsets The onset is often cited as the key syllabic constituent with respect to “lexical access”

189 Stress Accent and Syllable Onsets The onset is often cited as the key syllabic constituent with respect to “lexical access” It is therefore of interest to ascertain how the onset’s duration behaves as a function of accent level

190 Stress Accent and Syllable Onsets The onset is often cited as the key syllabic constituent with respect to “lexical access” It is therefore of interest to ascertain how the onset’s duration behaves as a function of accent level Because of the onset’s key role in lexical access one might assume that its duration would be relatively stable across accent level

191 Stress Accent and Syllable Onsets The onset is often cited as the key syllabic constituent with respect to “lexical access” It is therefore of interest to ascertain how the onset’s duration behaves as a function of accent level Because of the onset’s key role in lexical access one might assume that its duration would be relatively stable across accent level The following slides suggest that this assumption is INCORRECT,

192 Stress Accent and Syllable Onsets The onset is often cited as the key syllabic constituent with respect to “lexical access” It is therefore of interest to ascertain how the onset’s duration behaves as a function of accent level Because of the onset’s key role in lexical access one might assume that its duration would be relatively stable across accent level The following slides suggest that this assumption is INCORRECT, And that the structure of the onset is more complex (and more interesting) than initial intuition would suggest

193 Canonical Syllable Forms Onset Duration - Accent Level/Syllable Form The duration of the syllable onset varies significantly as a function of accent level (though not quite as much as exhibited by vocalic constituents)

194 Canonical Syllable Forms Onset Duration - Accent Level/Syllable Form The duration of the syllable onset varies significantly as a function of accent level (though not quite as much as exhibited by vocalic constituents) Onset duration is similar across syllable form

195 Canonical Syllable Forms Onset Duration - Accent Level/Syllable Form The duration of the syllable onset varies significantly as a function of accent level (though not quite as much as exhibited by vocalic constituents) Onset duration is similar across syllable form (except that segments comprising complex onsets [i.e., CCVC] are slightly shorter)

196 Canonical Syllable Forms Onset Duration - Accent Level/Syllable Form The duration of the syllable onset varies significantly as a function of accent level (though not quite as much as exhibited by vocalic constituents) Onset duration is similar across syllable form (except that segments comprising complex onsets [i.e., CCVC] are slightly shorter) The duration of unaccented onsets is similar across syllable forms

197 Canonical Syllable Forms Onset Duration - Accent Level/Syllable Form Onsets of accented syllables are generally 50-60% longer than their unaccented counterparts

198 Canonical Syllable Forms Onset Duration - Accent Level/Syllable Form Onsets of accented syllables are generally 50-60% longer than their unaccented counterparts Although this durational difference is not quite as large as observed for vocalic nuclei, it is still substantial (and mostly consistent across forms)

199 Onset Duration and Place of Articulation It is of interest to examine accent’s impact on duration of onset (and coda) constituents in somewhat greater detail

200 Onset Duration and Place of Articulation It is of interest to examine accent’s impact on duration of onset (and coda) constituents in somewhat greater detail A convenient means to do so is to partition the data with respect to place of maximum articulatory constriction in order to highlight certain patterns

201 Onset Duration and Place of Articulation It is of interest to examine accent’s impact on duration of onset (and coda) constituents in somewhat greater detail A convenient means to do so is to partition the data with respect to place of maximum articulatory constriction in order to highlight certain patterns What is place of articulation?

202 Onset Duration and Place of Articulation It is of interest to examine accent’s impact on duration of onset (and coda) constituents in somewhat greater detail A convenient means to do so is to partition the data with respect to place of maximum articulatory constriction in order to highlight certain patterns What is place of articulation? Let’s find out!

203 Place of Articulation – A Brief Primer The tongue contacts (or nearly so) the roof of the mouth in producing many of the consonantal sounds in English Anterior Labial [p] [b] [m] Labio-dental [f] [v] Inter-dental [th] [dh] Central Alveolar [t] [d] [n] [s] [z] Posterior Palatal [sh] [zh] Velar [k] [g] [ng] Chameleon Rhoticized [r] Lateral [l] Approximant [hh] From Daniloff (1973)

204 Onset Duration and Place of Articulation We will examine accent’s impact on the duration of onset (and coda) constituents on the basis of articulatory place

205 Onset Duration and Place of Articulation We will examine accent’s impact on the duration of onset (and coda) constituents on the basis of articulatory place First, we will examine the anterior consonants, followed by the central and posterior onsets

206 Onset Duration and Place of Articulation We will examine accent’s impact on the duration of onset (and coda) constituents on the basis of articulatory place First, we will examine the anterior consonants, followed by the central and posterior onsets Finally, we will examine those segments whose place of articulation assimilates to that of the following vocalic segment (“place chameleons”)

207 Onset Duration and Place of Articulation We will examine accent’s impact on the duration of onset (and coda) constituents on the basis of articulatory place First, we will examine the anterior consonants, followed by the central and posterior onsets Finally, we will examine those segments whose place of articulation assimilates to that of the following vocalic segment (“place chameleons”) Although the heavily accented onsets are (on average) ca. 50% longer than their unaccented counterparts …

208 Onset Duration and Place of Articulation We will examine accent’s impact on the duration of onset (and coda) constituents on the basis of articulatory place First, we will examine the anterior consonants, followed by the central and posterior onsets Finally, we will examine those segments whose place of articulation assimilates to that of the following vocalic segment (“place chameleons”) Although the heavily accented onsets are (on average) ca. 50% longer than their unaccented counterparts … There is a large disparity in durational differences as a function of accent level

209 Onset Duration and Place of Articulation We will examine accent’s impact on the duration of onset (and coda) constituents on the basis of articulatory place First, we will examine the anterior consonants, followed by the central and posterior onsets Finally, we will examine those segments whose place of articulation assimilates to that of the following vocalic segment (“place chameleons”) Although the heavily accented onsets are (on average) ca. 50% longer than their unaccented counterparts … There is a large disparity in durational differences as a function of accent level We will now examine the specific durational patterns as a function of articulatory place...

210 Onset Duration and Place of Articulation We will examine accent’s impact on the duration of onset (and coda) constituents on the basis of articulatory place First, we will examine the anterior consonants, followed by the central and posterior onsets Finally, we will examine those segments whose place of articulation assimilates to that of the following vocalic segment (“place chameleons”) Although the heavily accented onsets are (on average) ca. 50% longer than their unaccented counterparts … There is a large disparity in durational differences as a function of accent level We will now examine the specific durational patterns as a function of articulatory place... The patterns are revealing

211 Syllable Onset Duration - ANTERIOR Place Canonical Syllable Forms The VOICELESS consonants ([p] and [f]) are longer than the other segments

212 Syllable Onset Duration - ANTERIOR Place Canonical Syllable Forms The VOICELESS consonants ([p] and [f]) are longer than the other segments The largest durational disparity (as a function of accent level) is exhibited in the glide [y] – in many respects this segment is VOCALIC in nature (glide)

213 Syllable Onset Duration - ANTERIOR Place Canonical Syllable Forms The VOICELESS consonants ([p] and [f]) are longer than the other segments The largest durational disparity (as a function of accent level) is exhibited in the glide [y] – in many respects this segment is VOCALIC in nature (glide) The smallest durational disparity is manifest in the voiced fricative [dh] – this segment is actually a juncture with a high rate of deletion

214 Syllable Onset Duration - ANTERIOR Place Canonical Syllable Forms The VOICELESS consonants ([p] and [f]) are longer than the other segments The largest durational disparity (as a function of accent level) is exhibited in the glide [y] – in many respects this segment is VOCALIC in nature The smallest durational disparity is manifest in the voiced fricative [dh] – this segment is actually a juncture with a high rate of deletion Other segments exhibit intermediate patterns

215 Syllable Onset Duration - CENTRAL Place Canonical Syllable Forms The VOICELESS consonants ([t] and [s]) are longer than the other segments

216 Syllable Onset Duration - CENTRAL Place Canonical Syllable Forms The VOICELESS consonants ([t] and [s]) are longer than the other segments The alveolar flap [dx] and nasal flap [nx] are the shortest segments and don’t exhibit a durational disparity as a function of accent level

217 Syllable Onset Duration - POSTERIOR Place CANONICAL Syllable Forms The VOICELESS consonants ([k], [sh], [ch]) are longer than the other segments

218 Syllable Onset Duration - POSTERIOR Place CANONICAL Syllable Forms The VOICELESS consonants ([k], [sh], [ch]) are longer than the other segments Most of the segments exhibit a durational disparity between accented and unaccented forms

219 Syllable Onset Duration - POSTERIOR Place CANONICAL Syllable Forms The VOICELESS consonants ([k], [sh], [ch]) are longer than the other segments Most of the segments exhibit a durational disparity between accented and unaccented forms The duration of the voiced segments in unaccented syllables is ca. 50-60 ms

220 Syllable Onset Duration - POSTERIOR Place CANONICAL Syllable Forms The VOICELESS consonants ([k], [sh], [ch]) are longer than the other segments Most of the segments exhibit a durational disparity between accented and unaccented forms The duration of the voiced segments in unaccented syllables is ca. 50-60 ms The glide [w] exhibits a significant disparity between accented and unaccented forms

221 Syllable Onset Duration - Place Chameleons CANONICAL Syllable Forms Place chameleon segments exhibit a consistent durational disparity between accented and unaccented forms

222 Syllable Onset Duration - Place Chameleons CANONICAL Syllable Forms Place chameleon segments exhibit a consistent durational disparity between accented and unaccented forms In many respects the chameleons behave like VOWELS (and thus their duration is more sensitive to accent level than most other onsets

223 Syllable Onset Duration - Place Chameleons CANONICAL Syllable Forms Place chameleon segments exhibit a consistent durational disparity between accented and unaccented forms In many respects the chameleons behave like VOWELS (and thus their duration is more sensitive to accent level than most other onsets In unaccented syllables the duration of chameleons is ca. 50-60 ms

224 Segmental Identity and Stress Accent It is of interest to compare accent’s impact on segmental duration with its impact on segmental realization (i.e., whether the segment is realized canonically or not …)

225 Segmental Identity and Stress Accent It is of interest to compare accent’s impact on segmental duration with its impact on segmental realization (i.e., whether the segment is realized canonically or not...) The pattern of segmental realization bears some correspondence to durational variation as a function of accent level

226 Segmental Identity and Stress Accent It is of interest to compare accent’s impact on segmental duration with its impact on segmental realization (i.e., whether the segment is realized canonically or not...) The pattern of segmental realization bears some correspondence to durational variation as a function of accent level We’ll reserve discussion of this (most interesting) topic until tomorrow (“Beyond the Phoneme”)

227 PART NINE Stress Accent’s Impact on Syllable Codas

228 Stress Accent and Syllable Codas Stress accent’s impact on syllable codas differs from that of onsets

229 Stress Accent and Syllable Codas Stress accent’s impact on syllable codas differs from that of onsets The disparity in duration between accented and unaccented forms tends to be significantly less for codas than for onsets (at least when deletions are omitted from consideration)

230 Stress Accent and Syllable Codas Stress accent’s impact on syllable codas differs from that of onsets The disparity in duration between accented and unaccented forms tends to be significantly less for codas than for onsets (at least when deletions are omitted from consideration) There is a far greater probability of segmental deletion in coda constituents

231 Stress Accent and Syllable Codas Stress accent’s impact on syllable codas differs from that of onsets The disparity in duration between accented and unaccented forms tends to be significantly less for codas than for onsets (at least when deletions are omitted from consideration) There is a far greater probability of segmental deletion in coda constituents Accent level exerts a powerful influence on both segmental deletion and segmental duration

232 Stress Accent and Syllable Codas Stress accent’s impact on syllable codas differs from that of onsets The disparity in duration between accented and unaccented forms tends to be significantly less for codas than for onsets (at least when deletions are omitted from consideration) There is a far greater probability of segmental deletion in coda constituents Accent level exerts a powerful influence on both segmental deletion and segmental duration To a certain degree segmental deletion and duration interact (or are flip sides of the same phonetic coin – but will not be discussed here)

233 Coda Duration - Accent Level/Syllable Form Coda duration (on average) is similar across syllable structure, both for accented and unaccented forms Canonical Syllable Forms

234 Coda Duration - Accent Level/Syllable Form Coda duration (on average) is similar across syllable structure, both for accented and unaccented forms There is a relatively small dynamic range in duration between accented and unaccented codas (relative to onsets and nuclei) Canonical Syllable Forms

235 Coda Duration - Accent Level/Syllable Form Coda duration (on average) is similar across syllable structure, both for accented and unaccented forms There is a relatively small dynamic range in duration between accented and unaccented codas (relative to onsets and nuclei) Moreover, the duration of certain coda constituents is virtually identical in accented and unaccented syllables Canonical Syllable Forms

236 Syllable Coda Duration - ANTERIOR Place CANONICAL Syllable Forms The durational disparity between accented and unaccented forms is smaller for codas than for onsets

237 Syllable Coda Duration - ANTERIOR Place CANONICAL Syllable Forms The durational disparity between accented and unaccented forms is smaller for codas than for onsets Certain segments exhibit little if any difference in duration as a function of accent (e.g., [b], [m], [v])

238 Syllable Coda Duration - ANTERIOR Place CANONICAL Syllable Forms The durational disparity between accented and unaccented forms is smaller for codas than for onsets Certain segments exhibit little if any difference in duration as a function of accent (e.g., [b], [m], [v]) Such segments manifest certain properties of FLAPS (pure junctures)

239 Syllable Coda Duration - ANTERIOR Place ALLSyllable Forms Because of the significant number of deletions in coda constituents, particularly in unaccented syllables, the durational disparity between accented and unaccented syllables is preserved when duration is computed across ALL syllable forms (including those with deletions)

240 Syllable Coda Duration - ANTERIOR Place ALLSyllable Forms Because of the significant number of deletions in coda constituents, particularly in unaccented syllables, the durational disparity between accented and unaccented syllables is preserved when duration is computed across ALL syllable forms (including those with deletions) Those segments exhibiting flap-like properties (e.g., [b], [m], [v]) tend to delete the most in unaccented codas

241 Syllable Coda Duration - CENTRAL Place CANONICAL Syllable Forms The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties

242 Syllable Coda Duration - CENTRAL Place CANONICAL Syllable Forms The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties The duration of many of the coda segments do not exhibit a difference in duration

243 Syllable Coda Duration - CENTRAL Place CANONICAL Syllable Forms The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties The duration of many of the coda segments do not exhibit a difference in duration Many of the unaccented central codas are short in duration, in contrast to:

244 Syllable Coda Duration - CENTRAL Place CANONICAL Syllable Forms The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties The duration of many of the coda segments do not exhibit a difference in duration Many of the accented central codas are short in duration, in contrast to: (1) central onsets

245 Syllable Coda Duration - CENTRAL Place CANONICAL Syllable Forms The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties The duration of many of the coda segments do not exhibit a difference in duration Many of the accented central codas are short in duration, in contrast to: (1) central onsets, (2) anterior codas

246 Syllable Coda Duration - CENTRAL Place CANONICAL Syllable Forms The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties The duration of many of the coda segments do not exhibit a difference in duration Many of the accented central codas are short in duration, in contrast to: (1) central onsets, (2) anterior codas, (3) posterior codas

247 Syllable Coda Duration - CENTRAL Place CANONICAL Syllable Forms The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties The duration of many of the coda segments do not exhibit a difference in duration Many of the accented central codas are short in duration, in contrast to: (1) central onsets, (2) anterior codas, (3) posterior codas, (4) chameleon codas

248 Syllable Coda Duration - CENTRAL Place CANONICAL Syllable Forms The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties The duration of many of the coda segments do not exhibit a difference in duration Many of the accented central codas are short in duration, in contrast to: (1) central onsets, (2) anterior codas, (3) posterior codas, (4) chameleon codas

249 Syllable Coda Duration - CENTRAL Place ALLSyllable Forms Because of the high probability of deletions for central coda consonants the mean durations are quite low relative to other conditions

250 Syllable Coda Duration - CENTRAL Place ALL Syllable Forms Because of the high probability of deletions for central coda consonants the mean durations are quite low relative to other conditions In some sense the default duration for central codas is very short

251 Syllable Coda Duration - POSTERIOR Place CANONICAL Syllable Forms Many coda consonants are short in duration

252 Syllable Coda Duration - POSTERIOR Place CANONICAL Syllable Forms Many coda consonants are short in duration Most segments exhibit relatively little sensitivity to accent level

253 Syllable Coda Duration - POSTERIOR Place ALL Syllable Forms There are relatively few deletions in coda segments, hence the durational patterns are similar for ALL syllable forms relative to the canonical syllable forms

254 Syllable Coda Duration - Place Chameleons CANONICAL Syllable Forms In coda position, chameleons function essentially as vowels

255 Syllable Coda Duration - Place Chameleons CANONICAL Syllable Forms In coda position, chameleons function essentially as vowels Hence, the large durational disparity between accented and unaccented chameleon segments

256 Syllable Coda Duration - Place Chameleons ALL Syllable Forms There are a lot of deletions of coda chameleons in unaccented syllables

257 Syllable Coda Duration - Place Chameleons ALL Syllable Forms There are a lot of deletions of coda chameleons in unaccented syllables Hence the mean duration of these segments in unaccented forms is short (when deletions are counted as “0” in terms of duration)

258 PART TEN What’s Going on in Duration?

259 Durational Properties of Phonetic Segments The durational properties of phonetic segments vary as a function of BOTH syllable position and stress-accent level

260 Durational Properties of Phonetic Segments The durational properties of phonetic segments vary as a function of BOTH syllable position and stress-accent level Vocalic nuclei (and their quasi-consonantal counterparts, the glides and place chameleons) exhibit the greatest durational sensitivity to stress- accent level

261 Durational Properties of Phonetic Segments The durational properties of phonetic segments vary as a function of BOTH syllable position and stress-accent level Vocalic nuclei (and their quasi-consonantal counterparts, the glides and place chameleons) exhibit the greatest durational sensitivity to stress- accent level – such segments absorb much of accent’s impact on duration

262 Durational Properties of Phonetic Segments The durational properties of phonetic segments vary as a function of BOTH syllable position and stress-accent level Vocalic nuclei (and their quasi-consonantal counterparts, the glides and place chameleons) exhibit the greatest durational sensitivity to stress- accent level – such segments absorb much of accent’s impact on duration The durational properties of coda consonants are relatively insensitive to stress accent ….

263 Durational Properties of Phonetic Segments The durational properties of phonetic segments vary as a function of BOTH syllable position and stress-accent level Vocalic nuclei (and their quasi-consonantal counterparts, the glides and place chameleons) exhibit the greatest durational sensitivity to stress- accent level – such segments absorb much of accent’s impact on duration The durational properties of coda consonants are relatively insensitive to stress accent …. Unless they are vocalic in nature (such as the place chameleons, [r] and [l])

264 Durational Properties of Phonetic Segments The durational properties of phonetic segments vary as a function of BOTH syllable position and stress-accent level Vocalic nuclei (and their quasi-consonantal counterparts, the glides and place chameleons) exhibit the greatest durational sensitivity to stress- accent level – such segments absorb much of accent’s impact on duration The durational properties of coda consonants are relatively insensitive to stress accent …. Unless they are vocalic in nature (such as the place chameleons, [r] and [l]) The durational properties of onset consonants exhibit a sensitivity to stress accent somewhere in between nuclei and codas

265 Durational Properties of Phonetic Segments The durational properties of phonetic segments vary as a function of BOTH syllable position and stress-accent level Vocalic nuclei (and their quasi-consonantal counterparts, the glides and place chameleons) exhibit the greatest durational sensitivity to stress- accent level – such segments absorb much of accent’s impact on duration The durational properties of coda consonants are relatively insensitive to stress accent …. Unless they are vocalic in nature (such as the place chameleons, [r] and [l]) The durational properties of onset consonants exhibit a sensitivity to stress accent somewhere in between nuclei and codas Most of the fricated consonants (fricatives and affricates) are relatively long, and their duration is relatively insensitive to stress accent

266 Durational Properties of Phonetic Segments The durational properties of phonetic segments vary as a function of BOTH syllable position and stress-accent level Vocalic nuclei (and their quasi-consonantal counterparts, the glides and place chameleons) exhibit the greatest durational sensitivity to stress- accent level – such segments absorb much of accent’s impact on duration The durational properties of coda consonants are relatively insensitive to stress accent …. Unless they are vocalic in nature (such as the place chameleons, [r] and [l]) The durational properties of onset consonants exhibit a sensitivity to stress accent somewhere in between nuclei and codas Most of the fricated consonants (fricatives and affricates) are relatively long, and their duration is relatively insensitive to stress accent The “pure” junctures (flaps and glottal stop) are short, and their duration is insensitive to stress accent

267 Durational Properties of Phonetic Segments The durational properties of phonetic segments vary as a function of BOTH syllable position and stress-accent level Vocalic nuclei (and their quasi-consonantal counterparts, the glides and place chameleons) exhibit the greatest durational sensitivity to stress- accent level – such segments absorb much of accent’s impact on duration The durational properties of coda consonants are relatively insensitive to stress accent …. Unless they are vocalic in nature (such as the place chameleons, [r] and [l]) The durational properties of onset consonants exhibit a sensitivity to stress accent somewhere in between nuclei and codas Most of the fricated consonants (fricatives and affricates) are relatively long, and their duration is relatively insensitive to stress accent The “pure” junctures (flaps and glottal stop) are short, and their duration is insensitive to stress accent Such durational properties are incommensurate with traditional phonetic- segment models of spoken language

268 Durational Properties of Phonetic Segments The durational properties of phonetic segments vary as a function of BOTH syllable position and stress-accent level Vocalic nuclei (and their quasi-consonantal counterparts, the glides and place chameleons) exhibit the greatest durational sensitivity to stress- accent level – such segments absorb much of accent’s impact on duration The durational properties of coda consonants are relatively insensitive to stress accent …. Unless they are vocalic in nature (such as the place chameleons, [r] and [l]) The durational properties of onset consonants exhibit a sensitivity to stress accent somewhere in between nuclei and codas Most of the fricated consonants (fricatives and affricates) are relatively long, and their duration is relatively insensitive to stress accent The “pure” junctures (flaps and glottal stop) are short, and their duration is insensitive to stress accent Such durational properties are incommensurate with traditional phonetic- segment models of spoken language A more appropriate representation for understanding durational variation is a juncture-accent model (to be discussed in more detail tomorrow)

269 Spectro-Temporal Profile - DiSyllabic Word [s] [eh] [vx] [en] juncture accented syllable unaccented syllable “Seven” mean duration Full-spectrum perspective OGI Numbers95 [s] [eh] [vx] [en] Nucleus Onset Ambi-syllabic Nucleus Juncture

270 That’s All Many Thanks for Your Time and Attention

271 PART TEN Spectro-Temporal Profiles Revisited

272 The Spectro-Temporal Profile (STeP) Certain specific (and important) properties of the syllable are not well represented in terms of the traditional 2.5-D spectrographic representation

273 The Spectro-Temporal Profile (STeP) Certain specific (and important) properties of the syllable are not well represented in terms of the traditional 2.5-D spectrographic representation STRESS ACCENT and JUNCTURE are two such properties

274 The Spectro-Temporal Profile (STeP) Certain specific (and important) properties of the syllable are not well represented in terms of the traditional 2.5-D spectrographic representation STRESS ACCENT and JUNCTURE are two such properties A different representation, based on the log, critical-band energy profile across frequency and time, can provide the requisite detail

275 The Spectro-Temporal Profile (STeP) Certain specific (and important) properties of the syllable are not well represented in terms of the traditional 2.5-D spectrographic representation STRESS ACCENT and JUNCTURE are two such properties A different representation, based on the log, critical-band energy profile across frequency and time, can provide the requisite detail As shown in “miniature” below ….. STePs are derived from averages of hundreds of individual instances

276 The Spectro-Temporal Profile (STeP) Certain specific (and important) properties of the syllable are not well represented in terms of the traditional 2.5-D spectrographic representation STRESS ACCENT and JUNCTURE are two such properties A different representation, based on the log, critical-band energy profile across frequency and time, can provide the requisite detail As shown in “miniature” below …. (and as shown in expanded form on the following slides) STePs are derived from averages of hundreds of individual instances

277 Spectro-Temporal Profile - DiSyllabic Word [s] [eh] [vx] [en] juncture accented syllable unaccented syllable “Seven” mean duration Full-spectrum perspective OGI Numbers95 [s] [eh] [vx] [en] Nucleus Onset Ambi-syllabic Nucleus Juncture

278 [s] [eh] [vx] [en] juncture accented syllable unaccented syllable mean duration “Seven” High-frequency perspective OGI Numbers95 [s] [eh] [vx] [en] Spectro-Temporal Profile - DiSyllabic Word Nucleus Juncture Onset Nucleus

279 PART EIGHT A Preliminary Juncture-Accent Model

280 A means of visualizing important properties of the acoustic signal Road Map to the Juncture-Accent Model

281 A means of visualizing important properties of the acoustic signal The juncture-accent representation is based on log, critical-band energy across time and frequency Road Map to the Juncture-Accent Model

282 A means of visualizing important properties of the acoustic signal The juncture-accent representation is based on log, critical-band energy across time and frequency Although it is not intended as an auditory representation, it does represent spectro-temporal properties of the signal in a manner consistent with auditory principles Road Map to the Juncture-Accent Model

283 A means of visualizing important properties of the acoustic signal The juncture-accent representation is based on log, critical-band energy across time and frequency Although it is not intended as an auditory representation, it does represent spectro-temporal properties of the signal in a manner consistent with auditory principles Let’s take a look at some illustrations – Spectro-Temporal Profiles or “STePs” Road Map to the Juncture-Accent Model

284 Anatomy of a Spectro-Temporal Profile [s] [eh] [vx] [en] juncture accented syllable unaccented syllable “Seven” mean duration Full-spectrum perspective OGI Numbers95 [s] [eh] [vx] [en]

285 [s] [eh] [vx] [en] juncture accented syllable unaccented syllable mean duration “Seven” Anatomy of a Spectro-Temporal Profile High-frequency perspective OGI Numbers95 [s] [eh] [vx] [en]

286 Anatomy of a Spectro-Temporal Profile juncture accented syllable unaccented syllable [z] mean duration “Zero” [ih] [r] [ax] Full-spectrum perspective OGI Numbers95 [z] [ih] [r] [ax]

287 Spectro-Temporal Profile juncture unaccented syllable mean duration “Zero” [ih] [r] [ax] accented syllable [z] High-frequency perspective OGI Numbers95 [z] [ih] [r] [ah]

288 Spectro-Temporal Profile mean duration “Three” [iy] [r] accented syllable [th] Full-spectrum perspective OGI Numbers95 [th] [r] [iy]

289 Spectro-Temporal Profile mean duration “Three” [r] accented syllable [iy] High-frequency perspective OGI Numbers95 [th] [th] [r] [iy]

290 With respect to onset and coda segments (i.e. consonants) there are two basic forms – (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e. stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables The place chameleons (i.e., the approximants) are not very stable in either onset or coda position The vowels are divisible into two main groups – accented and unaccented The accented vowels are generally canonically realized and quasi-evenly distributed across the vowel space The unaccented forms tend to concentrate in the high-front and high-central regions of the vowel space Certain segments are actually junctures – e.g., the flaps and the glottal stop Many so-called segments are actually junctures (as they are flaps), the most noteworthy examples are [dh] and [v] What’s Going On? (in pronunciation)

291 With respect to onset and coda segments (i.e. consonants) there are two basic forms – (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e. stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables The place chameleons (i.e., the approximants) are not very stable in either onset or coda position The vowels are divisible into two main groups – accented and unaccented The accented vowels are generally canonically realized and quasi-evenly distributed across the vowel space The unaccented forms tend to concentrate in the high-front and high-central regions of the vowel space Certain segments are actually junctures – e.g., the flaps and the glottal stop Many so-called segments are actually junctures (as they are flaps), the most noteworthy examples are [dh] and [v] None of these properties is consistent with a segmental model of language What’s Going On? (in pronunciation)

292 Syllable Duration and Number of Segments For syllables greater than a single segment there is relatively little difference in duration as the number of segments (within a syllable) increases Canonical Syllable Forms

293 Syllable Duration and Number of Segments For syllables greater than a single segment there is relatively little difference in duration as the number of segments (within a syllable) increases Suggesting that syllable duration is largely controlled by processes independent of segmental production Canonical Syllable Forms

294 OVERTURE The Central Challenge for Models of Speech Recognition

295 Phonemic Beads on a String Illustrated In traditional models of speech recognition (by machine) words are represented as mere sequences of phonetic segments (“phones”) …. Frequency

296 Phonemic Beads on a String Illustrated In traditional models of speech recognition (by machine) words are represented as mere sequences of phonetic segments (“phones”) …. Strung together like “beads on a string” Frequency

297 Phonemic Beads on a String Illustrated In traditional models of speech recognition (by machine) words are represented as mere sequences of phonetic segments (“phones”) …. Strung together like “beads on a string” Analogous (in some measure) to the orthographic representation in a dictionary Frequency

298 Phonemic Beads on a String Illustrated In traditional models of speech recognition (by machine) words are represented as mere sequences of phonetic segments (“phones”) …. Strung together like “beads on a string” Analogous (in some measure) to the orthographic representation in a dictionary Little quarter is provided for prosody or other (extra)syllabic properties Frequency

299 Phonemic Beads on a String Illustrated Is this an accurate characterization of spoken language? Frequency

300 Phonemic Beads on a String Illustrated Is this an accurate characterization of spoken language? If it were, then current speech recognition systems, which are predicated on such a perspective, would experience little difficulty decoding speech Frequency

301 Phonemic Beads on a String Illustrated Is this an accurate characterization of spoken language? If it were, then current speech recognition systems, which are predicated on such a perspective, would experience little difficulty decoding speech In fact, such ASR systems require extensive, time-consuming “training” on material similar to that in the task at hand in order to function well Frequency

302 Phonemic Beads on a String Illustrated Is this an accurate characterization of spoken language? If it were, then current speech recognition systems, which are predicated on such a perspective, would experience little difficulty decoding speech In fact, such ASR systems require extensive, time-consuming “training” on material similar to that in the task at hand in order to function well Moreover, ASR systems require detailed statistical knowledge of the WORDS spoken in the task in order to do well Frequency

303 Phonemic Beads on a String Illustrated Is this an accurate characterization of spoken language? If it were, then current speech recognition systems, which are predicated on such a perspective, would experience little difficulty decoding speech In fact, such ASR systems require extensive, time-consuming “training” on material similar to that in the task at hand in order to function well Moreover, ASR systems require detailed statistical knowledge of the WORDS spoken in the task in order to do well i.e., phonetic decoding is insufficient (by itself) to recognize speech Frequency

304 Phonemic Beads on a String Illustrated Is this an accurate characterization of spoken language? If it were, then current speech recognition systems, which are predicated on such a perspective, would experience little difficulty decoding speech In fact, such ASR systems require extensive, time-consuming “training” on material similar to that in the task at hand in order to function well Moreover, ASR systems require detailed statistical knowledge of the WORDS spoken in the task in order to do well i.e., phonetic decoding is insufficient (by itself) to recognize speech – why? Frequency

305 Challenge Number One Acoustic Variability

306 Effects of Reverberation on the Speech Signal Reflections from walls and other surfaces routinely modify the spectro- temporal structure of the speech signal under everyday conditions

307 Effects of Reverberation on the Speech Signal Reflections from walls and other surfaces routinely modify the spectro- temporal structure of the speech signal under everyday conditions Yet, the intelligibility of speech is remarkably stable (unless the amount of reverberation or background noise is truly extreme)

308 Effects of Reverberation on the Speech Signal Reflections from walls and other surfaces routinely modify the spectro- temporal structure of the speech signal under everyday conditions Yet, the intelligibility of speech is remarkably stable (unless the amount of reverberation or background noise is truly extreme) How can this be so?

309 QUESTION ONE How DO listeners decode the speech signal given the large variation in the acoustic background?

310 QUESTION TWO Is there some acoustic property that provides a basis for perceptual stability of the speech signal?

311 An Invariant Property of the Speech Signal? Low-frequency energy fluctuations of the pressure waveform are largely preserved under many acoustic-interference conditions [based on an illustration by Hynek Hermansky] Modulation Spectrum

312 An Invariant Property of the Speech Signal? Low-frequency energy fluctuations of the pressure waveform are largely preserved under many acoustic-interference conditions In reverberant environments the MODULATION SPECTRUM’S peak is attenuated and shifted down to ca. 2 Hz (but is largely preserved) [based on an illustration by Hynek Hermansky] Modulation Spectrum

313 An Invariant Property of the Speech Signal? Low-frequency energy fluctuations of the pressure waveform are largely preserved under many acoustic-interference conditions In reverberant environments the modulation spectrum’s peak is attenuated and shifted down to ca. 2 Hz (but is largely preserved) (“What is the modulation spectrum?” you ask) [based on an illustration by Hynek Hermansky] Modulation Spectrum

314 Modulation Spectrum Computation

315 Intelligibility and the Modulation Spectrum Significant attenuation (or distortion) of the modulation spectrum results in an appreciable decline in the ability to understand spoken language Greenberg and Arai (1998)

316 Effects of Reverberation on the Speech Signal Reflections from walls and other surfaces routinely modify the spectro- temporal structure of the speech signal under everyday conditions

317 Effects of Reverberation on the Speech Signal Reflections from walls and other surfaces routinely modify the spectro- temporal structure of the speech signal under everyday conditions Yet, the intelligibility of speech is remarkably stable (unless the amount of reverberation or background noise is truly extreme)

318 Intelligibility and Spectral Asynchrony Intelligibility is relatively unaffected by spectral asynchrony as great as 140 ms – 75% words correct when mean asynchrony = mean phone duration

319 Intelligibility and Spectral Asynchrony Intelligibility is relatively unaffected by spectral asynchrony as great as 140 ms – 75% words correct when mean asynchrony = mean phone duration Speech intelligibility does appear to be roughly correlated with the energy in the modulation spectrum between 3 and 6 Hz


Download ppt "Temporal Properties of Spoken Language Steven Greenberg In Collaboration with Hannah Carvey,"

Similar presentations


Ads by Google