Download presentation
Presentation is loading. Please wait.
Published byDoris Long Modified over 9 years ago
1
1 CS 551/651: Structure of Spoken Language Lecture 7: Syllable Structure, Vowel Neutralization, and Coarticulation John-Paul Hosom Fall 2010
2
2 Syllables Words are composed of phonetic clusters: syllables Each syllable has a nucleus; typically the nucleus is a vowel or diphthong, sometimes a syllabic nasal or lateral (button, bottle) or rhotacized (r-colored) vowel (bird) Nucleus is syllabic nasal or lateral only when following alveolar consonant in previous syllable of a word Syllable boundaries sometimes ambiguous: “beefeater”:beef/eaterbee/feater (Hunt, ICSLP04) “dolphin”:dol/phindolph/in (Wells, 1990) “tender”:ten/dertend/er (Wells, 1990) Syllable can be broken into components: syllable contains {onset, rhyme} rhyme contains {nucleus, coda} onset and coda are consonants, rhyme is a vowel, syllabic nasal, or syllabic lateral
3
3 Syllables Limitations on consonant clusters: not all CCC combinations are possible in syllable-initial position. Of those that are possible, almost half are very rare. graphic from http://www.arts.uwa.edu.au/LingWWW/LIN101-102 very few English words/root with /s k l/: “sclerosis” possibly only one word in English: “spew” only a few English words pronounced (optionally) with /s t y/: “Stewart”, “steward”, “stew” very few English words with /s k y/: “skew”, “askew”, “obscure”
4
4 Syllables Sonority corresponds roughly to degree of constriction along vocal and/or nasal tract Ordering of sonority: vowels, glides (/w/, /y/), liquids (/l/, /r/), nasals, fricatives, affricates, plosives If a binary classification (sonorant/non-sonorant), then sonorant consists of all vowels, glides, liquids, and nasals. Fricatives, affricates, and plosives may be clustered into one category, “obstruents,” for purposes of sonority Syllabification can be done according to “sonority principle”; the sonority must rise and fall in a syllable Also, there’s the Maximal Onset Principle: “Put a consonant in the onset rather than the coda when possible”
5
5 Syllables Because of rise and fall of sonority in syllables, the following restrictions occur: (a) glide (/w/,/y/) must be immediately adjacent to a vowel, (b) /r/ is next closest consonant to vowel, (c) /l/ is next closest consonant to vowel, (d) nasal is next closest, (e) obstruent is farthest from the vowel (but there may be more than one obstruent in onset or coda) Obstruents in a cluster must have same voicing In series of obstruents between two vowels, voicing can change only once, at the syllable boundary. English allows up to 3 consonants in syllable initial position, 4 consonants at syllable final position
6
6 Syllables Examples: sphere /s f iy r/, streak /s t r iy k/, texts /t eh k s t s/, helms /h eh l m z/ but not /s t l iy/, /s p w iy/, /z b r ay/, etc. The ordering of glides and liquids doesn’t matter for our purposes (applying to syllabification), because glides and liquids can not occur sequentially within the same syllable in English. (However, two liquids in the same syllable are possible, e.g. “Carl”, as long as /r/ is closer to the vowel than /l/.) In English, some burst-fricative pairs are represented as distinct phonemes (/ch/, /jh/), although there are some other cases of burst-fricative pairs that remain distinct (e.g. “tsunami,” “bishops”, “six”). It’s also possible to have two or more adjacent fricatives: “eleven twelfths” (note 4 consonants after final vowel)
7
7 Vowel Neutralization When speech is uttered very quickly (or is not well enunciated), the formants tend to shift toward that of a neutral vowel: (from Daniloff, p. 320)(from van Bergem 1993 p. 8)
8
8 Vowel Neutralization Target undershoot: /m ih pc ph ih eh/
9
9 Vowel Neutralization Target undershoot: /ih/ extracted and concatenated from “mip”:
10
10 Vowel Neutralization However, neutralization is not always so simple; sometimes vowel formants shift away from the neutral position, depending on their context, and vowels tend toward slightly different neutral “targets”. Neutralization is to some extent an artifact of averaging over speakers and contexts (van Bergem 1993) vowels from one speaker in different phonetic contexts, and in “reduced” and “isolated” speaking conditions
11
11 Coarticulation Coarticulation is the “blending” of adjacent speech sounds, due to gradual movement of the articulators. Coarticulation makes automatic speech recognition and text-to-speech synthesis difficult, but humans use coarticulation to conserve effort while speaking and provide robustness during recognition. There is Right-to-Left (RL) or “anticipatory” and Left-to-Right (LR) or “carry-over” coarticulation Models of coarticulation and syllabification: Locus Theory Modified Locus Theory (Klatt) Öhman’s Theory Kozhevnikov-Chistovich (KC) Theory Wickelgren’s Theory, etc.
12
12 Coarticulation RL coarticulation occurs due to high-level planning of phonetic sequences: “spoon”:[spuwn] rounding in isolation –– +– rounding in context ++++ more observable if neighboring sounds not specified with respect to potentially coarticulated feature; e.g. /s/, /p/, /n/ not specified with respect to lip rounding (from Daniloff, pp. 323-324)
13
13 Coarticulation: Locus Theory Locus Theory (Delattre, Liberman, and Cooper, 1955) “there are, for each consonant, characteristic frequency positions, or loci, at which the formant transitions begin, or to which they may be assumed to point. On this basis, the transitions may be regarded simply as movements of the formants from their respective loci to the frequency levels appropriate for the next phone … The spectrographic patterns …, which produce /d/ before /iy/, /aa/, and /ow/, show how … these transitions seem to be pointing to a [F2] locus in the vicinity of 1800 [Hz].” Each consonant has “target frequencies” independent of the neighboring vowels. Formants transition from these target frequencies to the vowel target frequencies.
14
14 Coarticulation: Locus Theory Locus Theory: Consonants and vowels both have “targets” of articulator positions and therefore formant frequency locations Given sufficient duration of a syllable, all phonemes reach their targets The slope of the formants during a transition from a consonant to a vowel is relatively constant until reaching the target If the syllable duration doesn’t allow enough time for the formants to reach their targets, “target undershoot” occurs and the formants change direction before fully realizing the intended vowel
15
15 Coarticulation: Locus Theory Locus Theory: (From Klatt 1987, p. 753)
16
16 Coarticulation: Modified Locus Theory Problems with Locus Theory: A transition may have both rapid and slow components; rapid release of obstruction via tongue tip, followed by slow movement of tongue body. Preceding vowel can influence F2 onset of a CV transition (Öhman, 1966) F2 may be insensitive to oral constrictions (obstruents) if the tongue position is toward the front of the mouth (as in /iy/) (as reported by Fant 1973, Klatt1987)
17
17 Coarticulation: Modified Locus Theory Modified Locus Theory: Klatt hypothesized that main effects of the vowel on the articulation of consonants are front/back position and lip rounding Vowels divided into three sets: {+front}{+round}{–front, –round} (because there are no rounded front vowels in English, sets 1 and 2 are mutually exclusive) {+front}/iy ih eh ae/ {+round}/uw ao ow er/ {–front, –round}/uh ah aa aw/ Predicted F onset from F target for these 3 classes (locus theory) Achieved 95% intelligibility for CVC nonsense syllables
18
18 Coarticulation: Locus Theory Modified Locus Theory: (From Klatt 1987, p. 754) ×= -front, -round ° = +front = +round
19
19 Coarticulation: Öhman’s Theory Öhman (1966) found that loci of consonants is NOT independent of neighboring vowels: and that for /g/ more than one locus is required Conclusion: consonant “gestures” are superimposed on vowel “gestures” that are present during the consonant; even when consonant is being uttered in VCV, there is effect of both V on C.
20
20 Coarticulation: Öhman’s Theory Öhman (1967) proposed model of coarticulation based on vocal-tract shape evolving over time. Assumes that vocal-tract shapes can be mapped to formant frequencies. For VCV utterances: where s(x,t) is the vocal tract shape at position x and time t, v(x,t) is the vocal tract shape at position x for a given vowel as it varies over time from vowel 1 to vowel 2, c(x) is the vocal tract shape of the consonant, k(t) is an interpolation value (from 0 to 1), and w c (x) describes the degree to which c(x) “resists” coarticulation.
21
21 Coarticulation: Kozhevnikov-Chistovich (KC) Theory (1)Syllabification using C n V pattern: CV, CCV, CCCV, … phrase “give true answers”: g ihvtruw ae n serz −−−− −−−−−−−−−−− −− −−−−−−− − S1 S2 S3S4S5 (2) Measured relative durations of words, “syllables”, vowels: relative duration of vowel = D vow / D syll, syllable = D syll / D word word = D word / D phrase
22
22 Coarticulation: Kozhevnikov-Chistovich (KC) Theory They measured articulatory effects of vowel on consonants. They found coarticulation within syllable but not across syllables: C 1 V 1 C 2 C 3 V 2 articulatory gestures for consonant(s) and vowel begin nearly simultaneously with onset of initial consonant in syllable Example: lip rounding in /uw/ begins with /v/ in “give true answers”, but nasalization of /ae/ does not occur. focused only on LR coarticulation, effect of V on previous C. assumes motor programming of speech is discontinuous at VC boundary counter-examples showing LR coarticulation (Moll and Daniloff 1971, Kent, Carney, and Severeid 1974, Öhman 1966)
23
23 Coarticulation: Wickelgren’s Theory Speech units are mentally coded as context-sensitive units: in phonetic string /X Y Z/, Y is encoded as X Y Z “By assuming (context-sensitive) allophones to be the basic unit of articulation, … it is trivial to account for how the ‘same phoneme’ in different phonemic environments can be … different in some respects at all levels of the speech process” (Wickelgren 1969, p. 11) However, coarticulation can spread over more than one phone (up to seven phones distance). Other criticisms: MacNeilage 1970, Whitaker 1970, Halwes and Jenkins 1971; “Allophonic richness may only beget strategic poverty” (Kent and Minifie 1977) However, Wickelgren’s is the only model currently used in ASR and concatenative text-to-speech (exceptions: Wouters 2001, Wrede 2001).
24
24 Coarticulation: Gay’s Theory Gay, 1977: The syllabic unit of motor organization is the CV unit Based on X-ray motion pictures of VCV utterances anticipatory tongue movements for V 2 in V 1 CV 2 sequence don’t begin until closure of C has been attained movement toward V 2 occurs during closure of C, having a large effect on position and shape of tongue during release of closure V 1 has little effect on position of tongue at moment of closure supports KC theory; conflicts with Öhman’s findings
25
25 Coarticulation Other models: MacNeilage, Henke, Benguerel and Cowan, Moll and Daniloff, Liberman, Tatham, etc. Some are “feature based” in that each phonetic segment is assigned distinctive features which can then be modified in regular ways Some are “hierarchical models”, with several levels of organization and complex interaction between levels However, “coarticulatory patterns are not explained adequately by any … theories or models” (Kent and Minifie, 1977) Conflicting evidence (Öhman and Kent & Moll vs. KC and Gay)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.