Structure of Spoken Language

Structure of Spoken Language
CSE 551: Structure of Spoken Language Lecture 9: Syllable Structure, Vowel Neutralization, and Coarticulation John-Paul Hosom Fall 2004

NOTE There’s a tutorial on the web that allows you to hear the effect of different formant values: You can enter start time, end time, amplitude, and formant values for beginning, middle and end of a “syllable”, then generate a waveform and hear the result.

Syllables Words are composed of phonetic clusters: syllables Each syllable has a nucleus; typically the nucleus is a vowel or diphthong, sometimes a syllabic nasal or lateral (button, bottle) or retroflex (bird) Nucleus is syllabic nasal or lateral only when following alveolar consonant in previous syllable of a word Syllable boundaries sometimes ambiguous: “tasty”: tas/ty tast/y ta/sty “bottling”: bott/l/ing bott/ling Syllable can be broken into components: syllable contains {onset, rhyme} rhyme contains {nucleus, coda} onset and coda are consonants, rhyme is a vowel.

limitations on consonant clusters:
Syllables limitations on consonant clusters: not all CCC combinations are possible in syllable-initial position graphic from

Syllables Sonority corresponds roughly to degree of constriction along vocal and/or nasal tract Ordering of sonority: vowels, glides (/w/, /y/), liquids (/l/, /r/), nasals, fricatives, affricates, stops Fricatives, affricates, and stops may be clustered into one category, “obstruents,” for purposes of sonority Syllabification can be done according to “sonority principle”; the sonority must rise and fall in a syllable Also, there’s the Maximal Onset Principle: “Put a consonant in the onset rather than the coda when possible”

Syllables Because of rise and fall of sonority in syllables the following restrictions occur: (a) glide (/w/,/y/) must be immediately adjacent to a vowel, (b) /r/ and then /l/ are next closest consonant(s) to vowel, (c) nasal is next closest, (d) obstruent is farthest from the vowel (but there may be more than one obstruent in onset or coda) Obstruents in a cluster must have same voicing In series of obstruents between two vowels, voicing can change only once, at the syllable boundary. English allows up to 3 consonants in syllable initial position, 4 consonants at syllable final position Examples: sphere /s f iy r/, streak /s t r iy k/, texts /t eh k s t s/, helms /h eh l m z/ but not /t l iy/ or /p w iy/

Syllables The ordering of glides and liquids doesn’t matter for our purposes (applying to syllabification), because glides and liquids can not occur sequentially within the same syllable. Fricatives and stops can occur in any order within a syllable, e.g. “Senator Paul Tsongas”, “tsunami”, “stone” (Although in English most burst-fricative pairs are represented as distinct phonemes (/ch/, /jh/)).

Vowel Neutralization When speech is uttered very quickly (or is not well enunciated), the formants tend to shift toward that of a neutral vowel: (from Daniloff, p. 320) (from van Bergem 1993 p. 8)

Vowel Neutralization Target undershoot: /m ih pc ph ih eh/

Vowel Neutralization /m ih pc ph ih eh/ Target undershoot: /ih/ extracted and concatenated from “mip”:

Vowel Neutralization However, neutralization is not always so simple; sometimes vowel formants shift away from the neutral position, depending on their context, and vowels tend toward slightly different neutral “targets”. Neutralization is to some extent an artifact of averaging over speakers and contexts (van Bergem 1993)

Coarticulation Coarticulation is the “blending” of adjacent speech sounds, due to gradual movement of the articulators. Coarticulation makes automatic speech recognition and text-to-speech synthesis difficult, but humans use coarticulation to conserve effort while speaking and provide robustness during recognition. There is Right-to-Left (RL) or “anticipatory” and Left-to-Right (LR) or “carry-over” coarticulation Models of coarticulation and syllabification:  Locus Theory  Modified Locus Theory (Klatt)  Öhman’s Theory  Kozhevnikov-Chistovich (KC) Theory  Wickelgren’s Theory, etc.

RL coarticulation occurs due to high-level planning of phonetic
sequences: “spoon”: [s p uw n] rounding in isolation – – + – rounding in context more observable if neighboring sounds not specified with respect to potentially coarticulated feature; e.g. /s/, /p/, /n/ not specified with respect to lip rounding (from Daniloff, pp )

Coarticulation: Locus Theory
Locus Theory (Delattre, Liberman, and Cooper, 1955) “there are, for each consonant, characteristic frequency positions, or loci, at which the formant transitions begin, or to which they may be assumed to point. On this basis, the transitions may be regarded simply as movements of the formants from their respective loci to the frequency levels appropriate for the next phone … The spectrographic patterns …, which produce /d/ before /iy/, /aa/, and /ow/, show how … these transitions seem to be pointing to a [F2] locus in the vicinity of 1800 [Hz].”  Each consonant has “target frequencies” independent of the neighboring vowels.  Formants transition from these target frequencies to the vowel target frequencies.

Coarticulation: Locus Theory
Consonants and vowels both have “targets” of articulator positions and therefore formant frequency locations Given sufficient duration of a syllable, all phonemes reach their targets The slope of the formants during a transition from a consonant to a vowel is relatively constant until reaching the target If the syllable duration doesn’t allow enough time for the formants to reach their targets, “target undershoot” occurs and the formants change direction before fully realizing the intended vowel

Coarticulation: Locus Theory Locus Theory:
(From Klatt 1987, p. 753)

Coarticulation: Modified Locus Theory Problems with Locus Theory:
A transition may have both rapid and slow components; rapid release of obstruction via tongue tip, followed by slow movement of tongue body. Preceding vowel can influence F2 onset of a CV transition (Öhman, 1966) F2 may be insensitive to oral constrictions (obstruents) if the tongue position is toward the front of the mouth (as in /iy/) (as reported by Fant 1973, Klatt1987)

Coarticulation: Modified Locus Theory
Klatt hypothesized that main effects of the vowel on the articulation of consonants are front/back position and lip rounding Vowels divided into three sets: {+front} {+round} {–front, –round} (because there are no rounded front vowels in English, sets 1 and 2 are mutually exclusive) {+front} /iy ih eh ae/ {+round} /uw ao ow er/ {–front, –round} /uh ah aa aw/ Predicted Fonset from Ftarget for these 3 classes (locus theory) Achieved 95% intelligibility for CVC nonsense syllables

Coarticulation: Locus Theory Modified Locus Theory:
×= -front, -round ° = +front • = +round (From Klatt 1987, p. 754)

Coarticulation: Öhman’s Theory
Öhman (1965) found that loci of consonants is NOT independent of neighboring vowels: and that for /g/ more than one locus is required Conclusion: consonant “gestures” are superimposed on vowel “gestures” that are present during the consonant; even when consonant is being uttered in VCV, there is effect of both V on C.

Coarticulation: Öhman’s Theory
Öhman (1966) proposed model of coarticulation based on vocal-tract shape evolving over time. Assumes that vocal-tract shapes can be mapped to formant frequencies. For VCV utterances: where s(x,t) is the vocal tract shape at position x and time t, v(x) is the vocal tract shape at position x for a given vowel, c(x) is the vocal tract shape of the consonant, k(t) is an interpolation value (from 0 to 1), and wc(x) describes the degree to which c(x) “resists” coarticulation. v(x) describes the shape of the vocal tract, which may be a combination of two vowels if V1  V2. (v(x) will vary over time from V1 to V2)

Coarticulation: Kozhevnikov-Chistovich (KC) Theory
Syllabification using CnV pattern: CV, CCV, CCCV, … phrase “give true answers”: g ih v t r uw ae n s er z −−−− −−−−−−−−−−− −− −−−−−−− − S1 S2 S3 S4 S5 (2) Measured relative durations of words, “syllables”, vowels: relative duration of vowel = Dvow / Dsyll, syllable = Dsyll / Dword word = Dword / Dphrase

Coarticulation: Kozhevnikov-Chistovich (KC) Theory
Found coarticulation within syllable but not across syllables: C1 V1 C2 C3 V2 articulatory gestures for consonant(s) and vowel begin nearly simultaneously with onset of initial consonant in syllable Example: lip rounding in /uw/ begins with /v/ in “give true answers”, but nasalization of /ae/ does not occur. assumes little or no LR coarticulation assumes motor programming of speech is discontinuous at VC boundary counter-examples showing LR coarticulation (Moll and Daniloff 1971, Kent, Carney, and Severeid 1974, Öhman 1966)

Coarticulation: Wickelgren’s Theory
Speech units are mentally coded as context-sensitive units: in phonetic string /X Y Z/, Y is encoded as XYZ “By assuming (context-sensitive) allophones to be the basic unit of articulation, … it is trivial to account for how the ‘same phoneme’ in different phonemic environments can be … different in some respects at all levels of the speech process” (Wickelgren 1969, p. 11) However, coarticulation can spread over more than one phone (up to seven phones distance). Other criticisms: MacNeilage 1970, Whitaker 1970, Halwes and Jenkins 1971; “Allophonic richness may only beget strategic poverty” (Kent and Minifie 1977) However, Wickelgren’s is the only model currently used in ASR and concatenative text-to-speech (exceptions: Wouters 2001, Wrede 2001).

Coarticulation: Gay’s Theory
Gay, 1977: The syllabic unit of motor organization is the CV unit Based on X-ray motion pictures of VCV utterances anticipatory tongue movements for V2 in V1CV2 sequence don’t begin until closure of C has been attained movement toward V2 occurs during closure of C, having a large effect on position and shape of tongue during release of closure V1 has little effect on position of tongue at moment of closure supports KC theory; conflicts with Ohman’s findings

Coarticulation Other models: MacNeilage, Henke, Benguerel and Cowan, Moll and Daniloff, Liberman, Tatham, etc. Some are “feature based” in that each phonetic segment is assigned distinctive features which can then be modified in regular ways Some are “hierarchical models”, with several levels of organization and complex interaction between levels However, “coarticulatory patterns are not explained adequately by any … theories or models” (Kent and Minifie, 1977) Conflicting evidence (Öhman and Kent & Moll vs. KC and Gay)

Structure of Spoken Language

Similar presentations

Presentation on theme: "Structure of Spoken Language"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Structure of Spoken Language

Similar presentations

Presentation on theme: "Structure of Spoken Language"— Presentation transcript:

Similar presentations

About project

Feedback