Download presentation
Presentation is loading. Please wait.
1
Morphological information and acoustic salience in Dutch compounds Victor Kuperman, IWTS Radboud University Nijmegen
2
Introduction Kuperman, Pluymaekers, Ernestus, Baayen (in preparation) Goal: Role of morphological structure in modulating fine phonetic detail in speech production. Object: Interfixes in Dutch compounds.
3
Background: Theoretical Framework Economy of articulatory effort versus discriminability of the speech signal (H&H Theory, Lindblom 1990); Distribution of acoustic salience over an utterance depends on the distribution of information; Less predictable (more informative) elements are more salient; More predictable (less informative) elements are more reduced.
4
Background: Theoretical Framework Information transmission is optimal when information is distributed equally (per time unit) throughout the signal. Important elements need longer or more careful transmission: less likely to be lost to noise. Acoustic duration smoothes the amount of information in the signal over time (Aylett and Turk, 2004).
5
Background: Theoretical Framework Research on reduction in a large variety of language domains: Syntactic, discourse-related, phonological and prosodic, and lexical. Attested types of reduction: mostly, durational shortening of phonemes and syllables; deletion of phonemes and syllables (Ernestus, 2000; Johnson, 2004; Jurafsky et al, 2001) decrease in the spectral centre of gravity (Van Son and Pols, 2003) decrease in the mean amplitude (Shields and Balota, 1991); lesser degree of centralization of vowels (Wright, 1997), and higher degree of coarticulation (Scarborough, 2004).
6
Aims We test whether acoustic duration of interfixes contributes to smoothing of morphological information over the signal. The information-theoretical approach to acoustic salience was validated against two datasets with interfixed compounds. Control variables range from morphological to phonological to lexical tiers.
7
Background: Morphological Predictability Interfixes in Dutch compounds: Interfix -s-:oorlog-s-verklaring Interfix -e(n)-:dier-en-arts No interfix (zero):oog-arts Selection of the interfix is not predictable by deterministic rules, but depends on morphological families of the left/right constituents of the compound (Krott et al., 2001).
8
Background: Morphological Predictability Left/Right Constituent Families: Sets of compounds that share the left/right constituent with the target. Left constituent family of the compound “appartement-en-complex": appartement-en-complex appartement-en-gebouw appartement-s-gebouw
9
Background: Morphological Predictability Selection of an interfix in a compound is biased towards: the most frequent interfix in the Left constituent family; the most frequent interfix in the Right constituent family (to a lesser extent)
10
Methodology: Acoustic Materials Two datasets collected in the Read Speech component of the Spoken Dutch Corpus: 1155 tokens with the interfix -s-. Excluded environments: [s], [z], [∫] 742 tokens with the interfix -e(n)-. Excluded environments: [n], [m]. Interfixes were manually transcribed by two phoneticians. Acoustic durations for each segment in the datasets were obtained with the help of an HMM ASR which uses the HTK software package.
11
Methodology: Variables Dependent variable: (log-transformed) acoustic duration of the interfix Independent variable: The bias of the left constituent family towards –s- (SBias), or –en- (EnBias) for respective datasets. appartement-s-gebouw appartement-en-gebouw appartement-en-complex SBias in this left constituent family is: 1/(1+2) = 0.33. EnBias in this left constituent family is: 2/(1+2) =0.66.
12
Methodology: Control Variables Morphological variables: Positional entropy of the constituent families: Number of members in the family; Average information load of the family
13
Methodology: Control Variables Compound word frequency; constituent frequencies Frequency of word co-occurrence with its neighbors Segmental lexical information (van Son, Pols, 2003) Speech rate Number of segments after the interfix Position in the utterance (initial/final) Presence of [n] in the interfix (for –e(n)- dataset) Phoneme identity: [s] vs. [z] (for –s- dataset) FollowedbyStop (for –e(n)- dataset) Stress on the interfix syllable (for –s- dataset) Stress clash Speaker’s sex, language, age.
14
Results: /s/-dataset SBias 0.345*** RightPositionalEntropy 0.068*** SBias*RightPosEntropy-0.069*** WordFrequency 0.010* Lexical_Information 0.117*** PhonemeZ-0.156*** SpeechRate-0.511*** Stress-0.087*** R 2 = 0.104 Unique contribution of morpholexical factors = 2.0%
15
Discussion: /s/-dataset SBias 0.345*** RightPositionalEntropy 0.068*** SBias*RightPosEntropy-0.069*** WordFrequency 0.010* Direction of effects runs counter the predictions of the Information Theory. High values of these predictors imply a high likelihood of the interfix, so the acoustic duration of interfix should be reduced, not lengthened.
16
Results: /en/-dataset EnBias 0.140*** RightPositionalEntropy 0.082*** Lexical_Information 0.070*** PresenceN 0.707*** SpeechRate-0.036*** FollowedByStop 0.234*** R 2 = 0.720 Unique contribution of morpholexical factors = 2.3% Again, more probability (less information) is associated here with acoustic lengthening, rather than reduction.
17
General Discussion Fine phonetic detail is governed by two orthogonal dimensions: syntagmatic, and paradigmatic. Syntagmatic perspective: Information-theoretical approaches consider information as a probability of an element in the context of its phonetic, lexical or syntactic neighbors. The syntagmatic measures presume that the elements and their sequence in a produced unit (syllable, word, clause) are known with certainty.
18
General Discussion: Syntagmatic Perspective Example: Segmental lexical information load (Van Son and Pols, 2003): the contribution of a segment to word disambiguation given the preceding word fragment. Target word: boo-k Frequency (boo-k…) ______________________________ Frequency (boo-k, boo-t, boo-ze…) Less probable = better disambiguation = longer realization.
19
General Discussion: Paradigmatic perspective Selection of the interfix is a pocket of indeterminacy: choice is probabilistic, not deterministic. The indeterminacy is resolved by paradigmatic support in constituent families. Morpholexical variables determine the strength of the support for available alternatives for the speaker. Greater support from paradigmatics implies a more confident selection and more salient acoustic realization. Lack of support leads to acoustic reduction.
20
Conclusions Morphological information does affect acoustic duration of interfixes -- in the direction unpredicted by the information theory. Interfixes in Dutch compounds form pockets of indeterminacy where the selection is driven by the power of paradigmatic support. The maximally likely alternative (one with the most support) is realized with greater acoustic salience and is not reduced.
22
Methodology: Control Variables Morphological variables: Positional entropy of the constituent families: Number of members in the family; Average information load of the family H = - Σ p(x) * log 2 p(x) The left family frequency of appartement is 8 + 5 = 13, The relative frequencies of family members in this family are: 8/13 = 0.62 for appartementsgebouw, 5/13 = 0.38 for appartementengebouw. The left positional entropy of appartementengebouw equals -(0.62*log 2 0.62 + 0.38*log 2 0.38)=0.96 bit.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.