Morphological information and acoustic salience in Dutch compounds Victor Kuperman, IWTS Radboud University Nijmegen.

Slides:



Advertisements
Similar presentations
Atomatic summarization of voic messages using lexical and prosodic features Koumpis and Renals Presented by Daniel Vassilev.
Advertisements

Punctuation Generation Inspired Linguistic Features For Mandarin Prosodic Boundary Prediction CHEN-YU CHIANG, YIH-RU WANG AND SIN-HORNG CHEN 2012 ICASSP.
Human Speech Recognition Julia Hirschberg CS4706 (thanks to John-Paul Hosum for some slides)
Function words are often reduced or even deleted in casual conversation (Fig. 1). Pairs may neutralize: he’s/he was, we’re/we were What sources of information.
Coarticulation Analysis of Dysarthric Speech Xiaochuan Niu, advised by Jan van Santen.
Phonetic variability of the Greek rhotic sound Mary Baltazani University of Ioannina, Greece  Rhotics exhibit considerable phonetic variety cross-linguistically.
Using prosody to avoid ambiguity: Effects of speaker awareness and referential context Snedeker and Trueswell (2003) Psych 526 Eun-Kyung Lee.
AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY R.J.J.H. van Son, Barbertje M. Streefkerk, and Louis C.W. Pols Institute of Phonetic Sciences / ACLC University.
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.
Analysis II: Bayesian Analysis of Individual Data Bayesian analysis allows for principled 'null' results. 1) Transform the correlation coefficients to.
Development of coarticulatory patterns in spontaneous speech Melinda Fricke Keith Johnson University of California, Berkeley.
Acoustic / Lexical Model Derk Geene. Speech recognition  P(words|signal)= P(signal|words) P(words) / P(signal)  P(signal|words): Acoustic model  P(words):
Towards a model of speech production: Cognitive modeling and computational applications Michelle L. Gregory SNeRG 2003.
Yao LSA Separating speaker- and listener- oriented forces in speech – Evidence from phonological neighborhood density.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.
Introduction General Questions: What is the main cause of language change? Is it due to human laziness, the drive to conserve energy? Is it an automatization.
Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.
The distribution of the duration of the 4185 schwas (fig.2) is divided into 2 sub- groups : (1) 29% of the words produced with a complete absence of voicing.
Text-To-Speech Synthesis An Overview. What is a TTS System  Goal A system that can read any text Automatic production of new sentences Not just audio.
Why is ASR Hard? Natural speech is continuous
-- A corpus study using logistic regression Yao 1 Vowel alternation in the pronunciation of THE in American English.
Building High Quality Databases for Minority Languages such as Galician F. Campillo, D. Braga, A.B. Mourín, Carmen García-Mateo, P. Silva, M. Sales Dias,
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Acoustic and Linguistic Characterization of Spontaneous Speech Masanobu Nakamura, Koji Iwano, and Sadaoki Furui Department of Computer Science Tokyo Institute.
Introduction to Automatic Speech Recognition
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
Introduction Mel- Frequency Cepstral Coefficients (MFCCs) are quantitative representations of speech and are commonly used to label sound files. They are.
The partner effect in non- native speech Speech Accommodation Group Jiwon Hwang May 9, 2007.
Present Experiment Introduction Coarticulatory Timing and Lexical Effects on Vowel Nasalization in English: an Aerodynamic Study Jason Bishop University.
Phonetics and Phonology
Schizophrenia and Depression – Evidence in Speech Prosody Student: Yonatan Vaizman Advisor: Prof. Daphna Weinshall Joint work with Roie Kliper and Dr.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
Funded by NIH grant RO1 HD-4152 to J. Arnold NSF BCS and NSF BCS to Z. Griffin Why do speakers modulate acoustic prominence? Listener-oriented.
Whither Linguistic Interpretation of Acoustic Pronunciation Variation Annika Hämäläinen, Yan Han, Lou Boves & Louis ten Bosch.
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Suprasegmentals Segmental Segmental refers to phonemes and allophones and their attributes refers to phonemes and allophones and their attributes Supra-
Speech Science Fall 2009 Nov 2, Outline Suprasegmental features of speech Stress Intonation Duration and Juncture Role of feedback in speech production.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Yun-Nung (Vivian) Chen, Yu Huang, Sheng-Yi Kong, Lin-Shan Lee National Taiwan University, Taiwan.
Bernd Möbius CoE MMCI Saarland University Lecture 7 8 Dec 2010 Unit Selection Synthesis B Möbius Unit selection synthesis Text-to-Speech Synthesis.
Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents Elisabeth Chorianopoulou MSc in Speech and Language Processing.
Automatic Identification and Classification of Words using Phonetic and Prosodic Features Vidya Mohan Center for Speech and Language Engineering The Johns.
Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
National Taiwan University, Taiwan
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Conditional Random Fields for ASR Jeremy Morris July 25, 2006.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Combining Speech Attributes for Speech Recognition Jeremy Morris November 9, 2006.
© 2005, it - instituto de telecomunicações. Todos os direitos reservados. Arlindo Veiga 1,2 Sara Cadeias 1 Carla Lopes 1,2 Fernando Perdigão 1,2 1 Instituto.
Performance Comparison of Speaker and Emotion Recognition
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
0 / 27 John-Paul Hosom 1 Alexander Kain Brian O. Bush Towards the Recovery of Targets from Coarticulated Speech for Automatic Speech Recognition Center.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
Merging Segmental, Rhythmic and Fundamental Frequency Features for Automatic Language Identification Jean-Luc Rouas 1, Jérôme Farinas 1 & François Pellegrino.
Lexical, Prosodic, and Syntactics Cues for Dialog Acts.
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
MEGN 537 – Probabilistic Biomechanics Ch.5 – Determining Distributions and Parameters from Observed Data Anthony J Petrella, PhD.
The London School of Linguistics Marianne Beltrán Saavedra Mónica Yaresy Pachicano Niño Lorena Isabel Ortegón de la Peña Francisco Alberto Espinoza Moreno.
Mr. Darko Pekar, Speech Morphing Inc.
Conditional Random Fields for ASR
Statistical Models for Automatic Speech Recognition
CSC 594 Topics in AI – Natural Language Processing
Understanding Variation of VOT in spontaneous speech
Statistical Models for Automatic Speech Recognition
Cheng-Kuan Wei1 , Cheng-Tao Chung1 , Hung-Yi Lee2 and Lin-Shan Lee2
Presentation transcript:

Morphological information and acoustic salience in Dutch compounds Victor Kuperman, IWTS Radboud University Nijmegen

Introduction Kuperman, Pluymaekers, Ernestus, Baayen (in preparation) Goal: Role of morphological structure in modulating fine phonetic detail in speech production. Object: Interfixes in Dutch compounds.

Background: Theoretical Framework Economy of articulatory effort versus discriminability of the speech signal (H&H Theory, Lindblom 1990); Distribution of acoustic salience over an utterance depends on the distribution of information; Less predictable (more informative) elements are more salient; More predictable (less informative) elements are more reduced.

Background: Theoretical Framework Information transmission is optimal when information is distributed equally (per time unit) throughout the signal. Important elements need longer or more careful transmission: less likely to be lost to noise. Acoustic duration smoothes the amount of information in the signal over time (Aylett and Turk, 2004).

Background: Theoretical Framework Research on reduction in a large variety of language domains: Syntactic, discourse-related, phonological and prosodic, and lexical. Attested types of reduction: mostly, durational shortening of phonemes and syllables; deletion of phonemes and syllables (Ernestus, 2000; Johnson, 2004; Jurafsky et al, 2001) decrease in the spectral centre of gravity (Van Son and Pols, 2003) decrease in the mean amplitude (Shields and Balota, 1991); lesser degree of centralization of vowels (Wright, 1997), and higher degree of coarticulation (Scarborough, 2004).

Aims We test whether acoustic duration of interfixes contributes to smoothing of morphological information over the signal. The information-theoretical approach to acoustic salience was validated against two datasets with interfixed compounds. Control variables range from morphological to phonological to lexical tiers.

Background: Morphological Predictability Interfixes in Dutch compounds: Interfix -s-:oorlog-s-verklaring Interfix -e(n)-:dier-en-arts No interfix (zero):oog-arts Selection of the interfix is not predictable by deterministic rules, but depends on morphological families of the left/right constituents of the compound (Krott et al., 2001).

Background: Morphological Predictability Left/Right Constituent Families: Sets of compounds that share the left/right constituent with the target. Left constituent family of the compound “appartement-en-complex": appartement-en-complex appartement-en-gebouw appartement-s-gebouw

Background: Morphological Predictability Selection of an interfix in a compound is biased towards: the most frequent interfix in the Left constituent family; the most frequent interfix in the Right constituent family (to a lesser extent)

Methodology: Acoustic Materials Two datasets collected in the Read Speech component of the Spoken Dutch Corpus: 1155 tokens with the interfix -s-. Excluded environments: [s], [z], [∫] 742 tokens with the interfix -e(n)-. Excluded environments: [n], [m]. Interfixes were manually transcribed by two phoneticians. Acoustic durations for each segment in the datasets were obtained with the help of an HMM ASR which uses the HTK software package.

Methodology: Variables Dependent variable: (log-transformed) acoustic duration of the interfix Independent variable: The bias of the left constituent family towards –s- (SBias), or –en- (EnBias) for respective datasets. appartement-s-gebouw appartement-en-gebouw appartement-en-complex SBias in this left constituent family is: 1/(1+2) = EnBias in this left constituent family is: 2/(1+2) =0.66.

Methodology: Control Variables Morphological variables: Positional entropy of the constituent families: Number of members in the family; Average information load of the family

Methodology: Control Variables Compound word frequency; constituent frequencies Frequency of word co-occurrence with its neighbors Segmental lexical information (van Son, Pols, 2003) Speech rate Number of segments after the interfix Position in the utterance (initial/final) Presence of [n] in the interfix (for –e(n)- dataset) Phoneme identity: [s] vs. [z] (for –s- dataset) FollowedbyStop (for –e(n)- dataset) Stress on the interfix syllable (for –s- dataset) Stress clash Speaker’s sex, language, age.

Results: /s/-dataset SBias 0.345*** RightPositionalEntropy 0.068*** SBias*RightPosEntropy-0.069*** WordFrequency 0.010* Lexical_Information 0.117*** PhonemeZ-0.156*** SpeechRate-0.511*** Stress-0.087*** R 2 = Unique contribution of morpholexical factors = 2.0%

Discussion: /s/-dataset SBias 0.345*** RightPositionalEntropy 0.068*** SBias*RightPosEntropy-0.069*** WordFrequency 0.010* Direction of effects runs counter the predictions of the Information Theory. High values of these predictors imply a high likelihood of the interfix, so the acoustic duration of interfix should be reduced, not lengthened.

Results: /en/-dataset EnBias 0.140*** RightPositionalEntropy 0.082*** Lexical_Information 0.070*** PresenceN 0.707*** SpeechRate-0.036*** FollowedByStop 0.234*** R 2 = Unique contribution of morpholexical factors = 2.3% Again, more probability (less information) is associated here with acoustic lengthening, rather than reduction.

General Discussion Fine phonetic detail is governed by two orthogonal dimensions: syntagmatic, and paradigmatic. Syntagmatic perspective: Information-theoretical approaches consider information as a probability of an element in the context of its phonetic, lexical or syntactic neighbors. The syntagmatic measures presume that the elements and their sequence in a produced unit (syllable, word, clause) are known with certainty.

General Discussion: Syntagmatic Perspective Example: Segmental lexical information load (Van Son and Pols, 2003): the contribution of a segment to word disambiguation given the preceding word fragment. Target word: boo-k Frequency (boo-k…) ______________________________ Frequency (boo-k, boo-t, boo-ze…) Less probable = better disambiguation = longer realization.

General Discussion: Paradigmatic perspective Selection of the interfix is a pocket of indeterminacy: choice is probabilistic, not deterministic. The indeterminacy is resolved by paradigmatic support in constituent families. Morpholexical variables determine the strength of the support for available alternatives for the speaker. Greater support from paradigmatics implies a more confident selection and more salient acoustic realization. Lack of support leads to acoustic reduction.

Conclusions Morphological information does affect acoustic duration of interfixes -- in the direction unpredicted by the information theory. Interfixes in Dutch compounds form pockets of indeterminacy where the selection is driven by the power of paradigmatic support. The maximally likely alternative (one with the most support) is realized with greater acoustic salience and is not reduced.

Methodology: Control Variables Morphological variables: Positional entropy of the constituent families: Number of members in the family; Average information load of the family H = - Σ p(x) * log 2 p(x) The left family frequency of appartement is = 13, The relative frequencies of family members in this family are: 8/13 = 0.62 for appartementsgebouw, 5/13 = 0.38 for appartementengebouw. The left positional entropy of appartementengebouw equals -(0.62*log *log )=0.96 bit.