-- A corpus study using logistic regression Yao 1 Vowel alternation in the pronunciation of THE in American English.

Slides:



Advertisements
Similar presentations
Phonological Development
Advertisements

Employment transitions over the business cycle Mark Taylor (ISER)
Tone perception and production by Cantonese-speaking and English- speaking L2 learners of Mandarin Chinese Yen-Chen Hao Indiana University.
Human Speech Recognition Julia Hirschberg CS4706 (thanks to John-Paul Hosum for some slides)
Speech Productions of French- English Bilingual Speakers in Western Canada Nicole Netelenbos Fangfang Li.
JPN494: Japanese Language and Linguistics JPN543: Advanced Japanese Language and Linguistics Phonology & Phonetics (2)
Phonetic variability of the Greek rhotic sound Mary Baltazani University of Ioannina, Greece  Rhotics exhibit considerable phonetic variety cross-linguistically.
Infant sensitivity to distributional information can affect phonetic discrimination Jessica Maye, Janet F. Werker, LouAnn Gerken A brief article from Cognition.
Analyses on IFA corpus Louis C.W. Pols Institute of Phonetic Sciences (IFA) Amsterdam Center for Language and Communication (ACLC) Project meeting INTAS.
NOVA Comprehensive Perspectives on Child Speech Development and Disorders Chapter 10 Acquiring French Andrea MacLeod 1.
Clinical Phonetics.
Phonology Phonology is essentially the description of the systems and patterns of speech sounds in a language. It is, in effect, based on a theory of.
Languages Dialect and Accents
Do Children Pick and Choose? An Examination of Phonological Selection and Avoidance in Early Lexical Acquisition. Richard G. Schwartz and Laurence B. Leonard.
Language, Society, and Culture
Development of coarticulatory patterns in spontaneous speech Melinda Fricke Keith Johnson University of California, Berkeley.
Multiple Logistic Regression RSQUARE, LACKFIT, SELECTION, and interactions.
Towards a model of speech production: Cognitive modeling and computational applications Michelle L. Gregory SNeRG 2003.
Health-related quality of life in diabetic patients and controls without diabetes in refugee camps in Gaza strip: a cross-sectional study By: Ashraf Eljedi:
Extracting Social Meaning Identifying Interactional Style in Spoken Conversation Jurafsky et al ‘09 Presented by Laura Willson.
Yao LSA Separating speaker- and listener- oriented forces in speech – Evidence from phonological neighborhood density.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.
Turn-taking in Mandarin Dialogue: Interactions of Tone and Intonation Gina-Anne Levow University of Chicago October 14, 2005.
Chapter three Phonology
Return to the Obvious: the Ubiquity of Categorical Rules W. Labov, U. of Pennsylvania Panel on Usage-based and rule based approaches to phonological variation.
SOCIOPHONETICS. Labov, 1966 The Social Stratification of English in New York City “Investigating the r pronunciation after vowels.”
Acoustic and Linguistic Characterization of Spontaneous Speech Masanobu Nakamura, Koji Iwano, and Sadaoki Furui Department of Computer Science Tokyo Institute.
Introduction Mel- Frequency Cepstral Coefficients (MFCCs) are quantitative representations of speech and are commonly used to label sound files. They are.
…not the study of telephones!
Present Experiment Introduction Coarticulatory Timing and Lexical Effects on Vowel Nasalization in English: an Aerodynamic Study Jason Bishop University.
Abstract Research Questions The present study compared articulatory patterns in production of dental stop [t] with conventional dentures to productions.
Speech rate affects the word error rate of automatic speech recognition systems. Higher error rates for fast speech, but also for slow, hyperarticulated.
Whither Linguistic Interpretation of Acoustic Pronunciation Variation Annika Hämäläinen, Yan Han, Lou Boves & Louis ten Bosch.
1 Introducing The Buckeye Speech Corpus Kyuchul Yoon English Division, Kyungnam University March 21, 2008 School of English,
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,
English Linguistics: An Introduction
LREC 2008, Marrakech, Morocco1 Automatic phone segmentation of expressive speech L. Charonnat, G. Vidal, O. Boëffard IRISA/Cordial, Université de Rennes.
Experimentation Duration is the most significant feature with around 40% correlation. Experimentation Duration is the most significant feature with around.
The vowel detection algorithm provides an estimation of the actual number of vowel present in the waveform. It thus provides an estimate of SR(u) : François.
Experimentation Duration is the most significant feature with around 40% correlation. Experimentation Duration is the most significant feature with around.
YAO UC BERKELEY JULY 25, 2008 An Exemplar-based Approach to Automatic Burst Detection in Voiceless.
1 Cross-language evidence for three factors in speech perception Sandra Anacleto uOttawa.
A Fully Annotated Corpus of Russian Speech
Introduction to Speech Neal Snider, For LIN110, April 12 th, 2005 (adapted from slides by Florian Jaeger)
Robust speaking rate estimation using broad phonetic class recognition Jiahong Yuan and Mark Liberman University of Pennsylvania Mar. 16, 2010.
A quick walk through phonetic databases Read English –TIMIT –Boston University Radio News Spontaneous English –Switchboard ICSI transcriptions –Buckeye.
Conditional Random Fields for ASR Jeremy Morris July 25, 2006.
Combining Speech Attributes for Speech Recognition Jeremy Morris November 9, 2006.
Tone, Accent and Quantity October 19, 2015 Thanks to Chilin Shih for making some of these lecture materials available.
© 2005, it - instituto de telecomunicações. Todos os direitos reservados. Arlindo Veiga 1,2 Sara Cadeias 1 Carla Lopes 1,2 Fernando Perdigão 1,2 1 Instituto.
Hello, Who is Calling? Can Words Reveal the Social Nature of Conversations?
Getting Data ● Kinds of data for linguistics – Written – Spoken – Visual (ASL, body language) ● Phonetics – Implosives-larynx lowering, rounding, x-ray.
0 / 27 John-Paul Hosom 1 Alexander Kain Brian O. Bush Towards the Recovery of Targets from Coarticulated Speech for Automatic Speech Recognition Center.
1 Separable Processing of Consonants and Vowels Alfonso Caramazza, Doriana Chialant, Rita Capasso & Gabriele Miceli (Jan. 2000) Nature. Vol 403:
Making it Meaningful  Dialects of American English as YOU see them Dialects of American English  Does everyone speak using a dialect? Information about.
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
Chapter 2: The variation problem 1: Inter-speaker variation J. Jenkins The phonology of English as an international language Presented by: Carrie Newdall.
On the role of context and prosody in the interpretation of ‘okay’ Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Héctor Chávez, and Lauren Wilcox.
Danielle Werle Undergraduate Thesis Intelligibility and the Carrier Phrase Effect in Sinewave Speech.
Audio Books for Phonetics Research CatCod2008 Jiahong Yuan and Mark Liberman University of Pennsylvania Dec. 4, 2008.
Introduction to Linguistics
Increased Physical Activity And Senior Center Participation
6th International Conference on Language Variation in Europe
Dean Luo, Wentao Gu, Ruxin Luo and Lixin Wang
Conditional Random Fields for ASR
Audio Books for Phonetics Research
Understanding Variation of VOT in spontaneous speech
Improving Overlap Farrokh Alemi, Ph.D.
Analyzing F0 and vowel formants of Persian based on long-term features
Presentation transcript:

-- A corpus study using logistic regression Yao 1 Vowel alternation in the pronunciation of THE in American English

B ACKGROUND How do you say the word THE ? [dh ah], with a schwa [dh iy], with a high front tense vowel What is the rule for vowel alternation? Canonical rule: [dh iy] / _ [+vowel] [dh ah ] / otherwise Other stories? 2

B ACKGROUND Age (Keating et al, 1994) TIMIT Corpus of read speech in English Age-dependent pronunciation 3 Younger speakers have a higher probability of using other vowels than [iy] in “the” before vowel. No speakers above 50 yrs use other vowels than [i] before vowels.

B ACKGROUND Disfluency (Fox Tree & Clark, 1997) More [dh iy] (81%) than [dh ah] (7%) before suspension of speech. Ongoing sound change Age Gender? Social class? Dialect? Online speech production Planning problem Speech rate? 4

D ATA Buckeye corpus 40 speakers All residents at Columbus, Ohio Balanced in age and gender 1-hr interview Transcribed at word and phone level Dataset All tokens of the from all speakers 5

P RELIMINARY COUNTS 8132 instances of the 172 different phonetic transcriptions 10 most common pronunciation cover 84.19% of the tokens Most common syllable structures CV (N=7003); V (N=913); C (N=164) Most common vowels [ah] (N=4426); [ih] (N=1808); [iy] (N=1130) 6 At least three vowel variants, instead of two!

P RELIMINARY ANALYSIS Vowel name and duration [ ə ] [ ɪ ] [i] 7

P RELIMINARY ANALYSIS General vowel alternation pattern regarding the following segment 8

S TUDY DESIGN Use logistic regression to model the alternation among the three vowels ([ ah ], [ ih ], [iy]). Predictor variables include phonological factor: following segment speaker characteristics: age, gender contextual features: disfluency, speech rate 9

C ODING VARIABLES Vowel variant (outcome variable) ah: [ ə ] ih: [ ɪ ] Iy: [i] Following segment C: Consonant V: Vowel U: Non-lingusitic Age Y: Young (<40 yr) O: Old (>=40 yr) 10

C ODING VARIABLES ( CONT ’ D ) Gender F: Female M: Male Following Disfluency D: Disfluent Pause Filled pause ( um, uh, you know). Repetition ( the) Hesitation, cutoff, extended pronunciation F: Fluent otherwise 11

C ODING VARIABLES ( CONT ’ D ) Preceding Disfluency D: Disfluent Similar to following disfluency F: Fluent Speed Average speed of the pause-bounded stretch (in # of syll per second) 12

S IMPLEST MODEL [ah] vs. [iy] Exclude cases followed by non-linguistic sounds cases remain. Predictor variables Block 1: following segment Block 2: age, gender, and their interaction with following segment Block 3: speed, presence of disfluency, and their interaction with other variables Method = Forward stepwise (conditional) 13

S IMPLEST MODEL ( CONT ’ D ) Results Following segment is most significant. Percentage of right prediction: 80.3%  90.6% Following disfluency is also significant. No other factor or interaction appears significant. Temporary conclusion Old/young male/female speakers respect the canonical phonological rule equally well. 14

ABOUT [ IH ] Some basic facts Women produce [ih] more often than men (28.2% vs. 21.3%) Young people produce [ih] more often than older people (23.3% vs. 26.1%) The majority are followed by consonants (84.5%). Are these also the factors that would favor [ih] over [ah] or [iy]? 15

A TAD MORE COMPLICATED : [ IH ] VS. [ IY ] Exclude cases followed by non-linguistic sounds cases remain. Same independent variables as the previous model Results Following segment is the most significant condition (right prediction: 62.8%  80.7%) Following disfluency is also significant (80.7%  81.4%) Other significant factors: gender, gender X following segment, speed X following segment 16

[ IH ] VS. [ AH ] Exclude cases followed by non-linguistic sounds cases remain. Same independent variables as the previous model Results Following segment is still significant, but the significance is reduced (right prediction: 70.8%  71.5%) Other significant factors: gender X following segment, age, age X gender, following disfluency 17

T EMPORARY CONCLUSIONS Most important factor is following segment, but the effect is weakest in the ah/ih model. The presence of following disfluency also affects vowel alternation consistently, and the effect is strongest in iy/ih alternation. 18

E FFECT OF FOLLOWING DISFLUENCY IN IH / IY COMPARISON 19 Speaker characteristics (age, gender) and speech rate fail to enter the model for ah/iy distinction, but do show in the other two models considering the [ih] vowel. In particular, the interaction of gender and following segment shows in both models.

M OVING ON TO CASES FOLLOWED BY NON - LINGUISTIC SOUNDS [ah] vs. [iy] Same model, but with all cases (N=5556) Significant factors Block 1: Following segment (79.7%  89.0%) Block 2: Age X following segment, age, age X gender. Block 3: Following disfluency, speed and their interaction. Speed X following segment. (89.0%  89.3%) 20

M OVING ON TO CASES FOLLOWED BY NON - LINGUISTIC SOUNDS [ih] vs. [iy] Same model, but with all cases (N=2938) Significant factors Block 1: Following segment (61.5%  78.1%) Block 2: age, gender, age X following segment, gender X following segment. (78.1%  79.1) Block 3: Following disfluency, speed and their interaction. (79.1%  80.7%) 21

M OVING ON TO CASES FOLLOWED BY NON - LINGUISTIC SOUNDS [ah] vs. [ih] Same model, but with all cases (N=6234) Significant factors Block 1: Following segment (71.0%  71.6%) Block 2: age, gender, age X gender. (71.6%  71.7%) Block 3: Following disfluency X speed. 22

T EMPORARY CONCLUSIONS When all cases are included (followed by consonant, vowel, or non-linguistic sounds) Speaker characteristics enter the models, even the one for ah/iy distinction. Following disfluency and speed continue to contribute in all models. The ah/ih distinction is still the hardest to model. 23

E FFECT OF G ENDER 24

E FFECT OF AGE 25

G ENERAL DISCUSSION Ongoing sound change? - Yes… The new pronunciation [dh ih] A variant form of [dh ah]? Speaker characteristics at play? What about elongated [dh ah]? A variant form of [dh iy]? Vowel alternation  duration alternation? Disfluency and speech rate affecting the pronunciation? - Yes… Following (un)filled pauses and repetition Preceding disfluency has no effect 26

N EXT STEP Examine the phonetic makeup of the vowels Moving from modeling vowel name distinction to modeling continuous variables, such as formants and durations Include more speaker variables More specific age variable Social class? Include more contextual measures More types of disfluency Contextual predictability? 27

T HANKS ! Questions and comments are more than welcome… 28

R EFERENCES Fox Tree, J.E., Clark, H.H. (1997). Pronouncing "the" as "thee" to signal problems in speaking. Cognition, 62, Keating, P., MacEachern, M., Shryock, A., Dominguez, S. (1994). A manual for phonetic transcription: Segmentation and labeling of words in spontaneous speech. Manual written for the Linguistic Data Consortium, UCLA Working Papers in Phonetics 88, Pitt, M.A., Dilley, L., Johnson, K., Kiesling, S., Raymond, W., Hume, E. and Fosler-Lussier, E. (2007) Buckeye Corpus of Conversational Speech (2nd release) [ Columbus, OH: Department of Psychology, Ohio State University (Distributor). 29