Turn-taking in Mandarin Dialogue: Interactions of Tone and Intonation Gina-Anne Levow University of Chicago October 14, 2005.

Slides:



Advertisements
Similar presentations
1 Using the HTK speech recogniser to analyse prosody in a corpus of German spoken learners English Toshifumi Oba, Eric Atwell University of Leeds, School.
Advertisements

Atomatic summarization of voic messages using lexical and prosodic features Koumpis and Renals Presented by Daniel Vassilev.
Punctuation Generation Inspired Linguistic Features For Mandarin Prosodic Boundary Prediction CHEN-YU CHIANG, YIH-RU WANG AND SIN-HORNG CHEN 2012 ICASSP.
The Role of F0 in the Perceived Accentedness of L2 Speech Mary Grantham O’Brien Stephen Winters GLAC-15, Banff, Alberta May 1, 2009.
Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand.
Sub-Project I Prosody, Tones and Text-To-Speech Synthesis Sin-Horng Chen (PI), Chiu-yu Tseng (Co-PI), Yih-Ru Wang (Co-PI), Yuan-Fu Liao (Co-PI), Lin-shan.
/ nailon / – software for online analysis of prosody Interspeech 2006 special session: The prosody of turn-taking and dialog acts September 20, 2006 Jens.
Mandarin Chinese Speech Recognition. Mandarin Chinese Tonal language (inflection matters!) Tonal language (inflection matters!) 1 st tone – High, constant.
Prosodic Signalling of (Un)Expected Information in South Swedish Gilbert Ambrazaitis Linguistics and Phonetics Centre for Languages and Literature.
Context and Learning in Multilingual Tone and Pitch Accent Recognition Gina-Anne Levow University of Chicago May 18, 2007.
Modeling Prosodic Sequences with K-Means and Dirichlet Process GMMs Andrew Rosenberg Queens College / CUNY Interspeech 2013 August 26, 2013.
Emotion in Meetings: Hot Spots and Laughter. Corpus used ICSI Meeting Corpus – 75 unscripted, naturally occurring meetings on scientific topics – 71 hours.
Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.
Identifying Local Corrections in Human-Computer Dialogue Gina-Anne Levow University of Chicago October 5, 2004.
Recognition of Voice Onset Time for Use in Detecting Pronunciation Variation ● Project Description ● What is Voice Onset Time (VOT)? – Physical Realization.
Understanding Spoken Corrections in Human-Computer Dialogue Gina-Anne Levow University of Chicago MAICS April 1, 2006.
Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue SIGDial 2004 Gina-Anne Levow April 30, 2004.
SPOKEN LANGUAGE SYSTEMS MIT Computer Science and Artificial Intelligence Laboratory Mitchell Peabody, Chao Wang, and Stephanie Seneff June 19, 2004 Lexical.
Spoken Language Processing Lab Who we are: Julia Hirschberg, Stefan Benus, Fadi Biadsy, Frank Enos, Agus Gravano, Jackson Liscombe, Sameer Maskey, Andrew.
Automatic Prosody Labeling Final Presentation Andrew Rosenberg ELEN Speech and Audio Processing and Recognition 4/27/05.
Context in Multilingual Tone and Pitch Accent Recognition Gina-Anne Levow University of Chicago September 7, 2005.
Extracting Social Meaning Identifying Interactional Style in Spoken Conversation Jurafsky et al ‘09 Presented by Laura Willson.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.
High Frequency Word Entrainment in Spoken Dialogue ACL, June Columbus, OH Department of Computer and Information Science University of Pennsylvania.
Detecting missrecognitions Predicting with prosody.
Incorporating Tone-related MLP Posteriors in the Feature Representation for Mandarin ASR Overview Motivation Tone has a crucial role in Mandarin speech.
Sound and Speech. The vocal tract Figures from Graddol et al.
Classification of Discourse Functions of Affirmative Words in Spoken Dialogue Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Shira Mitchell, Ilia.
1 Back Channel Communication Antoine Raux Dialogs on Dialogs 02/25/2005.
Agustín Gravano 1,2 Julia Hirschberg 1 (1)Columbia University, New York, USA (2) Universidad de Buenos Aires, Argentina Turn-Yielding Cues in Task-Oriented.
Interactive Dialogue Systems Professor Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh Pittsburgh,
Exploiting lexical information for Meeting Structuring Alfred Dielmann, Steve Renals (University of Edinburgh) {
A Study in Cross-Cultural Interpretations of Back-Channeling Behavior Yaffa Al Bayyari Nigel Ward The University of Texas at El Paso Department of Computer.
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
AUTOMATIC DETECTION OF REGISTER CHANGES FOR THE ANALYSIS OF DISCOURSE STRUCTURE Laboratoire Parole et Langage, CNRS et Université de Provence Aix-en-Provence,
On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.
circle Adding Spoken Dialogue to a Text-Based Tutorial Dialogue System Diane J. Litman Learning Research and Development Center & Computer Science Department.
Yun-Nung (Vivian) Chen, Yu Huang, Sheng-Yi Kong, Lin-Shan Lee National Taiwan University, Taiwan.
Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents Elisabeth Chorianopoulou MSc in Speech and Language Processing.
Turn-taking Discourse and Dialogue CS 359 November 6, 2001.
1 Natural Language Processing Lecture Notes 14 Chapter 19.
Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC November 3, 2006.
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
Using Word-level Features to Better Predict Student Emotions during Spoken Tutoring Dialogues Mihai Rotaru Diane J. Litman Graduate Research Competition.
National Taiwan University, Taiwan
Imposing native speakers’ prosody on non-native speakers’ utterances: Preliminary studies Kyuchul Yoon Spring 2006 NAELL The Division of English Kyungnam.
1/17/20161 Emotion in Meetings: Business and Personal Julia Hirschberg CS 4995/6998.
Control of prosodic features under perturbation in collaboration with Frank Guenther Dept. of Cognitive and Neural Systems, BU Carrie Niziolek [carrien]
Lexical, Prosodic, and Syntactics Cues for Dialog Acts.
Adapting Dialogue Models Discourse & Dialogue CMSC November 19, 2006.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Acoustic Cues to Emotional Speech Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003.
Prosodic Cues to Disengagement and Uncertainty in Physics Tutorial Dialogues Diane Litman, Heather Friedberg, Kate Forbes-Riley University of Pittsburgh.
Yow-Bang Wang, Lin-Shan Lee INTERSPEECH 2010 Speaker: Hsiao-Tsung Hung.
On the role of context and prosody in the interpretation of ‘okay’ Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Héctor Chávez, and Lauren Wilcox.
Audio Books for Phonetics Research CatCod2008 Jiahong Yuan and Mark Liberman University of Pennsylvania Dec. 4, 2008.
A Text-free Approach to Assessing Nonnative Intonation Joseph Tepperman, Abe Kazemzadeh, and Shrikanth Narayanan Signal Analysis and Interpretation Laboratory,
Investigating Pitch Accent Recognition in Non-native Speech
Tone in Sherpa (Sino-Tibetan) Joyce McDonough1, Rebecca Baier2 and
Turn-taking and Disfluencies
Recognizing Structure: Sentence, Speaker, andTopic Segmentation
Voice source characterisation
Agustín Gravano1,2 Julia Hirschberg1
Emer Gilmartin, Carl Vogel, ADAPT Centre Trinity College Dublin
Recognizing Structure: Dialogue Acts and Segmentation
Ju Lin, Yanlu Xie, Yingming Gao, Jinsong Zhang
Low Level Cues to Emotion
Guest Lecture: Advanced Topics in Spoken Language Processing
Automatic Prosodic Event Detection
Presentation transcript:

Turn-taking in Mandarin Dialogue: Interactions of Tone and Intonation Gina-Anne Levow University of Chicago October 14, 2005

Roadmap Motivation –Enabling fluent conversation Data Collection and Processing Acoustic Analysis of Turn-taking –Tone and Intonation Recognizing Boundaries and Interruptions Conclusions and Future Work

Turn-taking in Dialogue Goal: Enable fluent conversation –Turn-taking is collaborative (Duncan 1974) Requires producing and understanding cues –Crucial for dialogue agents and understanding End-pointing in spoken dialogue systems Confusion of barge-in and backchannel

Challenges Silence not sufficient or necessary –Dialogue involves overlap Overlaps are not arbitrary (Ward et al, 2000) Proposed cues: –Multimodal: Gesture, Gaze Not always available –Prosodic Attested in English, Japanese Tone languages?

Approach Identify significant differences in –Pitch, intensity between initial/final positions –Intensity for different transition types –Pitch, intensity of interruptions vs smooth –Assess interaction of tone and intonation Exploit contrasts for recognition of –Turn unit boundaries: ~93% –Interruptions: 62%

Data Collection Taiwanese Putonghua Corpus –5 spontaneous dialogues ~20 minutes each –7 female, 3 male speakers –Manually transcribed and word segmented –Turn beginnings and overlaps Manually labelled and time-stamped

Data Processing Automatic forced alignment –CU Sonic (Pellom et al) language porting Dictionary-based, manual pinyin-ARPABET mapping –Yields phone, syllable, word, silence duration, position Acoustic analysis –Pitch, Intensity: Praat (Boersma, 2001) –Per-side log-scaled z-score normalized

Turn Unit Types “Smooth” –Turn not ended by overlap & speaker change “Rough” –Turn ended by overlap & speaker change “Inter” –New speaker takes floor with overlap

Turn Unit Types “Smooth” –Turn unit not ended by overlap & speaker change “Rough” –Turn unit ended by overlap & speaker change “Inter” –New speaker takes floor with overlap S1 S2 S1 S2

Turn Unit Types PositionStart -Overlap or -Spkr Change Start +Overlap +Spkr Change End -Overlap Smooth 1413 Intersmooth 289 End +Overlap Rough 407 Interrough 68

Turn Unit Initial-Final Contrasts

Turn Unit Boundary Contrasts Unit initial versus final syllables –Pitch significantly lower in final than initial –Intensity significantly lower in final than initial Across all transition types Rough versus smooth transitions –Final syllables Intensity significantly higher

Characterizing Interruptions Contrast first syllable of “inter” vs “smooth” –Pitch significantly higher in interruptions –Intensity significantly higher in interruptions

Interactions of Tone and Intonation Clear intonational cues in tone language What affect on tones? –Contrast tones in final vs non-final position Mean pitch lowered in each tone –Relative height largely preserved Contour lowered but largely preserved Distinguishing tone characteristics retained

Interactions of Tone and Intonation Mean pitch across tones Tone contour changes

Recognizing Turn Unit Boundaries and Turn Types Classifier – Boostexter (Schapire 2000); 10-fold xval –Comparable results for C4.5, SVMs Prosodic features: –Local: Pitch, Intensity: Mean, Max; Duration –Word, syllable –Contextual: Difference b/t current and following word: pitch, int Silence Text features: –N-grams within preceding, following 5 syllables

Recognizing Turn Unit Boundaries Word: Boundary/non-boundary –3200 instances; down-sampled, balanced set –Key features: Silence, max intensity Lexical features: preceding ‘ta’, following ‘dui’ –Prosodic features more robust without silence Prosody OnlyTextProsody+Text With Silence93.7%93.5%93.1% W/o Silence66.5%59.5%69%

Recognizing Interruptions Initial words: Interruption/smooth start –>400 instances: downsampled, balanced set –Contextual features: Difference of current word pitch, intensity w/ prev Preceding silence –Best results: 62%, all feature sets Key feature: silence Without silence drops to chance

Discussion Turn-taking in Mandarin Dialogue –Significant intonational, prosodic cues Initiation/Finality: Lower final pitch, intensity Turn transition types: –Rough vs smooth: higher final intensity –Interruptions vs smooth: higher pitch, intensity –Tones globally lowered; shape, relative height Exploit cues for boundary, interruption –93%, 62% respectively – with silence

Conclusions & Future Work Intonational cues to turn-taking in Mandarin –Pitch jointly encodes lexical, dialogue meaning Basic tone contrasts largely preserved –Prosodic information supports dialogue flow Silence important, but other cues co-signal Integrate dialogue information for tone reco –Turn-taking, topic structure, etc