Turn-taking in Mandarin Dialogue: Interactions of Tone and Intonation Gina-Anne Levow University of Chicago October 14, 2005
Roadmap Motivation –Enabling fluent conversation Data Collection and Processing Acoustic Analysis of Turn-taking –Tone and Intonation Recognizing Boundaries and Interruptions Conclusions and Future Work
Turn-taking in Dialogue Goal: Enable fluent conversation –Turn-taking is collaborative (Duncan 1974) Requires producing and understanding cues –Crucial for dialogue agents and understanding End-pointing in spoken dialogue systems Confusion of barge-in and backchannel
Challenges Silence not sufficient or necessary –Dialogue involves overlap Overlaps are not arbitrary (Ward et al, 2000) Proposed cues: –Multimodal: Gesture, Gaze Not always available –Prosodic Attested in English, Japanese Tone languages?
Approach Identify significant differences in –Pitch, intensity between initial/final positions –Intensity for different transition types –Pitch, intensity of interruptions vs smooth –Assess interaction of tone and intonation Exploit contrasts for recognition of –Turn unit boundaries: ~93% –Interruptions: 62%
Data Collection Taiwanese Putonghua Corpus –5 spontaneous dialogues ~20 minutes each –7 female, 3 male speakers –Manually transcribed and word segmented –Turn beginnings and overlaps Manually labelled and time-stamped
Data Processing Automatic forced alignment –CU Sonic (Pellom et al) language porting Dictionary-based, manual pinyin-ARPABET mapping –Yields phone, syllable, word, silence duration, position Acoustic analysis –Pitch, Intensity: Praat (Boersma, 2001) –Per-side log-scaled z-score normalized
Turn Unit Types “Smooth” –Turn not ended by overlap & speaker change “Rough” –Turn ended by overlap & speaker change “Inter” –New speaker takes floor with overlap
Turn Unit Types “Smooth” –Turn unit not ended by overlap & speaker change “Rough” –Turn unit ended by overlap & speaker change “Inter” –New speaker takes floor with overlap S1 S2 S1 S2
Turn Unit Types PositionStart -Overlap or -Spkr Change Start +Overlap +Spkr Change End -Overlap Smooth 1413 Intersmooth 289 End +Overlap Rough 407 Interrough 68
Turn Unit Initial-Final Contrasts
Turn Unit Boundary Contrasts Unit initial versus final syllables –Pitch significantly lower in final than initial –Intensity significantly lower in final than initial Across all transition types Rough versus smooth transitions –Final syllables Intensity significantly higher
Characterizing Interruptions Contrast first syllable of “inter” vs “smooth” –Pitch significantly higher in interruptions –Intensity significantly higher in interruptions
Interactions of Tone and Intonation Clear intonational cues in tone language What affect on tones? –Contrast tones in final vs non-final position Mean pitch lowered in each tone –Relative height largely preserved Contour lowered but largely preserved Distinguishing tone characteristics retained
Interactions of Tone and Intonation Mean pitch across tones Tone contour changes
Recognizing Turn Unit Boundaries and Turn Types Classifier – Boostexter (Schapire 2000); 10-fold xval –Comparable results for C4.5, SVMs Prosodic features: –Local: Pitch, Intensity: Mean, Max; Duration –Word, syllable –Contextual: Difference b/t current and following word: pitch, int Silence Text features: –N-grams within preceding, following 5 syllables
Recognizing Turn Unit Boundaries Word: Boundary/non-boundary –3200 instances; down-sampled, balanced set –Key features: Silence, max intensity Lexical features: preceding ‘ta’, following ‘dui’ –Prosodic features more robust without silence Prosody OnlyTextProsody+Text With Silence93.7%93.5%93.1% W/o Silence66.5%59.5%69%
Recognizing Interruptions Initial words: Interruption/smooth start –>400 instances: downsampled, balanced set –Contextual features: Difference of current word pitch, intensity w/ prev Preceding silence –Best results: 62%, all feature sets Key feature: silence Without silence drops to chance
Discussion Turn-taking in Mandarin Dialogue –Significant intonational, prosodic cues Initiation/Finality: Lower final pitch, intensity Turn transition types: –Rough vs smooth: higher final intensity –Interruptions vs smooth: higher pitch, intensity –Tones globally lowered; shape, relative height Exploit cues for boundary, interruption –93%, 62% respectively – with silence
Conclusions & Future Work Intonational cues to turn-taking in Mandarin –Pitch jointly encodes lexical, dialogue meaning Basic tone contrasts largely preserved –Prosodic information supports dialogue flow Silence important, but other cues co-signal Integrate dialogue information for tone reco –Turn-taking, topic structure, etc