Download presentation
Presentation is loading. Please wait.
Published byMeryl Hubbard Modified over 8 years ago
1
Turn-taking Discourse and Dialogue CS 359 November 6, 2001
2
Agenda Motivation –Silence in Human-Computer Dialogue Turn-taking in human-human dialogue –Turn-change signals –Back-channel acknowledgments –Maintaining contact Exploiting to improve HCC –Automatic identification of disfluencies, jump- in points, and jump-ins
3
Turn-taking in HCI Human turn end: – Detected by 250ms silence System turn end: –Signaled by end of speech –Indicated by any human sound Barge-in Continued attention: –No signal
4
Missed turn example
5
Gesture, Gaze & Voice Range of gestural signals: –head (nod,shake), shoulder, hand, leg, foot movements; facial expressions; postures; artifacts –Align with syllables Units: phonemic clause + change Study with recorded exchanges
6
Yielding the Floor Turn change signal –Offer floor to auditor/hearer Cues: pitch fall, lengthening, “but uh”, end gesture, amplitude drop+’uh’, end clause Likelihood of change increases with more cues Negated by any gesticulation
7
Taking the Floor Speaker-state signal –Indicate becoming speaker Occurs at beginning of turns Cues: –Shift in head direction AND/OR –Start of gesture
8
Retaining the Floor Within-turn signal –Still speaker: Look at hearer as end clause Continuation signal –Still speaker: Look away after within-turn/back Back-channel: –‘mmhm’/okay/etc; nods, sentence completion. Clarification request; restate –NOT a turn: signal attention, agreement, confusion
9
Segmenting Turns Speaker alone: –Within-turn signal->end of one unit; –Continuation signal -. Beginning of next unit Joint signal: –Speaker turn signal (end); auditor ->speaker; speaker->auditor –Within-turn + back-channel + continuation Back-channels signal understanding –Early back-channel + continuation
10
Regaining Attention Gaze & Disfluency –Disfluency: “perturbation” in speech Silent pause, filled pause, restart –Gaze: Conversants don’t stare at each other constantly However, speaker expects to meet hearer’s gaze –Confirm hearer’s attention Disfluency occurs when realize hearer NOT attending –Pause until begin gazing, or to request attention
11
Improving Human-Computer Turn-taking Identifying cues to turn change and turn start Meeting conversations: –Recorded, natural research meetings –Multi-party –Overlapping speech –Units = “Spurts” between 500ms silence
12
Text + Prosody Text sequence: –Modeled as n-gram language model –Implement as HMM Prosody: –Duration, Pitch, Pause, Energy –Decision trees: classify + probability Integrate LM + DT
13
Decision Trees A BC DE F G X=tX=f Y>1 Y<=1 Y>2 Y<=2 Disfluency Sentence End None
14
Interpreting Breaks For each inter-word position: –Is it a disfluency, sentence end, or continuation? Key features: –Pause duration, vowel duration 62% accuracy wrt 50% chance baseline –~90% overall Best combines LM & DT
15
Jump-in Points (Used) Possible turn changes –Points WITHIN spurt where new speaker starts Key features: –Pause duration, low energy, pitch fall Accuracy: 65% wrt 50% baseline Performance depends only on preceding prosodic features
16
Jump-in Features Do people speak differently when jump-in? –Differ from regular turn starts? Examine only first words of turns –No LM Key features: –Raised pitch, raised amplitude Accuracy: 77% wrt 50% baseline –Prosody only
17
Summary Prosodic features signal conversational moves –Pause and vowel duration distinguish sentence end, disfluency, or fluent continuation –Jump-ins occur at locations that sound like sent. ends –Raise voice when jump in
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.