Presentation is loading. Please wait.

Presentation is loading. Please wait.

Agustín Gravano 1,2 Julia Hirschberg 1 (1)Columbia University, New York, USA (2) Universidad de Buenos Aires, Argentina Turn-Yielding Cues in Task-Oriented.

Similar presentations


Presentation on theme: "Agustín Gravano 1,2 Julia Hirschberg 1 (1)Columbia University, New York, USA (2) Universidad de Buenos Aires, Argentina Turn-Yielding Cues in Task-Oriented."— Presentation transcript:

1 Agustín Gravano 1,2 Julia Hirschberg 1 (1)Columbia University, New York, USA (2) Universidad de Buenos Aires, Argentina Turn-Yielding Cues in Task-Oriented Dialogue

2 Agustín Gravano SIGdial 20092 Interactive Voice Response Systems Quickly spreading. “Uncomfortable”, “awkward”. ASR+TTS account for most IVR problems. Other problems revealed. Coordination of system-user exchanges. Long pauses after user turns; interruptions. Modeling turn-taking behavior should lead to improved system-user coordination. Introduction

3 Agustín Gravano SIGdial 20093 Goal Learn when the speaker is likely to end her/his conversational turn. Find turn-yielding cues. Cues displayed by the speaker when approaching a potential turn boundary. This should improve the coordination of IVRs: Speech understanding: Detect the end of the user’s turn. Speech generation: Display cues signalling the end of system’s turn. Introduction

4 Agustín Gravano SIGdial 20094 Talk Outline Previous work Material Method Results Conclusions

5 Agustín Gravano SIGdial 20095 Previous Work on Turn-Taking Duncan 1972, 1973, 1974, inter alia. Hypothesized 6 turn-yielding cues in face-to-face dialogue. Conjectured a linear relation between the number of displayed cues and the likelihood of a turn-taking attempt. Studies formalized and verified some of Duncan’s hypotheses. [For&Tho96; Wen&Sie03; Cut&Pea86; Wic&Cas01] Implementations of turn-boundary detection. Simulations [Fer&al.02,03; Edl&al.05; Sch06; Att&al.08; Bau08] Actual systems: Let’s Go! [Rau&Esk08] Exploiting turn-yielding cues improves performance.

6 Agustín Gravano SIGdial 20096 Columbia Games Corpus 12 task-oriented spontaneous dialogues. Standard American English. 13 subjects: 6 female, 7 male. Series of collaborative computer games. No eye contact. No speech restrictions. 9 hours of dialogue. Manual orthographic transcription, alignment. Manual prosodic annotations (ToBI). Material

7 Agustín Gravano SIGdial 20097 Player 1: DescriberPlayer 2: Follower Material Columbia Games Corpus

8 Agustín Gravano SIGdial 20098 Turn-Yielding Cues Cues displayed by the speaker when approaching a potential turn boundary.

9 Agustín Gravano SIGdial 20099 Method Smooth switch: Speaker A finishes her utterance; speaker B takes the turn with no overlapping speech. Trained annotators distinguished Smooth switches from Interruptions and Backchannels using a scheme based on Ferguson 1977, Beattie 1982. Turn-Yielding Cues IPU (Inter Pausal Unit): Maximal sequence of words from the same speaker surrounded by silence ≥ 50ms. Speaker A: Speaker B: IPU1IPU2 IPU3 HoldSmooth switch

10 Agustín Gravano SIGdial 200910 To find turn-yielding cues, we compare: IPUs preceding Holds, IPUs preceding Smooth switches. ~200 features: acoustic, prosodic, lexical, syntactic. Speaker A: Speaker B: HoldSmooth switch IPU1IPU2 IPU3 Turn-Yielding Cues Method

11 Agustín Gravano SIGdial 200911 1. Final intonation: Falling (L-L%) or high-rising (H-H%). 2. Faster speaking rate. Reduction of final lengthening. 3. Lower intensity level. 4. Lower pitch level. 5. Higher jitter, shimmer, NHR. Related to perception of voice quality. 6. Longer IPU duration ( seconds and #words ). Individual Cues Turn-Yielding Cues

12 Agustín Gravano SIGdial 200912 7. Textual completion (independent of intonation). (1) Manually annotated a portion of the data. Labelers read up to the end of a target IPU (no right context), judged whether it could constitute a ‘complete’ utterance. 400 tokens. K=0.81. (2) Trained an SVM classifier. 19 lexical + syntactic features. Accuracy: 80%. Maj-class baseline: 55%. Human agreement: 91%. (3) Labeled all IPUs in the corpus with the SVM model. Individual Cues Incomplete Complete Before smooth switches: Before holds: 18% 82% 47%53% (X 2 test, p ~ 0) Turn-Yielding Cues

13 Agustín Gravano SIGdial 200913 1. Final intonation: L-L% or H-H%. 2. Faster speaking rate. 3. Lower intensity level. 4. Lower pitch level. 5. Higher jitter, shimmer, NHR. 6. Longer IPU duration. 7. Textual completion. Individual Cues Turn-Yielding Cues

14 Agustín Gravano SIGdial 200914 Defining Presence of a Cue 2-3 representative features for each cue: Final intonationAbs. pitch slope over final 200ms, 300ms. Speaking rateSyllables/sec, phonemes/sec over IPU. Intensity levelMean intensity over final 500ms, 1000ms. Pitch levelMean pitch over final 500ms, 1000ms. Voice qualityJitter, shimmer, NHR over final 500ms. IPU durationDuration in ms, and in number of words. Textual completionComplete vs. incomplete (binary). Define presence/absence based on whether the value is closer to the mean before S or H. Turn-Yielding Cues

15 Agustín Gravano SIGdial 200915 Turn-yielding cues: 1: Final intonation 2: Speaking rate 3: Intensity level 4: Pitch level 5: IPU duration 6: Voice quality 7: Completion digit == cue present dot == cue absent Top Frequencies of Complex Cues

16 Agustín Gravano SIGdial 200916 Combined Cues Number of cues conjointly displayed Percentage of turn-taking attempts Turn-Yielding Cues r 2 = 0.969

17 Agustín Gravano SIGdial 200917 Turn-Yielding Cues IVR Systems After each IPU from the user: if estimated likelihood > threshold then take the turn To signal the end of a system’s turn: Include as many cues as possible in the system’s final IPU.

18 Agustín Gravano SIGdial 200918 Summary Study of turn-yielding cues. Objective, automatically computable. Combined cues. Improve turn-taking decisions of IVR systems. Results drawn from task-oriented dialogues. Not necessarily generalizable. Suitable for most IVR domains. Interspeech 2009: Study of backchannel- inviting cues.

19 Agustín Gravano SIGdial 200919 Special thanks to… Julia Hirschberg Thesis Committee Members Maxine Eskenazi, Kathy McKeown, Becky Passonneau, Amanda Stent. Speech Lab at Columbia University Stefan Benus, Fadi Biadsy, Sasha Caskey, Bob Coyne, Frank Enos, Martin Jansche, Jackson Liscombe, Sameer Maskey, Andrew Rosenberg. Collaborators Gregory Ward and Elisa Sneed German (Northwestern U); Ani Nenkova (UPenn); Héctor Chávez, David Elson, Michel Galley, Enrique Henestroza, Hanae Koiso, Shira Mitchell, Michael Mulley, Kristen Parton, Ilia Vovsha, Lauren Wilcox.


Download ppt "Agustín Gravano 1,2 Julia Hirschberg 1 (1)Columbia University, New York, USA (2) Universidad de Buenos Aires, Argentina Turn-Yielding Cues in Task-Oriented."

Similar presentations


Ads by Google