Agustín Gravano 1,2 Julia Hirschberg 1 (1)Columbia University, New York, USA (2) Universidad de Buenos Aires, Argentina Backchannel-Inviting Cues in Task-Oriented.

Slides:



Advertisements
Similar presentations
COMP 110: Introduction to Programming Tyler Johnson Feb 11, 2009 MWF 11:00AM-12:15PM Sitterson 014.
Advertisements

COMP 110: Introduction to Programming Tyler Johnson Feb 25, 2009 MWF 11:00AM-12:15PM Sitterson 014.
COMP 110: Introduction to Programming Tyler Johnson Mar 16, 2009 MWF 11:00AM-12:15PM Sitterson 014.
COMP 110: Introduction to Programming Tyler Johnson January 12, 2009 MWF 11:00AM-12:15PM Sitterson 014.
COMP 110: Introduction to Programming Tyler Johnson Apr 27, 2009 MWF 11:00AM-12:15PM Sitterson 014.
COMP 110: Introduction to Programming Tyler Johnson Feb 4, 2009 MWF 11:00AM-12:15PM Sitterson 014.
Tax Year TYPES OF PAYMENTS 1040 PG 2 Line & 68 Federal income tax withheld from W-2s, 1099s Estimated payments & $ applied from prior year.
Large Scale Integration of Senses for the Semantic Web Jorge Gracia, Mathieu dAquin, Eduardo Mena Computer Science and Systems Engineering Department (DIIS)
Empowering the Consumer: Telecom Consumer Parliament A Nigerian Communications Commission Initiative by Lolia S. Emakpore (Mrs) Director, Consumer Affairs.
1 Using the HTK speech recogniser to analyse prosody in a corpus of German spoken learners English Toshifumi Oba, Eric Atwell University of Leeds, School.
University of Reading Improved understanding of how rainfall responds to a warming world Richard Allan Environmental Systems.
Turn-Taking and Affirmative Cue Words in Task-Oriented Dialogue
Student Learning Center Time Management Welcome to the Time Management workshop. While we are waiting to begin, please fill out the blank weekly.
Ziehm Academy - User Guide for online registration portal Nuremberg, February 2009.
October FUEL PRICE EVALUATION Comparing different fuel costs is a complex issue requiring an in-depth knowledge of fuel properties and characteristics,
Automation Solutions for Ladle Gate Applications
Speed Limit Finder CS 410 Fall 2009 Personal Presentation September 21, 2009 Sept. 21,
1 Cathay Life Insurance Ltd. (Vietnam) 27/11/20091.
Atomatic summarization of voic messages using lexical and prosodic features Koumpis and Renals Presented by Daniel Vassilev.
Vault 9 Project Update 9 th September 2009 Paul Pointon – Site Project Delivery Manager LLW Repository Ltd.
30 min Scratch July min intro to Scratch A Quick-and-Dirty approach Leaving lots of exploration for the future. (5 hour lesson plan available)
Flexible Scheduling of Software with Logical Execution Time Constraints* Stefan Resmerita and Patricia Derler University of Salzburg, Austria *UC Berkeley,
“Effect of Genre, Speaker, and Word Class on the Realization of Given and New Information” Julia Agustín Gravano & Julia Hirschberg {agus,
“Downstepped contours in the given/new distinction” Agustín Gravano Spoken Language Processing Group Columbia University, New York On the Role of Prosody.
/ nailon / – software for online analysis of prosody Interspeech 2006 special session: The prosody of turn-taking and dialog acts September 20, 2006 Jens.
Detecting Certainness in Spoken Tutorial Dialogues Liscombe, Hirschberg & Venditti Using System and User Performance Features to Improve Emotion Detection.
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.
Agustín Gravano 1 · Stefan Benus 2 · Julia Hirschberg 1 Elisa Sneed German 3 · Gregory Ward 3 1 Columbia University 2 Univerzity Konštantína Filozofa.
Uncertainty Corpus: Resource to Study User Affect in Complex Spoken Dialogue Systems Kate Forbes-Riley, Diane Litman, Scott Silliman, Amruta Purandare.
Presented by Ravi Kiran. Julia Hirschberg Stefan Benus Jason M. Brenier Frank Enos Sarah Friedman Sarah Gilman Cynthia Girand Martin Graciarena Andreas.
Spoken Language Processing Lab Who we are: Julia Hirschberg, Stefan Benus, Fadi Biadsy, Frank Enos, Agus Gravano, Jackson Liscombe, Sameer Maskey, Andrew.
Extracting Social Meaning Identifying Interactional Style in Spoken Conversation Jurafsky et al ‘09 Presented by Laura Willson.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.
High Frequency Word Entrainment in Spoken Dialogue ACL, June Columbus, OH Department of Computer and Information Science University of Pennsylvania.
Detecting missrecognitions Predicting with prosody.
Context and Prosody in the Interpretation of Cue Phrases in Dialogue Julia Hirschberg Columbia University and KTH 11/22/07 Spoken Dialog with Humans and.
Turn-taking in Mandarin Dialogue: Interactions of Tone and Intonation Gina-Anne Levow University of Chicago October 14, 2005.
Classification of Discourse Functions of Affirmative Words in Spoken Dialogue Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Shira Mitchell, Ilia.
Turn-Taking in Spoken Dialogue Systems CS4706 Julia Hirschberg.
1 Back Channel Communication Antoine Raux Dialogs on Dialogs 02/25/2005.
Varying Input Segmentation for Story Boundary Detection Julia Hirschberg GALE PI Meeting March 23, 2007.
Agustín Gravano 1,2 Julia Hirschberg 1 (1)Columbia University, New York, USA (2) Universidad de Buenos Aires, Argentina Turn-Yielding Cues in Task-Oriented.
Schizophrenia and Depression – Evidence in Speech Prosody Student: Yonatan Vaizman Advisor: Prof. Daphna Weinshall Joint work with Roie Kliper and Dr.
A Study in Cross-Cultural Interpretations of Back-Channeling Behavior Yaffa Al Bayyari Nigel Ward The University of Texas at El Paso Department of Computer.
AUTOMATIC DETECTION OF REGISTER CHANGES FOR THE ANALYSIS OF DISCOURSE STRUCTURE Laboratoire Parole et Langage, CNRS et Université de Provence Aix-en-Provence,
On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.
Breathing and speech planning in turn-taking Francisco Torreira Sara Bögels Stephen Levinson Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands.
Turn-taking Discourse and Dialogue CS 359 November 6, 2001.
Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC November 3, 2006.
Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman&Kate Forbes-Riley University of Pittsburgh Department of Computer Science.
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
The Games Corpus Design, implementation and annotation Agustín Gravano Spoken Language Processing Group Columbia University.
Lexical, Prosodic, and Syntactics Cues for Dialog Acts.
Acoustic Cues to Emotional Speech Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003.
Prosodic Cues to Disengagement and Uncertainty in Physics Tutorial Dialogues Diane Litman, Heather Friedberg, Kate Forbes-Riley University of Pittsburgh.
On the role of context and prosody in the interpretation of ‘okay’ Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Héctor Chávez, and Lauren Wilcox.
Studying Intonation Julia Hirschberg CS /21/2018.
Agustín Gravano1,2 Julia Hirschberg1
Turn-taking and Disfluencies
“Downstepped contours in the given/new distinction”
High Frequency Word Entrainment in Spoken Dialogue
Agustín Gravano & Julia Hirschberg {agus,
Agustín Gravano1,2 Julia Hirschberg1
Agustín Gravano1 · Stefan Benus2 · Julia Hirschberg1
Low Level Cues to Emotion
Acoustic-Prosodic and Lexical Entrainment in Deceptive Dialogue
Guest Lecture: Advanced Topics in Spoken Language Processing
Automatic Prosodic Event Detection
Presentation transcript:

Agustín Gravano 1,2 Julia Hirschberg 1 (1)Columbia University, New York, USA (2) Universidad de Buenos Aires, Argentina Backchannel-Inviting Cues in Task-Oriented Dialogue

Agustín Gravano Interspeech Interactive Voice Response Systems Quickly spreading. Mostly simple functionality. “Uncomfortable”, “awkward”. ASR+TTS account for most IVR problems. As ASR and TTS improve, other problems revealed. Coordination of system-user exchanges. Backchannels. Introduction

Agustín Gravano Interspeech Short expressions uttered by listeners to: Convey that they are paying attention. Encourage the speaker to continue. Examples: okay, uh-huh, mm-hm, alright. Very frequent in task-oriented dialogue. Thus, modeling human usage of BC should lead to an improved system-user coordination. Introduction Backchannels

Agustín Gravano Interspeech Goal Learn when backchannels are likely to occur. Find “backchannel-inviting” cues. Cues displayed by the speaker “inviting” the listener to produce a backchannel response. This could improve the coordination of IVRs: Speech understanding: Detect points in the user’s turn where a backchannel would be welcome. Speech generation: Display cues inviting the user to produce a backchannel. Introduction

Agustín Gravano Interspeech Talk Outline Previous work Material Method Results Conclusions

Agustín Gravano Interspeech Previous Work Duncan 1972, 1973, 1974, inter alia. Hypothesized six turn-yielding cues in face-to-face dialogue. Several studies continued this line of research, but always excluded backchannels. Ward & Tsukahara Region of low pitch lasting 110ms or more. Cathcart et al Language model based on pause duration and part- of-speech tags to predict the location of BC. Backchannel-Inviting Cues

Agustín Gravano Interspeech Columbia Games Corpus 12 task-oriented spontaneous dialogues. Standard American English. 13 subjects: 6 female, 7 male. Series of collaborative computer games. No eye contact. No speech restrictions. 9 hours of dialogue. Manual orthographic transcription, alignment. Manual prosodic annotations (ToBI). Material

Agustín Gravano Interspeech Player 1: DescriberPlayer 2: Follower Material Columbia Games Corpus

Agustín Gravano Interspeech Backchannel-Inviting Cues Cues displayed by the speaker “inviting” the listener to produce a backchannel response.

Agustín Gravano Interspeech Method 3 trained annotators identified Backchannels using a labeling scheme described in [Gravano et al. 2007]. To find BC-inviting cues, we compare: IPUs preceding Holds, IPUs preceding Backchannels. Backchannel-Inviting Cues IPU (Inter Pausal Unit): Maximal sequence of words from the same speaker surrounded by silence ≥ 50ms. HoldBackchannel Speaker A: Speaker B: IPU1IPU2 IPU3 IPU4

Agustín Gravano Interspeech Backchannel-Inviting Cues Individual Cues 1. Final rising intonation: 81% of IPUs before BC end in H-H% or L-H%. 2. Higher pitch level. 3. Higher intensity level. 4. Lower NHR (voice quality). 5. Longer IPU duration (seconds, #words). 6. Final POS bigram: 72% of IPUs before BC end in DT NN, JJ NN, or NN NN. } entire IPU final 1.0 sec final 0.5 sec

Agustín Gravano Interspeech Defining Presence of a Cue 2 representative features for each cue: Final intonationPitch slope over final 200ms, 300ms. Intensity levelMean intensity over final 500ms, 1000ms. Pitch levelMean pitch over final 500ms, 1000ms. Voice qualityNHR over final 500ms, 1000ms. IPU durationDuration in ms, and in number of words. Final POS bigram{‘DT NN’, ‘JJ NN’, ‘NN NN’} vs. Rest (binary). Define presence/absence based on whether the value is closer to the mean before BC or H. Backchannel-Inviting Cues

Agustín Gravano Interspeech Top Frequencies of Complex Cues BC-inviting cues: 1: Final intonation 2: Intensity level 3: Pitch level 4: IPU duration 5: Voice quality 6: Final POS bigram digit == cue present dot == cue absent

Agustín Gravano Interspeech Backchannel-Inviting Cues Combined Cues Number of cues conjointly displayed Percentage of IPUs followed by a BC r 2 = 0.993

Agustín Gravano Interspeech Backchannel-Inviting Cues IVR Systems After each IPU from the user: if estimated likelihood > threshold then produce a backchannel To elicit a backchannel from the user, if desired: Include as many cues as possible in the system’s final IPU.

Agustín Gravano Interspeech Summary Study of backchannel-inviting cues. Objective, automatically computable. Combined cues. Improve turn-taking decisions of IVR systems. Results drawn from task-oriented dialogues. Not necessarily generalizable. Suitable for most IVR domains. SIGdial 2009: Study of turn-yielding cues.

Agustín Gravano Interspeech Special thanks to… My advisor, Julia Hirschberg Thesis Committee Members Maxine Eskenazi, Kathy McKeown, Becky Passonneau, Amanda Stent. Speech Lab at Columbia University Stefan Benus, Fadi Biadsy, Sasha Caskey, Bob Coyne, Frank Enos, Martin Jansche, Jackson Liscombe, Sameer Maskey, Andrew Rosenberg. Collaborators Gregory Ward and Elisa Sneed German (Northwestern U); Ani Nenkova (UPenn); Héctor Chávez, David Elson, Michel Galley, Enrique Henestroza, Hanae Koiso, Shira Mitchell, Michael Mulley, Kristen Parton, Ilia Vovsha, Lauren Wilcox.