Agustín Gravano1,2 Julia Hirschberg1

Slides:



Advertisements
Similar presentations
Information structuring in English dialogue class 4
Advertisements

Turn-Taking and Affirmative Cue Words in Task-Oriented Dialogue
Agustín Gravano 1,2 Julia Hirschberg 1 (1)Columbia University, New York, USA (2) Universidad de Buenos Aires, Argentina Backchannel-Inviting Cues in Task-Oriented.
“Effect of Genre, Speaker, and Word Class on the Realization of Given and New Information” Julia Agustín Gravano & Julia Hirschberg {agus,
“Downstepped contours in the given/new distinction” Agustín Gravano Spoken Language Processing Group Columbia University, New York On the Role of Prosody.
/ nailon / – software for online analysis of prosody Interspeech 2006 special session: The prosody of turn-taking and dialog acts September 20, 2006 Jens.
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.
Agustín Gravano 1 · Stefan Benus 2 · Julia Hirschberg 1 Elisa Sneed German 3 · Gregory Ward 3 1 Columbia University 2 Univerzity Konštantína Filozofa.
Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.
Comparing American and Palestinian Perceptions of Charisma Using Acoustic-Prosodic and Lexical Analysis Fadi Biadsy, Julia Hirschberg, Andrew Rosenberg,
Presented by Ravi Kiran. Julia Hirschberg Stefan Benus Jason M. Brenier Frank Enos Sarah Friedman Sarah Gilman Cynthia Girand Martin Graciarena Andreas.
Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue SIGDial 2004 Gina-Anne Levow April 30, 2004.
Spoken Language Processing Lab Who we are: Julia Hirschberg, Stefan Benus, Fadi Biadsy, Frank Enos, Agus Gravano, Jackson Liscombe, Sameer Maskey, Andrew.
Extracting Social Meaning Identifying Interactional Style in Spoken Conversation Jurafsky et al ‘09 Presented by Laura Willson.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.
High Frequency Word Entrainment in Spoken Dialogue ACL, June Columbus, OH Department of Computer and Information Science University of Pennsylvania.
Context and Prosody in the Interpretation of Cue Phrases in Dialogue Julia Hirschberg Columbia University and KTH 11/22/07 Spoken Dialog with Humans and.
Turn-taking in Mandarin Dialogue: Interactions of Tone and Intonation Gina-Anne Levow University of Chicago October 14, 2005.
Classification of Discourse Functions of Affirmative Words in Spoken Dialogue Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Shira Mitchell, Ilia.
Turn-Taking in Spoken Dialogue Systems CS4706 Julia Hirschberg.
1 Back Channel Communication Antoine Raux Dialogs on Dialogs 02/25/2005.
Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition Nengheng Zheng Supervised under Professor P.C. Ching Nov. 26,
Agustín Gravano 1,2 Julia Hirschberg 1 (1)Columbia University, New York, USA (2) Universidad de Buenos Aires, Argentina Turn-Yielding Cues in Task-Oriented.
Schizophrenia and Depression – Evidence in Speech Prosody Student: Yonatan Vaizman Advisor: Prof. Daphna Weinshall Joint work with Roie Kliper and Dr.
9 th Conference on Telecommunications – Conftele 2013 Castelo Branco, Portugal, May 8-10, 2013 Sara Candeias 1 Dirce Celorico 1 Jorge Proença 1 Arlindo.
A Study in Cross-Cultural Interpretations of Back-Channeling Behavior Yaffa Al Bayyari Nigel Ward The University of Texas at El Paso Department of Computer.
Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,
Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.
Breathing and speech planning in turn-taking Francisco Torreira Sara Bögels Stephen Levinson Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands.
Turn-taking Discourse and Dialogue CS 359 November 6, 2001.
Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC November 3, 2006.
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
The Games Corpus Design, implementation and annotation Agustín Gravano Spoken Language Processing Group Columbia University.
Arlindo Veiga Dirce Celorico Jorge Proença Sara Candeias Fernando Perdigão Prosodic and Phonetic Features for Speaking Styles Classification and Detection.
1/17/20161 Emotion in Meetings: Business and Personal Julia Hirschberg CS 4995/6998.
Nuclear Accent Shape and the Perception of Syllable Pitch Rachael-Anne Knight LAGB 16 April 2003.
Lexical, Prosodic, and Syntactics Cues for Dialog Acts.
Adapting Dialogue Models Discourse & Dialogue CMSC November 19, 2006.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Acoustic Cues to Emotional Speech Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003.
On the role of context and prosody in the interpretation of ‘okay’ Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Héctor Chávez, and Lauren Wilcox.
A Text-free Approach to Assessing Nonnative Intonation Joseph Tepperman, Abe Kazemzadeh, and Shrikanth Narayanan Signal Analysis and Interpretation Laboratory,
Why Study Spoken Language?
Studying Intonation Julia Hirschberg CS /21/2018.
Studying Intonation Julia Hirschberg CS /21/2018.
Spoken Dialogue Systems
Intonational and Its Meanings
Agustín Gravano1,2 Julia Hirschberg1
Comparing American and Palestinian Perceptions of Charisma Using Acoustic-Prosodic and Lexical Analysis Fadi Biadsy, Julia Hirschberg, Andrew Rosenberg,
Why Study Spoken Language?
Audio Books for Phonetics Research
Turn-taking and Disfluencies
Studying Spoken Language Text 17, 18 and 19
Understanding Variation of VOT in spontaneous speech
Recognizing Structure: Sentence, Speaker, andTopic Segmentation
“Downstepped contours in the given/new distinction”
Fadi Biadsy. , Andrew Rosenberg. , Rolf Carlson†, Julia Hirschberg
High Frequency Word Entrainment in Spoken Dialogue
Agustín Gravano & Julia Hirschberg {agus,
Spoken Dialogue Systems
Agustín Gravano1 · Stefan Benus2 · Julia Hirschberg1
Recognizing Structure: Dialogue Acts and Segmentation
Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,
Tools for Speech Analysis
Low Level Cues to Emotion
Acoustic-Prosodic and Lexical Entrainment in Deceptive Dialogue
2017 APSIPA A Study on Landmark Detection Based on CTC and Its Application to Pronunciation Error Detection Chuanying Niu1, Jinsong Zhang1, Xuesong Yang2.
Guest Lecture: Advanced Topics in Spoken Language Processing
Automatic Prosodic Event Detection
Presentation transcript:

Agustín Gravano1,2 Julia Hirschberg1 Turn-Yielding Cues in Task-Oriented Dialogue Agustín Gravano1,2 Julia Hirschberg1 Columbia University, New York, USA (2) Universidad de Buenos Aires, Argentina

Interactive Voice Response Systems Introduction Interactive Voice Response Systems Quickly spreading. “Uncomfortable”, “awkward”. ASR+TTS account for most IVR problems. Other problems revealed. Coordination of system-user exchanges. Long pauses after user turns; interruptions. Modeling turn-taking behavior should lead to improved system-user coordination. Begin to show  revealed An improved  improved Agustín Gravano SIGdial 2009

Introduction Goal Learn when the speaker is likely to end her/his conversational turn. Find turn-yielding cues. Cues displayed by the speaker when approaching a potential turn boundary. This should improve the coordination of IVRs: Speech understanding: Detect the end of the user’s turn. Speech generation: Display cues signalling the end of system’s turn. Could improve  should improve Agustín Gravano SIGdial 2009

Talk Outline Previous work Material Method Results Conclusions Rest of this talk  Talk Outline or Outline of Talk Agustín Gravano SIGdial 2009

Previous Work on Turn-Taking Duncan 1972, 1973, 1974, inter alia. Hypothesized 6 turn-yielding cues in face-to-face dialogue. Conjectured a linear relation between the number of displayed cues and the likelihood of a turn-taking attempt. Studies formalized and verified some of Duncan’s hypotheses. [For&Tho96; Wen&Sie03; Cut&Pea86; Wic&Cas01] Implementations of turn-boundary detection. Simulations [Fer&al.02,03; Edl&al.05; Sch06; Att&al.08; Bau08] Actual systems: Let’s Go! [Rau&Esk08] Exploiting turn-yielding cues improves performance. Same comment on duncan as for interspeech talk Agustín Gravano SIGdial 2009

Columbia Games Corpus 12 task-oriented spontaneous dialogues. Material Columbia Games Corpus 12 task-oriented spontaneous dialogues. Standard American English. 13 subjects: 6 female, 7 male. Series of collaborative computer games. No eye contact. No speech restrictions. 9 hours of dialogue. Manual orthographic transcription, alignment. Manual prosodic annotations (ToBI). Agustín Gravano SIGdial 2009

Columbia Games Corpus Material Player 1: Describer Player 2: Follower In an Objects games, each player saw a board with 5-7 objects. The boards were almost identical, with one object misplaced. One of the players had to describe the position of the target object to the other player, who had to move it to the correct position. Agustín Gravano SIGdial 2009

Turn-Yielding Cues Cues displayed by the speaker when approaching a potential turn boundary. Agustín Gravano SIGdial 2009

Method Turn-Yielding Cues IPU (Inter Pausal Unit): Maximal sequence of words from the same speaker surrounded by silence ≥ 50ms. Hold Smooth switch Speaker A: Speaker B: IPU1 IPU2 IPU3 Smooth switch: Speaker A finishes her utterance; speaker B takes the turn with no overlapping speech. Trained annotators distinguished Smooth switches from Interruptions and Backchannels using a scheme based on Ferguson 1977, Beattie 1982. Agustín Gravano SIGdial 2009

Method Turn-Yielding Cues To find turn-yielding cues, we compare: Speaker A: Speaker B: Hold Smooth switch IPU1 IPU2 IPU3 To find turn-yielding cues, we compare: IPUs preceding Holds, IPUs preceding Smooth switches. ~200 features: acoustic, prosodic, lexical, syntactic. Agustín Gravano SIGdial 2009

Individual Cues Final intonation: Faster speaking rate. Turn-Yielding Cues Individual Cues Final intonation: Falling (L-L%) or high-rising (H-H%). Faster speaking rate. Reduction of final lengthening. Lower intensity level. Lower pitch level. Higher jitter, shimmer, NHR. Related to perception of voice quality. Longer IPU duration (seconds and #words). Again, make it clear that you looked for many more…. Agustín Gravano SIGdial 2009

Before smooth switches: Turn-Yielding Cues Individual Cues Textual completion (independent of intonation). (1) Manually annotated a portion of the data. Labelers read up to the end of a target IPU (no right context), judged whether it could constitute a ‘complete’ utterance. 400 tokens. K=0.81. (2) Trained an SVM classifier. 19 lexical + syntactic features. Accuracy: 80%. Maj-class baseline: 55%. Human agreement: 91%. (3) Labeled all IPUs in the corpus with the SVM model. Again, make it clear that you looked for many more…. Incomplete Complete Before smooth switches: Before holds: 18% 82% 47% 53% (X2 test, p ~ 0) Agustín Gravano SIGdial 2009

Individual Cues Final intonation: L-L% or H-H%. Faster speaking rate. Turn-Yielding Cues Individual Cues Final intonation: L-L% or H-H%. Faster speaking rate. Lower intensity level. Lower pitch level. Higher jitter, shimmer, NHR. Longer IPU duration. Textual completion. Again, make it clear that you looked for many more…. Agustín Gravano SIGdial 2009

Defining Presence of a Cue Turn-Yielding Cues Defining Presence of a Cue 2-3 representative features for each cue: Final intonation Abs. pitch slope over final 200ms, 300ms. Speaking rate Syllables/sec, phonemes/sec over IPU. Intensity level Mean intensity over final 500ms, 1000ms. Pitch level Mean pitch over final 500ms, 1000ms. Voice quality Jitter, shimmer, NHR over final 500ms. IPU duration Duration in ms, and in number of words. Textual completion Complete vs. incomplete (binary). Define presence/absence based on whether the value is closer to the mean before S or H. Agustín Gravano SIGdial 2009

Top Frequencies of Complex Cues digit == cue present dot == cue absent Turn-yielding cues: 1: Final intonation 2: Speaking rate 3: Intensity level 4: Pitch level 5: IPU duration 6: Voice quality 7: Completion Agustín Gravano SIGdial 2009

Number of cues conjointly displayed Turn-Yielding Cues Combined Cues r 2 = 0.969 Percentage of turn-taking attempts Number of cues conjointly displayed Agustín Gravano SIGdial 2009

IVR Systems After each IPU from the user: Turn-Yielding Cues IVR Systems After each IPU from the user: if estimated likelihood > threshold then take the turn To signal the end of a system’s turn: Include as many cues as possible in the system’s final IPU. Agustín Gravano SIGdial 2009

Summary Study of turn-yielding cues. Objective, automatically computable. Combined cues. Improve turn-taking decisions of IVR systems. Results drawn from task-oriented dialogues. Not necessarily generalizable. Suitable for most IVR domains. Interspeech 2009: Study of backchannel-inviting cues. Agustín Gravano SIGdial 2009

Special thanks to… Julia Hirschberg Thesis Committee Members Maxine Eskenazi, Kathy McKeown, Becky Passonneau, Amanda Stent. Speech Lab at Columbia University Stefan Benus, Fadi Biadsy, Sasha Caskey, Bob Coyne, Frank Enos, Martin Jansche, Jackson Liscombe, Sameer Maskey, Andrew Rosenberg. Collaborators Gregory Ward and Elisa Sneed German (Northwestern U); Ani Nenkova (UPenn); Héctor Chávez, David Elson, Michel Galley, Enrique Henestroza, Hanae Koiso, Shira Mitchell, Michael Mulley, Kristen Parton, Ilia Vovsha, Lauren Wilcox. Agustín Gravano SIGdial 2009