Turn-Taking and Affirmative Cue Words in Task-Oriented Dialogue

Slides:



Advertisements
Similar presentations
COMP 110: Introduction to Programming Tyler Johnson Feb 18, 2009 MWF 11:00AM-12:15PM Sitterson 014.
Advertisements

COMP 110: Introduction to Programming Tyler Johnson Feb 25, 2009 MWF 11:00AM-12:15PM Sitterson 014.
COMP 110: Introduction to Programming Tyler Johnson Mar 16, 2009 MWF 11:00AM-12:15PM Sitterson 014.
COMP 110: Introduction to Programming Tyler Johnson Apr 20, 2009 MWF 11:00AM-12:15PM Sitterson 014.
COMP 110: Introduction to Programming Tyler Johnson Apr 13, 2009 MWF 11:00AM-12:15PM Sitterson 014.
COMP 110: Introduction to Programming Tyler Johnson January 26, 2009 MWF 11:00AM-12:15PM Sitterson 014.
COMP 110: Introduction to Programming Tyler Johnson Mar 25, 2009 MWF 11:00AM-12:15PM Sitterson 014.
COMP 110: Introduction to Programming Tyler Johnson Apr 8, 2009 MWF 11:00AM-12:15PM Sitterson 014.
COMP 110: Introduction to Programming Tyler Johnson Apr 1, 2009 MWF 11:00AM-12:15PM Sitterson 014.
COMP 110: Introduction to Programming Tyler Johnson January 28, 2009 MWF 11:00AM-12:15PM Sitterson 014.
Tax Year Site Procedures, Quality Reviews, In-Scope.
Ph. D. Completion and Attrition: Baseline Program Data
Large Scale Integration of Senses for the Semantic Web Jorge Gracia, Mathieu dAquin, Eduardo Mena Computer Science and Systems Engineering Department (DIIS)
Ziehm Academy - User Guide for online registration portal Nuremberg, February 2009.
AŽD Praha s.r.o. CRV&AVV ATO system used on vehicles of Czech Railways Dr.Ing. Ivo Myslivec AŽD Praha, Plant Technics, R&D WG1.
Non-Native Users in the Let s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch Antoine Raux & Maxine Eskenazi Language Technologies Institute.
1 Cathay Life Insurance Ltd. (Vietnam) 27/11/20091.
30 min Scratch July min intro to Scratch A Quick-and-Dirty approach Leaving lots of exploration for the future. (5 hour lesson plan available)
Agustín Gravano 1,2 Julia Hirschberg 1 (1)Columbia University, New York, USA (2) Universidad de Buenos Aires, Argentina Backchannel-Inviting Cues in Task-Oriented.
“Downstepped contours in the given/new distinction” Agustín Gravano Spoken Language Processing Group Columbia University, New York On the Role of Prosody.
/ nailon / – software for online analysis of prosody Interspeech 2006 special session: The prosody of turn-taking and dialog acts September 20, 2006 Jens.
Using prosody to avoid ambiguity: Effects of speaker awareness and referential context Snedeker and Trueswell (2003) Psych 526 Eun-Kyung Lee.
5/10/20151 Evaluating Spoken Dialogue Systems Julia Hirschberg CS 4706.
Agustín Gravano 1 · Stefan Benus 2 · Julia Hirschberg 1 Elisa Sneed German 3 · Gregory Ward 3 1 Columbia University 2 Univerzity Konštantína Filozofa.
Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.
Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue SIGDial 2004 Gina-Anne Levow April 30, 2004.
Spoken Language Processing Lab Who we are: Julia Hirschberg, Stefan Benus, Fadi Biadsy, Frank Enos, Agus Gravano, Jackson Liscombe, Sameer Maskey, Andrew.
What can humans do when faced with ASR errors? Dan Bohus Dialogs on Dialogs Group, October 2003.
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
Context in Multilingual Tone and Pitch Accent Recognition Gina-Anne Levow University of Chicago September 7, 2005.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.
High Frequency Word Entrainment in Spoken Dialogue ACL, June Columbus, OH Department of Computer and Information Science University of Pennsylvania.
Context and Prosody in the Interpretation of Cue Phrases in Dialogue Julia Hirschberg Columbia University and KTH 11/22/07 Spoken Dialog with Humans and.
Turn-taking in Mandarin Dialogue: Interactions of Tone and Intonation Gina-Anne Levow University of Chicago October 14, 2005.
Classification of Discourse Functions of Affirmative Words in Spoken Dialogue Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Shira Mitchell, Ilia.
Turn-Taking in Spoken Dialogue Systems CS4706 Julia Hirschberg.
Agustín Gravano 1,2 Julia Hirschberg 1 (1)Columbia University, New York, USA (2) Universidad de Buenos Aires, Argentina Turn-Yielding Cues in Task-Oriented.
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
1 Computational Linguistics Ling 200 Spring 2006.
Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.
Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.
Bernd Möbius CoE MMCI Saarland University Lecture 7 8 Dec 2010 Unit Selection Synthesis B Möbius Unit selection synthesis Text-to-Speech Synthesis.
Turn-taking Discourse and Dialogue CS 359 November 6, 2001.
Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC November 3, 2006.
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
The Games Corpus Design, implementation and annotation Agustín Gravano Spoken Language Processing Group Columbia University.
1/17/20161 Emotion in Meetings: Business and Personal Julia Hirschberg CS 4995/6998.
Lexical, Prosodic, and Syntactics Cues for Dialog Acts.
Adapting Dialogue Models Discourse & Dialogue CMSC November 19, 2006.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.
Grounding and Repair Joe Tepperman CS 599 – Dialogue Modeling Fall 2005.
On the role of context and prosody in the interpretation of ‘okay’ Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Héctor Chávez, and Lauren Wilcox.
Predicting and Adapting to Poor Speech Recognition in a Spoken Dialogue System Diane J. Litman AT&T Labs -- Research
Why Study Spoken Language?
Agustín Gravano1,2 Julia Hirschberg1
Dialogue Acts Julia Hirschberg CS /18/2018.
Why Study Spoken Language?
Recognizing Structure: Sentence, Speaker, andTopic Segmentation
“Downstepped contours in the given/new distinction”
Dialogue Acts Julia Hirschberg LSA /29/2018.
High Frequency Word Entrainment in Spoken Dialogue
Agustín Gravano1,2 Julia Hirschberg1
Agustín Gravano1 · Stefan Benus2 · Julia Hirschberg1
Recognizing Structure: Dialogue Acts and Segmentation
Low Level Cues to Emotion
Acoustic-Prosodic and Lexical Entrainment in Deceptive Dialogue
Guest Lecture: Advanced Topics in Spoken Language Processing
Automatic Prosodic Event Detection
Presentation transcript:

Turn-Taking and Affirmative Cue Words in Task-Oriented Dialogue Agustín Gravano Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Columbia University

Agustín Gravano - Thesis Defense - Jan 28, 2009 Special thanks to: Julia Hirschberg Committee Members Maxine Eskenazi, Kathy McKeown, Becky Passonneau, Amanda Stent. The Speech Lab Stefan Benus, Fadi Biadsy, Sasha Caskey, Bob Coyne, Frank Enos, Martin Jansche, Jackson Liscombe, Sameer Maskey, Andrew Rosenberg. Collaborators Gregory Ward and Elisa Sneed German (Northwestern U); Ani Nenkova (UPenn); Héctor Chávez, David Elson, Michel Galley, Enrique Henestroza, Hanae Koiso, Shira Mitchell, Michael Mulley, Kristen Parton, Ilia Vovsha, and Lauren Wilcox. Family and friends Agustín Gravano - Thesis Defense - Jan 28, 2009

Interactive Voice Response Systems Introduction Interactive Voice Response Systems Quickly spreading. Mostly simple functionality. Examples of IVR systems: Let’s Go!: Bus scheduling information (CMU). GOOG-411: Local information (Google). Most visible components of IVR systems: Automatic Speech Recognition (ASR) Text-To-Speech (TTS) Agustín Gravano - Thesis Defense - Jan 28, 2009

Interactive Voice Response Systems Introduction Interactive Voice Response Systems ASR+TTS account for most IVR problems. ASR: Up to 60% word error rate. TTS: Described as ‘odd’ or ‘mechanical’. As ASR and TTS improve, other problems begin to show: Coordination of system-user exchanges. Frequent words overloaded with multiple functions. Agustín Gravano - Thesis Defense - Jan 28, 2009

Coordination of Exchanges Introduction Coordination of Exchanges Let’s Go! Demo (http://www.speech.cs.cmu.edu/letsgo/) S: Thank you for calling the CMU Let's Go! Bus Information System. […] What can I do for you? U: I would like to go to the airport tomorrow morning. [silence] S: To the airport. When do you want to arrive? U: I'd like to arrive at 10:30. [silence] S: Arriving at around 10:30 AM. Where do you want to leave from? U: I'd like to leave from Carnegie Mellon. [silence] S: From Carnegie Mellon. There is a 28X leaving Forbes Avenue […] Turn boundary detection is currently based on silence detection. Problems: latencies and false positives. Agustín Gravano - Thesis Defense - Jan 28, 2009

Agustín Gravano - Thesis Defense - Jan 28, 2009 Introduction Overloaded Cue Words Cue words: expressions such as by the way, however, after all. Frequent in dialogue, used for structuring discourse and shaping conversation. Affirmative cue words: okay, alright, etc. Convey acknowledgment, start a new topic, display continued attention, inter alia. Frequent in task-oriented dialogue. IVR systems: understanding and generation. Agustín Gravano - Thesis Defense - Jan 28, 2009

Agustín Gravano - Thesis Defense - Jan 28, 2009 Introduction Motivation Understand and incorporate these and other phenomena into IVR systems, aiming at gradually approaching human-like behavior. Descriptions of associations between observed phenomena (e.g. turn exchange types) and measurable events (e.g. variations in acoustic features). No strong claims about the degree of awareness of speakers and listeners. Agustín Gravano - Thesis Defense - Jan 28, 2009

Agustín Gravano - Thesis Defense - Jan 28, 2009 (1) Columbia Games Corpus (2) Study of Turn-Taking (3) Study of Affirmative Cue Words Agustín Gravano - Thesis Defense - Jan 28, 2009

Agustín Gravano - Thesis Defense - Jan 28, 2009 Columbia Games Corpus Task-oriented spontaneous dialogues. Two subjects, each with a laptop computer. Series of collaborative computer games. Soundproof booth; head-mounted mics. No eye contact; only verbal communication. No restrictions; subjects could speak freely. Agustín Gravano - Thesis Defense - Jan 28, 2009

Agustín Gravano - Thesis Defense - Jan 28, 2009 Columbia Games Corpus Cards Game, Part 1 Player 1: Describer Player 2: Searcher In the first part of the Cards game, each player’s screen displayed a pile of 10 cards. One of the players was asked to describe the cards on their pile, one by one,. The other player was asked to search through the cards in their own pile. Agustín Gravano - Thesis Defense - Jan 28, 2009

Agustín Gravano - Thesis Defense - Jan 28, 2009 Columbia Games Corpus Cards Game, Part 2 Player 1: Describer Player 2: Searcher The second Cards game is a matching game. Each player saw a board of cards like these, and they had to describe to each other the cards, as they were turned face up. Their common goal was to match as many cards having at least one image in common as possible. Agustín Gravano - Thesis Defense - Jan 28, 2009

Agustín Gravano - Thesis Defense - Jan 28, 2009 Columbia Games Corpus Objects Game Player 1: Describer Player 2: Follower In an Objects games, each player saw a board with 5-7 objects. The boards were almost identical, with one object misplaced. One of the players had to describe the position of the target object to the other player, who had to move it to the correct position. Agustín Gravano - Thesis Defense - Jan 28, 2009

Agustín Gravano - Thesis Defense - Jan 28, 2009 Columbia Games Corpus 12 sessions, 13 subjects (6 female, 7 male). 9 hours of dialogue. Orthographic transcription and alignment. 70K words, 2K unique words Non-word vocalizations (laughs, coughs, etc.) Prosodic transcription (ToBI conventions). Automatically generated session logs. Agustín Gravano - Thesis Defense - Jan 28, 2009

Agustín Gravano - Thesis Defense - Jan 28, 2009 (1) Columbia Games Corpus (2) Study of Turn-Taking (3) Study of Affirmative Cue Words Agustín Gravano - Thesis Defense - Jan 28, 2009

Agustín Gravano - Thesis Defense - Jan 28, 2009 Turn-Taking Goals Speech understanding: Detection of the end of the user’s turn. Detection of points in the user’s turn where a backchannel response would be welcome. Speech generation: Display of cues signalling the end of system’s turn. Display of cues inviting the user to produce a backchannel response. Agustín Gravano - Thesis Defense - Jan 28, 2009

Agustín Gravano - Thesis Defense - Jan 28, 2009 Turn-Taking Previous Work Sacks, Schegloff & Jefferson 1974. General characterization of turn-taking in conversation between two or more persons. Transition-relevance place: The current speaker may either yield the turn, or continue speaking. Duncan 1972, 1973, 1974, inter alia. Six turn-yielding cues in face-to-face dialogue. Linear relation between the number of displayed cues and the likelihood of a turn-taking attempt. Agustín Gravano - Thesis Defense - Jan 28, 2009

Agustín Gravano - Thesis Defense - Jan 28, 2009 Turn-Taking Previous Work Corpus and perception studies. Formalized and verified some of the turn-yielding cues hypothesized by Duncan. Ford & Thompson 1996; Wennerstrom & Siegel 2003; Cutler & Pearson 1986; Wichmann & Caspers 2001. Implementations of turn-boundary detection. Simulations (Ferrer et al. 2002, 2003; Edlund et al. 2005; Schlangen 2006; Atterer et al. 2008; Baumann 2008). Actual systems (Raux & Eskenazi 2008, on Let’s Go!). Exploiting turn-yielding cues improves performance. Agustín Gravano - Thesis Defense - Jan 28, 2009

Agustín Gravano - Thesis Defense - Jan 28, 2009 Turn-Taking Turn-Yielding Cues Cues displayed by the speaker when approaching a potential turn boundary. Agustín Gravano - Thesis Defense - Jan 28, 2009

Agustín Gravano - Thesis Defense - Jan 28, 2009 Turn-Yielding Cues Method IPU (Inter Pausal Unit): Maximal sequence of words from the same speaker surrounded by silence ≥ 50ms. Speaker A: Speaker B: Hold IPU1 IPU2 IPU3 Smooth switch Smooth switch: Speaker A finishes her utterance; speaker B takes the turn with no overlapping speech. Trained annotators distinguished Smooth switches from Interruptions and Backchannels using a scheme based on Ferguson 1977, Beattie 1982. Agustín Gravano - Thesis Defense - Jan 28, 2009

Agustín Gravano - Thesis Defense - Jan 28, 2009 Turn-Yielding Cues Method Speaker A: Speaker B: Hold Smooth switch IPU1 IPU2 IPU3 Compare IPUs preceding Holds and IPUs preceding Smooth switches. Assumption: Cues are more likely to occur before Smooth switches than before Holds. Agustín Gravano - Thesis Defense - Jan 28, 2009

Individual Turn-Yielding Cues Final intonation Speaking rate Intensity level Pitch level Textual completion Voice quality IPU duration Agustín Gravano - Thesis Defense - Jan 28, 2009

Agustín Gravano - Thesis Defense - Jan 28, 2009 Individual Turn-Yielding Cues 1. Final Intonation Smooth switch Hold H-H% 22.1% 9.1% [!]H-L% 13.2% 29.9% L-H% 14.1% 11.5% L-L% 47.2% 24.7% No boundary tone 0.7% 22.4% Other 2.6% 2.4% Total 100% (2 test: p≈0) Falling, high-rising: turn-final. Plateau: turn-medial. Examination of final pitch slope shows same results. Agustín Gravano - Thesis Defense - Jan 28, 2009

Agustín Gravano - Thesis Defense - Jan 28, 2009 Individual Turn-Yielding Cues 2. Speaking Rate * * Smooth switch Hold * * z-score (*) ANOVA: p < 0.01 Entire IPU Final word Reduced final lengthening before turn boundaries. Agustín Gravano - Thesis Defense - Jan 28, 2009

3/4. Intensity and Pitch Levels Individual Turn-Yielding Cues 3/4. Intensity and Pitch Levels * * * Smooth switch Hold z-score * * * (*) ANOVA: p < 0.01 Intensity Pitch Lower intensity, pitch levels before turn boundaries. Agustín Gravano - Thesis Defense - Jan 28, 2009

Agustín Gravano - Thesis Defense - Jan 28, 2009 Individual Turn-Yielding Cues 5. Textual Completion Syntactic/semantic/pragmatic completion independent of intonation and gesticulation. Automatic computation of textual completion. (1) Manually annotated a portion of the data. 3 labelers; 400 IPUs; Fleiss’  = 0.814. (2) Trained an SVM classifier. 80% accuracy; baseline: 55%; human: 91%. Agustín Gravano - Thesis Defense - Jan 28, 2009

Agustín Gravano - Thesis Defense - Jan 28, 2009 Individual Turn-Yielding Cues 5. Textual Completion Labeled all IPUs in the corpus with the SVM model. 18% Incomplete 47% 53% 82% Complete (2 test, p ≈ 0) Smooth switch Hold Textual completion seems to be almost a necessary condition before switches, but not before holds. Agustín Gravano - Thesis Defense - Jan 28, 2009

Agustín Gravano - Thesis Defense - Jan 28, 2009 Individual Turn-Yielding Cues 6. Voice Quality * * * * * * * * * Smooth switch Hold z-score (*) ANOVA: p < 0.01 Jitter Shimmer NHR Higher jitter, shimmer, NHR before turn boundaries. Agustín Gravano - Thesis Defense - Jan 28, 2009

Agustín Gravano - Thesis Defense - Jan 28, 2009 Individual Turn-Yielding Cues 7. IPU Duration * (*) ANOVA: p < 0.01 Smooth switch Hold z-score Longer IPUs before turn boundaries. Agustín Gravano - Thesis Defense - Jan 28, 2009

Agustín Gravano - Thesis Defense - Jan 28, 2009 Turn-Yielding Cues Individual Cues Final intonation Speaking rate Intensity level Pitch level Textual completion Voice quality IPU duration Agustín Gravano - Thesis Defense - Jan 28, 2009

Combined Cues Turn-Yielding Cues Percentage of turn-taking attempts Number of cues conjointly displayed Agustín Gravano - Thesis Defense - Jan 28, 2009

Backchannel-Inviting Cues Turn-Taking Backchannel-Inviting Cues Cues displayed by the speaker inviting the listener to produce a backchannel response. Agustín Gravano - Thesis Defense - Jan 28, 2009

Agustín Gravano - Thesis Defense - Jan 28, 2009 Backchannel-Inviting Cues Method Speaker A: Speaker B: Hold Backchannel IPU1 IPU2 IPU3 IPU4 Compare IPUs preceding Holds and IPUs preceding Backchannels. Assumption: Cues are more likely to occur before Backchannels than before Holds. Agustín Gravano - Thesis Defense - Jan 28, 2009

Agustín Gravano - Thesis Defense - Jan 28, 2009 Backchannel-Inviting Cues Individual Cues Final rising intonation: H-H% or L-H%. Higher intensity level. Higher pitch level. Longer IPU duration. Lower NHR. Final POS bigram: DT NN, JJ NN, or NN NN. Agustín Gravano - Thesis Defense - Jan 28, 2009

Combined Cues Backchannel-Inviting Cues Percentage of IPUs followed by a BC r 2 = 0.812 r 2 = 0.993 Number of cues conjointly displayed Agustín Gravano - Thesis Defense - Jan 28, 2009

Agustín Gravano - Thesis Defense - Jan 28, 2009 Turn-Taking Overlapping Speech Hold Overlap ip2 ip1 ip3 Speaker A: Speaker B: 95% of overlaps start during the turn-final intermediate phrase (ip). We look for turn-yielding cues in the second-to-last intermediate phrase (e.g., ip2). Agustín Gravano - Thesis Defense - Jan 28, 2009

Agustín Gravano - Thesis Defense - Jan 28, 2009 Turn-Taking Overlapping Speech Cues found in second-to-last ips: Higher speaking rate. Lower intensity. Higher jitter, shimmer, NHR. All cues match the corresponding cues found in (non-overlapping) smooth switches. Cues seem to extend further back in the turn, becoming more prominent toward turn endings. Future research: Generalize the model of discrete turn-yielding cues. Agustín Gravano - Thesis Defense - Jan 28, 2009

Agustín Gravano - Thesis Defense - Jan 28, 2009 (1) Columbia Games Corpus (2) Study of Turn-Taking (3) Study of Affirmative Cue Words Agustín Gravano - Thesis Defense - Jan 28, 2009

Agustín Gravano - Thesis Defense - Jan 28, 2009 Affirmative Cue Words 8% of the words in the Columbia Games Corpus: okay, right, yeah, mm-hm, alright, uh-huh, gotcha, huh, yep, yes, yup. 10 discourse/pragmatic functions: Acknowledgment/agreement, Literal modifier, Backchannel, Cue beginning/ending discourse segment, Check with the interlocutor, Stall/Filler, Back from a task, Pivot beginning/ending (Ack+Cue). Labeled by 3 trained annotators. Fleiss’  = 0.69: ‘Substantial’ agreement. Agustín Gravano - Thesis Defense - Jan 28, 2009

Examples that’s pretty much okay Affirmative Cue Words Examples that’s pretty much okay Speaker 1: between the yellow mermaid and the whale Speaker 2: okay Speaker 1: and it is okay we’re gonna be placing the blue moon Literal modifier Backchannel Cue beginning discourse segment Agustín Gravano - Thesis Defense - Jan 28, 2009

Interactive Voice Response Systems Affirmative Cue Words Interactive Voice Response Systems Speech understanding: Must interpret the user’s input correctly. Speech generation: Need to convey potentially ambiguous terms with the appropriate parameters for the intended meaning. Agustín Gravano - Thesis Defense - Jan 28, 2009

Agustín Gravano - Thesis Defense - Jan 28, 2009 Affirmative Cue Words Previous Work Disambiguation of single-word cue phrases. well, now, say, so, like, really, … Discourse vs. sentential senses. Hirschberg & Litman 1987, 1993; Litman 1994, 1996; Zufferey & Popescu-Belis 2004, Lai 2008. Affirmative cue words. Hockey 1991, 1992; Kowtko 1997: Intonational differences across discourse/pragmatic functions. Jurafsky et al. 1998: Lexical identity is a strong cue to word function. Agustín Gravano - Thesis Defense - Jan 28, 2009

Descriptive statistics Affirmative Cue Words Descriptive statistics Large contextual differences Backchannels occur always as separate turns. Cue beginnings occur mostly in turn-initial position. Modifier instances of right occur in all positions within the turn, but rarely as separate turns. Acknowledgments occur in turn initial, medial and final positions, and also as separate turns. Agustín Gravano - Thesis Defense - Jan 28, 2009

Descriptive statistics Affirmative Cue Words Descriptive statistics Final intonation Backchannel: Rising (H-H%, L-H%) Cue beginning: Falling (L-L%) Check: High-rising (H-H%) Intensity Backchannel: High Cue beginning: High Cue ending: Low Agustín Gravano - Thesis Defense - Jan 28, 2009

Perception study of okay Affirmative Cue Words Perception study of okay Okay is the most frequent ACW in the corpus. How do hearers disambiguate its meaning? Acoustic/prosodic/phonetic vs. contextual info? 20 subjects classified 54 tokens of okay into {Ack, BC, CueBeg} in two conditions: No context available: only the word okay. Context available: 2 full speaker turns. contextualized ‘okay’ Speaker A: okay Speaker B: Agustín Gravano - Thesis Defense - Jan 28, 2009

Perception study of okay Affirmative Cue Words Perception study of okay No context available Very low inter-subject agreement. Correlations of word function with acoustic/prosodic/ phonetic features. Context available Higher inter-subject agreement. Contextual features trump ac/pr/ph features of okay. Exception: Final intonation of okay. Agustín Gravano - Thesis Defense - Jan 28, 2009

Automatic Classification Affirmative Cue Words Automatic Classification Identify automatically the function of ACWs. Classification into discourse vs. sentential function insufficient for ACWs. right: 15% discourse, 85% sentential. All other ACWs: 99% discourse, 1% sentential. New classification tasks: Detection of an acknowledgment function. Acknowledgment vs. No acknowledgment. Detection of a discourse segment boundary function. SegBeg vs. SegEnd vs. None. Agustín Gravano - Thesis Defense - Jan 28, 2009

Automatic Classification Affirmative Cue Words Automatic Classification Lexical features Lexical id, POS tags, n-grams. Discourse features Position of target word in IPU, turn, conversation. Timing features Duration of word, IPU, turn; amount of overlaps; latencies. Acoustic features Pitch, intensity, pitch slope, voice quality. Phonetic features Id, duration of each phone. Agustín Gravano - Thesis Defense - Jan 28, 2009

Automatic Classification Affirmative Cue Words Automatic Classification Discourse Boundary Acknowledgment Error Rate Baseline (1) 18.6 % 15.3 % SVM: Word-only 14.4 % 15.0 % SVM: Online (up to current IPU) 10.1 % 6.7 % SVM: Full model 6.9 % 4.5 % Human labelers 5.7 % 3.3 % } * } } * * } } * * (1) Discourse Boundary: majority class == no boundary Acknowledgment: {right, huh}  no ACK; all others  ACK (*) Significantly different (Wilcoxon signed rank sum test; p < 0.05) Agustín Gravano - Thesis Defense - Jan 28, 2009

Agustín Gravano - Thesis Defense - Jan 28, 2009 Affirmative Cue Words Speaker Entrainment In conversation, people adapt the way they speak to match their partner. Referring expressions (Brennan 1996). Syntactic constructions (Reitter et al. 2006). Intensity (Coulston et al. 2002, Ward & Litman 2007). Entrainment at different levels (lex, syn, sem): Key for both production and understanding, and facilitates interaction (Pickering & Garrod 2004, Goleman 2006). Predictor of task success (MapTask; Reitter & Moore 2007). Agustín Gravano - Thesis Defense - Jan 28, 2009

Agustín Gravano - Thesis Defense - Jan 28, 2009 Affirmative Cue Words Speaker Entrainment Two novel measures of entrainment based on usage of high-frequency words (HFW), including ACW. Entrainment of HFW correlates with: (+) Game score  Task success (+) Proportion of overlaps (–) Proportion of interruptions  Dialogue coordination (–) Latency of smooth switches Future work: Establish causality relation. Impact on IVR system design and/or evaluation. } Agustín Gravano - Thesis Defense - Jan 28, 2009

Agustín Gravano - Thesis Defense - Jan 28, 2009 (1) Columbia Games Corpus (2) Study of Turn-Taking (3) Study of Affirmative Cue Words Agustín Gravano - Thesis Defense - Jan 28, 2009

Agustín Gravano - Thesis Defense - Jan 28, 2009 Contributions Columbia Games Corpus Valuable dataset for studying spontaneous task-oriented dialogue. Study of Turn-Taking Turn-yielding cues. Backchannel-inviting cues. Objective, automatically computable. Combined cues. Improve turn-taking decisions of IVR systems. Agustín Gravano - Thesis Defense - Jan 28, 2009

Agustín Gravano - Thesis Defense - Jan 28, 2009 Contributions Study of Affirmative Cue Words Descriptive statistics and perceptual results. Automatic classification. Speaker entrainment. Understanding and generation in IVR systems. Results drawn from task-oriented dialogues, thus not necessarily generalizable, but suitable for most IVR domains. Necessary steps towards the ambitious, long-term goal of human-like speech systems. Agustín Gravano - Thesis Defense - Jan 28, 2009

Agustín Gravano - Thesis Defense - Jan 28, 2009 Future Work Additional turn-taking cues. Voice quality? Novel ways to combine cues. Weights? Study cues that extend over entire turns, increasing near potential turn boundaries. Characterize interruptions. Speaker entrainment Affirmative cue words. Turn-taking behavior. Acoustic/prosodic variation. Agustín Gravano - Thesis Defense - Jan 28, 2009

Turn-Taking and Affirmative Cue Words in Task-Oriented Dialogue Agustín Gravano Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Columbia University