Download presentation
Presentation is loading. Please wait.
Published bySolomon Borman Modified over 10 years ago
1
Turn-Taking and Affirmative Cue Words in Task-Oriented Dialogue
Agustín Gravano Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Columbia University
2
Agustín Gravano - Thesis Defense - Jan 28, 2009
Special thanks to: Julia Hirschberg Committee Members Maxine Eskenazi, Kathy McKeown, Becky Passonneau, Amanda Stent. The Speech Lab Stefan Benus, Fadi Biadsy, Sasha Caskey, Bob Coyne, Frank Enos, Martin Jansche, Jackson Liscombe, Sameer Maskey, Andrew Rosenberg. Collaborators Gregory Ward and Elisa Sneed German (Northwestern U); Ani Nenkova (UPenn); Héctor Chávez, David Elson, Michel Galley, Enrique Henestroza, Hanae Koiso, Shira Mitchell, Michael Mulley, Kristen Parton, Ilia Vovsha, and Lauren Wilcox. Family and friends Agustín Gravano Thesis Defense Jan 28, 2009
3
Interactive Voice Response Systems
Introduction Interactive Voice Response Systems Quickly spreading. Mostly simple functionality. Examples of IVR systems: Let’s Go!: Bus scheduling information (CMU). GOOG-411: Local information (Google). Most visible components of IVR systems: Automatic Speech Recognition (ASR) Text-To-Speech (TTS) Agustín Gravano Thesis Defense Jan 28, 2009
4
Interactive Voice Response Systems
Introduction Interactive Voice Response Systems ASR+TTS account for most IVR problems. ASR: Up to 60% word error rate. TTS: Described as ‘odd’ or ‘mechanical’. As ASR and TTS improve, other problems begin to show: Coordination of system-user exchanges. Frequent words overloaded with multiple functions. Agustín Gravano Thesis Defense Jan 28, 2009
5
Coordination of Exchanges
Introduction Coordination of Exchanges Let’s Go! Demo ( S: Thank you for calling the CMU Let's Go! Bus Information System. […] What can I do for you? U: I would like to go to the airport tomorrow morning. [silence] S: To the airport. When do you want to arrive? U: I'd like to arrive at 10:30. [silence] S: Arriving at around 10:30 AM. Where do you want to leave from? U: I'd like to leave from Carnegie Mellon. [silence] S: From Carnegie Mellon. There is a 28X leaving Forbes Avenue […] Turn boundary detection is currently based on silence detection. Problems: latencies and false positives. Agustín Gravano Thesis Defense Jan 28, 2009
6
Agustín Gravano - Thesis Defense - Jan 28, 2009
Introduction Overloaded Cue Words Cue words: expressions such as by the way, however, after all. Frequent in dialogue, used for structuring discourse and shaping conversation. Affirmative cue words: okay, alright, etc. Convey acknowledgment, start a new topic, display continued attention, inter alia. Frequent in task-oriented dialogue. IVR systems: understanding and generation. Agustín Gravano Thesis Defense Jan 28, 2009
7
Agustín Gravano - Thesis Defense - Jan 28, 2009
Introduction Motivation Understand and incorporate these and other phenomena into IVR systems, aiming at gradually approaching human-like behavior. Descriptions of associations between observed phenomena (e.g. turn exchange types) and measurable events (e.g. variations in acoustic features). No strong claims about the degree of awareness of speakers and listeners. Agustín Gravano Thesis Defense Jan 28, 2009
8
Agustín Gravano - Thesis Defense - Jan 28, 2009
(1) Columbia Games Corpus (2) Study of Turn-Taking (3) Study of Affirmative Cue Words Agustín Gravano Thesis Defense Jan 28, 2009
9
Agustín Gravano - Thesis Defense - Jan 28, 2009
Columbia Games Corpus Task-oriented spontaneous dialogues. Two subjects, each with a laptop computer. Series of collaborative computer games. Soundproof booth; head-mounted mics. No eye contact; only verbal communication. No restrictions; subjects could speak freely. Agustín Gravano Thesis Defense Jan 28, 2009
10
Agustín Gravano - Thesis Defense - Jan 28, 2009
Columbia Games Corpus Cards Game, Part 1 Player 1: Describer Player 2: Searcher In the first part of the Cards game, each player’s screen displayed a pile of 10 cards. One of the players was asked to describe the cards on their pile, one by one,. The other player was asked to search through the cards in their own pile. Agustín Gravano Thesis Defense Jan 28, 2009
11
Agustín Gravano - Thesis Defense - Jan 28, 2009
Columbia Games Corpus Cards Game, Part 2 Player 1: Describer Player 2: Searcher The second Cards game is a matching game. Each player saw a board of cards like these, and they had to describe to each other the cards, as they were turned face up. Their common goal was to match as many cards having at least one image in common as possible. Agustín Gravano Thesis Defense Jan 28, 2009
12
Agustín Gravano - Thesis Defense - Jan 28, 2009
Columbia Games Corpus Objects Game Player 1: Describer Player 2: Follower In an Objects games, each player saw a board with 5-7 objects. The boards were almost identical, with one object misplaced. One of the players had to describe the position of the target object to the other player, who had to move it to the correct position. Agustín Gravano Thesis Defense Jan 28, 2009
13
Agustín Gravano - Thesis Defense - Jan 28, 2009
Columbia Games Corpus 12 sessions, 13 subjects (6 female, 7 male). 9 hours of dialogue. Orthographic transcription and alignment. 70K words, 2K unique words Non-word vocalizations (laughs, coughs, etc.) Prosodic transcription (ToBI conventions). Automatically generated session logs. Agustín Gravano Thesis Defense Jan 28, 2009
14
Agustín Gravano - Thesis Defense - Jan 28, 2009
(1) Columbia Games Corpus (2) Study of Turn-Taking (3) Study of Affirmative Cue Words Agustín Gravano Thesis Defense Jan 28, 2009
15
Agustín Gravano - Thesis Defense - Jan 28, 2009
Turn-Taking Goals Speech understanding: Detection of the end of the user’s turn. Detection of points in the user’s turn where a backchannel response would be welcome. Speech generation: Display of cues signalling the end of system’s turn. Display of cues inviting the user to produce a backchannel response. Agustín Gravano Thesis Defense Jan 28, 2009
16
Agustín Gravano - Thesis Defense - Jan 28, 2009
Turn-Taking Previous Work Sacks, Schegloff & Jefferson 1974. General characterization of turn-taking in conversation between two or more persons. Transition-relevance place: The current speaker may either yield the turn, or continue speaking. Duncan 1972, 1973, 1974, inter alia. Six turn-yielding cues in face-to-face dialogue. Linear relation between the number of displayed cues and the likelihood of a turn-taking attempt. Agustín Gravano Thesis Defense Jan 28, 2009
17
Agustín Gravano - Thesis Defense - Jan 28, 2009
Turn-Taking Previous Work Corpus and perception studies. Formalized and verified some of the turn-yielding cues hypothesized by Duncan. Ford & Thompson 1996; Wennerstrom & Siegel 2003; Cutler & Pearson 1986; Wichmann & Caspers 2001. Implementations of turn-boundary detection. Simulations (Ferrer et al. 2002, 2003; Edlund et al. 2005; Schlangen 2006; Atterer et al. 2008; Baumann 2008). Actual systems (Raux & Eskenazi 2008, on Let’s Go!). Exploiting turn-yielding cues improves performance. Agustín Gravano Thesis Defense Jan 28, 2009
18
Agustín Gravano - Thesis Defense - Jan 28, 2009
Turn-Taking Turn-Yielding Cues Cues displayed by the speaker when approaching a potential turn boundary. Agustín Gravano Thesis Defense Jan 28, 2009
19
Agustín Gravano - Thesis Defense - Jan 28, 2009
Turn-Yielding Cues Method IPU (Inter Pausal Unit): Maximal sequence of words from the same speaker surrounded by silence ≥ 50ms. Speaker A: Speaker B: Hold IPU1 IPU2 IPU3 Smooth switch Smooth switch: Speaker A finishes her utterance; speaker B takes the turn with no overlapping speech. Trained annotators distinguished Smooth switches from Interruptions and Backchannels using a scheme based on Ferguson 1977, Beattie 1982. Agustín Gravano Thesis Defense Jan 28, 2009
20
Agustín Gravano - Thesis Defense - Jan 28, 2009
Turn-Yielding Cues Method Speaker A: Speaker B: Hold Smooth switch IPU1 IPU2 IPU3 Compare IPUs preceding Holds and IPUs preceding Smooth switches. Assumption: Cues are more likely to occur before Smooth switches than before Holds. Agustín Gravano Thesis Defense Jan 28, 2009
21
Individual Turn-Yielding Cues
Final intonation Speaking rate Intensity level Pitch level Textual completion Voice quality IPU duration Agustín Gravano Thesis Defense Jan 28, 2009
22
Agustín Gravano - Thesis Defense - Jan 28, 2009
Individual Turn-Yielding Cues 1. Final Intonation Smooth switch Hold H-H% 22.1% 9.1% [!]H-L% 13.2% 29.9% L-H% 14.1% 11.5% L-L% 47.2% 24.7% No boundary tone 0.7% 22.4% Other 2.6% 2.4% Total 100% (2 test: p≈0) Falling, high-rising: turn-final. Plateau: turn-medial. Examination of final pitch slope shows same results. Agustín Gravano Thesis Defense Jan 28, 2009
23
Agustín Gravano - Thesis Defense - Jan 28, 2009
Individual Turn-Yielding Cues 2. Speaking Rate * * Smooth switch Hold * * z-score (*) ANOVA: p < 0.01 Entire IPU Final word Reduced final lengthening before turn boundaries. Agustín Gravano Thesis Defense Jan 28, 2009
24
3/4. Intensity and Pitch Levels
Individual Turn-Yielding Cues 3/4. Intensity and Pitch Levels * * * Smooth switch Hold z-score * * * (*) ANOVA: p < 0.01 Intensity Pitch Lower intensity, pitch levels before turn boundaries. Agustín Gravano Thesis Defense Jan 28, 2009
25
Agustín Gravano - Thesis Defense - Jan 28, 2009
Individual Turn-Yielding Cues 5. Textual Completion Syntactic/semantic/pragmatic completion independent of intonation and gesticulation. Automatic computation of textual completion. (1) Manually annotated a portion of the data. 3 labelers; 400 IPUs; Fleiss’ = (2) Trained an SVM classifier. 80% accuracy; baseline: 55%; human: 91%. Agustín Gravano Thesis Defense Jan 28, 2009
26
Agustín Gravano - Thesis Defense - Jan 28, 2009
Individual Turn-Yielding Cues 5. Textual Completion Labeled all IPUs in the corpus with the SVM model. 18% Incomplete 47% 53% 82% Complete (2 test, p ≈ 0) Smooth switch Hold Textual completion seems to be almost a necessary condition before switches, but not before holds. Agustín Gravano Thesis Defense Jan 28, 2009
27
Agustín Gravano - Thesis Defense - Jan 28, 2009
Individual Turn-Yielding Cues 6. Voice Quality * * * * * * * * * Smooth switch Hold z-score (*) ANOVA: p < 0.01 Jitter Shimmer NHR Higher jitter, shimmer, NHR before turn boundaries. Agustín Gravano Thesis Defense Jan 28, 2009
28
Agustín Gravano - Thesis Defense - Jan 28, 2009
Individual Turn-Yielding Cues 7. IPU Duration * (*) ANOVA: p < 0.01 Smooth switch Hold z-score Longer IPUs before turn boundaries. Agustín Gravano Thesis Defense Jan 28, 2009
29
Agustín Gravano - Thesis Defense - Jan 28, 2009
Turn-Yielding Cues Individual Cues Final intonation Speaking rate Intensity level Pitch level Textual completion Voice quality IPU duration Agustín Gravano Thesis Defense Jan 28, 2009
30
Combined Cues Turn-Yielding Cues Percentage of turn-taking attempts
Number of cues conjointly displayed Agustín Gravano Thesis Defense Jan 28, 2009
31
Backchannel-Inviting Cues
Turn-Taking Backchannel-Inviting Cues Cues displayed by the speaker inviting the listener to produce a backchannel response. Agustín Gravano Thesis Defense Jan 28, 2009
32
Agustín Gravano - Thesis Defense - Jan 28, 2009
Backchannel-Inviting Cues Method Speaker A: Speaker B: Hold Backchannel IPU1 IPU2 IPU3 IPU4 Compare IPUs preceding Holds and IPUs preceding Backchannels. Assumption: Cues are more likely to occur before Backchannels than before Holds. Agustín Gravano Thesis Defense Jan 28, 2009
33
Agustín Gravano - Thesis Defense - Jan 28, 2009
Backchannel-Inviting Cues Individual Cues Final rising intonation: H-H% or L-H%. Higher intensity level. Higher pitch level. Longer IPU duration. Lower NHR. Final POS bigram: DT NN, JJ NN, or NN NN. Agustín Gravano Thesis Defense Jan 28, 2009
34
Combined Cues Backchannel-Inviting Cues
Percentage of IPUs followed by a BC r 2 = 0.812 r 2 = 0.993 Number of cues conjointly displayed Agustín Gravano Thesis Defense Jan 28, 2009
35
Agustín Gravano - Thesis Defense - Jan 28, 2009
Turn-Taking Overlapping Speech Hold Overlap ip2 ip1 ip3 Speaker A: Speaker B: 95% of overlaps start during the turn-final intermediate phrase (ip). We look for turn-yielding cues in the second-to-last intermediate phrase (e.g., ip2). Agustín Gravano Thesis Defense Jan 28, 2009
36
Agustín Gravano - Thesis Defense - Jan 28, 2009
Turn-Taking Overlapping Speech Cues found in second-to-last ips: Higher speaking rate. Lower intensity. Higher jitter, shimmer, NHR. All cues match the corresponding cues found in (non-overlapping) smooth switches. Cues seem to extend further back in the turn, becoming more prominent toward turn endings. Future research: Generalize the model of discrete turn-yielding cues. Agustín Gravano Thesis Defense Jan 28, 2009
37
Agustín Gravano - Thesis Defense - Jan 28, 2009
(1) Columbia Games Corpus (2) Study of Turn-Taking (3) Study of Affirmative Cue Words Agustín Gravano Thesis Defense Jan 28, 2009
38
Agustín Gravano - Thesis Defense - Jan 28, 2009
Affirmative Cue Words 8% of the words in the Columbia Games Corpus: okay, right, yeah, mm-hm, alright, uh-huh, gotcha, huh, yep, yes, yup. 10 discourse/pragmatic functions: Acknowledgment/agreement, Literal modifier, Backchannel, Cue beginning/ending discourse segment, Check with the interlocutor, Stall/Filler, Back from a task, Pivot beginning/ending (Ack+Cue). Labeled by 3 trained annotators. Fleiss’ = 0.69: ‘Substantial’ agreement. Agustín Gravano Thesis Defense Jan 28, 2009
39
Examples that’s pretty much okay
Affirmative Cue Words Examples that’s pretty much okay Speaker 1: between the yellow mermaid and the whale Speaker 2: okay Speaker 1: and it is okay we’re gonna be placing the blue moon Literal modifier Backchannel Cue beginning discourse segment Agustín Gravano Thesis Defense Jan 28, 2009
40
Interactive Voice Response Systems
Affirmative Cue Words Interactive Voice Response Systems Speech understanding: Must interpret the user’s input correctly. Speech generation: Need to convey potentially ambiguous terms with the appropriate parameters for the intended meaning. Agustín Gravano Thesis Defense Jan 28, 2009
41
Agustín Gravano - Thesis Defense - Jan 28, 2009
Affirmative Cue Words Previous Work Disambiguation of single-word cue phrases. well, now, say, so, like, really, … Discourse vs. sentential senses. Hirschberg & Litman 1987, 1993; Litman 1994, 1996; Zufferey & Popescu-Belis 2004, Lai 2008. Affirmative cue words. Hockey 1991, 1992; Kowtko 1997: Intonational differences across discourse/pragmatic functions. Jurafsky et al. 1998: Lexical identity is a strong cue to word function. Agustín Gravano Thesis Defense Jan 28, 2009
42
Descriptive statistics
Affirmative Cue Words Descriptive statistics Large contextual differences Backchannels occur always as separate turns. Cue beginnings occur mostly in turn-initial position. Modifier instances of right occur in all positions within the turn, but rarely as separate turns. Acknowledgments occur in turn initial, medial and final positions, and also as separate turns. Agustín Gravano Thesis Defense Jan 28, 2009
43
Descriptive statistics
Affirmative Cue Words Descriptive statistics Final intonation Backchannel: Rising (H-H%, L-H%) Cue beginning: Falling (L-L%) Check: High-rising (H-H%) Intensity Backchannel: High Cue beginning: High Cue ending: Low Agustín Gravano Thesis Defense Jan 28, 2009
44
Perception study of okay
Affirmative Cue Words Perception study of okay Okay is the most frequent ACW in the corpus. How do hearers disambiguate its meaning? Acoustic/prosodic/phonetic vs. contextual info? 20 subjects classified 54 tokens of okay into {Ack, BC, CueBeg} in two conditions: No context available: only the word okay. Context available: 2 full speaker turns. contextualized ‘okay’ Speaker A: okay Speaker B: Agustín Gravano Thesis Defense Jan 28, 2009
45
Perception study of okay
Affirmative Cue Words Perception study of okay No context available Very low inter-subject agreement. Correlations of word function with acoustic/prosodic/ phonetic features. Context available Higher inter-subject agreement. Contextual features trump ac/pr/ph features of okay. Exception: Final intonation of okay. Agustín Gravano Thesis Defense Jan 28, 2009
46
Automatic Classification
Affirmative Cue Words Automatic Classification Identify automatically the function of ACWs. Classification into discourse vs. sentential function insufficient for ACWs. right: 15% discourse, 85% sentential. All other ACWs: 99% discourse, 1% sentential. New classification tasks: Detection of an acknowledgment function. Acknowledgment vs. No acknowledgment. Detection of a discourse segment boundary function. SegBeg vs. SegEnd vs. None. Agustín Gravano Thesis Defense Jan 28, 2009
47
Automatic Classification
Affirmative Cue Words Automatic Classification Lexical features Lexical id, POS tags, n-grams. Discourse features Position of target word in IPU, turn, conversation. Timing features Duration of word, IPU, turn; amount of overlaps; latencies. Acoustic features Pitch, intensity, pitch slope, voice quality. Phonetic features Id, duration of each phone. Agustín Gravano Thesis Defense Jan 28, 2009
48
Automatic Classification
Affirmative Cue Words Automatic Classification Discourse Boundary Acknowledgment Error Rate Baseline (1) 18.6 % 15.3 % SVM: Word-only 14.4 % 15.0 % SVM: Online (up to current IPU) 10.1 % 6.7 % SVM: Full model 6.9 % 4.5 % Human labelers 5.7 % 3.3 % } * } } * * } } * * (1) Discourse Boundary: majority class == no boundary Acknowledgment: {right, huh} no ACK; all others ACK (*) Significantly different (Wilcoxon signed rank sum test; p < 0.05) Agustín Gravano Thesis Defense Jan 28, 2009
49
Agustín Gravano - Thesis Defense - Jan 28, 2009
Affirmative Cue Words Speaker Entrainment In conversation, people adapt the way they speak to match their partner. Referring expressions (Brennan 1996). Syntactic constructions (Reitter et al. 2006). Intensity (Coulston et al. 2002, Ward & Litman 2007). Entrainment at different levels (lex, syn, sem): Key for both production and understanding, and facilitates interaction (Pickering & Garrod 2004, Goleman 2006). Predictor of task success (MapTask; Reitter & Moore 2007). Agustín Gravano Thesis Defense Jan 28, 2009
50
Agustín Gravano - Thesis Defense - Jan 28, 2009
Affirmative Cue Words Speaker Entrainment Two novel measures of entrainment based on usage of high-frequency words (HFW), including ACW. Entrainment of HFW correlates with: (+) Game score Task success (+) Proportion of overlaps (–) Proportion of interruptions Dialogue coordination (–) Latency of smooth switches Future work: Establish causality relation. Impact on IVR system design and/or evaluation. } Agustín Gravano Thesis Defense Jan 28, 2009
51
Agustín Gravano - Thesis Defense - Jan 28, 2009
(1) Columbia Games Corpus (2) Study of Turn-Taking (3) Study of Affirmative Cue Words Agustín Gravano Thesis Defense Jan 28, 2009
52
Agustín Gravano - Thesis Defense - Jan 28, 2009
Contributions Columbia Games Corpus Valuable dataset for studying spontaneous task-oriented dialogue. Study of Turn-Taking Turn-yielding cues. Backchannel-inviting cues. Objective, automatically computable. Combined cues. Improve turn-taking decisions of IVR systems. Agustín Gravano Thesis Defense Jan 28, 2009
53
Agustín Gravano - Thesis Defense - Jan 28, 2009
Contributions Study of Affirmative Cue Words Descriptive statistics and perceptual results. Automatic classification. Speaker entrainment. Understanding and generation in IVR systems. Results drawn from task-oriented dialogues, thus not necessarily generalizable, but suitable for most IVR domains. Necessary steps towards the ambitious, long-term goal of human-like speech systems. Agustín Gravano Thesis Defense Jan 28, 2009
54
Agustín Gravano - Thesis Defense - Jan 28, 2009
Future Work Additional turn-taking cues. Voice quality? Novel ways to combine cues. Weights? Study cues that extend over entire turns, increasing near potential turn boundaries. Characterize interruptions. Speaker entrainment Affirmative cue words. Turn-taking behavior. Acoustic/prosodic variation. Agustín Gravano Thesis Defense Jan 28, 2009
55
Turn-Taking and Affirmative Cue Words in Task-Oriented Dialogue
Agustín Gravano Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Columbia University
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.