Challenges in Dialogue Discourse and Dialogue CMSC 35900-1 October 27, 2006.

Slides:



Advertisements
Similar presentations
Conversations  Conversation are cooperative events:  Without cooperation, interaction would be chaotic. Would be no reason to communicate  Grice's.
Advertisements

Language and communication What is language? How do we communicate? Pragmatic principles Common ground.
The Cooperative Principle
Social Interaction Functions Making Conversations Work.
/ nailon / – software for online analysis of prosody Interspeech 2006 special session: The prosody of turn-taking and dialog acts September 20, 2006 Jens.
Clippit Post Mortem Panel Tim Bickmore John Davis Lewis Johnson Brian Whitworth.
Results Clear distinction between two question intonations: perception and understanding level Three distinct prototypes for different interpretations.
User interaction ‘Rules’ of Human-Human Conversation
U1, Speech in the interface:2. Dialogue Management1 Module u1: Speech in the Interface 2: Dialogue Management Jacques Terken HG room 2:40 tel. (247) 5254.
People & Speech Interfaces CS 260 Wednesday, October 4, 2006.
Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.
Error detection in spoken dialogue systems GSLT Dialogue Systems, 5p Gabriel Skantze TT Centrum för talteknologi.
Language as Action James Pustejovsky USEM 40a Spring 2006.
SAI User-System Interaction U1, Speech in the interface: 6. Human communication1 Module u1: Speech in the Interface 6: Human Communication Jacques Terken.
Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue SIGDial 2004 Gina-Anne Levow April 30, 2004.
Grounding in Communication Herbert H. Clark and Susan E. Brennan.
Information, action and negotiation in dialogue systems Staffan Larsson Kings College, Jan 2001.
Turn-taking in Mandarin Dialogue: Interactions of Tone and Intonation Gina-Anne Levow University of Chicago October 14, 2005.
Classification of Discourse Functions of Affirmative Words in Spoken Dialogue Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Shira Mitchell, Ilia.
6/28/20151 Spoken Dialogue Systems: Human and Machine Julia Hirschberg CS 4706.
1 Chapter 19: Dialogue and Conversational Agents Nadia Hamrouni and Ahmed Abbasi 12/5/2006.
Semantics 3rd class Chapter 5.
Discourse Markers Discourse & Dialogue CS November 25, 2006.
Recognition of meeting actions using information obtained from different modalities Natasa Jovanovic TKI University of Twente.
Communicative Resources. How Do We Communicate? Conversation involves more than language – Gestures, facial expressions, tone of voice, … – Face-to-face.
Theories of Discourse and Dialogue. Discourse Any set of connected sentences This set of sentences gives context to the discourse Some language phenomena.
A Study in Cross-Cultural Interpretations of Back-Channeling Behavior Yaffa Al Bayyari Nigel Ward The University of Texas at El Paso Department of Computer.
Chapter 7. BEAT: the Behavior Expression Animation Toolkit
APML, a Markup Language for Believable Behavior Generation Soft computing Laboratory Yonsei University October 25, 2004.
Qualitative Data Analysis: An introduction Carol Grbich Chapter 18: Conversation analysis.
Features of Spoken Discourse
circle Adding Spoken Dialogue to a Text-Based Tutorial Dialogue System Diane J. Litman Learning Research and Development Center & Computer Science Department.
Turn-taking Discourse and Dialogue CS 359 November 6, 2001.
Issues in Multiparty Dialogues Ronak Patel. Current Trend  Only two-party case (a person and a Dialog system  Multi party (more than two persons Ex.
ENTERFACE 08 Project 1 “MultiParty Communication with a Tour Guide ECA” Mid-term presentation August 19th, 2008.
Pragmatics.
1 Natural Language Processing Lecture Notes 14 Chapter 19.
Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC November 3, 2006.
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
Wrapping Up Ling575 Spoken Dialog Systems June 5, 2013.
Discourse & Dialogue CS 359 November 13, 2001
ADRESS FORMS AND POLITENESS Second person- used when the subject of the verb in a sentence is the same as the individual to.
Dialogue Ling 571 Fei Xia Week 8: 11/15/05. Outline Properties of dialogues Dialogue acts Dialogue manager.
Dialogue Act Tagging Discourse and Dialogue CMSC November 4, 2004.
Turn-taking and Backchannels Ryan Lish. Turn-taking We all learned it in preschool, right? Also an essential part of conversation Basic phenomenon of.
May 2006CLINT CS Dialogue1 Computational Linguistics Introduction NL Dialogue Systems.
Natural conversation “When we investigate how dialogues actually work, as found in recordings of natural speech, we are often in for a surprise. We are.
TOPIC MANAGEMENT AND TURN-TAKING Discourse Strategies used by speakers and how cooperation is achieved.
User Responses to Prosodic Variation in Fragmentary Grounding Utterances in Dialog Gabriel Skantze, David House & Jens Edlund.
Lexical, Prosodic, and Syntactics Cues for Dialog Acts.
Adapting Dialogue Models Discourse & Dialogue CMSC November 19, 2006.
Intention & Cooperation Discourse and Dialogue CS 359 October 18, 2001.
Grounding and Repair Joe Tepperman CS 599 – Dialogue Modeling Fall 2005.
Agent-Based Dialogue Management Discourse & Dialogue CMSC November 10, 2006.
Conversation and Preference Structure. Conversation Analysis Conversation analysis is a popular approach to the study of discourse. Conversation analysis.
Lesson 20: Public Speaking Teen Leadership. Effective Communication for Leaders Why is it important to learn to communicate more effectively? Your future.
Language: Comprehension, Production, & Bilingualism Dr. Claudia J. Stanny EXP 4507 Memory & Cognition Spring 2009.
Aristotel‘s concept to language studies was to study true or false sentences - propositions; Thomas Reid described utterances of promising, warning, forgiving.
Chapter 8 Spoken Discourse. Linguistic Competence communicative competence: the knowledge we bring to using language as a communicative tool in conversation.
Challenges in Dialogue
Recognizing Structure: Dialogue Acts and Segmentation
Dialogue Acts Julia Hirschberg CS /18/2018.
Turn-taking and Disfluencies
Studying Spoken Language Text 17, 18 and 19
Recognizing Structure: Sentence, Speaker, andTopic Segmentation
SECOND LANGUAGE LISTENING Comprehension: Process and Pedagogy
Spoken Dialogue Systems: System Overview
Recognizing Structure: Dialogue Acts and Segmentation
Communicative Resources
Challenges in Dialogue
Presentation transcript:

Challenges in Dialogue Discourse and Dialogue CMSC October 27, 2006

Roadmap Issues in Dialogue –Dialogue vs General Discourse –Dialogue Acts Modeling Recognition and Interpretation –Dialogue Management for Computational Agents

Dialogue vs General Discourse Key contrast: Two or more speakers –Primary focus on speech Issues in multi-party spoken dialogue –Turn-taking – who speaks next, when? –Collaboration – clarification, feedback,… –Disfluencies –Adjacency pairs, dialogue acts

Turn-Taking Multi-party discourse –Need to trade off speaker/hearer roles Interpret reference from sequential utterances When? –End of sentence? No: multi-utterance turns –Silence? No: little silence in smooth dialogue:< 250ms –When other starts speaking? No: relatively little overlap face-to-face: ~5%

Turn-taking: When Rule-governed behavior –Possibly multiple legal turn change times Aka transition-relevance places (TRP) Generally at utterance boundaries –Utterance not necessarily sentence –In fact, utterance/sentence boundaries not obvious in speech »Don’t necessarily pause between sentences Automatic utterance boundary detection –Cue words (okay, so,..); POS sequences; prosody

Turn-taking: Who & How At each TRP in each turn (Sacks 1974) –If speaker has selected A to speak, A must take floor –If speaker has selected no one to speak, anyone can –If no one else takes the turn, the speaker can Selecting speaker A: –By explicit/implicit mention: What about it, Bob? By gaze, function Selecting others: questions, greetings, closing –(Traum et al., 2003)

Turn-taking in HCI Human turn end: – Detected by 250ms silence System turn end: –Signaled by end of speech –Indicated by any human sound Barge-in Continued attention: –No signal

Gesture, Gaze & Voice Range of gestural signals: –head (nod,shake), shoulder, hand, leg, foot movements; facial expressions; postures; artifacts –Align with syllables Units: phonemic clause + change Study with recorded exchanges

Yielding the Floor Turn change signal –Offer floor to auditor/hearer Cues: pitch fall, lengthening, “but uh”, end gesture, amplitude drop+’uh’, end clause Likelihood of change increases with more cues Negated by any gesticulation

Taking the Floor Speaker-state signal –Indicate becoming speaker Occurs at beginning of turns Cues: –Shift in head direction AND/OR –Start of gesture

Retaining the Floor Within-turn signal –Still speaker: Look at hearer as end clause Continuation signal –Still speaker: Look away after within-turn/back Back-channel: –‘mmhm’/okay/etc; nods, sentence completion. Clarification request; restate –NOT a turn: signal attention, agreement, confusion

Segmenting Turns Speaker alone: –Within-turn signal->end of one unit; –Continuation signal -. Beginning of next unit Joint signal: –Speaker turn signal (end); auditor ->speaker; speaker- >auditor –Within-turn + back-channel + continuation Back-channels signal understanding –Early back-channel + continuation

Regaining Attention Gaze & Disfluency –Disfluency: “perturbation” in speech Silent pause, filled pause, restart –Gaze: Conversants don’t stare at each other constantly However, speaker expects to meet hearer’s gaze –Confirm hearer’s attention Disfluency occurs when realize hearer NOT attending –Pause until begin gazing, or to request attention

Improving Human-Computer Turn-taking Identifying cues to turn change and turn start Meeting conversations: –Recorded, natural research meetings –Multi-party –Overlapping speech –Units = “Spurts” between 500ms silence Can predict – on-line – likely turn end

Text + Prosody Text sequence: –Modeled as n-gram language model –Implement as HMM Prosody: –Duration, Pitch, Pause, Energy –Decision trees: classify + probability Integrate LM + DT

Decision Trees A BC DE F G X=tX=f Y>1 Y<=1 Y>2 Y<=2 Disfluency Sentence End None

Interpreting Breaks For each inter-word position: –Is it a disfluency, sentence end, or continuation? Key features: –Pause duration, vowel duration 62% accuracy wrt 50% chance baseline –~90% overall Best combines LM & DT

Jump-in Points (Used) Possible turn changes –Points WITHIN spurt where new speaker starts Key features: –Pause duration, low energy, pitch fall Accuracy: 65% wrt 50% baseline Performance depends only on preceding prosodic features

Jump-in Features Do people speak differently when jump-in? –Differ from regular turn starts? Examine only first words of turns –No LM Key features: –Raised pitch, raised amplitude Accuracy: 77% wrt 50% baseline –Prosody only

Collaborative Communication Speaker tries to establish and add to “common ground” – “mutual belief” –Presumed a joint, collaborative activity Make sure “mutually believe” the same thing –Hearer can acknowledge/accept/disagree »Clark & Schaeffer: Degrees of grounding Display, Demonstrate/Reformulate, Acknowledgement, Next relevant contribution, Continued attention

Computational Models (Traum et al) revised for computation –Involves both speaker and hearer Initiate, Continue, Acknowledge, Repair, Request Repair, etc –Common phenomena “Back-Channel” – “uh-huh”, “okay”, etc –Allows hearer to signal continued attention, ack »WITHOUT taking the turn Requests for repair – common in human-human –Even more common in human-computer dialogue

Implicature & Grice’s Maxims Inferences licensed by utterances Grice’s Maxims –Quantity: Be as informative as required “There are two classes per week” – not 1, or 5 –Quality: Be truthful – don’t lie, –Relevance: Be relevant –Manner: “Be perspicuous” Don’t be obscure, ambiguous, prolix, or disorderly “Flouting” maxims: Consciously violate for effect –Humor, emphasis,

Speech & Dialogue Acts Speech Acts (Austin, Searle) –“Doing things with words” E.g. performatives: “I dub thee Sir Lancelot” –Illocutionary acts: act of asking, answering, promising, etc in saying an utterance Include: Assertives: “I propose to..”, Directives: “Stop that”, Commissives: “I promise”, Expressives: “Thank you”, Declarations: “You’re fired”

Dialogue Acts (aka Conversational moves) –Enriched set of speech acts Capture full range of conversational functions –Adjacency pairs: Many two-part structures E.g. Question-Answer, Greeting-Greeting, Request- Grant, etc… Paired for speaker-hearer dyads –Contrast with rhetorical relations in monologue

DAMSL Dialogue Act Tagging framework –Adjacency pairs+grounding+repair Forward looking functions –Statement, info-request, commit, closing, etc Backward looking functions –Focus on link to prior speaker utterance Agreement, answer, accept, etc..

Tagged Dialogue [assert] C1:... I need to travel in May. [inforeq,ack] A1: And, what day in May did you want to travel? [assert,answer] C2: OK uh I need to be there for a meeting that’s from the 12th to the 15th. [inforeq,ack] A2: And you’re flying into what city? [assert,answer]C3: Seattle. [inforeq,ack] A3: And what time would you like to leave Pittsburgh? [check,hold] C4: Uh hmm I dont think theres many options for nonstop. [accept,ack] A4: Right. [assert] There’s three non-stops today. [info-req] C5: What are they? [assert,open-option] A5: The first one departs PGH at 10:00am arrives Seattle at 12:05 their time. The second flight departs PGH at 5:55pm, arrives Seattle at 8pm. And the last flight departs PGH at 8:15pm arrives Seattle at 10:28pm. [accept,ack] C6: OK Ill take the 5ish flight on the night before on the11th. [check,ack] A6: On the 11th? [assert,ack] OK. Departing at 5:55pm arrives Seattle at 8pm, U.S. Air flight 115. [ack] C7: OK.

Dialogue Act Recognition Goal: Identify dialogue act tag(s) from surface form Challenge: Surface form can be ambiguous –“Can you X?” – yes/no question, or info-request “Flying on the 11t h, at what time?” – check, statement Requires interpretation by hearer –Strategies: Plan inference, cue recognition

Plan-inference-based Classic AI (BDI) planning framework –Model Belief, Knowledge, Desire Formal definition with predicate calculus –Axiomatization of plans and actions as well –STRIPS-style: Preconditions, Effects, Body –Rules for plan inference Elegant, but.. –Labor-intensive rule, KB, heuristic development –Effectively AI-complete

Cue-based Interpretation Employs sets of features to identify –Words and collocations: Please -> request –Prosody: Rising pitch -> yes/no question –Conversational structure: prior act Example: Check: Syntax: tag question “,right?” Syntax + prosody: Fragment with rise N-gram: argmax d P(d)P(W|d) –So you, sounds like, etc Details later ….

From Human to Computer Conversational agents –Systems that (try to) participate in dialogues –Examples: Directory assistance, travel info, weather, restaurant and navigation info Issues: –Limited understanding: ASR errors, interpretation –Computational costs: broader coverage -> slower, less accurate

Dialogue Manager Tradeoffs Flexibility vs Simplicity/Predictability –System vs User vs Mixed Initiative –Order of dialogue interaction –Conversational “naturalness” vs Accuracy –Cost of model construction, generalization, learning, etc Models: FST, Frame-based, HMM, BDI Evaluation frameworks