Challenges in Dialogue

Slides:



Advertisements
Similar presentations
Conversations  Conversation are cooperative events:  Without cooperation, interaction would be chaotic. Would be no reason to communicate  Grice's.
Advertisements

Language and communication What is language? How do we communicate? Pragmatic principles Common ground.
Social Interaction Functions Making Conversations Work.
/ nailon / – software for online analysis of prosody Interspeech 2006 special session: The prosody of turn-taking and dialog acts September 20, 2006 Jens.
U1, Speech in the interface:2. Dialogue Management1 Module u1: Speech in the Interface 2: Dialogue Management Jacques Terken HG room 2:40 tel. (247) 5254.
Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.
Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue SIGDial 2004 Gina-Anne Levow April 30, 2004.
Turn-taking in Mandarin Dialogue: Interactions of Tone and Intonation Gina-Anne Levow University of Chicago October 14, 2005.
Classification of Discourse Functions of Affirmative Words in Spoken Dialogue Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Shira Mitchell, Ilia.
6/28/20151 Spoken Dialogue Systems: Human and Machine Julia Hirschberg CS 4706.
Psycholinguistics 09 Conversational Interaction. Conversation is a complex process of language use and a special form of social interaction with its own.
1 Chapter 19: Dialogue and Conversational Agents Nadia Hamrouni and Ahmed Abbasi 12/5/2006.
Discourse Markers Discourse & Dialogue CS November 25, 2006.
Theories of Discourse and Dialogue. Discourse Any set of connected sentences This set of sentences gives context to the discourse Some language phenomena.
A Study in Cross-Cultural Interpretations of Back-Channeling Behavior Yaffa Al Bayyari Nigel Ward The University of Texas at El Paso Department of Computer.
Chapter 7. BEAT: the Behavior Expression Animation Toolkit
Qualitative Data Analysis: An introduction Carol Grbich Chapter 18: Conversation analysis.
Turn-taking Discourse and Dialogue CS 359 November 6, 2001.
Issues in Multiparty Dialogues Ronak Patel. Current Trend  Only two-party case (a person and a Dialog system  Multi party (more than two persons Ex.
ENTERFACE 08 Project 1 “MultiParty Communication with a Tour Guide ECA” Mid-term presentation August 19th, 2008.
Pragmatics.
1 Natural Language Processing Lecture Notes 14 Chapter 19.
Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC November 3, 2006.
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
Discourse & Dialogue CS 359 November 13, 2001
ADRESS FORMS AND POLITENESS Second person- used when the subject of the verb in a sentence is the same as the individual to.
Dialogue Ling 571 Fei Xia Week 8: 11/15/05. Outline Properties of dialogues Dialogue acts Dialogue manager.
Challenges in Dialogue Discourse and Dialogue CMSC October 27, 2006.
Dialogue Act Tagging Discourse and Dialogue CMSC November 4, 2004.
May 2006CLINT CS Dialogue1 Computational Linguistics Introduction NL Dialogue Systems.
Natural conversation “When we investigate how dialogues actually work, as found in recordings of natural speech, we are often in for a surprise. We are.
TOPIC MANAGEMENT AND TURN-TAKING Discourse Strategies used by speakers and how cooperation is achieved.
Lexical, Prosodic, and Syntactics Cues for Dialog Acts.
Adapting Dialogue Models Discourse & Dialogue CMSC November 19, 2006.
Intention & Cooperation Discourse and Dialogue CS 359 October 18, 2001.
Grounding and Repair Joe Tepperman CS 599 – Dialogue Modeling Fall 2005.
Agent-Based Dialogue Management Discourse & Dialogue CMSC November 10, 2006.
Chapter 8 Spoken Discourse. Linguistic Competence communicative competence: the knowledge we bring to using language as a communicative tool in conversation.
Analysis of spontaneous speech
COMMUNICATION OF MEANING
Investigating Pitch Accent Recognition in Non-native Speech
PRAGMATICS 3.
Grounding by nodding GESPIN 2009, Poznan, Poland
Conditional Random Fields for ASR
Functions of intonation 1
Recognizing Structure: Dialogue Acts and Segmentation
Issues in Spoken Dialogue Systems
Spoken Dialogue Systems: Human and Machine
Spoken Dialogue Systems: Managing Interaction
Intonational and Its Meanings
Intonational and Its Meanings
Seminar Four Quality Academic feedback: oral and written
THE NATURE OF SPEAKING Joko Nurkamto UNS Solo.
Chapter 4 – Communication Skills
Dialogue Acts Julia Hirschberg CS /18/2018.
Communication Skills Copyright ©2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible.
Turn-taking and Disfluencies
Studying Spoken Language Text 17, 18 and 19
Learner resource 7 Features of spoken discourse
Recognizing Structure: Sentence, Speaker, andTopic Segmentation
Interest-Based Problem Solving
SECOND LANGUAGE LISTENING Comprehension: Process and Pedagogy
The Cooperative Principle
Lecture 30 Dialogues November 3, /16/2019.
Spoken Dialogue Systems: System Overview
Recognizing Structure: Dialogue Acts and Segmentation
The Cooperative Principle
Communicative Resources
Challenges in Dialogue
Tinnitus activities THERAPY
Presentation transcript:

Challenges in Dialogue Natural Language Processing CMSC 35100-1 February 19, 2008

Roadmap Issues in Dialogue Dialogue vs General Discourse Dialogue Acts Modeling Recognition and Interpretation Dialogue Management for Computational Agents

Dialogue vs General Discourse Key contrast: Two or more speakers Primary focus on speech Issues in multi-party spoken dialogue Turn-taking – who speaks next, when? Collaboration – clarification, feedback,… Disfluencies Adjacency pairs, dialogue acts

Turn-Taking Multi-party discourse When? Need to trade off speaker/hearer roles Interpret reference from sequential utterances When? End of sentence? No: multi-utterance turns Silence? No: little silence in smooth dialogue:< 250ms When other starts speaking? No: relatively little overlap face-to-face: ~5%

Turn-taking: When Rule-governed behavior Possibly multiple legal turn change times Aka transition-relevance places (TRP) Generally at utterance boundaries Utterance not necessarily sentence In fact, utterance/sentence boundaries not obvious in speech Don’t necessarily pause between sentences Automatic utterance boundary detection Cue words (okay, so,..); POS sequences; prosody

Turn-taking: Who & How At each TRP in each turn (Sacks 1974) If speaker has selected A to speak, A must take floor If speaker has selected no one to speak, anyone can If no one else takes the turn, the speaker can Selecting speaker A: By explicit/implicit mention: What about it, Bob? By gaze, function Selecting others: questions, greetings, closing (Traum et al., 2003)

Turn-taking in HCI Human turn end: System turn end: Detected by 250ms silence System turn end: Signaled by end of speech Indicated by any human sound Barge-in Continued attention: No signal

Gesture, Gaze & Voice Range of gestural signals: head (nod,shake), shoulder, hand, leg, foot movements; facial expressions; postures; artifacts Align with syllables Units: phonemic clause + change Study with recorded exchanges

Yielding the Floor Turn change signal Offer floor to auditor/hearer Cues: pitch fall, lengthening, “but uh”, end gesture, amplitude drop+’uh’, end clause Likelihood of change increases with more cues Negated by any gesticulation

Taking the Floor Speaker-state signal Occurs at beginning of turns Indicate becoming speaker Occurs at beginning of turns Cues: Shift in head direction AND/OR Start of gesture

Retaining the Floor Within-turn signal Continuation signal Still speaker: Look at hearer as end clause Continuation signal Still speaker: Look away after within-turn/back Back-channel: ‘mmhm’/okay/etc; nods, sentence completion. Clarification request; restate NOT a turn: signal attention, agreement, confusion

Segmenting Turns Speaker alone: Joint signal: Within-turn signal->end of one unit; Continuation signal -. Beginning of next unit Joint signal: Speaker turn signal (end); auditor ->speaker; speaker->auditor Within-turn + back-channel + continuation Back-channels signal understanding Early back-channel + continuation

Regaining Attention Gaze & Disfluency Disfluency: “perturbation” in speech Silent pause, filled pause, restart Gaze: Conversants don’t stare at each other constantly However, speaker expects to meet hearer’s gaze Confirm hearer’s attention Disfluency occurs when realize hearer NOT attending Pause until begin gazing, or to request attention

Improving Human-Computer Turn-taking Identifying cues to turn change and turn start Meeting conversations: Recorded, natural research meetings Multi-party Overlapping speech Units = “Spurts” between 500ms silence Can predict – on-line – likely turn end

Text + Prosody Text sequence: Prosody: Integrate LM + DT Modeled as n-gram language model Implement as HMM Prosody: Duration, Pitch, Pause, Energy Decision trees: classify + probability Integrate LM + DT

Decision Trees A X=t X=f B C Y>1 Y<=1 Y<=2 Y>2 D E F G Sentence End None Sentence End Disfluency

Interpreting Breaks For each inter-word position: Key features: Is it a disfluency, sentence end, or continuation? Key features: Pause duration, vowel duration 62% accuracy wrt 50% chance baseline ~90% overall Best combines LM & DT

Jump-in Points (Used) Possible turn changes Key features: Points WITHIN spurt where new speaker starts Key features: Pause duration, low energy, pitch fall Accuracy: 65% wrt 50% baseline Performance depends only on preceding prosodic features

Jump-in Features Do people speak differently when jump-in? Differ from regular turn starts? Examine only first words of turns No LM Key features: Raised pitch, raised amplitude Accuracy: 77% wrt 50% baseline Prosody only

Collaborative Communication Speaker tries to establish and add to “common ground” – “mutual belief” Presumed a joint, collaborative activity Make sure “mutually believe” the same thing Hearer can acknowledge/accept/disagree Clark & Schaeffer: Degrees of grounding Display, Demonstrate/Reformulate, Acknowledgement, Next relevant contribution, Continued attention

Computational Models (Traum et al) revised for computation Involves both speaker and hearer Initiate, Continue, Acknowledge, Repair, Request Repair, etc Common phenomena “Back-Channel” – “uh-huh”, “okay”, etc Allows hearer to signal continued attention, ack WITHOUT taking the turn Requests for repair – common in human-human Even more common in human-computer dialogue

Implicature & Grice’s Maxims Inferences licensed by utterances Grice’s Maxims Quantity: Be as informative as required “There are two classes per week” – not 1, or 5 Quality: Be truthful – don’t lie, Relevance: Be relevant Manner: “Be perspicuous” Don’t be obscure, ambiguous, prolix, or disorderly “Flouting” maxims: Consciously violate for effect Humor, emphasis,

Speech & Dialogue Acts Speech Acts (Austin, Searle) “Doing things with words” E.g. performatives: “I dub thee Sir Lancelot” Illocutionary acts: act of asking, answering, promising, etc in saying an utterance Include: Assertives: “I propose to..” , Directives: “Stop that”, Commissives: “I promise”, Expressives: “Thank you”, Declarations: “You’re fired”

Dialogue Acts (aka Conversational moves) Enriched set of speech acts Capture full range of conversational functions Adjacency pairs: Many two-part structures E.g. Question-Answer, Greeting-Greeting, Request-Grant, etc… Paired for speaker-hearer dyads Contrast with rhetorical relations in monologue

DAMSL Dialogue Act Tagging framework Forward looking functions Adjacency pairs+grounding+repair Forward looking functions Statement, info-request, commit, closing, etc Backward looking functions Focus on link to prior speaker utterance Agreement, answer, accept, etc..

Tagged Dialogue [assert] C1: . . . I need to travel in May. [inforeq,ack] A1: And, what day in May did you want to travel? [assert,answer] C2: OK uh I need to be there for a meeting that’s from the 12th to the 15th. [inforeq,ack] A2: And you’re flying into what city? [assert,answer]C3: Seattle. [inforeq,ack] A3: And what time would you like to leave Pittsburgh? [check,hold] C4: Uh hmm I dont think theres many options for nonstop. [accept,ack] A4: Right. [assert] There’s three non-stops today. [info-req] C5: What are they? [assert,open-option] A5: The first one departs PGH at 10:00am arrives Seattle at 12:05 their time. The second flight departs PGH at 5:55pm, arrives Seattle at 8pm. And the last flight departs PGH at 8:15pm arrives Seattle at 10:28pm. [accept,ack] C6: OK Ill take the 5ish flight on the night before on the11th. [check,ack] A6: On the 11th? [assert,ack] OK. Departing at 5:55pm arrives Seattle at 8pm, U.S. Air flight 115. [ack] C7: OK. Tagged Dialogue

Dialogue Act Recognition Goal: Identify dialogue act tag(s) from surface form Challenge: Surface form can be ambiguous “Can you X?” – yes/no question, or info-request “Flying on the 11th, at what time?” – check, statement Requires interpretation by hearer Strategies: Plan inference, cue recognition

Plan-inference-based Classic AI (BDI) planning framework Model Belief, Knowledge, Desire Formal definition with predicate calculus Axiomatization of plans and actions as well STRIPS-style: Preconditions, Effects, Body Rules for plan inference Elegant, but.. Labor-intensive rule, KB, heuristic development Effectively AI-complete

Cue-based Interpretation Employs sets of features to identify Words and collocations: Please -> request Prosody: Rising pitch -> yes/no question Conversational structure: prior act Example: Check: Syntax: tag question “,right?” Syntax + prosody: Fragment with rise N-gram: argmax d P(d)P(W|d) So you, sounds like, etc Details later ….

From Human to Computer Conversational agents Issues: Systems that (try to) participate in dialogues Examples: Directory assistance, travel info, weather, restaurant and navigation info Issues: Limited understanding: ASR errors, interpretation Computational costs: broader coverage -> slower, less accurate

Dialogue Manager Tradeoffs Flexibility vs Simplicity/Predictability System vs User vs Mixed Initiative Order of dialogue interaction Conversational “naturalness” vs Accuracy Cost of model construction, generalization, learning, etc Models: FST, Frame-based, HMM, BDI Evaluation frameworks