Download presentation
Presentation is loading. Please wait.
1
Challenges in Dialogue
Natural Language Processing CMSC February 19, 2008
2
Roadmap Issues in Dialogue Dialogue vs General Discourse Dialogue Acts
Modeling Recognition and Interpretation Dialogue Management for Computational Agents
3
Dialogue vs General Discourse
Key contrast: Two or more speakers Primary focus on speech Issues in multi-party spoken dialogue Turn-taking – who speaks next, when? Collaboration – clarification, feedback,… Disfluencies Adjacency pairs, dialogue acts
4
Turn-Taking Multi-party discourse When?
Need to trade off speaker/hearer roles Interpret reference from sequential utterances When? End of sentence? No: multi-utterance turns Silence? No: little silence in smooth dialogue:< 250ms When other starts speaking? No: relatively little overlap face-to-face: ~5%
5
Turn-taking: When Rule-governed behavior
Possibly multiple legal turn change times Aka transition-relevance places (TRP) Generally at utterance boundaries Utterance not necessarily sentence In fact, utterance/sentence boundaries not obvious in speech Don’t necessarily pause between sentences Automatic utterance boundary detection Cue words (okay, so,..); POS sequences; prosody
6
Turn-taking: Who & How At each TRP in each turn (Sacks 1974)
If speaker has selected A to speak, A must take floor If speaker has selected no one to speak, anyone can If no one else takes the turn, the speaker can Selecting speaker A: By explicit/implicit mention: What about it, Bob? By gaze, function Selecting others: questions, greetings, closing (Traum et al., 2003)
7
Turn-taking in HCI Human turn end: System turn end:
Detected by 250ms silence System turn end: Signaled by end of speech Indicated by any human sound Barge-in Continued attention: No signal
8
Gesture, Gaze & Voice Range of gestural signals:
head (nod,shake), shoulder, hand, leg, foot movements; facial expressions; postures; artifacts Align with syllables Units: phonemic clause + change Study with recorded exchanges
9
Yielding the Floor Turn change signal
Offer floor to auditor/hearer Cues: pitch fall, lengthening, “but uh”, end gesture, amplitude drop+’uh’, end clause Likelihood of change increases with more cues Negated by any gesticulation
10
Taking the Floor Speaker-state signal Occurs at beginning of turns
Indicate becoming speaker Occurs at beginning of turns Cues: Shift in head direction AND/OR Start of gesture
11
Retaining the Floor Within-turn signal Continuation signal
Still speaker: Look at hearer as end clause Continuation signal Still speaker: Look away after within-turn/back Back-channel: ‘mmhm’/okay/etc; nods, sentence completion. Clarification request; restate NOT a turn: signal attention, agreement, confusion
12
Segmenting Turns Speaker alone: Joint signal:
Within-turn signal->end of one unit; Continuation signal -. Beginning of next unit Joint signal: Speaker turn signal (end); auditor ->speaker; speaker->auditor Within-turn + back-channel + continuation Back-channels signal understanding Early back-channel + continuation
13
Regaining Attention Gaze & Disfluency
Disfluency: “perturbation” in speech Silent pause, filled pause, restart Gaze: Conversants don’t stare at each other constantly However, speaker expects to meet hearer’s gaze Confirm hearer’s attention Disfluency occurs when realize hearer NOT attending Pause until begin gazing, or to request attention
14
Improving Human-Computer Turn-taking
Identifying cues to turn change and turn start Meeting conversations: Recorded, natural research meetings Multi-party Overlapping speech Units = “Spurts” between 500ms silence Can predict – on-line – likely turn end
15
Text + Prosody Text sequence: Prosody: Integrate LM + DT
Modeled as n-gram language model Implement as HMM Prosody: Duration, Pitch, Pause, Energy Decision trees: classify + probability Integrate LM + DT
16
Decision Trees A X=t X=f B C Y>1 Y<=1 Y<=2 Y>2 D E F G
Sentence End None Sentence End Disfluency
17
Interpreting Breaks For each inter-word position: Key features:
Is it a disfluency, sentence end, or continuation? Key features: Pause duration, vowel duration 62% accuracy wrt 50% chance baseline ~90% overall Best combines LM & DT
18
Jump-in Points (Used) Possible turn changes Key features:
Points WITHIN spurt where new speaker starts Key features: Pause duration, low energy, pitch fall Accuracy: 65% wrt 50% baseline Performance depends only on preceding prosodic features
19
Jump-in Features Do people speak differently when jump-in?
Differ from regular turn starts? Examine only first words of turns No LM Key features: Raised pitch, raised amplitude Accuracy: 77% wrt 50% baseline Prosody only
20
Collaborative Communication
Speaker tries to establish and add to “common ground” – “mutual belief” Presumed a joint, collaborative activity Make sure “mutually believe” the same thing Hearer can acknowledge/accept/disagree Clark & Schaeffer: Degrees of grounding Display, Demonstrate/Reformulate, Acknowledgement, Next relevant contribution, Continued attention
21
Computational Models (Traum et al) revised for computation
Involves both speaker and hearer Initiate, Continue, Acknowledge, Repair, Request Repair, etc Common phenomena “Back-Channel” – “uh-huh”, “okay”, etc Allows hearer to signal continued attention, ack WITHOUT taking the turn Requests for repair – common in human-human Even more common in human-computer dialogue
22
Implicature & Grice’s Maxims
Inferences licensed by utterances Grice’s Maxims Quantity: Be as informative as required “There are two classes per week” – not 1, or 5 Quality: Be truthful – don’t lie, Relevance: Be relevant Manner: “Be perspicuous” Don’t be obscure, ambiguous, prolix, or disorderly “Flouting” maxims: Consciously violate for effect Humor, emphasis,
23
Speech & Dialogue Acts Speech Acts (Austin, Searle)
“Doing things with words” E.g. performatives: “I dub thee Sir Lancelot” Illocutionary acts: act of asking, answering, promising, etc in saying an utterance Include: Assertives: “I propose to..” , Directives: “Stop that”, Commissives: “I promise”, Expressives: “Thank you”, Declarations: “You’re fired”
24
Dialogue Acts (aka Conversational moves) Enriched set of speech acts
Capture full range of conversational functions Adjacency pairs: Many two-part structures E.g. Question-Answer, Greeting-Greeting, Request-Grant, etc… Paired for speaker-hearer dyads Contrast with rhetorical relations in monologue
25
DAMSL Dialogue Act Tagging framework Forward looking functions
Adjacency pairs+grounding+repair Forward looking functions Statement, info-request, commit, closing, etc Backward looking functions Focus on link to prior speaker utterance Agreement, answer, accept, etc..
26
Tagged Dialogue [assert] C1: . . . I need to travel in May.
[inforeq,ack] A1: And, what day in May did you want to travel? [assert,answer] C2: OK uh I need to be there for a meeting that’s from the 12th to the 15th. [inforeq,ack] A2: And you’re flying into what city? [assert,answer]C3: Seattle. [inforeq,ack] A3: And what time would you like to leave Pittsburgh? [check,hold] C4: Uh hmm I dont think theres many options for nonstop. [accept,ack] A4: Right. [assert] There’s three non-stops today. [info-req] C5: What are they? [assert,open-option] A5: The first one departs PGH at 10:00am arrives Seattle at 12:05 their time. The second flight departs PGH at 5:55pm, arrives Seattle at 8pm. And the last flight departs PGH at 8:15pm arrives Seattle at 10:28pm. [accept,ack] C6: OK Ill take the 5ish flight on the night before on the11th. [check,ack] A6: On the 11th? [assert,ack] OK. Departing at 5:55pm arrives Seattle at 8pm, U.S. Air flight 115. [ack] C7: OK. Tagged Dialogue
27
Dialogue Act Recognition
Goal: Identify dialogue act tag(s) from surface form Challenge: Surface form can be ambiguous “Can you X?” – yes/no question, or info-request “Flying on the 11th, at what time?” – check, statement Requires interpretation by hearer Strategies: Plan inference, cue recognition
28
Plan-inference-based
Classic AI (BDI) planning framework Model Belief, Knowledge, Desire Formal definition with predicate calculus Axiomatization of plans and actions as well STRIPS-style: Preconditions, Effects, Body Rules for plan inference Elegant, but.. Labor-intensive rule, KB, heuristic development Effectively AI-complete
29
Cue-based Interpretation
Employs sets of features to identify Words and collocations: Please -> request Prosody: Rising pitch -> yes/no question Conversational structure: prior act Example: Check: Syntax: tag question “,right?” Syntax + prosody: Fragment with rise N-gram: argmax d P(d)P(W|d) So you, sounds like, etc Details later ….
30
From Human to Computer Conversational agents Issues:
Systems that (try to) participate in dialogues Examples: Directory assistance, travel info, weather, restaurant and navigation info Issues: Limited understanding: ASR errors, interpretation Computational costs: broader coverage -> slower, less accurate
31
Dialogue Manager Tradeoffs
Flexibility vs Simplicity/Predictability System vs User vs Mixed Initiative Order of dialogue interaction Conversational “naturalness” vs Accuracy Cost of model construction, generalization, learning, etc Models: FST, Frame-based, HMM, BDI Evaluation frameworks
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.