Download presentation
Presentation is loading. Please wait.
Published byAmanda Theodora Cameron Modified over 8 years ago
1
Incremental Dialogue Understanding and Feedback for Multiparty, Multimodal Conversation David Traum, David DeVault, Jina Lee, Zhiyang Wang, and Stacy Marsella 04.04.2013 presented by Zhichao Hu
2
2 Requirements for Human-like feedback provided in real-time, as the speaker is speaking often specific, so the feedback mechanism requires interpretation and attempted understanding of what the speaker is saying. expressive, indicating aspects of the current mental state of the feedback giver, including beliefs, goals, emotions, and how the developing utterance is related sometimes evocative, trying to create an impression on or response behavior from the observers of the feedback (including regulation of main speaker behavior).
3
3 Listener Feedback - Example Ranger tries to convince Utah to take his place as the new sheriff Utah : surprised at offer, but in favor of it Harmony: switches from overhearer -> eavesdropper, dislikes offer, emotional reactions Ranger Utah (bartender)Harmony (bar owner) USC ICT’s SASO project for negotiation skill training Ranger’s Deputy
4
4 Demo
5
5 Listener Feedback - Example “Utah (1), it’s time for me to move on (2) and the (3) town will need a strong leader (4) like yourself (5) to (6) maintain law and order (7).” In: Utah Role shifts to addressee, Harmony role shifts to eavesdropper Out: U: Attend H: Attend In: Partial understanding Out: U: Understand (nod) H: Attend (avoid gaze) In: affect surprise for Utah and fear for Harmony Out: U: Show emo (Surprise) H: Show emo (Fear) In: attitude like for Utah Out: U: Show like (smile) H: Attend (glances) In: attitude like, agree for Utah, dislike, anger for Harmony Out: U: Show like H: Show anger In: Partial Understanding Out: U: Understand, agree H: Avoid mutual gaze
6
6 Implications for system capabilities Incremental perception and understanding Computing hypotheses and predictions of partial and ultimate meaning while listening Revising hypotheses Access to pragmatic reasoning Dialogue acts Reference resolution Access to goals, plans, intention, emotion
7
7 System Architecture: Functional Requirements A speech recognizer that can produce incremental, word-by-word results, ideally with confidence scores. A natural language understanding component (NLU) that produces semantic representations and predictions of final meaning when given a speech recognition output. A meta-NLU component that computes confidence estimates given(partial) speech recognition and NLU outputs. A vision component that can recognize speaker behaviors such as gaze direction. A domain reasoning component that can model beliefs, tasks, plans, and attitudes toward particular topics. A dialogue manager that can compute pragmatic effects of communication as recognized by the above input components and update state and calculate communicative intentions. A feedback generator that can produce communicative behaviors given the function specifications from the dialogue manager.
8
8 Environment NLU & meta-NLU NLU & meta-NLU Speech Recognition Speech Recognition Non-Verbal Behavior Generator Non-Verbal Behavior Generator Smartbody Procedural Animation Planner Smartbody Procedural Animation Planner Speech Synthesis Speech Synthesis Natural Language Generation Natural Language Generation Dialog Manager Dialog Manager Emotion Model Emotion ModelTask Planner Body and Affective State Management Body and Affective State Management Body Mind Real Environment Vision Recognition Vision Recognition Perception Rendered Game Engine Domain Specific Knowledge Domain Specific Knowledge Domain Independent Knowledge Domain Independent Knowledge World State Protocol World State Protocol Knowledge Management Intelligent Cognitive Agent SASO Vhuman Architecture
9
9 System Architecture Speech recognizer Input speech Utterance text NLU and meta-NLU Vision component Input behavior Recognized behaviors Domain reasoning component Dialogue manager Understood utterances Feedback generator Causal interpretation Resolved reference Output behavior
10
10 Function Markup Language (FML) Feedback Message (minimum 2) [string] <indicators Correct="[boolean]" High="[boolean]" Incorrect="[boolean]" Low="[boolean]" MAXF="[boolean]" PF1="[boolean]" PF2="[boolean]" PF3="[boolean]" WillBeCorrect="[boolean]" WillBeHigh="[boolean]" WillBeIncorrect="[boolean]" WillBeLow="[boolean]"/>...
11
11 Automatic Speech Recognition (ASR) " role=”speaker"/> … " speaker=” ” utterance="[id]" progress="[integer]" complete="[boolean]"> [string] … ASR provides incremental hypotheses while user is speaking Currently use Pocketsphinx, every 200 msec Provided Info: Speaker Id Progress (how many partials) Text Complete?
12
12 Natural Language Understanding (NLU) …... … For each partial utteranc e Two semantic frames prediction explicit subframe confidence indicators F t = estimated f-score of current hypothesis F L = estimated f-score of final hypothesis
13
13 Semantic Interpretation: example Predicted & Explicit sub-Frame Probability distribution across hypotheses Estimated F-score & confidence
14
14 Semantic Interpretation: example
15
15 Dialogue Management (DM)/Pragmatics " role=”addressee"/> " role=”bystander"/>… [string]... … Reference resolution Indexicals (“I”, “we”, “you”, “here”,…) Anaphors (“he”, “she”, “it”,..) Descriptions (e.g. “the money”) Task and state descriptions Dialogue act recognition Role updates for each conversational participant
16
16 1.Look up word in lexicon 2.Extract semantic features 3.Find candidates in domain representation that match features (prioritized partial unification) 4.Compare candidates to dialogue context/history (prefer recency) 5.Goal to clarify if zero or more than one candidate Reference Resolution Steps
17
Information State/Dialogue Acts Model (Traum & Rickel 2002) LayerInfo State ComponentsDialogue Acts ContactParticipant contact Make-contact, break-contact attentionParticipant focusShow, request, accept conversation Conversation, topic, participants Start-conversation, end-conversation, confirm-start, deny-start, identify-topic, join, leave Turn-takingConversation turn Take-turn, keep-turn, hold-turn, release-turn, assign-turn initiativeConversation initiative Take-initiative, release-initiative groundingConversation CGUs Initiate, continue, acknowledge repair, cancel, request-repair CoreSocial State (obligations, commitments, trust) Conversation QUD, Negotiation, CGU contents Forward: assert, info-req, order, request, thank, greeting, closing, express, check, suggest, promise, offer, apology, encourage, accuse, intro-topic, avoid Backward: accept, reject, address, answer, divert, counterpropose, hold, check, clarify-parameter, redirect
18
18 Participant Structure Conversational Participant Roles Active participant (recent speaker or addressee) Overhearer (passive role) Utterance Participation Roles Speaker Addressee Side-participant Overhearer Eavesdropper
19
19 Speaker: given by ASR/Microphone Addressee: calculated using text, gaze, & context Side-participant: active participant who is not speaker or addressee Overhearer: other ratified participant or bystander, not participating Eavesdropper: bystander covertly listening Computing Utterance Participation Roles
20
20 Addressee Recognition Algorithm 1. If utterance specifies addressee (e.g., a vocative or utterance of just a name when not expecting a short answer or clarification of type person) then Addressee = specified addressee 2. else if speaker facing/gaze at someone then Addressee = faced participant 3. else if speaker of current utterance is the same as the speaker of the immediately previous utterance then Addressee = previous addressee 4. else if previous speaker is different from current speaker then Addressee = previous speaker 5. else if unique other conversational participant then Addressee = participant 6. else Addressee unknown
21
21 Conversational Goals Positive comprehension goal: the agent will expend cognitive resources to listen to and understand the utterance Negative comprehension goal: the agent will focus attention on other matters, such as planning next actions or utterances, emotional reasoning, or task execution Positive participation goal: the agent will look for opportunities to further the conversation with active conversational behavior Negative participation goal: the agent tends to disengage, perhaps moving further away and out of contact
22
22 Default/Normative Conversational Goals: based on utterance roles RoleComprehension GoalParticipation Goal Speakeryes Addresseeyes Side-Participantyes Overhearerno Eavesdropperyesno
23
23 Domain and Emotion Reasoning: EMA (Emotion and Adaptation) (minimum 2) … Task model: relates states and actions to goals Derived utility calculation Appraisal: computing relevant values Emotion mapping Coping
24
24 Desirability: value of the proposition to the agent (e.g., does it causally advance or inhibit a state of utility for the agent). Likelihood: are states likely to occur Causal attribution: who deserves credit/blame. Controllability: can the outcome be altered by actions under control of the agent. Changeability: can the outcome be altered by some other causal agent. Appraisal variables
25
25 Computing Attitude & Affect Attitude: “felt” type and intensity from desirability Stance from coping strategy: Leaked -> use “felt” values Intended -> values based on interpersonal goal Affect Leaked: use emotion associated with referenced focus Intended: portray emotion based on coping strategy
26
26 Overriding Conversational Goal Norms Using coping strategies and higher-level conversational goals to try to change participant status Want to say something add participation goals, even if currently eavesdropper or overhearer Want to leave conversation Drop participation goal (Locally) divided attention Drop conversation goals while keeping participation goal
27
27 Sources of message aspects: ASR NLU DM EMA
28
28 Real-time calculation NLU Meta-NLU Dialogue Attitude
29
29 Behavior Generation Addressee Role Behaviors Attention: gaze/mutual gaze Cognitive load: avert Social comparison: glance at others Grounding: nod, head-tilt, frown Attitude/Affect: facial display (smile, frown, brow flash) Side-participants Similar to addressee but less committed/engaged Eavesdroppers Avoid reactive behavior, furtive glances Role changing behavior Adopt behaviors of desired role
30
30 Demo (reprise)
31
31 Current & Future work Evaluation in real multi-party interaction (so far just developer tests) Verbal feedback Verbal backchannels/explicit acknowledgement Completions (collaborative & strategic) Responses Incremental Grounding Visser et al Workshop presentation on Saturday Full integration of affect component Strategic (intended) behaviors
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.