Presentation is loading. Please wait.

Presentation is loading. Please wait.

Incremental Dialogue Understanding and Feedback for Multiparty, Multimodal Conversation David Traum, David DeVault, Jina Lee, Zhiyang Wang, and Stacy Marsella.

Similar presentations


Presentation on theme: "Incremental Dialogue Understanding and Feedback for Multiparty, Multimodal Conversation David Traum, David DeVault, Jina Lee, Zhiyang Wang, and Stacy Marsella."— Presentation transcript:

1 Incremental Dialogue Understanding and Feedback for Multiparty, Multimodal Conversation David Traum, David DeVault, Jina Lee, Zhiyang Wang, and Stacy Marsella 04.04.2013 presented by Zhichao Hu

2 2 Requirements for Human-like feedback  provided in real-time, as the speaker is speaking  often specific, so the feedback mechanism requires interpretation and attempted understanding of what the speaker is saying.  expressive, indicating aspects of the current mental state of the feedback giver, including beliefs, goals, emotions, and how the developing utterance is related  sometimes evocative, trying to create an impression on or response behavior from the observers of the feedback (including regulation of main speaker behavior).

3 3 Listener Feedback - Example  Ranger tries to convince Utah to take his place as the new sheriff  Utah : surprised at offer, but in favor of it  Harmony: switches from overhearer -> eavesdropper, dislikes offer, emotional reactions Ranger Utah (bartender)Harmony (bar owner) USC ICT’s SASO project for negotiation skill training Ranger’s Deputy

4 4 Demo

5 5 Listener Feedback - Example “Utah (1), it’s time for me to move on (2) and the (3) town will need a strong leader (4) like yourself (5) to (6) maintain law and order (7).” In: Utah Role shifts to addressee, Harmony role shifts to eavesdropper Out: U: Attend H: Attend In: Partial understanding Out: U: Understand (nod) H: Attend (avoid gaze) In: affect surprise for Utah and fear for Harmony Out: U: Show emo (Surprise) H: Show emo (Fear) In: attitude like for Utah Out: U: Show like (smile) H: Attend (glances) In: attitude like, agree for Utah, dislike, anger for Harmony Out: U: Show like H: Show anger In: Partial Understanding Out: U: Understand, agree H: Avoid mutual gaze

6 6 Implications for system capabilities  Incremental perception and understanding  Computing hypotheses and predictions of partial and ultimate meaning while listening  Revising hypotheses  Access to pragmatic reasoning  Dialogue acts  Reference resolution  Access to goals, plans, intention, emotion

7 7 System Architecture: Functional Requirements  A speech recognizer that can produce incremental, word-by-word results, ideally with confidence scores.  A natural language understanding component (NLU) that produces semantic representations and predictions of final meaning when given a speech recognition output.  A meta-NLU component that computes confidence estimates given(partial) speech recognition and NLU outputs.  A vision component that can recognize speaker behaviors such as gaze direction.  A domain reasoning component that can model beliefs, tasks, plans, and attitudes toward particular topics.  A dialogue manager that can compute pragmatic effects of communication as recognized by the above input components and update state and calculate communicative intentions.  A feedback generator that can produce communicative behaviors given the function specifications from the dialogue manager.

8 8 Environment NLU & meta-NLU NLU & meta-NLU Speech Recognition Speech Recognition Non-Verbal Behavior Generator Non-Verbal Behavior Generator Smartbody Procedural Animation Planner Smartbody Procedural Animation Planner Speech Synthesis Speech Synthesis Natural Language Generation Natural Language Generation Dialog Manager Dialog Manager Emotion Model Emotion ModelTask Planner Body and Affective State Management Body and Affective State Management Body Mind Real Environment Vision Recognition Vision Recognition Perception Rendered Game Engine Domain Specific Knowledge Domain Specific Knowledge Domain Independent Knowledge Domain Independent Knowledge World State Protocol World State Protocol Knowledge Management Intelligent Cognitive Agent SASO Vhuman Architecture

9 9 System Architecture Speech recognizer Input speech Utterance text NLU and meta-NLU Vision component Input behavior Recognized behaviors Domain reasoning component Dialogue manager Understood utterances Feedback generator Causal interpretation Resolved reference Output behavior

10 10 Function Markup Language (FML) Feedback Message (minimum 2) [string] <indicators Correct="[boolean]" High="[boolean]" Incorrect="[boolean]" Low="[boolean]" MAXF="[boolean]" PF1="[boolean]" PF2="[boolean]" PF3="[boolean]" WillBeCorrect="[boolean]" WillBeHigh="[boolean]" WillBeIncorrect="[boolean]" WillBeLow="[boolean]"/>...

11 11 Automatic Speech Recognition (ASR) " role=”speaker"/> … " speaker=” ” utterance="[id]" progress="[integer]" complete="[boolean]"> [string] …  ASR provides incremental hypotheses while user is speaking  Currently use Pocketsphinx, every 200 msec  Provided Info:  Speaker  Id  Progress (how many partials)  Text  Complete?

12 12 Natural Language Understanding (NLU) …... …  For each partial utteranc e  Two semantic frames  prediction  explicit subframe  confidence indicators  F t = estimated f-score of current hypothesis  F L = estimated f-score of final hypothesis

13 13 Semantic Interpretation: example  Predicted & Explicit sub-Frame  Probability distribution across hypotheses  Estimated F-score & confidence

14 14 Semantic Interpretation: example

15 15 Dialogue Management (DM)/Pragmatics " role=”addressee"/> " role=”bystander"/>… [string]... …  Reference resolution  Indexicals (“I”, “we”, “you”, “here”,…)  Anaphors (“he”, “she”, “it”,..)  Descriptions (e.g. “the money”)  Task and state descriptions  Dialogue act recognition  Role updates for each conversational participant

16 16 1.Look up word in lexicon 2.Extract semantic features 3.Find candidates in domain representation that match features (prioritized partial unification) 4.Compare candidates to dialogue context/history  (prefer recency) 5.Goal to clarify if zero or more than one candidate Reference Resolution Steps

17 Information State/Dialogue Acts Model (Traum & Rickel 2002) LayerInfo State ComponentsDialogue Acts ContactParticipant contact Make-contact, break-contact attentionParticipant focusShow, request, accept conversation Conversation, topic, participants Start-conversation, end-conversation, confirm-start, deny-start, identify-topic, join, leave Turn-takingConversation turn Take-turn, keep-turn, hold-turn, release-turn, assign-turn initiativeConversation initiative Take-initiative, release-initiative groundingConversation CGUs Initiate, continue, acknowledge repair, cancel, request-repair CoreSocial State (obligations, commitments, trust) Conversation QUD, Negotiation, CGU contents Forward: assert, info-req, order, request, thank, greeting, closing, express, check, suggest, promise, offer, apology, encourage, accuse, intro-topic, avoid Backward: accept, reject, address, answer, divert, counterpropose, hold, check, clarify-parameter, redirect

18 18 Participant Structure  Conversational Participant Roles  Active participant (recent speaker or addressee)  Overhearer (passive role)  Utterance Participation Roles  Speaker  Addressee  Side-participant  Overhearer  Eavesdropper

19 19  Speaker: given by ASR/Microphone  Addressee: calculated using text, gaze, & context  Side-participant: active participant who is not speaker or addressee  Overhearer: other ratified participant or bystander, not participating  Eavesdropper: bystander covertly listening Computing Utterance Participation Roles

20 20 Addressee Recognition Algorithm 1. If utterance specifies addressee (e.g., a vocative or utterance of just a name when not expecting a short answer or clarification of type person) then Addressee = specified addressee 2. else if speaker facing/gaze at someone then Addressee = faced participant 3. else if speaker of current utterance is the same as the speaker of the immediately previous utterance then Addressee = previous addressee 4. else if previous speaker is different from current speaker then Addressee = previous speaker 5. else if unique other conversational participant then Addressee = participant 6. else Addressee unknown

21 21 Conversational Goals  Positive comprehension goal: the agent will expend cognitive resources to listen to and understand the utterance  Negative comprehension goal: the agent will focus attention on other matters, such as planning next actions or utterances, emotional reasoning, or task execution  Positive participation goal: the agent will look for opportunities to further the conversation with active conversational behavior  Negative participation goal: the agent tends to disengage, perhaps moving further away and out of contact

22 22 Default/Normative Conversational Goals: based on utterance roles RoleComprehension GoalParticipation Goal Speakeryes Addresseeyes Side-Participantyes Overhearerno Eavesdropperyesno

23 23 Domain and Emotion Reasoning: EMA (Emotion and Adaptation) (minimum 2) …  Task model: relates states and actions to goals  Derived utility calculation  Appraisal: computing relevant values  Emotion mapping  Coping

24 24  Desirability: value of the proposition to the agent (e.g., does it causally advance or inhibit a state of utility for the agent).  Likelihood: are states likely to occur  Causal attribution: who deserves credit/blame.  Controllability: can the outcome be altered by actions under control of the agent.  Changeability: can the outcome be altered by some other causal agent. Appraisal variables

25 25 Computing Attitude & Affect  Attitude:  “felt” type and intensity from desirability  Stance from coping strategy:  Leaked -> use “felt” values  Intended -> values based on interpersonal goal  Affect  Leaked: use emotion associated with referenced focus  Intended: portray emotion based on coping strategy

26 26 Overriding Conversational Goal Norms  Using coping strategies and higher-level conversational goals to try to change participant status  Want to say something  add participation goals, even if currently eavesdropper or overhearer  Want to leave conversation  Drop participation goal  (Locally) divided attention  Drop conversation goals while keeping participation goal

27 27 Sources of message aspects: ASR NLU DM EMA

28 28 Real-time calculation  NLU  Meta-NLU  Dialogue  Attitude

29 29 Behavior Generation  Addressee Role Behaviors  Attention: gaze/mutual gaze  Cognitive load: avert  Social comparison: glance at others  Grounding: nod, head-tilt, frown  Attitude/Affect: facial display (smile, frown, brow flash)  Side-participants  Similar to addressee but less committed/engaged  Eavesdroppers  Avoid reactive behavior, furtive glances  Role changing behavior  Adopt behaviors of desired role

30 30 Demo (reprise)

31 31 Current & Future work  Evaluation in real multi-party interaction (so far just developer tests)  Verbal feedback  Verbal backchannels/explicit acknowledgement  Completions (collaborative & strategic)  Responses  Incremental Grounding  Visser et al Workshop presentation on Saturday  Full integration of affect component  Strategic (intended) behaviors


Download ppt "Incremental Dialogue Understanding and Feedback for Multiparty, Multimodal Conversation David Traum, David DeVault, Jina Lee, Zhiyang Wang, and Stacy Marsella."

Similar presentations


Ads by Google