Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bob Marinier Oral Defense University of Michigan, CSE June 17, 2008 A Computational Unification of Cognitive Control, Emotion, and Learning.

Similar presentations


Presentation on theme: "Bob Marinier Oral Defense University of Michigan, CSE June 17, 2008 A Computational Unification of Cognitive Control, Emotion, and Learning."— Presentation transcript:

1 Bob Marinier Oral Defense University of Michigan, CSE June 17, 2008 A Computational Unification of Cognitive Control, Emotion, and Learning

2 Introduction The link between core cognitive functions and emotion has not been fully explored Existing computational models are largely pragmatic We integrate the PEACTIDM theory of cognitive control and appraisal theories of emotion PEACTIDM supplies process, appraisal theories supply data We use emotion-driven reinforcement learning to demonstrate improved functionality Automatically generate rewards, set parameters 2

3 Cognitive Control: PEACTIDM PerceiveObtain raw perception EncodeCreate domain-independent representation AttendChoose stimulus to process ComprehendGenerate structures that relate stimulus to tasks and can be used to inform behavior TaskPerform task maintenance IntendChoose an action, create prediction DecodeDecompose action into motor commands MotorExecute motor commands 3

4 PEACTIDM Cycle Comprehend Perceive Encode Attend Intend Decode Motor Raw Perceptual Information Stimulus Relevance Stimulus chosen for processing Current Situation Assessment Action Motor Commands Environmental Change Prediction What is this information? 4

5 Appraisal Theories of Emotion A situation is evaluated along a number of appraisal dimensions, many of which relate the situation to current goals Novelty, goal relevance, goal conduciveness, expectedness, causal agency, etc. Result of appraisals influences emotion Emotion can then be coped with (via internal or external actions) Situation Goals Appraisals Emotion Coping 5

6 Appraisals to Emotions (Scherer 2001) 6 JoyFearAnger SuddennessHigh/mediumHigh UnpredictabilityHigh Intrinsic pleasantnessLow Goal/need relevanceHigh Cause: agentOther/natureOther Cause: motiveChance/intentionalIntentional Outcome probabilityVery highHighVery high Discrepancy from expectationHigh ConducivenessVery highLow ControlHigh PowerVery lowHigh Why these dimensions? What is the functional purpose of emotion?

7 Unification of PEACTIDM and Appraisal Theories 7 Comprehend Perceive Encode Attend Intend Decode Motor Raw Perceptual Information Stimulus Relevance Stimulus chosen for processing Current Situation Assessment Action Motor Commands Environmental Change Suddenness Unpredictability Goal Relevance Intrinsic Pleasantness Causal Agent/Motive Discrepancy Conduciveness Control/Power Prediction Outcome Probability

8 Example: Simple Choice Response Task 8

9 Appraisal Frame Suddenness1 Goal Relevance1 Conduciveness1 Discrepancy0 Outcome Probability1 PEACTIDM in the Button Task 9 “Surprise Factor”

10 Appraisal Frame Suddenness1 Goal Relevance1 Conduciveness1 Discrepancy0 Outcome Probability1 PEACTIDM in the Button Task 10 Conduciveness Discrepancy1

11 Summary of Evaluation 11 1. Cognitively generated emotions Emotions arise from appraisals 2. Fast primary emotions Some appraisals generated and activated early 3. Emotional experience Cognitive access to emotional state, but no physiology 4. Body-mind interactions Emotions can influence behavior 5. Emotional behavior a. Model works and produces useful, purposeful behavior b. Different environments lead to: i. Different time courses ii. Different feeling profiles c. Choices impact emotions and success

12 Primary Contributions 12 1. Appraisals are functionally required by cognition They specify the data used by certain steps in PEACTIDM 2. Appraisals provide a task-independent language for control knowledge They influence choices such as Attend and Intend 3. PEACTIDM implies a partial ordering of appraisal generation Data dependencies imply that some appraisals can’t be generated until after others 4. Circumplex models can be synthesized from appraisal models Emotion intensity and valence can be derived from appraisals 5. Emotion intensity is largely determined by expectations “Surprise Factor” is determined by Outcome Probability and Discrepancy from Expectation 6. Some appraisals may require an arbitrary amount of inference Comprehend can theoretically require arbitrary processing 7. Internal and external stimuli are treated identically Tasking options can be Attended and Intended just like external stimuli

13 Additional Exploration 13 Functionality: What is emotion good for? Emotion-driven reinforcement learning Scale: Does it work in non-trivial domains? Continuous time/space environment More complex appraisal generation Understanding: How do appraisals influence performance? Try subsets of appraisals

14 Intrinsically Motivated Reinforcement Learning (Sutton & Barto 1998; Singh et al. 2004) 14 Environment Critic Agent ActionsStatesRewards External Environment Internal Environment Agent Critic Actions StatesRewards Sensations Appraisal Process +/- Emotion Intensity Decisions “Organism” Reward = Intensity * Valence

15 Clean House Domain 15 Blocks Agent Rooms Storage Room Gateways

16 Stimuli in the Environment Gateway to 73Gateway to 78 Gateway to 93 Current room Block 1 Create subtask go to room 73 Create subtask go to room 78 Create subtask go to room 93 Create subtask clean current room 16

17 Learning In this domain, the agent is only learning what to Attend to (including Tasking) Not learning what action to take Goal: What is the impact of various appraisals? Disabled most and developed a few Conduciveness Discrepancy from Expectation and Outcome Probability Goal Relevance Intrinsic Pleasantness Method: SARSA, epsilon-greedy, fixed ER and LR 50 trials, 15 episodes per trial 17

18 Conduciveness 18 Measures how good or bad a stimulus is Influences emotion intensity and valence Sufficient to generate a reward Value based on “progress” and “path” Progress: Is agent getting closer to goal over time? Path: Will acting on stimulus get agent closer to goal?

19 Conduciveness Total FailuresTrial FailuresFinal Episode Failures 7.6%24%6% 19

20 Outcome Probability and Discrepancy from Expectation 20 Measures how likely a prediction is and how accurate the prediction is Influences emotion intensity via “surprise factor” (unvalenced) Predictions and Outcome Probability generated via learned task model Results in non-stationary reward Discrepancy generated via comparison to prediction Added these appraisals on top of Conduciveness

21 Outcome Probability and Discrepancy from Expectation Total FailuresTrial FailuresFinal Episode Failures 0% 21

22 Goal Relevance Measures how important a stimulus is for the goal Influences emotion intensity (unvalenced) Value based on “path” knowledge Agent actually had too much path knowledge, so removed some The value of Goal Relevance for some stimulus is used to “boost” the Q-value of the Attend operator for that stimulus Added this appraisal on top of Conduciveness, Outcome Probability, and Discrepancy 22

23 GR Knowledge Reduction Results 23

24 Intrinsic Pleasantness Measures how attracted the agent is to a stimulus independent of the current goal Influences emotion intensity and valence Made blocks intrinsically pleasant This is good because blocks need to be Attended to get cleaned up This is bad because agent may be distracted by blocks that have already been cleaned up Replaced Goal Relevance with this appraisal 24

25 Intrinsic Pleasantness Results 25

26 Dynamic Exploration Rate 26 Dynamically adjust exploration rate based on current emotion If Valence < 0, then things could probably be better ER = |Intensity * Valence| If Valence > 0, then things are ok ER = 0 Experiment conducted with Conduciveness, Outcome Probability, and Discrepancy only

27 Dynamic Exploration Rate Total FailuresTrial FailuresFinal Episode Failures 9.2%38%8.0% 27

28 Dynamic Learning Rate 28 Dynamically adjust learning rate based on current emotion If reward magnitude is large, then there may be something to learn LR = |Intensity*Valence| Experiment conducted with Conduciveness, Outcome Probability, and Discrepancy only, Dynamic Exploration Rate enabled Total FailuresTrial FailuresFinal Episode Failures 0.5%8.0%0.0%

29 Dynamic Exploration and Learning Rates 29 Dynamically adjust exploration and learning rates based on current emotion If Valence < 0, then things could probably be better ER = |Intensity * Valence| If Valence > 0, then things are ok ER = 0 If reward magnitude is large, then there may be something to learn LR = |Intensity*Valence| Experiment conducted with Conduciveness, Outcome Probability, and Discrepancy only Results: Tighter convergences, better prediction accuracy, small number of failures

30 Learning Summary ConducivenessFoundation to learning. Agent learns to perform the task better over time. Outcome Probability, Discrepancy from Expectation Introduced learned task model for generating predictions as basis for generating values for these appraisals. Agent learns to predict better over time. Also results in much improved failure rates. Goal Relevance Used to “boost” Q-values of proposed Attend operators. Agent does extremely well (except for failures), to the point where it almost isn’t learning, raising questions about the value of other appraisals. Knowledge about Goal Relevance was reduced, leading to more learning. Intrinsic Pleasantness Used to provide a task-independent bias on valence and intensity. Results are mixed, as expected, but agent generally learns to overcome problems. Dynamic Exploration and Learning Rates Emotion used to regulate part of the architecture. Resulted in tighter convergences and prediction accuracy. Slightly more failures. 30

31 Secondary Contributions 31 1. Reinforcement learning can be driven by intrinsically generated rewards based on the agent’s feeling 2. Reinforcement learning parameters can be influenced by the current emotional state, resulting in improved performance 3. Each appraisal contributes to the agent’s performance 4. The system scales to continuous time and space environments 5. Mood averages reward over time, allowing states with no reward-invoking stimulus to still have a reward associated with them

32 Future Work 32 Cognition Scalability Validation Physiology Action tendencies Non-verbal communication Basic drives Integration with other architectural mechanisms Learning (appraisal values, intend, etc.) Human data Believability Sociocultural interactions More appraisals (social, perceptual, etc.) Physiological measures Behavior Functionality Decision making

33 Backup Slides 33

34 Benefits of Soar 34 Parallel rule firing allows for: Parallel Encoding Parallel appraisal generation Parallel Decoding (theoretically) Impasses provide: Architectural support for PEACTIDM-related subgoals Intend Comprehend (theoretically) Support for fast and extended inference, and transitioning from extended to fast (chunking) Intend in button task starts out extended and becomes fast Reinforcement learning allows fast learning from emotion feedback Future benefits: New modules may assist in appraisal generation Episodic/semantic memories, visual imagery, etc.

35 Architectural Requirements: Soar vs. ACT-R 35

36 PEACTIDM and GOMS 36 In general, these are complementary techniques GOMS Focused on HCI Focused on motor actions (e.g. keypresses) Less focus on cognitive aspects (more abstract) PEACTIDM Focused on required cognitive functions Allows for a mapping with appraisals Could implement PEACTIDM with GOMS, but would lack the proper labels that allow for the mapping

37 Relating Emotion to Intrinsically Motivated RL 37 Emotion intensity and valence used to: Generate intrinsic rewards Various appraisals contribute to the reward signal with varying success Frequent reward signals allow agent to learn faster, but can also introduce infinite reward cycles Task modeling helps address cycles Automatically adjust parameters Learning and exploration rates Helps reduce unnecessary exploration, bad learning

38 Button Task Timing: Before and After Learning 38

39 Learning the Task Model 39 Stim1 Stimulus 1Stimulus 2Stimulus 3 Stim2 Stim3 0.2 0.15 1.0 0.0 Task Memory Prediction(generic) Outcome Probability0.5 Discrepancy0.0 Surprise Factor Intensity Reward 0.5 Medium Stim2 1.0 0.5 Stim3 0.57 0.43 Lower 0.1 0.4 0.0 Perception/Encoding

40 Body Symbolic Long-Term Memories Procedural Short-Term Memory Situation, Goals Decision Procedure Chunking Reinforcement Learning Semantic Learning Episodic Learning Perception Action Visual Imagery Feeling Generation Extending Soar with Emotion (Marinier & Laird 2007) 40 Soar is a cognitive architecture A cognitive architecture is a set of task-independent mechanisms that interact to give rise to behavior Cognitive architectures are general agent frameworks

41 Feeling Generation Reinforcement Learning Emotion.5,.7,0,-.4,.3,… Extending Soar with Emotion (Marinier & Laird 2007) 41 Body Decision Procedure Perception Action Appraisals Feelings Short-Term Memory Situation, Goals Mood.7,-.2,.8,.3,.6,… Feelings Knowledge Architecture Symbolic Long-Term Memories Procedural Chunking Semantic Learning Episodic Learning +/- Intensity Feeling.9,.6,.5,-.1,.8,… Visual Imagery

42 Appraisal Value Ranges 42

43 Computing Feeling from Emotion and Mood 43 Assumption: Appraisal dimensions are independent Limited Range: Inputs and outputs are in [0,1] or [-1,1] Distinguishability: Very different inputs should lead to very different outputs Non-linear: Linearity would violate limited range and distinguishability

44 Example 44

45 Maze Tasks 45 no distractions distractions single subgoal multiple subgoals impossible

46 Time Course and Impact of Feelings 46

47 Feeling Dynamics Results 47 very easy

48 Computing Feeling Intensity 48 Motivation: Intensity gives a summary of how important (i.e., how good or bad) the situation is Limited range: Should map onto [0,1] No dominant appraisal: No single value should drown out all the others Can’t just multiply values, because if any are 0, then intensity is 0 Realization principle: Expected events should be less intense than unexpected events

49 Example 49

50 Learning task 50 Start Goal Optimal Subtasks

51 Learning Results 51

52 Circumplex models Emotions can be described in terms of intensity and valence, as in a circumplex model: 52 NEGATIVE VALENCE POSITIVE VALENCE HIGH INTENSITY LOW INTENSITY upset stressed nervous tense sad depressed lethargic fatigued alert excited elated happy contented serene relaxed calm Adapted from Feldman Barrett & Russell (1998)

53 Full Knowledge Goal Relevance Results 53

54 Related Work 54


Download ppt "Bob Marinier Oral Defense University of Michigan, CSE June 17, 2008 A Computational Unification of Cognitive Control, Emotion, and Learning."

Similar presentations


Ads by Google