Presentation is loading. Please wait.

Presentation is loading. Please wait.

Emotion-Driven Reinforcement Learning Bob Marinier & John Laird University of Michigan, Computer Science and Engineering CogSci’08.

Similar presentations


Presentation on theme: "Emotion-Driven Reinforcement Learning Bob Marinier & John Laird University of Michigan, Computer Science and Engineering CogSci’08."— Presentation transcript:

1 Emotion-Driven Reinforcement Learning Bob Marinier & John Laird University of Michigan, Computer Science and Engineering CogSci’08

2 Introduction Interested in the functional benefits of emotion for a cognitive agent ▫Appraisal theories of emotion ▫PEACTIDM theory of cognitive control Use emotion as a reward signal to a reinforcement learning agent ▫Demonstrates a functional benefit of emotion ▫Provides a theory of the origin of intrinsic reward 2

3 Outline Background ▫Integration of emotion and cognition ▫Integration of emotion and reinforcement learning ▫Implementation in Soar Learning task Results 3

4 Appraisal Theories of Emotion A situation is evaluated along a number of appraisal dimensions, many of which relate the situation to current goals ▫Novelty, goal relevance, goal conduciveness, expectedness, causal agency, etc. Appraisals influence emotion Emotion can then be coped with (via internal or external actions) Situation Goals Appraisals Emotion Coping 4

5 Appraisals to Emotions (Scherer 2001) 5 JoyFearAnger SuddennessHigh/mediumHigh UnpredictabilityHigh Intrinsic pleasantnessLow Goal/need relevanceHigh Cause: agentOther/natureOther Cause: motiveChance/intentionalIntentional Outcome probabilityVery highHighVery high Discrepancy from expectation High ConducivenessVery highLow ControlHigh PowerVery lowHigh

6 Cognitive Control: PEACTIDM (Newell 1990) PerceiveObtain raw perception EncodeCreate domain-independent representation AttendChoose stimulus to process ComprehendGenerate structures that relate stimulus to tasks and can be used to inform behavior TaskPerform task maintenance IntendChoose an action, create prediction DecodeDecompose action into motor commands MotorExecute motor commands 6

7 Unification of PEACTIDM and Appraisal Theories 7 Comprehend Perceive Encode Attend Intend Decode Motor Raw Perceptual Information Stimulus Relevance Stimulus chosen for processing Current Situation Assessment Action Motor Commands Environmental Change Suddenness Unpredictability Goal Relevance Intrinsic Pleasantness Causal Agent/Motive Discrepancy Conduciveness Control/Power Prediction Outcome Probability

8 Distinction between emotion, mood, and feeling (Marinier & Laird 2007) Emotion: Result of appraisals ▫Is about the current situation Mood: “Average” over recent emotions ▫Provides historical context Feeling: Emotion “+” Mood ▫What agent actually perceives 8

9 Cognition Emotion Mood Feeling Combination Function Pull Decay Active Appraisals Perceived Feeling Emotion, mood, and feeling 9

10 Intrinsically Motivated Reinforcement Learning (Sutton & Barto 1998; Singh et al. 2004) 10 Environment Critic Agent ActionsStatesRewards External Environment Internal Environment Agent Critic Actions StatesRewards Sensations Appraisal Process +/- Feeling Intensity Decisions “Organism” Reward = Intensity * Valence

11 Body Symbolic Long-Term Memories Procedural Short-Term Memory Situation, Goals Decision Procedure Chunking Reinforcement Learning Semantic Learning Episodic Learning Perception Action Visual Imagery Appraisal Detector Extending Soar with Emotion (Marinier & Laird 2007) 11

12 Appraisal Detector Reinforcement Learning Emotion.5,.7,0,-.4,.3,… Extending Soar with Emotion (Marinier & Laird 2007) 12 Body Decision Procedure Perception Action Appraisals Feelings Short-Term Memory Situation, Goals Mood.7,-.2,.8,.3,.6,… Feelings Knowledge Architecture Symbolic Long-Term Memories Procedural Chunking Semantic Learning Episodic Learning +/- Intensity Feeling.9,.6,.5,-.1,.8,… Visual Imagery

13 Learning task 13 Start Goal

14 Learning task: Encoding 14 South Passable: true On path: true Progress: true East Passable: false On path: true Progress: true West Passable: false On path: false Progress: true North Passable: false On path: false Progress: true

15 Learning task: Encoding & Appraisal 15 South Intrinsic Pleasantness: Neutral Goal Relevance: High Unpredictability: Low East Intrinsic Pleasantness: Low Goal Relevance: High Unpredictability: High West Intrinsic Pleasantness: Low Goal Relevance: Low Unpredictability: High North Intrinsic Pleasantness: Low Goal Relevance: Low Unpredictability: High

16 Learning task: Attending, Comprehending & Appraisal 16 South Intrinsic Pleasantness: Neutral Goal Relevance: High Unpredictability: Low Conduciveness: High Control: High …

17 Learning task: Tasking 17

18 Learning task: Tasking 18 Optimal Subtasks

19 What is being learned? When to Attend vs Task If Attending, what to Attend to If Tasking, which subtask to create When to Intend vs. Ignore 19

20 Learning Results 20

21 Results: With and without mood 21

22 Discussion Agent learns both internal (tasking) and external (movement) actions Emotion allows for more frequent rewards, and thus learns faster than standard RL Mood “fills in the gaps” allowing for even faster learning and less variability 22

23 Conclusion & Future Work Demonstrated computational model that integrates emotion and cognitive control Confirmed emotion can drive reinforcement learning We have already successfully demonstrated similar learning in a more complex domain Would like to explore multi-agent scenarios 23

24 Circumplex models Emotions can be described in terms of intensity and valence, as in a circumplex model: 24 NEGATIVE VALENCE POSITIVE VALENCE HIGH INTENSITY LOW INTENSITY upset stressed nervous tense sad depressed lethargic fatigued alert excited elated happy contented serene relaxed calm Adapted from Feldman Barrett & Russell (1998)

25 Computing Feeling from Emotion and Mood 25 Assumption: Appraisal dimensions are independent Limited Range: Inputs and outputs are in [0,1] or [-1,1] Distinguishability: Very different inputs should lead to very different outputs Non-linear: Linearity would violate limited range and distinguishability

26 Computing Feeling Intensity 26 Motivation: Intensity gives a summary of how important (i.e., how good or bad) the situation is Limited range: Should map onto [0,1] No dominant appraisal: No single value should drown out all the others ▫Can’t just multiply values, because if any are 0, then intensity is 0 Realization principle: Expected events should be less intense than unexpected events


Download ppt "Emotion-Driven Reinforcement Learning Bob Marinier & John Laird University of Michigan, Computer Science and Engineering CogSci’08."

Similar presentations


Ads by Google