Bob Marinier Oral Defense University of Michigan, CSE June 17, 2008 A Computational Unification of Cognitive Control, Emotion, and Learning.

Slides:

Advertisements

Similar presentations

Bob Marinier John Laird University of Michigan Electrical Engineering and Computer Science August 2, 2007.

Advertisements

Affective Facial Expressions Facilitate Robot Learning Joost Broekens Pascal Haazebroek LIACS, Leiden University, The Netherlands.

Computational Aspects of Emotion in Adaptive Behavior Joost Broekens, Walter Kosters, Fons Verbeek LIACS, Leiden University, The Netherlands.

Hierarchical Reinforcement Learning Amir massoud Farahmand

A Cognitive Architecture Theory of Comprehension and Appraisal: Unifying Cognitive Functions and Appraisal Bob Marinier John Laird University of Michigan.

1 Soar Emote Bob Marinier John Laird University of Michigan.

Chapter 7 Perception & Attribution. Perception Cognitive process by which we interpret and understand our surroundings Social perception – how we make.

Emotion-Driven Reinforcement Learning Bob Marinier & John Laird University of Michigan, Computer Science and Engineering CogSci’08.

Emotion Psychology, 4/e by Saul Kassin CHAPTER 12: Emotion 4/12/2017

Generating Appraisals with Sequence & Influence Networks Bob Marinier, SoarTech 29 th Soar Workshop June 2009.

Intelligent Agents Russell and Norvig: 2

A computational unification of cognitive behavior and emotion Robert P. Marinier III, John E. Laird, Richard L. Lewis Cognitive Systems Research vol. 10,

Chapter 3 Attention and Performance

Threaded Cognition: An Integrated Theory of Concurrent Multitasking

A Theory of Unifying Cognitive Processing, Appraisal and Emotion Bob Marinier John Laird University of Michigan.

1 Reinforcement Learning Problem Week #3. Figure reproduced from the figure on page 52 in reference [1] 2 Reinforcement Learning Loop state Agent Environment.

Outline Introduction Soar (State operator and result) Architecture

1 Learning from Behavior Performances vs Abstract Behavior Descriptions Tolga Konik University of Michigan.

1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.

The Importance of Architecture for Achieving Human-level AI John Laird University of Michigan June 17, th Soar Workshop

Polyscheme John Laird February 21, Major Observations Polyscheme is a FRAMEWORK not an architecture – Explicitly does not commit to specific primitives.

Models of Human Performance Dr. Chris Baber. 2 Objectives Introduce theory-based models for predicting human performance Introduce competence-based models.

Introduction to Affect and Cognition Copyright © 2005 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 3.

Consumer Decision Making

Reinforcement Learning and Soar Shelley Nason. Reinforcement Learning Reinforcement learning: Learning how to act so as to maximize the expected cumulative.

Noynay, Kelvin G. BSED-ENGLISH Educational Technology 1.

COMPUTATIONAL MODELING OF INTEGRATED COGNITION AND EMOTION Bob MarinierUniversity of Michigan.

Architectural Design.

What is Stress? u A physiological response? u Particular emotions? u A major life event? u A minor life event? u A circumstance? u A conflict between two.

Computational Models of Emotion and Cognition Computational Models of Emotion and cognition Christopher L. Dancy, Frank E. Ritter, Keith Berry Jerry Lin,

Towards a Logic for Wide- Area Internet Routing Nick Feamster Hari Balakrishnan.

Understand the sequence of oral presentation assignment components Learn how to develop explanations for assigned material –Listen to lecture on Rowan.

Modeling Driver Behavior in a Cognitive Architecture

An Architecture for Empathic Agents. Abstract Architecture Planning + Coping Deliberated Actions Agent in the World Body Speech Facial expressions Effectors.

SLB /04/07 Thinking and Communicating “The Spiritual Life is Thinking!” (R.B. Thieme, Jr.)

Computational Investigations of the Regulative Role of Pleasure in Adaptive Behavior Action-Selection Biased by Pleasure-Regulated Simulated Interaction.

Ecological Interface Design

 Learning is acquiring new or modifying existing knowledge, behaviors, skills, values or preferences and may involve synthesizing different types of.

Bob Marinier Advisor: John Laird Functional Contributions of Emotion to Artificial Intelligence.

1 Introduction to Software Engineering Lecture 1.

Information Processing Theory EDU 330: Educational Psychology Daniel Moos.

University of Windsor School of Computer Science Topics in Artificial Intelligence Fall 2008 Sept 11, 2008.

EMOTIONAL INTELLIGENCE. 2 Emotional Intelligence at Work.

Curiosity-Driven Exploration with Planning Trajectories Tyler Streeter PhD Student, Human Computer Interaction Iowa State University

Michael A. Hitt C. Chet Miller Adrienne Colella Slides by R. Dennis Middlemist Michael A. Hitt C. Chet Miller Adrienne Colella Chapter 4 Learning and Perception.

Chapter 3 Human Resource Development

Pediatric Pain Management

Module 16 Emotion.

Chapter 1. Cognitive Systems Introduction in Cognitive Systems, Christensen et al. Course: Robots Learning from Humans Park, Sae-Rom Lee, Woo-Jin Statistical.

A2 Psychology of Sport Concentration Booklet 4 Skills Working as a team Complete green group tasks Working as an individual Complete yellow individual.

RULES Patty Nordstrom Hien Nguyen. "Cognitive Skills are Realized by Production Rules"

Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10

SOAR A cognitive architecture By: Majid Ali Khan.

1 Learning through Interactive Behavior Specifications Tolga Konik CSLI, Stanford University Douglas Pearson Three Penny Software John Laird University.

Chapter 4 Motor Control Theories Concept: Theories about how we control coordinated movement differ in terms of the roles of central and environmental.

1 Situation Comprehension and Emotion Bob Marinier University of Michigan June 2005.

Control-Theoretic Approaches for Dynamic Information Assurance George Vachtsevanos Georgia Tech Working Meeting U. C. Berkeley February 5, 2003.

Chapter Two Copyright © 2006 McGraw-Hill/Irwin The Marketing Research Process.

WP6 Emotion in Interaction Embodied Conversational Agents WP6 core task: describe an interactive ECA system with capabilities beyond those of present day.

Copyright ©2016 Pearson Education, Inc. 5-1 Essentials of Organizational Behavior 13e Stephen P. Robbins & Timothy A. Judge Chapter 5 Personality and Values.

Copyright 2006 John Wiley & Sons, Inc Chapter 5 – Cognitive Engineering HCI: Developing Effective Organizational Information Systems Dov Te’eni Jane Carey.

Cognitive Modeling Cogs 4961, Cogs 6967 Psyc 4510 CSCI 4960 Mike Schoelles

Learning Fast and Slow John E. Laird

Learning and Perception

How should we classify emotions?

Chapter 3: The Reinforcement Learning Problem

Chapter 3: The Reinforcement Learning Problem

Chapter 3: The Reinforcement Learning Problem

Florida State University

Computational Aspects of Emotion in Adaptive Behavior

Presentation transcript:

Bob Marinier Oral Defense University of Michigan, CSE June 17, 2008 A Computational Unification of Cognitive Control, Emotion, and Learning

Introduction The link between core cognitive functions and emotion has not been fully explored Existing computational models are largely pragmatic We integrate the PEACTIDM theory of cognitive control and appraisal theories of emotion PEACTIDM supplies process, appraisal theories supply data We use emotion-driven reinforcement learning to demonstrate improved functionality Automatically generate rewards, set parameters 2

Cognitive Control: PEACTIDM PerceiveObtain raw perception EncodeCreate domain-independent representation AttendChoose stimulus to process ComprehendGenerate structures that relate stimulus to tasks and can be used to inform behavior TaskPerform task maintenance IntendChoose an action, create prediction DecodeDecompose action into motor commands MotorExecute motor commands 3

PEACTIDM Cycle Comprehend Perceive Encode Attend Intend Decode Motor Raw Perceptual Information Stimulus Relevance Stimulus chosen for processing Current Situation Assessment Action Motor Commands Environmental Change Prediction What is this information? 4

Appraisal Theories of Emotion A situation is evaluated along a number of appraisal dimensions, many of which relate the situation to current goals Novelty, goal relevance, goal conduciveness, expectedness, causal agency, etc. Result of appraisals influences emotion Emotion can then be coped with (via internal or external actions) Situation Goals Appraisals Emotion Coping 5

Appraisals to Emotions (Scherer 2001) 6 JoyFearAnger SuddennessHigh/mediumHigh UnpredictabilityHigh Intrinsic pleasantnessLow Goal/need relevanceHigh Cause: agentOther/natureOther Cause: motiveChance/intentionalIntentional Outcome probabilityVery highHighVery high Discrepancy from expectationHigh ConducivenessVery highLow ControlHigh PowerVery lowHigh Why these dimensions? What is the functional purpose of emotion?

Unification of PEACTIDM and Appraisal Theories 7 Comprehend Perceive Encode Attend Intend Decode Motor Raw Perceptual Information Stimulus Relevance Stimulus chosen for processing Current Situation Assessment Action Motor Commands Environmental Change Suddenness Unpredictability Goal Relevance Intrinsic Pleasantness Causal Agent/Motive Discrepancy Conduciveness Control/Power Prediction Outcome Probability

Example: Simple Choice Response Task 8

Appraisal Frame Suddenness1 Goal Relevance1 Conduciveness1 Discrepancy0 Outcome Probability1 PEACTIDM in the Button Task 9 “Surprise Factor”

Appraisal Frame Suddenness1 Goal Relevance1 Conduciveness1 Discrepancy0 Outcome Probability1 PEACTIDM in the Button Task 10 Conduciveness Discrepancy1

Summary of Evaluation Cognitively generated emotions Emotions arise from appraisals 2. Fast primary emotions Some appraisals generated and activated early 3. Emotional experience Cognitive access to emotional state, but no physiology 4. Body-mind interactions Emotions can influence behavior 5. Emotional behavior a. Model works and produces useful, purposeful behavior b. Different environments lead to: i. Different time courses ii. Different feeling profiles c. Choices impact emotions and success

Primary Contributions Appraisals are functionally required by cognition They specify the data used by certain steps in PEACTIDM 2. Appraisals provide a task-independent language for control knowledge They influence choices such as Attend and Intend 3. PEACTIDM implies a partial ordering of appraisal generation Data dependencies imply that some appraisals can’t be generated until after others 4. Circumplex models can be synthesized from appraisal models Emotion intensity and valence can be derived from appraisals 5. Emotion intensity is largely determined by expectations “Surprise Factor” is determined by Outcome Probability and Discrepancy from Expectation 6. Some appraisals may require an arbitrary amount of inference Comprehend can theoretically require arbitrary processing 7. Internal and external stimuli are treated identically Tasking options can be Attended and Intended just like external stimuli

Additional Exploration 13 Functionality: What is emotion good for? Emotion-driven reinforcement learning Scale: Does it work in non-trivial domains? Continuous time/space environment More complex appraisal generation Understanding: How do appraisals influence performance? Try subsets of appraisals

Intrinsically Motivated Reinforcement Learning (Sutton & Barto 1998; Singh et al. 2004) 14 Environment Critic Agent ActionsStatesRewards External Environment Internal Environment Agent Critic Actions StatesRewards Sensations Appraisal Process +/- Emotion Intensity Decisions “Organism” Reward = Intensity * Valence

Clean House Domain 15 Blocks Agent Rooms Storage Room Gateways

Stimuli in the Environment Gateway to 73Gateway to 78 Gateway to 93 Current room Block 1 Create subtask go to room 73 Create subtask go to room 78 Create subtask go to room 93 Create subtask clean current room 16

Learning In this domain, the agent is only learning what to Attend to (including Tasking) Not learning what action to take Goal: What is the impact of various appraisals? Disabled most and developed a few Conduciveness Discrepancy from Expectation and Outcome Probability Goal Relevance Intrinsic Pleasantness Method: SARSA, epsilon-greedy, fixed ER and LR 50 trials, 15 episodes per trial 17

Conduciveness 18 Measures how good or bad a stimulus is Influences emotion intensity and valence Sufficient to generate a reward Value based on “progress” and “path” Progress: Is agent getting closer to goal over time? Path: Will acting on stimulus get agent closer to goal?

Conduciveness Total FailuresTrial FailuresFinal Episode Failures 7.6%24%6% 19

Outcome Probability and Discrepancy from Expectation 20 Measures how likely a prediction is and how accurate the prediction is Influences emotion intensity via “surprise factor” (unvalenced) Predictions and Outcome Probability generated via learned task model Results in non-stationary reward Discrepancy generated via comparison to prediction Added these appraisals on top of Conduciveness

Outcome Probability and Discrepancy from Expectation Total FailuresTrial FailuresFinal Episode Failures 0% 21

Goal Relevance Measures how important a stimulus is for the goal Influences emotion intensity (unvalenced) Value based on “path” knowledge Agent actually had too much path knowledge, so removed some The value of Goal Relevance for some stimulus is used to “boost” the Q-value of the Attend operator for that stimulus Added this appraisal on top of Conduciveness, Outcome Probability, and Discrepancy 22

GR Knowledge Reduction Results 23

Intrinsic Pleasantness Measures how attracted the agent is to a stimulus independent of the current goal Influences emotion intensity and valence Made blocks intrinsically pleasant This is good because blocks need to be Attended to get cleaned up This is bad because agent may be distracted by blocks that have already been cleaned up Replaced Goal Relevance with this appraisal 24

Intrinsic Pleasantness Results 25

Dynamic Exploration Rate 26 Dynamically adjust exploration rate based on current emotion If Valence < 0, then things could probably be better ER = |Intensity * Valence| If Valence > 0, then things are ok ER = 0 Experiment conducted with Conduciveness, Outcome Probability, and Discrepancy only

Dynamic Exploration Rate Total FailuresTrial FailuresFinal Episode Failures 9.2%38%8.0% 27

Dynamic Learning Rate 28 Dynamically adjust learning rate based on current emotion If reward magnitude is large, then there may be something to learn LR = |Intensity*Valence| Experiment conducted with Conduciveness, Outcome Probability, and Discrepancy only, Dynamic Exploration Rate enabled Total FailuresTrial FailuresFinal Episode Failures 0.5%8.0%0.0%

Dynamic Exploration and Learning Rates 29 Dynamically adjust exploration and learning rates based on current emotion If Valence < 0, then things could probably be better ER = |Intensity * Valence| If Valence > 0, then things are ok ER = 0 If reward magnitude is large, then there may be something to learn LR = |Intensity*Valence| Experiment conducted with Conduciveness, Outcome Probability, and Discrepancy only Results: Tighter convergences, better prediction accuracy, small number of failures

Learning Summary ConducivenessFoundation to learning. Agent learns to perform the task better over time. Outcome Probability, Discrepancy from Expectation Introduced learned task model for generating predictions as basis for generating values for these appraisals. Agent learns to predict better over time. Also results in much improved failure rates. Goal Relevance Used to “boost” Q-values of proposed Attend operators. Agent does extremely well (except for failures), to the point where it almost isn’t learning, raising questions about the value of other appraisals. Knowledge about Goal Relevance was reduced, leading to more learning. Intrinsic Pleasantness Used to provide a task-independent bias on valence and intensity. Results are mixed, as expected, but agent generally learns to overcome problems. Dynamic Exploration and Learning Rates Emotion used to regulate part of the architecture. Resulted in tighter convergences and prediction accuracy. Slightly more failures. 30

Secondary Contributions Reinforcement learning can be driven by intrinsically generated rewards based on the agent’s feeling 2. Reinforcement learning parameters can be influenced by the current emotional state, resulting in improved performance 3. Each appraisal contributes to the agent’s performance 4. The system scales to continuous time and space environments 5. Mood averages reward over time, allowing states with no reward-invoking stimulus to still have a reward associated with them

Future Work 32 Cognition Scalability Validation Physiology Action tendencies Non-verbal communication Basic drives Integration with other architectural mechanisms Learning (appraisal values, intend, etc.) Human data Believability Sociocultural interactions More appraisals (social, perceptual, etc.) Physiological measures Behavior Functionality Decision making

Backup Slides 33

Benefits of Soar 34 Parallel rule firing allows for: Parallel Encoding Parallel appraisal generation Parallel Decoding (theoretically) Impasses provide: Architectural support for PEACTIDM-related subgoals Intend Comprehend (theoretically) Support for fast and extended inference, and transitioning from extended to fast (chunking) Intend in button task starts out extended and becomes fast Reinforcement learning allows fast learning from emotion feedback Future benefits: New modules may assist in appraisal generation Episodic/semantic memories, visual imagery, etc.

Architectural Requirements: Soar vs. ACT-R 35

PEACTIDM and GOMS 36 In general, these are complementary techniques GOMS Focused on HCI Focused on motor actions (e.g. keypresses) Less focus on cognitive aspects (more abstract) PEACTIDM Focused on required cognitive functions Allows for a mapping with appraisals Could implement PEACTIDM with GOMS, but would lack the proper labels that allow for the mapping

Relating Emotion to Intrinsically Motivated RL 37 Emotion intensity and valence used to: Generate intrinsic rewards Various appraisals contribute to the reward signal with varying success Frequent reward signals allow agent to learn faster, but can also introduce infinite reward cycles Task modeling helps address cycles Automatically adjust parameters Learning and exploration rates Helps reduce unnecessary exploration, bad learning

Button Task Timing: Before and After Learning 38

Learning the Task Model 39 Stim1 Stimulus 1Stimulus 2Stimulus 3 Stim2 Stim Task Memory Prediction(generic) Outcome Probability0.5 Discrepancy0.0 Surprise Factor Intensity Reward 0.5 Medium Stim Stim Lower Perception/Encoding

Body Symbolic Long-Term Memories Procedural Short-Term Memory Situation, Goals Decision Procedure Chunking Reinforcement Learning Semantic Learning Episodic Learning Perception Action Visual Imagery Feeling Generation Extending Soar with Emotion (Marinier & Laird 2007) 40 Soar is a cognitive architecture A cognitive architecture is a set of task-independent mechanisms that interact to give rise to behavior Cognitive architectures are general agent frameworks

Feeling Generation Reinforcement Learning Emotion.5,.7,0,-.4,.3,… Extending Soar with Emotion (Marinier & Laird 2007) 41 Body Decision Procedure Perception Action Appraisals Feelings Short-Term Memory Situation, Goals Mood.7,-.2,.8,.3,.6,… Feelings Knowledge Architecture Symbolic Long-Term Memories Procedural Chunking Semantic Learning Episodic Learning +/- Intensity Feeling.9,.6,.5,-.1,.8,… Visual Imagery

Appraisal Value Ranges 42

Computing Feeling from Emotion and Mood 43 Assumption: Appraisal dimensions are independent Limited Range: Inputs and outputs are in [0,1] or [-1,1] Distinguishability: Very different inputs should lead to very different outputs Non-linear: Linearity would violate limited range and distinguishability

Example 44

Maze Tasks 45 no distractions distractions single subgoal multiple subgoals impossible

Time Course and Impact of Feelings 46

Feeling Dynamics Results 47 very easy

Computing Feeling Intensity 48 Motivation: Intensity gives a summary of how important (i.e., how good or bad) the situation is Limited range: Should map onto [0,1] No dominant appraisal: No single value should drown out all the others Can’t just multiply values, because if any are 0, then intensity is 0 Realization principle: Expected events should be less intense than unexpected events

Example 49

Learning task 50 Start Goal Optimal Subtasks

Learning Results 51

Circumplex models Emotions can be described in terms of intensity and valence, as in a circumplex model: 52 NEGATIVE VALENCE POSITIVE VALENCE HIGH INTENSITY LOW INTENSITY upset stressed nervous tense sad depressed lethargic fatigued alert excited elated happy contented serene relaxed calm Adapted from Feldman Barrett & Russell (1998)

Full Knowledge Goal Relevance Results 53

Related Work 54