Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories › Christian P. Janssen ›

Slides:



Advertisements
Similar presentations
Yuri R. Tsoy, Vladimir G. Spitsyn, Department of Computer Engineering
Advertisements

Using Parallel Genetic Algorithm in a Predictive Job Scheduling
This research examines the effects of pressure and task difficulty across the lifespan in a history dependent decision-making task, the Mars Farming task.
Lectures 14: Instrumental Conditioning (Basic Issues) Learning, Psychology 5310 Spring, 2015 Professor Delamater.
Higher Coordination with Less Control – A Result of Information Maximization in the Sensorimotor Loop Keyan Zahedi, Nihat Ay, Ralf Der (Published on: May.
Figure 1. A Trial in the Old-Unpleasant IAT Task
1 Reinforcement Learning Problem Week #3. Figure reproduced from the figure on page 52 in reference [1] 2 Reinforcement Learning Loop state Agent Environment.
DARPA Mobile Autonomous Robot SoftwareMay Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial.
Comparison of Spatial and Temporal Discrimination Performance across Various Difficulty Levels J.E. THROPP, J.L. SZALMA, & P.A. HANCOCK Department of Psychology.
Phonetic Similarity Effects in Masked Priming Marja-Liisa Mailend 1, Edwin Maas 1, & Kenneth I. Forster 2 1 Department of Speech, Language, and Hearing.
1 Temporal-Difference Learning Week #6. 2 Introduction Temporal-Difference (TD) Learning –a combination of DP and MC methods updates estimates based on.
COSC 878 Seminar on Large Scale Statistical Machine Learning 1.
Quantifying Generalization from Trial-by-Trial Behavior in Reaching Movement Dan Liu Natural Computation Group Cognitive Science Department, UCSD March,
Gray et al., The Soft Constraints Hypothesis: A Rational Analysis Approach to Resource Allocation for Interactive Behavior.
Bayesian Reinforcement Learning with Gaussian Processes Huanren Zhang Electrical and Computer Engineering Purdue University.
Modeling Human Decisions in Coupled Human and Natural Systems: Review of Agent-Based Models Li An San Diego State University Mapping and Disentangling.
Exploration and Exploitation Strategies for the K-armed Bandit Problem by Alexander L. Strehl.
Decision making. ? Blaise Pascal Probability in games of chance How much should I bet on ’20’? E[gain] = Σgain(x) Pr(x)
The Importance of Architecture for Achieving Human-level AI John Laird University of Michigan June 17, th Soar Workshop
Project Life Cycle Jon Ivins DMU. Introduction n Projects consist of many separate components n Constraints include: time, costs, staff, equipment n Assets.
Current State of Play Cognitive psychology / neuroscience: –Increasingly rigorous models of individual decision making –Virtual ignorance (and ignoring)
Models of Human Performance Dr. Chris Baber. 2 Objectives Introduce theory-based models for predicting human performance Introduce competence-based models.
DDMLab – September 27, ACT-R models of training Cleotilde Gonzalez and Brad Best Dynamic Decision Making Laboratory (DDMLab)
Modeling fMRI data generated by overlapping cognitive processes with unknown onsets using Hidden Process Models Rebecca A. Hutchinson (1) Tom M. Mitchell.
Research Methods for HCI: Cognitive Modelling BCS HCI Tutorial 1 st September, 2008.
1 AI and Agents CS 171/271 (Chapters 1 and 2) Some text and images in these slides were drawn from Russel & Norvig’s published material.
ACT-R Workshop Schedule Opening: ACT-R from CMU ’ s Perspective 9:00 - 9:45 Overview of ACT-R -- John R. Anderson 9:45 – 10:30 Details of ACT-R
I2RP/OPTIMA Optimal Personal Interface by Man-Imitating Agents Artificial intelligence & Cognitive Engineering Institute, University of Groningen, Grote.
Results Attentional Focus Presence of others restricted the attentional focus: Participants showed a smaller flanker compatibility effect for the error.
Jeremy R. Gray, Christopher F. Chabris and Todd S. Braver Elaine Chan Neural mechanisms of general fluid intelligence.
When Uncertainty Matters: The Selection of Rapid Goal-Directed Movements Julia Trommershäuser, Laurence T. Maloney, Michael S. Landy Department of Psychology.
Application control chart concepts of designing a pre-alarm system Sheue-Ling Hwang, Jhih-Tsong Lin, Guo-Feng Liang, Yi-Jan Yau, Tzu-Chung Yenn, Chong-Cheng.
Curiosity-Driven Exploration with Planning Trajectories Tyler Streeter PhD Student, Human Computer Interaction Iowa State University
ACT-PRO Action Protocol Tracer A Tool for Analyzing Simple, Rule- based Tasks Wai-Tat Fu & Wayne D. Gray ARCH Lab George Mason University.
Perseveration following a temporal delay in the Dimensional Change Card Sort. Anthony Steven Dick and Willis F. Overton Temple University Correspondence.
Relationship between time orientation and individual characteristics Presenter: Tina Supervisor: Dr. Ravindra Goonetilleke.
REFERENCES Bargh, J. A., Gollwitzer, P. M., Lee-Chai, A., Barndollar, K., & Troetschel, R. (2001). The automated will: Nonconscious activation and pursuit.
Searching Topics Sequential Search Binary Search.
QUIZ!!  T/F: Optimal policies can be defined from an optimal Value function. TRUE  T/F: “Pick the MEU action first, then follow optimal policy” is optimal.
Eye Movements and Working Memory Marc Pomplun Department of Computer Science University of Massachusetts at Boston Homepage:
Reinforcement Learning for Mapping Instructions to Actions S.R.K. Branavan, Harr Chen, Luke S. Zettlemoyer, Regina Barzilay Computer Science and Artificial.
Exploring Individual Variability Using ACT-R Christian Schunn George Mason University.
Small Decision-Making under Uncertainty and Risk Takemi Fujikawa University of Western Sydney, Australia Agenda: Introduction Experimental Design Experiment.
WEEK 4: 1/26/15 – 1/30/15 PSYCHOLOGY 310: SPORT & INJURY PSYCHOLOGY UNIVERSITY OF MARY INSTRUCTOR: DR. THERESA MAGELKY Psychological Responses to Injury/
Priming of Landmarks During Object-Location Tasks:
Falk Lieder1, Zi L. Sim1, Jane C. Hu, Thomas L
Learning Fast and Slow John E. Laird
Emilie Zamarripa & Joseph Latimer| Faculty Mentor: Jarrod Hines
1 University of Hamburg 2 University of Applied Sciences Heidelberg
Reinforcement learning (Chapter 21)
CMSC 471 – Spring 2014 Class #25 – Thursday, May 1
An Overview of Reinforcement Learning
Paridon KN, Turner K, Nevison CM, Bristow M, Timmis MA
Processes of Mental Contrasting: Linking Future with Reality
Hybrid computing using a neural network with dynamic external memory
Sean Duffy Steven Gussman John Smith
Processes of Mental Contrasting: Linking Future with Reality
Chapter 3: The Reinforcement Learning Problem
Revealing priors on category structures through iterated learning
Dr. Unnikrishnan P.C. Professor, EEE
Chapter 2: Evaluative Feedback
Chapter 3: The Reinforcement Learning Problem
AI and Agents CS 171/271 (Chapters 1 and 2)
Chapter 3: The Reinforcement Learning Problem
CMSC 471 – Fall 2011 Class #25 – Tuesday, November 29
HCI Evaluation Techniques
iSTART: Reading Strategy Training
ACT-R models of training
Artificial Intelligence
Chapter 2: Evaluative Feedback
Presentation transcript:

Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories › Christian P. Janssen › Wayne D. Gray › Michael J. Schoelles How a Modeler’s Conception of Rewards Influences a Model’s behavior Investigating ACT-R 6’s utility learning mechanism

Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 2 Temporal difference learning & ACT-R ›Temporal difference learning has recently been introduced as ACT-R’s new utility learning mechanism (e.g., Fu & Anderson, 2004; Anderson, 2006, 2007; Bothell, 2005) ›Utility learning learns to optimize behavior as to maximize the rewards that the model receives ›A model can: Receive rewards at different moments in times Receive rewards of different magnitudes ›There are no guidelines for choosing when a reward should be given and what its magnitude should be

Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 3 New issues for ACT-R ›We studied two aspects of TD learning: When is reward given Magnitude of the reward ›This a new issue for ACT-R When is reward given: could be varied in ACT-R 5 Magnitude of reward: could not be varied in ACT-R 5 ›As we will show, the modeler’s conception of rewards has a big influence on a model’s behavior ›Case study: Blocks World task (Gray et al., 2006)

Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 4 Why the Blocks World task? ›Previous work indicates that the utility learning mechanism is crucial for this task ACT-R 5 models (Gray, Sims, Schoelles, 2005) Regular ACT-R 5 can not provide a good fit to the human data Because rewards in ACT-R 5 are binary (i.e., successes and failures) and not scalar Ideal Performer Model (Gray et al., 2006) Model outside of ACT-R that uses temporal difference learning provided a very good fit (Gray et al., 2006)

Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 5 Blocks World task ›So what’s the task?

Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 6 Blocks World task Task: “Copy pattern in target window by moving blocks from resource window to workspace window”

Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 7 Blocks World task Windows are covered with gray rectangles: Accessing information requires interaction with the interface

Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 8 Blocks World task Windows are covered with gray rectangles: Accessing information requires interaction with the interface

Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 9 Blocks World task Windows are covered with gray rectangles: Accessing information requires interaction with the interface

Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 10 Blocks World task Windows are covered with gray rectangles: Accessing information requires interaction with the interface

Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 11 Blocks World task ›Blocks world task: Information in Target Window is only available after waiting for a lockout time 0, 400 or 3200 milliseconds (between subjects)

Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 12 Blocks World task: human data (Gray et al., 2006) ›Size of lockout time influences human behavior: Number of blocks placed after 1st visit to target window Lockout Time [s]

Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 13 Blocks World task: Modeling Strategies ›Strategy: How many blocks do you plan to place after a visit to the target window? ›8 encode-x production rules “study x blocks” Encode-1 till encode-8 ›Model learns utility value of each production rule using ACT-R’s temporal difference learning algorithm

Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 14 Utility learning ›Utility learning requires the incorporation of rewards ›Two choices are crucial: When is the reward is given? What is the magnitude of the reward? ›After some experience, the utility of a production rule approximates (Anderson, 2007): MagnitudeWhen is reward given

Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 15 Utility learning ›Choice 1: When is the reward given? ›Important because: Utility value has a linear relationship with the the time at which the reward is given ›Choice in Blocks World Once model: Update once, at the end of the trial Each model: Update each time that part of the task is completed. A (set of) block(s) has been placed and the model either returns to the target window to study more blocks, or finishes the trial

Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 16 Utility learning ›Choice 2: magnitude of the reward ›Important because: Utility value has a linear relationship with the magnitude of the reward › But how to set this value? Experimental tweaking? -> unfavorable Fixed range of values? (e.g., between 0 and 1) -> difficult Relate to neurological data? -> not available for most models

Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 17 Utility learning ›Choice 2: magnitude of the reward ›Choice in Blocks World: Relate the reward to what might be important in the task Accuracy: Accuracy with which task is performed Options: Success: # blocks placed (once) Success: # blocks placed (each) Success & Failure: # blocks placed - #blocks forgotten (each model) Time: How much time does (part of the) task take? Options: Time spend on the task: -1 * time spend (once) Time spend waiting for specific aspect of the task: -1 * lockout size * number of visits to target window (once) Number of blocks placed per second (each)

Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 18 Blocks World task: Modeling Strategies ›6 models were developed ›Each model is run 6 times for each of 3 experimental conditions: 0, 400 and 3200 milliseconds ›Models interact with the same interface as human participants

Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 19 Blocks World task: general results ›Each model has unique results

Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 20 Blocks World task: general results ›What is the impact of: When the reward is given (once/each) The concept of the reward (related to accuracy/time) ›Results averaged over 3 models

Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 21 Utility learning: impact of when reward is given

Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 22 Utility learning: impact of concept of reward

Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 23 Comparison with ACT-R 5 (Gray, Sims & Schoelles, 2005)

Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 24 Conclusion ›Rewards can be given at different times during a trial and according to different concepts ›There are no guidelines what the best choices are ›Blocks World suggests that rewards should: Be given once: Model can optimize behavior over entire task Relate to concept of time: because different strategy choices have a big impact on reward size ›Models of other tasks should point out if this is consistent

Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 25 Conclusion ›This is not just a Blocks World issue General Computer Science / AI issue: representing a task in the right way is crucial (e.g., Russell & Norvig, 1995; Sutton & Barto, 1998) Many experiments involve manipulations and measurements of accuracy and speed of performance ›This a new issue for ACT-R When is reward given: could be varied in ACT-R 5 Magnitude of reward: could not be varied in ACT-R 5

Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 26 Thank you for your attention ›Questions? ›More information: Poster CogSci 2008 Thursday, July 24th “Cognitive Models of Strategy Shifts in Interactive Behavior” (session: “Attention and Implicit Learning”)

Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 27 References ›Anderson, J. R. (2006). A new utility learning mechanism. Paper presented at the 2006 ACT-R workshop. ›Anderson, J. R. (2007). How can the human mind occur in the physical universe? New York: Oxford University Press. ›Bothell, D. (2005). ACT-R 6 Official Release. Proceedings of the 12th ACT-R Workshop. ›Fu, W. T., & Anderson, J. R. (2004). Extending the computational abilities of the procedural learning mechanism in ACT-R. Proceedings of the 26th annual meeting of the Cognitive Science Society, ›Gray, W. D., Schoelles, M. J., & Sims, C. R. (2005). Adapting to the task environment: Explorations in expected value. Cognitive Systems Research, 6(1), ›Gray, W. D., Sims, C. R., Fu, W. T., & Schoelles, M. J. (2006). The soft constraints hypothesis: A rational analysis approach to resource allocation for interactive behavior. Psychological Review, 113(3), ›Russell, S. J., & Norvig, P. (1995). Artificial intelligence: a modern approach. Upper Saddle River, NJ: Prentice-Hall, Inc. ›Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge, MA: MIT Press.