Download presentation
Presentation is loading. Please wait.
Published byShannon Cole Modified over 9 years ago
1
Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories › Christian P. Janssen › Wayne D. Gray › Michael J. Schoelles How a Modeler’s Conception of Rewards Influences a Model’s behavior Investigating ACT-R 6’s utility learning mechanism
2
Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 2 Temporal difference learning & ACT-R ›Temporal difference learning has recently been introduced as ACT-R’s new utility learning mechanism (e.g., Fu & Anderson, 2004; Anderson, 2006, 2007; Bothell, 2005) ›Utility learning learns to optimize behavior as to maximize the rewards that the model receives ›A model can: Receive rewards at different moments in times Receive rewards of different magnitudes ›There are no guidelines for choosing when a reward should be given and what its magnitude should be
3
Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 3 New issues for ACT-R ›We studied two aspects of TD learning: When is reward given Magnitude of the reward ›This a new issue for ACT-R When is reward given: could be varied in ACT-R 5 Magnitude of reward: could not be varied in ACT-R 5 ›As we will show, the modeler’s conception of rewards has a big influence on a model’s behavior ›Case study: Blocks World task (Gray et al., 2006)
4
Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 4 Why the Blocks World task? ›Previous work indicates that the utility learning mechanism is crucial for this task ACT-R 5 models (Gray, Sims, Schoelles, 2005) Regular ACT-R 5 can not provide a good fit to the human data Because rewards in ACT-R 5 are binary (i.e., successes and failures) and not scalar Ideal Performer Model (Gray et al., 2006) Model outside of ACT-R that uses temporal difference learning provided a very good fit (Gray et al., 2006)
5
Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 5 Blocks World task ›So what’s the task?
6
Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 6 Blocks World task Task: “Copy pattern in target window by moving blocks from resource window to workspace window”
7
Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 7 Blocks World task Windows are covered with gray rectangles: Accessing information requires interaction with the interface
8
Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 8 Blocks World task Windows are covered with gray rectangles: Accessing information requires interaction with the interface
9
Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 9 Blocks World task Windows are covered with gray rectangles: Accessing information requires interaction with the interface
10
Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 10 Blocks World task Windows are covered with gray rectangles: Accessing information requires interaction with the interface
11
Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 11 Blocks World task ›Blocks world task: Information in Target Window is only available after waiting for a lockout time 0, 400 or 3200 milliseconds (between subjects)
12
Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 12 Blocks World task: human data (Gray et al., 2006) ›Size of lockout time influences human behavior: 0 1 2 3 4 5 0.01.02.03.0 Number of blocks placed after 1st visit to target window Lockout Time [s]
13
Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 13 Blocks World task: Modeling Strategies ›Strategy: How many blocks do you plan to place after a visit to the target window? ›8 encode-x production rules “study x blocks” Encode-1 till encode-8 ›Model learns utility value of each production rule using ACT-R’s temporal difference learning algorithm
14
Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 14 Utility learning ›Utility learning requires the incorporation of rewards ›Two choices are crucial: When is the reward is given? What is the magnitude of the reward? ›After some experience, the utility of a production rule approximates (Anderson, 2007): MagnitudeWhen is reward given
15
Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 15 Utility learning ›Choice 1: When is the reward given? ›Important because: Utility value has a linear relationship with the the time at which the reward is given ›Choice in Blocks World Once model: Update once, at the end of the trial Each model: Update each time that part of the task is completed. A (set of) block(s) has been placed and the model either returns to the target window to study more blocks, or finishes the trial
16
Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 16 Utility learning ›Choice 2: magnitude of the reward ›Important because: Utility value has a linear relationship with the magnitude of the reward › But how to set this value? Experimental tweaking? -> unfavorable Fixed range of values? (e.g., between 0 and 1) -> difficult Relate to neurological data? -> not available for most models
17
Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 17 Utility learning ›Choice 2: magnitude of the reward ›Choice in Blocks World: Relate the reward to what might be important in the task Accuracy: Accuracy with which task is performed Options: Success: # blocks placed (once) Success: # blocks placed (each) Success & Failure: # blocks placed - #blocks forgotten (each model) Time: How much time does (part of the) task take? Options: Time spend on the task: -1 * time spend (once) Time spend waiting for specific aspect of the task: -1 * lockout size * number of visits to target window (once) Number of blocks placed per second (each)
18
Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 18 Blocks World task: Modeling Strategies ›6 models were developed ›Each model is run 6 times for each of 3 experimental conditions: 0, 400 and 3200 milliseconds ›Models interact with the same interface as human participants
19
Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 19 Blocks World task: general results ›Each model has unique results
20
Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 20 Blocks World task: general results ›What is the impact of: When the reward is given (once/each) The concept of the reward (related to accuracy/time) ›Results averaged over 3 models
21
Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 21 Utility learning: impact of when reward is given
22
Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 22 Utility learning: impact of concept of reward
23
Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 23 Comparison with ACT-R 5 (Gray, Sims & Schoelles, 2005)
24
Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 24 Conclusion ›Rewards can be given at different times during a trial and according to different concepts ›There are no guidelines what the best choices are ›Blocks World suggests that rewards should: Be given once: Model can optimize behavior over entire task Relate to concept of time: because different strategy choices have a big impact on reward size ›Models of other tasks should point out if this is consistent
25
Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 25 Conclusion ›This is not just a Blocks World issue General Computer Science / AI issue: representing a task in the right way is crucial (e.g., Russell & Norvig, 1995; Sutton & Barto, 1998) Many experiments involve manipulations and measurements of accuracy and speed of performance ›This a new issue for ACT-R When is reward given: could be varied in ACT-R 5 Magnitude of reward: could not be varied in ACT-R 5
26
Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 26 Thank you for your attention ›Questions? ›More information: cjanssen@ai.rug.nl www.ai.rug.nl/~cjanssen www.cogsci.rpi.edu/cogworks Poster Session @ CogSci 2008 Thursday, July 24th “Cognitive Models of Strategy Shifts in Interactive Behavior” (session: “Attention and Implicit Learning”)
27
Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories 27 References ›Anderson, J. R. (2006). A new utility learning mechanism. Paper presented at the 2006 ACT-R workshop. ›Anderson, J. R. (2007). How can the human mind occur in the physical universe? New York: Oxford University Press. ›Bothell, D. (2005). ACT-R 6 Official Release. Proceedings of the 12th ACT-R Workshop. ›Fu, W. T., & Anderson, J. R. (2004). Extending the computational abilities of the procedural learning mechanism in ACT-R. Proceedings of the 26th annual meeting of the Cognitive Science Society, 416-421. ›Gray, W. D., Schoelles, M. J., & Sims, C. R. (2005). Adapting to the task environment: Explorations in expected value. Cognitive Systems Research, 6(1), 27- 40. ›Gray, W. D., Sims, C. R., Fu, W. T., & Schoelles, M. J. (2006). The soft constraints hypothesis: A rational analysis approach to resource allocation for interactive behavior. Psychological Review, 113(3), 461-482. ›Russell, S. J., & Norvig, P. (1995). Artificial intelligence: a modern approach. Upper Saddle River, NJ: Prentice-Hall, Inc. ›Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge, MA: MIT Press.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.