Simulation of the effort-related T-maze choice task by a reinforcement-learning model incorporating the decay of learned values. Simulation of the effort-related.

Slides:



Advertisements
Similar presentations
Reinforcement Learning (II.) Exercise Solutions Ata Kaban School of Computer Science University of Birmingham 2003.
Advertisements

RL for Large State Spaces: Value Function Approximation
Start S2S2 S3S3 S4S4 S5S5 Goal S7S7 S8S8 Arrows indicate strength between two problem states Start maze … Reinforcement learning example.
Octopus Arm Mid-Term Presentation Dmitry Volkinshtein & Peter Szabo Supervised by: Yaki Engel.
Computational Investigations of the Regulative Role of Pleasure in Adaptive Behavior Action-Selection Biased by Pleasure-Regulated Simulated Interaction.
Reinforcement Learning (II.) Exercise Solutions Ata Kaban School of Computer Science University of Birmingham.
Balancing Exploration and Exploitation Ratio in Reinforcement Learning Ozkan Ozcan (1stLT/ TuAF)
Learning Theory Reza Shadmehr & Jörn Diedrichsen Reinforcement Learning 1: Generalized policy iteration.
yahoo.com SUT-System Level Performance Models yahoo.com SUT-System Level Performance Models8-1 chapter11 Single Queue Systems.
Human and Optimal Exploration and Exploitation in Bandit Problems Department of Cognitive Sciences, University of California. A Bayesian analysis of human.
COMP 2208 Dr. Long Tran-Thanh University of Southampton Reinforcement Learning.
Gil McVean, Department of Statistics Thursday February 12 th 2009 Monte Carlo simulation.
Course 2 Probability Basics 7.9 and Theoretical Probability Theoretical Probability is the ratio of the number of ways an event can occur to the.
Figure 5: Change in Blackjack Posterior Distributions over Time.
Table 1 Groups of subjects in a study of the association between antibiotic use and colonization with resistant pneumococci. From: Measuring and Interpreting.
From: Aerodynamics of a Rugby Ball
Journal of Vision. 2017;17(8):6. doi: / Figure Legend:
Journal of Vision. 2017;17(8):6. doi: / Figure Legend:
Hidden Markov chain models (state space model)
Reinforcement learning
Volume 80, Issue 4, Pages (November 2013)
Volume 72, Issue 4, Pages (November 2011)
Dr. Unnikrishnan P.C. Professor, EEE
Jung Hoon Sul, Hoseok Kim, Namjung Huh, Daeyeol Lee, Min Whan Jung 
Impaired learning and relearning of virtual cued goal localization in Shank2-deficient mice. Impaired learning and relearning of virtual cued goal localization.
Volume 84, Issue 1, Pages (October 2014)
Learning to Simulate Others' Decisions
Modeling Endoplasmic Reticulum Network Maintenance in a Plant Cell
Seiichiro Amemiya, A. David Redish  Cell Reports 
Nils Kolling, Marco Wittmann, Matthew F.S. Rushworth  Neuron 
Volume 93, Issue 2, Pages (January 2017)
Volume 62, Issue 5, Pages (June 2009)
Hierarchical empirical Bayesian inference on group effects using the function spm_dcm_peb. Hierarchical empirical Bayesian inference on group effects using.
Volume 65, Issue 1, Pages (January 2010)
Angle of Impact.
Jeffrey Cockburn, Anne G.E. Collins, Michael J. Frank  Neuron 
Simulated group difference between control subjects and patients (with a group difference in precision of one-quarter) in the average reward received.
The Structure of your Simulation assessment
Active avoidance is impaired in aged mice in both multimodal and visual versions of the task. Active avoidance is impaired in aged mice in both multimodal.
A Neural Signature of Hierarchical Reinforcement Learning
Implicit adaptation generalizes around the aiming location (30° CCW)
Learning to Simulate Others' Decisions
Gad Kimmel, Ron Shamir  The American Journal of Human Genetics 
Volume 27, Issue 3, Pages (September 2000)
Supervised Calibration Relies on the Multisensory Percept
Satomi Matsuoka, Tatsuo Shibata, Masahiro Ueda  Biophysical Journal 
Reinforcement learning
Volume 92, Issue 2, Pages (October 2016)
Data simulation using the routine spm_MDP_VB.
Changes in WM diffusion parameters with increasing age along the length of HG from medial (primary) to lateral (nonprimary). Changes in WM diffusion parameters.
Brownian Dynamics of Subunit Addition-Loss Kinetics and Thermodynamics in Linear Polymer Self-Assembly  Brian T. Castle, David J. Odde  Biophysical Journal 
Cortical Signals for Rewarded Actions and Strategic Exploration
Incorporation of possible secondary effects of DA depletion into the model. Incorporation of possible secondary effects of DA depletion into the model.
The Hippocampus, Memory, and Place Cells
A, Rewarded voluntary switch task, a combined risky decision-making and task-switching paradigm. A, Rewarded voluntary switch task, a combined risky decision-making.
Discriminating IR light with visual cortex.
Log10-transformed EEG power in the 0- to 30-Hz range measured in females (white) and males (gray) at each channel for NFB (left) and CAL (right), shown.
Modeling Endoplasmic Reticulum Network Maintenance in a Plant Cell
Effect of circadian forced desynchrony on sexual behavior of male rats during the dark phase. Effect of circadian forced desynchrony on sexual behavior.
Reduction of PGC-1α isoforms, SDHA, and Tomm20 in PD
LFP findings from the PPC during performance on the visual target-detection task. LFP findings from the PPC during performance on the visual target-detection.
Effect of consecutive wins or losses on choice: test for reinforcement learning. Effect of consecutive wins or losses on choice: test for reinforcement.
Baseline sex differences in spontaneous excitatory activity.
Use of a PPT to ascertain female preference for males previously exposed to PO relative to CO. A, Schematic representation of the T-maze used for the PPT.
Summary data for target shift experiments and transient recall of old memories observed in the second target shift task. Summary data for target shift.
Elimination of dendritic spines per 2 d.
Cue-evoked dopamine release dynamics in the NAc shell and core.
Changes in levels of transcripts encoding inflammatory cytokines and GFAP. The amounts of the PCR cDNA products were obtained from phosphorimager readings.
Power spectra of EMG during the pre- and post-feedback sessions.
Differential representation of the radial maze in the dCA3 and vCA3 regions. Differential representation of the radial maze in the dCA3 and vCA3 regions.
Presentation transcript:

Simulation of the effort-related T-maze choice task by a reinforcement-learning model incorporating the decay of learned values. Simulation of the effort-related T-maze choice task by a reinforcement-learning model incorporating the decay of learned values. A, Self-paced navigation in the T-maze was simulated by a series of selections of Go, move to the next state (indicated by the straight arrows), or Stay, stay at the same state (indicated by the round arrows). The physical barrier placed in the HD arm in Condition 1 and 3 in the experiments was represented as the existence of an extra state preceding the rewarded state in the HD arm, i.e., State 5 preceding State 7. B, Magnification of the T-maze near the T-junction, illustrating a situation where the rat is taking Go from State 3 to State 4 (denoted as Go3→4). At the next time step, the rat arrives at State 4 and selects Go4→5 (go to the HD arm), Stay4→4, or Go4→6 (go to the LD arm) depending on the values of these actions, with the ratio of probabilities shown in the right. TD-RPE is calculated, and the value of Go3→4 is updated according to the TD-RPE, and in addition, the value of arbitrary action decays, as shown in the bottom. α, β, and φ in the formulas are the parameters representing the learning rate, inverse temperature (which determines the degree of exploitation over exploration on choice), and decay rate, respectively, and they were set to 0.5, 5, and 0.01 in the simulations. D in the formula of TD-RPE is the parameter for DA depletion: it was set to 1 before depletion (1–500 trials), and 0.25 after depletion (501–1000 trials). Kenji Morita, and Ayaka Kato eNeuro 2018;5:ENEURO.0021-18.2018 ©2018 by Society for Neuroscience