Download presentation
Presentation is loading. Please wait.
Published byKelley Lee Modified over 5 years ago
1
Simulation of the effort-related T-maze choice task by a reinforcement-learning model incorporating the decay of learned values. Simulation of the effort-related T-maze choice task by a reinforcement-learning model incorporating the decay of learned values. A, Self-paced navigation in the T-maze was simulated by a series of selections of Go, move to the next state (indicated by the straight arrows), or Stay, stay at the same state (indicated by the round arrows). The physical barrier placed in the HD arm in Condition 1 and 3 in the experiments was represented as the existence of an extra state preceding the rewarded state in the HD arm, i.e., State 5 preceding State 7. B, Magnification of the T-maze near the T-junction, illustrating a situation where the rat is taking Go from State 3 to State 4 (denoted as Go3→4). At the next time step, the rat arrives at State 4 and selects Go4→5 (go to the HD arm), Stay4→4, or Go4→6 (go to the LD arm) depending on the values of these actions, with the ratio of probabilities shown in the right. TD-RPE is calculated, and the value of Go3→4 is updated according to the TD-RPE, and in addition, the value of arbitrary action decays, as shown in the bottom. α, β, and φ in the formulas are the parameters representing the learning rate, inverse temperature (which determines the degree of exploitation over exploration on choice), and decay rate, respectively, and they were set to 0.5, 5, and 0.01 in the simulations. D in the formula of TD-RPE is the parameter for DA depletion: it was set to 1 before depletion (1–500 trials), and 0.25 after depletion (501–1000 trials). Kenji Morita, and Ayaka Kato eNeuro 2018;5:ENEURO ©2018 by Society for Neuroscience
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.