Chapter 16. Basal Ganglia Models for Autonomous Behavior Learning in Creating Brain-Like Intelligence, Sendhoff et al. Course: Robots Learning from Humans Kim, Jung Ah College of Natural Sciences Interdisciplinary Program in Brain Science Seoul National University
Contents 1.Introduction 1.Related Works – Reinforcement Learning Research – Neuroscience Findings – Open Questions 2.Basal Ganglia Models – Basal Ganglia System Model – Basal Ganglia Spiking Neural Network Model 1.Discussion 2.Conclusion 2
Introduction Research objective : – to develop an autonomous behavior learning system for machines Why? – People will demand that the machine selects the best option by assessing the situation What to investigate: – Learning mechanism underlying behavior selection in the animal basal ganglia (BG)
Introduction Animal Learning Basal Ganglia (BG) Classical Pavlovian conditioning Instrumental conditioning Machine Learning Reinforcement Learning (RL) Temporal difference (TD) learning Phasic activity of dopamine neurons in the BG
Related Works (1) – What is RL? Action a t Reward r t State s t+1 In reinforcement learning: agent interacts with its environment perceptions (state), actions, rewards (repeat) task is to choose actions to maximize rewards complete background knowledge unavailable Reinforcement learning (RL) is concerned with “how an agent ought to take actions in an environment so as to maximize some notion of long-term reward.”
Related Works (2) – BG Cerebral Cortex (frontal, prefrontal, and parietal areas) BG (striatum) Cerebral cortex (motor area) Basal Ganglia(BG) is buried deep within the telencephalon
Related Works (2) – Why BG? Information from the cortex flows through the direct and indirect pathways in parallel The outputs of both pathways ultimately regulate the motor thalamus The direct pathway helps to select certain motor actions while the indirect pathway simultaneously suppresses competing, and inappropriate, motor programs
Related Works Machine learning (Reinforcement learning, RL) Neuroscience (Basal Ganglia, BG) Model-based RL Uses experiences to construct an internal model of state transitions Can solve complex structured tasks Dyna and real-time dynamic programming Model-free RL Uses experiences to directly learn one or two simpler quantities, which can then achieve the optimal behavior without learning a world model Can be used to solve unknown tasks Temporal difference(TD) learning and Q-learning Model-based RL Dorsomedial striatum Prelimbic prefrontal cortex Orbito frontal cortex Medial prefrontal cortex Parts of the amygdala Model-free RL (Neuromodulatory system) Dorsolateral striatum Amygdala These brain areas are interconnected by parallel loops! BG are running model-free RL independently, as well as running model-based RL by receiving modeled, or structured, input from cortex.
Basal Ganglia(BG) Models (I) BG System Model Two types of test environments: MDP (Markov Decision Process) HOMDP (High Order MDP)
Basal Ganglia(BG) Models (I) BG System Model Two types of test environments: 1) MDP (Markov Decision Process) 2) HOMDP (High Order Markov Decision Process)
Basal Ganglia(BG) Models (II) BG Spiking Neural Network Model – Can select and initiate an action for trial and error in the presence of noisy, ambiguous input streams, and then adaptively of actions tune selection probability and timing – Indirect pathway selects an action and the direct pathway initiates the selected action
Discussion The BG system model focuses on the problems associated with relating model-based RL and model- free RL in a single system Open questions: – Architecture – Role of Neuromodulators – Neural Mechanisms of the Timing Perspective: – Associative Interacting Intelligence
Conclusion The BG system model illustrates the effectiveness of internal state representation and internal reward for achieving a goal in shorter trials The BG spiking neural circuit model has the capacity for probabilistic selection of action and also shows that selection probability and execution timing can be modulated