Neural Networks Chapter 7 Joost N. Kok Universiteit Leiden
Recurrent Networks Learning Time Sequences: Sequence Recognition Sequence Reproduction Temporal Association
Recurrent Networks Tapped Delay Lines: Keep several old values in a buffer
Recurrent Networks Drawbacks: Replace fixed time delays by filters: Length must be chosen in advance, leads to large number of input units, large number of training patterns, etc. Replace fixed time delays by filters:
Recurrent Networks Partially recurrent networks Output Nodes Hidden Nodes Input Nodes Context Nodes
Recurrent Networks Jordan Network
Recurrent Networks Elman Network Output Nodes Hidden Nodes Input Nodes Context Nodes
Recurrent Networks Expanded Hierarchical Elman Network Output Units Input Layer Hidden Layer Output Units Context Layer
Recurrent Networks
Recurrent Networks Back-Propagation Through Time
Reinforcement Learning Supervised learning with some feedback Reinforcement Learning Problems: Class I: reinforcement signal is always the same for given input-output pair Class II: stochastic environment, fixed probability for each input-output pair Class III: reinforcement and input patterns depend on past history of network output
Associative Reward-Penalty Stochastic Output Units Reinforcement Signal Target Error
Associative Reward Penalty Learning Rule
Models and Critics Environment
Reinforcement Comparison Critic Environment
Reinforcement Learning Reinforcement-Learning Model Agent receives input I which is some indication of current state s of environment Then the agent chooses an action a The action changes the state of the environment and the value is communicated through a scalar reinforcement signal r
Reinforcement Learning Environment: You are in state 65. You have four possible actions. Agent: I’ll take action 2. Environment: You received a reinforcement of 7 units. You are now in state 15. You have two possible actions. Agent: I’ll take action 1. Environment: You received a reinforcement of -4 units. You are now in state 12. You have two possible actions. …
Reinforcement Learning Environment is non-deterministic: same action in same state may result in different states and different reinforcements The environment is stationary: Probabilities of making state transitions or receiving specific reinforcement signals do not change over time
Reinforcement Learning Two types of learning: Model-free learning Model based learning Typical application areas: Robots Mazes Games …
Reinforcement Learning Paper: A short introduction to Reinforcement Learning (Stephan ten Hagen and Ben Krose)
Reinforcement Learning Environment is a Markov Decision Proces
Reinforcement Learning Optimize interaction with environment Optimize action selection mechanism Temporal Credit Assignment Problem Policy: action selection mechanism Value function:
Reinforcement Learning Optimal Value function based on optimal policy:
Reinforcement Learning Policy Evaluation: approximate value function for given policy Policy Iteration: start with arbitrary policy and improve
Reinforcement Learning Improve Policy:
Reinforcement Learning Value Iteration: combine policy evaluation and policy improvement steps:
Reinforcement Learning Monte Carlo: use if and are not known Given a policy, several complete iterations are performed Exploration/Exploitation Dilemma Extract Information Optimize Interaction
Reinforcement Learning Temporal Difference (TD) Learning During interaction, part of the update can be calculated Information from previous interactions is used
Reinforcement Learning TD(l) learning: discount factor l : the longer ago the state was visited, the less it will be effected by the present update
Reinforcement Learning Q-learning: combine actor and critic:
Reinforcement Learning Use temporal difference learning
Reinforcement Learning Q(l) learning:
Reinforcement Learning Feedforward Neural Networks are used when state/action spaces are large for of estimates of V(s) and Q(s,a).