Neural Networks Chapter 7

Neural Networks Chapter 7
Joost N. Kok Universiteit Leiden

Recurrent Networks Learning Time Sequences: Sequence Recognition
Sequence Reproduction Temporal Association

Recurrent Networks Tapped Delay Lines:
Keep several old values in a buffer

Recurrent Networks Drawbacks: Replace fixed time delays by filters:
Length must be chosen in advance, leads to large number of input units, large number of training patterns, etc. Replace fixed time delays by filters:

Recurrent Networks Partially recurrent networks Output Nodes
Hidden Nodes Input Nodes Context Nodes

Recurrent Networks Jordan Network

Recurrent Networks Elman Network Output Nodes Hidden Nodes Input Nodes
Context Nodes

Recurrent Networks Expanded Hierarchical Elman Network Output Units
Input Layer Hidden Layer Output Units Context Layer

Recurrent Networks

Recurrent Networks Back-Propagation Through Time

Reinforcement Learning
Supervised learning with some feedback Reinforcement Learning Problems: Class I: reinforcement signal is always the same for given input-output pair Class II: stochastic environment, fixed probability for each input-output pair Class III: reinforcement and input patterns depend on past history of network output

Associative Reward-Penalty
Stochastic Output Units Reinforcement Signal Target Error

Associative Reward Penalty
Learning Rule

Models and Critics Environment

Reinforcement Comparison
Critic Environment

Reinforcement-Learning Model Agent receives input I which is some indication of current state s of environment Then the agent chooses an action a The action changes the state of the environment and the value is communicated through a scalar reinforcement signal r

Environment: You are in state 65. You have four possible actions. Agent: I’ll take action 2. Environment: You received a reinforcement of 7 units. You are now in state 15. You have two possible actions. Agent: I’ll take action 1. Environment: You received a reinforcement of -4 units. You are now in state 12. You have two possible actions. …

Environment is non-deterministic: same action in same state may result in different states and different reinforcements The environment is stationary: Probabilities of making state transitions or receiving specific reinforcement signals do not change over time

Two types of learning: Model-free learning Model based learning Typical application areas: Robots Mazes Games …

Paper: A short introduction to Reinforcement Learning (Stephan ten Hagen and Ben Krose)

Environment is a Markov Decision Proces

Optimize interaction with environment Optimize action selection mechanism Temporal Credit Assignment Problem Policy: action selection mechanism Value function:

Optimal Value function based on optimal policy:

Policy Evaluation: approximate value function for given policy Policy Iteration: start with arbitrary policy and improve

Improve Policy:

Value Iteration: combine policy evaluation and policy improvement steps:

Monte Carlo: use if and are not known Given a policy, several complete iterations are performed Exploration/Exploitation Dilemma Extract Information Optimize Interaction

Temporal Difference (TD) Learning During interaction, part of the update can be calculated Information from previous interactions is used

TD(l) learning: discount factor l : the longer ago the state was visited, the less it will be effected by the present update

Q-learning: combine actor and critic:

Use temporal difference learning

Q(l) learning:

Feedforward Neural Networks are used when state/action spaces are large for of estimates of V(s) and Q(s,a).

Neural Networks Chapter 7

Similar presentations

Presentation on theme: "Neural Networks Chapter 7"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Neural Networks Chapter 7

Similar presentations

Presentation on theme: "Neural Networks Chapter 7"— Presentation transcript:

Similar presentations

About project

Feedback