Download presentation
Presentation is loading. Please wait.
1
Neural Networks Chapter 7
Joost N. Kok Universiteit Leiden
2
Recurrent Networks Learning Time Sequences: Sequence Recognition
Sequence Reproduction Temporal Association
3
Recurrent Networks Tapped Delay Lines:
Keep several old values in a buffer
4
Recurrent Networks Drawbacks: Replace fixed time delays by filters:
Length must be chosen in advance, leads to large number of input units, large number of training patterns, etc. Replace fixed time delays by filters:
5
Recurrent Networks Partially recurrent networks Output Nodes
Hidden Nodes Input Nodes Context Nodes
6
Recurrent Networks Jordan Network
7
Recurrent Networks Elman Network Output Nodes Hidden Nodes Input Nodes
Context Nodes
8
Recurrent Networks Expanded Hierarchical Elman Network Output Units
Input Layer Hidden Layer Output Units Context Layer
9
Recurrent Networks
11
Recurrent Networks Back-Propagation Through Time
12
Reinforcement Learning
Supervised learning with some feedback Reinforcement Learning Problems: Class I: reinforcement signal is always the same for given input-output pair Class II: stochastic environment, fixed probability for each input-output pair Class III: reinforcement and input patterns depend on past history of network output
13
Associative Reward-Penalty
Stochastic Output Units Reinforcement Signal Target Error
14
Associative Reward Penalty
Learning Rule
15
Models and Critics Environment
16
Reinforcement Comparison
Critic Environment
17
Reinforcement Learning
Reinforcement-Learning Model Agent receives input I which is some indication of current state s of environment Then the agent chooses an action a The action changes the state of the environment and the value is communicated through a scalar reinforcement signal r
18
Reinforcement Learning
Environment: You are in state 65. You have four possible actions. Agent: I’ll take action 2. Environment: You received a reinforcement of 7 units. You are now in state 15. You have two possible actions. Agent: I’ll take action 1. Environment: You received a reinforcement of -4 units. You are now in state 12. You have two possible actions. …
19
Reinforcement Learning
Environment is non-deterministic: same action in same state may result in different states and different reinforcements The environment is stationary: Probabilities of making state transitions or receiving specific reinforcement signals do not change over time
20
Reinforcement Learning
Two types of learning: Model-free learning Model based learning Typical application areas: Robots Mazes Games …
21
Reinforcement Learning
Paper: A short introduction to Reinforcement Learning (Stephan ten Hagen and Ben Krose)
22
Reinforcement Learning
Environment is a Markov Decision Proces
23
Reinforcement Learning
Optimize interaction with environment Optimize action selection mechanism Temporal Credit Assignment Problem Policy: action selection mechanism Value function:
24
Reinforcement Learning
Optimal Value function based on optimal policy:
25
Reinforcement Learning
Policy Evaluation: approximate value function for given policy Policy Iteration: start with arbitrary policy and improve
26
Reinforcement Learning
Improve Policy:
27
Reinforcement Learning
Value Iteration: combine policy evaluation and policy improvement steps:
28
Reinforcement Learning
Monte Carlo: use if and are not known Given a policy, several complete iterations are performed Exploration/Exploitation Dilemma Extract Information Optimize Interaction
29
Reinforcement Learning
Temporal Difference (TD) Learning During interaction, part of the update can be calculated Information from previous interactions is used
30
Reinforcement Learning
TD(l) learning: discount factor l : the longer ago the state was visited, the less it will be effected by the present update
31
Reinforcement Learning
Q-learning: combine actor and critic:
32
Reinforcement Learning
Use temporal difference learning
33
Reinforcement Learning
Q(l) learning:
34
Reinforcement Learning
Feedforward Neural Networks are used when state/action spaces are large for of estimates of V(s) and Q(s,a).
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.