Presentation is loading. Please wait.

Presentation is loading. Please wait.

Neural Networks Chapter 7

Similar presentations


Presentation on theme: "Neural Networks Chapter 7"— Presentation transcript:

1 Neural Networks Chapter 7
Joost N. Kok Universiteit Leiden

2 Recurrent Networks Learning Time Sequences: Sequence Recognition
Sequence Reproduction Temporal Association

3 Recurrent Networks Tapped Delay Lines:
Keep several old values in a buffer

4 Recurrent Networks Drawbacks: Replace fixed time delays by filters:
Length must be chosen in advance, leads to large number of input units, large number of training patterns, etc. Replace fixed time delays by filters:

5 Recurrent Networks Partially recurrent networks Output Nodes
Hidden Nodes Input Nodes Context Nodes

6 Recurrent Networks Jordan Network

7 Recurrent Networks Elman Network Output Nodes Hidden Nodes Input Nodes
Context Nodes

8 Recurrent Networks Expanded Hierarchical Elman Network Output Units
Input Layer Hidden Layer Output Units Context Layer

9 Recurrent Networks

10

11 Recurrent Networks Back-Propagation Through Time

12 Reinforcement Learning
Supervised learning with some feedback Reinforcement Learning Problems: Class I: reinforcement signal is always the same for given input-output pair Class II: stochastic environment, fixed probability for each input-output pair Class III: reinforcement and input patterns depend on past history of network output

13 Associative Reward-Penalty
Stochastic Output Units Reinforcement Signal Target Error

14 Associative Reward Penalty
Learning Rule

15 Models and Critics Environment

16 Reinforcement Comparison
Critic Environment

17 Reinforcement Learning
Reinforcement-Learning Model Agent receives input I which is some indication of current state s of environment Then the agent chooses an action a The action changes the state of the environment and the value is communicated through a scalar reinforcement signal r

18 Reinforcement Learning
Environment: You are in state 65. You have four possible actions. Agent: I’ll take action 2. Environment: You received a reinforcement of 7 units. You are now in state 15. You have two possible actions. Agent: I’ll take action 1. Environment: You received a reinforcement of -4 units. You are now in state 12. You have two possible actions.

19 Reinforcement Learning
Environment is non-deterministic: same action in same state may result in different states and different reinforcements The environment is stationary: Probabilities of making state transitions or receiving specific reinforcement signals do not change over time

20 Reinforcement Learning
Two types of learning: Model-free learning Model based learning Typical application areas: Robots Mazes Games

21 Reinforcement Learning
Paper: A short introduction to Reinforcement Learning (Stephan ten Hagen and Ben Krose)

22 Reinforcement Learning
Environment is a Markov Decision Proces

23 Reinforcement Learning
Optimize interaction with environment Optimize action selection mechanism Temporal Credit Assignment Problem Policy: action selection mechanism Value function:

24 Reinforcement Learning
Optimal Value function based on optimal policy:

25 Reinforcement Learning
Policy Evaluation: approximate value function for given policy Policy Iteration: start with arbitrary policy and improve

26 Reinforcement Learning
Improve Policy:

27 Reinforcement Learning
Value Iteration: combine policy evaluation and policy improvement steps:

28 Reinforcement Learning
Monte Carlo: use if and are not known Given a policy, several complete iterations are performed Exploration/Exploitation Dilemma Extract Information Optimize Interaction

29 Reinforcement Learning
Temporal Difference (TD) Learning During interaction, part of the update can be calculated Information from previous interactions is used

30 Reinforcement Learning
TD(l) learning: discount factor l : the longer ago the state was visited, the less it will be effected by the present update

31 Reinforcement Learning
Q-learning: combine actor and critic:

32 Reinforcement Learning
Use temporal difference learning

33 Reinforcement Learning
Q(l) learning:

34 Reinforcement Learning
Feedforward Neural Networks are used when state/action spaces are large for of estimates of V(s) and Q(s,a).


Download ppt "Neural Networks Chapter 7"

Similar presentations


Ads by Google