Download presentation
Presentation is loading. Please wait.
Published byAvery Stanley Modified over 11 years ago
1
University Paderborn 07 January 2009 RG Knowledge Based Systems Prof. Dr. Hans Kleine Büning Reinforcement Learning
2
Reinforcement Learning Prof. Dr. Hans Kleine Büning 2 University Paderborn Outline Motivation Applications Markov Decision Processes Q-learning Examples
3
Reinforcement Learning Prof. Dr. Hans Kleine Büning 3 University Paderborn
4
Reinforcement Learning Prof. Dr. Hans Kleine Büning 4 University Paderborn Reinforcement Learning: The Idea A way of programming agents by reward and punishment without specifying how the task is to be achieved
5
Reinforcement Learning Prof. Dr. Hans Kleine Büning 5 University Paderborn Learning to Ride a Bicycle Environment stat e action
6
Reinforcement Learning Prof. Dr. Hans Kleine Büning 6 University Paderborn Learning to Ride a Bicycle States: –Angle of handle bars –Angular velocity of handle bars –Angle of bicycle to vertical –Angular velocity of bicycle to vertical –Acceleration of angle of bicycle to vertical
7
Reinforcement Learning Prof. Dr. Hans Kleine Büning 7 University Paderborn Learning to Ride a Bicycle Environment stat e action
8
Reinforcement Learning Prof. Dr. Hans Kleine Büning 8 University Paderborn Learning to Ride a Bicycle Actions: –Torque to be applied to the handle bars –Displacement of the center of mass from the bicycles plan (in cm)
9
Reinforcement Learning Prof. Dr. Hans Kleine Büning 9 University Paderborn Learning to Ride a Bicycle Environment stat e action
10
Reinforcement Learning Prof. Dr. Hans Kleine Büning 10 University Paderborn Angle of bicycle to vertical is greater than 12° Reward = 0 Reward = -1 no yes
11
Reinforcement Learning Prof. Dr. Hans Kleine Büning 11 University Paderborn Learning To Ride a Bicycle Reinforcement Learning
12
Reinforcement Learning Prof. Dr. Hans Kleine Büning 12 University Paderborn Reinforcement Learning: Applications Board Games –TD-Gammon program, based on reinforcement learning, has become a world-class backgammon player Mobile Robot Controlling –Learning to ride a Bicycle –Navigation –Pole-balancing –Acrobot Sequential Process Controlling –Elevator dispatching
13
Reinforcement Learning Prof. Dr. Hans Kleine Büning 13 University Paderborn History of Reinforcement Learning Trial and error learning in psychology of animal learning Optimal control and dynamic programming Temporal-difference methods
14
Reinforcement Learning Prof. Dr. Hans Kleine Büning 14 University Paderborn Key Features of Reinforcement Learning Learner is not told which actions to take Trial and error search Possibility of delayed reward: –Sacrifice of short-term gains for greater long-term gains Explore/Exploit trade-off Considers the whole problem of a goal-directed agent interaction with an uncertain environment
15
Reinforcement Learning Prof. Dr. Hans Kleine Büning 15 University Paderborn The Agent-Environment Interaction Agent and environment interact at discrete time steps: t = 0,1, 2, … –Agent observes state at step t : s t 2 S –produces action at step t: a t 2 A –gets resulting reward : r t +1 2 –and resulting next state: s t +1 2 S
16
Reinforcement Learning Prof. Dr. Hans Kleine Büning 16 University Paderborn The Agents Goal: Coarsely, the agents goal is to get as much reward as it can over the long run Policy is a mapping from states to action s) = a Reinforcement learning methods specify how the agent changes its policy as a result of experience
17
Reinforcement Learning Prof. Dr. Hans Kleine Büning 17 University Paderborn Deterministic Markov Decision Process
18
Reinforcement Learning Prof. Dr. Hans Kleine Büning 18 University Paderborn Example
19
Reinforcement Learning Prof. Dr. Hans Kleine Büning 19 University Paderborn Example: Corresponding MDP
20
Reinforcement Learning Prof. Dr. Hans Kleine Büning 20 University Paderborn Example: Corresponding MDP
21
Reinforcement Learning Prof. Dr. Hans Kleine Büning 21 University Paderborn Example: Corresponding MDP
22
Reinforcement Learning Prof. Dr. Hans Kleine Büning 22 University Paderborn Example: Policy
23
Reinforcement Learning Prof. Dr. Hans Kleine Büning 23 University Paderborn Value of Policy and Rewards
24
Reinforcement Learning Prof. Dr. Hans Kleine Büning 24 University Paderborn Value of Policy and Agents Task
25
Reinforcement Learning Prof. Dr. Hans Kleine Büning 25 University Paderborn Nondeterministic Markov Decision Process P = 0.8 P = 0.1
26
Reinforcement Learning Prof. Dr. Hans Kleine Büning 26 University Paderborn Nondeterministic Markov Decision Process
27
Reinforcement Learning Prof. Dr. Hans Kleine Büning 27 University Paderborn Nondeterministic Markov Decision Process
28
Reinforcement Learning Prof. Dr. Hans Kleine Büning 28 University Paderborn Example with South-Easten Wind
29
Reinforcement Learning Prof. Dr. Hans Kleine Büning 29 University Paderborn Example with South-Easten Wind
30
Reinforcement Learning Prof. Dr. Hans Kleine Büning 30 University Paderborn Methods Dynamic Programming Value Function Approximation + Dynamic Programming Reinforcement Learning (Q-learning, Monte Carlo Methods) Value Function Approximation + Reinforcement Learning continuous states discrete states continuous states Model (reward function and transition probabilities) is known Model (reward function or transition probabilities) is unknown
31
Reinforcement Learning Prof. Dr. Hans Kleine Büning 31 University Paderborn Q-learning Algorithm
32
Reinforcement Learning Prof. Dr. Hans Kleine Büning 32 University Paderborn Q-learning Algorithm
33
Reinforcement Learning Prof. Dr. Hans Kleine Büning 33 University Paderborn Example
34
Reinforcement Learning Prof. Dr. Hans Kleine Büning 34 University Paderborn Example: Q-table Initialization
35
Reinforcement Learning Prof. Dr. Hans Kleine Büning 35 University Paderborn Example: Episode 1
36
Reinforcement Learning Prof. Dr. Hans Kleine Büning 36 University Paderborn Example: Episode 1
37
Reinforcement Learning Prof. Dr. Hans Kleine Büning 37 University Paderborn Example: Episode 1
38
Reinforcement Learning Prof. Dr. Hans Kleine Büning 38 University Paderborn Example: Episode 1
39
Reinforcement Learning Prof. Dr. Hans Kleine Büning 39 University Paderborn Example: Episode 1
40
Reinforcement Learning Prof. Dr. Hans Kleine Büning 40 University Paderborn Example: Q-table
41
Reinforcement Learning Prof. Dr. Hans Kleine Büning 41 University Paderborn Example: Episode 1
42
Reinforcement Learning Prof. Dr. Hans Kleine Büning 42 University Paderborn Episode 1
43
Reinforcement Learning Prof. Dr. Hans Kleine Büning 43 University Paderborn Example: Q-table
44
Reinforcement Learning Prof. Dr. Hans Kleine Büning 44 University Paderborn Example: Episode 2
45
Reinforcement Learning Prof. Dr. Hans Kleine Büning 45 University Paderborn Example: Episode 2
46
Reinforcement Learning Prof. Dr. Hans Kleine Büning 46 University Paderborn Example: Episode 2
47
Reinforcement Learning Prof. Dr. Hans Kleine Büning 47 University Paderborn Example: Q-table after Convergence
48
Reinforcement Learning Prof. Dr. Hans Kleine Büning 48 University Paderborn Example: Value Function after Convergence
49
Reinforcement Learning Prof. Dr. Hans Kleine Büning 49 University Paderborn Example: Optimal Policy
50
Reinforcement Learning Prof. Dr. Hans Kleine Büning 50 University Paderborn Example: Optimal Policy
51
Reinforcement Learning Prof. Dr. Hans Kleine Büning 51 University Paderborn Q-learning
52
Reinforcement Learning Prof. Dr. Hans Kleine Büning 52 University Paderborn Convergence of Q-learning
53
Reinforcement Learning Prof. Dr. Hans Kleine Büning 53 University Paderborn Blackjack Standard rules of blackjack hold State space: –element[0] - current value of player's hand (4-21) –element[1] - value of dealer's face-up card (2-11) –element[2] - player does not have usable ace (0/1) Starting states: –player has any 2 cards (uniformly distributed), dealer has any 1 card (uniformly distributed) Actions: –HIT –STICK Rewards: –1 for a loss –0 for a draw –1 for a win
54
Reinforcement Learning Prof. Dr. Hans Kleine Büning 54 University Paderborn Blackjack: Optimal Policy
55
Reinforcement Learning Prof. Dr. Hans Kleine Büning 55 University Paderborn Reinforcement Learning: Example States –Grids Actions –Left –Up –Right –Down Rewards –Bonus 20 –Food 1 –Predator -10 –Empty grid -0.1 Transition probabilities –0.80 – agent goes where he intends to go –0.20 – to any other adjacent grid or remains where it was (in case he is on the board of the grid world he goes to the other side)
56
Reinforcement Learning Prof. Dr. Hans Kleine Büning 56 University Paderborn Reinforcement Learning: Example
57
Reinforcement Learning Prof. Dr. Hans Kleine Büning 57 University Paderborn Reinforcement Learning: Example
58
Reinforcement Learning Prof. Dr. Hans Kleine Büning 58 University Paderborn Reinforcement Learning: Example
59
Reinforcement Learning Prof. Dr. Hans Kleine Büning 59 University Paderborn Reinforcement Learning: Example
60
Reinforcement Learning Prof. Dr. Hans Kleine Büning 60 University Paderborn Reinforcement Learning: Example
61
Reinforcement Learning Prof. Dr. Hans Kleine Büning 61 University Paderborn Reinforcement Learning: Example
62
Reinforcement Learning Prof. Dr. Hans Kleine Büning 62 University Paderborn Reinforcement Learning: Example
63
Reinforcement Learning Prof. Dr. Hans Kleine Büning 63 University Paderborn Reinforcement Learning: Example
64
Reinforcement Learning Prof. Dr. Hans Kleine Büning 64 University Paderborn Reinforcement Learning: Example
65
Reinforcement Learning Prof. Dr. Hans Kleine Büning 65 University Paderborn Reinforcement Learning: Example
66
Reinforcement Learning Prof. Dr. Hans Kleine Büning 66 University Paderborn Reinforcement Learning: Example
67
Reinforcement Learning Prof. Dr. Hans Kleine Büning 67 University Paderborn Reinforcement Learning: Example
68
Reinforcement Learning Prof. Dr. Hans Kleine Büning 68 University Paderborn Reinforcement Learning: Example
69
Reinforcement Learning Prof. Dr. Hans Kleine Büning 69 University Paderborn Reinforcement Learning: Example
70
Reinforcement Learning Prof. Dr. Hans Kleine Büning 70 University Paderborn Reinforcement Learning: Example
71
Reinforcement Learning Prof. Dr. Hans Kleine Büning 71 University Paderborn Reinforcement Learning: Example
72
Reinforcement Learning Prof. Dr. Hans Kleine Büning 72 University Paderborn Reinforcement Learning: Example
73
Reinforcement Learning Prof. Dr. Hans Kleine Büning 73 University Paderborn Reinforcement Learning: Example
74
Reinforcement Learning Prof. Dr. Hans Kleine Büning 74 University Paderborn Reinforcement Learning: Example
75
Reinforcement Learning Prof. Dr. Hans Kleine Büning 75 University Paderborn Reinforcement Learning: Example
76
Reinforcement Learning Prof. Dr. Hans Kleine Büning 76 University Paderborn Reinforcement Learning: Example
77
Reinforcement Learning Prof. Dr. Hans Kleine Büning 77 University Paderborn Reinforcement Learning: Example
78
Reinforcement Learning Prof. Dr. Hans Kleine Büning 78 University Paderborn Reinforcement Learning: Example
79
Reinforcement Learning Prof. Dr. Hans Kleine Büning 79 University Paderborn Reinforcement Learning: Example
80
Reinforcement Learning Prof. Dr. Hans Kleine Büning 80 University Paderborn Reinforcement Learning: Example
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.