Reinforcement Learning

Name: Reinforcement Learning
Uploaded: 2017-07-18T18:09:51+00:00
Duration: PTM3S20
Channel: Lester Kristopher Bryant
Description: Reinforcement Learning

Reinforcement Learning
Michael Roberts With Material From: Reinforcement Learning: An Introduction Sutton & Barto (1998)

What is RL? Trial & error learning Structure without model with model

RL vs. Supervised Learning
Evaluative vs. Instructional feedback Role of exploration On-line performance

K-armed Bandit Problem
Average Rewards Actions 10 0, 0, 5, 10, 35 5, 10, -15, -15, -10 -5 Agent 100

K-armed Bandit Cont. Greedy exploration ε-greedy Softmax
Average Reward: Incremental formula: where: α = 1 / (k+1) Probability of choosing action a:

More General Problems More than one state Delayed rewards
Markov Decision Process (MDP) Set of states Set of actions Reward function State transition function Table or Function Approximation

Example: Recycling Robot

Recycling Robot: Transition Graph

Dynamic Programming

Backup Diagram .25 .25 .25 .4 .6 .7 .3 .5 .5 Rewards 10 5 200 200 -10
1000

Dynamic Programming: Optimal Policy

Backup for Optimal Policy

Performance Metrics Eventual convergence to optimality
Speed of convergence to optimality Regret (Kaelbling, L., Littman, M., & Moore, A. 1996)

Gridworld Example

Initialize V arbitrarily, e.g. , for all
Repeat For each until (a small positive number) Output a deterministic policy, such that:

Temporal Difference Learning
RL without a model Issue of: temporal credit assignment Bootstraps like DP TD(0):

TD Learning Again, TD(0) = TD(λ) =
where e is called an eligibility trace

Backup Diagram for TD(λ)

TD-Gammon (Tesauro)

Additional Work POMDP’s Macros Multi-agent rl
Multiple reward structures

Reinforcement Learning

Similar presentations

Presentation on theme: "Reinforcement Learning"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Reinforcement Learning

Similar presentations

Presentation on theme: "Reinforcement Learning"— Presentation transcript:

Similar presentations

About project

Feedback