Download presentation
Presentation is loading. Please wait.
1
Reinforcement Learning
2
Overview Tabular Methods Approximate Methods
Deep Reinforcement Learning
3
Tabular Methods
4
Model: Mathematical models of dynamics and reward
Policy: function mapping agent’s states to action Value function: future rewards from being in a state and/or action when following a particular policy
5
MDP
6
Markov Reward Process
7
Markov Reward Process
10
MDP = MRP + Action
11
MDP + Policy
12
Compare
13
How to Control?
14
Policy Search
16
State-Action Value Q
17
Policy Iteration
21
Worst Case Policy Iteration Can Take At Most |A|^|S| Iterations* (Size of # Policies)
22
Value Iteration
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.