Download presentation
Presentation is loading. Please wait.
Published byRidwan Pranata Modified over 5 years ago
1
Hidden Markov Models (cont.) Markov Decision Processes
CHAPTER 9 Hidden Markov Models (cont.) Markov Decision Processes
2
Markov Models
3
Conditional Independence
4
Weather Example
5
Mini-Forward Algorithm
6
Example
7
Stationary Distributions
If we simulate the chain long enough: What happens? Uncertainty accumulates Eventually, we have no idea what the state is! Stationary distributions: For most chains, the distribution we end up in is independent of the initial distribution Called the stationary distribution of the chain Usually, can only predict a short time out
8
Example: Web Link Analysis
9
Mini-Viterbi Algorithm
10
Hidden Markov Models
11
HMM Applications
12
Filtering: Forward Algorithm
13
Filtering Example
14
MLE: Viterbi Algorithm
15
Viterbi Properties
16
Markov Decision Processes
17
MDP Solutions
18
Example Optimal Policies
19
Stationarity
20
How (Not) to Solve an MDP
The inefficient way: Enumerate policies Calculate the expected utility (discounte rewards) starting from the start state E.g. by simulating a bunch of runs Choose the best policy We’ll return to a (better) idea like this later
21
Utilities of States
22
Infinite Utilities?
23
The Bellman Equation
24
Example: Bellman Equations
25
Value Iteration
26
Policy Iteration Alternate approach:
Policy evaluation: calculate utilities for a fixed policy Policy improvement: update policy based on resulting utilities Repeat until convergence This is policy iteration Can converge faster under some conditions
27
Comparison In value iteration:
Every pass (or “backup”) updates both policy (based on current utilities) and utilities (based on current policy In policy iteration: Several passes to update utilities Occasional passes to update policies Hybrid approaches (asynchronous policy iteration): Any sequences of partial updates to either policy entries or utilities will converge if every state is visited infinitely often
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.