Hidden Markov Models (cont.) Markov Decision Processes CHAPTER 9 Hidden Markov Models (cont.) Markov Decision Processes
Markov Models
Conditional Independence
Weather Example
Mini-Forward Algorithm
Example
Stationary Distributions If we simulate the chain long enough: What happens? Uncertainty accumulates Eventually, we have no idea what the state is! Stationary distributions: For most chains, the distribution we end up in is independent of the initial distribution Called the stationary distribution of the chain Usually, can only predict a short time out
Example: Web Link Analysis
Mini-Viterbi Algorithm
Hidden Markov Models
HMM Applications
Filtering: Forward Algorithm
Filtering Example
MLE: Viterbi Algorithm
Viterbi Properties
Markov Decision Processes
MDP Solutions
Example Optimal Policies
Stationarity
How (Not) to Solve an MDP The inefficient way: Enumerate policies Calculate the expected utility (discounte rewards) starting from the start state E.g. by simulating a bunch of runs Choose the best policy We’ll return to a (better) idea like this later
Utilities of States
Infinite Utilities?
The Bellman Equation
Example: Bellman Equations
Value Iteration
Policy Iteration Alternate approach: Policy evaluation: calculate utilities for a fixed policy Policy improvement: update policy based on resulting utilities Repeat until convergence This is policy iteration Can converge faster under some conditions
Comparison In value iteration: Every pass (or “backup”) updates both policy (based on current utilities) and utilities (based on current policy In policy iteration: Several passes to update utilities Occasional passes to update policies Hybrid approaches (asynchronous policy iteration): Any sequences of partial updates to either policy entries or utilities will converge if every state is visited infinitely often