Download presentation
Presentation is loading. Please wait.
Published byGregory Hensley Modified over 9 years ago
1
Stochastic Dynamic Programming with Factored Representations Presentation by Dafna Shahaf (Boutilier, Dearden, Goldszmidt 2000)
2
The Problem Standard MDP algorithms require explicit state space enumeration Curse of dimensionality Need: Compact Representation (intuition: STRIPS) Need: versions of standard dynamic programming algorithms for it
3
A Glimpse of the Future Policy Tree Value Tree
4
A Glimpse of the Future: Some Experimental Results
5
Roadmap MDPs- Reminder Structured Representation for MDPs: Bayesian Nets, Decision Trees Algorithms for Structured Representation Experimental Results Extensions
6
MDPs- Reminder (states, actions, transitions, rewards) Discounted infinite-horizon Stationary Policies (an action to take at state s) Value functions: is k-stage-to-go value function for π
7
Roadmap MDPs- Reminder Structured Representation for MDPs: Bayesian Nets, Decision Trees Algorithms for Structured Representation Experimental Results Extensions
8
Representing MDPs as Bayesian Networks: Coffee world O: Robot is in office W: Robot is wet U: Has umbrella R: It is raining HCR: Robot has coffee HCO: Owner has coffee Go: Switch location BuyC: Buy coffee DelC: Deliver coffee GetU: Get umbrella The effect of the actions might be noisy. Need to provide a distribution for each effect.
9
Representing Actions: DelC 0 0.3 0 0
10
Representing Actions: Interesting Points No need to provide marginal distribution over pre-action variables Markov Property: we need only the previous state For now, no synchronic arcs Frame Problem? Single Network vs. a network for each action Why Decision Trees?
11
Representing Reward Generally determined by a subset of features.
12
Policies and Value Functions Policy TreeValue Tree The optimal choice may depend only on certain variables (given some others). Features HCR=T HCR=F ValuesActions
13
Roadmap MDPs- Reminder Structured Representation for MDPs: Bayesian Nets, Decision Trees Algorithms for Structured Representation Experimental Results Extensions
14
Bellman Backup Q-Function: The value of performing a in s, given value function v Value Iteration- Reminder
15
Structured Value Iteration- Overview Input: Tree( ). Output: Tree( ). 1. Set Tree( )= Tree( ) 2. Repeat (a) Compute Tree( )= Regress(Tree( ),a) for each action a (b) Merge (via maximization) trees Tree( ) to obtain Tree( ) Until termination criterion. Return Tree( ).
16
Example World
17
Step 2a: Calculating Q-Functions 1. Expected Future Value 2. Discounting Future Value 3. Adding Immediate Reward How to use the structure of the trees? Tree( ) should distinguish only conditions under which a makes a branch of Tree(V) true with different odds.
18
Calculating : Tree(V 0 )PTree( ) Finding conditions under which a will have distinct expected value, with respect to V 0 FVTree( ) Undiscounted Expected Future Value for performing action a with one-stage-to-go. Tree( ) Discounting FVTree (by 0.9), and adding the immediate reward function. Z: 1*10+ 0*0
19
An Alternative View:
20
(a more complicated example) Tree(V 1 ) Partial PTree( ) Unsimplified PTree( ) FVTree( )Tree( )
21
The Algorithm: Regress Input: Tree(V), action a. Output: Tree( ) 1. PTree( )= PRegress(Tree(V),a) (simplified)
22
The Algorithm: Regress Input: Tree(V), action a. Output: Tree( ) 1. PTree( )= PRegress(Tree(V),a) (simplified) 2. Construct FVTree( ): for each branch b of PTree, with leaf node l(b) (a) Pr b =the product of individual distr. from l(b) (b) (c) Re-label leaf l(b) with v b.
23
The Algorithm: Regress Input: Tree(V), action a. Output: Tree( ) 1. PTree( )= PRegress(Tree(V),a) (simplified) 2. Construct FVTree( ): for each branch b of PTree, with leaf node l(b) (a) Pr b =the product of individual distr. from l(b) (b) (c) Re-label leaf l(b) with v b. 3. Discount FVTree( ) with, append Tree(R) 4. Return FVTree( )
24
The Algorithm: PRegress Input: Tree(V), action a. Output: PTree( ) 1. If Tree(V) is a single node, return emptyTree 2. X = the variable at the root of Tree(V) = the tree for CPT(X) (label leaves with X)
25
The Algorithm: PRegress Input: Tree(V), action a. Output: PTree( ) 1. If Tree(V) is a single node, return emptyTree 2. X = the variable at the root of Tree(V) = the tree for CPT(X) (label leaves with X) 3. = the subtrees of Tree(V) for X=t, X=f 4. = call PRegress on
26
The Algorithm: PRegress Input: Tree(V), action a. Output: PTree( ) 1. If Tree(V) is a single node, return emptyTree 2. X = the variable at the root of Tree(V) = the tree for CPT(X) (label leaves with X) 3. = the subtrees of Tree(V) for X=t, X=f 4. = call PRegress on 5. For each leaf l in, add or both (according to distribution. Use union to combine labels) 6. Return
27
Step 2b. Maximization Value Iteration Complete.
28
Roadmap MDPs- Reminder Structured Representation for MDPs: Bayesian Nets, Decision Trees Algorithms for Structured Representation Experimental Results Extensions
29
Experimental Results Worst Case: Best Case:
30
Roadmap MDPs- Reminder Structured Representation for MDPs: Bayesian Nets, Decision Trees Algorithms for Structured Representation Experimental Results Extensions
31
Synchronic edges POMDPs Rewards Approximation
32
Questions?
33
Backup slides Here be dragons.
34
Regression through a Policy
35
Improving Policies: Example
36
Maximization Step, Improved Policy
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.