Download presentation
Presentation is loading. Please wait.
1
Max-norm Projections for Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University
2
Motivation MDPs: plan over atomic system states; Policy — specifies action at every state; Polytime algorithms for finding optimal policy. Number of states exponential in state variables.
3
Motivation: BNs meet MDPs Real-world MDPS have: Hundreds of variables; Googles of states. Can we exploit problem specific structure? For representation; For planning. Goal: Merge BN and MDPs for Efficient Computation.
4
Factored MDPs [Boutilier et al.] Total reward adding sub-rewards: R=R 1 +R 2 R2R2 Z R1R1 Y’ Z’ Y X’X Time tt+1 Actions only change small parts of model. Value function: Value of policy starting at state s.
5
Exploiting Structure Structured value function approach: [Boutilier et al. ‘95] Collapse value function using a tree; Works well only when many states have same value. X Z Model structure may imply structured value function;
6
Decomposable Value Functions Each h i is the status of some small part(s) of a complex system: status of a machine; inventory of a store. Linear combination of restricted domain functions. [Bellman et al. ‘63] [Tsitsiklis & Van Roy ’96] [Koller & Parr ’99,’00] K basis functions 2 n states h 1 (s1) h 2 (s1)... h 1 (s2) h 2 (s2)…. A=
7
Our Approach Embed structure into value function space a priori: Project into structured vector space of factored value functions; Efficiently find closest approximation to “true” value. Linear Combination of Structured Features
8
Policy Iteration Value of acting on Guess V = greedy(V) V = value of acting on (2 n x2 n ) (2 n x1) ValueRewardDiscounted expected value
9
Approximate Policy Iteration Guess w 0 t = greedy(A w t ) Aw t+1 value of acting on t Approximate value determination:
10
Approximate Value Determination Need a projection of the value function into the space of the basis functions: (L d projection) Previous work uses L 2 and weighted-L 2 projections. [Koller & Parr ’99, ’00]
11
Analysis of Approx. PI Theorem: We should be doing projections in Max-norm!
12
Approximate PI: Revisited Guess w 0 t = greedy(A w t ) Aw t+1 value of acting on t Approximate value determination: Analysis motivating projections in max-norm; Efficient algorithm for max-norm projection.
13
Efficient Max-norm Projection Computing max-norm for fixed weights; Cost networks; Efficient max-norm projection.
14
Efficient Max-norm Projection Computing max-norm for fixed weights; Cost networks; Efficient max-norm projection.
15
Max over Large State Spaces For fixed weights w, compute max-norm: However, if basis and target are functions of only a few variables, we can do it efficiently! Cost Networks can maximize over large state spaces efficiently when function is factored:
16
Efficient Max-norm Projection Computing max-norm for fixed weights; Cost networks; Efficient max-norm projection.
17
Can use variable elimination to maximize over state space: [Bertele & Brioschi ‘72] Cost Networks A D BC As in Bayes nets, maximization is exponential in size of largest factor. Here we need only 16, instead of 64 sum operations.
18
Efficient Max-norm Projection Computing max-norm for fixed weights; Cost networks; Efficient max-norm projection.
19
Algorithm for finding: Max-norm Projection Solve by Linear Programming : [Cheney ’82]
20
Representing the Constraints Explicit representation is exponential (|S|=2 n ) : If basis and target are factored, can use Cost Networks to represent the constraints:
21
Approximate Policy Iteration Guess w 0 t = greedy(A w t ) Aw t+1 value of acting on t How do represent the policy? How do we update it efficiently? Policy Improvement
22
What about the Policy ? Contextual Action Model: Z Y’ Z’ Y X’X default Z Y’ Z’ Y X’X Action 1 Z Y’ Z’ Y X’X Action 2 Factored value functions and model compact policy description Policy forms a decision list: If then action 1 else if then action 2 else if then action 1 Theorem: [Koller & Parr ’00]
23
Factored Policy Iteration: Summary Guess V = greedy(V) V = value of acting on Structure induces decision-list policy Key operations isomorphic to Bayesian Network inference Time per iteration reduced from O((2 n ) 3 ) to O(poly(k,n,C)) C = largest factor in cost net (function of structure) k = number of basis functions ( k << 2 n ) poly = complexity of LP solver, in practice close to linear
24
Network Management Problem Computers connected in a network; Each computer can fail with some probability; If a computer fails, it increases the probability its neighbors will fail; At every time step, the sys-admin must decide which computer to fix. Bidirectional Ring Ring and Star Server Star 3 Legs Ring of Rings Server
25
Comparing projections in L 2 to L Max-norm projection also much more efficient: Single cost network rather than many many BN inferences; Use of very efficient LP package (CPLEX). L 2 single basis L single basis L pair basis L 2 pair basis
26
Results on Larger Problems: Running Time Runs in time O(n 3 ) not O((2 n ) 3 )
27
Results on Larger Problems: Error Bounds Error remains bounded
28
Conclusions Max-norm projection directly minimizes error bounds; Closed-form projection operation provides exponential complexity reduction; Exploit structure to reduce computation costs! Solve very large MDPs efficiently.
29
Future Work POMDPs (IJCAI’01 workshop paper) ; Additional structure: Factored actions; Relational representations; CSI; Multi-agent systems; Linear program solution for MDP.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.