Solving Factored POMDPs with Linear Value Functions Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University
Policy Iteration for POMDPs [Hansen ‘98] 1:a 1 2:a 2 O1O1 O1O1 O2O2 O2O2 1 2 V b 1 2 V b 3 Value Determination DP Step Policy Improvement
Policy Iteration for POMDPs [Hansen ‘98] 1 2 V b 1 2 V b 3 1:a 1 3:a 1 2:a 2 O1O1 O1O1 O1O1 O2O2 O2O2 O2O2 Value Determination DP Step Policy Improvement
POMDP Complexity Number of vectors can grow exponentially: Avoid generating unneeded facets: Witness, IP, etc; Approximate by discarding similar vectors, etc. Each vector has a large representation: One dimension for each state; 2 n dimensions for n state variables; Can try structured representations of the vectors. [Boutilier & Poole ’96] [Hansen & Feng ’00] POMDPs have multiple sources of complexity:
Factored POMDPs Total reward adding sub-rewards: R=R 1 +R 2 R2R2 Z R1R1 Y’Z’YX’X Time tt+1 Subset of variables are observed OZ’ AZAZ OX’ AXAX Actions only change small parts of model
Exploiting Structure Structured vectors approach: [Boutilier & Poole ’96], [Hansen & Feng ’00] Within a vector, many dimensions may be equivalent; Collapse using a tree; Works well if DBN structure leads to clean decomposition; Doesn’t always hold up, even in MDPs. 1 2 V b=P(XYZ) X Z Structure in model might imply structure in vectors;
Our Approach Not all structured POMDPs have structured vectors; Embed structure into value function space a priori: Project -vectors into structured vector space; Efficiently find closest approximation to “true” -vectors. Linear Combination of Structured Features
Exploiting Structure in PI and Incremental Pruning 1:a 1 2:a 2 O1O1 O1O1 O2O2 O2O2 1 2 V b 1 2 V b 3 Value Determination DP Step Policy Improvement Best Pointwise Dominates Best
Factored Best Want to find vector with highest value for given belief state: Factorization decomposes dot-product: 1 2 V b 3 b
Factored Best Example Assume 4 state variables, 3 basis functions: Decomposition of dot product: Summands depend only on marginal probabilities
Factored Best Properties Avoids exponential blowup in belief state representation; Exponential in size of basis function domains; Suggests a belief state decomposition; Factored Best only requires marginals; Useful at execution time; Monitoring belief state: Can represent belief state as product of marginals; [Boyen & Koller ’98] Analyze policy loss from belief state approximation. [McAllester & Singh ’99] [Poupart & Boutilier ’01]
Exploiting Structure in PI and Incremental Pruning 1:a 1 2:a 2 O1O1 O1O1 O2O2 O2O2 1 2 V b 1 2 V b 3 Value Determination DP Step Policy Improvement Best Pointwise Dominates Pointwise Dominates
Pointwise Domination 1 2 V b 3 Does 2 dominate 4 pointwise ? 4 Minimum 0 Factored value functions: Minimization over exponential state space! Minimization over factored function efficient with cost networks. [Bertele and Brioschi ‘72], [Dechter ‘99]
Exploiting Structure in PI and Incremental Pruning 1:a 1 2:a 2 O1O1 O1O1 O2O2 O2O2 1 2 V b 1 2 V b 3 Value Determination DP Step Policy Improvement Best Pointwise Dominates Value Determination
Value Determination 1 2 V b 3 1:a 1 3:a 1 2:a 2 O1O1 O1O1 O1O1 O2O2 O2O2 O2O2 Value of policy, starting from 1 Expected Future RewardValue Observed O 1 Observed O 2
Approximate Value Determination Exact value determination exponential number of equations; Factored approximation efficient: Find best approximation in max-norm; Algorithm exploits factored model; Analogous to factored MDP case (see Max-norm Projection IJCAI talk on Thursday).
Exploiting Structure: Summary 1:a 1 2:a 2 O1O1 O1O1 O2O2 O2O2 1 2 V b 1 2 V b 3 Value Determination DP Step Policy Improvement
Conclusions Factored POMDPs can represent complex systems; Factorization in model doesn’t always imply factorization in solution: Linear approximation reduces dimensionality of problem; Can efficiently find closest linear approximation; Can modify standard POMDP algorithms to use factored linear value functions efficiently; Complexity function of DBN and basis structure.
Our Approach V b(s1) b(s2) One dimension for each state V h 1 (s) h 2 (s) Projection One dimension for each feature (<< #states)