Two Models of Evaluating Probabilistic Planning IPPC (Probabilistic Planning Competition) – How often did you reach the goal under the given time constraints FF-HOP FF-Replan Evaluate on the quality of the policy – Converging to optimal policy faster LRTDP mGPT Kolobov’s approach
Heuristics for Stochastic Planning Heuristics come from relaxation We can relax along two separate dimensions: – Relax –ve interactions Consider +ve interactions alone using relaxed planning graphs – Relax uncertainty Consider determinizations – Or a combination of both!
Determinizations Most-likely outcome determinization – Inadmissible – e.g. if only path to goal relies on less likely outcome of an action All outcomes determinization – Admissible, but not very informed – e.g. Very unlikely action leads you straight to goal
Solving Determinizations If we relax –ve interactions – Then compute relaxed plan Admissible if optimal relaxed plan is computed Inadmissible otherwise If we keep –ve interactions – Then use a deterministic planner (e.g. FF/LPG) Inadmissible unless the underlying planner is optimal
Dimensions of Relaxation Uncertainty Negative Interactions Relaxed Plan Heuristic 2 2 McLUG 3 3 FF/LPG Reducing Uncertainty Bound the number of stochastic outcomes Stochastic “width” Limited width stochastic planning? Increasing consideration
Dimensions of Relaxation NoneSomeFull NoneRelaxed PlanMcLUG Some FullFF/LPGLimited width Stoch Planning Uncertainty -ve interactions
Expressiveness v. Cost h = 0 McLUG FF-Replan FF Limited width stochastic planning Node Expansions v. Heuristic Computation Cost Nodes Expanded Computation Cost FF R FF
Reducing Heuristic Computation Cost Exploit overlapping structure of heuristics for different states – E.g. SAG idea for McLUG – E.g. Triangle tables idea for plans (c.f. Kolobov)
Triangle Table Memoization Use triangle tables / memoization C C B B A A A A B B C C If the above problem is solved, then we don’t need to call FF again for the below: B B A A A A B B