Heuristic Search Value Iteration for POMDPs Presenter: Hui Li January 12, 2007
Outline Value Approximation Heuristic Search Results Conclusions
Value Approximation in HSVI Optimal Value function Vn(b) in POMDPs for a horizon of length n is piecewise linear and convex Where is the gradient vector of Vn(b) in the k-th polyhedral belief region
Value Approximation in HSVI V*(b) b1 b2 V(b) V(b) is the upper bound V*(b) is the exact true value function V(b) is the lower bound b 1
Value Approximation in HSVI V*(b) b1 b2 V(b) Locally Updating at b b
Value Approximation in HSVI Vector set representation for the low bound V(b) Initialization Updating using
Value Approximation in HSVI Point set representation for the upper bound V(b) is Upper bound is the convex hull formed by a finite set of belief/value points Initialization Using MDP solution as initial value Updating using is the projection of b’ onto the convex hull, which can be solved by linear program.
Value Approximation in HSVI It can be proved that the lower bound V(b) and the upper bound V(b) are uniformly provable and converge to the true value function V*(b) . V0(b) V1(b) Vn(b) Upper bound Lower bound
Heuristic Search in HSVI Adding one belief point at each update iteration
Heuristic Search in HSVI Interval function Width of interval function Uncertainty at b
Heuristic Search in HSVI How to select next belief point b The selection of the action a* It turns out convergence can be guaranteed only by choosing the action with the greatest upper bound. The selection of the observation o* Selecting o* with the maximized weighted uncertainty
Results of HSVI on Benchmark Problems
Results of HSVI on Benchmark Problems Comparison between PBVI and HSVI
Results of HSVI on Benchmark Problems
Conclusions HSVI utilizes the upper bound and lower bound to approximate the value function; The heuristic search for next belief HSVI brings a faster convergence.