Presentation is loading. Please wait.

Presentation is loading. Please wait.

Heuristic Search Value Iteration

Similar presentations


Presentation on theme: "Heuristic Search Value Iteration"— Presentation transcript:

1 Heuristic Search Value Iteration
for POMDPs Presenter: Hui Li January 12, 2007

2 Outline Value Approximation Heuristic Search Results Conclusions

3 Value Approximation in HSVI
Optimal Value function Vn(b) in POMDPs for a horizon of length n is piecewise linear and convex Where is the gradient vector of Vn(b) in the k-th polyhedral belief region

4 Value Approximation in HSVI
V*(b) b1 b2 V(b) V(b) is the upper bound V*(b) is the exact true value function V(b) is the lower bound b 1

5 Value Approximation in HSVI
V*(b) b1 b2 V(b) Locally Updating at b b

6 Value Approximation in HSVI
Vector set representation for the low bound V(b) Initialization Updating using

7 Value Approximation in HSVI
Point set representation for the upper bound V(b) is Upper bound is the convex hull formed by a finite set of belief/value points Initialization Using MDP solution as initial value Updating using is the projection of b’ onto the convex hull, which can be solved by linear program.

8 Value Approximation in HSVI
It can be proved that the lower bound V(b) and the upper bound V(b) are uniformly provable and converge to the true value function V*(b) . V0(b) V1(b)   Vn(b)  Upper bound Lower bound

9 Heuristic Search in HSVI
Adding one belief point at each update iteration

10 Heuristic Search in HSVI
Interval function Width of interval function Uncertainty at b

11 Heuristic Search in HSVI
How to select next belief point b The selection of the action a* It turns out convergence can be guaranteed only by choosing the action with the greatest upper bound. The selection of the observation o* Selecting o* with the maximized weighted uncertainty

12 Results of HSVI on Benchmark Problems

13 Results of HSVI on Benchmark Problems
Comparison between PBVI and HSVI

14 Results of HSVI on Benchmark Problems

15 Conclusions HSVI utilizes the upper bound and lower bound to approximate the value function; The heuristic search for next belief HSVI brings a faster convergence.


Download ppt "Heuristic Search Value Iteration"

Similar presentations


Ads by Google