Heuristic Search Value Iteration

Slides:



Advertisements
Similar presentations
Optimization.
Advertisements

Markov Decision Process
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California.
Partially Observable Markov Decision Process (POMDP)
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Solving POMDPs Using Quadratically Constrained Linear Programs Christopher Amato.
SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.
SARSOP Successive Approximations of the Reachable Space under Optimal Policies Devin Grady 4 April 2013.
CSE-573 Artificial Intelligence Partially-Observable MDPS (POMDPs)
Zonotopes Techniques for Reachability Analysis Antoine Girard Workshop “Topics in Computation and Control” March 27 th 2006, Santa Barbara, CA, USA
Fast approximate POMDP planning: Overcoming the curse of history! Joelle Pineau, Geoff Gordon and Sebastian Thrun, CMU Point-based value iteration: an.
Decision Theoretic Planning
Optimal Policies for POMDP Presented by Alp Sardağ.
CS594 Automated decision making University of Illinois, Chicago
Optimization of thermal processes2007/2008 Optimization of thermal processes Maciej Marek Czestochowa University of Technology Institute of Thermal Machinery.
by Rianto Adhy Sasongko Supervisor: Dr.J.C.Allwright
Lecture 13 – Perceptrons Machine Learning March 16, 2010.
Separating Hyperplanes
Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.
An Accelerated Gradient Method for Multi-Agent Planning in Factored MDPs Sue Ann HongGeoff Gordon CarnegieMellonUniversity.
Partially Observable Markov Decision Process By Nezih Ergin Özkucur.
Infinite Horizon Problems
Planning under Uncertainty
1 Policies for POMDPs Minqing Hu. 2 Background on Solving POMDPs MDPs policy: to find a mapping from states to actions POMDPs policy: to find a mapping.
POMDPs: Partially Observable Markov Decision Processes Advanced AI
KI Kunstmatige Intelligentie / RuG Markov Decision Processes AIMA, Chapter 17.
Models of Planning ClassicalContingent (FO)MDP ???Contingent POMDP ???Conformant (NO)MDP Complete Observation Partial None Uncertainty Deterministic Disjunctive.
An Introduction to PO-MDP Presented by Alp Sardağ.
Incremental Pruning CSE 574 May 9, 2003 Stanley Kok.
4/1 Agenda: Markov Decision Processes (& Decision Theoretic Planning)
Ant Colony Optimization to Resource Allocation Problems Peng-Yeng Yin and Ching-Yu Wang Department of Information Management National Chi Nan University.
9/23. Announcements Homework 1 returned today (Avg 27.8; highest 37) –Homework 2 due Thursday Homework 3 socket to open today Project 1 due Tuesday –A.
Predictive State Representation Masoumeh Izadi School of Computer Science McGill University UdeM-McGill Machine Learning Seminar.
Instructor: Vincent Conitzer
MAKING COMPLEX DEClSlONS
Computational Stochastic Optimization: Bridging communities October 25, 2012 Warren Powell CASTLE Laboratory Princeton University
CSE-473 Artificial Intelligence Partially-Observable MDPS (POMDPs)
Part 4 Nonlinear Programming 4.3 Successive Linear Programming.
Generalized and Bounded Policy Iteration for Finitely Nested Interactive POMDPs: Scaling Up Ekhlas Sonu, Prashant Doshi Dept. of Computer Science University.
CSE-573 Reinforcement Learning POMDPs. Planning What action next? PerceptsActions Environment Static vs. Dynamic Fully vs. Partially Observable Perfect.
TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta.
Computer Science CPSC 502 Lecture 14 Markov Decision Processes (Ch. 9, up to 9.5.3)
Utilities and MDP: A Lesson in Multiagent System Based on Jose Vidal’s book Fundamentals of Multiagent Systems Henry Hexmoor SIUC.
1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
Solving POMDPs through Macro Decomposition
1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
A Tutorial on the Partially Observable Markov Decision Process and Its Applications Lawrence Carin June 7,2006.
U NCERTAINTY IN S ENSING ( AND ACTION ). P LANNING W ITH P ROBABILISTIC U NCERTAINTY IN S ENSING No motion Perpendicular motion.
1.7 Linear Inequalities.  With an inequality, you are finding all values of x for which the inequality is true.  Such values are solutions and are said.
1 Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design Xin Lu 11/04/2002.
5.3 Mixed Integer Nonlinear Programming Models. A Typical MINLP Model.
Generalized Point Based Value Iteration for Interactive POMDPs Prashant Doshi Dept. of Computer Science and AI Institute University of Georgia
Application of Dynamic Programming to Optimal Learning Problems Peter Frazier Warren Powell Savas Dayanik Department of Operations Research and Financial.
Ch. Eick: Num. Optimization with GAs Numerical Optimization General Framework: objective function f(x 1,...,x n ) to be minimized or maximized constraints:
Finite Element Method. History Application Consider the two point boundary value problem.
C.V. Education – Ben-Gurion University (Israel). –B.Sc. – –M.Sc. – –PhD. – Work: –After B.Sc. – Software Engineer at Microsoft,
Partial Observability “Planning and acting in partially observable stochastic domains” Leslie Pack Kaelbling, Michael L. Littman, Anthony R. Cassandra;
Part 4 Nonlinear Programming 4.3 Successive Linear Programming.
Trey Smith, Robotics Institute, Carnegie Mellon University
5.3 Mixed-Integer Nonlinear Programming (MINLP) Models
POMDPs Logistics Outline No class Wed
Reinforcement Learning in POMDPs Without Resets
Propagating Uncertainty In POMDP Value Iteration with Gaussian Process
5.3 Mixed Integer Nonlinear Programming Models
CS 188: Artificial Intelligence Fall 2007
Approximate POMDP planning: Overcoming the curse of history!
Instructor: Vincent Conitzer
Chapter 17 – Making Complex Decisions
Part 4 Nonlinear Programming
Reinforcement Learning Dealing with Partial Observability
Presentation transcript:

Heuristic Search Value Iteration for POMDPs Presenter: Hui Li January 12, 2007

Outline Value Approximation Heuristic Search Results Conclusions

Value Approximation in HSVI Optimal Value function Vn(b) in POMDPs for a horizon of length n is piecewise linear and convex Where is the gradient vector of Vn(b) in the k-th polyhedral belief region

Value Approximation in HSVI V*(b) b1 b2 V(b) V(b) is the upper bound V*(b) is the exact true value function V(b) is the lower bound b 1

Value Approximation in HSVI V*(b) b1 b2 V(b) Locally Updating at b b

Value Approximation in HSVI Vector set representation for the low bound V(b) Initialization Updating using

Value Approximation in HSVI Point set representation for the upper bound V(b) is Upper bound is the convex hull formed by a finite set of belief/value points Initialization Using MDP solution as initial value Updating using is the projection of b’ onto the convex hull, which can be solved by linear program.

Value Approximation in HSVI It can be proved that the lower bound V(b) and the upper bound V(b) are uniformly provable and converge to the true value function V*(b) . V0(b) V1(b)   Vn(b)  Upper bound Lower bound

Heuristic Search in HSVI Adding one belief point at each update iteration

Heuristic Search in HSVI Interval function Width of interval function Uncertainty at b

Heuristic Search in HSVI How to select next belief point b The selection of the action a* It turns out convergence can be guaranteed only by choosing the action with the greatest upper bound. The selection of the observation o* Selecting o* with the maximized weighted uncertainty

Results of HSVI on Benchmark Problems

Results of HSVI on Benchmark Problems Comparison between PBVI and HSVI

Results of HSVI on Benchmark Problems

Conclusions HSVI utilizes the upper bound and lower bound to approximate the value function; The heuristic search for next belief HSVI brings a faster convergence.