Based on slides by Nicholas Roy, MIT Finding Approximate POMDP Solutions through Belief Compression.

Slides:



Advertisements
Similar presentations
Viktor Zhumatiya, Faustino Gomeza,
Advertisements

Bayesian Belief Propagation
Dialogue Policy Optimisation
Mixture Models and the EM Algorithm
Partially Observable Markov Decision Process (POMDP)
SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.
CSE-573 Artificial Intelligence Partially-Observable MDPS (POMDPs)
HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Fast approximate POMDP planning: Overcoming the curse of history! Joelle Pineau, Geoff Gordon and Sebastian Thrun, CMU Point-based value iteration: an.
Dynamic Bayesian Networks (DBNs)
1. Algorithms for Inverse Reinforcement Learning 2
Sampling Distributions (§ )
Learning to estimate human pose with data driven belief propagation Gang Hua, Ming-Hsuan Yang, Ying Wu CVPR 05.
PROBABILISTIC ASSESSMENT OF THE QSAR APPLICATION DOMAIN Nina Jeliazkova 1, Joanna Jaworska 2 (1) IPP, Bulgarian Academy of Sciences, Sofia, Bulgaria (2)
What Are Partially Observable Markov Decision Processes and Why Might You Care? Bob Wall CS 536.
Planning under Uncertainty
POMDPs: Partially Observable Markov Decision Processes Advanced AI
EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University.
Particle Filters Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics TexPoint fonts used in EMF. Read.
CS 547: Sensing and Planning in Robotics Gaurav S. Sukhatme Computer Science Robotic Embedded Systems Laboratory University of Southern California
Incremental Pruning CSE 574 May 9, 2003 Stanley Kok.
Monte Carlo Localization
A Probabilistic Approach to Collaborative Multi-robot Localization Dieter Fox, Wolfram Burgard, Hannes Kruppa, Sebastin Thrun Presented by Rajkumar Parthasarathy.
Markov Decision Processes
Reinforcement Learning Yishay Mansour Tel-Aviv University.
Exploration in Reinforcement Learning Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham, UK
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Radial Basis Function Networks
Nonlinear Dimensionality Reduction Approaches. Dimensionality Reduction The goal: The meaningful low-dimensional structures hidden in their high-dimensional.
Markov Localization & Bayes Filtering
Link Reconstruction from Partial Information Gong Xiaofeng, Li Kun & C. H. Lai
CSE-473 Artificial Intelligence Partially-Observable MDPS (POMDPs)
Simultaneous Localization and Mapping Presented by Lihan He Apr. 21, 2006.
1 Robot Environment Interaction Environment perception provides information about the environment’s state, and it tends to increase the robot’s knowledge.
CSE-573 Reinforcement Learning POMDPs. Planning What action next? PerceptsActions Environment Static vs. Dynamic Fully vs. Partially Observable Perfect.
TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta.
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)
An Asymptotic Analysis of Generative, Discriminative, and Pseudolikelihood Estimators by Percy Liang and Michael Jordan (ICML 2008 ) Presented by Lihan.
Solving POMDPs through Macro Decomposition
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Reinforcement Learning Yishay Mansour Tel-Aviv University.
Maximum a posteriori sequence estimation using Monte Carlo particle filters S. J. Godsill, A. Doucet, and M. West Annals of the Institute of Statistical.
A Tutorial on the Partially Observable Markov Decision Process and Its Applications Lawrence Carin June 7,2006.
U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Probabilistic Robotics Introduction.  Robotics is the science of perceiving and manipulating the physical world through computer-controlled devices.
1 Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design Xin Lu 11/04/2002.
Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.
10-1 Probabilistic Robotics: FastSLAM Slide credits: Wolfram Burgard, Dieter Fox, Cyrill Stachniss, Giorgio Grisetti, Maren Bennewitz, Christian Plagemann,
Planning Strategies RSS II Lecture 6 September 21, 2005.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Partial Observability “Planning and acting in partially observable stochastic domains” Leslie Pack Kaelbling, Michael L. Littman, Anthony R. Cassandra;
Autonomous Mobile Robots Autonomous Systems Lab Zürich Probabilistic Map Based Localization "Position" Global Map PerceptionMotion Control Cognition Real.
Estimating standard error using bootstrap
CS b659: Intelligent Robotics
POMDPs Logistics Outline No class Wed
Probabilistic Reasoning Over Time
Markov Decision Processes
Introduction to particle filter
Markov Decision Processes
Hierarchical POMDP Solutions
SMEM Algorithm for Mixture Models
Introduction to particle filter
Approximate POMDP planning: Overcoming the curse of history!
Probabilistic Map Based Localization
Chapter 17 – Making Complex Decisions
Sampling Distributions (§ )
CS 416 Artificial Intelligence
Reinforcement Learning Dealing with Partial Observability
Presentation transcript:

Based on slides by Nicholas Roy, MIT Finding Approximate POMDP Solutions through Belief Compression

Reliable Navigation Conventional trajectories may not be robust to localisation error Estimated robot position Robot position distribution True robot position Goal position

Perception and Control PerceptionControl World state Control algorithms

Perception and Control Assumed full observability Exact POMDP planning Probabilistic Perception Model P(x) argmax P(x)Control World state Probabilistic Perception Model P(x)Control Brittle Intractable

Perception and Control Assume full observability Exact POMDP planning Brittle World state Probabilistic Perception Model P(x)Compressed P(x)Control Intractable

Main Insight World state Probabilistic Perception Model P(x)Low-dimensional P(x)Control Good policies for real world POMDPs can be found by planning over low-dimensional representations of the belief space.

but not usually. The controller may be globally uncertain... Belief Space Structure

Coastal Navigation Represent beliefs using Discretise into low-dimensional belief space MDP

Coastal Navigation

A Hard Navigation Problem Distance in M Average Distance to Goal

Dimensionality Reduction Principal Components Analysis Original Beliefs Weights Characteristic Beliefs

Principal Components Analysis Given belief b  n, we want b  m, m«n. Collection of beliefs drawn from 200 state problem Probability of being in state State ~

One sample distribution m=9 gives this representation for one sample distribution Principal Components Analysis Given belief b  n, we want b  m, m«n. Probability of being in state State ~

Principal Components Analysis Many real world POMDP distributions are characterised by large regions of low probability. Idea: Create fitting criterion that is (exponentially) stronger in low-probability regions (E-PCA)

1 basis2 bases3 bases4 bases Example EPCA State Probability of being in state

Example Reduction

E-PCA will indicate appropriate number of bases, depending on beliefs encountered Finding Dimensionality

Planning S1S1 S2S2 S3S3 Original POMDP Low-dimensional belief space B E-PCA Discrete belief space MDP Discretise ~

Model Parameters Reward function R(b) s1s1 s2s2 s3s3 p(s) Back-project to high dimensional belief Compute expected reward from belief: ~ ~

Model Parameters Low dimension Full dimension ~ 1. For each belief b i and action a bibi ~ 3.Propagate according to action bjbj 4.Propagate according to observation bjbj ~ ~ 5. Recover b j 6. Set T(b i, a, b j ) to probability of observation ~~ bibi ~ 2. Recover full belief b i

Robot Navigation Example True (hidden) robot position Goal position Goal state Initial Distribution

Robot Navigation Example True robot position Goal position

Policy Comparison Average Distance to Goal Distance in M 6 bases

People Finding

People Finding as a POMDP Fully Observable Robot Position of person unknown Robot position True person position

Finding and Tracking People Robot position True person position

People Finding as a POMDP Factored belief space 2 dimensions: fully-observable robot position 6 dimensions: distribution over person positions Regular grid gives ≈ states

Variable Resolution Non-regular grid using samples b1b1 b2b2 b3b3 b4b4 b5b5 T(b 1, a 1, b 2 ) T(b 1, a 2, b 5 ) Compute model parameters using nearest-neighbour ~~ ~ ~ ~ ~ ~ ~ ~

Refining the Grid V(b 1 ) ~ V(b' 1 ) ~ Sample beliefs according to policy b1b1 ~ b'b' ~ Construct new model ~ ~ Keep new belief if V(b'1) > V(b1)

The Optimal Policy Original distribution Reconstruction using EPCA and 6 bases Robot position True person position

E-PCA Policy Comparison Average time to find person Average # of Actions to find Person E-PCA: 72 states Refined E-PCA: 260 states Fully observable MDP

Nick’s Thesis Contributions Good policies for real world POMDPs can be found by planning over a low-dimensional representation of the belief space, using E-PCA. POMDPs can scale to bigger, more complicated real-world problems. POMDPs can be used for real deployed robots.