Presentation is loading. Please wait.

Presentation is loading. Please wait.

TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta.

Similar presentations


Presentation on theme: "TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta."— Presentation transcript:

1 TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

2 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Contents POMDP Example POMDP Finite World POMDP algorithm Practical Considerations Approximate POMDP Techniques

3 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Partially Observable Markov Decision Processes (POMDP) POMDP:  Uncertainty in Measurements  State  Uncertainty in Control Effects Adapt previous Value Iteration Algorithm (VI-VIA)

4 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Partially Observable Markov Decision Processes (POMDP) POMDP:  World can't be sensed directly Measurements: incomplete, noisy, etc. Partial Observability  Robot has to estimate a posterior distribution over a possible world state.

5 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Partially Observable Markov Decision Processes (POMDP) POMDP:  Algorithm to find optimal control policy exit for FINITE WORLD: State space Action space Space of observation Planning horizon  Computation is complex  For continuous case there are approximations All Finite

6 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Partially Observable Markov Decision Processes (POMDP) The algorithm we are going to study all based in Value Iteration (VI). with  The same as previous but is not observable Robot has to make decision in the BELIEF STATE  Robot’s internal knowledge about the state of the environment  Space of posteriori distribution over state

7 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Partially Observable Markov Decision Processes (POMDP) So with Control Policy

8 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Partially Observable Markov Decision Processes (POMDP) Belief  bel   Each value in POMDP is function of entire probability distribution Problems:  State Space finite  Belief Space continuous  State Space continuous  Belief Space infinitely-dimensional continuum  Also complexity in calculate the Value Function Because of the integral over all the distribution

9 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Partially Observable Markov Decision Processes (POMDP) At the end  optimal solution exist for Interesting Special Case of Finite World:  state space; action space; space of observations; planning horizon  All finite Solution of VF are Piecewise Linear Function over the belief space  The previous arrive because Expectation is a linear operation Ability to select different controls in different parts

10 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Example POMDP 2 States:3 Control Actions:

11 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Example POMDP When execute payoff: Dilemma  opposite payoff in each state  knowledge of the state translate directly into payoff

12 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Example POMDP To acquire knowledge robot has control affects the state of the world in non-deterministic manner: (Cost of waiting, cost of sensing, etc.)

13 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Example POMDP Benefit  Before each control decision, the robot can sense. By sensing robot gains knowledge about the state  Make better control decisions  High payoff expectation In the case of control action, robot sense without terminal action

14 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Example POMDP The measurement model is governed by the following probability distribution:

15 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Example POMDP This example is easy to graph over the belief space (2 states) Belief state

16 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Example POMDP Control Policy  Function that maps the unit interval [0;1] to space of all actions Example

17 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Example POMDP – Control Choice Control Choice ( When to execute what control?)  First consider the immediate payoff.  Payoff now is a function of belief state So for, the expected payoff Payoff in POMDPs

18 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Example POMDP – Control Choice

19 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Example POMDP – Control Choice

20 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Example POMDP – Control Choice

21 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Example POMDP – Control Choice First we calculate  the robot simply selects the action of highest expected payoff Piecewise Linear convex Function Maximum of individual payoff function

22 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Example POMDP – Control Choice First we calculate  the robot simply selects the action of highest expected payoff Piecewise Linear convex Function Maximum of individual payoff function

23 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Example POMDP – Control Choice First we calculate  the robot simply selects the action of highest expected payoff Transition occurs when in Optimal Policy

24 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Example POMDP - Sensing Now we have perception  What if the robot can sense before it chooses control?  How it affects the optimal Value Function Sensing  info about State  enable choose better control action In previous example Expected payoff How better will this be after sensing?

25 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Example POMDP – Control Choice Belief after sensing as a function of the belief before sensing Given by Bayes Rule Finally

26 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Example POMDP – Control Choice How this affects the Value Function?

27 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Example POMDP – Control Choice Mathematically That is just replacing by in the Value Function

28 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Example POMDP – Control Choice However our interest is the complete Expected Value Function after sensing, that consider also the probability of sensing the other measurement. This is given by:

29 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Example POMDP – Control Choice An this results in

30 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Example POMDP – Control Choice Mathematically

31 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Example POMDP - Prediction To plan at a horizon larger than we have to take this into consideration and project our value function accordingly According to our transition probability model In between the expectation is linear If

32 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Example POMDP – Prediction An this results in

33 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Example POMDP – Prediction And adding and we have:

34 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Example POMDP – Prediction Mathematically cost Fix!!

35 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Example POMDP – Pruning Full backup : Impractical!!! Efficient approximate POMDP needed

36 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Finite World POMDP algorithm To understand this read Mathematical Derivation of POMDPs pp.531-536 in [1]

37 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Finite World POMDP algorithm To understand this read Mathematical Derivation of POMDPs pp.531-536 in [1]

38 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Example POMDP – Practical Considerations It looks easy let’s try something more “real”… Probabilistic Robot “RoboProb”

39 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Example POMDP – Practical Considerations It looks easy let’s try something more “real”… Probabilistic Robot “RoboProb” 11 States: 5 Control Actions: Sense without moving 0.1 0.8 Transition Model

40 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Example POMDP – Practical Considerations It looks easy let’s try something more “real”… Probabilistic Robot “RoboProb” “Reward”  Payoff The same set for all control action Example

41 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Example POMDP – Practical Considerations It’s getting kind of hard :S… Probabilistic Robot “RoboProb” Transition Probability Example  0.1 0.8

42 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Example POMDP – Practical Considerations It’s getting kind of hard :S… Probabilistic Robot “RoboProb” Transition Probability Example 

43 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Example POMDP – Practical Considerations It’s getting kind of hard :S… Probabilistic Robot “RoboProb” Measurement Probability

44 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Example POMDP – Practical Considerations It’s getting kind of hard :S… Probabilistic Robot “RoboProb” Belief States Impossible to graph!!

45 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Example POMDP – Practical Considerations It’s getting kind of hard :S… Probabilistic Robot “RoboProb” Each linear function results from executing control, followed by observing measurement, and then executing control.

46 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Example POMDP – Practical Considerations It’s getting kind of hard :S… Probabilistic Robot “RoboProb” Defining Measurement Probability Defining “Reward” Payoff Defining Transition Probability Merging Transition (Control) Probability

47 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Example POMDP – Practical Considerations It’s getting kind of hard :S… Probabilistic Robot “RoboProb” Setting Beliefs Executing Sensing Executing

48 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Example POMDP – Practical Considerations Now What…? Probabilistic Robot “RoboProb” Calculating The real problem is to compute 

49 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Example POMDP – Practical Considerations The real problem is to compute  Given a belief and a control action, the outcome is a distribution over distributions. Because belief is also based on the next measurement, the measurement itself is generated stochastically. Key factor in this update is the conditional probability This probability specifies a distribution over probability distributions.

50 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Example POMDP – Practical Considerations The real problem is to compute  So we make Contain only on non-zero term =

51 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Example POMDP – Practical Considerations The real problem is to compute  Arriving to: Just integrate over measurements instead of Because our space is finite we have With

52 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Example POMDP – Practical Considerations The real problem is to compute  At the end we have something  So, this VIA is far from practical.  For any reasonable number of distinct states, measurements, and controls, the complexity of the value function is prohibitive, even for relatively beginning planning horizons.  Need for approximations

53 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Approximate POMDP Techniques Here we have 3 approximate probabilistic planning and control algorithms  QMDP  AMDP  MC-POMDP Varying degrees of practical applicability. All 3 algorithms relied on approximations of the POMDP value function. They differed in the nature of their approximations.

54 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Approximate POMDP Techniques - QMDP The QMDP framework considers uncertainty only for a single action choice:  Assumes after the immediate next control action, the state of the world suddenly becomes observable.  Full observability make possible to use the MDP-optimal value function.  QMDP generalizes the MDP value function to belief spaces through the mathematical expectation operator.  Planning in QMDPs is as efficient as in MDPs, but the value function generally overestimates the true value of a belief state.

55 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Approximate POMDP Techniques - QMDP Algorithm The QMDP framework considers uncertainty only for a single action choice.

56 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Approximate POMDP Techniques - AMDP Augmented-MDP (AMDP) maps the belief into a lower- dimensional representation, over which it then performs exact value iteration. “Classical" representation consists of the most likely state under a belief, along with the belief entropy. AMDPs are like MDPs with one added dimension in the state representation that measures global degree of uncertainty. To implement AMDP, its necessary to learn the state transition and the reward function in the low-dimensional belief space.

57 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Approximate POMDP Techniques - AMDP “Classical" representation consists of the most likely state under a belief, along with the belief entropy.

58 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Approximate POMDP Techniques - AMDP

59 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Approximate POMDP Techniques - AMDP AMDPs in mobile robot navigation is called coastal navigation. Anticipates uncertainty Selects motion that trades off overall path length with the uncertainty accrued along a path. Resulting trajectories differ significantly from any non- probabilistic solution. Being temporarily lost is acceptable, if the robot can later re-localize with sufficiently high probability.

60 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Approximate POMDP Techniques - AMDP AMDP Algorithm

61 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Approximate POMDP Techniques - AMDP

62 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Approximate POMDP Techniques - MC-POMDP The Monte Carlo MPOMDP (MC-POMDP) Particle filter version of POMDPs. Calculates a value function defined over sets of particles. MC-POMDPs uses local learning technique, which used a locally weighted learning rule in combination with a proximity test based on KL-divergence. MC-POMDPs then apply Monte Carlo sampling to implement an approximate value backup. The resulting algorithm is a full-fledged POMDP algorithm whose computational complexity and accuracy are both functions of the parameters of the learning algorithm.

63 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Approximate POMDP Techniques - MC-POMDP particle set representing belief Value Function

64 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Approximate POMDP Techniques - MC-POMDP MC-POMDP Algorithm

65 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Approximate POMDP Techniques - MC-POMDP

66 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology References and Links References [1] Thrun, Burgard, Fox. Probabilistic Robotics. MIT Press, 2005 Links http://en.wikipedia.org/wiki/Partially_observable_Markov_decision_process http://www.cs.cmu.edu/~trey/zmdp/ http://www.cassandra.org/pomdp/index.shtml http://www.cs.duke.edu/~mlittman/topics/pomdp-page.html

67 TKK | Automation Technology Laboratory AS-84.4340 Postgraduate Course in Automation Technology Exercise Exercise 1 in [1] Chapter 15 A person faces two doors. Behind one is a tiger, behind the other a reward of +10. The person can either listen or open one of the doors. When opening the door with a tiger, the person will be eaten, which has an associated cost of -20. Listening costs -1. When listening, the person will hear a roaring noise that indicates the presence of the tiger, but only with 0.85 probability will the person be able to localize the noise correctly. With 0.15 probability, the noise will appear as if it came from the door hiding the reward. Your questions: (a) Provide the formal model of the POMDP, in which you define the state, action, and measurement spaces, the cost function, and the associated probability functions. (b) What is the expected cumulative payoff/cost of the open-loop action sequence: "Listen, listen, open door 1"? Explain your calculation. (c) What is the expected cumulative payoff/cost of the open-loop action sequence: "Listen, then open the door for which we did not hear a noise"? Again, explain your calculation.


Download ppt "TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta."

Similar presentations


Ads by Google