Deciding Under Probabilistic Uncertainty

Slides:

Advertisements

Similar presentations

Decision Making Under Uncertainty Russell and Norvig: ch 16, 17 CMSC 671 – Fall 2005 material from Lise Getoor, Jean-Claude Latombe, and Daphne Koller.

Advertisements

Solving problems by searching

Planning with Non-Deterministic Uncertainty (Where failure is not an option) R&N: Chap. 12, Sect (+ Chap. 10, Sect 10.7)

Markov Decision Processes (MDPs) read Ch utility-based agents –goals encoded in utility function U(s), or U:S  effects of actions encoded in.

Markov Decision Process

SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.

CSE-573 Artificial Intelligence Partially-Observable MDPS (POMDPs)

For Friday Finish chapter 5 Program 1, Milestone 1 due.

Decision Theoretic Planning

MDP Presentation CS594 Automated Optimal Decision Making Sohail M Yousof Advanced Artificial Intelligence.

Markov Decision Processes

Planning under Uncertainty

POMDPs: Partially Observable Markov Decision Processes Advanced AI

SA-1 1 Probabilistic Robotics Planning and Control: Markov Decision Processes.

KI Kunstmatige Intelligentie / RuG Markov Decision Processes AIMA, Chapter 17.

Nov 14 th  Homework 4 due  Project 4 due 11/26.

Decision Making Under Uncertainty Russell and Norvig: ch 16, 17 CMSC421 – Fall 2005.

Making Decisions under Probabilistic Uncertainty (Where an agent optimizes what it gets on average, but it may get more... or less ) R&N: Chap. 17, Sect.

Markov Decision Processes

CS 326 A: Motion Planning Target Tracking and Virtual Cameras.

Department of Computer Science Undergraduate Events More

Solving problems by searching

Decision Making Under Uncertainty Russell and Norvig: ch 16, 17 CMSC421 – Fall 2003 material from Jean-Claude Latombe, and Daphne Koller.

CS B 659: I NTELLIGENT R OBOTICS Planning Under Uncertainty.

CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.

Utility Theory & MDPs Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart.

MAKING COMPLEX DEClSlONS

CSE-473 Artificial Intelligence Partially-Observable MDPS (POMDPs)

Decision Making in Robots and Autonomous Agents Decision Making in Robots and Autonomous Agents The Markov Decision Process (MDP) model Subramanian Ramamoorthy.

Utilities and MDP: A Lesson in Multiagent System Based on Jose Vidal’s book Fundamentals of Multiagent Systems Henry Hexmoor SIUC.

Department of Computer Science Undergraduate Events More

Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect ,Chap. 17 CS121 – Winter 2003.

Search CPSC 386 Artificial Intelligence Ellen Walker Hiram College.

Decision Making Under Uncertainty CMSC 671 – Fall 2010 R&N, Chapters , , material from Lise Getoor, Jean-Claude Latombe, and.

Decision Theoretic Planning. Decisions Under Uncertainty  Some areas of AI (e.g., planning) focus on decision making in domains where the environment.

Decision Making Under Uncertainty CMSC 471 – Spring 2041 Class #25– Tuesday, April 29 R&N, material from Lise Getoor, Jean-Claude Latombe, and.

Announcements  Upcoming due dates  Wednesday 11/4, 11:59pm Homework 8  Friday 10/30, 5pm Project 3  Watch out for Daylight Savings and UTC.

Automated Planning and Decision Making Prof. Ronen Brafman Automated Planning and Decision Making Fully Observable MDP.

Reinforcement Learning Dynamic Programming I Subramanian Ramamoorthy School of Informatics 31 January, 2012.

Department of Computer Science Undergraduate Events More

Markov Decision Process (MDP)

Planning Under Uncertainty. Sensing error Partial observability Unpredictable dynamics Other agents.

Web-Mining Agents Agents and Rational Behavior Decision-Making under Uncertainty Complex Decisions Ralf Möller Universität zu Lübeck Institut für Informationssysteme.

Decision Making Under Uncertainty CMSC 471 – Fall 2011 Class #23-24 – Thursday, November 17 / Tuesday, November 22 R&N, Chapters , ,

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Achieving Goals in Decentralized POMDPs Christopher Amato Shlomo Zilberstein UMass.

Solving problems by searching Chapter 3. Types of agents Reflex agent Consider how the world IS Choose action based on current percept Do not consider.

Ralf Möller Universität zu Lübeck Institut für Informationssysteme

Solving problems by searching

Markov Decision Process (MDP)

Announcements Grader office hours posted on course website

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3

Markov Decision Processes II

Prof. Dr. Holger Schlingloff 1,2 Dr. Esteban Pavese 1

CS b659: Intelligent Robotics

Making complex decisions

ECE 448 Lecture 4: Search Intro

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3

Markov Decision Processes

Markov Decision Processes

Markov Decision Processes

Announcements Homework 3 due today (grace period through Friday)

CS 188: Artificial Intelligence Fall 2007

13. Acting under Uncertainty Wolfram Burgard and Bernhard Nebel

Instructor: Vincent Conitzer

CS 416 Artificial Intelligence

Reinforcement Learning Dealing with Partial Observability

CS 416 Artificial Intelligence

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3

Presentation transcript:

Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003 Deciding Under Probabilistic Uncertainty

Non-deterministic vs. Probabilistic Uncertainty ? b a c {a,b,c} decision that is best for worst case ~ Adversarial search ? b a c {a(pa),b(pb),c(pc)} decision that maximizes expected utility value Probabilistic model Non-deterministic model Deciding Under Probabilistic Uncertainty

One State/One Action Example U(S0) = 100 x 0.2 + 50 x 0.7 + 70 x 0.1 = 20 + 35 + 7 = 62 s3 s2 s1 0.2 0.7 0.1 100 50 70  utility of state Deciding Under Probabilistic Uncertainty

One State/Two Actions Example U1(S0) = 62 U2(S0) = 74 U(S0) = max{U1(S0),U2(S0)} = 74 s0 A1 A2 s3 s2 s1 s4 0.2 0.7 0.1 0.2 0.8 100 50 70 80 Deciding Under Probabilistic Uncertainty

Introducing Action Costs U1(S0) = 62 – 5 = 57 U2(S0) = 74 – 25 = 49 U(S0) = max{U1(S0),U2(S0)} = 57 s0 A1 A2 -5 -25 s3 s2 s1 s4 0.2 0.7 0.1 0.2 0.8 100 50 70 80 Deciding Under Probabilistic Uncertainty

Example: Finding Juliet A robot, Romeo, is in Charles’ office and must deliver a letter to Juliet Juliet is either in her office, or in the conference room. Without other prior knowledge, each possibility has probability 0.5 The robot’s goal is to minimize the time spent in transit Charles’ off. Juliet’s off. Conf. room 10min 5min Deciding Under Probabilistic Uncertainty

Example: Finding Juliet States are: S0: Romeo in Charles’ office S1: Romeo in Juliet’s office and Juliet here S2: Romeo in Juliet’s office and Juliet not here S3: Romeo in conference room and Juliet here S4: Romeo in conference room and Juliet not here Actions are: GJO (go to Juliet’s office) GCR (go to conference room) Deciding Under Probabilistic Uncertainty

Deciding Under Probabilistic Uncertainty Utility Computation Juliet’s off. 5min 10min Charles’ off. GJO GCR 1 2 3 4 10min Conf. room -10 -10 -10 -15 Deciding Under Probabilistic Uncertainty

n-Step Decision Process 1 2 3 There is a single initial state States reached after i steps are all different from those reached after ji steps Each state i has a its reward R(i) Each state reached after n steps is terminal The goal is to maximize the sum of rewards Deciding Under Probabilistic Uncertainty

n-Step Decision Process Utility of state i: U(i) = R(i) + maxa SkP(k | a.i) U(k) i 1 2 3 Two possible actions from i : a1 and a2 a1 leads to k11 with probability P11 or k12 with probability P12 P11 = P(k11 | a1.i) P12 = P(k12 | a1.i) a2 leads to k21 with probability P21 or k22 with probability P22 U(i) = R(i) + max {P11 U(k11) + P12 U(k12), P21 U(k21) + P22 U(k22)} Deciding Under Probabilistic Uncertainty

n-Step Decision Process Utility of state i: U(i) = R(i) + maxa SkP(k | a.i) U(k) Best choice of action at state i: P*(i) = arg maxa SkP(k | a.i) U(k) i 1 2 3 Optimal policy For j = n-1, n-2, …, 0 do: For every state si attained after step j Compute the utility of si Label that state with the corresponding best action Deciding Under Probabilistic Uncertainty

n-Step Decision Process … … with costs on actions instead of rewards on states i Utility of state i: U(i) = maxa (SkP(k | a.i) U(k) – Ca) Best choice of action at state i: P*(i) = arg maxa (SkP(k|a.i)U(k)–Ca) 1 2 3 Deciding Under Probabilistic Uncertainty

Deciding Under Probabilistic Uncertainty 1 2 3 i GJO GCR 1 2 3 4 Deciding Under Probabilistic Uncertainty

Deciding Under Probabilistic Uncertainty 1 2 3 i GJO 2 3 GCR GCR 4 1 Deciding Under Probabilistic Uncertainty GJO

Deciding Under Probabilistic Uncertainty Target Tracking target robot The robot must keep the target in view The target’s trajectory is not known in advance The environment may or may not be known Deciding Under Probabilistic Uncertainty

Deciding Under Probabilistic Uncertainty Target Tracking target robot The robot must keep the target in view The target’s trajectory is not known in advance The environment may or may not be known Deciding Under Probabilistic Uncertainty

States Are Indexed by Time ([i,j], [u,v], t) ([i+1,j], [u,v], t+1) ([i+1,j], [u-1,v], t+1) ([i+1,j], [u+1,v], t+1) ([i+1,j], [u,v-1], t+1) ([i+1,j], [u,v+1], t+1) right State = (robot-position, target-position, time) Action = (stop, up, down, right, left) Outcome of an action = 5 possible states, each with probability 0.2 Each state has 25 successors Deciding Under Probabilistic Uncertainty

h-Step Planning Process Planning horizon h “Terminal” states: States where the target is not visible States at depth h R(state) = at where 0 < a < 1 discounting Reward function: Target visible  +1 Target not visible  0 Maximizing the sum of rewards ~ maximizing escape time Deciding Under Probabilistic Uncertainty

h-Step Planning Process Planning horizon h The planner computes the optimal policy over this tree of states, but only the first step of the policy is executed. Then everything is repeated again … (sliding horizon) Deciding Under Probabilistic Uncertainty

h-Step Planning Process Planning horizon h Deciding Under Probabilistic Uncertainty

h-Step Planning Process Planning horizon h h is chosen such that the computation of the optimal policy over the tree can be computed in one increment of time Deciding Under Probabilistic Uncertainty

Example With No Planner Deciding Under Probabilistic Uncertainty

Deciding Under Probabilistic Uncertainty Example With Planner Deciding Under Probabilistic Uncertainty

Deciding Under Probabilistic Uncertainty Other Example Deciding Under Probabilistic Uncertainty

h-Step Planning Process Planning horizon h The optimal policy over this tree is not the optimal policy that would have been computed if a prior model of the environment had been available, along with an arbitrarily fast computer Deciding Under Probabilistic Uncertainty

Deciding Under Probabilistic Uncertainty 1 2 3 i GJO GCR 1 2 3 4 Deciding Under Probabilistic Uncertainty

Simple Robot Navigation Problem In each state, the possible actions are U, D, R, and L Deciding Under Probabilistic Uncertainty

Probabilistic Transition Model In each state, the possible actions are U, D, R, and L The effect of U is as follows (transition model): With probability 0.8 the robot moves up one square (if the robot is already in the top row, then it does not move) Deciding Under Probabilistic Uncertainty

Probabilistic Transition Model In each state, the possible actions are U, D, R, and L The effect of U is as follows (transition model): With probability 0.8 the robot moves up one square (if the robot is already in the top row, then it does not move) With probability 0.1 the robot moves right one square (if the robot is already in the rightmost row, then it does not move) Deciding Under Probabilistic Uncertainty

Probabilistic Transition Model In each state, the possible actions are U, D, R, and L The effect of U is as follows (transition model): With probability 0.8 the robot moves up one square (if the robot is already in the top row, then it does not move) With probability 0.1 the robot moves right one square (if the robot is already in the rightmost row, then it does not move) With probability 0.1 the robot moves left one square (if the robot is already in the leftmost row, then it does not move) Deciding Under Probabilistic Uncertainty

Deciding Under Probabilistic Uncertainty Markov Property The transition properties depend only on the current state, not on previous history (how that state was reached) Deciding Under Probabilistic Uncertainty

Deciding Under Probabilistic Uncertainty Sequence of Actions 3 2 1 1 2 3 4 Planned sequence of actions: (U, R) U is executed Deciding Under Probabilistic Uncertainty

Deciding Under Probabilistic Uncertainty Sequence of Actions 3 2 1 1 2 3 4 Planned sequence of actions: (U, R) U is executed Deciding Under Probabilistic Uncertainty

Deciding Under Probabilistic Uncertainty Sequence of Actions 3 2 1 1 2 3 4 Planned sequence of actions: (U, R) U is executed R is executed Deciding Under Probabilistic Uncertainty

Deciding Under Probabilistic Uncertainty Sequence of Actions 3 2 1 1 2 3 4 Planned sequence of actions: (U, R) U is executed R is executed Deciding Under Probabilistic Uncertainty

Deciding Under Probabilistic Uncertainty Sequence of Actions [3,2] 3 2 1 1 2 3 4 Planned sequence of actions: (U, R) Deciding Under Probabilistic Uncertainty

Deciding Under Probabilistic Uncertainty Sequence of Actions [3,2] [4,2] [3,3] 3 2 1 1 2 3 4 Planned sequence of actions: (U, R) U is executed Deciding Under Probabilistic Uncertainty

Deciding Under Probabilistic Uncertainty Histories [3,2] [4,2] [3,3] 3 2 1 [3,3] [3,2] [4,1] [4,2] [4,3] [3,1] 1 2 3 4 Planned sequence of actions: (U, R) U has been executed R is executed There are 9 possible sequences of states – called histories – and 6 possible final states for the robot! Deciding Under Probabilistic Uncertainty

Probability of Reaching the Goal 3 Note importance of Markov property in this derivation 2 1 1 2 3 4 P([4,3] | (U,R).[3,2]) = P([4,3] | R.[3,3]) x P([3,3] | U.[3,2]) + P([4,3] | R.[4,2]) x P([4,2] | U.[3,2]) P([4,3] | R.[3,3]) = 0.8 P([4,3] | R.[4,2]) = 0.1 P([3,3] | U.[3,2]) = 0.8 P([4,2] | U.[3,2]) = 0.1 P([4,3] | (U,R).[3,2]) = 0.65 Deciding Under Probabilistic Uncertainty

Deciding Under Probabilistic Uncertainty Utility Function 3 +1 2 -1 1 1 2 3 4 The robot needs to recharge its batteries [4,3] provides power supply [4,2] is a sand area from which the robot cannot escape [4,3] or [4,2] are terminal states Reward of a terminal state: +1 or -1 Reward of a non-terminal state: -1/25 Utility of a history: sum of rewards of traversed states Goal: Maximize the utility of the history Deciding Under Probabilistic Uncertainty

Deciding Under Probabilistic Uncertainty Histories are potentially unbounded and the same state can be reached many times +1 3 -1 2 1 1 2 3 4 Deciding Under Probabilistic Uncertainty

Utility of an Action Sequence +1 3 -1 2 1 1 2 3 4 Consider the action sequence (U,R) from [3,2] Deciding Under Probabilistic Uncertainty

Utility of an Action Sequence [3,2] [4,2] [3,3] [4,1] [4,3] [3,1] +1 3 -1 2 1 1 2 3 4 Consider the action sequence (U,R) from [3,2] A run produces one among 7 possible histories, each with some probability Deciding Under Probabilistic Uncertainty

Utility of an Action Sequence [3,2] [4,2] [3,3] [4,1] [4,3] [3,1] +1 3 -1 2 1 1 2 3 4 Consider the action sequence (U,R) from [3,2] A run produces one among 7 possible histories, each with some probability The utility of the sequence is the expected utility of the histories: U = ShUh P(h) Deciding Under Probabilistic Uncertainty

Optimal Action Sequence [3,2] [4,2] [3,3] [4,1] [4,3] [3,1] +1 3 -1 2 1 1 2 3 4 Consider the action sequence (U,R) from [3,2] A run produces one among 7 possible histories, each with some probability The utility of the sequence is the expected utility of the histories The optimal sequence is the one with maximal utility Deciding Under Probabilistic Uncertainty

Optimal Action Sequence [3,2] [4,2] [3,3] [4,1] [4,3] [3,1] +1 3 -1 2 1 1 2 3 4 Consider the action sequence (U,R) from [3,2] A run produces one among 7 possible histories, each with some probability The utility of the sequence is the expected utility of the histories The optimal sequence is the one with maximal utility But is the optimal action sequence what we want to compute? NO!! Except if sequence is executed blindly (open-loop strategy) Deciding Under Probabilistic Uncertainty

Reactive Agent Algorithm observable state Repeat: s  sensed state If s is terminal then exit a  choose action (given s) Perform a Deciding Under Probabilistic Uncertainty

Deciding Under Probabilistic Uncertainty Utility of state i: U(i) = maxa (SkP(k | a.i) U(k) – Ca) Best choice of action at state i: P*(i) = arg maxa (SkP(k|a.i)U(k)–Ca) 1 2 3 i Deciding Under Probabilistic Uncertainty

Policy (Reactive/Closed-Loop Strategy) -1 +1 2 3 1 4 A policy P is a complete mapping from states to actions Deciding Under Probabilistic Uncertainty

Reactive Agent Algorithm Repeat: s  sensed state If s is terminal then exit a  P(s) Perform a Deciding Under Probabilistic Uncertainty

Optimal Policy Note that [3,2] is a “dangerous” +1 3 -1 2 1 Note that [3,2] is a “dangerous” state that the optimal policy tries to avoid 1 2 3 4 A policy P is a complete mapping from states to actions The optimal policy P* is the one that always yields a history (ending at a terminal state) with maximal expected utility Makes sense because of Markov property Deciding Under Probabilistic Uncertainty

Optimal Policy How to compute P*? This problem is called a +1 3 -1 2 1 1 2 3 4 This problem is called a Markov Decision Problem (MDP) A policy P is a complete mapping from states to actions The optimal policy P* is the one that always yields a history with maximal expected utility How to compute P*? Deciding Under Probabilistic Uncertainty

Deciding Under Probabilistic Uncertainty Utility of state i: U(i) = maxa (SkP(k | a.i) U(k) – Ca) Best choice of action at state i: P*(i) = arg maxa (SkP(k|a.i)U(k)–Ca) 1 2 3 i Deciding Under Probabilistic Uncertainty

Deciding Under Probabilistic Uncertainty The trick used in target-tracking (indexing state by time) can be applied … -1 +1 … but would yield a large tree + sub-optimal policy Deciding Under Probabilistic Uncertainty

Deciding Under Probabilistic Uncertainty First-Step Analysis Simulate one step: U(i) = R(i) + maxa SkP(k | a.i) U(k) P*(i) = arg maxa SkP(k | a.i) U(k) ( Principle of Max Expected Utility) +1 -1 ? Deciding Under Probabilistic Uncertainty

Deciding Under Probabilistic Uncertainty What is the Difference? 1 2 3 -1 +1 P*(i) = arg maxa SkP(k | a.i) U(k) U(i) = R(i) + maxa SkP(k | a.i) U(k) Deciding Under Probabilistic Uncertainty

Deciding Under Probabilistic Uncertainty Value Iteration Initialize the utility of each non-terminal state si to U0(i) = 0 For t = 0, 1, 2, …, do: Ut+1(i)  R(i) + maxa SkP(k | a.i) Ut(k) -1 +1 2 3 1 4 Deciding Under Probabilistic Uncertainty

Deciding Under Probabilistic Uncertainty Value Iteration Note the importance of terminal states and connectivity of the state-transition graph Initialize the utility of each non-terminal state si to U0(i) = 0 For t = 0, 1, 2, …, do: Ut+1(i)  R(i) + maxa SkP(k | a.i) Ut(k) Ut([3,1]) t 30 20 10 0.611 0.5 Not very different from indexing states by time 0.812 0.868 0.918 +1 3 0.762 0.660 -1 2 0.705 0.655 0.611 0.388 1 1 2 3 4 Deciding Under Probabilistic Uncertainty

Deciding Under Probabilistic Uncertainty Policy Iteration Pick a policy P at random Deciding Under Probabilistic Uncertainty

Deciding Under Probabilistic Uncertainty Policy Iteration Pick a policy P at random Repeat: Compute the utility of each state for P U (i) = R(i) + SkP(k | P(i).i) U (k) Set of linear equations (often a sparse system) Deciding Under Probabilistic Uncertainty

Deciding Under Probabilistic Uncertainty Policy Iteration Pick a policy P at random Repeat: Compute the utility of each state for P U (i) = R(i) + SkP(k | P(i).i) U (k) Compute the policy P’ given these utilities P’(i) = arg maxa SkP(k | a.i) U(k) If P’ = P then return P else P = P’ Deciding Under Probabilistic Uncertainty

Deciding Under Probabilistic Uncertainty Application of First Step Analysis Computing the Probability of Folding pfold of a Protein HIV integrase 1- pfold pfold Merge with previous Unfolded state Folded state Deciding Under Probabilistic Uncertainty

Computation Through Simulation Ensemble property = global many paths property 10K to 30K independent simulations Deciding Under Probabilistic Uncertainty

Deciding Under Probabilistic Uncertainty Capture the stochastic nature of molecular motion by a probabilistic roadmap vi vj Pij animation Deciding Under Probabilistic Uncertainty

Deciding Under Probabilistic Uncertainty Edge probabilities Follow Metropolis criteria: vi Pij Pii Self-transition probability: animation vj Deciding Under Probabilistic Uncertainty

Deciding Under Probabilistic Uncertainty First-Step Analysis U: Unfolded set F: Folded set One linear equation per node Solution gives pfold for all nodes No explicit simulation run All pathways are taken into account Sparse linear system l k j Pik Pil Pij m animation Pim i Pii Let fi = pfold(i) After one step: fi = Pii fi + Pij fj + Pik fk + Pil fl + Pim fm =1 Deciding Under Probabilistic Uncertainty

Partially Observable Markov Decision Problem Uncertainty in sensing: A sensing operation returns multiple states, with a probability distribution Deciding Under Probabilistic Uncertainty

Example: Target Tracking There is uncertainty in the robot’s and target’s positions, and this uncertainty grows with further motion There is a risk that the target escape behind the corner requiring the robot to move appropriately But there is a positioning landmark nearby. Should the robot tries to reduce position uncertainty? Deciding Under Probabilistic Uncertainty

Deciding Under Probabilistic Uncertainty Summary Probabilistic uncertainty Utility function Optimal policy Maximal expected utility Value iteration Policy iteration Deciding Under Probabilistic Uncertainty