Decision Theory: Sequential Decisions Computer Science cpsc322, Lecture 34 (Textbook Chpt 9.3) April, 1, 2009.

Slides:



Advertisements
Similar presentations
Decision Theory: Sequential Decisions Computer Science cpsc322, Lecture 34 (Textbook Chpt 9.3) Nov, 28, 2012.
Advertisements

CPSC 322, Lecture 4Slide 1 Search: Intro Computer Science cpsc322, Lecture 4 (Textbook Chpt ) Sept, 11, 2013.
Making Simple Decisions
Department of Computer Science Undergraduate Events More
CPSC 322, Lecture 5Slide 1 Uninformed Search Computer Science cpsc322, Lecture 5 (Textbook Chpt 3.4) January, 14, 2009.
CPSC 322, Lecture 2Slide 1 Representational Dimensions Computer Science cpsc322, Lecture 2 (Textbook Chpt1) Sept, 6, 2013.
Reasoning Under Uncertainty: Bayesian networks intro Jim Little Uncertainty 4 November 7, 2014 Textbook §6.3, 6.3.1, 6.5, 6.5.1,
Decision Making Under Uncertainty Jim Little Decision 1 Nov
CPSC 502, Lecture 11Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 11 Oct, 18, 2011.
Planning under Uncertainty
CPSC 322, Lecture 16Slide 1 Stochastic Local Search Variants Computer Science cpsc322, Lecture 16 (Textbook Chpt 4.8) February, 9, 2009.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.
CPSC 322, Lecture 35Slide 1 Finish VE for Sequential Decisions & Value of Information and Control Computer Science cpsc322, Lecture 35 (Textbook Chpt 9.4)
CPSC 322, Lecture 19Slide 1 Propositional Logic Intro, Syntax Computer Science cpsc322, Lecture 19 (Textbook Chpt ) February, 23, 2009.
Decision Theory: Single Stage Decisions Computer Science cpsc322, Lecture 33 (Textbook Chpt 9.2) March, 30, 2009.
CPSC 322, Lecture 37Slide 1 Finish Markov Decision Processes Last Class Computer Science cpsc322, Lecture 37 (Textbook Chpt 9.5) April, 8, 2009.
CPSC 322, Lecture 4Slide 1 Search: Intro Computer Science cpsc322, Lecture 4 (Textbook Chpt ) January, 12, 2009.
CPSC 322, Lecture 18Slide 1 Planning: Heuristics and CSP Planning Computer Science cpsc322, Lecture 18 (Textbook Chpt 8) February, 12, 2010.
CPSC 322, Lecture 30Slide 1 Reasoning Under Uncertainty: Variable elimination Computer Science cpsc322, Lecture 30 (Textbook Chpt 6.4) March, 23, 2009.
CPSC 322, Lecture 11Slide 1 Constraint Satisfaction Problems (CSPs) Introduction Computer Science cpsc322, Lecture 11 (Textbook Chpt 4.0 – 4.2) January,
CPSC 322, Lecture 12Slide 1 CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12 (Textbook Chpt ) January, 29, 2010.
Decision Theory: Sequential Decisions Computer Science cpsc322, Lecture 34 (Textbook Chpt 9.3) April, 12, 2010.
CPSC 322, Lecture 17Slide 1 Planning: Representation and Forward Search Computer Science cpsc322, Lecture 17 (Textbook Chpt 8.1 (Skip )- 8.2) February,
CPSC 322, Lecture 32Slide 1 Probability and Time: Hidden Markov Models (HMMs) Computer Science cpsc322, Lecture 32 (Textbook Chpt 6.5) March, 27, 2009.
Department of Computer Science Undergraduate Events More
CPSC 322, Lecture 29Slide 1 Reasoning Under Uncertainty: Bnet Inference (Variable elimination) Computer Science cpsc322, Lecture 29 (Textbook Chpt 6.4)
CPSC 322, Lecture 35Slide 1 Value of Information and Control Computer Science cpsc322, Lecture 35 (Textbook Chpt 9.4) April, 14, 2010.
CPSC 322, Lecture 24Slide 1 Reasoning under Uncertainty: Intro to Probability Computer Science cpsc322, Lecture 24 (Textbook Chpt 6.1, 6.1.1) March, 15,
Decision Theory: Optimal Policies for Sequential Decisions CPSC 322 – Decision Theory 3 Textbook §9.3 April 4, 2011.
Slide 1 Constraint Satisfaction Problems (CSPs) Introduction Jim Little UBC CS 322 – CSP 1 September 27, 2014 Textbook §
Reasoning Under Uncertainty: Bayesian networks intro CPSC 322 – Uncertainty 4 Textbook §6.3 – March 23, 2011.
Department of Computer Science Undergraduate Events More
Constraint Satisfaction Problems (CSPs) CPSC 322 – CSP 1 Poole & Mackworth textbook: Sections § Lecturer: Alan Mackworth September 28, 2012.
CPSC 322, Lecture 32Slide 1 Probability and Time: Hidden Markov Models (HMMs) Computer Science cpsc322, Lecture 32 (Textbook Chpt 6.5.2) Nov, 25, 2013.
Department of Computer Science Undergraduate Events More
Lecture 33, Bayesian Networks Wrap Up Intro to Decision Theory Slide 1.
CPSC 322, Lecture 19Slide 1 (finish Planning) Propositional Logic Intro, Syntax Computer Science cpsc322, Lecture 19 (Textbook Chpt – 5.2) Oct,
1 CSC 384 Lecture Slides (c) , C. Boutilier and P. Poupart CSC384: Lecture 25  Last time Decision trees and decision networks  Today wrap up.
Department of Computer Science Undergraduate Events More
Reasoning Under Uncertainty: Independence and Inference CPSC 322 – Uncertainty 5 Textbook §6.3.1 (and for HMMs) March 25, 2011.
CPSC 322, Lecture 2Slide 1 Representational Dimensions Computer Science cpsc322, Lecture 2 (Textbook Chpt1) Sept, 7, 2012.
Decision Theory: Single & Sequential Decisions. VE for Decision Networks. CPSC 322 – Decision Theory 2 Textbook §9.2 April 1, 2011.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) Nov, 13, 2013.
Example: Localization for “Pushed around” Robot
Please Also Complete Teaching Evaluations
Reasoning Under Uncertainty: Belief Networks
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
Computer Science cpsc322, Lecture 20
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 2
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
Chapter 10 (part 3): Using Uncertain Knowledge
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3
Planning: Representation and Forward Search
Planning Under Uncertainty Computer Science cpsc322, Lecture 11
& Decision Theory: Intro
Decision Theory: Single Stage Decisions
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
Planning: Representation and Forward Search
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 2
Decision Theory: Single & Sequential Decisions.
Decision Theory: Sequential Decisions
Reasoning Under Uncertainty: Bnet Inference
Decision Theory: Single Stage Decisions
Reasoning Under Uncertainty: Bnet Inference
Please Also Complete Teaching Evaluations
VE for Decision Networks,
Planning: Representation and Forward Search
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3
Presentation transcript:

Decision Theory: Sequential Decisions Computer Science cpsc322, Lecture 34 (Textbook Chpt 9.3) April, 1, 2009

Lecture Overview Recap (Example One-off decision, single stage decision network) Sequential Decisions Finding Optimal Policies

“Single” Action vs. Sequence of Actions Set of primitive decisions that can be treated as a single macro decision to be made before acting Agent makes observations Decides on an action Carries out the action

Recap: One-off decision example Delivery Robot Example Robot needs to reach a certain room Going through stairs may cause an accident. It can go the short way through long stairs, or the long way through short stairs (that reduces the chance of an accident but takes more time) The Robot can choose to wear pads to protect itself or not (to protect itself in case of an accident) but pads slow it down If there is an accident the Robot does not get to the room

Lecture Overview Recap (Example One-off decision, single stage decision network) Sequential Decisions Representation Policies Finding Optimal Policies

Sequential decision problems A sequential decision problem consists of a sequence of decision variables D 1,….., D n. Each D i has an information set of variables pD i, whose value will be known at the time decision D i is made.

Sequential decisions : Simplest possible Only one decision! (but different from one-off decisions) Early in the morning. Shall I take my umbrella today? (I’ll have to go for a long walk at noon) Relevant Random Variables?

Policies for Sequential Decision Problem: Intro A policy specifies what an agent should do under each circumstance (for each decision, consider the parents of the decision node) In the Umbrella “degenerate” case: D1D1 pD 1 How many policies? One possible Policy

Sequential decision problems: “complete” Example A sequential decision problem consists of a sequence of decision variables D 1,….., D n. Each D i has an information set of variables pD i, whose value will be known at the time decision D i is made. No-forgetting decision network: decisions are totally ordered if a decision D b comes before D a,then D b is a parent of D a any parent of D b is a parent of D a

Policies for Sequential Decision Problems A policy is a sequence of δ 1,….., δ n decision functions δ i : dom(pD i ) → dom(D i ) This policy means that when the agent has observed O  dom(pD i ), it will do δ i (O) Example: ReportCheck Smoke Report CheckSmoke SeeSmokeCall true true true true true false true false true true false false false true true false true false false false true false false false true false true false true false How many policies?

Lecture Overview Recap Sequential Decisions Finding Optimal Policies

When does a possible world satisfy a policy? A possible world specifies a value for each random variable and each decision variable. Possible world w satisfies policy δ, written w ╞ δ if the value of each decision variable is the value selected by its decision function in the policy (when applied in w). ReportCheck Smoke true false true false Report CheckSmoke SeeSmoke Call true true true true true false true false true true false false false true true false true false false false true false false false true false true false true false VARs Fire Tampering Alarm Leaving Report Smoke SeeSmoke CheckSmoke Call true false true falsetrue true Decision function for…

When does a possible world satisfy a policy? Possible world w satisfies policy δ, written w ╞ δ if the value of each decision variable is the value selected by its decision function in the policy (when applied in w). ReportCheck Smoke true false true false Report CheckSmoke SeeSmoke Call true true true true true false true false true true false false false true true false true false false false true false false false true false true false true false Decision function for… VARs Fire Tampering Alarm Leaving Report Smoke SeeSmoke CheckSmoke Call true false true true

Expected Value of a Policy Each possible world w has a probability P( w ) and a utility U( w ) The expected utility of policy δ is The optimal policy is one with the expected utility.

Lecture Overview Recap Sequential Decisions Finding Optimal Policies (Efficiently)

Complexity of finding the optimal policy: how many policies? If a decision D has k binary parents, how many assignments of values to the parents are there? If there are b possible actions, how many different decision functions are there? If there are d decisions, each with k binary parents and b possible actions, how many policies are there? How many assignments to parents? How many decision functions? (binary decisions) How many policies?

Finding the optimal policy more efficiently: VE 1.Remove all variables that are not ancestors of the utility node 2.Create a factor for each conditional probability table and a factor for the utility. 3.Sum out random variables that are not parents of a decision node. 4.Eliminate (aka sum out) the decision variables 5.Sum out the remaining random variables. 6.Multiply the factors: this is the expected utility of the optimal policy.

Eliminate the decision Variables: details Select a variable D that corresponds to the latest decision to be made this variable will appear in only one factor with its parents Eliminate D by maximizing. This returns: The optimal decision function for D, arg max D f A new factor to use in VE, max D f Repeat till there are no more decision nodes. Report CheckSmoke Value true true false false true false Example: Eliminate CheckSmoke Report CheckSmoke true false Report Value true false New factor Decision Function

VE elimination reduces complexity of finding the optimal policy We have seen that, if a decision D has k binary parents, there are b possible actions, If there are d decisions, Then there are: ( b 2 k ) d policies Doing variable elimination lets us find the optimal policy after considering only d.b 2 k policies (we eliminate one decision at a time) VE is much more efficient than searching through policy space. However, this complexity is still doubly-exponential we'll only be able to handle relatively small problems.

CPSC 322, Lecture 4Slide 20 Learning Goals for today’s class You can: Represent sequential decision problems as decision networks. And explain the non forgetting property Verify whether a possible world satisfies a policy and define the expected value of a policy Compute the number of policies for a decision problem Compute the optimal policy by Variable Elimination

Next class Value of Information and control – textbook sect 9.4 (needed for last question Assign. #4) More examples of decision networks Return Assign3 Need to talk to student