1 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart CSC384: Lecture 24  Last time Variable elimination, decision trees  Today decision.

Slides:



Advertisements
Similar presentations
Decision Theory: Sequential Decisions Computer Science cpsc322, Lecture 34 (Textbook Chpt 9.3) Nov, 28, 2012.
Advertisements

Making Simple Decisions Chapter 16 Some material borrowed from Jean-Claude Latombe and Daphne Koller by way of Marie desJadines,
Making Simple Decisions
Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.
THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 5220: Reasoning and Decision under Uncertainty L09: Graphical Models for Decision Problems Nevin.
Dynamic Bayesian Networks (DBNs)
Slide 1 Planning: Representation and Forward Search Jim Little UBC CS 322 October 10, 2014 Textbook §8.1 (Skip )- 8.2)
Lirong Xia Hidden Markov Models Tue, March 28, 2014.
CPSC 502, Lecture 11Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 11 Oct, 18, 2011.
Planning under Uncertainty
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.
CPSC 322, Lecture 35Slide 1 Finish VE for Sequential Decisions & Value of Information and Control Computer Science cpsc322, Lecture 35 (Textbook Chpt 9.4)
Decision Theory: Single Stage Decisions Computer Science cpsc322, Lecture 33 (Textbook Chpt 9.2) March, 30, 2009.
Decision Theory: Sequential Decisions Computer Science cpsc322, Lecture 34 (Textbook Chpt 9.3) April, 1, 2009.
10/28 Temporal Probabilistic Models. Temporal (Sequential) Process A temporal process is the evolution of system state over time Often the system state.
Bayesian Networks. Motivation The conditional independence assumption made by naïve Bayes classifiers may seem to rigid, especially for classification.
Decision Making Under Uncertainty Russell and Norvig: ch 16, 17 CMSC421 – Fall 2005.
Decision Theory: Sequential Decisions Computer Science cpsc322, Lecture 34 (Textbook Chpt 9.3) April, 12, 2010.
CPSC 322, Lecture 17Slide 1 Planning: Representation and Forward Search Computer Science cpsc322, Lecture 17 (Textbook Chpt 8.1 (Skip )- 8.2) February,
CPSC 322, Lecture 17Slide 1 Planning: Representation and Forward Search Computer Science cpsc322, Lecture 17 (Textbook Chpt 8.1 (Skip )- 8.2) February,
5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.
CPSC 322, Lecture 31Slide 1 Probability and Time: Markov Models Computer Science cpsc322, Lecture 31 (Textbook Chpt 6.5) March, 25, 2009.
CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley.
11/14  Continuation of Time & Change in Probabilistic Reasoning Project 4 progress? Grade Anxiety? Make-up Class  On Monday?  On Wednesday?
CS 188: Artificial Intelligence Fall 2006 Lecture 17: Bayes Nets III 10/26/2006 Dan Klein – UC Berkeley.
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
Announcements Homework 8 is out Final Contest (Optional)
Decision Making Under Uncertainty Russell and Norvig: ch 16, 17 CMSC421 – Fall 2003 material from Jean-Claude Latombe, and Daphne Koller.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
CMSC 671 Fall 2003 Class #26 – Wednesday, November 26 Russell & Norvig 16.1 – 16.5 Some material borrowed from Jean-Claude Latombe and Daphne Koller by.
CPSC 502, Lecture 13Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 13 Oct, 25, 2011 Slide credit POMDP: C. Conati.
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
1 Factored MDPs Alan Fern * * Based in part on slides by Craig Boutilier.
Announcements Project 4: Ghostbusters Homework 7
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
Bayesian Networks CSE 473. © D. Weld and D. Fox 2 Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential.
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 2, 2015.
1 CSC 384 Lecture Slides (c) , C. Boutilier and P. Poupart CSC384: Lecture 25  Last time Decision trees and decision networks  Today wrap up.
Inference Algorithms for Bayes Networks
Learning and Acting with Bayes Nets Chapter 20.. Page 2 === A Network and a Training Data.
Preferences I give robot a planning problem: I want coffee
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
Making Simple Decisions Chapter 16 Some material borrowed from Jean-Claude Latombe and Daphne Koller by way of Marie desJadines,
1 CSC 384 Lecture Slides (c) , C. Boutilier and P. Poupart CSC384: Lecture 16  Last time Searching a Graphplan for a plan, and relaxed plan heuristics.
Decision Theory: Single & Sequential Decisions. VE for Decision Networks. CPSC 322 – Decision Theory 2 Textbook §9.2 April 1, 2011.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) Nov, 13, 2013.
1 Automated Planning and Decision Making 2007 Automated Planning and Decision Making Prof. Ronen Brafman Various Subjects.
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 15 th, 2008 Readings: K&F: 8.1, 8.2, 8.3,
Probabilistic Reasoning Inference and Relational Bayesian Networks.
Nevin L. Zhang Room 3504, phone: ,
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Planning: Representation and Forward Search
Artificial Intelligence
CS 4/527: Artificial Intelligence
Decision Theory: Single Stage Decisions
Planning: Representation and Forward Search
Inference Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation: B.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Decision Theory: Sequential Decisions
CAP 5636 – Advanced Artificial Intelligence
Decision Theory: Single Stage Decisions
CS 188: Artificial Intelligence
Class #19 – Tuesday, November 3
CS 188: Artificial Intelligence
CS 188: Artificial Intelligence Fall 2008
Class #16 – Tuesday, October 26
CS 188: Artificial Intelligence Spring 2006
CS 188: Artificial Intelligence Fall 2008
Planning: Representation and Forward Search
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Presentation transcript:

1 CSC 384 Lecture Slides (c) , C. Boutilier and P. Poupart CSC384: Lecture 24  Last time Variable elimination, decision trees  Today decision trees, decision networks  Readings: Today: 10.4 (decision trees, decision networks) Next week: wrap up  Announcements: none

2 CSC 384 Lecture Slides (c) , C. Boutilier and P. Poupart Evaluating a Decision Tree  U(n3) =.9*5 +.1*2  U(n4) =.8*3 +.2*4  U(s2) = max{U(n3), U(n4)} decision a or b (whichever is max)  U(n1) =.3U(s2) +.7U(s3)  U(s1) = max{U(n1), U(n2 )} decision: max of a, b s2 n3 ab n s1 n1 ab n2 s3

3 CSC 384 Lecture Slides (c) , C. Boutilier and P. Poupart Decision Tree Policies  Note that we don’t just compute values, but policies for the tree  A policy assigns a decision to each choice node in tree s2 n3 ab n4 s1 n1 ab n2 s3s4 abab

4 CSC 384 Lecture Slides (c) , C. Boutilier and P. Poupart Large State Spaces (Variables)  Tree size: grows exponentially with depth  Specification: suppose each state is an assignment to variables then the number of distinct states can be huge  To represent outcomes of actions or decisions, we need to specify distributions Pr(s|d) : probability of outcome s given decision d Pr(s|a,s’): prob. of state s given that action a is performed in state s’  But state space exponential in # of variables spelling out distributions explicitly is intractable

5 CSC 384 Lecture Slides (c) , C. Boutilier and P. Poupart Large State Spaces (Variables)  Bayes nets can be used to represent actions (action distributions) Pr(s|a,s’) If s, s’ are specified as assignments to a set of state variables we have: Pr(s|a,s’) = Pr(X 1,X 2,…,X n |A,X’ 1,X’ 2,…,X’ n ) this is just a joint distribution over variables, conditioned on action/decision and previous state. We have already seen that Bayes nets can represent such distributions compactly.

6 CSC 384 Lecture Slides (c) , C. Boutilier and P. Poupart Example Action using Dynamic BN TtTt LtLt CtCt RtRt T t+1 L t+1 C t+1 R t+1 Deliver Coffee action f R (L t, R t, C t, C t+1 ) f J (T t, T t+1 ) L R C C (t+1) C (t+1) T T T F T T T F T F F T T T F F T F T F F F F F T T (t+1) T (t+1) T F MtMt M t+1 M – mail waiting C – Craig has coffee T – lab tidy R – robot has coffee L – robot located in Craig’s office

7 CSC 384 Lecture Slides (c) , C. Boutilier and P. Poupart Dynamic BN Action Representation  Dynamic Bayesian networks (DBNs): a way to use BNs to represent specific actions list all state variables for time t (pre-action) list all state variables for time t+1 (post-action) indicate parents of all t+1 variables  these can include time t and time t+1 variables  network must be acyclic though specify CPT for each time t+1 variable  Note: generally no prior given for time t variables we’re (generally) interested in conditional distribution over post-action states given pre-action state so time t vars are instantiated as “evidence” when using a DBN (generally)

8 CSC 384 Lecture Slides (c) , C. Boutilier and P. Poupart Example of Dependence within Slice Throw rock at window action Alarm t Alarm t+1 Broken t Broken t+1 P(broken t+1 | broken t ) = 1 P(broken t+1 | ~broken t ) =.6 P(al t+1 | al t, Br t ) = 1 P(al t+1 | ~al t,~br t+1 ) = 0 P(al t+1 | ~al t,br t+1 ) =.95 Throwing rock has certain probability of breaking window and setting off alarm; but whether alarm is triggered depends on whether rock actually broke the window.

9 CSC 384 Lecture Slides (c) , C. Boutilier and P. Poupart Use of BN Action Reprsnt’n  DBNs: actions concisely, naturally specified These look a bit like STRIPS and the situation calculus, but allow for probabilistic effects  How to use: use to generate “expectimax” search tree to solve decision problems use directly in stochastic decision making algorithms  First use doesn’t buy us much computationally when solving decision problems. But second use allows us to compute expected utilities without enumerating the outcome space (tree) well see something like this with decision networks

10 CSC 384 Lecture Slides (c) , C. Boutilier and P. Poupart Decision Networks  Decision networks (more commonly known as influence diagrams) provide a way of representing sequential decision problems basic idea: represent the variables in the problem as you would in a BN add decision variables – variables that you “control” add utility variables – how good different states are

11 CSC 384 Lecture Slides (c) , C. Boutilier and P. Poupart Sample Decision Network Disease TstResult Chills Fever BloodTstDrug U optional

12 CSC 384 Lecture Slides (c) , C. Boutilier and P. Poupart Decision Networks: Chance Nodes  Chance nodes random variables, denoted by circles as in a BN, probabilistic dependence on parents Disease Fever Pr(flu) =.3 Pr(mal) =.1 Pr(none) =.6 Pr(f|flu) =.5 Pr(f|mal) =.3 Pr(f|none) =.05 TstResult BloodTst Pr(pos|flu,bt) =.2 Pr(neg|flu,bt) =.8 Pr(null|flu,bt) = 0 Pr(pos|mal,bt) =.9 Pr(neg|mal,bt) =.1 Pr(null|mal,bt) = 0 Pr(pos|no,bt) =.1 Pr(neg|no,bt) =.9 Pr(null|no,bt) = 0 Pr(pos|D,~bt) = 0 Pr(neg|D,~bt) = 0 Pr(null|D,~bt) = 1

13 CSC 384 Lecture Slides (c) , C. Boutilier and P. Poupart Decision Networks: Decision Nodes  Decision nodes variables decision maker sets, denoted by squares parents reflect information available at time decision is to be made  In example decision node: the actual values of Ch and Fev will be observed before the decision to take test must be made agent can make different decisions for each instantiation of parents (i.e., policies) Chills Fever BloodTst BT ∊ {bt, ~bt}

14 CSC 384 Lecture Slides (c) , C. Boutilier and P. Poupart Decision Networks: Value Node  Value node specifies utility of a state, denoted by a diamond utility depends only on state of parents of value node generally: only one value node in a decision network  Utility depends only on disease and drug Disease BloodTstDrug U optional U(fludrug, flu) = 20 U(fludrug, mal) = -300 U(fludrug, none) = -5 U(maldrug, flu) = -30 U(maldrug, mal) = 10 U(maldrug, none) = -20 U(no drug, flu) = -10 U(no drug, mal) = -285 U(no drug, none) = 30

15 CSC 384 Lecture Slides (c) , C. Boutilier and P. Poupart Decision Networks: Assumptions  Decision nodes are totally ordered decision variables D 1, D 2, …, D n decisions are made in sequence e.g., BloodTst (yes,no) decided before Drug (fd,md,no)  No-forgetting property any information available when decision D i is made is available when decision D j is made (for i < j) thus all parents of D i as well as Di would be parents of D j Chills Fever BloodTstDrug Dashed arcs ensure the no-forgetting property

16 CSC 384 Lecture Slides (c) , C. Boutilier and P. Poupart Policies  Let Par(D i ) be the parents of decision node D i Dom(Par(D i )) is the set of assignments to parents  A policy δ is a set of mappings δ i, one for each decision node D i δ i :Dom(Par(D i )) →Dom(D i ) δ i associates a decision with each parent asst for D i  For example, a policy for BT might be: δ BT (c,f) = bt δ BT (c,~f) = ~bt δ BT (~c,f) = bt δ BT (~c,~f) = ~bt Chills Fever BloodTst

17 CSC 384 Lecture Slides (c) , C. Boutilier and P. Poupart Value of a Policy  Value of a policy δ is the expected utility given that decision nodes are executed according to δ  Given asst x to the set X of all chance variables, let δ (x) denote the asst to decision variables dictated by δ e.g., asst to D 1 determined by it’s parents’ asst in x e.g., asst to D 2 determined by it’s parents’ asst in x along with whatever was assigned to D 1 etc.  Value of δ : EU( δ ) = Σ X P(X, δ (X)) U(X, δ (X))

18 CSC 384 Lecture Slides (c) , C. Boutilier and P. Poupart Value of a Policy  Note: Once δ has been fixed, the decision nodes become just like ordinary probability nodes in a Bayes net.  Thus we can compute a distribution over its values using variable elimination. Chills Fever BloodTst Pr(bt|c,f) = 1.0 Pr(bt|c,~f) = 0.0 Pr(bt|~c,f) = 1.0 Pr(bt|~c,~f) = 0.0 δBT (c,f) = bt δBT (c,~f) = ~bt δBT (~c,f) = bt δBT (~c,~f) = ~bt

19 CSC 384 Lecture Slides (c) , C. Boutilier and P. Poupart Value of a Policy  With all nodes in the influence diagram “converted” to ordinary Bayes net nodes by the fixed δ we can compute the posterior distribution of the various values of the utility node again by variable elimination. Disease Drug U U(fludrug, flu) = 20 U(fludrug, mal) = -300 U(fludrug, none) = -5 U(maldrug, flu) = -30 U(maldrug, mal) = 10 U(maldrug, none) = -20 U(no drug, flu) = -10 U(no drug, mal) = -285 U(no drug, none) = 30 From this posterior we can compute the expected value Pr(fludrug, flu) =.1 Pr(fludrug, mal) =.05 Pr(fludrug, none) =.05 Pr(maldrug, flu) =.1 Pr(maldrug, mal) =.1 Pr(maldrug, none) =.2 Pr(no drug, flu) =.1 Pr(no drug, mal) =.1 Pr(no drug, none) =.2

20 CSC 384 Lecture Slides (c) , C. Boutilier and P. Poupart Optimal Policies  An optimal policy is a policy δ * such that EU( δ* ) ≥ EU( δ ) for all policies δ  We can use the dynamic programming principle yet again to avoid enumerating all policies, and using variable elimination to do the computation.

21 CSC 384 Lecture Slides (c) , C. Boutilier and P. Poupart Computing the Best Policy  We can work backwards as follows  First compute optimal policy for Drug (last dec’n) for each asst to parents (C,F,BT,TR) and for each decision value (D = md,fd,none), compute the expected value of choosing that value of D using VE set policy choice for each value of parents to be the value of D that has max value eg: δ D (c,f,bt,pos) = md Disease TstResult Chills Fever BloodTstDrug U

22 CSC 384 Lecture Slides (c) , C. Boutilier and P. Poupart Computing the Best Policy  Next compute policy for BT given policy δ D (C,F,BT,TR) just determined for Drug Once δ D (C,F,BT,TR) is fixed, we can treat Drug as a normal random variable with deterministic probabilities i.e., for any instantiation of parents, value of Drug is fixed by policy δ D this means we can solve for optimal policy for BT just as before only uninstantiated vars are random vars (once we fix its parents)

23 CSC 384 Lecture Slides (c) , C. Boutilier and P. Poupart Computing the Best Policy  Computing the expected values with VE. suppose we have asst to parents of Drug we want to compute EU of deciding to set Drug = md we run variable elimination.  Treat C,F,BT,TR,Dr as evidence this reduces factors (e.g., U restricted to bt,md: depends on Dis) eliminate remaining variables (e.g., only Disease left) left with factor: U() = Σ Dis P(Dis| c,f,bt,pos,md)U(Dis)  We now know EU of doing Dr=md when c,f,bt,pos true  Can do same for fd,no to decide which is best Disease TstResult Chills Fever BloodTstDrug U

24 CSC 384 Lecture Slides (c) , C. Boutilier and P. Poupart Computing Expected Utilities  The preceding illustrates a general phenomenon computing expected utilities with BNs is quite easy utility nodes are just factors that can be dealt with using variable elimination EU = Σ A,B,C P(A,B,C) U(B,C) = Σ A,B,C P(C|B) P(B|A) P(A) U(B,C)  Just eliminate variables in the usual way U C B A

25 CSC 384 Lecture Slides (c) , C. Boutilier and P. Poupart Optimizing Policies: Key Points  If a decision node D has no decisions that follow it, we can find its policy by instantiating each of its parents and computing the expected utility of each decision for each parent instantiation no-forgetting means that all other decisions are instantiated (they must be parents) its easy to compute the expected utility using VE the number of computations is quite large: we run expected utility calculations (VE) for each parent instantiation together with each possible decision D might allow policy: choose max decision for each parent instant’n

26 CSC 384 Lecture Slides (c) , C. Boutilier and P. Poupart Optimizing Policies: Key Points  When a decision D node is optimized, it can be treated as a random variable for each instantiation of its parents we now know what value the decision should take just treat policy as a new CPT: for a given parent instantiation x, D gets δ (x) with probability 1(all other decisions get probability zero)  If we optimize from last decision to first, at each point we can optimize a specific decision by (a bunch of) simple VE calculations it’s successor decisions (optimized) are just normal nodes in the BNs (with CPTs)

27 CSC 384 Lecture Slides (c) , C. Boutilier and P. Poupart Decision Network Notes  Decision networks commonly used by decision analysts to help structure decision problems  Much work put into computationally effective techniques to solve these common trick: replace the decision nodes with random variables at outset and solve a plain Bayes net (a subtle but useful transformation)  Complexity much greater than BN inference we need to solve a number of BN inference problems one BN problem for each setting of decision node parents and decision node value