1 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart CSC384: Lecture 24  Last time Variable elimination, decision trees  Today decision.

1 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart CSC384: Lecture 24  Last time Variable elimination, decision trees  Today decision trees, decision networks  Readings: Today: 10.4 (decision trees, decision networks) Next week: wrap up  Announcements: none

2 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Evaluating a Decision Tree  U(n3) =.9*5 +.1*2  U(n4) =.8*3 +.2*4  U(s2) = max{U(n3), U(n4)} decision a or b (whichever is max)  U(n1) =.3U(s2) +.7U(s3)  U(s1) = max{U(n1), U(n2 )} decision: max of a, b s2 n3 ab.9.9.1.1 52 n4.8.8.2.2 34 s1 n1 ab.3.3.7.7 n2 s3

3 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Decision Tree Policies  Note that we don’t just compute values, but policies for the tree  A policy assigns a decision to each choice node in tree s2 n3 ab n4 s1 n1 ab.3.3.7.7 n2 s3s4 abab

4 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Large State Spaces (Variables)  Tree size: grows exponentially with depth  Specification: suppose each state is an assignment to variables then the number of distinct states can be huge  To represent outcomes of actions or decisions, we need to specify distributions Pr(s|d) : probability of outcome s given decision d Pr(s|a,s’): prob. of state s given that action a is performed in state s’  But state space exponential in # of variables spelling out distributions explicitly is intractable

5 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Large State Spaces (Variables)  Bayes nets can be used to represent actions (action distributions) Pr(s|a,s’) If s, s’ are specified as assignments to a set of state variables we have: Pr(s|a,s’) = Pr(X 1,X 2,…,X n |A,X’ 1,X’ 2,…,X’ n ) this is just a joint distribution over variables, conditioned on action/decision and previous state. We have already seen that Bayes nets can represent such distributions compactly.

6 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Example Action using Dynamic BN TtTt LtLt CtCt RtRt T t+1 L t+1 C t+1 R t+1 Deliver Coffee action f R (L t, R t, C t, C t+1 ) f J (T t, T t+1 ) L R C C (t+1) C (t+1) T T T 1.0 0.0 F T T 1.0 0.0 T F T 1.0 0.0 F F T 1.0 0.0 T T F 0.8 0.2 F T F 0.0 1.0 T F F 0.0 1.0 F F F 0.0 1.0 T T (t+1) T (t+1) T 1.0 0.0 F 0.0 1.0 MtMt M t+1 M – mail waiting C – Craig has coffee T – lab tidy R – robot has coffee L – robot located in Craig’s office

7 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Dynamic BN Action Representation  Dynamic Bayesian networks (DBNs): a way to use BNs to represent specific actions list all state variables for time t (pre-action) list all state variables for time t+1 (post-action) indicate parents of all t+1 variables  these can include time t and time t+1 variables  network must be acyclic though specify CPT for each time t+1 variable  Note: generally no prior given for time t variables we’re (generally) interested in conditional distribution over post-action states given pre-action state so time t vars are instantiated as “evidence” when using a DBN (generally)

8 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Example of Dependence within Slice Throw rock at window action Alarm t Alarm t+1 Broken t Broken t+1 P(broken t+1 | broken t ) = 1 P(broken t+1 | ~broken t ) =.6 P(al t+1 | al t, Br t ) = 1 P(al t+1 | ~al t,~br t+1 ) = 0 P(al t+1 | ~al t,br t+1 ) =.95 Throwing rock has certain probability of breaking window and setting off alarm; but whether alarm is triggered depends on whether rock actually broke the window.

9 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Use of BN Action Reprsnt’n  DBNs: actions concisely, naturally specified These look a bit like STRIPS and the situation calculus, but allow for probabilistic effects  How to use: use to generate “expectimax” search tree to solve decision problems use directly in stochastic decision making algorithms  First use doesn’t buy us much computationally when solving decision problems. But second use allows us to compute expected utilities without enumerating the outcome space (tree) well see something like this with decision networks

10 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Decision Networks  Decision networks (more commonly known as influence diagrams) provide a way of representing sequential decision problems basic idea: represent the variables in the problem as you would in a BN add decision variables – variables that you “control” add utility variables – how good different states are

13 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Decision Networks: Decision Nodes  Decision nodes variables decision maker sets, denoted by squares parents reflect information available at time decision is to be made  In example decision node: the actual values of Ch and Fev will be observed before the decision to take test must be made agent can make different decisions for each instantiation of parents (i.e., policies) Chills Fever BloodTst BT ∊ {bt, ~bt}

14 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Decision Networks: Value Node  Value node specifies utility of a state, denoted by a diamond utility depends only on state of parents of value node generally: only one value node in a decision network  Utility depends only on disease and drug Disease BloodTstDrug U optional U(fludrug, flu) = 20 U(fludrug, mal) = -300 U(fludrug, none) = -5 U(maldrug, flu) = -30 U(maldrug, mal) = 10 U(maldrug, none) = -20 U(no drug, flu) = -10 U(no drug, mal) = -285 U(no drug, none) = 30

15 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Decision Networks: Assumptions  Decision nodes are totally ordered decision variables D 1, D 2, …, D n decisions are made in sequence e.g., BloodTst (yes,no) decided before Drug (fd,md,no)  No-forgetting property any information available when decision D i is made is available when decision D j is made (for i < j) thus all parents of D i as well as Di would be parents of D j Chills Fever BloodTstDrug Dashed arcs ensure the no-forgetting property

16 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Policies  Let Par(D i ) be the parents of decision node D i Dom(Par(D i )) is the set of assignments to parents  A policy δ is a set of mappings δ i, one for each decision node D i δ i :Dom(Par(D i )) →Dom(D i ) δ i associates a decision with each parent asst for D i  For example, a policy for BT might be: δ BT (c,f) = bt δ BT (c,~f) = ~bt δ BT (~c,f) = bt δ BT (~c,~f) = ~bt Chills Fever BloodTst

17 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Value of a Policy  Value of a policy δ is the expected utility given that decision nodes are executed according to δ  Given asst x to the set X of all chance variables, let δ (x) denote the asst to decision variables dictated by δ e.g., asst to D 1 determined by it’s parents’ asst in x e.g., asst to D 2 determined by it’s parents’ asst in x along with whatever was assigned to D 1 etc.  Value of δ : EU( δ ) = Σ X P(X, δ (X)) U(X, δ (X))

18 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Value of a Policy  Note: Once δ has been fixed, the decision nodes become just like ordinary probability nodes in a Bayes net.  Thus we can compute a distribution over its values using variable elimination. Chills Fever BloodTst Pr(bt|c,f) = 1.0 Pr(bt|c,~f) = 0.0 Pr(bt|~c,f) = 1.0 Pr(bt|~c,~f) = 0.0 δBT (c,f) = bt δBT (c,~f) = ~bt δBT (~c,f) = bt δBT (~c,~f) = ~bt

19 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Value of a Policy  With all nodes in the influence diagram “converted” to ordinary Bayes net nodes by the fixed δ we can compute the posterior distribution of the various values of the utility node again by variable elimination. Disease Drug U U(fludrug, flu) = 20 U(fludrug, mal) = -300 U(fludrug, none) = -5 U(maldrug, flu) = -30 U(maldrug, mal) = 10 U(maldrug, none) = -20 U(no drug, flu) = -10 U(no drug, mal) = -285 U(no drug, none) = 30 From this posterior we can compute the expected value Pr(fludrug, flu) =.1 Pr(fludrug, mal) =.05 Pr(fludrug, none) =.05 Pr(maldrug, flu) =.1 Pr(maldrug, mal) =.1 Pr(maldrug, none) =.2 Pr(no drug, flu) =.1 Pr(no drug, mal) =.1 Pr(no drug, none) =.2

20 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Optimal Policies  An optimal policy is a policy δ * such that EU( δ* ) ≥ EU( δ ) for all policies δ  We can use the dynamic programming principle yet again to avoid enumerating all policies, and using variable elimination to do the computation.

21 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Computing the Best Policy  We can work backwards as follows  First compute optimal policy for Drug (last dec’n) for each asst to parents (C,F,BT,TR) and for each decision value (D = md,fd,none), compute the expected value of choosing that value of D using VE set policy choice for each value of parents to be the value of D that has max value eg: δ D (c,f,bt,pos) = md Disease TstResult Chills Fever BloodTstDrug U

22 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Computing the Best Policy  Next compute policy for BT given policy δ D (C,F,BT,TR) just determined for Drug Once δ D (C,F,BT,TR) is fixed, we can treat Drug as a normal random variable with deterministic probabilities i.e., for any instantiation of parents, value of Drug is fixed by policy δ D this means we can solve for optimal policy for BT just as before only uninstantiated vars are random vars (once we fix its parents)

23 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Computing the Best Policy  Computing the expected values with VE. suppose we have asst to parents of Drug we want to compute EU of deciding to set Drug = md we run variable elimination.  Treat C,F,BT,TR,Dr as evidence this reduces factors (e.g., U restricted to bt,md: depends on Dis) eliminate remaining variables (e.g., only Disease left) left with factor: U() = Σ Dis P(Dis| c,f,bt,pos,md)U(Dis)  We now know EU of doing Dr=md when c,f,bt,pos true  Can do same for fd,no to decide which is best Disease TstResult Chills Fever BloodTstDrug U

24 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Computing Expected Utilities  The preceding illustrates a general phenomenon computing expected utilities with BNs is quite easy utility nodes are just factors that can be dealt with using variable elimination EU = Σ A,B,C P(A,B,C) U(B,C) = Σ A,B,C P(C|B) P(B|A) P(A) U(B,C)  Just eliminate variables in the usual way U C B A

25 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Optimizing Policies: Key Points  If a decision node D has no decisions that follow it, we can find its policy by instantiating each of its parents and computing the expected utility of each decision for each parent instantiation no-forgetting means that all other decisions are instantiated (they must be parents) its easy to compute the expected utility using VE the number of computations is quite large: we run expected utility calculations (VE) for each parent instantiation together with each possible decision D might allow policy: choose max decision for each parent instant’n

26 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Optimizing Policies: Key Points  When a decision D node is optimized, it can be treated as a random variable for each instantiation of its parents we now know what value the decision should take just treat policy as a new CPT: for a given parent instantiation x, D gets δ (x) with probability 1(all other decisions get probability zero)  If we optimize from last decision to first, at each point we can optimize a specific decision by (a bunch of) simple VE calculations it’s successor decisions (optimized) are just normal nodes in the BNs (with CPTs)

27 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Decision Network Notes  Decision networks commonly used by decision analysts to help structure decision problems  Much work put into computationally effective techniques to solve these common trick: replace the decision nodes with random variables at outset and solve a plain Bayes net (a subtle but useful transformation)  Complexity much greater than BN inference we need to solve a number of BN inference problems one BN problem for each setting of decision node parents and decision node value

1 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart CSC384: Lecture 24  Last time Variable elimination, decision trees  Today decision.

Similar presentations

Presentation on theme: "1 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart CSC384: Lecture 24  Last time Variable elimination, decision trees  Today decision."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart CSC384: Lecture 24  Last time Variable elimination, decision trees  Today decision.

Similar presentations

Presentation on theme: "1 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart CSC384: Lecture 24  Last time Variable elimination, decision trees  Today decision."— Presentation transcript:

Similar presentations

About project

Feedback