Download presentation
Presentation is loading. Please wait.
Published byVictoria O’Neal’ Modified over 9 years ago
1
1 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart CSC384: Lecture 24 Last time Variable elimination, decision trees Today decision trees, decision networks Readings: Today: 10.4 (decision trees, decision networks) Next week: wrap up Announcements: none
2
2 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Evaluating a Decision Tree U(n3) =.9*5 +.1*2 U(n4) =.8*3 +.2*4 U(s2) = max{U(n3), U(n4)} decision a or b (whichever is max) U(n1) =.3U(s2) +.7U(s3) U(s1) = max{U(n1), U(n2 )} decision: max of a, b s2 n3 ab.9.9.1.1 52 n4.8.8.2.2 34 s1 n1 ab.3.3.7.7 n2 s3
3
3 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Decision Tree Policies Note that we don’t just compute values, but policies for the tree A policy assigns a decision to each choice node in tree s2 n3 ab n4 s1 n1 ab.3.3.7.7 n2 s3s4 abab
4
4 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Large State Spaces (Variables) Tree size: grows exponentially with depth Specification: suppose each state is an assignment to variables then the number of distinct states can be huge To represent outcomes of actions or decisions, we need to specify distributions Pr(s|d) : probability of outcome s given decision d Pr(s|a,s’): prob. of state s given that action a is performed in state s’ But state space exponential in # of variables spelling out distributions explicitly is intractable
5
5 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Large State Spaces (Variables) Bayes nets can be used to represent actions (action distributions) Pr(s|a,s’) If s, s’ are specified as assignments to a set of state variables we have: Pr(s|a,s’) = Pr(X 1,X 2,…,X n |A,X’ 1,X’ 2,…,X’ n ) this is just a joint distribution over variables, conditioned on action/decision and previous state. We have already seen that Bayes nets can represent such distributions compactly.
6
6 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Example Action using Dynamic BN TtTt LtLt CtCt RtRt T t+1 L t+1 C t+1 R t+1 Deliver Coffee action f R (L t, R t, C t, C t+1 ) f J (T t, T t+1 ) L R C C (t+1) C (t+1) T T T 1.0 0.0 F T T 1.0 0.0 T F T 1.0 0.0 F F T 1.0 0.0 T T F 0.8 0.2 F T F 0.0 1.0 T F F 0.0 1.0 F F F 0.0 1.0 T T (t+1) T (t+1) T 1.0 0.0 F 0.0 1.0 MtMt M t+1 M – mail waiting C – Craig has coffee T – lab tidy R – robot has coffee L – robot located in Craig’s office
7
7 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Dynamic BN Action Representation Dynamic Bayesian networks (DBNs): a way to use BNs to represent specific actions list all state variables for time t (pre-action) list all state variables for time t+1 (post-action) indicate parents of all t+1 variables these can include time t and time t+1 variables network must be acyclic though specify CPT for each time t+1 variable Note: generally no prior given for time t variables we’re (generally) interested in conditional distribution over post-action states given pre-action state so time t vars are instantiated as “evidence” when using a DBN (generally)
8
8 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Example of Dependence within Slice Throw rock at window action Alarm t Alarm t+1 Broken t Broken t+1 P(broken t+1 | broken t ) = 1 P(broken t+1 | ~broken t ) =.6 P(al t+1 | al t, Br t ) = 1 P(al t+1 | ~al t,~br t+1 ) = 0 P(al t+1 | ~al t,br t+1 ) =.95 Throwing rock has certain probability of breaking window and setting off alarm; but whether alarm is triggered depends on whether rock actually broke the window.
9
9 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Use of BN Action Reprsnt’n DBNs: actions concisely, naturally specified These look a bit like STRIPS and the situation calculus, but allow for probabilistic effects How to use: use to generate “expectimax” search tree to solve decision problems use directly in stochastic decision making algorithms First use doesn’t buy us much computationally when solving decision problems. But second use allows us to compute expected utilities without enumerating the outcome space (tree) well see something like this with decision networks
10
10 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Decision Networks Decision networks (more commonly known as influence diagrams) provide a way of representing sequential decision problems basic idea: represent the variables in the problem as you would in a BN add decision variables – variables that you “control” add utility variables – how good different states are
11
11 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Sample Decision Network Disease TstResult Chills Fever BloodTstDrug U optional
12
12 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Decision Networks: Chance Nodes Chance nodes random variables, denoted by circles as in a BN, probabilistic dependence on parents Disease Fever Pr(flu) =.3 Pr(mal) =.1 Pr(none) =.6 Pr(f|flu) =.5 Pr(f|mal) =.3 Pr(f|none) =.05 TstResult BloodTst Pr(pos|flu,bt) =.2 Pr(neg|flu,bt) =.8 Pr(null|flu,bt) = 0 Pr(pos|mal,bt) =.9 Pr(neg|mal,bt) =.1 Pr(null|mal,bt) = 0 Pr(pos|no,bt) =.1 Pr(neg|no,bt) =.9 Pr(null|no,bt) = 0 Pr(pos|D,~bt) = 0 Pr(neg|D,~bt) = 0 Pr(null|D,~bt) = 1
13
13 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Decision Networks: Decision Nodes Decision nodes variables decision maker sets, denoted by squares parents reflect information available at time decision is to be made In example decision node: the actual values of Ch and Fev will be observed before the decision to take test must be made agent can make different decisions for each instantiation of parents (i.e., policies) Chills Fever BloodTst BT ∊ {bt, ~bt}
14
14 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Decision Networks: Value Node Value node specifies utility of a state, denoted by a diamond utility depends only on state of parents of value node generally: only one value node in a decision network Utility depends only on disease and drug Disease BloodTstDrug U optional U(fludrug, flu) = 20 U(fludrug, mal) = -300 U(fludrug, none) = -5 U(maldrug, flu) = -30 U(maldrug, mal) = 10 U(maldrug, none) = -20 U(no drug, flu) = -10 U(no drug, mal) = -285 U(no drug, none) = 30
15
15 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Decision Networks: Assumptions Decision nodes are totally ordered decision variables D 1, D 2, …, D n decisions are made in sequence e.g., BloodTst (yes,no) decided before Drug (fd,md,no) No-forgetting property any information available when decision D i is made is available when decision D j is made (for i < j) thus all parents of D i as well as Di would be parents of D j Chills Fever BloodTstDrug Dashed arcs ensure the no-forgetting property
16
16 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Policies Let Par(D i ) be the parents of decision node D i Dom(Par(D i )) is the set of assignments to parents A policy δ is a set of mappings δ i, one for each decision node D i δ i :Dom(Par(D i )) →Dom(D i ) δ i associates a decision with each parent asst for D i For example, a policy for BT might be: δ BT (c,f) = bt δ BT (c,~f) = ~bt δ BT (~c,f) = bt δ BT (~c,~f) = ~bt Chills Fever BloodTst
17
17 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Value of a Policy Value of a policy δ is the expected utility given that decision nodes are executed according to δ Given asst x to the set X of all chance variables, let δ (x) denote the asst to decision variables dictated by δ e.g., asst to D 1 determined by it’s parents’ asst in x e.g., asst to D 2 determined by it’s parents’ asst in x along with whatever was assigned to D 1 etc. Value of δ : EU( δ ) = Σ X P(X, δ (X)) U(X, δ (X))
18
18 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Value of a Policy Note: Once δ has been fixed, the decision nodes become just like ordinary probability nodes in a Bayes net. Thus we can compute a distribution over its values using variable elimination. Chills Fever BloodTst Pr(bt|c,f) = 1.0 Pr(bt|c,~f) = 0.0 Pr(bt|~c,f) = 1.0 Pr(bt|~c,~f) = 0.0 δBT (c,f) = bt δBT (c,~f) = ~bt δBT (~c,f) = bt δBT (~c,~f) = ~bt
19
19 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Value of a Policy With all nodes in the influence diagram “converted” to ordinary Bayes net nodes by the fixed δ we can compute the posterior distribution of the various values of the utility node again by variable elimination. Disease Drug U U(fludrug, flu) = 20 U(fludrug, mal) = -300 U(fludrug, none) = -5 U(maldrug, flu) = -30 U(maldrug, mal) = 10 U(maldrug, none) = -20 U(no drug, flu) = -10 U(no drug, mal) = -285 U(no drug, none) = 30 From this posterior we can compute the expected value Pr(fludrug, flu) =.1 Pr(fludrug, mal) =.05 Pr(fludrug, none) =.05 Pr(maldrug, flu) =.1 Pr(maldrug, mal) =.1 Pr(maldrug, none) =.2 Pr(no drug, flu) =.1 Pr(no drug, mal) =.1 Pr(no drug, none) =.2
20
20 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Optimal Policies An optimal policy is a policy δ * such that EU( δ* ) ≥ EU( δ ) for all policies δ We can use the dynamic programming principle yet again to avoid enumerating all policies, and using variable elimination to do the computation.
21
21 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Computing the Best Policy We can work backwards as follows First compute optimal policy for Drug (last dec’n) for each asst to parents (C,F,BT,TR) and for each decision value (D = md,fd,none), compute the expected value of choosing that value of D using VE set policy choice for each value of parents to be the value of D that has max value eg: δ D (c,f,bt,pos) = md Disease TstResult Chills Fever BloodTstDrug U
22
22 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Computing the Best Policy Next compute policy for BT given policy δ D (C,F,BT,TR) just determined for Drug Once δ D (C,F,BT,TR) is fixed, we can treat Drug as a normal random variable with deterministic probabilities i.e., for any instantiation of parents, value of Drug is fixed by policy δ D this means we can solve for optimal policy for BT just as before only uninstantiated vars are random vars (once we fix its parents)
23
23 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Computing the Best Policy Computing the expected values with VE. suppose we have asst to parents of Drug we want to compute EU of deciding to set Drug = md we run variable elimination. Treat C,F,BT,TR,Dr as evidence this reduces factors (e.g., U restricted to bt,md: depends on Dis) eliminate remaining variables (e.g., only Disease left) left with factor: U() = Σ Dis P(Dis| c,f,bt,pos,md)U(Dis) We now know EU of doing Dr=md when c,f,bt,pos true Can do same for fd,no to decide which is best Disease TstResult Chills Fever BloodTstDrug U
24
24 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Computing Expected Utilities The preceding illustrates a general phenomenon computing expected utilities with BNs is quite easy utility nodes are just factors that can be dealt with using variable elimination EU = Σ A,B,C P(A,B,C) U(B,C) = Σ A,B,C P(C|B) P(B|A) P(A) U(B,C) Just eliminate variables in the usual way U C B A
25
25 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Optimizing Policies: Key Points If a decision node D has no decisions that follow it, we can find its policy by instantiating each of its parents and computing the expected utility of each decision for each parent instantiation no-forgetting means that all other decisions are instantiated (they must be parents) its easy to compute the expected utility using VE the number of computations is quite large: we run expected utility calculations (VE) for each parent instantiation together with each possible decision D might allow policy: choose max decision for each parent instant’n
26
26 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Optimizing Policies: Key Points When a decision D node is optimized, it can be treated as a random variable for each instantiation of its parents we now know what value the decision should take just treat policy as a new CPT: for a given parent instantiation x, D gets δ (x) with probability 1(all other decisions get probability zero) If we optimize from last decision to first, at each point we can optimize a specific decision by (a bunch of) simple VE calculations it’s successor decisions (optimized) are just normal nodes in the BNs (with CPTs)
27
27 CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart Decision Network Notes Decision networks commonly used by decision analysts to help structure decision problems Much work put into computationally effective techniques to solve these common trick: replace the decision nodes with random variables at outset and solve a plain Bayes net (a subtle but useful transformation) Complexity much greater than BN inference we need to solve a number of BN inference problems one BN problem for each setting of decision node parents and decision node value
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.