Download presentation
Presentation is loading. Please wait.
1
CS 188: Artificial Intelligence Fall 2009 Lecture 18: Decision Diagrams 10/29/2009 Dan Klein – UC Berkeley
2
Announcements Mid-Semester Evaluations Link is in your email Assignments W3 out later this week P4 out next week Contest! 2
3
Decision Networks MEU: choose the action which maximizes the expected utility given the evidence Can directly operationalize this with decision networks Bayes nets with nodes for utility and actions Lets us calculate the expected utility for each action New node types: Chance nodes (just like BNs) Actions (rectangles, cannot have parents, act as observed evidence) Utility node (diamond, depends on action and chance nodes) Weather Forecast Umbrella U 3 [DEMO: Ghostbusters]
4
Decision Networks Action selection: Instantiate all evidence Set action node(s) each possible way Calculate posterior for all parents of utility node, given the evidence Calculate expected utility for each action Choose maximizing action Weather Forecast Umbrella U 4
5
Example: Decision Networks Weather Umbrella U WP(W) sun0.7 rain0.3 AWU(A,W) leavesun100 leaverain0 takesun20 takerain70 Umbrella = leave Umbrella = take Optimal decision = leave
6
Decisions as Outcome Trees Almost exactly like expectimax / MDPs What’s changed? 6 U(t,s) Weather take leave {} sun U(t,r) rain U(l,s)U(l,r) rain sun
7
Evidence in Decision Networks Find P(W|F=bad) Select for evidence First we join P(W) and P(bad|W) Then we normalize Weather Forecast WP(W) sun0.7 rain0.3 FP(F|rain) good0.1 bad0.9 FP(F|sun) good0.8 bad0.2 WP(W) sun0.7 rain0.3 WP(F=bad|W) sun0.2 rain0.9 WP(W,F=bad) sun0.14 rain0.27 WP(W | F=bad) sun0.34 rain0.66 7 Umbrella U
8
Example: Decision Networks Weather Forecast =bad Umbrella U AWU(A,W) leavesun100 leaverain0 takesun20 takerain70 WP(W|F=bad) sun0.34 rain0.66 Umbrella = leave Umbrella = take Optimal decision = take 8
9
Decisions as Outcome Trees 9 U(t,s) W | {b} take leave sun U(t,r) rain U(l,s)U(l,r) rain sun {b}
10
Conditioning on Action Nodes An action node can be a parent of a chance node Chance node conditions on the outcome of the action Action nodes are like observed variables in a Bayes’ net, except we max over their values 10 S’ A U S T(s,a,s’) R(s,a,s’)
11
Value of Information Idea: compute value of acquiring evidence Can be done directly from decision network Example: buying oil drilling rights Two blocks A and B, exactly one has oil, worth k You can drill in one location Prior probabilities 0.5 each, & mutually exclusive Drilling in either A or B has MEU = k/2 Question: what’s the value of information? Value of knowing which of A or B has oil Value is expected gain in MEU from new info Survey may say “oil in a” or “oil in b,” prob 0.5 each If we know OilLoc, MEU is k (either way) Gain in MEU from knowing OilLoc? VPI(OilLoc) = k/2 Fair price of information: k/2 OilLoc DrillLoc U DOU aak ab0 ba0 bbk OP a1/2 b 11
12
Value of Information Assume we have evidence E=e. Value if we act now: Assume we see that E’ = e’. Value if we act then: BUT E’ is a random variable whose value is unknown, so we don’t know what e’ will be Expected value if E’ is revealed and then we act: Value of information: how much MEU goes up by revealing E’ first: P(s | e) {e} a U {e, e’} a P(s | e, e’) U {e} P(e’ | e) {e, e’}
13
VPI Example: Weather Weather Forecast Umbrella U AWU leavesun100 leaverain0 takesun20 takerain70 MEU with no evidence MEU if forecast is bad MEU if forecast is good FP(F) good0.59 bad0.41 Forecast distribution 13 [Demo]
14
VPI Properties Nonnegative Nonadditive ---consider, e.g., obtaining E j twice Order-independent 14
15
Quick VPI Questions The soup of the day is either clam chowder or split pea, but you wouldn’t order either one. What’s the value of knowing which it is? There are two kinds of plastic forks at a picnic. It must be that one is slightly better. What’s the value of knowing which? You have $10 to bet double-or-nothing and there is a 75% chance that Berkeley will beat Stanford. What’s the value of knowing the outcome in advance? You must bet on Cal, either way. What’s the value now?
16
Reasoning over Time Often, we want to reason about a sequence of observations Speech recognition Robot localization User attention Medical monitoring Need to introduce time into our models Basic approach: hidden Markov models (HMMs) More general: dynamic Bayes’ nets 16
17
Markov Models A Markov model is a chain-structured BN Each node is identically distributed (stationarity) Value of X at a given time is called the state As a BN: Parameters: called transition probabilities or dynamics, specify how the state evolves over time (also, initial probs) X2X2 X1X1 X3X3 X4X4 [DEMO: Ghostbusters]
18
Conditional Independence Basic conditional independence: Past and future independent of the present Each time step only depends on the previous This is called the (first order) Markov property Note that the chain is just a (growing) BN We can always use generic BN reasoning on it if we truncate the chain at a fixed length X2X2 X1X1 X3X3 X4X4 18
19
Example: Markov Chain Weather: States: X = {rain, sun} Transitions: Initial distribution: 1.0 sun What’s the probability distribution after one step? rainsun 0.9 0.1 This is a CPT, not a BN! 19
20
Mini-Forward Algorithm Question: probability of being in state x at time t? Slow answer: Enumerate all sequences of length t which end in s Add up their probabilities … 20
21
Mini-Forward Algorithm Better way: cached incremental belief updates An instance of variable elimination! sun rain sun rain sun rain sun rain Forward simulation
22
Example From initial observation of sun From initial observation of rain P(X 1 )P(X 2 )P(X 3 )P(X ) P(X 1 )P(X 2 )P(X 3 )P(X ) 22
23
Stationary Distributions If we simulate the chain long enough: What happens? Uncertainty accumulates Eventually, we have no idea what the state is! Stationary distributions: For most chains, the distribution we end up in is independent of the initial distribution Called the stationary distribution of the chain Usually, can only predict a short time out [DEMO: Ghostbusters]
24
Web Link Analysis PageRank over a web graph Each web page is a state Initial distribution: uniform over pages Transitions: With prob. c, uniform jump to a random page (dotted lines) With prob. 1-c, follow a random outlink (solid lines) Stationary distribution Will spend more time on highly reachable pages E.g. many ways to get to the Acrobat Reader download page! Somewhat robust to link spam Google 1.0 returned the set of pages containing all your keywords in decreasing rank, now all search engines use link analysis along with many other factors 24
25
Value of (Perfect) Information Current evidence E=e, utility depends on S=s Potential new evidence E’: suppose we knew E’ = e’ BUT E’ is a random variable whose value is currently unknown, so: Must compute expected gain over all possible values (VPI = value of perfect information) 25
26
VPI Example: Ghostbusters TBGP(T,B,G) +t+b+g0.16 +t+b gg 0.16 +t bb +g0.24 +t bb gg 0.04 t +b+g0.04 tt +b gg 0.24 tt bb +g0.06 tt bb gg Reminder: ghost is hidden, sensors are noisy T: Top square is red B: Bottom square is red G: Ghost is in the top Sensor model: P( +t | +g ) = 0.8 P( +t | g ) = 0.4 P( +b | +g) = 0.4 P( +b | g ) = 0.8 Joint Distribution [Demo]
27
VPI Example: Ghostbusters TBGP(T,B,G) +t+b+g0.16 +t+b gg 0.16 +t bb +g0.24 +t bb gg 0.04 t +b+g0.04 tt +b gg 0.24 tt bb +g0.06 tt bb gg Utility of bust is 2, no bust is 0 Q1: What’s the value of knowing T if I know nothing? Q1’: E P(T) [MEU(t) – MEU()] Q2: What’s the value of knowing B if I already know that T is true (red)? Q2’: E P(B|t) [MEU(t,b) – MEU(t)] How low can the value of information ever be? Joint Distribution [Demo]
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.