Making Simple Decisions Chapter 16 Some material borrowed from Jean-Claude Latombe and Daphne Koller by way of Marie desJadines,
Topics Decision making under uncertainty –Utility theory and rationality –Expected utility –Utility functions –Multiattribute utility functions –Preference structures –Decision networks –Value of information
Uncertain Outcomes of Actions Some actions may have uncertain outcomes –Action: spend $10 to buy a lottery which pays $1000 to the winner –Outcome: {win, not-win} Each outcome is associated with some merit (utility) –Win: gain $990 –Not-win: lose $10 There is a probability distribution associated with the outcomes of this action (0.0001, ). Should I take this action?
Expected Utility Random variable X with n values x 1,…,x n and distribution (p 1,…,p n ) –X is the outcome of performing action A (i.e., the state reached after A is taken) Function U of X –U is a mapping from states to numerical utilities (values) The expected utility of performing action A is EU[A] = i=1,…,n p(x i |A)U(x i ) Expected utility of lottery: *99 0– *10 = – Utility of each outcome Probability of each outcome
s0s0 s3s3 s2s2 s1s1 A U(S0|A1) = 100 x x x 0.1 = = 62 One State/One Action Example
s0s0 s3s3 s2s2 s1s1 A A2 s4s U1(S0|A1) = 62 U2(S0|A2) = 74 U(S0) = max{U1(S0|A1),U2(S0|A2)} = 74 One State/Two Actions Example
s0s0 s3s3 s2s2 s1s1 A A2 s4s U1(S0|A1) = 62 – 5 = 57 U2(S0|A2) = 74 – 25 = 49 U(S0) = max{U1(S0|A1),U2(S0|A2)} = Introducing Action Costs
MEU Principle Decision theory: A rational agent should choose the action that maximizes the agent’s expected utility Maximizing expected utility (MEU) is a normative criterion for rational choices of actions Must have complete model of: –Actions –States –Utilities Even if you have a complete model, will be computationally intractable
Comparing outcomes Which is better: A = Being rich and sunbathing where it’s warm B = Being rich and sunbathing where it’s cool C = Being poor and sunbathing where it’s warm D = Being poor and sunbathing where it’s cool Multiattribute utility theory –A clearly dominates B: A > B. A > C. C > D. A > D. What about B vs. C? –Simplest case: Additive value function (just add the individual attribute utilities) –Others use weighted utility, based on the relative importance of these attributes –Learning the combined utility function (similar to joint prob. table)
Axioms of Utility Theory Orderability –(A>B) (A<B) (A~B) Transitivity –(A>B) (B>C) (A>C) Continuity –A>B>C p [p,A; 1-p,C] ~ B Substitutability –A~B [p,A; 1-p,C]~[p,B; 1-p,C] Monotonicity –A>B (p≥q [p,A; 1-p,B] >~ [q,A; 1-q,B]) Decomposability –[p,A; 1-p, [q,B; 1-q, C]] ~ [p,A; (1-p)q, B; (1-p)(1-q), C] 10
Money Versus Utility Money <> Utility –More money is better, but not always in a linear relationship to the amount of money Expected Monetary Value Risk-averse: U(L) < U(S EMV(L) ) Risk-seeking: U(L) > U(S EMV(L) ) Risk-neutral: U(L) = U(S EMV(L) ) 11
Value Function Provides a ranking of alternatives, but not a meaningful metric scale Also known as an “ordinal utility function” Sometimes, only relative judgments (value functions) are necessary At other times, absolute judgments (utility functions) are required 12
Multiattribute Utility Theory A given state may have multiple utilities –...because of multiple evaluation criteria –...because of multiple agents (interested parties) with different utility functions 13
Decision networks Extend Bayesian nets to handle actions and utilities –a.k.a. influence diagrams Make use of Bayesian net inference Useful application: Value of Information
Decision network representation Chance nodes: random variables, as in Bayesian nets Decision nodes: actions that decision maker can take Utility/value nodes: the utility of the outcome state.
R&N example
Evaluating decision networks Set the evidence variables for the current state. For each possible value of the decision node (assume just one): –Set the decision node to that value. –Calculate the posterior probabilities for the parent nodes of the utility node, using BN inference. –Calculate the resulting utility for the action. Return the action with the highest utility.
Exercise: Umbrella network Weather Forecast Umbrella Happiness take/don’t take f w p(f|w) sunny rain 0.3 rainy rain 0.7 sunny no rain 0.8 rainy no rain 0.2 P(rain) = 0.4 U(lug, rain) = -25 U(lug, ~rain) = 0 U(~lug, rain) = -100 U(~lug, ~rain) = 100 Lug umbrella P(lug|take) = 1.0 P(~lug|~take)=1.0 EU(take) = U(lug, rain)*P(lug)*p(rain) + U(lug, ~rain)*P(lug)*p(~rain) = -25* *P(~rain) = -25*0.4 = -10 EU(~take) = U(~lug, rain)*P(~lug)*p(rain) + U(~lug, ~rain)*P(~lug)*p(~rain) = -100* *0.6 = 20
Umbrella network Weather Forecast Umbrella Happiness take/don’t take f w p(f|w) sunny rain 0.3 rainy rain 0.7 sunny no rain 0.8 rainy no rain 0.2 P(rain) = 0.4 U(lug, rain) = -25 U(lug, ~rain) = 0 U(~lug, rain) = -100 U(~lug, ~rain) = 100 Lug umbrella P(lug|take) = 1.0 P(~lug|~take)=1.0 Decision may be helped with forecast (additional information) D(F=Sunny) = Take D(F=Rainy) = Not_Take
Value of Perfect Information (VPI) How much is it worth to observe (with certainty) a random variable X? Suppose the agent’s current knowledge is E. The value of the current best action is: EU(α | E) = max A ∑ i U(Result i (A)) p(Result i (A) | E, Do(A)) The value of the new best action after observing the value of X is: EU(α’ | E,X) = max A ∑ i U(Result i (A)) p(Result i (A) | E, X, Do(A)) …But we don’t know the value of X yet, so we have to sum over its possible values The value of perfect information for X is therefore: VPI(X) = ( ∑ k p(x k | E) EU(α xk | x k, E)) – EU (α | E) Probability of each value of X Expected utility of the best action given that value of X Expected utility of the best action if we don’t know X (i.e., currently)
Value of Perfect Information (VPI) Show that VPI(X) = ( ∑ k p(x k | E) EU(α xk | x k, E)) – EU (α | E) >= 0. By Bayes’ theorem we have: EU(α | E) = ∑ i U(Result i (α)) p(Result i (α) | E, α) = ∑ i U(Result i (α)) ∑ k p(Result i (α) | E, x k, α) p(x k | E) = ∑ k p(x k | E) EU(α | x k, E)) Then VPI(X) can be re-written as VPI(X) = ( ∑ k p(x k | E) (EU(α xk | x k, E)) – EU (α | x k, E)) But α xk is the action with the highest EU for the given x k and E, so EU(α xk | x k, E)) – EU (α | x k, E) >= 0 for all x k, and thus VPI(X) >= 0. Observing X is worthwhile only if VPI(X) > Cost of observing X.
VPI exercise: Umbrella network Weather Forecast Umbrella Happiness take/don’t take f w p(f|w) sunny rain 0.3 rainy rain 0.7 sunny no rain 0.8 rainy no rain 0.2 P(rain) = 0.4 U(lug, rain) = -25 U(lug, ~rain) = 0 U(~lug, rain) = -100 U(~lug, ~rain) = 100 Lug umbrella P(lug|take) = 1.0 P(~lug|~take)=1.0 What’s the value of knowing the weather forecast before leaving home?
Exercise: Umbrella network f w p(f|w) sunny rain 0.3 rainy rain 0.7 sunny no rain 0.8 rainy no rain 0.2 p(rain|sunny) = 0.12 *5/3 = 0.2 p(~rain|sunny) = 0.48*5/3 = 0.8 Similarly, we have p(rain|rainy) = 0.12 *2.5 = 0.7 p(~rain|rainy) = 0.28*2.5 = 0.3 p(W|F) = α*p(F|W)*P(W) p(sunny|rain)*p(rain) = 0.3*0.4 = 0.12 P(sunny|~rain)*p(~rain) = 0.8*0.6 = 0.48 α = 1/( ) = 5/3 EU(take|f=sunny)) = -25*P(rain|sunny) + 0*P(~rain|sunny) = -25*0.2 = -5 EU(~take|f=sunny) = -100* *0.8 = 60 a1 = ~take EU(take|f=rainy)) = -25*P(rain|rainy) + 0*P(~rain|rainy) = -25*0.7 = EU(~take|f=rainy) = -100* *0.3 = -40 a2 = take VPI(F) = 60*P(f=sunny) – 17.5*p(f=rainy) – 20 = 60*0.6 – 17.5*0.4 – 20 = 9 P(rain) = 0.4