Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006.

Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006

Utility-Based Agent environment agent ? sensors actuators

Non-deterministic vs. Probabilistic Uncertainty ? bac {a,b,c}  decision that is best for worst case ? bac {a(p a ),b(p b ),c(p c )}  decision that maximizes expected utility value Non-deterministic modelProbabilistic model ~ Adversarial search

Expected Utility Random variable X with n values x 1,…,x n and distribution (p 1,…,p n ) E.g.: X i is Result i (A)|Do(A), E, the state reached after doing an action A given E, what we know about the current state Function U of X E.g., U is the utility of a state The expected utility of A is EU[A|E] =  i=1,…,n p(x i |A)U(x i ) =  i=1,…,n p(Result i (A)|Do(A),E)U(Result i (A))

s0s0 s3s3 s2s2 s1s1 A1 0.20.70.1 1005070 U(S0) = 100 x 0.2 + 50 x 0.7 + 70 x 0.1 = 20 + 35 + 7 = 62 One State/One Action Example

s0s0 s3s3 s2s2 s1s1 A1 0.20.70.1 1005070 A2 s4s4 0.20.8 80 U1(S0) = 62 U2(S0) = 74 U(S0) = max{U1(S0),U2(S0)} = 74 One State/Two Actions Example

s0s0 s3s3 s2s2 s1s1 A1 0.20.70.1 1005070 A2 s4s4 0.20.8 80 U1(S0) = 62 – 5 = 57 U2(S0) = 74 – 25 = 49 U(S0) = max{U1(S0),U2(S0)} = 57 -5-25 Introducing Action Costs

MEU Principle rational agent should choose the action that maximizes agent’s expected utility this is the basis of the field of decision theory normative criterion for rational choice of action

Not quite… Must have complete model of: Actions Utilities States Even if you have a complete model, will be computationally intractable In fact, a truly rational agent takes into account the utility of reasoning as well---bounded rationality Nevertheless, great progress has been made in this area recently, and we are able to solve much more complex decision theoretic problems than ever before

We’ll look at Decision Theoretic Reasoning Simple decision making (ch. 16) Sequential decision making (ch. 17)

Preferences An agent chooses among prizes (A, B, etc.) and lotteries, i.e., situations with uncertain prizes Lottery L = [p, A; (1 – p), B] Notation: A > B: A preferred to B A  B : indifference between A and B A ≥ B : B not preferred to A

Rational Preferences Idea: preferences of a rational agent must obey constraints Axioms of Utility Theory 1. Orderability: (A > B) v (B > A) v (A  B) 2. Transitivity: (A > B) ^ (B > C)  (A > C) 3. Contitnuity: A > B > C   p [p, A; 1-p,C]  B 4. Substitutability: A  B  [p, A; 1-p,C]  [p, B; 1-p,C] 5. Monotonicity: A > B  (p ≥ q  [p, A; 1-p, B] ≥ [q, A; 1-q, B])

Rational Preferences Violating the constraints leads to irrational behavior E.g: an agent with intransitive preferences can be induced to give away all its money if B > C, than an agent who has C would pay some amount, say $1, to get B if A > B, then an agent who has B would pay, say, $1 to get A if C > A, then an agent who has A would pay, say, $1 to get C ….oh, oh!

Rational Preferences  Utility Theorem (Ramsey, 1931, von Neumann and Morgenstern, 1944): Given preferences satisfying the constraints, there exists a real- valued function U such that U(A) ≥ U(B)  A ≥B U([p 1,S 1 ;…,p n,S n ])=  i p i U(S i ) MEU principle: Choose the action that maximizes expected utility

Utility Assessment Standard approach to assessment of human utilites: compare a given state A to a standard lottery L p that has best possible prize w/ prob. p worst possible catastrophy w/ prob. (1-p) adjust lottery probability p until A  L p continue as before instant death A  L p p 1 - p

Aside: Money  Utility function Given a lottery L with expected monetrary value EMV(L), usually U(L) < U(EMV(L)) e.g., people are risk-averse Would you rather have $1,000,000 for sure, or a lottery with [0.5, $0; 0.5, $3,000,000]?

Decision Networks Extend BNs to handle actions and utilities Also called Influence diagrams Make use of BN inference Can do Value of Information calculations

Decision Networks cont. Chance nodes: random variables, as in BNs Decision nodes: actions that decision maker can take Utility/value nodes: the utility of the outcome state.

R&N example

Prenatal Testing Example

Umbrella Network rain Take Umbrella happiness take/don’t take P(rain) = 0.4 U(~umb, ~rain) = 100 U(~umb, rain) = -100 U(umb,~rain) = 0 U(umb,rain) = -25 umbrella P(umb|take) = 1.0 P(~umb|~take)=1.0

Evaluating Decision Networks Set the evidence variables for current state For each possible value of the decision node: Set decision node to that value Calculate the posterior probability of the parent nodes of the utility node, using BN inference Calculate the resulting utility for action return the action with the highest utility

Umbrella Network rain Take Umbrella happiness take/don’t take P(rain) = 0.4 U(~umb, ~rain) = 100 U(~umb, rain) = -100 U(umb,~rain) = 0 U(umb,rain) = -25 umbrella P(umb|take) = 1.0 P(umb|~take)= 0

Umbrella Network rain Take Umbrella happiness take/don’t take P(rain) = 0.4 U(~umb, ~rain) = 100 U(~umb, rain) = -100 U(umb,~rain) = 0 U(umb,rain) = -25 umbrella P(umb|take) = 0.8 P(umb|~take)=0.1 umbrainP(umb,rain | take) 000.2 x 0.6 010.2 x 0.4 100.8 x 0.6 110.8 x 0.4 #1 #1: EU(take) = 100 x.12 + -100 x 0.08 + 0 x 0.48 + -25 x.32 = ???

Umbrella Network rain Take Umbrella happiness take/don’t take P(rain) = 0.4 U(~umb, ~rain) = 100 U(~umb, rain) = -100 U(umb,~rain) = 0 U(umb,rain) = -25 umbrella P(umb|take) = 0.8 P(umb|~take)=0.1 umbrainP(umb,rain | ~take) 000. 9 x 0.6 010.9 x 0.4 100.1 x 0.6 110.1 x 0.4 #2 #2: EU(~take) = 100 x.54 + -100 x 0.36 + 0 x 0.06 + -25 x.04 = ??? So, in this case I would…?

Value of Information Idea: Compute the expected value of acquiring possible evidence Example: buying oil drilling rights Two blocks A and B, exactly one of them has oil, worth k Prior probability 0.5 Current price of block is k/2 What is the value of getting a survey of A done? Survey will say ‘oil in A’ or ‘no oil in A’ w/ prob. 0.5 Compute expected value of information (VOI) expected value of best action given the infromation minus expected value of best action without information VOI(Survey) = [0.5 x value of buy A given oil in A] + [0.5 x value of buy B given no oil in A] – 0 = ??

Value of Information (VOI) suppose agent’s current knowledge is E. The value of the current best action  is the value of the new best action (after new evidence E’ is obtained): the value of information for E’ is:

Umbrella Network rain forecast Take Umbrella happiness take/don’t take P(rain) = 0.4 U(~umb, ~rain) = 100 U(~umb, rain) = -100 U(umb,~rain) = 0 U(umb,rain) = -25 umbrella P(umb|take) = 0.8 P(umb|~take)=0.1 RP(F=rainy|R) 00.2 10.7

VOI VOI(forecast)= P(rainy)EU(  rainy ) + P(~rainy)EU(  ~rainy ) – EU(  )

Umbrella Network rain forecast Take Umbrella happiness take/don’t take P(F=rainy) = 0.4 U(~umb, ~rain) = 100 U(~umb, rain) = -100 U(umb,~rain) = 0 U(umb,rain) = -25 umbrella P(umb|take) = 0.8 P(umb|~take)=0.1 FP(R=rain|F) 00.2 10.7

VOI VOI(forecast)= P(rainy)EU(  rainy ) + P(~rainy)EU(  ~rainy ) – EU(  )

Summary: Simple Decision Making Decision Theory = Probability Theory + Utility Theory Rational Agent operates by MEU Decision Networks Value of Information

Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006.

Similar presentations

Presentation on theme: "Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006.

Similar presentations

Presentation on theme: "Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006."— Presentation transcript:

Similar presentations

About project

Feedback