Summary So Far Extremes in classes of games: –Nonadversarial, perfect information, deterministic –Adversarial, imperfect information, chance Adversarial, perfect information, deterministic Adversarial, perfect information, deterministic Minimax trees Optimal playing strategy if tree is finite But requires generating whole tree But there are workarounds Cut-off and evaluation functions α-β pruning
Chance (Non-game) Applications: Negotiation Auctions Military planning
Example: Blackgammon Game overview: –Goal is to move all one’s pieces off the board. –Requires all pieces to be in the home board. –White moves clockwise toward 25, and black counterclockwise toward 0. –A piece can move everywhere except if there are several opponent pieces there. –Roll dice at the beginning of a player’s turn to determine legal moves.
Example Configuration legal moves: 5-10, , , , 11-16
Detour: Probability Suppose that I flip a “fair” coin: what is the probability that it will come heads: Expected gain if you bet $X on heads: 0.5 Suppose that I flip a “totally unfair” coin (always come heads): what is the probability that it will come heads: Expected gain if you bet $X on heads: 1 $X Maximum Expected Utility principle MEU([p 1,S 1 ; p 2,S 2 ; … ; p n,S n ]) = i p i U(S i ) $X/2
Example Suppose that you are in a TV show and you have already earned 1’ so far. Now, the presentator propose you a gamble: he will flip a coin if the coin comes up heads you will earn 3’ But if it comes up tails you will loose the 1’ What do you decide? First shot: U(winning $X) = X MEU ([0.5,0; 0.5,3’ ]) = 1’ This utility is called the expected monetary value
Example (II) If we use the expected monetary value of the lottery does it take the bet? Yes!, because: MEU([0.5,0; 0.5,3’ ]) = 1’ > MEU([1,1’ ; 0,3’ ]) = 1’ But is this really what you would do? Not me!
Example (III) Second shot: Let S = “my current wealth” S’ = “my current wealth” + $1’ S’’ = “my current wealth” + $3’ MEU(Accept) = MEU(Decline) = 0.5U(S) + 0.5U(S’’) U(S’) 0.5U(S) + 0.5U(S’’) U(S’) If U(S) = 5, U(S’) = 8, U(S’’) = 10, would you accept the bet? No! = 7.5 = 8 $ U
Human Judgment and Utility Decision theory is a normative theory: describe how agents should act Experimental evidence suggest that people violate the axioms of utility Tversky and Kahnerman (1982) and Allen (1953): Experiment with people Choice was given between A and B and then between C and D: A: 80% chance of $4000 B: 100% chance of $3000 C: 20% chance of $4000 D: 25% chance of $3000
Human Judgment and Utility (II) Majority choose B over A and C over D If U($0) = 0 MEU([0.8,4000; 0.2,0]) = MEU([1,3000; 0,4000]) = 0.8U($4000) U($3000) Thus, 0.8U($4000) < U($3000) MEU([0.2,4000; 0.8,0]) = MEU([0.25,3000; 0.65, 0]) = 0.2U($4000) 0.25U($3000) Thus, 0.2U($4000) > 0.25U($3000) Thus, there cannot be no utility function consistent with these values
Human Judgment and Utility (III) The point is that it is very hard to model an automatic agent that behaves like a human (back to the Turing test) However, the utility theory does give some formal way of model decisions and as such is used to generate consistent decisions
Extending Minimax Trees: Expectiminimax Chance node denote possible dice rolls Each branch from a chance node is labeled with the probability that the branch will be taken If distribution is uniform then probability is 1/n, where n is the number of choices Each position has an expected utility
Expected Utility: Expectimax If node n is terminal, EY(n) = utility(n) If n is a nonterminal node: expectimax and expectimin Terminal MAX Dice C Expectimax(C) = i p(d i )max S S(C,di) (utility(s)) where S(C,d i ) is the set of all legal moves for P(d i ) P(1,1) P(6,6)
Expected Utility: MIN, MAX Terminal MAX Dice C MIN(M) = min C children(M) (Expectimax(C)) (that is, apply standard minimax-value formula) MIN M
Expected Utility: Expectimin Terminal MAX Dice C MIN M Dice C’ Expectimin(C’) = i p(d i )min S S(C,di) (utility(s)) where S(C,d i ) is the set of all legal moves for P(d i )
Closing Notes These trees can be very large, therefore Cut-off and evaluation functions –Evaluation functions have to be linear functions: EF(state) = w 1 f 1 (state) + w 2 f 2 (state) + … + w n f n (state) Complexity –Minimax (i.e., w/o chance): O(b m ) –Expectiminimax: O(b m n m ), where n is the number of distinct roles