Presentation is loading. Please wait.

Presentation is loading. Please wait.

Notes adapted from lecture notes for CMSC 421 by B.J. Dorr

Similar presentations


Presentation on theme: "Notes adapted from lecture notes for CMSC 421 by B.J. Dorr"— Presentation transcript:

1 Notes adapted from lecture notes for CMSC 421 by B.J. Dorr
game playing

2 Outline What are games? Optimal decisions in games - pruning
Which strategy leads to success? - pruning Games of imperfect information Games that include an element of chance

3 What are and why study games?
Games are a form of multi-agent environment What do other agents do and how do they affect our success? Cooperative vs. competitive multi-agent environments. Competitive multi-agent environments give rise to adversarial problems a.k.a. games Why study games? Fun; historically entertaining Interesting subject of study because they are hard Easy to represent and agents restricted to small number of actions

4 Relation of Games to Search
Search – no adversary Solution is (heuristic) method for finding goal Heuristics and CSP techniques can find optimal solution Evaluation function: estimate of cost from start to goal through given node Examples: path planning, scheduling activities Games – adversary Solution is strategy (strategy specifies move for every possible opponent reply). Time limits force an approximate solution Evaluation function: evaluate “goodness” of game position Examples: chess, checkers, Othello, backgammon

5 Types of Games

6 Game setup Two players: MAX and MIN
MAX moves first and they take turns until the game is over. Winner gets award, looser gets penalty. Games as search: Initial state: e.g. board configuration of chess Successor function: list of (move,state) pairs specifying legal moves. Terminal test: Is the game finished? Utility function: Gives numerical value of terminal states. E.g. win (+1), lose (-1) and draw (0) in tic-tac-toe (next) MAX uses search tree to determine next move.

7 Partial Game Tree for Tic-Tac-Toe

8 Optimal strategies Find the contingent strategy for MAX assuming an infallible MIN opponent. Assumption: Both players play optimally !! Given a game tree, the optimal strategy can be determined by using the minimax value of each node: MINIMAX-VALUE(n)= UTILITY(n) If n is a terminal maxs  successors(n) MINIMAX-VALUE(s) If n is a max node mins  successors(n) MINIMAX-VALUE(s) If n is a min node

9 Two-Ply Game Tree

10 Two-Ply Game Tree MAX nodes MIN nodes
Terminal nodes – utility values for MAX Other nodes – minimax values MAX’s best move at root – a1 - leads to the successor with the highest minimax value MIN’s best to reply – b1 – leads to the successor with the lowest minimax value

11 What if MIN does not play optimally?
Definition of optimal play for MAX assumes MIN plays optimally: maximizes worst-case outcome for MAX. But if MIN does not play optimally, MAX will do even better. [Can be proved.]

12 Minimax Algorithm function MINIMAX-DECISION(state) returns an action
inputs: state, current state in game vMAX-VALUE(state) return the action in SUCCESSORS(state) with value v function MAX-VALUE(state) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v  ∞ for a,s in SUCCESSORS(state) do v  MAX(v,MIN-VALUE(s)) return v function MIN-VALUE(state) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v  ∞ for a,s in SUCCESSORS(state) do v  MIN(v,MAX-VALUE(s)) return v

13 Properties of Minimax Criterion Minimax Complete? Yes Time O(bm) Space
Optimal?

14 Tic-Tac-Toe Depth bound 2
Breadth first search until all nodes at level 2 are generated Apply evaluation function to positions at these nodes

15 Tic-Tac-Toe Evaluation function e(p) of a position p
If p is not a winning position for either player e(p) = (number of a complete rows, columns, or diagonals that are still open for MAX) – (number of a complete rows, columns, or diagonals that are still open for MIN) If p is a win for MAX e(p) =  If p is a win for MIN e(p) = - 

16 Tic-Tac-Toe - First stage
x 6-4=2 5-4=1 -2 x 1 x o o 4-6=-2 x x MAX’s move o 6-6=0 x o 4-5=-1 1 x o 5-6=-1 5-5=0 x x o o 6-5=1 o 5-5=0 x x -1 x o 5-5=0 x 5-6=-1 o x 6-5=1 x o

17 Tic-Tac-Toe - Second stage
1 1 MAX’s move o o x o x 3-2=1 x x o x x o o o o x 4-2=2 o x 4-3=1 x o x x x o o 1 o o x 3-2=1 o x 4-2=2 o x 4-3=1 x x x o o o x o 5-2=3 o x o x 4-2=2 3-3=0 x x x o x x o o x o 5-3=2 o x 3-2=1 o x 3-2=1 x x o x o x 4-2=2 o x x 3-3=0 o x o 5-2=3 x o x 3-3=0 o o x x o o x 4-3=1 o x o x x 4-3=1 4-2=2 o x o x o o 4-3=1 x o x x 4-2=2 o o x

18 Multiplayer games Games allow more than two players
Single minimax values become vectors

19 Problem of minimax search
Number of games states is exponential to the number of moves. Solution: Do not examine every node ==> Alpha-beta pruning Alpha = value of best choice found so far at any choice point along the MAX path Beta = value of best choice found so far at any choice point along the MIN path Revisit example …

20 Tic-Tac-Toe - First stage
Beta value = -1 B x Alpha value = -1 o 4-5=-1 x 5-5=0 x o o 6-5=1 x -1 x o 5-5=0 x A 5-6=-1 C o x 6-5=1 x o

21 Alpha-beta pruning Search progresses in a depth first manner
Whenever a tip node is generated, it’s static evaluation is computed Whenever a position can be given a backed up value, this value is computed Node A and all its successors have been generated Backed up value -1 Node B and its successors have not yet been generated Now we know that the backed up value of the start node is bounded from below by -1

22 Alpha-beta pruning Depending on the back up values of the other successors of the start node, the final backed up value of the start node may be greater than -1, but it cannot be less This lower bound – alpha value for the start nodea

23 Alpha-beta pruning Depth first search proceeds – node B and its first successor node C are generated Node C is given a static value of -1 Backed up value of node B is bounded from above by -1 This Upper bound on node B – beta value. Therefore discontinue search below node B. Node B. will not turn out to be preferable to node A

24 Alpha-beta pruning Reduction in search effort achieved by keeping track of bounds on backed up values As successors of a node are given backed up values, the bounds on backed up values can be revised Alpha values of MAX nodes that can never decrease Beta values of MIN nodes can never increase

25 Alpha-beta pruning Therefore search can be discontinued
Below any MIN node having a beta value less than or equal to the alpha value of any of its MAX node ancestors The final backed up value of this MIN node can then be set to its beta value Below any MAX node having an alpha value greater than or equal to the beta value of any of its MINa node ancestors The final backed up value of this MAX node can then be set to its alpha value

26 Alpha-beta pruning during search, alpha and beta values are computed as follows: The alpha value of a MAX node is set equal to the current largest final backed up value of its successors The beta value of a MINof the node is set equal to the current smallest final backed up value of its successors

27 Alpha-Beta Example Do DF-search until first leaf [-∞,+∞] [-∞, +∞]
Range of possible values [-∞,+∞] [-∞, +∞]

28 Alpha-Beta Example (continued)
[-∞,+∞] [-∞,3]

29 Alpha-Beta Example (continued)
[-∞,+∞] [-∞,3]

30 Alpha-Beta Example (continued)
[3,+∞] [3,3]

31 Alpha-Beta Example (continued)
[3,+∞] This node is worse for MAX [3,3] [-∞,2]

32 Alpha-Beta Example (continued)
, [3,14] [3,3] [-∞,2] [-∞,14]

33 Alpha-Beta Example (continued)
, [3,5] [3,3] [−∞,2] [-∞,5]

34 Alpha-Beta Example (continued)
[3,3] [3,3] [−∞,2] [2,2]

35 Alpha-Beta Example (continued)
[3,3] [3,3] [-∞,2] [2,2]

36 Alpha-Beta Algorithm function ALPHA-BETA-SEARCH(state) returns an action inputs: state, current state in game vMAX-VALUE(state, - ∞ , +∞) return the action in SUCCESSORS(state) with value v function MAX-VALUE(state, , ) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v  - ∞ for a,s in SUCCESSORS(state) do v  MAX(v,MIN-VALUE(s,  , )) if v ≥  then return v   MAX( ,v) return v

37 Alpha-Beta Algorithm function MIN-VALUE(state,  , ) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v  + ∞ for a,s in SUCCESSORS(state) do v  MIN(v,MAX-VALUE(s,  , )) if v ≤  then return v   MIN( ,v) return v

38 General alpha-beta pruning
Consider a node n somewhere in the tree If player has a better choice at Parent node of n Or any choice point further up n will never be reached in actual play. Hence when enough is known about n, it can be pruned.

39 Final Comments about Alpha-Beta Pruning
Pruning does not affect final results Entire subtrees can be pruned. Good move ordering improves effectiveness of pruning Alpha-beta pruning can look twice as far as minimax in the same amount of time Repeated states are again possible. Store them in memory = transposition table

40 Games that include chance
Possible moves (5-10,5-11), (5-11,19-24),(5-10,10-16) and (5-11,11-16)

41 Games that include chance
chance nodes Possible moves (5-10,5-11), (5-11,19-24),(5-10,10-16) and (5-11,11-16) [1,1], [6,6] chance 1/36, all other chance 1/18

42 Games that include chance
[1,1], [6,6] chance 1/36, all other chance 1/18 Can not calculate definite minimax value, only expected value

43 Expected minimax value
EXPECTED-MINIMAX-VALUE(n)= UTILITY(n) If n is a terminal maxs  successors(n) MINIMAX-VALUE(s) If n is a max node mins  successors(n) MINIMAX-VALUE(s) If n is a max node s  successors(n) P(s) . EXPECTEDMINIMAX(s) If n is a chance node These equations can be backed-up recursively all the way to the root of the game tree.

44 Position evaluation with chance nodes
Left, A1 wins Right A2 wins Outcome of evaluation function may not change when values are scaled differently. Behavior is preserved only by a positive linear transformation of EVAL.

45 Discussion Examine section on state-of-the-art games yourself
Minimax assumes right tree is better than left, yet … Return probability distribution over possible values Yet expensive calculation

46 Discussion Utility of node expansion
Only expand those nodes which lead to significanlty better moves Both suggestions require meta-reasoning

47 Summary Games are fun (and dangerous)
They illustrate several important points about AI Perfection is unattainable -> approximation Good idea what to think about Uncertainty constrains the assignment of values to states Games are to AI as grand prix racing is to automobile design.


Download ppt "Notes adapted from lecture notes for CMSC 421 by B.J. Dorr"

Similar presentations


Ads by Google