Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore, Percy Liang, Luke Zettlemoyer

Announcements I sent out an email to the class mailing list on Thurs (making the medium and big cheese mazes extra credit). HW2 will be released later this week.

Adversarial search (Chapter 5) World Champion chess player Garry Kasparov is defeated by IBM’s Deep Blue chess-playing computer in a six-game match in May, 1997 (link)link © Telegraph Group Unlimited 1997

Why study games? Games are a traditional hallmark of intelligence Games are easy to formalize Games can be a good model of real-world competitive or cooperative activities –Military confrontations, negotiation, auctions, etc.

Types of game environments DeterministicStochastic Perfect information (fully observable) Imperfect information (partially observable) Chess, checkers, go Backgammon, monopoly Battleship Scrabble, poker, bridge

Alternating two-player zero-sum games Players take turns Each game outcome or terminal state has a utility for each player (e.g., 1 for win, 0 for loss) The sum of both players’ utilities is a constant

Games vs. single-agent search We don’t know how the opponent will act –The solution is not a fixed sequence of actions from start state to goal state, but a strategy or policy (a mapping from state to best move in that state) Efficiency is critical to playing well –The time to make a move is limited –The branching factor, search depth, and number of terminal configurations are huge In chess, branching factor ≈ 35 and depth ≈ 100, giving a search tree of 10 154 nodes –Number of atoms in the observable universe ≈ 10 80 –This rules out searching all the way to the end of the game

Search problem components Initial state Actions Transition model –What state results from performing a given action in a given state? Goal state Path cost –Assume that it is a sum of nonnegative step costs The optimal solution is the sequence of actions that gives the lowest path cost for reaching the goal Initial state Goal state

Deterministic Games Many possible formulations, e.g.: States: S (start at initial state) Players: P={1…N} (usually take turns) Actions: A (may depend on player/state) Transition Model: SxA -> S Terminal Test: S->{t,f} Terminal Utilities: SxP -> R Solution for a player is a policy: S->A

Deterministic Single-Player Deterministic, single player, perfect information (e.g. 8-puzzle, rubik’s cube): Know the rules, know what actions do, know when you win … it’s just search! Slight reinterpretation: Each node stores a value: the best outcome it can reach This is the maximal value of its children Note we don’t have path sums as before (utilities at end) After search, pick moves that lead to best outcome

Game tree A game of tic-tac-toe between two players, “MAX” (X) and “MIN” (O)

A more abstract game tree Terminal utilities (for MAX) 322 3 A two-ply game

A more abstract game tree Minimax value of a node: the utility (for MAX) of being in the corresponding state, assuming perfect play on both sides Minimax strategy: Choose the move that gives the best worst-case payoff 322 3

Computing the minimax value of a node Minimax(node) =  Utility(node) if node is terminal  max action Minimax(Succ(node, action)) if player = MAX  min action Minimax(Succ(node, action)) if player = MIN 322 3

Optimality of minimax The minimax strategy is optimal against an optimal opponent What if your opponent is suboptimal? –Your utility can only be higher than if you were playing an optimal opponent! –A different strategy may work better for a sub-optimal opponent, but it will necessarily be worse against an optimal opponent 11 Example from D. Klein and P. Abbeel

More general games More than two players, non-zero-sum Utilities are now tuples Each player maximizes their own utility at their node Utilities get propagated (backed up) from children to parents 4,3,24,3,27,4,17,4,1 4,3,24,3,2 1,5,21,5,27,7,17,7,1 1,5,21,5,2 4,3,24,3,2

10 9 x

Alpha-beta pruning It is possible to compute the exact minimax decision without expanding every node in the game tree

Alpha-beta pruning It is possible to compute the exact minimax decision without expanding every node in the game tree 3 33

Alpha-beta pruning It is possible to compute the exact minimax decision without expanding every node in the game tree 3 33 22

Alpha-beta pruning It is possible to compute the exact minimax decision without expanding every node in the game tree 3 33 22  14

Alpha-beta pruning It is possible to compute the exact minimax decision without expanding every node in the game tree 3 33 22 55

Alpha-beta pruning It is possible to compute the exact minimax decision without expanding every node in the game tree 3 3 22 2

Alpha-beta pruning It is possible to compute the exact minimax decision without expanding every node in the game tree (-∞,+∞) Range of possible values (-∞,+∞)

Alpha-beta pruning It is possible to compute the exact minimax decision without expanding every node in the game tree ≤3 (−∞,3] (−∞,+∞)

Alpha-beta pruning It is possible to compute the exact minimax decision without expanding every node in the game tree 3 33 [3,3] [3,+∞)

Alpha-beta pruning It is possible to compute the exact minimax decision without expanding every node in the game tree 3 33 22 (−∞,2] This node is worse for MAX [3,+∞)

Alpha-beta pruning It is possible to compute the exact minimax decision without expanding every node in the game tree 3 33 22  14 (−∞,14] [3,14]

Alpha-beta pruning It is possible to compute the exact minimax decision without expanding every node in the game tree 3 33 22 55 (−∞,5] [3,5] 55

Alpha-beta pruning It is possible to compute the exact minimax decision without expanding every node in the game tree 3 3 22 2 [2,2] (−∞,2] [3,3]

General alpha-beta pruning Consider a node n in the tree --- If player has a better choice at: –Parent node of n –Or any choice point further up Then n will never be reached in play. Hence, when that much is known about n, it can be pruned.

Alpha-beta Algorithm Depth first search –only considers nodes along a single path from root at any time  = highest-value choice found at any choice point of path for MAX (initially,  = −infinity)  = lowest-value choice found at any choice point of path for MIN (initially,  = +infinity) Pass current values of  and  down to child nodes during search. Update values of  and  during search as: –MAX updates  at MAX nodes –MIN updates  at MIN nodes Prune remaining branches at a node when  ≥ 

When to Prune Prune whenever  ≥ . –Prune below a Max node whose alpha value becomes greater than or equal to the beta value of its ancestors. Max nodes update alpha based on children’s returned values. –Prune below a Min node whose beta value becomes less than or equal to the alpha value of its ancestors. Min nodes update beta based on children’s returned values.

Alpha-Beta Example Revisited , , initial values Do DF-search until first leaf  =−   =+   =−   =+  , , passed to kids

Alpha-Beta Example (continued) MIN updates , based on kids  =−   =+   =−   =3

Alpha-Beta Example (continued)  =−   =3 MIN updates , based on kids. No change.  =−   =+ 

Alpha-Beta Example (continued) MAX updates , based on kids.  =3  =+  3 is returned as node value.

Alpha-Beta Example (continued)  =3  =+   =3  =+  , , passed to kids

Alpha-Beta Example (continued)  =3  =+   =3  =2 MIN updates , based on kids.

Alpha-Beta Example (continued)  =3  =2  ≥ , so prune.  =3  =+ 

Alpha-Beta Example (continued) 2 is returned as node value. MAX updates , based on kids. No change.  =3  =+ 

Alpha-Beta Example (continued),  =3  =+   =3  =+  , , passed to kids

Alpha-Beta Example (continued),  =3  =14  =3  =+  MIN updates , based on kids.

Alpha-Beta Example (continued),  =3  =5  =3  =+  MIN updates , based on kids.

Alpha-Beta Example (continued)  =3  =+  2  =3  =2 MIN updates , based on kids.

Alpha-Beta Example (continued)  =3  =+  2 is returned as node value. 2

Alpha-Beta Example (continued) 2

Alpha-beta pruning α is the value of the best (highest value) choice for the MAX player found so far at any choice point above node n We want to compute the MIN-value at n As we loop over n’s children, the MIN-value decreases If it drops below α, MAX will never choose n, so we can ignore n’s remaining children

Alpha-beta pruning β is the value of the best (lowest value) choice for the MIN player found so far at any choice point above node n We want to compute the MAX-value at n As we loop over n’s children, the MAX-value increases If it goes above β, MIN will never choose n, so we can ignore n’s remaining children b

Alpha-beta pruning Function action = Alpha-Beta-Search(node) v = Min-Value(node, −∞, ∞) return the action from node with value v α: best alternative available to the Max player β: best alternative available to the Min player Function v = Min-Value(node, α, β) if Terminal(node) return Utility(node) v = +∞ for each action from node v = Min(v, Max-Value(Succ(node, action), α, β)) if v ≤ α return v β = Min(β, v) end for return v node Succ(node, action) action …

Alpha-beta pruning Function action = Alpha-Beta-Search(node) v = Max-Value(node, −∞, ∞) return the action from node with value v α: best alternative available to the Max player β: best alternative available to the Min player Function v = Max-Value(node, α, β) if Terminal(node) return Utility(node) v = −∞ for each action from node v = Max(v, Min-Value(Succ(node, action), α, β)) if v ≥ β return v α = Max(α, v) end for return v node Succ(node, action) action …

Alpha-beta pruning Pruning does not affect final result Amount of pruning depends on move ordering

Example 341278 5 6 -which nodes can be pruned?

Answer to Example 341278 5 6 -which nodes can be pruned? Answer: NONE! Because the most favorable nodes for both are explored last (i.e., in the diagram, are on the right-hand side). Max Min Max

Second Example (the exact mirror image of the first example) 658721 3 4 -which nodes can be pruned?

Answer to Second Example (the exact mirror image of the first example) 658721 3 4 -which nodes can be pruned? Min Max Answer: LOTS! Because the most favorable nodes for both are explored first (i.e., in the diagram, are on the left-hand side).

Alpha-beta pruning Pruning does not affect final result Amount of pruning depends on move ordering –Should start with the “best” moves (highest-value for MAX or lowest-value for MIN) –For chess, can try captures first, then threats, then forward moves, then backward moves –Can also try to remember “killer moves” from other branches of the tree With perfect ordering, the time to find the best move is reduced to O(b m/2 ) from O(b m ) –Depth of search you can afford is effectively doubled

Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

Similar presentations

Presentation on theme: "Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,

Similar presentations

Presentation on theme: "Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,"— Presentation transcript:

Similar presentations

About project

Feedback