Game Playing COSC 4550/5550 Prof. D. Spears.

Game Playing COSC 4550/5550 Prof. D. Spears

Game Theory In AI, games are considered a form of adversarial search.
Mathematical game theory, however, has its origins in the field of economics. In economics, any multiagent environment is viewed as a game, if agents have significant impacts on each other. Von Neumann, J. and Morgenstern, Theory of Games and Economic Behavior. Princeton University Press. Nash, J Equilibrium points in N-person games. Proceedings of the National Academy of Sciences 36: Game theory topics include: Static/dynamic games of complete/incomplete information Nash equilibria Dominance Multi-stage games Repeated games Bayesian games and Bayesian equilibria Mechanism design

Classical AI Adversarial Game Playing (Deterministic two-agent search)
The form of games studied most in AI is deterministic, turn-taking, two-player, zero-sum games of perfect information. This means deterministic, fully observable environments in which there are two agents whose actions must alternate and in which the utility values at the end of the game are always equal and opposite (winner, loser). Game-playing is a classical search problem A state is a board configuration Initial state (initial board, which agent’s turn). Set of actions/operators (legal moves), along with the successor function δ(s, a) which gives the next state. Terminal (goal) test (when the game is over). Terminal states. Cost/utility/payoff/evaluation function (evaluates the game state).

Game Playing (cont’d) Cannot be solved using single-agent search algorithms because presence of an opponent introduces uncertainty even for games of perfect information (chess, checkers, tic-tac-toe). You never really know what your opponent will do and have to be prepared for any action by the opponent. Standard assumption: assume perfect opponent in order to generate search tree. In other words, assume the opponent will make the most damaging move (from your point of view). Definition: “ply” – one move by a player.

Game trees for 2-player, zero-sum games
MAX and MIN are the two players. MAX moves first and we typically view the game from MAX’s perspective. Zero-sum: A win for MAX is a loss for MIN and vice-versa. [E.g., Score: +1 for win, -1 for loss, 0 for tie. The sum of payoffs for both players is 0 at the end of the game.] The initial state and legal moves for each side define a game tree.

Generating game trees x o Possible moves for o x x o o x o x o x o x o
Possible moves for x

Minimax Algorithm MAX MIN MAX 4 10 5 3 9 17 7 2 1
Maximizes the utility of MAX under the assumption that MIN will play perfectly to minimize it.

Minimax Algorithm function MINIMAX-DECISION(state) returns an action
inputs: state, current state in game v MAX-VALUE(state) return the action in SUCCESSORS(state) with value v function MAX-VALUE(state) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v for each s in SUCCESSORS(state) do v MAX(v, MIN-VALUE(s)) return v function MIN-VALUE(state) returns a utility value v MIN(v, MAX-VALUE(s))

Minimax Algorithm MAX 4 4 3 1 MIN MAX 4 10 5 3 9 17 7 2 1
Marked path is called the principal variation. All nodes on it have the value at the root.

Performance of Minimax
We assume a depth-first search of the tree: Time complexity = O(bm) where b is the branching factor (width) and m is the maximum depth of the tree. Space complexity is O(bm) if all successors generated at once, or O(m) if successors generated one at a time. For real games, the time cost of exhaustive search is impractical. For example, the search tree for chess has about nodes. So, how can we improve search? We can evaluate nodes that aren’t terminal (leaf) nodes using an evaluation (heuristic) function. We can make use of symmetries in the search space. We can try to prune parts of the search space that we don’t need to examine.

Depth-Limited Search We may not have the computational resources to expand the game tree all the way to the end of the game, even if loss/win/draw is clear. One solution is a depth-limited search with an evaluation function.

Evaluation Functions With a static board evaluation function, we don’t need to expand the entire search tree. A static board evaluation function estimates the goodness of a board configuration with respect to a player (typically, MAX). The quality of the moves selected by depth-limited minimax is a function of the quality of the static board evaluator. A good evaluation function: Accurately reflects the probability of winning for a given node in the search tree. Is efficient to compute.

Symmetries Taking advantage of symmetries is also useful, because this allows one to automatically reduce the size of the search space by only generating a portion of the search space and ignoring the symmetric equivalents.

An Example using Tic-Tac-Toe
An example evaluation function for a board position p: Eval(p) = (the number of complete rows, columns, or diagonals that are still open for MAX) – (the number of complete rows, columns, or diagonals that are still open for MIN). Eval(p) = if MAX wins. Eval(p) = if MIN wins. o x Eval(p) = 6 – 4 = 2

Symmetries in Tic-Tac-Toe
There are various symmetries that one can exploit in Tic-Tac-Toe. Consider the following 4 board positions… o x x o x o x o Once you’ve searched the game tree for the left-most board position, there is no need to examine the other board positions…

An Example of a Search Tree That Captures Symmetries
MAX 1 MIN x 1 MIN MIN x -1 x o -2 x x o Note the use of an evaluation function, symmetries, and minimax algorithm. 5-4=1 6-4=2 x x x x x o o o o o x x x x o x o o o o 6-5=1 5-5=0 6-5=1 5-5=0 4-5=-1 5-6=-1 6-6=0 5-6=-1 6-6=0 4-6=-2

Pruning the Search Space
Using symmetries is not the only way to prune the search space. Alpha-beta pruning: The value of a node is relevant only if there is some possibility we will get to it during play. If we can prove that rational players will never let that node be reached, regardless of its value, then we don’t have to examine or even generate it.

Alpha-Beta Pruning: Motivation
MAX 4 MIN 4 MAX 4 10 5 3 9 17 7 2 1 After evaluating left subtree, we know MAX has a move with value 4. After expanding the first leaf node in the middle subtree, what can we conclude?

Alpha-Beta Pruning (cont’d)
MAX 4 MIN 4 <=3 MAX 4 10 5 3 9 17 7 2 1 The value for MAX from the middle subtree is less than or equal to 3. Since MAX already has a move with a higher value, the decision at the root will be unchanged regardless of the values of the other leaves in the middle subtree.

Alpha-Beta Pruning (cont’d)
MAX 4 MIN 4 <= 3 <=2 MAX 4 10 5 3 9 17 7 2 1 The first leaf of the right subtree promises a value of 7 or lower. So we continue on to the second leaf, which brings the value of the subtree down to 2. At this point, MAX can stop looking at this subtree – because its value will be less than or equal to 2, and MAX already has a move with value 4.

Alpha and Beta Alpha, or α, is the value of the best (i.e., highest-value) choice we have found so far at any choice point along the path for MAX. Beta, or β, is the value of the best (i.e., lowest-value) choice we have found so far at any choice point along the path for MIN. Alpha and beta values are updated during search. We compare these two values to do pruning.

Pruning principle Every node has two variables, [alpha, beta]. n n
beta’ <= alpha Think about this. On LHS, n’ will have a value <= beta’ (due to the min function), So MAX can safely ignore the pruned part of the tree. On RHS, n’ will have a value >= alpha’ (due to the max function), So MIN can safely ignore the pruned part of the tree. Search can be discontinued below a MIN node having a beta’ <= the alpha of its MAX parent. It can also be discontinued below a MAX node having an alpha’ >= the beta of its MIN parent.

Alpha-Beta Search Algorithm
This is just like MINIMAX, but with alpha and beta added function ALPHA-BETA-SEARCH(state) returns an action inputs: state, current state in game v MAX-VALUE(state, ) return the action in SUCCESSORS(state) with value v

Alpha-Beta Search Algorithm (cont’d)
function MAX-VALUE(state, α, β) returns a utility value v inputs: state, current state in game α, the best alternative for MAX along the path to state β, the best alternative for MIN along the path to state if TERMINAL-TEST(state) then return UTILITY(state) v for each s in SUCCESSORS(state) do v MAX(v, MIN-VALUE(s, α, β ) if then return v α MAX(α, v) return v For depth-limited search, this is the static evaluation Pruning: Returns β to replace β in the node above Returns α to replace β in the node above

Alpha-Beta Search Algorithm (cont’d)
function MIN-VALUE(state, α, β) returns a utility value v inputs: state, current state in game α, the best alternative for MAX along the path to state β, the best alternative for MIN along the path to state if TERMINAL-TEST(state) then return UTILITY(state) v for each s in SUCCESSORS(state) do v MIN(v, MAX-VALUE(s, α, β ) if then return v β MIN(β, v) return v For depth-limited search, this is the static evaluation Pruning: Returns β to replace β in the node above Returns α to replace β in the node above NOTE: ALPHA-BETA search is the same algorithm as MINIMAX, except for the two lines in MAX-VALUE and MIN-VALUE that maintain alpha and beta, and the bookkeeping to pass these parameters along.

Alpha-Beta in Lisp (defun max-value (tree alpha beta)
(format t "max-value ~S ~S ~S~%" tree alpha beta) (cond ((atom tree) (setq alpha tree)) (t (dolist (subtree tree) (setq alpha (apply #'max `(,alpha ,(min-value subtree alpha beta)))) (if (>= alpha beta) (return beta))))) alpha ) (defun min-value (tree alpha beta) (format t "min-value ~S ~S ~S~%" tree alpha beta) (cond ((atom tree) (setq beta tree)) (setq beta (apply #'min `(,beta ,(max-value subtree alpha beta)))) (if (>= alpha beta) (return alpha))))) beta Alpha-Beta in Lisp ;Top level call (defun alpha-beta (tree) (max-value tree ) )

Example in Detail The numbers in brackets are [alpha, beta].
[-Inf, Inf] [-Inf, Inf] [-Inf, Inf] [-Inf, Inf] 1 < β = Inf [-Inf, 1] The lower left-most node is updated to [-Inf, 1]. so return β to replace α above

Example in Detail [-Inf, Inf] [-Inf, Inf] [1, Inf] [-Inf, Inf] [1, Inf] The previously updated beta is passed up to change alpha and we continue back down. [1, 5] [-Inf, 1] [1, 2] so return β to replace α above

Example in Detail [-Inf, Inf] [-Inf, 2] [2, Inf] [-Inf, 2] [-Inf, Inf] [1, Inf] [-Inf, 2] The previously updated beta is passed up to change alpha. This alpha is passed up to change beta higher up – because beta is less than the previous value of alpha, which was Inf. We go back down. [1, 5] [-Inf, 1] [1, 2] so return β to replace α above

Example in Detail [2, Inf] [-Inf, 2] [2, Inf] 2) [2, Inf] [2, 2] [2, Inf] This pruning makes sense. No point in continuing because MIN would never send us down this way, given it can’t get under 2. [2, Inf] [-Inf, Inf] [1, Inf] [-Inf, 2] The previously updated beta is passed up to change alpha. This causes pruning. Then beta (not alpha) is passed up to the higher beta. Then beta is passed up to the root alpha. Then we go back down. [1, 5] [-Inf, 1] [1, 2] 1) so return β to replace α above

Example in Detail Alpha (not beta) is passed up to change alpha. This would have involved pruning if this had occurred before the last leaf. We go back down. [2, Inf] [-Inf, 2] [2, Inf] [2, Inf] [2, 2] [2, Inf] [-Inf, Inf] [1, Inf] [-Inf, 2] [2, Inf] [2, Inf] [1, 5] [2, 9] [-Inf, 1] [2, 3] [1, 2] [2, 3] [2, 2]

Example in Detail The updated beta is passed up to change alpha. The updated alpha is passed up to change beta. Then back down. This causes pruning. [2, Inf] [-Inf, 2] [2, 3] [2, Inf] [2, 2] [3, Inf] [2, 3] This pruning makes sense. No point in continuing because already there is a path that will give 2 (on the left side of the tree). If the pruned branch > 2 then MIN would never take it, and if branch < 2 then MIN would take it, but MAX never would. [-Inf, Inf] [1, Inf] [-Inf, 2] [2, Inf] [2, Inf] [2, 3] 2) [1, 5] [2, 9] [-Inf, 1] [2, 3] [2, 2] [1, 2] [2, 3] 1) [2, 2] so return β to replace α above

Example in Detail The updated alpha (not beta) is passed up to change alpha. Then we go back down. [2, Inf] [-Inf, 2] [2, 3] [2, Inf] [2, 2] [3, Inf] [2, 3] This pruning makes sense. No point in continuing because already there is a path that will give 2 (on the left side of the tree). If the pruned branch > 2 then MIN would never take it, and if branch < 2 then MIN would take it, but MAX never would. [-Inf, Inf] [1, Inf] [-Inf, 2] [2, Inf] [2, Inf] [2, 3] [2, 3] [1, 5] [2, 9] [-Inf, 1] [2, 3] [2, 2] [1, 2] [2, 3] [2, 2]

Example in Detail Alpha is passed up to replace alpha. Pruning would have been done if the order were different. Return values to root. Done. [2, Inf] [-Inf, 2] [2, 2] [2, Inf] [2, 2] [3, Inf] [2, 3] Question: Why isn’t the last subtree pruned as well? Because it could have a min score of 2.5… We need to check. The final path is shown highlighted. [-Inf, Inf] [1, Inf] [-Inf, 2] [2, Inf] [2, Inf] [2, 3] [2, 3] [1, 5] [2, 9] [-Inf, 1] [2, 3] [2, 2] [2, 1] [1, 2] [2, 3] [2, 2] so use α to replace α above

Trace of Lisp Alpha Beta Code
max-value ((((1 4 1) (5 9 2)) ((6 5 3) (8 9 7))) (((9 3 2) (3 8 4)) ((6 2 6) (4 3 1)))) min-value (((1 4 1) (5 9 2)) ((6 5 3) (8 9 7))) max-value ((1 4 1) (5 9 2)) min-value (1 4 1) max-value max-value max-value min-value (5 9 2) max-value max-value 9 1 5 max-value 2 1 5 max-value ((6 5 3) (8 9 7)) min-value (6 5 3) max-value max-value max-value // NOTE: Pruning has happened min-value (((9 3 2) (3 8 4)) ((6 2 6) (4 3 1))) max-value ((9 3 2) (3 8 4)) min-value (9 3 2) max-value max-value 3 2 9 max-value 2 2 3 min-value (3 8 4) max-value max-value 8 2 3 max-value 4 2 3 max-value ((6 2 6) (4 3 1)) 2 3 min-value (6 2 6) 2 3 max-value 6 2 3 max-value // NOTE Pruning has happened min-value (4 3 1) 2 3 max-value 3 2 3 max-value 1 2 3

Qualification The effectiveness of alpha-beta pruning is highly dependent on the order in which the successors are examined. If you are able to order the successors in a desirable way for pruning, then alpha-beta can look ahead roughly twice as far as minimax in the same amount of time.

Order #1 [2, Inf] [-Inf, 2] [2, Inf] [2, Inf] [2, 2] [2, Inf] [-Inf, Inf] [1, Inf] [-Inf, 2] [2, Inf] [1, 5] [2, 9] [-Inf, 1] [1, 2] [2, 3] [2, 2]

Order #2 [2, Inf] [-Inf, 2] [2, Inf] [2, Inf] [2, 2] [2, Inf] [-Inf, Inf] [1, Inf] [-Inf, 2] [2, Inf] [1, 5] [2, 2] [-Inf, 1] [1, 2]

More on Evaluation Functions
An evaluation function returns an estimate of the expected utility of the game from a given position. The evaluation function must agree with the utility function on terminal states. An evaluation function should be relatively quick to compute. An evaluation function should accurately reflect the actual chances of winning. Example: Weighted linear function w1f1 + w2f2 +…+ wnfn where the w’s are weights and the f’s are features of the particular position.

An Example using Tic-Tac-Toe
An example evaluation function for a board position p: Eval(p) = 3 * (the number of complete rows, columns, or diagonals that are still open for MAX) – 2 * (the number of complete rows, columns, or diagonals that are still open for MIN). Eval(p) = if MAX wins. Eval(p) = if MIN wins. o x Eval(p) = 3 * 6 – 2 * 4 = 10

Search Enhancements Other search enhancements for game-playing programs: Game-specific ordering of successors, e.g., try captures first, then threats, then forward moves, then backward moves. Quiescence search: If search reaches a leaf node and there is evidence to suggest that the evaluation may be unstable (e.g., you are in the middle of a piece exchange), continue searching until a stable (quiescent) stage is reached. Some moves are explored deeper than others.

History: Game-Playing Program Success: Deep Blue
May 1997, IBM’s Deep Blue Supercomputer played a match with reigning World Chess Champion Gary Kasparov. In a shocking finale, Kasparov resigned 19 moves into Game 6, handing a historic victory to Deep Blue. First time a current world champion lost a match to a computer opponent under tournament conditions. Deep Blue used a weighted combination of material advantage, position, king safety, and tempo (rate of development of board control) for its evaluation function. 700,000 grandmaster games are kept in its database. Has special-purpose hardware to generate game tree quickly: 480 chess-specific processors, each of which is capable of examining 2 million positions per second.

Game Playing COSC 4550/5550 Prof. D. Spears.

Similar presentations

Presentation on theme: "Game Playing COSC 4550/5550 Prof. D. Spears."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Game Playing COSC 4550/5550 Prof. D. Spears.

Similar presentations

Presentation on theme: "Game Playing COSC 4550/5550 Prof. D. Spears."— Presentation transcript:

Similar presentations

About project

Feedback