6. Fully Observable Game Playing 2012/03/28
Games vs. search problems
Game Theory Studied by mathematicians, economists, finance In AI, we limit game to deterministic turn-taking two-player zero-sum perfect information This means deterministic, full observable environments in which there are two agents whose action must alternate and in in which the utility values at the end of the game are always equal and opposite
Types of Games deterministic chance perfect information Chess, Checkers (西洋跳棋), Go, Othello Backgammon (西洋雙陸棋) imperfect information Bridge, Poker Game playing was one of the first tasks undertaken in AI Machines have surpassed humans on checker and Othello, have defeated human champions in chess and backgammon In Go, computers perform at the amateur level
Checkers
Game as Search Problems Games offer pure, abstract competition A chess playing computer would be an existence proof of a machine doing something generally thought to require intelligence Games are idealization of worlds in which the world state is fully accessible the (small number of) actions are well-defined uncertainty due to moves by the opponent due to the complexity of games
Game as Search Problems (cont.-1) Games are usually much too hard to solve Example, chess: average branching factor = 35 average moves per player = 50 total number of nodes in search tree = 35100 or 10154 total number of different legal positions = 1040 Time limits for making good decisions Unlikely to find goal, must approximate
Game as Search Problems (cont.-2) Initial State How does the game start? Successor Function A list of legal (move, state) pairs for each state Terminal Test Determine when game is over Utility Function Provide numeric value for all terminal states e.g., win, lose, draw with +1, -1, 0
Game Tree (2-player, deterministic, turns) Game tree complexity 9!=362880 Game board complexity 39= 19683
Minimax Strategy Assumption MinimaxValue(n) = Both players are knowledgeable and play the best possible move MinimaxValue(n) = Utility(n) if n is a terminal state maxsSuccessors(n) MinimaxValue(s) if n is a MAX node minsSuccessors(n) MinimaxValue(s) if n is a MIN node
Minimax Strategy (cont.) Is a Optimal Strategy Leads to outcomes at least as good as any other strategy when playing an infallible opponent Pick the option that most (max) minimizes the damage your opponent can do maximize the worst-case outcome because your skillful opponent will certainly find the most damaging move
Minimax Perfect play for deterministic, perfect information games Idea: choose moves to a position with highest minimax value = best achievable payoff against best play
Minimax – Animated Example 3 6 The computer can obtain 6 by choosing the right hand edge from the first node. Max 5 1 3 6 2 7 6 Min 5 3 1 3 7 Max 6 5 13
Minimax Algorithm function MINIMAX-DECISION ( state ) returns an action inputs: state, current state in game v MAX-VALUE ( state ) return the action in SUCCESSORS( state ) with value v function MAX-VALUE ( state ) returns a utility value if TERMINAL-TEST ( state ) then return UTILITY ( state ) v – for a, s in SUCCESSORS ( state ) do v MAX ( v, MIN-VALUE ( s )) return v function MIN-VALUE ( state ) returns a utility value v v MIN ( v, MAX-VALUE ( s ))
Optimal Decisions in Multiplayer Games Extend the minimax idea to multiplayer games Replace the single value for each node with a vector of values
Minimax Algorithm (cont.) Generate the whole game tree Apply the utility function to each terminal state Propagate utility of terminal states up one level Utility(n) = max / min (n.1, n.2, …, n.b) At the root, MAX chooses the move leading to the highest utility value
Analysis of Minimax Complete? Yes, only if the tree is finite Optimal? Yes, against an optimal opponent Time? O(bm), is a complete depth-first search m: max depth, b: # of legal moves Space? O(bm), generate all successors at once or O(m), generate successor one at a time For chess, b 35, m 100 for reasonable games Exact solution completely infeasible
Complex Games What happens if minimax is applied to large complex games? What happens to the search space? Example, chess Decent amateur program 1000 moves / second 150 seconds / move (tournament play) Look at approx. 150,000 moves Chess branching factor of 35 Generate trees that are 3-4 ply Resultant play – pure amateur
- Pruning The problem of minimax search - pruning # of states to examine: exponential in number of moves - pruning return the same move as minimax would, but prune away branches that cannot possibly influence the final decision – lower bound on MAX node, never decreasing value of the best (highest) choice so far in search of MAX – upper bound on MAX node, never increasing value of the best (lowest) choice so far in search of MIN
- Pruning Example - 1
- Pruning Example - 1 (2nd Ed.) 2 5 14 ? [-, 2]
- Pruning (cont.) cut-off cut-off Search is discontinued below any MIN node with min-value cut-off Search is discontinued below any MAX node with max-value Order of considering successors matters (look at step f in previous slide) If possible, consider best successors first
- Pruning (cont.) max min If n is worse than , max will avoid it prune the branch If m is better than n for player, we will never get to n in play and just prune it
- Pruning Example - 2 A D E F G = - 6 = = - = 6 8 2 5 = 6 = 6 = 6 = B C = - 6 = = - = = - = 6 = - 8 = 6 = 6 = D E F G H I J K L M L M 6 5 8 6 2 1 5 4
- Pruning Example - 3 MAX MIN Completed 5 1 2 3 4 7 6 a c b d e f g 6 a c b d e f g Node Alpha Beta a 3 ∞ Node Alpha Beta a -∞ ∞ b -∞ 3 b -∞ ∞ d 1 ∞ d -∞ ∞ d 3 ∞ d 2 ∞ e 4 3 CUT-OFF e -∞ 3 c 3 3 CUT-OFF c 3 ∞ f 3 ∞ Completed
Key: -∞ = negative infinity; +∞ = positive infinity Function Node a b V Return Max A -∞, 3 +∞ 3, 2 Min B -∞ +∞, 3 3, 4 D -∞,1,2,3 1,2,3 1 2 3 E -∞, 4 4 Cutoff 5 & 7 C +∞, 6 6 F 3,4,5,6 -∞,4,5,6 4,5,6 5 G -∞, 6 Cutoff 1 & 5 Key: -∞ = negative infinity; +∞ = positive infinity The last value in a square is the final value assigned to the specific variable, i.e. at the end of the search Node A’s a = 3.
- Algorithm function ALPHA-BETA-SEARCH ( state ) returns an action inputs: state, current state in game v MAX-VALUE ( state, – , ) return the action in SUCCESSORS ( state ) with value v function MAX-VALUE ( state, , ) returns a utility value , the value of the best alternative for MAX along the path to state , the value of the best alternative for MIN along the path to state if TERMINAL-TEST ( state ) then return UTILITY ( state ) v – for a, s in SUCCESSORS ( state ) do v MAX ( v, MIN-VALUE ( s, , )) if v then return v // fail-high MAX ( , v ) return v
- Algorithm (cont.) function MIN-VALUE ( state, , ) returns a utility value inputs: state, current state in game , the value of the best alternative for MAX along the path to state , the value of the best alternative for MIN along the path to state if TERMINAL-TEST ( state ) then return UTILITY ( state ) v for a, s in SUCCESSORS ( state ) do v MIN ( v, MAX-VALUE ( s, , )) if v then return v // fail low MIN ( , v ) return v
- Pruning Example - 4 MAX MIN 5 8 7 a c b d e f g 4 2 1 3 h i j k l 3 h i j k l m n
- Pruning Example - 5 MAX MIN -1 5 a c b d e f g 1 2 -5 3 4 h i j k -1 5 a c b d e f g 1 2 -5 3 4 h i j k l n o -4 -3 m
Analysis of - Search Pruning does not affect final result The effectiveness of - pruning is highly dependent on the order in which the successors are examined It is worthwhile to try to examine first the successors that are likely to be best e.g., Example 1 (e,f) If successors of D is 2, 5, 14 (instead of 14, 5, 2) then 5, 14 can be pruned
Analysis of - Search (cont.) If best move first (perfect ordering), the total number of nodes examined = O(bm/2) effective branching factor = b1/2 for chess, 6 instead 35 i.e., - can look ahead roughly twice as far as minimax in the same amount of time If random order, the total number of nodes examined = O(b3m/4) for moderate b
Imperfect, Real-Time Decisions No practical to assume the program has time to search all the ways to terminal states Since moves must be made in a reasonable amount of time, to alter minimax or - in two ways Evaluation Function (instead of utility function) an estimate of the expected utility of game from a given position Cutoff Test (instead of terminal test) decide when to apply Eval e.g., depth limit (perhaps add quiescence search)
Evaluation Functions The heuristic that estimates expected utility Preserve the ordering among terminal states in the same way as the true utility function, otherwise it can cause bad decision making Computation cannot take too long For nonterminal states, it should be strongly correlated with the actual chances of winning Define features of game state that assist in evaluation What are features of chess? e.g., # of pawns possessed, etc. Weighted Linear Function Eval(s) = w1f1(s) + w2f2(s) + … + wnfn(s)
Evaluation Functions (cont.-1) (a) Black has an advantage of a knight and two pawns and will win the game (b) Black will lose after white captures the queen
Evaluation Functions (cont.-2) Digression: Exact values don’t matter Behavior is preserved under any monotonic transformation of Eval Only the order matter payoff in deterministic games acts as an order utility function
Cutting off Search When do you use evaluation functions? if Cutoff-Test(state, depth) then return Eval(state) controlling the amount of search is to set a fixed depth limit d Cutoff-Test(state, depth) returns 1 or 0 when 1 is returned for all depth greater than some fixed depth d, use evaluation function cutoff beyond a certain depth cutoff if state is stable (more predictable) cutoff moves you know are bad (forward pruning) Can have disastrous effect if evaluation functions are not sophisticated enough Should continue the search until a quiescent position is found
Cutting off Search (cont.) Does it work in practice? bm = 106, b = 35 m = 4 4-ply lookahead is a hopeless chess player 4-ply human novice 8-ply typical PC, human master 12-ply Deep Blue, Kasparov
Horizontal Effect a series of checks by the black rook forces the inevitable queening move by white “over the horizontal” and makes the position look like a win for black, when it is really a win for white Horizontal effect arises when the program is facing a move by the opponent that causes serious damage and is ultimately unavoidable At present, no general solution has been found for horizontal problem
Suggestion Improve evaluation function Make the search deeper Know that the bishop is trapped Make the search deeper Make the search depth more flexible Program searches deeper in the line that a pawn is being given away, and less deep in other lines
HW2, Deadline 4/12 Design the Evaluation Functions for Chinese chess and Chinese Dark chess.