Presentation is loading. Please wait.

Presentation is loading. Please wait.

6. Fully Observable Game Playing

Similar presentations


Presentation on theme: "6. Fully Observable Game Playing"— Presentation transcript:

1 6. Fully Observable Game Playing
2012/03/28

2 Games vs. search problems

3 Game Theory Studied by mathematicians, economists, finance
In AI, we limit game to deterministic turn-taking two-player zero-sum perfect information This means deterministic, full observable environments in which there are two agents whose action must alternate and in in which the utility values at the end of the game are always equal and opposite

4 Types of Games deterministic chance perfect information Chess, Checkers (西洋跳棋), Go, Othello Backgammon (西洋雙陸棋) imperfect information Bridge, Poker Game playing was one of the first tasks undertaken in AI Machines have surpassed humans on checker and Othello, have defeated human champions in chess and backgammon In Go, computers perform at the amateur level

5 Checkers

6 Game as Search Problems
Games offer pure, abstract competition A chess playing computer would be an existence proof of a machine doing something generally thought to require intelligence Games are idealization of worlds in which the world state is fully accessible the (small number of) actions are well-defined uncertainty due to moves by the opponent due to the complexity of games

7 Game as Search Problems (cont.-1)
Games are usually much too hard to solve Example, chess: average branching factor = 35 average moves per player = 50 total number of nodes in search tree = or 10154 total number of different legal positions = 1040 Time limits for making good decisions Unlikely to find goal, must approximate

8 Game as Search Problems (cont.-2)
Initial State How does the game start? Successor Function A list of legal (move, state) pairs for each state Terminal Test Determine when game is over Utility Function Provide numeric value for all terminal states e.g., win, lose, draw with +1, -1, 0

9 Game Tree (2-player, deterministic, turns)
Game tree complexity 9!=362880 Game board complexity 39=

10 Minimax Strategy Assumption MinimaxValue(n) =
Both players are knowledgeable and play the best possible move MinimaxValue(n) = Utility(n) if n is a terminal state maxsSuccessors(n) MinimaxValue(s) if n is a MAX node minsSuccessors(n) MinimaxValue(s) if n is a MIN node

11 Minimax Strategy (cont.)
Is a Optimal Strategy Leads to outcomes at least as good as any other strategy when playing an infallible opponent Pick the option that most (max) minimizes the damage your opponent can do maximize the worst-case outcome because your skillful opponent will certainly find the most damaging move

12 Minimax Perfect play for deterministic, perfect information games
Idea: choose moves to a position with highest minimax value = best achievable payoff against best play

13 Minimax – Animated Example
3 6 The computer can obtain 6 by choosing the right hand edge from the first node. Max 5 1 3 6 2 7 6 Min 5 3 1 3 7 Max 6 5 13

14 Minimax Algorithm function MINIMAX-DECISION ( state ) returns an action inputs: state, current state in game v  MAX-VALUE ( state ) return the action in SUCCESSORS( state ) with value v function MAX-VALUE ( state ) returns a utility value if TERMINAL-TEST ( state ) then return UTILITY ( state ) v  –  for a, s in SUCCESSORS ( state ) do v  MAX ( v, MIN-VALUE ( s )) return v function MIN-VALUE ( state ) returns a utility value v   v  MIN ( v, MAX-VALUE ( s ))

15 Optimal Decisions in Multiplayer Games
Extend the minimax idea to multiplayer games Replace the single value for each node with a vector of values

16 Minimax Algorithm (cont.)
Generate the whole game tree Apply the utility function to each terminal state Propagate utility of terminal states up one level Utility(n) = max / min (n.1, n.2, …, n.b) At the root, MAX chooses the move leading to the highest utility value

17 Analysis of Minimax Complete? Yes, only if the tree is finite
Optimal? Yes, against an optimal opponent Time? O(bm), is a complete depth-first search m: max depth, b: # of legal moves Space? O(bm), generate all successors at once or O(m), generate successor one at a time For chess, b  35, m  100 for reasonable games  Exact solution completely infeasible

18 Complex Games What happens if minimax is applied to large complex games? What happens to the search space? Example, chess Decent amateur program  1000 moves / second 150 seconds / move (tournament play) Look at approx. 150,000 moves Chess branching factor of 35 Generate trees that are 3-4 ply Resultant play – pure amateur

19 - Pruning The problem of minimax search - pruning
# of states to examine: exponential in number of moves - pruning return the same move as minimax would, but prune away branches that cannot possibly influence the final decision  – lower bound on MAX node, never decreasing value of the best (highest) choice so far in search of MAX  – upper bound on MAX node, never increasing value of the best (lowest) choice so far in search of MIN

20 - Pruning Example - 1

21 - Pruning Example - 1 (2nd Ed.)
? [-, 2]

22 - Pruning (cont.)  cut-off  cut-off
Search is discontinued below any MIN node with min-value     cut-off Search is discontinued below any MAX node with max-value    Order of considering successors matters (look at step f in previous slide) If possible, consider best successors first

23 - Pruning (cont.) max min If n is worse than , max will avoid it  prune the branch If m is better than n for player, we will never get to n in play and just prune it

24 - Pruning Example - 2 A D E F G  = - 6  =   = -  =  6 8 2 5
 = 6  =  6     = 6  =  B C  = - 6  =   = -  =   = -  = 6  = - 8  = 6     = 6  =  D E F G H I J K L M L M

25 - Pruning Example - 3 MAX MIN Completed 5 1 2 3 4 7 6 a c b d e f g
6 a c b d e f g Node Alpha Beta a 3 ∞ Node Alpha Beta a -∞ ∞ b -∞ 3 b -∞ ∞ d 1 ∞ d -∞ ∞ d 3 ∞ d 2 ∞ e CUT-OFF e -∞ 3 c CUT-OFF c 3 ∞ f 3 ∞ Completed

26 Key: -∞ = negative infinity; +∞ = positive infinity
Function Node a b V Return Max A -∞, 3 +∞ 3, 2 Min B -∞ +∞, 3 3, 4 D -∞,1,2,3 1,2,3 1 2 3 E -∞, 4 4 Cutoff 5 & 7 C +∞, 6 6 F 3,4,5,6 -∞,4,5,6 4,5,6 5 G -∞, 6 Cutoff 1 & 5 Key: -∞ = negative infinity; +∞ = positive infinity The last value in a square is the final value assigned to the specific variable, i.e. at the end of the search Node A’s a = 3.

27 - Algorithm function ALPHA-BETA-SEARCH ( state ) returns an action
inputs: state, current state in game v  MAX-VALUE ( state, – ,   ) return the action in SUCCESSORS ( state ) with value v function MAX-VALUE ( state, ,  ) returns a utility value , the value of the best alternative for MAX along the path to state , the value of the best alternative for MIN along the path to state if TERMINAL-TEST ( state ) then return UTILITY ( state ) v  –  for a, s in SUCCESSORS ( state ) do v  MAX ( v, MIN-VALUE ( s, ,  )) if v   then return v // fail-high   MAX ( , v ) return v

28 - Algorithm (cont.) function MIN-VALUE ( state, ,  ) returns a utility value inputs: state, current state in game , the value of the best alternative for MAX along the path to state , the value of the best alternative for MIN along the path to state if TERMINAL-TEST ( state ) then return UTILITY ( state ) v    for a, s in SUCCESSORS ( state ) do v  MIN ( v, MAX-VALUE ( s, ,  )) if v   then return v // fail low   MIN ( , v ) return v

29 - Pruning Example - 4 MAX MIN 5 8 7 a c b d e f g 4 2 1 3 h i j k l
3 h i j k l m n

30 - Pruning Example - 5 MAX MIN -1 5 a c b d e f g 1 2 -5 3 4 h i j k
-1 5 a c b d e f g 1 2 -5 3 4 h i j k l n o -4 -3 m

31 Analysis of - Search Pruning does not affect final result
The effectiveness of - pruning is highly dependent on the order in which the successors are examined  It is worthwhile to try to examine first the successors that are likely to be best e.g., Example 1 (e,f) If successors of D is 2, 5, 14 (instead of 14, 5, 2) then 5, 14 can be pruned

32 Analysis of - Search (cont.)
If best move first (perfect ordering), the total number of nodes examined = O(bm/2) effective branching factor = b1/2 for chess, 6 instead 35 i.e., - can look ahead roughly twice as far as minimax in the same amount of time If random order, the total number of nodes examined = O(b3m/4) for moderate b

33 Imperfect, Real-Time Decisions
No practical to assume the program has time to search all the ways to terminal states Since moves must be made in a reasonable amount of time, to alter minimax or - in two ways Evaluation Function (instead of utility function) an estimate of the expected utility of game from a given position Cutoff Test (instead of terminal test) decide when to apply Eval e.g., depth limit (perhaps add quiescence search)

34 Evaluation Functions The heuristic that estimates expected utility
Preserve the ordering among terminal states in the same way as the true utility function, otherwise it can cause bad decision making Computation cannot take too long For nonterminal states, it should be strongly correlated with the actual chances of winning Define features of game state that assist in evaluation What are features of chess? e.g., # of pawns possessed, etc. Weighted Linear Function Eval(s) = w1f1(s) + w2f2(s) + … + wnfn(s)

35 Evaluation Functions (cont.-1)
(a) Black has an advantage of a knight and two pawns and will win the game (b) Black will lose after white captures the queen

36 Evaluation Functions (cont.-2)
Digression: Exact values don’t matter Behavior is preserved under any monotonic transformation of Eval Only the order matter payoff in deterministic games acts as an order utility function

37 Cutting off Search When do you use evaluation functions?
if Cutoff-Test(state, depth) then return Eval(state) controlling the amount of search is to set a fixed depth limit d Cutoff-Test(state, depth) returns 1 or 0 when 1 is returned for all depth greater than some fixed depth d, use evaluation function cutoff beyond a certain depth cutoff if state is stable (more predictable) cutoff moves you know are bad (forward pruning) Can have disastrous effect if evaluation functions are not sophisticated enough Should continue the search until a quiescent position is found

38 Cutting off Search (cont.)
Does it work in practice? bm = 106, b = 35  m = 4 4-ply lookahead is a hopeless chess player 4-ply  human novice 8-ply  typical PC, human master 12-ply  Deep Blue, Kasparov

39 Horizontal Effect a series of checks by the black rook forces the inevitable queening move by white “over the horizontal” and makes the position look like a win for black, when it is really a win for white Horizontal effect arises when the program is facing a move by the opponent that causes serious damage and is ultimately unavoidable At present, no general solution has been found for horizontal problem

40 Suggestion Improve evaluation function Make the search deeper
Know that the bishop is trapped Make the search deeper Make the search depth more flexible Program searches deeper in the line that a pawn is being given away, and less deep in other lines

41 HW2, Deadline 4/12 Design the Evaluation Functions for Chinese chess and Chinese Dark chess.


Download ppt "6. Fully Observable Game Playing"

Similar presentations


Ads by Google