P. Dunne Adapted from slides created by © Franz Kurfess Games 1 Single-Person Game conventional search problem identify a sequence of moves that leads to a winning state examples: Solitaire, dragons and dungeons, Rubik’s cube little attention in AI some games can be quite challenging some versions of solitaire a heuristic for Rubik’s cube was found by the Absolver program
P. Dunne Adapted from slides created by © Franz Kurfess Games 2 Two-Person Game games with two opposing players often called MAX and MIN usually MAX moves first, then min in game terminology, a move comprises one step, or play by each player Typically you are Max MAX wants a strategy to find a winning state no matter what MIN does Max must assume MIN does the same or at least tries to prevent MAX from winning
P. Dunne Adapted from slides created by © Franz Kurfess Games 3 Perfect Decisions Optimal strategy for MAX traverse all relevant parts of the search tree this must include possible moves by MIN identify a path that leads MAX to a winning state So MAX must do some work to estimate all the possible moves (to a certain depth) from the current position and try to plan the best way forward such that he will win often impractical time and space limitations
P. Dunne Adapted from slides created by © Franz Kurfess Games Nodes are discovered using DFS Once leaf nodes are discovered they are scored Here Max is building a tree of possibilities Which way should he play when the tree is finished? Max’s possible moves Mins’s possible moves Max’s moves Mins’s moves
P. Dunne Adapted from slides created by © Franz Kurfess Games 5 Max-Min Example terminal nodes: values calculated from the utility function
P. Dunne Adapted from slides created by © Franz Kurfess Games 6 MiniMax Example Min other nodes: values calculated via minimax algorithm Here the green nodes pick the minimum value from the nodes underneath
P. Dunne Adapted from slides created by © Franz Kurfess Games 7 MiniMax Example Max Min other nodes: values calculated via minimax algorithm Here the red nodes pick the maximum value from the nodes underneath
P. Dunne Adapted from slides created by © Franz Kurfess Games 8 MiniMax Example Max Min other nodes: values calculated via minimax algorithm
P. Dunne Adapted from slides created by © Franz Kurfess Games 9 MiniMax Example Max Min
P. Dunne Adapted from slides created by © Franz Kurfess Games 10 MiniMax Example Max Min moves by Max and countermoves by Min
P. Dunne Adapted from slides created by © Franz Kurfess Games 11 MiniMax Observations the values of some of the leaf nodes are irrelevant for decisions at the next level this also holds for decisions at higher levels as a consequence, under certain circumstances, some parts of the tree can be disregarded it is possible to still make an optimal decision without considering those parts
P. Dunne Adapted from slides created by © Franz Kurfess Games 12 What is pruning? You don’t have to look at every node in the tree discards parts of the search tree That are guaranteed not to contain good moves results in substantial time and space savings as a consequence, longer sequences of moves can be explored the leftover part of the task may still be exponential, however
P. Dunne Adapted from slides created by © Franz Kurfess Games 13 Alpha-Beta Pruning extension of the minimax approach results in the same move as minimax, but with less overhead prunes uninteresting parts of the search tree certain moves are not considered won’t result in a better evaluation value than a move further up in the tree they would lead to a less desirable outcome applies to moves by both players indicates the best choice for Max so far never decreases indicates the best choice for Min so far never increases
P. Dunne Adapted from slides created by © Franz Kurfess Games 14 Note For the following example remember: Nodes are found with DFS As a terminal or leaf node (or at a temporary terminal node at a certain depth) is found a utility function gives it a score So do we need to evaluate F and G once we find E? Can we prune the tree? 5 E FG
P. Dunne Adapted from slides created by © Franz Kurfess Games 15 Alpha-Beta Example 1 Max Min [-∞, +∞] 5 Step we expand the tree a little we assume a depth-first, left-to-right search as basic strategy the range ( [-∞, +∞] ) of the possible values for each node are indicated initially the local values [-∞, +∞] reflect the values of the sub-trees in that node from Max’s or Min’s perspective; Since we haven’t expanded they are infinite the global values and are the best overall choices so far for Max or Min [-∞, +∞] best choice for Max? best choice for Min? Global Values Local Values
P. Dunne Adapted from slides created by © Franz Kurfess Games 16 Alpha-Beta Example 2 Max Min 7 [-∞, 7] 5 We evaluate a node Min obtains the first value from a successor node [-∞, +∞] best choice for Max? best choice for min node if value of node < value of parent then abandon. …NO because no yet
P. Dunne Adapted from slides created by © Franz Kurfess Games 17 Alpha-Beta Example 3 Max Min 76 [-∞, 6] 5 Min obtains the second value from a successor node [-∞, +∞] best choice for Max? best choice for min node if value of node < value of parent then abandon. …NO
P. Dunne Adapted from slides created by © Franz Kurfess Games 18 Alpha-Beta Example 4 Max Min [5, +∞] best choice for Max5 best choice for Min5 Min obtains the third value from a successor node this is the last value from this sub-tree, and the exact value is known Min is finished on this branch Max now has a value for its first successor node, but hopes that something better might still min node if value of node < value of parent then abandon. …No more nodes
P. Dunne Adapted from slides created by © Franz Kurfess Games 19 Alpha-Beta Example 5 Max Min Min continues with the next sub-tree, and gets a better value Max has a better choice from its perspective (the ‘5’) and should not consider a move into the sub-tree currently being explored by Min 3 [5, +∞] best choice for Max5 best choice for Min3 [-∞, min node if value of node < value of parent then abandon. …YES!
P. Dunne Adapted from slides created by © Franz Kurfess Games 20 Alpha-Beta Example 6 Max Min Max won’t consider a move to this sub-tree it would choose 5 first =>abandon expansion this is a case of pruning, indicated by Pruning means that we won’t do a DFS into these nodes No need to use the utility function to calculate the nodes value No need to explore that part of the tree further. 3 [5, +∞] best choice for Max5 best choice for Min3 [-∞, min node if value of node < value of parent then abandon. …YES!
P. Dunne Adapted from slides created by © Franz Kurfess Games 21 Alpha-Beta Example 7 Max Min Min explores the next sub-tree, and finds a value that is worse than the other nodes at this level Min knows this ‘6’ is good for Max and will then evaluate the leaf nodes looking for a lower value (if it exists) if Min is not able to find something lower, then Max will choose this branch 3 best choice for Max5 best choice for Min3 5 [5, +∞] [-∞, 3][-∞, min node if value of node < value of parent then abandon. …NO
P. Dunne Adapted from slides created by © Franz Kurfess Games 22 Alpha-Beta Example 8 Max Min Min is lucky, and finds a value that is the same as the current worst value at this level Max can choose this branch, or the other branch with the same value 3 best choice for Max5 best choice for Min3 5 [5, +∞] [-∞, 3][-∞, 5] min node if value of node < value of parent then abandon. …YES!
P. Dunne Adapted from slides created by © Franz Kurfess Games 23 Alpha-Beta Example 9 Max Min Min could continue searching this sub-tree to see if there is a value that is less than the current worst alternative in order to give Max as few choices as possible this depends on the specific implementation Max knows the best value for its sub-tree 3 best choice for Max5 best choice for Min3 5 5 [-∞, 3][-∞, 5] 5
P. Dunne Adapted from slides created by © Franz Kurfess Games 24 Alpha-Beta Example Overview Max Min <=5 5 some branches can be pruned because they would never be considered after looking at one branch, Max already knows that they will not be of interest since Min would choose a value that is less than what Max already has at its disposal 364 <= 3 5 best choice for Max5 best choice for Min3
P. Dunne Adapted from slides created by © Franz Kurfess Games 25 Properties of Alpha-Beta Pruning in the ideal case, the best successor node is examined first alpha-beta can look ahead twice as far as minimax assumes an idealized tree model uniform branching factor, path length random distribution of leaf evaluation values requires additional information for good players game-specific background knowledge empirical data
P. Dunne Adapted from slides created by © Franz Kurfess Games 26 Imperfect Decisions Does this mean I have to expand the tree all the way??? complete search is impractical for most games alternative: search the tree only to a certain depth requires a cutoff-test to determine where to stop
P. Dunne Adapted from slides created by © Franz Kurfess Games 27 Evaluation Function If I stop part of the way down how do I score these nodes (which are not terminal nodes!) Use an evaluation function to score these nodes! must be consistent with the utility function values for terminal nodes (or at least their order) must be the same tradeoff between accuracy and time cost without time limits, minimax could be used
P. Dunne Adapted from slides created by © Franz Kurfess Games 28 Example: Tic-Tac-Toe simple evaluation function E(s) = (rx + cx + dx) - (ro + co + do) where r,c,d are the number of rows, columns and diagonals lines available for a win for X (or O) in that state; x and o are the pieces of the two players 1-ply lookahead start at the top of the tree evaluate all 9 choices for player 1 pick the maximum E-value 2-ply lookahead also looks at the opponents possible move assuming that the opponents picks the minimum E-value
P. Dunne Adapted from slides created by © Franz Kurfess Games 29 E(s1:2) = 2 E(s1:3) = 3 E(s1:4) = 2 E(s1:5) = 4 E(s1:6) = 2 E(s1:7) = 3 E(s1:8) = 2 E(s1:9) = 3 Tic-Tac-Toe 1-Ply XXX XXX XXX E(s1:1) = 3 E(s0) = Max{E(s11), E(s1n)} = Max{2,3,4} = 4 E(s1:1) == E of depth-level ‘1’ number ‘1’ Simple evaluation function E(s1:1) = (rx + cx + dx) - (ro + co + do) E(s1:1) = ‘X’ has (3rows + 3 cols + 2 diags) – ‘O’ has (2r + 2c +1d) E(s11) = 8 -5 = 3
P. Dunne Adapted from slides created by © Franz Kurfess Games 30 E(s2:16) = -1 E(s2:15) 5 -6 = -1 E(s28) = 0 E(s27) = 1 E(s2:48) = 1 E(s2:47) = 2 E(s2:13) = -1 E(s2:9) = -1 E(s2:10) 5 -6 = -1 E(s2:11) = -1 E(s2:12) = -1 E(s2:14) = -1 E(s25) = 1 E(s21) = 1 E(s22) = 0 E(s23) = 1 E(s24) = -1 E(s26) = 0 E(s1:6) = 2 E(s1:7) = 3 E(s1:8) = 2 E(s1:9) = 3 E(s1:5) = 4 E(s1:3) = 3 E(s1:2) = 2 E(s1:1) = 3 E(s2:45) = 2 Tic-Tac-Toe 2-Ply XXX XXX XXX E(s0) = Max{E(s11), E(s1n)} = Max{2,3,4} = 4 E(s1:4) = 2 XOX O X O E(s2:41) = 1 E(s2:42) = 2 E(s2:43) = 1 E(s2:44) = 2 E(s2:46) = 1 OX O X O X O XX O X O X O X O XX O XOOXX O X O X O X O X O X O XOXOX O O Note: For 2-Ply we must expand all nodes in the 1 st level but as scores of 2 and 3 repeat a few times we only expand one of each.
P. Dunne Adapted from slides created by © Franz Kurfess Games 31 E(s2:16) = -1 E(s2:15) 5 -6 = -1 E(s28) = 0 E(s27) = 1 E(s2:48) = 1 E(s2:47) = 2 E(s2:13) = -1 E(s2:9) = -1 E(s2:10) 5 -6 = -1 E(s2:11) = -1 E(s2:12) = -1 E(s2:14) = -1 E(s25) = 1 E(s21) = 1 E(s22) = 0 E(s23) = 1 E(s24) = -1 E(s26) = 0 E(s1:6) = 2 E(s1:7) = 3 E(s1:8) = 2 E(s1:9) = 3 E(s1:5) = 4 E(s1:3) = 3 E(s1:2) = 2 E(s1:1) = 3 E(s2:45) = 2 Tic-Tac-Toe 2-Ply XXX XXX XXX E(s0) = Max{E(s11), E(s1n)} = Max{2,3,4} = 4 E(s1:4) = 2 XOX O X O E(s2:41) = 1 E(s2:42) = 2 E(s2:43) = 1 E(s2:44) = 2 E(s2:46) = 1 OX O X O X O XX O X O X O X O XX O XOOXX O X O X O X O X O X O XOXOX O O It seems that the centre X (i.e. a move to E(s1:5)) is the best because its successor nodes have a value of 1 (as opposed to 0 or worse -1)
P. Dunne Adapted from slides created by © Franz Kurfess Games Checkers Case Study initial board configuration Black single on 20 single on 21 king on 31 Redsingle on 23 king on 22 evaluation function E(s) = (5 x 1 + x 2 ) - (5r 1 + r 2 ) where x 1 = black king advantage, x 2 = black single advantage, r 1 = red king advantage, r 2 = red single advantage
P. Dunne Adapted from slides created by © Franz Kurfess Games > > > > > > MIN moves Part 1 MiniMax using DFS Typically you expand DFS style to a certain depth and then eval. node E(s) = (5 x1 + x2) - (5r1 + r2) = (5+2) –(0 +1) = 6 takes red king MAX moves
P. Dunne Adapted from slides created by © Franz Kurfess Games > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > MAX MIN We fast-forward and see the full tree to a certain depth
P. Dunne Adapted from slides created by © Franz Kurfess Games > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > MAX MIN Then we score leaf nodes
P. Dunne Adapted from slides created by © Franz Kurfess Games > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > MAX MIN Then we propagate Min-Max scores upwards
P. Dunne Adapted from slides created by © Franz Kurfess Games 37 Could we make this a little easier? This tree will really expand some considerable distance What about if we introduce a little pruning OK then … but first we have to go back to the start.
P. Dunne Adapted from slides created by © Franz Kurfess Games > > > > > > 14 MIN moves Checkers Min-Max With pruning MAX moves 6 6 Temporarily propagate values up
P. Dunne Adapted from slides created by © Franz Kurfess Games 39 Checkers Alpha-Beta Example > > > > > > > > > 27 MAX max node if value of node > value of parent then abandon. …No! … No! … No! …. No …. No! Alas no pruning below this Max node! Next node is expanded
P. Dunne Adapted from slides created by © Franz Kurfess Games 40 Checkers Alpha-Beta Example > > > > > > > > > > > > MAX MIN Notice new value propagated max node if value of node > value of parent then abandon. …No! So its in the interest of Max node to see if it can do better … No! etc …. No etc …. No!
P. Dunne Adapted from slides created by © Franz Kurfess Games 41 Checkers Alpha-Beta Example > > > > > > > > > > > > Its not in MAXs interest to bother expanding these nodes as they have the same value as 16-> 11 so we prune them MAX MIN
P. Dunne Adapted from slides created by © Franz Kurfess Games 42 Checkers Alpha-Beta Example > > > > > > > > > > > > > > > MAX MIN
P. Dunne Adapted from slides created by © Franz Kurfess Games 43 Checkers Alpha-Beta Example > > > > > > > > > > > > > > > > > 27 MAX MIN
P. Dunne Adapted from slides created by © Franz Kurfess Games 44 Search Limits search must be cut off because of time or space limitations strategies like depth-limited or iterative deepening search can be used don’t take advantage of knowledge about the problem more refined strategies apply background knowledge quiescent search cut off only parts of the search space that don’t exhibit big changes in the evaluation function
P. Dunne Adapted from slides created by © Franz Kurfess Games 45 Games and Computers state of the art for some game programs Chess Checkers Othello Backgammon Go
P. Dunne Adapted from slides created by © Franz Kurfess Games 46 Chess Deep Blue, a special-purpose parallel computer, defeated the world champion Gary Kasparov in 1997 the human player didn’t show his best game some claims that the circumstances were questionable Deep Blue used a massive data base with games from the literature Fritz, a program running on an ordinary PC, challenged the world champion Vladimir Kramnik to an eight-game draw in 2002 top programs and top human players are roughly equal
P. Dunne Adapted from slides created by © Franz Kurfess Games 47 Checkers Arthur Samuel develops a checkers program in the 1950s that learns its own evaluation function reaches an expert level stage in the 1960s Chinook becomes world champion in 1994 human opponent, Dr. Marion Tinsley, withdraws for health reasons Tinsley had been the world champion for 40 years Chinook uses off-the-shelf hardware, alpha-beta search, end-games data base for six-piece positions
P. Dunne Adapted from slides created by © Franz Kurfess Games 48 Othello Logistello defeated the human world champion in 1997 many programs play far better than humans smaller search space than chess little evaluation expertise available
P. Dunne Adapted from slides created by © Franz Kurfess Games 49 Backgammon TD-Gammon, neural-network based program, ranks among the best players in the world improves its own evaluation function through learning techniques search-based methods are practically hopeless chance elements, branching factor