Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA.

Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA

Rutgers CS440, Fall 2003 Adversarial search So far, single agent search – no opponents or collaborators Multi-agent search: –Playing a game with an opponent: adversarial search –Economies: even more complex, societies of cooperative and non- cooperative agents Game playing and AI: –Games can be complex, require (?) human intelligence –Have to evolve in “real-time” –Well-defined problems –Limited scope

Rutgers CS440, Fall 2003 Games and AI DeterministicChance perfect infoCheckers, Chess, Go, Othello Backgammon, Monopoly imperfect infoBridge, Poker, Scrabble

Rutgers CS440, Fall 2003 Games and search Traditional search: single agent, searches for its well-being, unobstructed Games: search against an opponent Consider a two player board game: –e.g., chess, checkers, tic-tac-toe –board configuration: unique arrangement of "pieces" Representing board games as search problem: –states: board configurations –operators: legal moves –initial state: current board configuration –goal state: winning/terminal board configuration

Rutgers CS440, Fall 2003 Wrong representation We want to optimize our (agent’s) goal, hence build a search tree based on possible moves/actions Problem: discounts the opponent X XX X X XO X XO X O XX OX X

Rutgers CS440, Fall 2003 Better representation: game search tree Include opponent’s actions as well XXX X OXOX O X O X X O X XO X XO X O XX OX X Agent move Opponent move Agent move Full move 1015 Utilities (assigned to goal nodes)

Rutgers CS440, Fall 2003 Game search trees What is the size of the game search trees? –O(b d ) –Tic-tac-toe: 9! leaves (max depth= 9) –Chess: 35 legal moves, average “depth” 100 b d ~ 35 100 ~10 154 states, “only” ~10 40 legal states Too deep for exhaustive search!

Rutgers CS440, Fall 2003 Utilities in search trees Assign utility to (terminal) states, describing how much they are valued for the agent –High utility – good for the agent –Low utility – good for the opponent M1M1 N3N3 O2O2 K0K0 L2L2 F -7 G -5 H3H3 I9I9 J -6 E3E3 D2D2 B -5 C9C9 opponent's possible moves board evaluation from agent's perspective A9A9 computer's possible moves terminal states

Rutgers CS440, Fall 2003 Search strategy Worst-case scenario: assume the opponent will always make a best move (i.e., worst move for us) Minimax search: maximize the utility for our agent while expecting that the opponent plays his best moves: 1.High utility favors agent => chose move with maximal utility 2.Low move favors opponent => assume opponent makes the move with lowest utility EDBC A E1E1 D0D0 B -7 C -6 A1A1 M1M1 N3N3 O2O2 K0K0 L2L2 F -7 G -5 H3H3 I9I9 J -6 computer's possible moves opponent's possible moves terminal states

Rutgers CS440, Fall 2003 Minimax algorithm 1.Start with utilities of terminal nodes 2.Propagate them back to root node by choosing the minimax strategy EDBC A ED BC A M1M1 N3N3 O2O2 K0K0 L2L2 F -7 G -5 H3H3 I9I9 J -6 EDBC A E1E1 D0D0 B -5 C -6 A M1M1 N3N3 O2O2 K0K0 L2L2 F -7 G -5 H3H3 I9I9 J -6 min EDBC A E1E1 D0D0 B -5 C -6 A1A1 M1M1 N3N3 O2O2 K0K0 L2L2 F -7 G -5 H3H3 I9I9 J -6 max

Rutgers CS440, Fall 2003 Complexity of minimax algorithm Utilities propagate up in a recursive fashion: –DFS Space complexity: –O(bd) Time complexity: –O(b d ) Problem: time complexity – it’s a game, finite time to make a move

Rutgers CS440, Fall 2003 Reducing complexity of minimax (1) Don’t search to full depth d, terminate early Prune bad paths Problem: –Don’t have utility of non-terminal nodes Estimate utility for non-terminal nodes: –static board evaluation function (SBE) is a heuristic that assigns utility to non-terminal nodes –it reflects the computer’s chances of winning from that node –it must be easy to calculate from board configuration For example, Chess: SBE = α * materialBalance + β * centerControl + γ * … material balance = Value of white pieces - Value of black pieces pawn = 1, rook = 5, queen = 9, etc.

Rutgers CS440, Fall 2003 Minimax with Evaluation Functions Same as general Minimax, except –only goes to depth m –estimates using SBE function How would this algorithm perform at chess? –if could look ahead ~4 pairs of moves (i.e., 8 ply) would be consistently beaten by average players –if could look ahead ~8 pairs as done in a typical PC, is as good as human master

Rutgers CS440, Fall 2003 Reducing complexity of minimax (2) Some branches of the tree will not be taken if the opponent plays cleverly. Can we detect them ahead of time? Prune off paths that do not need to be explored Alpha-beta pruning Keep track of while doing DFS of game tree: –maximizing level: alpha highest value seen so far lower bound on node's evaluation/score –minimizing level: beta lowest value seen so far higher bound on node's evaluation/score

Rutgers CS440, Fall 2003 O W -3 B N4N4 F G -5 X -5 E D0D0 C R0R0 P9P9 Q -6 S3S3 T5T5 U -7 V -9 KM H3H3 I8I8 J L2L2 A Alpha-Beta Example minimax(A,0,4) max Call Stack A A Aα=Aα=

Rutgers CS440, Fall 2003 O W -3 B N4N4 F G -5 X -5 E D0D0 C R0R0 P9P9 Q -6 S3S3 T5T5 U -7 V -9 KM H3H3 I8I8 J L2L2 Aα=Aα= Alpha-Beta Example minimax(B,1,4) max Call Stack A B Bβ=Bβ= B min

Rutgers CS440, Fall 2003 O W -3 Bβ=Bβ= N4N4 F G -5 X -5 E D0D0 C R0R0 P9P9 Q -6 S3S3 T5T5 U -7 V -9 KM H3H3 I8I8 J L2L2 Aα=Aα= Alpha-Beta Example minimax(F,2,4) max Call Stack A F Fα=Fα= B min max F

Rutgers CS440, Fall 2003 O W -3 Bβ=Bβ= N4N4 Fα=Fα= G -5 X -5 E D0D0 C R0R0 P9P9 Q -6 S3S3 T5T5 U -7 V -9 KM H3H3 I8I8 J L2L2 Aα=Aα= Alpha-Beta Example minimax(N,3,4) maxCall Stack A N4N4 B min max F blue: terminal state N

Rutgers CS440, Fall 2003 O W -3 Bβ=Bβ= N4N4 Fα=Fα= G -5 X -5 E D0D0 C R0R0 P9P9 Q -6 S3S3 T5T5 U -7 V -9 KM H3H3 I8I8 J L2L2 Aα=Aα= Alpha-Beta Example minimax(F,2,4) is returned to max Call Stack A alpha = 4, maximum seen so far B min max F blue: terminal state F α=4

Rutgers CS440, Fall 2003 O W -3 Bβ=Bβ= N4N4 F α=4 G -5 X -5 E D0D0 C R0R0 P9P9 Q -6 S3S3 T5T5 U -7 V -9 KM H3H3 I8I8 J L2L2 Aα=Aα= Alpha-Beta Example minimax(O,3,4) max Call Stack A B min max F blue: terminal state O min O Oβ=Oβ=

Rutgers CS440, Fall 2003 blue: terminal state Oβ=Oβ= W -3 Bβ=Bβ= N4N4 F α=4 G -5 X -5 E D0D0 C R0R0 P9P9 Q -6 S3S3 T5T5 U -7 V -9 KM H3H3 I8I8 J L2L2 Aα=Aα= Alpha-Beta Example minimax(W,4,4) max Call Stack A B min max F blue: terminal state (depth limit) O W -3 min W

Rutgers CS440, Fall 2003 blue: terminal state Oβ=Oβ= W -3 Bβ=Bβ= N4N4 F α=4 G -5 X -5 E D0D0 C R0R0 P9P9 Q -6 S3S3 T5T5 U -7 V -9 KM H3H3 I8I8 J L2L2 Aα=Aα= Alpha-Beta Example minimax(O,3,4) is returned to max Call Stack A beta = -3, minimum seen so far B min max F O min O β=-3

Rutgers CS440, Fall 2003 blue: terminal state O β=-3 W -3 Bβ=Bβ= N4N4 F α=4 G -5 X -5 E D0D0 C R0R0 P9P9 Q -6 S3S3 T5T5 U -7 V -9 KM H3H3 I8I8 J L2L2 Aα=Aα= Alpha-Beta Example minimax(O,3,4) is returned to max Call Stack A O's beta  F's alpha: stop expanding O (alpha cut-off) B min max F O min X -5

Rutgers CS440, Fall 2003 blue: terminal state O β=-3 W -3 Bβ=Bβ= N4N4 F α=4 G -5 X -5 E D0D0 C R0R0 P9P9 Q -6 S3S3 T5T5 U -7 V -9 KM H3H3 I8I8 J L2L2 Aα=Aα= Alpha-Beta Example Why?Smart opponent will choose W or worse, thus O's upper bound is –3 So computer shouldn't choose O:-3 since N:4 is better max Call Stack A B min max F O min

Rutgers CS440, Fall 2003 blue: terminal state O β=-3 W -3 Bβ=Bβ= N4N4 F α=4 G -5 X -5 E D0D0 C R0R0 P9P9 Q -6 S3S3 T5T5 U -7 V -9 KM H3H3 I8I8 J L2L2 Aα=Aα= Alpha-Beta Example minimax(F,2,4) is returned to max Call Stack A B min max Fmin X -5 alpha not changed (maximizing)

Rutgers CS440, Fall 2003 blue: terminal state O β=-3 W -3 Bβ=Bβ= N4N4 F α=4 G -5 X -5 E D0D0 C R0R0 P9P9 Q -6 S3S3 T5T5 U -7 V -9 KM H3H3 I8I8 J L2L2 Aα=Aα= Alpha-Beta Example minimax(B,1,4) is returned to max Call Stack A B min max min X -5 beta = 4, minimum seen so far B β=4

Rutgers CS440, Fall 2003 Effectiveness of Alpha-Beta Search  Effectiveness depends on the order in which successors are examined. More effective if best are examined first Worst Case: –ordered so that no pruning takes place –no improvement over exhaustive search Best Case: –each player’s best move is evaluated first (left-most) In practice, performance is closer to best rather than worst case

Rutgers CS440, Fall 2003 Effectiveness of Alpha-Beta Search In practice often get O(b (d/2) ) rather than O(b d ) –same as having a branching factor of  b since (  b) d = b (d/2) For Example: Chess –goes from b ~ 35 to b ~ 6 –permits much deeper search for the same time –makes computer chess competitive with humans

Rutgers CS440, Fall 2003 Dealing with Limited Time In real games, there is usually a time limit T on making a move How do we take this into account? –cannot stop alpha-beta midway and expect to use results with any confidence –so, we could set a conservative depth-limit that guarantees we will find a move in time < T –but then, the search may finish early and the opportunity is wasted to do more search

Rutgers CS440, Fall 2003 Dealing with Limited Time In practice, iterative deepening search (IDS) is used –run alpha-beta search with an increasing depth limit –when the clock runs out, use the solution found for the last completed alpha-beta search (i.e., the deepest search that was completed)

Rutgers CS440, Fall 2003 The Horizon Effect Sometimes disaster lurks just beyond search depth –computer captures queen, but a few moves later the opponent checkmates (i.e., wins) The computer has a limited horizon; it cannot see that this significant event could happen How do you avoid catastrophic losses due to “short- sightedness”? –quiescence search –secondary search

Rutgers CS440, Fall 2003 The Horizon Effect Quiescence Search –when evaluation frequently changing, look deeper than limit –look for a point when game “quiets down” Secondary Search 1.find best move looking to depth d 2.look k steps beyond to verify that it still looks good 3.if it doesn't, repeat Step 2 for next best move

Rutgers CS440, Fall 2003 Book Moves Build a database of opening moves, end games, and studied configurations If the current state is in the database, use database: –to determine the next move –to evaluate the board Otherwise, do alpha-beta search

Rutgers CS440, Fall 2003 Examples of Algorithms which Learn to Play Well Checkers: A. L. Samuel, “Some Studies in Machine Learning using the Game of Checkers,” IBM Journal of Research and Development, 11(6):601-617, 1959 Learned by playing a copy of itself thousands of times Used only an IBM 704 with 10,000 words of RAM, magnetic tape, and a clock speed of 1 kHz Successful enough to compete well at human tournaments

Rutgers CS440, Fall 2003 Examples of Algorithms which Learn to Play Well Backgammon: G. Tesauro and T. J. Sejnowski, “A Parallel Network that Learns to Play Backgammon,” Artificial Intelligence 39(3), 357-390, 1989 Also learns by playing copies of itself Uses a non-linear evaluation function - a neural network Rated one of the top three players in the world

Rutgers CS440, Fall 2003 Non-deterministic Games Some games involve chance, for example: –roll of dice –spin of game wheel –deal of cards from shuffled deck How can we handle games with random elements? The game tree representation is extended to include chance nodes: 1.agent moves 2.chance nodes 3.opponent moves

Rutgers CS440, Fall 2003 Non-deterministic Games The game tree representation is extended: Aα=Aα= B β=2 72 C β=6 96 D β=0 50 E β=-4 8-4 50/50.5 max chance min

Rutgers CS440, Fall 2003 Non-deterministic Games Weight score by the probabilities that move occurs Use expected value for move: sum of possible random outcomes Aα=Aα= B β=2 72 C β=6 96 D β=0 50 E β=-4 8-4 50/50.5 max chance min 50/50 4 50/50 -2

Rutgers CS440, Fall 2003 Non-deterministic Games Choose move with highest expected value Aα=Aα= B β=2 72 C β=6 96 D β=0 50 E β=-4 8-4 50/50 4 50/50 -2.5 max chance min A α=4

Rutgers CS440, Fall 2003 Non-deterministic Games Non-determinism increases branching factor –21 possible rolls with 2 dice Value of lookahead diminishes: as depth increases probability of reaching a given node decreases alpha-beta pruning less effective TDGammon: –depth-2 search –very good heuristic –plays at world champion level

Rutgers CS440, Fall 2003 Computers can play GrandMaster Chess “Deep Blue” (IBM) Parallel processor, 32 nodes Each node has 8 dedicated VLSI “chess chips” Can search 200 million configurations/second Uses minimax, alpha-beta, sophisticated heuristics It currently can search to 14 ply (i.e., 7 pairs of moves) Can avoid horizon by searching as deep as 40 ply Uses book moves

Rutgers CS440, Fall 2003 Computers can play GrandMaster Chess Kasparov vs. Deep Blue, May 1997 6 game full-regulation chess match sponsored by ACM Kasparov lost the match 2 wins & 1 tie to 3 wins & 1 tie This was an historic achievement for computer chess being the first time a computer became the best chess player on the planet Note that Deep Blue plays by “brute force” (i.e., raw power from computer speed and memory); it uses relatively little that is similar to human intuition and cleverness

Rutgers CS440, Fall 2003 Status of Computers in Other Deterministic Games Checkers/Draughts –current world champion is Chinook –can beat any human, (beat Tinsley in 1994) –uses alpha-beta search, book moves (> 443 billion) Othello –computers can easily beat the world experts Go –branching factor b ~ 360 (very large!) –$2 million prize for any system that can beat a world expert

Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA.

Similar presentations

Presentation on theme: "Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA.

Similar presentations

Presentation on theme: "Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA."— Presentation transcript:

Similar presentations

About project

Feedback