Adversarial Search. Game playing Perfect play The minimax algorithm alpha-beta pruning Resource limitations Elements of chance Imperfect information.

Slides:



Advertisements
Similar presentations
Chapter 6, Sec Adversarial Search.
Advertisements

Adversarial Search Chapter 6 Sections 1 – 4. Outline Optimal decisions α-β pruning Imperfect, real-time decisions.
Adversarial Search Chapter 6 Section 1 – 4. Types of Games.
Games & Adversarial Search Chapter 5. Games vs. search problems "Unpredictable" opponent  specifying a move for every possible opponent’s reply. Time.
February 7, 2006AI: Chapter 6: Adversarial Search1 Artificial Intelligence Chapter 6: Adversarial Search Michael Scherger Department of Computer Science.
Games & Adversarial Search
CSC 8520 Spring Paula Matuszek CS 8520: Artificial Intelligence Solving Problems by Searching Paula Matuszek Spring, 2010 Slides based on Hwee Tou.
CS 484 – Artificial Intelligence
Adversarial Search Chapter 6 Section 1 – 4.
Adversarial Search Chapter 5.
COMP-4640: Intelligent & Interactive Systems Game Playing A game can be formally defined as a search problem with: -An initial state -a set of operators.
1 Game Playing. 2 Outline Perfect Play Resource Limits Alpha-Beta pruning Games of Chance.
Adversarial Search Game Playing Chapter 6. Outline Games Perfect Play –Minimax decisions –α-β pruning Resource Limits and Approximate Evaluation Games.
Adversarial Search CSE 473 University of Washington.
Adversarial Search Chapter 6.
Artificial Intelligence for Games Game playing Patrick Olivier
Advanced Artificial Intelligence
Adversarial Search 對抗搜尋. Outline  Optimal decisions  α-β pruning  Imperfect, real-time decisions.
An Introduction to Artificial Intelligence Lecture VI: Adversarial Search (Games) Ramin Halavati In which we examine problems.
1 Adversarial Search Chapter 6 Section 1 – 4 The Master vs Machine: A Video.
10/19/2004TCSS435A Isabelle Bichindaritz1 Game and Tree Searching.
EIE426-AICV 1 Game Playing Filename: eie426-game-playing-0809.ppt.
G51IAI Introduction to AI Minmax and Alpha Beta Pruning Garry Kasparov and Deep Blue. © 1997, GM Gabriel Schwartzman's Chess Camera, courtesy IBM.
Games with Chance 2012/04/25 1. Nondeterministic Games: Backgammon White moves clockwise toward 25. Black moves counterclockwise toward 0. A piece can.
This time: Outline Game playing The minimax algorithm
CS 561, Sessions Administrativia Assignment 1 due tuesday 9/24/2002 BEFORE midnight Midterm exam 10/10/2002.
CS 561, Sessions Last time: search strategies Uninformed: Use only information available in the problem formulation Breadth-first Uniform-cost Depth-first.
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 580 Artificial Intelligence Ch.6: Adversarial Search Fall 2008 Marco Valtorta.
1 DCP 1172 Introduction to Artificial Intelligence Lecture notes for Chap. 6 [AIMA] Chang-Sheng Chen.
CS 460, Sessions Last time: search strategies Uninformed: Use only information available in the problem formulation Breadth-first Uniform-cost Depth-first.
Adversarial Search: Game Playing Reading: Chess paper.
Games & Adversarial Search Chapter 6 Section 1 – 4.
Game Playing State-of-the-Art  Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley in Used an endgame database defining.
CSC 412: AI Adversarial Search
Adversarial Search Chapter 6 Section 1 – 4. Outline Optimal decisions α-β pruning Imperfect, real-time decisions.
Introduction to Artificial Intelligence CS 438 Spring 2008 Today –AIMA, Ch. 6 –Adversarial Search Thursday –AIMA, Ch. 6 –More Adversarial Search The “Luke.
1 Adversarial Search CS 171/271 (Chapter 6) Some text and images in these slides were drawn from Russel & Norvig’s published material.
Adversarial Search Chapter 6 Section 1 – 4. Search in an Adversarial Environment Iterative deepening and A* useful for single-agent search problems What.
CHAPTER 4 PROBABILITY THEORY SEARCH FOR GAMES. Representing Knowledge.
Quiz 4 : Minimax Minimax is a paranoid algorithm. True
Adversarial Search Chapter Games vs. search problems "Unpredictable" opponent  specifying a move for every possible opponent reply Time limits.
CSE373: Data Structures & Algorithms Lecture 23: Intro to Artificial Intelligence and Game Theory Based on slides adapted Luke Zettlemoyer, Dan Klein,
Paula Matuszek, CSC 8520, Fall Based in part on aima.eecs.berkeley.edu/slides-ppt 1 CS 8520: Artificial Intelligence Adversarial Search Paula Matuszek.
CS 188: Artificial Intelligence Spring 2006 Lecture 23: Games 4/18/2006 Dan Klein – UC Berkeley.
Adversarial Search Chapter 6 Section 1 – 4. Games vs. search problems "Unpredictable" opponent  specifying a move for every possible opponent reply Time.
CS 188: Artificial Intelligence Fall 2006 Lecture 7: Adversarial Search 9/19/2006 Dan Klein – UC Berkeley Many slides over the course adapted from either.
Explorations in Artificial Intelligence Prof. Carla P. Gomes Module 5 Adversarial Search (Thanks Meinolf Sellman!)
Adversarial Search Chapter 5 Sections 1 – 4. AI & Expert Systems© Dr. Khalid Kaabneh, AAU Outline Optimal decisions α-β pruning Imperfect, real-time decisions.
ADVERSARIAL SEARCH Chapter 6 Section 1 – 4. OUTLINE Optimal decisions α-β pruning Imperfect, real-time decisions.
Adversarial Search CMPT 463. When: Tuesday, April 5 3:30PM Where: RLC 105 Team based: one, two or three people per team Languages: Python, C++ and Java.
Chapter 5 Adversarial Search. 5.1 Games Why Study Game Playing? Games allow us to experiment with easier versions of real-world situations Hostile agents.
5/4/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 9, 5/4/2005 University of Washington, Department of Electrical Engineering Spring 2005.
Artificial Intelligence AIMA §5: Adversarial Search
Game Playing Why do AI researchers study game playing?
Announcements Homework 1 Full assignment posted..
Last time: search strategies
Adversarial Search Chapter 5.
Games & Adversarial Search
Games & Adversarial Search
Adversarial Search.
Games & Adversarial Search
Games & Adversarial Search
Lecture 1C: Adversarial Search(Game)
Game Playing Fifth Lecture 2019/4/11.
Games & Adversarial Search
Adversarial Search CMPT 420 / CMPG 720.
Adversarial Search CS 171/271 (Chapter 6)
Games & Adversarial Search
Adversarial Search Chapter 6 Section 1 – 4.
Presentation transcript:

Adversarial Search

Game playing Perfect play The minimax algorithm alpha-beta pruning Resource limitations Elements of chance Imperfect information

Game Playing State-of-the-Art Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley in Used an endgame database defining perfect play for all positions involving 8 or fewer pieces on the board, a total of 443,748,401,247 positions. Checkers is now solved! Chess: Deep Blue defeated human world champion Gary Kasparov in a six- game match in Deep Blue examined 200 million positions per second, used very sophisticated evaluation and undisclosed methods for extending some lines of search up to 40 ply. Current programs are even better, if less historic. Othello: Human champions refuse to compete against computers, which are too good. Go: Human champions are beginning to be challenged by machines, though the best humans still beat the best machines. In go, b > 300, so most programs use pattern knowledge bases to suggest plausible moves, along with aggressive pruning. Pacman: unknown

What kind of games? Abstraction: To describe a game we must capture every relevant aspect of the game. Such as: Chess Tic-tac-toe … Accessible environments: Such games are characterized by perfect information Search: game-playing then consists of a search through possible game positions Unpredictable opponent: introduces uncertainty thus game-playing must deal with contingency problems

Type of games

Game Playing Many different kinds of games! Axes: Deterministic or stochastic? One, two, or more players? Perfect information (can you see the state)? Turn taking or simultaneous action? Want algorithms for calculating a strategy (policy) which recommends a move in each state

Deterministic Games Deterministic, single player, perfect information: Know the rules Know what actions do Know when you win E.g. Freecell, 8-Puzzle, Rubik’s cube … it’s just search! Slight reinterpretation: Each node stores a value: the best outcome it can reach This is the maximal outcome of its children (the max value) Note that we don’t have path sums as before (utilities at end) After search, can pick move that leads to best node

Deterministic Two-Player E.g. tic-tac-toe, chess, checkers Zero-sum games One player maximizes result The other minimizes result Minimax search A state-space search tree Players alternate Each layer, or ply, consists of around of moves* Choose move to position with highest minimax value = best achievable utility against best play

Games vs. search problems  “Unpredictable" opponent  solution is a strategy specifying a move for every possible opponent reply  Time limits  unlikely to find goal, must approximate  Plan of attack: ◦Computer considers possible lines of play (Babbage, 1846) ◦Algorithm for perfect play (Zermelo, 1912; Von Neumann, 1944) ◦Finite horizon, approximate evaluation (Zuse, 1945; Wiener, 1948; Shannon, 1950) ◦First chess program (Turing, 1951) ◦Machine learning to improve evaluation accuracy (Samuel, ) ◦Pruning to allow deeper search (McCarthy, 1956)

Searching for the next move  Complexity: many games have a huge search space Chess:b = 35, m=100nodes = if each node takes about 1 ns to explore then each move will take about millennia to calculate.  Resource (e.g., time, memory) limit: optimal solution not feasible/possible, thus must approximate 1. Pruning: makes the search more efficient by discarding portions of the search tree that cannot improve quality result. 2. Evaluation functions: heuristics to evaluate utility of a state without exhaustive search.

Two-player games A game formulated as a search problem ◦Initial state: ◦Operators: ◦Terminal state: ◦Utility function: of the board position and turn definition of legal moves conditions for when game is over a numeric value that describes the outcome game.E.g., -1, 0, 1 for loss, draw, win. (AKA payoff function)

Example: Tic-Tac-Toe CS561 - Lecture Macskassy - Spring

The minimax algorithm  Perfect play for deterministic environments with perfect information  Basic idea: choose move with highest minimax value = best achievable payoff against best play  Algorithm: 1.Generate game tree completely 2.Determine utility of each terminal state 3.Propagate the utility values upward in the three by applying MIN and MAX operators on the nodes in the current level 4.At the root node use minimax decision to select the move with the max (of the min) utility value  Steps 2 and 3 in the algorithm assume that the opponent will play perfectly.

Generate Game Tree

x xx x

x oxx o x o xo

1 ply 1 move x oxx o x o xo

CS561 - Lecture Macskassy - Spring 2011 win lose draw 20 x o x ox o xox ox xo xox xox o xox ox xo xox xox oo xox xox oo xox ox xoo xox oox xo xox oox xo xox ox oxo xox xox ox o xox oox xxo xox oox xxo xox xox oxo A sub-tree

CS561 - Lecture Macskassy - Spring win lose draw x o x ox o xox ox xo xox xox o xox ox xo xox xox oo xox xox oo xox ox xoo xox oox xo xox oox xo xox ox oxo xox xox ox o xox oox xxo xox oox xxo xox xox oxo What is a good move?

CS561 - Lecture Macskassy - Spring MIN MAX MiniMax Example

CS561 - Lecture Macskassy - Spring MiniMax: Recursive Implementation

Minimax Properties Optimal against a perfect player. Otherwise?  Time complexity? ◦O(b m )  Space complexity? ◦O(bm)  For chess, b=35, m=100 ◦Exact solution is completely infeasible ◦But, do we need to explore the whole tree? max min

Resource Limits Cannot search to leaves Depth-limited search Instead, search a limited depth of tree Replace terminal utilities with an eval function for non-terminal positions Guarantee of optimal play is gone More plies makes a BIG difference Example: ◦ Suppose we have 100 seconds, can explore 10K nodes / sec ◦ So can check 1M nodes per move ◦ α – β reaches about depth 8 – decent chess program ???? min max -24

Evaluation Functions Function which scores non-terminals  Ideal function: returns the utility of the position  In practice: typically weighted linear sum of features:  e.g. f 1 ( s ) = (num white queens – num black queens), etc.

Evaluation Functions

Why Pacman starves  He knows his score will go up by eating the dot now  He knows his score will go up just as much by eating the dot later on  There are no point- scoring opportunities after eating the dot  Therefore, waiting seems just as good as eating

-pruning: general principle Player If> v then MAX will chose m so prune tree under n Similar forfor MIN Player CS561 - Lecture Macskassy - Spring Opponent m n v

-pruning: example 1 MAX CS561 - Lecture Macskassy - Spring MIN [-∞,+∞]

-pruning: example 1 3 MAX CS561 - Lecture Macskassy - Spring MIN [-∞,+∞]

-pruning: example 1 3 MAX CS561 - Lecture Macskassy - Spring MIN [-∞,+∞] [-∞,3]

-pruning: example 1 3 MAX MIN [-∞,+∞] [-∞,3] 12 CS561 - Lecture Macskassy - Spring

-pruning: example MAX MIN [-∞,+∞] [3,+∞] [-∞,3] CS561 - Lecture Macskassy - Spring

-pruning: example MAX MIN 2 [3,2][3,2] [-∞,+∞] [3,+∞] [-∞,3] CS561 - Lecture Macskassy - Spring

-pruning: example MAX MIN 214 [-∞,+∞] [3,+∞] [3,14][3,2][3,2] CS561 - Lecture Macskassy - Spring [-∞,3]

-pruning: example MAX MIN 214 [3,5][3,5] 5 [3,14] [-∞,+∞] [3,+∞] [3,2][3,2] CS561 - Lecture Macskassy - Spring [-∞,3]

-pruning: example MAX MIN 214 [3,5][3,5] 5 [3,14] 2 [3,2][3,2] [-∞,+∞] [3,+∞] CS561 - Lecture Macskassy - Spring [3,2][3,2] [-∞,3]

-pruning: example MAX Selected move MIN [-∞,+∞] [3,+∞] [3,5][3,5] [3,14] [3,2][3,2][3,2][3,2] CS561 - Lecture Macskassy - Spring [-∞,3]

-pruning: example 2 MAX MIN [-∞,+∞] CS561 - Lecture Macskassy - Spring

-pruning: example 2 MAX MIN 2 [-∞,2] [-∞,+∞] CS561 - Lecture Macskassy - Spring

-pruning: example 2 MAX MIN 52 [-∞,2] [-∞,+∞] CS561 - Lecture Macskassy - Spring

-pruning: example 2 MAX MIN 52 [-∞,2] 14 [-∞,+∞] [2,+∞] CS561 - Lecture Macskassy - Spring

-pruning: example 2 MAX MIN 5214 [2,5][2,5] 5 [-∞,+∞] [2,+∞] [-∞,2] CS561 - Lecture Macskassy - Spring

-pruning: example 2 MAX MIN [2,5] [2,1][2,5] [2,1] [-∞,+∞] [2,+∞] [-∞,2] CS561 - Lecture Macskassy - Spring

-pruning: example 2 MAX MIN [2,8][2,8] 51 [2,5] [2,1][2,5] [2,1] [-∞,+∞] [2,+∞] [-∞,2] CS561 - Lecture Macskassy - Spring

-pruning: example 2 MAX MIN [2,8][2,8] 1251 [2,5] [2,1][2,5] [2,1] [-∞,+∞] [2,+∞] [-∞,2] CS561 - Lecture Macskassy - Spring

-pruning: example 2 MAX MIN [2,8][2,3][2,8][2,3] 351 [2,5] [2,1][2,5] [2,1] [-∞,2] [-∞,+∞] [2,+∞] [3, +∞] CS561 - Lecture Macskassy - Spring

MIN [2,8][2,3][2,8][2,3][2,5] [2,1][2,5] [2,1] [-∞,+∞] [2,+∞] [3, +∞] -pruning: example 2 MAX Selected move [-∞,2] CS561 - Lecture Macskassy - Spring

-pruning: example 3 MAX MIN [6, ∞] CS561 - Lecture Macskassy - Spring

-pruning: example 3 MAX MIN [6, ∞] [-∞,6] CS561 - Lecture Macskassy - Spring

-pruning: example 3 MAX MIN [-∞,14] MIN [6, ∞] 14 CS561 - Lecture Macskassy - Spring [-∞,6]

-pruning: example 3 MAX MIN [-∞,6] [6, ∞] [-∞,14][-∞,5] 145 CS561 - Lecture Macskassy - Spring

-pruning: example 3 MAX MIN [-∞,6] [5,6][5,6] [6, ∞] [-∞,14][-∞,5] 1459 CS561 - Lecture Macskassy - Spring

-pruning: example 3 MAX MIN [5,1][5,1] [-∞,6] [6, ∞] [5,6][5,6] [-∞,14][-∞,5] CS561 - Lecture Macskassy - Spring

-pruning: example 3 MAX MIN [5,4][5,4] [-∞,6] [6, ∞] [5,1][5,1] [5,6][5,6] [-∞,14][-∞,5] CS561 - Lecture Macskassy - Spring

-pruning: example 3 MAX MIN [-∞,6] [-∞,5] [6, ∞] [5,4][5,4][5,1][5,1] [5,6][5,6] [-∞,14][-∞,5] CS561 - Lecture Macskassy - Spring

-pruning: example 3 MAX MIN [5,4][5,4] [-∞,6] [-∞,5] [6, ∞] Selected move [5,1][5,1] [5,6][5,6] [-∞,14][-∞,5] CS561 - Lecture Macskassy - Spring

-pruning: example 4 MAX MIN [6, ∞] CS561 - Lecture Macskassy - Spring

-pruning: example 4 MAX MIN [6, ∞] [-∞,6] CS561 - Lecture Macskassy - Spring

-pruning: example 4 MAX MIN [-∞,14] MIN [6, ∞] 14 CS561 - Lecture Macskassy - Spring [-∞,6]

-pruning: example 4 MAX MIN [-∞,6] [6, ∞] [-∞,14][-∞,7] 147 CS561 - Lecture Macskassy - Spring

-pruning: example 4 MAX MIN [-∞,6] [7,6][7,6] [6, ∞] [-∞,14][-∞,7] 1479 CS561 - Lecture Macskassy - Spring

-pruning: example 4 MAX MIN [7,6][7,6] [6, ∞] Selected move [-∞,14][-∞,7] [-∞,6] 1479 CS561 - Lecture Macskassy - Spring

-pruning: general principle Player If> v then MAX will chose m so prune tree under n Similar forfor MIN Player CS561 - Lecture Macskassy - Spring Opponent m n v

The α – β algorithm

Properties α – β algorithm

Resource limits  Standard approach: ◦Use C UTOFF -T EST instead of T ERMINAL -T EST  e.g., depth limit (perhaps add quiescence search) ◦Use E VAL instead of U TILITY  i.e., evaluation function that estimates desirability of position  Suppose we have 100 seconds, and can explore 10 4 nodes/second  10 6 nodes per move ≈ 35 8/2  α – β reaches depth 8  pretty good chess program

Evaluation Functions Function which scores non-terminals  Ideal function: returns the utility of the position  In practice: typically weighted linear sum of features:  e.g. f 1 ( s ) = (num white queens – num black queens), etc.

Digression: Exact values don't matter Behavior is preserved under any monotonic transformation of Eval Only the order matters: payoff in deterministic games acts as an ordinal utility function

CS561 - Lecture Macskassy - Spring  Dice rolls increase b: 21 possible rolls with 2 dice ◦Backgammon20 legal moves ◦ Depth 4 = 20 x (21 x 20) x 10 9  As depth increases, probability of reaching a given node shrinks ◦So value of lookahead is diminished ◦So limiting depth is less damaging ◦-pruning is much less effective  TDGammon uses depth-2 search + very good E VAL function + reinforcement learning ◦ world-champion level play STOCHASTIC GAMES

Nondeterministic games in general In nondeterministic games, chance introduced by dice, card-shuffling Simplified example with coin-flipping:

Algorithm for nondeterministic games Expectiminimax gives perfect play Just like Minimax, except we must also handle chance nodes: … if state is a Max node then return the highest ExpectiMinimax-Value of Successors(state) if state is a Min node then return the lowest ExpectiMinimax-Value of Successors(state) if state is a chance node then return average of ExpectiMinimax-Value of Successors(state) …

Expectiminimax

Digression: Exact values DO matter Behavior is preserved only by positive linear transformation of Eval Hence Eval should be proportional to the expected payoff

Games of imperfect information E.g., card games, where opponent's initial cards are unknown Typically we can calculate a probability for each possible deal Seems just like having one big dice roll at the beginning of the game Idea: compute the minimax value of each action in each deal, then choose the action with highest expected value over all deals Special case: if an action is optimal for all deals, it's optimal. GIB, current best bridge program, approximates this idea by 1) generating 100 deals consistent with bidding information 2) picking the action that wins most tricks on average

Example Four-card bridge/whist/hearts hand, Max to play first

Commonsense example

Proper analysis * Intuition that the value of an action is the average of its values in all actual states is WRONG With partial observability, value of an action depends on the information state or belief state the agent is in Can generate and search a tree of information states Leads to rational behaviors such as Acting to obtain information Signalling to one's partner Acting randomly to minimize information disclosure