ICS-171:Notes 6: 1 Notes 6: Two-Player Games and Search ICS 171 Winter 2001.

Slides:



Advertisements
Similar presentations
Adversarial Search Chapter 6 Section 1 – 4. Types of Games.
Advertisements

Adversarial Search We have experience in search where we assume that we are the only intelligent being and we have explicit control over the “world”. Lets.
Adversarial Search Reference: “Artificial Intelligence: A Modern Approach, 3 rd ed” (Russell and Norvig)
Games & Adversarial Search Chapter 5. Games vs. search problems "Unpredictable" opponent  specifying a move for every possible opponent’s reply. Time.
This lecture topic: Game-Playing & Adversarial Search
Games & Adversarial Search
ICS-271:Notes 6: 1 Notes 6: Game-Playing ICS 271 Fall 2008.
CS 484 – Artificial Intelligence
Adversarial Search Chapter 6 Section 1 – 4.
University College Cork (Ireland) Department of Civil and Environmental Engineering Course: Engineering Artificial Intelligence Dr. Radu Marinescu Lecture.
Adversarial Search Chapter 5.
COMP-4640: Intelligent & Interactive Systems Game Playing A game can be formally defined as a search problem with: -An initial state -a set of operators.
Adversarial Search: Game Playing Reading: Chapter next time.
Adversarial Search CSE 473 University of Washington.
Artificial Intelligence for Games Game playing Patrick Olivier
An Introduction to Artificial Intelligence Lecture VI: Adversarial Search (Games) Ramin Halavati In which we examine problems.
1 Adversarial Search Chapter 6 Section 1 – 4 The Master vs Machine: A Video.
Games CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
Two-player games overview Computer programs which play 2-player games – –game-playing as search – –with the complication of an opponent General principles.
Minimax and Alpha-Beta Reduction Borrows from Spring 2006 CS 440 Lecture Slides.
This time: Outline Game playing The minimax algorithm
CS 561, Sessions Administrativia Assignment 1 due tuesday 9/24/2002 BEFORE midnight Midterm exam 10/10/2002.
CS 561, Sessions Last time: search strategies Uninformed: Use only information available in the problem formulation Breadth-first Uniform-cost Depth-first.
Game Playing CSC361 AI CSC361: Game Playing.
1 search CS 331/531 Dr M M Awais A* Examples:. 2 search CS 331/531 Dr M M Awais 8-Puzzle f(N) = g(N) + h(N)
CS 460, Sessions Last time: search strategies Uninformed: Use only information available in the problem formulation Breadth-first Uniform-cost Depth-first.
ICS-271:Notes 6: 1 Notes 6: Game-Playing ICS 271 Fall 2006.
Adversarial Search: Game Playing Reading: Chess paper.
Games & Adversarial Search Chapter 6 Section 1 – 4.
Game Playing: Adversarial Search Chapter 6. Why study games Fun Clear criteria for success Interesting, hard problems which require minimal “initial structure”
ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003.
Game Playing State-of-the-Art  Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley in Used an endgame database defining.
1 Adversary Search Ref: Chapter 5. 2 Games & A.I. Easy to measure success Easy to represent states Small number of operators Comparison against humans.
CSC 412: AI Adversarial Search
Game Trees: MiniMax strategy, Tree Evaluation, Pruning, Utility evaluation Adapted from slides of Yoonsuck Choe.
PSU CS 370 – Introduction to Artificial Intelligence Game MinMax Alpha-Beta.
Game Playing Chapter 5. Game playing §Search applied to a problem against an adversary l some actions are not under the control of the problem-solver.
KU NLP Heuristic Search Heuristic Search and Expert Systems (1) q An interesting approach to implementing heuristics is the use of confidence.
Lecture 6: Game Playing Heshaam Faili University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance.
Game Playing.
Game Playing Chapter 5. Game playing §Search applied to a problem against an adversary l some actions are not under the control of the problem-solver.
Games as Game Theory Systems (Ch. 19). Game Theory It is not a theoretical approach to games Game theory is the mathematical study of decision making.
Adversarial Search Chapter 6 Section 1 – 4. Outline Optimal decisions α-β pruning Imperfect, real-time decisions.
Introduction to Artificial Intelligence CS 438 Spring 2008 Today –AIMA, Ch. 6 –Adversarial Search Thursday –AIMA, Ch. 6 –More Adversarial Search The “Luke.
Minimax with Alpha Beta Pruning The minimax algorithm is a way of finding an optimal move in a two player game. Alpha-beta pruning is a way of finding.
Games. Adversaries Consider the process of reasoning when an adversary is trying to defeat our efforts In game playing situations one searches down the.
For Wednesday Read chapter 7, sections 1-4 Homework: –Chapter 6, exercise 1.
CSCI 4310 Lecture 6: Adversarial Tree Search. Book Winston Chapter 6.
GAME PLAYING 1. There were two reasons that games appeared to be a good domain in which to explore machine intelligence: 1.They provide a structured task.
Game Playing Revision Mini-Max search Alpha-Beta pruning General concerns on games.
Today’s Topics Playing Deterministic (no Dice, etc) Games –Mini-max –  -  pruning –ML and games? 1997: Computer Chess Player (IBM’s Deep Blue) Beat Human.
Artificial Intelligence
ARTIFICIAL INTELLIGENCE (CS 461D) Princess Nora University Faculty of Computer & Information Systems.
Game-playing AIs Part 2 CIS 391 Fall CSE Intro to AI 2 Games: Outline of Unit Part II  The Minimax Rule  Alpha-Beta Pruning  Game-playing.
Adversarial Search 2 (Game Playing)
Adversarial Search and Game Playing Russell and Norvig: Chapter 6 Slides adapted from: robotics.stanford.edu/~latombe/cs121/2004/home.htm Prof: Dekang.
February 25, 2016Introduction to Artificial Intelligence Lecture 10: Two-Player Games II 1 The Alpha-Beta Procedure Can we estimate the efficiency benefit.
Explorations in Artificial Intelligence Prof. Carla P. Gomes Module 5 Adversarial Search (Thanks Meinolf Sellman!)
Artificial Intelligence in Game Design Board Games and the MinMax Algorithm.
Adversarial Search Chapter 5 Sections 1 – 4. AI & Expert Systems© Dr. Khalid Kaabneh, AAU Outline Optimal decisions α-β pruning Imperfect, real-time decisions.
Chapter 5: Adversarial Search & Game Playing
ADVERSARIAL SEARCH Chapter 6 Section 1 – 4. OUTLINE Optimal decisions α-β pruning Imperfect, real-time decisions.
1 Chapter 6 Game Playing. 2 Chapter 6 Contents l Game Trees l Assumptions l Static evaluation functions l Searching game trees l Minimax l Bounded lookahead.
Adversarial Search and Game-Playing
Last time: search strategies
PENGANTAR INTELIJENSIA BUATAN (64A614)
Adversarial Search and Game Playing (Where making good decisions requires respecting your opponent) R&N: Chap. 6.
Notes 6: Two-Player Games and Search
Games & Adversarial Search
Presentation transcript:

ICS-171:Notes 6: 1 Notes 6: Two-Player Games and Search ICS 171 Winter 2001

ICS-171:Notes 6: 2 Outline Computer programs which play 2-player games –game-playing as search –with the complication of an opponent General principles of game-playing and search –evaluation functions, minimax principle –alpha-beta-pruning, heuristic techniques Status of Game-Playing Systems –in chess, checkers, backgammon, Othello, etc, computers routinely defeat leading world players Applications? –think of “nature” as an opponent: economics, medicine, etc Reading: Nilsson Ch.II-12 ; R.&N. Ch.5 ( )

ICS-171:Notes 6: 3 Chess Rating Scale Garry Kasparov (current World Champion ) Deep Blue Deep Thought

ICS-171:Notes 6: 4 Game-Playing and AI Game-playing is a good problem for AI research: –all the information is available i.e., human and computer have equal information –game-playing is non-trivial need to display “human-like” intelligence some games (such as chess) are very complex requires decision-making within a time-limit –more realistic than other search problems –games are played in a controlled environment can do experiments, repeat games, etc: good for evaluating research systems –can compare humans and computers directly can evaluate percentage of wins/losses to quantify performance

ICS-171:Notes 6: 5 Search and Game Playing Consider a board game –e.g., chess, checkers, tic-tac-toe –configuration of the board = unique arrangement of “pieces” –each possible configuration = state in search space Statement of Game as a Search Problem –States = board configurations –Operators = legal moves –Initial State = current configuration –Terminal State (Goal) = winning configuration

ICS-171:Notes 6: 6 Game Tree Representation New aspect to search problem –there’s an opponent we cannot control –how can we handle this? S Computer Moves Opponent Moves Computer Moves G Possible Goal State lower in Tree (winning situation for computer)

ICS-171:Notes 6: 7 Complexity of Game Playing Imagine we could predict the opponent’s moves given each computer move How complex would search be in this case? –worst case, it will be O(b d ) –Chess: b ~ 35 (average branching factor) d ~ 100 (depth of game tree for typical game) b d ~ ~ nodes!! (“only” about legal states) –Tic-Tac-Toe ~5 legal moves, total of 9 moves 5 9 = 1,953,125 9! = 362,880 (Computer goes first) 8! = 40,320 (Computer goes second) well-known games can produce enormous search trees

ICS-171:Notes 6: 8 Utility Functions Utility Function: –defined for each terminal state in a game –assigns a numeric value for each terminal state –these numbers represent how “valuable” the state is for the computer positive for winning negative for losing zero for a draw –Typical values from -infinity (lost) to +infinity (won) or [-1, +1].

ICS-171:Notes 6: 9 Greedy Search with Utilities A greedy search strategy using utility functions –Expand the search tree as far as the terminal states on each branch –Generate utility functions for each board configuration –Make the initial move that results in the board configuration with the maximum value –but this ignores what the opponent might do! i.e. opponent’s moves are interleaved with the computer’s

ICS-171:Notes 6: 10 Minimax Principle “Assume the worst” –say we are two plays (in the tree) away from the terminal states –high numbers favor the computer so we want to choose moves which maximize utility –low numbers favor the opponent so they will choose moves which minimize utility Minimax Principle –you (the computer) assume that the opponent will choose the minimizing move next (after your move) –so you now choose the best move under this assumption i.e., the maximum (highest-value) option considering both your move and the opponent’s optimal move. –we can generalize this argument more than 2 moves ahead: we can search ahead as far as we can afford.

ICS-171:Notes 6: 11 Minimax Search Example Explore the tree as far as the terminal states Calculate utilities for the resulting board configurations The computer will make the move such that when the opponent makes his best move, the board configuration will be in the best position for the computer

ICS-171:Notes 6: 12 Propagating Minimax Values up the Game Tree Starting from the leaves –Assign a value to the parent node as follows Children are Opponent’s moves: Minimum of all immediate children Children are Computer’s moves: Maximum of all immediate children

ICS-171:Notes 6: 13 Deeper Game Trees Minimax can be generalized to deeper game trees (than just 2 levels): “propagate” utility values upwards in the tree

ICS-171:Notes 6: 14 General Minimax Algorithm on a Game Tree For each move by the computer 1. perform depth-first search as far as the terminal state 2. assign utilities at each terminal state 3. propagate upwards the minimax choices if the parent is a minimizer (opponent) propagate up the minimum value of the children if the parent is a maximizer (computer) propagate up the maximum value of the children 4. choose the move (the child of the current node) corresponding to the maximum of the minimax values if the children Note: - minimax values are gradually propagated upwards as depth-first search proceeds, i.e., minimax values propagate up the tree in a “left-to-right” fashion - minimax values for sub-tree are propagated upwards “as we go”, so only O(bd) nodes need to be kept in memory at any time

ICS-171:Notes 6: 15 Complexity of Minimax Algorithm Assume that all terminal states are at depth d Space complexity –depth-first search, so O(bd) Time complexity –branching factor b, thus, O(b d ) The O(b d ) time complexity is a major problem since the computer typically only has a finite amount of time to make a move, –e.g., in chess, 2 minute time-limit, but b=35, d=50 ! So the direct minimax algorithm is impractical in practice –instead what about depth-limited search to depth m? –but utilities are defined only at terminal states –=> need to know the “utility” of non-terminal states these are called heuristic/static evaluation functions

ICS-171:Notes 6: 16 Idea of Static Evaluation Functions An Evaluation Function: –provide an estimate of how good the current board configuration is for the computer. –think of it as the expected utility averaged over all possible game outcomes from that position –it reflects the computer’s chances of winning from that node –but it must be easy to calculate, i.e., cannot involve searching the tree below the depth-limit node –thus, we use various easily calculated heuristics, e.g., Chess: Value of all white pieces - Value of all black pieces –Typically, one figures how good it is for the computer, and how good it is for the opponent, and subtracts the opponent’s score from the computer’s Must agree with the utility function when calculated at terminal nodes

ICS-171:Notes 6: 17 Minimax with Evaluation Functions The same as general minimax, except –only goes to depth m –uses evaluation functions (estimates of utility) How would this algorithm perform at chess? –could look ahead about 4 “ply” (pairs of moves) –would be consistently beaten even by average chess players! For each move by the computer 1. perform depth-first search with depth-limit m 2. assign evaluation functions at each state at depth m 3. propagate upwards the minimax choices if the parent is a minimizer (opponent) propagate up the minimum value of the children if the parent is a maximizer (computer) propagate up the maximum value of the children 4. choose the move (the child of the current node) corresponding to the maximum of the minimax values if the children

ICS-171:Notes 6: 18 The Alpha Beta Principle: Marry Search Tree Generation with Position Evaluations Computer's XX 3 X Opponent’s Moves (min of evaluations) ( X indicates ‘not evaluated’)

ICS-171:Notes 6: 19 Alpha Beta Procedure Depth first search of game tree, keeping track of: –Alpha: Highest value seen so far on maximizing level –Beta: Lowest value seen so far on minimizing level Pruning –When Maximizing, do not expand any more sibling nodes once a node has been seen whose evaluation is smaller than Alpha –When Minimizing, do not expand any sibling nodes once a node has been seen whose evaluation is greater than Beta

ICS-171:Notes 6: The Alpha Beta Principle: 2nd Example

ICS-171:Notes 6: 21 Effectiveness of Alpha-Beta Search Worst-Case –branches are ordered so that no pruning takes place –alpha-beta gives no improvement over exhaustive search Best-Case –each player’s best move is the left-most alternative (i.e., evaluated first) –in practice, performance is closer to best rather than worst-case In practice often get O(b (d/2) ) rather than O(b d ) –this is the same as having a branching factor of sqrt(b), since (sqrt(b)) d = b (d/2) i.e., we have effectively gone from b to square root of b –e.g., in chess go from b ~ 35 to b ~ 6 this permits much deeper search in the same amount of time makes computer chess competitive with humans!

ICS-171:Notes 6: 22 Iterative (Progressive) Deepening In real games, there is usually a time limit T on making a move How do we take this into account? –using alpha-beta we cannot use “partial” results with any confidence unless the full breadth of the tree has been searched – So, we could be conservative and set a conservative depth-limit which guarantees that we will find a move in time < T disadvantage is that we may finish early, could do more search In practice, iterative deepening search (IDS) is used –IDS runs alpha-beta search with an increasing depth-limit note: the “inner” loop is a full alpha-beta search with a specified depth limit m –when the clock runs out we use the solution found at the previous depth limit

ICS-171:Notes 6: 23 Heuristics and Game Tree Search The Horizon Effect –sometimes there’s a major “effect” (such as a piece being captured) which is just “below” the depth to which the tree has been expanded –the computer cannot see that this major event could happen –it has a “limited horizon” –there are heuristics to try to follow certain branches more deeply to detect to such important events, i.e., “focusing attention” –this helps to avoid catastrophic losses due to “short-sightedness” Heuristics for Tree Exploration –it may be better to explore some branches more deeply in the allotted time –various heuristics exist to identify “promising” branches

ICS-171:Notes 6: 24 More on Evaluation Functions Features are numeric characteristics of a board position, e.g., –feature 1 = f1 = number of white pieces –feature 2 = f2 = number of black pieces –feature 3 = f3 = f1/f2 –feature 4 = f4 = number of white bishops –feature 5 = f5 = estimate of “threat” to white king Evaluation function is an estimate of a useful a board position is –it is a function of the pieces on the board –it is like the heuristic function in general search –generally evaluation function = function(f1, f2, f3,......) i.e., it is a function of individual features

ICS-171:Notes 6: 25 Linear Evaluation Functions A linear function of f1, f2, f3 is a weighted sum of f1, f2, f3.... A linear evaluation function is = w1.f1 + w2.f2 + w3.f wn.fn –where f1, f2, f3, are the features –and w1, w2, w3, are the weights –Idea: more important features get more weight The quality of play will depend directly on the quality of the evaluation function –to build an evaluation function we have to 1. construct good features (using expert knowledge, heuristics) 2. pick good weights (can be learned)

ICS-171:Notes 6: 26 Learning the Weights in a Linear Evaluation Function A linear evaluation function is = w1.f1 + w2.f2 + w3.f wn.fn How could we “learn” these weights? –basic idea: play lots of games against an opponent –for every move (or game) look at the error = true outcome - evaluation function –if error is positive (underestimating) adjust weights to increase the evaluation function –if error is zero do nothing –if error is negative (overestimatng) adjust weights to decrease the evaluation function –details on learning of weights will be taught later in quarter when we discuss learning algorithms

ICS-171:Notes 6: 27 Examples of Algorithms which Learn to Play Well Checkers –A. L. Samuel, “Some studies in machine learning using the game of checkers,” IBM Journal of Research and Development, 11(6): , –Learned by playing a copy of itself thousands of times –using only an IBM 704 with 10,000 words of RAM, magnetic tape, and a clock speed of 1 kHz –successful enough to compete well at human tournaments –a computer performing full flawless search is expected soon Backgammon –G. Tesauro and T. J. Sejnowski, “A parallel network that learns to play backgammon,” Artificial Intelligence, 39(3), , –Also learns by playing copies of itself –uses a non-linear evaluation function (a neural network) –rates in the top 3 players in the world.

ICS-171:Notes 6: 28 Computers can play GrandMaster Chess “Deep Blue” (IBM) –parallel processor, 32 nodes –each node has 8 dedicated VLSI “chess chips” –each chip can search 200 million configurations/second –uses minimax, alpha-beta, heuristics: can search to depth 14 –memorizes starts, end-games –power based on speed and memory: no common sense Kasparov v. Deep Blue, May 1997 –6 game full-regulation chess match (sponsored by ACM) –Kasparov lost the match (2.5 to 3.5) –a historic achievement for computer chess: the first time a computer is the best chess-player on the planet Note that Deep Blue plays by “brute-force”: there is relatively little which is similar to human intuition and cleverness

ICS-171:Notes 6: 29 Status of Computers in Other Games Checkers/Draughts –current world champion is Chinook, can beat any human –uses alpha-beta search Othello –computers can easily beat the world experts Backgammon –system which learns is ranked in the top 3 in the world –uses neural networks to learn from playing many many games against itself Go –branching factor b ~ 360: very large! –$2 million prize for any system which can beat a world expert

ICS-171:Notes 6: 30 Web site of the Day Describes Kasparov v. Deep Blue match in May 1997 –descriptions of the games –quotes, background info on people involved –expert commentary and analysis –detailed information about Deep Blue –essays about computer chess in general

ICS-171:Notes 6: 31 Summary Game playing is best modeled as a search problem Game trees represent alternate computer/opponent moves Evaluation functions estimate the quality of a given board configuration for each player Minimax is a procedure which chooses moves by assuming that the opponent will always choose the move which is best for them Alpha-Beta is a procedure which can prune large parts of the search tree and allow search to go deeper For many well-known games, computer algorithms based on heuristic search match or out-perform human world experts. Reading: Nilsson Ch.II-12 ; R.&N. Ch.5 ( ) Do alpha-beta pruning exercise on page 206 Nilsson