G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance.

Slides:



Advertisements
Similar presentations
Planning with Non-Deterministic Uncertainty (Where failure is not an option) R&N: Chap. 12, Sect (+ Chap. 10, Sect 10.7)
Advertisements

Games & Adversarial Search Chapter 5. Games vs. search problems "Unpredictable" opponent  specifying a move for every possible opponent’s reply. Time.
For Friday Finish chapter 5 Program 1, Milestone 1 due.
Artificial Intelligence CS482, CS682, MW 1 – 2:15, SEM 201, MS 227 Prerequisites: 302, 365 Instructor: Sushil Louis,
February 7, 2006AI: Chapter 6: Adversarial Search1 Artificial Intelligence Chapter 6: Adversarial Search Michael Scherger Department of Computer Science.
Games & Adversarial Search
Artificial Intelligence Adversarial search Fall 2008 professor: Luigi Ceccaroni.
CMSC 671 Fall 2001 Class #8 – Thursday, September 27.
ICS-271:Notes 6: 1 Notes 6: Game-Playing ICS 271 Fall 2008.
Game Playing (Tic-Tac-Toe), ANDOR graph By Chinmaya, Hanoosh,Rajkumar.
CS 484 – Artificial Intelligence
1 Game Playing. 2 Outline Perfect Play Resource Limits Alpha-Beta pruning Games of Chance.
Adversarial Search: Game Playing Reading: Chapter next time.
Lecture 12 Last time: CSPs, backtracking, forward checking Today: Game Playing.
Adversarial Search CSE 473 University of Washington.
MINIMAX SEARCH AND ALPHA- BETA PRUNING: PLAYER 1 VS. PLAYER 2.
Games CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
A DVERSARIAL S EARCH & G AME P LAYING. 2 3 T EXAS H OLD ‘E M P OKER 2 cards per player, face down 5 community cards dealt incrementally Winner has best.
Minimax and Alpha-Beta Reduction Borrows from Spring 2006 CS 440 Lecture Slides.
This time: Outline Game playing The minimax algorithm
Lecture 02 – Part C Game Playing: Adversarial Search
1 Game Playing Chapter 6 Additional references for the slides: Luger’s AI book (2005). Robert Wilensky’s CS188 slides:
Game Playing CSC361 AI CSC361: Game Playing.
Games and adversarial search
Min-Max Trees Based on slides by: Rob Powers Ian Gent Yishay Mansour.
1 search CS 331/531 Dr M M Awais A* Examples:. 2 search CS 331/531 Dr M M Awais 8-Puzzle f(N) = g(N) + h(N)
Adversarial Search and Game Playing (Where making good decisions requires respecting your opponent) R&N: Chap. 6.
ICS-271:Notes 6: 1 Notes 6: Game-Playing ICS 271 Fall 2006.
Adversarial Search and Game Playing Examples. Game Tree MAX’s play  MIN’s play  Terminal state (win for MAX)  Here, symmetries have been used to reduce.
Games & Adversarial Search Chapter 6 Section 1 – 4.
Game Playing: Adversarial Search Chapter 6. Why study games Fun Clear criteria for success Interesting, hard problems which require minimal “initial structure”
1 Adversary Search Ref: Chapter 5. 2 Games & A.I. Easy to measure success Easy to represent states Small number of operators Comparison against humans.
Game Trees: MiniMax strategy, Tree Evaluation, Pruning, Utility evaluation Adapted from slides of Yoonsuck Choe.
Notes adapted from lecture notes for CMSC 421 by B.J. Dorr
Minimax Trees: Utility Evaluation, Tree Evaluation, Pruning CPSC 315 – Programming Studio Spring 2008 Project 2, Lecture 2 Adapted from slides of Yoonsuck.
Lecture 6: Game Playing Heshaam Faili University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance.
AD FOR GAMES Lecture 4. M INIMAX AND A LPHA -B ETA R EDUCTION Borrows from Spring 2006 CS 440 Lecture Slides.
Planning with Non-Deterministic Uncertainty. Recap Uncertainty is inherent in systems that act in the real world Last lecture: reacting to unmodeled disturbances.
1 Computer Group Engineering Department University of Science and Culture S. H. Davarpanah
October 3, 2012Introduction to Artificial Intelligence Lecture 9: Two-Player Games 1 Iterative Deepening A* Algorithm A* has memory demands that increase.
Chapter 12 Adversarial Search. (c) 2000, 2001 SNU CSE Biointelligence Lab2 Two-Agent Games (1) Idealized Setting  The actions of the agents are interleaved.
Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,
Introduction to Artificial Intelligence CS 438 Spring 2008 Today –AIMA, Ch. 6 –Adversarial Search Thursday –AIMA, Ch. 6 –More Adversarial Search The “Luke.
Adversarial Search; Heuristics in Games Foundations of Artificial Intelligence.
Games. Adversaries Consider the process of reasoning when an adversary is trying to defeat our efforts In game playing situations one searches down the.
For Wednesday Read chapter 7, sections 1-4 Homework: –Chapter 6, exercise 1.
Quiz 4 : Minimax Minimax is a paranoid algorithm. True
G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.
Adversarial Search Chapter Games vs. search problems "Unpredictable" opponent  specifying a move for every possible opponent reply Time limits.
U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty.
U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty.
ARTIFICIAL INTELLIGENCE (CS 461D) Princess Nora University Faculty of Computer & Information Systems.
Graph Search II GAM 376 Robin Burke. Outline Homework #3 Graph search review DFS, BFS A* search Iterative beam search IA* search Search in turn-based.
CMSC 421: Intro to Artificial Intelligence October 6, 2003 Lecture 7: Games Professor: Bonnie J. Dorr TA: Nate Waisbrot.
Game Playing: Adversarial Search chapter 5. Game Playing: Adversarial Search  Introduction  So far, in problem solving, single agent search  The machine.
Adversarial Search 2 (Game Playing)
Adversarial Search and Game Playing Russell and Norvig: Chapter 6 Slides adapted from: robotics.stanford.edu/~latombe/cs121/2004/home.htm Prof: Dekang.
Adversarial Search and Game Playing Russell and Norvig: Chapter 5 Russell and Norvig: Chapter 6 CS121 – Winter 2003.
Adversarial Search In this lecture, we introduce a new search scenario: game playing 1.two players, 2.zero-sum game, (win-lose, lose-win, draw) 3.perfect.
Adversarial Search Chapter Two-Agent Games (1) Idealized Setting – The actions of the agents are interleaved. Example – Grid-Space World – Two.
Iterative Deepening A*
Adversarial Search and Game Playing (Where making good decisions requires respecting your opponent) R&N: Chap. 6.
PENGANTAR INTELIJENSIA BUATAN (64A614)
Adversarial Search and Game Playing (Where making good decisions requires respecting your opponent) R&N: Chap. 6.
Expectimax Lirong Xia. Expectimax Lirong Xia Project 2 MAX player: Pacman Question 1-3: Multiple MIN players: ghosts Extend classical minimax search.
Game playing.
Introduction to Artificial Intelligence Lecture 9: Two-Player Games I
Adversarial Search and Game Playing
Based on slides by: Rob Powers Ian Gent
Adversarial Search and Game Playing Examples
Presentation transcript:

G AME P LAYING 2

T HIS L ECTURE Alpha-beta pruning Games with chance

N ONDETERMINISM Uncertainty is caused by the actions of another agent (MIN), who competes with our agent (MAX) MAX’s play MAX cannot tell what move will be played MIN’s play

N ONDETERMINISM Uncertainty is caused by the actions of another agent (MIN), who competes with our agent (MAX) MAX’s play MAX must decide what to play for BOTH these outcomes MIN’s play Instead of a single path, the agent must construct an entire plan

M INIMAX B ACKUP MIN’s turn MAX’s turn +1 0 MAX’s turn

D EPTH -F IRST M INIMAX A LGORITHM MAX-Value(S) 1. If Terminal?(S) return Result(S) 2. Return max S’  SUCC(S) MIN-Value(S’) MIN-Value(S) 1. If Terminal?(S) return Result(S) 2. Return min S’  SUCC(S) MAX-Value(S’) MINIMAX-Decision(S) Return action leading to state S’  SUCC(S) that maximizes MIN-Value(S’)

R EAL -T IME G AME P LAYING WITH E VALUATION F UNCTION e(s): function indicating estimated favorability of a state to MAX Keep track of depth, and add line: If(depth(s) = cutoff) return e(s) After terminal test

C AN WE DO BETTER ? Yes ! Much better ! 3  Pruning  -1  3 3 This part of the tree can’t have any effect on the value that will be backed up to the root

E XAMPLE

 = 2 2 The beta value of a MIN node is an upper bound on the final backed-up value. It can never increase

E XAMPLE The beta value of a MIN node is an upper bound on the final backed-up value. It can never increase 1  = 1 2

E XAMPLE  = 1 The alpha value of a MAX node is a lower bound on the final backed-up value. It can never decrease 1  = 1 2

E XAMPLE  = 1 1  = 1 2  = -1

E XAMPLE  = 1 1  = 1 2  = -1 Search can be discontinued below any MIN node whose beta value is less than or equal to the alpha value of one of its MAX ancestors Search can be discontinued below any MIN node whose beta value is less than or equal to the alpha value of one of its MAX ancestors

A LPHA -B ETA P RUNING Explore the game tree to depth h in depth-first manner Back up alpha and beta values whenever possible Prune branches that can’t lead to changing the final decision

A LPHA -B ETA A LGORITHM Update the alpha/beta value of the parent of a node N when the search below N has been completed or discontinued Discontinue the search below a MAX node N if its alpha value is  the beta value of a MIN ancestor of N Discontinue the search below a MIN node N if its beta value is  the alpha value of a MAX ancestor of N

E XAMPLE MAX MIN MAX MIN MAX MIN

E XAMPLE MAX MIN MAX MIN MAX MIN

E XAMPLE MAX MIN MAX MIN MAX MIN

E XAMPLE MAX MIN MAX MIN MAX MIN

E XAMPLE MAX MIN MAX MIN MAX MIN

E XAMPLE MAX MIN MAX MIN MAX MIN

E XAMPLE MAX MIN MAX MIN MAX MIN

E XAMPLE MAX MIN MAX MIN MAX MIN

E XAMPLE MAX MIN MAX MIN MAX MIN

E XAMPLE MAX MIN MAX MIN MAX MIN

E XAMPLE

E XAMPLE MAX MIN MAX MIN MAX MIN

E XAMPLE MAX MIN MAX MIN MAX MIN

E XAMPLE MAX MIN MAX MIN MAX MIN

E XAMPLE MAX MIN MAX MIN MAX MIN

E XAMPLE MAX MIN MAX MIN MAX MIN

E XAMPLE MAX MIN MAX MIN MAX MIN

E XAMPLE MAX MIN MAX MIN MAX MIN

E XAMPLE MAX MIN MAX MIN MAX MIN

E XAMPLE MAX MIN MAX MIN MAX MIN

E XAMPLE MAX MIN MAX MIN MAX MIN

E XAMPLE MAX MIN MAX MIN MAX MIN

E XAMPLE MAX MIN MAX MIN MAX MIN

E XAMPLE

E XAMPLE MAX MIN MAX MIN MAX MIN

E XAMPLE MAX MIN MAX MIN MAX MIN

E XAMPLE MAX MIN MAX MIN MAX MIN

H OW MUCH DO WE GAIN ? Consider these two cases: 3  = 3  =-1 (4) 3  = 3 4  =4

H OW MUCH DO WE GAIN ? Assume a game tree of uniform branching factor b Minimax examines O(b h ) nodes, so does alpha-beta in the worst-case The gain for alpha-beta is maximum when: The children of a MAX node are ordered in decreasing backed up values The children of a MIN node are ordered in increasing backed up values Then alpha-beta examines O(b h/2 ) nodes [Knuth and Moore, 1975] But this requires an oracle (if we knew how to order nodes perfectly, we would not need to search the game tree) If nodes are ordered at random, then the average number of nodes examined by alpha-beta is ~O(b 3h/4 )

A LPHA -B ETA I MPLEMENTATION MAX-Value(S, ,  ) 1. If Terminal?(S) return Result(S) 2. For all S’  SUCC(S) 3.   max( ,MIN-Value(S’, ,  )) 4. If   , then return  5. Return  MIN-Value(S, ,  ) 1. If Terminal?(S) return Result(S) 2. For all S’  SUCC(S) 3.   min( ,MAX-Value(S’, ,  )) 4. If   , then return  5. Return  Alpha-Beta-Decision(S) Return action leading to state S’  SUCC(S) that maximizes MIN-Value(S’,- ,+  )

H EURISTIC O RDERING OF N ODES Order the nodes below the root according to the values backed-up at the previous iteration

O THER I MPROVEMENTS Adaptive horizon + iterative deepening Extended search: Retain k>1 best paths, instead of just one, and extend the tree at greater depth below their leaf nodes (to help dealing with the “horizon effect”) Singular extension: If a move is obviously better than the others in a node at horizon h, then expand this node along this move Use transposition tables to deal with repeated states Null-move search

G AMES OF C HANCE

Dice games: backgammon, Yahtzee, craps, … Card games: poker, blackjack, … Is there a fundamental difference between the nondeterminism in chess-playing vs. the nondeterminism in a dice roll?

MAX CHANCE MIN CHANCE MAX

E XPECTED V ALUES The utility of a MAX/MIN node in the game tree is the max/min of the utility values of its successors The expected utility of a CHANCE node in the game tree is the average of the utility values of its successors ExpectedValue(s) =  s’  SUCC(s) ExpectedValue(s’) P(s’) MinimaxValue(s) = max s’  SUCC(s) MinimaxValue(s’) Compare to MinimaxValue(s) = min s’  SUCC(s) MinimaxValue(s’) CHANCE nodes MAX nodes MIN nodes

A DVERSARIAL G AMES OF C HANCE E.g., Backgammon MAX nodes, MIN nodes, CHANCE nodes Expectiminimax search Backup step: MAX = maximum of children CHANCE = average of children MIN = minimum of children CHANCE = average of children 4 levels of the game tree separate each of MAX’s turns! Evaluation function? Pruning?

G ENERALIZING M INIMAX V ALUES Utilities can be continuous numerical values, rather than +1,0,-1 Allows maximizing the amount of “points” (e.g., $) rewarded instead of just achieving a win Rewards associated with terminal states Costs can be associated with certain decisions at non-terminal states (e.g., placing a bet)

R OULETTE “Game tree” only has depth 2 Place a bet Observe the roulette wheel No bet Bet: Red, $5 RedNot red Chance node 18/3820/38Probabilities +100

C HANCE N ODE B ACKUP Expected value: For k children, with backed up values v 1,…,v k Chance node value = p 1 * v 1 + p 2 * v 2 + … + p k * v k RedNot red Chance node 18/3820/38Probabilities +100 Bet: Red, $5 Value: 18/38 * /38 * 0 = 4.74

MAX/C HANCE N ODES RedNot red 18/3820/ Bet: Red, $ MAX Chance Bet: 17, $ = 150/38 17Not 17 1/3837/ Max should pick the action leading to the node with the highest value

A SLIGHTLY MORE COMPLEX EXAMPLE Two fair coins Pay $1 to start, at which point both are flipped Can flip up to two coins again, at a cost of $1 each Payout: $5 for HH, $1 for HT or TH, $0 for TT HT HH 1/2 TTHT 1/2 HT Flip TFlip H Done HTHHTTHT 1/2 Flip T Flip H HT Done TT Done Flip T HTTT 1/2

A SLIGHTLY MORE COMPLEX EXAMPLE Two fair coins Pay $1 to start, at which point both are flipped Can flip up to two coins again, at a cost of $1 each Payout: $5 for HH, $1 for HT or TH, $0 for TT HT HH 1/2 TTHT 1/2 HT Flip TFlip H Done HTHHTTHT 1/2 Flip T Flip H HT Done TT Done Flip T HTTT 1/ =4 1-1=0 1-2=-15-2=3-2-2

A SLIGHTLY MORE COMPLEX EXAMPLE Two fair coins Pay $1 to start, at which point both are flipped Can flip up to two coins again, at a cost of $1 each Payout: $5 for HH, $1 for HT or TH, $0 for TT HT HH 1/2 TTHT 1/2 HT Flip TFlip H Done HTHHTTHT 1/2 Flip T Flip H HT Done TT Done Flip T HTTT 1/ /2

A SLIGHTLY MORE COMPLEX EXAMPLE Two fair coins Pay $1 to start, at which point both are flipped Can flip up to two coins again, at a cost of $1 each Payout: $5 for HH, $1 for HT or TH, $0 for TT HT HH 1/2 TTHT 1/2 HT Flip TFlip H Done HTHHTTHT 1/2 Flip T Flip H HT Done TT Done Flip T HTTT 1/ /

A SLIGHTLY MORE COMPLEX EXAMPLE Two fair coins Pay $1 to start, at which point both are flipped Can flip up to two coins again, at a cost of $1 each Payout: $5 for HH, $1 for HT or TH, $0 for TT HT HH 1/2 TTHT 1/2 HT Flip TFlip H Done HTHHTTHT 1/2 Flip T Flip H HT Done TT Done Flip T HTTT 1/ /

A SLIGHTLY MORE COMPLEX EXAMPLE Two fair coins Pay $1 to start, at which point both are flipped Can flip up to two coins again, at a cost of $1 each Payout: $5 for HH, $1 for HT or TH, $0 for TT HT HH 1/2 TTHT 1/2 HT Flip TFlip H Done HTHHTTHT 1/2 Flip T Flip H HT Done TT Done Flip T HTTT 1/ /

TTHT 1/2 A SLIGHTLY MORE COMPLEX EXAMPLE Two fair coins Pay $1 to start, at which point both are flipped Can flip up to two coins again, at a cost of $1 each Payout: $5 for HH, $1 for HT or TH, $0 for TT HT HH 1/2 HT Flip TFlip H Done HTHHTTHT 1/2 Flip T Flip H HT Done TT Done Flip T HTTT 1/ / /

A SLIGHTLY MORE COMPLEX EXAMPLE Two fair coins Pay $1 to start, at which point both are flipped Can flip up to two coins again, at a cost of $1 each Payout: $5 for HH, $1 for HT or TH, $0 for TT HT HH 1/2 TTHT 1/2 HT Flip TFlip H Done HTHHTTHT 1/2 Flip T Flip H HT Done TT Done Flip T HTTT 1/ / /

C ARD G AMES Blackjack (6-deck), video poker: similar to coin- flipping game But in many card games, need to keep track of history of dealt cards in state because it affects future probabilities One-deck blackjack Bridge Poker (We won’t even get started on betting strategies)

P ARTIALLY O BSERVABLE G AMES Partial observability Don’t see entire state (e.g., other players’ hands) “Fog of war” Examples: Kriegspiel (see R&N) Battleship Stratego

P ARTIALLY -O BSERVABLE C ARD G AMES One possible strategy: Consider all possible deals Solve each deal as a fully-observable problem Choose the move that has the best average minimax value “Averaging over clairvoyance” [Why doesn’t this always work?]

O BSERVATION OF THE R EAL W ORLD 69 Real world in some state Percepts On(A,B) On(B,Table) Handempty Interpretation of the percepts in the representation language Percepts can be user’s inputs, sensory data (e.g., image pixels), information received from other agents,...

S ECOND SOURCE OF U NCERTAINTY : I MPERFECT O BSERVATION OF THE W ORLD Observation of the world can be: Partial, e.g., a vision sensor can’t see through obstacles (lack of percepts) 70 R1R1 R2R2 The robot may not know whether there is dust in room R2

S ECOND SOURCE OF U NCERTAINTY : I MPERFECT O BSERVATION OF THE W ORLD Observation of the world can be: Partial, e.g., a vision sensor can’t see through obstacles Ambiguous, e.g., percepts have multiple possible interpretations 71 A B C On(A,B)  On(A,C)

S ECOND SOURCE OF U NCERTAINTY : I MPERFECT O BSERVATION OF THE W ORLD Observation of the world can be: Partial, e.g., a vision sensor can’t see through obstacles Ambiguous, e.g., percepts have multiple possible interpretations Incorrect 72

E XAMPLE : B ELIEF S TATE In the presence of non-deterministic sensory uncertainty, an agent belief state represents all the states of the world that it thinks are possible at a given time or at a given stage of reasoning In the probabilistic model of uncertainty, a probability is associated with each state to measure its likelihood to be the actual state

B ELIEF S TATE A belief state is the set of all states that an agent think are possible at any given time or at any stage of planning a course of actions, e.g.: To plan a course of actions, the agent searches a space of belief states, instead of a space of states

S ENSOR M ODEL State space S The sensor model is a function SENSE: S  2 S that maps each state s  S to a belief state (the set of all states that the agent would think possible if it were actually observing state s) Example: Assume our vacuum robot can perfectly sense the room it is in and if there is dust in it. But it can’t sense if there is dust in the other room SENSE( ) =

V ACUUM R OBOT A CTION M ODEL Right either moves the robot right, or does nothing Left always moves the robot to the left, but it may occasionally deposit dust in the right room Suck picks up the dirt in the room, if any, and always does the right thing The robot perfectly senses the room it is in and whether there is dust in it But it can’t sense if there is dust in the other room

T RANSITION B ETWEEN B ELIEF S TATES Suppose the robot is initially in state: After sensing this state, its belief state is: Just after executing Left, its belief state will be: After sensing the new state, its belief state will be: or if there is no dust if there is dust in R 1 in R 1

T RANSITION B ETWEEN B ELIEF S TATES Playing a “game against nature” Left Clean(R 1 )  Clean(R 1 ) After receiving an observation, the robot will have one of these two belief states

AND/OR T REE OF B ELIEF S TATES Left Suck goal A goal belief state is one in which all states are goal states An action is applicable to a belief state B if its preconditions are achieved in all states in B Right loop goal

R ECAP Alpha-beta pruning: reduce complexity of minimax to O(b h/2 ) ideally Games with chance Expected values: averaging over probabilities Partial observability Reason about sets of states: belief state Much more on latter 2 topics later

P ROJECT P ROPOSAL (O PTIONAL ) Mandatory: instructor’s advance approval Out of town 9/24-10/1, can discuss via Project title, team members 1/2 to 1 page description Specific topic (problem you are trying to solve, topic of survey, etc) Why did you choose this topic? Methods (researched in advance, sources of references) Expected results to me by 10/2

H OMEWORK Reading: R&N 4.3-4,13.1-2