Extensive-form games and how to solve them

Slides:



Advertisements
Similar presentations
CPS Extensive-form games Vincent Conitzer
Advertisements

M9302 Mathematical Models in Economics Instructor: Georgi Burlakov 3.1.Dynamic Games of Complete but Imperfect Information Lecture
This Segment: Computational game theory Lecture 1: Game representations, solution concepts and complexity Tuomas Sandholm Computer Science Department Carnegie.
Tuomas Sandholm, Andrew Gilpin Lossless Abstraction of Imperfect Information Games Presentation : B 趙峻甫 B 蔡旻光 B 駱家淮 B 李政緯.
Short introduction to game theory 1. 2  Decision Theory = Probability theory + Utility Theory (deals with chance) (deals with outcomes)  Fundamental.
Game-theoretic analysis tools Necessary for building nonmanipulable automated negotiation systems.
Extensive-form games. Extensive-form games with perfect information Player 1 Player 2 Player 1 2, 45, 33, 2 1, 00, 5 Players do not move simultaneously.
Minimax and Alpha-Beta Reduction Borrows from Spring 2006 CS 440 Lecture Slides.
Slide 1 of 13 So... What’s Game Theory? Game theory refers to a branch of applied math that deals with the strategic interactions between various ‘agents’,
Better automated abstraction techniques for imperfect information games, with application to Texas Hold’em poker Andrew Gilpin and Tuomas Sandholm Carnegie.
A competitive Texas Hold’em poker player via automated abstraction and real-time equilibrium computation Andrew Gilpin and Tuomas Sandholm Carnegie Mellon.
Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.
Better automated abstraction techniques for imperfect information games, with application to Texas Hold’em poker * Andrew Gilpin and Tuomas Sandholm, CMU,
Alpha-Beta Search. 2 Two-player games The object of a search is to find a path from the starting position to a goal position In a puzzle-type problem,
Finding equilibria in large sequential games of imperfect information Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department.
Minimax strategies, Nash equilibria, correlated equilibria Vincent Conitzer
Advanced Artificial Intelligence Lecture 3B: Game theory.
Game representations, solution concepts and complexity Tuomas Sandholm Computer Science Department Carnegie Mellon University.
Dynamic Games of complete information: Backward Induction and Subgame perfection - Repeated Games -
Extensive-form games Vincent Conitzer
Dynamic Games & The Extensive Form
Game-theoretic analysis tools Tuomas Sandholm Professor Computer Science Department Carnegie Mellon University.
Topic 3 Games in Extensive Form 1. A. Perfect Information Games in Extensive Form. 1 RaiseFold Raise (0,0) (-1,1) Raise (1,-1) (-1,1)(2,-2) 2.
Lecture 12. Game theory So far we discussed: roulette and blackjack Roulette: – Outcomes completely independent and random – Very little strategy (even.
Steering Evolution and Biological Adaptation Strategically: Computational Game Theory and Opponent Exploitation for Treatment Planning, Drug Design, and.
Better automated abstraction techniques for imperfect information games Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department.
Chapter 16 Oligopoly and Game Theory. “Game theory is the study of how people behave in strategic situations. By ‘strategic’ we mean a situation in which.
Finding equilibria in large sequential games of imperfect information CMU Theory Lunch – November 9, 2005 (joint work with Tuomas Sandholm)
Game Theory By Ben Cutting & Rohit Venkat.
Strategy Grafting in Extensive Games
Adversarial Environments/ Game Playing
Game Theory M.Pajhouh Niya M.Ghotbi
Instructor: Vincent Conitzer
Extensive-Form Game Abstraction with Bounds
Lecture 13.
Game Theory Just last week:
Project BEST Game Theory.
5. Combining simultaneous and sequential moves.
Communication Complexity as a Lower Bound for Learning in Games
Vincent Conitzer CPS Repeated games Vincent Conitzer
Noam Brown and Tuomas Sandholm Computer Science Department
Games & Adversarial Search
Lecture 12.
Econ 805 Advanced Micro Theory 1
Multiagent Systems Extensive Form Games © Manfred Huber 2018.
Multiagent Systems Game Theory © Manfred Huber 2018.
Alpha-Beta Search.
Games & Adversarial Search
Games & Adversarial Search
Alpha-Beta Search.
Chapter 29 Game Theory Key Concept: Nash equilibrium and Subgame Perfect Nash equilibrium (SPNE)
Instructor: Vincent Conitzer
Vincent Conitzer Mechanism design Vincent Conitzer
Alpha-Beta Search.
Vincent Conitzer CPS 173 Mechanism design Vincent Conitzer
CPS Extensive-form games
Multiagent Systems Repeated Games © Manfred Huber 2018.
Vincent Conitzer Repeated games Vincent Conitzer
Vincent Conitzer Extensive-form games Vincent Conitzer
CPS 173 Extensive-form games
Alpha-Beta Search.
Instructor: Vincent Conitzer
Games & Adversarial Search
Definition of Renegotiation-safe
Instructor: Vincent Conitzer
Alpha-Beta Search.
Games & Adversarial Search
A Technique for Reducing Normal Form Games to Compute a Nash Equilibrium Vincent Conitzer and Tuomas Sandholm Carnegie Mellon University, Computer Science.
Vincent Conitzer CPS Repeated games Vincent Conitzer
Finding equilibria in large sequential games of imperfect information
Presentation transcript:

Extensive-form games and how to solve them CMU 15-381 and 15-681 Fall 2017 Teachers: Noam Brown Tuomas Sandholm

What are Games? Requirements for a game: Examples? At least 2 players Actions available for each player Payoffs to each player for those actions Examples?

What is Game Theory? Assumptions in Game Theory: 1) All players want to maximize their utility 2) All players are rational 3) It is common knowledge that all players are rational

Two-Player Zero-Sum Games For now, we will focus on two-player zero-sum games Only two players Whatever one player wins, the other player loses Common in recreational games (rock-paper-scissors, chess, Go, poker) Player 2 Rock Paper Scissors 0, 0 -1, 1 1, -1 Player 1

Extensive-Form Games Moves are done sequentially P1 Moves are done sequentially Although “information sets”, discussed later, enable modeling simultaneous moves too There can be an extra player called chance (aka. nature) that plays stochastically, not strategically Game forms a tree Rewards are shown for P1, then P2 P2 P2 -4,4 -3,3 -2,2 P1 1,-1 0,0

Backward Induction P1 Start at the bottom of the tree and solve assuming we reached that point Replace the decision point with the value of the decision Move up, and repeat P2 P2 -4,4 -3,3 -2,2 P1 1,-1 0,0

Backward Induction P1 Start at the bottom of the tree and solve assuming we reached that point Replace the decision point with the value of the decision Move up, and repeat P2 P2 -4,4 -3,3 -2,2 P1 1,-1 0,0

Backward Induction P1 Start at the bottom of the tree and solve assuming we reached that point Replace the decision point with the value of the decision Move up, and repeat P2 P2 2,4 5,3 -2,2 1,-1

Backward Induction P1 Start at the bottom of the tree and solve assuming we reached that point Replace the decision point with the value of the decision Move up, and repeat P2 P2 2,4 5,3 -2,2 1,-1

Backward Induction P1 Start at the bottom of the tree and solve assuming we reached that point Replace the decision point with the value of the decision Move up, and repeat P2 -2,2 -4,4 -3,3

Backward Induction P1 Start at the bottom of the tree and solve assuming we reached that point Replace the decision point with the value of the decision Move up, and repeat -4,4 -2,2

Backward Induction P1 Start at the bottom of the tree and solve assuming we reached that point Replace the decision point with the value of the decision Move up, and repeat P2 P2 -4,4 -3,3 -2,2 P1 1,-1 0,0

Converting to Normal Form P1 Every extensive-form game can be converted to a normal-form game Must define strategy for every situation Problem: Exponential in the size of the game A B P2 P2 L R ℓ 𝑟 -2,2 -3,3 0,0 -1,1 L /ℓ L / 𝑟 R / ℓ R / 𝑟 A -2,2 -3,3 B 0,0 -1,1

Solving Perfect-Information Games Checkers solved – can always draw [Schaeffer et al. "Checkers is solved." Science (2007)] At about 1020 states, checkers is the largest game ever solved. Chess has about 1040. Go has about 10170.

Imperfect-information Games Heads Tails What about games like poker? Backward induction doesn’t work in imperfect- information games! An information set is the set of all possible states (nodes in the game tree) that the player may be in, given the information available to him. A strategy must be identical for all nodes in an information set. P1 P1 Play Sell Play Sell 0.5 -0.5 P2 P2 Tails Tails Heads Heads -1 1 1 -1

Half-Street Kuhn Poker 3 cards: K, Q, and J. Highest card wins Two players are dealt one of the cards Both players ante 1 chip P1 may bet or check If P1 checks, the higher card wins 1 chip If P1 bets, P2 may call or fold - If P2 folds, P1 wins 1 chip - If P2 calls, the higher card wins 2 chips Poll: What should P1 do with a Queen? 1. Always bet 2. Sometimes bet, sometimes check 3. Always check 4. No idea P1 Bet Check ±1 P2 Fold Call 1 ±2

Imperfect-information Games Check Bet Fold Call P1: K, P2: Q P1: K, P2: J P1: Q, P2: J P1: Q, P2: K P1: J, P2: K P1: J, P2: Q C P1 P2 1 -1 2 -2 Try iterative removal of dominated strategies!

Imperfect-information Games Bet Fold Call Check C P1 P2 1 -1 2 -2 P1: K, P2: Q P1: J, P2: Q P1: K, P2: J P1: Q, P2: J P1: Q, P2: K P1: J, P2: K

Imperfect-information Games Bet Fold Call Check C P1 P2 1 -1 2 -2 P1: K, P2: Q P1: J, P2: Q P1: K, P2: J P1: Q, P2: J P1: Q, P2: K P1: J, P2: K

Imperfect-information Games Two questions remain: How often should P1 bet with a Jack (bluff)? How often should P2 call with a Queen when P1 bets? Call with Q Fold with Q Bet with J Check with J

Imperfect-information Games Remaining game solved by making opponent indifferent. Let 𝐽=𝑃(𝐵𝑒𝑡|𝐽𝑎𝑐𝑘) and 𝑄=𝑃(𝐶𝑎𝑙𝑙|𝑄𝑢𝑒𝑒𝑛) −1=−2 𝑄+1 2 + 1−𝑄 2 => 𝑄= 1 3 −1=2 𝐽 𝐽+1 −2( 1 𝐽+1 ) => 𝐽= 1 3

Solving two-player zero-sum extensive-form games

AlphaGo AlphaGo techniques extend, in principle, to other perfect-information games

Perfect-Information Games

Perfect-Information Games

Perfect-Information Games

Imperfect-Information Games

Imperfect-Information Games

Example Game: Coin Toss P1 Information Set P1 Information Set C P = 0.5 P = 0.5 Heads Tails P1 P1 EV = 0.5 EV = -0.5 P2 Information Set Sell Sell Play Play P2 P2 If it lands Heads the coin is considered lucky and P1 can sell it. If it lands Tails the coin is unlucky and P1 can pay someone to take it. Or, P1 can play, and P2 can try to guess how the coin landed. Heads Tails Heads Tails -1 1 1 -1

Imperfect-Information Games: Coin Toss Heads Tails P1 P1 Imperfect-Information Subgame EV = 0.5 EV = -0.5 Sell Sell Play Play P2 P2 Heads Tails Heads Tails -1 1 1 -1

Imperfect-Information Games: Coin Toss Heads Tails P1 P1 EV = 0.5 EV = -0.5 EV = 1.0 Sell EV = -1.0 Sell Play Play P2 P2 What is the optimal strategy for P2, looking only at the subgame? Clearly if we know the exact state then we can search just like in Chess. But we only know the information set. We can see that P1 would have sold the coin if it landed Heads, so P2 should always guess Tails. P = 1.0 P = 0.0 P = 1.0 P = 0.0 Heads Tails Heads Tails -1 1 1 -1

Imperfect-Information Games: Coin Toss Heads EV = 0.5 Tails EV = 1.0 Average = 0.75 C P = 0.5 P = 0.5 Heads Tails P1 P1 EV = 0.5 EV = -0.5 EV = 1.0 Sell EV = -1.0 Sell Play Play P2 P2 What is the optimal strategy for P2, looking only at the subgame? Clearly if we know the exact state then we can search just like in Chess. But we only know the information set. We can see that P1 would have sold the coin if it landed Heads, so P2 should always guess Tails. P = 1.0 P = 0.0 P = 1.0 P = 0.0 Heads Tails Heads Tails -1 1 1 -1

Imperfect-Information Games: Coin Toss Heads Tails P1 P1 EV = 0.5 EV = -0.5 EV = -1.0 Sell EV = 1.0 Sell Play Play P2 P2 What is the optimal strategy for P2, looking only at the subgame? Clearly if we know the exact state then we can search just like in Chess. But we only know the information set. We can see that P1 would have sold the coin if it landed Heads, so P2 should always guess Tails. P = 0.0 P = 1.0 P = 0.0 P = 1.0 Heads Tails Heads Tails -1 1 1 -1

Imperfect-Information Games: Coin Toss Heads EV = 1.0 Tails EV = -0.5 Average = 0.25 C P = 0.5 P = 0.5 Heads Tails P1 P1 EV = 0.5 EV = -0.5 EV = -1.0 Sell EV = 1.0 Sell Play Play P2 P2 What is the optimal strategy for P2, looking only at the subgame? Clearly if we know the exact state then we can search just like in Chess. But we only know the information set. We can see that P1 would have sold the coin if it landed Heads, so P2 should always guess Tails. P = 0.0 P = 1.0 P = 0.0 P = 1.0 Heads Tails Heads Tails -1 1 1 -1

Imperfect-Information Games: Coin Toss Heads Tails P1 P1 EV = 0.5 EV = -0.5 EV = -0.5 Sell EV = 0.5 Sell Play Play P2 P2 What is the optimal strategy for P2, looking only at the subgame? Clearly if we know the exact state then we can search just like in Chess. But we only know the information set. We can see that P1 would have sold the coin if it landed Heads, so P2 should always guess Tails. P = 0.25 P = 0.75 P = 0.25 P = 0.75 Heads Tails Heads Tails -1 1 1 -1

Imperfect-Information Games: Coin Toss Heads EV = 0.5 Tails EV = -0.5 Average = 0.0 C P = 0.5 P = 0.5 Heads Tails P1 P1 EV = 0.5 EV = -0.5 EV = -0.5 Sell EV = 0.5 Sell Play Play P2 P2 What is the optimal strategy for P2, looking only at the subgame? Clearly if we know the exact state then we can search just like in Chess. But we only know the information set. We can see that P1 would have sold the coin if it landed Heads, so P2 should always guess Tails. P = 0.25 P = 0.75 P = 0.25 P = 0.75 Heads Tails Heads Tails -1 1 1 -1

Imperfect-Information Games: Coin Toss Heads Tails P1 P1 EV = 0.5 EV = -0.5 Sell Sell Play Play P2 P2 What is the optimal strategy for P2, looking only at the subgame? Clearly if we know the exact state then we can search just like in Chess. But we only know the information set. We can see that P1 would have sold the coin if it landed Heads, so P2 should always guess Tails. P = 0.25 P = 0.75 P = 0.25 P = 0.75 Heads Tails Heads Tails -1 1 1 -1

Imperfect-Information Games: Coin Toss Heads Tails P1 P1 EV = -0.5 EV = 0.5 Sell Sell Play Play P2 P2 What is the optimal strategy for P2, looking only at the subgame? Clearly if we know the exact state then we can search just like in Chess. But we only know the information set. We can see that P1 would have sold the coin if it landed Heads, so P2 should always guess Tails. P = 0.25 P = 0.75 P = 0.25 P = 0.75 Heads Tails Heads Tails -1 1 1 -1

Imperfect-Information Games: Coin Toss Heads Tails P1 P1 EV = -0.5 EV = 0.5 Sell Sell Play Play P2 P2 What is the optimal strategy for P2, looking only at the subgame? Clearly if we know the exact state then we can search just like in Chess. But we only know the information set. We can see that P1 would have sold the coin if it landed Heads, so P2 should always guess Tails. P = 0.75 P = 0.25 P = 0.75 P = 0.25 Heads Tails Heads Tails -1 1 1 -1

Increasing scalability: Abstraction Idea: Create smaller game that is similar to big game Run equilibrium-finding algorithm on smaller game Map computed equilibrium to a strategy profile in big game

Abstraction [Gilpin & Sandholm EC-06, J. of the ACM 2007…] Original game Abstracted game Compact representation Automated abstraction 10161 Equilibrium-finding algorithm Key idea: Have a really good abstraction (ideally, no abstraction) for the first two rounds (which is a tiny fraction of the whole game). We only play according to the abstract strategy for those two rounds. Reverse mapping ~equilibrium ε-equilibrium Foreshadowed by Shi & Littman 01, Billings et al. IJCAI-03

Lossless abstraction C C 0.5 0.5 1 P1 P1 P1 P2 P2 P2 P2 P2 P2 10 10 10 10 10 10 10 10

Lossy abstraction C C 0.5 0.5 1 P1 P1 P1 P2 P2 P2 P2 P2 P2 8 10 10 10 10 10 10 9 10

ε-Nash equilibrium Standard Nash equilibrium: ε-Nash equilibrium: Each player has no incentive to deviate In other words: No agent can gain utility by switching to another strategy ε-Nash equilibrium: Each player has only ε incentive to deviate In other words: No agent can gain more than ε utility by switching to another strategy

Increasing scalability: Sparse algorithms We can solve games exactly using a linear program, but this requires 𝑂( 𝑁 2 ) memory where 𝑁 is the size of the game Idea: design algorithms that trade time/quality for memory Accept that algorithm outputs approximately correct answer: ε-Nash equilibrium (different from ε error from abstraction) Ideally: ε goes to 0 as algorithm runtime goes to infinity Memory usage is linear in the size of the game

Plan for next part of the presentation We will introduce counterfactual regret minimization (CFR)-based equilibrium finding in two-player zero-sum games Most common algorithm Relatively easy to understand and implement We start from first principles and proceed step by step to the most advanced algorithms in this family Used to be best in practice, but now there is a gradient-based algorithm that is better in practice also [Kroer et al. EC-17] But that is much more difficult to understand