Finding equilibria in large sequential games of imperfect information

Slides:



Advertisements
Similar presentations
CPS Extensive-form games Vincent Conitzer
Advertisements

Nash’s Theorem Theorem (Nash, 1951): Every finite game (finite number of players, finite number of pure strategies) has at least one mixed-strategy Nash.
M9302 Mathematical Models in Economics Instructor: Georgi Burlakov 3.1.Dynamic Games of Complete but Imperfect Information Lecture
Continuation Methods for Structured Games Ben Blum Christian Shelton Daphne Koller Stanford University.
This Segment: Computational game theory Lecture 1: Game representations, solution concepts and complexity Tuomas Sandholm Computer Science Department Carnegie.
Sequential imperfect-information games Case study: Poker Tuomas Sandholm Carnegie Mellon University Computer Science Department.
Tuomas Sandholm, Andrew Gilpin Lossless Abstraction of Imperfect Information Games Presentation : B 趙峻甫 B 蔡旻光 B 駱家淮 B 李政緯.
Game Theoretical Insights in Strategic Patrolling: Model and Analysis Nicola Gatti – DEI, Politecnico di Milano, Piazza Leonardo.
Automatically Generating Game-Theoretic Strategies for Huge Imperfect-Information Games Tuomas Sandholm Carnegie Mellon University Computer Science Department.
Short introduction to game theory 1. 2  Decision Theory = Probability theory + Utility Theory (deals with chance) (deals with outcomes)  Fundamental.
Extensive-form games. Extensive-form games with perfect information Player 1 Player 2 Player 1 2, 45, 33, 2 1, 00, 5 Players do not move simultaneously.
A camper awakens to the growl of a hungry bear and sees his friend putting on a pair of running shoes, “You can’t outrun a bear,” scoffs the camper. His.
Complexity Results about Nash Equilibria
Temporal Action-Graph Games: A New Representation for Dynamic Games Albert Xin Jiang University of British Columbia Kevin Leyton-Brown University of British.
Poker for Fun and Profit (and intellectual challenge) Robert Holte Computing Science Dept. University of Alberta.
Better automated abstraction techniques for imperfect information games, with application to Texas Hold’em poker Andrew Gilpin and Tuomas Sandholm Carnegie.
AWESOME: A General Multiagent Learning Algorithm that Converges in Self- Play and Learns a Best Response Against Stationary Opponents Vincent Conitzer.
A competitive Texas Hold’em poker player via automated abstraction and real-time equilibrium computation Andrew Gilpin and Tuomas Sandholm Carnegie Mellon.
Complexity of Mechanism Design Vincent Conitzer and Tuomas Sandholm Carnegie Mellon University Computer Science Department.
Better automated abstraction techniques for imperfect information games, with application to Texas Hold’em poker * Andrew Gilpin and Tuomas Sandholm, CMU,
Nesterov’s excessive gap technique and poker Andrew Gilpin CMU Theory Lunch Feb 28, 2007 Joint work with: Samid Hoda, Javier Peña, Troels Sørensen, Tuomas.
Computing equilibria in extensive form games Andrew Gilpin Advanced AI – April 7, 2005.
Finding equilibria in large sequential games of imperfect information Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department.
MAKING COMPLEX DEClSlONS
Sequences of Take-It-or-Leave-it Offers: Near-Optimal Auctions Without Full Valuation Revelation Tuomas Sandholm and Andrew Gilpin Carnegie Mellon University.
EC941 - Game Theory Prof. Francesco Squintani Lecture 5 1.
Agents that can play multi-player games. Recall: Single-player, fully-observable, deterministic game agents An agent that plays Peg Solitaire involves.
Extensive-form games Vincent Conitzer
Automated Design of Multistage Mechanisms Tuomas Sandholm (Carnegie Mellon) Vincent Conitzer (Carnegie Mellon) Craig Boutilier (Toronto)
Dynamic Games & The Extensive Form
Game-theoretic analysis tools Tuomas Sandholm Professor Computer Science Department Carnegie Mellon University.
1 What is Game Theory About? r Analysis of situations where conflict of interests is present r Goal is to prescribe how conflicts can be resolved 2 2 r.
Steering Evolution and Biological Adaptation Strategically: Computational Game Theory and Opponent Exploitation for Treatment Planning, Drug Design, and.
Better automated abstraction techniques for imperfect information games Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department.
EC941 - Game Theory Prof. Francesco Squintani Lecture 6 1.
Finding equilibria in large sequential games of imperfect information CMU Theory Lunch – November 9, 2005 (joint work with Tuomas Sandholm)
John Forbes Nash John Forbes Nash, Jr. (born June 13, 1928) is an American mathematician whose works in game theory, differential geometry, and partial.
Game Theory By Ben Cutting & Rohit Venkat.
Game theory basics A Game describes situations of strategic interaction, where the payoff for one agent depends on its own actions as well as on the actions.
Strategy Grafting in Extensive Games
Game Theory M.Pajhouh Niya M.Ghotbi
Extensive-Form Game Abstraction with Bounds
Joint work with Sam Ganzfried
Game Theory Just last week:
Last time: search strategies
DATA STRUCTURES AND OBJECT ORIENTED PROGRAMMING IN C++
Computing equilibria in extensive form games
Communication Complexity as a Lower Bound for Learning in Games
Extensive-form games and how to solve them
Vincent Conitzer CPS Repeated games Vincent Conitzer
Noam Brown and Tuomas Sandholm Computer Science Department
Econ 805 Advanced Micro Theory 1
Artificial Intelligence
Structured Models for Multi-Agent Interactions
Multiagent Systems Extensive Form Games © Manfred Huber 2018.
Artificial Intelligence
CPS Extensive-form games
Enumerating All Nash Equilibria for Two-person Extensive Games
Vincent Conitzer Repeated games Vincent Conitzer
Lecture 20 Linear Program Duality
Lecture 3: Environs and Algorithms
Russell and Norvig: Chapter 3, Sections 3.1 – 3.3
Vincent Conitzer Extensive-form games Vincent Conitzer
CPS 173 Extensive-form games
Games & Adversarial Search
Normal Form (Matrix) Games
Games & Adversarial Search
A Technique for Reducing Normal Form Games to Compute a Nash Equilibrium Vincent Conitzer and Tuomas Sandholm Carnegie Mellon University, Computer Science.
Vincent Conitzer CPS Repeated games Vincent Conitzer
Presentation transcript:

Finding equilibria in large sequential games of imperfect information Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department

Motivation: Poker Poker is a wildly popular card game This year’s World Series of Poker prize pool surpassed $103 million, including $56 million for the World Championship event ESPN is broadcasting parts of the tournament Poker presents several challenges for AI Imperfect information Risk assessment and management Deception (bluffing, slow-playing) Counter-deception (calling a bluff) Wsopr $100,000, sport?

Rhode Island Hold’em poker: The Deal

Rhode Island Hold’em poker: Round 1

Rhode Island Hold’em poker: Round 2

Rhode Island Hold’em poker: Round 3

Rhode Island Hold’em poker: Showdown

Sneak preview of results: Solving Rhode Island Hold’em poker Rhode Island Hold’em poker invented as a testbed for AI research [Shi & Littman 2001] Game tree has more than 3.1 billion nodes Previously, the best techniques did not scale to games this large Using our algorithm we have computed optimal strategies for this game This is the largest poker game solved to date by over four orders of magnitude

Outline of this talk Game-theoretic foundations: Equilibrium Model: Ordered games Abstraction mechanism: Information filters Strategic equivalence: Game isomorphisms Algorithm: GameShrink Solving Rhode Island Hold’em

Game Theory In multi-agent systems, an agent’s outcome depends on the actions of the other agents Consequently, an agent’s optimal action depends on the actions of the other agents Game theory provides guidance as to how an agent should act A game-theoretic equilibrium specifies a strategy for each agent such that no agent wishes to deviate Such an equilibrium always exists [Nash 1950] Game theory is the mathematical study of such environments

Complexity of computing equilibria Finding a Nash equilibrium is “A most fundamental computational problem whose complexity is wide open [and] together with factoring … the most important concrete open question on the boundary of P today” [Papadimitriou 2001] Even for games with only two players There are algorithms (requiring exponential-time in the worst-case) for computing Nash equilibria Good news: Two-person zero-sum matrix games can be solved in poly-time using linear programming

What about sequential games? Sequential games involve turn-taking, moves of chance, and imperfect information Every sequential game can be converted into a simultaneous-move game Basic idea: Make one strategy in the simultaneous-move game for every possible action in every possible situation in the sequential game This approach leads to an exponential blowup in the number of strategies

Sequence form representation The sequence form is an alternative representation that is more compact [Koller, Megiddo, von Stengel, Romanovskii] Using the sequence form, two-player zero-sum games with perfect recall can be solved in time polynomial in the size of the game tree But, Texas Hold’em has 1018 nodes

Our approach Instead of developing an equilibrium-finding algorithm per se, we instead introduce an automated abstraction technique that results in a smaller, equivalent game We prove that a Nash equilibrium in the smaller game corresponds to a Nash equilibrium in the original game Our technique applies to n-player sequential games with observed actions and ordered signals

Game with ordered signals (a.k.a. ordered game) Players I = {1,…,n} Stage games G = G1,…,Gr Player label L Game-ending nodes ω Signal alphabet Θ Signal quantities κ = κ1,…,κr and γ = γ1,…,γr Signal probability distribution p Partial ordering ≥ of subsets of Θ Utility function u (increasing in private signals) κ = (0,1,1) γ = (1,0,0) Θ = {2♠,…,A♦} Uniform Hand rank

Information filters Observation: We can make games smaller by filtering the information a player receives Instead of observing a specific signal exactly, a player instead observes a filtered set of signals E.g. receiving the signal {A♠,A♣,A♥,A♦} instead of A♠ Combining an ordered game and a valid information filter yields a filtered ordered game Prop. A filtered ordered game is a finite sequential game with perfect recall Corollary If the filtered ordered game is two-person zero-sum, we can solve it in poly-time using linear programming Information filters isolate signal history from action history

Filtered signal trees Every filtered ordered game has a corresponding filtered signal tree Each edge corresponds to the revelation of some signal Each path corresponds to the revelation of a set of signals Our algorithms operate directly on the filtered signal tree We never load the full game representation into memory

Filtered signal tree: Example

Ordered game isomorphic relation The ordered game isomorphic relation captures the notion of strategic symmetry between nodes We define the relationship recursively: Two leaves are ordered game isomorphic if the payoffs to all players are the same at each leaf, for all action histories Two internal nodes are ordered game isomorphic if they are siblings and there is a bijection between their children such that only ordered game isomorphic nodes are matched We can compute this relationship efficiently using dynamic programming and perfect matching computations in a bipartite graph

Ordered game isomorphic abstraction transformation This operation transforms an existing information filter into a new filter that merges two ordered game isomorphic nodes The new filter yields a smaller, abstracted game Thm If a strategy profile is a Nash equilibrium in the smaller, abstracted game, then it is a Nash equilibrium in the original game

Applying the ordered game isomorphic abstraction transformation

Applying the ordered game isomorphic abstraction transformation

Applying the ordered game isomorphic abstraction transformation

GameShrink: Efficiently computing ordered game isomorphic abstraction transformations Recall: we have a dynamic program for determining if two nodes of the filtered signal tree are ordered game isomorphic Algorithm: Starting from the top of the filtered signal tree, perform the transformation where applicable Approximation algorithm: instead of requiring perfect matching, instead require a matching with a penalty below some threshold

GameShrink: Efficiently computing ordered game isomorphic abstraction transformations The Union-Find data structure provides an efficient representation of the information filter Linear memory and almost linear time Can eliminate certain perfect matching computations by using easy-to-check necessary conditions Compact histogram databases for storing win/loss frequencies to speed up the checks

Solving Rhode Island Hold’em poker GameShrink computes all ordered game isomorphic abstraction transformations in under one second Without abstraction, the linear program has 91,224,226 rows and columns After applying GameShrink, the linear program has only 1,237,238 rows and columns By solving the resulting linear program, we are able to compute optimal min-max strategies for this game CPLEX Barrier method takes 7 days, 17 hours and 25 GB RAM to solve This is the largest poker game solved to date by over four orders of magnitude

Comparison to previous research Rule-based Limited success in even small poker games Simulation/Learning Do not take multi-agent aspect into account Game-theoretic Manual abstraction “Approximating Game-Theoretic Optimal Strategies for Full-scale Poker”, Billings, Burch, Davidson, Holte, Schaeffer, Schauenberg, Szafron, IJCAI-03. Distinguished Paper Award. Automated abstraction

Directions for future work Computing strategies for larger games Requires approximation of solutions Tournament poker More than two players Other types of abstraction

Thank you very much for your interest Summary Introduced an automatic method for performing abstractions in a broad class of games Introduced information filters as a technique for working with games with imperfect information Developed an equilibrium-preserving abstraction transformation, along with an efficient algorithm Described a simple extension that yields an approximation algorithm for tackling even larger games Solved the largest poker game to date Playable on-line at http://www.cs.cmu.edu/~gilpin/gsi.html Thank you very much for your interest