Better automated abstraction techniques for imperfect information games, with application to Texas Hold’em poker * Andrew Gilpin and Tuomas Sandholm, CMU,

Slides:



Advertisements
Similar presentations
An Introduction to Game Theory Part V: Extensive Games with Perfect Information Bernhard Nebel.
Advertisements

Lecture 13. Poker Two basic concepts: Poker is a game of skill, not luck. If you want to win at poker, – make sure you are very skilled at the game, and.
Introduction to Game Theory
1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California.
This Segment: Computational game theory Lecture 1: Game representations, solution concepts and complexity Tuomas Sandholm Computer Science Department Carnegie.
Sequential imperfect-information games Case study: Poker Tuomas Sandholm Carnegie Mellon University Computer Science Department.
Tuomas Sandholm, Andrew Gilpin Lossless Abstraction of Imperfect Information Games Presentation : B 趙峻甫 B 蔡旻光 B 駱家淮 B 李政緯.
Games & Adversarial Search Chapter 5. Games vs. search problems "Unpredictable" opponent  specifying a move for every possible opponent’s reply. Time.
Automatically Generating Game-Theoretic Strategies for Huge Imperfect-Information Games Tuomas Sandholm Carnegie Mellon University Computer Science Department.
Short introduction to game theory 1. 2  Decision Theory = Probability theory + Utility Theory (deals with chance) (deals with outcomes)  Fundamental.
Class Project Due at end of finals week Essentially anything you want, so long as it’s AI related and I approve Any programming language you want In pairs.
What is game theory… Game theory studies settings where multiple parties (agents) each have –different preferences (utility functions), –different actions.
INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 1 Games, Optimization, and Online Algorithms Martin Zinkevich University of Alberta.
Complexity Results about Nash Equilibria
Basics on Game Theory For Industrial Economics (According to Shy’s Plan)
Sequential imperfect-information games Case study: Poker Tuomas Sandholm Carnegie Mellon University Computer Science Department.
Poker for Fun and Profit (and intellectual challenge) Robert Holte Computing Science Dept. University of Alberta.
CS 589 Information Risk Management 23 January 2007.
Game Playing CSC361 AI CSC361: Game Playing.
1 Computing Nash Equilibrium Presenter: Yishay Mansour.
Better automated abstraction techniques for imperfect information games, with application to Texas Hold’em poker Andrew Gilpin and Tuomas Sandholm Carnegie.
AWESOME: A General Multiagent Learning Algorithm that Converges in Self- Play and Learns a Best Response Against Stationary Opponents Vincent Conitzer.
A competitive Texas Hold’em poker player via automated abstraction and real-time equilibrium computation Andrew Gilpin and Tuomas Sandholm Carnegie Mellon.
Correlated-Q Learning and Cyclic Equilibria in Markov games Haoqi Zhang.
Presenter: Robert Holte. 2 Helping the world understand … and make informed decisions. * * Potential beneficiaries: commercial games companies, and their.
Using Probabilistic Knowledge And Simulation To Play Poker (Darse Billings …) Presented by Brett Borghetti 7 Jan 2007.
How computers play games with you CS161, Spring ‘03 Nathan Sturtevant.
Adversarial Search: Game Playing Reading: Chess paper.
Nesterov’s excessive gap technique and poker Andrew Gilpin CMU Theory Lunch Feb 28, 2007 Joint work with: Samid Hoda, Javier Peña, Troels Sørensen, Tuomas.
Games & Adversarial Search Chapter 6 Section 1 – 4.
Computing equilibria in extensive form games Andrew Gilpin Advanced AI – April 7, 2005.
Finding equilibria in large sequential games of imperfect information Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department.
Models of Strategic Deficiency and Poker Workflow Inference: What to do with One Example and no Semantics.
Reinforcement Learning in the Presence of Hidden States Andrew Howard Andrew Arnold {ah679
Algorithms for Large Sequential Incomplete-Information Games Tuomas Sandholm Professor Carnegie Mellon University Computer Science Department.
Advanced Artificial Intelligence Lecture 3B: Game theory.
Game Playing.
Sequences of Take-It-or-Leave-it Offers: Near-Optimal Auctions Without Full Valuation Revelation Tuomas Sandholm and Andrew Gilpin Carnegie Mellon University.
Game representations, solution concepts and complexity Tuomas Sandholm Computer Science Department Carnegie Mellon University.
Game-theoretic analysis tools Tuomas Sandholm Professor Computer Science Department Carnegie Mellon University.
SARTRE: System Overview A Case-Based Agent for Two-Player Texas Hold'em Jonathan Rubin & Ian Watson University of Auckland Game AI Group
Memory and Analogy in Game-Playing Agents Jonathan Rubin & Ian Watson University of Auckland Game AI Group
Chapter 3: DECISION ANALYSIS Part 2 1. Decision Making Under Risk  Probabilistic decision situation  States of nature have probabilities of occurrence.
Games. Adversaries Consider the process of reasoning when an adversary is trying to defeat our efforts In game playing situations one searches down the.
The challenge of poker NDHU CSIE AI Lab 羅仲耘. 2004/11/04the challenge of poker2 Outline Introduction Texas Hold’em rules Poki’s architecture Betting Strategy.
Poker as a Testbed for Machine Intelligence Research By Darse Billings, Dennis Papp, Jonathan Schaeffer, Duane Szafron Presented By:- Debraj Manna Gada.
Sequential imperfect-information games Case study: Poker Tuomas Sandholm Carnegie Mellon University Computer Science Department.
The State of Techniques for Solving Large Imperfect-Information Games Tuomas Sandholm Professor Carnegie Mellon University Computer Science Department.
Steering Evolution and Biological Adaptation Strategically: Computational Game Theory and Opponent Exploitation for Treatment Planning, Drug Design, and.
Better automated abstraction techniques for imperfect information games Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department.
Econ 805 Advanced Micro Theory 1 Dan Quint Fall 2009 Lecture 1 A Quick Review of Game Theory and, in particular, Bayesian Games.
Finding equilibria in large sequential games of imperfect information CMU Theory Lunch – November 9, 2005 (joint work with Tuomas Sandholm)
The State of Techniques for Solving Large Imperfect-Information Games Tuomas Sandholm.
OPPONENT EXPLOITATION Tuomas Sandholm. Traditionally two approaches to tackling games Game theory approach (abstraction+equilibrium finding) –Safe in.
Chapter 5 Adversarial Search. 5.1 Games Why Study Game Playing? Games allow us to experiment with easier versions of real-world situations Hostile agents.
John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center with a lot of slides from Tuomas Sandholm Copyright 2015 Poker.
Artificial Intelligence AIMA §5: Adversarial Search
Strategy Grafting in Extensive Games
Keep the Adversary Guessing: Agent Security by Policy Randomization
Extensive-Form Game Abstraction with Bounds
Stochastic tree search and stochastic games
Joint work with Sam Ganzfried
Game Theory Just last week:
Computing equilibria in extensive form games
Extensive-form games and how to solve them
Noam Brown and Tuomas Sandholm Computer Science Department
Strategies for Poker AI player
AI and Games 唐平中 清华大学 2/22/2019 Pingzhong Tang.
Multiagent Systems Repeated Games © Manfred Huber 2018.
Finding equilibria in large sequential games of imperfect information
Presentation transcript:

Better automated abstraction techniques for imperfect information games, with application to Texas Hold’em poker * Andrew Gilpin and Tuomas Sandholm, CMU, CSD *This material based upon work supported by the National Science Foundation under ITR grant IIS Games and information Perfect information games: agents have complete knowledge of the world’s state (e.g., chess, Go) Imperfect information games: agents are partially informed about the world’s state. For example: Robot facing adversaries in an uncertain, stochastic environment Most economic situations where agents have private information Most card games where the opponents’ cards are hidden High-level view of GS2, our Texas Hold’em poker player Game tree has ~10 18 leaves This is too much to consider at once (even for GameShrink) We split 4 betting rounds into 2 phases We solve first phase (3 rounds) offline We solve second phase (2 rounds) in a real- time equilibrium computation, using updated beliefs from the first 2 rounds: References 1.D. Billings, N. Burch, A. Davidson, R. Holte, J. Schaeffer, T. Schauenberg, and D. Szafron. Approximating game-theoretic optimal strategies for full-scale poker. In IJCAI, D. Billings, M. Bowling, N. Burch, A. Davidson, R. Holte, J. Schaeffer, T. Schauenberg, and D. Szafron. Game tree search with adaptation in stochastic imperfect information games. In Computers and Games. Spring-Verlag A. Gilpin and T. Sandholm. Finding equilibria in large sequential games of imperfect information. In ACM- EC, A. Gilpin and T. Sandholm. A competitive Texas Hold’em poker player via automated abstraction and real-time equilibrium computation. In AAAI, P.B. Miltersen and T.B. Sørensen. A near-optimal strategy for a heads- up no-limit Texas Hold'em poker tournament. In AAMAS, GameShrink [3] Automated abstraction technique that yields smaller, equivalent game Nash equilibria in the smaller game correspond to Nash equilibria in the original game Smaller, abstracted game can be solved using standard techniques This method was used to solve Rhode Island Hold’em poker For even larger games, GameShrink can be used as an approximation algorithm Used to construct GS1 [4] Experimental results Sparbot: Game theory-based player, manual abstraction [1] Vexbot: Opponent modeling, miximax search with statistical sampling [2] Challenges for AI Imperfect information Risk assessment and management Speculation & counter-speculation Signaling and interpreting signals (misrepresentation, bluffing, etc.) Ongoing research Provable bounds on approximation Improved equilibrium-finding algorithms Non-smooth minimization techniques Interior-point method tailored for the sequence form LP Incorporating opponent modeling in game-theoretic framework Tournament poker (e.g. [5]) No-limit Texas Hold’em Games with more than 2 players May need to use alternative solution concept Game theory In multi-agent systems, an agent’s outcome depends on the actions of the other agents Consequently, an agent’s optimal action depends on the actions of the other agents Game theory provides guidance as to how an agent should act A game-theoretic equilibrium specifies a strategy for each agent such that no agent wishes to deviate Computing an equilibrium of a game is hard, but: Two-person zero-sum games can be solved in poly-time using the sequence form and linear programming Optimized approximate abstraction Original version of GameShrink yielded lopsided abstractions when used as an abstraction algorithm Now we instead find the abstractions via clustering and integer programming: For each betting round of the game: For each group of hands in that round: Use k-means clustering to determine best clustering for all possible values of k (using win probability as metric) For each value of k, compute the expected error Solve an integer program (IP) to allocate the buckets such that the overall expected error is minimized (Solving these IPs is easy in practice) Mitigating effect of having multiple phases (round-based abstraction) For the leaves of Phase I, GS1 and Sparbot assumed rollout with no betting for the final round Can do better by estimating the betting that occurs in later rounds Incorporate this info in LP for Phase I For each possible hand strength and for each possible betting situation, we store the probability of each action This data is mined from 100’000s of hands played We use these estimated payoffs as the payoffs to use in the LP for Phase I Example of betting in 4 th round Player 1 has bet. Player 2 to fold, call, or raise OpponentSeries won by GS2 Win rate (small bets per 100) GS138 of Sparbot28 of Vexbot32 of GS2 without improved abstraction and without estimated payoffs 48 of GS2 without improved abstraction 35 of GS2 without estimated payoffs 44 of