Better automated abstraction techniques for imperfect information games, with application to Texas Hold’em poker * Andrew Gilpin and Tuomas Sandholm, CMU, CSD *This material based upon work supported by the National Science Foundation under ITR grant IIS Games and information Perfect information games: agents have complete knowledge of the world’s state (e.g., chess, Go) Imperfect information games: agents are partially informed about the world’s state. For example: Robot facing adversaries in an uncertain, stochastic environment Most economic situations where agents have private information Most card games where the opponents’ cards are hidden High-level view of GS2, our Texas Hold’em poker player Game tree has ~10 18 leaves This is too much to consider at once (even for GameShrink) We split 4 betting rounds into 2 phases We solve first phase (3 rounds) offline We solve second phase (2 rounds) in a real- time equilibrium computation, using updated beliefs from the first 2 rounds: References 1.D. Billings, N. Burch, A. Davidson, R. Holte, J. Schaeffer, T. Schauenberg, and D. Szafron. Approximating game-theoretic optimal strategies for full-scale poker. In IJCAI, D. Billings, M. Bowling, N. Burch, A. Davidson, R. Holte, J. Schaeffer, T. Schauenberg, and D. Szafron. Game tree search with adaptation in stochastic imperfect information games. In Computers and Games. Spring-Verlag A. Gilpin and T. Sandholm. Finding equilibria in large sequential games of imperfect information. In ACM- EC, A. Gilpin and T. Sandholm. A competitive Texas Hold’em poker player via automated abstraction and real-time equilibrium computation. In AAAI, P.B. Miltersen and T.B. Sørensen. A near-optimal strategy for a heads- up no-limit Texas Hold'em poker tournament. In AAMAS, GameShrink [3] Automated abstraction technique that yields smaller, equivalent game Nash equilibria in the smaller game correspond to Nash equilibria in the original game Smaller, abstracted game can be solved using standard techniques This method was used to solve Rhode Island Hold’em poker For even larger games, GameShrink can be used as an approximation algorithm Used to construct GS1 [4] Experimental results Sparbot: Game theory-based player, manual abstraction [1] Vexbot: Opponent modeling, miximax search with statistical sampling [2] Challenges for AI Imperfect information Risk assessment and management Speculation & counter-speculation Signaling and interpreting signals (misrepresentation, bluffing, etc.) Ongoing research Provable bounds on approximation Improved equilibrium-finding algorithms Non-smooth minimization techniques Interior-point method tailored for the sequence form LP Incorporating opponent modeling in game-theoretic framework Tournament poker (e.g. [5]) No-limit Texas Hold’em Games with more than 2 players May need to use alternative solution concept Game theory In multi-agent systems, an agent’s outcome depends on the actions of the other agents Consequently, an agent’s optimal action depends on the actions of the other agents Game theory provides guidance as to how an agent should act A game-theoretic equilibrium specifies a strategy for each agent such that no agent wishes to deviate Computing an equilibrium of a game is hard, but: Two-person zero-sum games can be solved in poly-time using the sequence form and linear programming Optimized approximate abstraction Original version of GameShrink yielded lopsided abstractions when used as an abstraction algorithm Now we instead find the abstractions via clustering and integer programming: For each betting round of the game: For each group of hands in that round: Use k-means clustering to determine best clustering for all possible values of k (using win probability as metric) For each value of k, compute the expected error Solve an integer program (IP) to allocate the buckets such that the overall expected error is minimized (Solving these IPs is easy in practice) Mitigating effect of having multiple phases (round-based abstraction) For the leaves of Phase I, GS1 and Sparbot assumed rollout with no betting for the final round Can do better by estimating the betting that occurs in later rounds Incorporate this info in LP for Phase I For each possible hand strength and for each possible betting situation, we store the probability of each action This data is mined from 100’000s of hands played We use these estimated payoffs as the payoffs to use in the LP for Phase I Example of betting in 4 th round Player 1 has bet. Player 2 to fold, call, or raise OpponentSeries won by GS2 Win rate (small bets per 100) GS138 of Sparbot28 of Vexbot32 of GS2 without improved abstraction and without estimated payoffs 48 of GS2 without improved abstraction 35 of GS2 without estimated payoffs 44 of