Extensive-Form Game Abstraction with Bounds

Slides:



Advertisements
Similar presentations
An Introduction to Game Theory Part V: Extensive Games with Perfect Information Bernhard Nebel.
Advertisements

1 Chapter 11 Here we see cases where the basic LP model can not be used.
This Segment: Computational game theory Lecture 1: Game representations, solution concepts and complexity Tuomas Sandholm Computer Science Department Carnegie.
Tuomas Sandholm, Andrew Gilpin Lossless Abstraction of Imperfect Information Games Presentation : B 趙峻甫 B 蔡旻光 B 駱家淮 B 李政緯.
Games & Adversarial Search Chapter 5. Games vs. search problems "Unpredictable" opponent  specifying a move for every possible opponent’s reply. Time.
Outline Dudo Rules Regret and Counterfactual Regret Imperfect Recall Abstraction Counterfactual Regret Minimization (CFR) Difficulties Fixed-Strategy.
Games & Adversarial Search
Adversarial Search 對抗搜尋. Outline  Optimal decisions  α-β pruning  Imperfect, real-time decisions.
Extensive-form games. Extensive-form games with perfect information Player 1 Player 2 Player 1 2, 45, 33, 2 1, 00, 5 Players do not move simultaneously.
INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 1 Games, Optimization, and Online Algorithms Martin Zinkevich University of Alberta.
CSE332: Data Abstractions Lecture 9: B Trees Dan Grossman Spring 2010.
This time: Outline Game playing The minimax algorithm
Poker for Fun and Profit (and intellectual challenge) Robert Holte Computing Science Dept. University of Alberta.
Game Playing CSC361 AI CSC361: Game Playing.
Better automated abstraction techniques for imperfect information games, with application to Texas Hold’em poker Andrew Gilpin and Tuomas Sandholm Carnegie.
A competitive Texas Hold’em poker player via automated abstraction and real-time equilibrium computation Andrew Gilpin and Tuomas Sandholm Carnegie Mellon.
Better automated abstraction techniques for imperfect information games, with application to Texas Hold’em poker * Andrew Gilpin and Tuomas Sandholm, CMU,
Games & Adversarial Search Chapter 6 Section 1 – 4.
Computing equilibria in extensive form games Andrew Gilpin Advanced AI – April 7, 2005.
Finding equilibria in large sequential games of imperfect information Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department.
Decision Procedures An Algorithmic Point of View
Minimax Trees: Utility Evaluation, Tree Evaluation, Pruning CPSC 315 – Programming Studio Spring 2008 Project 2, Lecture 2 Adapted from slides of Yoonsuck.
Chapter 9 Games with Imperfect Information Bayesian Games.
Lecture 6: Game Playing Heshaam Faili University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance.
Game Playing.
Extensive-form games Vincent Conitzer
Vasilis Syrgkanis Cornell University
Better automated abstraction techniques for imperfect information games Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department.
Finding equilibria in large sequential games of imperfect information CMU Theory Lunch – November 9, 2005 (joint work with Tuomas Sandholm)
The State of Techniques for Solving Large Imperfect-Information Games Tuomas Sandholm.
Constraint Propagation CS121 – Winter Constraint Propagation2 Constraint Propagation … … is the process of determining how the possible values of.
1 Decisions in games Minimax algorithm  -  algorithm Tic-Tac-Toe game Decisions in games Minimax algorithm  -  algorithm Tic-Tac-Toe game.
Game Playing Why do AI researchers study game playing?
Strategy Grafting in Extensive Games
Keep the Adversary Guessing: Agent Security by Policy Randomization
Instructor: Vincent Conitzer
Stochastic tree search and stochastic games
Game Theory Just last week:
Last time: search strategies
Computing equilibria in extensive form games
Communication Complexity as a Lower Bound for Learning in Games
Extensive-form games and how to solve them
Noam Brown and Tuomas Sandholm Computer Science Department
Games & Adversarial Search
Games with Chance Other Search Algorithms
Games & Adversarial Search
Constraint Propagation
Structured Models for Multi-Agent Interactions
Chapter 6. Large Scale Optimization
Multiagent Systems Extensive Form Games © Manfred Huber 2018.
Presented By Aaron Roth
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Games & Adversarial Search
Games & Adversarial Search
Kevin Mason Michael Suggs
Instructor: Vincent Conitzer
CPS Extensive-form games
Multiagent Systems Repeated Games © Manfred Huber 2018.
Vincent Conitzer Extensive-form games Vincent Conitzer
CPS 173 Extensive-form games
Major Design Strategies
Games & Adversarial Search
Adversarial Search CMPT 420 / CMPG 720.
Major Design Strategies
Normal Form (Matrix) Games
Chapter 6. Large Scale Optimization
Games & Adversarial Search
Unit II Game Playing.
Finding equilibria in large sequential games of imperfect information
Minimax Trees: Utility Evaluation, Tree Evaluation, Pruning
Presentation transcript:

Extensive-Form Game Abstraction with Bounds

Outline Motivation Abstractions of extensive-form games Theoretical guarantees on abstraction quality Computing abstractions Hardness Algorithms Experiments

ε-Nash equilibrium Real game Abstracted game (automated) Abstraction Equilibrium-finding algorithm Map to real game ε-Nash equilibrium Nash equilibrium

Why is game abstraction difficult? Abstraction pathologies Sometimes, refining an abstraction can be worse Every equilibrium can be worse Tuomas presented numerous papers on game abstraction without solution quality bounds Lossless abstractions are too large Particularly difficult to analyze in extensive-form games Information sets cut across subtrees A player might be best responding to nodes in different part of game tree

Extensive-form games - definition Game tree. Branches denote actions. Information sets. Payoff at leaves. Perfect recall. Players remember past actions. Imperfect recall. Players might forget past actions. C P1 P1 P2 P2 1.5 -3 P1 P1 P1 8 -6 -6

Counterfactual value Defined for each information set 𝐼. Expected value from information set. Assumes that player plays to reach 𝐼. Rescales by probability of reaching 𝐼. Example: Uniform distribution everywhere. Bottom information set. Reach probability: 1 6 . Conditional node distribution: 1 2 , 1 2 . 𝑉 𝐼 = 1 2 ∗ 1 2 ∗8− 1 2 ∗ 1 2 ∗6=0.5. C P1 P1 P2 P2 1.5 -3 6 P1 P1 8 -6

Regret Defined for each action 𝑎. Change in expected value when taking 𝑎. Holds everything else constant. Example: Uniform distribution everywhere. Bottom information set. 𝑉 𝐼 = 1 2 ∗ 1 2 ∗8− 1 2 ∗ 1 2 ∗6=0.5. To get 𝑟(𝑒) set 𝜎 1 ′ 𝑒 =1. 𝑟 𝑒 = 𝑉 𝑒 𝐼 −𝑉 𝐼 =3.5. C P1 P1 P2 P2 1.5 -3 6 P1 P1 8 -6

Abstraction Goal: reduce number of decision variables, maintain low regret Method: merge information sets/remove actions Constraints: Must define bijection between nodes Nodes in bijection must have same ancestors sequences over other players Must have same descendant sequences over all players Might create imperfect recall Worse bounds C P1 P1 P2 P2 1.5 8 -6 P1 9 -7 P1

Abstraction payoff error Quantify error in abstraction. Measure similarity of abstraction and full game. Based on bijection. Maximum difference between leaf nodes: First mapping: 1 Second mapping: 2 Formally: Leaf node: 𝜖 𝑅 𝑠 =|𝑢 𝑠 −𝑢( 𝑠 ′ )| Player node: 𝜖 𝑅 𝑠 = max 𝑐 𝜖 𝑅 (𝑐) Nature node: 𝜖 𝑅 𝑠 = 𝑐 𝑝 𝑐 𝜖 𝑅 (𝑐) C P1 P1 P2 P2 1.5 8 -6 P1 9 -8 P1

Abstraction chance node error Quantify error in abstraction. Measure similarity of abstraction and full game. Based on bijection. Maximum difference between leaf nodes: First mapping: 1 Second mapping: 2 Formally: Player node: 𝜖 0 𝑠 = max 𝑐 𝜖 0 (𝑐) Nature node: 𝜖 0 𝑠 = 𝑐 |𝑝 𝑐 −𝑝 (𝑐′) | P2 P1 P1 C C 1.5 1 3 2 3 2 3 1 3 8 -6 P1 9 -8 P1

Abstraction chance distribution error Quantify error in abstraction. Measure similarity of abstraction and full game. Based on bijection. Maximum difference between leaf nodes: First mapping: 1 Second mapping: 2 Formally: Infoset node: 𝜖 0 𝑠 =|𝑝 𝑠 𝐼 −𝑝( 𝑠 ′ | 𝐼 ′ ) Infoset: 𝜖 0 𝐼 = 𝑠∈𝐼 𝜖 0 (𝑠) P2 P1 P1 C C 1.5 1 3 2 3 2 3 1 3 8 -6 P1 9 -8 P1

Bounds on abstraction quality Given: Original perfect-recall game Abstraction that satisfies our constraints Abstraction strategy with bounded regret on each action We get: An 𝜖-Nash equilibrium in full game Perfect recall abstraction error for player 𝑖: 2 𝜖 𝑖 𝑅 + 𝑗∈ ℋ 0 𝜖 𝑗 0 𝑊+ 𝑗∈ ℋ 𝑖 2 𝜖 𝑗 0 𝑊 + 𝑟 𝑖 Imperfect-recall abstraction error: Same as for perfect recall, but ℋ 𝑖 times Linearly worse in game depth

Complexity and structure NP-hard to minimize our bound (both for perfect and imperfect recall) Determining whether two trees are topologically mappable is graph isomorphism complete Decomposition: Level-by-level is, in general, impossible There might be no legal abstractions identifiable through only single-level abstraction

Algorithms Single-level abstraction: Assumes set of legal-to-merge information sets Equivalent to clustering with weird objective function Forms a metric space Immediately yields 2-approximation algorithm for chance-free abstraction Chance-only abstractions gives new objective function not considered in clustering literature Weighted sum over elements, with each taking the maximum intra-cluster distance Integer programming for whole tree: Variables represent merging nodes and/or information sets #variables quadratic in tree size

Perfect recall IP experiments 5 cards 2 kings 2 jacks 1 queen Limit hold’em 2 players 1 private card dealt to each 1 public card dealt Betting after cards are dealt in each round 2 raises per round

Signal tree Tree representing nature actions that are independent of player actions Actions available to players must be independent of these Abstraction of signal tree leads to valid abstraction of full game tree

Experiments that minimize tree size

Experiments that minimize bound

Imperfect-recall single-level experiments Game: Die-roll poker. Poker-like game that uses dice. Correlated die rolls (e.g. P1 rolls a 3, then P2 is more likely to roll a number close to 3). Game order: Each player rolls a private 4-sided die. Betting happens. Each player rolls a second private 4-sided die. Another round of betting. Model games where players get individual noisy and/or shared imperfect signals.

Experimental setup Abstraction: Compute bound-minimizing abstraction of the second round of die rolls. Relies on integer-programming formulation. Apply counterfactual regret minimization (CFR) algorithm. Gives solution with bounded regret on each action. Compute actual regret in full game. Compare to bound from our theoretical result.

Imperfect-recall experiments

Comparison to prior results Lanctot, Gibson, Burch, Zinkevich, and Bowling. ICML12 Bounds also for imperfect-recall abstractions Only for CFR algorithm Allow only utility error Utility error exponentially worse (𝑂( 𝑏 ℎ ) vs. 𝑂(ℎ)) Do not take chance weights into account Very nice experiments for utility-error only case Kroer and Sandholm. EC14 Bounds only for perfect-recall abstractions Do not have linear dependence on height Imperfect-recall work builds on both papers The model of abstraction is an extension of ICML12 paper Analysis uses techniques from EC14 paper Our experiments are for the utility+chance outcome error case