Presentation is loading. Please wait.

Presentation is loading. Please wait.

Extensive-Form Game Abstraction with Bounds

Similar presentations


Presentation on theme: "Extensive-Form Game Abstraction with Bounds"— Presentation transcript:

1 Extensive-Form Game Abstraction with Bounds

2 Outline Motivation Abstractions of extensive-form games
Theoretical guarantees on abstraction quality Computing abstractions Hardness Algorithms Experiments

3 ε-Nash equilibrium Real game Abstracted game (automated) Abstraction
Equilibrium-finding algorithm Map to real game ε-Nash equilibrium Nash equilibrium

4 Why is game abstraction difficult?
Abstraction pathologies Sometimes, refining an abstraction can be worse Every equilibrium can be worse Tuomas presented numerous papers on game abstraction without solution quality bounds Lossless abstractions are too large Particularly difficult to analyze in extensive-form games Information sets cut across subtrees A player might be best responding to nodes in different part of game tree

5 Extensive-form games - definition
Game tree. Branches denote actions. Information sets. Payoff at leaves. Perfect recall. Players remember past actions. Imperfect recall. Players might forget past actions. C P1 P1 P2 P2 1.5 -3 P1 P1 P1 8 -6 -6

6 Counterfactual value Defined for each information set 𝐼.
Expected value from information set. Assumes that player plays to reach 𝐼. Rescales by probability of reaching 𝐼. Example: Uniform distribution everywhere. Bottom information set. Reach probability: Conditional node distribution: , 𝑉 𝐼 = 1 2 ∗ 1 2 ∗8− 1 2 ∗ 1 2 ∗6=0.5. C P1 P1 P2 P2 1.5 -3 6 P1 P1 8 -6

7 Regret Defined for each action 𝑎.
Change in expected value when taking 𝑎. Holds everything else constant. Example: Uniform distribution everywhere. Bottom information set. 𝑉 𝐼 = 1 2 ∗ 1 2 ∗8− 1 2 ∗ 1 2 ∗6=0.5. To get 𝑟(𝑒) set 𝜎 1 ′ 𝑒 =1. 𝑟 𝑒 = 𝑉 𝑒 𝐼 −𝑉 𝐼 =3.5. C P1 P1 P2 P2 1.5 -3 6 P1 P1 8 -6

8 Abstraction Goal: reduce number of decision variables, maintain low regret Method: merge information sets/remove actions Constraints: Must define bijection between nodes Nodes in bijection must have same ancestors sequences over other players Must have same descendant sequences over all players Might create imperfect recall Worse bounds C P1 P1 P2 P2 1.5 8 -6 P1 9 -7 P1

9 Abstraction payoff error
Quantify error in abstraction. Measure similarity of abstraction and full game. Based on bijection. Maximum difference between leaf nodes: First mapping: 1 Second mapping: 2 Formally: Leaf node: 𝜖 𝑅 𝑠 =|𝑢 𝑠 −𝑢( 𝑠 ′ )| Player node: 𝜖 𝑅 𝑠 = max 𝑐 𝜖 𝑅 (𝑐) Nature node: 𝜖 𝑅 𝑠 = 𝑐 𝑝 𝑐 𝜖 𝑅 (𝑐) C P1 P1 P2 P2 1.5 8 -6 P1 9 -8 P1

10 Abstraction chance node error
Quantify error in abstraction. Measure similarity of abstraction and full game. Based on bijection. Maximum difference between leaf nodes: First mapping: 1 Second mapping: 2 Formally: Player node: 𝜖 0 𝑠 = max 𝑐 𝜖 0 (𝑐) Nature node: 𝜖 0 𝑠 = 𝑐 |𝑝 𝑐 −𝑝 (𝑐′) | P2 P1 P1 C C 1.5 1 3 2 3 2 3 1 3 8 -6 P1 9 -8 P1

11 Abstraction chance distribution error
Quantify error in abstraction. Measure similarity of abstraction and full game. Based on bijection. Maximum difference between leaf nodes: First mapping: 1 Second mapping: 2 Formally: Infoset node: 𝜖 0 𝑠 =|𝑝 𝑠 𝐼 −𝑝( 𝑠 ′ | 𝐼 ′ ) Infoset: 𝜖 0 𝐼 = 𝑠∈𝐼 𝜖 0 (𝑠) P2 P1 P1 C C 1.5 1 3 2 3 2 3 1 3 8 -6 P1 9 -8 P1

12 Bounds on abstraction quality
Given: Original perfect-recall game Abstraction that satisfies our constraints Abstraction strategy with bounded regret on each action We get: An 𝜖-Nash equilibrium in full game Perfect recall abstraction error for player 𝑖: 2 𝜖 𝑖 𝑅 + 𝑗∈ ℋ 0 𝜖 𝑗 0 𝑊+ 𝑗∈ ℋ 𝑖 2 𝜖 𝑗 0 𝑊 + 𝑟 𝑖 Imperfect-recall abstraction error: Same as for perfect recall, but ℋ 𝑖 times Linearly worse in game depth

13 Complexity and structure
NP-hard to minimize our bound (both for perfect and imperfect recall) Determining whether two trees are topologically mappable is graph isomorphism complete Decomposition: Level-by-level is, in general, impossible There might be no legal abstractions identifiable through only single-level abstraction

14 Algorithms Single-level abstraction:
Assumes set of legal-to-merge information sets Equivalent to clustering with weird objective function Forms a metric space Immediately yields 2-approximation algorithm for chance-free abstraction Chance-only abstractions gives new objective function not considered in clustering literature Weighted sum over elements, with each taking the maximum intra-cluster distance Integer programming for whole tree: Variables represent merging nodes and/or information sets #variables quadratic in tree size

15 Perfect recall IP experiments
5 cards 2 kings 2 jacks 1 queen Limit hold’em 2 players 1 private card dealt to each 1 public card dealt Betting after cards are dealt in each round 2 raises per round

16 Signal tree Tree representing nature actions that are independent of player actions Actions available to players must be independent of these Abstraction of signal tree leads to valid abstraction of full game tree

17 Experiments that minimize tree size

18 Experiments that minimize bound

19 Imperfect-recall single-level experiments
Game: Die-roll poker. Poker-like game that uses dice. Correlated die rolls (e.g. P1 rolls a 3, then P2 is more likely to roll a number close to 3). Game order: Each player rolls a private 4-sided die. Betting happens. Each player rolls a second private 4-sided die. Another round of betting. Model games where players get individual noisy and/or shared imperfect signals.

20 Experimental setup Abstraction:
Compute bound-minimizing abstraction of the second round of die rolls. Relies on integer-programming formulation. Apply counterfactual regret minimization (CFR) algorithm. Gives solution with bounded regret on each action. Compute actual regret in full game. Compare to bound from our theoretical result.

21 Imperfect-recall experiments

22 Comparison to prior results
Lanctot, Gibson, Burch, Zinkevich, and Bowling. ICML12 Bounds also for imperfect-recall abstractions Only for CFR algorithm Allow only utility error Utility error exponentially worse (𝑂( 𝑏 ℎ ) vs. 𝑂(ℎ)) Do not take chance weights into account Very nice experiments for utility-error only case Kroer and Sandholm. EC14 Bounds only for perfect-recall abstractions Do not have linear dependence on height Imperfect-recall work builds on both papers The model of abstraction is an extension of ICML12 paper Analysis uses techniques from EC14 paper Our experiments are for the utility+chance outcome error case


Download ppt "Extensive-Form Game Abstraction with Bounds"

Similar presentations


Ads by Google