Strategy Grafting in Extensive Games

Strategy Grafting in Extensive Games
Kevin Waugh, Nolan Bard and Michael Bowling NIPS 2009 (Dec) / Boyoung Kim GT

Introduction Extensive game – often used to model the interaction of multiple agents within an environment. Trend – increasing the size of an extensive game that can be feasibly solved. E.g. 2-player limit Texas Hold’em/ approximately 1018 states. The classic linear programming technique/ approximately 107 states. More recent techniques/ approximately 1012 states. (Andrew Gilpin, Samid Hoda, Javier Pe˜na, and Tuomas Sandholm. Gradient-based Algorithms for Finding Nash Equilibria in Extensive Form Games. In Proceedings of the Eighteenth International Conference on Game Theory, & Martin Zinkevich, Michael Johanson, Michael Bowling, and Carmelo Piccione. Regret Minimization in Games with Incomplete Information. In Advances in Neural Information Processing Systems Twenty, pages 1729–1736, A longer version is available as a University of Alberta Technical Report, TR07-14.)

Introduction Abstraction technique: reduce the original game to an abstract game  solve it  the resulting strategy is played in the original game. Abstract

Background Def 1 (Extensive Game) A finite extensive game w/ imperfect information is denoted Γ and has the following components: A finite set N of players. A finite set H of sequences: the possible histories of actions. The empty sequence is in H and every prefix of a sequence in H is also in H. Terminal histories: Z⊆H s.t. No sequence in Z is a strict prefix of any sequence in H. A(h)={a : (h,a) ∈ H} are the actions available after a non-terminal history h ∈ H＼Z. A Player function P that assigns to each non-terminal history a member of N∪{c}, where c represents chance. P(h): the player who takes an action after the history h. Hi: the set of histories where player i choose the next action. H={Φ,A,B,AC,AD,BE,BF,BFG,BFH} Z={AC,AD,BE,BFG,BFH} A(BF)={G,H} P(A)=2, P(BF)=1 H1={Φ,BF} H2={A,B}

Background A function fc that associates w/ every history h ∈ Hc a probability distribution fc( · | h ) on A(h). fc( a | h ): the prob. that a occur given h. For each player i ∈ N, a utility function ui that assigns each terminal history a real value. ui(z) is rewarded to player i for reaching terminal history z. If N={1,2} and for all z ∈ Z, u1(z)=-u2(z), an extensive game is said to be zero-sum. For each player i ∈ N, a partition Ii of Hi w/ the property that A(h)=A(h’) whenever h and h’ are in the same member of the partition. Ii : the information partition of player i; a set Ii ∈ Ii is an information set of player i □ In this paper we only focus on: 2-player zero-sum games w/ perfect recall. Perfect recall: a restriction on the information partitions that excludes unrealistic situations where a player is forced to forget his own past information or decisions.

Background Def 2 (Strategy) A strategy for player i, σi , that assigns a probability distribution Over A(h) to each h ∈ Hi . This function is constrained so that σi(h)= σi(h’) whenever h and h’ are in the same information set. A strategy is pure if no randomization is required. Σi : the set of all strategies for player i □ Def 3 (Strategy Profile) A strategy profile in extensive game Γ: a set of strategies, σ={σ1, σ2, … ,σN}, that contains one strategy for each player. σ-i : the set strategies for all players except player i. Σ : the set of all strategy profiles □ ui(σ): the expected utility of player i when all players play according to a strategy profile σ. ui( σi , σ-i ): the expected utility of player i when all other players play according to σ-i and player i plays according to σi.

Background Def 4 (Nash Equilibrium) A Nash equilibrium is a strategy profile σ where For all i∈N and for all σi‘∈Σi , ui (σi , σ-i) ≥ ui (σi‘, σ-i) An approximation of a Nash equilibrium or ε- Nash equilibrium is a strategy profile σ where For all i∈N and for all σi‘∈Σi , ui (σi , σ-i) + ε ≥ ui (σi‘, σ-i) □ A Nash Equilibrium exists in all extensive games. Recall (Essentials of Game Theory, Thm 2.3.1) (Nash, 1951) Every game w/ a finite number of players and action profiles has at least one Nash equilibrium. In a zero-sum game, we say it is optimal to play any strategy belonging to an equilibrium because this guarantees the equilibrium player the highest expected utility in the worst case. In this sense, we call computing an equilibrium in a zero-sum game solving game.

Background Many games are too large to solve directly  abstraction (reduce the game to a manageable size one). The abstraction game is solved  the resulting strategy is presumed to be strong in the original game. Abstraction can be achieved: 1) Merging information sets together 2) restricting the actions a player can take from a given history 3) or a combination of both.

Background Strategies for abstract games are defined in the same manner.

Strategy grafting There is no guarantee that optimal strategies in abstract games are strong in the original game( [1] Kevin Waugh, David Schnizlein, Michael Bowling, and Duane Szafron. Abstraction Pathologies in Extensive Games. In Proceedings of the Eighth International Joint Conference on Autonomous Agents and Multi-Agent Systems, pages 781–788, ). …ⓐ And these strategies empirically perform well. Easy to show: Strategy space must include at least as good (if not better) strategies than a smaller space that it refines([1]). …ⓑ ⓑ would seem to imply that a larger abstraction would always be better?  It depends on the method of selecting a strategy.

Strategy grafting Def 6 (Dominated Strategy) A dominated strategy for player i is a pure strategy, σi , s.t. there exists another strategy, σi ’ , where for all opponent strategies σ-i , ui (σi ’, σ-i) ≥ ui (σi , σ-i) and the inequality must hold strictly for at least one opponent strategy □ Abstraction does not (necessarily) preserve strategy domination: When Abstracting, one can merge a dominated strategy in w/ a non-dominated strategy. In abstract game, this combined strategy might become part of an equilibrium  Abstract strategy make mistakes. Finer abstraction may better preserve domination. Decomposition: Natural approach for using strategy spaces w/o additional computational costs. In extensive games w/ imperfect information, straightforward decomposition can be a trouble – Opponent might be able to determine which subgame is being played.

Strategy grafting - The solution to these sub-games: grafts.

Strategy grafting Start out w/ a base strategy for the player – use the same base strategy for all grafts  only information shared. Only the portion of the game is allowed to vary – remaining parts are played by the base strategy.  block of the grafting partition. Not interested in the pair of strategies. When we construct a graft, our opponent must learn a strategy for the entire game.

Strategy grafting - The quality of the grafted strategies.

Strategy Grafting in Extensive Games

Similar presentations

Presentation on theme: "Strategy Grafting in Extensive Games"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Strategy Grafting in Extensive Games

Similar presentations

Presentation on theme: "Strategy Grafting in Extensive Games"— Presentation transcript:

Similar presentations

About project

Feedback