Nesterov’s excessive gap technique and poker Andrew Gilpin CMU Theory Lunch Feb 28, 2007 Joint work with: Samid Hoda, Javier Peña, Troels Sørensen, Tuomas.

Slides:



Advertisements
Similar presentations
1 LP, extended maxflow, TRW OR: How to understand Vladimirs most recent work Ramin Zabih Cornell University.
Advertisements

C&O 355 Mathematical Programming Fall 2010 Lecture 12 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A.
Sequential imperfect-information games Case study: Poker Tuomas Sandholm Carnegie Mellon University Computer Science Department.
Tuomas Sandholm, Andrew Gilpin Lossless Abstraction of Imperfect Information Games Presentation : B 趙峻甫 B 蔡旻光 B 駱家淮 B 李政緯.
Introduction to Algorithms
6.896: Topics in Algorithmic Game Theory Lecture 11 Constantinos Daskalakis.
Game Theoretical Insights in Strategic Patrolling: Model and Analysis Nicola Gatti – DEI, Politecnico di Milano, Piazza Leonardo.
MIT and James Orlin © Game Theory 2-person 0-sum (or constant sum) game theory 2-person game theory (e.g., prisoner’s dilemma)
Study Group Randomized Algorithms 21 st June 03. Topics Covered Game Tree Evaluation –its expected run time is better than the worst- case complexity.
Automatically Generating Game-Theoretic Strategies for Huge Imperfect-Information Games Tuomas Sandholm Carnegie Mellon University Computer Science Department.
Algorithms for solving two- player normal form games Tuomas Sandholm Carnegie Mellon University Computer Science Department.
Christos alatzidis constantina galbogini.  The Complexity of Computing a Nash Equilibrium  Constantinos Daskalakis  Paul W. Goldberg  Christos H.
INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 1 Games, Optimization, and Online Algorithms Martin Zinkevich University of Alberta.
Sequential imperfect-information games Case study: Poker Tuomas Sandholm Carnegie Mellon University Computer Science Department.
Poker for Fun and Profit (and intellectual challenge) Robert Holte Computing Science Dept. University of Alberta.
1 Computing Nash Equilibrium Presenter: Yishay Mansour.
Better automated abstraction techniques for imperfect information games, with application to Texas Hold’em poker Andrew Gilpin and Tuomas Sandholm Carnegie.
A competitive Texas Hold’em poker player via automated abstraction and real-time equilibrium computation Andrew Gilpin and Tuomas Sandholm Carnegie Mellon.
Better automated abstraction techniques for imperfect information games, with application to Texas Hold’em poker * Andrew Gilpin and Tuomas Sandholm, CMU,
Finite Mathematics & Its Applications, 10/e by Goldstein/Schneider/SiegelCopyright © 2010 Pearson Education, Inc. 1 of 68 Chapter 9 The Theory of Games.
Computing equilibria in extensive form games Andrew Gilpin Advanced AI – April 7, 2005.
An introduction to game theory Today: The fundamentals of game theory, including Nash equilibrium.
Finding equilibria in large sequential games of imperfect information Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department.
Chapter 3 Linear Programming Methods 高等作業研究 高等作業研究 ( 一 ) Chapter 3 Linear Programming Methods (II)
The Multiplicative Weights Update Method Based on Arora, Hazan & Kale (2005) Mashor Housh Oded Cats Advanced simulation methods Prof. Rubinstein.
Agents that can play multi-player games. Recall: Single-player, fully-observable, deterministic game agents An agent that plays Peg Solitaire involves.
6.853: Topics in Algorithmic Game Theory Fall 2011 Constantinos Daskalakis Lecture 11.
Optimal Client-Server Assignment for Internet Distributed Systems.
1 Multiplicative Weights Update Method Boaz Kaminer Andrey Dolgin Based on: Aurora S., Hazan E. and Kale S., “The Multiplicative Weights Update Method:
SARTRE: System Overview A Case-Based Agent for Two-Player Texas Hold'em Jonathan Rubin & Ian Watson University of Auckland Game AI Group
Center for Evolutionary Functional Genomics Large-Scale Sparse Logistic Regression Jieping Ye Arizona State University Joint work with Jun Liu and Jianhui.
Mechanism design. Goal of mechanism design Implementing a social choice function f(u 1, …, u |A| ) using a game Center = “auctioneer” does not know the.
Discrete Optimization Lecture #3 2008/3/41Shi-Chung Chang, NTUEE, GIIE, GICE Last Time 1.Algorithms and Complexity » Problems, algorithms, and complexity.
1 Lagrangean Relaxation --- Bounding through penalty adjustment.
Sequential imperfect-information games Case study: Poker Tuomas Sandholm Carnegie Mellon University Computer Science Department.
Part 3 Linear Programming
The State of Techniques for Solving Large Imperfect-Information Games Tuomas Sandholm Professor Carnegie Mellon University Computer Science Department.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Steering Evolution and Biological Adaptation Strategically: Computational Game Theory and Opponent Exploitation for Treatment Planning, Drug Design, and.
Algorithms for solving two-player normal form games
1. 2 Some details on the Simplex Method approach 2x2 games 2xn and mx2 games Recall: First try pure strategies. If there are no saddle points use mixed.
OR Chapter 7. The Revised Simplex Method  Recall Theorem 3.1, same basis  same dictionary Entire dictionary can be constructed as long as we.
Better automated abstraction techniques for imperfect information games Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department.
Linear Programming Chapter 9. Interior Point Methods  Three major variants  Affine scaling algorithm - easy concept, good performance  Potential.
Finding equilibria in large sequential games of imperfect information CMU Theory Lunch – November 9, 2005 (joint work with Tuomas Sandholm)
2.5 The Fundamental Theorem of Game Theory For any 2-person zero-sum game there exists a pair (x*,y*) in S  T such that min {x*V. j : j=1,...,n} =
Day 9 GAME THEORY. 3 Solution Methods for Non-Zero Sum Games Dominant Strategy Iterated Dominant Strategy Nash Equilibrium NON- ZERO SUM GAMES HOW TO.
OPPONENT EXPLOITATION Tuomas Sandholm. Traditionally two approaches to tackling games Game theory approach (abstraction+equilibrium finding) –Safe in.
Strategy Grafting in Extensive Games
Extensive-Form Game Abstraction with Bounds
Lap Chi Lau we will only use slides 4 to 19
Integer Programming An integer linear program (ILP) is defined exactly as a linear program except that values of variables in a feasible solution have.
The Duality Theorem Primal P: Maximize
Game Theory Just last week:
Tools for Decision Analysis: Analysis of Risky Decisions
Topics in Algorithms Lap Chi Lau.
Non-additive Security Games
Solver & Optimization Problems
Computing equilibria in extensive form games
Communication Complexity as a Lower Bound for Learning in Games
Extensive-form games and how to solve them
Chapter 6. Large Scale Optimization
CSE (c) S. Tanimoto, 2001 Search-Introduction
Multiagent Systems Repeated Games © Manfred Huber 2018.
Lecture 20 Linear Program Duality
Normal Form (Matrix) Games
Chapter 6. Large Scale Optimization
Finding equilibria in large sequential games of imperfect information
Multidisciplinary Optimization
Presentation transcript:

Nesterov’s excessive gap technique and poker Andrew Gilpin CMU Theory Lunch Feb 28, 2007 Joint work with: Samid Hoda, Javier Peña, Troels Sørensen, Tuomas Sandholm

Outline Two-person zero-sum sequential games First-order methods for convex optimization Nesterov’s excessive gap technique (EGT) EGT for sequential games Heuristics for EGT Application to Texas Hold’em poker

We want to solve: If Q 1 and Q 2 are simplices, this is the Nash equilibrium problem for two-person zero-sum matrix games If Q 1 and Q 2 are complexes, this is the Nash equilibrium problem for two-person zero-sum sequential games

What’s a complex? It’s just like a simplex, but more complex. Each player’s complex encodes her set of realization plans in the game In particular, player 1’s complex is where E and e depend on the game…

A B C D E F G H

Recall our problem: where Q 1 and Q 2 are complexes Since Q1 and Q2 have a linear description, this problem can be solved as an LP. However, current LP solution methods do not scale

(Un)scalability of LP solvers Rhode Island poker [Shi & Littman 01] –LP has 91 million rows and columns –Applying GameShrink automated abstraction algorithm yields an LP with only 1.2 million rows and columns, and 50 million non- zeros [G. & Sandholm, 06a] –Solution requires 25 GB RAM and over a week of CPU time Texas Hold’em poker –~10 18 nodes in game tree –Lossy abstractions need to be performed –Limitations of current solver technology primary limitation to achieving expert-level strategies [G. & Sandholm 06b, 07a] Instead of standard LP solvers, what about a first-order method?

Convex optimization Suppose we want to solve where f is convex. For general f, convergence requires O(1/ε 2 ) iterations (e.g., for subgradient methods) For smooth, strongly convex f with Lipschitz- continuous gradient, can be done in O(1/ε ½ ) iterations Note that this formulation captures ALL convex optimization problems (can model feasible space using an indicator function) Analysis based on black-box oracle access model. Can we do better by looking inside the box?

Strong convexity A function is strongly convex if there exists such that for all and all is the strong convexity parameter of d

Recall our problem: where Q 1 and Q 2 are complexes Equivalently: where and

,, Unfortunately, Φ and f are non-smooth Fortunately, they have a special structure Let d 1,d 2 be smooth and strongly convex on Q 1,Q 2 These are called prox-functions Now let μ > 0 and consider: These are well-defined smooth functions

Excessive gap condition From weak duality, we have that f(y) ≤ Φ(x) The excessive gap condition requires that f μ (y) ≤ Φ μ (x) (EGC) The algorithm maintains (EGC), and gradually decreases μ As μ decreases, the smoothed functions approach the non-smooth functions, and thus iterates satisfying (EGC) converge to optimal solutions

Nesterov’s main theorem Theorem [Nesterov 05] There exists an algorithm such that after at most N iterations, the iterates have duality gap at most Furthermore, each iteration only requires solving three problems of the form and performing three matrix-vector product operations on A.

Nice prox functions A prox function d for Q is nice if it is: 1.Strongly convex continuous everywhere in Q, and differentiable in the relative interior of Q 2.The min of d over Q is 0 3.The following maps are easily computable:

Nice simplex prox function 1: Entropy

Nice simplex prox function 2: Euclidean sargmax can be computed in O(n log n) time

From the simplex to the complex Theorem [Hoda, G., Peña 06] A nice prox function can be constructed for the complex via a recursive application of any nice prox function for the simplex

Prox function example Let be any nice simplex prox function. The prox function for this matrix is:

Solving

(similar to b(i-vii))

Heuristics [G., Hoda, Peña, Sandholm 07] Heuristic 1: Aggressive μ reduction –The μ given in the previous algorithm is a conservative choice guaranteeing convergence –In practice, we can do much better by aggressively pushing μ, while checking that the excessive gap condition is satisfied Heuristic 2: Balanced μ reduction –To prevent one μ from dominating the other, we also perform periodic adjustments to keep them within a small factor of one another

Matrix-vector multiplication in poker [G., Hoda, Peña, Sandholm 07] The main time and space bottleneck of the algorithm is the matrix-vector product on A Instead of storing the entire matrix, we can represent it as a composition of Kronecker products We can also effectively take advantage of parallelization in the matrix-vector product to achieve near-linear speedup

Memory usage comparison InstanceCPLEX IPMCPLEX SimplexEGT 10k0.082 GB>0.051 GB0.012 GB 160k2.25 GB>0.664 GB0.035 GB RI25.2 GB>3.45 GB0.15 GB Texas>458 GB 2.49 GB

Poker Poker is a recognized challenge problem in AI because (among other reasons) –the other players’ cards are hidden; –bluffing and other deceptive strategies are needed in a good player; –there is uncertainty about future events. Texas Hold’em: most popular variant of poker Two-player game tree has ~10 18 nodes

Potential-aware automated abstraction [G., Sandholm, Sørensen 07] Most prior automated abstraction algorithms employ a myopic expected value computation as a similarity metric –This ignores hands like flush draws where although the probability of winning is small, the payoff could be high Our newest algorithm considers higher-dimensional spaces consisting of histograms over abstracted classes of states from later stages of the game This enables our bottom-up abstraction algorithm to automatically take into account positive and negative potential

Solving the four-round model Computed abstraction with –20 first-round buckets –800 second-round buckets –4800 third-round buckets –28800 fourth-round buckets Algorithm using 30 GB RAM –Simply representing as an LP requires 32 TB –Outputs new, improved solution every 2.5 days

[G., Sandholm, Sørensen 07]

Future research Customizing second-order (e.g. interior- point methods) for the equilibrium problem Additional heuristics for improving practical performance of EGT algorithm Techniques for finding an optimal solution from an ε-solution

Thank you ☺