INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 1 Games, Optimization, and Online Algorithms Martin Zinkevich University of Alberta.

Slides:



Advertisements
Similar presentations
An Introduction to Game Theory Part V: Extensive Games with Perfect Information Bernhard Nebel.
Advertisements

How to Schedule a Cascade in an Arbitrary Graph F. Chierchetti, J. Kleinberg, A. Panconesi February 2012 Presented by Emrah Cem 7301 – Advances in Social.
ECON 100 Tutorial: Week 9 office: LUMS C85.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Adversarial Search We have experience in search where we assume that we are the only intelligent being and we have explicit control over the “world”. Lets.
Continuation Methods for Structured Games Ben Blum Christian Shelton Daphne Koller Stanford University.
This Segment: Computational game theory Lecture 1: Game representations, solution concepts and complexity Tuomas Sandholm Computer Science Department Carnegie.
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
Evolution and Repeated Games D. Fudenberg (Harvard) E. Maskin (IAS, Princeton)
Games & Adversarial Search Chapter 5. Games vs. search problems "Unpredictable" opponent  specifying a move for every possible opponent’s reply. Time.
Introduction to Game theory Presented by: George Fortetsanakis.
Evaluation Through Conflict Martin Zinkevich Yahoo! Inc.
Game Theoretical Insights in Strategic Patrolling: Model and Analysis Nicola Gatti – DEI, Politecnico di Milano, Piazza Leonardo.
Joint Strategy Fictitious Play Sherwin Doroudi. “Adapted” from J. R. Marden, G. Arslan, J. S. Shamma, “Joint strategy fictitious play with inertia for.
Course: Applications of Information Theory to Computer Science CSG195, Fall 2008 CCIS Department, Northeastern University Dimitrios Kanoulas.
Response Regret Martin Zinkevich AAAI Fall Symposium November 5 th, 2005 This work was supported by NSF Career Grant #IIS
No-Regret Algorithms for Online Convex Programs Geoffrey J. Gordon Carnegie Mellon University Presented by Nicolas Chapados 21 February 2007.
Adversarial Search Chapter 6.
An Introduction to Game Theory Part I: Strategic Games
Satisfaction Equilibrium Stéphane Ross. Canadian AI / 21 Problem In real life multiagent systems :  Agents generally do not know the preferences.
A camper awakens to the growl of a hungry bear and sees his friend putting on a pair of running shoes, “You can’t outrun a bear,” scoffs the camper. His.
Planning under Uncertainty
An Introduction to Game Theory Part II: Mixed and Correlated Strategies Bernhard Nebel.
Poker for Fun and Profit (and intellectual challenge) Robert Holte Computing Science Dept. University of Alberta.
1 Computing Nash Equilibrium Presenter: Yishay Mansour.
Matrix Games Mahesh Arumugam Borzoo Bonakdarpour Ali Ebnenasir CSE 960: Selected Topics in Algorithms and Complexity Instructor: Dr. Torng.
An Introduction to Game Theory Part III: Strictly Competitive Games Bernhard Nebel.
A competitive Texas Hold’em poker player via automated abstraction and real-time equilibrium computation Andrew Gilpin and Tuomas Sandholm Carnegie Mellon.
Nash Q-Learning for General-Sum Stochastic Games Hu & Wellman March 6 th, 2006 CS286r Presented by Ilan Lobel.
2006 AAAI Computer Poker Competition Michael Littman Rutgers University Martin Zinkevich Christian Smith Luke Duguid U of Alberta.
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Solving the Protein Threading Problem in Parallel Nocola Yanev, Rumen Andonov Indrajit Bhattacharya CMSC 838T Presentation.
Better automated abstraction techniques for imperfect information games, with application to Texas Hold’em poker * Andrew Gilpin and Tuomas Sandholm, CMU,
Algorithms For Inverse Reinforcement Learning Presented by Alp Sardağ.
Games & Adversarial Search Chapter 6 Section 1 – 4.
1 On the Agenda(s) of Research on Multi-Agent Learning by Yoav Shoham and Rob Powers and Trond Grenager Learning against opponents with bounded memory.
Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011.
Simple search methods for finding a Nash equilibrium Ryan Porter, Eugene Nudelman, and Yoav Shoham Games and Economic Behavior, Vol. 63, Issue 2. pp ,
Reinforcement Learning in the Presence of Hidden States Andrew Howard Andrew Arnold {ah679
A Projection Framework for Near- Potential Polynomial Games Nikolai Matni Control and Dynamical Systems, California.
MAKING COMPLEX DEClSlONS
Distributed Constraint Optimization Michal Jakob Agent Technology Center, Dept. of Computer Science and Engineering, FEE, Czech Technical University A4M33MAS.
Advanced Artificial Intelligence Lecture 3B: Game theory.
Mechanisms for Making Crowds Truthful Andrew Mao, Sergiy Nesterko.
Game Playing.
Game representations, solution concepts and complexity Tuomas Sandholm Computer Science Department Carnegie Mellon University.
Agents that can play multi-player games. Recall: Single-player, fully-observable, deterministic game agents An agent that plays Peg Solitaire involves.
Standard and Extended Form Games A Lesson in Multiagent System Based on Jose Vidal’s book Fundamentals of Multiagent Systems Henry Hexmoor, SIUC.
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.
Equilibria in Network Games: At the Edge of Analytics and Complexity Rachel Kranton Duke University Research Issues at the Interface of Computer Science.
Vasilis Syrgkanis Cornell University
Better automated abstraction techniques for imperfect information games Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department.
ARTIFICIAL INTELLIGENCE (CS 461D) Princess Nora University Faculty of Computer & Information Systems.
1 Algorithms for Computing Approximate Nash Equilibria Vangelis Markakis Athens University of Economics and Business.
Econ 805 Advanced Micro Theory 1 Dan Quint Fall 2009 Lecture 1 A Quick Review of Game Theory and, in particular, Bayesian Games.
Chapter 5 Adversarial Search. 5.1 Games Why Study Game Playing? Games allow us to experiment with easier versions of real-world situations Hostile agents.
Game Playing Why do AI researchers study game playing?
Strategy Grafting in Extensive Games
Extensive-Form Game Abstraction with Bounds
The Duality Theorem Primal P: Maximize
Game Theory Just last week:
Extensive-form games and how to solve them
Games with Chance Other Search Algorithms
Artificial Intelligence
Structured Models for Multi-Agent Interactions
Artificial Intelligence
Multiagent Systems Repeated Games © Manfred Huber 2018.
Normal Form (Matrix) Games
A Technique for Reducing Normal Form Games to Compute a Nash Equilibrium Vincent Conitzer and Tuomas Sandholm Carnegie Mellon University, Computer Science.
Blockchain Mining Games
Presentation transcript:

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 1 Games, Optimization, and Online Algorithms Martin Zinkevich University of Alberta November 8 th, 2006

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 2 Question Does pop culture have anything to offer advanced research projects?

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 3 Fun and Games for Scientists Fun problem (in scientist-ese) (1) A problem which has a wide base of players at a variety of levels (2) A problem which has aspects which provide interesting challenges for the human mind

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 4 Fun and Games for Scientists Game problem (in scientist-ese) (1) A problem which has a formal structure (rules) with a variety of parameter settings (opponents). (2) A problem where the world IS out to get you.

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 5 Fun and Games “Fun” can capture aspects of difficulty that are orthogonal to the size of the state space or the algorithmic complexity of the problems involved. “Games” are environments where issues such as: learning-to-learn can be studied amongst a variety of opponents, and non-stationarity can be studied in the presence of other learning agents.

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 6 Two Objectives of This Talk Finding Nash equilibria Developing “experts” a priori in games

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 7 Main Point Algorithms that learn in self-play can be utilized to generate both an equilibrium as well as experts. Constraint/column generation is among these

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 8 Question in This Talk What are interesting unbalanced strategies to consider?

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 9 Outline Introduction Iterated Best Response Iterated Generalized Best Response Other Applications Conclusion

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 10 Iterated Best Response (Broken Version) One broken idea 1. INIT: start with an arbitrary strategy 2. RESPONSE: Compute the best response 3. REPEAT: step 2 until satisfied

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 11 Hide and Seek HIDE ACTIONS:BLUESEEK ACTIONS: RED

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 12 Hide and Seek HIDE ACTIONS:BLUE SEEK ACTIONS: RED

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 13 Problem: No Balance There is no one killer strategy in some games. Without adding some balance, there is no way to fully explore the space.

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 14 What Games Require Balance? Simultaneous move games Imperfect Information Games (games with private information).

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 15 SEEK ACTIONS: RED RESTRICTED NASH Balancing Existing Strategies HIDE ACTIONS:BLUE /50

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 16 Iterated Balanced Best Response 1. INIT: Start with strategies S for player 1 and T for player BALANCE: Make a bimatrix game and solve for equilibrium. 3. RESPONSE: Add the best responses to the equilibrium of the game to S and T. 4. REPEAT 2 and 3 until satisfied

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 17 What’s The Point? In general, equilibrium computations are significantly harder than best responses. In practice, it is easier to compute an approximate best response than an approximate Nash equilibrium.

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 18 Pure Poker Player 1, Player 2 each receive a “card” in [0,1] (a real number) Then, player 1 bets or checks. If player 1 bets, player 2 calls or folds.

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 19 Strategies BetCheck CallFold CallFold Player 2 Card Probability Mass 01 1 Card 01 1 Probability Mass Player 1

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 20 Pure Poker Continuous state space Given a strategy that splits [0,1] into a finite number of intervals and plays a fixed distribution in each interval, the best response is also of this form. CheckCallFold

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 21 Pure Poker Player 1 Player 2 Bet Call BetCheckCallFBetCheckCallFold BetCall FBetCheck BetCheck Call F Fold BCBCallFold 0

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 22 Real Poker In one abstraction we are currently working with, each player has 625 private states, and there are about 16,000 betting sequences, for over several BILLION states. While it is possible to iterate over all possible states in a short period of time, you can’t really perform complex operations on this size of problem.

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 23 Positive Results In under a hundred iterations, this technique can approximately solve simple variants of poker, such as Kuhn and Leduc Poker.

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 24 Outline Introduction Iterated Best Response Iterated Generalized Best Response Other Applications Conclusion

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 25 Practical Problem Although balance-response technique above works, it can generate lots of strategies before equilibrium is achieved. Is there a way to cut down on this?

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 26 Robustness How do you develop a strategy that is robust assuming that your opponent will play a strategy you have already seen?

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML A 21029B 1137Z 4445Y 3573X Min cba Strat Robustness: Generalized Best Response Maximize the MINIMUM against a set of opponents

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML A 21029B 1137Z 4445Y 3573X Min cba Strat Robustness: Generalized Best Response Maximize the MINIMUM against a set of opponents The set of possible actions could be INFINITE

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 29 Iterated Generalized Best Response Start with strategies S and T. Add to T a generalized best response to S. Add to S a generalized best response to T. Repeat until satisfied.

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 30 Hide and Seek HIDE ACTIONS:BLUESEEK ACTIONS: RED

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 31 How to Compute a Generalized Best Response? 1. Use a linear program. Could be slow Could be arbitrarily high precision 2. Use iterated best response 1. Start with sets of strategies S and possibly empty T. 2. Compute a Nash equilibrium between S and T. 3. Find a best response to the mixture over S. 4. Add it to T.

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 32 Results in Poker Using this technique (iterated GBR), we solved a four-round game of Texas Hold’Em We beat Opti4 (Sparbot)! By 0.01 small bets/hand 

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 33 Other Applications Economics (non-zero sum) Counterstrike/RTS Games (best response not easy)

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 34 Extensions Non-zero sum games Approximate best response operation (through reinforcement learning) Learning the abstraction while learning the strategy

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 35 Conclusions Algorithms that learn in self-play (such as iterated generalized best response) yield a wealth of useful strategies including approximate Nash equilibrium.

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 36 How Hard is a Game? For a game to be hard, it has to be at least POSSIBLE to play it badly: otherwise, regardless of how complex it is, it is still easy. The depth of human skill in a particular game indicates its complexity.

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 37 How Hard is a Game?

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 38 Formalism If the complexity of a game is at least k, then there exists people 1 to k, such that for any two people in the list i>j, player i can beat player j with at least 2/3 probability.

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 39 How Hard is a Game?

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 40 Important Property: Transitivity in The List

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 41 Formalism If the complexity of a game is at least k, then there exists people 1 to k, such that for any two people in the list i>j, player i can beat player j with at least 2/3 probability.

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 42 Why People?

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 43 Why People? Choose a number between 1 and 100. Highest number wins a dollar, no money is exchanged on a tie.

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 44 Formalism If the complexity of a game is at least k, then there exists strategies 1 to k, such that for any two strategies in the list i>j, strategy i can beat strategy j with at least 2/3 probability.

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 45 Formalism The epsilon-complexity of a game is at least k if there exists strategies 1 to k, and for any two strategies i>j, EV[i playing against j]>epsilon

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 46 Make it a Linear Program? The linear program (sequence form) has a number of constraints x variables roughly proportional to the size of the game tree. The coefficient matrix is big: this makes inversion difficult. Also: numerical instabilities

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 47 A Theoretical Guarantee? (No!) HIDE ACTIONS:BLUESEEK ACTIONS: RED

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 48 The Theoretical Problem Each new bot is a best response to a particular mixture of the previous bots. There could be a different mixture over those bots which would do BETTER against that new bot: in fact, it could even beat the new bot!

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 49 A Theoretical Guarantee? (No!) HIDE ACTIONS:BLUESEEK ACTIONS: RED

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 50 A Theoretical Guarantee? (No!) HIDE ACTIONS:BLUESEEK ACTIONS: RED

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 51 A Theoretical Guarantee? (Yes!)

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 52 Relation to Complexity Theorem: If the epsilon-complexity is k, then if you use iterative generalized best response for k steps, there is no strategy that can beat every strategy of yours by epsilon.

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 53 An Epsilon Equilibrium At the end of iterated generalized best response, you have a guarantee of how hard it is to beat the sequence, not necessarily an individual from the sequence. However, using a similar trick, you can easily compute a mixture over these strategies against which no strategy can gain more than epsilon.

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 54 Two Techniques for Finding Nash of Extensive-Form Games Take ALL strategies of the full game, and make a bimatrix game where each is an action (Thomson; Kuhn). Bimatrix game exponential in the size of the tree. Form a linear program using the sequence form. LP is linear in the size of the tree (Koller et al).