Joint work with Sam Ganzfried

Slides:



Advertisements
Similar presentations
(More on) characterizing dominant-strategy implementation in quasilinear environments (see, e.g., Nisan’s review: Chapter 9 of Algorithmic Game Theory.
Advertisements

This Segment: Computational game theory Lecture 1: Game representations, solution concepts and complexity Tuomas Sandholm Computer Science Department Carnegie.
Falcon on a Cloudy Day A Ro Sham Bo Algorithm by Andrew Post.
Evaluation Through Conflict Martin Zinkevich Yahoo! Inc.
Learning in games Vincent Conitzer
Minimax and Alpha-Beta Reduction Borrows from Spring 2006 CS 440 Lecture Slides.
Social choice theory = preference aggregation = voting assuming agents tell the truth about their preferences Tuomas Sandholm Professor Computer Science.
Poker for Fun and Profit (and intellectual challenge) Robert Holte Computing Science Dept. University of Alberta.
AWESOME: A General Multiagent Learning Algorithm that Converges in Self- Play and Learns a Best Response Against Stationary Opponents Vincent Conitzer.
Presenter: Robert Holte. 2 Helping the world understand … and make informed decisions. * * Potential beneficiaries: commercial games companies, and their.
Better automated abstraction techniques for imperfect information games, with application to Texas Hold’em poker * Andrew Gilpin and Tuomas Sandholm, CMU,
Nesterov’s excessive gap technique and poker Andrew Gilpin CMU Theory Lunch Feb 28, 2007 Joint work with: Samid Hoda, Javier Peña, Troels Sørensen, Tuomas.
Multi-item auctions & exchanges (multiple distinguishable items for sale) Tuomas Sandholm Carnegie Mellon University.
Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011.
Poki: The Poker Agent Greg Priebe Zak Knudson. Overview Texas Hold’em poker Architecture and Opponent Modeling of Poki Improvements from past Poki Betting.
Models of Strategic Deficiency and Poker Workflow Inference: What to do with One Example and no Semantics.
Introduction to the Big Stack Strategy (BSS) Strategy: No Limit.
CPS Learning in games Vincent Conitzer
Introduction Many decision making problems in real life
Game-theoretic analysis tools Tuomas Sandholm Professor Computer Science Department Carnegie Mellon University.
SARTRE: System Overview A Case-Based Agent for Two-Player Texas Hold'em Jonathan Rubin & Ian Watson University of Auckland Game AI Group
The challenge of poker NDHU CSIE AI Lab 羅仲耘. 2004/11/04the challenge of poker2 Outline Introduction Texas Hold’em rules Poki’s architecture Betting Strategy.
The State of Techniques for Solving Large Imperfect-Information Games Tuomas Sandholm Professor Carnegie Mellon University Computer Science Department.
Lecture 12. Game theory So far we discussed: roulette and blackjack Roulette: – Outcomes completely independent and random – Very little strategy (even.
Steering Evolution and Biological Adaptation Strategically: Computational Game Theory and Opponent Exploitation for Treatment Planning, Drug Design, and.
Better automated abstraction techniques for imperfect information games Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department.
Vincent Conitzer CPS Learning in games Vincent Conitzer
Day 9 GAME THEORY. 3 Solution Methods for Non-Zero Sum Games Dominant Strategy Iterated Dominant Strategy Nash Equilibrium NON- ZERO SUM GAMES HOW TO.
OPPONENT EXPLOITATION Tuomas Sandholm. Traditionally two approaches to tackling games Game theory approach (abstraction+equilibrium finding) –Safe in.
Artificial Intelligence AIMA §5: Adversarial Search
Strategy Grafting in Extensive Games
Announcements Homework 1 Full assignment posted..
Extensive-Form Game Abstraction with Bounds
The Duality Theorem Primal P: Maximize
Game Theory Just last week:
Tools for Decision Analysis: Analysis of Risky Decisions
Tiffany Bao∗, Yan Shoshitaishvili†, Fish Wang†
Great Theoretical Ideas in Computer Science
Recitation #3 Tel Aviv University 2016/2017 Slava Novgorodov
Failures of the VCG Mechanism in Combinatorial Auctions and Exchanges
Games and adversarial search (Chapter 5)
Social choice theory = preference aggregation = voting assuming agents tell the truth about their preferences Tuomas Sandholm Professor Computer Science.
Communication Complexity as a Lower Bound for Learning in Games
Extensive-form games and how to solve them
Noam Brown and Tuomas Sandholm Computer Science Department
Strategies for Poker AI player
Artificial Intelligence
Convergence, Targeted Optimality, and Safety in Multiagent Learning
Introduction to AI Tuomas Sandholm Professor
National 5 Physical Education
Artificial Intelligence
Reinforcement Learning
Optimal Control and Reachability with Competing Inputs
AI and Games 唐平中 清华大学 2/22/2019 Pingzhong Tang.
Ship Patrol: Multiagent Patrol under Complex Environmental Conditions
Strike it out.
CPS 173 Extensive-form games
Try starting with fewer objects to investigate.
Example: multi-party coin toss
Operations Research: Applications and Algorithms
Game Theory: Nash Equilibrium
Factor Game Sample Game.
Adversarial Search CMPT 420 / CMPG 720.
Minimax strategies, alpha beta pruning
Homework 5 (June 14) Material covered: Slides
CS51A David Kauchak Spring 2019
Operations Research: Applications and Algorithms
A Technique for Reducing Normal Form Games to Compute a Nash Equilibrium Vincent Conitzer and Tuomas Sandholm Carnegie Mellon University, Computer Science.
Finding equilibria in large sequential games of imperfect information
Playing 25 Lines ( Total Bet 25 ).
Presentation transcript:

Joint work with Sam Ganzfried Game Theory-Based Opponent Modeling in Large Imperfect-Information Games Tuomas Sandholm Carnegie Mellon University Computer Science Department Joint work with Sam Ganzfried

Traditionally two approaches Game theory approach (abstraction+equilibrium finding) Safe in 2-person 0-sum games Doesn’t maximally exploit weaknesses in opponent(s) Opponent modeling Get-taught-and-exploited problem [Sandholm AIJ-07] Needs prohibitively many repetitions to learn in large games (loses too much during learning) Crushed by game theory approach in Texas Hold’em…even with just 2 players and limit betting Same tends to be true of no-regret learning algorithms

Let’s hybridize the two approaches Start playing based on game theory approach As we learn opponent(s) deviate from equilibrium, start adjusting our strategy to exploit their weaknesses

The dream of safe exploitation Wish: Let’s avoid the get-taught-and-exploited problem by exploiting only to an extent that risks what we have won so far Proposition. It is impossible to exploit to any extent (beyond what the best equilibrium strategy would exploit) while preserving the safety guarantee of equilibrium play So we give up some on worst-case safety …

Deviation-Based Best Response (DBBR) algorithm (can be generalized to multi-player non-zero-sum) Dirichlet prior Many ways to determine opponent’s “best” strategy that is consistent with observations L1 or L2 distance to equilibrium strategy Custom weight-shifting algorithm ...

Experiments Performs significantly better in 2-player Limit Texas Hold’em against trivial opponents, and weak opponents from AAAI computer poker competitions, than game-theory-based base strategy Can be turned on only against weak opponents Examples of winrate evolution: