POKER AGENTS LD Miller & Adam EckMay 3, 2011. Motivation 2  Classic environment properties of MAS  Stochastic behavior (agents and environment)  Incomplete.

Slides:



Advertisements
Similar presentations
Bet sizing – How much to bet and why? Strategy: SnG / Tournaments.
Advertisements

Odds and Outs – How to play Draws Strategy: SNG / Tournaments.
After the flop – nobody raised before the flop Strategy: No-Limit.
Lecture 13. Poker Two basic concepts: Poker is a game of skill, not luck. If you want to win at poker, – make sure you are very skilled at the game, and.
Constructing Complex NPC Behavior via Multi- Objective Neuroevolution Jacob Schrum – Risto Miikkulainen –
Short Stack Strategy – How to play after the flop Strategy: No Limit.
Randomized Strategies and Temporal Difference Learning in Poker Michael Oder April 4, 2002 Advisor: Dr. David Mutchler.
No-Regret Algorithms for Online Convex Programs Geoffrey J. Gordon Carnegie Mellon University Presented by Nicolas Chapados 21 February 2007.
Final Specification KwangMonarchIkhanJamesGraham.
Mathematics and the Game of Poker
Rene Plowden Joseph Libby. Improving the profit margin by optimizing the win ratio through the use of various strategies and algorithmic computations.
Minority Games A Complex Systems Project. Going to a concert… But which night to pick? Friday or Saturday? You want to go on the night with the least.
Reinforcement learning
A Game Of Strategy … Or Luck? Serene Li Hui Heng Xiaojun Jiang Cheewei Ng Li Xue Alison Then Team 5, MS&E220 Autumn 2008.
Student simulation and evaluation DOD meeting Hua Ai 03/03/2006.
Poker for Fun and Profit (and intellectual challenge) Robert Holte Computing Science Dept. University of Alberta.
Intelligence for Games and Puzzles1 Poker: Opponent Modelling Early AI work on poker used simplified.
DNA Starts to Learn Poker David Harlan Wood 4 * Hong Bi 1 Steven O. Kimbrough 2 Dongjun Wu 3 Junghuei Chen 1* Departments of 1 Chemistry & Biochemistry.
Integrating POMDP and RL for a Two Layer Simulated Robot Architecture Presented by Alp Sardağ.
Intro to Probability & Games
Using Probabilistic Knowledge And Simulation To Play Poker (Darse Billings …) Presented by Brett Borghetti 7 Jan 2007.
Introduction to Cognition and Gaming 9/22/02: Bluffing.
Better automated abstraction techniques for imperfect information games, with application to Texas Hold’em poker * Andrew Gilpin and Tuomas Sandholm, CMU,
Poki: The Poker Agent Greg Priebe Zak Knudson. Overview Texas Hold’em poker Architecture and Opponent Modeling of Poki Improvements from past Poki Betting.
Models of Strategic Deficiency and Poker Workflow Inference: What to do with One Example and no Semantics.
Introduction to the Big Stack Strategy (BSS) Strategy: No Limit.
Reinforcement Learning in the Presence of Hidden States Andrew Howard Andrew Arnold {ah679
CPSC 7373: Artificial Intelligence Lecture 11: Reinforcement Learning Jiang Bian, Fall 2012 University of Arkansas at Little Rock.
Undergrad probability course (not a poker strategy guide nor an endorsement of gambling). Standard undergrad topics + random walks, arcsine laws, and a.
Texas Holdem Poker With Q-Learning. First Round (pre-flop) PlayerOpponent.
Overview Odds Pot Odds Outs Probability to Hit an Out
Evolving Multi-modal Behavior in NPCs Jacob Schrum – Risto Miikkulainen –
Advanced Artificial Intelligence Lecture 3B: Game theory.
Learning to Play Blackjack Thomas Boyett Presentation for CAP 4630 Teacher: Dr. Eggen.
Paigow Training Hai J Shen International Association of Certified Surveillance Professionals.
Brain Teasers. Answer 3 Quantitative Finance Society Gambling Strategies & Statistics.
Introduction for Rotarians
NearOptimalGamblingAdive Matt Morgis Peter Chapman Mitch McCann Temple University.
Introduction Many decision making problems in real life
Starcraft Opponent Modeling CSE 391: Intro to AI Luciano Cheng.
SARTRE: System Overview A Case-Based Agent for Two-Player Texas Hold'em Jonathan Rubin & Ian Watson University of Auckland Game AI Group
The challenge of poker NDHU CSIE AI Lab 羅仲耘. 2004/11/04the challenge of poker2 Outline Introduction Texas Hold’em rules Poki’s architecture Betting Strategy.
Poker as a Testbed for Machine Intelligence Research By Darse Billings, Dennis Papp, Jonathan Schaeffer, Duane Szafron Presented By:- Debraj Manna Gada.
Top level learning Pass selection using TPOT-RL. DT receiver choice function DT is trained off-line in artificial situation DT used in a heuristic, hand-coded.
Neural Network Implementation of Poker AI
The Poker Game in Jadex by Group 1 Mohammed Musavi (Ashkan) Xavi Dolcet Enric Tejedor.
Texas Hold’em Playing Styles Team 4 Matt Darryl Alex.
Short stack strategy: Draws in a free play situation Strategy: No Limit.
All In To put all the rest of your money into the pot.
Introduction to Poker Originally created by Albert Wu,
Odds and Outs Strategy: General concepts. Expected Value – Dice Game Expected Value (EV) Calculating the EV in a dice game 1/3 x (+3$) + 2/3 x (-1$) =
Artificial Neural Networks And Texas Hold’em ECE 539 Final Project December 19, 2003 Andy Schultz.
The Mathematics of Poker– Implied Pot Odds Strategy: No-Limit.
Chance We will base on the frequency theory to study chances (or probability).
OPPONENT EXPLOITATION Tuomas Sandholm. Traditionally two approaches to tackling games Game theory approach (abstraction+equilibrium finding) –Safe in.
Understanding AI of 2 Player Games. Motivation Not much experience in AI (first AI project) and no specific interests/passion that I wanted to explore.
Stat 35b: Introduction to Probability with Applications to Poker
Game Theory Just last week:
Stat 35b: Introduction to Probability with Applications to Poker
Input Space Partition Testing CS 4501 / 6501 Software Testing
Reinforcement learning (Chapter 21)
Reinforcement Learning (1)
Reinforcement learning (Chapter 21)
Stat 35b: Introduction to Probability with Applications to Poker
Strategies for Poker AI player
What makes a good mathematical game?
CS 188: Artificial Intelligence Spring 2006
Stat 35b: Introduction to Probability with Applications to Poker
World models and basis functions
Presentation transcript:

POKER AGENTS LD Miller & Adam EckMay 3, 2011

Motivation 2  Classic environment properties of MAS  Stochastic behavior (agents and environment)  Incomplete information  Uncertainty  Application Examples  Robotics  Intelligent user interfaces  Decision support systems

Overview 3  Background  Methodology (Updated)  Results (Updated)  Conclusions (Updated)

Background| Texas Hold’em Poker 4  Games consist of 4 different steps  Actions: bet (check, raise, call) and fold  Bets can be limited or unlimited BackgroundMethodologyResultsConclusions community cardsprivate cards (1) pre-flop(2) flop(3) turn(4) river

Background| Texas Hold’em Poker 5  Significant worldwide popularity and revenue  World Series of Poker (WSOP) attracted 63,706 players in 2010 (WSOP, 2010)  Online sites generated estimated $20 billion in 2007 (Economist, 2007)  Has fortuitous mix of strategy and luck  Community cards allow for more accurate modeling  Still many “ outs ” or remaining community cards which defeat strong hands BackgroundMethodologyResultsConclusions

Background| Texas Hold’em Poker 6  Strategy depends on hand strength which changes from step to step!  Hands which were strong early in the game may get weaker (and vice-versa) as cards are dealt BackgroundMethodologyResultsConclusions community cardsprivate cards raise! check?fold?

Background| Texas Hold’em Poker 7  Strategy also depends on betting behavior  Three different types (Smith, 2009):  Aggressive players who often bet/raise to force folds  Optimistic players who often call to stay in hands  Conservative or “tight” players who often fold unless they have really strong hands BackgroundMethodologyResultsConclusions

Methodology| Strategies 8  Solution 2: Probability distributions  Hand strength measured using Poker Prophesier ( BackgroundMethodologyResultsConclusions TacticFoldCallRaise Weak[0…0.7)[0.7…0.95)[0.95…1) Medium[0…0.3)[0.3…0.7)[0.7…1) Strong[0…0.05)[0.05…0.3)[0.3…1) BehaviorWeakMediumStrong Aggressive[0…0.2)[0.2…0.6)[0.6…1) Optimistic[0…0.5)[0.5…0.9)[0.9…1) Conservative[0…0.3)[0.3…0.8)[0.8…1) (1) Check hand strength for tactic (2) “Roll” on tactic for action

Methodology| Deceptive Agent 9  Problem 1: Agents don’t explicitly deceive  Reveal strategy every action  Easy to model  Solution: alternate strategies periodically  Conservative to aggressive and vice-versa  Break opponent modeling (concept shift) BackgroundMethodologyResultsConclusions

Methodology| Explore/Exploit 10  Problem 2: Basic agents don’t adapt  Ignore opponent behavior  Static strategies  Solution: use reinforcement learning (RL)  Implicitly model opponents  Revise action probabilities  Explore space of strategies, then exploit success BackgroundMethodologyResultsConclusions

Methodology| Active Sensing 11  Opponent model = knowledge  Refined through observations Betting history, opponent’s cards  Actions produce observations Information is not free  Tradeoff in action selection  Current vs. future hand winnings/losses  Sacrifice vs. gain BackgroundMethodologyResultsConclusions

Methodology| Active Sensing 12  Knowledge representation  Set of Dirichlet probability distributions Frequency counting approach Opponent state s o = their estimated hand strength Observed opponent action a o  Opponent state  Calculated at end of hand (if cards revealed)  Otherwise 1 – s Considers all possible opponent hands BackgroundMethodologyResultsConclusions

Methodology| BoU 13  Problem: Different strategies may only be effective against certain opponents  Example: Doyle Brunson has won 2 WSOP with 7-2 off suit ― worst possible starting hand  Example: An aggressive strategy is detrimental when opponent knows you are aggressive  Solution: Choose the “correct” strategy based on the previous sessions BackgroundMethodologyResultsConclusions

Methodology| BoU 14  Approach: Find the Boundary of Use (BoU) for the strategies based on previously collected sessions  BoU partitions sessions into three types of regions (successful, unsuccessful, mixed) based on the session outcome  Session outcome ― complex and independent of strategy  Choose the correct strategy for new hands based on region membership BackgroundMethodologyResultsConclusions

Methodology| BoU 15  BoU Example  Ideal: All sessions inside the BoU BackgroundMethodologyResultsConclusions Strategy Incorrect Strategy Correct Strategy ?????

Methodology| BoU 16  BoU Implementation  k-Mediods clustering semi-supervised clustering Similarity metric needs to be modified to incorporate action sequences AND missing values Number of clusters found automatically balancing cluster purity and coverage  Session outcome Uses hand strength to compute the correct decision  Model updates Adjust intervals for tactics based on sessions found in mixed regions BackgroundMethodologyResultsConclusions

Results| Overview 17  Validation (presented previously)  Basic agent vs. other basic  RL agent vs. basic agents  Deceptive agent vs. RL agent  Investigation  AS agent vs. RL /Deceptive agents  BoU agent vs. RL/Deceptive agents  AS agent vs. BoU agent Ultimate showdown BackgroundMethodologyResultsConclusions

Results| Overview 18  Hypotheses (research and operational) BackgroundMethodologyResultsConclusions Hypo.SummaryValidated?Section R1AS agents will outperform non-AS...???5.2.1 R2Changing the rate of exploration in AS will...???5.2.1 R3Using the BoU to choose the correct strategy...???5.2.3 O1None of the basic strategies dominates???5.1.1 O2RL approach will outperform basic...and Deceptive will be somewhere in the middle... ??? O3AS and BoU will outperform RL??? O4Deceptive will lead for the first part of games...??? O5AS will outperform BoU when BoU does not have any data on AS ???5.2.3

Results| RL Validation 19 BackgroundMethodologyResultsConclusions  Matchup 1: RL vs. Aggressive HSFoldCallRaise

Results| RL Validation 20 BackgroundMethodologyResultsConclusions  Matchup 2: RL vs. Optimistic HSFoldCallRaise

Results| RL Validation 21 BackgroundMethodologyResultsConclusions  Matchup 3: RL vs. Conservative HSFoldCallRaise

Results| RL Validation 22 BackgroundMethodologyResultsConclusions  Matchup 4: RL vs. Deceptive HSFoldCallRaise

Results| AS Results 23 BackgroundMethodologyResultsConclusions  All opponent modeling approaches defeat  Explicit modeling better than implicit  AS with ε = 0.2 improves over non-AS due to additional sensing  AS with ε = 0.4 senses too much, resulting in too many lost hands

Results| AS Results 24 BackgroundMethodologyResultsConclusions  All opponent modeling approaches defeat Deceptive  Can handle concept shift AS  AS with ε = 0.2 similar to non-AS  Little benefit from extra sensing  Again AS with ε = 0.4 senses too much

Results| AS Results 25 BackgroundMethodologyResultsConclusions  AS with ε = 0.2 defeats non-AS  Active sensing provides better opponent model  Overcomes additional costs  Again AS with ε = 0.4 senses too much

Results| AS Results 26 BackgroundMethodologyResultsConclusions  Conclusions  Mixed results for Hypothesis R1 AS with ε = 0.2 better than non-AS against RL and heads-up AS with ε = 0.4 always worse than non-AS  Confirm Hypothesis R2 ε = 0.4 results in too much sensing which results in more losses when the agent should have folded Not enough extra sensing benefit to offset costs

Results| BoU Results 27 BackgroundMethodologyResultsConclusions  BoU is crushed by RL   BoU constantly lowers interval for Aggressive  RL learns to be super- tight

Results| BoU Results 28 BackgroundMethodologyResultsConclusions  BoU very close to deceptive  Both use aggressive strategies  BoU’s aggressive is much more reckless after model updates

Results| BoU Results 29 BackgroundMethodologyResultsConclusions  Conclusions  Hypothesis R3 and O3 do not hold BoU does not outperform deceptive/RL  Model update method Updates Aggressive strategy to “fix” mixed regions Results in emergent behavior— reckless bluffing Bluffing is very bad against a super-tight player HSFoldCallRaise

Results| Ultimate Showdown 30 BackgroundMethodologyResultsConclusions  And the winner is…active sensing (booo) HSFoldCallRaise

Conclusion| Summary 31  AS > RL > Aggressive > Deceptive >= BoU > Optimistic > Conservative BackgroundMethodologyResultsConclusions Hypo.SummaryValidated?Section R1AS agents will outperform non-AS...Yes5.2.1 R2Changing the rate of exploration in AS will...Yes5.2.1 R3Using the BoU to choose the correct strategy...No5.2.3 O1None of the basic strategies dominatesNo5.1.1 O2RL approach will outperform basic...and Deceptive will be somewhere in the middle... Yes O3AS and BoU will outperform RLYes O4Deceptive will lead for the first part of games...No O5AS will outperform BoU when BoU does not have any data on AS Yes5.2.3

Questions? 32

References 33  (Daw et al., 2006) N.D. Daw et. al, Cortical substrates for exploratory decisions in humans, Nature, 441:  (Economist, 2007) Poker: A big deal, Economist, Retrieved January 11, 2011, from  (Smith, 2009) Smith, G., Levere, M., and Kurtzman, R. Poker player behavior after big wins and big losses, Management Science, pp ,  (WSOP, 2010) 2010 World series of poker shatters attendance records, Retrieved January 11, 2011, from SERIES-OF-POKER-SHATTERS-ATTENDANCE-RECORD.html