John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center with a lot of slides from Tuomas Sandholm Copyright 2015 Poker.

Slides:



Advertisements
Similar presentations
After the flop – an opponent raised before the flop Strategy: No-Limit.
Advertisements

Sequential imperfect-information games Case study: Poker Tuomas Sandholm Carnegie Mellon University Computer Science Department.
Tuomas Sandholm, Andrew Gilpin Lossless Abstraction of Imperfect Information Games Presentation : B 趙峻甫 B 蔡旻光 B 駱家淮 B 李政緯.
Evaluation Through Conflict Martin Zinkevich Yahoo! Inc.
Randomized Strategies and Temporal Difference Learning in Poker Michael Oder April 4, 2002 Advisor: Dr. David Mutchler.
Automatically Generating Game-Theoretic Strategies for Huge Imperfect-Information Games Tuomas Sandholm Carnegie Mellon University Computer Science Department.
Outline Dudo Rules Regret and Counterfactual Regret Imperfect Recall Abstraction Counterfactual Regret Minimization (CFR) Difficulties Fixed-Strategy.
Questions?. Setting a reward function, with and without subgoals Difference between agent and environment AI for games, Roomba Markov Property – Broken.
Mathematics and the Game of Poker
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 1 Games, Optimization, and Online Algorithms Martin Zinkevich University of Alberta.
Poker for Fun and Profit (and intellectual challenge) Robert Holte Computing Science Dept. University of Alberta.
DNA Starts to Learn Poker David Harlan Wood 4 * Hong Bi 1 Steven O. Kimbrough 2 Dongjun Wu 3 Junghuei Chen 1* Departments of 1 Chemistry & Biochemistry.
Better automated abstraction techniques for imperfect information games, with application to Texas Hold’em poker Andrew Gilpin and Tuomas Sandholm Carnegie.
A competitive Texas Hold’em poker player via automated abstraction and real-time equilibrium computation Andrew Gilpin and Tuomas Sandholm Carnegie Mellon.
Intro to Probability & Games
Using Probabilistic Knowledge And Simulation To Play Poker (Darse Billings …) Presented by Brett Borghetti 7 Jan 2007.
Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.
Better automated abstraction techniques for imperfect information games, with application to Texas Hold’em poker * Andrew Gilpin and Tuomas Sandholm, CMU,
Nesterov’s excessive gap technique and poker Andrew Gilpin CMU Theory Lunch Feb 28, 2007 Joint work with: Samid Hoda, Javier Peña, Troels Sørensen, Tuomas.
A Very Short Introduction to OpenMP Basile Schaeli EPFL – I&C – LSP Vincent Keller EPFL – STI – LIN.
Computing equilibria in extensive form games Andrew Gilpin Advanced AI – April 7, 2005.
Finding equilibria in large sequential games of imperfect information Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department.
Poki: The Poker Agent Greg Priebe Zak Knudson. Overview Texas Hold’em poker Architecture and Opponent Modeling of Poki Improvements from past Poki Betting.
Reinforcement Learning in the Presence of Hidden States Andrew Howard Andrew Arnold {ah679
1 Adversary Search Ref: Chapter 5. 2 Games & A.I. Easy to measure success Easy to represent states Small number of operators Comparison against humans.
Undergrad probability course (not a poker strategy guide nor an endorsement of gambling). Standard undergrad topics + random walks, arcsine laws, and a.
Texas Holdem Poker With Q-Learning. First Round (pre-flop) PlayerOpponent.
Introduction for Rotarians
NearOptimalGamblingAdive Matt Morgis Peter Chapman Mitch McCann Temple University.
Stat 35b: Introduction to Probability with Applications to Poker Outline for the day: 0. Collect hw2, return hw1, give out hw3. 1.Project A competition.
Poker Download A most popular card game or group of card games is called poker. Players compete against one another by betting on the values of each player's.
Instructor: Vincent Conitzer
SARTRE: System Overview A Case-Based Agent for Two-Player Texas Hold'em Jonathan Rubin & Ian Watson University of Auckland Game AI Group
Games. Adversaries Consider the process of reasoning when an adversary is trying to defeat our efforts In game playing situations one searches down the.
Senior Project Poster Day 2007, CIS Dept. University of Pennsylvania Reversi Meng Tran Faculty Advisor: Dr. Barry Silverman Strategies: l Corners t Corners.
The challenge of poker NDHU CSIE AI Lab 羅仲耘. 2004/11/04the challenge of poker2 Outline Introduction Texas Hold’em rules Poki’s architecture Betting Strategy.
Poker as a Testbed for Machine Intelligence Research By Darse Billings, Dennis Papp, Jonathan Schaeffer, Duane Szafron Presented By:- Debraj Manna Gada.
Neural Network Implementation of Poker AI
The State of Techniques for Solving Large Imperfect-Information Games Tuomas Sandholm Professor Carnegie Mellon University Computer Science Department.
Steering Evolution and Biological Adaptation Strategically: Computational Game Theory and Opponent Exploitation for Treatment Planning, Drug Design, and.
Texas Hold’em Playing Styles Team 4 Matt Darryl Alex.
Better automated abstraction techniques for imperfect information games Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department.
Introduction to Poker Originally created by Albert Wu,
ARTIFICIAL INTELLIGENCE (CS 461D) Princess Nora University Faculty of Computer & Information Systems.
The Mathematics of Poker– Implied Pot Odds Strategy: No-Limit.
Outline: 1) Odds ratios, continued. 2) Expected value revisited, Harrington’s strategy 3) Pot odds 4) Examples.
Stat 35b: Introduction to Probability with Applications to Poker Outline for the day: 1. Combos, permutations, and A  vs 2  after first ace 2.Conditional.
Stat 35b: Introduction to Probability with Applications to Poker Outline for the day: 1.Expected value. 2.Heads up with AA. 3.Heads up with Gus vs.
The State of Techniques for Solving Large Imperfect-Information Games Tuomas Sandholm.
OPPONENT EXPLOITATION Tuomas Sandholm. Traditionally two approaches to tackling games Game theory approach (abstraction+equilibrium finding) –Safe in.
Understanding AI of 2 Player Games. Motivation Not much experience in AI (first AI project) and no specific interests/passion that I wanted to explore.
Strategy Grafting in Extensive Games
Stat 35b: Introduction to Probability with Applications to Poker
Instructor: Vincent Conitzer
Extensive-Form Game Abstraction with Bounds
Stochastic tree search and stochastic games
Joint work with Sam Ganzfried
Game Theory Just last week:
Stat 35b: Introduction to Probability with Applications to Poker
Bridge (and Card Games) in General
Computing equilibria in extensive form games
Extensive-form games and how to solve them
Noam Brown and Tuomas Sandholm Computer Science Department
The Parameterized Poker Squares EAAI NSG Challenge
Instructor: Vincent Conitzer
Stat 35b: Introduction to Probability with Applications to Poker
Stat 35b: Introduction to Probability with Applications to Poker
HOW TO PLAY POKER.
Finding equilibria in large sequential games of imperfect information
Presentation transcript:

John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center with a lot of slides from Tuomas Sandholm Copyright 2015 Poker

Recognized challenge problem in AI since 1992 [Billings, Schaeffer, …] – Hidden information (other players’ cards) – Uncertainty about future events – Deceptive strategies needed in a good player – Very large game trees NBC National Heads-Up Poker Championship 2013

Heads-Up Limit Texas Hold’em Bots surpassed pros in 2008 [U. Alberta Poker Research Group] “Essentially solved” in 2015 [Bowling et al.] 2008AAAI-07

Heads-Up No-Limit Texas Hold’em Annual Computer Poker Competition Claudico Tartanian7

Heads-up no-limit Texas Hold’em Thanks Microsoft!

Texas Hold’em poker 2-player Limit has ~10 18 nodes 2-player No-Limit has ~ nodes Losslessly abstracted game too big to solve => abstract more => lossy Nature deals 2 cards to each player Nature deals 3 shared cards Nature deals 1 shared card Round of betting

Bet P2 22,100 possible P2 Call P2 PreFlop Fold Raise Action Flop Turn. River Game Tree Payoff Any leaf

Our approach [Gilpin & Sandholm EC-06, J. of the ACM 2007…] Now used basically by all competitive Texas Hold’em programs Nash equilibrium Original game Abstracted game Automated abstraction Custom equilibrium-finding algorithm Reverse model Foreshadowed by Shi & Littman 01, Billings et al. IJCAI

Compute Strategy Nash Equilibrium – “defensive” strategy (doesn’t try learn or to exploit opponent flaws) – No worse than tie (on average over many hands) – neither player can hope to improve their expected utilities through unilateral strategy change Too hard to solve completely here so we use an approximation that will converge to this… Counterfactual Regret Minimization (CFR) – Invented in 2000 (Hart and Mas-Colell)! – Predominant strategy since ~2006

Abstraction Mostly about how to bin similar situations A spade 4 flush is kind of like a heart 4 flush Clustering into “buckets” (k-means) in this case not at all the only choice 169 pre-flop buckets 60 public flop buckets 500 private buckets for turn and river Down to 5.5^15 nodes

More pruning Sampling Montel Carlo: Sqr(N) Imperfect recall No longer conforms to original convergence criteria Not obvious that this is a big win Empirical results show that it is Indexing scheme Accounts for suit isomorphisms

CFR Regret – how much better could we have done with some other action instead of this one Break down overall regret into regret at each step (actually information set) – sets of game states that the controlling player cannot distinguish and so must choose actions for all such states with the same distribution – for example, the first player to act does not know which cards the other players were dealt, and so all game states immediately following the deal where the first player holds the same cards would be in the same information set Weight this regret by (iteratively recalculated) probability of opponent reaching this set Average overall regret is less than the sum of this Immediate Counterfactual Regret So, if we minimize the immediate regret, we approach a Nash equilibrium

Bet P2 22,100 possible P2 Call P2 PreFlop Fold Raise Action Flop Turn. River Game Tree Payoff Any leaf

Serial? So far this we are prescribing a serial algorithm Do we need to parallelize to – scale up tree? – iterate to accurate solution? History would suggest yes…

Scalability of (near-)equilibrium finding in 2-player 0-sum games AAAI poker competition announced Koller & Pfeffer Using sequence form & LP (simplex) Billings et al. LP (CPLEX interior point method) Gilpin & Sandholm LP (CPLEX interior point method) Gilpin, Hoda, Peña & Sandholm Scalable EGT Gilpin, Sandholm ø & Sørensen Scalable EGT Zinkevich et al. Counterfactual regret

Blacklight Obvious starting point – Fairly easy threaded code (OpenMP) – World’s largest shared memory machine NUMA

Bet P2 22,100 possible P2 Call P2 PreFlop Fold Raise Action ……OpenMP…..

Bet P2 22,100 possible P2 Call P2 PreFlop Fold Raise Action MPI (OpenMP) MPI (OpenMP) MPI (OpenMP) MPI (OpenMP) MPI (OpenMP) MPI (OpenMP)

Hybrid Algorithm 1.Start on head node for pre-flop 2.Send the current state to each child blade 3.Each child blade then samples from its bucket, and continues the iteration of MCCFR. Within each child blade we use, multiple cores. Whenever a child cluster is reached, each core is given the same inputs but uses a different random number seed to select which sample to work on 4.Once all the child blades complete their part of the iteration, their calculated values are returned to the head blade 5.The head blade calculates a weighted average of these values, weighing them by the number of choices 6.The head node then continues its iteration of MCCFR, repeating the process whenever the sample exits the top part until the iteration is complete

Hybrid Programming (Most “complex” version: MPI_THREAD_MULTIPLE ) #include //Last thread of PE 0 sends its number to PE 1 main(int argc, char* argv[]){ int provided, myPE, thread, last_thread, data=0, tag=0; MPI_Status status; MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &provided); MPI_Comm_rank(MPI_COMM_WORLD, &myPE); #pragma omp parallel firstprivate(thread, data, tag, status) { thread = omp_get_thread_num(); last_thread = omp_get_num_threads()-1; if ( thread==last_thread && myPE==0 ) MPI_Send(&thread, 1, MPI_INT, 1, tag, MPI_COMM_WORLD); else if ( thread==last_thread && myPE==1 ) MPI_Recv(&data, 1, MPI_INT, 0, tag, MPI_COMM_WORLD, &status); printf("PE %d, Thread %d, Data %d\n", myPE, thread, data); } MPI_Finalize(); } % export OMP_NUM_THREADS=4 % ibrun -np 3 a.out PE 0, Thread 0, Data 0 PE 1, Thread 0, Data 0 PE 2, Thread 0, Data 0 PE 2, Thread 3, Data 0 PE 0, Thread 3, Data 0 PE 1, Thread 3, Data 3 PE 0, Thread 2, Data 0 PE 2, Thread 2, Data 0 PE 1, Thread 2, Data 0 PE 0, Thread 1, Data 0 PE 1, Thread 1, Data 0 PE 2, Thread 1, Data 0 Output for 4 threads run on 3 PEs

Hybrid Tradeoffs Easier – Scaling – More asynchronous – Flexible Redistribution Harder – Load balancing (a “Fold” action truncates earlier, for example) – MPI_THREAD_MULTIPLE has potential perf penalties – Debugging

Hybrid Performance Comm/Comp – About 1 ms comm time per iteration – About 15 ms per iteration Time – 960 cores for ~ 2M core hours for man-machine – ~1M to win last (2014) machine tournament* *Last July, a predecessor of Claudico, Tartanian7, won a Heads-up No-limit Texas Hold'em contest against other AI bots at the Association for the Advancement of Artificial Intelligence's 2014 Computer Poker Competition.

HandsTotal HandsDougDongBjornJasonTotal Per SessionCumulative Day 1 - A Day 1 - B Day 2 - A Day 2 - B Day 3 - A ,281-73,85042, Day 3 - B , Day 4 - A ,415-42,11446,889-30, Day 4 - B Day 5 - A ,57642,611-94,384-23, , Day 5 - B , Day 6 - A ,29816,24536,609-62, Day 6 - B ,1357,65851,2815, Day 7 -A ,79118,539101,86812, Day 7 -B ,40752,058-29,115-2, Day 8 -A ,26375,683-40,31893, Day 8 - B ,401-46, , Day 9 -A ,57976,12415,05035, Day 9 -B ,144-50, , Day 10 - A ,100-39,75366,286-77, Day 10 - B ,01362,85053,04919, Day 11 - A ,971-54,307101,99954, Day 11- B ,02921,492-74,71187, Day 12 - A ,014-5,59769,41248, Day 12 - B ,201-18,553-76,0609, Day 13 - A ,448-46,08173,651-9, Day ,39011,449-4,443-22, Total

HandsTotal HandsDougDongBjornJasonTotal Per SessionCumulative Day 1 - A Day 1 - B Day 2 - A Day 2 - B Day 3 - A ,281-73,85042, Day 3 - B , Day 4 - A ,415-42,11446,889-30, Day 4 - B Day 5 - A ,57642,611-94,384-23, , Day 5 - B , Day 6 - A ,29816,24536,609-62, Day 6 - B ,1357,65851,2815, Day 7 -A ,79118,539101,86812, Day 7 -B ,40752,058-29,115-2, Day 8 -A ,26375,683-40,31893, Day 8 - B ,401-46, , Day 9 -A ,57976,12415,05035, Day 9 -B ,144-50, , Day 10 - A ,100-39,75366,286-77, Day 10 - B ,01362,85053,04919, Day 11 - A ,971-54,307101,99954, Day 11- B ,02921,492-74,71187, Day 12 - A ,014-5,59769,41248, Day 12 - B ,201-18,553-76,0609, Day 13 - A ,448-46,08173,651-9, Day ,39011,449-4,443-22, Total

Short form… The competition ended with 80,000 hands of poker and an enormous amount of $170 million wagered during the play, and the humans won $732,713. At the top spot was Bjorn Li, who was left with chips worth $529,033. At second position, Doug Polk was left with $213,671, while Dong Kim had chips worth $70,491 and Jason Les with $80,482. However, no real money was involved in the poker competition. The actual prize money was composed of $100,000, which was sponsored by Rivers Casino and Microsoft.

How human? Claudico is latin for “I limp”! Claudico donk (from “donkey”) bets. Will commit heavy against a little pot. Had pros convinced it was learning from them every day! "Limping is for Losers This is the most important fundamental in poker--for every game, for every tournament, every stake: If you are the first player to voluntarily commit chips to the pot, open for a raise. Limping is inevitably a losing play. If you see a person at the table limping, you can be fairly sure he is a bad player. Bottom line: If your hand is worth playing, it is worth raising."

(Near) Future Work k-means on GPU Much bigger IO Win next machine-machine comp (late 2015) so we can… Win next Brains vs. AI!