Presentation is loading. Please wait.

Presentation is loading. Please wait.

John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center with a lot of slides from Tuomas Sandholm Copyright 2015 Poker.

Similar presentations


Presentation on theme: "John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center with a lot of slides from Tuomas Sandholm Copyright 2015 Poker."— Presentation transcript:

1 John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center with a lot of slides from Tuomas Sandholm Copyright 2015 Poker

2 Recognized challenge problem in AI since 1992 [Billings, Schaeffer, …] – Hidden information (other players’ cards) – Uncertainty about future events – Deceptive strategies needed in a good player – Very large game trees NBC National Heads-Up Poker Championship 2013

3 Heads-Up Limit Texas Hold’em Bots surpassed pros in 2008 [U. Alberta Poker Research Group] “Essentially solved” in 2015 [Bowling et al.] 2008AAAI-07

4 Heads-Up No-Limit Texas Hold’em Annual Computer Poker Competition Claudico Tartanian7

5 Heads-up no-limit Texas Hold’em Thanks Microsoft!

6 Texas Hold’em poker 2-player Limit has ~10 18 nodes 2-player No-Limit has ~10 165 nodes Losslessly abstracted game too big to solve => abstract more => lossy Nature deals 2 cards to each player Nature deals 3 shared cards Nature deals 1 shared card Round of betting

7 Bet P2 22,100 possible P2 Call P2 PreFlop Fold Raise Action Flop Turn. River Game Tree Payoff Any leaf

8 Our approach [Gilpin & Sandholm EC-06, J. of the ACM 2007…] Now used basically by all competitive Texas Hold’em programs Nash equilibrium Original game Abstracted game Automated abstraction Custom equilibrium-finding algorithm Reverse model Foreshadowed by Shi & Littman 01, Billings et al. IJCAI-03 10 161

9 Compute Strategy Nash Equilibrium – “defensive” strategy (doesn’t try learn or to exploit opponent flaws) – No worse than tie (on average over many hands) – neither player can hope to improve their expected utilities through unilateral strategy change Too hard to solve completely here so we use an approximation that will converge to this… Counterfactual Regret Minimization (CFR) – Invented in 2000 (Hart and Mas-Colell)! – Predominant strategy since ~2006

10 Abstraction Mostly about how to bin similar situations A spade 4 flush is kind of like a heart 4 flush Clustering into “buckets” (k-means) in this case not at all the only choice 169 pre-flop buckets 60 public flop buckets 500 private buckets for turn and river Down to 5.5^15 nodes

11 More pruning Sampling Montel Carlo: Sqr(N) Imperfect recall No longer conforms to original convergence criteria Not obvious that this is a big win Empirical results show that it is Indexing scheme Accounts for suit isomorphisms

12 CFR Regret – how much better could we have done with some other action instead of this one Break down overall regret into regret at each step (actually information set) – sets of game states that the controlling player cannot distinguish and so must choose actions for all such states with the same distribution – for example, the first player to act does not know which cards the other players were dealt, and so all game states immediately following the deal where the first player holds the same cards would be in the same information set Weight this regret by (iteratively recalculated) probability of opponent reaching this set Average overall regret is less than the sum of this Immediate Counterfactual Regret So, if we minimize the immediate regret, we approach a Nash equilibrium

13 Bet P2 22,100 possible P2 Call P2 PreFlop Fold Raise Action Flop Turn. River Game Tree Payoff Any leaf

14 Serial? So far this we are prescribing a serial algorithm Do we need to parallelize to – scale up tree? – iterate to accurate solution? History would suggest yes…

15 Scalability of (near-)equilibrium finding in 2-player 0-sum games AAAI poker competition announced Koller & Pfeffer Using sequence form & LP (simplex) Billings et al. LP (CPLEX interior point method) Gilpin & Sandholm LP (CPLEX interior point method) Gilpin, Hoda, Peña & Sandholm Scalable EGT Gilpin, Sandholm ø & Sørensen Scalable EGT Zinkevich et al. Counterfactual regret

16 Blacklight Obvious starting point – Fairly easy threaded code (OpenMP) – World’s largest shared memory machine NUMA

17 Bet P2 22,100 possible P2 Call P2 PreFlop Fold Raise Action ……OpenMP…..

18 Bet P2 22,100 possible P2 Call P2 PreFlop Fold Raise Action MPI (OpenMP) MPI (OpenMP) MPI (OpenMP) MPI (OpenMP) MPI (OpenMP) MPI (OpenMP)

19 Hybrid Algorithm 1.Start on head node for pre-flop 2.Send the current state to each child blade 3.Each child blade then samples from its bucket, and continues the iteration of MCCFR. Within each child blade we use, multiple cores. Whenever a child cluster is reached, each core is given the same inputs but uses a different random number seed to select which sample to work on 4.Once all the child blades complete their part of the iteration, their calculated values are returned to the head blade 5.The head blade calculates a weighted average of these values, weighing them by the number of choices 6.The head node then continues its iteration of MCCFR, repeating the process whenever the sample exits the top part until the iteration is complete

20 Hybrid Programming (Most “complex” version: MPI_THREAD_MULTIPLE ) #include //Last thread of PE 0 sends its number to PE 1 main(int argc, char* argv[]){ int provided, myPE, thread, last_thread, data=0, tag=0; MPI_Status status; MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &provided); MPI_Comm_rank(MPI_COMM_WORLD, &myPE); #pragma omp parallel firstprivate(thread, data, tag, status) { thread = omp_get_thread_num(); last_thread = omp_get_num_threads()-1; if ( thread==last_thread && myPE==0 ) MPI_Send(&thread, 1, MPI_INT, 1, tag, MPI_COMM_WORLD); else if ( thread==last_thread && myPE==1 ) MPI_Recv(&data, 1, MPI_INT, 0, tag, MPI_COMM_WORLD, &status); printf("PE %d, Thread %d, Data %d\n", myPE, thread, data); } MPI_Finalize(); } % export OMP_NUM_THREADS=4 % ibrun -np 3 a.out PE 0, Thread 0, Data 0 PE 1, Thread 0, Data 0 PE 2, Thread 0, Data 0 PE 2, Thread 3, Data 0 PE 0, Thread 3, Data 0 PE 1, Thread 3, Data 3 PE 0, Thread 2, Data 0 PE 2, Thread 2, Data 0 PE 1, Thread 2, Data 0 PE 0, Thread 1, Data 0 PE 1, Thread 1, Data 0 PE 2, Thread 1, Data 0 Output for 4 threads run on 3 PEs

21 Hybrid Tradeoffs Easier – Scaling – More asynchronous – Flexible Redistribution Harder – Load balancing (a “Fold” action truncates earlier, for example) – MPI_THREAD_MULTIPLE has potential perf penalties – Debugging

22 Hybrid Performance Comm/Comp – About 1 ms comm time per iteration – About 15 ms per iteration Time – 960 cores for ~ 2M core hours for man-machine – ~1M to win last (2014) machine tournament* *Last July, a predecessor of Claudico, Tartanian7, won a Heads-up No-limit Texas Hold'em contest against other AI bots at the Association for the Advancement of Artificial Intelligence's 2014 Computer Poker Competition.

23 HandsTotal HandsDougDongBjornJasonTotal Per SessionCumulative Day 1 - A7503000602024094913888-1750297537 Day 1 - B7506000557093076718602-5391151167148704 Day 2 - A7509000-1919-2613463329-42219-6943141761 Day 2 - B75012000-44951-12870-269692795-8199559766 Day 3 - A3751350045,281-73,85042,673-6791731367079 Day 3 - B7501650030741-56469,380-7099487166566 Day 4 - A8001970044,415-42,11446,889-30,03719153185719 Day 4 - B800229001107791281248376-29379142588328307 Day 5 - A80026100-50,57642,611-94,384-23,635-125,984202323 Day 5 - B8002930014254186112,8053028265952268275 Day 6 - A80032500111,29816,24536,609-62,914101238369513 Day 6 - B8003570025,1357,65851,2815,31589389458902 Day 7 -A8003890053,79118,539101,86812,160186358645260 Day 7 -B80042100-97,40752,058-29,115-2,134-76598568662 Day 8 -A80045300-94,26375,683-40,31893,90335005603667 Day 8 - B8004850012,401-46,70025817,605-16436587231 Day 9 -A80051700-27,57976,12415,05035,12498719685950 Day 9 -B8005490075,144-50,834-372-118,880-94942591008 Day 10 - A8005810025,100-39,75366,286-77,217-25584565424 Day 10 - B80061300-19,01362,85053,04919,515116401681825 Day 11 - A80064500-37,971-54,307101,99954,34764068745893 Day 11- B80067700-106,02921,492-74,71187,269-71979673914 Day 12 - A80070900-82,014-5,59769,41248,32130122704036 Day 12 - B80074100118,201-18,553-76,0609,60033188737224 Day 13 - A127579200-30,448-46,08173,651-9,781-12659724565 Day 142008000023,39011,449-4,443-22,2488148732713 Total200008000021367170491529033-80482732713

24 HandsTotal HandsDougDongBjornJasonTotal Per SessionCumulative Day 1 - A7503000602024094913888-1750297537 Day 1 - B7506000557093076718602-5391151167148704 Day 2 - A7509000-1919-2613463329-42219-6943141761 Day 2 - B75012000-44951-12870-269692795-8199559766 Day 3 - A3751350045,281-73,85042,673-6791731367079 Day 3 - B7501650030741-56469,380-7099487166566 Day 4 - A8001970044,415-42,11446,889-30,03719153185719 Day 4 - B800229001107791281248376-29379142588328307 Day 5 - A80026100-50,57642,611-94,384-23,635-125,984202323 Day 5 - B8002930014254186112,8053028265952268275 Day 6 - A80032500111,29816,24536,609-62,914101238369513 Day 6 - B8003570025,1357,65851,2815,31589389458902 Day 7 -A8003890053,79118,539101,86812,160186358645260 Day 7 -B80042100-97,40752,058-29,115-2,134-76598568662 Day 8 -A80045300-94,26375,683-40,31893,90335005603667 Day 8 - B8004850012,401-46,70025817,605-16436587231 Day 9 -A80051700-27,57976,12415,05035,12498719685950 Day 9 -B8005490075,144-50,834-372-118,880-94942591008 Day 10 - A8005810025,100-39,75366,286-77,217-25584565424 Day 10 - B80061300-19,01362,85053,04919,515116401681825 Day 11 - A80064500-37,971-54,307101,99954,34764068745893 Day 11- B80067700-106,02921,492-74,71187,269-71979673914 Day 12 - A80070900-82,014-5,59769,41248,32130122704036 Day 12 - B80074100118,201-18,553-76,0609,60033188737224 Day 13 - A127579200-30,448-46,08173,651-9,781-12659724565 Day 142008000023,39011,449-4,443-22,2488148732713 Total200008000021367170491529033-80482732713

25 Short form… The competition ended with 80,000 hands of poker and an enormous amount of $170 million wagered during the play, and the humans won $732,713. At the top spot was Bjorn Li, who was left with chips worth $529,033. At second position, Doug Polk was left with $213,671, while Dong Kim had chips worth $70,491 and Jason Les with $80,482. However, no real money was involved in the poker competition. The actual prize money was composed of $100,000, which was sponsored by Rivers Casino and Microsoft.

26 How human? Claudico is latin for “I limp”! Claudico donk (from “donkey”) bets. Will commit heavy against a little pot. Had pros convinced it was learning from them every day! "Limping is for Losers This is the most important fundamental in poker--for every game, for every tournament, every stake: If you are the first player to voluntarily commit chips to the pot, open for a raise. Limping is inevitably a losing play. If you see a person at the table limping, you can be fairly sure he is a bad player. Bottom line: If your hand is worth playing, it is worth raising."

27 (Near) Future Work k-means on GPU Much bigger IO Win next machine-machine comp (late 2015) so we can… Win next Brains vs. AI!


Download ppt "John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center with a lot of slides from Tuomas Sandholm Copyright 2015 Poker."

Similar presentations


Ads by Google