Download presentation
Presentation is loading. Please wait.
Published byProsper Cummings Modified over 9 years ago
1
Poker and AI How the most “stable” creature on earth got used to that good old game from the west!
2
A game of (p)luck! Cards: – 2 Blinds – Flop:3 community cards – Turn:1 more community card – River:1 last community card Betting rounds after every card deal/flip Fold OR Call (Check) OR Raise (Bet) Showdown, if you get there
3
Poker as a non trivial act of intelligence Phil Hellmuth Phil used my knowledge of Phil against me Mike Matusow
4
Ain’t this an AI seminar? Games have always been an allure to AI theoreticians. Game of incomplete information Several successful implementations: BluffBot(Teppo Salonen), Polaris(Univ of Alberta), Poki, Casper… will see some. AAAI Annual Poker Competition : http://www.cs.ualberta.ca/~pokert/
5
The essence of Poker Hand Strength & Hand Potential : Assess the strength of the current hand. – Cards in game – Number of players in the game – Position of the player – History – Draws – Risks
6
The essence of Poker Pot Odds – Pot odds are the relative odds of the bet v/s the total pot compared with the odds of winning – Example: If the cards in hand are A(H)-A(D). And the cards on board are A(C)-2-3-7-?. Then the odds of getting a very strong hand after the river are 5:13. – The pot odds for a $10 bet on $40 pot are 1:4 while on a $10 pot are 1:1. – The first favorable, not the second.
7
The essence of Poker Bluffing & Unpredictability – Different strategies in similar situations – Element of non determinacy Opponent Modeling – Used to guess the opponents’ cards based on history
8
LOKI & POKI A look at how The Experts do it!
9
Encoding the Problem Probability triples – simplicity itself P r := ( f, c, r ) “Marvin thinks for an eon and comes up with the three magic numbers to make tea!” The output of all analysis at any game point is the probability with which poki folds or calls or raises. The final decision is non deterministic adding natural noise.
10
Building the system Pre-flop strategies : Almost zero information guess! How do humans start: Sklansky’s rankings – Collected into groups of similar cards (as far as poker is concerned) and categorized into 8 groups, of decreasing strength – Tuned for 10 player games, not considering opponent characteristics A Rule based system on this information
11
Man as a hand-wavy standard Moving away from External information: – Eliminate the use of human knowledge whenever possible – calculated information may be quantitative rather than qualitative – The algorithmic approach can be applied to many different specific situations (such as having exactly six players in the game)
12
Rebuilding the system Roll Out simulations – Pre-flop blinds called by all players and then checks till the showdown. Then probability of winning with a pair of cards gives the Income rate – Coarse Iterated Roll Out simulations – Income rates in the first simulation decides whether a player calls or folds pre-flops. – This value stabilizes
13
Hand Strength Hand Potential Effective Hand Strength Effective Hand Strength Think! Probability Triple Probability Triple Random Number Generator Random Number Generator
14
Hand Strength Hand Strength is the probability that a given hand is better than that of an active opponent – How? Calculate all possible hands that can be made with the current hand, and also those that are better / equal / worse than ours Extrapolate to n-opponents by raising the found probability to n HS n = (HS 1 ) n
15
Hand Potential Positive Potential: Of all possible games with the current hand, we calculate all scenarios where Poki is behind but ends up winning. Negative Potential: Of all possible games with the current hand, we calculate all scenarios where Poki is ahead but ends up losing.
16
Hand Strength Hand Potential Effective Hand Strength Effective Hand Strength Probability Triple Probability Triple Random Number Generator Random Number Generator
17
Effective Hand Strength P r (win) = P r (ahead)×P r (opponent does not improve) + P r (behind)×P r (we improve) = HS ×(1 − N Pot ) +(1 − HS)×P Pot. = HS + (1 − HS)×P Pot. = HS n + (1− HS n )×P pot (multiple opponents)
18
Hand Strength Hand Potential Effective Hand Strength Effective Hand Strength Probability Triple Probability Triple Random Number Generator Random Number Generator
19
Adding Sophistication All card pairs at a given point of time not equally likely Maintain a weighting table that stores the probability for each card pair he/she may be holding at the given point in game depending on history. re-weighting : update to this table on every move. EHS i = HS i + (1− HS i )×P pot,i
20
Adding Sophistication Pre-flop: Use the Ranking developed to guess initial weights for each player Post-flop: – Number of possible games too large – Weight EHS values. Mean threshold required for the player to call/raise Variance (uncertainty) in this value Use these values to reweight the table
21
“No poker strategy is complete without a good opponent modeling system” A Neural Net trained for an opponent fed 19 game characteristics and outputs a probability triple of for the opponents next action. Neural Net Fold Call Bet Inputs
22
There are other ways to make money
23
CASe based Poker playER Stores a large case base obtained through the simulation of other bots (Loki/Poki) For a particular situation calculates similarity value for each case and sort them (quick sort) Take cases up to a threshold of 97% or top 20 (which ever applicable) Find probability (f, c, r),i.e., the frequency of various decisions taken in there cases.
24
CASe based Poker playER Performs well against other bots and against real opponents in play money games Testing in real money games was expensive!! Reasons given for this – Insufficient real money cases – Different strategy adopted by people
25
Evolving Adaptive Play LooseTight Passive Evolution starts Aggressive A particular human trait is represented by a matrx which stores informations like probability tuple in various game situations
26
Evolution Matrices corresponding to the new generation are formed by randomizing/swapping some values in the matrix. The most promising matrices are selected through multiple game plays. The final set of matrices correspond to the best solution in the current playing environment. Can adapt to any change in the strategy of other players
27
Evolution: Martians can’t exist on Earth W tight (A tight ) > W tight (A) W loose (A loose ) > W loose (A) W tight (A tight ) > W tight (A loose ) W loose (A loose ) > W loose (A tight ) W x : Performance in ‘x’ environment A y : Program developed in environment ‘y’ Human traits are generally not fixed and their domain is not so small
28
Stereotypes People play with certain “prejudiced” strategies. Extensive statistics collected to jot down possible stereotypes In an early game, lack of data hampers effective opponent modeling : use stereotypes Extend the idea to the whole game. Stereotypes are various game-play styles adopted by various peoples recorded by watching a large number of games
29
A Façade used to match the decisions taken by the player at each betting round. The stereotype with the least mean square deviation chosen as the match The actual stereotype then used to guess the action of the player in future
30
Poker and Game Theory How to find the “optimal” strategy in the game of imperfect information – poker?
31
Applications of Game Theory To mathematically capture behavior in strategic situations, in which an individual's success in making choices depends on the choices of others In an equilibrium, each player of the game has adopted a strategy that they are unlikely to change, e.g. Nash Equilibrium applied to Climate Change Models
32
A One Card Poker OPENER DEALER ACEDEUCE TREY How is the game played?
33
A One Card Poker OPENER DEALER 1. Dealer Deals 2. Put $ 100 3. Check or Bet depending on how the other player plays!!
34
One card poker – decision tree The tree goes to a maximum depth of 3 Opener has a choice Opener Bets Dealer bets => Showdown Dealer folds =>Opener wins Opener Checks Dealer Calls => Showdown Dealer Bets Opener Calls => Showdown Opener Folds => Dealer wins
35
A One Card Poker – typical situation OPENER DEALER DEUCE I Bet!! What to do??? Is he bluffing?
36
Assumption: Obvious Plays and Stupid Mistakes 1.Folding the trey (3) 2.Calling with the ace 3.Checking with the trey “in position” 4.Betting with the deuce
37
Strategic Plays and Expected Value Consider the following variables: p1 = probability the opener bluffs with the ace, p2 = probability the opener calls with the deuce, p3 = probability the opener bets with the trey, q1 = probability the dealer bluffs with the ace, q2 = probability the dealer calls with the deuce.
38
Opener’s post-ante expected value There are three possible non-zero post-ante results for the opener. Either he loses $100, wins $200, or wins $300. We will begin by computing the probabilities of each of these outcomes. Case 1: The opener has the ace, the dealer has the deuce P(-100 $) = p1q2, P(200 $) = p1(1 - q2), P(300 $) = 0 Case 2: The opener has the ace, the dealer has the trey (3) P(-100 $) = p1, P(200 $) = P(300 $) = 0
39
Opener’s post-ante expected value Case 3: The opener has the deuce (2), the dealer has the ace P(-100 $) = 0, P(200 $) = 1 – q1, P(300 $) = q1p2 Case 4: The opener has the deuce (2), the dealer has the trey P(-100 $) = p2, P(200 $) = P(300 $) = 0 Case 5: The opener has the trey (3), the dealer has the ace P(-100 $) = 0, P(200 $) = 1 - (1 - p3)q1, P(300 $) = (1 - p3)q1 Case 6: The opener has the trey (3), the dealer has the deuce P(-100 $) = 0, P(200 $) = 1 - p3q2, P(300 $) = p3q2
40
Game Theoretic Analysis The opener’s total Expected Value for the entire hand is: [q1(3p2 − p3 − 1) + q2(p3 − 3p1) + (p1 − p2)] / 6 If q1 = q2 = 1/3; EV = - 1/18 and this does not depend on the opener’s choices of the numbers p1, p2, and p3
41
Optimal strategy: Game Theoretic Analysis The opener has an advantage in the game. The only way for the dealer to prevent the opener from being able to seize back some of this advantage is to play the indifferent strategy, q1 = q2 = 1/3 It is for this reason that the indifferent strategy is more commonly referred to as the “optimal” strategy.
42
Game Theory – How to win? You cannot win with the optimal strategy, but you can make sure you don’t lose.
43
Game Theory – How to win? So the object of the game is not to play optimally. It is to spot the times when your opponent is not playing optimally, or even to induce him not to play optimally, to recognize the way in which he is deviating from optimality, and then to choose a non-optimal strategy for yourself which capitalizes on his mistakes. You must play non-optimally in order to win. To capitalize on your opponent’s mistakes, you must play in a way that leaves you vulnerable.
44
Game Theory – to the other games Perfect InformationImperfect Information No chanceChess Go Inspection Game Battleships ChanceBackgammon Monopoly Poker Interesting finds in Game Theoretical Poker Research: Gautam Rao, a poker expert said about PsOpti : You have a very strong program. Once you add opponent modeling to it, it will kill everyone In poker, knowing the basic approach of the opponent is essential, since it will dictate how to properly handle many situations that arise. Some players wrongly attributed intelligence where none was present
45
References Billings, Davidson, Schaeffer, Szafron; The challenge of poker, 2002 Billings, Davidson, Schaeffer, Szafron; Opponent modeling in poker, 1998 Luigi Baron, Lyndon While; Evolving Adaptive play for simplified poker, 1998 Watson and Rubin, Case Based Poker Bot, 2008 Layton, Vamplew, Turville; Using stereotypes to improve early match poker play, 2008 Jason Swanson, Game Theory and Poker, 2005 D. Billings, N. Burch, A. Davidson, R. Holte, J. Schaeffer, T. Schauenberg, and D. Szafron Approximating Game- Theoretic Optimal Strategies for Full-scale Poker
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.