Intelligence for Games and Puzzles1 Poker: Opponent Modelling Early AI work on poker used simplified variants of poker. More recently attention has focused mainly on “Limit Texas Hold’em”, in both its “heads-up” form (only two players) and its many-player form (often 10). Texas Hold’em is a popular form of poker in the USA. As in all forms of poker, betting is an essential element. Texas Hold’em offers four opportunities per hand for a round of betting. In Limit Texas Hold’em there are two sizes of bet increment: the small bet - say $2 the large bet - say $4 In No-Limit Texas Hold’em, players may bet any amount up to the current size of the pot.
Intelligence for Games and Puzzles2 The structure of a hand of Texas Hold’em A hand (if played to the bitter end) proceeds through nine stages: Dealer gives each player two cards - “hole cards” - face-down, player may see only his own cards “Preflop” Round of (small) betting, started by the “blind” Dealer lays three cards face-up “Flop” Round of (small) betting Dealer lays a fourth card face-up “Turn” Round of (big) betting Dealer lays a fifth card face-up “River” Round of (big) betting Players still in the game show their cards to determine the winner “Showdown” The winner is the player who makes the best 5-card poker hand using a combination of his hole cards and the community cards (the board).
Intelligence for Games and Puzzles3 Decisions to be made In a round of betting, players have to choose one of five actions repeatedly, starting off with the player to dealer’s left and proceeding clockwise: BetIf nobody has yet bet in the current round, a player may add the appropriate- size (small or large) bet to the pot. CheckIf nobody has yet bet in the current round, a player may do nothing. CallIf someone has put more into the pot in the current round, a player may add just enough to make their own contribution equal. RaiseIf someone has put more into the pot in the current round, a player may add enough to make their own contribution equal and then add the appropriate (small or large) bet on top. In Limit version, max 3 raises per round. FoldA player may withdraw from the hand, forfeiting any bets and raises already put in the pot, and excluding themselves from further betting. Limit games require no decision about the amount of bets and raises. No-limit games require more complex reasoning because bets may vary in size.
Intelligence for Games and Puzzles4 Betting based on probabilities One way to play poker is to use probabilities: Given your own known hole cards, and the community cards that are on show, for each possible combination of community cards yet to appear, how likely is your hand to be better than any other player’s hand? Compare this to the pot odds - the ratio If the comparison is very favourable, bet or raise; if merely favourable, check or call; if not, check if possible otherwise fold the cost of making a bet/call the size of the pot
Intelligence for Games and Puzzles5 Predictability is bad Basing your behaviour on the probabilities like this is a poor strategy. Other players will observe the cards you reveal at the showdowns, learn about your conservative style of play, learn about your assessment of winning chances, interpret your betting behaviour as indicative of the strength of your hand, and use this to beat you over the course of many hands. Good poker players 1.observe the decisions of their opponents and gather what evidence they can 2.base their decisions on models of their opponents, exploiting any weaknesses they detect 3.strive to frustrate the formation of accurate models of themselves, by bluffing, and by consciously, deliberately changing their own style
Intelligence for Games and Puzzles6 Poker as AI Testbed domain Poker, like several other games, features Competing agents Chance Finite set of choices Large game tree In addition, Poker has Risk assessment Deception In many other games, there is little to be gained by modelling opponents. Rudimentary models, like “contempt factor”, or no model at all, are common. In poker, modelling opponents - and awareness of their trying to model you - is essential to good play.
Intelligence for Games and Puzzles7 Bayesian Network approach to modelling By training over many self- played hands, CPP (conditional probability table) can be built up, Then in real play, knowing all influences upon “opponent action” except “opponent current hand”, can draw conclusions about “opponent current hand”. But CPP at ~200k entries cannot reasonably be modified over a game of ~100 hands. (Boulton)
Intelligence for Games and Puzzles8 Classification of hands At the outset, the two hole cards of an opponent player may be any two of the 50 cards you don’t have. 50x49/2=1225 combinations if you enumerate them. Sufficient to distinguish 169 qualitatively different hands: 13 possible pairs - AA KK QQ … pairings of cards of the same suit - AKs, AQs, AJs, A10s, A9s, … 42s, 32s 78 pairings of cards of different suits - AK, AQ, AJ, … 32 Collapsing still further, to 25 or so classes, loses some information but facilitates learning of statistics. Classification of boards and of pot sizes can proceed similarly.
Intelligence for Games and Puzzles9 The Loki program Loki, from Univ.Alberta, used a probabilistic approach, with one initial model (set of weights) for all players, then updating weights for individual players on the basis of their observed actions. Assess prob. of holding each class of hand, given own cards & board; Modify prob. estimates in light of each action e.g. “raise” increase strong hand probs. & decrease weak hand probs Adjust weights from estimated hands to better predict observed action This showed improved performance compared to (i) programs with no modelling and (ii) programs with only static modelling.
Intelligence for Games and Puzzles10 Bluffing behaviour Being able to model others is only part of the solution. Good players find it easy to model opponents who never bluff. Bluffing purely at random (say 5% of hands) has a problem: in some cases opponents can know for certain you cannot win, avoid bluffing at such a time. Keeping raising when bluffing is not typical of behaviour when you truly do have a good hand - good opponents will detect the difference. Follow a plan: proceed as if your chance of losing was say 50% of your true estimate of that chance - this will lead to consistent and realistic behaviour that cannot be easily diagnosed as bluffing.
Intelligence for Games and Puzzles11 The Poki program Poki is a rewrite & enhancement of Loki. It features a neural-network opponent modelling mechanism, inputs include estimated hand strength estimated hand potential previous action of opponent position of player clockwise from dealer (first, last, neither) predictions from “expert predictors” Opponent modelling is viewed as machine learning: predict opponent’s action Backpropagation within the neural network Plug-in “Expert Predictor(s)” (ensemble) may be machine-learning systems too Poki also features game-tree search, to 5 ply, using “miximax” to handle the problem of imperfect knowledge.
Intelligence for Games and Puzzles12 References
Intelligence for Games and Puzzles13 References Quoted in Aaron Davidson’s 2002 MSc thesis at the U.Alberta site: