DNA Starts to Learn Poker David Harlan Wood 4 * Hong Bi 1 Steven O. Kimbrough 2 Dongjun Wu 3 Junghuei Chen 1* Departments of 1 Chemistry & Biochemistry.

Slides:

Advertisements

Similar presentations

G5BAIM Artificial Intelligence Methods Graham Kendall Evolutionary Algorithms.

Advertisements

Genetic Programming Dan Kiely Ran Shoham Brent Heigold

Racquetball is a competitive game in which a racquet is used to serve and return a ball.

Lecture 13. Poker Two basic concepts: Poker is a game of skill, not luck. If you want to win at poker, – make sure you are very skilled at the game, and.

Constructing Complex NPC Behavior via Multi- Objective Neuroevolution Jacob Schrum – Risto Miikkulainen –

Short Stack Strategy – How to play after the flop Strategy: No Limit.

EGR 141 Computer Problem Solving in Engineering and Computer Science

Randomized Strategies and Temporal Difference Learning in Poker Michael Oder April 4, 2002 Advisor: Dr. David Mutchler.

No-Regret Algorithms for Online Convex Programs Geoffrey J. Gordon Carnegie Mellon University Presented by Nicolas Chapados 21 February 2007.

CS 484 – Artificial Intelligence

Neural Networks for Opponent Modeling in Poker John Pym.

Dealer Comm Hand Player makes Ante bet and optional Bonus bet. Five cards are dealt to each player from the shuffler. Five cards are dealt from the shuffler.

Card Counting What is it, how does it work, and can I use it to pay for college? (and if it does work, do I even have to go to college?) Jeff O’Connell.

Mathematics and the Game of Poker

Eponine Lupo.  Game Theory is a mathematical theory that deals with models of conflict and cooperation.  It is a precise and logical description of.

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 1 Games, Optimization, and Online Algorithms Martin Zinkevich University of Alberta.

Temporal Action-Graph Games: A New Representation for Dynamic Games Albert Xin Jiang University of British Columbia Kevin Leyton-Brown University of British.

Poker for Fun and Profit (and intellectual challenge) Robert Holte Computing Science Dept. University of Alberta.

Adaptive Multi-objective Differential Evolution with Stochastic Coding Strategy Wei-Ming Chen

Better automated abstraction techniques for imperfect information games, with application to Texas Hold’em poker Andrew Gilpin and Tuomas Sandholm Carnegie.

A competitive Texas Hold’em poker player via automated abstraction and real-time equilibrium computation Andrew Gilpin and Tuomas Sandholm Carnegie Mellon.

Simulating the Evolution of Contest Escalation Winfried Just and Xiaolu Sun Department of Mathematics and Edison Biotechnology Institute Ohio University.

Lectures in Microeconomics-Charles W. Upton Minimax Strategies.

Using Probabilistic Knowledge And Simulation To Play Poker (Darse Billings …) Presented by Brett Borghetti 7 Jan 2007.

Better automated abstraction techniques for imperfect information games, with application to Texas Hold’em poker * Andrew Gilpin and Tuomas Sandholm, CMU,

Evolutionary Computation Application Peter Andras peter.andras/lectures.

Alpha-Beta Search. 2 Two-player games The object of a search is to find a path from the starting position to a goal position In a puzzle-type problem,

Minimax Strategies. Everyone who has studied a game like poker knows the importance of mixing strategies. –With a bad hand, you often fold –But you must.

Reinforcement Learning in the Presence of Hidden States Andrew Howard Andrew Arnold {ah679

CPS 170: Artificial Intelligence Game Theory Instructor: Vincent Conitzer.

Texas Holdem Poker With Q-Learning. First Round (pre-flop) PlayerOpponent.

Genetic Algorithm.

Joost N. Kok Artificial Intelligence: from Computer Science to Molecular Informatics.

Advanced Artificial Intelligence Lecture 3B: Game theory.

History-Dependent Graphical Multiagent Models Quang Duong Michael P. Wellman Satinder Singh Computer Science and Engineering University of Michigan, USA.

Casinos There’s a reason they are big and extravagant!

Learning to Play Blackjack Thomas Boyett Presentation for CAP 4630 Teacher: Dr. Eggen.

林偉楷 Taiwan Evolutionary Intelligence Laboratory.

A quantum protocol for sampling correlated equilibria unconditionally and without a mediator Iordanis Kerenidis, LIAFA, Univ Paris 7, and CNRS Shengyu.

Evolution Strategies Evolutionary Programming Genetic Programming Michael J. Watts

Classifying Attributes with Game- theoretic Rough Sets Nouman Azam and JingTao Yao Department of Computer Science University of Regina CANADA S4S 0A2

Pricing Combinatorial Markets for Tournaments Presented by Rory Kulz.

Instructor: Vincent Conitzer

SARTRE: System Overview A Case-Based Agent for Two-Player Texas Hold'em Jonathan Rubin & Ian Watson University of Auckland Game AI Group

G5BAIM Artificial Intelligence Methods Dr. Rong Qu Evolutionary Algorithms.

The challenge of poker NDHU CSIE AI Lab 羅仲耘. 2004/11/04the challenge of poker2 Outline Introduction Texas Hold’em rules Poki’s architecture Betting Strategy.

Lecture 5A Mixed Strategies and Multiplicity Not every game has a pure strategy Nash equilibrium, and some games have more than one. This lecture shows.

Poker as a Testbed for Machine Intelligence Research By Darse Billings, Dennis Papp, Jonathan Schaeffer, Duane Szafron Presented By:- Debraj Manna Gada.

Yikan Chen Weikeng Qin 1.

Evolution Programs (insert catchy subtitle here).

Evolutionary Programming

Probability Evaluation 11/12 th Grade Statistics Fair Games Random Number Generator Probable Outcomes Resources Why Fair Games? Probable Outcome Examples.

1 DNA starts to Learn Poker David Harlan Wood Hong Bi Steven O.Kimbrough Dongjun Wu Junghuei Chen.

CPS 570: Artificial Intelligence Game Theory Instructor: Vincent Conitzer.

Iterated Prisoner’s Dilemma Game in Evolutionary Computation Seung-Ryong Yang.

Better automated abstraction techniques for imperfect information games Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department.

1 Autonomic Computer Systems Evolutionary Computation Pascal Paysan.

Evolving Strategies for the Prisoner’s Dilemma Jennifer Golbeck University of Maryland, College Park Department of Computer Science July 23, 2002.

Evolutionary Programming A.E. Eiben and J.E. Smith, Introduction to Evolutionary Computing Chapter 5.

CSE 4705 Artificial Intelligence

Extensive-Form Game Abstraction with Bounds

Evolution Strategies Evolutionary Programming

Nevin L. Zhang Room 3504, phone: ,

Game Theory Just last week:

Extensive-form games and how to solve them

Strategies for Poker AI player

Team Dont Block Me, National Taiwan University

Presentation transcript:

DNA Starts to Learn Poker David Harlan Wood 4 * Hong Bi 1 Steven O. Kimbrough 2 Dongjun Wu 3 Junghuei Chen 1* Departments of 1 Chemistry & Biochemistry and 4 Computer & Information Sciences University of Delaware 2 The Wharton School, University of Pennsylvania 3 Benett S. Lebow College of Business, Drexel University

Player Dealt an Ace Ace Say Ace (adds $1) Player Dealer Call (adds $1) Fold Losses $ 1 Deal Loses $2

2 Say Ace (adds $1) Say 2 Player Dealer Losses $ 1 Call (adds $1) Fold Losses $ 1 Wins $ 2 Deal Player dealt a 2

Ace2 Say Ace (adds $1) Say Ace (adds $1) Say 2 Player Dealer Call (adds $1) Fold Losses $ 1 Call (adds $1) Fold Losses $ 1 Wins $ 2 Deal Player dealt an Ace Player dealt a 2 Loses $2 OBJECTIVE: To Obtain Probabilistic Strategies Each player wants to obtain a strategy for the game. A strategy prescribes an action in every possible situation. That is, at each node, raising as a function of hand dealt.

Poker Play New Game New Dealer Strategies Deals Assemble New Player Strategies

Learning Separate by Payoffs Programmable Selection of Recovered Dealer Strategies Programmable Selection of Recovered Player Strategies Dealer’s Adaptation Player’s Adaptation Amplify Crossover Mutate Amplify Crossover Mutate Recover & Distribute Strategies Recover & Cut Play Histories for Player’s & Dealer’s Strategies Player’s StrategiesDealer’s Strategies

Learning Poker Play New Game Separate by Payoffs Programmable Selection of Recovered Dealer Strategies Programmable Selection of Recovered Player Strategies Dealer’s Adaptation Player’s Adaptation Amplify Crossover Mutate New Dealer Strategies Amplify Crossover Mutate Deals Assemble New Player Strategies Recover & Distribute Strategies Recover & Cut Play Histories for Player’s & Dealer’s Strategies Player’s StrategiesDealer’s Strategies

R.E. 1 Dealer’s Strategies R.E. 2 Stopper Say A’ FOLD’ Call’Fold’ Player’s Strategies R. E. 1 2’Say 2’Fold’Error SAY2’ Say A’A’Say A’ Stopper 2 Dealt 2 R.E. 2 A Ace2 Say Ace (adds $1) Say Ace (adds $1) Say 2 Player Dealer Call (adds $1) Fold Losses $ 1 Call (adds $1) Fold Losses $ 1 Wins $ 2 Deal Loses $2 Sequences from: Sakamoto, et. al, DNA4 (1997) Dealt A

Dealer’s Strategies

Player’s Strategies

Deals

Two Strategies and a Deal Define a Game Ace Dealt A Player’s Strategy R. E. 1 2’Say 2’Fold’Error SAY2’ Say A’A’Say A’ A Dealer’s Strategy R.E. 1R.E. 2 Say A’ FOLD’ Call’Fold’ A R.E. 2

Cut with R.E.1 & R.E.2 and Assemble A Game Player’s Strategy Dealer’s Strategy Deal 2’ Say 2’ Fold’ Error Say A’ A’ Say A’ Call’ Fold’ A SAY 2’ FOLD’ 2’ Say 2’ Fold’ Error Say A’ A’ Say A’ R. E. 1 Say A’ Call’ Fold’ R.E. 2 A SAY 2’ FOLD’

Cut with R.E.1 & R.E.2 and Assemble A Game Player’s Strategy Dealer’s Strategy Deal 2’ Say 2’ Fold’ Error Say A’ A’ Say A’ Call’ Fold’ A SAY 2’ FOLD’ 2’ Say 2’ Fold’ Error Say A’ A’ Say A’ R. E. 1 Say A’ Call’ Fold’ R.E. 2 A SAY 2’ FOLD’ Two Strategies and a Deal Define a Game Ace Dealt A Player’s Strategy R. E. 1 2’Say 2’Fold’Error SAY2’ Say A’A’Say A’ A Dealer’s Strategy R.E. 1R.E. 2 Say A’ FOLD’ Call’Fold’ A R.E. 2

Player’s Strategy Dealer’s Strategy Deal 2’Say 2’Fold’ErrorSay A’A’Say A’ Call’Fold’A SAY 2’ FOLD’ 74-mer (S1) 57-mer (S2) 48-mer (S3) 53-mer (S4) L1 (25 mer) L3 (28 mer) L2 (28 mer) S1 S2 S3 S4 R1 R2 M R1: Ligation Reaction R2: Purified Ligation Product

Ace Say Ace (adds $1) Say 2 Player Dealer Call (adds $1) Fold Losses $ 1 Deal Player dealt an Ace Player Says A Dealer Folds Dealer MIGHT Change to Call Loses $2

Player Dealt an Ace 2’Say 2’Fold’Error SAY 2’ Say A’A’Say A’ FOLD’ Call’Fold’ A Player’s Strategy Dealer’s Strategy Deal Player Says Ace A’Say A’ Extend (Say A) A Player’s Strategy Extend (Fold) Say A’ Fold’ Say A Dealer Folds Dealer’s Strategy Extend (Call) Dealer MIGHT Change to Call Fold’ FOLD’ Call’ Fold Preventer Dealer’s Strategy Error

Player Says Ace A’ Say A’ Extend (Say A) A Extend (Fold) Say A’ Fold’ Say A Dealer Fold Extend (Call) Dealer MIGHT Change to Call Fold’ FOLD’ Call’ Fold Preventer (232-mer) (247-mer) (262-mer) (282-mer)

2 Say Ace (adds $1) Say 2 Player Dealer Losses $ 1 Call (adds $1) Fold Losses $ 1 Wins $ 2 Deal Player dealt a 2 Player Says 2 (Block Say 2) Player Changes to Say A Dealer Changes to Call Dealer Folds

Player Dealt a 2 2 2’ Say 2’ Fold’Error SAY 2’ Say A’ A’ Say A’ FOLD’ Call’ Fold’ Player’s Strategy Dealer’s StrategyDeal Dealer MIGHT Change to Call FOLD’ Call’ Fold Extend (Call) Fold’Error Preventer Dealer’s Strategy Dealer Folds Extend (Fold) Say A’ Fold’ Say A Dealer’s Strategy Player MIGHT Change to Say Ace Player’s Strategy SAY 2’ Say A’ Extend (Say A) Say 2 Player Says 2 Say 2’ 2’ Extend (Say 2) 2 Player’s Strategy

Ace2 Say Ace (adds $1) Say Ace (adds $1) Say 2 Player Dealer Call (adds $1) Fold Losses $ 1 Call (adds $1) Fold Losses $ 1 Wins $ 2 Deal Player dealt an Ace Player dealt a 2 Player Says A Dealer Folds Dealer MIGHT Change to Call Loses $2 Dealer MIGHT Change to Call Dealer Folds Player MIGHT Change to Say Ace Player Says 2

Learning Poker Play New Game Separate by Payoffs Programmable Selection of Recovered Dealer Strategies Programmable Selection of Recovered Player Strategies Dealer’s Adaptation Player’s Adaptation Amplify Crossover Mutate New Dealer Strategies Amplify Crossover Mutate Deals Assemble New Player Strategies Recover & Distribute Strategies Recover & Cut Play Histories for Player’s & Dealer’s Strategies Player’s StrategiesDealer’s Strategies

Separate by Payoffs Programmable Selection of Recovered Dealer Strategies Dealer’s Adaptation Amplify Crossover Mutate Recover & Distribute Strategies Recover & Cut Play Histories for Player’s & Dealer’s Strategies Player’s StrategiesDealer’s Strategies Strategies are returned grouped by outcomes: -$ 2, - $ 1, + $ 1, + $ 2. Select Dealer’s own Preferred mix of strategies to be bred Breed by using PCR to restore population size using a variable mutation rate. Crossover by pairwise recombining of “change your mind” regions. Learning

Ace2 Say Ace (adds $1) Say Ace (adds $1) Say 2 Player Dealer Call (adds $1) Fold Losses $ 1 Call (adds $1) Fold Losses $ 1 Wins $ 2 Deal Player dealt an Ace Player dealt a 2 Loses $2 OBJECTIVE: To Obtain Probabilistic Strategies Each player wants to obtain a strategy for the game. A strategy prescribes an action in every possible situation. That is, at each node, raising as a function of hand dealt.

Complexity Our complexity is linear in the number of nodes in the tree # nodes in tree = 2 players + betting rounds At each node, we need a probability distribution giving “level of bet” as a function of “dealt hand”. For us, probability distribution is substituted by probabilistic hybridization of DNA encoded “dealt hand” to adapting “change you mind about folding” region of strategy. The output (if generated) is an adapting “level of bet” region of strategy. hand bet next next’ bet generator next Extend bet’ hand’ hand evaluator

Comparison Koller and Pfeffer derive equilibrium mixed strategies with complexity polynomial in # nodes * # possible deals * 2 betting levels “Representations and Solutions for Game-Theoretic Problems,” Artificial Intelligence (1997) Two-player games only Don’t exploit weakness of opponent No dynamics, only equilibrium

Player 1 Player 2 Player Player Poker: All Possible Deals Course of Play P1 P2 P3 P2 P1 PassBet $ a Pass Bet $ a FC FCFCFC FC FCFC FCFC C: Call (add $ b) F: Fold

Learning Poker Recover Dealer’s & Player’s Strategies Play New GameSeparate by Payoffs Programmable Selection of Recovered Dealer Strategies Programmable Selection of Recovered Player Strategies Dealer’s Adaptation Player’s Adaptation Amplify Crossover Mutate New Dealer Strategies Amplify Crossover Mutate Deals Assemble New Player Strategies

A A A A A A A A

A A A A A A A A A A