Outline Dudo Rules Regret and Counterfactual Regret Imperfect Recall Abstraction Counterfactual Regret Minimization (CFR) Difficulties Fixed-Strategy.

Slides:



Advertisements
Similar presentations
Adversarial Search Chapter 6 Section 1 – 4. Types of Games.
Advertisements

Markov Decision Process
Adversarial Search We have experience in search where we assume that we are the only intelligent being and we have explicit control over the “world”. Lets.
This Segment: Computational game theory Lecture 1: Game representations, solution concepts and complexity Tuomas Sandholm Computer Science Department Carnegie.
MIT and James Orlin © Dynamic Programming 1 –Recursion –Principle of Optimality.
EGR 141 Computer Problem Solving in Engineering and Computer Science
Randomized Strategies and Temporal Difference Learning in Poker Michael Oder April 4, 2002 Advisor: Dr. David Mutchler.
No-Regret Algorithms for Online Convex Programs Geoffrey J. Gordon Carnegie Mellon University Presented by Nicolas Chapados 21 February 2007.
Adversarial Search 對抗搜尋. Outline  Optimal decisions  α-β pruning  Imperfect, real-time decisions.
SCIT1003 Chapter 2: Sequential games - perfect information extensive form Prof. Tsang.
Use logic to teach the computer how to play a game
Dudo: An Introduction to the Incan Bluffing Dice Game
Artificial Intelligence in Game Design
Dudo: An Introduction to the Incan Bluffing Dice Game Todd W. Neller Dept. of Computer Science.
Markov Decision Processes
This time: Outline Game playing The minimax algorithm
Games with Chance Other Search Algorithms CPSC 315 – Programming Studio Spring 2008 Project 2, Lecture 3 Adapted from slides of Yoonsuck Choe.
Games and adversarial search
Ensemble Learning: An Introduction
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
1 Solving Ponnuki-Go on Small Board Paper: Solving Ponnuki-Go on small board Authors: Erik van der Werf, Jos Uiterwijk, Jaap van den Herik Presented by:
Final Presentation Simon McNeilly Supervisors Dr. Lloyd Allison Jon McCormack Melody Generation by Phrase.
Alpha-Beta Search. 2 Two-player games The object of a search is to find a path from the starting position to a goal position In a puzzle-type problem,
Time for playing games Form pairs You will get a sheet of paper to play games with You will have 12 minutes to play the games and turn them in.
Experts and Boosting Algorithms. Experts: Motivation Given a set of experts –No prior information –No consistent behavior –Goal: Predict as the best expert.
Poki: The Poker Agent Greg Priebe Zak Knudson. Overview Texas Hold’em poker Architecture and Opponent Modeling of Poki Improvements from past Poki Betting.
Dice Games Properly Presented Todd W. Neller. What’s Up With the Title? My favorite book on dice games: – Many games – Diverse games – Well-categorized.
1 Adversary Search Ref: Chapter 5. 2 Games & A.I. Easy to measure success Easy to represent states Small number of operators Comparison against humans.
The game of Craps Rules of play: 1. Played with two dice (six faces to a die – numbers 1-6 per face) 2. Sequence of betting rounds (or just rounds) 3.
Utility Theory & MDPs Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart.
Adversarial Search Chapter 5 Adapted from Tom Lenaerts’ lecture notes.
Minimax Trees: Utility Evaluation, Tree Evaluation, Pruning CPSC 315 – Programming Studio Spring 2008 Project 2, Lecture 2 Adapted from slides of Yoonsuck.
Chapter 9 Games with Imperfect Information Bayesian Games.
Lecture 6: Game Playing Heshaam Faili University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance.
Dynamic Games of complete information: Backward Induction and Subgame perfection - Repeated Games -
Game Theory Robin Burke GAM 224 Spring Outline Admin Game Theory Utility theory Zero-sum and non-zero sum games Decision Trees Degenerate strategies.
Games with Imperfect Information Bayesian Games. Complete versus Incomplete Information So far we have assumed that players hold the correct belief about.
Instructor: Vincent Conitzer
Reinforcement Learning Presentation Markov Games as a Framework for Multi-agent Reinforcement Learning Mike L. Littman Jinzhong Niu March 30, 2004.
Integrating Background Knowledge and Reinforcement Learning for Action Selection John E. Laird Nate Derbinsky Miller Tinkerhess.
Game Playing. Towards Intelligence? Many researchers attacked “intelligent behavior” by looking to strategy games involving deep thought. Many researchers.
More on Logic Today we look at the for loop and then put all of this together to look at some more complex forms of logic that a program will need The.
A RTIFICIAL I NTELLIGENCE Games. G AMES AND C OMPUTERS Games offer concrete or abstract competitions “I’m better than you!” Some games are amenable to.
Sample: Probability “Fair Game” Project (borrowed from Intel®, then adjusted)
An Introduction to Game Theory Math 480: Mathematics Seminar Dr. Sylvester.

ARTIFICIAL INTELLIGENCE (CS 461D) Princess Nora University Faculty of Computer & Information Systems.
The Standard Genetic Algorithm Start with a “population” of “individuals” Rank these individuals according to their “fitness” Select pairs of individuals.
GOMOKU ALGORITHM STUDY MIN-MAX AND MONTE CARLO APPROACHING
More on Logic Today we look at the for loop and then put all of this together to look at some more complex forms of logic that a program will need The.
Error-Correcting Code
Finding equilibria in large sequential games of imperfect information CMU Theory Lunch – November 9, 2005 (joint work with Tuomas Sandholm)
REINFORCEMENT LEARNING Unsupervised learning 1. 2 So far ….  Supervised machine learning: given a set of annotated istances and a set of categories,
Artificial Intelligence in Game Design Board Games and the MinMax Algorithm.
Design of Rock-Paper-Scissors end Print instructions Play game Ask user If (s)he wants to play again Play again? Y Print stats: wins, losses and ties start.
Online Learning Model. Motivation Many situations involve online repeated decision making in an uncertain environment. Deciding how to invest your money.
Dice Game Introduction
Artificial Intelligence AIMA §5: Adversarial Search
ADVERSARIAL GAME SEARCH: Min-Max Search
Extensive-Form Game Abstraction with Bounds
Stochastic tree search and stochastic games
4. Games and adversarial search
Game Theory Just last week:
Last time: search strategies
Markov Decision Processes
Extensive-form games and how to solve them
Games with Chance Other Search Algorithms
NIM - a two person game n objects are in one pile
Backpropagation Disclaimer: This PPT is modified based on
Minimax Trees: Utility Evaluation, Tree Evaluation, Pruning
Presentation transcript:

Outline Dudo Rules Regret and Counterfactual Regret Imperfect Recall Abstraction Counterfactual Regret Minimization (CFR) Difficulties Fixed-Strategy Iteration Counterfactual Regret Minimization (FSICFR) Results

Dudo Bluffing dice game Origin generally believed to be 15 th c. Inca Many variants/names (Liar’s Dice, Perudo, Bluff, Call My Bluff, Cacho, Cachito) Internationally popular – BoardGameGeek.com rank 272/55074 (~top ½%), November 10 th, 2011

Dudo Overview Bluffing dice game for 2+ players – Each player rolls 5 dice concealed under cup – Players make successively bolder claims about all dice rolled until player challenges – Loser of challenge loses dice – Last player with dice wins

Dudo Claims

Dudo Rules Players each roll and privately view 5 dice concealed under a cup. 1s are wild. Players make successively greater claims until one challenges the previous claim with “Dudo!” (Sp. “I doubt it!”), all reveal dice, and: – More/less than claimed? Challenger/claimant loses dice according to difference. – Claim exactly correct? All others lose 1 die. The next round begins with challenge winner. The last player with dice wins.

Regret Rock, Paper, Scissors (a.k.a. Roshambo) +1 win, 0 tie, -1 loss Losing choice results in regret of 1/2 for not choosing tie/win play. Hart & Mas-Colell Regret-Matching (2000): – Depart from current play with probabilities proportional to cumulative past regrets

Counterfactual Regret Example Input: realization weights Compute node strategy from normalized positive cumulative regret. Update avg. output strategy weighted by player realization weight. Recursively evaluate strategy to compute action values and node value. Compute counterfactual regret. Update cumulative regret weighted by opponent realization weight. Player 1 Node Realization weights p1, p2 Actions a1 a2 v2 v1 a3 v3

Counterfactual Regret Example p1p2 Realization Weights Player 1 Node: a1a2a3 Cumulative Regret Positive Regret20030 Strategy Cumulative Strategy += Player 1 Node Actions:123 p1' p2'0.25 v Node Value28 Action Regrets Counterfactual Regrets3-9-2 Old Cumulative Regret New Cumulative Regret Input: realization weights Compute node strategy from normalized positive cumulative regret. Update avg. output strategy weighted by player realization weight. Recursively evaluate strategy to compute action values and node value. Compute counterfactual regret. Update cumulative regret weighted by opponent realization weight.

Dudo Information Sets Information set – all game states consistent with what you know – Dudo: Your dice, the number of opponent dice, the claim history – 2-player Dudo info sets: 294,021,177,291,188,232,192 Imperfect recall abstraction – Remember up to the last m claims – m = 3  21,828,536 abstract info. sets – Abstraction + CFR can lead to pathological behavior, but not apparently in this case.

Problems in applying CFR to Dudo Even with imperfect recall, the exponentially growing paths through possible states make application infeasible:

Insight CFR w/ abstracted info. sets, – node visits grow exponentially with depth. – strategies and regrets change each node visit We have restructured the algorithm to make a single dynamic-programming style – forward pass computing strategies and accumulating realization weights, and – backward pass computing utilities and accumulating regrets. By fixing-strategies throughout the forward pass, we can use the same regret-matching approach, yet reduce exponential complexity to linear.

2-versus-2 Dice Training

Time (ms) per Training Iteration

Varying Imperfect Recall

Summary Imperfect recall abstraction brings Dudo information sets from 2.9 x to 2.2x10 7. Even so, CFRs exponential growth through depths of claim sequences proves infeasible. FSICFR: Holding strategies fixed through dynamic-programming-style forward and backward pass allows efficient CFR training. Any extensive game tree with significant paths to the same (abstracted) info. sets benefits.