Playing Dice with Soar Soar Workshop June 2011 John E. Laird Nate Derbinsky, Miller Tinkerhess University of Michigan 1.

Slides:



Advertisements
Similar presentations
Soar and StarCraft By Alex Turner. What is StarCraft: Brood War? A Real-Time Strategy (RTS) computer game released in A sci-fi war simulation Continually.
Advertisements

Learning To Use Memory Nick Gorski & John Laird Soar Workshop 2011.
AI Pathfinding Representing the Search Space
Overview Fundamentals
On the Genetic Evolution of a Perfect Tic-Tac-Toe Strategy
Adversarial Search Chapter 5.
Adversarial Search CSE 473 University of Washington.
Artificial Intelligence for Games Game playing Patrick Olivier
Games CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
Rational Agents for the Card Game “BS” Jeff Reilly, Brian Munce, Stephen Tanguis, Alex Keeler.
BLACKJACK SHARKS: Aabida Mayet, Omar Bacerra, Eser Kaptan March 15, 2010.
Created by Charles Jenkins
Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. 15 Chances, Probabilities, and Odds 15.1Random Experiments and.
Artificial Intelligence in Game Design
VK Dice By: Kenny Gutierrez, Vyvy Pham Mentors: Sarah Eichhorn, Robert Campbell.
Game Design Serious Games Miikka Junnila.
EC: Lecture 17: Classifier Systems Ata Kaban University of Birmingham.
1 Chunking with Confidence John Laird University of Michigan June 17, th Soar Workshop
Probability And Expected Value ————————————
A Soar’s Eye View of ACT-R John Laird 24 th Soar Workshop June 11, 2004.
Using Probabilistic Knowledge And Simulation To Play Poker (Darse Billings …) Presented by Brett Borghetti 7 Jan 2007.
Reinforcement Learning (1)
Dice Games Properly Presented Todd W. Neller. What’s Up With the Title? My favorite book on dice games: – Many games – Diverse games – Well-categorized.
Check it out! : Simple Random Sampling. Players of a dice game roll five dice and earn points according to the combinations of numbers they roll.
1 Adversary Search Ref: Chapter 5. 2 Games & A.I. Easy to measure success Easy to represent states Small number of operators Comparison against humans.
L-3 PROPRIETARY 1 Real-time IO-oriented Soar Agent for Base Station-Mobile Control David Daniel, Scott Hastings L-3 Communications.
Game Trees: MiniMax strategy, Tree Evaluation, Pruning, Utility evaluation Adapted from slides of Yoonsuck Choe.
Welcome to NEWBURY BRIDGE CLUB Doubles. Most of you are used to doubling for take-out in the 2 nd seat.(i.e. immediately after the opener) Much of today’s.
Minimax Trees: Utility Evaluation, Tree Evaluation, Pruning CPSC 315 – Programming Studio Spring 2008 Project 2, Lecture 2 Adapted from slides of Yoonsuck.
Reinforcement Learning
1 Today Random testing again Some background (Hamlet) Why not always use random testing? More Dominion & project? CUTE: “concolic” testing.
Art 315 Lecture 6 Dr. J. Parker. Variables Variables are one of a few key concepts in programming that must be understood. Many engineering/cs students.
Integrating Background Knowledge and Reinforcement Learning for Action Selection John E. Laird Nate Derbinsky Miller Tinkerhess.
Game Playing. Towards Intelligence? Many researchers attacked “intelligent behavior” by looking to strategy games involving deep thought. Many researchers.
1.3 Simulations and Experimental Probability (Textbook Section 4.1)
Games. Adversaries Consider the process of reasoning when an adversary is trying to defeat our efforts In game playing situations one searches down the.
Games 1 Alpha-Beta Example [-∞, +∞] Range of possible values Do DF-search until first leaf.
Soar Technology, Inc. Proprietary 6/4/ Using Soar to Teach Probabilistic Reasoning Jim Thomas 6/5/2013.
CHECKERS: TD(Λ) LEARNING APPLIED FOR DETERMINISTIC GAME Presented By: Presented To: Amna Khan Mis Saleha Raza.
The challenge of poker NDHU CSIE AI Lab 羅仲耘. 2004/11/04the challenge of poker2 Outline Introduction Texas Hold’em rules Poki’s architecture Betting Strategy.
Poker as a Testbed for Machine Intelligence Research By Darse Billings, Dennis Papp, Jonathan Schaeffer, Duane Szafron Presented By:- Debraj Manna Gada.
Probability Evaluation 11/12 th Grade Statistics Fair Games Random Number Generator Probable Outcomes Resources Why Fair Games? Probable Outcome Examples.
Today’s Topics Playing Deterministic (no Dice, etc) Games –Mini-max –  -  pruning –ML and games? 1997: Computer Chess Player (IBM’s Deep Blue) Beat Human.
Software Development. Software Development Loop Design  Programmers need a solid foundation before they start coding anything  Understand the task.
Machine Learning in Practice Lecture 5 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
GamblingGambling What are the odds? Jessica Judd.
ARTIFICIAL INTELLIGENCE (CS 461D) Princess Nora University Faculty of Computer & Information Systems.
Beyond Chunking: Learning in Soar March 22, 2003 John E. Laird Shelley Nason, Andrew Nuxoll and a cast of many others University of Michigan.
Sampling Design and Analysis MTH 494 Lecture-21 Ossam Chohan Assistant Professor CIIT Abbottabad.
Goals Your organization has three metrics to measure mission effectiveness Count your successes in each category for each roll 1.Total of exactly thirteen.
Copyright © Cengage Learning. All rights reserved. 5 Joint Probability Distributions and Random Samples.
More on Logic Today we look at the for loop and then put all of this together to look at some more complex forms of logic that a program will need The.
Programming Games of Chance
Competence-Preserving Retention of Learned Knowledge in Soar’s Working and Procedural Memories Nate Derbinsky, John E. Laird University of Michigan.
The expected value The value of a variable one would “expect” to get. It is also called the (mathematical) expectation, or the mean.
The normal approximation for probability histograms.
Artificial Intelligence in Game Design Board Games and the MinMax Algorithm.
1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Chapter 9 Understanding Randomness.
Understanding AI of 2 Player Games. Motivation Not much experience in AI (first AI project) and no specific interests/passion that I wanted to explore.
© 2009, Tom McKendree The Systems Engineering Game (Basic Game) 1.0 Introduction This is a game about the systems engineering process, as represented by.
Search: Games & Adversarial Search Artificial Intelligence CMSC January 28, 2003.
Dice Game Introduction
Software Development.
Key Rule: Each Number has only one valid combination of Prime Factors.
Student Activity 1: Fair trials with two dice
Probability of casino games
The Fair Unfair Polarization
John E. Laird 32nd Soar Workshop
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Truth tables: Ways to organize results of Boolean expressions.
Presentation transcript:

Playing Dice with Soar Soar Workshop June 2011 John E. Laird Nate Derbinsky, Miller Tinkerhess University of Michigan 1

Supporting Software Freedice.net [Nate Derbinsky] – Supports correspondence dice games for humans and bots through web interface – Used for original development Dice World [Miller Tinkerhess] – Java based dice game – Many times faster than Freedice – Used for generating results Both give same information available to all players QnA [Nate Derbinsky] – Allows easy attachment of external software device where there is an immediate response. – Similar to I/O but with single access for input & output Dice Probability Calculator [Jon Voigt & Nate Derbinsky] – Computes simple probabilities: how likely is it for there to be 6 dice of the same face if there are a total of 20 dice. Page 2

Dice Game Rules Roll dice in cup – Everyone starts with five dice Actions – Bid: number of dice of a given rank: six twos – Bid and reroll: push out a subset of dice and reroll – Exact: exactly the number of dice bid in play – Challenge: < number of dice bid in play – Pass: all dice are the same (must have >= 2 dice). Lose die if lose a challenge Last one with remaining dice wins Page 3

Key Strategic Considerations 1.If you raise too high, you might be challenged. 2.If you don’t raise high enough, it will come around again for another turn. 3.Can only lose a die if you challenged or are challenged. 4.Pushing dice increases information for other players and reduces your flexibility. Page 4

Research Issues in Dice Game Multi-faceted uncertainty – What dice are under other cups? – What did that bid indicate? – What will the next player do after my bid? – How combine different sources of uncertainty? Non-trivial cognitive challenge for humans – Actions take seconds to minutes Human performance correlated with experience – A fair amount to learn beyond the rules – Potential for opponent modeling – Potential for learning Page 5

Decision making under multi-faceted uncertainty in Soar? Agent 1: Expected Values – Compute expected value for the bid based on known and unknown dice. If there are 6 unknown dice, and I have 2 2’s, then there are most likely 6/6 = = 3 2’s. Compare bids to that expected value and classify as certain, safe, risky, very risky. Similar to what we think humans do. Agent 2: Probability Values – Compute probability of bids being successful. Uses external software calculator. Additional heuristics contribute to final bid: – Don’t pass or challenge if have another good bid. – Don’t bid and push if bid alone is a good bid. –…–… Page 6

Overall Approach 1.Propose operators for all legal actions – Raises, raises with pushes, exact, pass, challenge 2.Tie impasse arises between operators 3.Evaluate all of the operators in selection space Create preferences based on evaluations 1.symbolic evaluations → symbolic preferences 2.probability-based evaluations → numeric preferences 4.Decision procedure picks the best operator. Page 7

Dice Problem Space Operators Raise bid – Next legal bid given last bid for each die rank (1-6) – Special processing to determine first bid and lowest reasonable bid Raise, push, and roll – Next legal bid given last bid for each die rank where there is a possible push – Only consider pushing all relevant dice (no Valerie Bids) Challenge – Previous bid, or two back if previous is a pass Exact (if available) Pass A few more operators for initialization and counting up available dice on each turn. Total dice showing for each rank Total dice under other players’ cups Totals for my dice under my cup 8

One-step Look-ahead Using Selection Problem Space Page 9 (on A Table) (on B Table) (on C A) AB C move(C, B) move(B, C) move(C, Table) Evaluate(move(C, Table)) (on A Table) (on B Table) (on C A) move(C, Table) Tie Impasse (on A Table) (on B Table) (on C Table) Evaluation = 1 Evaluate(move(C, B)) Evaluate(move(B,C)) copy Evaluation = 0 Evaluation = 1 Prefer move(C, Table) AB C A B C Goal

Replace Look-Ahead Evaluation with Expected-Value Calculation (Agent 1) Page 10 compute-bid- difference Tie Impasse Evaluation = risky Evaluate(challenge) Evaluate(pass) Evaluation = safeEvaluation = very risky Evaluation = risky Challenge > Bid Challenge > Pass Bid > Pass Bid: 6[4] Challenge Pass Evaluate(bid: 6[4]) compute-bid- likelihood Possible Evaluations: lose, very risky, risky, safe, certain [Exposed 4’s] + [in my cup 4’s] + [unknown]/6 – [# of dice bid: 6] = Bid difference Conversion of evaluations to preferences is done by a bunch of rules with many special cases.

Approach #1 Operators for Evaluation Simple Bid (Raises) – Compute-bid-difference [-N to +M] – Evaluate-bid-likelihood [lose, very risky, risky, safe, certain] Bid, Push, and Reroll – Compute-bid-push-difference – Evaluate-bid-likelihood Challenge – Compute-challenge-bid-difference – Evaluate-challenge-bid-likelihood – Compute-challenge-pass-likelihood Exact – Compute-exact-difference – Evaluate-bid-likelihood Pass – Compute-pass-likelihood 11

Replace Look-Ahead Evaluation with Probability Calculation (Agent 2) Page 12 compute-bid- probability [using QnA] Tie Impasse Evaluation =.3 Evaluate(challenge) Evaluate(pass) Evaluation =.9 Evaluation = very risky.08 Evaluation =.3 Challenge =.9 Bid: 6[4] =.3 Pass =.08 Bid: 6[4] Challenge Pass Evaluate(bid: 6[4]).3 Conversion of evaluations to preferences is done by a bunch of rules with many special cases.

Approach #2 Evaluation Operators Simple Bids (Raises) – Compute-bid-probability Bid, Push, and Reroll – Compute-bid-push-probability Challenge – Compute-challenge-probability – Compute-challenge-pass-likelihood Exact – Compute-exact-probability Pass – Compute-pass-likelihood 13

Observations Plays a good game! – Doesn’t make stupid bids and is a bit unpredictable – Tends to be conservative on bids – Tends to be aggressive on challenges – Has beaten human “experts” Bluffs when it doesn’t have a good bid. – But doesn’t explicitly decide to bluff Does not always take the safest bid – Sometimes from randomness – Sometimes because of selection knowledge Don’t take certain pass when have another safe bid Page 14

Model Previous Player Agent tends to challenge too often. A player usually makes a specific bid for a reason! Approach: Add selection space operator that computes likely dice under previous player’s cup based on most recent bid. – Selected only if no certain bid using known dice. Analysis of previous player’s bid Iterate through possible number of dice bid starting with 1 until 1.find a number that would make the bid reasonable or 2.reach a number that is very unlikely Use result to recalculate expected value/probabilities of possible bids. Page 15

Results for 1000 two player games Page 16 No Model – ProbabilityNo Model – Expectation Model – Probability761/239703/297 Model – Expectation681/319607/393 Expectation – No ModelExpectation – Model Probability – No Model451/549319/681 Probability – Model703/297571/429 Player against self is ~480/520 Without model, expectation-based is better. With model, probability-based is better. Model is more important for the probability-based agent.

Three & Four Player Games with Models Page 17 Expectation414 Expectation289 Probability297 Expectation384 Probability284 Probability332 Probability346 Probability331 Probability323 Expectation246 Probability231 Expectation267 Probability256 Expectation381 Expectation230 Probability146 Probability243

Future Work: More Opponent Modeling Extend evaluation calculations – Worst case analysis for next player for my best bids “No matter what he has, he won’t challenge that bid.” Allows agent to bluff more. – Possibly biased by next player’s last bid. Episodic memory: – History of how next player responded to similar bids. – History of what player had when made similar bid. Model other players to have better estimates of hidden dice. Page 18

Future Work: Chunking and RL Use chunking to learn (lots of) RL rules. – Hard to write these by hand – too many cases – Chunking works with the current agents to learn rules that test features relevant to created preference. – Numeric preferences are initial values for RL. Use RL to “tune” rules learned by chunking. – Need lots of experience. Interesting proposal for the source of RL rules. Initial results are not promising… Page 19

Future Work: More Baselines and Experiments Pure probability and expectation-based without heuristics. RL agents with different value functions. – Fair number of state-action pairs: – Number of possible prior bids (>200) * possible configurations of dice (10 6 ) * number of next bids (~15) = 3 Billion Human players (from the web?). Create (large) set of cases that can be used for direct comparison: – Find states where different agents make different bids. Page 20

Nuggets and Coal Nuggets: – Two different approaches for reasoning with probabilities in Soar that fits in “naturally”. – Example of using opponent model. – Leads to a competent dice player. Coal – Don’t understand strengths and weaknesses. – Has promise for generating and using RL rules but haven’t achieved it. Page 21