ChooseMove=16  19 V =30 points. LegalMoves= 16  19, or SimulateMove = 11  15, or …. 16  19,

Slides:



Advertisements
Similar presentations
Machine learning Overview
Advertisements

Probability.
Introduction to Probability
1 UM Stratego Collin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister.
5.2 Systems of Linear Equations in Three Variables
Chapter 6, Sec Adversarial Search.
SJS SDI_161 Design of Statistical Investigations Stephen Senn Random Sampling I.
1 Machine Learning: Lecture 1 Overview of Machine Learning (Based on Chapter 1 of Mitchell T.., Machine Learning, 1997)
© Glenn Rowe AC Lab 2 A simple card game (twenty- one)
Adversarial Search Chapter 6 Section 1 – 4. Types of Games.
Reaching Agreements II. 2 What utility does a deal give an agent? Given encounter  T 1,T 2  in task domain  T,{1,2},c  We define the utility of a.
Local Search Jim Little UBC CS 322 – CSP October 3, 2014 Textbook §4.8
Order of Operations And Real Number Operations
MA 1128: Lecture 06 – 2/15/11 Graphs Functions.
Classification Classification Examples
Pick Me. Basic Probability Population – a set of entities concerning which statistical inferences are to be drawn Random Sample – all member of population.
The Nature of Mathematical Reasoning
Adversarial Search Reference: “Artificial Intelligence: A Modern Approach, 3 rd ed” (Russell and Norvig)
Evolution and Repeated Games D. Fudenberg (Harvard) E. Maskin (IAS, Princeton)
Artificial Intelligence in Game Design Heuristics and Other Ideas in Board Games.
Minimax and Alpha-Beta Reduction Borrows from Spring 2006 CS 440 Lecture Slides.
Artificial Intelligence in Game Design
Games with Chance 2012/04/25 1. Nondeterministic Games: Backgammon White moves clockwise toward 25. Black moves counterclockwise toward 0. A piece can.
1er. Escuela Red ProTIC - Tandil, de Abril, Introduction How to program computers to learn? Learning: Improving automatically with experience.
Lesson 2-2 Example Solve. Tic Tac Toe Katrina and her friend played Tic Tac Toe. The outcomes of the games are shown in the line plot. Did Katrina.
How to play Chess.
Check it out! : Simple Random Sampling. Players of a dice game roll five dice and earn points according to the combinations of numbers they roll.
1 Reinforcement Learning: Learning algorithms Function Approximation Yishay Mansour Tel-Aviv University.
Texas Holdem Poker With Q-Learning. First Round (pre-flop) PlayerOpponent.
Chess Merit Badge Chess Basics: Set Up the Board & Basic Rules by Joseph L. Bell © 2011.
Game Playing.
CpSc 810: Machine Learning Design a learning system.
Artificial Intelligence in Game Design Lecture 22: Heuristics and Other Ideas in Board Games.
AD FOR GAMES Lecture 4. M INIMAX AND A LPHA -B ETA R EDUCTION Borrows from Spring 2006 CS 440 Lecture Slides.
Mark Dunlop, Computer and Information Sciences, Strathclyde University 1 Algorithms & Complexity 5 Games Mark D Dunlop.
General-to-Specific Ordering. 8/29/03Logic Based Classification2 SkyAirTempHumidityWindWaterForecastEnjoySport SunnyWarmNormalStrongWarmSameYes SunnyWarmHighStrongWarmSameYes.
1 Phase II - Checkers Operator: Eric Bengfort Temporal Status: End of Week Five Location: Phase Two Presentation Systems Check: Checkers Checksum Passed.
Well Posed Learning Problems Must identify the following 3 features –Learning Task: the thing you want to learn. –Performance measure: must know when you.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 9 of 42 Wednesday, 14.
DEEP RED An Intelligent Approach to Chinese Checkers.
Chapter 1: Introduction. 2 목 차목 차 t Definition and Applications of Machine t Designing a Learning System  Choosing the Training Experience  Choosing.
Random Numbers Random numbers are extremely useful: especially for games, and also for calculating experimental probabilities. Formula for generating random.
1 Introduction to Machine Learning Chapter 1. cont.
ARTIFICIAL INTELLIGENCE (CS 461D) Princess Nora University Faculty of Computer & Information Systems.
Introduction Machine Learning: Chapter 1. Contents Types of learning Applications of machine learning Disciplines related with machine learning Well-posed.
Well Posed Learning Problems Must identify the following 3 features –Learning Task: the thing you want to learn. –Performance measure: must know when you.
The Standard Genetic Algorithm Start with a “population” of “individuals” Rank these individuals according to their “fitness” Select pairs of individuals.
Adversarial Search. Regular Tic Tac Toe Play a few games. –What is the expected outcome –What kinds of moves “guarantee” that?
1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Chapter 9 Understanding Randomness.
Understanding AI of 2 Player Games. Motivation Not much experience in AI (first AI project) and no specific interests/passion that I wanted to explore.
Supervise Learning Introduction. What is Learning Problem Learning = Improving with experience at some task – Improve over task T, – With respect to performance.
Adversarial Search and Game-Playing
Instructor: Vincent Conitzer
Spring 2003 Dr. Susan Bridges
Mastering the game of Go with deep neural network and tree search
Tic – Tac – Toe ! Choose a square by clicking on “box #”
Adversarial Search.
David Kauchak CS52 – Spring 2016
CHAPTER 6 PROBABILITY & SIMULATION
CS 4700: Foundations of Artificial Intelligence
Tic – Tac – Toe ! Choose a square by clicking on “box #”
Tic – Tac – Toe ! Choose a square by clicking on “box #”
Tic – Tac – Toe ! Choose a square by clicking on “box #”
Instructor: Vincent Conitzer
Tic – Tac – Toe ! Choose a square by clicking on “box #”
Tic – Tac – Toe ! Choose a square by clicking on “box #”
Tic – Tac – Toe ! Choose a square by clicking on “box #”
CS51A David Kauchak Spring 2019
2019 SAIMC Puzzle Challenge General Regulations
Unit II Game Playing.
Presentation transcript:

ChooseMove=16  19 V =30 points

LegalMoves= 16  19, or SimulateMove = 11  15, or …. 16  19,

For all legal moves: - simulate that move - check the value of the result Pick the best move LegalMoves 12  16, or 11  15, or …. 12  16 SimulateMove = V 30 points = =

We’ll make our computer program learn the function V - so when it sees a board, it will assign a score to the board - it has to learn how to assign the correct score to a board First, we have to define how our function V is supposed to behave…

…but what about boards in the middle of a game? b b ‘ successor( b ) …. V(b’)=100

…but what about boards in the middle of a game? b b ‘ successor( b ) …. V(b’)=100 V(b)=V(b’)=100 ….

b b ‘ successor( b ) …. V(b’)=100 V(b)=V(b’)=100 V(b)=V(b’)=0 …. V(b’)=-100 NB: if this board state occurs in 50% wins and 50% loses then V(b) will eventually be 0

b b ‘ successor( b ) …. V(b’)=100 V(b)=V(b’)=100 V(b)=V(b’)=0 …. V(b’)=-100 NB: if this board state occurs in 75% wins and 25% loses then V(b) will eventually be 50

b b ‘ successor( b ) …. V(b’)=100 V(b)=V(b’)=100 V(b)=V(b’)=0 …. V(b’)=-100 so each board score will eventually be a mix of the number of games that you can win, lose or draw from that board (assuming an ideal scoring function…in fact we often only approximate this)

…but what about boards in the middle of a game? b b ‘ successor( b ) …. V(b’)=100 V(b)=V(b’)=100 …. that’s our definition of V – this is how our computer program’s V should learn how to behave…

But in the middle of a game, how can you calculate the path to the final board? - generate all possibilities? tic-tac-toe chess  …takes too long! - approximate V and call it V’ (V hat ) V’(b)  V(b) …so we’ll learn an approximation to V b b ‘ successor( b ) …. V(b’)=100 V(b)=V(b’)=100 ….

Now we’ll choose our representation of V’ (what is a board anyway…?) 1) big table? input: boardscore ….

2) CLIPS rules? IF my piece is near a side edge THEN score = 80 IF my piece is near the opponents edge THEN score = 90 …. Now we’ll choose our representation of V’ (what is a board anyway…?)

3) polynomial function? X 1 = 12 X 1 = number of white pieces on board X 2 = number of red pieces X 3 = number of white kings X 4 = number of red kings X 5 = number of white pieces threatened by red (can be captured on red’s next turn) X 6 = number of red pieces threatened by white X 2 = 11 X 3 = 0 X 4 = 0 X 5 = 1 X 6 = 0 e.g. we define some variables… Now we’ll choose our representation of V’ (what is a board anyway…?)

3) polynomial function? X 1 = number of white pieces on board X 2 = number of red pieces X 3 = number of white kings X 4 = number of red kings X 5 = number of white pieces threatened by red (can be captured on red’s next turn) X 6 = number of red pieces threatened by white arrange as linear combination: (quadratic function) Now we’ll choose our representation of V’ (what is a board anyway…?)

3) polynomial function? X 1 = number of white pieces on board X 2 = number of red pieces X 3 = number of white kings X 4 = number of red kings X 5 = number of white pieces threatened by red (can be captured on red’s next turn) X 6 = number of red pieces threatened by white arrange as linear combination: (quadratic function) Computer program will change the values of the weights – it will learn what the weights should be to give correct score for each board Now we’ll choose our representation of V’ (what is a board anyway…?)

learning process – playing and learning at same time computer will play against itself initialise V’ with random weights (w 0 =23 etc.) V’(b) = x 1 +…

learning process – playing and learning at same time computer will play against itself initialise V’ with random weights (w 0 =23 etc.) start a new game - for each board (a) calculate V’ on all possible legal moves board 1 V’( ) = x 1 +… = 30 b = For example, one of the boards resulting from a legal move might be LegalMoves = 12  16, or 11  15, or …. 12  16 SimulateMove = V 30 points = = = =

learning process – playing and learning at same time computer will play against itself initialise V’ with random weights (w 0 =23 etc.) start a new game - for each board (a) calculate V’ on all possible legal moves board 1 First time round V’ is essentially random (because we set the weights as random) – as it learns V’ should pick better successors LegalMoves = 12  16, or 11  15, or …. 12  16 SimulateMove = V 30 points = = = =

learning process – playing and learning at same time computer will play against itself initialise V’ with random weights (w 0 =23 etc.) start a new game - for each board (a) calculate V’ on all possible legal moves (b) pick the successor with the highest score (c) evaluate error board 1 red:12  16 board 2 At this point board 1 = b board 2 = successor(b)

learning process – playing and learning at same time computer will play against itself initialise V’ with random weights (w 0 =23 etc.) start a new game - for each board (a) calculate V’ on all possible legal moves (b) pick the successor with the highest score (c) evaluate error (d) modify each weight to correct error board 1 board 2 red:12  16

learning process – playing and learning at same time computer will play against itself initialise V’ with random weights (w 0 =23 etc.) start a new game - for each board (a) calculate V’ on all possible legal moves (b) pick the successor with the highest score (c) evaluate error (d) modify each weight to correct error board 1 board 2 error weight to update learning rate modify weight in proportion to size of attribute value red:12  16

learning process – playing and learning at same time computer will play against itself initialise V’ with random weights (w 0 =23 etc.) start a new game - for each board (a) calculate V’ on all possible legal moves (b) pick the successor with the highest score (c) evaluate error (d) modify each weight to correct error board 1 board 2 error weight to update learning rate modify weight in proportion to size of attribute value At this point the change probably won’t be in a useful direction, because the weights are all still random red:12  16

learning process – playing and learning at same time computer will play against itself initialise V’ with random weights (w 0 =23 etc.) start a new game - for each board (a) calculate V’ on all possible legal moves (b) pick the successor with the highest score (c) evaluate error (d) modify each weight to correct error (e) repeat until end-of-game board 1 board 2 …. red wins red:12  16

learning process – playing and learning at same time computer will play against itself initialise V’ with random weights (w 0 =23 etc.) start a new game - for each board (a) calculate V’ on all possible legal moves (b) pick the successor with the highest score (c) evaluate error (d) modify each weight to correct error (e) repeat until end-of-game board 1 board 2 …. red wins b is a final board state V train (b) = 100 …because we won red:12  16

learning process – playing and learning at same time computer will play against itself initialise V’ with random weights (w 0 =23 etc.) start a new game - for each board (a) calculate V’ on all possible legal moves (b) pick the successor with the highest score (c) evaluate error (d) modify each weight to correct error (e) repeat until end-of-game board 1 board 2 …. red wins now we’re shifting our V’ towards something useful (a little bit) red:12  16

learning process – playing and learning at same time computer will play against itself initialise V’ with random weights (w 0 =23 etc.) start a new game - for each board (a) calculate V’ on all possible legal moves (b) pick the successor with the highest score (c) evaluate error (d) modify each weight to correct error (e) repeat until end-of-game board 1 board 2 …. red wins 1 million games later and our V’ might now predict a useful score for any board that we might see red:12  16