Download presentation
Presentation is loading. Please wait.
Published byDaphne Dunkum Modified over 9 years ago
1
ChooseMove=16 19 V =30 points
2
LegalMoves= 16 19, or SimulateMove = 11 15, or …. 16 19,
3
For all legal moves: - simulate that move - check the value of the result Pick the best move LegalMoves 12 16, or 11 15, or …. 12 16 SimulateMove = V 30 points = =
4
We’ll make our computer program learn the function V - so when it sees a board, it will assign a score to the board - it has to learn how to assign the correct score to a board First, we have to define how our function V is supposed to behave…
5
…but what about boards in the middle of a game? b b ‘ successor( b ) …. V(b’)=100
6
…but what about boards in the middle of a game? b b ‘ successor( b ) …. V(b’)=100 V(b)=V(b’)=100 ….
7
b b ‘ successor( b ) …. V(b’)=100 V(b)=V(b’)=100 V(b)=V(b’)=0 …. V(b’)=-100 NB: if this board state occurs in 50% wins and 50% loses then V(b) will eventually be 0
8
b b ‘ successor( b ) …. V(b’)=100 V(b)=V(b’)=100 V(b)=V(b’)=0 …. V(b’)=-100 NB: if this board state occurs in 75% wins and 25% loses then V(b) will eventually be 50
9
b b ‘ successor( b ) …. V(b’)=100 V(b)=V(b’)=100 V(b)=V(b’)=0 …. V(b’)=-100 so each board score will eventually be a mix of the number of games that you can win, lose or draw from that board (assuming an ideal scoring function…in fact we often only approximate this)
10
…but what about boards in the middle of a game? b b ‘ successor( b ) …. V(b’)=100 V(b)=V(b’)=100 …. that’s our definition of V – this is how our computer program’s V should learn how to behave…
11
But in the middle of a game, how can you calculate the path to the final board? - generate all possibilities? tic-tac-toe chess …takes too long! - approximate V and call it V’ (V hat ) V’(b) V(b) …so we’ll learn an approximation to V b b ‘ successor( b ) …. V(b’)=100 V(b)=V(b’)=100 ….
12
Now we’ll choose our representation of V’ (what is a board anyway…?) 1) big table? 50 90 input: boardscore ….
13
2) CLIPS rules? IF my piece is near a side edge THEN score = 80 IF my piece is near the opponents edge THEN score = 90 …. Now we’ll choose our representation of V’ (what is a board anyway…?)
14
3) polynomial function? X 1 = 12 X 1 = number of white pieces on board X 2 = number of red pieces X 3 = number of white kings X 4 = number of red kings X 5 = number of white pieces threatened by red (can be captured on red’s next turn) X 6 = number of red pieces threatened by white X 2 = 11 X 3 = 0 X 4 = 0 X 5 = 1 X 6 = 0 e.g. we define some variables… Now we’ll choose our representation of V’ (what is a board anyway…?)
15
3) polynomial function? X 1 = number of white pieces on board X 2 = number of red pieces X 3 = number of white kings X 4 = number of red kings X 5 = number of white pieces threatened by red (can be captured on red’s next turn) X 6 = number of red pieces threatened by white arrange as linear combination: (quadratic function) Now we’ll choose our representation of V’ (what is a board anyway…?)
16
3) polynomial function? X 1 = number of white pieces on board X 2 = number of red pieces X 3 = number of white kings X 4 = number of red kings X 5 = number of white pieces threatened by red (can be captured on red’s next turn) X 6 = number of red pieces threatened by white arrange as linear combination: (quadratic function) Computer program will change the values of the weights – it will learn what the weights should be to give correct score for each board Now we’ll choose our representation of V’ (what is a board anyway…?)
17
learning process – playing and learning at same time computer will play against itself initialise V’ with random weights (w 0 =23 etc.) V’(b) = 23 + 0.5 x 1 +…
18
learning process – playing and learning at same time computer will play against itself initialise V’ with random weights (w 0 =23 etc.) start a new game - for each board (a) calculate V’ on all possible legal moves board 1 V’( ) = 23 + 0.5 x 1 +… = 30 b = For example, one of the boards resulting from a legal move might be LegalMoves = 12 16, or 11 15, or …. 12 16 SimulateMove = V 30 points = = = =
19
learning process – playing and learning at same time computer will play against itself initialise V’ with random weights (w 0 =23 etc.) start a new game - for each board (a) calculate V’ on all possible legal moves board 1 First time round V’ is essentially random (because we set the weights as random) – as it learns V’ should pick better successors LegalMoves = 12 16, or 11 15, or …. 12 16 SimulateMove = V 30 points = = = =
20
learning process – playing and learning at same time computer will play against itself initialise V’ with random weights (w 0 =23 etc.) start a new game - for each board (a) calculate V’ on all possible legal moves (b) pick the successor with the highest score (c) evaluate error board 1 red:12 16 board 2 At this point board 1 = b board 2 = successor(b)
21
learning process – playing and learning at same time computer will play against itself initialise V’ with random weights (w 0 =23 etc.) start a new game - for each board (a) calculate V’ on all possible legal moves (b) pick the successor with the highest score (c) evaluate error (d) modify each weight to correct error board 1 board 2 red:12 16
22
learning process – playing and learning at same time computer will play against itself initialise V’ with random weights (w 0 =23 etc.) start a new game - for each board (a) calculate V’ on all possible legal moves (b) pick the successor with the highest score (c) evaluate error (d) modify each weight to correct error board 1 board 2 error weight to update learning rate modify weight in proportion to size of attribute value red:12 16
23
learning process – playing and learning at same time computer will play against itself initialise V’ with random weights (w 0 =23 etc.) start a new game - for each board (a) calculate V’ on all possible legal moves (b) pick the successor with the highest score (c) evaluate error (d) modify each weight to correct error board 1 board 2 error weight to update learning rate modify weight in proportion to size of attribute value At this point the change probably won’t be in a useful direction, because the weights are all still random red:12 16
24
learning process – playing and learning at same time computer will play against itself initialise V’ with random weights (w 0 =23 etc.) start a new game - for each board (a) calculate V’ on all possible legal moves (b) pick the successor with the highest score (c) evaluate error (d) modify each weight to correct error (e) repeat until end-of-game board 1 board 2 …. red wins red:12 16
25
learning process – playing and learning at same time computer will play against itself initialise V’ with random weights (w 0 =23 etc.) start a new game - for each board (a) calculate V’ on all possible legal moves (b) pick the successor with the highest score (c) evaluate error (d) modify each weight to correct error (e) repeat until end-of-game board 1 board 2 …. red wins b is a final board state V train (b) = 100 …because we won red:12 16
26
learning process – playing and learning at same time computer will play against itself initialise V’ with random weights (w 0 =23 etc.) start a new game - for each board (a) calculate V’ on all possible legal moves (b) pick the successor with the highest score (c) evaluate error (d) modify each weight to correct error (e) repeat until end-of-game board 1 board 2 …. red wins now we’re shifting our V’ towards something useful (a little bit) red:12 16
27
learning process – playing and learning at same time computer will play against itself initialise V’ with random weights (w 0 =23 etc.) start a new game - for each board (a) calculate V’ on all possible legal moves (b) pick the successor with the highest score (c) evaluate error (d) modify each weight to correct error (e) repeat until end-of-game board 1 board 2 …. red wins 1 million games later and our V’ might now predict a useful score for any board that we might see red:12 16
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.