Download presentation
Presentation is loading. Please wait.
Published byDoreen Gibbs Modified over 8 years ago
1
Well Posed Learning Problems Must identify the following 3 features –Learning Task: the thing you want to learn. –Performance measure: must know when you did bad and when you did good. Often called the critic. –Training Experience: Basically, you’ve got to know how you’re going to get the data you’re going to train the whole furshlugginer thing with.
2
Checkers T: play checkers P: percentage of games won What experience? What exactly should be learned? How shall it be represented? What specific algorithm to learn it?
3
Type of Training Experience Direct or indirect? –Direct: board state -> correct move –Indirect: outcome of a complete game –Credit assignment problem Teacher or not ? –Teacher selects board states –Learner can select board states –Randomly selected board states? Is training experience representative of performance goal? –Training playing against itself –Performance evaluated playing against world champion
4
Choose Target Function ChooseMove : B M : board state move –Maps a legal board state to a legal move Evaluate : B V : board state board value –Assigns a numerical score to any given board state, such that better board states obtain a higher score –Select the best move by evaluating all successor states of legal moves and pick the one with the maximal score
5
Possible Definition of Target Function If b is a final board state that is won then V(b) = 100 If b is a final board state that is lost then V(b) = - 100 If b is a final board state that is drawn then V(b)=0 If b is not a final board state, then V(b)=V(b’), where b’ is the best final board state that can be achieved starting from b and playing optimally until the end of the game. Gives correct values but is not operational
6
State Space Search V(b)= ? V(b)= max i V(b i ) m 1 : b b 1 m 2 : b b 2 m 3 : b b 3
7
State Space Search V(b 1 )= ? m 4 : b b 4 m 5 : b b 5 m 6 : b b 6 V(b 1 )= min i V(b i )
8
Final Board States Black wins: V(b)=-100 Red wins: V(b)=100 draw: V(b)=0
9
Number of Board States Tic-Tac-Toe: #board states < 9!/(5! 4!) + 9!/(1! 4! 4!) + … … + 9!/(2! 4! 3!) + … 9 = 6045 4 x 4 checkers: #board states = ? #board states < 8x7x6x5*2 2 /(2!*2!) = 1680 Regular checkers (8x8 board, 8 pieces each) #board states < 32!*2 16 /(8! * 8! * 16!) = 5.07*10 17
10
Choose Representation of Target Function Table look-up Collection of rules Neural networks Polynomial function of board features Trade-offs in choosing an expressive representation ??? –Approximation accuracy –Number of training examples to learn the target function
11
Representation of Target Function V(b)= 0 + 1 bp(b) + 2 rp(b) + 3 bk(b) + 4 rk(b) + 5 bt(b) + 6 rt(b) bp(b): #black pieces rp(b): #red pieces bk(b): #black kings rk(b): #red kings bt(b): #red pieces threatened by black rt(b): #black pieces threatened by red
12
Obtaining Training Examples V(b) : true target function V’(b) : learned target function V train (b) : training value (estimate) Rule for estimating training values: V train (b) V’(Successor(b))
13
Choose Weight Training Rule LMS weight update rule: Select a training example b at random 1. Compute error(b) error(b) = V train (b) – V’(b) 2. For each board feature fi, update weight i i + f i error(b) : try learning rate = 0.1
14
Example: 4x4 checkers V(b)= 0 + 1 rp(b) + 2 bp(b) Initial weights: 0 =-10, 1 =75, 2 =-60 V(b 0 )= 0 + 1 *2 + 2 *2 = 20 m 1 : b b 1 V(b 1 )=20 m 2 : b b 2 V(b 2 )=20 m 3 : b b 3 V(b 3 )=20
15
Example 4x4 checkers V(b 1 )=20 V(b 0 )=20 1. Compute error(b 0 ) = V train (b) – V(b 0 ) = V(b 1 ) – V(b 0 ) = 0 2. For each board feature fi, update weight i i + f i error(b) 0 0 + 0.1 * 1 * 0 1 1 + 0.1 * 2 * 0 2 2 + 0.1 * 2 * 0
16
Example: 4x4 checkers V(b 2 )=20 V(b 0 )=20 V(b 1 )=20 V(b 3 )=20
17
Example: 4x4 checkers V(b 4b )=-55 V(b 4a )=20 V(b 3 )=20
18
Example 4x4 checkers V(b 4 )=-55 V(b 3 )=20 1. Compute error(b 3 ) = V train (b) – V(b 3 ) = V(b 4 ) – V(b 3 ) = -75 2. For each board feature fi, update weight i i + f i error(b) : 0 =-10, 1 =75, 2 =-60 0 0 - 0.1 * 1 * 75, 0 = -17.5 1 1 - 0.1 * 2 * 75, 1 = 60 2 2 - 0.1 * 2 * 75, 2 = -75
19
Example: 4x4 checkers V(b 5 )=-107.5 V(b 4 )=-107.5 0 = -17.5, 1 = 60, 2 = -75
20
Example 4x4 checkers V(b 6 )=-167.5 V(b 5 )=-107.5 error(b 5 ) = V train (b) – V(b 5 ) = V(b 6 ) – V(b 5 ) = -60 0 =-17.5, 1 =60, 2 =-75 i i + f i error(b) 0 0 - 0.1 * 1 * 60, 0 = -23.5 1 1 - 0.1 * 1 * 60, 1 = 54 2 2 - 0.1 * 2 * 60, 2 = -87
21
Example 4x4 checkers V(b 6 )=-197.5 error(b 6 ) = V train (b) – V(b 6 ) = V f (b 6 ) – V(b 6 ) = 97.5 0 =-23.5, 1 =54, 2 =-87 i i + f i error(b) 0 0 + 0.1 * 1 * 97.5, 0 = –13.75 1 1 + 0.1 * 0 * 97.5, 1 = 54 2 2 + 0.1 * 2 * 97.5, 2 = -67.5 Final board state: black won V f (b)=-100
22
Questions We have constrained the evaluation function to a linear sum of features. Can the true evaluation function be represented by these features in this way? Even if so, will the learning technique be able to learn the correct weights? –Practically? –If it can’t, how bad can it get? How bad is it to likely get?
23
Optical Character Recognition Task ? Performance Measure? Training Experience? –Type of knowledge to be learned –Target function, representation –Getting training data –Learning algorithm?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.