Download presentation
Presentation is loading. Please wait.
Published byDarlene Green Modified over 8 years ago
1
Supervise Learning Introduction
2
What is Learning Problem Learning = Improving with experience at some task – Improve over task T, – With respect to performance measure P – Based on experience E. Example: Learn to play checkers – T: Play checkers – P: % of games won in world tournament – E: Opportunity to play against self
3
Learning to Play Checkers T: Play checkers Percent of games won in world tournament What experience? What exactly should be learned. How shall it be represented? Training Distribution=Testing Distribution? What specific algorithm to learn it?
4
Choose the Target Function ChooseMove: Board Move ?? – Input: The set of legal board state – Output: Some move from legal move Reduce the problem of improving performance P at task T to the problem of learning some target function such as ChooseMove
5
Choose the Target Function ChooseMove – Straightforward to transform to this function – Difficult to learn given indirect training experience Function V – Board Move can assign a score for each state – Easy to select best move
6
Possible Definition for Target Function V If b is a final board state that is won, then V(b) = 100 If b is a final board state that is lost, then V(b) = -100 If b is a final board state that is drawn, then V(b) = 0 if b is a not a final state in the game, then V(b) = V(b'),where b' is the best final board state that can be achieved starting from b and playing optimally until the end of the game. This gives correct values, but is not operational
7
Choose the Target Function Recursive function Not efficiently computable Nonoperational definition Ideal target function is difficult Approximation solution
8
Choose the Target Function Use linear function of the form to represent the function – x1 , the number of black pieces on the board – x2 , the number of red pieces on the board – x3 , the number of black kings on the board – x4 , the number of red kings on the board – x5 , the number of black pieces threatened by red – x6 , the number of black pieces threatened by black
9
Choose the Target Function Target function representation V(b)=w0+w1x1+w2x2+…+w6x6 Reduce the problem of learning a checkers strategy to the problem of learning values for coefficients w0 though w6
10
An Approximated Function Each training example is an ordered pair of the form – – B is board state , Vtrain(b) is training value – E.g. ,,100> Training process – Assign specific scores to specific board states. – Find the best wi to match the training examples
11
An Approximated Function Rules for estimating training values – Vtrain(b)=V’(Successor(b)) Find best weight – Learning algorithm for choosing the weights wi to best fit the set of training examples. – LMS Least Mean Squares
12
LMS Weight Update Rule For each training example – Use the current weights to calculate – For each weight wi, update it as
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.