Supervise Learning Introduction
What is Learning Problem Learning = Improving with experience at some task – Improve over task T, – With respect to performance measure P – Based on experience E. Example: Learn to play checkers – T: Play checkers – P: % of games won in world tournament – E: Opportunity to play against self
Learning to Play Checkers T: Play checkers Percent of games won in world tournament What experience? What exactly should be learned. How shall it be represented? Training Distribution=Testing Distribution? What specific algorithm to learn it?
Choose the Target Function ChooseMove: Board Move ?? – Input: The set of legal board state – Output: Some move from legal move Reduce the problem of improving performance P at task T to the problem of learning some target function such as ChooseMove
Choose the Target Function ChooseMove – Straightforward to transform to this function – Difficult to learn given indirect training experience Function V – Board Move can assign a score for each state – Easy to select best move
Possible Definition for Target Function V If b is a final board state that is won, then V(b) = 100 If b is a final board state that is lost, then V(b) = -100 If b is a final board state that is drawn, then V(b) = 0 if b is a not a final state in the game, then V(b) = V(b'),where b' is the best final board state that can be achieved starting from b and playing optimally until the end of the game. This gives correct values, but is not operational
Choose the Target Function Recursive function Not efficiently computable Nonoperational definition Ideal target function is difficult Approximation solution
Choose the Target Function Use linear function of the form to represent the function – x1 , the number of black pieces on the board – x2 , the number of red pieces on the board – x3 , the number of black kings on the board – x4 , the number of red kings on the board – x5 , the number of black pieces threatened by red – x6 , the number of black pieces threatened by black
Choose the Target Function Target function representation V(b)=w0+w1x1+w2x2+…+w6x6 Reduce the problem of learning a checkers strategy to the problem of learning values for coefficients w0 though w6
An Approximated Function Each training example is an ordered pair of the form – – B is board state , Vtrain(b) is training value – E.g. ,,100> Training process – Assign specific scores to specific board states. – Find the best wi to match the training examples
An Approximated Function Rules for estimating training values – Vtrain(b)=V’(Successor(b)) Find best weight – Learning algorithm for choosing the weights wi to best fit the set of training examples. – LMS Least Mean Squares
LMS Weight Update Rule For each training example – Use the current weights to calculate – For each weight wi, update it as