Presentation is loading. Please wait.

Presentation is loading. Please wait.

Supervise Learning Introduction. What is Learning Problem Learning = Improving with experience at some task – Improve over task T, – With respect to performance.

Similar presentations

Presentation on theme: "Supervise Learning Introduction. What is Learning Problem Learning = Improving with experience at some task – Improve over task T, – With respect to performance."— Presentation transcript:

1 Supervise Learning Introduction

2 What is Learning Problem Learning = Improving with experience at some task – Improve over task T, – With respect to performance measure P – Based on experience E. Example: Learn to play checkers – T: Play checkers – P: % of games won in world tournament – E: Opportunity to play against self

3 Learning to Play Checkers T: Play checkers Percent of games won in world tournament What experience? What exactly should be learned. How shall it be represented? Training Distribution=Testing Distribution? What specific algorithm to learn it?

4 Choose the Target Function ChooseMove: Board  Move ?? – Input: The set of legal board state – Output: Some move from legal move Reduce the problem of improving performance P at task T to the problem of learning some target function such as ChooseMove

5 Choose the Target Function ChooseMove – Straightforward to transform to this function – Difficult to learn given indirect training experience Function V – Board  Move can assign a score for each state – Easy to select best move

6 Possible Definition for Target Function V If b is a final board state that is won, then V(b) = 100 If b is a final board state that is lost, then V(b) = -100 If b is a final board state that is drawn, then V(b) = 0 if b is a not a final state in the game, then V(b) = V(b'),where b' is the best final board state that can be achieved starting from b and playing optimally until the end of the game. This gives correct values, but is not operational

7 Choose the Target Function Recursive function Not efficiently computable Nonoperational definition Ideal target function is difficult Approximation solution

8 Choose the Target Function Use linear function of the form to represent the function – x1 , the number of black pieces on the board – x2 , the number of red pieces on the board – x3 , the number of black kings on the board – x4 , the number of red kings on the board – x5 , the number of black pieces threatened by red – x6 , the number of black pieces threatened by black

9 Choose the Target Function Target function representation V(b)=w0+w1x1+w2x2+…+w6x6 Reduce the problem of learning a checkers strategy to the problem of learning values for coefficients w0 though w6

10 An Approximated Function Each training example is an ordered pair of the form – – B is board state , Vtrain(b) is training value – E.g. ,,100> Training process – Assign specific scores to specific board states. – Find the best wi to match the training examples

11 An Approximated Function Rules for estimating training values – Vtrain(b)=V’(Successor(b)) Find best weight – Learning algorithm for choosing the weights wi to best fit the set of training examples. – LMS Least Mean Squares

12 LMS Weight Update Rule For each training example – Use the current weights to calculate – For each weight wi, update it as

Download ppt "Supervise Learning Introduction. What is Learning Problem Learning = Improving with experience at some task – Improve over task T, – With respect to performance."

Similar presentations

Ads by Google