CONTENTS 1. Introduction 2. The Basic Checker-playing Program 3. Rote Learning and Its Variants 4. Learning Procedure Involving Generalizations 5. Rote Learning vs. Generalization
INTRODUCTION General Methods of Approach Choice of Problem : ‘Checkers’ Heuristic procedures A definite goal (final goal) at least one intermediate goal (criterion) Definite rules of activity The learning process can be tested Familiar & understandable
The Basic Checker-playing Program General method from ‘Shannon, 1950’ as applied to chess 1. Alternatives Which alternative moves are to be considered? 2. Analysis a. Which continuations are to be explored and to what depth? b. How are positions to be evaluated in terms of their patterns? c. How are the evaluations to be integrated into a single value for an alternative? 3. Final choice procedure What procedure is to be used to select the final preferred move?
The Basic Checker-playing Program (Cont’d) << Ply Number >> +20 1 : Proposed move by Machine +20 +3 -70 +15 2 : Anticipated reply by Opponent +100 +20 +4 +3 -10 -70 +7 +15 3 : Proposed move by Machine +100 +50 +20 -7 +4 -3 +3 -10 -20 -70 -100 +3 +7 +15 -5 Exploration to ply level 3 Evaluation with scoring polynomial Selection of alternative by ‘minimax’ procedure
The Basic Checker-playing Program (Cont’d) Ply Limitations depends on the board conditions a. Set a minimum distance b. When the next move is a jump, the last move is a jump, an exchange offer is possible, program continues looking ahead. desired results
The Basic Checker-playing Program (Cont’d) Other Modes of Play Have program play both sides of the game Follow book games evaluation of book move and proposed move by machine (correlation coefficient) Have program play several simultaneous games against different opponents
The Basic Checker-playing Program (Cont’d) Scoring polynomial a. Measure of intermediate goals b. Linear polynomial: sum of terms multiplied by coefficients f(x,c) =c1g1(x)+c2g2(x)+…+cjgj(x) g(x): terms selected from a list of 38 parameters c: coefficients which multiply these parameters
The Basic Checker-playing Program (Cont’d) Scoring polynomial (Cont’d) c. Each term relates to the relative standings of the two sides, with respect to the parameter in question; difference between the ratings for the individual sides. d. Dominant parameters: inability to move, relative piece advantage
The Basic Checker-playing Program (Cont’d) +20 << Ply Number >> 1 2 3 +20 Selection of the best next move depends on the evaluation process. Learning involves improving the evaluation as a result of ‘experiences’ .
Rote Learning and Its Variants Storage scheme Simply save all of the board positions encountered during play, together with their computed scores. Reference is made to this memory record Improvement Reduce computing time Looking much farther in advance Sense of direction
Rote Learning and Its Variants (Cont’d) +20 Board position score +20 +15 …. …. +20 +20 Ply level 6 Learning Improvement
Rote Learning and Its Variants (Cont’d) Cataloging & Culling Stored Information Limit the the number of boards that can be saved & Long search time a. catalog boards that are saved Standardizing & Grouping b. delete redundancies c. discard board positions Method based on frequency of use: Refreshing & Forgetting Method based on ply: cull lowest-ply board positions
Rote Learning and Its Variants (Cont’d) Rote-learning Tests Conclusions: a. A sense of direction & refined system of cataloging and storing information b. Efficiency depends on the data handling capacity of computer c. More information must be stored to improve midgame play d. Game/ suitable vehicle for use during development of learning techniques
Learning Procedure Involving Generalizations An obvious way to decrease the amount of storage needed to utilize the past experience is to generalize on the basis of experience and to save only the generalizations. Generalize on experience after each move by adjusting the coefficients in the evaluation polynomial and by replacing terms which appear to be unimportant by new parameters drawn from a reserve list.
Learning Procedure Involving Generalizations (Cont’d) backed-up score A Scoring System Y=f(x) X: current board position Y: an estimate for backed-up score +20 +20 Ply level 6 Evaluation Improvement Learning
Learning Procedure Involving Generalizations (Cont’d) Back-up score from ply level 3 Board position +20 +15 score …. Board position Backed-up score Function (scoring system) f(x,c) : linear polynomial
Learning Procedure Involving Generalizations (Cont’d) Scoring Polynomial for generalization: f(x,c) =c1g1(x)+c2g2(x)+…+cjgj(x) g(x): terms selected from a list of 38 parameters c: coefficients which multiply these parameters Learning procedure involves, after each move, adjusting the coefficients replacing terms which appear to be unimportant by new parameters
Learning Procedure Involving Generalizations (Cont’d) Training Alpha (with learning) & Beta program (without learning) determine relative ability of Alpha manual intervention (arbitrary change in scoring polynomial)
Learning Procedure Involving Generalizations (Cont’d) Polynomial Modification Procedure Initial scoring polynomial f(x,c) =c1g1(x)+c2g2(x)+…+cjgj(x) At a given board position(xk), a. compute the scoring polynomial (f(xk,c)) and save this polynomial. b. compute the backed-up score(yk), using the look-ahead procedure
Learning Procedure Involving Generalizations (Cont’d) Polynomial Modification Procedure (Cont’d) Delta = yk - f(xk,w) indicator of change used to check the scoring polynomial and adjust weight(coefficient) for each term in polynomial check the scoring polynomial, using delta
Learning Procedure Involving Generalizations (Cont’d) Polynomial Modification Procedure (Cont’d) Adjustment in the values of coefficient a. Correlation beween the signs of the individual term contributions in the initial polynomial and the sign of delta b. Adjustment in consideration of Number of times that each term has been used and has had nonzero value. If delta is positive, terms which contributed positively should have been given more weight, while those that contributed negatively should have been given less weight. c. The coefficient for the term with the largest correlation coefficient is set at a prescribed maximum value, with proportionate values determined for all of the remaining coefficients.
Learning Procedure Involving Generalizations (Cont’d) Instabilities Stabilizing against minor variations in the delta values set an arbitrary minimum value of delta fixed at the average value of the coefficients for the terms in the currently existing evaluation polynomial. Stabilizing violent fluctuations, when a new term is introduced replace the times-used number by an arbitrary number, until the usage does, in fact, equal this number.
Learning Procedure Involving Generalizations (Cont’d) Term Replacement Low-term tally against the lowest correlation coefficient Is it a satisfactory scheme to select terms for the evaluation polynomial? Binary Connective Terms Combinational, nonlinear terms
Learning Procedure Involving Generalizations (Cont’d) Preliminary Learning-by-generalization Tests Learning procedure did work and learning rate was high. Learning was quite erratic and none too stable.
Learning Procedure Involving Generalizations (Cont’d) Second Series of Tests Four Modifications for improving stability Conclusions a. effective learning device for problem to amenable to tree-searching procedures. b. modest memory requirements & reasonable operating time c. instability can be dealt with by straight-forward procedures. d. machine can learn to play a better-than-average game of checkers
Rote Learning vs. Generalization Improvement is made by increasing data storage Good opening play and end-game play poor middle game Learning-by-generalization: Generalization on the experience by adjusting a scoring system Good opening play and end-game play poor middle game
1 2 3 4 5 6 7 8 10 11 12 13 14 15 16 17 19 20 21 22 23 24 25 26 28 29 30 31 32 33 34 35