CONTENTS 1. Introduction 2. The Basic Checker-playing Program

Slides:



Advertisements
Similar presentations
Adversarial Search We have experience in search where we assume that we are the only intelligent being and we have explicit control over the “world”. Lets.
Advertisements

© 2015 McGraw-Hill Education. All rights reserved. Chapter 15 Game Theory.
Adversarial Search Chapter 5.
MINIMAX SEARCH AND ALPHA- BETA PRUNING: PLAYER 1 VS. PLAYER 2.
Search Strategies.  Tries – for word searchers, spell checking, spelling corrections  Digital Search Trees – for searching for frequent keys (in text,
The Implementation of Machine Learning in the Game of Checkers Billy Melicher Computer Systems lab
Mahgul Gulzai Moomal Umer Rabail Hafeez
The back-propagation training algorithm
1 search CS 331/531 Dr M M Awais A* Examples:. 2 search CS 331/531 Dr M M Awais 8-Puzzle f(N) = g(N) + h(N)
MAE 552 – Heuristic Optimization Lecture 28 April 5, 2002 Topic:Chess Programs Utilizing Tree Searches.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Business Forecasting Chapter 5 Forecasting with Smoothing Techniques.
1 Adversary Search Ref: Chapter 5. 2 Games & A.I. Easy to measure success Easy to represent states Small number of operators Comparison against humans.
Game Playing Chapter 5. Game playing §Search applied to a problem against an adversary l some actions are not under the control of the problem-solver.
Lecture 6: Game Playing Heshaam Faili University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance.
1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 3. Rote Learning.
Game Playing Chapter 5. Game playing §Search applied to a problem against an adversary l some actions are not under the control of the problem-solver.
Artificial Neural Network Supervised Learning دكترمحسن كاهاني
Games. Adversaries Consider the process of reasoning when an adversary is trying to defeat our efforts In game playing situations one searches down the.
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 2.
GAME PLAYING 1. There were two reasons that games appeared to be a good domain in which to explore machine intelligence: 1.They provide a structured task.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 9 of 42 Wednesday, 14.
Tetris Agent Optimization Using Harmony Search Algorithm
Game tree search Chapter 6 (6.1 to 6.3 and 6.6) cover games. 6.6 covers state of the art game players in particular. 6.5 covers games that involve uncertainty.
February 25, 2016Introduction to Artificial Intelligence Lecture 10: Two-Player Games II 1 The Alpha-Beta Procedure Can we estimate the efficiency benefit.
ESTIMATING WEIGHT Course: Special Topics in Remote Sensing & GIS Mirza Muhammad Waqar Contact: EXT:2257 RG712.
Understanding AI of 2 Player Games. Motivation Not much experience in AI (first AI project) and no specific interests/passion that I wanted to explore.
Search: Games & Adversarial Search Artificial Intelligence CMSC January 28, 2003.
Supervise Learning Introduction. What is Learning Problem Learning = Improving with experience at some task – Improve over task T, – With respect to performance.
Determining How Costs Behave
Game Playing Why do AI researchers study game playing?
Adversarial Search and Game-Playing
EA C461 – Artificial Intelligence Adversarial Search
Othello Artificial Intelligence With Machine Learning
Last time: search strategies
Depth-First Search N-Queens Problem Hamiltonian Circuits
Iterative Deepening A*
By James Mannion Computer Systems Lab Period 3
Adversarial Search and Game Playing (Where making good decisions requires respecting your opponent) R&N: Chap. 6.
Pengantar Kecerdasan Buatan
The Implementation of Machine Learning in the Game of Checkers
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
Othello Artificial Intelligence With Machine Learning
The Implementation of Machine Learning in the Game of Checkers
Adversarial Search Chapter 5.
Dakota Ewigman Jacob Zimmermann
Adversarial Search.
Artificial Intelligence Chapter 12 Adversarial Search
Alpha-Beta Search.
Kevin Mason Michael Suggs
NIM - a two person game n objects are in one pile
Alpha-Beta Search.
The Alpha-Beta Procedure
Introduction to Artificial Intelligence Lecture 9: Two-Player Games I
Alpha-Beta Search.
Reinforcement Learning for Adaptive Game Learner
Minimax strategies, alpha beta pruning
Alpha-Beta Search.
Mini-Max search Alpha-Beta pruning General concerns on games
CSE (c) S. Tanimoto, 2007 Search 2: AlphaBeta Pruning
Reseeding-based Test Set Embedding with Reduced Test Sequences
Search.
Search.
Adversarial Search CS 171/271 (Chapter 6)
Alpha-Beta Search.
Games & Adversarial Search
Minimax strategies, alpha beta pruning
CS51A David Kauchak Spring 2019
Adversarial Search Chapter 6 Section 1 – 4.
Minimax Trees: Utility Evaluation, Tree Evaluation, Pruning
Presentation transcript:

CONTENTS 1. Introduction 2. The Basic Checker-playing Program 3. Rote Learning and Its Variants 4. Learning Procedure Involving Generalizations 5. Rote Learning vs. Generalization

INTRODUCTION General Methods of Approach Choice of Problem : ‘Checkers’ Heuristic procedures A definite goal (final goal) at least one intermediate goal (criterion) Definite rules of activity The learning process can be tested Familiar & understandable

The Basic Checker-playing Program General method from ‘Shannon, 1950’ as applied to chess 1. Alternatives Which alternative moves are to be considered? 2. Analysis a. Which continuations are to be explored and to what depth? b. How are positions to be evaluated in terms of their patterns? c. How are the evaluations to be integrated into a single value for an alternative? 3. Final choice procedure What procedure is to be used to select the final preferred move?

The Basic Checker-playing Program (Cont’d) << Ply Number >> +20 1 : Proposed move by Machine +20 +3 -70 +15 2 : Anticipated reply by Opponent +100 +20 +4 +3 -10 -70 +7 +15 3 : Proposed move by Machine +100 +50 +20 -7 +4 -3 +3 -10 -20 -70 -100 +3 +7 +15 -5 Exploration to ply level 3 Evaluation with scoring polynomial Selection of alternative by ‘minimax’ procedure

The Basic Checker-playing Program (Cont’d) Ply Limitations depends on the board conditions a. Set a minimum distance b. When the next move is a jump, the last move is a jump, an exchange offer is possible, program continues looking ahead. desired results

The Basic Checker-playing Program (Cont’d) Other Modes of Play Have program play both sides of the game Follow book games evaluation of book move and proposed move by machine (correlation coefficient) Have program play several simultaneous games against different opponents

The Basic Checker-playing Program (Cont’d) Scoring polynomial a. Measure of intermediate goals b. Linear polynomial: sum of terms multiplied by coefficients f(x,c) =c1g1(x)+c2g2(x)+…+cjgj(x) g(x): terms selected from a list of 38 parameters c: coefficients which multiply these parameters

The Basic Checker-playing Program (Cont’d) Scoring polynomial (Cont’d) c. Each term relates to the relative standings of the two sides, with respect to the parameter in question; difference between the ratings for the individual sides. d. Dominant parameters: inability to move, relative piece advantage

The Basic Checker-playing Program (Cont’d) +20 << Ply Number >> 1 2 3 +20 Selection of the best next move depends on the evaluation process. Learning involves improving the evaluation as a result of ‘experiences’ .

Rote Learning and Its Variants Storage scheme Simply save all of the board positions encountered during play, together with their computed scores. Reference is made to this memory record Improvement Reduce computing time Looking much farther in advance Sense of direction

Rote Learning and Its Variants (Cont’d) +20 Board position score +20 +15 …. …. +20 +20 Ply level 6 Learning Improvement

Rote Learning and Its Variants (Cont’d) Cataloging & Culling Stored Information Limit the the number of boards that can be saved & Long search time a. catalog boards that are saved Standardizing & Grouping b. delete redundancies c. discard board positions Method based on frequency of use: Refreshing & Forgetting Method based on ply: cull lowest-ply board positions

Rote Learning and Its Variants (Cont’d) Rote-learning Tests Conclusions: a. A sense of direction & refined system of cataloging and storing information b. Efficiency depends on the data handling capacity of computer c. More information must be stored to improve midgame play d. Game/ suitable vehicle for use during development of learning techniques

Learning Procedure Involving Generalizations An obvious way to decrease the amount of storage needed to utilize the past experience is to generalize on the basis of experience and to save only the generalizations. Generalize on experience after each move by adjusting the coefficients in the evaluation polynomial and by replacing terms which appear to be unimportant by new parameters drawn from a reserve list.

Learning Procedure Involving Generalizations (Cont’d) backed-up score A Scoring System Y=f(x) X: current board position Y: an estimate for backed-up score +20 +20 Ply level 6 Evaluation Improvement Learning

Learning Procedure Involving Generalizations (Cont’d) Back-up score from ply level 3 Board position +20 +15 score …. Board position Backed-up score Function (scoring system) f(x,c) : linear polynomial

Learning Procedure Involving Generalizations (Cont’d) Scoring Polynomial for generalization: f(x,c) =c1g1(x)+c2g2(x)+…+cjgj(x) g(x): terms selected from a list of 38 parameters c: coefficients which multiply these parameters Learning procedure involves, after each move, adjusting the coefficients replacing terms which appear to be unimportant by new parameters

Learning Procedure Involving Generalizations (Cont’d) Training Alpha (with learning) & Beta program (without learning) determine relative ability of Alpha manual intervention (arbitrary change in scoring polynomial)

Learning Procedure Involving Generalizations (Cont’d) Polynomial Modification Procedure Initial scoring polynomial f(x,c) =c1g1(x)+c2g2(x)+…+cjgj(x) At a given board position(xk), a. compute the scoring polynomial (f(xk,c)) and save this polynomial. b. compute the backed-up score(yk), using the look-ahead procedure

Learning Procedure Involving Generalizations (Cont’d) Polynomial Modification Procedure (Cont’d) Delta = yk - f(xk,w) indicator of change used to check the scoring polynomial and adjust weight(coefficient) for each term in polynomial check the scoring polynomial, using delta

Learning Procedure Involving Generalizations (Cont’d) Polynomial Modification Procedure (Cont’d) Adjustment in the values of coefficient a. Correlation beween the signs of the individual term contributions in the initial polynomial and the sign of delta b. Adjustment in consideration of Number of times that each term has been used and has had nonzero value. If delta is positive, terms which contributed positively should have been given more weight, while those that contributed negatively should have been given less weight. c. The coefficient for the term with the largest correlation coefficient is set at a prescribed maximum value, with proportionate values determined for all of the remaining coefficients.

Learning Procedure Involving Generalizations (Cont’d) Instabilities Stabilizing against minor variations in the delta values set an arbitrary minimum value of delta fixed at the average value of the coefficients for the terms in the currently existing evaluation polynomial. Stabilizing violent fluctuations, when a new term is introduced replace the times-used number by an arbitrary number, until the usage does, in fact, equal this number.

Learning Procedure Involving Generalizations (Cont’d) Term Replacement Low-term tally against the lowest correlation coefficient Is it a satisfactory scheme to select terms for the evaluation polynomial? Binary Connective Terms Combinational, nonlinear terms

Learning Procedure Involving Generalizations (Cont’d) Preliminary Learning-by-generalization Tests Learning procedure did work and learning rate was high. Learning was quite erratic and none too stable.

Learning Procedure Involving Generalizations (Cont’d) Second Series of Tests Four Modifications for improving stability Conclusions a. effective learning device for problem to amenable to tree-searching procedures. b. modest memory requirements & reasonable operating time c. instability can be dealt with by straight-forward procedures. d. machine can learn to play a better-than-average game of checkers

Rote Learning vs. Generalization Improvement is made by increasing data storage Good opening play and end-game play poor middle game Learning-by-generalization: Generalization on the experience by adjusting a scoring system Good opening play and end-game play poor middle game

1 2 3 4 5 6 7 8 10 11 12 13 14 15 16 17 19 20 21 22 23 24 25 26 28 29 30 31 32 33 34 35