Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.

Slides:

Advertisements

Similar presentations

Learning in Computer Go David Silver. The Problem Large state space  Approximately states  Game tree of about nodes  Branching factor.

Advertisements

Adversarial Search We have experience in search where we assume that we are the only intelligent being and we have explicit control over the “world”. Lets.

How to Win a Chinese Chess Game Reinforcement Learning Cheng, Wen Ju.

10/29/01Reinforcement Learning in Games 1 Colin Cherry Oct 29/01.

CS 484 – Artificial Intelligence

2002/11/15Game Programming Workshop1 A Neural Network for Evaluating King Danger in Shogi Reijer Grimbergen Department of Information Science Saga University.

Adversarial Search Chapter 5.

1 Game Playing. 2 Outline Perfect Play Resource Limits Alpha-Beta pruning Games of Chance.

Adversarial Search: Game Playing Reading: Chapter next time.

Applied Neuro-Dynamic Programming in the Game of Chess James Gideon.

MINIMAX SEARCH AND ALPHA- BETA PRUNING: PLAYER 1 VS. PLAYER 2.

The Implementation of Artificial Intelligence and Temporal Difference Learning Algorithms in a Computerized Chess Programme By James Mannion Computer Systems.

Artificial Intelligence for Games Game playing Patrick Olivier

Artificial Intelligence in Game Design Heuristics and Other Ideas in Board Games.

Reinforcement Learning

Games CPSC 386 Artificial Intelligence Ellen Walker Hiram College.

Minimax and Alpha-Beta Reduction Borrows from Spring 2006 CS 440 Lecture Slides.

Probability CSE 473 – Autumn 2003 Henry Kautz. ExpectiMax.

Problem Solving Using Search Reduce a problem to one of searching a graph. View problem solving as a process of moving through a sequence of problem states.

Honte, a Go-Playing Program Using Neural Nets Frederik Dahl.

Reinforcement Learning of Local Shape in the Game of Atari-Go David Silver.

Machine Learning Motivation for machine learning How to set up a problem How to design a learner Introduce one class of learners (ANN) –Perceptrons –Feed-forward.

Adversarial Search: Game Playing Reading: Chess paper.

Othello Sean Farrell June 29, Othello Two-player game played on 8x8 board All pieces have one white side and one black side Initial board setup.

Reinforcement Learning of Local Shape in the Game of Atari-Go David Silver.

Game Playing: Adversarial Search Chapter 6. Why study games Fun Clear criteria for success Interesting, hard problems which require minimal “initial structure”

1 Reinforcement Learning: Learning algorithms Function Approximation Yishay Mansour Tel-Aviv University.

CSC 412: AI Adversarial Search

Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.

Game Trees: MiniMax strategy, Tree Evaluation, Pruning, Utility evaluation Adapted from slides of Yoonsuck Choe.

Minimax Trees: Utility Evaluation, Tree Evaluation, Pruning CPSC 315 – Programming Studio Spring 2008 Project 2, Lecture 2 Adapted from slides of Yoonsuck.

Game Playing. Introduction Why is game playing so interesting from an AI point of view? –Game Playing is harder then common searching The search space.

Artificial Intelligence in Game Design Lecture 22: Heuristics and Other Ideas in Board Games.

Chapter 12 Adversarial Search. (c) 2000, 2001 SNU CSE Biointelligence Lab2 Two-Agent Games (1) Idealized Setting  The actions of the agents are interleaved.

Introduction to Artificial Intelligence CS 438 Spring 2008 Today –AIMA, Ch. 6 –Adversarial Search Thursday –AIMA, Ch. 6 –More Adversarial Search The “Luke.

Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.

CHECKERS: TD(Λ) LEARNING APPLIED FOR DETERMINISTIC GAME Presented By: Presented To: Amna Khan Mis Saleha Raza.

Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 9 of 42 Wednesday, 14.

Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.

1 Introduction to Machine Learning Chapter 1. cont.

ARTIFICIAL INTELLIGENCE (CS 461D) Princess Nora University Faculty of Computer & Information Systems.

Backgammon Group 1: - Remco Bras - Tim Beyer - Maurice Hermans - Esther Verhoef - Thomas Acker.

A Scalable Machine Learning Approach to Go Pierre Baldi and Lin Wu UC Irvine.

Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.

ConvNets for Image Classification

Chapter 5 Adversarial Search. 5.1 Games Why Study Game Playing? Games allow us to experiment with easier versions of real-world situations Hostile agents.

Understanding AlphaGo. Go Overview Originated in ancient China 2,500 years ago Two players game Goal - surround more territory than the opponent 19X19.

Adversarial Search Chapter Two-Agent Games (1) Idealized Setting – The actions of the agents are interleaved. Example – Grid-Space World – Two.

Stochastic tree search and stochastic games

BACE Bowron – Abernethy Chess Engine Christopher Bowron Rob Abernethy.

D1 Miwa Makoto Chikayama & Taura Lab

Mastering the game of Go with deep neural network and tree search

Reinforcement learning (Chapter 21)

AlphaGo with Deep RL Alpha GO.

Backgammon project Oren Salzman Guy Levit Instructors:

Adversarial Search Chapter 5.

Artificial Intelligence Chapter 3 Neural Networks

Chapter 2: Evaluative Feedback

Pruned Search Strategies

Artificial Intelligence Chapter 3 Neural Networks

Artificial Intelligence Chapter 3 Neural Networks

Artificial Intelligence Chapter 3 Neural Networks

These neural networks take a description of the Go board as an input and process it through 12 different network layers containing millions of neuron-like.

Minimax strategies, alpha beta pruning

Chapter 2: Evaluative Feedback

Function approximation

Artificial Intelligence Chapter 3 Neural Networks

Minimax Trees: Utility Evaluation, Tree Evaluation, Pruning

Presentation transcript:

Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab

Outline 1. Introduction 2. Parameter tuning 1. Supervised learning 2. Comparison training 3. Reinforcement learning 3. Conclusion

Introduction

Game Playing Program Simple implementation of real-world problem - If one player wins, the other must lose. Very large search spaces - We can’t get complete information in limited time.

Game tree search Root node: current position Node: position Branch: legal move MINIMAX SEARCH

Requirements for a evaluation function accuracy efficiency Static Evaluation Function Static Evaluation Function Evaluated by

Features and Weights Feature f (The number of pieces of each side, etc.) Weight w (The weight of a important feature must be big.) Linear Non-linear Neural Network, etc.

Parameter tuning

Machine learning in games In simple game like Othello and backgammon, parameter tuning by machine learning has been successful. In complex games like Shogi, hand-crafted evaluation function is still better. Machine learning is used only in limited domains. (only value of materials, etc.)

Outline 1. Introduction 2. Parameter tuning 1. Supervised learning 2. Comparison training 3. Reinforcement learning 3. Conclusion

Training sample: Minimize the error of the evaluation function on these positions. Supervised learning (Position, Score)

Supervised learning(1) Backgammon program [Tesauro 1989] Score is given by human experts. Standard back-propagation Far from human expert-level. Out In 1 In 2 In 458 In 459 ・・・ Input : Position and move (459 hand-crafted feature of Boolean value) Score of Move w1w1 w2w2 w3w3 w5w5 w4w4

Supervised learning Difficulties in supplying training data by experts Consuming much time of experts to create a database. Human experts don’t think in terms of absolute scores.

Supervised learning(2) Bayesian learning [Lee et. al. 1988] Training position is labeled win or lose. Estimate the mean feature vector and the covariance matrix for each label from training data. x1x1 x3x3 x4x4 x2x2 μ win μ lose Test sample

Supervised learning(3) LOGISTELLO [Buro 1998] Build different classifiers for different stages of the game. (Othello is a game of finite plies.) last stage → middle/first stage. (Scores of last stage is more reliable than middle/first stage.)

Outline 1. Introduction 2. Parameter tuning 1. Supervised learning 2. Comparison training 3. Reinforcement learning 3. Conclusion

Comparison training Training sample: Evaluation function learns to satisfy the constraint of these training sample. Expert’s move is preferred above all other moves. (Position_1, Position_2, which is preferable)

Backgammon program [Tesauro 1989] Consistency Transitivity Standard back-propagation Simpler and stronger than preceding versions of supervised learning. Comparison training Final position (a)Final position (b) W1W1 W2W2 W3W3 W4W4 W 1 =W 2 W 3 =-W 4 Which is preferable

Comparison training Problems of Comparison training Is the assumption “human expert’s move is the best“ correct? A program trained on experts’ games will imitate a human playing style, which makes it harder for the program to surprise a human being.

Outline 1. Introduction 2. Parameter tuning 1. Supervised learning 2. Comparison training 3. Reinforcement learning 3. Conclusion

Reinforcement learning No training information from a domain expert. Program explores the different actions. It will receive feedback from the environment (reward). Win or Lose By which margin the program won/lost. Program (Learner) Environment action reward position

TD(λ) Temporal Difference Learning w t : Weight vector at time t. F: Evaluation function (Function of vector W and position x) x t : Position at time t. α: Learning rate. λ: Influence of the current evaluation function value for weight updates of previous moves

Temporal Difference Learning F(x t-3 )F(x t-2 )F(x t-1 ) F(x t )F(x t+1 ) TD(λ) λ= 0 λ= 0.5 λ= 1 F(x t+1 ) F(x t ) F(x t-1 ) F(x t-2 ) F(x t-3 )

Temporal Difference Learning(1) TD-Gammon [Tesauro 1992-] Neural Network (Input: raw board information) TD(λ) Self-play (300,000games) Human expert level Program action

Self-play in other games None of those successors achieved a performance as impressive as TD-Gammon’s. In case of backgammon, dice before each move ensured a sufficient variety. exploration-exploitation dilemma Program action

Temporal Difference learning(2) KNIGHT CAP [Baxter et al. 1998] Learned on Internet chess server 1468 features (linearly combined) × 4 stages TDLeaf (λ)

Knight Cap’s rating All but the material parameters are initially set to zero. After about 1000 games on the server, its rating had improved to exceed 2150, which is an improvement from an average amateur to a strong expert.

Conclusion

Machine learning in games Successful in simple game Used in limited domains in complex game such as Shogi Reinforcement learning is successful is stochastic game such as Backgammon.