IGB GO —— A self-learning GO program Lin WU Information & Computer Science University of California, Irvine.

Slides:

Advertisements

Similar presentations

Learning in Computer Go David Silver. The Problem Large state space  Approximately states  Game tree of about nodes  Branching factor.

Advertisements

Genetic Algorithms (Evolutionary Computing) Genetic Algorithms are used to try to “evolve” the solution to a problem Generate prototype solutions called.

NEURAL NETWORKS Perceptron

Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.

RL for Large State Spaces: Value Function Approximation

Randomized Strategies and Temporal Difference Learning in Poker Michael Oder April 4, 2002 Advisor: Dr. David Mutchler.

10/29/01Reinforcement Learning in Games 1 Colin Cherry Oct 29/01.

CS 4700: Foundations of Artificial Intelligence Bart Selman Reinforcement Learning R&N – Chapter 21 Note: in the next two parts of RL, some of the figure/section.

Tuomas Sandholm Carnegie Mellon University Computer Science Department

Minimax and Alpha-Beta Reduction Borrows from Spring 2006 CS 440 Lecture Slides.

1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.

Artificial Intelligence in Game Design Introduction to Learning.

Honte, a Go-Playing Program Using Neural Nets Frederik Dahl.

Logging and Replay of Go Game Steven Davis Elizabeth Fehrman Seth Groder.

Report on Intrusion Detection and Data Fusion By Ganesh Godavari.

Learning Shape in Computer Go David Silver. A brief introduction to Go Black and white take turns to place down stones Once played, a stone cannot move.

Reinforcement Learning of Local Shape in the Game of Atari-Go David Silver.

Reinforcement Learning

Incorporating Advice into Agents that Learn from Reinforcement Presented by Alp Sardağ.

1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.

Evolution and Coevolution of Artificial Neural Networks playing Go Thesis by Peter Maier, Salzburg, April 2004 Additional paper used Computer Go, by Martin.

1 Solving Ponnuki-Go on Small Board Paper: Solving Ponnuki-Go on small board Authors: Erik van der Werf, Jos Uiterwijk, Jaap van den Herik Presented by:

Monte Carlo Go Has a Way to Go Haruhiro Yoshimoto (*1) Kazuki Yoshizoe (*1) Tomoyuki Kaneko (*1) Akihiro Kishimoto (*2) Kenjiro Taura (*1) (*1)University.

Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.

Reinforcement Learning (1)

Reinforcement Learning of Local Shape in the Game of Atari-Go David Silver.

CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.

A Genetic Algorithms Approach to Feature Subset Selection Problem by Hasan Doğu TAŞKIRAN CS 550 – Machine Learning Workshop Department of Computer Engineering.

Issues with Data Mining

CPSC 171 Introduction to Computer Science 3 Levels of Understanding Algorithms More Algorithm Discovery and Design.

Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.

Introduction Many decision making problems in real life

Neural Networks Ellen Walker Hiram College. Connectionist Architectures Characterized by (Rich & Knight) –Large number of very simple neuron-like processing.

Computer Go : A Go player Rohit Gurjar CS365 Project Proposal, IIT Kanpur Guided By – Prof. Amitabha Mukerjee.

 Summary  How to Play Go  Project Details  Demo  Results  Conclusions.

Learning BlackJack with ANN (Aritificial Neural Network) Ip Kei Sam ID:

Starcraft Opponent Modeling CSE 391: Intro to AI Luciano Cheng.

Report on Intrusion Detection and Data Fusion By Ganesh Godavari.

Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.

Computer Go : A Go player Rohit Gurjar CS365 Project Presentation, IIT Kanpur Guided By – Prof. Amitabha Mukerjee.

Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.

Well Posed Learning Problems Must identify the following 3 features –Learning Task: the thing you want to learn. –Performance measure: must know when you.

ADVANCED PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.

CHECKERS: TD(Λ) LEARNING APPLIED FOR DETERMINISTIC GAME Presented By: Presented To: Amna Khan Mis Saleha Raza.

© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.

GAME PLAYING 1. There were two reasons that games appeared to be a good domain in which to explore machine intelligence: 1.They provide a structured task.

Neural Network Implementation of Poker AI

CS851 – Biological Computing February 6, 2003 Nathanael Paul Randomness in Cellular Automata.

Learning to Play the Game of GO Lei Li Computer Science Department May 3, 2007.

Playing Tic-Tac-Toe with Neural Networks

Riza Erdem Jappie Klooster Dirk Meulenbelt EVOLVING MULTI-MODAL BEHAVIOR IN NPC S.

Artificial Intelligence: Research and Collaborative Possibilities a presentation by: Dr. Ernest L. McDuffie, Assistant Professor Department of Computer.

Well Posed Learning Problems Must identify the following 3 features –Learning Task: the thing you want to learn. –Performance measure: must know when you.

Artificial Intelligence in Game Design Lecture 20: Hill Climbing and N-Grams.

A Scalable Machine Learning Approach to Go Pierre Baldi and Lin Wu UC Irvine.

1 Evaluation Function for Computer Go. 2 Game Objective Surrounding most area on the boardSurrounding most area on the board.

ConvNets for Image Classification

Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno 1 Authors : Siming Liu, Christopher Ballinger, Sushil Louis

 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.

CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.

Understanding AlphaGo. Go Overview Originated in ancient China 2,500 years ago Two players game Goal - surround more territory than the opponent 19X19.

Estimate Testing Size and Effort Using Test Case Point Analysis

D1 Miwa Makoto Chikayama & Taura Lab

Mastering the game of Go with deep neural network and tree search

CS 4700: Foundations of Artificial Intelligence

National Dong Hwa University

Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton

What is a System? A system is a collection of interrelated components that work together to perform a specific task.

Outline Announcement Neural networks Perceptrons - continued

Presentation transcript:

IGB GO —— A self-learning GO program Lin WU Information & Computer Science University of California, Irvine

12/02/2003 Lin WU, Outline Background: – What is GO? – Existing GO programs IGB GO Past work: – Three past scenarios – Present scenario Discussion Conclusion Demon.

12/02/2003 Lin WU, What is GO Black and white player play alternatively. Black plays first. Basic concepts: – Liberty – Eye – Territory – Unconditional live – Position

12/02/2003 Lin WU, What is GO (cont.) Rules – Stone(s) are captured, if the liberty becomes 0. – Captured stones are removed from board – Winner is determined by counting the territory

12/02/2003 Lin WU, Existing GO programs There are many existing GO programs – KCC Igo – HARUKA – Go++ – Goemate – Hand talk – The Many Faces of Go: – GNU GO: – NeuroGo: – etc. None of them can beat average amateur players.

12/02/2003 Lin WU, Conceptual Architecture Pattern libraries: – Library for opening – Library of corner – Library for the internal part of the board – Libraries for attack, defense, connection, etc. Engine: match the board position against the libraries Evaluation: determine the best, if there are multiple hits

12/02/2003 Lin WU, Architecture I The many faces of GO (1981- now) – Knowledge Representation in The Many Faces of Go, David Fotland, February 27, 1993 – Joseki database of standard corner patterns (36,000 moves) – a pattern database of 8x8 patterns (4,000 moves) – a rule based expert system with about 200 rules that suggests plausible moves for full board evaluation

12/02/2003 Lin WU, Architecture II GNU GO ( – now) – GNU GO documentation – Pattern libraries General: patterns.db, patterns2.db Fuseki (opening): fuseki.db Eyes: eyes.db Connection: conn.db Influence: influence.db, barriers.db Etc – GNU Go engine: calculate states of different level, pattern matching, move reasoning.

12/02/2003 Lin WU, Why pattern based system? Simple rules doesn’t mean simple game – Simple rules means extremely huge searching space Board evaluation is hard, especially in the middle of the game – The representation space is extremely huge – The evaluation function is sensitive to small difference of input – Result: to get reliable evaluation results, the level of search have to be very high Pattern based system – Avoid search by pattern matching

12/02/2003 Lin WU, Complexity —— Search time Search Level 5x57x79x919x ,4016,561130, ,625117,649531, E , E 64.3 E 72.1 E 8

12/02/2003 Lin WU, Problems of pattern based system Everything is manual work As system become larger, it’s harder to improve the pattern database. As database becomes larger, more likely to be inconsistent. Results: – Performance improves slower as the performance becomes better.

12/02/2003 Lin WU, Outline Background: – What is GO? – Existing GO programs IGB GO Past work: – Three past scenarios – Present scenario Discussion Conclusion Demon.

12/02/2003 Lin WU, IGB GO A GO program which can improve its performance automatically How? – Use artificial neural networks to learn the evaluation function. – Improving the quality of the neural networks by improving the quality of training data.

12/02/2003 Lin WU, Architecture of the neural networks 6 planes – 1 input plane – 1 output plane – 4 transmission Use recurrent neural network to learn two functions

12/02/2003 Lin WU, How to improving the training data 1. Initiate a group of neural networks 2. Let neural networks play against each other 3. Identify the set of good moves 4. Train neural networks over those good moves 5. Repeat 2.

12/02/2003 Lin WU, Two key issues of this system Given the neural networks, how to identify “the good moves” Given the good moves, how to improve neural networks’ performance efficiently

12/02/2003 Lin WU, Outline Background: – What is GO? – Existing GO programs IGB GO Past work: – Three past scenarios – Present scenario Discussion Conclusion Demon.

12/02/2003 Lin WU, Play against itself 1. Randomly initiate a neural network 2. The neural network plays against itself over a set of initial setups. 3. If black(or white) wins, learn the black(or white) moves. 4. Update weights, repeat 2.

12/02/2003 Lin WU, Play against itself — Good move identification Win: the color who gets larger territory Good moves: all the moves played by wining color

12/02/2003 Lin WU, Play against itself — Results Results – First, improve – Then, begin to get worse – Last, learn a very deterministic and bad pattern Improvement: No guarantee.

12/02/2003 Lin WU, Group playing 1. Initiate a group of neural networks (18) 2. Randomly assign a neural network to another as a pair. 3. Members in a pair play against each other 4. Identify the set of good moves 5. Train the loser neural networks over those good moves 6. Repeat 2.

12/02/2003 Lin WU, Group playing — Good move identification Each pair has two players (A and B) Game1: A plays black, B plays white, get a result R1 Game2: B plays black, A plays white, get a result R2 If R1 > R2, then A is better player. B is the loser. So B learn all the moves played by A.

12/02/2003 Lin WU, Group playing — Results Results – Improve at beginning. – If a player dominates, the whole system degrades as “play against itself”. – No indication of converge till now. (9 machines, 1 month on 9 by 9 board) Improvement: No guarantee.

12/02/2003 Lin WU, ABC scenario 1. Initiate a group of neural networks 2. Randomly assign three different neural networks (A,B,C) in a group 3. Let A and B play against each other 4. Identify the set of good moves 5. Train neural networks over those good moves 6. Repeat 2.

12/02/2003 Lin WU, ABC scenario — Good move identification For a given pair with player A and player B – Suppose B is the loser. Randomly assign a teacher C – C will tell B, what move C will make for every B’s turn C’s suggested move is the same as that of B C’s suggested move is different from B Based on C’s suggest move, A play with B again – Better: understandable good move – The same – Worse The set of good moves is all the understandable good moves

12/02/2003 Lin WU, ABC scenario — Results Results – It took 1 week to get a best player from 3 randomly initialized players – The best player was beaten by another randomly initialized player. – The speed of improving became slower as the performance increased. Improvement: guarantee. Training Speed: unacceptable slow

12/02/2003 Lin WU, Present scenario Output representation: – Two papers: Temporal Difference Learning of Position Evaluation in the Game of Go, Nicol N. Schraudolph, Peter Dayan, and Terrence J. Sejnowski, Advances in Neural Information Processing 6, 1994 Learning to evaluate GO positions via temporal difference methods, Nicol N. Schraudolph, Peter Dayan, and Terrence J. Sejnowski, Soft Computing Techniques in Game Playing, 2000 – Each intersection has an output: real number [0,1] – The likelihood to make a move => the likelihood of securing that intersection as black territory at the end of the game. – Reinforcement learning Good move identification: reinforcement learning identify good moves automatically

12/02/2003 Lin WU, Present scenario — Results Improvement: guarantee. Training Speed: better than ABC scenario, but still slow Results – 5x5: hours training: beat random player 100% weeks ( h): comparable to GNUGO Prediction accuracy is >90% after the board is occupied >50% – 7x7: after 1 month of training, GNUGO beats it without any difficulty

12/02/2003 Lin WU, Outline Background: – What is GO? – Existing GO programs IGB GO Past work: – Three past scenarios – Present scenario Discussion Conclusion Demon.

12/02/2003 Lin WU, Why better results Old architecture – Target is inconsistent – Target is harder to learn, spatial complexity 3 25 / 8 ( ) for 5x5 – Quality of training data is bad New architecture – Target is consistent, and at the end of the game, it’s true target. – Target correlates mainly to local information, so the complexity should be much less than 3 25 / 8 – Quality of training data is determined by the neural network itself.

12/02/2003 Lin WU, Is present arch. enough — search time complexity Board Size 2(25%)3(25%)2(50%)3(50%) 5x ,096531,441 7x7 4,096531,4411.6E72.8E11 9x9 1E63.5E91E121.2E19 19x19 1.2E278.7E421.5E547.6E85

12/02/2003 Lin WU, Known Problems Intrinsic hard problems: – No complexity bounds for the number of iterations to get a better player – Representation space is extremely huge

12/02/2003 Lin WU, Known Problems — Technical Temporary technical problems: – Lack position-level evaluation method – Unable to respond to some unusual cases correctly Unable to AUTOMATICALLY identify the unusual cases, which will cause problems – Time complexity per iteration: Play a match: O(n 6 W) Learn a match: O(n 4 W) for TD0, O(n 6 W) for Q-Learning (19/5) 6 = 3011

12/02/2003 Lin WU, Bounds for iteration Maybe exponential Observation: – Human being: the complexity increases as the level of player increases. – Present implementation: same as above Important to know – How fast the complexity increases, as the level of player increases?

12/02/2003 Lin WU, The complexity could be exponential – Suppose, one player dominate the whole system, or a small group of players dominate the whole system – How much time is needed for obtaining a better new player or a better group? – Repeat the experiment, with the same amount of time, there is a 50% chance to get a better one, due to the symmetry – At least exponential to 2.

12/02/2003 Lin WU, Position-level performance evaluation With it – Study the iteration bounds empirically – The evaluation results can be used to find good tradeoff between performance and searching space Without it – Every method is trial and error, but there exists infinite number of potential methods to try.

12/02/2003 Lin WU, Time complexity per iteration Separate “play” and “learn” – A database of training data – Training data: Best players play against each other Online server Manually find ways to beat the best player. – All players learn the generated training data

12/02/2003 Lin WU, Unusual move identification Difficulty – Search space is huge  Hard to identify automatically Possible solution – Use database to record all such moves, once they appear Can be implemented the same as training database

12/02/2003 Lin WU, Why it’s so hard No method touches the tough problem explicitly. – Key problems: extremely huge searching space hard to evaluate positions Present strategy is to reduce the searching space by improve evaluation function.

12/02/2003 Lin WU, Why it’s so hard (cont.) Reinforcement learning may not be enough – Nicol N. Schraudolph, 6 years without any observable progress – Arthur Samuel, “no progress has been made in overcoming [this defect]” (11 years, ) (Blondie24, p ) Neural network may not learn – Why? Representation space is huge even for the last move 90% occupied, 9x9 board, equal number of black and white – Solution Generalization ability Automatically identify features

12/02/2003 Lin WU, Lesson I Ability to improve The best? – The speed of improving: 5x5: hours training to beat random weeks ( h) to be comparable to GNUGO 7x7: after 1 month of training, GNUGO is still able to win. ==?

12/02/2003 Lin WU, Lesson II Deterministic function between input and output Neural network can learn it without any difficulty No – The intrinsic complexity of the function – Neural network can only learn the correlation between the input and the output, as a result of hill climbing ==?

12/02/2003 Lin WU, Conclusion A self-learning GO program is possible but exists several technically difficult problems – Automatic feature discovery – Automatic learning from failure – Position-level performance evaluation

12/02/2003 Lin WU, Demon.

12/02/2003 Lin WU, Thanks for coming