Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computer Go : A Go player Rohit Gurjar CS365 Project Presentation, IIT Kanpur Guided By – Prof. Amitabha Mukerjee.

Similar presentations


Presentation on theme: "Computer Go : A Go player Rohit Gurjar CS365 Project Presentation, IIT Kanpur Guided By – Prof. Amitabha Mukerjee."— Presentation transcript:

1 Computer Go : A Go player Rohit Gurjar CS365 Project Presentation, IIT Kanpur rgurjar@iitk.ac.in Guided By – Prof. Amitabha Mukerjee

2 Introduction to the game A two player board game Around 3000 years old Players have white and black stones A stone (or a group) is captured if it is completely surrounded by enemy stones Goal is to surround territories (empty points or enemy stones) by with one’s own stones Image ref : http://www.sente.ch/software/goban/BoardBig.jpg

3 Approach Board Size 9x9 Aim – to learn Evaluation Function f : state -> (-1,1) At any state the next move will be chosen s.t. resulting state will have maximum value. Problem - Reward at the end of Game Implementation using Temporal Difference learning TD learning uses neural networks for learning Ref : A machine learning approach to computer Go (2007) – Jeffrey Bagdis

4 Neural Networks : Perceptron The Perceptron - simplified computational model of the biological neurons Perceptron : Input Vector -> (-1,1) Training to produce a desired output (target) for a corresponding input by adjusting its input weights appropriately. Perceptrons can be connected together to form Multi-layer networks Ref : A machine learning approach to computer Go (2007) – Jeffrey Bagdis

5 Multilayer Network Ref : A machine learning approach to computer Go (2007) – Jeffrey Bagdis

6 Learning Two Phases – Initial Learning Randomly generated symmetric states Target value set to Zero for all states – Actual Learning Plays a game Map the final score to target value for states Set the target values for all the intermediate states according to the final result

7 TD learning Target value for all the states : Final reward Used: 3 layered Neural Network Board Representation : 83 inputs – 81 board points {-1,0,1} – one for score() – one for gameLength Learning to predict by temporal difference methods – R.S. Sutton,1988

8 Network Architecture Input Layer 83 nodes Hidden Layer nodes: 2 times input layer Output Layer : one node It takes a target value for a state and modifies weights to correct the evaluation function Initialization : random weights With random weights it lost all the games against random player

9 Results Initial Learning – Not much improvement – Won 6-7 games out of 50 against random player Actual Learning – After 100 games of learning by self play Won 87 times (out of 100) – After next 1000 games of Learning Results are good but worse than before

10 Reason : exponential decay factor Effect of Final score over intermediate states – Exponential decay(backward) of target value for the intermediate states – Initially (0.9)^(T-t), for the state at time t where T is total no of steps – While learning very small scores obtained (-10..10) – Again played 1000 games of learning with (0.99)^(T-t) – Worked well – won 94 games(out of 100)

11 Important features Next 1000 games – Learning Parameter Initially 0.1 After 1000 games, started learning slowly New value -> 1 – Sigmoid Function Initially a = 1 With a=1, y has value close to 1. Very slow rate of change of weights After 100 games New value -> 0.1

12 Important features Not improved much in next 2000 games Also experimented with map from final score to target value – Score (-81, 81) – Target(-1,1) This map will not affect the learning much

13 Future Work Perhaps learning parameter has to be increase as no. of learning games increased. Appropriately chosen exponential decay factor will improve in learning. In scoring : removal of Dead stones Have to test against GNU go Or a human player

14 References Learning to predict by temporal difference methods – R.S. Sutton,1988 A machine learning approach to computer Go (2007) – Jeffrey Bagdis


Download ppt "Computer Go : A Go player Rohit Gurjar CS365 Project Presentation, IIT Kanpur Guided By – Prof. Amitabha Mukerjee."

Similar presentations


Ads by Google