Mastering the game of Go with deep neural network and tree search

Slides:



Advertisements
Similar presentations
Slides from: Doug Gray, David Poole
Advertisements

Monte Carlo Tree Search: Insights and Applications BCS Real AI Event Simon Lucas Game Intelligence Group University of Essex.
Reinforcement Learning & Apprenticeship Learning Chenyi Chen.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Honte, a Go-Playing Program Using Neural Nets Frederik Dahl.
Reinforcement Learning of Local Shape in the Game of Atari-Go David Silver.
לביצוע מיידי ! להתחלק לקבוצות –2 או 3 בקבוצה להעביר את הקבוצות – היום בסוף השיעור ! ספר Reinforcement Learning – הספר קיים online ( גישה מהאתר של הסדנה.
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Back-Propagation Algorithm
Reinforcement Learning of Local Shape in the Game of Atari-Go David Silver.
1 Reinforcement Learning: Learning algorithms Function Approximation Yishay Mansour Tel-Aviv University.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Artificial Neural Networks
Upper Confidence Trees for Game AI Chahine Koleejan.
Neural Network Introduction Hung-yi Lee. Review: Supervised Learning Training: Pick the “best” Function f * Training Data Model Testing: Hypothesis Function.
Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.
Monte-Carlo methods for Computation and Optimization Spring 2015 Based on “N-Grams and the Last-Good-Reply Policy Applied in General Game Playing” (Mandy.
Multi-Layer Perceptron
Neural Networks Chapter 7
Neural Networks Vladimir Pleskonjić 3188/ /20 Vladimir Pleskonjić General Feedforward neural networks Inputs are numeric features Outputs are in.
A Scalable Machine Learning Approach to Go Pierre Baldi and Lin Wu UC Irvine.
Bab 5 Classification: Alternative Techniques Part 4 Artificial Neural Networks Based Classifer.
Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.
AI: AlphaGo European champion : Fan Hui A feat previously thought to be at least a decade away!!!
ConvNets for Image Classification
Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with.
Xintao Wu University of Arkansas Introduction to Deep Learning 1.
Understanding AlphaGo. Go Overview Originated in ancient China 2,500 years ago Two players game Goal - surround more territory than the opponent 19X19.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Adversarial Search and Game-Playing
Machine Learning Supervised Learning Classification and Regression
Reinforcement Learning
Convolutional Neural Network
Stochastic tree search and stochastic games
Deep Feedforward Networks
Reinforcement Learning
Machine Learning for Big Data
Introduction of Reinforcement Learning
Deep Reinforcement Learning
Status Report on Machine Learning
Adversarial Learning for Neural Dialogue Generation
Reinforcement learning (Chapter 21)
AlphaGo with Deep RL Alpha GO.
Status Report on Machine Learning
Reinforcement learning (Chapter 21)
Videos NYT Video: DeepMind's alphaGo: Match 4 Summary: see 11 min.
Alpha Go …and Higher Ed Reuben Ternes Oakland University
Reinforcement Learning
AlphaGo and learning methods
Deep reinforcement learning
AlphaGO from Google DeepMind in 2016, beat human grandmasters
AlphaGo and learning methods
Neural Networks and Backpropagation
CSC 578 Neural Networks and Deep Learning
Training Neural networks to play checkers
"Playing Atari with deep reinforcement learning."
Machine Learning Today: Reading: Maria Florina Balcan
Goodfellow: Chap 6 Deep Feedforward Networks
Kevin Mason Michael Suggs
Reinforcement Learning
September 22, 2011 Dr. Itamar Arel College of Engineering
Chapter 6: Temporal Difference Learning
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Artificial Intelligence 10. Neural Networks
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Adversarial Search, Game Playing
These neural networks take a description of the Go board as an input and process it through 12 different network layers containing millions of neuron-like.
Introduction to Neural Networks
Prabhas Chongstitvatana Chulalongkorn University
Principles of Back-Propagation
Presentation transcript:

Mastering the game of Go with deep neural network and tree search David Silver, Aja Huang, et al.

The Game of Go (围棋, 囲碁, 바둑) Invented in China more than 2000 years ago Two simple rules: Players put black and white pieces on crosses of the board alternatively. The player controlling the most crosses in the end wins. If all liberties of a piece or of a set of connected pieces are taken by pieces of another color, it/they will be removed from the board. A huge search space The board has 361 (19*19) crosses. Approximately 250^150 possible sequence of moves.

Monte Carlo Tree Search (MCTS) Selection: start from root R and select successive child nodes down to a leaf node L. The section below says more about a way of choosing child nodes that lets the game tree expand towards most promising moves, which is the essence of Monte Carlo tree search. Expansion: unless L ends the game with a win/loss for either player, either create one or more child nodes or choose from them node C. Simulation: play a random playout from node C. Backpropagation: use the result of the playout to update information in the nodes on the path from C to R.

MCTS

Supervised Learning of Policy Network The SL policy network pσ(a|s) alternates between convolutional layers with weights σ, and rectifier nonlinearities. A final softmax layer outputs a probability distribution over all legal moves a. The input s to the policy network is a simple representation of the board state The policy network is trained on randomly sampled state-action pairs (s, a), using stochastic gradient ascent to maximize the likelihood of the human move a selected in state s

Reinforcement Learning of Policy Network play games between the current policy network pρ and a randomly selected previous iteration of the policy network.  Randomizing from a pool of opponents in this way stabilizes training by preventing overfitting to the current policy. use a reward function r(s) that is zero for all non-terminal time steps t < T. outcome zt = ± r(sT) is the terminal reward at the end of the game from the perspective of the current player at time step t: +1 for winning and −1 for losing.

Reinforcement Learning of Policy Network   Reinforcement Learning of Policy Network Weights are then updated at each time step t by stochastic gradient ascent in the direction that maximizes expected outcome evaluate the performance of the RL policy network in game play, sampling each move from its output probability distribution over actions.

Reinforcement Learning of Value Network estimating a value function vp(s) that predicts the outcome from position s of games played by using policy p for both players approximate the value function using a value network vθ(s) with weights θ  train the weights of the value network by regression on state- outcome pairs (s, z), using stochastic gradient descent to minimize the mean squared error (MSE) between the predicted value vθ(s), and the corresponding outcome z

Neural network training pipeline and architecture D Silver et al. Nature 529, 484–489 (2016) doi:10.1038/nature16961

MCTS in AlphaGo

Thank You