Understanding AlphaGo. Go Overview Originated in ancient China 2,500 years ago Two players game Goal - surround more territory than the opponent 19X19.

Slides:



Advertisements
Similar presentations
Learning in Computer Go David Silver. The Problem Large state space  Approximately states  Game tree of about nodes  Branching factor.
Advertisements

Adversarial Search Chapter 6 Section 1 – 4. Types of Games.
Monte Carlo Tree Search: Insights and Applications BCS Real AI Event Simon Lucas Game Intelligence Group University of Essex.
CS 484 – Artificial Intelligence
Adversarial Search Chapter 5.
Applied Neuro-Dynamic Programming in the Game of Chess James Gideon.
Reinforcement Learning & Apprenticeship Learning Chenyi Chen.
Learning Shape in Computer Go David Silver. A brief introduction to Go Black and white take turns to place down stones Once played, a stone cannot move.
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Adversarial Search: Game Playing Reading: Chess paper.
Monte Carlo Go Has a Way to Go Haruhiro Yoshimoto (*1) Kazuki Yoshizoe (*1) Tomoyuki Kaneko (*1) Akihiro Kishimoto (*2) Kenjiro Taura (*1) (*1)University.
Marco Adelfio CMSC 828N – Spring 2009 General Game Playing (GGP)
Reinforcement Learning in the Presence of Hidden States Andrew Howard Andrew Arnold {ah679
UCT (Upper Confidence based Tree Search) An efficient game tree search algorithm for the game of Go by Levente Kocsis and Csaba Szepesvari [1]. The UCB1.
Go An ancient Oriental board game Andrew Simons. Introduction 2 player game of skill. Popular in the Far East, growing in the West. Simple rules, extremely.
Game Playing.
Upper Confidence Trees for Game AI Chahine Koleejan.
1 Learning CRFs with Hierarchical Features: An Application to Go Scott Sanner Thore Graepel Ralf Herbrich Tom Minka TexPoint fonts used in EMF. Read the.
Computer Go : A Go player Rohit Gurjar CS365 Project Proposal, IIT Kanpur Guided By – Prof. Amitabha Mukerjee.
Development of a Machine-Learning-Based AI For Go By Justin Park.
1 Phase II - Checkers Operator: Eric Bengfort Temporal Status: End of Week Five Location: Phase Two Presentation Systems Check: Checkers Checksum Passed.
Computer Go : A Go player Rohit Gurjar CS365 Project Presentation, IIT Kanpur Guided By – Prof. Amitabha Mukerjee.
Well Posed Learning Problems Must identify the following 3 features –Learning Task: the thing you want to learn. –Performance measure: must know when you.
Neural Networks Chapter 7
Parallelization in Computer Board Games Ian Princivalli.
Learning to Play the Game of GO Lei Li Computer Science Department May 3, 2007.
Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10
ARTIFICIAL INTELLIGENCE (CS 461D) Princess Nora University Faculty of Computer & Information Systems.
Well Posed Learning Problems Must identify the following 3 features –Learning Task: the thing you want to learn. –Performance measure: must know when you.
Applications of Machine Learning to the Game of Go David Stern Applied Games Group Microsoft Research Cambridge (working with Thore Graepel, Ralf Herbrich,
Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.
AI: AlphaGo European champion : Fan Hui A feat previously thought to be at least a decade away!!!
ConvNets for Image Classification
CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.
Artificial Intelligence AIMA §5: Adversarial Search
Game Playing Why do AI researchers study game playing?
Adversarial Search and Game-Playing
Reinforcement Learning
Stochastic tree search and stochastic games
Machine Learning for Big Data
Status Report on Machine Learning
Reinforcement Learning
Mastering the game of Go with deep neural network and tree search
Adversarial Search and Game Playing (Where making good decisions requires respecting your opponent) R&N: Chap. 6.
Pengantar Kecerdasan Buatan
AlphaGo with Deep RL Alpha GO.
Status Report on Machine Learning
Reinforcement learning (Chapter 21)
Videos NYT Video: DeepMind's alphaGo: Match 4 Summary: see 11 min.
Alpha Go …and Higher Ed Reuben Ternes Oakland University
AlphaGo and learning methods
Deep reinforcement learning
AI Strategies for Probabilistic Turn Based Games
AlphaGO from Google DeepMind in 2016, beat human grandmasters
Adversarial Search Chapter 5.
AlphaGo and learning methods
Games & Adversarial Search
Adversarial Search.
"Playing Atari with deep reinforcement learning."
Games & Adversarial Search
Games & Adversarial Search
Kevin Mason Michael Suggs
Reinforcement Learning
Beyond Monte Carlo Tree Search: Playing Go with Deep Alternative Neural Network and Long-Term Evaluation Jinzhuo Wang, WenminWang, Ronggang Wang, Wen Gao.
Game Playing Fifth Lecture 2019/4/11.
Adversarial Search, Game Playing
These neural networks take a description of the Go board as an input and process it through 12 different network layers containing millions of neuron-like.
Games & Adversarial Search
Prabhas Chongstitvatana Chulalongkorn University
Unit II Game Playing.
Presentation transcript:

Understanding AlphaGo

Go Overview Originated in ancient China 2,500 years ago Two players game Goal - surround more territory than the opponent 19X19 grid board playing pieces “stones“ Turn = place a stone or pass The game ends when both players pass

Go Overview Only two basic rules 1.Capture rule: stones that have no liberties ->captured and removed from board 2.ko rule: a player is not allowed to make a move that returns the game to the previous position X

Go Overview Final position Who won? White Score: 12Black Score: 13

Go In a Reinforcement Set-Up Environment states Actions Transition between states Reinforcement function S = A = r(s)= 0 if s is not a terminal state 1 o.w Goal : find policy that maximize the expected total payoff

Why it is hard for computers to play GO? Possible configuration of board extremely high ~10^700 Impossible to use brute force exhaustive search Chess (b≈35, d≈80) Go (b≈250, d≈150) main challenges Branching factor Value function

game-go.html game-go.html

Training the Deep Neural Networks Human experts (state,action) (state, win/loss) Monte Carlo Tree Search

Training the Deep Neural Networks Policy Value

~30 million (state, action) Goal:maximize the log likelihood of an action Input : 48 feature planes Output: action probability map 19X19X48 12 convolutional + rectifier layers Softmax Probability map

Bigger -> better and slower Accuracy AlphaGo (all input features)57.0% AlphaGo (only raw board position)55.7% state of the art44.4%

12 convolutional + rectifier layers Softmax Probability map ForwardingAccuracy 3 milliseconds55.4% 2 microseconds24.2%

SGA 19X19X48 12 convolutional + rectifier layers Softmax Probability map Preventing overfitting RL policy Won more then 80% of the games against SL policy

Training the Deep Neural Networks Monte Carlo Tree Search Human experts (state,action) (state, win/loss)

Training the Deep Neural Networks ~30m position states Monte Carlo Tree Search ~30m Human experts positions

Position evaluation Approximating optimal value function Input : state, output: probability to win Goal: minimize MSE Overfitting - position within games are strongly correlated 19X19X48 convolutional + rectifier layers fc scalar

Training the Deep Neural Networks ~30m Human expert (state,action) (state,won/loss) Monte Carlo Tree Search

Monte Carlo Experiments : repeated random sampling to obtain numerical results Search method Method for making optimal decisions in artificial intelligence (AI) problems The strongest Go AIs (Fuego, Pachi, Zen, and Crazy Stone) all rely on MCTS

Monte Carlo Tree Search Each round of Monte Carlo tree search consists of four steps 1.Selection 2.Expansion 3.Simulation 4.Backpropagation

MCTS – Upper Confidence Bounds for Trees Exploration Exploitation Tradeoff Kocsis, L. & Szepesvári, C. Bandit based Monte- Carlo planning (2006) Convergence to the optimal solution ExplorationExploitation W i #wins after visiting the node i n i #times node i has been visited C exploration parameter t #times node i parent has been visited

AlphaGo MCTS Selection Expansion Evaluation Backpropagation Each edge (s,a) stores: Q(s,a) - action value (avrerage value of sub tree) N(s,a) – visit count P(s,a) – prior probability Why not using the RL policy??

AlphaGo MCTS Selection Expansion Evaluation Backpropagation

AlphaGo MCTS Selection Expansion Evaluation Backpropagation Leaf evaluation: 1.Value network 2.Random rollout played until terminal

AlphaGo MCTS Selection Expansion Evaluation Backpropagation How to choose the next move? Maximum visit count Less sensitive to outliers than maximum action value

AlphaGo

AlphaGo VS Experts 5:04:1

Take Home Modular system Reinforcement and Deep learning Generic VS

Critical difference between alphaGo & Deep blue Used general purpose algorithms Not a set of handcraft rules Modular system combining planning and pattern recognition Think like human LEE sedol – words best player. South korea. 5 game macth. 5.3 Fan hui wind 5/5. Europe champion