Presentation is loading. Please wait.

Presentation is loading. Please wait.

Understanding AlphaGo. Go Overview Originated in ancient China 2,500 years ago Two players game Goal - surround more territory than the opponent 19X19.

Similar presentations


Presentation on theme: "Understanding AlphaGo. Go Overview Originated in ancient China 2,500 years ago Two players game Goal - surround more territory than the opponent 19X19."— Presentation transcript:

1 Understanding AlphaGo

2

3 Go Overview Originated in ancient China 2,500 years ago Two players game Goal - surround more territory than the opponent 19X19 grid board playing pieces “stones“ Turn = place a stone or pass The game ends when both players pass

4 Go Overview Only two basic rules 1.Capture rule: stones that have no liberties ->captured and removed from board 2.ko rule: a player is not allowed to make a move that returns the game to the previous position X

5 Go Overview Final position Who won? White Score: 12Black Score: 13

6 Go In a Reinforcement Set-Up Environment states Actions Transition between states Reinforcement function S = A = r(s)= 0 if s is not a terminal state 1 o.w Goal : find policy that maximize the expected total payoff

7 Why it is hard for computers to play GO? Possible configuration of board extremely high ~10^700 Impossible to use brute force exhaustive search Chess (b≈35, d≈80) Go (b≈250, d≈150) main challenges Branching factor Value function

8 https://googleblog.blogspot.co.il/2016/01/alphago-machine-learning- game-go.html https://googleblog.blogspot.co.il/2016/01/alphago-machine-learning- game-go.html

9 Training the Deep Neural Networks Human experts (state,action) (state, win/loss) Monte Carlo Tree Search

10 Training the Deep Neural Networks Policy Value

11 ~30 million (state, action) Goal:maximize the log likelihood of an action Input : 48 feature planes Output: action probability map 19X19X48 12 convolutional + rectifier layers Softmax Probability map

12 Bigger -> better and slower Accuracy AlphaGo (all input features)57.0% AlphaGo (only raw board position)55.7% state of the art44.4%

13

14

15

16 12 convolutional + rectifier layers Softmax Probability map ForwardingAccuracy 3 milliseconds55.4% 2 microseconds24.2%

17

18 SGA 19X19X48 12 convolutional + rectifier layers Softmax Probability map Preventing overfitting RL policy Won more then 80% of the games against SL policy

19 Training the Deep Neural Networks Monte Carlo Tree Search Human experts (state,action) (state, win/loss)

20 Training the Deep Neural Networks ~30m position states Monte Carlo Tree Search ~30m Human experts positions

21 Position evaluation Approximating optimal value function Input : state, output: probability to win Goal: minimize MSE Overfitting - position within games are strongly correlated 19X19X48 convolutional + rectifier layers fc scalar

22

23 Training the Deep Neural Networks ~30m Human expert (state,action) (state,won/loss) Monte Carlo Tree Search

24 Monte Carlo Experiments : repeated random sampling to obtain numerical results Search method Method for making optimal decisions in artificial intelligence (AI) problems The strongest Go AIs (Fuego, Pachi, Zen, and Crazy Stone) all rely on MCTS

25 Monte Carlo Tree Search Each round of Monte Carlo tree search consists of four steps 1.Selection 2.Expansion 3.Simulation 4.Backpropagation

26 MCTS – Upper Confidence Bounds for Trees Exploration Exploitation Tradeoff Kocsis, L. & Szepesvári, C. Bandit based Monte- Carlo planning (2006) Convergence to the optimal solution ExplorationExploitation W i #wins after visiting the node i n i #times node i has been visited C exploration parameter t #times node i parent has been visited

27 AlphaGo MCTS Selection Expansion Evaluation Backpropagation Each edge (s,a) stores: Q(s,a) - action value (avrerage value of sub tree) N(s,a) – visit count P(s,a) – prior probability Why not using the RL policy??

28 AlphaGo MCTS Selection Expansion Evaluation Backpropagation

29 AlphaGo MCTS Selection Expansion Evaluation Backpropagation Leaf evaluation: 1.Value network 2.Random rollout played until terminal

30 AlphaGo MCTS Selection Expansion Evaluation Backpropagation How to choose the next move? Maximum visit count Less sensitive to outliers than maximum action value

31 AlphaGo

32 AlphaGo VS Experts 5:04:1

33 Take Home Modular system Reinforcement and Deep learning Generic VS

34

35 Critical difference between alphaGo & Deep blue Used general purpose algorithms Not a set of handcraft rules Modular system combining planning and pattern recognition Think like human LEE sedol – words best player. South korea. 5 game macth. 5.3 Fan hui wind 5/5. Europe champion


Download ppt "Understanding AlphaGo. Go Overview Originated in ancient China 2,500 years ago Two players game Goal - surround more territory than the opponent 19X19."

Similar presentations


Ads by Google