Presentation is loading. Please wait.

Presentation is loading. Please wait.

Status Report on Machine Learning

Similar presentations


Presentation on theme: "Status Report on Machine Learning"— Presentation transcript:

1 Status Report on Machine Learning
Hsu-Wen Chiang LeCosPA, NTU

2 Artificial Intelligence
Navigation Sensation Communication Manipulation Intelligence Perception Problem Solving Learning Recognition

3 Imitation Game If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.

4 Other Turing Tests Navigation Sensation Communication Manipulation
Intelligence Perception Problem Solving Learning Recognition *Brewing test, college graduation test, employment test, judge test

5 Go! Perfect testing ground for AI improvement
Complicated (>1080 states for early game[1], possible states) No loss of information and with clear goal, also deterministic Large gap between amateur and professional  Easy to evaluate AI progress The last safe house for human[2] Perfect testing ground for AI improvement [1] Early game = first 40 moves [2] Until AlphaGo came out

6 Basic Knowledge about Go
Position: state of the game Goal: occupying more area “dan” and Elo[1] rating translation AlphaGo Performance (Oct. 2015) Pro 2 dan using single machine (48CPU+8GPU) Pro 4 dan using 40 machines w/ 1 GPU disabled (This is the version used when playing with human) [1]400 Elo difference = <10% winning rate, and average Elo = 1000

7 What has been through Tree Search (Too slow)
Value of State (Winning Probability) PW=1? ?? ?? ?? Works if and only if a good score estimation system exists

8 What has been through Tree Search Value of State
Policy of Search (pattern match) Monte Carlo Rollout (MC Rollout) V X

9 What has been through Tree Search Value of State Policy of Search
Monte Carlo Rollout PW=1!

10 What has been through Tree Search Value of State Policy of Search
Monte Carlo Rollout Supervised Linear Classifier (handcrafted by scientists  Learn from the master )

11 What has been through Tree Search Value of State Policy of Search
Monte Carlo Rollout Supervised Linear Classifier Deep Neural Network Reinforcement Learning

12 Neuron Neuron: N inputs, 1 output
, eg (ReLU) This is just a hyperplane (linear classifier). N neurons  universal function approximator (Riemann sum) Out

13 Neural Network (NN) Back-propagation learning (non-convex)
Need neurons and LOTS of synapses SLOW!!

14 Convolution Neural Network (cNN)
*Wavelet

15 Deep vs. Shallow *Renormalization

16 From Learning to Belief
Supervised Learning (SL)

17 From Learning to Belief
Supervised Learning (SL) Reinforcement Learning (RL)

18 Previous Deep Belief Result

19 Putting Everything Together
Value Network: RL 15-layer CNN Policy Network : SL 13-layer CNN (~5ms) 48 features pattern filters Rollout: SL (learn from previous move predicted by policy network) Linear Classifier using 3*3 patterns around current move + 5*5 diamond patterns around last move (~2 μs/step)

20 How AlphaGo is trained Pattern Recognition (3 weeks): Look at 160K games (29.4M positions) played by KGS amateur 6~9 dan human players SL Policy Network (1 day): Learn from 128 “games” RL Policy & Value Network (7 day): 50M self-play from 32 “best positions” (~1sec/play!!)

21 AlphaGo Algorithm a. Pick the move with max Q+u(P). Repeat. b. (Single move access #>40) Calculate P from policy network. Return to a. c. Compute Q by averaging over value network AND rollout d. (Out of Time) Most visited move is chosen

22 First Blood Playing with Europe champion Pro 2 dan Fan Hui during Oct , NDA till Jan. 27 komi 7.5 Chinese (Area) rule 5:0 when playing slow (1hours + 30 seconds) 3:2 when playing fast (30 seconds) AlphaGo is trained for 1 months

23

24

25 Game 1 Playing w/ itself and learning more positions and games for 5 months!! Pro 9 dan Lee Sedol First (komi 7.5 China rule) AlphaGo WINS by 5 points after compensation

26 Welcome to the future Game 2 AlphaGo Wins by 7 points

27 Rise of the Machine Game 3 AlphaGo Wins by 11 points

28 Sorry couldn’t resist :D
Game 4 Lee Wins

29 What makes 5 dan difference?
No 5 second timeout limit Increase feature filters from 192 to 256?? Compressing data through 8-fold symm. of Go? Total: 2 dan difference (~10x slowdown) Learning from Fan Hui? More training? Higher quality of self-play?

30


Download ppt "Status Report on Machine Learning"

Similar presentations


Ads by Google