Download presentation
Presentation is loading. Please wait.
1
Status Report on Machine Learning
Hsu-Wen Chiang LeCosPA, NTU
2
Artificial Intelligence
Navigation Sensation Communication Manipulation Intelligence Perception Problem Solving Learning Recognition
3
Imitation Game If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.
4
Other Turing Tests Navigation Sensation Communication Manipulation
Intelligence Perception Problem Solving Learning Recognition *Brewing test, college graduation test, employment test, judge test
5
Go! Perfect testing ground for AI improvement
Complicated (>1080 states for early game[1], possible states) No loss of information and with clear goal, also deterministic Large gap between amateur and professional Easy to evaluate AI progress The last safe house for human[2] Perfect testing ground for AI improvement [1] Early game = first 40 moves [2] Until AlphaGo came out
6
Basic Knowledge about Go
Position: state of the game Goal: occupying more area “dan” and Elo[1] rating translation AlphaGo Performance (Oct. 2015) Pro 2 dan using single machine (48CPU+8GPU) Pro 4 dan using 40 machines w/ 1 GPU disabled (This is the version used when playing with human) [1]400 Elo difference = <10% winning rate, and average Elo = 1000
7
What has been through Tree Search (Too slow)
Value of State (Winning Probability) PW=1? ?? ?? ?? Works if and only if a good score estimation system exists
8
What has been through Tree Search Value of State
Policy of Search (pattern match) Monte Carlo Rollout (MC Rollout) V X
9
What has been through Tree Search Value of State Policy of Search
Monte Carlo Rollout PW=1!
10
What has been through Tree Search Value of State Policy of Search
Monte Carlo Rollout Supervised Linear Classifier (handcrafted by scientists Learn from the master )
11
What has been through Tree Search Value of State Policy of Search
Monte Carlo Rollout Supervised Linear Classifier Deep Neural Network Reinforcement Learning
12
Neuron Neuron: N inputs, 1 output
, eg (ReLU) This is just a hyperplane (linear classifier). N neurons universal function approximator (Riemann sum) Out
13
Neural Network (NN) Back-propagation learning (non-convex)
Need neurons and LOTS of synapses SLOW!!
14
Convolution Neural Network (cNN)
*Wavelet
15
Deep vs. Shallow *Renormalization
16
From Learning to Belief
Supervised Learning (SL)
17
From Learning to Belief
Supervised Learning (SL) Reinforcement Learning (RL)
18
Previous Deep Belief Result
19
Putting Everything Together
Value Network: RL 15-layer CNN Policy Network : SL 13-layer CNN (~5ms) 48 features pattern filters Rollout: SL (learn from previous move predicted by policy network) Linear Classifier using 3*3 patterns around current move + 5*5 diamond patterns around last move (~2 μs/step)
20
How AlphaGo is trained Pattern Recognition (3 weeks): Look at 160K games (29.4M positions) played by KGS amateur 6~9 dan human players SL Policy Network (1 day): Learn from 128 “games” RL Policy & Value Network (7 day): 50M self-play from 32 “best positions” (~1sec/play!!)
21
AlphaGo Algorithm a. Pick the move with max Q+u(P). Repeat. b. (Single move access #>40) Calculate P from policy network. Return to a. c. Compute Q by averaging over value network AND rollout d. (Out of Time) Most visited move is chosen
22
First Blood Playing with Europe champion Pro 2 dan Fan Hui during Oct , NDA till Jan. 27 komi 7.5 Chinese (Area) rule 5:0 when playing slow (1hours + 30 seconds) 3:2 when playing fast (30 seconds) AlphaGo is trained for 1 months
25
Game 1 Playing w/ itself and learning more positions and games for 5 months!! Pro 9 dan Lee Sedol First (komi 7.5 China rule) AlphaGo WINS by 5 points after compensation
26
Welcome to the future Game 2 AlphaGo Wins by 7 points
27
Rise of the Machine Game 3 AlphaGo Wins by 11 points
28
Sorry couldn’t resist :D
Game 4 Lee Wins
29
What makes 5 dan difference?
No 5 second timeout limit Increase feature filters from 192 to 256?? Compressing data through 8-fold symm. of Go? Total: 2 dan difference (~10x slowdown) Learning from Fan Hui? More training? Higher quality of self-play?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.