Status Report on Machine Learning Hsu-Wen Chiang LeCosPA, NTU
Artificial Intelligence Navigation Sensation Communication Manipulation Intelligence Perception Problem Solving Learning Recognition
Imitation Game If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.
Other Turing Tests Navigation Sensation Communication Manipulation Intelligence Perception Problem Solving Learning Recognition *Brewing test, college graduation test, employment test, judge test
Go! Perfect testing ground for AI improvement Complicated (>1080 states for early game[1], 10164 possible states) No loss of information and with clear goal, also deterministic Large gap between amateur and professional Easy to evaluate AI progress The last safe house for human[2] Perfect testing ground for AI improvement [1] Early game = first 40 moves [2] Until AlphaGo came out
Basic Knowledge about Go Position: state of the game Goal: occupying more area “dan” and Elo[1] rating translation AlphaGo Performance (Oct. 2015) Pro 2 dan using single machine (48CPU+8GPU) Pro 4 dan using 40 machines w/ 1 GPU disabled (This is the version used when playing with human) [1]400 Elo difference = <10% winning rate, and average Elo = 1000
What has been through Tree Search (Too slow) Value of State (Winning Probability) PW=1? ?? ?? ?? Works if and only if a good score estimation system exists
What has been through Tree Search Value of State Policy of Search (pattern match) Monte Carlo Rollout (MC Rollout) V X
What has been through Tree Search Value of State Policy of Search Monte Carlo Rollout PW=1!
What has been through Tree Search Value of State Policy of Search Monte Carlo Rollout Supervised Linear Classifier (handcrafted by scientists Learn from the master )
What has been through Tree Search Value of State Policy of Search Monte Carlo Rollout Supervised Linear Classifier Deep Neural Network Reinforcement Learning
Neuron Neuron: N inputs, 1 output , eg. (ReLU) This is just a hyperplane (linear classifier). N neurons universal function approximator (Riemann sum) Out
Neural Network (NN) Back-propagation learning (non-convex) Need neurons and LOTS of synapses SLOW!!
Convolution Neural Network (cNN) *Wavelet
Deep vs. Shallow *Renormalization
From Learning to Belief Supervised Learning (SL)
From Learning to Belief Supervised Learning (SL) Reinforcement Learning (RL)
Previous Deep Belief Result https://youtu.be/iqXKQf2BOSE?t=1m23s
Putting Everything Together Value Network: RL 15-layer CNN Policy Network : SL 13-layer CNN (~5ms) 48 features + 192 pattern filters Rollout: SL (learn from previous move predicted by policy network) Linear Classifier using 3*3 patterns around current move + 5*5 diamond patterns around last move (~2 μs/step)
How AlphaGo is trained Pattern Recognition (3 weeks): Look at 160K games (29.4M positions) played by KGS amateur 6~9 dan human players SL Policy Network (1 day): Learn from 128 “games” RL Policy & Value Network (7 day): 50M self-play from 32 “best positions” (~1sec/play!!)
AlphaGo Algorithm a. Pick the move with max Q+u(P). Repeat. b. (Single move access #>40) Calculate P from policy network. Return to a. c. Compute Q by averaging over value network AND rollout d. (Out of Time) Most visited move is chosen
First Blood Playing with Europe champion Pro 2 dan Fan Hui during Oct. 5-9 2015, NDA till Jan. 27 komi 7.5 Chinese (Area) rule 5:0 when playing slow (1hours + 30 seconds) 3:2 when playing fast (30 seconds) AlphaGo is trained for 1 months
Game 1 Playing w/ itself and learning more positions and games for 5 months!! Pro 9 dan Lee Sedol First (komi 7.5 China rule) AlphaGo WINS by 5 points after compensation
Welcome to the future Game 2 AlphaGo Wins by 7 points
Rise of the Machine Game 3 AlphaGo Wins by 11 points
Sorry couldn’t resist :D Game 4 Lee Wins
What makes 5 dan difference? No 5 second timeout limit Increase feature filters from 192 to 256?? Compressing data through 8-fold symm. of Go? Total: 2 dan difference (~10x slowdown) Learning from Fan Hui? More training? Higher quality of self-play?