Status Report on Machine Learning

Status Report on Machine Learning
Hsu-Wen Chiang LeCosPA, NTU

Artificial Intelligence
Navigation Sensation Communication Manipulation Intelligence Perception Problem Solving Learning Recognition

Imitation Game If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.

Other Turing Tests Navigation Sensation Communication Manipulation
Intelligence Perception Problem Solving Learning Recognition *Brewing test, college graduation test, employment test, judge test

Go! Perfect testing ground for AI improvement
Complicated (>1080 states for early game[1], possible states) No loss of information and with clear goal, also deterministic Large gap between amateur and professional  Easy to evaluate AI progress The last safe house for human[2] Perfect testing ground for AI improvement [1] Early game = first 40 moves [2] Until AlphaGo came out

Basic Knowledge about Go
Position: state of the game Goal: occupying more area “dan” and Elo[1] rating translation AlphaGo Performance (Oct. 2015) Pro 2 dan using single machine (48CPU+8GPU) Pro 4 dan using 40 machines w/ 1 GPU disabled (This is the version used when playing with human) [1]400 Elo difference = <10% winning rate, and average Elo = 1000

What has been through Tree Search (Too slow)
Value of State (Winning Probability) PW=1? ?? ?? ?? Works if and only if a good score estimation system exists

What has been through Tree Search Value of State
Policy of Search (pattern match) Monte Carlo Rollout (MC Rollout) V X

What has been through Tree Search Value of State Policy of Search
Monte Carlo Rollout PW=1!

Monte Carlo Rollout Supervised Linear Classifier (handcrafted by scientists  Learn from the master )

Monte Carlo Rollout Supervised Linear Classifier Deep Neural Network Reinforcement Learning

Neuron Neuron: N inputs, 1 output
, eg (ReLU) This is just a hyperplane (linear classifier). N neurons  universal function approximator (Riemann sum) Out

Neural Network (NN) Back-propagation learning (non-convex)
Need neurons and LOTS of synapses SLOW!!

Convolution Neural Network (cNN)
*Wavelet

Deep vs. Shallow *Renormalization

From Learning to Belief
Supervised Learning (SL)

From Learning to Belief
Supervised Learning (SL) Reinforcement Learning (RL)

Previous Deep Belief Result

Putting Everything Together
Value Network: RL 15-layer CNN Policy Network : SL 13-layer CNN (~5ms) 48 features pattern filters Rollout: SL (learn from previous move predicted by policy network) Linear Classifier using 3*3 patterns around current move + 5*5 diamond patterns around last move (~2 μs/step)

How AlphaGo is trained Pattern Recognition (3 weeks): Look at 160K games (29.4M positions) played by KGS amateur 6~9 dan human players SL Policy Network (1 day): Learn from 128 “games” RL Policy & Value Network (7 day): 50M self-play from 32 “best positions” (~1sec/play!!)

AlphaGo Algorithm a. Pick the move with max Q+u(P). Repeat. b. (Single move access #>40) Calculate P from policy network. Return to a. c. Compute Q by averaging over value network AND rollout d. (Out of Time) Most visited move is chosen

First Blood Playing with Europe champion Pro 2 dan Fan Hui during Oct , NDA till Jan. 27 komi 7.5 Chinese (Area) rule 5:0 when playing slow (1hours + 30 seconds) 3:2 when playing fast (30 seconds) AlphaGo is trained for 1 months

Game 1 Playing w/ itself and learning more positions and games for 5 months!! Pro 9 dan Lee Sedol First (komi 7.5 China rule) AlphaGo WINS by 5 points after compensation

Welcome to the future Game 2 AlphaGo Wins by 7 points

Rise of the Machine Game 3 AlphaGo Wins by 11 points

Sorry couldn’t resist :D
Game 4 Lee Wins

What makes 5 dan difference?
No 5 second timeout limit Increase feature filters from 192 to 256?? Compressing data through 8-fold symm. of Go? Total: 2 dan difference (~10x slowdown) Learning from Fan Hui? More training? Higher quality of self-play?

Status Report on Machine Learning

Similar presentations

Presentation on theme: "Status Report on Machine Learning"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Status Report on Machine Learning

Similar presentations

Presentation on theme: "Status Report on Machine Learning"— Presentation transcript:

Similar presentations

About project

Feedback