Status Report on Machine Learning

Slides:



Advertisements
Similar presentations
Todd W. Neller Gettysburg College
Advertisements

Slides from: Doug Gray, David Poole
Ch. Eick: More on Machine Learning & Neural Networks Different Forms of Learning: –Learning agent receives feedback with respect to its actions (e.g. using.
Reinforcement Learning & Apprenticeship Learning Chenyi Chen.
Monte Carlo Go Has a Way to Go Haruhiro Yoshimoto (*1) Kazuki Yoshizoe (*1) Tomoyuki Kaneko (*1) Akihiro Kishimoto (*2) Kenjiro Taura (*1) (*1)University.
Data Mining and Neural Networks Danny Leung CS157B, Spring 2006 Professor Sin-Min Lee.
Neural Networks AI – Week 23 Sub-symbolic AI Multi-Layer Neural Networks Lee McCluskey, room 3/10
Chapter 3 Neural Network Xiu-jun GONG (Ph. D) School of Computer Science and Technology, Tianjin University
School of Engineering and Computer Science Victoria University of Wellington Copyright: Peter Andreae, VUW Image Recognition COMP # 18.
WEEK INTRODUCTION IT440 ARTIFICIAL INTELLIGENCE.
Applications of Machine Learning to the Game of Go David Stern Applied Games Group Microsoft Research Cambridge (working with Thore Graepel, Ralf Herbrich,
Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.
AI: AlphaGo European champion : Fan Hui A feat previously thought to be at least a decade away!!!
ConvNets for Image Classification
Chapter 13 Artificial Intelligence. Artificial Intelligence – Figure 13.1 The Turing Test.
1 Passive Reinforcement Learning Ruti Glick Bar-Ilan university.
Understanding AlphaGo. Go Overview Originated in ancient China 2,500 years ago Two players game Goal - surround more territory than the opponent 19X19.
Artificial Intelligence AIMA §5: Adversarial Search
Neural networks and support vector machines
Big data classification using neural network
Stochastic tree search and stochastic games
Machine Learning for Big Data
Introduction of Reinforcement Learning
Data Mining, Neural Network and Genetic Programming
ECE 5424: Introduction to Machine Learning
Status Report on Machine Learning
I Know it’s an Idiot But it’s MY artificial idiot!
Artificial Intelligence (CS 370D)
Mastering the game of Go with deep neural network and tree search
Deep Learning Hung-yi Lee 李宏毅.
AlphaGo with Deep RL Alpha GO.
Videos NYT Video: DeepMind's alphaGo: Match 4 Summary: see 11 min.
CSE 473 Introduction to Artificial Intelligence Neural Networks
Alpha Go …and Higher Ed Reuben Ternes Oakland University
Reinforcement Learning
AlphaGo and learning methods
Deep reinforcement learning
Neural Networks CS 446 Machine Learning.
Classification with Perceptrons Reading:
CH. 1: Introduction 1.1 What is Machine Learning Example:
AlphaGO from Google DeepMind in 2016, beat human grandmasters
CS 4700: Foundations of Artificial Intelligence
AlphaGo and learning methods
A note given in BCC class on March 15, 2016
CS 188: Artificial Intelligence
Artificial Intelligence – Deep Learning and its Applications
Machine Learning: The Connectionist
CSE P573 Applications of Artificial Intelligence Neural Networks
CSE 473 Introduction to Artificial Intelligence Neural Networks
K Nearest Neighbor Classification
Introduction to Neural Networks
Brain Inspired Algorithms Dr. David Fagan
Convolutional Neural Networks
Interaction with artificial intelligence in games
Neural Networks Geoff Hulten.
October 6, 2011 Dr. Itamar Arel College of Engineering
Chapter 1: Introduction
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Beyond Monte Carlo Tree Search: Playing Go with Deep Alternative Neural Network and Long-Term Evaluation Jinzhuo Wang, WenminWang, Ronggang Wang, Wen Gao.
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Adversarial Search, Game Playing
These neural networks take a description of the Go board as an input and process it through 12 different network layers containing millions of neuron-like.
Introduction to Artificial Intelligence
Automatic Handwriting Generation
BY READING MANUALS IN A MONTE CARLO FRAMEWORK
Introduction to Neural Networks
Machine learning CS 229 / stats 229
Mastering Open-face Chinese Poker by Self-play Reinforcement Learning
Function approximation
Prabhas Chongstitvatana Chulalongkorn University
Presentation transcript:

Status Report on Machine Learning Hsu-Wen Chiang LeCosPA, NTU

Artificial Intelligence Navigation Sensation Communication Manipulation Intelligence Perception Problem Solving Learning Recognition

Imitation Game If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.

Other Turing Tests Navigation Sensation Communication Manipulation Intelligence Perception Problem Solving Learning Recognition *Brewing test, college graduation test, employment test, judge test

Go! Perfect testing ground for AI improvement Complicated (>1080 states for early game[1], 10164 possible states) No loss of information and with clear goal, also deterministic Large gap between amateur and professional  Easy to evaluate AI progress The last safe house for human[2] Perfect testing ground for AI improvement [1] Early game = first 40 moves [2] Until AlphaGo came out

Basic Knowledge about Go Position: state of the game Goal: occupying more area “dan” and Elo[1] rating translation AlphaGo Performance (Oct. 2015) Pro 2 dan using single machine (48CPU+8GPU) Pro 4 dan using 40 machines w/ 1 GPU disabled (This is the version used when playing with human) [1]400 Elo difference = <10% winning rate, and average Elo = 1000

What has been through Tree Search (Too slow) Value of State (Winning Probability) PW=1? ?? ?? ?? Works if and only if a good score estimation system exists

What has been through Tree Search Value of State Policy of Search (pattern match) Monte Carlo Rollout (MC Rollout) V X

What has been through Tree Search Value of State Policy of Search Monte Carlo Rollout PW=1!

What has been through Tree Search Value of State Policy of Search Monte Carlo Rollout Supervised Linear Classifier (handcrafted by scientists  Learn from the master )

What has been through Tree Search Value of State Policy of Search Monte Carlo Rollout Supervised Linear Classifier Deep Neural Network Reinforcement Learning

Neuron Neuron: N inputs, 1 output , eg. (ReLU) This is just a hyperplane (linear classifier). N neurons  universal function approximator (Riemann sum) Out

Neural Network (NN) Back-propagation learning (non-convex) Need neurons and LOTS of synapses SLOW!!

Convolution Neural Network (cNN) *Wavelet

Deep vs. Shallow *Renormalization

From Learning to Belief Supervised Learning (SL)

From Learning to Belief Supervised Learning (SL) Reinforcement Learning (RL)

Previous Deep Belief Result https://youtu.be/iqXKQf2BOSE?t=1m23s

Putting Everything Together Value Network: RL 15-layer CNN Policy Network : SL 13-layer CNN (~5ms) 48 features + 192 pattern filters Rollout: SL (learn from previous move predicted by policy network) Linear Classifier using 3*3 patterns around current move + 5*5 diamond patterns around last move (~2 μs/step)

How AlphaGo is trained Pattern Recognition (3 weeks): Look at 160K games (29.4M positions) played by KGS amateur 6~9 dan human players SL Policy Network (1 day): Learn from 128 “games” RL Policy & Value Network (7 day): 50M self-play from 32 “best positions” (~1sec/play!!)

AlphaGo Algorithm a. Pick the move with max Q+u(P). Repeat. b. (Single move access #>40) Calculate P from policy network. Return to a. c. Compute Q by averaging over value network AND rollout d. (Out of Time) Most visited move is chosen

First Blood Playing with Europe champion Pro 2 dan Fan Hui during Oct. 5-9 2015, NDA till Jan. 27 komi 7.5 Chinese (Area) rule 5:0 when playing slow (1hours + 30 seconds) 3:2 when playing fast (30 seconds) AlphaGo is trained for 1 months

Game 1 Playing w/ itself and learning more positions and games for 5 months!! Pro 9 dan Lee Sedol First (komi 7.5 China rule) AlphaGo WINS by 5 points after compensation *Actually is win by resignation (may have error on score estimation)

Welcome to the future Game 2 AlphaGo Wins by 7 points *Actually is win by resignation

Rise of the Machine Game 3 AlphaGo Wins by 11 points *Actually is win by resignation

Sorry couldn’t resist :D Game 4 Lee Wins *Win by resignation

What makes 7 dan difference? Increase feature filters from 192 to 256?? Compressing data through 8-fold symm. of Go? Total: 2 dan difference (~10x slowdown) Learning from Fan Hui? More training? Higher quality of self-play? Let’s wait for another paper from Deepmind