Status Report on Machine Learning

Slides:



Advertisements
Similar presentations
Todd W. Neller Gettysburg College
Advertisements

Ch. Eick: More on Machine Learning & Neural Networks Different Forms of Learning: –Learning agent receives feedback with respect to its actions (e.g. using.
Reinforcement Learning & Apprenticeship Learning Chenyi Chen.
Monte Carlo Go Has a Way to Go Haruhiro Yoshimoto (*1) Kazuki Yoshizoe (*1) Tomoyuki Kaneko (*1) Akihiro Kishimoto (*2) Kenjiro Taura (*1) (*1)University.
Data Mining and Neural Networks Danny Leung CS157B, Spring 2006 Professor Sin-Min Lee.
Neural Networks AI – Week 23 Sub-symbolic AI Multi-Layer Neural Networks Lee McCluskey, room 3/10
Chapter 3 Neural Network Xiu-jun GONG (Ph. D) School of Computer Science and Technology, Tianjin University
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
School of Engineering and Computer Science Victoria University of Wellington Copyright: Peter Andreae, VUW Image Recognition COMP # 18.
WEEK INTRODUCTION IT440 ARTIFICIAL INTELLIGENCE.
Dr.Abeer Mahmoud ARTIFICIAL INTELLIGENCE (CS 461D) Dr. Abeer Mahmoud Computer science Department Princess Nora University Faculty of Computer & Information.
Applications of Machine Learning to the Game of Go David Stern Applied Games Group Microsoft Research Cambridge (working with Thore Graepel, Ralf Herbrich,
Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.
AI: AlphaGo European champion : Fan Hui A feat previously thought to be at least a decade away!!!
ConvNets for Image Classification
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
Chapter 13 Artificial Intelligence. Artificial Intelligence – Figure 13.1 The Turing Test.
Understanding AlphaGo. Go Overview Originated in ancient China 2,500 years ago Two players game Goal - surround more territory than the opponent 19X19.
Artificial Intelligence AIMA §5: Adversarial Search
Neural networks and support vector machines
Stochastic tree search and stochastic games
The Relationship between Deep Learning and Brain Function
Machine Learning for Big Data
Introduction of Reinforcement Learning
Deep Learning Amin Sobhani.
Chapter 11: Artificial Intelligence
ECE 5424: Introduction to Machine Learning
I Know it’s an Idiot But it’s MY artificial idiot!
Artificial Intelligence (CS 370D)
Mastering the game of Go with deep neural network and tree search
AlphaGo with Deep RL Alpha GO.
Status Report on Machine Learning
Videos NYT Video: DeepMind's alphaGo: Match 4 Summary: see 11 min.
Alpha Go …and Higher Ed Reuben Ternes Oakland University
Reinforcement Learning
AlphaGo and learning methods
Deep reinforcement learning
Classification with Perceptrons Reading:
CH. 1: Introduction 1.1 What is Machine Learning Example:
AlphaGO from Google DeepMind in 2016, beat human grandmasters
Adversarial Search Chapter 5.
AlphaGo and learning methods
A note given in BCC class on March 15, 2016
CS 188: Artificial Intelligence
Artificial Intelligence – Deep Learning and its Applications
Machine Learning: The Connectionist
Bird-species Recognition Using Convolutional Neural Network
K Nearest Neighbor Classification
Introduction to Neural Networks
Brain Inspired Algorithms Dr. David Fagan
Convolutional Neural Networks
Interaction with artificial intelligence in games
Basics of Deep Learning No Math Required
Neural Networks Geoff Hulten.
The Naïve Bayes (NB) Classifier
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Beyond Monte Carlo Tree Search: Playing Go with Deep Alternative Neural Network and Long-Term Evaluation Jinzhuo Wang, WenminWang, Ronggang Wang, Wen Gao.
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Artificial Intelligence and Future of Education
Adversarial Search, Game Playing
These neural networks take a description of the Go board as an input and process it through 12 different network layers containing millions of neuron-like.
Introduction to Artificial Intelligence
Automatic Handwriting Generation
Introduction to Neural Networks
Machine learning CS 229 / stats 229
Mastering Open-face Chinese Poker by Self-play Reinforcement Learning
Function approximation
Prabhas Chongstitvatana Chulalongkorn University
Artificial Intelligence Machine Learning
What is Artificial Intelligence?
Presentation transcript:

Status Report on Machine Learning Hsu-Wen Chiang LeCosPA, NTU

Artificial Intelligence Navigation Sensation Communication Manipulation Intelligence Perception Problem Solving Learning Recognition

Imitation Game If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.

Other Turing Tests Navigation Sensation Communication Manipulation Intelligence Perception Problem Solving Learning Recognition *Brewing test, college graduation test, employment test, judge test

Go! Perfect testing ground for AI improvement Complicated (>1080 states for early game[1], 10164 possible states) No loss of information and with clear goal, also deterministic Large gap between amateur and professional  Easy to evaluate AI progress The last safe house for human[2] Perfect testing ground for AI improvement [1] Early game = first 40 moves [2] Until AlphaGo came out

Basic Knowledge about Go Position: state of the game Goal: occupying more area “dan” and Elo[1] rating translation AlphaGo Performance (Oct. 2015) Pro 2 dan using single machine (48CPU+8GPU) Pro 4 dan using 40 machines w/ 1 GPU disabled (This is the version used when playing with human) [1]400 Elo difference = <10% winning rate, and average Elo = 1000

What has been through Tree Search (Too slow) Value of State (Winning Probability) PW=1? ?? ?? ?? Works if and only if a good score estimation system exists

What has been through Tree Search Value of State Policy of Search (pattern match) Monte Carlo Rollout (MC Rollout) V X

What has been through Tree Search Value of State Policy of Search Monte Carlo Rollout PW=1!

What has been through Tree Search Value of State Policy of Search Monte Carlo Rollout Supervised Linear Classifier (handcrafted by scientists  Learn from the master )

What has been through Tree Search Value of State Policy of Search Monte Carlo Rollout Supervised Linear Classifier Deep Neural Network Reinforcement Learning

Neuron Neuron: N inputs, 1 output , eg. (ReLU) This is just a hyperplane (linear classifier). N neurons  universal function approximator (Riemann sum) Out

Neural Network (NN) Back-propagation learning (non-convex) Need neurons and LOTS of synapses SLOW!!

Convolution Neural Network (cNN) *Wavelet

Deep vs. Shallow *Renormalization

From Learning to Belief Supervised Learning (SL)

From Learning to Belief Supervised Learning (SL) Reinforcement Learning (RL)

Previous Deep Belief Result https://youtu.be/iqXKQf2BOSE?t=1m23s

Putting Everything Together Value Network: RL 15-layer CNN Policy Network : SL 13-layer CNN (~5ms) 48 features + 192 pattern filters Rollout: SL (learn from previous move predicted by policy network) Linear Classifier using 3*3 patterns around current move + 5*5 diamond patterns around last move (~2 μs/step)

How AlphaGo is trained Pattern Recognition (3 weeks): Look at 160K games (29.4M positions) played by KGS amateur 6~9 dan human players SL Policy Network (1 day): Learn from 128 “games” RL Policy & Value Network (7 day): 50M self-play from 32 “best positions” (~1sec/play!!)

AlphaGo Algorithm a. Pick the move with max Q+u(P). Repeat. b. (Single move access #>40) Calculate P from policy network. Return to a. c. Compute Q by averaging over value network AND rollout d. (Out of Time) Most visited move is chosen

First Blood Playing with Europe champion Pro 2 dan Fan Hui during Oct. 5-9 2015, NDA till Jan. 27 komi 7.5 Chinese (Area) rule 5:0 when playing slow (1hours + 30 seconds) 3:2 when playing fast (30 seconds) AlphaGo is trained for 1 months

Game 1 Playing w/ itself and learning more positions and games for 5 months!! Pro 9 dan Lee Sedol First (komi 7.5 China rule) AlphaGo WINS by 5 points after compensation

Welcome to the future Game 2 AlphaGo Wins by 7 points

Rise of the Machine Game 3 AlphaGo Wins by 11 points

Sorry couldn’t resist :D Game 4 Lee Wins

What makes 5 dan difference? No 5 second timeout limit Increase feature filters from 192 to 256?? Compressing data through 8-fold symm. of Go? Total: 2 dan difference (~10x slowdown) Learning from Fan Hui? More training? Higher quality of self-play?