Human-level control through deep reinforcement learning

Slides:



Advertisements
Similar presentations
ImageNet Classification with Deep Convolutional Neural Networks
Advertisements

Reinforcement Learning & Apprenticeship Learning Chenyi Chen.
Overview of Back Propagation Algorithm
What is the Best Multi-Stage Architecture for Object Recognition Kevin Jarrett, Koray Kavukcuoglu, Marc’ Aurelio Ranzato and Yann LeCun Presented by Lingbo.
Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.
Computer Go : A Go player Rohit Gurjar CS365 Project Presentation, IIT Kanpur Guided By – Prof. Amitabha Mukerjee.
Deep Convolutional Nets
Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.
Feedforward semantic segmentation with zoom-out features
1 Computational Vision CSCI 363, Fall 2012 Lecture 16 Stereopsis.
Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov
ImageNet Classification with Deep Convolutional Neural Networks Presenter: Weicong Chen.
Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.
ConvNets for Image Classification
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Machine Learning Supervised Learning Classification and Regression
Deep Reinforcement Learning
Big data classification using neural network
Convolutional Sequence to Sequence Learning
Reinforcement Learning
Deep Learning for Dual-Energy X-Ray
Figure 5: Change in Blackjack Posterior Distributions over Time.
Learning Deep Generative Models by Ruslan Salakhutdinov
Convolutional Neural Network
Deep Feedforward Networks
Reinforcement Learning
Deep Reinforcement Learning
A Comparison of Learning Algorithms on the ALE
Computer Science and Engineering, Seoul National University
DeepCount Mark Lenson.
Mastering the game of Go with deep neural network and tree search
ReinforcementLearning: A package for replicating human behavior in R
Reinforcement Learning
Deep reinforcement learning
Lecture 5 Smaller Network: CNN
How it Works: Convolutional Neural Networks
R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.
Presenter: Hajar Emami
Deep Learning and Newtonian Physics
"Playing Atari with deep reinforcement learning."
Computer Vision James Hays
Convolutional Neural Networks for sentence classification
Introduction to Neural Networks
Towards Understanding the Invertibility of Convolutional Neural Networks Anna C. Gilbert1, Yi Zhang1, Kibok Lee1, Yuting Zhang1, Honglak Lee1,2 1University.
Convolutional Neural Networks
Deep learning Introduction Classes of Deep Learning Networks
Very Deep Convolutional Networks for Large-Scale Image Recognition
Smart Robots, Drones, IoT
Dr. Unnikrishnan P.C. Professor, EEE
LECTURE 35: Introduction to EEG Processing
Neural Networks Geoff Hulten.
المشرف د.يــــاســـــــــر فـــــــؤاد By: ahmed badrealldeen
LECTURE 33: Alternative OPTIMIZERS
Visualizing and Understanding Convolutional Networks
Example of a simple deep network architecture.
Analysis of Trained CNN (Receptive Field & Weights of Network)
Convolutional Neural Networks
Introduction to Object Tracking
Autoencoders hi shea autoencoders Sys-AI.
A connectionist model in action
CSC 578 Neural Networks and Deep Learning
by Khaled Nasr, Pooja Viswanathan, and Andreas Nieder
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
Neural Machine Translation using CNN
Function approximation
Example of a simple deep network architecture.
DNN-BASED SPEAKER-ADAPTIVE POSTFILTERING WITH LIMITED ADAPTATION DATA FOR STATISTICAL SPEECH SYNTHESIS SYSTEMS Mirac Goksu Ozturk1, Okan Ulusoy1, Cenk.
Example of training and deployment of deep convolutional neural networks. Example of training and deployment of deep convolutional neural networks. During.
Week 7 Presentation Ngoc Ta Aidean Sharghi
Jianbo Chen*, Le Song†✦, Martin J. Wainwright*◇ , Michael I. Jordan*
Presentation transcript:

Human-level control through deep reinforcement learning Mnih et. al. Deep Mind

Motivation Reinforcement Learning - used to find optimum actions where states/features are well defined. Deep Learning - learn specific features from high-dimensional data Reinforcement Learning + Deep Learning = AI (?) - David Silver

Motivation Continued... Deep Reinforcement Learning Deep Q-network agent Input - only the pixels and the game score Achieved a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters.

Recap

Value-Action (Q or Quality) Function Goal is to approximate the optimal action-value function Maximum sum of rewards discounted rt at each time step t, achievable by a behavior policy, after making an observation (s) and taking action (a).

Implementation : First Preprocessing Raw Input - 210 * 160 pixels with 128-color palette (demanding in terms of computation and memory requirements) Extracted the Y channel ( luminance ) from RGB and rescale it to 84*84. Function ∅ applies preprocessing to m recent frames and stacks them to produce input to the Q-function, in which m=4 Real input = 84*84*4

Model Architecture Input : 84*84*4 images after preprocessing Hidden Layer: 32 filters 8*8 with stride 4 Another hidden layer: 64 filters of 4*4 with stride 2 Final Fully Connected Layer : 512 rectifier units Output layer is fully connected linear layer with single output for each valid action Number of valid actions varied between 4 and 18 based on games The activation function used in convolutional layer was rectified nonlinearity

100 * (DQN score - random play score)/ (human score - random play score).

DQN agent play for 2 h of real game time and running the t-SNE algorithm on the last hidden layer representation The DQN predicts high state values for both full (top right screenshots) and nearly complete screens (bottom left screenshots) because it has learned that completing a screen leads to a new screen full of enemy ships. Partially completed screens (bottom screenshots) are assigned lower state values because less immediate reward is available. The screens shown on the bottom right and top left and middle are less perceptually are still mapped to nearby representations and similar values because the orange bunkers do not carry great significance near the end of a level.

Generated by combination of human and DQN show similar visualization

References Sutton,R. & Barto, A. Reinforcement Learning: An Introduction (MIT Press, 1998) Deep Reinforcement Learning, Fall 2017, Sergey Levine, University of Berkley Human-level control through deep reinforcement learning, Mnih et. al., Nature, 2015