Human-level control through deep reinforcement learning

Slides:

Advertisements

Similar presentations

ImageNet Classification with Deep Convolutional Neural Networks

Advertisements

Reinforcement Learning & Apprenticeship Learning Chenyi Chen.

Overview of Back Propagation Algorithm

What is the Best Multi-Stage Architecture for Object Recognition Kevin Jarrett, Koray Kavukcuoglu, Marc’ Aurelio Ranzato and Yann LeCun Presented by Lingbo.

Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.

Computer Go : A Go player Rohit Gurjar CS365 Project Presentation, IIT Kanpur Guided By – Prof. Amitabha Mukerjee.

Deep Convolutional Nets

Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.

Feedforward semantic segmentation with zoom-out features

1 Computational Vision CSCI 363, Fall 2012 Lecture 16 Stereopsis.

Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov

ImageNet Classification with Deep Convolutional Neural Networks Presenter: Weicong Chen.

Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.

ConvNets for Image Classification

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Machine Learning Supervised Learning Classification and Regression

Deep Reinforcement Learning

Big data classification using neural network

Convolutional Sequence to Sequence Learning

Reinforcement Learning

Deep Learning for Dual-Energy X-Ray

Figure 5: Change in Blackjack Posterior Distributions over Time.

Learning Deep Generative Models by Ruslan Salakhutdinov

Convolutional Neural Network

Deep Feedforward Networks

Reinforcement Learning

Deep Reinforcement Learning

A Comparison of Learning Algorithms on the ALE

Computer Science and Engineering, Seoul National University

DeepCount Mark Lenson.

Mastering the game of Go with deep neural network and tree search

ReinforcementLearning: A package for replicating human behavior in R

Reinforcement Learning

Deep reinforcement learning

Lecture 5 Smaller Network: CNN

How it Works: Convolutional Neural Networks

R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.

Presenter: Hajar Emami

Deep Learning and Newtonian Physics

"Playing Atari with deep reinforcement learning."

Computer Vision James Hays

Convolutional Neural Networks for sentence classification

Introduction to Neural Networks

Towards Understanding the Invertibility of Convolutional Neural Networks Anna C. Gilbert1, Yi Zhang1, Kibok Lee1, Yuting Zhang1, Honglak Lee1,2 1University.

Convolutional Neural Networks

Deep learning Introduction Classes of Deep Learning Networks

Very Deep Convolutional Networks for Large-Scale Image Recognition

Smart Robots, Drones, IoT

Dr. Unnikrishnan P.C. Professor, EEE

LECTURE 35: Introduction to EEG Processing

Neural Networks Geoff Hulten.

المشرف د.يــــاســـــــــر فـــــــؤاد By: ahmed badrealldeen

LECTURE 33: Alternative OPTIMIZERS

Visualizing and Understanding Convolutional Networks

Example of a simple deep network architecture.

Analysis of Trained CNN (Receptive Field & Weights of Network)

Convolutional Neural Networks

Introduction to Object Tracking

Autoencoders hi shea autoencoders Sys-AI.

A connectionist model in action

CSC 578 Neural Networks and Deep Learning

by Khaled Nasr, Pooja Viswanathan, and Andreas Nieder

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

Neural Machine Translation using CNN

Function approximation

Example of a simple deep network architecture.

DNN-BASED SPEAKER-ADAPTIVE POSTFILTERING WITH LIMITED ADAPTATION DATA FOR STATISTICAL SPEECH SYNTHESIS SYSTEMS Mirac Goksu Ozturk1, Okan Ulusoy1, Cenk.

Example of training and deployment of deep convolutional neural networks. Example of training and deployment of deep convolutional neural networks. During.

Week 7 Presentation Ngoc Ta Aidean Sharghi

Jianbo Chen*, Le Song†✦, Martin J. Wainwright*◇ , Michael I. Jordan*

Presentation transcript:

Human-level control through deep reinforcement learning Mnih et. al. Deep Mind

Motivation Reinforcement Learning - used to find optimum actions where states/features are well defined. Deep Learning - learn specific features from high-dimensional data Reinforcement Learning + Deep Learning = AI (?) - David Silver

Motivation Continued... Deep Reinforcement Learning Deep Q-network agent Input - only the pixels and the game score Achieved a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters.

Recap

Value-Action (Q or Quality) Function Goal is to approximate the optimal action-value function Maximum sum of rewards discounted rt at each time step t, achievable by a behavior policy, after making an observation (s) and taking action (a).

Implementation : First Preprocessing Raw Input - 210 * 160 pixels with 128-color palette (demanding in terms of computation and memory requirements) Extracted the Y channel ( luminance ) from RGB and rescale it to 84*84. Function ∅ applies preprocessing to m recent frames and stacks them to produce input to the Q-function, in which m=4 Real input = 84*84*4

Model Architecture Input : 84*84*4 images after preprocessing Hidden Layer: 32 filters 8*8 with stride 4 Another hidden layer: 64 filters of 4*4 with stride 2 Final Fully Connected Layer : 512 rectifier units Output layer is fully connected linear layer with single output for each valid action Number of valid actions varied between 4 and 18 based on games The activation function used in convolutional layer was rectified nonlinearity

100 * (DQN score - random play score)/ (human score - random play score).

DQN agent play for 2 h of real game time and running the t-SNE algorithm on the last hidden layer representation The DQN predicts high state values for both full (top right screenshots) and nearly complete screens (bottom left screenshots) because it has learned that completing a screen leads to a new screen full of enemy ships. Partially completed screens (bottom screenshots) are assigned lower state values because less immediate reward is available. The screens shown on the bottom right and top left and middle are less perceptually are still mapped to nearby representations and similar values because the orange bunkers do not carry great significance near the end of a level.

Generated by combination of human and DQN show similar visualization

References Sutton,R. & Barto, A. Reinforcement Learning: An Introduction (MIT Press, 1998) Deep Reinforcement Learning, Fall 2017, Sergey Levine, University of Berkley Human-level control through deep reinforcement learning, Mnih et. al., Nature, 2015