Deep Reinforcement Learning

Slides:

Advertisements

Similar presentations

A machine learning perspective on neural networks and learning tools

Advertisements

Reinforcement learning

Value Iteration & Q-learning CS 5368 Song Cui. Outline Recap Value Iteration Q-learning.

Reinforcement Learning & Apprenticeship Learning Chenyi Chen.

1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.

Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.

1 Reinforcement Learning: Learning algorithms Function Approximation Yishay Mansour Tel-Aviv University.

CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.

Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.

Model-based Bayesian Reinforcement Learning in Partially Observable Domains by Pascal Poupart and Nikos Vlassis (2008 International Symposium on Artificial.

Reinforcement learning (Chapter 21)

Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.

Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.

Lecture 12. Outline of Rule-Based Classification 1. Overview of ANN 2. Basic Feedforward ANN 3. Linear Perceptron Algorithm 4. Nonlinear and Multilayer.

Reinforcement Learning  Basic idea:  Receive feedback in the form of rewards  Agent’s utility is defined by the reward function  Must learn to act.

Xintao Wu University of Arkansas Introduction to Deep Learning 1.

CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.

Reinforcement Learning

Continuous Control with Prioritized Experience Replay

Reinforcement Learning

Online Multiscale Dynamic Topic Models

Introduction of Reinforcement Learning

Deep Learning Amin Sobhani.

Deep Reinforcement Learning

A Comparison of Learning Algorithms on the ALE

Adversarial Learning for Neural Dialogue Generation

A Crash Course in Reinforcement Learning

Mastering the game of Go with deep neural network and tree search

Wenhan Xiong, Thien Hoang, William Wang Department of Computer Science

Reinforcement learning (Chapter 21)

ReinforcementLearning: A package for replicating human behavior in R

István Szita & András Lőrincz

AlphaGo with Deep RL Alpha GO.

Reinforcement learning (Chapter 21)

Videos NYT Video: DeepMind's alphaGo: Match 4 Summary: see 11 min.

Reinforcement Learning

Deep reinforcement learning

Intelligent Information System Lab

Neural networks (3) Regularization Autoencoder

How it Works: Convolutional Neural Networks

Reinforcement Learning

Deep Reinforcement Learning

Reinforcement Learning

Reinforcement learning with unsupervised auxiliary tasks

"Playing Atari with deep reinforcement learning."

Human-level control through deep reinforcement learning

Deep learning Introduction Classes of Deep Learning Networks

Dr. Unnikrishnan P.C. Professor, EEE

RL for Large State Spaces: Value Function Approximation

Double Dueling Agent for Dialogue Policy Learning

Reinforcement Learning

Reinforcement Learning

Slides adapted from Geoffrey Hinton, Yann Le Cun, Yoshua Bengio

CS 188: Artificial Intelligence Spring 2006

Deep Reinforcement Learning

Designing Neural Network Architectures Using Reinforcement Learning

CMSC 471 – Fall 2011 Class #25 – Tuesday, November 29

Neural networks (3) Regularization Autoencoder

Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton

Unsupervised Perceptual Rewards For Imitation Learning

Deep Reinforcement Learning: Learning how to act using a deep neural network Psych 209, Winter 2019 February 12, 2019.

Reinforcement Learning (2)

Function approximation

Reinforcement Learning (2)

CS 440/ECE448 Lecture 22: Reinforcement Learning

Morteza Kheirkhah University College London

Reinforcement Learning

Model based RL Part 1.

A Deep Reinforcement Learning Approach to Traffic Management

Presentation transcript:

Deep Reinforcement Learning Ph.D Student Wangyu 王宇

Deep Q-Network Published on Nature. A CNN trained with a variant of Q-learning. Use Atari games as testbed. Use raw pixels as input. Not provided with any game-specific information or hand-designed visual features. Mnih, K. Kavukcuoglu, D. Silver, et al. Human-level control through deep reinforcement learning， Nature, 518(7540):529–533, 2015.

Contribution This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks. This develops a novel artificial agent, termed a deep Q-network, that can learn successful policies directly fromhigh-dimensional sensory inputs using end-to-end reinforcement learning. Mnih, K. Kavukcuoglu, D. Silver, et al. Human-level control through deep reinforcement learning， Nature, 518(7540):529–533, 2015.

What Hinton,Bengio & Lecun said about Deep RL We expect much of the future progress in vision to come from systems that are trained end-to-end and combine ConvNets with RNNs that use reinforcement learning to decide where to look. Yann LeCun, Yoshua Bengio & Geoffrey Hinton, Deep Learning， Nature, doi:10.1038,14539, 2015.

What is RL? Credit: Deep RL tutorial of David Silver, Google DeepMind. Policy : or RL’s task :Find the optimal policy which maximizes the reward.

Markov Decision Process Ordered sequence: Determinacy: Every adjustment of the arguments of algorithm will determinantly affect the world. MDP:

Retrurn,Value function and Bellman Equation Return(Reward) Value function: Bellman Equation:

Action-value function Optimal Action-value function: Iterative update:

Why this basic approach is impractical 1. Because the action-value function is estimated separately for each sequence, without any generalization. 2. How to get the future Q value(Qi) when calculating the current value (Qi+1). 3. Curse of Dimensionality. Function approximator !!!

How to approximate By a linear function approximator: 2. By a nonlinear function approximator such as a neural Network(Q-network).

The Arcade Learning Enviroment(ALE) 1. Visual input (210 x 160 RGB video at 60Hz) 2. A diverse and interesting set of tasks that were designed to be difficult for humans players. 3. Our goal is to create a single neural network agent that is able to successfully learn to play as many of the games as possible.

Schematic illustration

The structure of the CNN Actual code from https://github.com/kuz/DeepMind-Atari-Deep-Q-Learner

Q-learning Approximate target values: A Q-network can be trained by minimising a sequence of loss unctions Li(i) that changes at each iteration i. Differentiating the loss function with respect to the weights we arrive at the following gradient

Key Points Two networks with the same structure. The weights of target network is updated (copied) from the online network periodically. This algorithm is model-free, without explicitly estimating the reward and transition dynamics . 3. This algorithm is policy-free: Epsilon-GREEDY POLICY.

Training algorithm for deep Q-networks(1/2) Preprocessing : (1) Encode a single framewe take themaximumvalue for each pixel colour value over the frame being encoded and the previous frame; (2) Extract the Y channel and rescale from 210x160 to 84x84; (3) Stack the 4 most recent frames (84x84x4) as the input of Q-function. (4) Normalize the game reward points to +1 , 0 and -1. Experience Replay: (1) Greater data efficiency. (2) Breaks consecutive samples’ correlations and therefore reduces the variance of the updates (3) Smooth out learning and avoid oscillations or divergence in the parameters

Training algorithm for deep Q-networks(2/2) 3. Use two networks with the same structure. A delay between the timean update to Q is made and the time the update affects the targets making divergence or oscillations much more unlikely. 4. Error clipping. Clip the error term from the update to be between -1 and 1. This further improved the stability of the algorithm

Algorithm

Comparison

Visualization of learned value functions(1/2)

Visualization of learned value functions(1/2)

Comparison of cases with or without replay and target Q

Some improvements of DQN Calculation of target Q. Hado van Hasselt, Arthur Guez, David Silver, Deep Reinforcement Learning with Double Q-learning, 22 Sep 2015. Prioritized Experience Replay. Tom Schaul, John Quan, Ioannis Antonoglou, David Silver, Prioritized Experience Replay, 18 Nov 2015. 3. Dueling Network Architectures. Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, Nando de Freitas, Dueling Network Architectures for Deep Reinforcement Learning 20 Nov 2015. (ICML best paper)

Double Q-learning

Double Q-learning

End & Thanks