Deep Reinforcement Learning

Slides:



Advertisements
Similar presentations
A machine learning perspective on neural networks and learning tools
Advertisements

Reinforcement learning
Value Iteration & Q-learning CS 5368 Song Cui. Outline Recap Value Iteration Q-learning.
Reinforcement Learning & Apprenticeship Learning Chenyi Chen.
Deep Learning.
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.
1 Reinforcement Learning: Learning algorithms Function Approximation Yishay Mansour Tel-Aviv University.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.
Model-based Bayesian Reinforcement Learning in Partially Observable Domains by Pascal Poupart and Nikos Vlassis (2008 International Symposium on Artificial.
Reinforcement learning (Chapter 21)
Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.
Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.
Lecture 12. Outline of Rule-Based Classification 1. Overview of ANN 2. Basic Feedforward ANN 3. Linear Perceptron Algorithm 4. Nonlinear and Multilayer.
Reinforcement Learning  Basic idea:  Receive feedback in the form of rewards  Agent’s utility is defined by the reward function  Must learn to act.
Xintao Wu University of Arkansas Introduction to Deep Learning 1.
CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.
Reinforcement Learning
Continuous Control with Prioritized Experience Replay
Reinforcement Learning
Online Multiscale Dynamic Topic Models
Introduction of Reinforcement Learning
Deep Learning Amin Sobhani.
Deep Reinforcement Learning
A Comparison of Learning Algorithms on the ALE
Adversarial Learning for Neural Dialogue Generation
A Crash Course in Reinforcement Learning
Mastering the game of Go with deep neural network and tree search
Wenhan Xiong, Thien Hoang, William Wang Department of Computer Science
Reinforcement learning (Chapter 21)
ReinforcementLearning: A package for replicating human behavior in R
István Szita & András Lőrincz
AlphaGo with Deep RL Alpha GO.
Reinforcement learning (Chapter 21)
Videos NYT Video: DeepMind's alphaGo: Match 4 Summary: see 11 min.
Reinforcement Learning
Deep reinforcement learning
Intelligent Information System Lab
Neural networks (3) Regularization Autoencoder
How it Works: Convolutional Neural Networks
Reinforcement Learning
Deep Reinforcement Learning
Reinforcement Learning
Reinforcement learning with unsupervised auxiliary tasks
"Playing Atari with deep reinforcement learning."
Human-level control through deep reinforcement learning
Deep learning Introduction Classes of Deep Learning Networks
Dr. Unnikrishnan P.C. Professor, EEE
RL for Large State Spaces: Value Function Approximation
Double Dueling Agent for Dialogue Policy Learning
Reinforcement Learning
Reinforcement Learning
Slides adapted from Geoffrey Hinton, Yann Le Cun, Yoshua Bengio
CS 188: Artificial Intelligence Spring 2006
Deep Reinforcement Learning
Designing Neural Network Architectures Using Reinforcement Learning
CMSC 471 – Fall 2011 Class #25 – Tuesday, November 29
Neural networks (3) Regularization Autoencoder
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Unsupervised Perceptual Rewards For Imitation Learning
Deep Reinforcement Learning: Learning how to act using a deep neural network Psych 209, Winter 2019 February 12, 2019.
Reinforcement Learning (2)
Function approximation
Reinforcement Learning (2)
CS 440/ECE448 Lecture 22: Reinforcement Learning
Morteza Kheirkhah University College London
Reinforcement Learning
Model based RL Part 1.
A Deep Reinforcement Learning Approach to Traffic Management
Presentation transcript:

Deep Reinforcement Learning Ph.D Student Wangyu 王宇

Deep Q-Network Published on Nature. A CNN trained with a variant of Q-learning. Use Atari games as testbed. Use raw pixels as input. Not provided with any game-specific information or hand-designed visual features. Mnih, K. Kavukcuoglu, D. Silver, et al. Human-level control through deep reinforcement learning, Nature, 518(7540):529–533, 2015.

Contribution This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks. This develops a novel artificial agent, termed a deep Q-network, that can learn successful policies directly fromhigh-dimensional sensory inputs using end-to-end reinforcement learning. Mnih, K. Kavukcuoglu, D. Silver, et al. Human-level control through deep reinforcement learning, Nature, 518(7540):529–533, 2015.

What Hinton,Bengio & Lecun said about Deep RL We expect much of the future progress in vision to come from systems that are trained end-to-end and combine ConvNets with RNNs that use reinforcement learning to decide where to look. Yann LeCun, Yoshua Bengio & Geoffrey Hinton, Deep Learning, Nature, doi:10.1038,14539, 2015.

What is RL? Credit: Deep RL tutorial of David Silver, Google DeepMind. Policy : or RL’s task :Find the optimal policy which maximizes the reward.

Markov Decision Process Ordered sequence: Determinacy: Every adjustment of the arguments of algorithm will determinantly affect the world. MDP:

Retrurn,Value function and Bellman Equation Return(Reward) Value function: Bellman Equation:

Action-value function Optimal Action-value function: Iterative update:

Why this basic approach is impractical 1. Because the action-value function is estimated separately for each sequence, without any generalization. 2. How to get the future Q value(Qi) when calculating the current value (Qi+1). 3. Curse of Dimensionality. Function approximator !!!

How to approximate By a linear function approximator: 2. By a nonlinear function approximator such as a neural Network(Q-network).

The Arcade Learning Enviroment(ALE) 1. Visual input (210 x 160 RGB video at 60Hz) 2. A diverse and interesting set of tasks that were designed to be difficult for humans players. 3. Our goal is to create a single neural network agent that is able to successfully learn to play as many of the games as possible.

Schematic illustration

The structure of the CNN Actual code from https://github.com/kuz/DeepMind-Atari-Deep-Q-Learner

Q-learning Approximate target values: A Q-network can be trained by minimising a sequence of loss unctions Li(i) that changes at each iteration i. Differentiating the loss function with respect to the weights we arrive at the following gradient

Key Points Two networks with the same structure. The weights of target network is updated (copied) from the online network periodically. This algorithm is model-free, without explicitly estimating the reward and transition dynamics . 3. This algorithm is policy-free: Epsilon-GREEDY POLICY.

Training algorithm for deep Q-networks(1/2) Preprocessing : (1) Encode a single framewe take themaximumvalue for each pixel colour value over the frame being encoded and the previous frame; (2) Extract the Y channel and rescale from 210x160 to 84x84; (3) Stack the 4 most recent frames (84x84x4) as the input of Q-function. (4) Normalize the game reward points to +1 , 0 and -1. Experience Replay: (1) Greater data efficiency. (2) Breaks consecutive samples’ correlations and therefore reduces the variance of the updates (3) Smooth out learning and avoid oscillations or divergence in the parameters

Training algorithm for deep Q-networks(2/2) 3. Use two networks with the same structure. A delay between the timean update to Q is made and the time the update affects the targets making divergence or oscillations much more unlikely. 4. Error clipping. Clip the error term from the update to be between -1 and 1. This further improved the stability of the algorithm

Algorithm

Comparison

Visualization of learned value functions(1/2)

Visualization of learned value functions(1/2)

Comparison of cases with or without replay and target Q

Some improvements of DQN Calculation of target Q. Hado van Hasselt, Arthur Guez, David Silver, Deep Reinforcement Learning with Double Q-learning, 22 Sep 2015. Prioritized Experience Replay. Tom Schaul, John Quan, Ioannis Antonoglou, David Silver, Prioritized Experience Replay, 18 Nov 2015. 3. Dueling Network Architectures. Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, Nando de Freitas, Dueling Network Architectures for Deep Reinforcement Learning 20 Nov 2015. (ICML best paper)

Double Q-learning

Double Q-learning

End & Thanks