Deep Reinforcement Learning

Slides:



Advertisements
Similar presentations
Neural Networks and Kernel Methods
Advertisements

Slides from: Doug Gray, David Poole
RL for Large State Spaces: Value Function Approximation
ImageNet Classification with Deep Convolutional Neural Networks
Reinforcement Learning & Apprenticeship Learning Chenyi Chen.
לביצוע מיידי ! להתחלק לקבוצות –2 או 3 בקבוצה להעביר את הקבוצות – היום בסוף השיעור ! ספר Reinforcement Learning – הספר קיים online ( גישה מהאתר של הסדנה.
Overview of Back Propagation Algorithm
1 Reinforcement Learning: Learning algorithms Function Approximation Yishay Mansour Tel-Aviv University.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Reinforcement learning (Chapter 21)
Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.
ConvNets for Image Classification
Lecture 3b: CNN: Advanced Layers
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Reinforcement Learning
Machine Learning Supervised Learning Classification and Regression
Deep Reinforcement Learning
Reinforcement Learning
Learning to Compare Image Patches via Convolutional Neural Networks
Continuous Control with Prioritized Experience Replay
Convolutional Neural Network
Stochastic tree search and stochastic games
Reinforcement Learning
Reinforcement Learning
Environment Generation with GANs
Summary of “Efficient Deep Learning for Stereo Matching”
Introduction of Reinforcement Learning
Deep Learning Amin Sobhani.
A Comparison of Learning Algorithms on the ALE
Adversarial Learning for Neural Dialogue Generation
Chapter 6: Temporal Difference Learning
Mastering the game of Go with deep neural network and tree search
Reinforcement learning (Chapter 21)
ReinforcementLearning: A package for replicating human behavior in R
AlphaGo with Deep RL Alpha GO.
Reinforcement learning (Chapter 21)
Reinforcement Learning
Deep reinforcement learning
Intelligent Information System Lab
Neural networks (3) Regularization Autoencoder
Machine Learning Basics
Reinforcement learning with unsupervised auxiliary tasks
A brief introduction to neural network
"Playing Atari with deep reinforcement learning."
Human-level control through deep reinforcement learning
Convolutional Neural Networks for Visual Tracking
RL for Large State Spaces: Value Function Approximation
Double Dueling Agent for Dialogue Policy Learning
Reinforcement Learning
Reinforcement Learning
LECTURE 35: Introduction to EEG Processing
Convolutional networks
Neural Networks Geoff Hulten.
LECTURE 33: Alternative OPTIMIZERS
Chapter 6: Temporal Difference Learning
Deep Reinforcement Learning
Designing Neural Network Architectures Using Reinforcement Learning
Neural networks (3) Regularization Autoencoder
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
These neural networks take a description of the Go board as an input and process it through 12 different network layers containing millions of neuron-like.
Deep Reinforcement Learning: Learning how to act using a deep neural network Psych 209, Winter 2019 February 12, 2019.
Introduction to Neural Networks
Learning and Memorization
Function approximation
Distributed Reinforcement Learning for Multi-Robot Decentralized Collective Construction Gyu-Young Hwang
Week 7 Presentation Ngoc Ta Aidean Sharghi
CS 440/ECE448 Lecture 22: Reinforcement Learning
Morteza Kheirkhah University College London
An introduction to neural network and machine learning
CHAPTER 11 REINFORCEMENT LEARNING VIA TEMPORAL DIFFERENCES
Presentation transcript:

Deep Reinforcement Learning 1. Learning to play Atari arcade games 2. Learning to play Go

Learning to play Atari video games with Deep Reinforcement Learning Mnih et al., DeepMind Technologies, 2013 Deep learning: Requires large amount of hand-labeled data Assumes data samples are iid, with stationary distribution Reinforcement learning: Must learn from sparse, noisy, delayed reward function “Samples” are not independent Data distribution can change as system learns “online”.

Learning to play Atari video games with Deep Reinforcement Learning Mnih et al., DeepMind Technologies, 2013 https://www.youtube.com/watch?v=Up-a5x3coC0 Uses convolutional neural network: Input is raw pixels of video frames (~ the “state”) Output is estimated Q(s,a) for each possible action

Their system learns to play Atari 2600 games. 210x160 RGB video at 60 Hz. Designed to be difficult for human players “Our goal is to create a single neural network agent that is able to successfully learn to play as many of the games as possible.” No game-specific info provided. No hand-designed visual features. Learns exclusively from video input, the reward, and terminal signals. Network architecture and hyperparameters kept constant across all games.

CNN: Input is set of 4 most recent video frames, output vector is Q-value of each possible action, given the input state At each time step t, agent selects action at from set of legal actions A = {1,…, K}. Action at is passed to emulator. Game state and current score are updated. Agent observes image xt from emulator (vector of raw pixels representing current screen) Agent receives reward rt representing change in game score. (Usually 0.)

Loss function:

Three possible sources of training instability / non-convergence: Temporally adjacent “experiences” (s, a, r, s’) are clearly not independent or identically distributed. Small updates to Q can significantly change the policy Correlations between the action values Q and “target” values Proposed methods to address these: Use “experience replay” Use two networks: one for action values, Q(s, a ; θ), and one for targets, Q(s, a ; θ−) , and only update the target network periodically.

Experience Replay During training, to alleviate problem of correlated data and non-stationary distributions, use “experience reply” mechanism, which randomly samples previous transitions, and “thereby smooths the training distribution over many past behaviors.” “During learning, we apply Q-learning updates, on samples (or minibatches) of experience (s,a,r,s’) drawn uniformly at random from the pool of stored samples.”

Separating action-value and target networks Use two networks: Q(s, a; θ) is the network that estimates the current estimated value of taking action a in state s Q (s’, a’; θ−) is the network used for estimating the target: New loss function:

Updating networks Every time step, Q(s, a; θt) is updated using the target computed using Q (s’, a’; θt−). (Minimize loss function via stochastic gradient descent on θt (weights).) Every C time steps, θ− is replaced by θ.

output of CNN −

Experiments Used same CNN architecture, learning algorithm, and parameters across a suite of Atari games.

Video https://www.youtube.com/watch?v=Ih8EfvOzBOY

More detailed results Space Invaders Seaquest Epsilon-greedy action section, with epsilon = 0.05

“The performance of DQNis normalized with respect to a professional human games tester (that is, 100% level) and random play (that is, 0% level). Note that the normalized performance of DQN, expressed as a percentage, is calculated as: 100 * (DQN score − random play score) / (human score − random play score).”

Mastering the game of Go with deep neural networks and tree search Silver et al., 2016 https://www.youtube.com/watch?v=53YLZBSS0cc