Deep Reinforcement Learning

Slides:

Advertisements

Similar presentations

Neural Networks and Kernel Methods

Advertisements

Slides from: Doug Gray, David Poole

RL for Large State Spaces: Value Function Approximation

ImageNet Classification with Deep Convolutional Neural Networks

Reinforcement Learning & Apprenticeship Learning Chenyi Chen.

לביצוע מיידי ! להתחלק לקבוצות –2 או 3 בקבוצה להעביר את הקבוצות – היום בסוף השיעור ! ספר Reinforcement Learning – הספר קיים online ( גישה מהאתר של הסדנה.

Overview of Back Propagation Algorithm

1 Reinforcement Learning: Learning algorithms Function Approximation Yishay Mansour Tel-Aviv University.

CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.

Reinforcement learning (Chapter 21)

Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.

ConvNets for Image Classification

Lecture 3b: CNN: Advanced Layers

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Reinforcement Learning

Machine Learning Supervised Learning Classification and Regression

Deep Reinforcement Learning

Reinforcement Learning

Learning to Compare Image Patches via Convolutional Neural Networks

Continuous Control with Prioritized Experience Replay

Convolutional Neural Network

Stochastic tree search and stochastic games

Reinforcement Learning

Reinforcement Learning

Environment Generation with GANs

Summary of “Efficient Deep Learning for Stereo Matching”

Introduction of Reinforcement Learning

Deep Learning Amin Sobhani.

A Comparison of Learning Algorithms on the ALE

Adversarial Learning for Neural Dialogue Generation

Chapter 6: Temporal Difference Learning

Mastering the game of Go with deep neural network and tree search

Reinforcement learning (Chapter 21)

ReinforcementLearning: A package for replicating human behavior in R

AlphaGo with Deep RL Alpha GO.

Reinforcement learning (Chapter 21)

Reinforcement Learning

Deep reinforcement learning

Intelligent Information System Lab

Neural networks (3) Regularization Autoencoder

Machine Learning Basics

Reinforcement learning with unsupervised auxiliary tasks

A brief introduction to neural network

"Playing Atari with deep reinforcement learning."

Human-level control through deep reinforcement learning

Convolutional Neural Networks for Visual Tracking

RL for Large State Spaces: Value Function Approximation

Double Dueling Agent for Dialogue Policy Learning

Reinforcement Learning

Reinforcement Learning

LECTURE 35: Introduction to EEG Processing

Convolutional networks

Neural Networks Geoff Hulten.

LECTURE 33: Alternative OPTIMIZERS

Chapter 6: Temporal Difference Learning

Deep Reinforcement Learning

Designing Neural Network Architectures Using Reinforcement Learning

Neural networks (3) Regularization Autoencoder

Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton

These neural networks take a description of the Go board as an input and process it through 12 different network layers containing millions of neuron-like.

Deep Reinforcement Learning: Learning how to act using a deep neural network Psych 209, Winter 2019 February 12, 2019.

Introduction to Neural Networks

Learning and Memorization

Function approximation

Distributed Reinforcement Learning for Multi-Robot Decentralized Collective Construction Gyu-Young Hwang

Week 7 Presentation Ngoc Ta Aidean Sharghi

CS 440/ECE448 Lecture 22: Reinforcement Learning

Morteza Kheirkhah University College London

An introduction to neural network and machine learning

CHAPTER 11 REINFORCEMENT LEARNING VIA TEMPORAL DIFFERENCES

Presentation transcript:

Deep Reinforcement Learning 1. Learning to play Atari arcade games 2. Learning to play Go

Learning to play Atari video games with Deep Reinforcement Learning Mnih et al., DeepMind Technologies, 2013 Deep learning: Requires large amount of hand-labeled data Assumes data samples are iid, with stationary distribution Reinforcement learning: Must learn from sparse, noisy, delayed reward function “Samples” are not independent Data distribution can change as system learns “online”.

Learning to play Atari video games with Deep Reinforcement Learning Mnih et al., DeepMind Technologies, 2013 https://www.youtube.com/watch?v=Up-a5x3coC0 Uses convolutional neural network: Input is raw pixels of video frames (~ the “state”) Output is estimated Q(s,a) for each possible action

Their system learns to play Atari 2600 games. 210x160 RGB video at 60 Hz. Designed to be difficult for human players “Our goal is to create a single neural network agent that is able to successfully learn to play as many of the games as possible.” No game-specific info provided. No hand-designed visual features. Learns exclusively from video input, the reward, and terminal signals. Network architecture and hyperparameters kept constant across all games.

CNN: Input is set of 4 most recent video frames, output vector is Q-value of each possible action, given the input state At each time step t, agent selects action at from set of legal actions A = {1,…, K}. Action at is passed to emulator. Game state and current score are updated. Agent observes image xt from emulator (vector of raw pixels representing current screen) Agent receives reward rt representing change in game score. (Usually 0.)

Loss function:

Three possible sources of training instability / non-convergence: Temporally adjacent “experiences” (s, a, r, s’) are clearly not independent or identically distributed. Small updates to Q can significantly change the policy Correlations between the action values Q and “target” values Proposed methods to address these: Use “experience replay” Use two networks: one for action values, Q(s, a ; θ), and one for targets, Q(s, a ; θ−) , and only update the target network periodically.

Experience Replay During training, to alleviate problem of correlated data and non-stationary distributions, use “experience reply” mechanism, which randomly samples previous transitions, and “thereby smooths the training distribution over many past behaviors.” “During learning, we apply Q-learning updates, on samples (or minibatches) of experience (s,a,r,s’) drawn uniformly at random from the pool of stored samples.”

Separating action-value and target networks Use two networks: Q(s, a; θ) is the network that estimates the current estimated value of taking action a in state s Q (s’, a’; θ−) is the network used for estimating the target: New loss function:

Updating networks Every time step, Q(s, a; θt) is updated using the target computed using Q (s’, a’; θt−). (Minimize loss function via stochastic gradient descent on θt (weights).) Every C time steps, θ− is replaced by θ.

output of CNN −

Experiments Used same CNN architecture, learning algorithm, and parameters across a suite of Atari games.

Video https://www.youtube.com/watch?v=Ih8EfvOzBOY

More detailed results Space Invaders Seaquest Epsilon-greedy action section, with epsilon = 0.05

“The performance of DQNis normalized with respect to a professional human games tester (that is, 100% level) and random play (that is, 0% level). Note that the normalized performance of DQN, expressed as a percentage, is calculated as: 100 * (DQN score − random play score) / (human score − random play score).”

Mastering the game of Go with deep neural networks and tree search Silver et al., 2016 https://www.youtube.com/watch?v=53YLZBSS0cc