Chapter 8: Generalization and Function Approximation pLook at how experience with a limited part of the state set be used to produce good behavior over.

Slides:



Advertisements
Similar presentations
Short reading for Thursday Job talk at 1:30pm in ETRL 101 Kuka robotics –
Advertisements

Slides from: Doug Gray, David Poole
RL for Large State Spaces: Value Function Approximation
1 Tópicos Especiais em Aprendizagem Prof. Reinaldo Bianchi Centro Universitário da FEI 2012.
1 Temporal-Difference Learning Week #6. 2 Introduction Temporal-Difference (TD) Learning –a combination of DP and MC methods updates estimates based on.
The loss function, the normal equation,
R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Chapter 4: Dynamic Programming pOverview of a collection of classical solution.
Adapted from R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction From Sutton & Barto Reinforcement Learning An Introduction.
x – independent variable (input)
R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Chapter 4: Dynamic Programming pOverview of a collection of classical solution.
Radial Basis Functions
R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Chapter 2: Evaluative Feedback pEvaluating actions vs. instructing by giving correct.
Instance Based Learning
May 18,2004A Reinforcement Learning Method Based on Adaptive Simlated Annealing1 A Reinforcement Learning Method Based on Adaptive Simulated Annealing.
November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.
Chapter 6: Temporal Difference Learning
R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 From Sutton & Barto Reinforcement Learning An Introduction.
Chapter 6: Temporal Difference Learning
An Introduction to COMPUTATIONAL REINFORCEMENT LEARING Andrew G. Barto Department of Computer Science University of Massachusetts – Amherst Lecture 3 Autonomous.
Reinforcement Learning: Generalization and Function Brendan and Yifang Feb 10, 2015.
R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Chapter 4: Dynamic Programming pOverview of a collection of classical solution.
Radial Basis Function Networks
Neural Networks Lecture 8: Two simple learning algorithms
Drones Collecting Cell Phone Data in LA AdNear had already been using methods.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Temporal Difference Learning By John Lenz. Reinforcement Learning Agent interacting with environment Agent receives reward signal based on previous action.
Chapter 8: Generalization and Function Approximation pLook at how experience with a limited part of the state set be used to produce good behavior over.
1 Artificial Neural Networks Sanun Srisuk EECP0720 Expert Systems – Artificial Neural Networks.
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Overcoming the Curse of Dimensionality with Reinforcement Learning Rich Sutton AT&T Labs with thanks to Doina Precup, Peter Stone, Satinder Singh, David.
Reinforcement Learning Generalization and Function Approximation Subramanian Ramamoorthy School of Informatics 28 February, 2012.
Linear Discrimination Reading: Chapter 2 of textbook.
Non-Bayes classifiers. Linear discriminants, neural networks.
CMSC 471 Fall 2009 Temporal Difference Learning Prof. Marie desJardins Class #25 – Tuesday, 11/24 Thanks to Rich Sutton and Andy Barto for the use of their.
Off-Policy Temporal-Difference Learning with Function Approximation Doina Precup McGill University Rich Sutton Sanjoy Dasgupta AT&T Labs.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Insight: Steal from Existing Supervised Learning Methods! Training = {X,Y} Error = target output – actual output.
Reinforcement Learning with Laser Cats! Marshall Wang Maria Jahja DTR Group Meeting October 5, 2015.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
1 ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 12: Generalization and Function Approximation Dr. Itamar Arel College of Engineering.
Decision Making Under Uncertainty Lec #9: Approximate Value Function UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Some slides by Jeremy.
Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.
RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU 2: 1 Nonlinear Models.
1 ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 12: Generalization and Function Approximation Dr. Itamar Arel College of Engineering.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Today’s Lecture Neural networks Training
Adavanced Numerical Computation 2008, AM NDHU
A Simple Artificial Neuron
CSC 578 Neural Networks and Deep Learning
CMSC 671 – Fall 2010 Class #22 – Wednesday 11/17
CMSC 471 Fall 2009 RL using Dynamic Programming
Chapter 4: Dynamic Programming
Neuro-Computing Lecture 4 Radial Basis Function Network
Chapter 4: Dynamic Programming
RL for Large State Spaces: Value Function Approximation
Chapter 2: Evaluative Feedback
Chapter 8: Generalization and Function Approximation
The loss function, the normal equation,
CMSC 471 – Fall 2011 Class #25 – Tuesday, November 29
Mathematical Foundations of BME Reza Shadmehr
Artificial Intelligence 10. Neural Networks
Multivariate Methods Berlin Chen, 2005 References:
Chapter 4: Dynamic Programming
Prediction Networks Prediction A simple example (section 3.7.3)
Chapter 2: Evaluative Feedback
Presentation transcript:

Chapter 8: Generalization and Function Approximation pLook at how experience with a limited part of the state set be used to produce good behavior over a much larger part. pOverview of function approximation (FA) methods and how they can be adapted to RL Objectives of this chapter:

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 2 Value Prediction with FA As usual: Policy Evaluation (the prediction problem): for a given policy , compute the state-value function In earlier chapters, value functions were stored in lookup tables.

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 3 Adapt Supervised Learning Algorithms Supervised Learning System Inputs Outputs Training Info = desired (target) outputs Error = (target output – actual output) Training example = {input, target output}

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 4 Backups as Training Examples As a training example: inputtarget output

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 5 Any Function Approximation Method? pIn principle, yes: artificial neural networks decision trees multivariate regression methods etc. pBut RL has some special requirements: usually want to learn while interacting ability to handle nonstationarity other?

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 6 Gradient Descent Methods transpose

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 7 Performance Measures pMany are applicable but… pa common and simple one is the mean-squared error (MSE) over a distribution P to weight various errors: pWhy P ? pWhy minimize MSE? pLet us assume that P is always the distribution of states at which backups are done. pThe on-policy distribution: the distribution created while following the policy being evaluated. Stronger results are available for this distribution.

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 8 Gradient Descent Iteratively move down the gradient:

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 9 Gradient Descent Cont. For the MSE given above and using the chain rule:

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 10 Gradient Descent Cont. Assume that states appear with distribution P Use just the sample gradient instead: Since each sample gradient is an unbiased estimate of the true gradient, this converges to a local minimum of the MSE if  decreases appropriately with t.

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 11 But We Don’t have these Targets

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 12 What about TD( ) Targets?

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 13 On-Line Gradient-Descent TD( )

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 14 Linear Methods

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 15 Nice Properties of Linear FA Methods pThe gradient is very simple: pFor MSE, the error surface is simple: quadratic surface with a single minumum.  Linear gradient descent TD( ) converges: Step size decreases appropriately On-line sampling (states sampled from the on-policy distribution) Converges to parameter vector with property: best parameter vector (Tsitsiklis & Van Roy, 1997)

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 16 Coarse Coding

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 17 Learning and Coarse Coding

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 18 Tile Coding pBinary feature for each tile pNumber of features present at any one time is constant pBinary features means weighted sum easy to compute pEasy to compute indices of the features present

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 19 Tile Coding Cont. Irregular tilings Hashing CMAC “Cerebellar model arithmetic computer” Albus 1971

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 20 Radial Basis Functions (RBFs) e.g., Gaussians

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 21 Can you beat the “curse of dimensionality”? pCan you keep the number of features from going up exponentially with the dimension? pFunction complexity, not dimensionality, is the problem. pKanerva coding: Select a bunch of binary prototypes Use hamming distance as distance measure Dimensionality is no longer a problem, only complexity p“Lazy learning” schemes: Remember all the data To get new value, find nearest neighbors and interpolate e.g., locally-weighted regression

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 22 Control with Function Approximation Training examples of the form: pLearning state-action values pThe general gradient-descent rule:  Gradient-descent Sarsa( ) (backward view):

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 23 GPI with Linear Gradient Descent Sarsa( )

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 24 GPI Linear Gradient Descent Watkins’ Q( )

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 25 Mountain-Car Task

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 26 Mountain-Car Results

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 27 Baird’s Counterexample Reward is always zero, so the true value is zero for all s The approximate of state value function is shown in each state Updating Is unstable from some initial conditions

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 28 Baird’s Counterexample Cont.

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 29 Should We Bootstrap?

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 30 Summary pGeneralization pAdapting supervised-learning function approximation methods pGradient-descent methods pLinear gradient-descent methods Radial basis functions Tile coding Kanerva coding pNonlinear gradient-descent methods? Backpropation? pSubtleties involving function approximation, bootstrapping and the on-policy/off-policy distinction