Chapter 8: Generalization and Function Approximation pLook at how experience with a limited part of the state set be used to produce good behavior over.

Chapter 8: Generalization and Function Approximation pLook at how experience with a limited part of the state set be used to produce good behavior over a much larger part. pOverview of function approximation (FA) methods and how they can be adapted to RL Objectives of this chapter:

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 2 Any FA Method? pIn principle, yes: artificial neural networks decision trees multivariate regression methods etc. pBut RL has some special requirements: usually want to learn while interacting ability to handle nonstationarity other?

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 3 Gradient Descent Methods transpose

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 4 Performance Measures pMany are applicable but… pa common and simple one is the mean-squared error (MSE) over a distribution P : pWhy P ? pWhy minimize MSE? pLet us assume that P is always the distribution of states at which backups are done. pThe on-policy distribution: the distribution created while following the policy being evaluated. Stronger results are available for this distribution.

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 5 Gradient Descent Iteratively move down the gradient:

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 6 On-Line Gradient-Descent TD( )

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 7 Linear Methods

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 8 Nice Properties of Linear FA Methods pThe gradient is very simple: pFor MSE, the error surface is simple: quadratic surface with a single minumum.  Linear gradient descent TD( ) converges: Step size decreases appropriately On-line sampling (states sampled from the on-policy distribution) Converges to parameter vector with property: best parameter vector (Tsitsiklis & Van Roy, 1997)

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 9 Coarse Coding

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 10 Shaping Generalization in Coarse Coding

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 11 Learning and Coarse Coding

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 12 Tile Coding pBinary feature for each tile pNumber of features present at any one time is constant pBinary features means weighted sum easy to compute pEasy to compute indices of the freatures present

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 13 Tile Coding Cont. Irregular tilings Hashing CMAC “Cerebellar model arithmetic computer” Albus 1971

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 14 Radial Basis Functions (RBFs) e.g., Gaussians

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 15 Can you beat the “curse of dimensionality”? pCan you keep the number of features from going up exponentially with the dimension? pFunction complexity, not dimensionality, is the problem. pKanerva coding: Select a bunch of binary prototypes Use hamming distance as distance measure Dimensionality is no longer a problem, only complexity p“Lazy learning” schemes: Remember all the data To get new value, find nearest neighbors and interpolate e.g., locally-weighted regression

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 16 pLearning state-action values pThe general gradient-descent rule:  Gradient-descent Sarsa( ) (backward view): Control with FA Training examples of the form:

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 17 GPI Linear Gradient Descent Watkins’ Q( )

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 18 GPI with Linear Gradient Descent Sarsa( )

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 19 Mountain-Car Task

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 20 Mountain Car with Radial Basis Functions

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 21 Mountain-Car Results

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 22 Baird’s Counterexample

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 23 Baird’s Counterexample Cont.

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 24 Should We Bootstrap?

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 25 Summary pGeneralization pAdapting supervised-learning function approximation methods pGradient-descent methods pLinear gradient-descent methods Radial basis functions Tile coding Kanerva coding pNonlinear gradient-descent methods? Backpropation? pSubtleties involving function approximation, bootstrapping and the on-policy/off-policy distinction

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 26 Value Prediction with FA As usual: Policy Evaluation (the prediction problem): for a given policy , compute the state-value function In earlier chapters, value functions were stored in lookup tables.

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 27 Adapt Supervised Learning Algorithms Supervised Learning System Inputs Outputs Training Info = desired (target) outputs Error = (target output – actual output) Training example = {input, target output}

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 28 Backups as Training Examples As a training example: inputtarget output

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 29 Gradient Descent Cont. For the MSE given above and using the chain rule:

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 30 Gradient Descent Cont. Use just the sample gradient instead: Since each sample gradient is an unbiased estimate of the true gradient, this converges to a local minimum of the MSE if  decreases appropriately with t.

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 31 But We Don’t have these Targets

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 32 What about TD( ) Targets?

Chapter 8: Generalization and Function Approximation pLook at how experience with a limited part of the state set be used to produce good behavior over.

Similar presentations

Presentation on theme: "Chapter 8: Generalization and Function Approximation pLook at how experience with a limited part of the state set be used to produce good behavior over."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 8: Generalization and Function Approximation pLook at how experience with a limited part of the state set be used to produce good behavior over.

Similar presentations

Presentation on theme: "Chapter 8: Generalization and Function Approximation pLook at how experience with a limited part of the state set be used to produce good behavior over."— Presentation transcript:

Similar presentations

About project

Feedback