Chapter 8: Generalization and Function Approximation

Slides:



Advertisements
Similar presentations
Short reading for Thursday Job talk at 1:30pm in ETRL 101 Kuka robotics –
Advertisements

Slides from: Doug Gray, David Poole
1 Tópicos Especiais em Aprendizagem Prof. Reinaldo Bianchi Centro Universitário da FEI 2012.
Instance Based Learning
1 Machine Learning: Lecture 7 Instance-Based Learning (IBL) (Based on Chapter 8 of Mitchell T.., Machine Learning, 1997)
The loss function, the normal equation,
x – independent variable (input)
Chapter 8: Generalization and Function Approximation pLook at how experience with a limited part of the state set be used to produce good behavior over.
Radial Basis Functions
Instance Based Learning
May 18,2004A Reinforcement Learning Method Based on Adaptive Simlated Annealing1 A Reinforcement Learning Method Based on Adaptive Simulated Annealing.
November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.
Prediction Networks Prediction –Predict f(t) based on values of f(t – 1), f(t – 2),… –Two NN models: feedforward and recurrent A simple example (section.
Artificial Neural Networks
Reinforcement Learning: Generalization and Function Brendan and Yifang Feb 10, 2015.
Hazırlayan NEURAL NETWORKS Radial Basis Function Networks I PROF. DR. YUSUF OYSAL.
CS Instance Based Learning1 Instance Based Learning.
Hazırlayan NEURAL NETWORKS Radial Basis Function Networks II PROF. DR. YUSUF OYSAL.
Radial Basis Function Networks
Drones Collecting Cell Phone Data in LA AdNear had already been using methods.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Chapter 8: Generalization and Function Approximation pLook at how experience with a limited part of the state set be used to produce good behavior over.
1 Artificial Neural Networks Sanun Srisuk EECP0720 Expert Systems – Artificial Neural Networks.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Reinforcement Learning Generalization and Function Approximation Subramanian Ramamoorthy School of Informatics 28 February, 2012.
Linear Discrimination Reading: Chapter 2 of textbook.
Non-Bayes classifiers. Linear discriminants, neural networks.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Insight: Steal from Existing Supervised Learning Methods! Training = {X,Y} Error = target output – actual output.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
1 ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 12: Generalization and Function Approximation Dr. Itamar Arel College of Engineering.
Decision Making Under Uncertainty Lec #9: Approximate Value Function UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Some slides by Jeremy.
RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU 2: 1 Nonlinear Models.
1 ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 12: Generalization and Function Approximation Dr. Itamar Arel College of Engineering.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Today’s Lecture Neural networks Training
Predictive Learning from Data
Adavanced Numerical Computation 2008, AM NDHU
第 3 章 神经网络.
A Simple Artificial Neuron
Machine Learning Today: Reading: Maria Florina Balcan
CSC 578 Neural Networks and Deep Learning
CMSC 671 – Fall 2010 Class #22 – Wednesday 11/17
Collaborative Filtering Matrix Factorization Approach
CMSC 471 Fall 2009 RL using Dynamic Programming
Chapter 4: Dynamic Programming
Neuro-Computing Lecture 4 Radial Basis Function Network
Chapter 4: Dynamic Programming
Instance Based Learning
RL for Large State Spaces: Value Function Approximation
Predictive Learning from Data
Chapter 2: Evaluative Feedback
Chap 8. Instance Based Learning
Capabilities of Threshold Neurons
The loss function, the normal equation,
CMSC 471 – Fall 2011 Class #25 – Tuesday, November 29
Machine Learning: UNIT-4 CHAPTER-1
Mathematical Foundations of BME Reza Shadmehr
Chapter 7: Eligibility Traces
Artificial Intelligence 10. Neural Networks
Machine learning overview
Multivariate Methods Berlin Chen, 2005 References:
Chapter 4: Dynamic Programming
Linear Discrimination
Prediction Networks Prediction A simple example (section 3.7.3)
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
Chapter 2: Evaluative Feedback
Reinforcement Learning (2)
Presentation transcript:

Chapter 8: Generalization and Function Approximation Objectives of this chapter: Look at how experience with a limited part of the state set be used to produce good behavior over a much larger part. Overview of function approximation (FA) methods and how they can be adapted to RL

Value Prediction with FA As usual: Policy Evaluation (the prediction problem): for a given policy p, compute the state-value function In earlier chapters, value functions were stored in lookup tables. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Adapt Supervised Learning Algorithms Training Info = desired (target) outputs Supervised Learning System Inputs Outputs Training example = {input, target output} Error = (target output – actual output) R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Backups as Training Examples As a training example: input target output R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Any FA Method? In principle, yes: artificial neural networks decision trees multivariate regression methods etc. But RL has some special requirements: usually want to learn while interacting ability to handle nonstationarity other? R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Gradient Descent Methods transpose R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Performance Measures Many are applicable but… a common and simple one is the mean-squared error (MSE) over a distribution P : Why P ? Why minimize MSE? Let us assume that P is always the distribution of states at which backups are done. The on-policy distribution: the distribution created while following the policy being evaluated. Stronger results are available for this distribution. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Gradient Descent Iteratively move down the gradient: R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Gradient Descent Cont. For the MSE given above and using the chain rule: R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Gradient Descent Cont. Use just the sample gradient instead: Since each sample gradient is an unbiased estimate of the true gradient, this converges to a local minimum of the MSE if a decreases appropriately with t. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

But We Don’t have these Targets R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

What about TD(l) Targets? R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

On-Line Gradient-Descent TD(l) R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Linear Methods R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Nice Properties of Linear FA Methods The gradient is very simple: For MSE, the error surface is simple: quadratic surface with a single minumum. Linear gradient descent TD(l) converges: Step size decreases appropriately On-line sampling (states sampled from the on-policy distribution) Converges to parameter vector with property: best parameter vector (Tsitsiklis & Van Roy, 1997) R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Coarse Coding R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Learning and Coarse Coding R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Tile Coding Binary feature for each tile Number of features present at any one time is constant Binary features means weighted sum easy to compute Easy to compute indices of the freatures present R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Tile Coding Cont. Irregular tilings Hashing CMAC “Cerebellar model arithmetic computer” Albus 1971 R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Radial Basis Functions (RBFs) e.g., Gaussians R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Can you beat the “curse of dimensionality”? Can you keep the number of features from going up exponentially with the dimension? Function complexity, not dimensionality, is the problem. Kanerva coding: Select a bunch of binary prototypes Use hamming distance as distance measure Dimensionality is no longer a problem, only complexity “Lazy learning” schemes: Remember all the data To get new value, find nearest neighbors and interpolate e.g., locally-weighted regression R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Control with FA Learning state-action values The general gradient-descent rule: Gradient-descent Sarsa(l) (backward view): Training examples of the form: R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

GPI with Linear Gradient Descent Sarsa(l) R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

GPI Linear Gradient Descent Watkins’ Q(l) R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Mountain-Car Task R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Mountain-Car Results R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Baird’s Counterexample R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Baird’s Counterexample Cont. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Should We Bootstrap? R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Summary Generalization Adapting supervised-learning function approximation methods Gradient-descent methods Linear gradient-descent methods Radial basis functions Tile coding Kanerva coding Nonlinear gradient-descent methods? Backpropation? Subleties involving function approximation, bootstrapping and the on-policy/off-policy distinction R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction