Insight: Steal from Existing Supervised Learning Methods! Training = {X,Y} Error = target output – actual output.

Slides:



Advertisements
Similar presentations
Short reading for Thursday Job talk at 1:30pm in ETRL 101 Kuka robotics –
Advertisements

Slides from: Doug Gray, David Poole
RL for Large State Spaces: Value Function Approximation
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Supervised Learning Recap
Machine Learning Week 2 Lecture 1.
Lecture 13 – Perceptrons Machine Learning March 16, 2010.
Artificial Neural Networks
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
CS Perceptrons1. 2 Basic Neuron CS Perceptrons3 Expanded Neuron.
Perceptron.
The loss function, the normal equation,
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Artificial Neural Networks
Lecture 14 – Neural Networks
Supervised and Unsupervised learning and application to Neuroscience Cours CA6b-4.
Simple Neural Nets For Pattern Classification
x – independent variable (input)
Chapter 8: Generalization and Function Approximation pLook at how experience with a limited part of the state set be used to produce good behavior over.
L15:Microarray analysis (Classification). The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Neural Networks Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.
Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic.
Artificial Neural Networks
Reinforcement Learning: Generalization and Function Brendan and Yifang Feb 10, 2015.
CS 4700: Foundations of Artificial Intelligence
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Machine learning Image source:
Neural Networks Lecture 8: Two simple learning algorithms
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
Drones Collecting Cell Phone Data in LA AdNear had already been using methods.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Biointelligence Laboratory, Seoul National University
Classification Part 3: Artificial Neural Networks
Chapter 8: Generalization and Function Approximation pLook at how experience with a limited part of the state set be used to produce good behavior over.
Computer Science and Engineering
1 Artificial Neural Networks Sanun Srisuk EECP0720 Expert Systems – Artificial Neural Networks.
Multi-Layer Perceptrons Michael J. Watts
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Reinforcement Learning Generalization and Function Approximation Subramanian Ramamoorthy School of Informatics 28 February, 2012.
Linear Discrimination Reading: Chapter 2 of textbook.
Non-Bayes classifiers. Linear discriminants, neural networks.
Linear Classification with Perceptrons
ADALINE (ADAptive LInear NEuron) Network and
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Fundamentals of Artificial Neural Networks Chapter 7 in amlbook.com.
For Friday No reading Take home exam due Exam 2. For Monday Read chapter 22, sections 1-3 FOIL exercise due.
Chapter 2 Single Layer Feedforward Networks
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
1 ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 12: Generalization and Function Approximation Dr. Itamar Arel College of Engineering.
EEE502 Pattern Recognition
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU 2: 1 Nonlinear Models.
1 ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 12: Generalization and Function Approximation Dr. Itamar Arel College of Engineering.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
Fall 2004 Backpropagation CS478 - Machine Learning.
Empirical risk minimization
10701 / Machine Learning.
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Machine Learning Today: Reading: Maria Florina Balcan
ECE 471/571 - Lecture 17 Back Propagation.
Chapter 8: Generalization and Function Approximation
Empirical risk minimization
Reinforcement Learning (2)
Presentation transcript:

Insight: Steal from Existing Supervised Learning Methods! Training = {X,Y} Error = target output – actual output

TD Backups as Training Examples Recall the TD(0) backup: As a training example: – Input = Features of s t – Target output = r t+1 + γV(s t+1 )

What FA methods can we use? In principle, anything! – Neural networks – Decision trees – Multivariate regression – Support Vector Machines – Gaussian Processes – Etc. But, we normally want to – Learn while interacting – Handle nonstationarity – Not take “too long” or use “too much” memory – Etc.

Perceptron Binary, linear classifier: Rosenblatt, 1957 Eventual failure of perceptron to do “everything” shifted field of AI towards symbolic representations Sum = w 1 x 1 + w 2 x 2 + … + w n x n Output is +1 if sum > 0, -1 otherwise w j = w j + (target – output) x j Also, can use x 0 = 1 and w 0 is therefore a bias

Perceptron Consider Perceptron with 3 weights: x, y, bias

Spatial-based Perceptron Weights

Gradient Descent w = (w 1, w 2, …, w n ) T Assume V t (s) sufficiently smooth differential function of w, for all states s in S Also, assume that training examples are in the form: Features of s t  V π (s t ) Goal: minimize errors on the observed samples

w t+1 =w t + α[V π (s t )-V t (s t )] w t V t (s t ) Vector of partial derivatives Recall… what is – 1D: derivative – Field (function defined on multi-dimensional domain): gradient of scalar field, gradient of vector field, divergence of vector field, or curl of vector field Δ Δ

Scalar field Gradient

Let J(w) be any function of the weight space The gradient at any point w t in this space is: Then, to iteratively move down the gradient: Why still doing this iteratively? If you could just eliminate the error, why could this be a bad idea?

Common goal is to minimize mean-squared error (MSE) over distribution d: Why does this make any sense? d is distribution of states receiving backups on- or off-policy distribution

Gradient Descent Each sample gradient is an unbiased estimate of the true gradient This will converge to a local minimum of the MSE if α decreases “appropriately” over time

Unfortunately, we don’t actually have v π (s) Instead, we just have an estimate of the target V t If the V t is an an unbiased estimate of v π (s t ), then we’ll converge to a local minimum (again with α caveat)

δ is or normal TD error e is vector of eligibility traces θ is a weight vector

Discussion: Pac-Man SarsaPacMan.java SarsaPacMan.java QFunction.java QFunction.java

Note that TD(λ) targets are biased But… we do it anyway

Linear Methods Why are these a particularly important type of function approximation? Parameter vector θ t Column vector of features φ s for every state (same number of components)

Linear Methods Gradient is simple Error surface for MSE is simple (single minimum)

Neural Networks How do we get around only linear solutions?

Neural Networks A multi-layer network of linear perceptrons is still linear. Non-linear (differentiable) units Logistic or tanh function