Intro to Reinforcement Learning Learning how to get what you want...

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

The Maximum Likelihood Method
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 18 Sampling Distribution Models.
The 10 Commandments for Java Programmers VII: Thou shalt study thy libraries with care and strive not to reinvent them without cause, that thy code may.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #21.
Integration of sensory modalities
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Bayesian Wrap-Up (probably). 5 minutes of math... Marginal probabilities If you have a joint PDF:... and want to know about the probability of just one.
Visual Recognition Tutorial
Bayesian Learning, Part 1 of (probably) 4 Reading: Bishop Ch. 1.2, 1.5, 2.3.
Maximum likelihood (ML) and likelihood ratio (LR) test
Bayesian Learning, Part 1 of (probably) 4 Reading: DH&S, Ch. 2.{1-5}, 3.{1-4}
Probability theory Much inspired by the presentation of Kren and Samuelsson.
Lecture 5: Learning models using EM
Bayesian wrap-up (probably). Administrivia My schedule has been chaos... Thank you for your understanding... Feedback on the student lectures? HW2 not.
Visual Recognition Tutorial
Bayesian learning finalized (with high probability)
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Level 1 Laboratories University of Surrey, Physics Dept, Level 1 Labs, Oct 2007 Handling & Propagation of Errors : A simple approach 1.
Reinforcement Learning (for real this time). Administrivia No noose is good noose (Crazy weather?)
Reinforcement Learning, Cont’d Useful refs: Sutton & Barto, Reinforcement Learning: An Introduction, MIT Press 1998.
Sample Midterm question. Sue want to build a model to predict movie ratings. She has a matrix of data, where for M movies and U users she has collected.
Reinforcement Learning Reinforced. Administrivia I’m out of town next Tues and Wed Class cancelled Apr 4 -- work on projects! No office hrs Apr 4 or 5.
Reinforcement Learning: Learning to get what you want... Sutton & Barto, Reinforcement Learning: An Introduction, MIT Press 1998.
RL 2 It’s 2:00 AM. Do you know where your mouse is?
Computer vision: models, learning and inference
Thanks to Nir Friedman, HU
The People Have Spoken.... Administrivia Final Project proposal due today Undergrad credit: please see me in office hours Dissertation defense announcements.
Pattern Recognition Topic 2: Bayes Rule Expectant mother:
Bayesian Wrap-Up (probably). Administrivia Office hours tomorrow on schedule Woo hoo! Office hours today deferred... [sigh] 4:30-5:15.
Bayesian Learning, cont’d. Administrivia Homework 1 returned today (details in a second) Reading 2 assigned today S. Thrun, Learning occupancy grids with.
Crash Course on Machine Learning
1 Bayesian methods for parameter estimation and data assimilation with crop models Part 2: Likelihood function and prior distribution David Makowski and.
Learning Theory Reza Shadmehr Bayesian Learning 2: Gaussian distribution & linear regression Causal inference.
All of Statistics Chapter 5: Convergence of Random Variables Nick Schafer.
Lab 4 1.Get an image into a ROS node 2.Find all the orange pixels (suggest HSV) 3.Identify the midpoint of all the orange pixels 4.Explore the findContours.
EM and expected complete log-likelihood Mixture of Experts
2 nd Order CFA Byrne Chapter 5. 2 nd Order Models The idea of a 2 nd order model (sometimes called a bi-factor model) is: – You have some latent variables.
Statistics for the Social Sciences Psychology 340 Fall 2013 Correlation and Regression.
IID Samples In supervised learning, we usually assume that data points are sampled independently and from the same distribution IID assumption: data are.
AP STATS: Take 10 minutes or so to complete your 7.1C quiz.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
LOGISTIC REGRESSION David Kauchak CS451 – Fall 2013.
Confidence Interval & Unbiased Estimator Review and Foreword.
Lecture 8: More on the Binomial Distribution and Sampling Distributions June 1, 2004 STAT 111 Introductory Statistics.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #25.
Sampling and estimation Petter Mostad
Week 6. Statistics etc. GRS LX 865 Topics in Linguistics.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
Review of statistical modeling and probability theory Alan Moses ML4bio.
EE 551/451, Fall, 2006 Communication Systems Zhu Han Department of Electrical and Computer Engineering Class 15 Oct. 10 th, 2006.
LECTURE 07: CLASSIFICATION PT. 3 February 15, 2016 SDS 293 Machine Learning.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.
CS Ensembles and Bayes1 Ensembles, Model Combination and Bayesian Combination.
CSC321: Lecture 8: The Bayesian way to fit models Geoffrey Hinton.
MLPR - Questions. Can you go through integration, differentiation etc. Why do we need priors? Difference between prior and posterior. What does Bayesian.
What’s Reasonable About a Range?
Ch3: Model Building through Regression
Special Topics In Scientific Computing
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Parametric Methods Berlin Chen, 2005 References:
Mathematical Foundations of BME
Mathematical Foundations of BME Reza Shadmehr
Presentation transcript:

Intro to Reinforcement Learning Learning how to get what you want...

Administrivia HW 2 due today You’ve all turned it in, right? Remember: Midterm Thu, Oct 21 Office hours cancelled Oct 20 Reading 2 due Oct 26 Class survey today Anonymous Help me make the class better!

Reminder: Bayesian parameter estimation Produces a distribution over params, not single value Use Bayes’ rule to get there: Where: Generative distribution: Parameter prior: Model-free data distribution (prior): Posterior parameter distribution:

Exercise Suppose you want to estimate the average air speed of an unladen (African) swallow Let’s say that airspeeds of individual swallows, x, are Gaussianly distributed with mean and variance 1: Let’s say, also, that we think the mean is “around” 50 kph, but we’re not sure exactly what it is. But our uncertainty (variance) is 10 kph. Derive the posterior estimate of the mean airspeed.

Pop Quiz What happens to the posterior estimate of μ as N → ∞ ? Just kidding! Not a real pop quiz for credit this time But a question like this could be on the exam Be sure you understand how ML and Bayesian parameter estimation work!

Posterior Gaussian Airspeed

Let’s look just at the numerator for a second...

Posterior Gaussian Airspeed After a little * rearrangement (completing the square)... *Ok, a lot... I.e., our posterior can be written as a Gaussian itself w/ a special mean and variance...

Posterior Gaussian Airspeed Average of data (data mean) Prior mean Linear combination (weighted avg.)

Intro to Reinforcement Learning

Meet Mack the Mouse* Mack lives a hard life as a psychology test subject Has to run around mazes all day, finding food and avoiding electric shocks Needs to know how to find cheese quickly, while getting shocked as little as possible Q: How can Mack learn to find his way around? * Mickey is still copyright ?

Start with an easy case V. simple maze: Whenever Mack goes left, he gets cheese Whenever he goes right, he gets shocked After reward/punishment, he’s reset back to start of maze Q: how can Mack learn to act well in this world?

Learning in the easy case Say there are two labels: “cheese” and “shock” Mack tries a bunch of trials in the world -- that generates a bunch of experiences: Now what?

But what to do? So we know that Mack can learn a mapping from actions to outcomes But what should Mack do in any given situation? What action should he take at any given time? Suppose Mack is the subject of a psychotropic drug study and has actually come to like shocks and hate cheese -- how does he act now?

Reward functions In general, we think of a reward function: R() tells us whether Mack thinks a particular outcome is good or bad Mack before drugs: R( cheese )=+1 R( shock )=-1 Mack after drugs: R( cheese )=-1 R( shock )=+1 Behavior always depends on rewards (utilities)

Maximizing reward So Mack wants to get the maximum possible reward (Whatever that means to him) For the one-shot case like this, this is fairly easy Now what about a harder case?