Statistical learning and optimal control: A framework for biological learning and motor control Lecture 2: Models of biological learning and sensory- motor.

Slides:



Advertisements
Similar presentations
Chapter 8 Learning © 2004 John Wiley & Sons, Inc.
Advertisements

Internal models, adaptation, and uncertainty
Behavioral Theories of Motor Control
Integration of sensory modalities
Ming-Feng Yeh1 CHAPTER 13 Associative Learning. Ming-Feng Yeh2 Objectives The neural networks, trained in a supervised manner, require a target signal.
Observers and Kalman Filters
Introduction to Psychology, 7th Edition, Rod Plotnik Module 9: Classical Conditioning Module 9 Classical Conditioning.
Inferring Hand Motion from Multi-Cell Recordings in Motor Cortex using a Kalman Filter Wei Wu*, Michael Black †, Yun Gao*, Elie Bienenstock* §, Mijail.
Quantifying Generalization from Trial-by-Trial Behavior in Reaching Movement Dan Liu Natural Computation Group Cognitive Science Department, UCSD March,
Lectures 7&8: Pavlovian Conditioning (Determining Conditions) Learning, Psychology 5310 Spring, 2015 Professor Delamater.
Developing Stimulus Control. Peak Shift Phenomena where the peak of the generalization curve shifts AWAY from the S- – Means that the most responding.
Single Point of Contact Manipulation of Unknown Objects Stuart Anderson Advisor: Reid Simmons School of Computer Science Carnegie Mellon University.
Baysian Approaches Kun Guo, PhD Reader in Cognitive Neuroscience School of Psychology University of Lincoln Quantitative Methods 2011.
Estimation and the Kalman Filter David Johnson. The Mean of a Discrete Distribution “I have more legs than average”
Understanding Perception and Action Using the Kalman filter Mathematical Models of Human Behavior Amy Kalia April 24, 2007.
The Positional Acuity of the Human Visual System Year 2 Practical Class Dr. Paul McGraw & Mr. Craig Stockdale.
Tracking with Linear Dynamic Models. Introduction Tracking is the problem of generating an inference about the motion of an object given a sequence of.
Optimality in Motor Control By : Shahab Vahdat Seminar of Human Motor Control Spring 2007.
Statistical learning and optimal control:
A kinematic cost Reza Shadmehr. Subject’s performanceMinimum jerk motion Flash and Hogan, J Neurosci 1985 Point to point movements generally exhibit similar.
Chapter 7: Learning 1 What is learning? A relatively permanent change in behavior due to experience First test - purpose? To assess learning First test.
Learning Prof. Tom Alloway. Definition of Learning l Change in behavior l Due to experience relevant to what is being learned l Relatively durable n Conditioning.
© 2002 John Wiley & Sons, Inc. Huffman: PSYCHOLOGY IN ACTION, 6E PSYCHOLOGY IN ACTION Sixth Edition by Karen Huffman PowerPoint  Lecture Notes Presentation.
B.F. SKINNER - "Skinner box": -many responses -little time and effort -easily recorded -RESPONSE RATE is the Dependent Variable.
Learning Theory Reza Shadmehr Bayesian Learning 2: Gaussian distribution & linear regression Causal inference.
Chapter #2: Motor Learning for Effective Coaching and Performance
/09/dji-phantom-crashes-into- canadian-lake/
Statistical learning and optimal control:
Dr. Ramez. Bedwani.  Different methods of learning  Factors affecting learning.
Physics 114: Exam 2 Review Lectures 11-16
LEARNING  a relatively permanent change in behavior as the result of an experience.  essential process enabling animals and humans to adapt to their.
Learning Experiments and Concepts.  What is learning?
Lecture 3: Non-associative Learning
Learning Theory Reza Shadmehr State estimation theory.
Computational Modeling of Place Cells in the Rat Hippocampus Nov. 15, 2001 Charles C. Kemp.
Motor adaptation and the timescales of memory Reza Shadmehr Johns Hopkins School of Medicine Ali Ghazizadeh Maurice Smith Konrad Koerding Haiyin Chen Dave.
Learning Theory Reza Shadmehr LMS with Newton-Raphson, weighted least squares, choice of loss function.
Statistical learning and optimal control: A framework for biological learning and motor control Lecture 4: Stochastic optimal control Reza Shadmehr Johns.
Learning Theory Reza Shadmehr Optimal feedback control stochastic feedback control with and without additive noise.
Module 10 Operant & Cognitive Approaches. OPERANT CONDITIONING Operant conditioning –Also called _________________________________ –Kind of learning in.
Motor Control. Beyond babbling Three problems with motor babbling: –Random exploration is slow –Error-based learning algorithms are faster but error signals.
Processing Sequential Sensor Data The “John Krumm perspective” Thomas Plötz November 29 th, 2011.
VI. Regression Analysis A. Simple Linear Regression 1. Scatter Plots Regression analysis is best taught via an example. Pencil lead is a ceramic material.
Knowledge acquired in this way.
Abstract This presentation questions the need for reinforcement learning and related paradigms from machine-learning, when trying to optimise the behavior.
ECE-7000: Nonlinear Dynamical Systems Overfitting and model costs Overfitting  The more free parameters a model has, the better it can be adapted.
Chapter 6 FLASH CARD CHALLENGE!!!
Blocking The phenomenon of blocking tells us that what happens to one CS depends not only on its relationship to the US but also on the strength of other.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
State Estimation and Kalman Filtering Zeeshan Ali Sayyed.
Tracking with dynamics
Extinction of Conditioned Behavior Chapter 9 Effects of Extinction Extinction and Original Learning What is learned during Extinction.
State-Space Recursive Least Squares with Adaptive Memory College of Electrical & Mechanical Engineering National University of Sciences & Technology (NUST)
Psychology and Neurobiology of Decision-Making under Uncertainty Angela Yu March 11, 2010.
Kalman Filter and Data Streaming Presented By :- Ankur Jain Department of Computer Science 7/21/03.
Mechanisms of Extinction
Kalman Filters and Linear Dynamical Systems and Optimal Adaptation To A Changing Body (Koerding, Tenenbaum, Shadmehr)
Lecture 10: Observers and Kalman Filters
UNIT 4 BRAIN, BEHAVIOUR & EXPERIENCE
Integration of sensory modalities
Gaussian distribution & linear regression
Bayes and Kalman Filter
Learning Theory Reza Shadmehr
The Organization and Planning of Movement Ch
Mathematical Foundations of BME Reza Shadmehr
Learning Theory Reza Shadmehr
Homework 2 Let us simulate the savings experiment of Kojima et al. (2004) assuming that the learner models the hidden state of the world with a 3x1 vector.
Mathematical Foundations of BME
JJ Orban de Xivry Hands on session JJ Orban de Xivry
Presentation transcript:

Statistical learning and optimal control: A framework for biological learning and motor control Lecture 2: Models of biological learning and sensory- motor integration Reza Shadmehr Johns Hopkins School of Medicine

Various forms of classical conditioning in animal psychology Table from Peter Dayan Not explained by LMS, but predicted by the Kalman filter.

Kalman filter as a model of animal learning Suppose that x represents inputs from the environment: a light and a tone. Suppose that y represents rewards, like a food pellet. Animal’s model of the experimental setupAnimal’s expectation on trial n Animal’s learning from trial n

Sharing Paradigm Train: {x1,x2} -> 1 Test: x1 -> ?, x2 -> ? Result: x1->0.5, x2-> y yhat w1 w P11 P k1 k x1 x2 y * y yhat w1 w2 Learning with Kalman gainLMS

Blocking Kamin (1968) Attention-like processes in classical conditioning. In: Miami symposium on the prediction of behavior: aversive stimulation (ed. MR Jones), pp Univ. of Miami Press. Kamin trained an animal to continuously press a lever to receive food. He then paired a light (conditioned stimulus) and a mild electric shock to the foot of the rat (unconditioned stimulus). In response to the shock, the animal would reduce the lever-press activity. Soon the animal learned that when the light predicted the shock, and therefore reduced lever pressing in response to the light. He then paired the light with a tone when giving the electric shock. After this second stage of training, he observed than when the tone was given alone, the animal did not reduce its lever pressing. The animal had not learned anything about the tone.

Blocking Paradigm Train: x1 -> 1, {x1,x2} -> 1 Test: x2 -> ?, x1 -> ? Result: x2 -> 0, x1 -> x1 x2 y * y yhat w1 w P11 P k1 k2 Learning with Kalman gainLMS w1 w y yhat

Backwards Blocking Paradigm Train: {x1,x2} -> 1, x1 -> 1 Test: x2 -> ? Result: x2 -> x1 x2 y * y yhat w1 w P11 P k1 k2 Learning with Kalman gainLMS y yhat w1 w2

Different output models Case 1: the animal assumes an additive model. If each stimulus predicts one reward, then if the two are present together, they predict two rewards. Suppose that x represents inputs from the environment: a light and a tone. Suppose that y represents a reward, like a food pellet. Case 2: the animal assumes a weighted average model. If each stimulus predicts one reward, then if the two are present together, they still predict one reward, but with higher confidence. The weights a1 and a2 should be set to the inverse of the variance (uncertainty) with which each stimulus x1 and x2 predicts the reward.

General case of the Kalman filter A priori estimate of mean and variance of the hidden variable before I observe the first data point Update of the estimate of the hidden variable after I observed the data point Forward projection of the estimate to the next trial nx1 mx1

DM Wolpert et al. (1995) Science 269:1880 Motor command Sensory measurement State of our body Application of Kalman filter to problems in sensorimotor control

When we move our arm in darkness, we may estimate the position of our hand based on three sources of information: proprioceptive feedback. a forward model of how the motor commands have moved our arm. by combining our prediction from the forward model with actual proprioceptive feedback. Experimental procedures: Subject holds a robotic arm in total darkness. The hand is briefly illuminated. An arrow is displayed to left or right, showing which way to move the hand. In some cases, the robot produces a constant force that assists or resists the movement. The subject slowly moves the hand until a tone is sounded. They use the other hand to move a mouse cursor to show where they think their hand is located. DM Wolpert et al. (1995) Science 269:1880

Motor command Sensory measurement State of the body A B C The generative model, describing actual dynamics of the limb The model for estimation of sensory state from sensory feedback For whatever reason, the brain has an incorrect model of the arm. It overestimates the effect of motor commands on changes in limb position.

Initial conditions: the subject can see the hand and has no uncertainty regarding its position and velocity Forward model of state change and feedback Actual observation Estimate of state incorporates the prior and the observation Forward model to establish the prior and the uncertainty for the next state

Timesec Actual and estimated position Kalman gain Bias at end of movement (cm) Variance at end of movement (cm^2) Total movement time (sec) Motor command u Time of “beep” For movements of various length A single movement Pos (cm)

Puzzling results: Savings and memory despite “washout” Gain=eye displacement divided by target displacement Result 1: After changes in gain, monkeys exhibit recall despite behavioral evidence for washout. Kojima et al. (2004) Memory of learning facilitates saccade adaptation in the monkey. J Neurosci 24:

Result 2: Following changes in gain and a long period of washout, monkeys exhibit no recall. Result 3: Following changes in gain and a period of darkness, monkeys exhibit a “jump” in memory. Puzzling results: Improvements in performance without error feedback Kojima et al. (2004) J Neurosci 24:7531.

The learner’s hypothesis about the structure of the world A 1. The world has many hidden states. What I observe is a linear combination of these states. 2.The hidden states change from trial to trial. Some change slowly, others change fast. 3.The states that change fast have larger noise than states that change slow. slow system fast system state transition equation output equation

y yhat w1 w2 Simulations for savings x1 x2 y * The critical assumption is that in the fast system, there is much more noise than in the slow system. This produces larger learning rate in the slow system.

x1 x2 y * y yhat w1 w2 Simulations for spontaneous recovery despite zero error feedback error clamp In the error clamp period, estimates are made yet the weight update equation does not see any error. Therefore, the effect of Kalman gain in the error- clamp period is zero. Nevertheless, weights continue to change because of the state update equations. The fast weights rapidly rebound to zero, while the slow weights slowly decline. The sum of these two changes produces a “spontaneous recovery” after washout.

Mean gain at start of recovery = 0.83 Mean gain at start of recovery = 0.86 Mean gain at end of recovery = 0.87 % gain change = 1.2% gain change = 14.4% Mean gain at end of recovery = 0.95 Target extinguished during recoveryTarget visible during recovery Changes in representation without error feedback Seeberger et al. (2002) Brain Research 956:

Massed vs. Spaced training: effect of changing the inter-trial interval Learning reaching in a force field ITI = 8min ITI = 1min Discrimination performance (sec) Rats were trained on an operant conditional discrimination in which an ambiguous stimulus (X) indicated both the occasions on which responding in the presence of a second cue (A) would be reinforced and the occasions on which responding in the presence of a third cue (B) would not be reinforced (X --> A+, A-, X --> B-, B+). Both rats with lesions of the hippocampus and control rats learned this discrimination more rapidly when the training trials were widely spaced (intertrial interval of 8 min) than when they were massed (intertrial interval of 1 min). With spaced practice, lesioned and control rats learned this discrimination equally well. But when the training trials were massed, lesioned rats learned more rapidly than controls. Han, J.S., Gallagher, M. & Holland, P. Hippocampus 8: (1998)

Escape latency (s) Training trial (bin size=4) 4 trials per day for 4 days 16 trials in one day Performance in a water maze Commins, S., Cunningham, L., Harvey, D. & Walsh, D. (2003) Behav Brain Res 139: Aboukhalil, A., Shelhamer, M. & Clendaniel, R. (2004) Neurosci Lett 369: Cue-dependent saccade gain adaptation When eyes are looking up, increase saccade gain, when eyes are looking down, decrease gain. (break period: 1 min)

A A AAA The learner’s hypothesis about the structure of the world 1. The world has many hidden states. What I observe is a linear combination of these states. 2.The hidden states change from trial to trial. Some change slowly, others change fast. 3.The states that change fast have larger noise than states that change slow. 4.The state changes can occur more frequently than I can make observations. Inter-trial interval

y yhat y yhat ITI=2 ITI=20

When there is an observation, the uncertainty for each hidden variable decreases proportional to its Kalman gain. When there are no observations, the uncertainty decreases in proportion to A squared, but increases in proportion to state noise Q P P11 Uncertainty for the slow stateUncertainty for the fast state ITI=20 Beyond a minimum ITI, increased ITI continues to increase the uncertainty of the slow state but has little effect on the fast state uncertainty. The longer ITI increases the total learning by increasing the slow state’s sensitivity to error.

y yhatspaced yhatmassed w1massed w1spaced w2massed w2spaced k1massed k1spaced k2massed k2spaced Observation number Performance in spaced training depends largely on the slow state. Therefore, spaced training produces memories that decay little with passage of time.

ITI=14 ITI=2 ITI=98 Performance during training Test at 1 week ITI=14 ITI=2 ITI=98 Testing at 1 day or 1 week (averaged together) Pavlik, P. I. and Anderson, J. R. ( 2005). Practice and forgetting effects on vocabulary memory: An activation-based model of the spacing effect. Cognitive Science, 29, Spaced training results in better retention in learning a second language On Day 1, subjects learned to translate written Japanese words into English. They were given a Japanese word (written phonetically), and then given the English translation. This “study trial” was repeated twice. Afterwards, the were given the Japanese word and had to write the translation. If their translation was incorrect, the correct translation was given. The ITI between word repetition was either 2, 14, or 98 trials. Performance during training was better when the ITI was short. However, retention was much better for words that were observed with longer ITI. (The retention test involved two groups; one at 1 day and other at 7 days. Performance was slightly better for the 1 day group but the results were averaged in this figure.)

Data fusion Suppose that we have two sensors that independently measure something. We would like to combine their measures to form a better estimate. What should the weights be? Suppose that we know that sensor 1 gives us measurement y1 and has Gaussian noise with variance: And similarly, sensor 2 has gives us measurement y2 and has Gaussian noise with variance: A good idea is to weight each sensor inversely proportional to its noise:

To see why this makes sense, let’s put forth a generative model that describes our hypothesis about how the data that we are observing is generated: Observed variables Hidden variable Data fusion via Kalman filter

See homework for this priors our first observation variance of our posterior estimate

Notice that after we make our first observation, the variance of our posterior is better than the variance of either sensor. What our sensors tell us The real world

Sensor 1Sensor 2 Combined Sensor 1 Sensor 2 Combined Combining equally noisy sensorsCombining sensors with unequal noise Mean of the posterior, and its variance probability

musclesMotor commands force Body part State change Sensory system Proprioception Vision Audition Measured sensory consequences Forward model Predicted sensory consequences Integration Belief What we sense depends on what we predicted

Duhamel et al. Science 255, (1992) The brain predicts the sensory consequences of motor commands

Vaziri, Diedrichsen, Shadmehr (2006) Journal of Neuroscience Combining sensory predictions with sensory measurements should produce a better spatial estimate of the visual world

Vaziri et al. (2006) J Neurosci

How to set the initial var-cov matrix In homework, we will show that in general: Now if we have absolutely no prior information on w, then before we see the first data point P(1|0) is infinity, and therefore its inverse in zero. After we see the first data point, we will be using the above equation to update our estimate. The updated estimate will become: A reasonable and conservative estimate of the initial value of P would be to set it to the above value. That is, set: