Learning Theory Reza Shadmehr

Learning Theory Reza Shadmehr Sensitivity to error, Generalization properties of online learning, examples from perceptual learning and motor learning, estimating the generalization function

Sensitivity to error In the last lecture we saw evidence for the idea that in human learning, the loss function has the shape: What is the implication of this loss function in terms of how we learn from error? Marko et al. (2012), Journal of Neurophysiology, considered this question.

Sensitivity to error for various loss functions
Measured data from human volunteers Marko et al. (2012) Sensitivity to error decreases as error size increases. This is consistent with a loss function that decreases as error size increases (i.e., q<2)

Main question: The brain seems to be able to control how much it is willing to learn from error. How is this done?

We learn more from errors that arise in stable environments
Learning from error Herzfeld et al. (2014) Science

Are changes in error-sensitivity global to all errors?
Herzfeld et al. (2014) Science

A memory of errors Herzfeld et al. (2014) Science

Review of LMS 1 basis set:

Generalization in on-line learning (LMS)
While in a given trial error is experienced along only one input, the effect of that error is “broadcast” to all weights. The changed weights then affect the output potentially at every possible input. This broadcast is a generalization of error. The broadcast is quantified by a generalization function. Generalization function

Example: Generalization in on-line learning
Gaussian bases -1 -0.5 0.5 1 0.2 0.4 0.6 0.8 After 11th trials training point: After 10th trials Error on the 11th trial has a local effect on output. The neighborhood that it affects is around the training point. -1 -0.5 0.5 1 2 3 4 5 -1 -0.5 0.5 1 2 3 4 5

Gaussian bases Generalization function: How does the error that was experienced in x(n) affect any other state x? -1 -0.5 0.5 1 0.2 0.4 0.6 0.8 A normalized basis

Polynomial bases After 11th training point: After 10th training point -1 -0.5 0.5 1 0.6 0.7 0.8 0.9 -1 -0.5 0.5 1 0.6 0.7 0.8 0.9

Example: Perceptual learning of visual cues
In many different visual spatial discrimination tasks, such as determining the sign of the offset in a vernier stimulus, the human visual system exhibits hyperacuity by evaluating spatial relations with the precision of a fraction of the photoreceptor’s diameter. Poggio et al. (1992) proposed that this impressive performance comes about because of fast learning in the early stages of visual processing in the cortex. 20 arc sec 20 arc min long 2 arc min wide Viewed at a distance of 2.5 m Poggio et al. (1992) Science 256:1018

Psychophysical Results
Vertical vernies Horizontal vernies n=12 subjects bin=40 trials Percentage correct Modeling Results Vertical vernies Horizontal vernies Input space: 180x360 pixels, each representing 1 arc sec. Output: +1/-1 if to the right or left. bin=40 trials Poggio et al. (1992) Science 256:1018

Identification of an adaptive system
The system that we are interested in understanding has some knowledge about the world. In trial n, we will assess its knowledge at a point along x. The system will evaluate its “store of knowledge”, and produce an output yhat. We present the system with an error ytilde. The system will learn from that error by generalizing it. We are interested in quantifying this generalization function. We query this system for N trials. Our data set looks like this: Our goal is to estimate the function b. - +

Representing the “knowledge” contained in the adaptive system
We represent the knowledge contained in the adaptive system as a function that can change from trial to trial. On each trial, the function is evaluated along some x: We don’t know the basis functions. However, we know that the error in that trial changes the system’s knowledge: The interesting idea is that in a trial, we can measure the knowledge of the adaptive system at one point. As soon as we measure it, the system acquires further information (the error), and that changes knowledge at all points. Our problem is to use the trial-by-trial data to estimate how this knowledge is changing and arrive at the generalization function.

Estimating the generalization function
Take the input space x and divide it up into p equal segments. Represent the generalization function as a matrix B. The first argument of the generalization function x is an index that specifies the row of this matrix, the second argument x(n) is an index that specifies the column. Index specified by x Generalization from x(n) to x: Index specified by x(n) a row vector selector: System equations:

Estimating the generalization function: linear solution
Suppose that the same input is visited in trial n and trial n+m: Let’s track that part of the output as it evolves: So for every pair of trials that query the same input, we get a linear equation that has some parameters of our generalization function.

Example: estimating the generalization function from the record of errors
50 100 150 5 10 15 20 25 Column 2 Column 1 The fit between the model and data is exact. 2 4 6 8 10 0.05 0.1 0.15 0.2 -4 -2 2 4 0.05 0.1 0.15 0.2 B row number

Learning Theory Reza Shadmehr

Similar presentations

Presentation on theme: "Learning Theory Reza Shadmehr"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Learning Theory Reza Shadmehr

Similar presentations

Presentation on theme: "Learning Theory Reza Shadmehr"— Presentation transcript:

Similar presentations

About project

Feedback