The General Linear Model (GLM): the marriage between linear systems and stats FFA
fMRI Data Processing Stream raw scanner data preprocessing identify task related activity
The General Linear Model (GLM) Jargon/Terms to remember General Linear Model (GLM) HRF (HIRF): hemodynamic (impulse) response function Design matrix Predictors Betas () Residual error Deconvolution How can we relate the hemodynamic response for a single event to the hemodynamic response during a blocked stimulation? Will the rise time be the same? Duration of response be the same? We need a model - we will spend a lecture explaining the current model for the linear systems approach which provides a mean to predict the expected BOLD response from the experimental paradigm. We will discuss both the appeal and constraints of this model. But first we will give some intuition 3
How do I predict the BOLD response to any stimulus? Convolution How do I predict the BOLD response to any stimulus?
How do I predict the BOLD response to any stimulus? Convolution How do I predict the BOLD response to any stimulus? According to the linear systems approach Need to know the impulse response function Predicted response is the convolution between the input and the impulse response function
How do I predict the BOLD response to any stimulus? Convolution How do I predict the BOLD response to any stimulus? According to the linear systems approach Need to know the impulse response function Predicted response is the convolution between the input and the impulse response function
How do I predict the BOLD response to any stimulus? Convolution How do I predict the BOLD response to any stimulus? According to the linear systems approach Need to know the impulse response function Predicted response is the convolution between the input and the impulse response function
Examples of model hemodynamic impulse response function (HRF) estimated from human fMRI measurements No undershoot No undershoot Derived empirically in V1 Shortest stimulus is 3s Mathematical model based on Turner balloon model of BOLD responses Derived empirically in V1 Shortest stimulus is 1s used decovolution
How do I predict the BOLD response to any stimulus? Convolution or How do I predict the BOLD response to any stimulus? Stimulus Predicted BOLD?
How do I predict the BOLD response to any stimulus? Convolution or How do I predict the BOLD response to any stimulus? Stimulus Predicted BOLD? Turn the stimulus into a series of impulses Sum up the time shifted impulse response In other words convolve your stimulus with the hrf
Convolve stimulus with hrf Convolution Stimulus Predicted BOLD Convolve stimulus with hrf We do this numerically in matlab with the conv function Convolution_tutorial.m
How do I predict the BOLD response to any stimulus? Convolution How do I predict the BOLD response to any stimulus? Convolution_tutorial.m
Hypothesis: there are neurons in the brain that respond more (increase firing) to moving than still visual stimuli
Awesome fMRI experiment Hypothesis: there are neurons in the brain that respond more (increase firing) to moving than still visual stimuli Awesome fMRI experiment Scan subjects as they view 6 alternating blocks of moving vs. stationary stimuli Moving dots 1 Still dots time
Awesome fMRI experiment Hypothesis: there are neurons in the brain that respond more (increase firing) to moving than still visual stimuli Awesome fMRI experiment Scan subjects as they view 6 alternating blocks of moving vs. stationary stimuli Moving dots 1 Still dots Prediction: BOLD response in regions containing motion sensitive neurons will be stronger during blocks of moving dots than blocks of still dots time
Awesome fMRI experiment Hypothesis: there are neurons in the brain that respond more (increase firing) to moving than still visual stimuli Awesome fMRI experiment Scan subjects as they view 6 alternating blocks of moving vs. stationary stimuli Note that I generated a vector of 1s and 0s indicating at each timepoint what is the stimulus: 0=still; 1=moving; Moving dots 1 Still dots Prediction: BOLD response in regions containing motion sensitive neurons will be stronger during blocks of moving dots than blocks of still dots time
Awesome fMRI experiment Hypothesis: there are neurons in the brain that respond more (increase firing) to moving than still visual stimuli Awesome fMRI experiment Scan subjects as they view 6 alternating blocks of moving vs. stationary stimuli Note that I generated a vector of 1s and 0s indicating at each timepoint what is the stimulus: 0=still; 1=moving; This is called the design matrix. Moving dots 1 Still dots Prediction: BOLD response in regions containing motion sensitive neurons will be stronger during blocks of moving dots than blocks of still dots time
Based on our linear model we will generate a prediction by convolving the design matrix with an HRF g(t) predictor 1 HRF * 18
Based on our linear model we will generate a prediction by convolving the design matrix with an HRF g(t) predictor 1 HRF * How well does my prediction match the data? Time series from an example voxel time course data 19
I want to write a mathematical notation relating the prediction to the data for 2 reasons: (1) to estimate the model parameters and (2) test how good the fit is time course data g(t) predictor
I want to write a mathematical notation relating the prediction to the data for 2 reasons: (1) to estimate the model parameters and (2) test how good the fit is time course data g(t) predictor
I want to write a mathematical notation relating the prediction to the data for 2 reasons: (1) to estimate the model parameters and (2) test how good the fit is time course data g(t) predictor
I want to write a mathematical notation relating the prediction to the data for 2 reasons: (1) to estimate the model parameters and (2) test how good the fit is Because this is a linear model, I predict my time course y(t) is a scaled version of the predictor g(t), plus an offset term b0. time course data g(t) predictor
I want to write a mathematical notation relating the prediction to the data for 2 reasons: (1) to estimate the model parameters and (2) test how good the fit is Because this is a linear model, I predict my time course y(t) is a scaled version of the predictor g(t), plus an offset term b0. time course data g(t) predictor The model has 2 parameters: b1 scales the predictor b0 shifts it from baseline and an error term e(t), residual error: what the linear model doesn’t explain in the data
I want to write a mathematical notation relating the prediction to the data for 2 reasons: (1) to estimate the model parameters and (2) test how good the fit is Because this is a linear model, I predict my time course y(t) is a scaled version of the predictor g(t), plus an offset term b0. time course data g(t) predictor The model has 2 parameters: b1 scales the predictor b0 shifts it from baseline and an error term e(t), residual error: what the linear model doesn’t explain in the data
g(t) predictor We can solve this with a simple linear regression! I want to write a mathematical notation relating the prediction to the data for 2 reasons: (1) to estimate the model parameters and (2) test how good the fit is We can solve this with a simple linear regression! time course data g(t) predictor The model has 2 parameters: b1 scales the predictor b0 shifts it from baseline and an error term e(t), residual error: what the linear model doesn’t explain in the data
* g(t) predictor Design matrix HRF Stimulus: used to generate predictor Offset term (b0) HRF * 27
is scaled to match the time course Design matrix Solution (b1): how much the predictor is scaled to match the time course
is scaled to match the time course Design matrix Solution (b1): how much the predictor is scaled to match the time course Here the signal is around zero so b0 is negligible. Nevertheless researchers rarely report it because they are interested about the effect of the stimulus and not the baseline brain signal
The residual error is the difference between the data and the model’s prediction e(t)=residual error
The residual error is the difference between the data and the model’s prediction The error term can be used to estimate how much variance in the data is explained by the model e(t)=residual error
How does the HRF affect results? * * *
How does the HRF affect results? * * *
How does the HRF affect results?
The General Linear Model: expand the regression model to have more than one predictor y(t) time course of voxel gi(t) i-th factor bi coefficient of i-th factor b0 shift from baseline e(t) additive noise
General Linear Model: Matrix Notation Y time course column vector (nx1) n: number of time samples G matrix of concatenated predictors (nxp) p:number of predictors (number of experimental conditions) b vector of factor coefficients (px1) e additive noise Least Squares Solution:
General Linear Model: Matrix Notation Y time course column vector (nx1) n: number of time samples G matrix of concatenated predictors (nxp) p:number of predictors (number of experimental conditions) b vector of factor coefficients (px1) e additive noise Least Squares Solution: As an experimenter you generate the design matrix, which then gets convolved with the HRF to generator predictors gi
Example GLM with 4 predictors and 2 baselines for run1 and run 2
Example GLM with 4 predictors and 2 baselines for run1 and run 2 Scaled by bs 39
Example GLM with 4 predictors and 2 baselines for run1 and run 2 GLM explains 70.5% of the time course variance
Example GLM with 4 predictors and 2 baselines for run1 and run 2 GLM explains 46.7% of the time course variance
Correlated Predictors Avoid predictors that are correlated with one another This is why we NEVER include a baseline predictor baseline predictor is almost completely correlated (r = -1) with the sum of other existing predictors if we included a baseline predictor, the model would have problems assigning variance to stimulus predictors vs. baseline predictors for example, the model could not distinguish between two possibilities (e.g., Beta1=1, Beta2=0 vs. Beta1=0, Beta2=-1) Stimulus predictor Baseline predictor
Correlated Predictors Avoid predictors that are correlated with one another This is why we NEVER include a baseline predictor baseline predictor is almost completely correlated (r = -1) with the sum of other existing predictors if we included a baseline predictor, the model would have problems assigning variance to stimulus predictors vs. baseline predictors for example, the model could not distinguish between two possibilities (e.g., Beta1=1, Beta2=0 vs. Beta1=0, Beta2=-1) Stimulus predictor Baseline predictor
Correlated Predictors Avoid predictors that are correlated with one another This is why we NEVER include a baseline predictor baseline predictor is almost completely correlated (r = -1) with the sum of other existing predictors if we included a baseline predictor, the model would have problems assigning variance to stimulus predictors vs. baseline predictors for example, the model could not distinguish between two possibilities (e.g., Beta1=1, Beta2=0 vs. Beta1=0, Beta2=-1) Stimulus predictor Baseline predictor
Deconvolution Can you solve the inverse problem? If I measure the summed response of several impulses, can I recover the hemodynamic response to a single event?
Series of identical single events that are closely space in time Series of identical single events that are closely space in time. Can I recover the hemodynamic response of a single event? h3 h4 h2 h5 h1 h6 h7 y3 y6 y5 y2 y4 y1 =y8 =y7
Series of identical single events that are closely space in time Series of identical single events that are closely space in time. Can I recover the hemodynamic response of a single event? h3 h4 h2 h5 h1 h6 h7 y3 y6 y5 y2 y4 y1 =y8 =y7
Series of identical single events that are closely space in time Series of identical single events that are closely space in time. Can I recover the hemodynamic response of a single event? h3 h4 h2 h5 h1 h6 h7 y3 y6 y5 y2 y4 Deconvolution: Compute the hemodynamic response hi in a time window by solving the GLM at each time point rather than assume an HRF; y1 =y8 =y7
Not if they are presented in a fixed interval Series of identical single events that are closely space in time. Can I recover the hemodynamic response of a single event? h3 h4 h2 h5 h1 h6 h7 y3 y6 y5 y2 y4 Deconvolution: Compute the hemodynamic response hi in a time window by solving the GLM at each time point rather than assume an HRF; y1 =y8 =y7
Series of identical single events that are closely space in time Series of identical single events that are closely space in time. Can I recover the hemodynamic response of a single event? Yes, if they are displaced (jittered) in time h3 h4 h2 h5 h6 h1 h7 y3 y6 y5 y2 y4 y1 Deconvolution: Compute the hemodynamic response hi in a time window by solving the GLM at each time point rather than assume an HRF; =y7 =y8
Estimating s Using Standard GLM
Deconvolution, estimating s in a time window without assuming a specific HRF
Back to nonlinearities- re-evaluating the GLM when it fails Birn et al NeuroImage 2001 Linear model fails for brief and rapid stimuli: (1) Responses are non-linear for durations shorter than 2s (2) Model substantially underestimates responses for brief stimuli
Problem: GLM relates stimulus to BOLD not neural responses to BOLD Gaussian noise Black box fMRI response Neural response MRI Scanner Hemo-dynamics Stimulus + If nonlinearities have a neural origin then incorporating a model of neural responses that accounts for these nonlinearities may better predict BOLD responses than the standard GLM
We sought to test nonlinearities by measuring brain responses to combinations of sustained and transient visual stimuli and comparing the predictions of the GLM to a new encoding model that takes into account nonlinear neural responses
We sought to test nonlinearities by measuring brain responses to combinations of sustained and transient visual stimuli and comparing the predictions of the GLM to a new encoding model that takes into account nonlinear neural responses
Predicted V1 responses from the standard GLM Stigliani, Jeska & Grill-Spector, Biorxiv 2017
V1 responses to transient stimuli differ from the predictions of the standard GLM Stigliani, Jeska & Grill-Spector, Biorxiv 2017
We developed a 2-temporal channel encoding model of neural responses in the visual system and tested if it better predicts BOLD responses than the standard GLM Stigliani, Jeska & Grill-Spector, Biorxiv 2017
2-temporal channel model better predicts V1 responses to time varying stimuli than the standard GLM Stigliani, Jeska & Grill-Spector, Biorxiv 2017
2-temporal channel model better predicts V1 responses to time varying stimuli than the standard GLM Stigliani, Jeska & Grill-Spector, Biorxiv 2017
2-temporal channel model also explains other data such as the data from the Birn paper Supplementary Figure S4: The 2 temporal-channel model explains response nonlinearities for briefly presented stimuli. (a) Figure adapted from Birn et al. (y-axis values are unreported in original version). Top left: measured V1 responses to brief (250– 2000 ms) presentations of a checkerboard stimulus that was contrast inverted at 8 Hz in all trial durations; Top right: predicted V1 responses based on a standard linear model solved using responses to longer presentations of the checkerboard stimulus. Bottom: same data as above except the measured and predicted fMRI responses are superimposed for each trial duration. (b) Simulated V1 responses to the stimuli used by Birn et al. that are derived with the weights solved using models fit to V1 data from Experiments 1 and 2 of the present study. Left: predictions of the 2 temporal-channel model for each trial duration; Right: predictions of the standard model for each trial duration. Bottom: same data as above except the predictions of the two models are superimposed for each trial duration. The simulations show that the standard model replicates Birn et al.’s linear model and underestimate responses. In contrast, the 2 temporal-channel model better explains the measured responses (a‑left) and predicts higher responses than the standard model in each duration (b‑bottom).