Overview of SPM p <0.05 Statistical parametric map (SPM) Image time-series Kernel Design matrix Realignment Smoothing General linear model Statistical inference Gaussian field theory Normalisation p <0.05 Template Parameter estimates
The General Linear Model (GLM) Frederike Petzschner Translational Neuromodeling Unit (TNU) Institute for Biomedical Engineering, University of Zurich & ETH Zurich With many thanks for slides & images to: FIL Methods group, Virginia Flanagin and Klaas Enno Stephan This Talk will be concerned with the General Linear Model, which is the core of SPM. So far you have been concerned with Preprocessing the data such that all the functional images are aligned and the registered to the high resolution anatomical images, the are nicely smoothed and the timing is adjusted. Now we can use this data to actually ask some questions. That is we go back to our original experiment
Image a very simple experiment… One session 7 cycles of rest and listening Blocks of 6 scans with 7 sec TR The toy example that we are going to use is one of the first fMRI studies which explains why the TR is so long And it tested listening to words versus rest. So the experimental manipulation was a block design of blocks of listening versus blocks of rest. time
Image a very simple experiment… What we measure. single voxel time series Time What we know. Know we have acquires a series of images of the course of this manipulation, And in fact we have not acquired images but small volumes so three dimensional data plus we have a forth dimension which is time. In the GLM we do a voxel-wise analysis That is we look a the time course of one particular voxel. For instance if we take a voxel in the rare part of the brain.. We could measure a time series like that over the course of our experimental manipulation What we are interested in is to find out wheher there is a significant change in the BOLD response due to our experimental manipulation that is between listening and rest. For the particular voxel we are looking it this is very roughly speaking not the case time Question: Is there a change in the BOLD response between listening and rest?
Image a very simple experiment… What we measure. single voxel time series Time What we know. But if we look at another voxel. E.g. in the auditory cortex we might actually get something like this. This looks much more like a change based on listening and rest. time Question: Is there a change in the BOLD response between listening and rest?
You need a model of your data… linear model effects estimate statistic error estimate But in order to determine whether there is a significant difference we need to model the data based on our knowledge about them and see whether there is a change that is caused by our experimental manipulation. This model then will give us an effects estimate (beta values) und an estimate of the error with which we can before all kinds of analysis Question: Is there a change in the BOLD response between listening and rest?
= + + Explain your data… error 1 2 x1 x2 e Time BOLD signal as a combination of experimental manipulation,confounds and errors error = + 1 2 + Time Error is the thing that is not explained And informative part Easier in matrix form x1 x2 e BOLD signal regressor Single voxel regression model:
= + + Explain your data… error 1 2 x1 x2 e Time BOLD signal as a combination of experimental manipulation,confounds and errors error = + 1 2 + Time This is the matrix form x1 x2 e BOLD signal Single voxel regression model:
= + + The black and white version in SPM 1 2 error Designmatrix e n: number of scans p: number of regressors
Model assumptions Designmatrix error The design matrix embodies all available knowledge about experimentally controlled factors and potential confounds. Talk: Experimental Design Wed 9:45 – 10:45 Designmatrix You want to estimate your parameters such that you minimize: This can be done using an Ordinary least squares estimation (OLS) assuming an i.i.d. error: error To define the design matrix is not enough You need to have an assumption about your error term. We assume that noise is randomly distributed. Independent and identical (independent of all other noise and identical across the time series) The designmatrix should also account for potential confounds If the error term becomes to big this harms you in terms of statistics
GLM assumes identical and independently distributed errors i.i.d. = error covariance is a scalar multiple of the identity matrix: Cov(e) = 2I non-identity non-independence t1 t2 t1 t2 To define the design matrix is not enough You need to have an assumption about your error term. We assume that noise is randomly distributed. Independent and identical (independent of all other noise and identical across the time series) The designmatrix should also account for potential confounds If the error term becomes to big this harms you in terms of statistics
= + How to fit the model and estimate the parameters? error y X „Option 1“: Per hand = + y X
= + How to fit the model and estimate the parameters? error y X OLS (Ordinary Least Squares) error Data predicted by our model Error between predicted and actual data Goal is to determine the betas such that we minimize the quadratic error = + y X
OLS (Ordinary Least Squares) The goal is to minimize the quadratic error between data and model
OLS (Ordinary Least Squares) The goal is to minimize the quadratic error between data and model
OLS (Ordinary Least Squares) The goal is to minimize the quadratic error between data and model This is a scalar and the transpose of a scalar is a scalar
OLS (Ordinary Least Squares) The goal is to minimize the quadratic error between data and model This is a scalar and the transpose of a scalar is a scalar
OLS (Ordinary Least Squares) The goal is to minimize the quadratic error between data and model This is a scalar and the transpose of a scalar is a scalar You find the extremum of a function by taking its derivative and setting it to zero
OLS (Ordinary Least Squares) The goal is to minimize the quadratic error between data and model This is a scalar and the transpose of a scalar is a scalar You find the extremum of a function by taking its derivative and setting it to zero SOLUTION: OLS of the Parameters
A geometric perspective on the GLM OLS estimates y e x2 x1 Geometric perspective You can see the regressors as two vectors 7 cycles 8 dimensional space The only signal that can be explained by the linear combination of the two regressors so only the gray plane Design space defined by X
Correlated and orthogonal regressors y Design space defined by X x2 x2* x1 Same notion of geometric expression Data are outside the plane We want to estimate the projection of our data onto the plane Problem the two regressors are correlated You can orthogonalize the vectors to get more statistical power But not both regressors change but just one will increase Correlated regressors = explained variance is shared between regressors When x2 is orthogonalized with regard to x1, only the parameter estimate for x1 changes, not that for x2!
= + We are nearly there… linear model effects estimate statistic error estimate Not going to talk about the statistics = +
What are the problems? Design Error BOLD responses have a delayed and dispersed form. The BOLD signal includes substantial amounts of low- frequency noise. The data are serially correlated (temporally autocorrelated) this violates the assumptions of the noise model in the GLM What are the problems with model bold signal need up to 5 seconds to reach peak and about 20 seconds to decay Addictional signal from the scanner: scanner drift. Eg. From heating up because of the movement of the coils Subjects are alive. Datat acquired at a slow paste compared to this physiological activity this is on top of the bold signal. Now the errror term is correlated
Problem 1: Shape of BOLD response The response of a linear time-invariant (LTI) system is the convolution of the input with the system's response to an impulse (delta function). If you mix input function and expected activation function you get the expected bold response
Solution: Convolution model of the BOLD response expected BOLD response = input function impulse response function (HRF) HRF No slightly delayed and also not completely flat blue = data green = predicted response, taking convolved with HRF red = predicted response, NOT taking into account the HRF
Problem 2: Low frequency noise You have signals that you are not interested in at a low frequency Build-in tool that allows you to capture these very slow oscillations The part of the signal we are not interesetd in will be modelled in the design blue = data black = mean + low-frequency drift green = predicted response, taking into account low-frequency drift red = predicted response, NOT taking into account low-frequency drift
Problem 2: Low frequency noise Linear model You have signals that you are not interested in at a low frequency Build-in tool that allows you to capture these very slow oscillations The part of the signal we are not interesetd in will be modelled in the design blue = data black = mean + low-frequency drift green = predicted response, taking into account low-frequency drift red = predicted response, NOT taking into account low-frequency drift
discrete cosine transform (DCT) set Solution 2: High pass filtering If you have already used SPM You have to decide on the cut-off By default 1/128 secondsthe gray area get s modelled out If you design something that discrete cosine transform (DCT) set
Problem 3: Serial correlations i.i.d non-independence non-identity t1 t2 t1 t2 n If err iid matrx with only the diagonal Solution use an autocorrelated model (AR1 plus white noise) n n: number of scans
1st order autoregressive process: AR(1) Problem 3: Serial correlations with 1st order autoregressive process: AR(1) autocovariance function n If err iid matrx with only the diagonal Solution use an autocorrelated model (AR1 plus white noise) n n: number of scans
Problem 3: Serial correlations Pre-whitening: 1. Use an enhanced noise model with multiple error covariance components, i.e. e ~ N(0,2V) instead of e ~ N(0,2I). 2. Use estimated serial correlation to specify filter matrix W for whitening the data. This is i.i.d If err iid matrx with only the diagonal Solution use an autocorrelated model (AR1 plus white noise)
How do we define W ? Enhanced noise model Remember linear transform for Gaussians Choose W such that error covariance becomes spherical Conclusion: W is a simple function of V so how do we estimate V ? What sometimes worries users. The design matrix changes after the estimation your are looking at the weighted design matrix
V Q1 Q2 = 1 + 2 Find V: Multiple covariance components enhanced noise model error covariance components Q and hyperparameters V Q1 Q2 = 1 + 2 Variance is modeled by a linear combination of covariance matrixes Hyperparamters : parameters on the covariance Way to do that is restricted maximum likelihood Estimation of hyperparameters with EM (expectation maximisation) or ReML (restricted maximum likelihood).
= + We are there… linear model effects estimate statistic error estimate GLM includes all known experimental effects and confounds Convolution with a canonical HRF High-pass filtering to account for low-frequency drifts Estimation of multiple variance components (e.g. to account for serial correlations) Not going to talk about the statistics = +
= + We are there… linear model effects estimate statistic Null hypothesis: effects estimate statistic error estimate Not going to talk about the statistics Talk: Statistical Inference and design efficiency. Next Talk = +
We are there… Mass-univariate approach: GLM applied to > 100,000 voxels Threshold of p<0.05 more than 5000 voxels significant by chance! single voxel time series Time Massive problem with multiple comparisons! Solution: Gaussian random field theory Not going to talk about the statistics
Outlook: further challenges correction for multiple comparisons Talk: Multiple Comparisons Wed 8:30 – 9:30 variability in the HRF across voxels Talk: Experimental Design Wed 9:45 – 10:45 limitations of frequentist statistics Talk: entire Friday GLM ignores interactions among voxels Talk: Multivariate Analysis Thu 12:30 – 13:30
Thank you! Read me Friston, Ashburner, Kiebel, Nichols, Penny (2007) Statistical Parametric Mapping: The Analysis of Functional Brain Images. Elsevier. Christensen R (1996) Plane Answers to Complex Questions: The Theory of Linear Models. Springer. Friston KJ et al. (1995) Statistical parametric maps in functional imaging: a general linear approach. Human Brain Mapping 2: 189-210. Talk Title letzte Folie Derivative with respect to beta Eventuell die gesamte Herleitung Animationen überprüfen Eventuell HGF Ableitung einbauen