Computational Biology VI

Name: Computational Biology VI
Uploaded: 2017-08-09T23:05:54+00:00
Duration: PTM30S18
Channel: Myles Williamson
Description: Computational Biology VI

Computational Biology VI
Jianfeng Feng Warwick University

Our module Data Processing Pipeline Linear Model Granger causality
Preprocessing Data Analysis Experimental Design Localizing Brain Activity Slice-time Correction Linear Model Granger causality Motion Correction, Co-registration & Normalization Raw Data Acquisition Connectivity Applications: Clinical Spatial Smoothing Reconstruction

Outline GLM Model building Noise module Inferences
Multi-comparison correction ( after next week)

1: The General Linear Model

Statistical Analysis There are multiple goals in the statistical analysis or machine learning side of fMRI data. They include: localizing brain areas activated by the task; determining networks corresponding to brain function; and making predictions about psychological or disease states.

Human Brain Mapping The most common use of fMRI to date has been to localize areas of the brain that activate in response to a certain task. These types of human brain mapping studies are necessary for the development of biomarkers and increasing our understanding of brain function.

Massive Univariate Approach
Typically analysis is performed by constructing a separate model at each voxel The ‘massive univariate approach’. Assumes an improbable independence between voxel pairs...... Typically dependencies between voxels are dealt with later using random field theory, which makes assumptions about the spatial dependencies between voxels.

General Linear Model y(t) = b0 + b1 f1(t) +… + + bp fp(t) + e(t)
The general linear model (GLM) approach treats the data as a linear combination of model functions (predictors) plus noise (error). The model functions are assumed to have known shapes, but their amplitudes are unknown and need to be estimated. The GLM framework encompasses many of the commonly used techniques in fMRI data analysis (and big data analysis more generally). y(t) = b0 + b1 f1(t) +… + + bp fp(t) + e(t)

Illustration Consider an experiment of alternating blocks of finger-tapping and rest. Construct a model to study data from a single voxel for a single subject. We seek to determine whether activation is higher during finger-tapping compared with rest.

= + 𝛽 0 𝛽 1 fMRI Data Design matrix Model parameters Residuals X
𝛽 0 𝛽 1 = + X BOLD signal Intercept Predicted task response

𝛽 0 𝛽 1 = + X In simple formula: y(t) = b0 + b1 f1(t) + e1(t)

= + H0 : 1  0 Null hypothesis 𝛽 0 𝛽 1 fMRI Data Design matrix
Model parameters Residuals 𝛽 0 𝛽 1 = + X H0 : 1  0 Null hypothesis BOLD signal Intercept Predicted task response

GLM y(t) = b0 + b1 f1(t) +… + + bp fp(t) + e(t) or Y  Xβ  ε
A standard GLM can be written: y(t) = b0 + b1 f1(t) +… + + bp fp(t) + e(t) or Y  Xβ  ε ε ~ N (0, V) where 𝑌 1 𝑌 2 ⋮ 𝑌 𝑇 = 1 𝑋 11 ⋯ 𝑋 1𝑝 1 𝑋 21 ⋯ 𝑋 2𝑝 ⋮ ⋮ ⋮ 1 𝑋 𝑇1 ⋯ 𝑋 𝑇𝑝 × 𝛽 1 𝛽 2 ⋮ 𝛽 𝑝 𝜀 1 𝜀 2 ⋮ 𝜀 𝑇 V is the covariance matrix whose format depends on the noise model. Design matrix Noise The quality of the model depends on our choice of X and V. fMRI Data Regression coefficients

In matrix format (lecture 4, time series)
For k dimensional case where the recorded data (Y(1), Y(2), Y(3), …, Y(T)), denoting

y  XβT y  Xβ Problem Formulation ε ~ N(0, I 2 ) Y  Xβ  ε
Assume the model: Y  Xβ  ε ε ~ N(0, I 2 ) The matrices X and Y are assumed to be known, and the noise is considered to be uncorrelated. Our goal is to find the value of  that minimizes: y  XβT y  Xβ

OLS Solution βˆ  (XT X)1 XT y E ˆ   Var(ˆ)   2 XT X
Ordinary least squares solution βˆ  (XT X)1 XT y Properties: Maximum likelihood estimate E ˆ   Var(ˆ)   2 XT X Can we do better? 1

Gauss Markov Theorem Var( 𝛽 )  Var( 𝛽 )
The Gauss-Markov Theorem states that any other unbiased estimator of  will have a larger variance than the OLS solution. Assume 𝛽 is an unbiased estimator of . Then according to G-M Theorem, Var( 𝛽 )  Var( 𝛽 ) ˆ is the best linear unbiased estimator (BLUE) of .

Estimation βˆ  (X' X)1 X' Y Y  Xβ  ε βˆ  (X' V1X)1 X' V1Y
If  is i.i.d., then Ordinary Least Square (OLS) estimate is optimal model estimate βˆ  (X' X)1 X' Y Y  Xβ  ε If Var() =V2  I2, then Generalized Least Squares (GLS) estimate is optimal model estimate βˆ  (X' V1X)1 X' V1Y Y  Xβ  ε

Summary Y  Xβ  ε βˆ  (X' V1X)1 X' V1Y Yˆ  Xˆ r  Y  Yˆ
model Y  Xβ  ε estimate βˆ  (X' V1X)1 X' V1Y fitted values Yˆ  Xˆ residuals r  Y  Yˆ  (I  (X ' V1X)1 X ' V1 )Y  RY

Estimating the Variance
Even if we assume  is i.i.d., we still need to estimate the residual variance, 2. T For OLS: rT r  ˆ 2  N  p Estimating V  I more difficult, using iterative methods.

Estimation βˆ  (X' X)1 X' Y Y  Xβ  ε βˆ  (X' V1X)1 X' V1Y
If  is i.i.d., then Ordinary Least Square (OLS) estimate is optimal model estimate βˆ  (X' X)1 X' Y Y  Xβ  ε If Var() =V2  I2, then Generalized Least Squares (GLS) estimate is optimal model estimate βˆ  (X' V1X)1 X' V1Y Y  Xβ  ε

Model Refinement This model has a number of shortcomings.
We want to use our understanding of the signal and noise properties of BOLD fMRI to aid us in constructing appropriate models. This includes deciding on an appropriate design matrix, as well as an appropriate noise model.

Issues 1. BOLD responses have a delayed and dispersed form.
clear all close all for i=1:400 h=0.1; x(i)=i*h; y(i)=x(i)*x(i)*x(i)*x(i)*exp(-x(i)); end for i=41:350 m(i)=y(i-40); y(i)=y(i)-m(i)*0.2; plot(x([1:300]),y([1:300])/max(y)) 1. BOLD responses have a delayed and dispersed form. The fMRI signal includes substantial amounts of low- frequency noise. The data are serially correlated which needs to be considered in the model.

2: Model Building

General Linear Model ε ~ N (0, V) Y  Xβ  ε
A standard GLM can be written: Y  Xβ  ε ε ~ N (0, V) where 𝑌 1 𝑌 2 ⋮ 𝑌 𝑇 = 1 𝑋 11 ⋯ 𝑋 1𝑝 1 𝑋 21 ⋯ 𝑋 2𝑝 ⋮ ⋮ ⋮ 1 𝑋 𝑇1 ⋯ 𝑋 𝑇𝑝 × 𝛽 1 𝛽 2 ⋮ 𝛽 𝑝 𝜀 1 𝜀 2 ⋮ 𝜀 𝑇 V is the covariance matrix whose format depends on the noise model. Design matrix Noise The quality of the model depends on our choice of X and V. fMRI Data Model parameters

Model Building Proper construction of the design matrix is critical for effective use of the GLM. This process can be complicated by the following properties of the BOLD response: It includes low-frequency noise and artifacts related to head movement and cardiopulmonary-induced brain movement. The neural response shape may not be known. The hemodynamic response varies in shape across the brain.

BOLD Response Predict the shape of the BOLD response to a given stimulus pattern. Assume the shape is known and the amplitude is unknown. The relationship between stimuli and the BOLD response is typically modeled using a linear time invariant (LTI) system. In an LTI system an impulse (i.e., neuronal activity) is convolved with an impulse response function (i.e., HRF).

Hemodynamic Response Function
Convolution Examples Block Design Experimental Stimulus Function Hemodynamic Response Function Predicted Response

HRF Models Often a fixed canonical HRF is used to model the response to neuronal activity Linear combination of 2 gamma functions. Optimal if correct. If wrong, leads to bias and power loss.  Unlikely that the same HRF is valid for all voxels.  True response may be faster/slower  True response may have smaller/bigger undershoot

𝛽 0 𝛽 1 = + X BOLD signal Intercept Predicted task response H :   0 0 1

Example Image of Model Data & Fitted predictors Single HRF

𝛽 0 𝛽 1 = + X BOLD signal Intercept Predicted task response H :   0 0 1

Example y(t) = b0 + b1 f1(t) + e(t) or Y  Xβ  ε
In mathematical term, we have y(t) = b0 + b1 f1(t) + e(t) or Y  Xβ  ε

Problems The HRF shape depends both on the vasculature and the time course of neural activity. Checkerboard Thermal pain Stimulus On Assuming a fixed HRF is usually not appropriate. Aversive picture, Aversive anticipation

Temporal Basis Functions
To allow for different types of HRFs in different brain regions, it is typically better to use temporal basis functions. A linear combination of functions can be used to account for delays and dispersions in the HRF. The stimulus function is convolved with each of the basis functions to give a set of regressors. The parameter estimates give the coefficients that determine the combination of basis functions that best models the HRF for the trial type and voxel in question.

In an LTI system the BOLD response is modeled x(t)  (s  h)(t) where s(t) is a stimulus function and h(t) the HRF and * is the convolution. Model the HRF as a linear combination of temporal basis functions, fi(t), such that h(t)  i fi (t)

h(t)  i fi (t) h(t)  1 2 3

The BOLD response can be rewritten: x(t)  i (s  fi )(t) In the GLM framework the convolution of the stimulus function with each basis function makes up a separate column of the design matrix. Each corresponding i describes the weight of that component.

Typically-used models vary in the degree they make a priori assumptions about the shape of the response. In the most extreme case, the shape of the HRF is fixed and only the amplitude is allowed to vary. By contrast, a finite impulse response (FIR) basis set, contains one free parameter for every time- point following stimulation for every cognitive event type.

Finite Impulse Response
The model estimates an HRF of arbitrary shape for each event type in each voxel of the brain

Basis sets Image of Data & Fitted Model predictors Single HRF HRF +
derivatives Finite Impulse Response (FIR) Time (s)

Basis sets Image of Data & Fitted Model predictors Single HRF HRF +
derivatives Finite Impulse Response (FIR) Overfitting ????? Time (s)

Nuisance Covariates Often model factors associated with known sources of variability, but that are not related to the experimental hypothesis, need to be included in the GLM. Examples of possible ‘nuisance regressors’: Signal drift Physiological (e.g., respiration) artifacts Head motion, e.g. six regressors comprising of three translations and three rotations.  Sometimes transformations of the six regressors also included.

Drift Slow changes in voxel intensity over time (low- frequency noise) is present in the fMRI signal. Scanner instabilities and not motion or physiological noise may be the main cause of the drift, as drift has been seen in cadavers. Need to include drift parameters in our models. - Use splines, polynomial basis or discrete cosine basis

Drift Blue curve is the data Red curve is what we expected

Model with Drift = + Y = X  +  
Selecting fi as cos(i w t) and sin(i w t) for example (as in MP3, only cos is used)            = + Y = X  +  

High Pass Filtering blue = black = green = data
mean + low-frequency drift predicted response, taking into account low-frequency drift predicted response (with low- frequency drift explained away) red =

Physiological Noise Respiration and heart beat give rise to high- frequency noise. This type of noise is difficult to remove and is often left in the data giving rise to temporal autocorrelations.

3: Noise Models

GLM Y  Xβ  ε ε ~ N (0, V) A standard GLM can be written: where
𝑌 1 𝑌 2 ⋮ 𝑌 𝑛 = 1 𝑋 11 ⋯ 𝑋 1𝑝 1 𝑋 21 ⋯ 𝑋 2𝑝 ⋮ ⋮ ⋮ 1 𝑋 𝑛1 ⋯ 𝑋 𝑛𝑝 × 𝛽 1 𝛽 2 ⋮ 𝛽 𝑝 𝜀 1 𝜀 2 ⋮ 𝜀 𝑛 V is the covariance matrix whose format depends on the noise model. Design matrix Noise The quality of the model depends on our choice of X and V. fMRI Data Regression coefficients

Design Matrix We has previously discussed various signal and nuisance components that can be included in the design matrix to improve the model. Temporal Basis functions  Allows for flexible HRF Motion parameters  Corrects for ‘spin history’ artifacts

fMRI Noise Functional MRI data typically exhibit significant autocorrelation. Caused by physiological noise and low frequency drift, that has not been appropriately modeled. Typically modeled using either an AR(p) process. Single subject statistics are not valid without an accurate model of the noise.

AR(1) model Serial correlation can be modeled using a first-order autoregressive model, i.e. t  t1  ut ut ~ N (0, ) 2 The error term t depends on the previous error term t-1 and a new disturbance term ut.

AR(1) model 𝜌 ℎ = 1, if h  0,  |h| if h  0
The autocorrelation function (ACF) for an AR(1) process: 𝜌 ℎ = 1, if h  0,  |h| if h  0 =0.7 1.0 0.5 0.0 -0.5 -1.0 5 10 15 1:16

Error Term The format of V will depend on what noise model is used.
IID Case AR(1) Case 𝐕∝ ⋯ ⋯ ⋯ 0 ⋮ ⋮ ⋮ ⋮ ⋯ 1 𝐕∝ 1 𝜙 𝜙 2 ⋯ 𝜙 𝑛−1 𝜙 1 𝜙 ⋯ 𝜙 𝑛−2 𝜙 2 𝜙 1 ⋯ 𝜙 𝑛−3 ⋮ ⋮ ⋮ ⋮ 𝜙 𝑛−1 𝜙 𝑛−2 𝜙 𝑛−3 ⋯ 1

GLM Summary Y  Xβ  ε βˆ  (X' V1X)1 X' V1Y Yˆ  Xˆ r  Y  Yˆ

Estimating V In general the form of the covariance matrix is unknown, which means it has to be estimated. Estimating V depends on ’s, and estimating ’s depends on V. Need iterative procedure.

Iterative Procedure Assume that V=I and calculate the OLS solution.
Estimate the parameters of V using the residuals. Re-estimate the  values using the estimated covariance matrix 𝐕 from step 2. 4. Iterate until convergence.

Spatio-temporal Behavior
The spatiotemporal behavior of these noise processes is complex. Spatial maps of the model parameters from an AR(2) model estimated for each voxel’s noise data.

4: Inference

GLM Summary Y  Xβ  ε βˆ  (X' V1X)1 X' V1Y Yˆ  Xˆ r  Y  Yˆ

Inference After fitting the GLM use the estimated parameters to determine whether there is significant activation present in the voxel. Inference is based on the fact that: ˆ ~ N (,(XT V1X)1 ) Use t or F test to perform tests on effects of interest.

Contrasts It is often of interest to see whether a linear combination of the parameters are significant. The term cT specifies a linear combination of the estimated parameters, i.e. c β  c   c  … c  T n n Here c is called a contrast vector.

cT Example H0 : c β  0     0, 1, 1 H0 : 2  3 1 2 3
Experiment with two types of stimuli. H0 : 2  3 = 1 H0 : c β  0 T 2 cT     0, 1, 1 3 + Noise

T-test cT βˆ T  Var cT βˆ  H0 : c β  0 Ha : c β  0 To test
use the t-statistic: T T Ha : c β  0 cT βˆ T  Var cT βˆ  (tr(RV))2 Under H0, T is approximately t() with   tr((RV) ) 2

Multiple Contrasts We often want to make simultaneous tests of several contrasts at once. c is now a contrast matrix. Suppose 𝑪= then 𝒄 𝑇 𝛽= 𝛽 1 𝛽 2

Example Consider a model with box-car shaped activation and drift modeled using the discrete cosine basis.          = + Y = X  +  

Example Do the drift components add anything to the model? Test: H0 : c   0 T where 𝑪=

Example      This is equivalent to testing:
     0 3 4 5 6 7 8 9 To understand what this implies, we split the design matrix into two parts: 1 𝑋 11 𝑋 12 ⋯ 𝑋 𝑋 21 𝑋 22 ⋯ 𝑋 29 ⋮ ⋮ ⋮ ⋮ 1 𝑋 𝑛1 𝑋 𝑛2 ⋯ 𝑋 𝑛9 X0 X1

Example Do the drift components add anything to the model?
The X1 matrix explains the drift. Does it contribute in a significant way to the model? Compare the results using the full model, with design matrix X, with those obtained using a reduced model, with design matrix X0.

r r  r r F-test  tr((R 
Test the hypothesis using the F-statistic: r r  r r T T 0 0 F   tr((R  ˆ 2 R )V Assuming the errors are normally distributed, F has an approximate F-distribution with (0, ) degrees of freedom, where tr(R  R )V 2 tr(RV)2   tr((RV)2 )   tr(R  R )V  and 2

Statistical Images For each voxel a hypothesis test is performed. The statistic corresponding to that test is used to create a statistical image over all voxels. T-value

Localizing Activation
Construct a model for each voxel of the brain. “Massive univariate approach” Regression models (GLM) commonly used.          = + Y = X  +  

2. Perform a statistical test to determine whether task related activation is present in the voxel. T H : c β  0 Statistical image: Map of t-tests across all voxels (a.k.a t-map).

3. Choose an appropriate threshold for determining statistical significance. Statistical parametric map: (SPM) Each significant voxel is color-coded according to the size of its p-value.

Statistical Images How do we determine which voxels are actually active? Problems: The statistics are obtained by performing a large number of hypothesis tests. Many of the test statistics will be artificially inflated due to the noise. This leads to many false positives.

Multiple Comparisons Which of 100,000 voxels are significant?
– =0.05  5,000 false positive voxels Choosing a threshold is a balance between sensitivity (true positive rate) and specificity (true negative rate). t > 1 t > 2 t > 3 t > 4 t > 5

Computational Biology VI

Similar presentations

Presentation on theme: "Computational Biology VI"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Computational Biology VI

Similar presentations

Presentation on theme: "Computational Biology VI"— Presentation transcript:

Similar presentations

About project

Feedback