Download presentation
Presentation is loading. Please wait.
1
Econometrics & Business Statistics @ The University of Sydney Michael Smith Econometrics & Business Statistics, U. of Sydney Ludwig Fahrmeir Department of Statistics, LMU Spatial Bayesian Variable Selection with Application to fMRI
2
Econometrics & Business Statistics @ The University of Sydney Key References Smith and Fahrmeir (2004) ‘Spatial Bayesian variable selection with application to fMRI’ (under review) Smith, Putz, Auer and Fahrmeir, (2003), Neuroimage Smith and Smith (2005), ‘Estimation of binary Markov Random Fields using Markov chain Monte Carlo’, to appear in JCGS Kohn, Smith and Chan (2001), ‘Nonparametric regression using linear combinations of basis functions’, Statistics and Computing, 11, 313-322
3
Econometrics & Business Statistics @ The University of Sydney Bayesian Variable Selection in Regression Bayesian variable selection is now widely used in statistics Assume there are i=1,..,N separate regressions: y i = X i β i + e i each located at sites on a lattice Each with n i observations and the same p independent variables For each regression, introduce a vector of indicator variables γ i with elements γ ij = 0 iff β ij = 0 γ ij = 1 iff β ij ≠ 0 where j=1,…,p
4
Econometrics & Business Statistics @ The University of Sydney Proper Prior for Coefficients Each regression can be restated as y i = X i (γ i )β i (γ i ) + e i where β i (γ i ) are the non-zero elements of the β i vector. A proper prior is employed for the non-zero elements of the β vector with This results in a prior (g-prior)
5
Econometrics & Business Statistics @ The University of Sydney Joint Model Posterior The posterior of interest is where (see Smith and Kohn ‘96) Binary MRFs priors are placed on p(γ (j) ) to spatially smooth the indicators for each regressor j
6
Econometrics & Business Statistics @ The University of Sydney The Ising Prior One of the most popular binary MRFs is due to Ernst Ising Let γ = ( γ 1,…, γ N )’ be a vector of binary indicators over a lattice Then the pdf of γ is: Here: –α i are the external field coefficients –i~j indicates site i neighbors site j –θ is the smoothing parameter –ω ij are the weights (taken here as reciprocal of distance between sites i,j)
7
Econometrics & Business Statistics @ The University of Sydney The Ising Prior Other binary MRF priors could be used For example, latent variables with Gaussian MRFs (see Smith and Smith ’05 for a comparison) However, the Ising model has three strong advantages in the fMRI application: –Through the external field, anatomical prior information can be incorporated –Single-site sampling is much faster than with the latent GMRF based priors –The edge-preservation properties are strong
8
Econometrics & Business Statistics @ The University of Sydney The Ising Prior However, the joint distribution p(γ,θ)=p(γ|θ)p(θ) can be tricky…. When α=0 p(γ i =1 | θ) = p(γ i =1) = ½, so that θ is a smoothing parameter only However, when α≠0 (which proves important in the fMRI application) then p(γ i =1 | θ) ≠ p(γ i =1), so that θ is both a smoothing parameter and (along with fixed variables α) determines the marginal prior probability
9
Econometrics & Business Statistics @ The University of Sydney Posterior Distribution The posterior distribution (conditional on θ) can be computed in closed form, with However, in general this is not a recognisable density, so that inference has to be undertaken via simulation Possible to employ a MH step using a binary MRF approximation as a proposal (see Nott & Green) However, single-site sampling proves very hard to beat!
10
Econometrics & Business Statistics @ The University of Sydney Sampling Scheme Can use the Gibbs sampler, generating directly from p(γ ij |γ \ij,θ,y) Alternatively, can employ a MH step at each site based on the conditional prior p(γ ij |γ \ij,θ) Even under uniform priors can prove significantly faster (see Kohn et al. ‘01) This is because it avoids the need for evaluation of the likelihood when there is no switch (that is, 0→0 or 1→1) In the case where informative spatial binary MRF priors are used, the MH step can prove even faster because the proposal is a better approximation to the posterior
11
Econometrics & Business Statistics @ The University of Sydney Co-estimation with Smoothing Parameters Assume a uniform hyperprior The smoothing parameters can be generated one-at-a-time from Where C j (α j,θ j ) is the normalising constant for the Ising density It is pre-computed using a B-spline numerical approximation, with points evaluated using a Monte Carlo method (see Green and Richardson, ‘02) This proves extremely accurate The element θ j is then generated using a RW MH
12
Econometrics & Business Statistics @ The University of Sydney Monte Carlo Estimates The following Monte Carlo mixture estimates are of interest Note that if primary interest is in the first estimate, then evaluation of the conditional posteriors is undertaken analytically at each step of a Gibbs sampler In this case, this negates the speed improvements of the MH step suggested in Kohn et al. ’01 If interest is in the second estimate, then the faster sampler is preferable
13
Econometrics & Business Statistics @ The University of Sydney Introduction to brain mapping using fMRI data fMRI is a powerful tool for the non-invasive assessment of the functionality of neuronal networks The crux of this analysis is the distinction between active and inactive voxels from a functional time series This data is typically massive and involves a time series of observations at many voxels in the brain The data is usually highly noisy, and the issue facing statisticians is the balancing of false-positive with false-negative classifications This issue is addressed by exploiting the known high degree of spatial correlation in activation profiles If exploited effectively, this greatly enhances the activation maps and reduces both f-p and f-n classifications
14
Econometrics & Business Statistics @ The University of Sydney Example of MR time series (strongly active, weakly active, inactive voxels)
15
Econometrics & Business Statistics @ The University of Sydney Regression Modeling of fMRI data Define the following variables: –y it : the magnetic resonance time series at voxel i and time t –a it : baseline trend at voxel i and time t –z it : transformed stimulus at voxel i and time t –β i : activation amplitude Then the regression model below is a popular model in the literature y it = a it + z it β i + e it The baseline trend is due to background influence on the patient’s neuronal activity during the experiment A simple model is a parametric expansion a it = w t ’α i, where we use a quadratic polynomial and 4 low order Fourier terms The errors are modelled as iid N(0,σ i 2 )
16
Econometrics & Business Statistics @ The University of Sydney Anatomically Informed Activation Prior The prior for the activation indicators (only) can be informed by all sorts of information Here, we only consider the fact that activation can only occur in areas of grey matter Let g=(g 1,…,g N ) denote whether, or not, each voxel is grey matter We assume p(g)=p(g 1 )…p(g N ), where p(g i ) are known Then, if p(γ i =1 | g i ) = a (0.1 in our empirical experiment), p(γ i =1) = p(g i ) a = c i When θ=0 we can equate this with the marginal prior from the Ising density to obtain values for the external field to get p(γ i =1) = exp(α i )/exp(α i +1) = c i
17
Econometrics & Business Statistics @ The University of Sydney Two-Step Procedure However, the drawback of using non-zero external field coefficients in the Ising prior for the activation effect is that θ is no longer simply a smoothing parameter Therefore, we use a two-step procedure: Step (1) Set α=0, estimate and obtain a point estimate θ * =E(θ|y) Step (2) Refit with an anatomically informed Ising prior, but conditional on fixed level of smoothing θ * This overcomes the complex inter-relationship between θ & the marginal probability of activation with non-zero α
18
Econometrics & Business Statistics @ The University of Sydney Posterior Analysis using 3D Neighborhood We fit the data using a 3D neighborhood (9+9+8 neighbors) For simplicity here we do not undertake variable selection on the parametric trend terms, just the activation variable Three priors are employed (see table 1) The activation and amplitude maps obtained using prior (i) are in given in fig 2 The activation maps using prior (ii) for 8 contiguous slices are given in fig 4- they reveal the important of using anatomical prior information The activation mpas using prior (iii) for the same 8 slices are given in fig 6- they reveal the importance of spatial smoothing
19
Econometrics & Business Statistics @ The University of Sydney Three Ising Priors Prior (i) corresponds to step 1 in the two stage procedure Prior (ii) corresponds to step 2 in the two stage procedure Prior (iii) is for comparison with prior (ii) to demonstrate the impact of spatial smoothing
20
Econometrics & Business Statistics @ The University of Sydney
21
Econometrics & Business Statistics @ The University of Sydney
22
Econometrics & Business Statistics @ The University of Sydney
23
Econometrics & Business Statistics @ The University of Sydney Some comparisons An obvious alternative is to simply smooth spatially the activation amplitudes (β i, i=1,…,N) and then classify Fig. 5 shows the amplitude map using this approach What happens when variable selection is also undertaken on trend terms? Therefore, using only a 2D neighborhood structure two estimates were obtained: one with BVS on all terms, and the other with BVS on only one term. The resulting activation maps are similar and are in fig 8.
24
Econometrics & Business Statistics @ The University of Sydney
25
Econometrics & Business Statistics @ The University of Sydney
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.