Econometrics & Business The University of Sydney Michael Smith Econometrics & Business Statistics, U. of Sydney Ludwig Fahrmeir Department.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Introduction to Monte Carlo Markov chain (MCMC) methods
Fast Bayesian Matching Pursuit Presenter: Changchun Zhang ECE / CMR Tennessee Technological University November 12, 2010 Reading Group (Authors: Philip.
Dynamic Bayesian Networks (DBNs)
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
A Bayesian Approach to Joint Feature Selection and Classifier Design Balaji Krishnapuram, Alexander J. Hartemink, Lawrence Carin, Fellow, IEEE, and Mario.
J. Mike McHugh,Janusz Konrad, Venkatesh Saligrama and Pierre-Marc Jodoin Signal Processing Letters, IEEE Professor: Jar-Ferr Yang Presenter: Ming-Hua Tang.
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
Non-Linear Statistical Static Timing Analysis for Non-Gaussian Variation Sources Lerong Cheng 1, Jinjun Xiong 2, and Prof. Lei He 1 1 EE Department, UCLA.
Chapter 5: Linear Discriminant Functions
J. Daunizeau Wellcome Trust Centre for Neuroimaging, London, UK Institute of Empirical Research in Economics, Zurich, Switzerland Bayesian inference.
Particle filters (continued…). Recall Particle filters –Track state sequence x i given the measurements ( y 0, y 1, …., y i ) –Non-linear dynamics –Non-linear.
Today Introduction to MCMC Particle filters and MCMC
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Chapter 11 Multiple Regression.
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
1 A MONTE CARLO EXPERIMENT In the previous slideshow, we saw that the error term is responsible for the variations of b 2 around its fixed component 
Correlation & Regression
TSTAT_THRESHOLD (~1 secs execution) Calculates P=0.05 (corrected) threshold t for the T statistic using the minimum given by a Bonferroni correction and.
Image Analysis and Markov Random Fields (MRFs) Quanren Xiong.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Bayes Factor Based on Han and Carlin (2001, JASA).
Modeling Menstrual Cycle Length in Pre- and Peri-Menopausal Women Michael Elliott Xiaobi Huang Sioban Harlow University of Michigan School of Public Health.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Example Clustered Transformations MAP Adaptation Resources: ECE 7000:
The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010.
The Examination of Residuals. The residuals are defined as the n differences : where is an observation and is the corresponding fitted value obtained.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
An Empirical Likelihood Ratio Based Goodness-of-Fit Test for Two-parameter Weibull Distributions Presented by: Ms. Ratchadaporn Meksena Student ID:
Bayesian Analysis and Applications of A Cure Rate Model.
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
1 Gil McVean Tuesday 24 th February 2009 Markov Chain Monte Carlo.
Bayesian Inversion of Stokes Profiles A.Asensio Ramos (IAC) M. J. Martínez González (LERMA) J. A. Rubiño Martín (IAC) Beaulieu Workshop ( Beaulieu sur.
Simulation of the matrix Bingham-von Mises- Fisher distribution, with applications to multivariate and relational data Discussion led by Chunping Wang.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Markov Random Fields Probabilistic Models for Images
Fast Simulators for Assessment and Propagation of Model Uncertainty* Jim Berger, M.J. Bayarri, German Molina June 20, 2001 SAMO 2001, Madrid *Project of.
Bayesian Inference and Posterior Probability Maps Guillaume Flandin Wellcome Department of Imaging Neuroscience, University College London, UK SPM Course,
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
References: [1]S.M. Smith et al. (2004) Advances in functional and structural MR image analysis and implementation in FSL. Neuroimage 23: [2]S.M.
The Dirichlet Labeling Process for Functional Data Analysis XuanLong Nguyen & Alan E. Gelfand Duke University Machine Learning Group Presented by Lu Ren.
Suppressing Random Walks in Markov Chain Monte Carlo Using Ordered Overrelaxation Radford M. Neal 발표자 : 장 정 호.
Clustering and Testing in High- Dimensional Data M. Radavičius, G. Jakimauskas, J. Sušinskas (Institute of Mathematics and Informatics, Vilnius, Lithuania)
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Bayesian Generalized Kernel Mixed Models Zhihua Zhang, Guang Dai and Michael I. Jordan JMLR 2011.
A generalized bivariate Bernoulli model with covariate dependence Fan Zhang.
MCMC (Part II) By Marc Sobel. Monte Carlo Exploration  Suppose we want to optimize a complicated distribution f(*). We assume ‘f’ is known up to a multiplicative.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
6.4 Random Fields on Graphs 6.5 Random Fields Models In “Adaptive Cooperative Systems” Summarized by Ho-Sik Seok.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Bayesian Methods Will Penny and Guillaume Flandin Wellcome Department of Imaging Neuroscience, University College London, UK SPM Course, London, May 12.
Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.
Institute of Statistics and Decision Sciences In Defense of a Dissertation Submitted for the Degree of Doctor of Philosophy 26 July 2005 Regression Model.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Biointelligence Laboratory, Seoul National University
Bayesian data analysis
Ch3: Model Building through Regression
Markov Random Fields for Edge Classification
Filtering and State Estimation: Basic Concepts
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
EE513 Audio Signals and Systems
Robust Full Bayesian Learning for Neural Networks
Parametric Methods Berlin Chen, 2005 References:
Presentation transcript:

Econometrics & Business The University of Sydney Michael Smith Econometrics & Business Statistics, U. of Sydney Ludwig Fahrmeir Department of Statistics, LMU Spatial Bayesian Variable Selection with Application to fMRI

Econometrics & Business The University of Sydney Key References Smith and Fahrmeir (2004) ‘Spatial Bayesian variable selection with application to fMRI’ (under review) Smith, Putz, Auer and Fahrmeir, (2003), Neuroimage Smith and Smith (2005), ‘Estimation of binary Markov Random Fields using Markov chain Monte Carlo’, to appear in JCGS Kohn, Smith and Chan (2001), ‘Nonparametric regression using linear combinations of basis functions’, Statistics and Computing, 11,

Econometrics & Business The University of Sydney Bayesian Variable Selection in Regression Bayesian variable selection is now widely used in statistics Assume there are i=1,..,N separate regressions: y i = X i β i + e i each located at sites on a lattice Each with n i observations and the same p independent variables For each regression, introduce a vector of indicator variables γ i with elements γ ij = 0 iff β ij = 0 γ ij = 1 iff β ij ≠ 0 where j=1,…,p

Econometrics & Business The University of Sydney Proper Prior for Coefficients Each regression can be restated as y i = X i (γ i )β i (γ i ) + e i where β i (γ i ) are the non-zero elements of the β i vector. A proper prior is employed for the non-zero elements of the β vector with This results in a prior (g-prior)

Econometrics & Business The University of Sydney Joint Model Posterior The posterior of interest is where (see Smith and Kohn ‘96) Binary MRFs priors are placed on p(γ (j) ) to spatially smooth the indicators for each regressor j

Econometrics & Business The University of Sydney The Ising Prior One of the most popular binary MRFs is due to Ernst Ising Let γ = ( γ 1,…, γ N )’ be a vector of binary indicators over a lattice Then the pdf of γ is: Here: –α i are the external field coefficients –i~j indicates site i neighbors site j –θ is the smoothing parameter –ω ij are the weights (taken here as reciprocal of distance between sites i,j)

Econometrics & Business The University of Sydney The Ising Prior Other binary MRF priors could be used For example, latent variables with Gaussian MRFs (see Smith and Smith ’05 for a comparison) However, the Ising model has three strong advantages in the fMRI application: –Through the external field, anatomical prior information can be incorporated –Single-site sampling is much faster than with the latent GMRF based priors –The edge-preservation properties are strong

Econometrics & Business The University of Sydney The Ising Prior However, the joint distribution p(γ,θ)=p(γ|θ)p(θ) can be tricky…. When α=0 p(γ i =1 | θ) = p(γ i =1) = ½, so that θ is a smoothing parameter only However, when α≠0 (which proves important in the fMRI application) then p(γ i =1 | θ) ≠ p(γ i =1), so that θ is both a smoothing parameter and (along with fixed variables α) determines the marginal prior probability

Econometrics & Business The University of Sydney Posterior Distribution The posterior distribution (conditional on θ) can be computed in closed form, with However, in general this is not a recognisable density, so that inference has to be undertaken via simulation Possible to employ a MH step using a binary MRF approximation as a proposal (see Nott & Green) However, single-site sampling proves very hard to beat!

Econometrics & Business The University of Sydney Sampling Scheme Can use the Gibbs sampler, generating directly from p(γ ij |γ \ij,θ,y) Alternatively, can employ a MH step at each site based on the conditional prior p(γ ij |γ \ij,θ) Even under uniform priors can prove significantly faster (see Kohn et al. ‘01) This is because it avoids the need for evaluation of the likelihood when there is no switch (that is, 0→0 or 1→1) In the case where informative spatial binary MRF priors are used, the MH step can prove even faster because the proposal is a better approximation to the posterior

Econometrics & Business The University of Sydney Co-estimation with Smoothing Parameters Assume a uniform hyperprior The smoothing parameters can be generated one-at-a-time from Where C j (α j,θ j ) is the normalising constant for the Ising density It is pre-computed using a B-spline numerical approximation, with points evaluated using a Monte Carlo method (see Green and Richardson, ‘02) This proves extremely accurate The element θ j is then generated using a RW MH

Econometrics & Business The University of Sydney Monte Carlo Estimates The following Monte Carlo mixture estimates are of interest Note that if primary interest is in the first estimate, then evaluation of the conditional posteriors is undertaken analytically at each step of a Gibbs sampler In this case, this negates the speed improvements of the MH step suggested in Kohn et al. ’01 If interest is in the second estimate, then the faster sampler is preferable

Econometrics & Business The University of Sydney Introduction to brain mapping using fMRI data fMRI is a powerful tool for the non-invasive assessment of the functionality of neuronal networks The crux of this analysis is the distinction between active and inactive voxels from a functional time series This data is typically massive and involves a time series of observations at many voxels in the brain The data is usually highly noisy, and the issue facing statisticians is the balancing of false-positive with false-negative classifications This issue is addressed by exploiting the known high degree of spatial correlation in activation profiles If exploited effectively, this greatly enhances the activation maps and reduces both f-p and f-n classifications

Econometrics & Business The University of Sydney Example of MR time series (strongly active, weakly active, inactive voxels)

Econometrics & Business The University of Sydney Regression Modeling of fMRI data Define the following variables: –y it : the magnetic resonance time series at voxel i and time t –a it : baseline trend at voxel i and time t –z it : transformed stimulus at voxel i and time t –β i : activation amplitude Then the regression model below is a popular model in the literature y it = a it + z it β i + e it The baseline trend is due to background influence on the patient’s neuronal activity during the experiment A simple model is a parametric expansion a it = w t ’α i, where we use a quadratic polynomial and 4 low order Fourier terms The errors are modelled as iid N(0,σ i 2 )

Econometrics & Business The University of Sydney Anatomically Informed Activation Prior The prior for the activation indicators (only) can be informed by all sorts of information Here, we only consider the fact that activation can only occur in areas of grey matter Let g=(g 1,…,g N ) denote whether, or not, each voxel is grey matter We assume p(g)=p(g 1 )…p(g N ), where p(g i ) are known Then, if p(γ i =1 | g i ) = a (0.1 in our empirical experiment), p(γ i =1) = p(g i ) a = c i When θ=0 we can equate this with the marginal prior from the Ising density to obtain values for the external field to get p(γ i =1) = exp(α i )/exp(α i +1) = c i

Econometrics & Business The University of Sydney Two-Step Procedure However, the drawback of using non-zero external field coefficients in the Ising prior for the activation effect is that θ is no longer simply a smoothing parameter Therefore, we use a two-step procedure: Step (1) Set α=0, estimate and obtain a point estimate θ * =E(θ|y) Step (2) Refit with an anatomically informed Ising prior, but conditional on fixed level of smoothing θ * This overcomes the complex inter-relationship between θ & the marginal probability of activation with non-zero α

Econometrics & Business The University of Sydney Posterior Analysis using 3D Neighborhood We fit the data using a 3D neighborhood (9+9+8 neighbors) For simplicity here we do not undertake variable selection on the parametric trend terms, just the activation variable Three priors are employed (see table 1) The activation and amplitude maps obtained using prior (i) are in given in fig 2 The activation maps using prior (ii) for 8 contiguous slices are given in fig 4- they reveal the important of using anatomical prior information The activation mpas using prior (iii) for the same 8 slices are given in fig 6- they reveal the importance of spatial smoothing

Econometrics & Business The University of Sydney Three Ising Priors Prior (i) corresponds to step 1 in the two stage procedure Prior (ii) corresponds to step 2 in the two stage procedure Prior (iii) is for comparison with prior (ii) to demonstrate the impact of spatial smoothing

Econometrics & Business The University of Sydney

Econometrics & Business The University of Sydney

Econometrics & Business The University of Sydney

Econometrics & Business The University of Sydney Some comparisons An obvious alternative is to simply smooth spatially the activation amplitudes (β i, i=1,…,N) and then classify Fig. 5 shows the amplitude map using this approach What happens when variable selection is also undertaken on trend terms? Therefore, using only a 2D neighborhood structure two estimates were obtained: one with BVS on all terms, and the other with BVS on only one term. The resulting activation maps are similar and are in fig 8.

Econometrics & Business The University of Sydney

Econometrics & Business The University of Sydney