SGPP: Spatial Gaussian Predictive Process Models for Neuroimaging Data Yimei Li Department of Biostatistics St. Jude Children’s Research Hospital Joint.

Slides:



Advertisements
Similar presentations
Hierarchical Models and
Advertisements

A Generalized Nonlinear IV Unit Root Test for Panel Data with Cross-Sectional Dependence Shaoping Wang School of Economics, Huazhong University of Science.
EigenFaces and EigenPatches Useful model of variation in a region –Region must be fixed shape (eg rectangle) Developed for face recognition Generalised.
Kin 304 Regression Linear Regression Least Sum of Squares
Yue Han and Lei Yu Binghamton University.
Chapter 10 Curve Fitting and Regression Analysis
P M V Subbarao Professor Mechanical Engineering Department
The loss function, the normal equation,
Sam Pfister, Stergios Roumeliotis, Joel Burdick
An introduction to Principal Component Analysis (PCA)
The Simple Linear Regression Model: Specification and Estimation
Principal Component Analysis
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Continuous Random Variables and Probability Distributions
Basic Mathematics for Portfolio Management. Statistics Variables x, y, z Constants a, b Observations {x n, y n |n=1,…N} Mean.
Modeling clustered survival data The different approaches.
Correlation 1. Correlation - degree to which variables are associated or covary. (Changes in the value of one tends to be associated with changes in the.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL FADTTS: Functional Analysis of Diffusion Tensor Tract Statistics Hongtu Zhu, Ph.D. Department of Biostatistics.
Lecture II-2: Probability Review
Modern Navigation Thomas Herring
TSTAT_THRESHOLD (~1 secs execution) Calculates P=0.05 (corrected) threshold t for the T statistic using the minimum given by a Bonferroni correction and.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Regression and Correlation Methods Judy Zhong Ph.D.
Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004.
III. Multi-Dimensional Random Variables and Application in Vector Quantization.
Multivariate Approaches to Analyze fMRI Data Yuanxin Hu.
© The McGraw-Hill Companies, Inc., 2000 Business and Finance College Principles of Statistics Lecture 10 aaed EL Rabai week
Scientific question: Does the lunch intervention impact cognitive ability? The data consists of 4 measures of cognitive ability including:Raven’s score.
© The McGraw-Hill Companies, Inc., Chapter 11 Correlation and Regression.
Correlation and Regression PS397 Testing and Measurement January 16, 2007 Thanh-Thanh Tieu.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
1 Modeling Coherent Mortality Forecasts using the Framework of Lee-Carter Model Presenter: Jack C. Yue /National Chengchi University, Taiwan Co-author:
N– variate Gaussian. Some important characteristics: 1)The pdf of n jointly Gaussian R.V.’s is completely described by means, variances and covariances.
Bayesian Multivariate Logistic Regression by Sean O’Brien and David Dunson (Biometrics, 2004 ) Presented by Lihan He ECE, Duke University May 16, 2008.
Operations on Multiple Random Variables
III. Multi-Dimensional Random Variables and Application in Vector Quantization.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
Correlation & Regression Analysis
CpSc 881: Machine Learning
1 Data Analysis Linear Regression Data Analysis Linear Regression Ernesto A. Diaz Department of Mathematics Redwood High School.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
STOCHASTIC HYDROLOGY Stochastic Simulation of Bivariate Distributions Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National.
Principal Component Analysis (PCA)
Stochastic Hydrology Random Field Simulation Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Biointelligence Laboratory, Seoul National University
The simple linear regression model and parameter estimation
Department of Mathematics
CS479/679 Pattern Recognition Dr. George Bebis
Linear Regression Modelling
Chapter 7. Classification and Prediction
Probability Theory and Parameter Estimation I
Kin 304 Regression Linear Regression Least Sum of Squares
Linear Mixed Models in JMP Pro
BPK 304W Regression Linear Regression Least Sum of Squares
Random Effects Analysis
The General Linear Model (GLM)
Stochastic Hydrology Random Field Simulation
LEARNING OUTCOMES After studying this chapter, you should be able to
From GLM to HLM Working with Continuous Outcomes
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005
What are BLUP? and why they are useful?
Matrix Algebra and Random Vectors
Correlation and Regression
OVERVIEW OF LINEAR MODELS
Product moment correlation
Hierarchical Models and
Principal Component Analysis
Presentation transcript:

SGPP: Spatial Gaussian Predictive Process Models for Neuroimaging Data Yimei Li Department of Biostatistics St. Jude Children’s Research Hospital Joint work with Dr. JungWon Hyun and Dr. Hongtu Zhu

Motivation Spatial Gaussian Predictive Process Models for Neuroimaging data Simulation Studies Real Data Analysis Discussion Outline

Motivation Images with missing valuesImages with predicted values Base line Images Future Images Future Covariates: Gender, Age, Treatment group etc.

Motivation Preprocessed data: single voxel Voxel-wise Analysis β β Design matrix The power is not optimal (Li et al., 2011; Polzehl et al., 2010) The voxel-wise analysis is also not optimal in prediction, since it does not account for spatial dependence of imaging data.

Spatial Gaussian Predictive Process Model The first one is to develop SGPP to delineate the association between high-dimensional imaging data and a set of covariates of interest, such as age, while accurately approximating spatial dependence of imaging data. The second one is to develop a simultaneous estimation and prediction framework for the analysis of neuroimaging data.

Models : is copy of : individual imaging variation and the medium to long range spatial dependence : spatially correlated errors that captures the short and local dependence : neuroimaging measures at d_m voxel : covariates of interests Medium to long dependence Short range dependence

Median to long dependence: FPCA Spectral Decomposition Since admits the Karhunen-Loeve expansion and its Approximation by:, is referred to as the (j,l) th functional principle score of the ith subject : is the lebesgue measurement : are uncorrelated random variables with

: is an autocorrelation parameter, which controls the strength of the local spatial dependence. It is the same across the brain and the value between 0 and 1. : Denotes the cardinality of N(d) : Denotes the independent and Identical copies of, with : a vector of unknown parameters Short Range Dependence: Multivariate Simultaneous Autoregressive Model : variance structure for the short range dependence

Variance Model approximation Variance and Model Approximation

Path Diagram for SGPP Estimation Process

Path Diagram for SGPP Prediction Process Training Data SetTesting Data Set Estimate SGPP model parameters

Model Validation We evaluate the prediction accuracy of the proposed model by quantifying the prediction error at all voxels with missing data, specifically the rtMSPE for each j is given by: : Denotes the set of all subjects in the test set VWLMGLM+FPCA GLM+SARGLM+FPCA+SAR Model to compare:

Simulation Studies I We simulation 900 pixel on 30*30 phantom for 50 subjects. At given pixel the data was generated from a bivariate Gaussian process model: : is generated from uniform (1,2), where are independently generated according to: The regression coefficients and eigenfuctions are set as follows:

Simulation Results (1) β estimate (2) The first 10 relative eigenvalues of simulated data (a) True β Estimated β

Simulation Results (3) Estimation: Eigenfuctions: (4) Estimation: REML: True Eigenfuctions: Estimated Eigenfuctions:

Simulation Results (5) Prediction: TE: 15 randomly selected subjects, TR: the other 35 subjects For each subject in the test set, we considered the imaging data with 10%, 30% And 50% missingness respectively. The missing pixels were randomly sampled according to missingness. We fit the SGPP model to the training set and estimate all the components in the model We predicted the missing data in the test set and obtained rtMSPE. We compared rtMSPE for the proposed model with those for VWLM, GLM+fPCA, and GLM+SAR model in table below:

Simulation II: Non Gaussian Random Field Similar set up as previous simulation. Except that a class of non-Gaussian Random field was generated by squaring the Gaussian random fields whose Correlations are the squared root of the desired correlation We examine the rtMSPE for this set up as below:

Real Data Analysis Surface data set of 43 infants at close to 1 year old. The response were based on the SPHARM-PDM representation of left lateral ventricle surfaces. The left lateral ventricle surface of each infant is represented by 1002 location vectors with each location vector consisting of spatial x,y and z coordinates of the corresponding vertex on the SPHARM-PDM surface Gender, gestational age are covariates we considered in the model i: ith subject j: jth coordinate : estimates for covariates for jth coordinates, include intercept, gender and gestational age

Real Data Analysis (1) β estimate for (1, Gender, Gestational age) effects Intercept GenderGestational age X Y Z

Real Data Analysis (2) Eigenvalues

Real Data Analysis (3) Eigenfunctions X Y Z Component 1Component 2Component 3

Real Data Analysis (4) –log p value map for β GenderGestational Age X Y Z X Y Z Raw P Value Corrected P Value

Prediction Randomly select 13 infants as testing set to estimate the prediction error

Discussion SGPP and our prediction method can be used to directly solve missing data problems in neuroimaging studies The proposed model can be extended to neuroimaging data obtained from clustered studies For instance, SGPP can be extended to predict a follow-up structural alternation and neural activity based on an individual's baseline image and covariates information It also can be extended to predict disease diagnosis and prevention