(Virginia Commonwealth University)

Slides:



Advertisements
Similar presentations
Structural Equation Modeling. What is SEM Swiss Army Knife of Statistics Can replicate virtually any model from “canned” stats packages (some limitations.
Advertisements

A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Phenotypic factor analysis
GENERAL LINEAR MODELS: Estimation algorithms
Budapest May 27, 2008 Unifying mixed linear models and the MASH algorithm for breakpoint detection and correction Anders Grimvall, Sackmone Sirisack, Agne.
Dimension reduction (1)
The General Linear Model. The Simple Linear Model Linear Regression.
Adapting to missing data
Factor Analysis Purpose of Factor Analysis
Chapter 11 Multiple Regression.
Basic Mathematics for Portfolio Management. Statistics Variables x, y, z Constants a, b Observations {x n, y n |n=1,…N} Mean.
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
Structural Equation Modeling Intro to SEM Psy 524 Ainsworth.
Matrix Approach to Simple Linear Regression KNNL – Chapter 5.
Introduction to Multilevel Modeling Using SPSS
Karri Silventoinen University of Helsinki Osaka University.
Stats for Engineers Lecture 9. Summary From Last Time Confidence Intervals for the mean t-tables Q Student t-distribution.
CIS 2033 based on Dekking et al. A Modern Introduction to Probability and Statistics Instructor Longin Jan Latecki C22: The Method of Least Squares.
Introduction to OpenMx Sarah Medland. What is OpenMx? Free, Open-source, full–featured SEM package Software which runs on Windows, Mac OSX, and Linux.
Multilevel Modeling Software Wayne Osgood Crime, Law & Justice Program Department of Sociology.
The importance of the “Means Model” in Mx for modeling regression and association Dorret Boomsma, Nick Martin Boulder 2008.
Introduction to Matrices and Matrix Approach to Simple Linear Regression.
Generalised method of moments approach to testing the CAPM Nimesh Mistry Filipp Levin.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
Tutorial I: Missing Value Analysis
Mx modeling of methylation data: twin correlations [means, SD, correlation] ACE / ADE latent factor model regression [sex and age] genetic association.
Biostatistics Case Studies Peter D. Christenson Biostatistician Session 3: Missing Data in Longitudinal Studies.
- 1 - Preliminaries Multivariate normal model (section 3.6, Gelman) –For a multi-parameter vector y, multivariate normal distribution is where  is covariance.
Øyvind Langsrud New Challenges for Statistical Software - The Use of R in Official Statistics, Bucharest, Romania, 7-8 April 1 A variance estimation R.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
QTL Mapping Using Mx Michael C Neale Virginia Institute for Psychiatric and Behavioral Genetics Virginia Commonwealth University.
The SweSAT Vocabulary (word): understanding of words and concepts. Data Sufficiency (ds): numerical reasoning ability. Reading Comprehension (read): Swedish.
Dimension reduction (2) EDR space Sliced inverse regression Multi-dimensional LDA Partial Least Squares Network Component analysis.
MathematicalMarketing Slide 3c.1 Mathematical Tools Chapter 3: Part c – Parameter Estimation We will be discussing  Nonlinear Parameter Estimation  Maximum.
University of Colorado at Boulder
Regression Models for Linkage: Merlin Regress
Structural Equation Modeling using MPlus
Esman M. Nyamongo Central Bank of Kenya
Introduction to OpenMx
The general linear model and Statistical Parametric Mapping
Linear Mixed Models in JMP Pro
Re-introduction to openMx
CH 5: Multivariate Methods
Combining SEM & GREML in OpenMx
Marker heritability Biases, confounding factors, current methods, and best practices Luke Evans, Matthew Keller.
Generalized Linear Models
Multiple Imputation.
How to handle missing data values
Matrices Definition: A matrix is a rectangular array of numbers or symbolic elements In many applications, the rows of a matrix will represent individuals.
Univariate modeling Sarah Medland.
6-1 Introduction To Empirical Models
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
10701 / Machine Learning Today: - Cross validation,
Why general modeling framework?
OVERVIEW OF LINEAR MODELS
(Re)introduction to Mx Sarah Medland
Structural Equation Modeling
Bootstrapping Jackknifing
OVERVIEW OF LINEAR MODELS
The general linear model and Statistical Parametric Mapping
Fixed, Random and Mixed effects
Learning Theory Reza Shadmehr
GREML: Heritability Estimation Using Genomic Data
Analytics – Statistical Approaches
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
BOULDER WORKSHOP STATISTICS REVIEWED: LIKELIHOOD MODELS
Mathematical Foundations of BME
Rachael Bedford Mplus: Longitudinal Analysis Workshop 23/06/2015
Structural Equation Modeling
Presentation transcript:

(Virginia Commonwealth University) GREML in OpenMx Rob Kirkpatrick (Virginia Commonwealth University) Boulder Workshop 2015

What’s GREML? Genomic-Relatedness-Matrix Restricted Maximum Likelihood (Benjamin et al., 2012). What GCTA does to estimate . Benjamin, D. J., et al. (2012). PNAS, 109(21), 8026-8031. doi: 10.1073/pnas.1120666109

What’s OpenMx? Package for the R statistical computing platform. R “frontend.” C++/Fortran “backend.” Combines matrix operations with numerical optimization. Purpose (broadly): structural equation modeling. http://openmx.psyc.virginia.edu/

Models in OpenMx 2.0.x Split between: Model’s expectation (specification). Model’s fitfunction (loss function to be minimized). GREML implementation introduces a GREML expectation and fitfunction. IMPORTANT: GREML implementation not yet officially released and is subject to change! To use GREML in OpenMx right now, you would need to build a “bleeding edge” revision of OpenMx from the source repository.

GREML in OpenMx is flexible Key distinguishing characteristic from other analyses in OpenMx (e.g., FIML): phenotype vector y is a single realization of a random vector that cannot, in general, be partitioned into independent subvectors. Applicable to analyses in disciplines other than genetics. Simple GREML-type models should be done in GCTA, because GCTA is optimized for the purpose. Not much point using OpenMx for things that GCTA can do, and more quickly. For instance, GREML enables a quite flexible, albeit computationally inefficient, way of doing multilevel analyses in OpenMx.

GREML in OpenMx: assumptions Conditional on covariates X, phenotype vector y is a single draw from a multivariate-normal distribution having (in general) dense covariance matrix, V. The parameters of V are of primary interest. Random effects are normally distributed. WLS regression, using weight matrix V-1, is adequate model for phenotypic mean. ‘V’ matrix can be sparse, but at present, OpenMx doesn’t know how to internally represent it as a sparse matrix w/r/t storage and operations. Parameters in ‘V’ are variance components, genetic correlations, etc. Random effects, including residual, assumed to be normal.

mxExpectationGREML()

Notes N.B. y can contain an arbitrary number of stacked phenotypes(!). Need to use NPSOL or Newton-Raphson to benefit from derivatives. “Definition variables” not compatible (and not necessary) with GREML. Ordinal variables not compatible with REML. GREML expectation does not use MxData, but it needs dummy MxData for technical reasons…

mxGREMLStarter() Recommended interface for initializing a GREML model. User provides: Raw “wide-format” data. Column names of phenotype(s) and covariates. Function returns MxModel with: X & y, with missing observations removed. GREML fitfunction. Dummy MxData. User still has to specify V. Raw data = dataframe or matrix (in R). No simple interface for specifying ‘V’ at present, for the sake of flexibility.

GREML fitfunction Can trim rows and columns of V (and derivatives) that correspond to missing observations trimmed from X & y… …but better performance if trimming done before runtime. Backend: Uses Eigen C++ library for linear algebra system. With Newton-Raphson, does average-information REML (Johnson & Thompson, 1995). Frontend object holds results for fixed effects. Johnson, D. L., & Thompson, R. (1995). Journal of Dairy Science, 78, 449-456.

TODO—short term UI for viewing fixed-effects results. More data-structuring choices for mxGREMLStarter() (e.g., repeated-measures). Sparse V option? Option for MxAlgebras not to return a result to frontend (because huge). The way the data are structured in the current mxGREMLStarter() is appropriate if the multiple phenotypes really are different phenotypes. A different structuring would be preferable when the “multiple” phenotypes are actually the same phenotype measured multiple times from the same unit (e.g., repeated measures, participants clustered within families).

TODO—long term Allow matrices (esp. GRMs) to be read directly into the backend from disk. User-friendly interface for specifying V (e.g., from paths)?

Concluding Notes Validation of OpenMx GREML: Reproduced FIML and GCTA results from Lindon’s simulation. Reproduced GCTA results reported in Kirkpatrick et al. (2014). There’s a simple example script in extra slides. I’m looking for interested users! Let’s talk! Kirkpatrick, R. M., et al. (2014). PLoS ONE 9(11): e112390. doi:10.1371/journal.pone.0112390

Acknowledgements NIH grant DA026119 Mike Neale (PI) Lindon Eaves Joshua Pritikin, Mahsa Zahery, & Mike Hunter The rest of the OpenMx Development Team

Trivially Simple Example Script (Part 1) dat <- cbind(rnorm(100),rep(1,100)) #<--Randomly generate data: colnames(dat) <- c("y","x") #Initialize MxModel: start <- mxGREMLStarter("GREMLmod",data=dat,Xdata="x",ydata = "y",addOnes = F, dropNAfromV = T) #Create a custom compute plan: plan <- mxComputeSequence(freeSet=c("Ve"),steps=list(   mxComputeNewtonRaphson(fitfunction="fitfunction"),   mxComputeOnce('fitfunction', c('fit','gradient','hessian','ihessian')),   mxComputeStandardError(), mxComputeReportDeriv() )) This is a trivially simple model (so it’s not that substantively interesting).

Trivially Simple Example Script (Part 2) #Finish constructing MxModel: testmod <- mxModel(   start,   plan,   mxExpectationGREML(V="V",X="X",y="y",dV = c(ve="I")),   mxMatrix(type = "Full", nrow = 1, ncol=1, free=T, values = 2, labels = "ve", lbound = 0.0001, name = "Ve"),   mxMatrix("Iden",nrow=100,name="I",condenseSlots=T),   mxAlgebra(I %x% Ve,name="V") ) testrun <- mxRun(testmod) #<--Run it. summary(testrun) #<--See results. #See results for fixed effects: cbind(testrun$fitfunction$info$b, sqrt(testrun$fitfunction$info$bcov))

Within-Person Model (for a not-so-trivial analysis) AS EL ES 1 1 1 1 L S 1 1 1 This is a bit more interesting: a biometric latent-growth curve model. Imagine if we want to fit it to a genotyped sample of classically unrelated participants. Y₁ Y₂ Y₃

Between-Person Model There will be n double-headed paths coming from each person’s AL and AS! This fact is represented with a GRM. AL AS EL ES