Univariate Model Fitting

Slides:



Advertisements
Similar presentations
OpenMx Frühling Rijsdijk.
Advertisements

Bivariate analysis HGEN619 class 2007.
Fitting Bivariate Models October 21, 2014 Elizabeth Prom-Wormley & Hermine Maes
Elizabeth Prom-Wormley & Hermine Maes
Multivariate Mx Exercise D Posthuma Files: \\danielle\Multivariate.
Genetics and Personality
(Re)introduction to OpenMx Sarah Medland. Starting at the beginning  Opening R Gui – double click Unix/Terminal – type R  Closing R Gui – click on the.
Path Analysis Danielle Dick Boulder Path Analysis Allows us to represent linear models for the relationships between variables in diagrammatic form.
Multivariate Analysis Nick Martin, Hermine Maes TC21 March 2008 HGEN619 10/20/03.
Multivariate Genetic Analysis: Introduction(II) Frühling Rijsdijk & Shaun Purcell Wednesday March 6, 2002.
Extended sibships Danielle Posthuma Kate Morley Files: \\danielle\ExtSibs.
Developmental models. Multivariate analysis choleski models factor models –y =  f + u genetic factor models –P j = h G j + c C j + e E j –common pathway.
Multiple raters March 7 th, 2002 Boulder, Colorado John Hewitt.
ACDE model and estimability Why can’t we estimate (co)variances due to A, C, D and E simultaneously in a standard twin design?
Univariate Analysis in Mx Boulder, Group Structure Title Type: Data/ Calculation/ Constraint Reading Data Matrices Declaration Assigning Specifications/
Multivariate Analysis Hermine Maes TC19 March 2006 HGEN619 10/20/03.
Biometrical Genetics Pak Sham & Shaun Purcell Twin Workshop, March 2002.
David M. Evans Sarah E. Medland Developmental Models in Genetic Research Wellcome Trust Centre for Human Genetics Oxford United Kingdom Twin Workshop Boulder.
Genetic Growth Curve Models Practica Boulder, March 2008 Irene Rebollo & Gitta Lubke & Mike Neale VU Amsterdam NL & Notre Dame, US & VCU, US.
Gene x Environment Interactions Brad Verhulst (With lots of help from slides written by Hermine and Liz) September 30, 2014.
Path Analysis Frühling Rijsdijk SGDP Centre Institute of Psychiatry King’s College London, UK.
Introduction to Multivariate Genetic Analysis Kate Morley and Frühling Rijsdijk 21st Twin and Family Methodology Workshop, March 2008.
Path Analysis Frühling Rijsdijk. Biometrical Genetic Theory Aims of session:  Derivation of Predicted Var/Cov matrices Using: (1)Path Tracing Rules (2)Covariance.
Raw data analysis S. Purcell & M. C. Neale Twin Workshop, IBG Colorado, March 2002.
Karri Silventoinen University of Helsinki Osaka University.
Karri Silventoinen University of Helsinki Osaka University.
Karri Silventoinen University of Helsinki Osaka University.
 Go to Faculty/marleen/Boulder2012/Moderating_cov  Copy all files to your own directory  Go to Faculty/sanja/Boulder2012/Moderating_covariances _IQ_SES.
Institute of Psychiatry King’s College London, UK
Introduction to Multivariate Genetic Analysis (2) Marleen de Moor, Kees-Jan Kan & Nick Martin March 7, 20121M. de Moor, Twin Workshop Boulder.
Extending Simplex model to model Ph  E transmission JANNEKE m. de kort & C.V. DolAn Contact:
Cholesky decomposition May 27th 2015 Helsinki, Finland E. Vuoksimaa.
Univariate modeling Sarah Medland. Starting at the beginning… Data preparation – The algebra style used in Mx expects 1 line per case/family – (Almost)
Practical SCRIPT: F:\meike\2010\Multi_prac\MultivariateTwinAnalysis_MatrixRaw.r DATA: DHBQ_bs.dat.
Attention Problems – SNP association Dorret Boomsma Toos van Beijsterveldt Michel Nivard.
Mx modeling of methylation data: twin correlations [means, SD, correlation] ACE / ADE latent factor model regression [sex and age] genetic association.
Mx Practical TC20, 2007 Hermine H. Maes Nick Martin, Dorret Boomsma.
Model building & assumptions Matt Keller, Sarah Medland, Hermine Maes TC21 March 2008.
Introduction to Genetic Theory
Welcome  Log on using the username and password you received at registration  Copy the folder: F:/sarah/mon-morning To your H drive.
Biological Approach Methods. Other METHODS of studying biological traits??? How else can you examine biological links to behaviour? Brain storm.
Introduction to Multivariate Genetic Analysis Danielle Posthuma & Meike Bartels.
Methodology of the Biological approach
March 7, 2012M. de Moor, Twin Workshop Boulder1 Copy files Go to Faculty\marleen\Boulder2012\Multivariate Copy all files to your own directory Go to Faculty\kees\Boulder2012\Multivariate.
Extended Pedigrees HGEN619 class 2007.
Invest. Ophthalmol. Vis. Sci ;57(1): doi: /iovs Figure Legend:
NORMAL DISTRIBUTIONS OF PHENOTYPES
Bivariate analysis HGEN619 class 2006.
Univariate Twin Analysis
Introduction to Multivariate Genetic Analysis
Fitting Univariate Models to Continuous and Categorical Data
Re-introduction to openMx
Gene-Environment Interaction & Correlation
MRC SGDP Centre, Institute of Psychiatry, Psychology & Neuroscience
Path Analysis Danielle Dick Boulder 2008
Can resemblance (e.g. correlations) between sib pairs, or DZ twins, be modeled as a function of DNA marker sharing at a particular chromosomal location?
Univariate modeling Sarah Medland.
Pak Sham & Shaun Purcell Twin Workshop, March 2002
(Re)introduction to Mx Sarah Medland
Sarah Medland faculty/sarah/2018/Tuesday
Lucía Colodro Conde Sarah Medland & Matt Keller
Files you will need are in Faculty drive: /lucia/Friday_Hackathon
MANOVA Control of experimentwise error rate (problem of multiple tests). Detection of multivariate vs. univariate differences among groups (multivariate.
Brad Verhulst & Sarah Medland
Power Calculation Practical
Multivariate Genetic Analysis: Introduction
Rater Bias & Sibling Interaction Meike Bartels Boulder 2004
Presentation transcript:

Univariate Model Fitting Sarah Medland QIMR – openMx workshop Brisbane 16/08/10

Univariate Twin Models = using the twin design to assess the aetiology of one trait (univariate) Path Diagrams Basic ACE R Script

1. Path Diagrams for Univariate Models

Basic Twin Model - MZ Twin 1 Trait A C E c e a Twin 2 Trait 1.0 1

Basic Twin Model - DZ Twin 1 Trait A C E c e a Twin 2 Trait 0.5 1 1.0

Basic Twin Model Twin 1 Trait A C E c e a Twin 2 Trait 1 rDZ = 0.5 rMZ = 1.0; rDZ = 1.0 rDZ = 0.5 1

The Variance Since the variance of a variable is the covariance of the variable with itself, the expected variance will be the sum of all paths from the variable to itself, which follow Wright’s rules

Variance for Twin 1 - A 1 1 a*1*a = a2 1 A C E c a e Twin 1 Trait

a*1*a = a2 c*1*c = c2 Variance for Twin 1 - C c a e 1 1 1 A C E Twin 1 Trait

a*1*a = a2 c*1*c = c2 e*1*e = e2 Variance for Twin 1 - E c a e 1 1 1 A Twin 1 Trait

Total Variance for Twin 1 a*1*a = a2 c*1*c = c2 e*1*e = e2 Twin 1 Trait A C E c e a 1 Total Variance = a2 + c2 + e2

a2 + c2 Covariance - MZ 1.0 1.0 1 1 1 1 1 1 A C E E C A c e c a e a Twin 1 Trait Twin 2 Trait

0.5a2 + c2 Covariance - DZ 0.5 1.0 1 1 1 1 1 1 A C E E C A c e c a e a Twin 1 Trait Twin 2 Trait

Variance-Covariance Matrices MZ Twin 1 Twin 2 a2 + c2 + e2 a2 + c2

Variance-Covariance Matrices DZ Twin 1 Twin 2 a2 + c2 + e2 0.5a2 + c2

Variance-Covariance Matrices MZ Twin 1 Twin 2 a2 + c2 + e2 a2 + c2 DZ 0.5a2 + c2

Why isn’t e2 included in the covariance? Because, e2 refers to environmental influences UNIQUE to each twin. Therefore, this cannot explain why there is similarity between twins. Why is a2 only .5 for DZs but not MZs? Because DZ twins share on average half of their genes, whereas MZs share all of their genes.

2. Basic openMx ACE Script

Overview OpenMx script Running the script Describing the output

Do some algebra to get the variances ACE model Specify the matrices you need To build the model # Fit ACE Model with RawData and Matrices Input # ----------------------------------------------------------------------- univACEModel <- mxModel("univACE", mxModel("ACE", # Matrices a, c, and e to store a, c, and e path coefficients mxMatrix( type="Lower", nrow=nv, ncol=nv, free=TRUE, values=10, label="a11", name="a" ), mxMatrix( type="Lower", nrow=nv, ncol=nv, free=TRUE, values=10, label="c11", name="c" ), mxMatrix( type="Lower", nrow=nv, ncol=nv, free=TRUE, values=10, label="e11", name="e" ), # Matrices A, C, and E compute variance components mxAlgebra( expression=a %*% t(a), name="A" ), mxAlgebra( expression=c %*% t(c), name="C" ), mxAlgebra( expression=e %*% t(e), name="E" ), # Algebra to compute total variances and standard deviations (diagonal only) mxAlgebra( expression=A+C+E, name="V" ), Twin 1 Trait A C E c e a 1 Do some algebra to get the variances

Start values? a2 = additive genetic variance (A) c2 = Shared E variance (C) e2 = Non-shared E variance (E) Sum is modelled to be expected Total Variance Start Values for a, c, e:  (Total Variance / 3) Twin 1 Trait A C E c e a 1

Standardize parameter estimates # Algebra to compute total variances and standard deviations (diagonal only) mxAlgebra( expression=A+C+E, name="V" ), mxMatrix( type="Iden", nrow=nv, ncol=nv, name="I"), mxAlgebra( expression=solve(sqrt(I*V)), name="iSD"), # Algebra to compute standardized path estimares and variance components mxAlgebra( expression=a%*%iSD, name="sta"), mxAlgebra( expression=c%*%iSD, name="stc"), mxAlgebra( expression=e%*%iSD, name="ste"), mxAlgebra( expression=A/V, name="h2"), mxAlgebra( expression=C/V, name="c2"), mxAlgebra( expression=E/V, name="e2"), a * 1/SD = a/SD Twin 1 Trait A a 1 The regression coefficient a is standardized by: (a * SD(A)) / SD(Trait) where SD(Trait) is the standard deviation of the dependent variable, and SD(A) is the standard deviation of the predictor, the latent factor ‘A’ (=1) V 1 * = V SD inv = 1/SD

Standardize parameter estimates # Algebra to compute total variances and standard deviations (diagonal only) mxAlgebra( expression=A+C+E, name="V" ), mxMatrix( type="Iden", nrow=nv, ncol=nv, name="I"), mxAlgebra( expression=solve(sqrt(I*V)), name="iSD"), # Algebra to compute standardized path estimares and variance components mxAlgebra( expression=a%*%iSD, name="sta"), mxAlgebra( expression=c%*%iSD, name="stc"), mxAlgebra( expression=e%*%iSD, name="ste"), mxAlgebra( expression=A/V, name="h2"), mxAlgebra( expression=C/V, name="c2"), mxAlgebra( expression=E/V, name="e2"), Twin 1 Trait A sta 1 The heritability ‘h2’ is the proportion of the total variance due to A (additive genetic effects; = A / V. Note: this will be “sta” squared. The standardized variance components for C and E are: C / V; E / V N “sta” 2 V A / = “h2”

Standardising data V = A + C + E A/V = 73/233 = .31 V = [73] + [90] + [70] C/V = 90/233 = .39 V = [233] E/V = 70/233 = .30 a = 8.55 c = 9.49 e = 8.35 SD = sqrt(V) = 15.26 sta = 8.55 / 15.26 = 0.56 squared = .31 stc = 9.49 / 15.26 = 0.62 squared = .39 ste = 8.35 / 15.26 = 0.55 squared = .30

# Algebra for expected variance/covariance matrix in MZ mxAlgebra( expression= rbind ( cbind(A+C+E , A+C), cbind(A+C , A+C+E)), name="expCovMZ" ), # Algebra for expected variance/covariance matrix in DZ mxAlgebra( expression= rbind ( cbind(A+C+E , 0.5%x%A+C), cbind(0.5%x%A+C , A+C+E)), name="expCovDZ" ) MZ Twin 1 Twin 2 a2 + c2 + e2 a2 + c2 DZ 0.5a2 + c2 Twin 1 Trait A C E c e a Twin 2 Trait 1/ 0.5 1 1.0

mxModel("MZ", mxData( observed=mzData, type="raw" ), mxFIMLObjective( covariance="ACE.expCovMZ", means="ACE.expMean", dimnames=selVars ) ), mxModel("DZ", mxData( observed=dzData, type="raw" ), mxFIMLObjective( covariance="ACE.expCovDZ", means="ACE.expMean", dimnames=selVars ) mxAlgebra( expression=MZ.objective + DZ.objective, name="m2ACEsumll" ), mxAlgebraObjective("m2ACEsumll"), mxCI(c('ACE.A', 'ACE.C', 'ACE.E')) ) univACEFit <- mxRun(univACEModel, intervals=T) univACESumm <- summary(univACEFit) univACESumm

You can fit sub-models to test the significance of your parameters # Fit AE model # ----------------------------------------------------------------------- univAEModel <- mxModel(univACEFit, name="univAE", mxModel(univACEFit$ACE, mxMatrix( type="Lower", nrow=1, ncol=1, free=FALSE, values=0, label="c11", name="c" ) ) ) univAEFit <- mxRun(univAEModel) univAESumm <- summary(univAEFit) univAESumm Twin 1 Trait A E Twin 2 Trait You can fit sub-models to test the significance of your parameters -you simply drop the parameter and see if the model fit is significantly worse than full model

Sub-models The E parameter can never not be dropped # Fit CE model # ----------------------------------------------------------------------- univCEModel <- mxModel(univACEFit, name="univCE", mxModel(univACEFit$ACE, mxMatrix( type="Lower", nrow=1, ncol=1, free=FALSE, values=0, label="a11", name="a" ) ) ) univCEFit <- mxRun(univCEModel) univCESumm <- summary(univCEFit) univCESumm Twin 1 Trait C E Twin 2 Trait Twin 1 Trait E Twin 2 Trait # Fit E model # ----------------------------------------------------------------------- univEModel <- mxModel(univAEFit, name="univE", mxModel(univAEFit$ACE, mxMatrix( type="Lower", nrow=1, ncol=1, free=FALSE, values=0, label="a11", name="a" ) ) ) univEFit <- mxRun(univEModel) univESumm <- summary(univEFit) univESumm The E parameter can never not be dropped because it includes measurement error

OpenMx Output

univACESumm free parameters: name matrix row col Estimate 1 a11 ACE.a 1 1 8.546504 2 c11 ACE.c 1 1 9.488454 3 e11 ACE.e 1 1 8.352197 4 mean ACE.Mean 1 1 94.147803 observed statistics: 2198 estimated parameters: 4 degrees of freedom: 2194 -2 log likelihood: 17639.91 saturated -2 log likelihood: NA number of observations: 1110 chi-square: NA p: NA AIC (Mx): 13251.91 BIC (Mx): 1127.663 adjusted BIC: RMSEA: NA

tableFitStatistics models compared to saturated model Name ep -2LL df AIC diffLL diffdf p M1 : univTwinSat 10 17637.98 2188 13261.98 - - - M2 : univACE 4 17639.91 2194 13251.91 1.93 6 0.93 M3 : univAE 3 17670.38 2195 13280.38 32.4 7 0 M4 : univCE 3 17665.27 2195 13275.27 27.3 7 0 M5 : univE 2 18213.28 2196 13821.28 575.3 8 0 Smaller -2LL means better fit. -2LL of sub-model is always higher (worse fit). The question is: is it significantly worse. Chi-sq test: dif in -2LL is chi-square distributed. Evaluate sig of chi-sq test. A non-sig p-value means that the model is consistent with the data.

Nested.fit models compared to ACE model Name ep -2LL df diffLL diffdf p univACE 4 17639.91 2194 NA NA NA univAE 3 17670.38 2195 32.47 1 0 univCE 3 17665.27 2195 25.36 1 0 univE 2 18213.28 2196 573.37 2 0 Smaller -2LL means better fit. -2LL of sub-model is always higher (worse fit). The question is: is it significantly worse. Chi-sq test: dif in -2LL is chi-square distributed. Evaluate sig of chi-sq test. Critical Chi-sq value for 1 DF = 3.84 A non-sig p-value means that the dropped parameter(s) are non-significant.

Estimates ACE model > univACEFit$ACE.h2 [,1] [1,] 0.3137131 > univACEFit$ACE.c2 [1,] 0.3866762 > univACEFit$ACE.e2 [1,] 0.2996108