Mx modeling of methylation data: twin correlations [means, SD, correlation] ACE / ADE latent factor model regression [sex and age] genetic association.

Slides:

Advertisements

Similar presentations

Bivariate analysis HGEN619 class 2007.

Advertisements

Elizabeth Prom-Wormley & Hermine Maes

Multivariate Mx Exercise D Posthuma Files: \\danielle\Multivariate.

Structural Equation Modeling

Estimating “Heritability” using Genetic Data David Evans University of Queensland.

Path Analysis Danielle Dick Boulder Path Analysis Allows us to represent linear models for the relationships between variables in diagrammatic form.

Multivariate Analysis Nick Martin, Hermine Maes TC21 March 2008 HGEN619 10/20/03.

Bivariate Model: traits A and B measured in twins cbca.

Extended sibships Danielle Posthuma Kate Morley Files: \\danielle\ExtSibs.

ACDE model and estimability Why can’t we estimate (co)variances due to A, C, D and E simultaneously in a standard twin design?

Heterogeneity Hermine Maes TC19 March Files to Copy to your Computer Faculty/hmaes/tc19/maes/heterogeneity  ozbmi.rec  ozbmi.dat  ozbmiysat(4)(5).mx.

Univariate Analysis in Mx Boulder, Group Structure Title Type: Data/ Calculation/ Constraint Reading Data Matrices Declaration Assigning Specifications/

Continuous heterogeneity Shaun Purcell Boulder Twin Workshop March 2004.

Multivariate Analysis Hermine Maes TC19 March 2006 HGEN619 10/20/03.

Genetic Dominance in Extended Pedigrees: Boulder, March 2008 Irene Rebollo Biological Psychology Department, Vrije Universiteit Netherlands Twin Register.

Heterogeneity II Danielle Dick, Tim York, Marleen de Moor, Dorret Boomsma Boulder Twin Workshop March 2010.

Univariate Analysis Hermine Maes TC19 March 2006.

Gene x Environment Interactions Brad Verhulst (With lots of help from slides written by Hermine and Liz) September 30, 2014.

Regressing SNPs on a latent variable Michel Nivard & Nick Martin.

Path Analysis Frühling Rijsdijk SGDP Centre Institute of Psychiatry King’s College London, UK.

Mx Practical TC18, 2005 Dorret Boomsma, Nick Martin, Hermine H. Maes.

Introduction to Multivariate Genetic Analysis Kate Morley and Frühling Rijsdijk 21st Twin and Family Methodology Workshop, March 2008.

Path Analysis Frühling Rijsdijk. Biometrical Genetic Theory Aims of session:  Derivation of Predicted Var/Cov matrices Using: (1)Path Tracing Rules (2)Covariance.

Raw data analysis S. Purcell & M. C. Neale Twin Workshop, IBG Colorado, March 2002.

Discriminant Analysis Testing latent variables as predictors of groups.

Karri Silventoinen University of Helsinki Osaka University.

Karri Silventoinen University of Helsinki Osaka University.

Karri Silventoinen University of Helsinki Osaka University.

Institute of Psychiatry King’s College London, UK

Cholesky decomposition May 27th 2015 Helsinki, Finland E. Vuoksimaa.

Univariate modeling Sarah Medland. Starting at the beginning… Data preparation – The algebra style used in Mx expects 1 line per case/family – (Almost)

Longitudinal Modeling Nathan Gillespie & Dorret Boomsma \\nathan\2008\Longitudinal neuro_f_chol.mx neuro_f_simplex.mx jepq6.dat.

Whole genome approaches to quantitative genetics Leuven 2008.

The importance of the “Means Model” in Mx for modeling regression and association Dorret Boomsma, Nick Martin Boulder 2008.

Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.

Gene-Environment Interaction & Correlation Danielle Dick & Danielle Posthuma Leuven 2008.

Combined Linkage and Association in Mx Hermine Maes Kate Morley Dorret Boomsma Nick Martin Meike Bartels Boulder 2009.

Univariate Analysis Hermine Maes TC21 March 2008.

Linkage and association Sarah Medland. Genotypic similarity between relatives IBS Alleles shared Identical By State “look the same”, may have the same.

Attention Problems – SNP association Dorret Boomsma Toos van Beijsterveldt Michel Nivard.

Tutorial I: Missing Value Analysis

Means, Thresholds and Moderation Sarah Medland – Boulder 2008 Corrected Version Thanks to Hongyan Du for pointing out the error on the regression examples.

Mx Practical TC20, 2007 Hermine H. Maes Nick Martin, Dorret Boomsma.

David M. Evans Multivariate QTL Linkage Analysis Queensland Institute of Medical Research Brisbane Australia Twin Workshop Boulder 2003.

Continuous heterogeneity Danielle Dick & Sarah Medland Boulder Twin Workshop March 2006.

Welcome  Log on using the username and password you received at registration  Copy the folder: F:/sarah/mon-morning To your H drive.

Introduction to Multivariate Genetic Analysis Danielle Posthuma & Meike Bartels.

Developmental Models/ Longitudinal Data Analysis Danielle Dick & Nathan Gillespie Boulder, March 2006.

Chapter 17 STRUCTURAL EQUATION MODELING. Structural Equation Modeling (SEM)  Relatively new statistical technique used to test theoretical or causal.

QTL Mapping Using Mx Michael C Neale Virginia Institute for Psychiatric and Behavioral Genetics Virginia Commonwealth University.

March 7, 2012M. de Moor, Twin Workshop Boulder1 Copy files Go to Faculty\marleen\Boulder2012\Multivariate Copy all files to your own directory Go to Faculty\kees\Boulder2012\Multivariate.

Multivariate Genetic Analysis (Introduction) Frühling Rijsdijk Wednesday March 8, 2006.

Categorical Data HGEN

Extended Pedigrees HGEN619 class 2007.

Invest. Ophthalmol. Vis. Sci ;57(1): doi: /iovs Figure Legend:

Univariate Twin Analysis

Introduction to Multivariate Genetic Analysis

Correlation, Regression & Nested Models

Re-introduction to openMx

MRC SGDP Centre, Institute of Psychiatry, Psychology & Neuroscience

Path Analysis Danielle Dick Boulder 2008

Univariate modeling Sarah Medland.

Why general modeling framework?

(Re)introduction to Mx Sarah Medland

Sarah Medland faculty/sarah/2018/Tuesday

Lucía Colodro Conde Sarah Medland & Matt Keller

Power Calculation for QTL Association

BOULDER WORKSHOP STATISTICS REVIEWED: LIKELIHOOD MODELS

Multivariate Genetic Analysis: Introduction

Presentation transcript:

Mx modeling of methylation data: twin correlations [means, SD, correlation] ACE / ADE latent factor model regression [sex and age] genetic association analysis [SNP] Dorret Boomsma, Nick Martin, Irene Rebollo Leuven 2008 Heijmans BT, Kremer D, Tobi EW, Boomsma DI, Slagboom PE. Heritable rather than age-related environmental and stochastic factors dominate variation in DNA methylation of the human IGF2/H19 locus. Hum Mol Genet (5):

This session Obtain MZ and DZ twin correlations; obtain ML estimates of means and variances Estimate heritability from ACE model Extension 1: regression analysis: how well do sex and age predict methylation? Extension 2: regression analysis: how well do SNPs in IGF2 predict methylation? (i.e. genetic association test)

Correlation (covariance) Model

Correlation Model MZ & DZ MZ twinsDZ twins 3 x 2 parameters: 1 mean, 1 variance, 1 covariance for MZ and DZ (also possible to estimate separate means/variances for first- and second-born twins)

MX Mx script can be divided into several “groups” Parameters are estimated in matrices Matrices are defined in groups (locally) Matrices (or matrix elements) can be equated across groups To estimates MZ and DZ correlations I used 3 groups: a data definition group and 2 data groups

First group G1: calculation group Data Calculation NGroups=3 Begin matrices; X dia 2 2 Free! (standard deviation) MZ Y stand 2 2 Free! correlation MZ S dia 2 2 Free! (standard deviation) DZ T stand 2 2 Free! correlation DZ G Full 1 1 free! grand mean phenotypes MZ H Full 1 1 free! grand mean phenotypes DZ End matrices; Script: correlatiejob igf2_mp2_2008.mx Data: mx_igf2_aug08_mp2.dat

Model (matrix notation) Covariances X*Y*X' ;! model for MZs Covariances S*T*S';! model for DZs X dia 2 2 Free! (standard deviation) MZ Y stand 2 2 Free! correlation MZ S dia 2 2 Free! (standard deviation) DZ T stand 2 2 Free! correlation DZ

MX Output: MZ, DZ means MZ, DZ SDs MZ, DZ correlations MeanSDCorr MZ DZ parameters estimated; -2 * log-likelihood of data = What is the heritability of this trait?

ACE Model, based on MZ & DZ data Square = observed phenotype; circle = latent variable (standardized); a, c, e are factor loadings; if fitted to raw data model also includes means

ACE Model (+ Means) MZ twinsDZ twins parameters: means, 3 path coefficients: a, c, e

ADE Model When DZ correlations are much lower than MZ correlations; common environment is unlikely and genetic non-additivity (genetic dominance) may explain the data better

Individual Model & Variation T1= aA1 + cC1 + eE1 T2= aA2 + cC2 + eE2 Var(T1) = a*a*Var(A1) + c*c*var(C1) + e*e*Var(E1) = a 2 +c 2 +e 2 Covar (T1,T2) = a*a*covar(A1,A2) + c*c*var(C1,C2) = αa 2 +γc 2

Fit ACE model (and submodels) Modify the correlation script Or: take ACE igf2_mp2_2008.mx; fits ACE, AE, CE and E models to the data

Saturated and ACE Correlation script: -2LL = (df = 6) ACE model: -2LL = (df = 4) Does the ACE model fit worse?? Why???

Back to correlation model Correlation model: -2LL = (df = 6) Correlation model with equal means and variances for MZ & DZ: -2 LL = (df = 4) ACE model: -2LL = (df = 4) MeanSDMZ / DZ Corr / 0.26 Heritability from ACE model = 0.78% CE (-2LL = ) or E model (-2LL = ) do not fit

Means testing, regression analysis etc. for clustered data (e.g. from twins or sibs) If Ss are unrelated any statistical package can be used for regression analysis, tests of mean differences, estimation of variance. If data come from related Ss (e.g. twins) we need to model the covariance structure between Ss to obtain the correct answer.

MX Mx allows us to model means and covariance structures (for dependent variables) If means are modeled input must be “raw data” (Full Information Maximum Likelihood - FIML) In addition, the user can specify “definition variables” (these are the predictors in a regression equation (= independent variables)) The independent variables are not modeled in the covariance matrix

The likelihood of the i th family x is vector of observed values (dependent variables) for twin1, twin2 etc μ is vector of expected values, given observed independent variables (predictors, regressors) such as age, sex, genotype etc. Σ is the variance covariance matrix of residual values after the regression effects on the expected values have been removed

Testing assumptions (1) Are means same for MZ & DZ? Are SD the same for MZ & DZ? Are means same for men and women? Is there an effect of age?

Using definition variables Dependent variables (phenotypes) may have missing values (e.g ) Independent (definition) variables are NOT allowed to have missing values (if they do, all data for that case (twin pair) are removed)

Individual differences Our ultimate goal is to be able to measure all causal variables so the residual variance approaches zero – except for measurement error. Until that time we have to continue to model variance components in terms of A, C – and E (latent (=unmeasured) constructs). However, if causal variables are also influenced by genes, we want to use multivariate modeling (and not correct the dependent variable)

“Means model” – or preferably model for expected individual values X i = M + B*P i + e i –M = grand mean –B = regression –P = predictor(s), e.g. age and sex –e = residual term i stands for individual (M and B are invariant over individuals) read in the predictor variables in Mx

Importance of getting the means model right Age regression can look like C in twin model (twins are of the same age) If pooling data from 2 sexes, sex differences in means can create C Model grand mean (female) + male deviation Correcting for age, sex effects on means does not mean that residual variance components are necessarily homogeneous between groups – need GxE modeling (later in the week)

Definition variables: age and sex Definition variables cannot be missing, even if dependent variable is missing in FIML - if dependent variable is missing, supply a valid dummy value (doesn’t matter which value, as long as it is not the same as the missing code for the dependent variable!) - if dependent variable is not missing, supply e.g. the population mean for the definition variable, or the co-twin’s value – i.e. impute with care! Or delete data of this person

Mx script for age/sex correction Script = Covar correlatiejob igf2_mp2_2008.mx Data file = mx_igf2_aug08_mp2.dat Or modify your old script Dependent variable is a quantitative methylation score Data were collected on adolescent twins Definition variables: age and sex Are effects of age and sex significant?

Saving residuals Mx allows you to save residuals after baseline run and then use these as input variables for batch runs Option saveres

Including genotypes in the means model Allelic model for SNPs (2 alleles), or genotypic model (3 genotypes (0,1,2 alleles)) For microsatellites with k alleles, k-1 deviations, k(k-1)/2 – 1 deviations! Missing genotypes?  not allowed

Including genotypes in the means model Phenotype = methylation scores Predictors: 1 SNP Coding: 0,1,2 (N of alleles) Script: SNP correlatiejob igf2_mp2_2008.mx Data: mx_igf2_aug08_mp2_noMis.dat OR: modify one of the existing scripts NB slightly different dataset

Including genotypes in the means model Is there evidence that this SNP has a main effect? No