Why general modeling framework?

Slides:



Advertisements
Similar presentations
1 Regression as Moment Structure. 2 Regression Equation Y =  X + v Observable Variables Y z = X Moment matrix  YY  YX  =  YX  XX Moment structure.
Advertisements

A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Structural Equation Modeling Using Mplus Chongming Yang Research Support Center FHSS College.
1 1 Chapter 5: Multiple Regression 5.1 Fitting a Multiple Regression Model 5.2 Fitting a Multiple Regression Model with Interactions 5.3 Generating and.
Structural Equation Modeling
Solution Copy files from f:\edwin\solution.  21  11  22 1  2 BMD  1 exercise 1 y 3 Hip  11  22  33  44  y 5 Arm y 4 Spine y 2 Duration.
Chapter 11 Multiple Regression.
Raw data analysis S. Purcell & M. C. Neale Twin Workshop, IBG Colorado, March 2002.
Estimation Kline Chapter 7 (skip , appendices)
Univariate modeling Sarah Medland. Starting at the beginning… Data preparation – The algebra style used in Mx expects 1 line per case/family – (Almost)
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
The importance of the “Means Model” in Mx for modeling regression and association Dorret Boomsma, Nick Martin Boulder 2008.
Multivariate Statistics Confirmatory Factor Analysis I W. M. van der Veld University of Amsterdam.
Measurement Models: Identification and Estimation James G. Anderson, Ph.D. Purdue University.
Lecture 21: Quantitative Traits I Date: 11/05/02  Review: covariance, regression, etc  Introduction to quantitative genetics.
Estimation Kline Chapter 7 (skip , appendices)
Mx modeling of methylation data: twin correlations [means, SD, correlation] ACE / ADE latent factor model regression [sex and age] genetic association.
Lecture 22: Quantitative Traits II
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
Powerful Regression-based Quantitative Trait Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.
David M. Evans Multivariate QTL Linkage Analysis Queensland Institute of Medical Research Brisbane Australia Twin Workshop Boulder 2003.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Multiple Regression Chapter 14.
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L11.1 Lecture 11: Canonical correlation analysis (CANCOR)
QTL Mapping Using Mx Michael C Neale Virginia Institute for Psychiatric and Behavioral Genetics Virginia Commonwealth University.
The SweSAT Vocabulary (word): understanding of words and concepts. Data Sufficiency (ds): numerical reasoning ability. Reading Comprehension (read): Swedish.
I. Statistical Methods for Genome-Enabled Prediction of Complex Traits OUTLINE THE CHALLENGES OF PREDICTING COMPLEX TRAITS ORDINARY LEAST SQUARES (OLS)
Methods of Presenting and Interpreting Information Class 9.
Date of download: 11/12/2016 Copyright © 2016 American Medical Association. All rights reserved. From: Influence of Child Abuse on Adult DepressionModeration.
The Chi Square Test A statistical method used to determine goodness of fit Chi-square requires no assumptions about the shape of the population distribution.
Tests of hypothesis Contents: Tests of significance for small samples
Regression Models for Linkage: Merlin Regress
Advanced Statistical Methods: Continuous Variables
BINARY LOGISTIC REGRESSION
Chapter 7. Classification and Prediction
Structural Equation Modeling using MPlus
Probability Theory and Parameter Estimation I
Gene-environment interaction
Kin 304 Regression Linear Regression Least Sum of Squares
Understanding and conducting
Re-introduction to openMx
CH 5: Multivariate Methods
Correlation – Regression
Regression.
Statistical Tools in Quantitative Genetics
The Maximum Likelihood Method
Genome Wide Association Studies using SNP
Multiple Regression and Model Building
12 Inferential Analysis.
BPK 304W Correlation.
Univariate modeling Sarah Medland.
Regression-based linkage analysis
Linkage in Selected Samples
Error Checking for Linkage Analyses
Correlation for a pair of relatives
OVERVIEW OF LINEAR MODELS
Chapter 8: Weighting adjustment
The Multivariate Normal Distribution, Part 2
What are BLUP? and why they are useful?
Structural Equation Modeling
Sarah Medland faculty/sarah/2018/Tuesday
Simple Linear Regression
12 Inferential Analysis.
OVERVIEW OF LINEAR MODELS
A Flexible Bayesian Framework for Modeling Haplotype Association with Disease, Allowing for Dominance Effects of the Underlying Causative Variants  Andrew.
Power Calculation for QTL Association
BOULDER WORKSHOP STATISTICS REVIEWED: LIKELIHOOD MODELS
The European Statistical Training Programme (ESTP)
Causal Relationships with measurement error in the data
Testing Causal Hypotheses
Presentation transcript:

A General Modeling Framework for Studying Candidate Genes Copy files from f:\edwin\example

Why general modeling framework? Candidate genes for quantitative traits usually “main effect” on mean. Genetic advantage more extensive modeling framework Some candidate genes may be more likely to be detected One reason is power e.g. (pleiotropic) easier to detect in multivariate study Some genes may not work in a simple “main effect” fashion e.g. exert their effects in severely deprived environments only, or influence the sensitivity to environmental fluctuations (variance) Correct tests? e.g. different genotypic variances in selected samples

More extensive picture genetic effects Substantive advantage general modeling framework More extensive picture genetic effects Shed new light on traditional research questions Continuity, change, and heterotypy Comorbidity/pleiotropy Complex traits: Causal mechanisms involving multiple factors New issues: The interplay between genotypes and environment. Vulnerability, resilience, and protective factors Risk behavior and the construction of favorable environments Sensitivity to environmental fluctuations Instrumental function due to unique properties

Requirements modeling framework Genetic effects on the means, variances, and relations between variables Stratification effects on all these components Nuclear families of various sizes Interpretable parameterization Di- and multi-allelic loci, marker haplotypes, multiple loci simultaneously, and parental genotypes Easy to fit in existing (Mx) software

LISREL based model h(s) = ajk(s) + Bjk(s)h(s) + Gjk(s) + zjk(s) y(s) = nyjk(s) + Lyjk(s)h(s) + eyjk(s) x = nxk + Lxk + exk y subject variables x family variables

x-variables is independent subject plus family variables Alternative Models Conditional model h(s) = ajk(s) + Bjk(s)h(s) + Gjk(s)xs + zjk(s) y(s) = njk(s) + Ljk(s)h(s) + Kjk(s)xs + ejk(s) x-variables is independent subject plus family variables relax assumption full multivariate normality curvi or non-linear effects x-variables Disadvantage: - Optimization, - Measurement model x-variables Other modeling frameworks

Partitioning parameter matrices Most matrices: a) general matrices that are not subscripted represent overall model in all genotype groups and population strata b) genetic matrices j represent deviations from the general model caused by locus effects c) matrices that are subscripted k and represent deviations from the general model caused by population stratification

How? Example matrix Beta: Causal effects of subject variables on each other Bjk(s) = B + Bj(gsI) + Bk(fI) Main effects are in B that has dimension nh  nh,

Genetic effects in term Bj(gsI) The ng  1 vector gs contains ng dummy variables coding the genotype (haplotype) of subject s deviations from B thus maximum = #genotypes - 1 sets of dummy variables to study multiple loci simultaneously or effects of parental genotypes - Bj = [ B1 | B2 |… | Bng] dimension is nh  (ng  nh), where B1 is the nh  nh submatrix containing the effects of the first dummy variable, …etc.

Example

A1A1 subjects

A1A2 subjects

A2A2 subjects

Stratification effects in term Bk(fI) The nf  1 vector f contains the nf dummy variables used to code family types deviations thus maximum = #family types - 1 Bk = [ B1 | B2 |… | Bnf] dimension is nh  (nf  nh), where B1 is the nh  nh submatrix containing the effects of the first dummy variable, …etc.  and I select proper matrix for dummy variable

Sibling pairs A B F1 F2 F3 F4 F5 Subject Not informative 2 1   F1 F2 F3 F4 F5 Subject A B  Not informative 2 1 of stratification Informative

Two Parents, one “child”   Parent A B Subject F1 F2 F3 F4 F5 Not informative 2 1 of stratification Informative

Other matrices are partitioned in the same way

General interpretation Genetic effects on: means are “main” effects relations between variables are interaction effects residuals are variance effects

Simple example

y1 y2 z1 z2 y1 y2 a1(1) a1(2) a2(1) a2(2) a1 a2 0 b12 b21 0

Interactions

b21(1) > 0 and b21(2) = 0

b21(1) and b21(2) >0

Estimation and specification in Mx

Expected means and covariances single subject

Expected means and covariances whole family

Maximize log-likelihood function given the observed data by Raw Maximum likelihood where the individual log-likelihoods equal Minus two times the difference between the log likelihoods of two nested models is chi-square distributed with the difference in estimated parameters as the degrees of freedom.

Specification Therefore simple program Most instances selection of matrices Dimension matrices > boring, errors Get started Therefore simple program Batch or questions

MxScript Data structure Matrices to be used File names Number of (latent) subject variables? Number of subjects in largest family? Number of dummy variables for genotypes? Matrices to be used Do the subject variables have causal effects on each other? BETA? GENETIC: causal relations between subject variables? BETA? STRATIFICATION: means of subject variables? ALPHA? File names Name of file with your data? (DOS name)? Name of the file for the Mx script? (DOS name)

Structure Mx script Most instances four groups Group Function Free parameters Starting values 1 General part yes yes 2 Genetic effects yes 3 Stratification effects yes 4 Fit model to data Type from DOS-prompt: MxScript <ENTER> Type from DOS-prompt: MxScript input.dat <ENTER>

Example Name data file: example.dat Sibling pairs, no parents Three genotype groups Family variables in data file (indicate that you want specify admixture effects) Starting values: sample drawn from multivariate distribution with means 0 and variances 1.5

General part exercise BMD Intensity Arm Spine Duration Hip

Identification measurement model:

exercise BMD Common pathway? Intensity Arm Independent pathway? Genetic + Stratification effects Spine Duration Hip

Tests Common pathway-Estimate model with genetic and stratification effects on means of second latent variable and test for significance of: Genetic effects Stratification effects Genetic + stratification effect Independent pathway- Estimate model with genetic and stratification effects on means of the indicators of the second latent variable and test for significance of:

Free elements a Full 2 1 Free [Matrices-End matrices section] Free a 1 1 a 2 1 [After End matrices - free elements] Free a 1 1 to a 2 1 [After End matrices - free range]

Copy files from f:\edwin\solution