A General Modeling Framework for Studying Candidate Genes Copy files from f:\edwin\example
Why general modeling framework? Candidate genes for quantitative traits usually “main effect” on mean. Genetic advantage more extensive modeling framework Some candidate genes may be more likely to be detected One reason is power e.g. (pleiotropic) easier to detect in multivariate study Some genes may not work in a simple “main effect” fashion e.g. exert their effects in severely deprived environments only, or influence the sensitivity to environmental fluctuations (variance) Correct tests? e.g. different genotypic variances in selected samples
More extensive picture genetic effects Substantive advantage general modeling framework More extensive picture genetic effects Shed new light on traditional research questions Continuity, change, and heterotypy Comorbidity/pleiotropy Complex traits: Causal mechanisms involving multiple factors New issues: The interplay between genotypes and environment. Vulnerability, resilience, and protective factors Risk behavior and the construction of favorable environments Sensitivity to environmental fluctuations Instrumental function due to unique properties
Requirements modeling framework Genetic effects on the means, variances, and relations between variables Stratification effects on all these components Nuclear families of various sizes Interpretable parameterization Di- and multi-allelic loci, marker haplotypes, multiple loci simultaneously, and parental genotypes Easy to fit in existing (Mx) software
LISREL based model h(s) = ajk(s) + Bjk(s)h(s) + Gjk(s) + zjk(s) y(s) = nyjk(s) + Lyjk(s)h(s) + eyjk(s) x = nxk + Lxk + exk y subject variables x family variables
x-variables is independent subject plus family variables Alternative Models Conditional model h(s) = ajk(s) + Bjk(s)h(s) + Gjk(s)xs + zjk(s) y(s) = njk(s) + Ljk(s)h(s) + Kjk(s)xs + ejk(s) x-variables is independent subject plus family variables relax assumption full multivariate normality curvi or non-linear effects x-variables Disadvantage: - Optimization, - Measurement model x-variables Other modeling frameworks
Partitioning parameter matrices Most matrices: a) general matrices that are not subscripted represent overall model in all genotype groups and population strata b) genetic matrices j represent deviations from the general model caused by locus effects c) matrices that are subscripted k and represent deviations from the general model caused by population stratification
How? Example matrix Beta: Causal effects of subject variables on each other Bjk(s) = B + Bj(gsI) + Bk(fI) Main effects are in B that has dimension nh nh,
Genetic effects in term Bj(gsI) The ng 1 vector gs contains ng dummy variables coding the genotype (haplotype) of subject s deviations from B thus maximum = #genotypes - 1 sets of dummy variables to study multiple loci simultaneously or effects of parental genotypes - Bj = [ B1 | B2 |… | Bng] dimension is nh (ng nh), where B1 is the nh nh submatrix containing the effects of the first dummy variable, …etc.
Example
A1A1 subjects
A1A2 subjects
A2A2 subjects
Stratification effects in term Bk(fI) The nf 1 vector f contains the nf dummy variables used to code family types deviations thus maximum = #family types - 1 Bk = [ B1 | B2 |… | Bnf] dimension is nh (nf nh), where B1 is the nh nh submatrix containing the effects of the first dummy variable, …etc. and I select proper matrix for dummy variable
Sibling pairs A B F1 F2 F3 F4 F5 Subject Not informative 2 1 F1 F2 F3 F4 F5 Subject A B Not informative 2 1 of stratification Informative
Two Parents, one “child” Parent A B Subject F1 F2 F3 F4 F5 Not informative 2 1 of stratification Informative
Other matrices are partitioned in the same way
General interpretation Genetic effects on: means are “main” effects relations between variables are interaction effects residuals are variance effects
Simple example
y1 y2 z1 z2 y1 y2 a1(1) a1(2) a2(1) a2(2) a1 a2 0 b12 b21 0
Interactions
b21(1) > 0 and b21(2) = 0
b21(1) and b21(2) >0
Estimation and specification in Mx
Expected means and covariances single subject
Expected means and covariances whole family
Maximize log-likelihood function given the observed data by Raw Maximum likelihood where the individual log-likelihoods equal Minus two times the difference between the log likelihoods of two nested models is chi-square distributed with the difference in estimated parameters as the degrees of freedom.
Specification Therefore simple program Most instances selection of matrices Dimension matrices > boring, errors Get started Therefore simple program Batch or questions
MxScript Data structure Matrices to be used File names Number of (latent) subject variables? Number of subjects in largest family? Number of dummy variables for genotypes? Matrices to be used Do the subject variables have causal effects on each other? BETA? GENETIC: causal relations between subject variables? BETA? STRATIFICATION: means of subject variables? ALPHA? File names Name of file with your data? (DOS name)? Name of the file for the Mx script? (DOS name)
Structure Mx script Most instances four groups Group Function Free parameters Starting values 1 General part yes yes 2 Genetic effects yes 3 Stratification effects yes 4 Fit model to data Type from DOS-prompt: MxScript <ENTER> Type from DOS-prompt: MxScript input.dat <ENTER>
Example Name data file: example.dat Sibling pairs, no parents Three genotype groups Family variables in data file (indicate that you want specify admixture effects) Starting values: sample drawn from multivariate distribution with means 0 and variances 1.5
General part exercise BMD Intensity Arm Spine Duration Hip
Identification measurement model:
exercise BMD Common pathway? Intensity Arm Independent pathway? Genetic + Stratification effects Spine Duration Hip
Tests Common pathway-Estimate model with genetic and stratification effects on means of second latent variable and test for significance of: Genetic effects Stratification effects Genetic + stratification effect Independent pathway- Estimate model with genetic and stratification effects on means of the indicators of the second latent variable and test for significance of:
Free elements a Full 2 1 Free [Matrices-End matrices section] Free a 1 1 a 2 1 [After End matrices - free elements] Free a 1 1 to a 2 1 [After End matrices - free range]
Copy files from f:\edwin\solution