Download presentation
Presentation is loading. Please wait.
Published byTapani Saaristo Modified over 5 years ago
1
Simulating and Modeling Genetically Informative Data
Matthew C. Keller Sarah E. Medland
2
Outline The usefulness of simulation in behavioral genetics
Using GeneEvolve to simulate genetically informative data Practical simulating different designs Classical Twin Design (CTD) Nuclear Twin Family Design (NTFD)
3
Simulation provides knowledge about processes that are difficult/impossible to figure out analytically Independent check of models. Especially important for complex (e.g., extended twin family) models. Model verification: Check that your models work as they are supposed to. Sensitivity analysis: Check the effect on parameter estimates when assumptions are violated (e.g., different modes of assortative mating, vertical transmission, or genetic action). Method for predicting complex dynamics in population genetics
4
Using complex models without independent verification (e. g
Using complex models without independent verification (e.g., simulation) is like…
5
Process of model verification
Simulate a dataset that has parameters that your model can estimate. Run your model on the simulated dataset Obtain and store parameter estimates Repeat steps 1-3 many (e.g., 1000) times
6
Results of model verification
If the mean parameter estimate = the simulated parameter estimate, the estimate is unbiased. If your model has no mistakes, parameters should generally be unbiased (there are exceptions) The standard deviation of an estimates corresponds to its standard error and its distribution to its sampling distribution You can also easily study the multivariate sampling distribution and statistics. E.g., how correlated parameters are.
7
Process of sensitivity analysis
Simulate a dataset that has one or more parameters that your model cannot estimate. Run your model on the simulated dataset Obtain and store parameter estimates Repeat steps 1-3 many (e.g., 1000) times
8
Results of sensitivity analysis
Because we are simulating violations of assumptions, we expect parameters to be biased. The question becomes: how biased? I.e., how big of a deal are these violations? We should be able to quantify the answers to these questions.
9
Reality: A=.4, D=.15, S=.15
10
A,D, & F estimates are highly correlated in Stealth & Cascade models
11
Simulation is not a panacea
Simulation can be said to provide “knowledge without understanding.” It is a helpful tool for understanding, but doesn’t provide understanding in and of itself. Simulations themselves rely on assumptions about how processes work. If these are wrong, our simulation results may not reflect reality.
12
Simulation program: GeneEvolve
13
GeneEvolve 0.73 Implemented in R, open-source, user modifiable
User specifies 31 basic parameters up front (and 17 advanced ones); no need to alter script after that. Fast (on AMD Opteron 3.2GHz dual 64 bit processor, 2GB RAM; OS= RHEL AS4) 10 genes, N=20,000 takes ~ 20 seconds/gen Download:
14
How GeneEvolve works: User specifies:
population size, # generations for population to evolve, threshold effects, mechanisms of assortative mating, vertical transmission, etc. 3 types of genetic effects 5 types of environmental effects 13 types of moderator/covariate effects Download:
15
Diagram of GeneEvolve Model
w w q q A F F A x x S S f E E f a a s D e e s D d d aa PFa PMa aa AxA AxA m m m m zs F A A F zd S S E E f a a f e s zaa s D D e d d PT1 aa aa PT2 AxA AxA
16
Diagram: GeneEvolve Age-by-A Interactions
Purcell Model: Our Model: r 1 1 1 A AL AS βA+ βint(age) βAL βAS βage β0 β0 1 P 1 P 1 age open Purcell-vs-Ours.pdf; Purcell-vs-OursCorrelation.pdf
17
How GeneEvolve works (cont):
At adulthood, ~ x% find mates s.t. phenotypic correlation b/w mating phenotypes = AM: Pairs have children : Rate determined by user-specified population growth Process iterated n times Download:
18
How GeneEvolve works (cont):
After n iterations, population split into two: Parents of spouses Parents of twins Parents of twins have offspring (MZ/DZ twins & their sibs) Twins mate with spousal population & have offspring Download:
19
What you get: 3 generations of phenotypic data written out (one row per family), potentially across repeated measures This data (& subsets of it) can be entered into structural models for model verification and sensitivity analysis A summary PDF at end shows: Basic simulation statistics Changes in variance components across time Correlations between 10 relative types Download:
22
Structural Equation Modeling (SEM) in BG
SEM is great because… Directs focus to effect sizes, not “significance” Forces consideration of causes and consequences Explicit disclosure of assumptions Potential weakness… Parameter reification: “Using the CTD we found that 50% of variation is due to A and 20% to C.”
23
Structural Equation Modeling (SEM) in BG
SEM is great because… Directs focus to effect sizes, not “significance” Forces consideration of causes and consequences Explicit disclosure of assumptions Potential weakness… Parameter reification: “Using the CTD we found that 50% of variation is due to A and 20% to C.” NO! Only true under strong assumptions that probably aren’t met (e.g., D=0) and usually go untested. To the degree assumptions wrong, estimates are biased.
24
Classical Twin Design (CTD)
1 A A 1/.25 C C E E a a e c c D D e d d PT1 PT2
25
Classical Twin Design (CTD)
Assumption biased up biased down Either D or C is zero A C & D No assortative mating C D No A-C covariance C D & A 1 A A 1/.25 C C E E a a e c c D D e d d PT1 PT2
26
Adding parents gets us around all these assumptions
Assumption biased up biased down Either D or C is zero No assortative mating No A-C covariance We don’t have to make these PMa C a D d E e c A q w PFa m PT1 PT2 1/.25 x x
27
Parents also allow differentiation of S & F
With parents, we can break “C” up into: S = env. factors shared only between sibs F = familial env factors passed from parents to offspring S C F PT1 S a D d E e s A f F PT2 1/.25 1 PT1 C a D d E e c A PT2 1/.25 1
28
Nuclear Twin Family Design (NTFD)
PMa S a D d E e s A q x w f F PFa m PT1 PT2 zd zs Note: m estimated and f fixed to 1 Assumptions: Only can estimate 3 of 4: A, D, S, and F (bias is variable) Assortative mating due to primary phenotypic assortment (bias is variable)
29
Stealth Include twins and their sibs, parents, spouses, and offspring…
Gives 17 unique covariances (MZ, DZ, Sib, P-O, Spousal, MZ avunc, DZ avunc, MZ cous, DZ cous, GP-GO, and 7 in-laws) 88 covariances with sex effects
30
Additional obs. covs with Stealth allow estimation of A, S, D, F, T
can be estimated simultaneously = env. factors shared only between twins T 1 F A A F 1/.25 S S E E f a a f e s s D e 1/0 D d d d PT1 PT2 t T T t (Remember: we’re not just estimating more effects. More importantly, we’re reducing the bias in estimated effects!)
31
Stealth S D T E A F PMa a d t e s q x w f PFa m PT1 PT2 PCh 1/0 1/.25
32
Stealth Assumption biased up biased down
Primary assortative mating A, D, or F A, D, or F No epistasis A, D S No AxAge D, S A
33
Stealth Assumption biased up biased down
Primary assortative mating A, D, or F A, D, or F No epistasis A, D S No AxAge D, S A Primary AM: mates choose each other based on phenotypic similarity Social homogamy: mates choose each other due to environmental similarity (e.g., religion) Convergence: mates become more similar to each other (e.g., becoming more conservative when dating a conservative)
34
Cascade A F F A S S E E D D T T A F F A A F F A S S S S E E E E D D D
~ ~ ~ ~ ~ PFa PMa ~ ~ ~ t d s s d t w a ~ ~ q ~ ~ a w f f q ~ e ~ e A F F A x x S S f E E f a a s e e s D D d d PFa PMa T t t T m m m m ~ ~ ~ ~ ~ ~ ~ PSp PT1 PT2 PSp ~ ~ ~ t d s s d t w ~ ~ a a ~ a ~ a w q ~ ~ f 1 ~ ~ f f ~ ~ f q ~ ~ e e ~ s s e ~ e A F F A A F F A x ~ 1/.25 ~ x S d d S S S E E E E a f f a ~ ~ a f f a s e s t t s e e s D e D 1/0 D D d d d d PFa PT1 PT2 PMa T t t T T t t T m m m m A F F A S E S E a f f a s D e e s D d d PCh PCh T t t T
35
Reality: A=.5, D=.2
36
Reality: A=.5, S=.2
37
Reality: A=.4, D=.15, S=.15
38
Reality: A=.35, D=.15, F=.2, S=.15, T=.15, AM=.3
39
Reality: A=.45, D=.15, F=.25, AM=.3 (Soc Hom)
40
Reality: A=.4, A*A=.15, S=.15
41
Reality: A=.4, A*Age=.15, S=.15
42
Conclusions All models require assumptions. Generally, more assumptions = more biased estimates For the first time, we have demonstrated independent assessments of the NTFD, Stealth, and Cascade models These complicated models work as designed! In all models, but especially the CTD, please don’t REIFY A, C, & D!
43
Acknowledgments Those who conceived of these models originally:
Jinks, Fulker, Eaves, Cloninger, Reich, Rice, Heath, Neale, Maes, etc. And to Nick Martin: for his energy and enthusiasm, and for encouraging us to do this to begin with
44
Why use it? Modeling aid Check bias & identification:
Feed PE parameters you are modeling, simulate data, & see if your model recovers the parameters Check model’s sensitivity to assumptions: Simulate violations of assumptions & note its effects on estimates Estimate power & multivariate sampling dist’s of estimates under very general conditions: Run PE multiple times given whatever condition you want Download:
45
Why use it? Predictor of population / evolutionary genetics dynamics
Find changes in variance parameters & relative covariances under different modes of AM, VT, & genetic effects: Simulate random genetic drift by varying population size Introduce selection (coming) to test theories on maintenance of genetic variation Download:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.