Download presentation
Presentation is loading. Please wait.
1
Summarizing Variation Michael C Neale PhD Virginia Institute for Psychiatric and Behavioral Genetics Virginia Commonwealth University
2
Overview )Mean )Variance )Covariance )Not always necessary/desirable
3
Computing Mean Formula E (x i )/N )Can compute with 5Pencil 5Calculator 5SAS 5SPSS 5Mx
4
One Coin toss 2 outcomes HeadsTails Outc ome 0 0.1 0.2 0.3 0.4 0.5 0.6 Probab ility
5
Two Coin toss 3 outcomes HHHT/THTT Outc ome 0 0.1 0.2 0.3 0.4 0.5 0.6 Probab ility
6
Four Coin toss 5 outcomes HHHHHHHTHHTTHTTTTTTT Outc ome 0 0.1 0.2 0.3 0.4 Probab ility
7
Ten Coin toss 9 outcomes Outc ome 0 0.05 0.1 0.15 0.2 0.25 0.3 Probab ility
8
Pascal's Triangle Pascal's friendChevalier de Mere 1654; Huygens 1657; Cardan 1501-1576 1 1 2 1 1 3 3 1 1 4 6 4 1 1 5 10 10 5 1 1 6 15 20 15 6 1 1 7 21 35 35 21 7 1 1/1 1/2 1/4 1/8 1/16 1/32 1/64 1/128 Frequency Probability
9
Fort Knox Toss 01234-2-3-4 Heads-Tails 0 0.1 0.2 0.3 0.4 0.5 Gauss 1827 Series 1 Infinite outcomes
10
Variance )Measure of Spread )Easily calculated )Individual differences
11
Average squared deviation Normal distribution 0123-2-3 : xixi didi Variance = G d i 2 /N
12
Measuring Variation )Absolute differences? )Squared differences? )Absolute cubed? )Squared squared? Weighs & Means
13
Measuring Variation )Squared differences Ways & Means Fisher (1922) Squared has minimum variance under normal distribution
14
Covariance ) Measure of association between two variables ) Closely related to variance ) Useful to partition variance
15
Deviations in two dimensions :x:x :y:y
16
:x:x :y:y dx dy
17
Measuring Covariation )A square, perimeter 4 )Area 1 Area of a rectangle 1 1
18
Measuring Covariation )A skinny rectangle, perimeter 4 )Area.25*1.75 =.4385 Area of a rectangle 1.75.25
19
Measuring Covariation )Points can contribute negatively )Area -.25*1.75 = -.4385 Area of a rectangle 1.75 -.25
20
Measuring Covariation Covariance Formula F = E (x i - : x )(y i - : y ) xy (N-1)
21
Correlation )Standardized covariance )Lies between -1 and 1 r = F xy 2 2 y x F * F
22
Summary Formulae : = ( E x i )/N F x = E (x i - :)/(N-1) 2 2 r = F xy 2 2 y x F * F F xy = E (x i -: x )(y i -: y )/(N-1)
23
Variance covariance matrix Several variables Var(X) Cov(X,Y) Cov(X,Z) Cov(X,Y) Var(Y) Cov(Y,Z) Cov(X,Z) Cov(Y,Z) Var(Z)
24
Conclusion )Means and covariances )Conceptual underpinning )Easy to compute )Can use raw data instead
25
Biometrical Model of QTL m d +a-a
26
Biometrical model for QTL Diallelic locus A/a with p as frequency of a
27
Classical Twin Studies )Summary: rmz & rdz )Basic model: A C E )rmz = A + C )rdz =.5A + C )var = A + C + E )Solve equations Information and analysis
28
Contributions to Variance )Additive QTL variance 5VA = 2p(1-p) [ a - d(2p-1) ]2 )Dominance QTL variance 5VD = 4p2 ( 1- p) 2 d2 )Total Genetic Variance due to locus VQ = V A + VD Single genetic locus
29
Origin of Expectations )P = aA + cC + eE )Standardize A C E )V P = a 2 + c 2 + e 2 )Assumes A C E independent Regression model
30
Path analysis )Two sorts of variable 5Observed, in boxes 5Latent, in circles )Two sorts of path 5Causal (regression), one-headed 5Correlational, two-headed Elements of a path diagram
31
Rules of path analysis )Trace path chains between variables )Chains are traced backwards, then forwards, with one change of direction at a double headed arrow )Predicted covariance due to a chain is the product of its paths )Predicted total covariance is sum of covariance due to all possible chains
32
ACE model MZ twins reared together
33
ACE model DZ twins reared together
34
ACE model DZ twins reared apart
35
Model fitting )Takes care of replicate statistics )Maximum likelihood estimates )Confidence intervals on parameters )Overall fit of model )Comparison of nested models
36
Fitting models to covariance matrices )MZ covariances 53 statistics V1 CMZ V2 )DZ covariances 53 statistics V1 CDZ V2 )Parameters: a c e )Df = nstat - npar = 6 - 3 = 3
37
Model fitting to covariance matrices )Inherently compares fit to saturated model )Difference in fit between A C E model and A E model gives likelihood ratio test with df = difference in number of parameters
38
Confidence intervals )Two basic forms 5covariance matrix of parameters 5likelihood curve )Likelihood-based has some nice properties; squares of CIs on a give CI's on a 2 Meeker & Escobar 1995; Neale & Miller, Behav Genet 1997
39
Multivariate analysis )Comorbidity 5Partition into relevant components 5Explicit models 5One disorder or two or three )Longitudinal data analysis 5Partition into new/old 5Explicit models 5Markov 5Growth curves
40
Cholesky Decomposition )Provides a way to model covariance matrices )Always fits perfectly )Doesn't predict much else Not a model
41
Perverse Universe A E.7 P NOT!
42
Perverse Universe A E Y X.7 -.7 r(X,Y)=0; Problem for almost any multivariate method
43
Analysis of raw data )Awesome treatment of missing values )More flexible modeling 5Moderator variables 5Correction for ascertainment 5Modeling of means )QTL analysis
44
Technicolor Likelihood Function For raw data in Mx j=1 ln L i = f i 3 ln [w j g(x i,: ij, G ij )] m x i - vector of observed scores on n subjects :ij - vector of predicted means Gij - matrix of predicted covariances - functions of parameters
45
Pihat Linkage Model for Siblings Each sib pair i has different COVARIANCE
46
Mixture distribution model Each sib pair i has different set of WEIGHTS p(IBD=2) x P(LDL1 & LDL2 | rQ = 1 ) p(IBD=1) x P(LDL1 & LDL2 | rQ =.5 ) p(IBD=0) x P(LDL1 & LDL2 | rQ = 0 ) Total likelihood is product of weighted likelihoods rQ=1rQ=.5 rQ=.0 weight j x Likelihood under model j
47
Conclusion )Model fitting has a number of advantages )Raw data can be analysed with greater flexibility )Not limited to continuous normally distributed variables
48
Conclusion II )Data analysis requires creative application of methods )Canned analyses are of limited use )Try to answer the question!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.