p(A) = p(AA) + ½ p(Aa) p(a) = p(aa)+ ½ p(Aa)

Slides:



Advertisements
Similar presentations
Bivariate analysis HGEN619 class 2007.
Advertisements

Practical H:\ferreira\biometric\sgene.exe. Practical Aim Visualize graphically how allele frequencies, genetic effects, dominance, etc, influence trait.
Outline: 1) Basics 2) Means and Values (Ch 7): summary 3) Variance (Ch 8): summary 4) Resemblance between relatives 5) Homework (8.3)
Biometrical genetics Manuel Ferreira Shaun Purcell Pak Sham Boulder Introductory Course 2006.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Biometrical genetics Manuel Ferreira Shaun Purcell Pak Sham Boulder Introductory Course 2006.
Genetic Theory Manuel AR Ferreira Egmond, 2007 Massachusetts General Hospital Harvard Medical School Boston.
Path Analysis Danielle Dick Boulder Path Analysis Allows us to represent linear models for the relationships between variables in diagrammatic form.
Quantitative Genetics
Reminder - Means, Variances and Covariances. Covariance Algebra.
Biometrical Genetics Pak Sham & Shaun Purcell Twin Workshop, March 2002.
Gene x Environment Interactions Brad Verhulst (With lots of help from slides written by Hermine and Liz) September 30, 2014.
Mx Practical TC18, 2005 Dorret Boomsma, Nick Martin, Hermine H. Maes.
Basic Mathematics for Portfolio Management. Statistics Variables x, y, z Constants a, b Observations {x n, y n |n=1,…N} Mean.
Analysis of Covariance Goals: 1)Reduce error variance. 2)Remove sources of bias from experiment. 3)Obtain adjusted estimates of population means.
Correlation 1. Correlation - degree to which variables are associated or covary. (Changes in the value of one tends to be associated with changes in the.
Lecture 2: Basic Population and Quantitative Genetics.
PBG 650 Advanced Plant Breeding Module 5: Quantitative Genetics – Genetic variance: additive and dominance.
Managerial Economics Demand Estimation. Scatter Diagram Regression Analysis.
Regression. Population Covariance and Correlation.
Univariate modeling Sarah Medland. Starting at the beginning… Data preparation – The algebra style used in Mx expects 1 line per case/family – (Almost)
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Environmental Modeling Basic Testing Methods - Statistics III.
Lecture 24: Quantitative Traits IV Date: 11/14/02  Sources of genetic variation additive dominance epistatic.
Lecture 21: Quantitative Traits I Date: 11/05/02  Review: covariance, regression, etc  Introduction to quantitative genetics.
Mx modeling of methylation data: twin correlations [means, SD, correlation] ACE / ADE latent factor model regression [sex and age] genetic association.
Lecture 22: Quantitative Traits II
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
Mx Practical TC20, 2007 Hermine H. Maes Nick Martin, Dorret Boomsma.
Introduction to Genetic Theory
Genetic principles for linkage and association analyses Manuel Ferreira & Pak Sham Boulder, 2009.
Welcome  Log on using the username and password you received at registration  Copy the folder: F:/sarah/mon-morning To your H drive.
I. Statistical Methods for Genome-Enabled Prediction of Complex Traits OUTLINE THE CHALLENGES OF PREDICTING COMPLEX TRAITS ORDINARY LEAST SQUARES (OLS)
Genetic Theory Manuel AR Ferreira Boulder, 2007 Massachusetts General Hospital Harvard Medical School Boston.
Extended Pedigrees HGEN619 class 2007.
Invest. Ophthalmol. Vis. Sci ;57(1): doi: /iovs Figure Legend:
Chapter 4: Basic Estimation Techniques
Genetic simplex model: practical
Regression Analysis AGEC 784.
PBG 650 Advanced Plant Breeding
Basic Estimation Techniques
Univariate Twin Analysis
B&A ; and REGRESSION - ANCOVA B&A ; and
Genotypic value is not transferred from parent to
Introduction to Multivariate Genetic Analysis
Chapter 11: Simple Linear Regression
Linear Mixed Models in JMP Pro
Re-introduction to openMx
MRC SGDP Centre, Institute of Psychiatry, Psychology & Neuroscience
Path Analysis Danielle Dick Boulder 2008
Regression model with multiple predictors
Genotypic value is not transferred from parent to
Slope of the regression line:
15 The Genetic Basis of Complex Inheritance
Basic Estimation Techniques
Univariate modeling Sarah Medland.
CHAPTER 29: Multiple Regression*
The regression model in matrix form
Biometrical model and introduction to genetic analysis
Pak Sham & Shaun Purcell Twin Workshop, March 2002
(Re)introduction to Mx Sarah Medland
Detecting variance-controlling QTL
Sarah Medland faculty/sarah/2018/Tuesday
Pi = Gi + Ei Pi = pi - p Gi = gi - g Ei = ei - e _ _ _ Phenotype
Introduction to Biometrical Genetics {in the classical twin design}
Ch 4.1 & 4.2 Two dimensions concept
Genotypic value is not transferred from parent to
Power Calculation for QTL Association
BOULDER WORKSHOP STATISTICS REVIEWED: LIKELIHOOD MODELS
Presentation transcript:

p(A) = p(AA) + ½ p(Aa) p(a) = p(aa)+ ½ p(Aa) Practical Part 1. Single QTL model 1.1 Given QTL data Calculate allele frequencies p(A) & p(a) assuming H-W equilibrium from genotype frequencies p(A) = p(AA) + ½ p(Aa) p(a) = p(aa)+ ½ p(Aa) 1.2 Regress phenotype on QTL in two models: A) additive model B) additive + dominance model

A) The linear regression model. xi = a0 + a*QTLi + ei s2(x) = a2*s2(QTL) + s2(e) QTL is coded -1 (aa) 0 (Aa) 1 (AA) a2*s2(QTL) is the additive genetic variance: contribution of the QTL to phenotypic individual differences. s2(e) is the residual (dominance + environmental) ……. a2*s2(QTL) = s2QTL(A) parameter a is the the effect of the QTL given addititve model +a – a aa Aa AA s2QTL(A) =2*pq[a]2

B) The linear regression model (non-linear). xi = a0 + a*QTL_Ai + d*QTL_Di + ei QTL_A is coded -1,0,1 QTL_D is coded 1 0 1 (p=q=.5) explained variance = a2*s2(QTL_A) + d2*s2(QTL_D) = s2QTL(A) + s2QTL(A) d +a – a aa Aa AA s2QTL(A) =2*pq[a+(q-p)d]2 s2QTL(D) = (2pqd)2

s2QTL(A) =2*pq[a+(q-p)d]2 s2QTL(D) = (2pqd)2 Given estimates of p(A), q(a), a and d, calculate expected contribution to the phenotypic variance (A and D components) 2) expected covariance of MZ, DZ and Parent-Offspring s2QTL(A) =2*pq[a+(q-p)d]2 s2QTL(D) = (2pqd)2

Practical Part 2. Suppose we have twin data, but not the measured QTL? Fit the ADE twin model If the QTL is the only source of genetic differences in the phenotypethen the variance components estimated in the ADE twin model should equal the variance components obtained in the regression analyses. Try it out using data. Generalizes from regression model with 1 QTL to M QTLs. The twin model gives you the explained variance expected in the regression of phenotype on M QTLs (but none of the QTLs are measured)....

# annotated RCODE to do the practical rm(list=ls(all=TRUE)) # clear memory data1=read.table('data1.dat',header=TRUE,row.names=NULL) # read the data # to read the data you have to navigate to the dir containing the data # menu: File. Option: Change Dir..... head(data1) # variable names: T1QTL and T1pheno N=dim(data1)[1] # sample size print(N) # using “table function to get genotype counts Gcounts=table(data1$T1QTL) # genotype coding -1=aa, 0=Aa, 1=AA Gfreq=Gcounts/N # genotype frequencies # estimate allele frequencies p_est=Gfreq[3]+.5*Gfreq[2] # allele freq A q_est=Gfreq[1]+.5*Gfreq[2] # allele freq a # p allele freq print(Gfreq ) # genotype freqs print(p_est) # allele freq (A)

variable names + first 6 cases. total sample size genotype frequencies T1QTL T1pheno 1 0 1.77 2 1 0.80 3 0 -0.12 4 0 0.55 5 1 0.13 0 1.13 # sample size [1] 5000 # genotype freqs -1 0 1 0.2568 0.4986 0.2446 > print(p_est) # allele freq (A) 1 0.4939 variable names + first 6 cases. total sample size genotype frequencies allele (A) frequency (HW equilibrium)

# plot the data pheno on QTL – scatter plot plot(data1$T1pheno~data1$T1QTL,col='grey') # calculate the phenotypic means within each genotype # the phenotypic means “conditional on genotype” m_aa=mean(data1$T1pheno[data1$T1QTL==-1]) m_Aa=mean(data1$T1pheno[data1$T1QTL==0]) m_AA=mean(data1$T1pheno[data1$T1QTL==1]) # show the conditional means in the scatter plot lines(c(-1,0,1),c(m_aa,m_Aa,m_AA),type='p',lwd=8,col='red') #

scatter plot of the data conditonal means in red

# regress phenotype on QTL - first the addititve model mlin1=lm(T1pheno~T1QTL,data=data1) summary(mlin1) # show regression results bs1=mlin1$coefficients # extract the regression coefficients print(bs1) # parameters # plot the regression line as fitted lines(c(-1,0,1),bs1[1]+bs1[2]*c(-1,0,1),type='l',lwd=3,col='blue')

linear model parameters (no dominance) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.37866 0.01252 30.25 <2e-16 *** T1QTL 0.99134 0.01768 56.08 <2e-16 *** Multiple R-squared: 0.3863 > print(bs1) # parameters (Intercept) T1QTL 0.3786623 0.9913396 linear model (no dominance) linear model parameters (no dominance) Model does not fit the data really.

# regress phenotype on QTL - A+D model # note that I(T1QTL^2) = c(-1 0 1)^2 = c(1 0 1) # mlin2=lm(T1pheno~T1QTL+I(T1QTL^2),data=data1) bs2=mlin2$coefficients print(bs2) # show the parameters b0, b1 (a) and b2 (d) # plot the regression line lines(c(-1,0,1),bs2[1]+bs2[2]*c(-1,0,1)+bs2[3]*c(1,0,1),type='l',lwd=3,col='green') # # get the explained variances model A and model B expl1v=var(mlin1$fitted.values) # S2A expl2v=var(mlin2$fitted.values) # S2D + S2A # show the total explained variance (covariance components) print(c(expl1v, expl2v))

Accurate model ! additive + dominance (Intercept) T1QTL I(T1QTL^2) 0.7664862 0.9819503 -0.7737103 additive + dominance > print(c(expl1v, expl2v)) [1] 0.4927052 0.6423467 Accurate model !

# print(c(expl1v,expl2v)) a=bs2[2] d=bs2[3] print(c(p_est,q_est,a,d)) S_A=2*p_est*q_est*(a+(q_est-p_est)*d)^2 S_D=(2*p_est*q_est*d)^2 S_G=S_A + S_D print(S_A) print(S_D) print(c(S_G,expl2v)) print(var(data1$T1pheno))

s2QTL(A) =2*pq[a+(q-p)d]2 s2QTL(D) = (2pqd)2 p(A) p(a) a and d > print(c(p_est,q_est,a,d)) 0.4939000 0.5061000 0.9819503 -0.7737103 > print(S_A) 0.4728184 > print(S_D) 0.1496124 > print(c(S_G,expl2v)) 0.6224308 0.6423467 > print(var(data1$T1pheno)) [1] 1.275602 s2QTL(A) =2*pq[a+(q-p)d]2 s2QTL(D) = (2pqd)2 s2QTL(A) + s2QTL(D) phenotypic variance

S_phen=var(data1$T1pheno) covPO=.5*S_A # expected cov dz1 - dz2 covDZ=.5*S_A + .25*S_D # expected cov mz1 - mz2 covMZ=S_A + S_D print(c(S_phen,S_A,S_D,S_G,covPO,covDZ,covMZ)) #

0.6224308 total genetic variance > print(c(S_phen,S_A,S_D,S_G,covPO,covDZ,covMZ)) 1.2756021 0.4728184 0.1496124 0.6224308 0.2364092 0.2738123 0.6224308 1.2756021 Phenotypic variance 0.4728184 add gen variance 0.1496124 dominance variance 0.6224308 total genetic variance 0.2364092 contribution to parent-offspring covariance 0.2738123 contribution to DZ covariance 0.6224308 contribution to MZ covariance

library(umx) data4=read.table('data4.dat',header=TRUE,row.names=NULL) head(data4) N=dim(data4)[1] # sample size # mzData=data4[data4$zyg==1,6:9] # create MZ and dzData=data4[data4$zyg==2,6:9] # DZ dataframes round(cov(mzData),3) # the covariance matrices round(cov(dzData),3) # the covariance matrices

based on regression results > round(cov(mzData),3) Mpheno Fpheno T1pheno T2pheno Mpheno 1.238 0.010 0.299 0.295 Fpheno 0.010 1.331 0.266 0.270 T1pheno 0.299 0.266 1.297 0.678 T2pheno 0.295 0.270 0.678 1.282 > round(cov(dzData),3) Mpheno 1.257 -0.001 0.215 0.201 Fpheno -0.001 1.272 0.218 0.261 T1pheno 0.215 0.218 1.269 0.294 T2pheno 0.201 0.261 0.294 1.284 phenotypic covariance matrix MZ phenotypic covariance matrix DZ based on regression results 0.2364092 contribution to parent-offspring covariance 0.2738123 contribution to DZ covariance 0.6224308 contribution to MZ covariance

# fit the ADE variance component model selDVs=colnames(mzData)[3:4] ADE=umxACEv(name = "ADE", selDVs, selCovs=NULL, dzData, mzData, dzCr = .25, covMethod=NULL) # Baseline=mxRefModels(ADE,run=TRUE) # mxCompare(Baseline,ADE) summary(ADE)$parameters #

based on direct regression with measured QTL > summary(ADE)$parameters name matrix row col Estimate 2 A_r1c1 top.A 1 1 0.5138449 3 C_r1c1 top.C 1 1 0.1581402 4 E_r1c1 top.E 1 1 0.6101012 based on twin model QTL not measured based on direct regression with measured QTL S_A = .4728 S_D = .1496 Message: Suppose a phenotype is subject to effects of M QTLs. QTLs of unknown location, not measured What the Twin Model does: Twin model provides estimates of explained variance in the regression of phenotype on M QTLS – without having measured the QTLs