Lecture 18: Heritability and P3D

Slides:



Advertisements
Similar presentations
Software for Incorporating Marker Data in Genetic Evaluations Kathy Hanford U.S. Meat Animal Research Center Agricultural Research Service U.S. Department.
Advertisements

GBS & GWAS using the iPlant Discovery Environment
Sampling: Final and Initial Sample Size Determination
PAG 2011 TASSEL Terry Casstevens 1, Peter Bradbury 2,3, Zhiwu Zhang 1, Yang Zhang 1, Edward Buckler 1,2,4 1 Institute.
Estimating “Heritability” using Genetic Data David Evans University of Queensland.
The appropriateness of three different association analysis models, the least square solution to the fixed effects General Linear Model (GLM-Q, Searle.
Statistical Genomics Zhiwu Zhang Washington State University Lecture 26: Kernel method.
Statistical Genomics Zhiwu Zhang Washington State University Lecture 19: SUPER.
Statistical Genomics Zhiwu Zhang Washington State University Lecture 25: Ridge Regression.
Washington State University
Statistical Genomics Zhiwu Zhang Washington State University Lecture 29: Bayesian implementation.
Statistical Genomics Zhiwu Zhang Washington State University Lecture 16: CMLM.
Statistical Genomics Zhiwu Zhang Washington State University Lecture 20: MLMM.
Statistical Genomics Zhiwu Zhang Washington State University Lecture 5: Linear Algebra.
Statistical Genomics Zhiwu Zhang Washington State University Lecture 4: Statistical inference.
Statistical Genomics Zhiwu Zhang Washington State University Lecture 11: Power, type I error and FDR.
Statistical Genomics Zhiwu Zhang Washington State University Lecture 27: Bayesian theorem.
Genome Wide Association Studies Zhiwu Zhang Washington State University.
Lecture 5: Linear Algebra
Lecture 28: Bayesian methods
Lecture 10: GWAS by correlation
Washington State University
Math 4030 Probability and Statistics for Engineers
Lecture 28: Bayesian Tools
Washington State University
Washington State University
Lecture 22: Marker Assisted Selection
Lecture 10: GWAS by correlation
Washington State University
Washington State University
Genome Wide Association Studies using SNP
Washington State University
Lecture 5: Linear Algebra
Marker heritability Biases, confounding factors, current methods, and best practices Luke Evans, Matthew Keller.
Washington State University
Washington State University
Washington State University
Washington State University
Washington State University
Washington State University
Lecture 10: GWAS by correlation
Washington State University
Lecture 23: Cross validation
Lecture 23: Cross validation
Washington State University
OVERVIEW OF LINEAR MODELS
Washington State University
Washington State University
Lecture 10: GWAS by correlation
What are BLUP? and why they are useful?
Lecture 16: Likelihood and estimates of variances
Washington State University
OVERVIEW OF LINEAR MODELS
Lecture 26: Bayesian theory
Lecture 11: Power, type I error and FDR
Washington State University
Lecture 11: Power, type I error and FDR
Washington State University
Lecture 27: Bayesian theorem
Washington State University
Washington State University
Lecture 17: Likelihood and estimates of variances
Washington State University
Lecture 23: Cross validation
Lecture 29: Bayesian implementation
Lecture 22: Marker Assisted Selection
Hunting for Celiac Disease Genes
Washington State University
The Basic Genetic Model
Results from a GWAS of prostate cancer in the KP population (8,399 cases and 38,745 controls), highlighting key chromosomal regions. Results from a GWAS.
Presentation transcript:

Lecture 18: Heritability and P3D Statistical Genomics Lecture 18: Heritability and P3D Zhiwu Zhang Washington State University

Administration Homework 4, due March 23, Wednesday, 3:10PM Midterm evaluation

Outline Complexity of computing time Idea of P3D Consideration of heritability

Computing time of inverse a=matrix(sample(100,n*n,replace=T),n,n) ptm <- proc.time() b=solve(a) proc.time() - ptm proc.time() - ptm user system elapsed 0.284 0.004 0.291 n 500 1000 2000 Time 0.291 2.235 16.762 Ratio (n) 1 2 Ratio (Time) 7.7 7.5 T ∝ n3

Computing time of MLM GWAS T ∝ m n3 p Number of markers Number of individuals Estimation of variances EMMA Kang, 2008 Literature h2 J Deckker, 2009

Computing time of MLM GWAS Literature h2 -50%, -10%, +10 and +50%

Similar effects, SE, LL, but different P

MLM Tree Charles Henderson 1983 Dick Quaas John Pollak Ivan Mao Larry Schaeffer Brian Kennedy Jim Wilton Charles Henderson 1983

http://www.limousin-international.com/genetic_evaluation.htm

Computing time of genetic evaluation T ∝ m n3 p Number of markers (1) Number of individuals (10M) Estimation of variances (1) A week!

One step further from Decker y = X𝛽+𝑍𝑢+𝑒 y = Wv + X𝛽+𝑍𝑢+𝑒

Completely different story Full optimization: estimate h2 for every SNPs P3D: Estimate h2 without SNP and fix it for testing SNPs

P3D y = Wv + X𝛽+𝑍𝑢+𝑒 Population Parameters Previously Determined C is compression level in CMLM

Compression + P3D Compression levels 100~10,000X faster

P3D/EMMAx Kang Zhang Zhang, Z. et al. Mixed linear model approach adapted for genome-wide association studies. Nat Genet 42, 355–360 (2010). Kang, H. M. et al. Variance component model to account for sample structure in genome-wide association studies. Nat Genet 42, 348–354 (2010).

EMMA kinship Non singular Diagonals: 1 What is the impact oh h2?

Estimated < True Estimated = 2 True Estimated = ½ True A kinship matrix was incorrectly used as Numerator Relationship Matrix (NRM), which is twice of kinship matrix, to estimate heritability. In this case, the estimated and true heritabilities have relationship of: Estimated < True Estimated = 2 True Estimated = ½ True Estimated > True

By definition of heritability Estimated < True Estimated = 2 True Estimated = ½ True Estimated > True ℎ 2 = 𝜎 𝑎 2 𝜎 𝑎 2 + 𝜎 𝑒 2

Genetic variance Var(u)=G=2K 𝜎 𝑎 2 =A 𝜎 𝑎 2 =𝐾 𝜎 𝑜 2 2 𝜎 𝑎 2 = 𝜎 𝑜 2 ℎ 𝑜 2 = 𝜎 𝑜 2 𝜎 𝑜 2 + 𝜎 𝑒 2 = 2𝜎 𝑎 2 2𝜎 𝑎 2 + 𝜎 𝑒 2 = 𝜎 𝑎 2 𝜎 𝑎 2 + 1 2 𝜎 𝑒 2 > 𝜎 𝑎 2 𝜎 𝑎 2 + 𝜎 𝑒 2 =ℎ 2

By definition of heritability Estimated < True Estimated = 2 True Estimated = ½ True Estimated > True ℎ 𝑜 2 = 𝜎 𝑎 2 𝜎 𝑎 2 + 1 2 𝜎 𝑒 2 > 𝜎 𝑎 2 𝜎 𝑎 2 + 𝜎 𝑒 2 =ℎ 2

Find out with GAPIT library('MASS') # required for ginv library(multtest) library(gplots) library(compiler) #required for cmpfun library("scatterplot3d") source("http://www.zzlab.net/GAPIT/emma.txt") source("http://www.zzlab.net/GAPIT/gapit_functions.txt") source("~/Dropbox/GAPIT/Functions/gapit_functions.txt") myGD=read.table(file="http://zzlab.net/GAPIT/data/mdp_numeric.txt",head=T) myGM=read.table(file="http://zzlab.net/GAPIT/data/mdp_SNP_information.txt",head=T) setwd("~/Dropbox/Current/ZZLab/WSUCourse/CROPS545/Demo") source("G2P.R") X=myGD[,-1] index1to5=myGM[,2]<6 X1to5 = X[,index1to5] set.seed(99164) mySim=G2P(X= X1to5,h2=.75,alpha=1,NQTN=10,distribution="norm") setwd("~/Desktop/temp") myY=cbind(as.data.frame(myGD[,1]), mySim$y) myGAPIT=GAPIT( Y=myY, GD=myGD, GM=myGM, QTN.position=mySim$QTN.position, PCA.total=3, group.from=1000000, group.to=1000000, group.by=10, memo="MLM") myGAPIT$kinship[1:5,1:6] Find out with GAPIT

Kinship from GAPIT myK=myGAPIT$kinship myK[,-1]=myK[,-1]/2

Using half of relationship myGAPIT2=GAPIT( Y=myY, GD=myGD, GM=myGM, KI=myK, QTN.position=mySim$QTN.position, PCA.total=3, group.from=1000000, group.to=1000000, group.by=10, memo="MLM2")

Estimated vs true 2x >

K and 2K are the same for P value

Highlight Complexity of computing time Idea of P3D Consideration of heritability