Statistical Genomics Zhiwu Zhang Washington State University Lecture 19: SUPER.

Slides:



Advertisements
Similar presentations
Meta-analysis for GWAS BST775 Fall DEMO Replication Criteria for a successful GWAS P
Advertisements

GBS & GWAS using the iPlant Discovery Environment
Added value of whole-genome sequence data to genomic predictions in dairy cattle Rianne van Binsbergen 1,2, Mario Calus 1, Chris Schrooten 3, Fred van.
Perspectives from Human Studies and Low Density Chip Jeffrey R. O’Connell University of Maryland School of Medicine October 28, 2008.
PAG 2011 TASSEL Terry Casstevens 1, Peter Bradbury 2,3, Zhiwu Zhang 1, Yang Zhang 1, Edward Buckler 1,2,4 1 Institute.
Association Modeling With iPlant
Quantitative Genetics
Give me your DNA and I tell you where you come from - and maybe more! Lausanne, Genopode 21 April 2010 Sven Bergmann University of Lausanne & Swiss Institute.
Design Considerations in Large- Scale Genetic Association Studies Michael Boehnke, Andrew Skol, Laura Scott, Cristen Willer, Gonçalo Abecasis, Anne Jackson,
Population Stratification
2007 Paul VanRaden and Mel Tooker Animal Improvement Programs Laboratory, USDA Agricultural Research Service, Beltsville, MD, USA
Jeff O’ConnellInterbull annual meeting, Orlando, FL, July 2015 (1) J. R. O’Connell 1 and P. M. VanRaden 2 1 University of Maryland School of Medicine,
California Pacific Medical Center
Statistical Genomics Zhiwu Zhang Washington State University Lecture 26: Kernel method.
Statistical Genomics Zhiwu Zhang Washington State University Lecture 25: Ridge Regression.
Genome-Wides Association Studies (GWAS) Veryan Codd.
Washington State University
Statistical Genomics Zhiwu Zhang Washington State University Lecture 29: Bayesian implementation.
Statistical Genomics Zhiwu Zhang Washington State University Lecture 16: CMLM.
Statistical Genomics Zhiwu Zhang Washington State University Lecture 7: Impute.
Statistical Genomics Zhiwu Zhang Washington State University Lecture 9: Linkage Disequilibrium.
Statistical Genomics Zhiwu Zhang Washington State University Lecture 20: MLMM.
Statistical Genomics Zhiwu Zhang Washington State University Lecture 4: Statistical inference.
Statistical Genomics Zhiwu Zhang Washington State University Lecture 11: Power, type I error and FDR.
I. Statistical Methods for Genome-Enabled Prediction of Complex Traits OUTLINE THE CHALLENGES OF PREDICTING COMPLEX TRAITS ORDINARY LEAST SQUARES (OLS)
Statistical Genomics Zhiwu Zhang Washington State University Lecture 27: Bayesian theorem.
Genome Wide Association Studies Zhiwu Zhang Washington State University.
Lecture 28: Bayesian methods
Lecture 10: GWAS by correlation
Lecture 28: Bayesian Tools
Washington State University
Washington State University
Lecture 22: Marker Assisted Selection
Lecture 10: GWAS by correlation
Washington State University
Washington State University
Genome Wide Association Studies using SNP
Washington State University
Marker heritability Biases, confounding factors, current methods, and best practices Luke Evans, Matthew Keller.
Washington State University
Washington State University
Washington State University
Washington State University
Washington State University
Washington State University
Washington State University
Washington State University
Lecture 23: Cross validation
Lecture 23: Cross validation
Washington State University
Washington State University
What are BLUP? and why they are useful?
Lecture 16: Likelihood and estimates of variances
Washington State University
Statistical Analysis and Design of Experiments for Large Data Sets
Lecture 11: Power, type I error and FDR
Washington State University
Lecture 11: Power, type I error and FDR
Washington State University
Lecture 27: Bayesian theorem
Lecture 18: Heritability and P3D
Washington State University
Lecture 17: Likelihood and estimates of variances
Washington State University
Lecture 23: Cross validation
Lecture 29: Bayesian implementation
Lecture 22: Marker Assisted Selection
Washington State University
The Basic Genetic Model
Presentation transcript:

Statistical Genomics Zhiwu Zhang Washington State University Lecture 19: SUPER

 Homework 5, due April 13, Wednesday, 3:10PM  Final exam: May 3, 120 minutes (3:10-5:10PM), 50 Administration

 Statistics (lecture slides)  R programming(lecture slides)  Genetics: GBS, populations structure, kinship  Imputation  GWAS: GLM, MLM, CMLM, ECMLM, SUPER, MLMM, EMMA, EMMAx/P3D, FarmCPU, PC+K  GS: gBLUP Read material

 Kinship based on QTN  Confounding between QTN and kinship  Complimentary kinship  SUPER Outline

More covariates y1x1x2 observationmeanPC2SNP []=X b0 b1b2 [] b= y = Xb + Zu +e Ind1Ind2…Ind9Ind10 u1u2…u9u10 10…00 01…00 00…10 00…01 Z u= [ ]

Variance in MLM y = Xb + Zu + e b prediction: Best Linear Unbiased Estimate, BLUE) Var(y)=V=Var(u)+Var(e) u prediction: Best Linear Unbiased Prediction, BLUP)

Kinship defined by single marker S1S2S3S4R1R2R3R4 S S S S R R R R SensitiveResistance Adding additional markers bluer the picture

Derivation of kinship All SNPs QTNs Non-QTNs SNP Kinship

Statistical power of kinship from

QTNs Average Realized Single trait All traits Pedigree Markers QTNs Remove QTN one at a time Kinship evolution

Statistical power of kinship from

Bin approach

Mimic QTN-1  1. Choose t associated SNPs as QTNs each represent an interval of size s.  2. Build kinship from the t QTNs  3. Optimization on t and s  4. For a SNP, remove the QTNs in LD with the SNP, e.g. R square > 1%  5. Use the remaining QTNs to build kinship for testing the SNP

Statistical power of kinship from Qishan Wang PLoS One, 2014 SUPER (Settlement of kinship Under Progressively Exclusive Relationship)

Threshold of excluding pseudo QTNs

Impact of initial P values

Sandwich Algorithm in GAPIT GDGKGP GKGK GK GP KI CMLM CMLM/ MLM/GLM SUPER/ FaST KI: Kinship of Individual GP: Genotype Probability InputKI Optimization of bin size and number GP GD: Genotype Data GK: Genotype for Kinship CMLM/ GLM MLM/GLM SUPER/ FaST

SUPER in GAPIT #RUN SUPER myGAPIT=GAPIT( Y=mySim$Y, GD=myGD, GM=myGM, QTN.position=mySim$QTN.position, PCA.total=3, sangwich.top="MLM", #options are GLM,MLM,CMLM, FaST and SUPER sangwich.bottom="SUPER", #options are GLM,MLM,CMLM, FaST and SUPER LD=0.1, memo="SUPER") #GAPIT library('MASS') # required for ginv library(multtest) library(gplots) library(compiler) #required for cmpfun library("scatterplot3d") source(" source(" source("~/Dropbox/GAPIT/Functions/gapit_functions.txt") myGD=read.table(file=" myGM=read.table(file=" #Siultate 10 QTN on the first chromosomes X=myGD[,-1] index1to5=myGM[,2]<6 X1to5 = X[,index1to5] taxa=myGD[,1] set.seed(99164) GD.candidate=cbind(taxa,X1to5) mySim=GAPIT.Phenotype.Simulation(GD=GD.candidate,GM=myGM[index1to5,],h2=.5,NQTN =10,QTNDist="norm")

GAPIT.FDR.TypeI Function myStat=GAPIT.FDR.TypeI(WS=c(1e0,1e3,1e4,1e5), GM=myGM, seqQTN=mySim$QTN.position, GWAS=myGAPIT$GWAS)

Return

Area Under Curve (AUC) par(mfrow=c(1,2),mar = c(5,2,5,2)) plot(myStat$FDR[,1],myStat$Power,type="b") plot(myStat$TypeI[,1],myStat$Power,type="b")

Replicates nrep=3 set.seed(99164) statRep=replicate(nrep, { mySim=GAPIT.Phenotype.Simulation(GD=GD.candidate,GM=myGM[index1to5,],h 2=.5,NQTN=10,QTNDist="norm") myGAPIT=GAPIT( Y=mySim$Y, GD=myGD, GM=myGM, QTN.position=mySim$QTN.position, PCA.total=3, sangwich.top="MLM", #options are GLM,MLM,CMLM, FaST and SUPER sangwich.bottom="SUPER", #options are GLM,MLM,CMLM, FaST and SUPER LD=0.1, memo="SUPER") myStat=GAPIT.FDR.TypeI(WS=c(1e0,1e3,1e4,1e5),GM=myGM,seqQTN=mySim$QT N.position,GWAS=myGAPIT$GWAS) })

str(statRep)

Means over replicates power=statRep[[2]] #FDR s.fdr=seq(3,length(statRep),7) fdr=statRep[s.fdr] fdr.mean=Reduce ("+", fdr) / length(fdr) #AUC: power vs. FDR s.auc.fdr=seq(6,length(statRep),7) auc.fdr=statRep[s.auc.fdr] auc.fdr.mean=Reduce ("+", auc.fdr) / length(auc.fdr)

Plots of power vs. FDR theColor=rainbow(4) plot(fdr.mean[,1],power, type="b", col=theColor [1],xlim=c(0,1)) for(i in 2:ncol(fdr.mean)){ lines(fdr.mean[,i], power, type="b", col= theColor [i]) }

 Kinship based on QTN  Confounding between QTN and kinship  Complimentary kinship  SUPER Highlight