Washington State University

Slides:



Advertisements
Similar presentations
Statistics 350 Lecture 11. Today Last Day: Start Chapter 3 Today: Section 3.8 Mid-Term Friday…..Sections ; ; (READ)
Advertisements

Chapter 4 Two-Variables Analysis 09/19-20/2013. Outline  Issue: How to identify the linear relationship between two variables?  Relationship: Scatter.
Population Stratification
LECTURE 9 Tuesday, 24 FEBRUARY STA291 Fall Administrative 4.2 Measures of Variation (Empirical Rule) 4.4 Measures of Linear Relationship Suggested.
Statistics 350 Lecture 13. Today Last Day: Some Chapter 4 and start Chapter 5 Today: Some matrix results Mid-Term Friday…..Sections ; ;
Statistical Genomics Zhiwu Zhang Washington State University Lecture 26: Kernel method.
Statistical Genomics Zhiwu Zhang Washington State University Lecture 19: SUPER.
Statistical Genomics Zhiwu Zhang Washington State University Lecture 25: Ridge Regression.
Washington State University
Statistical Genomics Zhiwu Zhang Washington State University Lecture 29: Bayesian implementation.
Statistical Genomics Zhiwu Zhang Washington State University Lecture 16: CMLM.
Statistical Genomics Zhiwu Zhang Washington State University Lecture 7: Impute.
Statistical Genomics Zhiwu Zhang Washington State University Lecture 9: Linkage Disequilibrium.
Statistical Genomics Zhiwu Zhang Washington State University Lecture 20: MLMM.
Statistical Genomics Zhiwu Zhang Washington State University Lecture 5: Linear Algebra.
Statistical Genomics Zhiwu Zhang Washington State University Lecture 4: Statistical inference.
Statistical Genomics Zhiwu Zhang Washington State University Lecture 11: Power, type I error and FDR.
Statistical Genomics Zhiwu Zhang Washington State University Lecture 27: Bayesian theorem.
Genome Wide Association Studies Zhiwu Zhang Washington State University.
Lecture 5: Linear Algebra
Lecture 28: Bayesian methods
Matt Gormley Lecture 11 October 5, 2016
Lecture 10: GWAS by correlation
Lecture 28: Bayesian Tools
Washington State University
Washington State University
Lecture 22: Marker Assisted Selection
Lecture 10: GWAS by correlation
Lecture 12: Population structure
Washington State University
Genome Wide Association Studies using SNP
Washington State University
Lecture 12: Population structure
Lecture 5: Linear Algebra
Washington State University
Washington State University
Washington State University
Washington State University
Washington State University
Washington State University
Lecture 10: GWAS by correlation
Washington State University
Lecture 23: Cross validation
Lecture 23: Cross validation
مدلسازي تجربي – تخمين پارامتر
Linear Hierarchical Modelling
Washington State University
Lecture 10: GWAS by correlation
What are BLUP? and why they are useful?
Lecture 16: Likelihood and estimates of variances
Washington State University
Lecture 26: Bayesian theory
Washington State University
Lecture 11: Power, type I error and FDR
Washington State University
Lecture 11: Power, type I error and FDR
Lecture 12: Population structure
Washington State University
Lecture 27: Bayesian theorem
Lecture 18: Heritability and P3D
Washington State University
Lecture 17: Likelihood and estimates of variances
Washington State University
Lecture 23: Cross validation
Lecture 29: Bayesian implementation
Lecture 22: Marker Assisted Selection
Lecture 3: Distribution of random variables
Washington State University
Homework Agenda Bellwork: Wednesday February 14, 2018 Learning Goal:
Visual Algebra for Teachers
Presentation transcript:

Washington State University Statistical Genomics Lecture 13: GLM Zhiwu Zhang Washington State University

Administration Homework 3 due Mar 2, Wednesday, 3:10PM Midterm exam: February 26, Friday, 50 minutes (3:35-4:25PM), 25 questions. Homework4 posted with pending modification

Outline Spurious association Covariates LS concept Linear model

QTNs 0n CHR 1-5, leave 6-10 empty myGD=read.table(file="http://zzlab.net/GAPIT/data/mdp_numeric.txt",head=T) myGM=read.table(file="http://zzlab.net/GAPIT/data/mdp_SNP_information.txt",head=T) setwd("~/Dropbox/Current/ZZLab/WSUCourse/CROPS545/Demo") source("G2P.R") source("GWASbyCor.R") X=myGD[,-1] index1to5=myGM[,2]<6 X1to5 = X[,index1to5] set.seed(99164) mySim=G2P(X= X1to5,h2=.75,alpha=1,NQTN=10,distribution="norm") p= GWASbyCor(X=X,y=mySim$y)

Visualization color.vector <- rep(c("deepskyblue","orange","forestgreen","indianred3"),10) m=nrow(myGM) plot(t(-log10(p))~seq(1:m),col=color.vector[myGM[,2]]) abline(v=mySim$QTN.position, lty = 2, lwd=2, col = "black")

QQ plot p.obs=p[!index1to5] m2=length(p.obs) p.uni=runif(m2,0,1) order.obs=order(p.obs) order.uni=order(p.uni) plot(-log10(p.uni[order.uni]), -log10(p.obs[order.obs])) abline(a = 0, b = 1, col = "red")

Phenotypes by genotypes order.obs=order(p.obs) X6to10=X[,!index1to5] Xtop=X6to10[,order.obs[1]] boxplot(mySim$y~Xtop)

Association with phenotypes r=-0.32 PCA=prcomp(X) plot(mySim$y,PCA$x[,2]) cor(mySim$y,PCA$x[,2])

Least Square Error y = a + cx + e r=-0.18 set.seed(99164) s=sample(length(mySim$y),10) plot(mySim$y[s],PCA$x[s,2]) cor(mySim$y[s],PCA$x[s,2]) y y = a + cx + e x

GLM for GWAS Y = SNP + Q (or PCs) + e Phenotype on individuals Population structure Y = SNP + Q (or PCs) + e (fixed effect) (fixed effect) General Linear Model (GLM)

Example from the ten individuals cbind(mySim$y[s],1, PCA$x[s,2],Xtop[s]) observation mean PC2 SNP [ b0 b1 b2 ] =b y [ 1 x1 x2 ] =X y = Xb +e

Linear model y=b0 + x1b1 + x2b2 + … + xpbp + e y: observation, dependent variable x: Explainary/independent variables e: Residuals/errors d =e12 + e22 + … + en2 =e'e =(y-Xb)'(y-Xb)

Optimization d =e'e =e2=(y-Xb)2 ∂d/∂b =2X'(y-Xb) =2X'y-2X'Xb=0 X'Xb=X'y b=[X’X]-1[X’Y]

Statistical test 𝑦 = 𝑋 ′ 𝑏 𝜎 𝑒 2 = 𝑦− 𝑦 ′ (𝑦− 𝑦 )/𝑛 𝑦 = 𝑋 ′ 𝑏 𝜎 𝑒 2 = 𝑦− 𝑦 ′ (𝑦− 𝑦 )/𝑛 𝑉𝑎𝑟 𝑏 = 𝑋 ′ 𝑋 −1 𝜎 𝑒 2 𝑡= 𝑏 / 𝑉𝑎𝑟 𝑏 ~𝑡(𝑛−1)

y=mySim$y X=cbind(1, PCA$x[,2],Xtop) LHS=t(X)% y=mySim$y X=cbind(1, PCA$x[,2],Xtop) LHS=t(X)%*%X C=solve(LHS) RHS=t(X)%*%y b=C%*%RHS yb=X%*%b e=y-yb n=length(y) ve=sum(e^2)/(n-1) vt=C*ve t=b/sqrt(diag(vt)) p=2*(1-pt(abs(t),n-2)) Action in R

Phenotypes by genotypes LM=cbind(b, t, sqrt(diag(vt)), p) rownames(LM)=cbind("Mean", "PC2","Xtop") colnames(LM)=cbind("b", "t", "SD","p") LM

Loop through genome G=myGD[,-1] n=nrow(G) m=ncol(G) P=matrix(NA,1,m) for (i in 1:m){ x=G[,i] if(max(x)==min(x)){ p=1}else{ X=cbind(1, PCA$x[,2],x) LHS=t(X)%*%X C=solve(LHS) RHS=t(X)%*%y b=C%*%RHS yb=X%*%b e=y-yb n=length(y) ve=sum(e^2)/(n-1) vt=C*ve t=b/sqrt(diag(vt)) p=2*(1-pt(abs(t),n-2)) } #end of testing variation P[i]=p[length(p)] } #end of looping for markers Loop through genome

QQ plot GWASbyCor p.obs=P[!index1to5] m2=length(p.obs) p.uni=runif(m2,0,1) order.obs=order(p.obs) order.uni=order(p.uni) plot(-log10(p.uni[order.uni]), -log10(p.obs[order.obs]), ylim=c(0,7)) abline(a = 0, b = 1, col = "red") GWASbyCor GLM with PC2

Using three PCs G=myGD[,-1] n=nrow(G) m=ncol(G) P=matrix(NA,1,m) for (i in 1:m){ x=G[,i] if(max(x)==min(x)){ p=1}else{ X=cbind(1, PCA$x[,1:3],x) LHS=t(X)%*%X C=solve(LHS) RHS=t(X)%*%y b=C%*%RHS yb=X%*%b e=y-yb n=length(y) ve=sum(e^2)/(n-1) vt=C*ve t=b/sqrt(diag(vt)) p=2*(1-pt(abs(t),n-2)) } #end of testing variation P[i]=p[length(p)] } #end of looping for markers Using three PCs

QQ plot GLM with PC2 p.obs=P[!index1to5] m2=length(p.obs) p.uni=runif(m2,0,1) order.obs=order(p.obs) order.uni=order(p.uni) plot(-log10(p.uni[order.uni]), -log10(p.obs[order.obs]), ylim=c(0,7)) abline(a = 0, b = 1, col = "red") GLM with PC1:3

QQ plot p.obs=P m2=length(p.obs) p.uni=runif(m2,0,1) order.obs=order(p.obs) order.uni=order(p.uni) plot(-log10(p.uni[order.uni]), -log10(p.obs[order.obs]), ) abline(a = 0, b = 1, col = "red")

color.vector <- rep(c("deepskyblue","orange","forestgreen","indianred3"),10) m=nrow(myGM) plot(t(-log10(P))~seq(1:m),col=color.vector[myGM[,2]]) abline(v=mySim$QTN.position, lty = 2, lwd=2, col = "black")

Highlight Spurious association Covariates LS concept Linear model