Washington State University

Slides:

Advertisements

Similar presentations

All Possible Regressions and Statistics for Comparing Models

Advertisements

6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.

Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.

Lecture 22: Evaluation April 24, 2010.

Lecture 23: Tues., Dec. 2 Today: Thursday:

Repeated Measurements Analysis. Repeated Measures Analysis of Variance Situations in which biologists would make repeated measurements on same individual.

Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.

Sequential Multiple Decision Procedures (SMDP) for Genome Scans Q.Y. Zhang and M.A. Province Division of Statistical Genomics Washington University School.

Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both.

Sequential & Multiple Hypothesis Testing Procedures for Genome-wide Association Scans Qunyuan Zhang Division of Statistical Genomics Washington University.

Statistical Genomics Zhiwu Zhang Washington State University Lecture 26: Kernel method.

Statistical Genomics Zhiwu Zhang Washington State University Lecture 19: SUPER.

Statistical Genomics Zhiwu Zhang Washington State University Lecture 25: Ridge Regression.

Washington State University

Statistical Genomics Zhiwu Zhang Washington State University Lecture 29: Bayesian implementation.

Statistical Genomics Zhiwu Zhang Washington State University Lecture 16: CMLM.

Statistical Genomics Zhiwu Zhang Washington State University Lecture 7: Impute.

Statistical Genomics Zhiwu Zhang Washington State University Lecture 20: MLMM.

Statistical Genomics Zhiwu Zhang Washington State University Lecture 11: Power, type I error and FDR.

Genome Wide Association Studies Zhiwu Zhang Washington State University.

1 BUSI 6220 By Dr. Nick Evangelopoulos, © 2012 Brief overview of Linear Regression Models (Pre-MBA level)

Correlation and Regression

Lecture 28: Bayesian methods

Lecture 10: GWAS by correlation

Lecture 28: Bayesian Tools

Washington State University

Washington State University

Lecture 22: Marker Assisted Selection

Lecture 10: GWAS by correlation

Washington State University

Statistics in MSmcDESPOT

Genome Wide Association Studies using SNP

CJT 765: Structural Equation Modeling

Forward Selection The Forward selection procedure looks to add variables to the model. Once added, those variables stay in the model even if they become.

Washington State University

BUSI 410 Business Analytics

Washington State University

Washington State University

Washington State University

Washington State University

Washington State University

Washington State University

Lecture 10: GWAS by correlation

CHAPTER 29: Multiple Regression*

Washington State University

Washington State University

Lecture 23: Cross validation

Multiple Regression Models

Lecture 23: Cross validation

Washington State University

Washington State University

Linear Model Selection and regularization

Lecture 10: GWAS by correlation

What are BLUP? and why they are useful?

Lecture 16: Likelihood and estimates of variances

Washington State University

Lecture 11: Power, type I error and FDR

Washington State University

Multivariate Linear Regression

Lecture 20 Last Lecture: Effect of adding or deleting a variable

Multiple Regression – Split Sample Validation

Lecture 11: Power, type I error and FDR

Washington State University

Lecture 18: Heritability and P3D

Lecture 17: Likelihood and estimates of variances

Washington State University

Lecture 23: Cross validation

Lecture 29: Bayesian implementation

Lecture 22: Marker Assisted Selection

Washington State University

Presentation transcript:

Washington State University Statistical Genomics Lecture 20: MLMM Zhiwu Zhang Washington State University

Administration Homework 4 graded Homework 5, due April 13, Wednesday, 3:10PM Final exam: May 3, 120 minutes (3:10-5:10PM), 50 Department seminar (March 28) , Brigid Meints, “Breeding Barley and Beans for Western Washington”

My believes After final exam but, something remain life long Xi~N(0,1), Y=Sum(Xi) over n, Y~X2(n) y = Xb + Zu + e Vay (y) = 2K SigmaA + I SiggmaE rep(rainbow(7),100) sample(100,5, replace=F) QTNs on CHR1-5, signals pop out on CHR6-10 100% "prediction accuracy" on a trait with h2=0

Core values behind statistics, programming, genetics, GWAS and GS in CROPS545 Doing >> looking Reasoning Learn = (re)invent Creative Self confidence

Doing >> looking

Reasoning Teaching model Hypothesis: There is no space to improve Objective: Reject the null hypothesis Method: Increase statistical power

Learn = (re)Invent

Dare to break the rules with judgment Creative Dare to break the rules with judgment

Self confidence Questioning why decreasing missing rate does not improve accuracy of stochastic imputation by Chongqing Questioning what is "u" in MLM by Joe Finding of setting seed in impute (KNN) package by Louisa One more example of my own

Evaluation Comment: Much more work than other WSU courses Adjustment Assignments: 9 to 6 Requirements: No experience with statistics and programming Easy to pass, or a grade C- after 1st assignment unless unusual behavior or recommended to withdraw

Outline Stepwise regression Criteria MLMM Power vs FDR and Type I error Replicate and mean

Testing SNPs, one at a time Phenotype Population structure Unequal relatedness Y = SNP + Q (or PCs) + Kinship + e (fixed effect) (fixed effect) (random effect) General Linear Model (GLM) Mixed Linear Model (MLM) (Yu et al. 2005, Nature Genetics)

GWAS does not work for traits associated with structure Test WO correction Nature 2010 Correction with MLM Magnus Norborg GWAS does not work for traits associated with structure

Two years later

Stepwise regression Choose m predictive variables from M (M>>m) variables The challenges : Choosing m from M is an NP problem Option: approximation Non unique criteria

Stepwise regression procedures sequence of F-tests or t-tests Adjusted R-square Akaike information criterion (AIC) Bayesian information criterion (BIC) Mallows's Cp PRESS false discovery rate (FDR) Why so many?

Forward stepwise regression t or F test Test M variables one at a time Fit the most significant variable as covariate Test rest variables one at a time Is the most influential variable significant Yes No End

Backward stepwise regression t or F test Test m variables simultaneously Is the least influential variable significant Yes End No Remove it and test the rest (m)

Hind from MHC (Major histocompatibility complex)

Nature Genetics, 2012, 44, 825-830 Two QTNs GLM MLM MLMM

MLMM y = SNP + Q + K + e y = SNP + QTN1 + Q + K + e Most significant SNP as pseudo QTN y = SNP + QTN1 + Q + K + e Most significant SNP as pseudo QTN y = SNP + QTN1 + QTN2 + Q + K + e So on and so forth until…

Stop when the ratio close to zero Forward regression y = SNP +QTN1+QTN2+…+ Q + K + e Var(u) Var(y) Stop when the ratio close to zero

Backward elimination Until all pseudo QTNs are significant y = QTN1+QTN2+…+QTNt+ Q + K + e Remove the least significant pseudo QTN y = QTN1+QTN2+…+QTNt-1+ Q + K + e Until all pseudo QTNs are significant

y = SNP +QTN1+QTN2+…+ Q + K + e Final p values Pseudo QTNs: y = QTN1+QTN2+…+ Q + K + e Other markers: y = SNP +QTN1+QTN2+…+ Q + K + e

MLMM R on GitHub

#Siultate 10 QTN on the first chromosomes X=myGD[,-1] index1to5=myGM[,2]<6 X1to5 = X[,index1to5] taxa=myGD[,1] set.seed(99164) GD.candidate=cbind(taxa,X1to5) mySim=GAPIT.Phenotype.Simulation(GD=GD.candidate,GM=myGM[index1to5,],h2=.5,NQTN=10,QTNDist="norm") myy=as.numeric(mySim$Y[,-1]) myMLMM<-mlmm_cof(myy,myX,myPC[,1:2],myK,nbchunks=2,maxsteps=20) myP=myMLMM$pval_step[[1]]$out[,2] myGI.MP=cbind(myGM[,-1],myP) setwd("~/Desktop/temp") GAPIT.Manhattan(GI.MP=myGI.MP,seqQTN=mySim$QTN.position) GAPIT.QQ(myP) rm(list=ls()) setwd('/Users/Zhiwu/Dropbox/Current/ZZLab/WSUCourse/CROPS545/mlmm-master') source('mlmm_cof.r') library("MASS") # required for ginv library(multtest) library(gplots) library(compiler) #required for cmpfun library("scatterplot3d") source("http://www.zzlab.net/GAPIT/emma.txt") source("http://www.zzlab.net/GAPIT/gapit_functions.txt") source("/Users/Zhiwu/Dropbox//GAPIT/functions/gapit_functions.txt") setwd("/Users/Zhiwu/Dropbox/Current/ZZLab/WSUCourse/CROPS512/Demo") myGD <- read.table("mdp_numeric.txt", head = TRUE) myGM <- read.table("mdp_SNP_information.txt" , head = TRUE) #for PC and K setwd("~/Desktop/temp") myGAPIT0=GAPIT(GD=myGD,GM=myGM,PCA.total=3,) myPC=as.matrix(myGAPIT0$PCA[,-1]) myK=as.matrix(myGAPIT0$kinship[,-1]) myX=as.matrix(myGD[,-1])

GAPIT.FDR.TypeI Function myGWAS=cbind(myGM,myP,NA) myStat=GAPIT.FDR.TypeI(WS=c(1e0,1e3,1e4,1e5),GM=myGM,seqQTN=mySim$QTN.position,GWAS=myGWAS)

Return

Area Under Curve (AUC) par(mfrow=c(1,2),mar = c(5,2,5,2)) plot(myStat$FDR[,1],myStat$Power,type="b") plot(myStat$TypeI[,1],myStat$Power,type="b")

Replicates nrep=10 set.seed(99164) statRep=replicate(nrep, { mySim=GAPIT.Phenotype.Simulation(GD=GD.candidate,GM=myGM[index1to5,],h2=.5,NQTN=10,QTNDist="norm") myy=as.numeric(mySim$Y[,-1]) myMLMM<-mlmm_cof(myy,myX,myPC[,1:2],myK,nbchunks=2,maxsteps=20) myP=myMLMM$pval_step[[1]]$out[,2] myGWAS=cbind(myGM,myP,NA) myStat=GAPIT.FDR.TypeI(WS=c(1e0,1e3,1e4,1e5),GM=myGM,seqQTN=mySim$QTN.position,GWAS=myGWAS) })

str(statRep)

Means over replicates power=statRep[[2]] #FDR s.fdr=seq(3,length(statRep),7) fdr=statRep[s.fdr] fdr.mean=Reduce ("+", fdr) / length(fdr) #AUC: power vs. FDR s.auc.fdr=seq(6,length(statRep),7) auc.fdr=statRep[s.auc.fdr] auc.fdr.mean=Reduce ("+", auc.fdr) / length(auc.fdr)

Plots of power vs. FDR theColor=rainbow(4) plot(fdr.mean[,1],power , type="b", col=theColor [1],xlim=c(0,1)) for(i in 2:ncol(fdr.mean)){ lines(fdr.mean[,i], power , type="b", col= theColor [i]) }

Highlight Stepwise regression Criteria MLMM Power vs FDR and Type I error Replicate and mean