Washington State University

Slides:



Advertisements
Similar presentations
All Possible Regressions and Statistics for Comparing Models
Advertisements

6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
MULTIPLE REGRESSION. OVERVIEW What Makes it Multiple? What Makes it Multiple? Additional Assumptions Additional Assumptions Methods of Entering Variables.
1 Chapter 9 Variable Selection and Model building Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Lecture 11 Multivariate Regression A Case Study. Other topics: Multicollinearity  Assuming that all the regression assumptions hold how good are our.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
CJT 765: Structural Equation Modeling Class 7: fitting a model, fit indices, comparingmodels, statistical power.
Multiple Regression Selecting the Best Equation. Techniques for Selecting the "Best" Regression Equation The best Regression equation is not necessarily.
Repeated Measurements Analysis. Repeated Measures Analysis of Variance Situations in which biologists would make repeated measurements on same individual.
Sequential Multiple Decision Procedures (SMDP) for Genome Scans Q.Y. Zhang and M.A. Province Division of Statistical Genomics Washington University School.
Sequential & Multiple Hypothesis Testing Procedures for Genome-wide Association Scans Qunyuan Zhang Division of Statistical Genomics Washington University.
Multiple Regression Selecting the Best Equation. Techniques for Selecting the "Best" Regression Equation The best Regression equation is not necessarily.
Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Specification: Choosing the Independent.
VISG – LARGE DATASETS Literature Review Introduction – Genome Wide Selection Aka Genomic Selection Set of Markers 10,000’s - enough to capture most genetic.
Using SPSS Note: The use of another statistical package such as Minitab is similar to using SPSS.
Statistical Genomics Zhiwu Zhang Washington State University Lecture 26: Kernel method.
Statistical Genomics Zhiwu Zhang Washington State University Lecture 19: SUPER.
Statistical Genomics Zhiwu Zhang Washington State University Lecture 25: Ridge Regression.
Washington State University
Statistical Genomics Zhiwu Zhang Washington State University Lecture 29: Bayesian implementation.
Statistical Genomics Zhiwu Zhang Washington State University Lecture 16: CMLM.
Statistical Genomics Zhiwu Zhang Washington State University Lecture 20: MLMM.
Statistical Genomics Zhiwu Zhang Washington State University Lecture 11: Power, type I error and FDR.
Genome Wide Association Studies Zhiwu Zhang Washington State University.
1 BUSI 6220 By Dr. Nick Evangelopoulos, © 2012 Brief overview of Linear Regression Models (Pre-MBA level)
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Lecture 28: Bayesian methods
Lecture 10: GWAS by correlation
Generalized linear models
Lecture 28: Bayesian Tools
Washington State University
Lecture 22: Marker Assisted Selection
Lecture 10: GWAS by correlation
Washington State University
Genome Wide Association Studies using SNP
CJT 765: Structural Equation Modeling
Forward Selection The Forward selection procedure looks to add variables to the model. Once added, those variables stay in the model even if they become.
Washington State University
Washington State University
Multivariate Analysis Lec 4
Conditional Test Statistics
Introduction to Data Mining and Classification
Washington State University
Washington State University
Washington State University
Washington State University
Lecture 10: GWAS by correlation
Washington State University
Lecture 23: Cross validation
Lecture 23: Cross validation
Washington State University
Linear Model Selection and regularization
Lecture 10: GWAS by correlation
What are BLUP? and why they are useful?
Lecture 16: Likelihood and estimates of variances
Washington State University
Lecture 11: Power, type I error and FDR
Washington State University
Lecture 11: Power, type I error and FDR
Washington State University
Chapter 11 Variable Selection Procedures
Lecture 18: Heritability and P3D
Washington State University
Lecture 17: Likelihood and estimates of variances
Washington State University
Lecture 23: Cross validation
Lecture 29: Bayesian implementation
Lecture 22: Marker Assisted Selection
Washington State University
Statistical Learning for Complex Survey Data:
Presentation transcript:

Washington State University Statistical Genomics Lecture 20: MLMM Zhiwu Zhang Washington State University

Administration Crop and Soil Science Seminar Monday, March 26, 1:10pm, Johnson Hall 343 Student Presentations Yvonne Thompson, Alexandra Davis, Jacob Lamkey

Outline Stepwise regression Criteria MLMM Power vs FDR and Type I error Replicate and mean

Testing SNPs, one at a time Phenotype Population structure Unequal relatedness Y = SNP + Q (or PCs) + Kinship + e (fixed effect) (fixed effect) (random effect) General Linear Model (GLM) Mixed Linear Model (MLM) (Yu et al. 2005, Nature Genetics)

Hind from MHC (Major histocompatibility complex)

Stepwise regression Choose m predictive variables from M (M>>m) variables The challenges : Choosing m from M is an NP problem Option: approximation Non unique criteria

Stepwise regression procedures sequence of F-tests or t-tests Adjusted R-square Akaike information criterion (AIC) Bayesian information criterion (BIC) Mallows's Cp PRESS false discovery rate (FDR) Why so many?

Stepwise regression Forward Test M variables one at a time Fit the most significant variable as covariate Test rest variables one at a time Is the most influential variable significant Yes No End

Stepwise regression Backward Test m variables simultaneously Is the least influential variable significant Yes End No Remove it and test the rest (m)

Nature Genetics, 2012, 44, 825-830 Two QTNs GLM MLM MLMM

MLMM y = SNP + Q + K + e y = SNP + QTN1 + Q + K + e Most significant SNP as pseudo QTN y = SNP + QTN1 + Q + K + e Most significant SNP as pseudo QTN y = SNP + QTN1 + QTN2 + Q + K + e So on and so forth until…

Stop when the ratio close to zero Forward regression y = SNP +QTN1+QTN2+…+ Q + K + e Var(u) Var(y) Stop when the ratio close to zero

Backward elimination Until all pseudo QTNs are significant y = QTN1+QTN2+…+QTNt+ Q + K + e Remove the least significant pseudo QTN y = QTN1+QTN2+…+QTNt-1+ Q + K + e Until all pseudo QTNs are significant

y = SNP +QTN1+QTN2+…+ Q + K + e Final p values Pseudo QTNs: y = QTN1+QTN2+…+ Q + K + e Other markers: y = SNP +QTN1+QTN2+…+ Q + K + e

MLMM R on GitHub

#Siultate 10 QTN on the first chromosomes X=myGD[,-1] index1to5=myGM[,2]<6 X1to5 = X[,index1to5] taxa=myGD[,1] set.seed(99164) GD.candidate=cbind(taxa,X1to5) mySim=GAPIT.Phenotype.Simulation(GD=GD.candidate,GM=myGM[index1to5,],h2=.5,NQTN=10,QTNDist="norm") myy=as.numeric(mySim$Y[,-1]) myMLMM<-mlmm_cof(myy,myX,myPC[,1:2],myK,nbchunks=2,maxsteps=20) myP=myMLMM$pval_step[[1]]$out[,2] myGI.MP=cbind(myGM[,-1],myP) setwd("~/Desktop/temp") GAPIT.Manhattan(GI.MP=myGI.MP,seqQTN=mySim$QTN.position) GAPIT.QQ(myP) rm(list=ls()) setwd('/Users/Zhiwu/Dropbox/Current/ZZLab/WSUCourse/CROPS545/mlmm-master') source('mlmm_cof.r') library("MASS") # required for ginv library(multtest) library(gplots) library(compiler) #required for cmpfun library("scatterplot3d") source("http://www.zzlab.net/GAPIT/emma.txt") source("http://www.zzlab.net/GAPIT/gapit_functions.txt") source("/Users/Zhiwu/Dropbox//GAPIT/functions/gapit_functions.txt") setwd("/Users/Zhiwu/Dropbox/Current/ZZLab/WSUCourse/CROPS512/Demo") myGD <- read.table("mdp_numeric.txt", head = TRUE) myGM <- read.table("mdp_SNP_information.txt" , head = TRUE) #for PC and K setwd("~/Desktop/temp") myGAPIT0=GAPIT(GD=myGD,GM=myGM,PCA.total=3,) myPC=as.matrix(myGAPIT0$PCA[,-1]) myK=as.matrix(myGAPIT0$kinship[,-1]) myX=as.matrix(myGD[,-1])

GAPIT.FDR.TypeI Function myGWAS=cbind(myGM,myP,NA) myStat=GAPIT.FDR.TypeI(WS=c(1e0,1e3,1e4,1e5), GM=myGM, seqQTN=mySim$QTN.position, GWAS=myGWAS)

Return

Area Under Curve (AUC) par(mfrow=c(1,2),mar = c(5,2,5,2)) plot(myStat$FDR[,1],myStat$Power,type="b") plot(myStat$TypeI[,1],myStat$Power,type="b")

Replicates nrep=10 set.seed(99164) statRep=replicate(nrep, { mySim=GAPIT.Phenotype.Simulation(GD=GD.candidate,GM=myGM[index1to5,],h2=.5,NQTN=10,QTNDist="norm") myy=as.numeric(mySim$Y[,-1]) myMLMM<-mlmm_cof(myy,myX,myPC[,1:2],myK,nbchunks=2,maxsteps=20) myP=myMLMM$pval_step[[1]]$out[,2] myGWAS=cbind(myGM,myP,NA) myStat=GAPIT.FDR.TypeI(WS=c(1e0,1e3,1e4,1e5),GM=myGM,seqQTN=mySim$QTN.position,GWAS=myGWAS) })

str(statRep)

Means over replicates power=statRep[[2]] #FDR s.fdr=seq(3,length(statRep),7) fdr=statRep[s.fdr] fdr.mean=Reduce ("+", fdr) / length(fdr) #TypeI s.t1=seq(4,length(statRep),7) t1=statRep[s.t1] t1.mean=Reduce ("+", t1) / length(t1)

Area Under Curve (AUC) par(mfrow=c(1,2),mar = c(5,2,5,2)) plot(fdr.mean[,1],power , type="b") plot(t1.mean[,1],power , type="b")

Highlight Stepwise regression Criteria MLMM Power vs FDR and Type I error Replicate and mean