1-26-061 > FDRpvalue<-p.adjust(LimmaFitPvalue,method="fdr") > FDRTPvalue<-p.adjust(TPvalue,method="fdr") > plot(LimmaFit$coefficients[complete.data],-log(FDRTPvalue[complete.data],base=10),type="p",main="Two-Sample.

Slides:



Advertisements
Similar presentations
Linear Models for Microarray Data
Advertisements

Test of (µ 1 – µ 2 ),  1 =  2, Populations Normal Test Statistic and df = n 1 + n 2 – 2 2– )1– 2 ( 2 1 )1– 1 ( 2 where ] 2 – 1 [–
Chapter 18: The Chi-Square Statistic
Multi-way Anova Identifying and quantifying sources of variation Ability to "factor out" certain sources - ("adjusting") For the beginning, we.
Statistical Modeling and Data Analysis Given a data set, first question a statistician ask is, “What is the statistical model to this data?” We then characterize.
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
From the homework: Distribution of DNA fragments generated by Micrococcal nuclease digestion mean(nucs) = bp median(nucs) = 110 bp sd(nucs+ = 17.3.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
Hypothesis Testing Steps of a Statistical Significance Test. 1. Assumptions Type of data, form of population, method of sampling, sample size.
1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Differentially expressed genes
BCOR 1020 Business Statistics Lecture 28 – May 1, 2008.
13-1 Designing Engineering Experiments Every experiment involves a sequence of activities: Conjecture – the original hypothesis that motivates the.
Simple Linear Regression and Correlation
Genomic Profiles of Brain Tissue in Humans and Chimpanzees.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
Statistics for Biologists 1. Outline estimation and hypothesis testing two sample comparisons linear models non-linear models application to genome scale.
General Linear Model & Classical Inference Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London SPM M/EEGCourse London, May.
Multiple testing in high- throughput biology Petter Mostad.
1 STATISTICAL HYPOTHESES AND THEIR VERIFICATION Kazimieras Pukėnas.
Candidate marker detection and multiple testing
Means Tests Hypothesis Testing Assumptions Testing (Normality)
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Randomization issues Two-sample t-test vs paired t-test I made a mistake in creating the dataset, so previous analyses will not be comparable.
Differential Expression II Adding power by modeling all the genes Oct 06.
General Linear Model & Classical Inference London, SPM-M/EEG course May 2014 C. Phillips, Cyclotron Research Centre, ULg, Belgium
First approach - repeating a simple analysis for each gene separately - 30k times Assume we have two experimental conditions (j=1,2) We measure.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Introduction to analysis of microarray data David Edwards.
Statistical analysis of expression data: Normalization, differential expression and multiple testing Jelle Goeman.
Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates.
Back to basics – Probability, Conditional Probability and Independence Probability of an outcome in an experiment is the proportion of times that.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Statistical Testing of Differences CHAPTER fifteen.
Multiple Testing Matthew Kowgier. Multiple Testing In statistics, the multiple comparisons/testing problem occurs when one considers a set of statistical.
Lack of Fit (LOF) Test A formal F test for checking whether a specific type of regression function adequately fits the data.
Problem 3.26, when assumptions are violated 1. Estimates of terms: We can estimate the mean response for Failure Time for problem 3.26 from the data by.
Analysis of Variance.
Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002.
Elementary statistics for foresters Lecture 5 Socrates/Erasmus WAU Spring semester 2005/2006.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Single-Factor Studies KNNL – Chapter 16. Single-Factor Models Independent Variable can be qualitative or quantitative If Quantitative, we typically assume.
1 Example Analysis of a Two-Color Array Experiment Using LIMMA 3/30/2011 Copyright © 2011 Dan Nettleton.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: One-way ANOVA Marshall University Genomics Core.
Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic.
Lecture 6 Design Matrices and ANOVA and how this is done in LIMMA.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
1 Estimation of Gene-Specific Variance 2/17/2011 Copyright © 2011 Dan Nettleton.
Chapter 13- Inference For Tables: Chi-square Procedures Section Test for goodness of fit Section Inference for Two-Way tables Presented By:
Statistical Data Analysis 2011/2012 M. de Gunst Lecture 3.
SPH 247 Statistical Analysis of Laboratory Data 1 May 5, 2015 SPH 247 Statistical Analysis of Laboratory Data.
ReCap Part II (Chapters 5,6,7) Data equations summarize pattern in data as a series of parameters (means, slopes). Frequency distributions, a key concept.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
SUMMARY EQT 271 MADAM SITI AISYAH ZAKARIA SEMESTER /2015.
Microarray Data Analysis Xuming He Department of Statistics University of Illinois at Urbana-Champaign.
limma Data to import:
Estimating standard error using bootstrap
CHAPTER 10 Comparing Two Populations or Groups
Differential Gene Expression
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
CHAPTER 10 Comparing Two Populations or Groups
Nonparametric Statistics Overview
Inferences on Two Samples Summary
CHAPTER 10 Comparing Two Populations or Groups
Parametric Empirical Bayes Methods for Microarrays
Chapter 18: The Chi-Square Statistic
K-Sample Methods Assume X1j from treatment 1 (sample of size n1) and and so forth… Xkj from treatment k (sample of size nk) for a total of n1+n2+ … +nk.
Presentation transcript:

> FDRpvalue<-p.adjust(LimmaFitPvalue,method="fdr") > FDRTPvalue<-p.adjust(TPvalue,method="fdr") > plot(LimmaFit$coefficients[complete.data],-log(FDRTPvalue[complete.data],base=10),type="p",main="Two-Sample t- test",xlab="Mean Difference",ylab="-log10(FDRTPvalue)",ylim=c(0,0.6)) > plot(LimmaFit$coefficients[complete.data],-log(FDRpvalue[complete.data],base=10),type="p",main="Paired t- test",xlab="Mean Difference",ylab="-log10(FDRpvalue)",ylim=c(0,0.6)) FDR Comparisons source("

False Discovery Rate Alternatively, adjust p-values as Following decision making procedure will keep FDR below q *

FDR p-value adjustments > N<-sum(!is.na(LimmaFitPvalue)) > Qvalue<-(N/rank(LimmaFitPvalue))*LimmaFitPvalue > FDRpvalue<-p.adjust(LimmaFitPvalue,method="fdr") > BONFpvalue<-p.adjust(LimmaFitPvalue,method="bonferroni") > All.Sigs<- data.frame(list(p.value=LimmaFitPvalue,Qvalue=Qvalue,FDRpv alue=FDRpvalue,BONFpvalue=BONFpvalue)) > OrderingPvalues<-order(LimmaFitPvalue) > All.Sigs<-All.Sigs[OrderingPvalues,] >

FDR p-value adjustments > All.Sigs[1:10,] p.value Qvalue FDRpvalue BONFpvalue e e e e e e e e e e >

Loess Normalization

Loess Normalization > NickelLoess<-loess(NNBLimmaDataNickel$M[,1]~NNBLimmaDataNickel$A[,1]) > attributes(NickelLoess) $names [1] "n" "fitted" "residuals" "enp" "s" "one.delta" [7] "two.delta" "trace.hat" "divisor" "pars" "kd" "call" [13] "terms" "xnames" "x" "y" "weights" $class [1] "loess" > par(mfrow=c(1,1)) > plotMA(NNBLimmaDataNickel,array=1,zero.weights=F,legend=F) > lines(c(-5,100),c(0,0),lw=5,col="blue") > points(NNBLimmaDataNickel$A[,1],NickelLoess$fitted,col="red")

Loess Normalization > NLNBLimmaDataNickel<-normalizeWithinArrays(NBLimmaDataNickel,method="loess") > par(mfrow=c(1,1)) > plotMA(NLNBLimmaDataNickel,array=1,zero.weights=F,legend=F) > lines(c(-5,100),c(0,0),lw=5,col="blue")

Loess Normalized Analysis > LoessLimmaFit<-lmFit(NLNBLimmaDataNickel,design) > LoessLimmaFitTstat<- abs(LoessLimmaFit$coefficients/(LoessLimmaFit$sigma*LoessLimmaFit$stdev.unscaled)) > LoessLimmaFitPvalue<-2*pt(LoessLimmaFitTstat,LoessLimmaFit$df.residual,lower.tail=FALSE) > LoessFDRpvalue<-p.adjust(LoessLimmaFitPvalue,method="fdr")

Empirical Bayes Estimation > EBayesLoessLimmaFit<-eBayes(LoessLimmaFit) > attributes(EBayesLoessLimmaFit) $names [1] "coefficients" "stdev.unscaled" "sigma" "df.residual" [5] "cov.coefficients" "pivot" "method" "design" [9] "genes" "Amean" "df.prior" "s2.prior" [13] "var.prior" "proportion" "s2.post" "t" [17] "p.value" "lods" "F" "F.p.value" $class [1] "MArrayLM" attr(,"package") [1] "limma" > EBayesFDR<-p.adjust(EBayesLoessLimmaFit$p.value,method="fdr") Homework: Compare "coefficients" for EBayesLoessLimmaFit and LoessLimmaFit

Empirical Bayes Estimation

Pretending That Data is Single Channel Again > LinearModelData<-cbind(NLNBLimmaDataNickel,NLNBLimmaDataNickel) > dim(LinearModelData) [1] > LinearModelData$M[1,] Cy5_Ctl-WT00hr_2_VS_Cy3_Nic-WT72hr_2 Cy5_Nic-WT72hr_3_VS_Cy3_Ctl-WT00hr_3 Cy5_Nic-WT72hr_1_VS_Cy3_Ctl-WT00hr_1 Cy5_Ctl-WT00hr_2_VS_Cy3_Nic- WT72hr_ Cy5_Nic-WT72hr_3_VS_Cy3_Ctl-WT00hr_3 Cy5_Nic-WT72hr_1_VS_Cy3_Ctl-WT00hr_ > LinearModelData$M<-cbind((NLNBLimmaDataNickel$A+NLNBLimmaDataNickel$M/2),NLNBLimmaDataNickel$A-(NLNBLimmaDataNickel$M/2)) > LinearModelData$M[1,] Cy5_Ctl-WT00hr_2_VS_Cy3_Nic-WT72hr_2 Cy5_Nic-WT72hr_3_VS_Cy3_Ctl-WT00hr_3 Cy5_Nic-WT72hr_1_VS_Cy3_Ctl-WT00hr_1 Cy5_Ctl-WT00hr_2_VS_Cy3_Nic-WT72hr_ Cy5_Nic-WT72hr_3_VS_Cy3_Ctl-WT00hr_3 Cy5_Nic-WT72hr_1_VS_Cy3_Ctl-WT00hr_ > This will "split" ratios back into the two channel data We could use original R and G data, but want to make use of loess normalization

Simple Linear Model Statistical Model: Trt j ="Effect due to treatmet j" Ctl for j=1 and Nic for j=2 x ij = log-intensity for replicate i and treatment j, where i=1,2,3 and j=1,2 Parameters are estimated based on the General Linear Model Framework and for now it just turns out that We are interested if Trt 2 -Trt 1

Describing a Linear Model to Limma Statistical Model: Design Matrix: > designMatrix<-cbind(Ctl=c(1,0,0,0,1,1),Nic=c(0,1,1,1,0,0)) > designMatrix Ctl Nic [1,] 1 0 [2,] 0 1 [3,] 0 1 [4,] 0 1 [5,] 1 0 [6,] 1 0 > contrast<-c(-1,1)

Describing a Linear Model to Limma Statistical Model: > FitLM<-lmFit(LinearModelData,designMatrix) > FitContrast<-contrasts.fit(FitLM,contrast) > attributes(FitContrast) $names [1] "coefficients" "stdev.unscaled" "sigma" "df.residual" "cov.coefficients" "method" "design" [8] "genes" "Amean" "contrasts" $class [1] "MArrayLM" attr(,"package") [1] "limma" > ContrastTstat<- abs(FitContrast$coefficients)/(FitContrast$sigma*FitContrast$stdev.unscaled) > ContrastPvalue<-2*pt(ContrastTstat,FitContrast$df.residual,lower.tail=FALSE)

Alternative Formulations of the Two-way ANOVA No-intercept Model Parameter Estimates Gets complicated Regardless of how the model is parametrized certain parameters remain unchanged (Trt 2 -Trt 1 ) In this sense all formulations are equivalent Null Hypotheses Null Distributions