1 Example Analysis of a Two-Color Array Experiment Using LIMMA 3/30/2011 Copyright © 2011 Dan Nettleton.

Slides:



Advertisements
Similar presentations
Linear Models for Microarray Data
Advertisements

Multi-way Anova Identifying and quantifying sources of variation Ability to "factor out" certain sources - ("adjusting") For the beginning, we.
1 Parametric Empirical Bayes Methods for Microarrays 3/7/2011 Copyright © 2011 Dan Nettleton.
Limma: Linear Models for Microarray Data R user group 21 June 2005 Judith Boer.
Dahlia Nielsen North Carolina State University Bioinformatics Research Center.
LimmaGUI A Point-and-Click Interface for cDNA Microarray Analysis James Wettenhall and Gordon Smyth Division of Genetics and Bioinformatics Walter and.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
OHRI Bioinformatics Introduction to the Significance Analysis of Microarrays application Stem.
Sandrine Dudoit1 Microarray Experimental Design and Analysis Sandrine Dudoit jointly with Yee Hwa Yang Division of Biostatistics, UC Berkeley
Getting the numbers comparable
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Preprocessing Methods for Two-Color Microarray Data
Microarray Data Preprocessing and Clustering Analysis
Differentially expressed genes
1 Test of significance for small samples Javier Cabrera.
1 Normalization Methods for Two-Color Microarray Data 1/13/2009 Copyright © 2009 Dan Nettleton.
The following slides have been adapted from to be presented at the Follow-up course on Microarray Data Analysis.
Multiple testing in high- throughput biology Petter Mostad.
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
CDNA Microarrays MB206.
Panu Somervuo, March 19, cDNA microarrays.
Randomization issues Two-sample t-test vs paired t-test I made a mistake in creating the dataset, so previous analyses will not be comparable.
1 Two Color Microarrays EPP 245/298 Statistical Analysis of Laboratory Data.
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
Differential Expression II Adding power by modeling all the genes Oct 06.
We calculated a t-test for 30,000 genes at once How do we handle results, present data and results Normalization of the data as a mean of removing.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Introduction to analysis of microarray data David Edwards.
Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates.
1 Identifying differentially expressed genes from RNA-seq data Many recent algorithms for calling differentially expressed genes: edgeR: Empirical analysis.
Model-based analysis of oligonucleotide arrays, dChip software Statistics and Genomics – Lecture 4 Department of Biostatistics Harvard School of Public.
Introduction to Statistical Analysis of Gene Expression Data Feng Hong Beespace meeting April 20, 2005.
1 Example Analysis of an Affymetrix Dataset Using AFFY and LIMMA 4/4/2011 Copyright © 2011 Dan Nettleton.
Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002.
Microarray Data Pre-Processing
1 Introduction to Mixed Linear Models in Microarray Experiments 2/1/2011 Copyright © 2011 Dan Nettleton.
1 ArrayTrack Demonstration National Center for Toxicological Research U.S. Food and Drug Administration 3900 NCTR Road, Jefferson, AR
Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic.
Microarray Technology. Introduction Introduction –Microarrays are extremely powerful ways to analyze gene expression. –Using a microarray, it is possible.
Statistical Analysis of Microarray Data By H. Bjørn Nielsen.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb GP3xCLI GenePix Post-Processing.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
1 Estimation of Gene-Specific Variance 2/17/2011 Copyright © 2011 Dan Nettleton.
SPH 247 Statistical Analysis of Laboratory Data 1 May 5, 2015 SPH 247 Statistical Analysis of Laboratory Data.
Empirical Bayes Analysis of Variance Component Models for Microarray Data S. Feng, 1 R.Wolfinger, 2 T.Chu, 2 G.Gibson, 3 L.McGraw 4 1. Department of Statistics,
Henrik Bengtsson Mathematical Statistics Centre for Mathematical Sciences Lund University Plate Effects in cDNA Microarray Data.
Preprocessing Bioconductor. Overview Data set Salmo, color flip design Name Cy5 Cy3 FileName array1 self1 self txt array2.
Variability & Statistical Analysis of Microarray Data GCAT – Georgetown July 2004 Jo Hardin Pomona College
The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
> FDRpvalue FDRTPvalue plot(LimmaFit$coefficients[complete.data],-log(FDRTPvalue[complete.data],base=10),type="p",main="Two-Sample.
Microarray Data Analysis Xuming He Department of Statistics University of Illinois at Urbana-Champaign.
Micro array Data Analysis. Differential Gene Expression Analysis The Experiment Micro-array experiment measures gene expression in Rats (>5000 genes).
基于 R/Bioconductor 进行生物芯片数据分析 曹宗富 博奥生物有限公司
Lab 5 Unsupervised and supervised clustering Feb 22 th 2012 Daniel Fernandez Alejandro Quiroz.
limma Data to import:
Estimation of Gene-Specific Variance
Mixture Modeling of the p-value Distribution
Differential Gene Expression
Copyright © 2007 Dan Nettleton
Normalization Methods for Two-Color Microarray Data
Uniform-Beta Mixture Modeling of the p-value Distribution
Significance Analysis of Microarrays (SAM)
Elizabeth Garrett Giovanni Parmigiani
Significance Analysis of Microarrays (SAM)
Parametric Empirical Bayes Methods for Microarrays
Taming Human Genetic Variability: Transcriptomic Meta-Analysis Guides the Experimental Design and Interpretation of iPSC-Based Disease Modeling  Pierre-Luc.
Analysis of Microarray Data Using Z Score Transformation
Presentation transcript:

1 Example Analysis of a Two-Color Array Experiment Using LIMMA 3/30/2011 Copyright © 2011 Dan Nettleton

2 Example Dataset Two-color microarray experiment described in Brooks, L., Strable, J., Zhang, X., Ohtsu, K., Zhou, R., Sarkar, A., Hargreaves, S., Eudy, D., Pawlowska, T., Ware, D., Janick-Buckner, D., Buckner, B., Timmermans, M.C.P., Schnable, P.S., Nettleton, D., Scanlon, M.J. (2009). Microdissection of shoot meristem functional domains. PLoS Genetics. 5(5): e

3 Comparison of Two Maize Cell Types (p vs. s) s p s p s p s p s p s p

4 ###### # #Load Limma package. (Must first be installed.) # ###### library(limma) ###### # #Examine the limma users guide. # ###### limmaUsersGuide()

5 ###### # #Set the working directory to the directory containing #the raw data files. The files are available at # #You will need to download them to your hard drive #and change the path accordingly. # ###### setwd("C:\\z\\sam3.0")

6 ###### # #Read the Targets.txt file that contains information #about the slides and what was hybridized to each #channel. # ###### targets=readTargets("Targets.txt") targets SlideNumber FileName Cy3 Cy5 1 1 sam31.gpr s p 2 2 sam32.gpr s p 3 3 sam33.gpr s p 4 4 sam34.gpr p s 5 5 sam35.gpr p s 6 6 sam36.gpr p s

7 ###### # #Read the data files into R and examine content. # ###### RG=read.maimages(targets,source="genepix") attributes(RG) $class [1] "RGList" attr(,"package") [1] "limma" $names [1] "R" "G" "Rb" "Gb" "targets" "genes" "source" [8] "printer"

8 head(RG$R) sam31 sam32 sam33 sam34 sam35 sam36 [1,] [2,] [3,] [4,] [5,] [6,] head(RG$Rb) sam31 sam32 sam33 sam34 sam35 sam36 [1,] [2,] [3,] [4,] [5,] [6,]

9 ###### # #Examine boxplots of red and green backgrounds. # ###### boxplot(data.frame(cbind(log2(RG$Gb),log2(RG$Rb))), main="Background", col=rep(c("green","red"),each=6))

10

11 ###### # #For slide 6, examine an image of the green background #and green signal corresponding to the slide layout. # ###### imageplot(log2(RG$Gb[,6]),RG$printer) x11() imageplot(log2(RG$G[,6]),RG$printer)

12

13

14 ###### # #Plot signal vs. background for the green channel of #slide 6. # ###### plot(log2(RG$Gb[,6]),log2(RG$G[,6])) lines(c(-9,99),c(-9,99),col=2)

15

16 ###### # #Perform background correction for all slides. # ###### RG=backgroundCorrect(RG,method="normexp") attributes(RG) $names [1] "R" "G" "targets" "genes" "source" "printer" $class [1] "RGList" attr(,"package") [1] "limma"

17 head(RG$R) sam31 sam32 sam33 sam34 sam35 sam36 [1,] [2,] [3,] [4,] [5,] [6,] ###### # #Examine side-by-side boxplots of #background-corrected data. # ###### boxplot(log2(data.frame( RG$R,RG$G))[,as.vector(rbind(1:6,7:12))], col=rep(c("red","green"),6))

18

19 ###### # #Examine background corrected signal densities #for each channel. # ###### plotDensities(RG)

20

21 ###### # #Examine plots of the difference vs. the average #log background corrected signal. # ###### plotMA(RG) plotMA(RG[,2]) plotMA(RG[,3]) plotMA(RG[,4]) plotMA(RG[,5]) plotMA(RG[,6])

22

23

24

25

26

27

28 Points that fall along a diagonal line have same value of log R for different values of log G.

29 log R – log G vs. (log R + log G)/2 becomes c – log G vs. (c + log G)/2 for some constant c.

30 Multiple spots have the same value of log R after background correction due to ties in the original R-Rb values.

31 ###### # #Loess normalize and median center the data. #Examine the resulting Red-Green differences. # ###### MA=normalizeWithinArrays(RG,method="loess") MA=normalizeWithinArrays(MA,method="median") attributes(MA) $class [1] "MAList" attr(,"package") [1] "limma" $names [1] "targets" "genes" "source" "printer" "M" "A"

32 head(MA$M) sam31 sam32 sam33 sam34 sam35 sam36 [1,] [2,] [3,] [4,] [5,] [6,] head(MA$A) sam31 sam32 sam33 sam34 sam35 sam36 [1,] [2,] [3,] [4,] [5,] [6,]

33 plotMA(MA[,6]) plotDensities(MA) boxplot(data.frame(MA$M))

34

35

36

37 ###### # #Scale normalize the differences. # ###### MA=normalizeBetweenArrays(MA,method="scale") boxplot(data.frame(MA$M))

38

39 #Create a design matrix for a gene-specific vector of #differences as the response. targets SlideNumber FileName Cy3 Cy5 1 1 sam31.gpr s p 2 2 sam32.gpr s p 3 3 sam33.gpr s p 4 4 sam34.gpr p s 5 5 sam35.gpr p s 6 6 sam36.gpr p s design=cbind(rep(1,6),rep(c(1,-1),each=3)) colnames(design)=c("Cy5minusCy3","PminusS") design Cy5minusCy3 PminusS [1,] 1 1 [2,] 1 1 [3,] 1 1 [4,] 1 -1 [5,] 1 -1 [6,] 1 -1

40 #Use the limma function lmFit to fit the model. fit=lmFit(MA, design) #Set up contrasts and compute p-values #using the limma approach. contrast.matrix=makeContrasts(PminusS, levels=design) fit2=contrasts.fit(fit, contrast.matrix) efit=eBayes(fit2) attributes(efit) $names [1] "coefficients" "rank" "assign" "qr" [5] "df.residual" "sigma" "cov.coefficients" "stdev.unscaled" [9] "genes" "Amean" "method" "design" [13] "contrasts" "df.prior" "s2.prior" "var.prior" [17] "proportion" "s2.post" "t" "p.value" [21] "lods" "F" "F.p.value" $class [1] "MArrayLM" attr(,"package") [1] "limma"

41 #Examine estimates of the prior df and variance. efit$df.prior [1] efit$s2.prior [1] #Compare variance estimates before and after shrinkage. plot(log(efit$sigma),log(sqrt(efit$s2.post)), xlab="Log Sqrt Original Variance Estimate", ylab="Log Sqrt Empirical Bayes Variance Estimate") lines(c(-99,99),c(-99,99),col=2) lsrs20=log(sqrt(efit$s2.prior)) abline(h=lsrs20,col=4) abline(v=lsrs20,col=4)

42

43 hist(efit$p.value,col=4) box()

44 #Get log base 2 fold change estimates log2fc=efit$coefficients #Create a volcano plot. p=efit$p.value[,1] plot(log2fc,-log(p,base=10), xlab=expression(paste(log[2]," fold change")), ylab=expression(paste(-log[10]," p-value")), cex.lab=1.7)

45

46 #Load some functions for multiple testing. source(" ~dnett/microarray/multtest.txt") #Estimate the number of EE (m0) and DE (m1) genes. m=length(p) m0=estimate.m0(p) m1=m-m0 m [1] m0 [1] 6084 m1 [1] 6716

47 #Convert p-values to q-values. qvals=jabes.q(p) #Find number of genes declared to be DE #for various FDR thresholds. fdrcuts=c(0.001,0.01,0.05) cbind(fdrcuts,NumberOfGenes= apply(outer(qvals,fdrcuts,"<="),2,sum)) fdrcuts NumberOfGenes [1,] [2,] [3,]

48 #Examine position of genes declared to be DE at FDR=0.01 #in MA plot for slide 6. plot(MA$A[,6],MA$M[,6], col=1+(qvals<=0.01), pch=3+14*(qvals<=0.01),cex=.6)

49

50 These points have small values of log R and relatively big values of log G, but these differences are not replicated on other slides.

51 #Fit uniform-beta mixture model. out=ub.mix(p) out [1] plot.ub.mix(p,out)

52

53 #Change starting values and refit #uniform-beta mixture model. out=ub.mix(p,c(.5,1,10)) out [1] plot.ub.mix(p,out)

54

55 #Compute estimated ppde for each gene. eppde=ppde(p,out) plot(p,eppde,xlab="p-value", ylab=“Estimated Posterior Probability of DE", cex.lab=1.5)

56

57 ppdecuts=seq(.6,.95,by=0.05) cbind(ppdecuts,NumberOfGenes= apply(outer(eppde,ppdecuts,">="),2,sum)) ppdecuts NumberOfGenes [1,] [2,] [3,] [4,] [5,] [6,] [7,] [8,]