Randomization issues Two-sample t-test vs paired t-test I made a mistake in creating the dataset, so previous analyses will not be comparable with new ones Issues of the background subtraction Limma as the general tool for analyzing microarray data Outline
limma... is a package for the analysis of microarray data, especially the use of linear models for analyzing designed experiments and the assessment of differential expression. Specially constructed data objects to represent various aspects of microarray data Specially constructed "object methods" for importing, normalizing, displaying and analyzing microarray data All objects and methods are transparent All objects can be accessed and modified outside of limma Unique in the implementation of the empirical Bayes procedure for identifying differentially expressed genes by "borrowing" information from different genes (everything so far has been gene by gene)
Measurement Error Model With Additive Background There are other models for accounting for the background signal Simple subtraction of the background intensities often introduces additional variability in the observed signal The problem is in the fact that we use a single-observation estimate for B With this in mind, various strategies have been proposed to pool background information from more than one spot to estimate B Foreground (F) Background (B) Old Model New Model
limma Data to import: File descriptions: Spot descriptions: Importing data: source("
limma library(limma) data.directory<-" targets<-readTargets(" spottypes<-readSpotTypes(" LimmadataC<-read.maimages(files=targets$FileName,source="genepix", path = data.directory, columns=list(Gf = "F532 Median", Gb ="B532 Median", Rf = "F635 Median", Rb = "B635 Median"), annotation=c("Name","ID","Block","Row","Column"),wt.fun=wtflags(0))
RGList class > attributes(LimmadataC) $names [1] "R" "G" "Rb" "Gb" "weights" "targets" "genes" $class [1] "RGList" attr(,"package") [1] "limma"
RGList class > LimmadataC$genes[1,] Name ID Block Row Column 1 no name Rn > LimmadataC$R[1:3,] 51-C1-3-vs-W W2-3-vs-C C3-3-vs-W W4-3-vs-C C5-3-vs-W W6-3-vs-C6-5 [1,] [2,] [3,] > LimmadataC$Rb[1:3,] 51-C1-3-vs-W W2-3-vs-C C3-3-vs-W W4-3-vs-C C5-3-vs-W W6-3-vs-C6-5 [1,] [2,] [3,]
RGList class LimmadataC$genes$Status<-controlStatus(spottypes,LimmadataC) LimmadataC$weights[LimmadataC$genes$ID=="Blank"]<-0 LimmadataC$printer<-getLayout(LimmadataC$genes) > attributes(LimmadataC) $names [1] "R" "G" "Rb" "Gb" "weights" "targets" "genes" "printer" $class [1] "RGList" attr(,"package") [1] "limma" > LimmadataC$genes[1,] Name ID Block Row Column Status 1 no name Rn cDNA
Plotting data in a RGList object > plotMA(LimmadataC,array=1,xlim=c(-1,16),ylim=c(-3,8))
limma PlotMA automatically subtracts the background intensities before plotting data Does not plot data with weight 0 If you want to plot all data or the data without subtracting background, you need to do a little work source("
limma > NBLimmadataC<-backgroundCorrect(LimmadataC,method="none") > attributes(NBLimmadataC) $names [1] "R" "G" "weights" "targets" "genes" "printer" $class [1] "RGList" attr(,"package") [1] "limma" Note that background measurements are gone
Scatter with and without background subtraction Background subtracted data is more spread More data points without background subtractions
Plotting all data points Want to plot data points with weight 0 as well Create datasets with and without subtracting background and set all weights to 1 SpotsPerArray<-dim(LimmadataC$R)[1] Narrays<-dim(LimmadataC$R)[2] Limmadata<-LimmadataC Limmadata$weights[1:SpotsPerArray,1:Narrays]<-1 NBLimmadata<-NBLimmadataC NBLimmadata$weights[1:SpotsPerArray,1:Narrays]<-1
Plotting all data points Background Subtracted Raw All dataZero-weight data removed
Which one to use? Removing points with the weight zero seems reasonable Subtracting background costs us some data points even if one channel is above background since differences of log-transformed measurements are used only Subtracting background seems to increase the variability, but it is unclear how would this affect results For now proceed without background subtraction, but compare results at the end Exploring other proposed background-adjustment methods also seems like a good idea
Data Analysis Loess normalization source("eh3.uc.edu/LimmaLoess.R") > NNBLimmadataC<-normalizeWithinArrays(NBLimmadataC, method="loess") > attributes(NNBLimmadataC) $names [1] "weights" "targets" "genes" "printer" "M" "A" $class [1] "MAList" attr(,"package") [1] "limma"
Loess-normalized data
source(" > design<-modelMatrix(targets, ref="C") Found unique target names: C W > design W 51-C1-3-vs-W W2-3-vs-C C3-3-vs-W W4-3-vs-C C5-3-vs-W W6-3-vs-C Paired t-test using limma
> LimmaPTT<-lmFit(MA,design) Error in.class1(object) : Object "MA" not found > LimmaPTT<-lmFit(NNBLimmadataC,design) > > attributes(LimmaPTT) $names [1] "coefficients" "stdev.unscaled" "sigma" "df.residual" [5] "cov.coefficients" "pivot" "method" "design" [9] "genes" "Amean" $class [1] "MArrayLM" attr(,"package") [1] "limma" Paired t-test using limma
> mean(c(1,-1,1,-1,1,-1)*NNBLimmadataC$M[2,]) [1] > var(c(1,-1,1,-1,1,-1)*NNBLimmadataC$M[2,])^0.5 [1] > 1/(6^0.5) [1] > mean(c(1,-1,1,-1,1,-1)*NNBLimmadataC$M[2,])/((var(c(1,-1,1,-1,1,-1)*NNBLimmadataC$M[2,])^0.5)*(1/(6^0.5))) [1] > > LimmaPTT$coefficients[2] [1] > LimmaPTT$stdev.unscaled[2] [1] > LimmaPTT$sigma[2] [1] > LimmaPTT$coefficients[2]/(LimmaPTT$sigma[2]*LimmaPTT$stdev.unscaled[2]) [1] Paired t-test using limma
> mean(c(1,-1,1,-1,1,-1)*NNBLimmadataC$M[1,]) [1] > var(c(1,-1,1,-1,1,-1)*NNBLimmadataC$M[1,])^0.5 [1] > 1/(6^0.5) [1] > mean(c(1,-1,1,-1,1,-1)*NNBLimmadataC$M[1,])/((var(c(1,-1,1,-1,1,-1)*NNBLimmadataC$M[1,])^0.5)*(1/(6^0.5))) [1] > > LimmaPTT$coefficients[1] [1] > LimmaPTT$stdev.unscaled[1] [1] 0.5 > LimmaPTT$sigma[1] [1] > LimmaPTT$coefficients[1]/(LimmaPTT$sigma[1]*LimmaPTT$stdev.unscaled[1]) [1] > NNBLimmadataC$weights[1,] 51-C1-3-vs-W W2-3-vs-C C3-3-vs-W W4-3-vs-C C5-3-vs-W W6-3-vs-C Paired t-test using limma
> dfp 0 > LimmaPTT$LimmaTStat<-LimmaPTT$coefficients/(LimmaPTT$sigma*LimmaPTT$stdev.unscaled) > LimmaPTT$LimmaTPvalue<-rep(NA,SpotsPerArray) > LimmaPTT$LimmaTPvalue[dfp]<-2*pt(LimmaPTT$LimmaTStat[dfp],LimmaPTT$df.residual[dfp],lower.tail=FALSE) > attributes(LimmaPTT) $names [1] "coefficients" "stdev.unscaled" "sigma" "df.residual" [5] "cov.coefficients" "pivot" "method" "design" [9] "genes" "Amean" "LimmaTStat" "LimmaTPvalue" $class [1] "MArrayLM" attr(,"package") [1] "limma" Paired t-test using limma
Facilitates easy data import and normalization Keeps track of "bad" spots To run the basic t-test, it takes a bit of additional work If we were to use the empirical Bayes statistics as implemented in limma, it would be even easier Empirical Bayes is generally BETTER than simple t-test Will talk about this type of analysis next week Limma also allows fitting models with multiple factors which we will also talk about next week Next time – multiple hypothesis testing and p-value adjustments limma so far