1 Two Color Microarrays EPP 245/298 Statistical Analysis of Laboratory Data
November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 2 Two-Color Arrays Two-color arrays are designed to account for variability in slides and spots by using two samples on each slide, each labeled with a different dye. If a spot is too large, for example, both signals will be too big, and the difference or ratio will eliminate that source of variability
November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 3 Dyes The most common dye sets are Cy3 (green) and Cy5 (red), which fluoresce at approximately 550 nm and 649 nm respectively (red light ~ 700 nm, green light ~ 550 nm) The dyes are excited with lasers at 532 nm (Cy3 green) and 635 nm (Cy5 red) The emissions are read via filters using a ccd device
November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 4
November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 5
November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 6
November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 7 File Format A slide scanned with Axon GenePix produces a file with extension.gpr that contains the results: This contains 29 rows of headers followed by 43 columns of data (in our example files) For full analysis one may also need a.gal file that describes the layout of the arrays
November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 8 "Block" "Column" "Row" "Name" "ID" "X" "Y" "Dia." "F635 Median" "F635 Mean" "F635 SD" "B635 Median" "B635 Mean" "B635 SD" "% > B635+1SD" "% > B635+2SD" "F635 % Sat."
November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 9 "F532 Median" "F532 Mean" "F532 SD" "B532 Median" "B532 Mean" "B532 SD" "% > B532+1SD" "% > B532+2SD" "F532 % Sat."
November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 10 "Ratio of Medians (635/532)" "Ratio of Means (635/532)" "Median of Ratios (635/532)" "Mean of Ratios (635/532)" "Ratios SD (635/532)" "Rgn Ratio (635/532)" "Rgn R² (635/532)" "F Pixels" "B Pixels" "Sum of Medians" "Sum of Means" "Log Ratio (635/532)" "F635 Median - B635" "F532 Median - B532" "F635 Mean - B635" "F532 Mean - B532" "Flags"
November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 11 Analysis Choices Mean or median foreground intensity Background corrected or not Log transform (base 2, e, or 10) or glog transform Log is compatible only with no background correction Glog is best with background correction
November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 12 d41 <- read.table(" gpr",header=T,skip=29) d41 <- d41[,c(4,5,9,10,12,13,18,19,21,22)] d50 <- read.table(" gpr",header=T,skip=29) d50 <- d50[,c(4,5,9,10,12,13,18,19,21,22)] d46 <- read.table(" gpr",header=T,skip=29) d46 <- d46[,c(4,5,9,10,12,13,18,19,21,22)] d47 <- read.table(" gpr",header=T,skip=29) d47 <- d47[,c(4,5,9,10,12,13,18,19,21,22)] d48 <- read.table(" gpr",header=T,skip=29) d48 <- d48[,c(4,5,9,10,12,13,18,19,21,22)] d49 <- read.table(" gpr",header=T,skip=29) d49 <- d49[,c(4,5,9,10,12,13,18,19,21,22)] d43 <- read.table(" gpr",header=T,skip=29) d43 <- d43[,c(4,5,9,10,12,13,18,19,21,22)]
November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 13 dataprep <- function(method="median",bc=F) { if ((method=="mean")&(bc)) cvec <- c(1,0,-1,0) if ((method!="median")&(bc)) cvec <- c(0,1,0,-1) if ((method=="mean")&(!bc)) cvec <- c(1,0,0,0) if ((method!="median")&(!bc)) cvec <- c(0,1,0,0) d41a <- as.matrix(d41[,3:6]) %*% cvec d41b <- as.matrix(d41[,7:10]) %*% cvec d50a <- as.matrix(d50[,3:6]) %*% cvec d50b <- as.matrix(d50[,7:10]) %*% cvec d46a <- as.matrix(d46[,3:6]) %*% cvec d46b <- as.matrix(d46[,7:10]) %*% cvec d45a <- as.matrix(d43[,3:6]) %*% cvec d45b <- as.matrix(d43[,7:10]) %*% cvec alldata <- cbind(d41a,d41b,d50a,d50b,d46a,d46b,d47a,d47b, d48a,d48b,d49a,d49b,d43a,d43b,d44a,d44b,d42a,d42b,d43a,d43b) return(alldata) }
November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 14 alldata <- dataprep(method="median",bc=F) rownames(alldata) <- d41[,1] dye <- as.factor(rep(c("Cy5","Cy3"),10)) slide <- as.factor(rep(1:10,each=2)) treat <- c(1,0,0,1,0,1,1,0,0,3,3,0,0,3,3,0,0,1,1,0) geneID <- d41[,1:2]
November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 15 Array normalization Array normalization is meant to increase the precision of comparisons by adjusting for variations that cover entire arrays Without normalization, the analysis would be valid, but possibly less sensitive However, a poor normalization method will be worse than none at all.
November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 16 Possible normalization methods We can equalize the mean or median intensity by adding or multiplying a correction term We can use different normalizations at different intensity levels (intensity-based normalization) for example by lowess or quantiles We can normalize for other things such as print tips
November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 17 Group 1Group 2 Array 1Array 2Array 3Array 4 Gene Gene Gene Example for Normalization
November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 18 > normex <- matrix(c(1100,110,80,900,95,65,425,85,55,550,110,80),ncol=4) > normex [,1] [,2] [,3] [,4] [1,] [2,] [3,] > group <- as.factor(c(1,1,2,2)) > anova(lm(normex[1,] ~ group)) Analysis of Variance Table Response: normex[1, ] Df Sum Sq Mean Sq F value Pr(>F) group * Residuals Signif. codes: 0 `***' `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 19 > anova(lm(normex[2,] ~ group)) Analysis of Variance Table Response: normex[2, ] Df Sum Sq Mean Sq F value Pr(>F) group Residuals > anova(lm(normex[3,] ~ group)) Analysis of Variance Table Response: normex[3, ] Df Sum Sq Mean Sq F value Pr(>F) group Residuals
November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 20 Group 1Group 2 Array 1Array 2Array 3Array 4 Gene Gene Gene Additive Normalization by Means
November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 21 > mn <- mean(cmn) > normex - rbind(cmn,cmn,cmn)+mn [,1] [,2] [,3] [,4] cmn cmn cmn > normex.1 <- normex - rbind(cmn,cmn,cmn)+mn
November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 22 > mn <- mean(cmn) > anova(lm(normex.1[1,] ~ group)) Analysis of Variance Table Response: normex.1[1, ] Df Sum Sq Mean Sq F value Pr(>F) group * Residuals > anova(lm(normex.1[2,] ~ group)) Analysis of Variance Table Response: normex.1[2, ] Df Sum Sq Mean Sq F value Pr(>F) group * Residuals > anova(lm(normex.1[3,] ~ group)) Analysis of Variance Table Response: normex.1[3, ] Df Sum Sq Mean Sq F value Pr(>F) group * Residuals
November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 23 Group 1Group 2 Array 1Array 2Array 3Array 4 Gene Gene Gene Multiplicative Normalization by Means
November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 24 > normex*mn/rbind(cmn,cmn,cmn) [,1] [,2] [,3] [,4] cmn cmn cmn > normex.2 <- normex*mn/rbind(cmn,cmn,cmn) > anova(lm(normex.2[1,] ~ group)) Response: normex.2[1, ] Df Sum Sq Mean Sq F value Pr(>F) group ** Residuals > anova(lm(normex.2[2,] ~ group)) Response: normex.2[2, ] Df Sum Sq Mean Sq F value Pr(>F) group ** Residuals > anova(lm(normex.2[3,] ~ group)) Response: normex.2[3, ] Df Sum Sq Mean Sq F value Pr(>F) group * Residuals
November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 25 Group 1Group 2 Array 1Array 2Array 3Array 4 Gene 1 Gene 2 Gene 3 Multiplicative Normalization by Medians