1 Two Color Microarrays EPP 245/298 Statistical Analysis of Laboratory Data.

Slides:



Advertisements
Similar presentations
SPH 247 Statistical Analysis of Laboratory Data 1April 2, 2013SPH 247 Statistical Analysis of Laboratory Data.
Advertisements

M. Kathleen Kerr “Design Considerations for Efficient and Effective Microarray Studies” Biometrics 59, ; December 2003 Biostatistics Article Oncology.
ANOVA: Analysis of Variation
Pre-processing in DNA microarray experiments Sandrine Dudoit PH 296, Section 33 13/09/2001.
MicroArray Image Analysis
MicroArray Image Analysis Robin Liechti
Microarray Normalization
Filtering and Normalization of Microarray Gene Expression Data Waclaw Kusnierczyk Norwegian University of Science and Technology Trondheim, Norway.
Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, The Walter and Eliza Hall Institute of Medical.
Introduction to Microarray Analysis and Technology Dave Lin - November 5, 2001.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
TIGR Spotfinder: a tool for microarray image processing
Getting the numbers comparable
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Preprocessing Methods for Two-Color Microarray Data
CDNA Microarray Design and Pre-processing By H. Bjørn Nielsen.
Figure 1: (A) A microarray may contain thousands of ‘spots’. Each spot contains many copies of the same DNA sequence that uniquely represents a gene from.
1 Basic Statistical Analysis of Array Data EPP 245 Statistical Analysis of Laboratory Data.
Gene Expression Data Analyses (2)
Microarray Technology Types Normalization Microarray Technology Microarray: –New Technology (first paper: 1995) Allows study of thousands of genes at.
ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”
Statistical Analysis of Microarray Data
Scanning and image analysis Scanning -Dyes -Confocal scanner -CCD scanner Image File Formats Image analysis -Locating the spots -Segmentation -Evaluating.
SPH 247 Statistical Analysis of Laboratory Data. Two-Color Arrays Two-color arrays are designed to account for variability in slides and spots by using.
Filtering and Normalization of Microarray Gene Expression Data Waclaw Kusnierczyk Norwegian University of Science and Technology Trondheim, Norway.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
1 Normalization Methods for Two-Color Microarray Data 1/13/2009 Copyright © 2009 Dan Nettleton.
(4) Within-Array Normalization PNAS, vol. 101, no. 5, Feb Jianqing Fan, Paul Tam, George Vande Woude, and Yi Ren.
Scanning and Image Processing -by Steve Clough. GSI Lumonics cDNA microarrays use two dyes with well separated emission spectra such as Cy3 and Cy5 to.
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
Affymetrix vs. glass slide based arrays
Hybridization and data acquisition –Hybridization –Scanning –Image analysis –Background correction and filtering –Data transformation Methods for normalization.
IMAGE INFORMATICS SOLUTIONS Extracting Information From Images Array-Pro 4.5 Training, May 2003.
The following slides have been adapted from to be presented at the Follow-up course on Microarray Data Analysis.
Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research.
CDNA Microarrays MB206.
Panu Somervuo, March 19, cDNA microarrays.
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
Microarray - Leukemia vs. normal GeneChip System.
Design of Experiments Problem formulation Setting up the experiment Analysis of data Panu Somervuo, March 20, 2007.
The Analysis of Microarray data using Mixed Models David Baird Peter Johnstone & Theresa Wilson AgResearch.
SPH 247 Statistical Analysis of Laboratory Data April 23, 2013.
Lecture Topic 5 Pre-processing AFFY data. Probe Level Analysis The Purpose –Calculate an expression value for each probe set (gene) from the PM.
ImArray - An Automated High-Performance Microarray Scanner Software for Microarray Image Analysis, Data Management and Knowledge Mining Wei-Bang Chen and.
Introduction to Statistical Analysis of Gene Expression Data Feng Hong Beespace meeting April 20, 2005.
Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002.
Statistics for Differential Expression Naomi Altman Oct. 06.
Henrik Bengtsson Mathematical Statistics Centre for Mathematical Sciences Lund University, Sweden Plate Effects in cDNA Microarray Data.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic.
Microarray Technology. Introduction Introduction –Microarrays are extremely powerful ways to analyze gene expression. –Using a microarray, it is possible.
Microarray hybridization Usually comparative – Ratio between two samples Examples – Tumor vs. normal tissue – Drug treatment vs. no treatment – Embryo.
MICROARRAYS D’EXPRESSIÓ ESTUDI DE REGULADORS DE LA TRANSCRIPCIÓ DE LA FAMILIA trxG M. Corominas:
(1) Normalization of cDNA microarray data Methods, Vol. 31, no. 4, December 2003 Gordon K. Smyth and Terry Speed.
Henrik Bengtsson Mathematical Statistics Centre for Mathematical Sciences Lund University Plate Effects in cDNA Microarray Data.
EPP 245 Statistical Analysis of Laboratory Data 1April 23, 2010SPH 247 Statistical Analysis of Laboratory Data.
Variability & Statistical Analysis of Microarray Data GCAT – Georgetown July 2004 Jo Hardin Pomona College
The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Other uses of DNA microarrays
1 Analysis of Variance (ANOVA) EPP 245/298 Statistical Analysis of Laboratory Data.
Microarray Data Analysis Xuming He Department of Statistics University of Illinois at Urbana-Champaign.
Lecture 2 – Pre-processing and Normalization José Luis Mosquera Computational Lab on Microarrays Data Analysis Special Topics in Computer Science Institute.
ANOVA: Analysis of Variation
ANOVA: Analysis of Variation
Normalization Methods for Two-Color Microarray Data
Image Processing for cDNA Microarray Data
Gene Expression Arrays
Pre-processing AFFY data
Presentation transcript:

1 Two Color Microarrays EPP 245/298 Statistical Analysis of Laboratory Data

November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 2 Two-Color Arrays Two-color arrays are designed to account for variability in slides and spots by using two samples on each slide, each labeled with a different dye. If a spot is too large, for example, both signals will be too big, and the difference or ratio will eliminate that source of variability

November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 3 Dyes The most common dye sets are Cy3 (green) and Cy5 (red), which fluoresce at approximately 550 nm and 649 nm respectively (red light ~ 700 nm, green light ~ 550 nm) The dyes are excited with lasers at 532 nm (Cy3 green) and 635 nm (Cy5 red) The emissions are read via filters using a ccd device

November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 4

November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 5

November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 6

November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 7 File Format A slide scanned with Axon GenePix produces a file with extension.gpr that contains the results: This contains 29 rows of headers followed by 43 columns of data (in our example files) For full analysis one may also need a.gal file that describes the layout of the arrays

November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 8 "Block" "Column" "Row" "Name" "ID" "X" "Y" "Dia." "F635 Median" "F635 Mean" "F635 SD" "B635 Median" "B635 Mean" "B635 SD" "% > B635+1SD" "% > B635+2SD" "F635 % Sat."

November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 9 "F532 Median" "F532 Mean" "F532 SD" "B532 Median" "B532 Mean" "B532 SD" "% > B532+1SD" "% > B532+2SD" "F532 % Sat."

November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 10 "Ratio of Medians (635/532)" "Ratio of Means (635/532)" "Median of Ratios (635/532)" "Mean of Ratios (635/532)" "Ratios SD (635/532)" "Rgn Ratio (635/532)" "Rgn R² (635/532)" "F Pixels" "B Pixels" "Sum of Medians" "Sum of Means" "Log Ratio (635/532)" "F635 Median - B635" "F532 Median - B532" "F635 Mean - B635" "F532 Mean - B532" "Flags"

November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 11 Analysis Choices Mean or median foreground intensity Background corrected or not Log transform (base 2, e, or 10) or glog transform Log is compatible only with no background correction Glog is best with background correction

November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 12 d41 <- read.table(" gpr",header=T,skip=29) d41 <- d41[,c(4,5,9,10,12,13,18,19,21,22)] d50 <- read.table(" gpr",header=T,skip=29) d50 <- d50[,c(4,5,9,10,12,13,18,19,21,22)] d46 <- read.table(" gpr",header=T,skip=29) d46 <- d46[,c(4,5,9,10,12,13,18,19,21,22)] d47 <- read.table(" gpr",header=T,skip=29) d47 <- d47[,c(4,5,9,10,12,13,18,19,21,22)] d48 <- read.table(" gpr",header=T,skip=29) d48 <- d48[,c(4,5,9,10,12,13,18,19,21,22)] d49 <- read.table(" gpr",header=T,skip=29) d49 <- d49[,c(4,5,9,10,12,13,18,19,21,22)] d43 <- read.table(" gpr",header=T,skip=29) d43 <- d43[,c(4,5,9,10,12,13,18,19,21,22)]

November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 13 dataprep <- function(method="median",bc=F) { if ((method=="mean")&(bc)) cvec <- c(1,0,-1,0) if ((method!="median")&(bc)) cvec <- c(0,1,0,-1) if ((method=="mean")&(!bc)) cvec <- c(1,0,0,0) if ((method!="median")&(!bc)) cvec <- c(0,1,0,0) d41a <- as.matrix(d41[,3:6]) %*% cvec d41b <- as.matrix(d41[,7:10]) %*% cvec d50a <- as.matrix(d50[,3:6]) %*% cvec d50b <- as.matrix(d50[,7:10]) %*% cvec d46a <- as.matrix(d46[,3:6]) %*% cvec d46b <- as.matrix(d46[,7:10]) %*% cvec d45a <- as.matrix(d43[,3:6]) %*% cvec d45b <- as.matrix(d43[,7:10]) %*% cvec alldata <- cbind(d41a,d41b,d50a,d50b,d46a,d46b,d47a,d47b, d48a,d48b,d49a,d49b,d43a,d43b,d44a,d44b,d42a,d42b,d43a,d43b) return(alldata) }

November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 14 alldata <- dataprep(method="median",bc=F) rownames(alldata) <- d41[,1] dye <- as.factor(rep(c("Cy5","Cy3"),10)) slide <- as.factor(rep(1:10,each=2)) treat <- c(1,0,0,1,0,1,1,0,0,3,3,0,0,3,3,0,0,1,1,0) geneID <- d41[,1:2]

November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 15 Array normalization Array normalization is meant to increase the precision of comparisons by adjusting for variations that cover entire arrays Without normalization, the analysis would be valid, but possibly less sensitive However, a poor normalization method will be worse than none at all.

November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 16 Possible normalization methods We can equalize the mean or median intensity by adding or multiplying a correction term We can use different normalizations at different intensity levels (intensity-based normalization) for example by lowess or quantiles We can normalize for other things such as print tips

November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 17 Group 1Group 2 Array 1Array 2Array 3Array 4 Gene Gene Gene Example for Normalization

November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 18 > normex <- matrix(c(1100,110,80,900,95,65,425,85,55,550,110,80),ncol=4) > normex [,1] [,2] [,3] [,4] [1,] [2,] [3,] > group <- as.factor(c(1,1,2,2)) > anova(lm(normex[1,] ~ group)) Analysis of Variance Table Response: normex[1, ] Df Sum Sq Mean Sq F value Pr(>F) group * Residuals Signif. codes: 0 `***' `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 19 > anova(lm(normex[2,] ~ group)) Analysis of Variance Table Response: normex[2, ] Df Sum Sq Mean Sq F value Pr(>F) group Residuals > anova(lm(normex[3,] ~ group)) Analysis of Variance Table Response: normex[3, ] Df Sum Sq Mean Sq F value Pr(>F) group Residuals

November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 20 Group 1Group 2 Array 1Array 2Array 3Array 4 Gene Gene Gene Additive Normalization by Means

November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 21 > mn <- mean(cmn) > normex - rbind(cmn,cmn,cmn)+mn [,1] [,2] [,3] [,4] cmn cmn cmn > normex.1 <- normex - rbind(cmn,cmn,cmn)+mn

November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 22 > mn <- mean(cmn) > anova(lm(normex.1[1,] ~ group)) Analysis of Variance Table Response: normex.1[1, ] Df Sum Sq Mean Sq F value Pr(>F) group * Residuals > anova(lm(normex.1[2,] ~ group)) Analysis of Variance Table Response: normex.1[2, ] Df Sum Sq Mean Sq F value Pr(>F) group * Residuals > anova(lm(normex.1[3,] ~ group)) Analysis of Variance Table Response: normex.1[3, ] Df Sum Sq Mean Sq F value Pr(>F) group * Residuals

November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 23 Group 1Group 2 Array 1Array 2Array 3Array 4 Gene Gene Gene Multiplicative Normalization by Means

November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 24 > normex*mn/rbind(cmn,cmn,cmn) [,1] [,2] [,3] [,4] cmn cmn cmn > normex.2 <- normex*mn/rbind(cmn,cmn,cmn) > anova(lm(normex.2[1,] ~ group)) Response: normex.2[1, ] Df Sum Sq Mean Sq F value Pr(>F) group ** Residuals > anova(lm(normex.2[2,] ~ group)) Response: normex.2[2, ] Df Sum Sq Mean Sq F value Pr(>F) group ** Residuals > anova(lm(normex.2[3,] ~ group)) Response: normex.2[3, ] Df Sum Sq Mean Sq F value Pr(>F) group * Residuals

November 10, 2004EPP 245 Statistical Analysis of Laboratory Data 25 Group 1Group 2 Array 1Array 2Array 3Array 4 Gene 1 Gene 2 Gene 3 Multiplicative Normalization by Medians