Henrik Bengtsson Mathematical Statistics Centre for Mathematical Sciences Lund University Plate Effects in cDNA Microarray Data
Outline Intensity dependent effects A new way of plotting microarray data Plate effects Plate normalization Measure of Fitness Results Discussion
Data Matt Callow’s ApoAI experiment (2000): –(8 ApoAI-KO mice vs. pool of 8 control mice), 8 control mice vs. pool of 8 control mice. –5357 ESTs/genes (6 triplicates, 175 duplicates, 4989 single spotted) & 840 blanks => 6384 spots in all. –Labeled using Cy3-dUTP and Cy5-dUTP. –Signals extracted from images by Spot.
Intensity dependent effects The log-ratio, M, depends on the intensity of the spot, A.
Print-tip effects The log-ratio (and its variance) depends on printtip group. How are the spots printed…?
Print order plot The spots are order according to when they were spotted/dipped onto the glass slide(s).
Plate effects The log-ratios depends on the plate the spotted clone comes from. (384-well plates from 6 different labs were used)
Plate Normalization Assumption: The genes from one plate are in average non-differentially expressed. Correctness? Are clones on the plates selected randomly? Spots on plates are less random that for instance spots in print- tip groups. The ApoAI mouse experiment is a comparison between 8 control mice and the pool of them. Even if clones on plates were from different tissues, e.g. plate 9-12 from brain, in this setup it should not affect the ratios, just the strength of the signals.
Removing plate biases
Intensity normalization Intensities (A) also have plate effects. Intensity normalization => plate biases again! Should we normalize A for plate? Probably not! Blanks and ”brain” spots have lower intensities, whereas the ”liver” spots have higher...
Sources of Artifacts scanning data: (R,G,...) cDNA clones PCR product amplification purification printing Hybridize RNA Test sample cDNA RNA Reference sample cDNA excitation red laser green laser emission overlay images Production Plate effects (?) Intensity effects (labelling efficiency) Intensity effects (quenching)
Several possible approaches ;( Decisions to make: Background correction? Plate normalization? Intensity (slide, print-tip or scaled print-tip) normalization? Platewise-intensity normalization? If both plate and intensity normalization, in what order? Maybe plate-intensity-plate-intensity-plate-... and so on? Need a way to compare different approaches...
Measure of Fitness Median absolute deviation (MAD) for gene i: d i = · median | r ij | where r ij = M ij – median M ij is residual j for gene i. The measure of fitness is defined as the mean of the genewise MADs: m.o.f. = d i / N where N is the number of genes. (...or or look at the density of the d i ’s) Important. Compare on the same scale!
Visual comparison between the ”best” Slidewise intensity normalization: (m.o.f.=0.228) Plate+print-tip int.+plate normalization: (m.o.f.=0.188)
bg – background corrected, P – Plate biases removed, S – slide-intensity normalized, B – printtip-intensity normalized, sB – scaled printtip intensity normalized. m.o.f. Removing plate biases first significantly lowers the gene variabilities. (15-20% lower than intensity normalization only) It is critical not to do background correction. Using measure of fitness is helpful in deciding what to do. Results
Discussion What are the reasons for plate effects and where do they actually occur? i) On the plates, ii) during printing or iii) at hybridization? How should one best standardize the measure of fitness? i) Based an all spot, ii) on a subset (blanks?), or iii) ?
Acknowledgements Statistics Dept, UC Berkeley: * Sandrine Dudoit * Terry Speed * Yee Hwa Yang Lawrence Berkeley National Laboratory: * Matt Callow Ernest Gallo Research Center, UCSF: * Karen Berger Mathematical Statistics, Lund University: * Ola Hössjer com.braju.sma - object oriented extension to sma (free): [R] Software (free): The Statistical Microarray Analysis (sma) library (free):
Transformed data {(M,A)} n= : M = log 2 (R/G) (ratio), A = log 2 (R·G) 1/2 = 1/2·log 2 (R·G) (intensity signal) R=(2 2A+M ) 1/2, G=(2 2A-M ) 1/2 Data Transformation “Observed” data {(R,G)} n= : R = red channel signal G = green channel signal (background corrected or not)
Normalization Biased towards the green channel & Intensity dependent artifacts
Blanks / Empty spots blanks 99%