Henrik Bengtsson Mathematical Statistics Centre for Mathematical Sciences Lund University, Sweden Plate Effects in cDNA Microarray Data.

Slides:



Advertisements
Similar presentations
Microarray Quality Assessment Issues in High-Throughput Data Analysis BIOS Spring 2010 Dr Mark Reimers.
Advertisements

Statistical analysis of microarray data
Pre-processing in DNA microarray experiments Sandrine Dudoit PH 296, Section 33 13/09/2001.
Microarray Normalization
Filtering and Normalization of Microarray Gene Expression Data Waclaw Kusnierczyk Norwegian University of Science and Technology Trondheim, Norway.
Microarray Simultaneously determining the abundance of multiple(100s-10,000s) transcripts.
Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, The Walter and Eliza Hall Institute of Medical.
Mathematical Statistics Centre for Mathematical Sciences Lund University, Sweden Low-level Analysis of Microarray Data.
Introduction to Microarray Analysis and Technology Dave Lin - November 5, 2001.
Normalization of microarray data
Mathematical Statistics, Centre for Mathematical Sciences
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Introduction to the design of cDNA microarray experiments Statistics 246, Spring 2002 Week 9, Lecture 1 Yee Hwa Yang.
Image Quantitation in Microarray Analysis More tomorrow...
Normalization of Microarray Data - how to do it! Henrik Bengtsson Terry Speed
Sandrine Dudoit1 Microarray Experimental Design and Analysis Sandrine Dudoit jointly with Yee Hwa Yang Division of Biostatistics, UC Berkeley
Getting the numbers comparable
Statistics for Microarrays
The second-simplest cDNA microarray data analysis problem Terry Speed, UC Berkeley Fred Hutchinson Cancer Research Center March 9, 2001.
Normalization for cDNA Microarray Data Yee Hwa Yang, Sandrine Dudoit, Percy Luu and Terry Speed. SPIE BIOS 2001, San Jose, CA January 22, 2001.
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Preprocessing Methods for Two-Color Microarray Data
Microarray analysis Golan Yona ( original version by David Lin )
Normalization Class web site: Statistics for Microarrays.
Gene Expression Data Analyses (2)
Normalization of 2 color arrays Alex Sánchez. Dept. Estadística Universitat de Barcelona.
Microarray Technology Types Normalization Microarray Technology Microarray: –New Technology (first paper: 1995) Allows study of thousands of genes at.
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
Some thoughts of the design of cDNA microarray experiments Terry Speed & Yee HwaYang, Department of Statistics UC Berkeley MGED IV Boston, February 14,
Corrections and Normalization in microarrays data analysis
Gene Expression Data Analyses (1) Trupti Joshi Computer Science Department 317 Engineering Building North (O)
Filtering and Normalization of Microarray Gene Expression Data Waclaw Kusnierczyk Norwegian University of Science and Technology Trondheim, Norway.
1 Normalization Methods for Two-Color Microarray Data 1/13/2009 Copyright © 2009 Dan Nettleton.
Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004.
Image Quantitation in Microarray Analysis More tomorrow...
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
Affymetrix vs. glass slide based arrays
Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics
CDNA Microarrays MB206.
Panu Somervuo, March 19, cDNA microarrays.
Gene Expression Data Qifang Xu. Outline cDNA Microarray Technology cDNA Microarray Technology Data Representation Data Representation Statistical Analysis.
1 Two Color Microarrays EPP 245/298 Statistical Analysis of Laboratory Data.
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003.
Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research.
The Analysis of Microarray data using Mixed Models David Baird Peter Johnstone & Theresa Wilson AgResearch.
1 Pre-processing - Normalization Databases Statistics for Microarray Data Analysis – Lecture 2 The Fields Institute for Research in Mathematical Sciences.
Introduction to Statistical Analysis of Gene Expression Data Feng Hong Beespace meeting April 20, 2005.
Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002.
Pre-processing in DNA microarray experiments Sandrine Dudoit, Robert Gentleman, Rafael Irizarry, and Yee Hwa Yang Bioconductor short course Summer 2002.
Statistics for Differential Expression Naomi Altman Oct. 06.
Microarray Technology. Introduction Introduction –Microarrays are extremely powerful ways to analyze gene expression. –Using a microarray, it is possible.
Microarray hybridization Usually comparative – Ratio between two samples Examples – Tumor vs. normal tissue – Drug treatment vs. no treatment – Embryo.
MICROARRAYS D’EXPRESSIÓ ESTUDI DE REGULADORS DE LA TRANSCRIPCIÓ DE LA FAMILIA trxG M. Corominas:
(1) Normalization of cDNA microarray data Methods, Vol. 31, no. 4, December 2003 Gordon K. Smyth and Terry Speed.
The second-simplest cDNA microarray data analysis problem Terry Speed, UC Berkeley Bioinformatic Strategies For Application of Genomic Tools to Environmental.
Henrik Bengtsson Mathematical Statistics Centre for Mathematical Sciences Lund University Plate Effects in cDNA Microarray Data.
MGED IV Meeting Considerations in the Design of Local Research Microarray Databases Jason Gonçalves Slides available at
The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Microarray: An Introduction
Microarray Data Analysis Xuming He Department of Statistics University of Illinois at Urbana-Champaign.
Lecture 2 – Pre-processing and Normalization José Luis Mosquera Computational Lab on Microarrays Data Analysis Special Topics in Computer Science Institute.
DNA Microarray. Microarray Printing 96-well-plate (PCR Products) 384-well print-plate Microarray.
Gene expression arrays in cancer research: methods and applications
Normalization Methods for Two-Color Microarray Data
New normalisation methods for microarrays
Getting the numbers comparable
Normalization for cDNA Microarray Data
Presentation transcript:

Henrik Bengtsson Mathematical Statistics Centre for Mathematical Sciences Lund University, Sweden Plate Effects in cDNA Microarray Data

2 of 21 Outline Data Known systematic variation / artifacts New way of plotting microarray data Print order / Plate effects Normalization of plate effects Normalization strategies Finding the best strategy: Measure of Reproducibility Results Discussion

3 of 21 Data Matt Callow’s apoAI experiment (2000): –(8 apoAI-KO mice vs. pool of 8 control mice), 8 control mice vs. pool of 8 control mice, i.e. eight hybridized slides. –5357 EST’s/genes (6 triplicates, 175 duplicates, 4989 single spotted) & 840 blanks => 6384 spots in all. –Labeled using Cy3-dUTP and Cy5-dUTP. –Signals extracted from the images by Spot.

4 of 21 Intensity dependent effects The log-ratio, M, depends on the intensity of the spot, A.

5 of 21 Print-tip/spatial intensity effects The log-ratio (and its variance) varies with print-tip group. But, how are the spots printed…?

6 of spots printed onto N slides in total 399 print turns using 4x4 print-tips 4·4·399= 6384

7 of 21 Print order plot The spots are order according to when they were spotted/dipped onto the glass slide(s). Note that it takes hours/days to print all spots an all slides.

8 of 21 Print dip plot Median values of the 16 log-ratios at each dip from each of the 399 print turns.

9 of 21 Sources of artifacts scanning data: (R fg,G fg,R bg,G bg,...) cDNA clones PCR product amplification purification printing Hybridize RNA Test sample cDNA RNA Reference sample cDNA excitation red laser green laser emission overlay images Production Plate effects (clone sets,..., ?) Intensity effects (labeling efficiency) Intensity effects (quenching) Print order effects (climate, print-tips,...)

10 of 21 Plate effects The log-ratios depends on the plate the spotted clone comes from. (384-well plates from 6 different labs were used)

11 of 21 Normalizing plate by plate Assumption: The genes from one plate are in average non-differentially expressed. Correctness? Are clones on the plates selected randomly? Spots on plates are less random than for instance spots in print-tip groups. Recall that in the current setup we do a comparison between 8 control mice and the pool of them.

12 of 21 Removing (constant) plate biases Will remove some of the intensity dependent effects......and some of the spatial artifacts.

13 of 21...and then an intensity normalization? Intensity normalization => reintroduced plate biases! Should we normalize A for plate effect? No! Less DNA hybridized to the blanks and to the ”brain” spots, compared to the rest (“liver” clones) Why? Because the intensities of the spots, A, also show plate effects. ?

14 of 21 Intensity dep. normalization plate by plate...and most of the spatial artifacts....plus a print-tip normalization? Removes the plate effects...

15 of 21 Multiple ways to normalize Component-wise normalization methods, e.g. Ex: print-tip normalization + constant plate normalization Ex: plate intensity normalization + print-tip normalization... will work in the general case Simultaneous normalization methods (not covered here) Ex: print-tip & plate intensity normalization (two dimensions)... requires a model and will not be applicable to the general case Need a way to compare different the outcomes...

16 of 21 Measure of Reproducibility Median absolute deviation (MAD) for gene i with replicates j=1,2,...,J: d i = · median | r ij | where r ij = M ij – median M ij is residual j for gene i. The measure of reproducibility (small in good) is a scalar defined as the mean of all genewise MADs: M.O.R. =  d i / N where N is the number of genes. Ex: two different genes: d a < d b

17 of 21 Pl–Constant platewise normalization, Pl(A)–Intensity dependent platewise normalization, Sl(A)–Intensity dependent slidewise normalization, Pr(A)–Intensity dependent print-tip-wise normalization, sPr(A)–Scaled intensity dependent print-tip-wise normalization, bg–background corrected data. Results 21 different normalization strategies was performed on both background and non-background subtracted data, i.e. total 42 runs.

18 of 21 Pl – Constant platewise norm., Pl(A) – Intensity dep. platewise norm., Sl(A) – Intensity dep. slidewise norm., Pr(A) – Intensity dep. print-tip-wise norm., sPr(A) – Scaled intensity dep. print-tip-wise norm., bg – background corrected data. Doing platewise intensity dependent normalization lowers the gene variability by another ~10% from print-tip norm. In all cases it is better not to do background correction. Using measure of reproducibility is helpful in deciding what to do. Results

19 of 21 Visual comparison Scaled print-tip intensity normalization: (M.O.R.=0.123; 46%) Scaled print-tip follow by plate intensity normalization: (M.O.R.=0.110; 41%) No normalization: (M.O.R.=0.270; 100%)

20 of 21 Discussion What are the reasons for seeing plate effects and where do they actually occur? i) in clone setup, ii) on the plates, iii) during printing, iv) at hybridization or where? Look at the behavior of the variance in addition to the bias. Are there any reasons for doing platewise normalization of variances too? How general is the result that not doing background subtraction performs better than doing it?

21 of 21 Acknowledgements Statistics Dept, UC Berkeley: * Sandrine Dudoit * Terry Speed * Yee Hwa Yang Lawrence Berkeley National Laboratory: * Matt Callow Mathematical Statistics, Lund University: * Ola Hössjer, Jan Holst Ernest Gallo Research Center, UCSF: * Karen Berger com.braju.sma – object-oriented extension to sma (free): [R] Software (free): The Statistical Microarray Analysis (sma) library (free):

22 of 21 Extra slides

23 of 21 Transformed data {(M,A)} n= : M = log 2 (R/G) (ratio), A = log 2 (R·G) 1/2 = 1/2·log 2 (R·G) (intensity signal)  R=(2 2A+M ) 1/2, G=(2 2A-M ) 1/2 Data Transformation “Observed” data {(R,G)} n= : R = red channel signal G = green channel signal (background corrected or not)

24 of 21 Normalization Biased towards the green channel & Intensity dependent artifacts

25 of 21 Blanks / empty spots blanks 99%