Henrik Bengtsson Mathematical Statistics Centre for Mathematical Sciences Lund University Plate Effects in cDNA Microarray Data.

Slides:



Advertisements
Similar presentations
Improved normalisation of microarray data by optimised iterative local regression Matthias E. Futschik Department of Information Science University of.
Advertisements

Microarray Quality Assessment Issues in High-Throughput Data Analysis BIOS Spring 2010 Dr Mark Reimers.
Statistical analysis of microarray data
Pre-processing in DNA microarray experiments Sandrine Dudoit PH 296, Section 33 13/09/2001.
Filtering and Normalization of Microarray Gene Expression Data Waclaw Kusnierczyk Norwegian University of Science and Technology Trondheim, Norway.
Microarray Simultaneously determining the abundance of multiple(100s-10,000s) transcripts.
Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, The Walter and Eliza Hall Institute of Medical.
Mathematical Statistics Centre for Mathematical Sciences Lund University, Sweden Low-level Analysis of Microarray Data.
Introduction to Microarray Analysis and Technology Dave Lin - November 5, 2001.
Normalization of microarray data
Mathematical Statistics, Centre for Mathematical Sciences
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Introduction to the design of cDNA microarray experiments Statistics 246, Spring 2002 Week 9, Lecture 1 Yee Hwa Yang.
Image Quantitation in Microarray Analysis More tomorrow...
Normalization of Microarray Data - how to do it! Henrik Bengtsson Terry Speed
Sandrine Dudoit1 Microarray Experimental Design and Analysis Sandrine Dudoit jointly with Yee Hwa Yang Division of Biostatistics, UC Berkeley
Statistics for Microarrays
The second-simplest cDNA microarray data analysis problem Terry Speed, UC Berkeley Fred Hutchinson Cancer Research Center March 9, 2001.
Normalization for cDNA Microarray Data Yee Hwa Yang, Sandrine Dudoit, Percy Luu and Terry Speed. SPIE BIOS 2001, San Jose, CA January 22, 2001.
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Microarray analysis Golan Yona ( original version by David Lin )
Normalization Class web site: Statistics for Microarrays.
Low-Level Analysis and QC Regional Biases Mark Reimers, NCI.
Gene Expression Data Analyses (2)
Normalization of 2 color arrays Alex Sánchez. Dept. Estadística Universitat de Barcelona.
Microarray Technology Types Normalization Microarray Technology Microarray: –New Technology (first paper: 1995) Allows study of thousands of genes at.
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
Some thoughts of the design of cDNA microarray experiments Terry Speed & Yee HwaYang, Department of Statistics UC Berkeley MGED IV Boston, February 14,
Corrections and Normalization in microarrays data analysis
Filtering and Normalization of Microarray Gene Expression Data Waclaw Kusnierczyk Norwegian University of Science and Technology Trondheim, Norway.
1 Normalization Methods for Two-Color Microarray Data 1/13/2009 Copyright © 2009 Dan Nettleton.
Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004.
Image Quantitation in Microarray Analysis More tomorrow...
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
(2) Ratio statistics of gene expression levels and applications to microarray data analysis Bioinformatics, Vol. 18, no. 9, 2002 Yidong Chen, Vishnu Kamat,
Affymetrix vs. glass slide based arrays
Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics
CDNA Microarrays MB206.
Panu Somervuo, March 19, cDNA microarrays.
Gene Expression Data Qifang Xu. Outline cDNA Microarray Technology cDNA Microarray Technology Data Representation Data Representation Statistical Analysis.
1 Two Color Microarrays EPP 245/298 Statistical Analysis of Laboratory Data.
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003.
Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research.
Introduction to DNA microarray technologies Sandrine Dudoit, Robert Gentleman, Rafael Irizarry, and Yee Hwa Yang Bioconductor short course Summer 2002.
The Analysis of Microarray data using Mixed Models David Baird Peter Johnstone & Theresa Wilson AgResearch.
1 Pre-processing - Normalization Databases Statistics for Microarray Data Analysis – Lecture 2 The Fields Institute for Research in Mathematical Sciences.
Introduction to Statistical Analysis of Gene Expression Data Feng Hong Beespace meeting April 20, 2005.
Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002.
Pre-processing in DNA microarray experiments Sandrine Dudoit, Robert Gentleman, Rafael Irizarry, and Yee Hwa Yang Bioconductor short course Summer 2002.
Statistics for Differential Expression Naomi Altman Oct. 06.
Henrik Bengtsson Mathematical Statistics Centre for Mathematical Sciences Lund University, Sweden Plate Effects in cDNA Microarray Data.
Microarray Technology. Introduction Introduction –Microarrays are extremely powerful ways to analyze gene expression. –Using a microarray, it is possible.
Microarray hybridization Usually comparative – Ratio between two samples Examples – Tumor vs. normal tissue – Drug treatment vs. no treatment – Embryo.
MICROARRAYS D’EXPRESSIÓ ESTUDI DE REGULADORS DE LA TRANSCRIPCIÓ DE LA FAMILIA trxG M. Corominas:
(1) Normalization of cDNA microarray data Methods, Vol. 31, no. 4, December 2003 Gordon K. Smyth and Terry Speed.
Analyzing Expression Data: Clustering and Stats Chapter 16.
The second-simplest cDNA microarray data analysis problem Terry Speed, UC Berkeley Bioinformatic Strategies For Application of Genomic Tools to Environmental.
Variability & Statistical Analysis of Microarray Data GCAT – Georgetown July 2004 Jo Hardin Pomona College
The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Microarray: An Introduction
Microarray Data Analysis Xuming He Department of Statistics University of Illinois at Urbana-Champaign.
Lecture 2 – Pre-processing and Normalization José Luis Mosquera Computational Lab on Microarrays Data Analysis Special Topics in Computer Science Institute.
DNA Microarray. Microarray Printing 96-well-plate (PCR Products) 384-well print-plate Microarray.
CDNA-Project cDNA project Julia Brettschneider (UCB Statistics)
Gene expression arrays in cancer research: methods and applications
New normalisation methods for microarrays
Normalization for cDNA Microarray Data
Presentation transcript:

Henrik Bengtsson Mathematical Statistics Centre for Mathematical Sciences Lund University Plate Effects in cDNA Microarray Data

Outline Intensity dependent effects A new way of plotting microarray data Plate effects Plate normalization Measure of Fitness Results Discussion

Data Matt Callow’s ApoAI experiment (2000): –(8 ApoAI-KO mice vs. pool of 8 control mice), 8 control mice vs. pool of 8 control mice. –5357 ESTs/genes (6 triplicates, 175 duplicates, 4989 single spotted) & 840 blanks => 6384 spots in all. –Labeled using Cy3-dUTP and Cy5-dUTP. –Signals extracted from images by Spot.

Intensity dependent effects The log-ratio, M, depends on the intensity of the spot, A.

Print-tip effects The log-ratio (and its variance) depends on printtip group. How are the spots printed…?

Print order plot The spots are order according to when they were spotted/dipped onto the glass slide(s).

Plate effects The log-ratios depends on the plate the spotted clone comes from. (384-well plates from 6 different labs were used)

Plate Normalization Assumption: The genes from one plate are in average non-differentially expressed. Correctness? Are clones on the plates selected randomly? Spots on plates are less random that for instance spots in print- tip groups. The ApoAI mouse experiment is a comparison between 8 control mice and the pool of them. Even if clones on plates were from different tissues, e.g. plate 9-12 from brain, in this setup it should not affect the ratios, just the strength of the signals.

Removing plate biases

Intensity normalization Intensities (A) also have plate effects. Intensity normalization => plate biases again! Should we normalize A for plate? Probably not! Blanks and ”brain” spots have lower intensities, whereas the ”liver” spots have higher...

Sources of Artifacts scanning data: (R,G,...) cDNA clones PCR product amplification purification printing Hybridize RNA Test sample cDNA RNA Reference sample cDNA excitation red laser green laser emission overlay images Production Plate effects (?) Intensity effects (labelling efficiency) Intensity effects (quenching)

Several possible approaches ;( Decisions to make: Background correction? Plate normalization? Intensity (slide, print-tip or scaled print-tip) normalization? Platewise-intensity normalization? If both plate and intensity normalization, in what order? Maybe plate-intensity-plate-intensity-plate-... and so on? Need a way to compare different approaches...

Measure of Fitness Median absolute deviation (MAD) for gene i: d i = · median | r ij | where r ij = M ij – median M ij is residual j for gene i. The measure of fitness is defined as the mean of the genewise MADs: m.o.f. =  d i / N where N is the number of genes. (...or or look at the density of the d i ’s) Important. Compare on the same scale!

Visual comparison between the ”best” Slidewise intensity normalization: (m.o.f.=0.228) Plate+print-tip int.+plate normalization: (m.o.f.=0.188)

bg – background corrected, P – Plate biases removed, S – slide-intensity normalized, B – printtip-intensity normalized, sB – scaled printtip intensity normalized. m.o.f. Removing plate biases first significantly lowers the gene variabilities. (15-20% lower than intensity normalization only) It is critical not to do background correction. Using measure of fitness is helpful in deciding what to do. Results

Discussion What are the reasons for plate effects and where do they actually occur? i) On the plates, ii) during printing or iii) at hybridization? How should one best standardize the measure of fitness? i) Based an all spot, ii) on a subset (blanks?), or iii) ?

Acknowledgements Statistics Dept, UC Berkeley: * Sandrine Dudoit * Terry Speed * Yee Hwa Yang Lawrence Berkeley National Laboratory: * Matt Callow Ernest Gallo Research Center, UCSF: * Karen Berger Mathematical Statistics, Lund University: * Ola Hössjer com.braju.sma - object oriented extension to sma (free): [R] Software (free): The Statistical Microarray Analysis (sma) library (free):

Transformed data {(M,A)} n= : M = log 2 (R/G) (ratio), A = log 2 (R·G) 1/2 = 1/2·log 2 (R·G) (intensity signal)  R=(2 2A+M ) 1/2, G=(2 2A-M ) 1/2 Data Transformation “Observed” data {(R,G)} n= : R = red channel signal G = green channel signal (background corrected or not)

Normalization Biased towards the green channel & Intensity dependent artifacts

Blanks / Empty spots blanks 99%