Normalization for cDNA Microarray Data

Slides:



Advertisements
Similar presentations
Improved normalisation of microarray data by optimised iterative local regression Matthias E. Futschik Department of Information Science University of.
Advertisements

Experimental Design and Differential Expression Class web site: Statistics for Microarrays.
M. Kathleen Kerr “Design Considerations for Efficient and Effective Microarray Studies” Biometrics 59, ; December 2003 Biostatistics Article Oncology.
High-dimensional data analysis: Microarrays and multiple testing Mark van de Wiel 1,2 1. Dep. of Mathematics, VU University Amsterdam 2. Dep. of Biostatistics.
Pre-processing in DNA microarray experiments Sandrine Dudoit PH 296, Section 33 13/09/2001.
Microarray Normalization
Filtering and Normalization of Microarray Gene Expression Data Waclaw Kusnierczyk Norwegian University of Science and Technology Trondheim, Norway.
1 MicroArray -- Data Analysis Cecilia Hansen & Dirk Repsilber Bioinformatics - 10p, October 2001.
Normalization of microarray data
Mathematical Statistics, Centre for Mathematical Sciences
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Introduction to the design of cDNA microarray experiments Statistics 246, Spring 2002 Week 9, Lecture 1 Yee Hwa Yang.
Normalization of Microarray Data - how to do it! Henrik Bengtsson Terry Speed
Sandrine Dudoit1 Microarray Experimental Design and Analysis Sandrine Dudoit jointly with Yee Hwa Yang Division of Biostatistics, UC Berkeley
Getting the numbers comparable
Statistics for Microarrays
Normalization for cDNA Microarray Data Yee Hwa Yang, Sandrine Dudoit, Percy Luu and Terry Speed. SPIE BIOS 2001, San Jose, CA January 22, 2001.
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Preprocessing Methods for Two-Color Microarray Data
Normalization Class web site: Statistics for Microarrays.
Gene Expression Data Analyses (2)
Normalization of 2 color arrays Alex Sánchez. Dept. Estadística Universitat de Barcelona.
Some thoughts of the design of cDNA microarray experiments Terry Speed & Yee HwaYang, Department of Statistics UC Berkeley MGED IV Boston, February 14,
Making Sense of Complicated Microarray Data
A robust neural networks approach for spatial and intensity-dependent normalization of cDNA microarray data A.L. Tarca, J.E.K. Cooke and J. MacKay Presented.
Corrections and Normalization in microarrays data analysis
Filtering and Normalization of Microarray Gene Expression Data Waclaw Kusnierczyk Norwegian University of Science and Technology Trondheim, Norway.
1 Normalization Methods for Two-Color Microarray Data 1/13/2009 Copyright © 2009 Dan Nettleton.
(4) Within-Array Normalization PNAS, vol. 101, no. 5, Feb Jianqing Fan, Paul Tam, George Vande Woude, and Yi Ren.
Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004.
Image Quantitation in Microarray Analysis More tomorrow...
The following slides have been adapted from to be presented at the Follow-up course on Microarray Data Analysis.
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
CDNA Microarrays MB206.
Panu Somervuo, March 19, cDNA microarrays.
Gene Expression Data Qifang Xu. Outline cDNA Microarray Technology cDNA Microarray Technology Data Representation Data Representation Statistical Analysis.
WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003.
Analysis of Microarray Data Analysis of images Preprocessing of gene expression data Normalization of data –Subtraction of Background Noise –Global/local.
Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research.
1 Pre-processing - Normalization Databases Statistics for Microarray Data Analysis – Lecture 2 The Fields Institute for Research in Mathematical Sciences.
Lecture Topic 5 Pre-processing AFFY data. Probe Level Analysis The Purpose –Calculate an expression value for each probe set (gene) from the PM.
Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002.
Statistics for Differential Expression Naomi Altman Oct. 06.
Henrik Bengtsson Mathematical Statistics Centre for Mathematical Sciences Lund University, Sweden Plate Effects in cDNA Microarray Data.
Microarray hybridization Usually comparative – Ratio between two samples Examples – Tumor vs. normal tissue – Drug treatment vs. no treatment – Embryo.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
(1) Normalization of cDNA microarray data Methods, Vol. 31, no. 4, December 2003 Gordon K. Smyth and Terry Speed.
Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006.
The second-simplest cDNA microarray data analysis problem Terry Speed, UC Berkeley Bioinformatic Strategies For Application of Genomic Tools to Environmental.
Henrik Bengtsson Mathematical Statistics Centre for Mathematical Sciences Lund University Plate Effects in cDNA Microarray Data.
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Microarray: An Introduction
Microarray Data Analysis Xuming He Department of Statistics University of Illinois at Urbana-Champaign.
Micro array Data Analysis. Differential Gene Expression Analysis The Experiment Micro-array experiment measures gene expression in Rats (>5000 genes).
Lecture 2 – Pre-processing and Normalization José Luis Mosquera Computational Lab on Microarrays Data Analysis Special Topics in Computer Science Institute.
DNA Microarray. Microarray Printing 96-well-plate (PCR Products) 384-well print-plate Microarray.
CDNA-Project cDNA project Julia Brettschneider (UCB Statistics)
Copyright © 2007 Dan Nettleton
Normalization Methods for Two-Color Microarray Data
Estimating expression differences in cDNA microarray experiments
Significance Analysis of Microarrays (SAM)
Image Processing for cDNA Microarray Data
New normalisation methods for microarrays
Significance Analysis of Microarrays (SAM)
Getting the numbers comparable
Data Transformation, T-Tools and Alternatives
Nonpapametric Smoothing (1)
Pre-processing AFFY data
Presentation transcript:

Normalization for cDNA Microarray Data Originally Yee Hwa Yang, Sandrine Dudoit, Percy Luu and Terry Speed. Ho Kim

Normalization To describe the process of removing system variations such as Physical properties of dyes efficiency of dye incorporation Experimental variability in probe coupling and processing procedures Scanner settings

Normalization issues Within-slide Paired-slides (dye swap) What genes to use Location Scale Paired-slides (dye swap) Self-normalization Between slides

Within-Slide Normalization Normalization balances red and green intensities. Imbalances can be caused by Different incorporation of dyes Different amounts of mRNA Different scanning parameters In practice, we usually need to increase the red intensity a bit to balance the green

log2R/G -> log2R/G - c = log2R/ (kG) Methods? Global normalization log2R/G -> log2R/G - c = log2R/ (kG) Standard Practice (in most software) c is a constant such that normalized log-ratios have zero mean or median. Our Preference: c is a function of overall spot intensity and print-tip-group.

What genes to use? All genes on the array : when only small portion of gene are expected to be differentially expressed, symmetry is also assumed. Constantly expressed genes (house keeping) : e.g. Beta actin Controls Spiked controls (e.g. synthetic DNA sequences, plant genes) : should have equal red and green intensities Genomic DNA titration series Other set of genes

Experiment KO #8 mRNA samples R = Apo A1 KO mouse liver G = Control (All C57Bl/6) KO #8 Probes: ~6,000 cDNAs, including 200 related to lipid metabolism.

M vs. A M = log2(R / G) : log intensity ratio A = log2(R*G) / 2 : mean log-intensity

Normalization - Median Assumption: Changes roughly symmetric First panel: smooth density of log2G and log2R. Second panel: M vs. A plot with median set to zero

Nonpapametric Smoothing (1) Consider X Y plot. Draw a regression line which requires no parametric assumptions The regression line is not linear The regression line is totally dependent on the data Two components of smoothing Kernal function : How to calculate weighted mean Bandwidth : width of the window (span), determines the smoothness of the regresssion line; wider > smoother

Nonpapametric Smoothing (2) Uniform Kernel

Nonpapametric Smoothing (3) Triangular Kernel

Nonpapametric Smoothing (4) Normal Kernel

Nonpapametric Smoothing (5) Default Lowess line : Span=0.5

Nonpapametric Smoothing (6) Lowess line : Span=0.2

Nonpapametric Smoothing (7) Lowess line : Span=0.1

Normalization - lowess Global lowess Assumption: changes roughly symmetric at all intensities.

Normalisation - print-tip-group Assumption: For every print group, changes roughly symmetric at all intensities.

M vs. A - after print-tip-group normalization

Effects of Location Normalisation Before normalisation After print-tip-group normalisation

Box Plot IQR=Q3-Q1 Outliers 1.5*IQR Q3 Median(Q2) Q1

QQ-plot : to compare sample distribution with other ones (e.g. normal) T(df=9) vs standard normal

Within print-tip-group box plots for print-tip-group normalized M

Taking scale into account Assumptions: All print-tip-groups have the same spread. True ratio is mij where i represents different print-tip-groups, j represents different spots. Observed is Mij, where Mij = ai mij Robust estimate of ai is MADi = medianj { |yij - median(yij) | }

Effect of location + scale normalization

Effect of location + scale normalization

Comparing different normalisation methods

Follow-up Experiment 50 distinct clones with largest absolute t-statistics from the first experiment. 72 other clones. Spot each clone 8 times . Two hybridizations: Slide 1, ttt -> red ctl-> green. Slide 2, ttt -> green ctl->red.

Follow-up Experiment

Paired-slides: dye swap Slide 1, M = log2 (R/G) - c Slide 2, M’ = log2 (R’/G’) - c’ Combine by subtract the normalized log-ratios: [ (log2 (R/G) - c) - (log2 (R’/G’) - c’) ] / 2  [ log2 (R/G) + (log2 (G’/R’) ] / 2  [ log2 (RG’/GR’) ] / 2 = (M-M`)/2 provided c = c’ Assumption: the separate normalizations are the same.

Verify Assumption

Result of Self-Normalization Plot of (M - M’)/2 vs. (A + A’)/2

Summary Case 1: A few genes that are likely to change Within-slide: Location: print-tip-group lowess normalization. Scale: for all print-tip-groups, adjust MAD to equal the geometric mean for MAD for all print-tip-groups. Between slides (experiments) : An extension of within-slide scale normalization (future work). Case 2: Many genes changing (paired-slides) Self-normalization: taking the difference of the two log-ratios. Check using controls or known information.

Technical Reports from Terry’s group: http://www.stat.berkeley.edu/users/terry/zarray/Html/ Technical Reports from Terry’s group: http://www.stat.Berkeley.EDU/users/terry/zarray/Html /papersindex.html Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Comparison of methods for image analysis on cDNA microarray data. Normalization for cDNA Microarray Data Statistical software R http://lib.stat.cmu.edu/R/CRAN/