Getting the numbers comparable

Slides:



Advertisements
Similar presentations
NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.
Advertisements

Lecture 9 Microarray experiments MA plots
Bias, Variance, and Fit for Three Measures of Expression: AvDiff, Li &Wong’s, and AvLog(PM-BG) Rafael A. Irizarry Department of Biostatistics, JHU (joint.
Bias, Variance, and Fit for Three Measures of Expression: AvDiff, Li &Wong’s, and AvLog(PM-BG) Rafael A. Irizarry Department of Biostatistics, JHU (joint.
Microarray Quality Assessment Issues in High-Throughput Data Analysis BIOS Spring 2010 Dr Mark Reimers.
Gene Expression Index Stat Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.
Microarray Normalization
Zhongxue Chen, Monnie McGee, Qingzhong Liu and Richard Scheuermann
Analysis and Interpretation of Microarray Data Michael F. Miles, M.D., Ph.D. Depts. of Pharmacology/Toxicology and Neurology and the Center for Study of.
Normalization of microarray data
Introduction to Affymetrix Microarrays
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Development, Implementation and Testing of a DNA Microarray Test Suite Ehsanul Haque Mentors: Dr. Cecilie Boysen Dr. Jim Breaux ViaLogy Corp.
Statistical Methods in Microarray Data Analysis Mark Reimers, Genomics and Bioinformatics, Karolinska Institute.
Probe Level Analysis of AffymetrixTM Data
DNA Microarray Bioinformatics - #27611 Program Normalization exercise (from last week) Dimension reduction theory (PCA/Clustering) Dimension reduction.
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Preprocessing Methods for Two-Color Microarray Data
Low-Level Analysis and QC Regional Biases Mark Reimers, NCI.
Summarizing and comparing GeneChip  data Terry Speed, UC Berkeley & WEHI, Melbourne Affymetrix Users Meeting, Friday June 7, 2002 Redwood City, CA.
Statistical Analysis of Microarray Data
Identification of spatial biases in Affymetrix oligonucleotide microarrays Jose Manuel Arteaga-Salas, Graham J. G. Upton, William B. Langdon and Andrew.
1 Preprocessing for Affymetrix GeneChip Data 1/18/2011 Copyright © 2011 Dan Nettleton.
SNP chips Advanced Microarray Analysis Mark Reimers, Dept Biostatistics, VCU, Fall 2008.
Normalization of 2 color arrays Alex Sánchez. Dept. Estadística Universitat de Barcelona.
Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.
1 Models and methods for summarizing GeneChip probe set data.
Microarray Data Analysis Data quality assessment and normalization for affymetrix chips.
Microarray Data Analysis Data quality assessment and normalization for affymetrix chips.
ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”
Introduce to Microarray
Affymetrix GeneChip Data Analysis Chip concepts and array design Improving intensity estimation from probe pairs level Clustering Motif discovering and.
Introduction to DNA microarrays DTU - January Hanne Jarmer.
Scanning and image analysis Scanning -Dyes -Confocal scanner -CCD scanner Image File Formats Image analysis -Locating the spots -Segmentation -Evaluating.
Analysis of GO annotation at cluster level by H. Bjørn Nielsen Slides from Agnieszka S. Juncker.
Gene Expression Microarrays Microarray Normalization Stat
Summaries of Affymetrix GeneChip probe level data By Rafael A. Irizarry PH 296 Project, Fall 2003 Group: Kelly Moore, Amanda Shieh, Xin Zhao.
Microarray Preprocessing
1 Normalization Methods for Two-Color Microarray Data 1/13/2009 Copyright © 2009 Dan Nettleton.
DATA TRANSFORMATION and NORMALIZATION Lecture Topic 4.
Panu Somervuo, March 19, cDNA microarrays.
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
Introduction to DNA microarrays DTU - May Hanne Jarmer.
Agenda Introduction to microarrays
Lo w -Level Analysis of Affymetrix Data Mark Reimers National Cancer Institute Bethesda Maryland.
Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine
Lecture Topic 5 Pre-processing AFFY data. Probe Level Analysis The Purpose –Calculate an expression value for each probe set (gene) from the PM.
Summarization of Oligonucleotide Expression Arrays BIOS Winter 2010.
Model-based analysis of oligonucleotide arrays, dChip software Statistics and Genomics – Lecture 4 Department of Biostatistics Harvard School of Public.
Introduction to Statistical Analysis of Gene Expression Data Feng Hong Beespace meeting April 20, 2005.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Pre-processing in DNA microarray experiments Sandrine Dudoit, Robert Gentleman, Rafael Irizarry, and Yee Hwa Yang Bioconductor short course Summer 2002.
Statistics for Differential Expression Naomi Altman Oct. 06.
Microarray Data Pre-Processing
Statistical Analysis of Microarray Data By H. Bjørn Nielsen.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
Introduction to Microarrays. The Central Dogma.
Microarray Data Analysis The Bioinformatics side of the bench.
Statistical Analyses of High Density Oligonucleotide Arrays Rafael A. Irizarry Department of Biostatistics, JHU (joint work with Bridget Hobbs and Terry.
Oigonucleotide (Affyx) Array Basics Joseph Nevins Holly Dressman Mike West Duke University.
The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.
Gene expression  Introduction to gene expression arrays Microarray Data pre-processing  Introduction to RNA-seq Deep sequencing applications RNA-seq.
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Exploration, Normalization, and Summaries of High Density Oligonucleotide Array Probe Level Data Rafael A. Irizarry Department of Biostatistics, JHU (joint.
Introduction to Affymetrix GeneChip data
CDNA-Project cDNA project Julia Brettschneider (UCB Statistics)
Getting the numbers comparable
Pre-processing AFFY data
Presentation transcript:

Getting the numbers comparable Normalization Getting the numbers comparable

The DNA Array Analysis Pipeline Question Experimental Design Array design Probe design Sample Preparation Hybridization Buy Chip/Array Image analysis Normalization Expression Index Calculation Comparable Gene Expression Data Statistical Analysis Fit to Model (time series) Advanced Data Analysis Clustering PCA Classification Promoter Analysis Meta analysis Survival analysis Regulatory Network

Expression intensities are not just target concentrations Sample contamination RNA quality Sample preparation Dye effect (cy3/cy5) Probe affinity Hybridization Unspecific signal (background) Saturation Spotting Other issues related to array manufacturing Image segmentation Array spatial effects

Two kinds of variation in the signal Global variation RNA quality Sample preparation Dye Hybridization Photodetection Gene-specific variation Spotting (size and shape) Cross-hybridization Dye Biological variation Effect Noise Systematic Stochastic

Gene-specific variation: Sources of variation Global variation: Similar effect on many measurements Corrections can be estimated from data Gene-specific variation: Too random to be explicitly accounted for “noise” Systematic Stochastic Normalization Statistical testing

Calibration = Normalization = Scaling

Nonlinear normalization

Lowess Normalization M A * * * * One of the most commonly utilized normalization techniques is the LOcally Weighted Scatterplot Smoothing (LOWESS) algorithm.

The Qspline method From the empirical distribution, a number of quantiles are calculated for each of the channels to be normalized (one channel shown in red) and for the reference distribution (shown in black) A QQ-plot is made and a normalization curve is constructed by fitting a cubic spline function As reference one can use an artificial “median array” for a set of arrays or use a log-normal distribution, which is a good approximation.

Accumulating quantiles Once again…qspline Accumulating quantiles When many microarrays are to be normalized to each other an average array can be used as target

Invariant set normalization (Li and Wong) A invariant set of probes is used Probes that does does not change intensity rank between arrays A piecewise linear median line is calculated This curve is used for normalization

Spatial normalization After intensity normalization After spatial normalization Raw data After intensity After intensity Spatial bias estimate After spatial After spatial normalization normalization normalization normalization

The DNA Array Analysis Pipeline Question Experimental Design Array design Probe design Sample Preparation Hybridization Buy Chip/Array Image analysis Normalization Expression Index Calculation Comparable Gene Expression Data Statistical Analysis Fit to Model (time series) Advanced Data Analysis Clustering PCA Classification Promoter Analysis Meta analysis Survival analysis Regulatory Network

Expression index value Some microarrays have multiple probes addressing the expression of the same target Affymetrix GeneChips have 11-20 probe pairs pr. Gene - Perfect Match (PM) - MisMatch (MM) PM: CGATCAATTGCACTATGTCATTTCT MM: CGATCAATTGCAGTATGTCATTTCT However for downstream analysis we often want to deal with only one value pr. gene. Therefore we want to collapse the intensities from many probes into one value: a gene expression index value

Expression index calculation Simplest method? Median But more sophisticated methods exists: dChip, RMA and MAS 5

dChip (Li & Wong) Model: PMij = qifj + eij Outlier removal: Identify extreme residuals Remove Re-fit Iterate Distribution of errors eij assumed independent of signal strength (Li and Wong, 2001)

RMA Robust Multi-array Average (RMA) expression measure (Irizarry et al., Biostatistics, 2003) For each probe set, re-write PMij = ij as: log(PMij)= log(i ) + log(j) Fit this additive model by iteratively re-weighted least-squares or median polish

MAS. 5 MicroArray Suite version 5 uses MM* is an adjusted MM that is never bigger than PM Tukey biweight is a robust average procedure with weights and outlier rejection

Std Dev of gene measures from 20 replicate arrays Methods compared on expression variance Standard deviation of gene measures from 20 replicate arrays Std Dev of gene measures from 20 replicate arrays Expression level RMA: Blue and Red MAS5: Green dChip: Black From Terry speed

Robustness MAS5.0 MAS 5.0 Log fold change estimate from 20ug cRNA (Irizarry et al., Biostatistics, 2003) MAS 5.0 Log fold change estimate from 1.25ug cRNA Log fold change estimate from 20ug cRNA

Robustness dChip dChip Log fold change estimate from 20ug cRNA (Irizarry et al., Biostatistics, 2003) dChip Log fold change estimate from 20ug cRNA Log fold change estimate from 1.25ug cRNA

Robustness RMA RMA Log fold change estimate from 20ug cRNA (Irizarry et al., Biostatistics, 2003) RMA Log fold change estimate from 20ug cRNA Log fold change estimate from 1.25ug cRNA

All of this is implemented in… R In the BioConductor packages ‘affy’ (Gautier et al., 2003).

References Li and Wong, (2001). Model-based analysis of oligonucleotide arrays: Model validation, design issues and standard error application. Genome Biology 2:1–11. Irizarry, Bolstad, Collin, Cope, Hobbs and Speed, (2003) Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Research 31(4):e15.) Affymetrix. Affymetrix Microarray Suite User Guide. Affymetrix, Santa Clara, CA, version 5 edition, 2001. Gautier, Cope, Bolstad, and Irizarry, (2003). affy - an r package for the analysis of affymetrix genechip data at the probe level. Bioinformatics