Bias, Variance, and Fit for Three Measures of Expression: AvDiff, Li &Wong’s, and AvLog(PM-BG) Rafael A. Irizarry Department of Biostatistics, JHU (joint.

Slides:

Advertisements

Similar presentations

Lecture 9 Microarray experiments MA plots

Advertisements

Bias, Variance, and Fit for Three Measures of Expression: AvDiff, Li &Wong’s, and AvLog(PM-BG) Rafael A. Irizarry Department of Biostatistics, JHU (joint.

Gene Expression Index Stat Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.

Microarray Normalization

Evaluation of Affymetrix array normalization procedures based on spiked cRNAs Andrew Hill Expression Profiling Informatics Genetics Institute/Wyeth-Ayerst.

Zhongxue Chen, Monnie McGee, Qingzhong Liu and Richard Scheuermann

Normalization of Microarray Data - how to do it! Henrik Bengtsson Terry Speed

Development, Implementation and Testing of a DNA Microarray Test Suite Ehsanul Haque Mentors: Dr. Cecilie Boysen Dr. Jim Breaux ViaLogy Corp.

Statistical Methods in Microarray Data Analysis Mark Reimers, Genomics and Bioinformatics, Karolinska Institute.

Sandrine Dudoit1 Microarray Experimental Design and Analysis Sandrine Dudoit jointly with Yee Hwa Yang Division of Biostatistics, UC Berkeley

Getting the numbers comparable

Probe Level Analysis of AffymetrixTM Data

Normalization for cDNA Microarray Data Yee Hwa Yang, Sandrine Dudoit, Percy Luu and Terry Speed. SPIE BIOS 2001, San Jose, CA January 22, 2001.

DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.

Public data - available for projects 6 data sets: –Human Tissues –Leukemia –Spike-in –FARO compendium – Yeast Cell Cycle –Yeast Rosetta Find one yourself.

Dilution/Mixture Study Bill Craven, GeneLogic, Inc. Motivated by a desire for a data set to be used as a baseline to characterize analysis and normalization.

Low-Level Analysis and QC Regional Biases Mark Reimers, NCI.

Summarizing and comparing GeneChip  data Terry Speed, UC Berkeley & WEHI, Melbourne Affymetrix Users Meeting, Friday June 7, 2002 Redwood City, CA.

Basic Statistical Concepts Psych 231: Research Methods in Psychology.

Identification of spatial biases in Affymetrix oligonucleotide microarrays Jose Manuel Arteaga-Salas, Graham J. G. Upton, William B. Langdon and Andrew.

Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays Department of Biostatistics, University pf North Carolina,

Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.

Felix Naef & Marcelo Magnasco, GL meeting, Nov Outline Background subtraction Probeset statistics Excursions into.

1 Models and methods for summarizing GeneChip probe set data.

Microarray Data Analysis Data quality assessment and normalization for affymetrix chips.

Microarray Data Analysis Data quality assessment and normalization for affymetrix chips.

ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”

Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays Division of Human Cancer Genetics Ohio State University.

GeneChips and Microarray Expression Data

Gene Expression Microarrays Microarray Normalization Stat

Filtering and Normalization of Microarray Gene Expression Data Waclaw Kusnierczyk Norwegian University of Science and Technology Trondheim, Norway.

Summaries of Affymetrix GeneChip probe level data By Rafael A. Irizarry PH 296 Project, Fall 2003 Group: Kelly Moore, Amanda Shieh, Xin Zhao.

Microarray Preprocessing

Chemometrics Method comparison

Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics

Panu Somervuo, March 19, cDNA microarrays.

Basic Statistics. Scales of measurement Nominal The one that has names Ordinal Rank ordered Interval Equal differences in the scores Ratio Has a true.

WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003.

Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.

Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research.

Assessing expression data quality in high-density oligonucliotide arrays.

A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Introduction to analysis of microarray data David Edwards.

Lo w -Level Analysis of Affymetrix Data Mark Reimers National Cancer Institute Bethesda Maryland.

Lecture Topic 5 Pre-processing AFFY data. Probe Level Analysis The Purpose –Calculate an expression value for each probe set (gene) from the PM.

Summarization of Oligonucleotide Expression Arrays BIOS Winter 2010.

Model-based analysis of oligonucleotide arrays, dChip software Statistics and Genomics – Lecture 4 Department of Biostatistics Harvard School of Public.

Pre-processing in DNA microarray experiments Sandrine Dudoit, Robert Gentleman, Rafael Irizarry, and Yee Hwa Yang Bioconductor short course Summer 2002.

Techniques for Analysing Microarrays Which genes are involved in ovarian and prostate cancer?

CPE 619 Two-Factor Full Factorial Design With Replications Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The.

A Microarray-Based Screening Procedure for Detecting Differentially Represented Yeast Mutants Rafael A. Irizarry Department of Biostatistics, JHU

Use of Mixture Model in a genome-wide DNA microarray-based genetic screen for components of the NHEJ Pathway in Yeast Rafael A. Irizarry Department of.

Aron, Aron, & Coups, Statistics for the Behavioral and Social Sciences: A Brief Course (3e), © 2005 Prentice Hall Chapter 6 Hypothesis Tests with Means.

Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics

Comp. Genomics Recitation 10 4/7/09 Differential expression detection.

Pre-processing DNA Microarray Data Sandrine Dudoit, Robert Gentleman, Rafael Irizarry, and Yee Hwa Yang Bioconductor Short Course Winter 2002 © Copyright.

Preparing to Analyse Data C.Adithan Department of Pharmacology JIPMER Pondicherry

Aron, Aron, & Coups, Statistics for the Behavioral and Social Sciences: A Brief Course (3e), © 2005 Prentice Hall Chapter 2 The Mean, Variance, Standard.

Statistical Analyses of High Density Oligonucleotide Arrays Rafael A. Irizarry Department of Biostatistics, JHU (joint work with Bridget Hobbs and Terry.

Oigonucleotide (Affyx) Array Basics Joseph Nevins Holly Dressman Mike West Duke University.

Exploration, Normalization, and Summaries of High Density Oligonucleotide Array Probe Level Data Rafael A. Irizarry Department of Biostatistics, JHU (joint.

Introduction to Affymetrix GeneChip data

A Simple Approach to Ranking Differentially Expressed Gene Expression Time Courses through Gaussian Process Regression By Alfredo A Kalaitzis and Neil.

CDNA-Project cDNA project Julia Brettschneider (UCB Statistics)

Summary descriptive statistics: means and standard deviations:

Summary descriptive statistics: means and standard deviations:

Getting the numbers comparable

Pan Du, Simon Lin Robert H. Lurie Comprehensive Cancer Center

Normalization for cDNA Microarray Data

Chapter 10 Introduction to the Analysis of Variance

Pre-processing AFFY data

Presentation transcript:

Bias, Variance, and Fit for Three Measures of Expression: AvDiff, Li &Wong’s, and AvLog(PM-BG) Rafael A. Irizarry Department of Biostatistics, JHU (joint work with Bridget Hobbs and Terry Speed, Walter & Eliza Hall Institute of Medical Research)

Summary Summarize the expression level of a probe set by Average Log 2 (PM-BG) PMs need to be normalized Background makes no use of probe-specific MM Evaluate and compare through bias, variance and model fit to AvDiff and the Li & Wong algorithm Use Gene Logic spike-in and dilution study All three expression measures performed well AvLog(PM-BG) is arguably the best of the three

SD vs. Avg of Defective Probes

Normalization at Probe Level

Spike-In Experiments Add concentrations (0.5pM – 100 pM) of 11 foreign species cRNAs to hybridization mixture Set A: 11 control cRNAs were spiked in, all at the same concentration, which varied across chips. Set B: 11 control cRNAs were spiked in, all at different concentrations, which varied across chips. The concentrations were arranged in 12x12 cyclic Latin square (with 3 replicates)

Set A: Probe Level Data (12 chips)

What Did We Learn? Don’t subtract or divide by MM Probe effect is additive on log scale Take logs

Why Remove Background?

Background Distribution

Average Log 2 (PM-BG) Normalize probe level data Compute BG = background mean by estimating the mode of the MM distribution Subtract BG from each PM If PM-BG < 0 use minimum of positives divided by 2 Take average

Expression after Normalization

Expression Level Comparison

Spike-In B Probe SetConc 1Conc 2Rank BioB BioB BioC BioB-M BioDn DapX CreX CreX BioC DapX DapX-M Later we consider 23 different combinations of concentrations

Differential Expression

Observed Ranks GeneAvDiffMAS 5.0Li&WongAvLog(PM-BG) BioB BioB BioC BioB-M30373 BioDn DapX CreX CreX BioC DapX DapX-M Top

Observed vs True Ratio

Dilution Experiment cRNA hybridized to human chip (HGU95) in range of proportions and dilutions Dilution series begins at 1.25  g cRNA per GeneChip array, and rises through 2.5, 5.0, 7.5, 10.0, to 20.0  g per array. 5 replicate chips were used at each dilution Normalize just within each set of 5 replicates For each probe set compute expression, average and SD over replicates, and fit a line to log expression vs. log concentration Regression line should have slope 1 and high R 2

Dilution Experiment Data

Expression and SD

Slope Estimates and R 2

Model check Compute observed SD of 5 replicate expression estimates Compute RMS of 5 nominal SDs Compare by taking the log ratio Closeness of observed and nominal SD taken as a measure of goodness of fit of the model

Observed vs. Model SE

Conclusion Take logs PMs need to be normalized Using global background improves on use of probe-specific MM Gene Logic spike-in and dilution study show all three expression measures performed very well AvLog(PM-BG) is arguably the best in terms of bias, variance and model fit Future: better BG; robust/resistant summaries

Acknowledgements Gene Brown’s group at Wyeth/Genetics Institute, and Uwe Scherf’s Genomics Research & Development Group at Gene Logic, for generating the spike-in and dilution data Gene Logic for permission to use these data Francois Collin (Gene Logic) Ben Bolstad (UC Berkeley) Magnus Åstrand (Astra Zeneca Mölndal)