Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics

Slides:



Advertisements
Similar presentations
Bias, Variance, and Fit for Three Measures of Expression: AvDiff, Li &Wong’s, and AvLog(PM-BG) Rafael A. Irizarry Department of Biostatistics, JHU (joint.
Advertisements

Bias, Variance, and Fit for Three Measures of Expression: AvDiff, Li &Wong’s, and AvLog(PM-BG) Rafael A. Irizarry Department of Biostatistics, JHU (joint.
Pre-processing in DNA microarray experiments Sandrine Dudoit PH 296, Section 33 13/09/2001.
Introduction to Microarray
Gene Expression Index Stat Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.
Microarray Normalization
Microarray Simultaneously determining the abundance of multiple(100s-10,000s) transcripts.
Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, The Walter and Eliza Hall Institute of Medical.
1 MicroArray -- Data Analysis Cecilia Hansen & Dirk Repsilber Bioinformatics - 10p, October 2001.
Mathematical Statistics, Centre for Mathematical Sciences
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Biological background: Gene Expression and Molecular Laboratory Techniques Class web site: Statistics.
Sandrine Dudoit1 Microarray Experimental Design and Analysis Sandrine Dudoit jointly with Yee Hwa Yang Division of Biostatistics, UC Berkeley
Getting the numbers comparable
DNA microarray and array data analysis
Normalization for cDNA Microarray Data Yee Hwa Yang, Sandrine Dudoit, Percy Luu and Terry Speed. SPIE BIOS 2001, San Jose, CA January 22, 2001.
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Normalization Class web site: Statistics for Microarrays.
Summarizing and comparing GeneChip  data Terry Speed, UC Berkeley & WEHI, Melbourne Affymetrix Users Meeting, Friday June 7, 2002 Redwood City, CA.
Figure 1: (A) A microarray may contain thousands of ‘spots’. Each spot contains many copies of the same DNA sequence that uniquely represents a gene from.
Microarray Technology Types Normalization Microarray Technology Microarray: –New Technology (first paper: 1995) Allows study of thousands of genes at.
Image Analysis Class web site: Statistics for Microarrays.
Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.
Gene Expression BMI 731 week 5
Microarray Data Analysis Data quality assessment and normalization for affymetrix chips.
Gene expression and the transcriptome I. Genomics and transcriptome After genome sequencing and annotation, the second major branch of genomics is analysis.
Introduce to Microarray
Corrections and Normalization in microarrays data analysis
Gene Expression BMI 731 Winter 2005 Catalin Barbacioru Department of Biomedical Informatics Ohio State University.
Gene Expression Data Analyses (1) Trupti Joshi Computer Science Department 317 Engineering Building North (O)
Microarrays: Basic Principle AGCCTAGCCT ACCGAACCGA GCGGAGCGGA CCGGACCGGA TCGGATCGGA Probe Targets Highly parallel molecular search and sort process based.
Analysis of microarray data
B IOINFORMATICS Dr. Aladdin HamwiehKhalid Al-shamaa Abdulqader Jighly Lecture 8 Analyzing Microarray Data Aleppo University Faculty of technical.
Summaries of Affymetrix GeneChip probe level data By Rafael A. Irizarry PH 296 Project, Fall 2003 Group: Kelly Moore, Amanda Shieh, Xin Zhao.
Gene expression and the transcriptome I
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
CDNA Microarrays MB206.
Panu Somervuo, March 19, cDNA microarrays.
WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003.
Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
Agenda Introduction to microarrays
Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research.
Assessing expression data quality in high-density oligonucliotide arrays.
Microarray - Leukemia vs. normal GeneChip System.
Introduction to DNA microarray technologies Sandrine Dudoit, Robert Gentleman, Rafael Irizarry, and Yee Hwa Yang Bioconductor short course Summer 2002.
Lo w -Level Analysis of Affymetrix Data Mark Reimers National Cancer Institute Bethesda Maryland.
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
Summarization of Oligonucleotide Expression Arrays BIOS Winter 2010.
Model-based analysis of oligonucleotide arrays, dChip software Statistics and Genomics – Lecture 4 Department of Biostatistics Harvard School of Public.
A Microarray-Based Screening Procedure for Detecting Differentially Represented Yeast Mutants Rafael A. Irizarry Department of Biostatistics, JHU
Henrik Bengtsson Mathematical Statistics Centre for Mathematical Sciences Lund University, Sweden Plate Effects in cDNA Microarray Data.
Use of Mixture Model in a genome-wide DNA microarray-based genetic screen for components of the NHEJ Pathway in Yeast Rafael A. Irizarry Department of.
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be more direct, but is currently.
Microarray hybridization Usually comparative – Ratio between two samples Examples – Tumor vs. normal tissue – Drug treatment vs. no treatment – Embryo.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
(1) Normalization of cDNA microarray data Methods, Vol. 31, no. 4, December 2003 Gordon K. Smyth and Terry Speed.
Empirical Bayes Analysis of Variance Component Models for Microarray Data S. Feng, 1 R.Wolfinger, 2 T.Chu, 2 G.Gibson, 3 L.McGraw 4 1. Department of Statistics,
Henrik Bengtsson Mathematical Statistics Centre for Mathematical Sciences Lund University Plate Effects in cDNA Microarray Data.
Statistical Analyses of High Density Oligonucleotide Arrays Rafael A. Irizarry Department of Biostatistics, JHU (joint work with Bridget Hobbs and Terry.
Oigonucleotide (Affyx) Array Basics Joseph Nevins Holly Dressman Mike West Duke University.
From: Duggan et.al. Nature Genetics 21:10-14, 1999 Microarray-Based Assays (The Basics) Each feature or “spot” represents a specific expressed gene (mRNA).
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Exploration, Normalization, and Summaries of High Density Oligonucleotide Array Probe Level Data Rafael A. Irizarry Department of Biostatistics, JHU (joint.
Microarray: An Introduction
Arrays How do they work ? What are they ?. WT Dwarf Transgenic Other species Arrays are inverted Northerns: Extract target RNA YFG Label probe + hybridise.
Introduction to Affymetrix GeneChip data
CDNA-Project cDNA project Julia Brettschneider (UCB Statistics)
Getting the numbers comparable
Normalization for cDNA Microarray Data
Presentation transcript:

Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics

Outline Scientific questions Review of technology Role of statistics Two case studies

Scientific Questions Expression Differential expression Expression patterns “ To understand gene function, it is helpful to know when and where it is expressed and…” “…under what circumstances the expression level is affected.” “… questions concerning functional pathways and how cellular components work together to regulate and carry out cellular processes.” Lipshutz et al. (1999) Nature genetics, 21, pp

What do Microarrays do? Interrogate labeled nucleic acid samples model systems, microdissections, cell lines, human tissue bank kanR UPTAG DOWNTAG RNA samples Oligonucleotide barcodes

How do they do it? Probes Labeled targets

cDNA clones (probes) PCR product amplification purification printing microarray Hybridize target to microarray mRNA target excitation laser 1 laser 2 emission scanning analysis 0.1nl/spot overlay image and normalize cDNA Arrays

High Density Oligonucleotide Arrays 24µm Millions of copies of a specific oligonucleotide probe Image of Hybridized Probe Array Image of Hybridized Probe Array >200,000 different complementary probes Single stranded, labeled RNA target Oligonucleotide probe * * * * *1.28cm GeneChip Probe Array Hybridized Probe Cell Compliments of D. Gerhold

Role of Statistics

Biological question Differentially expressed genes Sample class prediction etc. Testing Biological verification and interpretation Microarray experiment Estimation Experimental design Image analysis Normalization Clustering Discrimination Quantify Expression

Part of the image of one channel false-coloured on a white (v. high) red (high) through yellow and green (medium) to blue (low) and black scale

Does one size fit all?

Segmentation: limitation of the fixed circle method SRGFixed Circle Inside the boundary is spot (fg), outside is not.

Some local backgrounds We use something different again: a smaller, less variable value. Single channel grey scale

Quantification of Expression For each spot on the slide we calculate Red intensity = Rfg – Rbg fg = foreground, bg = background, and Green intensity = Gfg – Gbg and combine them in the log (base 2) ratio Log 2 ( Red intensity / Green intensity) we now have one differential expression for each gene for each array

Top 2.5%of ratios red, bottom 2.5% of ratios green The red-green ratios can be spatially biased

Another example

Oligo Array Image Analysis About 100 pixels per probe cell These intensities are combined to form one number representing expression for the probe cell oligo

Normalization at Probe Level

Dilution Experiment Data

PM MM

Default until 2002 GeneChip ® software uses Avg.diff with A a set of “suitable” pairs chosen by software. Log ratio version is also used. For differential expression Avg.diffs are compared between chips.

What is the evidence? Lockhart et. al. Nature Biotechnology 14 (1996)

Two case studies

Spike-In Experiments Add concentrations (0.5pM – 100 pM) of 11 foreign species cRNAs to hybridization mixture Set A: 11 control cRNAs were spiked in, all at the same concentration, which varied across chips. Set B: 11 control cRNAs were spiked in, all at different concentrations, which varied across chips. The concentrations were arranged in 12x12 cyclic Latin square (with 3 replicates)

Set A: Probe Level Data (12 chips)

Spike-In B Probe SetConc 1Conc 2Rank BioB BioB BioC BioB-M BioDn DapX CreX CreX BioC DapX DapX-M Later we consider 23 different combinations of concentrations

Observed Ranks GeneAvDiffMAS 5.0Li&WongAvLog(PM-BG) BioB BioB BioC BioB-M30363 BioDn DapX CreX CreX BioC DapX DapX-M

kanR A Transformation into deletion pool Select for Ura + transformants Genomic DNA preparation Circular pRS416 PCR Cy5 labeled PCR productsCy3 labeled PCR products Oligonucleotide array hybridization B EcoRI linearized PRS416 NHEJ Defective MCS CEN/ARS URA3 ttaa aatt CEN/ARS URA3 UPTAG DOWNTAG

.

Average Red and Green Scatter Plot

Average Red and Green MVA plot

Histograms

QQ-Plot

Z-Scores

Average Red and Green MVA Plot

Average Red and Green Scatter Plot

Summary Simple data exploration useful tool for quality assessment Statistical thinking helpful for interpretation Statistical models may help find signals in noise

Acknowledgements UC Berkeley Stat Ben Bolstad Sandrine Dudoit Terry Speed Jean Yang MBG (SOM) Jef Boeke Siew-Loon Ooi Marina Lee Forrest Spencer Biostatistics Karl Broman Leslie Cope Carlo Coulantoni Giovanni Parmigiani Scott Zeger Gene Logic Francois Colin Uwe Scherf’s Group PGA Tom Cappola Skip Garcia Joshua Hare WEHI Bridget Hobbs Natalie Thorne