Sandrine Dudoit1 Microarray Experimental Design and Analysis Sandrine Dudoit jointly with Yee Hwa Yang Division of Biostatistics, UC Berkeley www.stat.berkeley.edu/~sandrine.

Slides:



Advertisements
Similar presentations
Analysis by design Statistics is involved in the analysis of data generated from an experiment. It is essential to spend time and effort in advance to.
Advertisements

Linear Models for Microarray Data
Optimal designs for one and two-colour microarrays using mixed models
Experimental Design and Differential Expression Class web site: Statistics for Microarrays.
2 factor ANOVA. Drug ADrug BDrug C High doseSample 1Sample 2Sample 3 Low doseSample 4Sample 5Sample 6 2 X 3 Dose by drug (row by column)
1 Chapter 4 Experiments with Blocking Factors The Randomized Complete Block Design Nuisance factor: a design factor that probably has an effect.
M. Kathleen Kerr “Design Considerations for Efficient and Effective Microarray Studies” Biometrics 59, ; December 2003 Biostatistics Article Oncology.
1 Introduction to Experimental Design 1/26/2009 Copyright © 2009 Dan Nettleton.
Pre-processing in DNA microarray experiments Sandrine Dudoit PH 296, Section 33 13/09/2001.
1 MicroArray -- Data Analysis Cecilia Hansen & Dirk Repsilber Bioinformatics - 10p, October 2001.
Mathematical Statistics, Centre for Mathematical Sciences
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Introduction to the design of cDNA microarray experiments Statistics 246, Spring 2002 Week 9, Lecture 1 Yee Hwa Yang.
Experimental design for microarrays Presented by Alex Sánchez and Carmen Ruíz de Villa Departament d’Estadística. Universitat de Barcelona.
DNA Microarray Bioinformatics - #27611 Program Normalization exercise (from last week) Dimension reduction theory (PCA/Clustering) Dimension reduction.
Normalization for cDNA Microarray Data Yee Hwa Yang, Sandrine Dudoit, Percy Luu and Terry Speed. SPIE BIOS 2001, San Jose, CA January 22, 2001.
Differentially expressed genes
L Berkley Davis Copyright 2009 MER301: Engineering Reliability Lecture 14 1 MER301: Engineering Reliability LECTURE 14: Chapter 7: Design of Engineering.
Experimental design and statistical analyses of data
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
Some thoughts of the design of cDNA microarray experiments Terry Speed & Yee HwaYang, Department of Statistics UC Berkeley MGED IV Boston, February 14,
Gene Expression Data Analyses (1) Trupti Joshi Computer Science Department 317 Engineering Building North (O)
Genomics I: The Transcriptome RNA Expression Analysis Determining genomewide RNA expression levels.
Introduction to Bioinformatics Algorithms Clustering and Microarray Analysis.
Analysis of microarray data
B IOINFORMATICS Dr. Aladdin HamwiehKhalid Al-shamaa Abdulqader Jighly Lecture 8 Analyzing Microarray Data Aleppo University Faculty of technical.
Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012.
T WO WAY ANOVA WITH REPLICATION  Also called a Factorial Experiment.  Replication means an independent repeat of each factor combination.  The purpose.
(4) Within-Array Normalization PNAS, vol. 101, no. 5, Feb Jianqing Fan, Paul Tam, George Vande Woude, and Yi Ren.
Image Quantitation in Microarray Analysis More tomorrow...
The following slides have been adapted from to be presented at the Follow-up course on Microarray Data Analysis.
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
CDNA Microarrays MB206.
Design of Engineering Experiments Part 4 – Introduction to Factorials
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
Chapter 11 Multifactor Analysis of Variance.
Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research.
We calculated a t-test for 30,000 genes at once How do we handle results, present data and results Normalization of the data as a mean of removing.
Tips and tricks for performing standard meta-regression analysis with SPSS Giuseppe Biondi Zoccai Division of Cardiology, Department of Internal Medicine,
Design of Experiments Problem formulation Setting up the experiment Analysis of data Panu Somervuo, March 20, 2007.
A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Introduction to analysis of microarray data David Edwards.
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
Summarization of Oligonucleotide Expression Arrays BIOS Winter 2010.
Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Extracting quantitative information from proteomic 2-D gels Lecture in the bioinformatics course ”Gene expression and cell models” April 20, 2005 John.
Statistics for Differential Expression Naomi Altman Oct. 06.
Henrik Bengtsson Mathematical Statistics Centre for Mathematical Sciences Lund University, Sweden Plate Effects in cDNA Microarray Data.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
Design of Micro-arrays Lecture Topic 6. Experimental design Proper experimental design is needed to ensure that questions of interest can be answered.
Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic.
Microarray hybridization Usually comparative – Ratio between two samples Examples – Tumor vs. normal tissue – Drug treatment vs. no treatment – Embryo.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
CSIRO Insert presentation title, do not remove CSIRO from start of footer Experimental Design Why design? removal of technical variance Optimizing your.
Lecture 6 Design Matrices and ANOVA and how this is done in LIMMA.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Hybridization Design for 2-Channel Microarray Experiments Naomi S. Altman, Pennsylvania State University), NSF_RCN.
Empirical Bayes Analysis of Variance Component Models for Microarray Data S. Feng, 1 R.Wolfinger, 2 T.Chu, 2 G.Gibson, 3 L.McGraw 4 1. Department of Statistics,
Henrik Bengtsson Mathematical Statistics Centre for Mathematical Sciences Lund University Plate Effects in cDNA Microarray Data.
1 Statistical Analysis Professor Lynne Stokes Department of Statistical Science Lecture 9 Review.
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
NCode TM miRNA Analysis Platform Identifies Differentially Expressed Novel miRNAs in Adenocarcinoma Using Clinical Human Samples Provided By BioServe.
Statistics for Microarray Data Analysis – Lecture 3
Dimension reduction : PCA and Clustering
Introduction to Experimental Design
Normalization for cDNA Microarray Data
Design Issues Lecture Topic 6.
Presentation transcript:

Sandrine Dudoit1 Microarray Experimental Design and Analysis Sandrine Dudoit jointly with Yee Hwa Yang Division of Biostatistics, UC Berkeley CBMB and QB3 Short Course: Analysis of Gene Expression Microarray Data Genentech Hall Auditorium, Mission Bay, UCSF November 15, 2003

Sandrine Dudoit2 Combining data across arrays Genes Arrays M =log 2 ( Red intensity / Green intensity) expression measure, e.g, RMA … … … … … Data on G genes for n hybridizations Array1 Array2 Array3 Array4Array5 … Gene2 Gene1 Gene3 Gene5 Gene4 G x n genes-by-arrays data matrix …

Sandrine Dudoit3 Combining data across arrays D F B A C E … but columns have structure How can we design experiments and combine data across slides to provide accurate estimates of the effects of interest? Experimental design Regression analysis

Sandrine Dudoit4 Combining data across arrays cDNA array factorial experiment. Each column corresponds to a pair of mRNA samples with different drug x dose x time combinations. Clinical trial. Each column corresponds to a patient, with associated clinical outcome, such as survival and response to treatment. Linear models and extensions thereof can be used to effectively combine data across arrays for complex experimental designs.

Sandrine Dudoit5 Experimental design O A B AB

Sandrine Dudoit6 Experimental design Proper experimental design is needed to ensure that questions of interest can be answered and that this can be done accurately, given experimental constraints, such as cost of reagents and availability of mRNA.

Sandrine Dudoit7 Experimental design Design of the array itself –which cDNA probe sequences to print; –whether to use replicated probes; –which control sequences; –how many and where these should be printed. Allocation of target samples to the slides –pairing of mRNA samples for hybridization; –dye assignments; –type and number of replicates.

Sandrine Dudoit8 Graphical representation Multi-digraph Vertices: mRNA samples; Edges: hybridization; Direction: dye assignment. Cy3 sample Cy5 sample D F B A C E A design for 6 types of mRNA samples

Sandrine Dudoit9 Graphical representation The structure of the graph determines which effects can be estimated and the precision of the estimates. –Two mRNA samples can be compared only if there is a path joining the corresponding two vertices. –The precision of the estimated contrast then depends on the number of paths joining the two vertices and is inversely related to the length of the paths. Direct comparisons within slides yield more precise estimates than indirect ones between slides.

Sandrine Dudoit10 Comparing K treatments (i) Common reference design(ii) All-pairs design Question: Which design gives the most precise estimates of the contrasts A1-A2, A1-A3, and A2-A3? O A1A2A3 A1A2 A3

Sandrine Dudoit11 Comparing K treatments Answer: The all-pairs design is better, because comparisons are done within slides. For the same precision, the common reference design requires three times as many hybridizations or slides as the all-pairs design. In general, for K treatments Relative efficiency = 2K/(K-1) = 4, 3, 8/3, …  2. For the same precision, the common reference design requires 2K/(K-1) times as many hybridizations as the all-pairs design.

Sandrine Dudoit12 2 x 2 factorial experiment two factors, two levels each (1)(2)(3)(4)(5) Main effect A1214/31 Main effect B12111 Interaction AB334/38/32 Contrast A-B224/311 (1) Common ref. Scaled variances of estimated effects (2) Common ref.(4) Connected(5) All-pairs(3) Connected

Sandrine Dudoit13 Pooled reference T2T4T5T6T7T3T1 Ref Compare to T1 t vs. t+3 t vs. t+2 t vs. t+1 Time course Possible designs 1)All samples vs. common pooled reference 2)All samples vs. time 1 3)Direct hybridizations between timepoints

Sandrine Dudoit14 Design choices in time course experimentst vs. t+1t vs. t+2 T1T2T2T3T3T4T1T3T2T4T1T4Ave N=3A) T1 as common reference B) Direct hybridization N=4C) Common reference D) T1 as common ref + more E) Direct hybridization choice F) Direct hybridization choice T2 T3 T4 T1 T2 T3 T4 T1 Ref T2 T3 T4 T1 T2T3T4T1 T2T3T4T1 T2 T3 T4 T1

Sandrine Dudoit15 Experimental design In addition to experimental constraints, design decisions should be guided by the knowledge of which effects are of greater interest to the investigator. E.g. which main effects, which interactions. The experimenter should thus decide on the comparisons for which he wants the most precision and these should be made within slides to the extent possible.

Sandrine Dudoit16 Experimental design N.B. Efficiency can be measured in terms of different quantities –number of slides or hybridizations; –units of biological material, e.g. amount of mRNA for one channel.

Sandrine Dudoit17 Issues in experimental design Replication. Type of replication: –within or between slides replicates; –biological or technical replicates i.e., different vs. same extraction: generalizability vs. reproducibility. Sample size and power calculations. Dye assignments. Combining data across slides and sets of experiments: regression analysis … next.

Sandrine Dudoit18 2 x 2 factorial experiment O A B AB Study the joint effect of two treatments (e.g. drugs), A and B, say, on the gene expression response of tumor cells. There are four possible treatment combinations AB: both treatments are administered; A : only treatment A is administered; B : only treatment B is administered; O : cells are untreated. two factors, two levels each n=12

Sandrine Dudoit19 2 x 2 factorial experiment For each gene, consider a linear model for the joint effect of treatments A and B on the expression response.  : baseline effect;  : treatment A main effect;  : treatment B main effect;  : interaction between treatments A and B.

Sandrine Dudoit20 2 x 2 factorial experiment Log-ratio M for hybridization estimates O A B AB A Log-ratio M for hybridization estimates AB + 10 others.

Sandrine Dudoit21 Regression analysis For parameters , define a design matrix X so that E(M)=X  For each gene, compute least squares estimates of 

Sandrine Dudoit22 Regression analysis Combine data across slides for complex designs - can “link” different sets of hybridizations. Obtain unbiased and efficient estimates of the effects of interest (BLUE). Obtain measures of precision for estimated effects. Perform hypothesis testing. Extensions of linear models –generalized linear models; –robust weighted regression, etc.

Sandrine Dudoit23 Differential gene expression Identify genes whose expression levels are associated with a response or covariate of interest –clinical outcome such as survival, response to treatment, tumor class; –covariate such as treatment, dose, time. Estimation: estimate effects of interest and variability of these estimates. E.g. slope, interaction, or difference in means in a linear model. Testing: assess the statistical significance of the observed associations.

Sandrine Dudoit24 Use estimated effects in clustering genes x arrays matrix genes x estimated effects matrix Cluster analysis