Some thoughts of the design of cDNA microarray experiments Terry Speed & Yee HwaYang, Department of Statistics UC Berkeley MGED IV Boston, February 14,

Slides:



Advertisements
Similar presentations
Randomized Complete Block and Repeated Measures (Each Subject Receives Each Treatment) Designs KNNL – Chapters 21,
Advertisements

Optimal designs for one and two-colour microarrays using mixed models
Experimental Design and Differential Expression Class web site: Statistics for Microarrays.
M. Kathleen Kerr “Design Considerations for Efficient and Effective Microarray Studies” Biometrics 59, ; December 2003 Biostatistics Article Oncology.
Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented.
Pre-processing in DNA microarray experiments Sandrine Dudoit PH 296, Section 33 13/09/2001.
1 MicroArray -- Data Analysis Cecilia Hansen & Dirk Repsilber Bioinformatics - 10p, October 2001.
Mathematical Statistics, Centre for Mathematical Sciences
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Introduction to the design of cDNA microarray experiments Statistics 246, Spring 2002 Week 9, Lecture 1 Yee Hwa Yang.
Experimental design for microarrays Presented by Alex Sánchez and Carmen Ruíz de Villa Departament d’Estadística. Universitat de Barcelona.
Plotting the path from RNA to microarray: the importance of experimental planning and methods Glenn Short Microarray Core Facility/Lipid Metabolism Unit.
Normalization of Microarray Data - how to do it! Henrik Bengtsson Terry Speed
Sandrine Dudoit1 Microarray Experimental Design and Analysis Sandrine Dudoit jointly with Yee Hwa Yang Division of Biostatistics, UC Berkeley
Statistics for Microarrays
The second-simplest cDNA microarray data analysis problem Terry Speed, UC Berkeley Fred Hutchinson Cancer Research Center March 9, 2001.
Normalization for cDNA Microarray Data Yee Hwa Yang, Sandrine Dudoit, Percy Luu and Terry Speed. SPIE BIOS 2001, San Jose, CA January 22, 2001.
Normalization Class web site: Statistics for Microarrays.
Gene expression Terry Speed Lecture 4, December 18, 2001.
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
Terry Speed Wald Lecture III August 9, 2001
1 Lecture 21, Statistics 246, April 8, 2004 Identifying expression differences in cDNA microarray experiments, cont.
Corrections and Normalization in microarrays data analysis
Gene Expression Data Analyses (1) Trupti Joshi Computer Science Department 317 Engineering Building North (O)
Genomics I: The Transcriptome RNA Expression Analysis Determining genomewide RNA expression levels.
Two-Color Microarrays: Reference Designs and Reference RNAs. Kathleen Kerr Department of Biostatistics University of Washington Collaborators: Kyle Serikawa,
Designing Microarray Experiments Naomi Altman Oct. 06.
Microarrays: Basic Principle AGCCTAGCCT ACCGAACCGA GCGGAGCGGA CCGGACCGGA TCGGATCGGA Probe Targets Highly parallel molecular search and sort process based.
(4) Within-Array Normalization PNAS, vol. 101, no. 5, Feb Jianqing Fan, Paul Tam, George Vande Woude, and Yi Ren.
Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004.
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
CDNA Microarrays MB206.
Panu Somervuo, March 19, cDNA microarrays.
Gene Expression Data Qifang Xu. Outline cDNA Microarray Technology cDNA Microarray Technology Data Representation Data Representation Statistical Analysis.
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003.
Agenda Introduction to microarrays
Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research.
Design of Experiments Problem formulation Setting up the experiment Analysis of data Panu Somervuo, March 20, 2007.
ARK-Genomics: Centre for Comparative and Functional Genomics in Farm Animals Richard Talbot Roslin Institute and R(D)SVS University of Edinburgh Microarrays.
Design of microarray gene expression profiling experiments Peter-Bram ’t Hoen.
Dose Response 100% 50% 0% IC50, EC50 Dose response curve.
Gene Expression and Evolution. Why are Evolutionists Interested in Gene Expression? Divergence in gene expression can underlie differences between taxa.
Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002.
Statistics for Differential Expression Naomi Altman Oct. 06.
Henrik Bengtsson Mathematical Statistics Centre for Mathematical Sciences Lund University, Sweden Plate Effects in cDNA Microarray Data.
Design of Micro-arrays Lecture Topic 6. Experimental design Proper experimental design is needed to ensure that questions of interest can be answered.
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be more direct, but is currently.
Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic.
Statistical Analysis of Microarray Data By H. Bjørn Nielsen.
Microarray hybridization Usually comparative – Ratio between two samples Examples – Tumor vs. normal tissue – Drug treatment vs. no treatment – Embryo.
Gene expression Statistics 246, Week 3, Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.
CSIRO Insert presentation title, do not remove CSIRO from start of footer Experimental Design Why design? removal of technical variance Optimizing your.
Overview of Microarray. 2/71 Gene Expression Gene expression Production of mRNA is very much a reflection of the activity level of gene In the past, looking.
Hybridization Design for 2-Channel Microarray Experiments Naomi S. Altman, Pennsylvania State University), NSF_RCN.
The second-simplest cDNA microarray data analysis problem Terry Speed, UC Berkeley Bioinformatic Strategies For Application of Genomic Tools to Environmental.
Microarray Data Analysis The Bioinformatics side of the bench.
Henrik Bengtsson Mathematical Statistics Centre for Mathematical Sciences Lund University Plate Effects in cDNA Microarray Data.
Variability & Statistical Analysis of Microarray Data GCAT – Georgetown July 2004 Jo Hardin Pomona College
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
NCode TM miRNA Analysis Platform Identifies Differentially Expressed Novel miRNAs in Adenocarcinoma Using Clinical Human Samples Provided By BioServe.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
1 G Lect 13b G Lecture 13b Mixed models Special case: one entry per cell Equal vs. unequal cell n's.
Microarray: An Introduction
Microarray Data Analysis Xuming He Department of Statistics University of Illinois at Urbana-Champaign.
1 Lecture 20, Statistics 246, April 6, 2004 Identifying expression differences in cDNA microarray experiments cDNA microarray experiments.
Statistics for Microarray Data Analysis – Lecture 3
Functional Genomics in Evolutionary Research
Normalization for cDNA Microarray Data
Design Issues Lecture Topic 6.
Presentation transcript:

Some thoughts of the design of cDNA microarray experiments Terry Speed & Yee HwaYang, Department of Statistics UC Berkeley MGED IV Boston, February 14, 2002

Some aspects of design Layout of the array –Which cDNA sequence to print? Library Controls –Spatial position Allocation of samples to the slides –Different design layout A vs B : Treatment vs control Multiple treatments Factorial Time series –Other considerations Replication Physical limitations: the number of slides and the amount of material Extensibility - linking

Some issues to consider before designing cDNA microarray experiments Scientific Aims of the experiment Specific questions and priorities between them. How will the experiments answer the questions posed? Practical (Logistic) Types of mRNA samples: reference, control, treatment, mutant, etc Amount of material. Count the amount of mRNA involved in one channel of hybridization as one unit. The number of slides available for the experiment. Other Information The experimental process prior to hybridization: sample isolation, mRNA extraction, amplification, labelling,… Controls planned: positive, negative, ratio, etc. Verification method: Northern, RT-PCR, in situ hybridization, etc.

Natural design choice Case 1: Meaningful biological control (C) Samples: Liver tissue from four mice treated by cholesterol modifying drugs. Question 1: Genes that respond differently between the T and the C. Question 2: Genes that responded similarly across two or more treatments relative to control. Case 2: Use of universal reference. Samples: Different tumor samples. Question: To discover tumor subtypes. C T1 T2T3T4 T1T1 Ref T2T2 T n-1 TnTn

Treatment vs Control Two samples e.g. KO vs. WT or mutant vs. WT TC TRef C Direct Indirect  2 /22222 average (log (T/C))log (T / Ref) – log (C / Ref )

Caveat The advantage of direct over indirect comparisons was first pointed out by Churchill & Kerr, and in general, we agree with the conclusion. However, you can see in the last M vs A plot that the difference is not a factor of 2, as theory predicts. Why? A likely explanation is that the assumption that log(T/Ref) and log(C/Ref) are uncorrelated is not valid, and so the gains are less than predicted. The reason for the correlation is less obvious, but there are a number of possibilities. One is that we use mRNA from the same extraction; another is that we didn‘t dye-swap with the two indirect comparisons, but did when we replicated the direct comparison. The answer is not yet clear.

Labeling 3 sets of self – self hybridization: (cerebellum vs cerebellum) Data 1 and Data 2 were labeled together and hybridized on two slides separately. Data 3 were labeled separately. Data 1 Data 2 Data 3

Olfactory bulb experiment: 3 sets of Anterior vs Dorsal performed on different days #10 and #12 were from the same RNA isolation and amplification #12 and #18 were from different dissections and amplifications All 3 data sets were labeled separately before hybridization Extraction

I) Common Reference II) Common reference III) Direct comparison Number of Slides Ave. variance Units of material A = B = C = 1A = B = C = 2 Ave. variance One-way layout: one factor, k levels CB A ref CBA CBA

I) Common Reference II) Common reference III) Direct comparison Number of Slides N = 3N=6N=3 Ave. variance20.67 Units of materialA = B = C = 1A = B = C = 2 Ave. variance10.67 One-way layout: one factor, k levels CB A ref CBA CBA For k = 3, efficiency ratio (Design I / Design III) = 3. In general, efficiency ratio = 2k / (k-1). However, remember the assumption!

Design I Design III A B C A Ref BC Illustration from one experiment Box plots of log ratios: we are still ahead!

CTL OSM EGF OSM & EGF Factorial experiments Treated cell lines Possible experiments Here we are interested not in genes for which there is an O or an E effect, but in which there is an O  E interaction, i.e. in genes for which log(O&E/O)-log(E/C) is large or small.

Other examples of factorial experiments Suppose we have tumor T and standard cells S from the same tissue, and are interested in the impact of radiation R on gene expression. In general, genes for which log(RT/T) and log(RS/S) are large or small, will be less interesting to us than those for which log(RT/T) - log(RS/S) are large or small, i.e. those with large interactions. Next, suppose that our interest is in comparing gene expression in two mutants, say M and M’, at two developmental stages, E and P say. Then we are probably more interested in those genes for which the temporal pattern in the two mutants differ, than in the patterns themselves, i.e. interest focusses on genes for which log(ME/MP)-log(M’E/M’P) is large or small, again the ones with large interactions.

IndirectA balance of direct and indirect I)II)III)IV) # Slides N = 6 Main effect A NA Main effect B Interacti on A.B x 2 factorial: some design options C A.BBA B C A B C A B C A Table entry: variance (assuming all log ratios uncorrelated)

Design choices in time series. Entry: variance t vs t+1t vs t+2t vs t+3 Ave T1T2T2T3T3T4T1T3T2T4T1T4 N=3A) T1 as common reference B) Direct Hybridization N=4C) Common reference D) T1 as common ref + more E) Direct hybridization choice F) Direct Hybridization choice T2 T3 T4 T1 T2 T3 T4 T1 Ref T2 T3 T4 T1 T2T3T4T1 T2T3T4T1 T2 T3 T4 T1

M1.WT.P11 M1.MT.P21 M1.MT.P11 M1.WT.P21M1.WT.P1 M1.MT.P1 Mutant 1 (M1) Mutant 2 (M2) M2.WT.P11 M2.MT.P21 M2.MT.P11 M2.WT.P21 M2.WT P1 M2.MT.P1 Question: Seek genes that are changing over time and are different in MT vs WT. Analysis: Looking at the interaction effect between time and type. An recently designed factorial experiment

Summary The balance of direct and indirect comparisons in a given context should be determined by optimizing the precision of the estimates among comparisons of interest, subject to the scientific and physical constraints of the experiment.

Acknowledgments Jean Yee Hwa Yang Sandrine Dudoit Gary Glonek (Adelaide) Ingrid Lönnstedt (Uppsala) John Ngai’s Lab (Berkeley) Jonathan Scolnick Cynthia Duggan Vivian Peng Moriah Szpara Percy Luu Elva Diaz Dave Lin (Cornell)

Some web sites: Technical reports, talks, software etc. Statistical software R (“GNU’s S”) Packages within R environment: -- SMA (statistics for microarray analysis) Spot