Designing Microarray Experiments Naomi Altman Oct. 06.

Slides:



Advertisements
Similar presentations
Statistical Issues in the Design of Microarray Experiments Lara Lusa U.O. Statistica Medica e Biometria Istituto Nazionale per lo Studio e la Cura dei.
Advertisements

Experiment Design for Affymetrix Microarray.
Optimal designs for one and two-colour microarrays using mixed models
M. Kathleen Kerr “Design Considerations for Efficient and Effective Microarray Studies” Biometrics 59, ; December 2003 Biostatistics Article Oncology.
Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented.
Variance reduction techniques. 2 Introduction Simulation models should be coded such that they are efficient. Efficiency in terms of programming ensures.
1 Introduction to Experimental Design 1/26/2009 Copyright © 2009 Dan Nettleton.
Minimum Information About a Microarray Experiment - MIAME MGED 5 workshop.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Introduction to the design of cDNA microarray experiments Statistics 246, Spring 2002 Week 9, Lecture 1 Yee Hwa Yang.
Experimental design for microarrays Presented by Alex Sánchez and Carmen Ruíz de Villa Departament d’Estadística. Universitat de Barcelona.
Sandrine Dudoit1 Microarray Experimental Design and Analysis Sandrine Dudoit jointly with Yee Hwa Yang Division of Biostatistics, UC Berkeley
Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.
Getting the numbers comparable
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Analysis of Variance. Experimental Design u Investigator controls one or more independent variables –Called treatment variables or factors –Contain two.
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
Microarray Technology Types Normalization Microarray Technology Microarray: –New Technology (first paper: 1995) Allows study of thousands of genes at.
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.
Biol 500: basic statistics
Inferring the nature of the gene network connectivity Dynamic modeling of gene expression data Neal S. Holter, Amos Maritan, Marek Cieplak, Nina V. Fedoroff,
Microarrays: Theory and Application By Rich Jenkins MS Student of Zoo4670/5670 Year 2004.
Gene Expression Data Analyses (1) Trupti Joshi Computer Science Department 317 Engineering Building North (O)
Genomics I: The Transcriptome RNA Expression Analysis Determining genomewide RNA expression levels.
Microarrays: Basic Principle AGCCTAGCCT ACCGAACCGA GCGGAGCGGA CCGGACCGGA TCGGATCGGA Probe Targets Highly parallel molecular search and sort process based.
with an emphasis on DNA microarrays
Genome of the week - Deinococcus radiodurans Highly resistant to DNA damage –Most radiation resistant organism known Multiple genetic elements –2 chromosomes,
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
بسم الله الرحمن الرحيم * this presentation about :- “experimental design “ * Induced to :- Dr Aidah Abu Elsoud Alkaissi * Prepared by :- 1)-Hamsa karof.
QNT 531 Advanced Problems in Statistics and Research Methods
DNA Analysis Facility User Educational Series December 11, 2009.
The European Nutrigenomics Organisation Deciding and acting on quality of microarray experiments in genomics Chris Evelo BiGCaT Bioinformatics Maastricht.
Factors to Consider in Selecting a Genotyping Platform Elizabeth Pugh June 22, 2007.
CDNA Microarrays MB206.
Gene Expression Data Qifang Xu. Outline cDNA Microarray Technology cDNA Microarray Technology Data Representation Data Representation Statistical Analysis.
Some views on microarray experimental design Rainer Breitling Molecular Plant Science Group & Bioinformatics Research Centre University of Glasgow, Scotland,
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
Brian Kelly '06 Chapter 13: Experiments. Observational Study n Observational Study: A type of study in which individuals are observed or certain outcomes.
CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun.
Agenda Introduction to microarrays
The Scientific Method Formulation of an H ypothesis P lanning an experiment to objectively test the hypothesis Careful observation and collection of D.
We calculated a t-test for 30,000 genes at once How do we handle results, present data and results Normalization of the data as a mean of removing.
Design of Experiments Problem formulation Setting up the experiment Analysis of data Panu Somervuo, March 20, 2007.
Design of microarray gene expression profiling experiments Peter-Bram ’t Hoen.
DOX 6E Montgomery1 Unreplicated 2 k Factorial Designs These are 2 k factorial designs with one observation at each corner of the “cube” An unreplicated.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 14 Experimental.
Introduction to Statistical Analysis of Gene Expression Data Feng Hong Beespace meeting April 20, 2005.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Statistics for Differential Expression Naomi Altman Oct. 06.
Henrik Bengtsson Mathematical Statistics Centre for Mathematical Sciences Lund University, Sweden Plate Effects in cDNA Microarray Data.
Figure SOM1. Functional roles of the genes affected in zmet2-m1 mutants. Although the genes localized on the intracellular membranes were slightly over-represented.
Design of Micro-arrays Lecture Topic 6. Experimental design Proper experimental design is needed to ensure that questions of interest can be answered.
Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic.
Microarray Technology. Introduction Introduction –Microarrays are extremely powerful ways to analyze gene expression. –Using a microarray, it is possible.
Microarray (Gene Expression) DNA microarrays is a technology that can be used to measure changes in expression levels or to detect SNiPs Microarrays differ.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
CSIRO Insert presentation title, do not remove CSIRO from start of footer Experimental Design Why design? removal of technical variance Optimizing your.
Overview of Microarray. 2/71 Gene Expression Gene expression Production of mRNA is very much a reflection of the activity level of gene In the past, looking.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Hybridization Design for 2-Channel Microarray Experiments Naomi S. Altman, Pennsylvania State University), NSF_RCN.
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Final Project Everybody still registered for the grade who did not have their own project will receive an with file names to be used for their project.
Microarray: An Introduction
Getting the numbers comparable
Introduction to Experimental Design
Design Issues Lecture Topic 6.
Presentation transcript:

Designing Microarray Experiments Naomi Altman Oct. 06

Outline Replicability 2 objectives of Microarray Experiments From organism to data Steps in Designing a Microarray Study –1. Determine the objective of your study. –2. Determine the experimental conditions (treatments). –3. Determine the measurement platform –4. Determine the biological sampling design. –5. Determine the biological and technical replicates. –6. Determine the RNA pool. –7. Determine the hybridization design. –8. And for custom arrays …

Understanding Variation The basic tenet of scientific inference: Phenomena worthy of scientific investigation are replicable. (But just because something is replicable does not mean it is worth investigating.)

Study Objectives Inference about the technology Biological Inference

Study Objectives Inference about the technology Biological Inference Technical Replication Biological Replication

From Organism to Data Grow Organism Sample Tissue Spike and Label Sample Extract RNA Hybridize Scan

From Organism to Data Grow Organism Sample Tissue Spike and Label Sample Extract RNA Hybridize Scan } Biological Sample } Biological Subsample Technical Replication

Steps in Designing a Microarray Study 1. Determine the objective of your study. 2. Determine the experimental conditions (treatments). 3. Determine the measurement platform 4. Determine the biological sampling design. 5. Determine the biological and technical replicates. 6. Determine the RNA pool. 7. Determine the hybridization design. 8. And for custom arrays …

Where do we replicate? Grow Organism Sample Tissue Spike and Label Sample Extract RNA Hybridize Scan } Biological Inference } Individual Platform

Doing it right: The cost of doing a properly replicated experiment is high, but not as high as the cost of doing a bad experiment.

2. Determine the experimental conditions (treatments) under study. Genotypes – if comparing mutants, useful to have the background wild types Tissues - if possible, take all from the same individuals “stimuli” such as hormone treatments or exposure What is the appropriate “control” condition? Time course – multiple observations on the same individuals? - synchronization - when does the “action” occur?

2. Determine the experimental conditions (treatments) under study. 1 factor (e.g. genotype or tissue or stimuli or time) Typically the experiment is a “one-way layout” and the treatments or conditions are just the unique levels of the factor. 2 or more factors Typically the experiment is arranged as a factorial design using all combinations of factor levels. The design is said to be balanced if there are equal numbers of replicates at all levels for all conditions. Balance makes analysis simpler, but is not necessary.

2. Determine the experimental conditions (treatments) under study. Doing a pilot experiment is extremely useful for picking the appropriate experimental conditions. The cost of doing a pilot experiment is (almost) always repaid in a better experiment design. And pilot experiments are great for training.

3. Determine the measurement platform -Affymetrix, radionucleotide, cDNA, oligonucleotide -custom or off-the-shelf The cost of arrays is high, but not as high as the cost of : Repeating the entire experiment. Validating all findings on another platform. Losing a publication, grant opportunity or precedence. The costs include: Personnel time Reagents Analysis time

3. Determine the measurement platform -Affymetrix, radionucleotide, cDNA, oligonucleotide -custom or off-the-shelf The cost of arrays is high, but not as high as the cost of : Repeating the entire experiment. Validating all findings on another platform. Losing a publication, grant opportunity or precedence. The costs include: Personnel time Reagents Analysis time “But I can get arrays free from my buddy …” It’s not free.

3. Determine the measurement platform My current advice: Use off-the-shelf whole genome arrays when available. - less variability - better informatics - easier to compare across studies Seek the advice of your local microarray center when available. Be consistent with your collaborators (and competitors) when possible. But, be open to newer and better technologies as they arise.

4. Determine the biological sampling design. A block is a set of samples that have a natural clustering that reduces variability e.g. tissues from the same plant plants grown together in May (versus Dec) plants grown in Madison versus Gainesville The most power comes from: A strong protocol for sample collection. As complete set of conditions in every block. Replicating blocks.

4. Determine the biological sampling design. e.g. A time-course study of leaf development in 2 genotypes. Protocol: All leaves collected at 8 a.m. at specified stages. Blocking: 1 tray of each genotype grown together starting every 6 weeks. (3 replicates) One plant collected from each genotype at each stage and leaves harvested.

5. Determine the number of biological and technical replicates. The correct analyses of biological experiments differentiate between technical and biological replicates. Technical replication reduces measurement error. Biological replication is used for statistical inference. The more biological replicates you have, the higher your power. You need at least 3 biological replicates for each experimental condition for biological inference.

From Organism to Data Grow Organism Sample Tissue Spike and Label Sample Extract RNA Hybridize Scan } Biological Sample } Biological Subsample Technical Replication

5. Determine the number of biological and technical replicates. Technical replication is not necessary, but it can be cost effective if it is much cheaper than biological replication. (But you still need 3 biological replicates.) (Dye-swaps can be done using biological replicates.) Spot replication is very cheap and reduces missing data. But it is not cheap if you need a set of arrays to cover the genome.

5. Determine the number of biological and technical replicates. How many replicates? Sample sizes can be determined from power analysis. This requires a measure of biological variation and measurement error. These can be based on previous experiments or pilot experiments, and the correct ratio of technical replicates to biological replicates can be determined by ANOVA. The minimum acceptable sample size: 3 biological replicates

Doing it right: Design defensively: Spots fail. (Duplicate spots if possible.) Hybridizations fail. (Store extra samples.)

5. Determine the number of biological and technical replicates. With good statistical assistance, you can develop good designs that require fewer replicates but... If you go this route, it would be wise to make your statistician a collaborator – not “just” a consultant!

6. Determine the RNA pool. a.Standardization b. Pooling c. Amplification d. Spiking

6. Determine the RNA pool. a.Standardization – Since we want to compare samples it would be nice if they were quantitatively similar #cells or total RNA or total mRNA b. Pooling c. Amplification d. Spiking

6. Determine the RNA pool. a.Standardization b. Pooling – Pooling biological samples is like averaging This decreases biological variation, without increasing the number of arrays, and thus increases statistical power. But we still need 3 independent pools. c. Amplification d. Spiking

Pooling e.g. We have 3 biological samples and 3 arrays: 1 sample per array reduces both technical and biological variance by 1/3 provides measure of biological variability for biological inference. Pool all samples, hybridize to 3 arrays reduces both technical and biological variance by 1/3 provides a measure of technical variability NOT suitable for biological inference

Pooling e.g. We have 6 biological samples and 3 arrays: Using 3 pools of 2 samples each we reduce the biological variance by 1/6 and the technical variance by 1/3 we have a measure of biological variation suitable for biological inference

Pooling But pooling is not always wonderful: if one of the samples entering the pool is “bad” the entire pool is contaminated.

6. Determine the RNA pool. a.Standardization b. Pooling c. Amplification – Amplification can be used if the quantity of RNA is too small for hybridization. Amplification is supposed to preserve expression ratios for individual genes. Rule of thumb: If you need to amplify one sample, amplify all. d. Spiking

6. Determine the RNA pool. a.Standardization b. Pooling c. Amplification d. Spiking – Foreign RNA may be spiked into the sample for quality control or normalization. If you expect a high degree of differential expression, it is a good idea to have spiking controls to check your normalization. I recommend spiking several controls, each at a different concentration.

7. Determine the hybridization design. 1-channel arrays 2-channel arrays

7. Determine the hybridization design. 1-channel arrays Usually simple balanced ANOVA layout. Analysis of differential expression proceeds by t-tests or ANOVA or variations such as SAM, LIMMA, Wilcoxon...

Hybridization design for 1-channel arrays Typical experiments will be balanced one- way layouts or factorial designs. Analysis is somewhat more complicated if there are both technical and biological replicates.

Hybridization Design for 2-channel arrays Array RedGreen Many choices for hybridization schemes. Control design: 1 treatment, 1 control Reference design: many treatments, 1 control Loop design: many treatments Replicated Loops: many treatments Others: yet another statistical collaborator needed

Control Design Purpose: Compare 2 conditions A and C A C Both conditions are on every array. Equal numbers have A red and A green (which can be biological replicates). Analysis is usually done on A-C using parametric or nonparametric “t-test”. If A and C can be measured on the same biological sample, each biological sample is one array.

Reference Design Purpose: Compare many conditions A, B, C,.... A B C R The reference sample is on every array. Comparisons are among conditions, not reference. Dye-swap is not needed (but is safer in case there is no expression of a gene in R). Analysis uses X-R, and proceeds like 1-channel analysis.

Purpose: Compare many conditions (or combinations, including time course) A, B, C,.... Loop Design AB CD The dye-swap is between arrays. 2 samples per condition per loop. Analysis is done for each channel.

Purpose: Compare many conditions (or combinations, including time course) A, B, C,.... Replicated Loop Design AB CD Each pair of replicates is in a different loop. These can be designed to minimize variance of certain comparisons. AC BD

The Best Design (2-channel) Control designs are always fine for 2 condition studies. Reference designs always have less power than loop designs for the same number of arrays (and platform) but: are simpler to run and to analyze

E.g. 8 tissues on 16 arrays. Reference design: 2 arrays per tissue – 2 biological replicates. Reference sample is either genomic DNA or tissue pool. Loop Design: 2 loops of 8 – 4 biological replicates.

E.g. Time course with 4 conditions. Other Designs AB CD AB CD AB CD Time 1Time 2Time 3 The loops do not need to be connected for analysis. This is much simpler than the loop of 12 combinations of condition and time. Principles of optimal design can be used to pick the loops.

Doing it right. Plan the statistical analysis at the same time you plan the hybridization design. The correlation structure induced by the design is critical to the analysis. Correlation is induced by: sample pairing on 2-channel arrays repeated measurements on the same organism blocking

8. And for custom arrays Platform cDNAs or oligos Choice of genetic material (gene/ oligo selection) Spatial layout on array Control spots Number of spot replicates

8. And for custom arrays Platform – The cost is coming down! cDNAs or oligos Choice of genetic material (gene/ oligo selection) Spatial layout on array Control spots Number of spot replicates

8. And for custom arrays Platform – The cost is coming down! cDNAs or oligos - ?? Choice of genetic material (gene/ oligo selection) Spatial layout on array Control spots Number of spot replicates

8. And for custom arrays Platform – The cost is coming down! cDNAs or oligos - ?? Choice of genetic material (gene/ oligo selection) You can do some very fancy experiments Spatial layout on array Control spots Number of spot replicates

8. And for custom arrays Platform – The cost is coming down! cDNAs or oligos - ?? Choice of genetic material (gene/ oligo selection) You can do some very fancy experiments Spatial layout on array – at random, separating spot, oligo replicates. DO NOT arrange by EST set. Control spots Number of spot replicates

8. And for custom arrays Platform – The cost is coming down! cDNAs or oligos - ?? Choice of genetic material (gene/ oligo selection) You can do some very fancy experiments Spatial layout on array – at random Control spots – if used for normalization, should be placed at random or systematically spatially dispersed Number of spot replicates

8. And for custom arrays Platform – The cost is coming down! cDNAs or oligos - ?? Choice of genetic material (gene/ oligo selection) You can do some very fancy experiments Spatial layout on array – at random Control spots – if used for normalization, should be placed at random Number of spot replicates – 2 to 4 if possible, widely spaced, different oligos

References Some examples of nicely done microarray studies Caldo, Nettleton and Wise (2004) Plant Cell Jin et al (2001) Nature Genetics General Principles of Experimental Design (Textbooks) Montgomery (2005) Design and Analysis of Experiments Kuehl (2000) Design of Experiments - Statistical Principles of Research Design and Analysis 2nd Edition Comparisons of Measurement Platforms Cao et al (2003) Alliance for Cellular Signaling Irizarry et al (2004) Berkeley Electronic Press Sample size Black and Doerge (2002) Bioinformatics Cui and Churchill (2003) Methods of microarray data analysis Gadbury et al (2004) Statistical Methods in Medical Research

References Sample pooling Kendziorski, Zhang and Attie (2003) Biostatistics Sample Amplification L. Petalidis et al (2003) Nucleic Acid Research Use of spiking controls Altman (2005) Applied Bioinformatics Yang et al (2001) Microarrays: Optical Technologies and Informatics, Experiment Design for 2-channel Arrays Kerr (2003) Biometrics Simon, Radmacher and Dobbin (2002) Genetic Epidemiology Vinciotti et al (2004) Bioinformatics Yang and Speed (2002) Nature Reviews: Genetics