Design of microarray gene expression profiling experiments Peter-Bram ’t Hoen.

Slides:

Advertisements

Similar presentations

Optimal designs for one and two-colour microarrays using mixed models

Advertisements

M. Kathleen Kerr “Design Considerations for Efficient and Effective Microarray Studies” Biometrics 59, ; December 2003 Biostatistics Article Oncology.

Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented.

1 Introduction to Experimental Design 1/26/2009 Copyright © 2009 Dan Nettleton.

Microarray Normalization

Microarray technology and analysis of gene expression data Hillevi Lindroos.

Introduction to the design of cDNA microarray experiments Statistics 246, Spring 2002 Week 9, Lecture 1 Yee Hwa Yang.

Experimental design for microarrays Presented by Alex Sánchez and Carmen Ruíz de Villa Departament d’Estadística. Universitat de Barcelona.

Plotting the path from RNA to microarray: the importance of experimental planning and methods Glenn Short Microarray Core Facility/Lipid Metabolism Unit.

Microarray Data Analysis Stuart M. Brown NYU School of Medicine.

Sandrine Dudoit1 Microarray Experimental Design and Analysis Sandrine Dudoit jointly with Yee Hwa Yang Division of Biostatistics, UC Berkeley

Getting the numbers comparable

‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.

GCB/CIS 535 Microarray Topics John Tobias November 3 rd, 2004.

DNA Arrays …DNA systematically arrayed at high density, –virtual genomes for expression studies, RNA hybridization to DNA for expression studies, –comparative.

Microarray Technology Types Normalization Microarray Technology Microarray: –New Technology (first paper: 1995) Allows study of thousands of genes at.

Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.

Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.

Some thoughts of the design of cDNA microarray experiments Terry Speed & Yee HwaYang, Department of Statistics UC Berkeley MGED IV Boston, February 14,

Introduce to Microarray

Gene Expression Data Analyses (1) Trupti Joshi Computer Science Department 317 Engineering Building North (O)

Genomics I: The Transcriptome RNA Expression Analysis Determining genomewide RNA expression levels.

Designing Microarray Experiments Naomi Altman Oct. 06.

Analysis of microarray data

Microarray Data Analysis Illumina Gene Expression Data Analysis Yun Lian.

with an emphasis on DNA microarrays

Genome of the week - Deinococcus radiodurans Highly resistant to DNA damage –Most radiation resistant organism known Multiple genetic elements –2 chromosomes,

CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.

Affymetrix vs. glass slide based arrays

Practical Issues in Microarray Data Analysis Mark Reimers National Cancer Institute Bethesda Maryland.

DNA Analysis Facility User Educational Series December 11, 2009.

CDNA Microarrays MB206.

Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.

Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.

Agenda Introduction to microarrays

Verna Vu & Timothy Abreo

Microarray - Leukemia vs. normal GeneChip System.

Design of Experiments Problem formulation Setting up the experiment Analysis of data Panu Somervuo, March 20, 2007.

Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.

Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine

Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.

Statistical Principles of Experimental Design Chris Holmes Thanks to Dov Stekel.

Summarization of Oligonucleotide Expression Arrays BIOS Winter 2010.

1 FINAL PROJECT- Key dates –last day to decided on a project * 11-10/1- Presenting a proposed project in small groups A very short presentation (Max.

Introduction to Statistical Analysis of Gene Expression Data Feng Hong Beespace meeting April 20, 2005.

Gene Expression Analysis. 2 DNA Microarray First introduced in 1987 A microarray is a tool for analyzing gene expression in genomic scale. The microarray.

Statistics for Differential Expression Naomi Altman Oct. 06.

Design of Micro-arrays Lecture Topic 6. Experimental design Proper experimental design is needed to ensure that questions of interest can be answered.

1 Introduction to Mixed Linear Models in Microarray Experiments 2/1/2011 Copyright © 2011 Dan Nettleton.

Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be more direct, but is currently.

Microarray (Gene Expression) DNA microarrays is a technology that can be used to measure changes in expression levels or to detect SNiPs Microarrays differ.

Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics

Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.

CSIRO Insert presentation title, do not remove CSIRO from start of footer Experimental Design Why design? removal of technical variance Optimizing your.

Overview of Microarray. 2/71 Gene Expression Gene expression Production of mRNA is very much a reflection of the activity level of gene In the past, looking.

Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.

ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.

Hybridization Design for 2-Channel Microarray Experiments Naomi S. Altman, Pennsylvania State University), NSF_RCN.

Microarray Data Analysis The Bioinformatics side of the bench.

DNA Microarray Overview and Application. Table of Contents Section One : Introduction Section Two : Microarray Technique Section Three : Types of DNA.

From: Duggan et.al. Nature Genetics 21:10-14, 1999 Microarray-Based Assays (The Basics) Each feature or “spot” represents a specific expressed gene (mRNA).

Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.

Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.

Other uses of DNA microarrays

Microarray: An Introduction

Functional Genomics in Evolutionary Research

Getting the numbers comparable

Introduction to Experimental Design

Normalization for cDNA Microarray Data

Design Issues Lecture Topic 6.

Presentation transcript:

Design of microarray gene expression profiling experiments Peter-Bram ’t Hoen

2 Lay-out Practical considerations Pooling Randomization One-color vs Two-colors Two-color hybridization designs Ratio-based vs Intensity-based analysis

3 Think before you start research question choice of technology controls and replicates Ref: Churchill Nature Genetics Supplement 32:

4 Research question Limit your (initial) number of question / conditions choose best timepoint for mRNA regulation can be different from protein/activity pilots using RT-qPCR experimental follow-up what will you do with the data? verification of differential gene expression in vitro experiments to study mechanism "in vivo" verification in tissue sections

5 Choice of technology What is affordable? Do a pilot to estimate the variance for your samples, experimental set-up and platform Calculate your power: What is the lower border of the effect size that you can pick up?

6 Controls positive: genes whose regulation is known check on biological experiment & data analysis positive: spikes in mRNA and/or hyb mix check labeling procedure and hybridization detection range (sensitivity) and dynamic range "landing lights" for gridding software negative controls: non-specific binding check cross-hybridization: buffer, non-homologous DNA

7 Spikes RCACabrbcLLTP4LTP6 Spiked 2-fold change (copies/cell) XCP2RPC1NAC1TIMPRK Spiked 3-fold change (copies/cell) Test RNA Reference RNA spike …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… …… Array containing DNA controls …… …… …… …… …… …… cDNA probe synth. & hybridize

8 Spikes Van de Peppel et al. EMBO Reports 4, 387 (2003)

9 Controls positive: genes whose regulation is known check on biological experiment & data analysis positive: spikes in mRNA and/or hyb mix check labeling procedure and hybridization detection range (sensitivity) and dynamic range "landing lights" for gridding software negative controls: non-specific binding check cross-hybridization: buffer, non-homologous DNA

10 Replicates Include sufficient replicates, based on pilot experiment Biological replicates are preferred over technical replicates Control experimental variables with possible unintended effects genetic background gender age

11 Randomization Randomize samples with respect to experimental influences experimenter day of hybridization batch of arrays dye etc

12 Pooling Often done because of lack of sufficient amounts of RNA, but good amplification protocols are available Advantages: dampening of individual variation, may increase statistical power Generally not recommended: outliers in the population may result in large and significant effects information on the differences in the population is lost and is probably biologically relevant in fact, it is an artificial way to increase the significance of your findings

13 Hybridization design One color: not many difficulties expected Two color: what to hybridize with what in which color? Reference design Paired design Loop design Mixed design Read: Yang & Speed (2002). Design issues for cDNA microarray experiments. Nature Reviews Genetics 3,

14 Hybridization design: general issues Comparisons on the same array are more precise than comparisons on different arrays Identify most important comparisons Hybridize those on the same slide Dye swap A dye-effect is always there Balance designs with respect to dye (exception: some common reference designs)

15 Common reference vs direct hybridizations Direct Common reference A B A B R Variance[ log(A/B) ] for slide = s 2 then the variance of the average of the two measurements is s 2 /2 s 2 /2 log(A/B) = log(A/R) – log(B/R) and variance of log(A/B) is variance[ log(A/R) ] + variance[ log(B/R) ] = s 2 + s 2 = 2 s 2

16 More samples Loop Reference 6 arrays A B R C A B C Log (A/B) = 2/3 log (A/B) + 1/3 {log (A/C) – log (B/C)} Assuming that all variances are equal Variance [ log(A/B) ] = 4/9 (s 2 / 2) + 1/9 (s 2 ) = 1/3 s 2 Variance [ log(A/B) ] = Variance [ log(A/C) ] = Variance [ log(B/C) ] = 0.5s s 2 = s 2

17 Common reference vs direct hybridizations Theoretical Considerations A design is optimal when it minimizes the variance of the effect of interest Look for designs leading to small variance of log(A/B) Practical considerations Common reference may be desired when experiment is extended in the future or when a lot of different conditions have to be compared Choose a biologically relevant common reference (say: your control sample). In that case, your ratios are of interest and better interpretable

18 Time-course designs Take 4 time points T1 T2 T3 T4 The best choice of design depends on the comparisons of interest and on the number of slides available

19 Time-course designs Using 3 slides: T 1 T 2 T 3 T 4 which is the best to estimate changes relative to the initial time point: T 2 / T 1, T 3 / T 1, T 4 / T 1

20 Time-course designs Using 3 slides: T 1 T 2 T 3 T 4 which is the best to estimate relative changes between successive time points: T 2 / T 1, T 3 / T 2, T 4 / T 3

21 Time course designs Using 4 slides: T 1 T 2 T 3 T 4 R which is the reference design; All comparisons have equal precision

22 Time course design Using 4 slides: T 1 T 2 T 3 T 4 which is the loop design, balanced wrt dye Distant comparisons have lower precision

23 Time course designs Using 4 slides: T 1 T 2 T 3 T 4 also uses exactly 2 hybridizations per treatment, balanced wrt dye. Most precise estimates: 1/2, 1/3, 2/4, 3/4

24 Factorial designs Designs for studies which involve factors as explanatory variables Age group gender Cell line Tumor types

25 Factorial designs Glonek & Solomon (2004) Admissible design: using the same number of arrays, there are no other designs yielding smaller variances of all parameters Glonek et al.Biostatistics 5, (2004)

26 Factorial design; example Time 0h 24h Cell lines I (non-leukaemic) II (leukaemic) Find genes diff. expressed at 24 but not at 0: interaction between time and cell line

27 Factorial design; possible samples All combinations of factor levels. In this case, 4 are possible:

28 Factorial design: analysis model (log-)linear model is used experimental conditions correspond to parameter combinations as in:

29 Factorial design; possible arrays I,0 I,24 II,0II,24 (1) (2) (3) (4) (5) (6)

30 Optimal admissible design Designs that are not worse than others, and for which the variance of the parameter of interest is (one of the) smallest In the example: wish to find admissible designs for which the interaction term has one of the smallest variances

31 Glonek et al.Biostatistics 5, (2004)

32 Optimal admissible design Glonek et al.Biostatistics 5, (2004)

33 Factorial designs: conclusions Design with all pairwise comparisons is not the best in this case Best design can only be found with respect to a model if model does not fit the data well, design choice may not be the best make sure model chosen is adequate

34 How to compare efficiently many different conditions? Common reference: not efficient Loop and mixed designs: not all comparisons have equal precisions GA Churchill, Nat Genet Dec;32 Suppl:490-5

35 Possible solution Randomized design Intensity-based rather than ratio-based calculations Requires: Hybridization of two samples independent; no competition for binding sites Absence of large spot and array effects To be tested for each platform

36 Our favourite platform Spotted collection of 65-mer oligonucleotides (Sigma- Compugen collection) 22K

37 Design used to demonstrate independent hyb ‘t Hoen et al. Nucleic Acids Res. 32:e41 (2004)

38 Distribution of signal intensities is similar ‘t Hoen et al. Nucleic Acids Res. 32:e41 (2004)

39 Correlation of intensities is high ‘t Hoen et al. Nucleic Acids Res. 32:e41 (2004) R > < R < 0.95 R < 0.90

40 Effect of addition of unlabelled target Single target on microarray Two targets on microarray ‘t Hoen et al. Nucleic Acids Res. 32:e41 (2004)

41 Correlation of ratios calculated from different hyb designs ‘t Hoen et al. Nucleic Acids Res. 32:e41 (2004)

42 Intensity-based analysis Hybridizations of two targets on the array are independent No saturation and no competition Intensity readings show high inter-array correlation Comparisons on the same array have highest precision and all other comparisons have equal precision ‘t Hoen et al. Nucleic Acids Res. 32:e41 (2004)

43 Example of randomized design Turk et al. FASEB J 20, (2006) Mouse models for muscular dystrophy

44 Our design Randomly assign samples to the arrays, avoiding co- hybridization of sample from the same group 2 biological replicates 4 technical replicates (dye- swap + replicate spotting) Turk et al. FASEB J 20, (2006)

45 Intensity-based analysis can go wrong Vinciotti et al. Bioinformatics 21: (2005)

46 Intensity-based analysis can go wrong Vinciotti et al. Bioinformatics 21: (2005)

47 Some guidelines First determine the main question, pointing out the effect of interest log[A/B] Then choose analysis model, so that effect variance can be computed VAR { log[A/B] } Practical constraints: amount of RNA available, number of hybridizations, number of slides A good design measures the effect of interest as accurately as possible small VAR { log[A/B] }

48 Some useful links

49 Acknowledgements Human and Clinical Genetics, LUMC Judith Boer Renée de Menezes Rolf Turk Ellen Sterrenburg Johan den Dunnen Gertjan van Ommen Microarray facility: Leiden Genome Technology Center

50 Case study Two genetically-modified zebrafish strains and one wild-type Defects mainly in muscle development Apparent at hours of development; early death Question: which biological pathways are affected and responsible for defective myogenesis?

51 Possible platforms and budget Affymetrix (1-color): 500 euro per chip; variance for ratio of two samples on two chips: s 2 Homespotted arrays (2-color): 100 euro per chip variance for ratio of two samples on one chip: 2s 2 Budget: 12,000 euro

52 Questions Isolation of specific compartments / whole animal lysates? Pooling? How many replicates? Which hybridization design? What is the variance of the most important comparisons?