Design of Micro-arrays Lecture Topic 6. Experimental design Proper experimental design is needed to ensure that questions of interest can be answered.

Slides:



Advertisements
Similar presentations
Analysis by design Statistics is involved in the analysis of data generated from an experiment. It is essential to spend time and effort in advance to.
Advertisements

Experiment Design for Affymetrix Microarray.
M. Kathleen Kerr “Design Considerations for Efficient and Effective Microarray Studies” Biometrics 59, ; December 2003 Biostatistics Article Oncology.
1 Introduction to Experimental Design 1/26/2009 Copyright © 2009 Dan Nettleton.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 13 Experiments and Observational Studies.
Chapter 3 Producing Data 1. During most of this semester we go about statistics as if we already have data to work with. This is okay, but a little misleading.
Introduction to the design of cDNA microarray experiments Statistics 246, Spring 2002 Week 9, Lecture 1 Yee Hwa Yang.
Experimental design for microarrays Presented by Alex Sánchez and Carmen Ruíz de Villa Departament d’Estadística. Universitat de Barcelona.
Sandrine Dudoit1 Microarray Experimental Design and Analysis Sandrine Dudoit jointly with Yee Hwa Yang Division of Biostatistics, UC Berkeley
Chapter 28 Design of Experiments (DOE). Objectives Define basic design of experiments (DOE) terminology. Apply DOE principles. Plan, organize, and evaluate.
Statistics: The Science of Learning from Data Data Collection Data Analysis Interpretation Prediction  Take Action W.E. Deming “The value of statistics.
Determining the Size of
Introduction to the design (and analysis) of experiments James M. Curran Department of Statistics, University of Auckland
Experiments and Observational Studies.  A study at a high school in California compared academic performance of music students with that of non-music.
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
Chapter 1: Introduction to Statistics
Copyright © 2010 Pearson Education, Inc. Chapter 13 Experiments and Observational Studies.
Experiments and Observational Studies. Observational Studies In an observational study, researchers don’t assign choices; they simply observe them. look.
Copyright © 2010 Pearson Education, Inc. Slide
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 13 Experiments and Observational Studies.
CDNA Microarrays MB206.
Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
Slide 13-1 Copyright © 2004 Pearson Education, Inc.
Part III Gathering Data.
5.2 Designing Experiments
Design of Experiments Problem formulation Setting up the experiment Analysis of data Panu Somervuo, March 20, 2007.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 4 Designing Studies 4.2Experiments.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 4 Designing Studies 4.2Experiments.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
CHAPTER 9: Producing Data: Experiments. Chapter 9 Concepts 2  Observation vs. Experiment  Subjects, Factors, Treatments  How to Experiment Badly 
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 1-5 Collecting Sample Data.
Statistics for Differential Expression Naomi Altman Oct. 06.
1 Introduction to Mixed Linear Models in Microarray Experiments 2/1/2011 Copyright © 2011 Dan Nettleton.
C82MST Statistical Methods 2 - Lecture 1 1 Overview of Course Lecturers Dr Peter Bibby Prof Eamonn Ferguson Course Part I - Anova and related methods (Semester.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 4 Designing Studies 4.2Experiments.
CSIRO Insert presentation title, do not remove CSIRO from start of footer Experimental Design Why design? removal of technical variance Optimizing your.
Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring.
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
CHAPTER 9: Producing Data Experiments ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
1 Chapter 11 Understanding Randomness. 2 Why Random? What is it about chance outcomes being random that makes random selection seem fair? Two things:
CHAPTER 9: Producing Data Experiments ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Producing Data 1.
Psychological Experimentation The Experimental Method: Discovering the Causes of Behavior Experiment: A controlled situation in which the researcher.
STA248 week 121 Bootstrap Test for Pairs of Means of a Non-Normal Population – small samples Suppose X 1, …, X n are iid from some distribution independent.
CHAPTER 4 Designing Studies
Research methods Lesson 2.
Principles of Experiment
CHAPTER 4 Designing Studies
Chapter 14: Analysis of Variance One-way ANOVA Lecture 8
Experimental Design.
Experimental Design.
CHAPTER 4 Designing Studies
CHAPTER 9: Producing Data— Experiments
Statistical Reasoning December 8, 2015 Chapter 6.2
CHAPTER 4 Designing Studies
CHAPTER 4 Designing Studies
Introduction to Experimental Design
CHAPTER 4 Designing Studies
CHAPTER 4 Designing Studies
Chapter 4: Designing Studies
Introduction to the design (and analysis) of experiments
CHAPTER 4 Designing Studies
CHAPTER 4 Designing Studies
Principles of Experimental Design
CHAPTER 4 Designing Studies
Design Issues Lecture Topic 6.
Presentation transcript:

Design of Micro-arrays Lecture Topic 6

Experimental design Proper experimental design is needed to ensure that questions of interest can be answered and that this can be done accurately, given experimental constraints, such as cost of reagents and availability of mRNA.

Design considerations in micro-arrays There are 2 main components where Designs come in in Micro-arrays: –Probe Design –Allocation of RNA to probes

Array/Probe design Which gene-representative sequence from which gene collection to print on the array? Where? Controls or Not? Numbers, how many controls, how many genes? - Duplicate or replicate spots within a slide position.

Commonly asked questions Should we put duplicates on a slides. What should be the percentage of control spots? Where should the control spots be placed? [These relates to preprocessing such as quality assessment and normalization].

Probe Design As Statisticians we often have VERY little say on the probe designs. The only input may be in location of control spots. However, we may have some input in the allocation of RNA samples to the probes.

Idea behind Experimental Design It was introduced by Sir Ronald Fisher in the 1920s to deal with systematic sources of variation in agricultural field trials. The same ideas are true TODAY for Micro-arrays. Fisher’s idea was divided into 3 main principles: –Randomization –Replication –Local Control or Blocking Lets discuss some terms USED in design.

Terms and Definitions Treatment/ Condition: any attribute of primary interest Unit: Independent Replicate that is subject to the treatment Block: any attribute that is believed to have an influence on the response but NOT of primary interest Crossing: assigning all possible combinations of factors to units Confounding: the effect when the effect of one factor cannot be separated from another factor

Designing using Principles Randomization: a chance device to assign treatments to units, essential to reduce any systematic bias Replication: including more than one unit per condition, allows us to estimate random variation and is also used for reducing bias. Local control/blocking: if we believe that there is a systematic source of variation that may affect the response, we should identify this source and randomize within the blocks.

Crossing and Confounding Crossing: refers to assigning one of all possible combinations to the units. Common in terms of dye-swap, exposing all experimental conditions to both dyes. Confounding: happens when one factor cannot be told apart from another factor Example: you have two conditions H and C and 2 slides. You hybridize condition H with Red dye in all slides and condition C with Green dye in all slides. Here you cannot distinguish the effect of Dye from condition. This is called confounding.

Example: Consider a two-channel micro-array experiment done where we are interested in comparing RNA samples from healthy mice (H) to that of cancerous (C) mice. For the experiment we have 2 arrays (A1, A2) Here the treatment: is the status of the mice (2 levels: H and C) The UNIT is the array. (There is some debate about this, some argue each gene is the unit). The Block is the color (two levels red and green) Now lets consider how the experiment was done.

DESIGN 1 ARRAY AND CONDITION CONFOUNDING DesignArray A1Array A2 Channel RedHC Channel GreenHC

DESIGN 2 NON DYE-SWAP: COLOR AND CONDITION CONFOUNDING DesignArray A1Array A2 Channel RedHH Channel GreenCC

DESIGN 3 DYE-SWAP DesignArray A1Array A2 Channel RedHC Channel GreenCH

DESIGN 4 REFERNCE DESIGN DesignArray A1Array A2 Channel RedHC Channel GreenREF

Example continued Here we had two colors (blocks) Hence each condition (healthy or cancer) should be randomized within each color. Finally we want more than one slide for each treatment, block combination. The idea is: Response = f(treatment effect, block effect, random error) Randomization and replication allows us to believe that if all systematic sources of variation is removed then we are left with random error. Hence we assume no bias.

DESIGN WITH INDIVIDUAL BIOLOGICAL REPLICATION 2 BIOLOGICAL REPLICATES, 4 SLIDES DesignArray A1Array A2Array A3Array A4 Channel RedH(1)C(1)H(2)C(2) Channel GreenC(1)H(1)C(2)H(2)

The main goal is: Avoidance of bias Conditions of an experiment; mRNA extraction and processing, the reagents, the operators, the scanners and so on can leave a “global signature” in the resulting expression data. Hence it is essential to follow the principles of proper experimentation to avoid bias.

Replication and related issues What type of replicate is to be used?

Allocation of samples to the slides A Types of Samples - Replication – technical, biological. This always needs to be considered in microarrays since in general we often do NOT have biological replication - Pooled vs individual samples. - Pooled vs amplification samples.

Biological Replication The number of organisms from which you have taken the RNA is your biological replicate. If you used 3 mice and obtain RNA from each mice, that is your biological replication. Biological replication allows us to infer about the general population of interest. According to McClure and Wit: “the only thing that is good enough to answer a biological questions are the so- called biological replicates”

Technical Replicate Sometimes it is more convenient to obtain RNA from 3 organisms, put them together and extract the RNA and then divide them up into 3 RNA samples to be hybridized. This is NOT biological replication, rather this is called technical replicate after pooling. This is more convenient and has less variability (pooling always decreases variability), but often leads to bias. Another way is to obtain RNA from one organism and divide the RNA into 3 batches for hybridization. This is an extreme technical replicate.

More on Technical and Biological Replication Having a particular gene or (EST) repeated on a slide (as in Affy chips) is an example of Technical replication. This is NOT biological replication since the whole chip is exposed to the experimental condition However, technical replicates are useful, since they capture the variability due to measurement error, hybridization inequalities across a slide. The bottom line is: we are interested in the average expression level of a particular gene exposed to a particular condition for a specific biological organism.

How many Replicates? This is where the theory of optimal design comes in. Deciding HOW many replicates depends upon the questions you are interested in and the contrasts you want to estimate. In general a rule of thumb is: “at least 3 arrays per condition” One thing to keep in mind is that, technical replicates are in general highly reproducible, r =.95, whereas biological replicates from the same condition often have r ~.30.

Different design layout - Scientific aim of the experiment. - Robustness. - Extensibility. - Efficiency.

Taking physical limitation or cost into consideration: - the number of slides. - the amount of material.

Pooled vs. amplified samples In the cases where we do not have enough material from one biological sample to perform one array (chip) hybridizations. Pooling or Amplification are necessary. Amplification - Introduces more noise. - Non-linear amplification (??), different genes amplified at different rate. - Able to perform more hybridizations. Pooling - Less replicates hybridizations.

Pooled vs individual samples Pooling is seen as “biological averaging”. Trade off between - Cost of performing a hybridization. - Cost of the mRNA samples. Cost or mRNA samples << Cost per hybridization Pooling can assists reducing the number of hybridization.

To pool or not to pool? Pooling is routinely done when a single organism doesn’t allow you to have enough RNA for hybridization. So several organisms are combined to get enough RNA. The alternative to pooling is PCR amplification, where you use PCR techniques to physically amplify the harvested RNA. The literature has is not uniform in deciding which is better. Affy (GeneChip help notes) suggest that pooling causes too much averaging and sometimes we can average out less significant expressions.

Design Issues: Single Channel: –Identifying conditions of interest –Obtaining biological replicates –Preparing hybridization samples Each condition is considered to be a separate population from which biological replicates can be sampled.

Single Channel Issues If each ARRAY is considered a “Blocking Factor” then for one-channel oligo arrays it is NOT possible to apply more than one condition per array. Here each array can only be exposed to ONE condition, hence the array effects are confounded. Pooling of Biological replicates seems to be recommended to somewhat deal with some of the biases possible in single channel arrays.