Hybridization Design for 2-Channel Microarray Experiments Naomi S. Altman, Pennsylvania State University), NSF_RCN.

Slides:



Advertisements
Similar presentations
Randomized Complete Block and Repeated Measures (Each Subject Receives Each Treatment) Designs KNNL – Chapters 21,
Advertisements

Optimal designs for one and two-colour microarrays using mixed models
1 Chapter 4 Experiments with Blocking Factors The Randomized Complete Block Design Nuisance factor: a design factor that probably has an effect.
Chapter 4 Randomized Blocks, Latin Squares, and Related Designs
M. Kathleen Kerr “Design Considerations for Efficient and Effective Microarray Studies” Biometrics 59, ; December 2003 Biostatistics Article Oncology.
Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented.
Experimental Design, Response Surface Analysis, and Optimization
1 Introduction to Experimental Design 1/26/2009 Copyright © 2009 Dan Nettleton.
Genomic Profiles of Brain Tissue in Humans and Chimpanzees II Naomi Altman Oct 06.
Introduction to the design of cDNA microarray experiments Statistics 246, Spring 2002 Week 9, Lecture 1 Yee Hwa Yang.
Experimental design for microarrays Presented by Alex Sánchez and Carmen Ruíz de Villa Departament d’Estadística. Universitat de Barcelona.
Sandrine Dudoit1 Microarray Experimental Design and Analysis Sandrine Dudoit jointly with Yee Hwa Yang Division of Biostatistics, UC Berkeley
Getting the numbers comparable
Experimental design and statistical analyses of data
ANCOVA Psy 420 Andrew Ainsworth. What is ANCOVA?
Chapter 3 Analysis of Variance
MAE 552 Heuristic Optimization Instructor: John Eddy Lecture #18 3/6/02 Taguchi’s Orthogonal Arrays.
13-1 Designing Engineering Experiments Every experiment involves a sequence of activities: Conjecture – the original hypothesis that motivates the.
Gene Expression Data Analyses (1) Trupti Joshi Computer Science Department 317 Engineering Building North (O)
Outline Single-factor ANOVA Two-factor ANOVA Three-factor ANOVA
Designing Microarray Experiments Naomi Altman Oct. 06.
13 Design and Analysis of Single-Factor Experiments:
T WO W AY ANOVA W ITH R EPLICATION  Also called a Factorial Experiment.  Factorial Experiment is used to evaluate 2 or more factors simultaneously. 
T WO WAY ANOVA WITH REPLICATION  Also called a Factorial Experiment.  Replication means an independent repeat of each factor combination.  The purpose.
(4) Within-Array Normalization PNAS, vol. 101, no. 5, Feb Jianqing Fan, Paul Tam, George Vande Woude, and Yi Ren.
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
Factorial Experiments
Practical Issues in Microarray Data Analysis Mark Reimers National Cancer Institute Bethesda Maryland.
QNT 531 Advanced Problems in Statistics and Research Methods
Guide to Handling Missing Information Contacting researchers Algebraic recalculations, conversions and approximations Imputation method (substituting missing.
PROBABILITY (6MTCOAE205) Chapter 6 Estimation. Confidence Intervals Contents of this chapter: Confidence Intervals for the Population Mean, μ when Population.
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
Module 8: Estimating Genetic Variances Nested design GCA, SCA Diallel
Biostatistics in Practice Peter D. Christenson Biostatistician LABioMed.org /Biostat Session 6: Case Study.
Psychology 301 Chapters & Differences Between Two Means Introduction to Analysis of Variance Multiple Comparisons.
We calculated a t-test for 30,000 genes at once How do we handle results, present data and results Normalization of the data as a mean of removing.
Sums of Squares. Sums of squares Besides the unweighted means solution, sums of squares can be calculated in various ways depending on the situation and.
Design of Experiments Problem formulation Setting up the experiment Analysis of data Panu Somervuo, March 20, 2007.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Introduction to analysis of microarray data David Edwards.
The Analysis of Microarray data using Mixed Models David Baird Peter Johnstone & Theresa Wilson AgResearch.
Chapter coverage Part A Part A –1: Practical tools –2: Consulting –3: Design Principles Part B (4-6) One-way ANOVA Part B (4-6) One-way ANOVA Part C (7-9)
1 Blocking & Confounding in the 2 k Factorial Design Text reference, Chapter 7 Blocking is a technique for dealing with controllable nuisance variables.
Balanced Incomplete Block Design Ford Falcon Prices Quoted by 28 Dealers to 8 Interviewers (2 Interviewers/Dealer) Source: A.F. Jung (1961). "Interviewer.
Statistics for Differential Expression Naomi Altman Oct. 06.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
Design of Micro-arrays Lecture Topic 6. Experimental design Proper experimental design is needed to ensure that questions of interest can be answered.
Single-Factor Studies KNNL – Chapter 16. Single-Factor Models Independent Variable can be qualitative or quantitative If Quantitative, we typically assume.
1 Introduction to Mixed Linear Models in Microarray Experiments 2/1/2011 Copyright © 2011 Dan Nettleton.
Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic.
General Linear Model.
CSIRO Insert presentation title, do not remove CSIRO from start of footer Experimental Design Why design? removal of technical variance Optimizing your.
IE241: Introduction to Design of Experiments. Last term we talked about testing the difference between two independent means. For means from a normal.
Experimental Design Reaching a balance between statistical power and available finances.
Analysis of honey bee microarray gene expression data Sandra Rodriguez Zas and Heather Adams Department of Animal Sciences Institute for Genomic Biology.
ANOVA Overview of Major Designs. Between or Within Subjects Between-subjects (completely randomized) designs –Subjects are nested within treatment conditions.
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
More repeated measures. More on sphericity With our previous between groups Anova we had the assumption of homogeneity of variance With repeated measures.
1 G Lect 13b G Lecture 13b Mixed models Special case: one entry per cell Equal vs. unequal cell n's.
Two way ANOVA with replication
Two way ANOVA with replication
Two Sample t-test vs. Paired t-test
Single-Factor Studies
Single-Factor Studies
Discrete Event Simulation - 8
Randomized Complete Block and Repeated Measures (Each Subject Receives Each Treatment) Designs KNNL – Chapters 21,
Statistical Analysis and Design of Experiments for Large Data Sets
Introduction to Mixed Linear Models in Microarray Experiments
Design Issues Lecture Topic 6.
Presentation transcript:

Hybridization Design for 2-Channel Microarray Experiments Naomi S. Altman, Pennsylvania State University), NSF_RCN Meetings 04

Expt Design and Microarrays Microarrays are Microarrays are Expensive Expensive Noisy Noisy A perfect situation for optimal design A perfect situation for optimal design

Outline Designing a Microarray Study Designing a Microarray Study Reference Design Reference Design Loop Designs Loop Designs Replication Replication Optimal Design/Analysis Optimal Design/Analysis Incorporating Multiple Factors and Blocks Incorporating Multiple Factors and Blocks

Designing a Microarray Experiment Define objectives Define objectives Determine factors and treatments Determine factors and treatments Determine appropriate analysis method Determine appropriate analysis method Determine sample design (biological and technical replication) Determine sample design (biological and technical replication) Determine platform Determine platform Design spots for custom arrays Design spots for custom arrays Determine hybridization pairs Determine hybridization pairs Perform experiment Perform experiment

Designing a Microarray Experiment Define objectives Define objectives Determine factors and treatments Determine factors and treatments Determine appropriate analysis method Determine appropriate analysis method Determine sample design (biological and technical replication) Determine sample design (biological and technical replication) Determine platform Determine platform Design spots for custom arrays Design spots for custom arrays Determine hybridization pairs ← Determine hybridization pairs ← Perform experiment Perform experiment

Arrow Notation Introduced by Kerr and Churchill (2001) Each array is represented by an arrow. RedGreen

Reference Design Reference A B C D 4 arrays 1 sample/treatment 4 reference samples

Loop Design (Kerr and Churchill 2001) A C B D 4 arrays 2 samples/treatment

Replication Often there is confusion among: Biological replicates Technical replicates repeated samples split sample and relabel spot replication In this presentation: We consider only one spot/gene/array any technical replicates are averaged each sample is an independent biological replicate

Linear Mixed Model for Microarray Data is the response of the gene in one channel is the response of the gene in one channel is the mean response of the gene over all treatments, channels, arrays is the mean response of the gene over all treatments, channels, arrays is the effect of treatment i is the effect of treatment i the effect of dye j the effect of dye j is the effect of the array k (or spot on the array) is the effect of the array k (or spot on the array) is the random deviation from the other effects and includes biological variation, technical variation and random error is the random deviation from the other effects and includes biological variation, technical variation and random error

Linear Mixed Model for Microarray Data The 2 channels on a single spot are correlated → array should be treated as a random effect

Differencing Channels on an Array Often the difference between samples on a single array is the unit of analysis: Normalization is almost always done on this quantity. In a reference design, the difference between treatments A and B can be estimated from 2 arrays by But there can be a large loss of information.

Var(  )=0.126 Var(M)=0.453 Drosophila arrays courtesy of Bryce MacIver, PSU

Reference Design The reference sample is the same biological material on every array T treatments, k replicates,kT arrays If there are technical dye-swaps, these are averaged to form 1 replicate. If all comparisons are between treatments, there is no need to dye-swap. If there are dye-swaps, these should be balanced by treatment.

Reference Design – Usual Analysis Usually the analysis is done on  E.g. and with k replicates, the variance of the estimated difference is Using the linear mixed model, we see that the variance of one pair is

The optimal w is The resulting variance for a single replicate is and with k replicates, the variance of the estimated difference is Reference Design – Optimal Weights Consider using  Then

Reference Design – Optimal Weights We do not know the optimal weights but if we use mixed model ANOVA such as those available in SAS, Splus or R, the weights are approximated from the data – leading to more efficient computations.

Loop Designs A C B D A loop is balanced for dye effects and has two replicates at each node. T treatments, 2k replicates, Tk arrays Recall: for a reference design we get only k replicates on Tk arrays

Using optimal weighting Var(A-B)=Var(A-D) = Var(A-C)= Both are smaller than the variance of the reference design with 4 arrays Loop Designs T=4, 4 arrays A C B D

Loop Designs T=4 A C B D A B C D A D B C Design L4C Design L4B Design L4D

Loop Design – 3 loops = 6 replicates/treatments 3* L4C Var(A-B)= Var(A-C)= L4B+L4C+L4D Var(difference) = T=4, 12 arrays T=4, 12 arrays Reference Design – 3 replicates/treatment Var(difference) =

Loop Design – 3 loops = 6 replicates/treatments 3* L4C Var(A-B)= 0.46 Var(A-C)= 0.58 L4B+L4C+L4D Var(difference) = 0.47 T=4, 12 arrays Assuming T=4, 12 arrays Assuming Reference Design – 3 replicates/treatment Var(difference) = 0.83

An 8 Treatment Example A C B D G FE H

A C B D G FE H 2 Complete Blocks

An 8 Treatment Example A C B D G FE H Replication: Yellow loop? Red “loop”?

Incorporating 2x2 Factorial in a Loop GT gt gT Gt GT gT gt Gt Which Arrangement is Better?

Incorporating 2x2 Factorial in a Loop The contrasts of interest can be written (in terms of the means – not the observations) ½(A+B)- ½ (C+D) ½(A+D)-½ (B+C) ½(A+C)-½ (B+D) A C B D

Incorporating 2x2 Factorial in a Loop The optimal variances are: ½(A+B)-½ (C+D) ½(A+D)-½ (B+C) ½(A+C)-½ (B+D) A C B D

Incorporating 2x2 Factorial in a Loop GT gt gT Gt GT gT gt Gt Best arrangement for estimating interaction Best arrangement for estimating time main effect

And now for the rest of the story Missing arrays – not fatal but reduce efficiency Added treatments A C B D A C B D E

And now for the rest of the story Missing arrays – not fatal but reduce efficiency Added treatments A C B D A C B D E

Optimal Design? The loop design has not been shown to be optimal The loop design has not been shown to be optimal There are lots of other BIBDs for 2 samples/block There are lots of other BIBDs for 2 samples/block General BIBDs can be adapted as more channels become available General BIBDs can be adapted as more channels become available Loop designs are particularly appealing due to the dye balance and graphical representation Loop designs are particularly appealing due to the dye balance and graphical representation

The Moral of the Story Loop designs are very efficient Loop designs are very efficient Can incorporate factorial arrangements Can incorporate factorial arrangements Can incorporate blocks Can incorporate blocks Can be replicated in various ways to improve efficiency Can be replicated in various ways to improve efficiency Optimal design ideas can help determine which BIBD to use Optimal design ideas can help determine which BIBD to use ANOVA-type analyses on the individual channels – not differencing – should be used for analysis. ANOVA-type analyses on the individual channels – not differencing – should be used for analysis.

References Kerr and Churchill (2001), Experimental design for gene expression microarrays, Biostatistics, 2: Kerr and Churchill (2001), Experimental design for gene expression microarrays, Biostatistics, 2: Kerr (2003) Design Considerations for efficient and effective microarray studies, Biometrics, 59: Kerr (2003) Design Considerations for efficient and effective microarray studies, Biometrics, 59: Yang and Speed (2002) Design Issues for cDNA Microarray Experiments Nature Reviews Genetics 3, Yang and Speed (2002) Design Issues for cDNA Microarray Experiments Nature Reviews Genetics 3,

C2 B2 A1 C1 B1 A2