Factor-allocation in gene- expression microarray experiments Chris Brien Phenomics and Bioinformatics Research Centre University of South Australia

Slides:



Advertisements
Similar presentations
Educational Research: Causal-Comparative Studies
Advertisements

A Spreadsheet for Analysis of Straightforward Controlled Trials
Robust microarrary experiments by design: a multiphase framework Chris Brien Phenomics & Bioinformatics Research Centre, University of South Australia.
Multitiered experiments: Tiers in the design and analysis of experiments Chris Brien School of Mathematics and Statistics University of South Australia.
Multiphase experiments in the biological sciences Chris Brien Phenomics and Bioinformatics Research Centre, University of South Australia Joint work with:
Randomization-based analysis of experiments (Tiers over mixed models)
Principles in the design of multiphase experiments with a later laboratory phase: orthogonal designs Chris Brien 1, Bronwyn Harch 2, Ray Correll 2 & Rosemary.
Randomized Complete Block and Repeated Measures (Each Subject Receives Each Treatment) Designs KNNL – Chapters 21,
Mixed Designs: Between and Within Psy 420 Ainsworth.
Formulating mixed models for experiments, including longitudinal experiments (accepted for publication in JABES) Chris Brien 1 & Clarice Demétrio 2 1 University.
Robust microarray experiments by design: a multiphase framework Chris Brien Phenomics & Bioinformatics Research Centre, University of South Australia
Multiple Analysis of Variance – MANOVA
PSYC512: Research Methods PSYC512: Research Methods Lecture 13 Brian P. Dyre University of Idaho.
Multiple Comparisons in Factorial Experiments
1 Chapter 4 Experiments with Blocking Factors The Randomized Complete Block Design Nuisance factor: a design factor that probably has an effect.
Chapter 4 Randomized Blocks, Latin Squares, and Related Designs
M. Kathleen Kerr “Design Considerations for Efficient and Effective Microarray Studies” Biometrics 59, ; December 2003 Biostatistics Article Oncology.
1 Introduction to Experimental Design 1/26/2009 Copyright © 2009 Dan Nettleton.
Dahlia Nielsen North Carolina State University Bioinformatics Research Center.
Split-Plot Experiment Top Shrinkage by Wool Fiber Treatment and Number of Drying Revolutions J. Lindberg (1953). “Relationship Between Various Surface.
Principles in the design of multiphase experiments with a later laboratory phase: orthogonal designs Chris Brien 1, Bronwyn Harch 2, Ray Correll 2 & Rosemary.
Stat Today: Transformation of the response; Latin-squares.
Chapter 28 Design of Experiments (DOE). Objectives Define basic design of experiments (DOE) terminology. Apply DOE principles. Plan, organize, and evaluate.
8. ANALYSIS OF VARIANCE 8.1 Elements of a Designed Experiment
Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.
Stat Today: Multiple comparisons, diagnostic checking, an example After these notes, we will have looked at (skip figures 1.2 and 1.3, last.
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
Nested and Split Plot Designs. Nested and Split-Plot Designs These are multifactor experiments that address common economic and practical constraints.
Formulating mixed models for experiments, including longitudinal experiments [JABES (2009) 14, ] Chris Brien 1 & Clarice Demétrio 2 1 University.
Formulating mixed models for experiments, including longitudinal experiments [JABES (2009) 14, ] Chris Brien 1 & Clarice Demétrio 2 1 University.
Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012.
Understanding the Two-Way Analysis of Variance
Two-Way Analysis of Variance STAT E-150 Statistical Methods.
Statistical Modelling Chapter X 1 X.Sample size and power X.AHow it is done X.BPower X.CComputing the required sample size for the CRD and RCBD with a.
Introduction ANOVA Mike Tucker School of Psychology B209 Portland Square University of Plymouth Drake Circus Plymouth, PL4 8AA Tel: +44 (0)
Randomization inference for a chain of randomizations Chris Brien Phenomics & Bioinformatics Research Centre, University of South Australia. The Australian.
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
Research Study Design. Objective- To devise a study method that will clearly answer the study question with the least amount of time, energy, cost, and.
Statistical Aspects of a Research Project Mohd Ridzwan Abd Halim Jabatan Sains Tanaman Universiti Putra Malaysia.
Testing Hypotheses about Differences among Several Means.
Psych 5500/6500 Other ANOVA’s Fall, Factorial Designs Factorial Designs have one dependent variable and more than one independent variable (i.e.
IE341 Midterm. 1. The effects of a 2 x 2 fixed effects factorial design are: A effect = 20 B effect = 10 AB effect = 16 = 35 (a) Write the fitted regression.
Chapter coverage Part A Part A –1: Practical tools –2: Consulting –3: Design Principles Part B (4-6) One-way ANOVA Part B (4-6) One-way ANOVA Part C (7-9)
Chris Brien University of South Australia Bronwyn Harch Ray Correll CSIRO Mathematical and Information Sciences Design and analysis.
Statistical Modelling Chapter XI 1 XII. Justifying the ANOVA- based hypothesis test XII.AThe sources for an ANOVA XII.BThe sums of squares for an ANOVA.
Statistics for Differential Expression Naomi Altman Oct. 06.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
Design of Micro-arrays Lecture Topic 6. Experimental design Proper experimental design is needed to ensure that questions of interest can be answered.
Chapter 22: Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
Copyright © Cengage Learning. All rights reserved. 12 Analysis of Variance.
CSIRO Insert presentation title, do not remove CSIRO from start of footer Experimental Design Why design? removal of technical variance Optimizing your.
Lecture 6 Design Matrices and ANOVA and how this is done in LIMMA.
Chapter 11 Analysis of Variance. 11.1: The Completely Randomized Design: One-Way Analysis of Variance vocabulary –completely randomized –groups –factors.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
CHAPTER 9 Producing Data: Experiments BPS - 5TH ED.CHAPTER 9 1.
Hybridization Design for 2-Channel Microarray Experiments Naomi S. Altman, Pennsylvania State University), NSF_RCN.
Smith/Davis (c) 2005 Prentice Hall Chapter Fifteen Inferential Tests of Significance III: Analyzing and Interpreting Experiments with Multiple Independent.
The Mixed Effects Model - Introduction In many situations, one of the factors of interest will have its levels chosen because they are of specific interest.
ANOVA Overview of Major Designs. Between or Within Subjects Between-subjects (completely randomized) designs –Subjects are nested within treatment conditions.
Producing Data: Experiments BPS - 5th Ed. Chapter 9 1.
1 Topic 14 – Experimental Design Crossover Nested Factors Repeated Measures.
Designs for Experiments with More Than One Factor When the experimenter is interested in the effect of multiple factors on a response a factorial design.
Analysis of Variance l Chapter 8 l 8.1 One way ANOVA
Anatomies of experimental designs: how good is my single-phase design?
Randomizing and checking standard and multiphase designs using the R package dae Chris Brien Phenomics and Bioinformatics Research Centre, University.
Comparing Three or More Means
Anatomies of experimental designs: a case study for a p/q-rep multiphase design Chris Brien Phenomics and Bioinformatics Research Centre, University of.
Strip Plot Design.
Randomized Complete Block and Repeated Measures (Each Subject Receives Each Treatment) Designs KNNL – Chapters 21,
STATISTICS INFORMED DECISIONS USING DATA
Presentation transcript:

Factor-allocation in gene- expression microarray experiments Chris Brien Phenomics and Bioinformatics Research Centre University of South Australia

Outline 1. Establishing the analysis for a design 2. Analysis based on factor-allocation description 3. Analysis based on single-factor description 4. Microarray experiment (second phase) 5. Conclusions 2

1.Establishing the analysis for an design The aim is to: i. Formulate the mixed model: ii. Get the skeleton ANOVA table: iii. Derive the E[MSq] and use to obtain variance of treatment mean differences. 3

2.Analysis based on factor-allocation description Milliken et al. (2007,SAGMB) discuss the design of microarray experiments applied to a pre-existing split-plot experiment:  i.e. a two-phase experiment (McIntyre, 1955). First phase is a split-plot experiment on grasses in which:  An RCBD with 6 Blocks is used to assign the 2-level factor Precip to the main plots;  Each main-plot is split into 2 subplots to which the 2-level factor Temp is randomized. Investigate analysis of a first-phase response, such as grass production 4

2a.Factor-allocation description (Brien, 1983; Brien & Bailey, 2006; Brien et al., 2011) 5 Two panels, each with:  a list of factors; their numbers of levels; their nesting relationships. A set of factors is called a tier:  {Precip, Temp} or {Blocks, MainPlots, Subplots};  The factors in a set have the same status in the allocation, usually a randomization;  Textbook experiments are two-tiered, others are not. allocatedunallocated Use factor-allocation diagrams: 2 Precip 2 Temp 4 treatments 6 Blocks 2 MainPlots in B 2 Subplots in B, M 24 subplots

2b.Mixed model 6 Mixed model  P + T + P  T | B + B  M + B  M  S Precip2 Temp2 Precip  Temp 4 U1U1 2 Precip 2 Temp 4 treatments 6 Blocks 2 MainPlots in B 2 Subplots in B, M 24 subplots Y = X P  P + X T  T + X P  T  P  T + Z B u B + Z B  M u B  M + Z B  M  S u B  M  S. Terms in mixed model correspond to generalized factors:  A  B is the ab-level factor formed from the combinations of A with a levels and B with b levels. Display in Hasse diagrams that show hierarchy of terms from each tier. Blocks6 Blocks  MainPlots12 U1U1 Blocks  MainPlots  Subplots 24 (Brien & Bailey, 2006; Brien & Demétrio, 2009)

2c.ANOVA sources 2d.ANOVA table (summarizes properties) 7 Add sources to Hasse diagrams 1P1P 1T1T 1P#T 1M1M Precip2 Temp2 Precip  Temp 4 U1U1 Blocks6 Blocks  MainPlots12 U1U1 Blocks  MainPlots  Subplots 24 5B5B 6M[B] 1M1M 12S[B  M]

2e. E[Msq] Add E[MSq] to ANOVA table, tier by tier  Use Hasse diagrams and standard rules (Lohr, 1995; Brien et al., 2011). 8 Variance of diff between means from effects confounded with a single source easily obtained: 2  / r,  = E[MSq] for source for means ignoring q(), r = repl n of a mean. For example, variance of diff between Precip means: Precip-Temp mean differences use extended rules.

3) Analysis based on single-set description Single set of factors that uniquely indexes observations:  {Blocks, Precip, Temp} (MainPlots and Subplots omitted). What are the EUs in the single-set approach?  A set of units that are indexed by Blocks-Precip combinations and another set by the Blocks-Precip-Temp combinations.  Of course, Blocks-Precip-(Temp) are not actual EUs, as Precip (Temp) are not randomized to those combinations.  They act as a proxy for the unnamed EUs. 9 e.g. Searle, Casella & McCulloch (1992); Littel et al. (2006). 2 Precip 2 Temp 4 treatments 6 Blocks 2 MainPlots in B 2 Subplots in B, M 24 subplots Factor allocation clearly shows the EUs are MainPlots in B and Subplots in B, M

Mixed model: P + T + P  T | B + B  P + B  P  T.  Previous model: P + T + P  T | B + B  M + B  M  S.  Former model more economical as M and S not needed.  However, B  M and B  P are different sources of variability: inherent variability vs block-treatment interaction. An important difference is that in factor-allocation, initially at least, factors from different sets are taken to be independent. Mixed model and ANOVA table 10 Same decomposition and E[MSq], but the single-set ANOVA does not display confounding and the identification of sources is blurred.

4.Microarray experiment: second phase 11 For this phase, Milliken et al. (2007) gave three designs that differ in the way P and T assigned to an array: A. Same T, different P; B. Different T and P; C. Different T, same P. Each arrow represents an array, with 2 arrays per block (Red at the head). Two Blktypes depending on dye assignment: 1,3,5 and 2,4,6.

Randomization for Plan B Array 2 Dyes 24 array-dyes 2 Precip 2 Temp 4 treatments 2 MainPlots in B 6 Blocks 2 Subplots in B, M 24 subplots Milliken et al. (2007) not explicit. Wish to retain MainPlots and Subplots in the allocation and analysis to have a complete factor-allocation description.  Cannot just assign them ignoring treatments.  Need to assign combinations of the factors from both first-phase tiers and so these form a pseudotier which in indicated by the dashed oval. Three-tiered.  

Microarray phase randomization Randomized layout for first-phase: 13 BMSPTBMSPT Green Red

Microarray phase randomization (cont’d) Assignment to array-dyes 14 DyeRDyeG ArrayBMSPT BMSPT To do the randomization, permute Arrays and Dye separately (as for a row-column design), and then re-order.

Microarray phase randomization (cont’d) Randomized layout: 15 DyeRDyeG ArrayBMSPT BMSPT

Mixed model for Plan B 16 Mixed model based on generalized factors from each panel: P + T + P  T + D | B + B  M + B  M  S + A + A  D;  However, Milliken et al. (2007) include intertier (block-treatment) interactions of D with P and T.  P*T*D | B + B  M + B  M  S + A + A  D. 12 Array 2 Dyes 24 array-dyes 2 Precip 2 Temp 4 treatments 2 MainPlots in B 6 Blocks 2 Subplots in B, M 24 subplots  

ANOVA for Plan B If examine the design, see that a MainPlots[Blocks] contrast confounded with Dyes  use two-level pseudofactors M D to capture it. Also some Subplots[Blocks  MainPlots] contrasts confounded with Arrays:  Use S A for Subplots on the same array to capture it. 17 DyeRDyeG ArrayBMSMDMD SASA PT BMSMDMD SASA PT DyeRDyeG ArrayBMSMDMD SASA PT BMSMDMD SASA PT

ANOVA table for Plan B 18 array-dyes tier Sourcedf Array11 Dye1 A#D11 Sources for arrays-dyes straightforward. Sources for subplots as before but split across array- dyes sources using the pseudofactors M D and S A. The treatments tier sources are confounded as shown.  P#T, and other two-factor interactions, confounded with Arrays.  P and T confounded with less variable A#D subplots tier Sourcedf Blocks5 SubPlots[B  M] A 6 MainPlots[B] D 1 MainPlots[B]  5 SubPlots[B  M]  6 treatments tier Sourcedf P#D1 Residual4 P#T1 T#D1 Residual4 Precip1 Residual4 Temp1 P#T#D1 Residual4 12 Array 2 Dyes 24 array-dyes 2 Precip 2 Temp 4 treatments 2 MainPlots in B 6 Blocks 2 Subplots in B, M 24 subplots  

Comparison with single-set-description ANOVA Instead of pseudofactors, use grouping factors (Blktype & ArrayPairs) that are unconnected to terms in the model; all factors crossed or nested. Equivalent ANOVAs, but labels differ – rationale for single-set decomposition is unclear and its table does not show confounding; Thus, sources of variation obscured (e.g. P#T), although their E[MQs] show it. 19 array-dyes tiersubplots tiertreatments tiersingle-set-description sources SourcedfSourcedfSourcedf (Milliken et al., 2007) Array11Blocks5P#D1Blktype (= P#D) Residual4Block[Blktype] SubPlots[B  M] A 6P#T1 T#D1 Residual4ArrayPairs#Block[Blktype] Dye1MainPlots[B] D 11Dye A#D11MainPlots[B]  5Precip1 Residual4P#Block[Blktype] SubPlots[B  M]  6Temp1 P#T#D1Temp#Blktype Residual4

Adding E[MSq] for Plan B 20 array-dyes tiersubplots tiertreatments tier SourcedfSourcedfSourcedfE[MSq] Array11Blocks5P#D1 Residual4 SubPlots[B  M] A 6P#T1 T#D1 Residual4 Dye1MainPlots[B] D 1 A#D11MainPlots[B]  5Precip1 Residual4 SubPlots[B  M]  6Temp1 P#T#D1 Residual4 E[MSq] synthesized using standard rules as for first phase.  Milliken et al. (2007) use ad hoc procedure that takes 4 journal pages. Mixed model of convenience (drop B  M  S or A  D to get fit):  P*T*D | B + B  M + A + A  D (no pseudofactors);  Equivalent to Milliken et al. (2007).

Variance of mean differences 21 array-dyes tiersubplots tiertreatments tier SourcedfSourcedfSourcedfE[MSq] Array11Blocks5P#D1 Residual4 SubPlots[B  M] A 6P#T1 T#D1 Residual4 Dye1MainPlots[B] D 1 A#D11MainPlots[B]  5Precip1 Residual4 SubPlots[B  M]  6Temp1 P#T#D1 Residual4 Now, for Precip mean differences:

5.Conclusions Microarray designs are two-phase. Single-set description can be confusing and so false economy. Factor-allocation diagrams lead to explicit consideration of randomization for array design – important but often overlooked. A general, non-algebraic method for synthesizing the skeleton ANOVA table, mixed model and variances of mean differences is available for orthogonal designs. When allocation is randomized, mixed models are randomization-based (Brien & Bailey, 2006; Brien & Demétrio, 2009). Using pseudofactors where necessary:  retains all sources of variation;  avoids substitution of artificial grouping factors for real sources of variations so that sources in decomposition and terms in model directly related. 22

References Brien, C. J. (1983). Analysis of variance tables based on experimental structure. Biometrics, 39, Brien, C.J., and Bailey, R.A. (2006) Multiple randomizations (with discussion). J. Roy. Statist. Soc., Ser. B, 68, 571–609. Brien, C.J. and Demétrio, C.G.B. (2009) Formulating mixed models for experiments, including longitudinal experiments. J. Agr. Biol. Env. Stat., 14, Brien, C.J., Harch, B.D., Correll, R.L. and Bailey, R.A. (2011) Multiphase experiments with at least one later laboratory phase. I. Orthogonal designs. accepted for J. Agr. Biol. Env. Stat. Lohr, S. L. (1995). Hasse diagrams in statistical consulting and teaching. The American Statistician, 49(4), McIntyre, G. A. (1955). Design and analysis of two phase experiments. Biometrics, 11, Milliken, G. A., K. A. Garrett, et al. (2007) Experimental Design for Two- Color Microarrays Applied in a Pre-Existing Split-Plot Experiment. Stat. Appl. in Genet. and Mol. Biol., 6(1), Article 20. Web address for Multitiered experiments site: 23