Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 6 Design Matrices and ANOVA and how this is done in LIMMA.

Similar presentations


Presentation on theme: "Lecture 6 Design Matrices and ANOVA and how this is done in LIMMA."— Presentation transcript:

1 Lecture 6 Design Matrices and ANOVA and how this is done in LIMMA

2 ANOVA: Some Examples Is there a difference in the mean hourly wages for three different ethnic groups? Is there a difference in the mean sugar content in five different brands on cereal? IS there a difference between Mutant and Wild Type version of the organisms IS there a dye effect, as well as a treatment effect? For a time course experiment are there significant differences in gene expression for the different time points?

3 Model for ANOVA The general linear model which applies for ANOVA, Regression as well as ANCOVA is written as: Y = X   nX1) (nXp) (pX1) (nX1) This is the matrix formulation of the model. Y: response vector (observed) X: design matrix (observed)  : parameter vector (to be estimated)  : error vector (unobserved, randomness)

4 How to write a Design Matrix Consider a data set where we are looking at comparing 3 different fertilizers, A, B and C. For each fertilizer we have two plot of lands. Data: PlotFertilizerYield (TONNES) 1A12 2A15 3B21 4B18 5C10 6C9

5 Models: cell means model We can write this as: Yij =  i +  ij This is the cell-means model The corresponding design matrix is: 1 0 0 0 1 0 0 0 1 Each row corresponding to the unit, each column corresponding to the Treatment

6 Model: Factor effect Model We can write this as: Yij =  i +  ij This is the factor effect model, here we have an OVERALL mean and the  i are the differences of each treatment level /factor from the overall mean. Here we put the added requirement that  i = 0 The corresponding design matrix is: 1 0 0 1 1 0 1 -1 -1 Each row corresponding to the unit, each column corresponding to the Treatment, but the last treatment is expressed in terms of the other treatments.

7 Parameter Vectors For the cell means model:  ’  HO:  For the factor effects model:  ’  HO: 

8 Usage Most of Statistics uses the Factor effects model as it makes the interpretation of the hypothesis easier as we are testing our null that all the treatment effects are 0. However, in LIMMA in R we will use the easier cell-means model for design matrix construction and we need to define a contrast matrix.

9 LIMMA and Design Matrices This is what LIMMA says about constructing design Matrices: “The package limma uses an approach called linear models to analyse designed microarray experiments. This approach allows very general experiments to be analysed just as easily as a simple replicated experiment. The approach requires one or two matrices to be specified. The first is the design matrix which indicates in effect which RNA samples have been applied to each array. The second is the contrast matrix which specifies which comparisons you would like to make between the RNA samples. For very simple experiments, you may not need to specify the contrast matrix.”

10 More on Design Matrices The philosophy of the approach is as follows. You have to start by fitting a linear model o your data which fully models the systematic part of your data. The model is specified by the design matrix. Each row of the design matrix corresponds to an array in your experiment and each column corresponds to a coefficient which is used to describe the RNA sources in our experiment. With Affymetrix or single-channel data, or with two-color with a common reference, you will need as many coefficients as you have distinct RNA sources, no more and no less. With direct-design two-color data you will need one fewer coefficient than you have distinct RNA sources, unless you wish to estimate a dye- effect for each gene, in which case the number of RNA sources and the number of coefficients will be the same. Any set of independent coefficients will do, providing they describe all your treatments. The main purpose of this step is to estimate the variability in the data, hence the systematic part needs to be modeled so it can be distinguished from random variation.

11 LIMMA: contrasts In practice the requirement to have exactly as many coefficients as RNA sources is too restrictive in terms of questions you might want to answer. You might be interested in more or fewer comparisons between the RNA source. Hence the contrasts step is provided so that you can take the initial coefficients and compare them in as many ways as you want to answer any questions you might have, regardless of how many or how few these might be.

12 Writing out Design and Contrast Matrices: Example 1: This a one-factor ANOVA with 4 levels. The model is Yij =  i +  ij, i =1,…,4, j=1…3. Write out the contrast matrix if we were interested in comparing level 1 to level 2, and level 3 to the mean of level 1 and 2.

13 Example 1: Designs and Contrast Matrices arraym1m2m3m4 11000 21000 31000 40100 50100 60100 70010 80010 90010 100001 110001 120001 The contrast matrix for comparing: so that B= C’D comparing level 1 to level 2, level 3 to the mean of level 1 and 2. c1100 c2-1/2 10

14 Example 2 This a two-factor ANOVA with 3 levels for Factor A and 2 levels for Factor B. The model is Yij =  i +  j+  ij, i =1,…,3, j=1…2. Write out the contrast matrix for comparing Factor 1, levels 2 and 3 and Factor 2 levels 1 and 2.

15 Example 2: Design and Contrast Matrix The Design Matrix arraya1b1a1b2a2b1a2b2a3b1a3b2 1100000 2100000 3010000 4010000 5001000 6001000 7000100 8000100 9000010 10000010 11000001 Write out the contrast matrix for comparing : Factor 1, levels 2 and 3 Factor 1: levels 1 and 3 Factor 2 levels 1 and 2. Contrast: C1: 0 0 -1 -1 1 1 C2: -1 -1 0 0 1 1 C2: -1 1 -1 1 -1 1

16 Differential Expressions for Factorial Designs: Design Matrices and Contrasts, using R. Example The Estrogen Data set: Let us consider the Estrogen Data set, and look at how we use R to look at differential expressions using design matrices. NameFileNameTarget Abs10.1low10-1.celEstAbsent10 Abs10.2low10-2.celEstAbsent10 Pres10.1high10-1.celEstPresent10 Pres10.2high10-2.celEstPresent10 Abs48.1low48-1.celEstAbsent48 Abs48.2low48-2.celEstAbsent48 Pres48.1high48-1.celEstPresent48 Pres48.2high48-2.celEstPresent48

17 Description of Experiment There are 8 files in all, coming from a 2X2 factorial design. This is a design where there are 2 factors each at 2 levels. The study was done to measure the changes in gene expression for breast cancer patients due to estrogen (two levels Presence and Absence) at two time points (10hr and 48hr). This experiment data is available at the Bioconductor website.

18 Contrasts of Interest It is of interest to compare: 1.the effect of estrogen at 10 hours (compare presence to absence at 10 hours), 2.the effect of estrogen at 48 hours (compare presence and absence at 48 hours) 3.the effect of time in the absence of estrogen (compare Absent 10 to Absent 48).

19 Targets File Method To do this in R we can use different ways. Lets use the Targets file method as we did in 2 condition comparison before. So lets first put together a tab-delimited text file like the one above. I call it EstrogenTargets.txt so it describes a name, the filename and the targets containing the factor level infromation

20 Design matrix method One way to do this in R (to me it’s the simplest one in terms of Design matrices), is to write a Design Matrix using the factor combinations, WITHOUT the intercept term. R (at least LIMMA) writes the Design matrix as: EstAbsent10EstPresent10EstAbsent48EstPresent48 1000 0100 0010 0001 So our model is Y = X  g + 

21 Contrast Matrix Now to define the contrast we need to look at the transformation: b g = C’a g so, we define C as: C’ = -1 1 0 0 0 0 –1 1 -1010 This will define: (EstPresent10-EstAbsent10) (EstPresent48-EstAbsent48) (EstAbsent48-EstAbsent10)

22 In R using Targets file design=model.matrix(~- 1+factor(targets$Target,level=unique(targets$Target))) colnames(design)=unique(targets$Target) numParameters=ncol(design) parameterNames=colnames(design) contrastMatrix=matrix(c(-1,1,0,0,0,0,-1,1,-1,0,1,0),nrow=ncol(design)) Using the Targets file, efficient if you know how R works and you don’t have to put in the Matrix.

23 In R using the design matrix directly design<-matrix(c(1,0,0, 0,1,0,0,0,0,1, 0,0,0,1,0,0,0,0,1,0,0,0,1,0,0,0,0,1,0,0,0,1),nrow=8) contrastMatrix=matrix(c(-1,1,0,0,0,0,-1,1,- 1,0,1,0),nrow=ncol(design)) R constructs the matrices using the columns.

24 An example for Optimal Designs Suppose we have 12 arrays in a single channel framework and we have 5 conditions that we want to compare. Because of the unbalance it is harder to design orthogonal designs here. Sometimes people use classes of design that are already available and have properties like orthogonality. Designs in this class include: Margolin Designs (less than 6 conditions), Plackett-Burman designs and other such designs.

25 Consider the following Margolin Design: orthogonal for 6 conditions and 12 arrays 1111111 1 11-1-1-1-11 2 1-11-1-1-11 3 1-1-111-11 4 1-1-11-111 5 1-1-1-1111 6 1-1-1-1-1-1-1 7 1-11111-1 8 11-1111-1 9 111-1-11-1 10 111-11-1-1 11 1111-1-1-1 12

26 What if I have 5 conditions In some ways we could drop one column and use the Design matrix with the dropped column to preserve some optimality conditions. Question is which column to drop? The following R-code helps us decide whether we drop column 2 or 3 or 4.

27 A<-matrix(c(1,1,1,1,1,1,1, + 1,1,-1,-1,-1,-1,1, + 1,-1,1,-1,-1,-1,1, + 1,-1,-1,1,1,-1,1, + 1,-1,-1,1,-1,1,1, + 1,-1,-1,-1,1,1,1, + 1,-1,-1,-1,-1,-1,-1, + 1,-1,1,1,1,1,-1, + 1,1,-1,1,1,1,-1, + 1,1,1,-1,-1,1,-1, + 1,1,1,-1,1,-1,-1, + 1,1,1,1,-1,-1,-1), nrow=12) > B<-t(A) > C<-B%*%A > D<-solve(C) > det(D) [1] 5.353961e-07 > sum(diag(D)) [1] 2.065789

28 > A1<-A[,-2] > A2<-A[,-3] > A4<-A[,-4] > A1t=t(A1) > A2t=t(A2) > a3t=t(A4) > A4t=t(A4) > a1ta1=A1t%*%A1 > a2ta2=A2t%*%A2 > a4ta4=A4t%*%A4 > b1=solve(a1ta1) > b2=solve(a2ta2) > b3=solve(a4ta4) > aa1=sum(diag(b1)) > aa2=sum(diag(b2)) > aa4=sum(diag(b3))

29 Results from dropping columns >aa1 [1] 1.256966 (trace after dropping col 2) > aa2 [1] 0.8231631 (trace after dropping col 3) > aa4 [1] 1.322289 (trace after dropping col 4) > det(b1) [1] 3.023413e-06 (determinant after dropping col 2) > det(b2) [1] 1.216143e-06 (determinant after dropping col 3) > det(b3) [1] 2.941453e-06 (determinant after dropping col 4)


Download ppt "Lecture 6 Design Matrices and ANOVA and how this is done in LIMMA."

Similar presentations


Ads by Google