The Mixed Effects Model - Introduction In many situations, one of the factors of interest will have its levels chosen because they are of specific interest to the researcher. On the other hand, there may be a second factor of interest for which it is important to generalize to all possible levels; in this case the levels of the second factor might be chosen at random. This type of experiment is referred to as a mixed study design because one of the factors has fixed levels while the other has random levels. 1STA305 Week 9
The Mixed Effects Model Suppose that in a given study, a levels of factor A have been chosen because they are of particular interest. Further, suppose that b levels of factor B have been chosen at random from all possible levels of this factor. A total of abr experimental units will be used to conduct the study, r units will be randomly allocated to each of the ab experimental conditions. The form of the statistical model that we will study is identical to that for 2 factor studies with either fixed or random effects - the difference is in the assumptions about the factor levels and the interactions. The model equation is Y ijk = μ + α i + β j + γ ij + ε ijk As with the fixed and random effects models, we parameterize the model in such a way that μ is the overall mean of all of the responses: i.e. 2STA305 Week 9
Assumptions of the Mixed Effects Model As in the fixed effects model, the factor A has fixed effects and we therefore require that Since the levels of factor B have been chosen at random, we require instead that Since one of the factors is random, the interactions must be random as well however, since one of the factors is fixed, the sum over that component will be 0. Together these yield 2 constraints on the γ ij The factor (a-1)/a is for convenience in expressing EMS only. STA305 Week 93
STA305 week 54 Sums of Squares The observed variation in the data is measure in the same manner as for the fixed effects and the random effect case. In other words, the total variation in the data is measured by The sums of squares and the degrees of freedom for the other sources of variation are also the same as in the 2-factor study with fixed or random effects model. The only difference is in the expected mean squares.
STA305 week 55 Expected Mean Squares The expected mean squares are as follows:
Hypothesis Testing As in all of the other experimental designs that we have looked at, the motivation for the test statistics is derived from the EMS. As in the case of both the fixed and random effects models, the test for interactions will be made by comparing MS A×B to MS E. The test for the fixed factor, factor A, will be made by comparing MS A to MS A×B. The test for the random effect, factor B, will be made by comparing MS B to MS E. STA305 Week 96
The ANOVA Table It is useful to add the expected mean squares to the table in order to remember which ratios to form for the F-tests. The ANOVA table is given below: STA305 week 67
Estimating the Model Parameters The effect for the levels of the fixed factor can be estimated as in the fixed effects model. That is, In the mixed model, however, confidence intervals for the effects of the levels of the fixed factors are constructed using MS A×B as the variance estimate. That is, a CI for the effect of the ith level of factor A is: Orthogonal contrasts can also be used to make inferences about the levels of factor A. The mixed effects model also contains components of variation and these can be estimated as follows: STA305 week 68
Random & Mixed Effects Using SAS – Example Background: the goal of this study is to investigate the capacity of a measurement system. Design: 10 parts are randomly selected; 2 operators are randomly selected to measure each part 3 times. The statements required to conduct analysis in SAS are as follows: proc glm data = measurement ; class part operator ; Model measure = part | operator ; Random part operator ; Test h=part e=part*operator ; Test h=operator e=part*operator ; run ; STA305 Week 99
10
STA305 Week 911
STA305 Week 912
Suppose that parts were fixed and operators were random. The SAS code would be as follows: proc glm data = measurement ; class part operator ; model measure = part | operator ; random part operator ; test h=part e=part*operator ; run ; The ANOVA would look the same as above. The fixed factor “part” would be tested against the interaction. The (random) factor “operator” and “part×operator” would be tested against error term that can be read from the ANOVA table. STA305 Week 913
Three-Factor Fixed Effects Design Suppose that in a particular experiment, there are 3 factors that are of interest to the researcher. Assume that there are a levels of Factor A, b levels of Factor B, and c levels of Factor C. In this case, the researcher must also be concerned with interactions between all 3 factors: A×B, A×C, B×C, and A×B×C. The model that we will use in this case is Y ijkl = μ+α i +β j +_γ k +(αβ) ij +(αγ) ik +(βγ) jk +(αβγ) ijk + ε ijkl. In this notation, the interaction terms are denoted by, for instance, (αβ) ij. This notation is used to avoid introducing more Greek letters, and does not mean that the interaction between α i and β j is α i β j STA305 Week 914
Model Assumptions The assumptions about the parameters are similar to those for the 2- factor fixed effects model. We assume the following: STA305 Week 915
Sums of Squares and ANOVA Table STA305 Week 916
Blocking - Introduction In general, the goal of experimental design is to minimization haphazard variability and to be able to see differences between treatments. In some situations, a variable might have an impact on the response, however, this variable is not the focus of the study and we generally wish to exclude it from the design. Such variables are called nuisance factors. The purpose of randomization is to average out the impact of these nuisance factors. In some cases, the nuisance factors may be both unknown and uncontrollable, in which case randomization is especially useful. STA305 Week 917
In other cases, factors which influence response might be known, but possibly uncontrollable. Although such factors cannot be included in the design, we can at least observe their value. The analysis can then be adjusted to compensate for the effect of these variables using an analysis of covariance (to be discussed later in the course). In other situations, a nuisance factor may be both known and controllable, in which case, we can reduce risk of haphazard error by including this factor in design of the experiment. The type of designs, called blocked designs can be used to reduce variability of experimental error in such cases. STA305 Week 918
Example Fleet manager wishes to consider 4 brands of tires to determine which has least tread wear after 20,000 miles. Since there are 4 brands of tires to test the study should ideally include at least 4 cars. Denote tire brands by T 1, T 2, T 3, T 4 and the cars by C 1, C 2, C 3, C 4. One possible way to design the study is to randomly decide which car gets which type of tire. This car would then have 4 tires of this type STA305 Week 919
However, if there is a difference between cars with respect to the wear they cause on the tires, then this design will not allow us to detect a difference between brands. Although differences between cars are not of primary interest, they need to be taken into account. One possible way around this is to randomly assign the 16 tires (4 of each type) to the 4 cars. The following allocation of tires to cars might result from such a randomization: STA305 Week 920
However, the goal of the design was to eliminate the confounding of tire effects with car differences but this goal has not been met here. For example, brand T 1 isn’t used on car C 3, brand T 2 is not used on car C 1, and brand T 4 is not used on car C 2. So we need to ensure that there is no confounding and that random error does include differences between cars. This could be accomplished by restricting randomization so that each car must have one tire of each brand. That is, randomize the location of tires within each car. An example of such a randomization scheme is as follows: This design is known as a randomized complete block design. STA305 Week 921
Randomized Complete Block Design A randomized complete block design is a restricted randomization. Experimental units are first organized into homogeneous groups called blocks. Treatments are then randomly allocated within each block. In the example above, cars were contributing to variation but were not of primary interest. The fact that each car requires 4 tires means that the 4 tires on one car form a natural blocking unit. The purpose of blocking is to ensure that experimental units within a block are as homogeneous as possible with respect to the response variable. Units in different blocks are more heterogeneous. STA305 Week 922
Advantages & Disadvantages Using blocks allows us to control a factor not of primary interest. However, it requires that there be enough experimental units to ensure that each treatment can be used within each block. Further, it requires the researcher to assume that there is no interaction between blocks and treatments. Since block effects must be estimated in addition to treatment effects, the degrees of freedom available for estimating error are reduced. STA305 Week 923
Special Case: Paired t-test The simplest example of a randomized complete block design is a paired t-test. In this case there are 2 treatments to be studied, each treatment is applied to each experimental unit. For example, twins might be randomly allocated to one of 2 treatments. Or 2 treatments might be randomly allocated to left and right eyes, lungs, kidneys, hands, etc. STA305 Week 924
General Case: Two or More Treatments Consider the case where there is one factor which will be studied for its effect on the response variable. Suppose that the number of levels of that factor is a. Further, suppose that it is known that there is a nuisance factor which can be controlled, and that this factor will be used to form blocks. Let b denote the number of blocks to be used in the experiment. The order in which the treatments will be allocated within blocks is randomized. The total number of experimental units required to conduct this experiment is N = ab. STA305 Week 925
The Model We will use the following statistical model to express the response in terms of the treatment and block effects: Y ij = μ+τ i +β j + ε ij. Where: μ is the overall mean τ i is the effect of the i-th treatment β j is the effect of the j-th block and ε ij is the residual or random error term. It is possible that either or both of the treatments and blocks could be randomly chosen. But, for now we assume that both are fixed. STA305 Week 926
Assumptions As before, we will assume that ε ij ~ N(0, σ 2 ) and that ε ij are independent of each other. Treatment and block effects are defined as deviations from the overall mean. Therefore, we require that STA305 Week 927
Sources of Variation When considered as a whole, the data from all treatment groups and all blocks will contain a certain amount of variability. Some of the variability might be due the fact that the treatments have different effects on the response. Similarly, some of the variability might be due to the fact that blocks are quite heterogeneous with regard to the response. Finally, even if there were no treatment or block differences, there would still be chance variation. The total sum of squares is a measure of the overall variability in the sample, and it can be decomposed to allow us to determine how much variability is due to each source… STA305 Week 928