J. A. Landsheer & G. van den Wittenboer Additive and multiplicative effects in a fixed 2 x 2 design using ANOVA can be difficult to differentiate J. A. Landsheer & G. van den Wittenboer
Example two factor ANOVA A, B: arbitrary coded independent variables {-1, 1} Y: dependent variable A B -1 +1 4 8 12 16 24 36 Several researchers have cast considerable doubt as to whether ANOVA is sufficiently sensitive when interaction effects are involved. Wahlsten (1991) identified the problem that the sample size required for the detection of interaction effects is seven to nine times larger than that needed to detect main effects. In comparison with the main effects tests, McClelland and Judd (1993) estimated that efficiency has reduced by 80% for the interaction tests, but unlike Wahlsten they attribute this reduction mainly to the use of field studies, whilst claiming that experimental studies are not so affected. Possibly, this may have something to do with this kind of data, for which the ANOVA model is not appropriate. A two factor ANOVA An example. A behavioral researches two treatments A and B. Suppose, there are the following expected cell values: In fact, we have assumed an interaction effect next to two main effects, that is, these results can be described as a simple interaction effect Y = A + B + A*B, while treatment A has respectively the strength 2 and 4, and treatment B has also the strength 2 and 4. The ANOVA results however, show a surprising picture:
Motivation * Ability interaction (Y = Performance) Means of the non-additive relation in the 2 x 2 design of Table 1 and 2 that are produced by a single product term. ANOVA analysis does not show a significant interaction effect (N = 96).
Table 1: Exemplary ANOVA Results of 96 Cases Table 1: Exemplary ANOVA Results of 96 Cases. Demonstration of Enlarged Main Effects and no Interaction Effect when testing model Y = A + B + A*B. Partial eta Source SS df MS F p. squared Corrected Model 4125,954 3 1375,3 35,3 ,000 ,535 Intercept 19615,168 1 19615,1 503,0 ,000 ,845 A 2254,534 1 2254,5 57,8 ,000 ,386 B 1868,284 1 1868,2 47,9 ,000 ,342 A * B 3,136 1 3,1 ,1 ,777 ,001 Error 3587,566 92 38,9 Total 27328,688 96 Corrected Total 7713,520 95 The researcher found two main effects and no significant interaction effect. There is a small interaction effect, which is insignificant in this example. Although a small significant interaction effect can be found on other trials, in all trials the main effects are bound to be much larger in comparison to the interaction. From a mathematical point of view these results are entirely valid. The statistical correct conclusion in this situation is that the interaction does not account for any variance above and beyond the effects of A and B. This interpreation does not exclude the existence of an interaction effect. A not significant interaction effect only indicates that after the total amount of variance that is attributed to the main factors, there is too little variance left which can be attributed to an interaction. In a practical sense the analysis is not very helpful, and may set the researcher on the wrong foot. The researcher may conclude that both A and B have about the same treatment effects, and that they can be used interchangeably. This conclusion would not be very useful, as the best treatment result only arises when both treatments are applied at the same time. Whenever in a new experiment only a single treatment would be used, the effect will be meager. The best beneficiary effect of the treatments only occurs when the treatments are combined. Because we know how the dependent data have arisen, we have 20/20 hindsight. Obviously, the ANOVA model is not the best for the analysis of this data. But how can the researcher decide that this model would be inappropriate? Four questions 1. What happened? Why did the ANOVA result not indicate an interaction? 2. Is this a real situation, which can be expected in real experimenting or is it just a simulation artefact? 3. Is there an analysis model that would be more appropriate for this kind of data? 4. How can the researcher decide that his model of analysis is not particularly fitting in this situation?
Four questions What happened? Why did the ANOVA result not indicate an interaction? Is this a real situation, which can be expected in real experimenting or is it just a simulation artefact? Is there an analysis model that would be more appropriate for this kind of data? How can the researcher decide that his model of analysis is not particularly fitting in this situation?
Statistical interpretation: Rejection of the multiplicative model Simulated data: Y = A’ * B’ A’, B’: not directly observed treatment strengths A, B: {-1, +1} There are always linear relationships. In a simulation they are known. A’ = A + 3 B’ = B + 3 Y = (A + 3) (B + 3) + error What happened?
What happened: diagnosis Suppose real treatment values A’ (a1, a2) real treatment values B’ (b1, b2) Y = A’ * B’. with n observations in each cell. A -1 +1 B -1 O1 O2 +1 O3 O4 1. What happened? Diagnosis Suppose real treatment values A’ {a1, a2} Suppose real treatment values B’ {b1, b2} There are n observations in each cell. The Cell means are O1, O2, O3 and O4
Then: Xmeantot = (O1 + O2 + O3 + O4) / 4. And SSA = 4n [( (b1 + b2)(a1 – a2)) / 4] 2, SSB = 4n [( (a1 + a2)(b1 – b2)) / 4] 2, SSAB = 4n [(a1- a2)(b1 – b2) / 4] 2. Both SSA and SSB have become dependent on a multiplicative part. Moreover, both sums of squares are dependent on the way the other variable is scaled: when (b1 + b2) or (a1 + a2) becomes larger, the effect is bound to become larger. However, when the other variable has contrasting values for the levels (a1 = -a2 and b1 = - b2), the product terms become zero, because b1 + b2 = 0 and a1 + a2 = 0. SSAB does not in itself depend on scaling of either of the variables, but depends on the product of both level differences and it represents only the combinatory effect of both variables.
Illustration of the dependency of the expected mean squares of main factor A on the treatment results b1 and b2 Y = A + B + AB E[MSA] = n {(b1 + b2 +2) (a1 – a2)}2/4 E[MSAB] = n {(b1 - b2) (a1 – a2)}2/4 = 6 Y = AB E[MSA] = n {(b1 + b2)(a1 – a2)} 2/4 Y = A + B E[MSA] = n (2a1 – 2a2)2/4 = 6 n = 24 cases per cell (N = 96). Differences of the treatment results are kept constant: a2 – a1 = 0.5 and b2 – b1 = 0.5. The horizontal axis shows b1 ranging from –5 to 5, the vertical axis shows E[MSA]. Y = A + B + AB As A, B and A*B contributes equally to the result, only when b1+b2= -1, the contributions to the result are equal. When b1+b2 grows smaller or larger, the contributions of the main effects grow, while the interaction effect remains constant. In fact, the main effects become overestimated, while the interaction effect is underestimated. This situation arises when the treatment strengths deviate further from a centered position.
Seriousness of the problem ANOVA results of 21.000 simulations of a combinatory effect Performance = Ability * Motivation. The mean differences between the groups are kept constant (.5) for ability, while the mean manipulation difference of motivation is varied in 21 steps from 0 to 2. For each step the simulation is repeated 1000 times. To demonstrate the seriousness of the problem an extended simulation was created in Mathcad. Figure 2 shows the simulation results for a total of 21.000 runs. In all cases the performance results from the performance = ability * motivation relationship. Apart from the error component, we assumed a motivation of 1 for the untreated ‘low motivation’ subjects and an ability of 1 for the ‘low ability’ subjects. The ability of the ‘high ability’ subjects is assumed to be constant at a level of 1.5. In this simulation the manipulation results of motivation is varied from 1 to 3 in 21 steps, while the ability of the subjects who received a treatment to increase their ability is assumed to be constant (1.5). For each increase of motivation 1000 ANOVA’s are calculated, each with a random error component. The error component is normally distributed with a mean of zero and a standard deviation of 1. Figure 2 plots the proportion of statistical significant results (p < .05) for main effect motivation, main effect ability and the interaction effect ability * motivation The proportion of significant effects of ability grows from 60% to 100%, while in fact the manipulation difference (apart from the error component) is assumed to be constant at .5. The proportion significant main effects of motivation rapidly grows to 100%, with the simulated motivation difference growing from zero to about .75. In contrast, the proportion of significant interactions of ability * motivation slowly grows form .05% to about 60% when the maximum motivation difference of 2 between both groups has been reached. The proportion of significant interaction effects remains clearly behind that of the main effects. This result is unexpected as the relationship between the factors is purely multiplicative.
Discussion Can this situation arise in real experimenting? Is centering helpful? Sensitivity interactions / main effects Wahlsten: sample 7 to 9 larger McClelland & Judd: efficiency about 20% Is Multiple Regression a better approach? Measurement of the independent variables Extended experimenting Can this situation arise in real experimenting? For instance, behavioral treatments are often compared to attention-only control groups and this control condition may have some positive effect. In medicine studies, the administering of the medicine is regularly compared to a placebo and it is well-known that the latter condition often has a small positive effect. Moreover, a specific medicine may have a combinatory effect with behavioral treatment, for instance because the medicine may make the patient receptive for the behavioral treatment. However, as we have demonstrated, the chance of finding the true interaction effect with ANOVA is limited. Most research using ANOVA would demonstrate strong main effects, and a considerably smaller interaction, which may be not statistically significant. Obviously, there would be a mismatch between the ANOVA model and the actual data. Is centering helpful? It is easily concluded from the diagnosis presented above, that when the treatment strengths would be centered (for instance -2 and +2), ANOVA would yield an interaction effect. But how can the researcher manipulate his treatments in such a manner, that the treatment strengths would be centered? That would be an extremely lucky shot. Conclusion: centering is not a real solution. Sensitivity for interaction Several researchers have cast considerable doubt as to whether ANOVA is sufficiently sensitive when interaction effects are involved. Wahlsten (1991) identified the problem that the sample size required for the detection of interaction effects is seven to nine times larger than that needed to detect main effects. In comparison with the main effects tests, McClelland and Judd (1993) estimated that efficiency has reduced by 80% for the interaction tests, but unlike Wahlsten they attribute this reduction mainly to the use of field studies, whilst claiming that experimental studies are not so affected. Possibly, this may have something to do with this kind of data, for which the ANOVA model is not appropriate. Is Multiple Regression a better approach? Yes and No. In a situation in which the ANOVA model is mimicked, that is, using orthogonal coding for both independent variables, the results would be the same. Measurement of the independent variables A better approach would be to estimate the strength of each treatment. If the researcher would be successful and would obtain measures that would approach the real treatment strengths sufficiently (that is 2 and 4 for both variables), than a multiple regression analysis indicates far more often the interaction effect. Extended experimenting. Another way to improve on the situation is to do experiments, in which both treatments are studied without any possible effect of the other treatment. In that situation, it would become clear that the experiments with the combination of treatments account for a considerably larger total amount of variance, than any of both treatments on its own. However, this approach defies one of the advantages of factorial design, which is supposed to allow the study of several factors at the same time.