Download presentation
Presentation is loading. Please wait.
Published byVivian McLaughlin Modified over 9 years ago
1
1 Psych 5510/6510 Chapter 13 ANCOVA: Models with Continuous and Categorical Predictors Part 2: Controlling for Confounding Variables Spring, 2009
2
2 Contexts In this chapter we are looking at three contexts in which we might want to combine continuous and categorical variables in our model.: 1.Within a ‘true experimental’ design, where we can use this approach to increase the power of the design and to add sophistication to our model. 2.Within a ‘quasi-experimental’ or ‘static group’ design, where we can use this approach to control a confounding variable. 3.Within a correlational design, where we can introduce a categorical variable to better understand a continuous variable.
3
3 Context 2: Controlling for Confounding Variables In this context we are looking at including continuous variables in the model as a way of controlling confounding variables, particularly within quasi-experimental designs and static group designs
4
4 Quasi-Experimental Designs In a quasi-experimental design subjects are not randomly assigned to groups, usually for some practical or ethical reason. The groups are still assumed to be essentially as equal as they would have been if random assignment was possible. The independent variable is applied to the groups, and if group means are subsequently found to be different this difference is attributed to the independent variable.
5
5 Confounding Variables Because the subjects are not randomly assigned to groups the door is open for a certain type of confounding variable, where some pre-existing differences exist between the groups could account for the difference between the group means found after the independent variable is applied.
6
6 Static Group Design In a static group design the independent variable is what we use to divide subjects into groups. When, for example, the independent variable is ‘gender’ then the subjects are assigned to groups based upon their gender, we then measure some dependent variable to see if there is a difference between the genders. The independent variable ‘gender’ is a pre-existing property of the subjects, it is not a variable that is manipulated by the experimenter. In a static group design the independent variable is not manipulated by the experimenter.
7
7 Quasi-experimental vs. Static Group To make sure you understand the difference between the two designs. They both involve non-random assignment to groups, but in a quasi-experimental design you hope the groups start off as equal as they would have been if you could have randomly assigned to groups, you then manipulate the independent variable to see if you can make the groups different. In a static group design you use the independent variable to divide them into groups then measure to see if the groups are different.
8
8 Example (Quasi-Experimental) We want to examine the effectiveness of two different types of curriculum for teaching mathematics (methods ‘a1’ and ‘a2’). We have two different teachers (teachers ‘b1’ and ‘b2’). Each teacher teaches both methods in different classes, giving us four treatment combinations (classes) in a 2 by 2 factorial design. The dependent variable is an identical final exam that is used by both teachers using both curriculums.
9
9 Design 2-factor Design: (note as there are only two levels of each I.V. we can code each I.V. and the interaction term with one contrast each). I.V. A = Teacher A or Teacher B (i.e. X i1 ) I.V. B = Old Curriculum or New Curriculum (i.e. X i2 ) Interaction of A and B (i.e. X i3 ) D.V. = scores on a final (i.e. Y i ) If we could randomly assign students to treatment combinations it would be a true experimental design. In this case, however, students are allowed to select which class they take. As there is: 1) non-random assignment to groups; and 2) the independent variables (teacher and curriculum) are manipulated by the experimenter, this is a quasi-experimental design.
10
10 Analysis Without Controlling for Confounding Variables Model: Ŷ i =β 0 + β 1 X 1i + β 2 X 2i + β 3 X 3i
11
11 Analysis Without Controlling for Confounding Variables (cont.) Model: Ŷ i =β 0 + β 1 X 1i + β 2 X 2i + β 3 X 3i The effect of teacher (X1) and the teacher x curriculum interaction (X3) are statistically significant.
12
12 Confounding Variable But letting the students select which class they take introduces possible confounding variables. The one we will examine is the possibility that the students who are better at math may prefer one teacher over the other, or one approach over the other, and this in turn influences which class they take. If this is the case, then the difference between the cell means might not be due to the independent variables but instead be due to the better students deciding to enroll in a particular class. In other words, we would have gotten different results if the students were randomly assigned to the cells.
13
13 Covariate To control this confounding variable we decide to give everyone a pre-test that measures mathematical ability (our ‘covariate’ in ANOVA terms). We will call this variable ‘Z’ to make it easier to keep track of it when we include it in a model with the categorical variables X1, X2, and X3.
14
14 Redundancy If we had randomly assigned students to classes then we would expect the mean value of Z (math ability) to be about the same in each cell, which would make Z not redundant with our categorical variables. But in this case we are including Z specifically because we think its mean value is not the same in each cell. We think that knowing which cell a student is in would help us predict their Z score, and vice versa. In other words we think that Z and the categorical variables are at least somewhat redundant.
15
15 Testing to Determine Whether Z is a Confounding Variable To test whether math ability (measured by variable Z) is a confounding variable we will test whether it is redundant with the independent variables of the experiment. If we can use the independent variables to predict Z then Z is at least somewhat redundant with the independent variables. If it is redundant then it can account for some of the same error that the independent variables can (making it a confounding variable).
16
16 Testing to Determine Whether Z is a Confounding Variable Model C: Ź i =β 0 Model A: Ź i =β 0 + β 1 X 1i + β 2 X 2i + β 3 X 3i I couldn’t find a ‘Z’ symbol with a hat over it. If the independent variables, coded by X1, X2, and X3 can predict Z then they are redundant with Z.
17
17 Model C: Ź i =β 0 Model A: Ź i =β 0 + β 1 X 1i + β 2 X 2i + β 3 X 3i Z is redundant with the independent variables and thus is a confounding variable. Note that 1 - R² = 1 -.068 = 0.932 is the tolerance of Z in a design that contains Z, X1, X2, and X3 are predictors. This is not a lot of redundancy but is still statistically significant (p=.041).
18
18 Remember, that if the PRE of using X1, X2, and X3 to predict Z is statistically significant then we can conclude that Z is redundant, but given the nature of null hypothesis testing, if the PRE is not statistically significant that does not prove that Z is not redundant (sorry about the double negative), for failure to reject H0 does not prove H0 is true. In other words, we can prove Z is a confounding variable but we can’t prove it is not.
19
19 Including Z in the Model If we conclude that Z is a confounding variable then we will want to include it in our model. Ŷ i = β 0 + β 1 X 1 +β 2 X 2 +β 3 X 3 +β 4 Z
20
20 Controlling the Confounding Variable Ŷ i = β 0 + β 1 X 1 +β 2 X 2 +β 3 X 3 +β 4 Z Remember that when we analyze β 1, β 2, and β 3 we are looking at how much their corresponding variables add to the model after all of the other terms (including Z) are included. So we are looking for the effects of the independent variables after individual differences on math ability have already been accounted for, thus we have taken the confounding variable (Z) out of the analyses of the independent variables.
21
21 Power We are including Z specifically because it is redundant with the categorical predictors, but this can hurt the power of our test to see if the independent variables had an effect (see next slide). This, however, is what we want, for we are removing the confounding variable that was giving us an unrealistic picture of the effects of the independent variables.
22
22 PRE When Predictors are Redundant
23
23 Analysis When Controlling for Confounding Variable Z Model: Ŷ i =β 0 + β 1 X 1i + β 2 X 2i + β 3 X 3i + β 4 Z i
24
24 Analysis (cont.) 1)When we included the confounding variable in the model (thus removing its effect from the analysis of the independent variables) variable X1 (the difference between the two teachers) is no longer statistically significant. This makes sense (to me) for when I created the data I simulated the better students preferring to take the classes offered by teacher a1, when this confound was removed from the analysis the difference in effectiveness between the two teachers was no longer as great.
25
25 Analysis (cont.) 2)The effect of curriculum (X2) remained non significant. 3)The interaction of the two independent variables (X3) remained significant, in fact its p value went down a little when Z was included in the model. Why would that happen? In this case the answer probably lies in the tolerances, the tolerance of X3 (tolerances not shown above) is 1.00, thus it wasn’t redundant at all with the other predictor variables (including Z). If it wasn’t redundant with Z, then Z probably served to remove some of the within group variance involved in that contrast.
26
26 What We Have Done (ANOVA) Ŷ i = β 0 + β 1 X 1 +β 2 X 2 +β 3 X 3 +β 4 Z In terms of ANOVA, we have used Z to adjust the mean of each cell in such a way that the effect of the confounding variable is removed before the analysis of each independent variable begins (because each variable is analyzed as if it were added last to a model that contains the other variables).
27
27 What We Have Done (Model- Comparison Approach) Ŷ i = β 0 + β 1 X 1 +β 2 X 2 +β 3 X 3 +β 4 Z In model-comparison terms we are simply interested in whether it is worthwhile to include the pre-test (math ability) scores in our model, and how that influences the worthwhileness of the other variables.
28
28 Summary of the Two Approaches In the ANOVA perspective we are trying to remove a confounding variable from the analysis of the effects of our independent variables. In the model-comparison approach we are trying to come up with the best model of the dependent variable.
29
29 Caveat This provides a simple way to control for confounding variables in an experiment that does not have random assignment to groups. You identify possible confounding variables, measure them, determine if they are confounding, then see if the effects of the independent variable change when the effects of the confounding variables are controlled by adding them to the model. This is not, however, a sure-fire approach, for in a quasi- experimental design it is hard to know for sure whether or not you have thought of all possible confounding variables.
30
30 Further Explorations Following the model-comparison perspective, let’s say we find that it is worthwhile to have both the categorical variables (coding the independent variables) and the continuous variable (our confounding variable) in our model. It might be interesting to see if it would be worthwhile to move from an additive model to an interactive model: Ŷ i = β 0 + β 1 X 1 +β 2 X 2 +β 3 X 3 +β 4 Z +β 5 ZX 1 +β 6 ZX 2 +β 7 ZX 3
31
31 Further Explorations Let’s examine what the interaction terms would measure: X1 codes which teacher is teaching the class, so ZX 1 would look at whether the difference in effectiveness between the teachers was dependent upon the math ability of the student. It could be that for the students of low math ability the difference between the teachers was quite high, but for those of high math ability their was little difference between the teachers. Or, it could be that one teacher was better with students of low ability while the other was better with students of high ability. Very interesting!
32
32 Further Explorations X2 codes which curriculum was used, so ZX 2 would look at whether the difference in curriculum was dependent upon the math ability of the student. That could be quite interesting. X3 codes the interaction between teacher and curriculum, so ZX 3 would look at whether the interaction between the variables was dependent upon the math ability of the student. That’s a little harder to conceptualize but that could be quite interesting as well.
33
33 Effects of Confounding Variables It is important to note that confounding variables can hide real effects of the independent variable as well as create apparent effects of the independent variables that don’t really exist. So far we have taken a look at the latter case, where math ability created a difference between teacher effectiveness that went away when the confounding variable was included in the model.
34
34 Second Data Set In the first data set I had the better students choose to take classes from teacher a1, when actually teachers a1 and a2 were equally effective. The effect of the confounding variable was to artificially increase the differences between the apparent effectiveness of the two teachers. In this second data set I had the better students take the class from teacher 1a when teacher a2 was actually more effective. This effectiveness of teacher a2 however, is being hidden by the confounding variable of the better students selecting to take the class from the less effective teacher a1.
35
35 Analysis Without Controlling for Confounding Variable Z Model: Ŷ i =β 0 + β 1 X 1i + β 2 X 2i + β 3 X 3i Notice that none of the variables that code the independent variables are statistically significant, including that of teacher effectiveness (X1) despite my having created data where teacher a2 is better than teacher a1, that difference is being hidden by the confounding variable.
36
36 Analysis With Controlling for Confounding Variable Z Model: Ŷ i =β 0 + β 1 X 1i + β 2 X 2i + β 3 X 3i + β 4 Z i Notice that when we control the confounding variable by including it in the model that variable X1 (which codes the difference in effectiveness between the two teachers) is now statistically significant.
37
37 Static Group Designs The exact same procedures for controlling a confounding variable can be used in a static group design. In a static group design subjects are not randomly assigned to groups, and the independent variable is not manipulated by the experimenter. Instead, the independent variable is used to sort subjects into groups.
38
38 Example An example of a static group design would be to examine the effect of gender on salaries in some company. Gender is the independent variable, but it is not something that is ‘done’ to the subjects, instead it is the criteria by which subjects are assigned to one group or the other.
39
39 Confounding Variable Confounding variables in static group designs have to do with other systematic differences between the groups that arise when you use the independent variable to sort the subjects. In our example, if we use gender to sort employees into two groups, we might also end up sorting them by how many years they have worked for the company. It could be that only within the last ten years has the company given women an equal opportunity in the hiring process. If that is the case, then if we find that the two groups have different mean incomes is that attributable to our independent variable (gender) or is it due to the confounding variable (years of service)?
40
40 Controlling the Confounding Variable To control this confounding variable we simply add to our model how many years the employee has worked at the company. Independent Variable: Gender (X1) Confounding Variable: Years of Employment (X2) Dependent Variable: salary (Y) Model: Ŷ i = β 0 + β 1 X 1 +β 2 X 2 Now when we analyze the effect of gender (X1) on salary (Y) we will be doing so after years of employment (X2) has been accounted for (i.e. held steady, i.e. looking at differences in salary between genders for people who have been there an equal amount of time).
41
41 More Work to be Done This study is still pretty unsophisticated. For example, when we measure how many years each employee has worked for the company we are missing out on how much experience they had before they were hired. There also could be other confounding variables we have yet to add, for example, differences in educational levels between the genders. And, there are other variables that would be interesting to add as we progress. For example, whether it is worthwhile to add to the model the gender of the person making the decisions regarding promotions.
42
42 Interaction We might also include in our model the interaction of gender with years at the job. This could tell us whether the rate of getting salary increases over the years is dependent upon gender.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.