Download presentation
Presentation is loading. Please wait.
1
January 6, 2009 - afternoon session 1 Statistics Micro Mini Multiple Regression January 5-9, 2008 Beth Ayers
2
January 6, 2009 - afternoon session 2 Tuesday 1pm-4pm Session Dummy Variables Multiple regression ‒ Using quantitative and categorical explanatory variables ‒ Interactions among explanatory variables ‒ Linear regression vs. ANCOVA Two article critiques
3
January 6, 2009 - afternoon session 3 Dummy Variables Categorical explanatory variables can be used in a linear regression if they are coded as dummy variables For binary variables, the most frequently used codes are 0/1 and -1/+1 For a nominal variable with k levels, create k-1 explanatory variables that are each 0/1 ‒ each subject can have a value of one for at most one of the explanatory variables
4
January 6, 2009 - afternoon session 4 Dummy Variables Suppose we’d like to use a categorical variable that indicates which tutor (A or B) the student used. Define:
5
January 6, 2009 - afternoon session 5 Significance testing To test if X 2 has an affect ‒ H 0 : ¯ 2 = 0 ‒ H 1 : ¯ 2 ≠ 0 ‒ This is the usual t-test for a regression coefficient, we don’t need to do anything different for dummy variables If ¯ 2 = 0, then there is no difference between the mean response of Tutor A and B If ¯ 2 ≠ 0, then ¯ 2 is the difference between the mean response for Tutor A and Tutor B
6
January 6, 2009 - afternoon session 6 Interpretation Y = ¯ 0 + ¯ 1 *X 1 + ¯ 2 *X 2 Can think of this is two equations ‒ When X 2 = 0 ‒ Y = ¯ 0 + ¯ 1 *X 1 ‒ When X 2 = 1 ‒ Y = ¯ 0 + ¯ 1 *X 1 + ¯ 2 * 1= ( ¯ 0 + ¯ 2 ) + ¯ 1 *X 1 Then ¯ 0 + ¯ 1 is the new intercept for the case where X 2 = 1
7
January 6, 2009 - afternoon session 7 Dummy Variables Suppose we have three tutors (A, B, C). Define: Tutor A is considered the baseline
8
January 6, 2009 - afternoon session 8 Interpretation Y = ¯ 0 + ¯ 1 *X 1 + ¯ 2 *X 2 + ¯ 3 *X 3 Can think of this as three equations ‒ When X 2 = 0 and X 3 = 0 ‒ Y = ¯ 0 + ¯ 1 *X 1 ‒ When X 2 = 1 and X 3 = 0 ‒ Y = ¯ 0 + ¯ 1 *X 1 + ¯ 2 * 1= ( ¯ 0 + ¯ 2 ) + ¯ 1 *X 1 ‒ When X 2 = 0 and X 3 = 1 ‒ Y = ¯ 0 + ¯ 1 *X 1 + ¯ 2 * 1= ( ¯ 0 + ¯ 3 ) + ¯ 1 *X 1
9
January 6, 2009 - afternoon session 9 Interpretation ¯ 2 is then the difference between the mean response for Tutor A and Tutor B ¯ 3 is then the difference between the mean response for Tutor A and Tutor C To formally compare Tutor B to Tutor C, one must rerun the regression using either Tutor B or C as the baseline ‒ To informally compare them, one can look at the difference between ¯ 2 and ¯ 3
10
January 6, 2009 - afternoon session 10 Significance testing Again, we can use the usual t-test for a regression coefficient, we don’t need to do anything different
11
January 6, 2009 - afternoon session 11 Example Want to see if there is a gender effect in predicting efficiency Efficiency = ¯ 0 + ¯ 1 *WPM + ¯ 2 *Gender ‒ where
12
January 6, 2009 - afternoon session 12 Example
13
January 6, 2009 - afternoon session 13 Example
14
January 6, 2009 - afternoon session 14 Example Step 1 ‒ F-statistic: 791 ‒ P-value = 0.0000 So at least one of the two variables is important in predicting Efficiency
15
January 6, 2009 - afternoon session 15 Example Step 2 ‒ Test words per minute ‒ T-statistic: -38.34 ‒ P-value = 0.000 ‒ Test Gender ‒ T-statistic: -11.32 ‒ P-value = 0.000 Both words per minute and gender are important in predicting efficiency
16
January 6, 2009 - afternoon session 16 Example Regression Equations ‒ Males ‒ Efficiency = 84.77 – 0.49 ¢ WPM ‒ Females ‒ Efficiency = 84.77 – 0.49 ¢ WPM – 3.14 ¢ 1 ‒ Efficiency = 81.63 – 0.49 ¢ WPM
17
January 6, 2009 - afternoon session 17 Interpretation of the Parameters For words per minute: for each additional word per minute that a student can type, their efficiency increases by 0.5 minutes For Gender: Holding words per minute constant, females are, on average, more efficient by 3.14 minutes
18
January 6, 2009 - afternoon session 18 Interaction An interaction occurs between two or more explanatory variables (not between an explanatory variable and the response variable) An interaction occurs when the effect of a change in the level or value of one explanatory variable depends on the level or value of another explanatory variable In regression we account for an interaction by adding a variable that is the product of two existing explanatory variables
19
January 6, 2009 - afternoon session 19 Interpretation Y = ¯ 0 + ¯ 1 *X 1 + ¯ 2 *X 2 + ¯ 3 * X 1 *X 2 Assume that X 2 is a dummy variable and X 1 *X 2 is the interaction Again, can think of this is two equations ‒ When X 2 = 0 ‒ Y = ¯ 0 + ¯ 1 *X 1 ‒ When X 2 = 1 ‒ Y = ¯ 0 + ¯ 1 *X 1 + ¯ 2 *1 + ¯ 3 * X 1 *1 = ( ¯ 0 + ¯ 2 ) + ( ¯ 1 + ¯ 3 )* X 1 We can think of this as a new intercept and new slope for the case where X 2 = 1
20
January 6, 2009 - afternoon session 20 Interpretation Y = ¯ 0 + ¯ 1 *X 1 + ¯ 2 *X 2 + ¯ 3 * X 1 *X 2 ¯ 3 is called the interaction effect ¯ 1 and ¯ 2 are called main effects
21
January 6, 2009 - afternoon session 21 Interpretation If ¯ 3 is not significant, drop the interaction and rerun the regression ‒ Including the interaction, when it is not significant, can alter the interpretations of the other variables If ¯ 3 is significant, do not need to check if ¯ 1 and ¯ 2 are significant. We will always keep X 1 and X 2 in the regression
22
January 6, 2009 - afternoon session 22 Interaction Example Suppose we have two versions of a tutor and we want to know which helps students study for a math test In addition, we want to know if a student’s SAT math score affects their exam score We know which tutor each student used and we also have their SAT score and
23
January 6, 2009 - afternoon session 23 Interaction Example - EDA
24
January 6, 2009 - afternoon session 24 Interaction Example Sample output
25
January 6, 2009 - afternoon session 25 Interaction Example Step 1: are any of the variables significant in predicting exam score ‒ F-statistic: 6025 ‒ P-value = 0.000 Step 2: check interaction first ‒ T-statistic: 15.980 ‒ P-value = 0.000 Do not need to check main effects since the interaction is significant
26
January 6, 2009 - afternoon session 26 Interaction Example Regression equation Tutor A (tutor = 1) ‒ Exam score = (2.62 + 6.39) + (0.06+0.05) MathSAT ‒ Exam score = 9.01 + 0.11 MathSAT Tutor B (tutor = 0) ‒ Exam score = 2.62 + 0.06 MathSAT
27
January 6, 2009 - afternoon session 27 Interpretation of Coefficients On average, students using Tutor A have scores 6.39 points higher than students using tutor B For students using Tutor A, for each point that their Math SAT score increases, their exam score increases by 0.11 For students using Tutor B, for each point that their Math SAT score increases, their exam score increases by 0.06
28
January 6, 2009 - afternoon session 28 Interaction Example
29
January 6, 2009 - afternoon session 29 Example Explanatory variables ‒ GPA (0-5 scale) ‒ Math SAT score ‒ Time on tutor (in hours) ‒ Tutor used (A, B, C) Response variable ‒ Exam score
30
January 6, 2009 - afternoon session 30 Exploratory Analysis Exam Score GPAMath SATTime on Tutor Exam score1.000.140.950.02 GPA1.00-0.07-0.04 Math SAT1.000.00 Time on Tutor 1.00 ABC # students151817
31
January 6, 2009 - afternoon session 31 Plots
32
32 The Regression Think that time on tutor and the type of tutor may have an interaction
33
January 6, 2009 - afternoon session 33 Analysis Step 1 ‒ F-stat = 769.5p-value = 0.000 Step 2 ‒ Test the interactions first ‒ Test Time * Tutor B ‒ T-statistic: -0.727 P-value = 0.471 ‒ Test Time * Tutor C ‒ T-statistic: -0.195P-value = 0.847
34
January 6, 2009 - afternoon session 34 Next Steps Since neither interaction is significant, I would drop those two variables and rerun the regression Including the interaction, when it is not significant, can alter the interpretations of the other variables
35
January 6, 2009 - afternoon session 35 Updated regression
36
January 6, 2009 - afternoon session 36 Analysis Step 1 ‒ F-stat = 1111p-value = 0.000 Step 2 ‒ Test gpa ‒ T-statistic: 10.28P-value = 0.000 ‒ Test Math SAT score ‒ T-statistic: 70.03P-value = 0.000 ‒ Test time on tutor ‒ T-statistic: -0.43P-value = 0.672 ‒ Test Tutor B ‒ T-statistic: -10.52P-value = 0.000 ‒ Test Tutor C ‒ T-statistic: 2.60P-value = 0.0128
37
37 Next step Time on tutor is not significant ‒ Drop time and rerun
38
January 6, 2009 - afternoon session 38 Analysis Step 1 ‒ F-stat = 1414p-value = 0.000 Step 2 ‒ Test gpa ‒ T-statistic: 10.51P-value = 0.000 ‒ Test Math SAT score ‒ T-statistic: 70.69P-value = 0.000 ‒ Test Tutor B ‒ T-statistic: -10.80P-value = 0.000 ‒ Test Tutor C ‒ T-statistic: 2.67P-value = 0.011
39
January 6, 2009 - afternoon session 39 Interpretation For each addition GPA point, a student scores on average 2.1 points higher on the final exam For each addition Math SAT point, a student scores on average 0.11 points higher on the final exam
40
January 6, 2009 - afternoon session 40 Interpretation of Dummy Variables Students who used Tutor B scored on average 4.6 points lower on the final exam, compared to students using tutor A Students who used Tutor C scored on average 1.1 points higher on the final exam, compared to students using tutor A
41
January 6, 2009 - afternoon session 41 Interpretation of Dummy Variables We can say that students who used Tutor C scored on average 1.10-(-4.63) = 5.73 points higher than students who used Tutor B However, to say if it is a significant difference one would need to rerun the regression equation with either Tutor B or C as the baseline Although 5.73 is large, since we do NOT have a test statistic and p-value we can not make any claims about significance
42
January 6, 2009 - afternoon session 42 Check Assumptions
43
January 6, 2009 - afternoon session 43 Example Suppose we have the following regression Exam Score = 2.7 + 3.21*gpa + 0.18*MathSAT + 1.3*time + 1.01*TutorB - 1.44*TutorC + 1.8*time*TutorB - 1.7*time*TutorC Assume that in Step 1 we reject the null and that in Step 2 gpa, Math SAT, and the interaction are significant. Remember, since the interaction is significant, we are not concerned with the significance of time or tutor alone
44
January 6, 2009 - afternoon session 44 Interpretation Tutor A ‒ Exam Score = 2.7 + 3.21*gpa + 0.18*MathSAT + 1.3*time Tutor B ‒ Exam Score = 2.7 + 3.21*gpa + 0.18*MathSAT + 1.3*time + 1.01*TutorB + 1.8*time*TutorB ‒ Exam Score = (2.7 +1.01) + 3.21*gpa + 0.18*MathSAT + (1.3 + 1.8)*time ‒ Exam Score = 3.71 + 3.21*gpa + 0.18*MathSAT + 3.1*time Tutor C ‒ Exam Score = 2.7 + 3.21*gpa + 0.18*MathSAT + 1.3*time - 1.44*TutorC - 1.7*time*TutorC ‒ Exam Score = (2.7 -1.44) + 3.21*gpa + 0.18*MathSAT + (1.3 - 1.7)*time ‒ Exam Score = 1.26 + 3.21*gpa + 0.18*MathSAT - 0.40*time
45
January 6, 2009 - afternoon session 45 Interpretation For each additional point in GPA, a student’s exam score increases by 3.21 For each additional point in Math SAT, a student’s exam score increases by 0.18 Students who use tutor B score on average 1.01 points higher on the final exam than students using tutor A Students who use tutor C score on average 1.44 points lower on the final exam than students using tutor A
46
January 6, 2009 - afternoon session 46 Interpretation Students using Tutor A ‒ For each additional minute on the tutor, students exam scores increase by 1.3 Students using Tutor B ‒ For each additional minute on the tutor, students exam scores increase by 3.1 Students using Tutor C ‒ For each additional minute on the tutor, students exam scores decrease by 0.40
47
January 6, 2009 - afternoon session 47 ANCOVA Analysis of Covariance ‒ At least one quantitative and one categorical explanatory variable ‒ In general, the main interest is the effects of the categorical variable and the quantitative variable is considered to be a control variable ‒ It is a blending of regression and ANOVA
48
January 6, 2009 - afternoon session 48 ANCOVA Can either run a linear regression with a dummy variable or as an ANCOVA model, in which case output is similar to ANOVA models Will get the same results in either case! Different statistical packages make one or the other easier to run It is a matter of preference and interpretation
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.