Download presentation
Presentation is loading. Please wait.
Published byErika Preston Modified over 9 years ago
1
Here, pal! Regress this! presented by Miles Hamby, PhD Principle, Ariel Training Consultants MilesFlight.20megsfree.com drhamby@cox.net Part 2
2
The Equation MODEL 3 IVB (Slope) (Constant)35.577 Age-.117 Gender-.110 Married-4.05E-02 Black.439 Native Am.719 Asian-.553 Hispanic-.830 Unknown.531 Alien-.618 GPA-.277 Transfer Cr4.285E-02 Undergrad-3.259 Tutoring-4.71E-07 Accounting2.638 Business2.651 Y = a + bAge + bGen + bMar +bBlk + bNA + bAsn + bHis + bUnk + bAln + bGPA + bXfer + bUndergrad + bTutor + bAcc + bBus Y = 35.57 + (-.11)Age + (-.11)Gen + (-.04)Mar + (.43)Black + (.71)NatAm + (-.55)Asian + (-.83)Hisp + (-.53)Unk + (-.61)Alien + (.27)GPA + (.04)Xfer + (-3.25)Under + (-.04)Tutor + (2.63)Acc + (2.65)Bus
3
Let’s Predict! What is the predicted Quarters to completion for: Age 36, Male, Single, Black, US citizen, 3.5 GPA, 35 Transfer credits, Undergraduate, no Tutoring, Business major Y = 35.57 - (.11)Age - (.11)Gen - (.04)Mar + (.43)Black + (.71)NatAm - (.55)Asian - (.83)Hisp - (.53)Unk - (.61)Alien - (.27)GPA + (.04)Xfer – (3.25)Under - (.04)Tutor + (2.63)Acc + (2.65)Bus Y = 35.57 - (.11)(36) - (.11)(0) - (.04)(0) + (.43)(1) + (.71)(0) - (.55)(0) - (.83)(0) - (.53)(0) - (.61)(0) - (.27)(3.5) + (.04)(35) – (3.25)(1) - (.04)(0) + (2.63)(0) + (2.65)(1) 35.86 = 35.57 – 3.96 - 0 - 0 +.43 + 0 - 0 - 0 - 0 – 0 -.94 + 1.4 – 3.25 - 0 + 0 + 2.65
4
What is the predicted Quarters to completion for: Age 45, Female, Married, White, Alien, 3.0 GPA, No Transfer credits, Undergraduate, Tutored, Computer major Y = 35.57 - (.11)Age - (.11)Gen - (.04)Mar + (.43)Black + (.71)NatAm - (.55)Asian - (.83)Hisp - (.53)Unk - (.61)Alien - (.27)GPA + (.04)Xfer – (3.25)Under - (.04)Tutor + (2.63)Acc + (2.65)Bus Y = 35.57 - (.11)(45) - (.11)(1) - (.04)(1) + (.43)(0) + (.71)(0) - (.55)(0) - (.83)(0) - (.53)(0) - (.61)(1) - (.27)(3.0) + (.04)(0) – (3.25)(1) - (.04)(1) + (2.63)(0) + (2.65)(0) 25.8 = 35.57 – 4.95 -.11 -.04 + 0 + 0 - 0 - 0 - 0 -.61 -.81 + 0 - 3.25 -.04 + 0 + 0
5
Example Profiles Excel
6
Variation in the DV Each successive Model explains more of the variation (R 2 ) in the DV (Time to Completion) All three Models are significant (F <.05) But, 84.6% or more of the variation is still unexplained
7
Possible factors? Worklife, children, personal goals, financial aid, company sponsorship The point is – with R 2 only.154, there is some other other factor out there contributing more to Time to Completion and we need to find it!
8
Variation in the Slopes Cannot tell by the slopes – cannot compare apples to oranges Is the slope of Age (-.117) more or less than slope of GPA (-.277)? Apples to apples – i.e., use Standardized ‘Beta’ Beta Age (-.162) more Beta Acc (.016); i.e., unit of Age results in greater change than unit of GPA
9
Drawing Conclusions Summarize the correlations (Pearson’s R) Summarize the effects (coefficient B) Summarize the variation (R 2 ) “There is a statistically significant association between all the variables and Time to Completion.” “Academic major and transfer credits, and Undergraduate status seem to have the greatest affects.” “However, 86% of the variation in Time to Completion is still unexplained.” Suggest what’s next “Data on worklife, income, finances, and company sponsorship should be collected and anlayed.”
10
In Summary Regression measures the strength of association (correlation) for all variables considered at the same time Regression can predict the outcome of any given profile Regression measures the amount of effect (slope) of each variable on the dependent variable as ameliorated by all other variables
11
Regress it, Pal! It’s where it’s at!
12
Tests of Significance t-test for dichotomous variable (two categories) eg – Is there a difference in GPA between men and women? F-test - One-way ANOVA for polychomtomous (more than two categories) eg -- Is there a difference in GPA between African-American, Hispanic, Anglo, and Native American students? Purpose – determine if there is a significant difference between means of the categories of the nominal variable
13
References Lind, D., Marchal, R., Mason (2001); Statistical Techniques in Business & Economics, 11 th ed., McGraw- Hill Companies, Inc., New York, NY. ISBN 0-07-112318- 0 McClendon, J. (1994); Multiple Regression and Causal Analysis, F.E. Peacock Pulishers, Inc., Itasca, IL. ISBN 0- 87581-384-4 SPSS (1999); SPSS Base 9.0 Applications Guide, SPSS, Inc., Chicago, IL. ISBN 0-13-020401-3
14
Shortcoming of t-test and F ~ eg - Can we predict the GPA of a student based on gender? Regression predicts! Can we predict the level of satisfaction with a course based on gender? Can we predict the likelihood of graduation of a student based on gender? They do not predict.
15
Examples - Means of t-test and F Dichotomous - Find the mean GPA of males and that of females and compare them with a t-test. Polychotomous - Find the mean GPA for African- Americans, Hispanics, and Anglos and compare them with a one-way ANOVA
16
Example 1 Data Arbitrarily Code ‘gender’ (nominal variables) Female = 1 Male = 0 IDSATGPAGender Stud 153.2F (1) Stud 1132.7M (0)
17
(a) Correlation r (SPSS ‘R’) =.846 Interpretation – GPA is strongly associated with gender type Example 1 IDSATGPAGender Stud 153.2F (1) Stud 1132.7M (0)
18
Example 1 (b) Significance of difference in means of GPA by gender – ANOVA F < 0.05 Interpretation - reject Ho, i.e., there is a statistically significant difference in GPA according to gender IDSATGPAGender Stud 153.2F (1) Stud 1132.7M (0)
19
(c) Regression model (y=a+bx) Example 1 Interpretation – Male SAT is 1.9, female SAT is 1.9 + 2.3 = 4.2; i.e., mean female GPA is higher than mean male GPA GPA = 1.9 + 2.3 (gender code) IDSATGPAGender Stud 153.2F (1) Stud 1132.7M (0)
20
(a) Correlation r (SPSS ‘R’) =.837 Interpretation – GPA is strongly associated with gender type Example 1 IDSATGPAGender Stud 153.2F (1) Stud 1132.7M (0)
21
Example 1 (b) Significance of difference in means of GPA by gender – ANOVA F < 0.05 Interpretation - reject Ho, i.e., there is a statistically significant difference in GPA according to gender IDSATGPAGender Stud 153.2F (1) Stud 1132.7M (0)
22
(c) Regression model (y=a+bx) Coefficients a 2.620.11223.377.000 1.030.158.8376.498.000 (Constant) Gender Model 1 BStd. Error Unstandardized Coefficients Beta Standardi zed Coefficien ts tSig. Dependent Variable: GPA a. Example 1 Interpretation – Male GPA is 2.62, female GPA 2.62 + 1.03 = 3.65; i.e., mean female GPA is higher than mean male GPA GPA = 2.62 + 1.03 (gender code) IDSATGPAGender Stud 153.2F (1) Stud 1132.7M (0)
23
Nonsense coding – randomly assigning a random number to a nominal variable Regardless of the number assigned to a nominal variable, the strength of association is unaffected, ie, r (correlation), r 2 (coef. of determination) and B (slope) eg – Male = 1, Female = 2 Male = 13, Female = 43 Male = 0, Female = 1 Hispanic = 35, African-American = 72, Anglo = 87 For dichotomous variable, coding number is not important
24
BUT – slopes and intercepts coded nonsense are difficult to interpret, unless coded ‘0’ or ‘1’ eg - Male = 0, Female = 1 Mean GPA for Male (Ym) = 2.8, Mean GPA for Female (Yf) = 3.5 Ym – Yf Xm - Xf = 2.62 – 3.65 0 - 1 – 1.03 - 1 1.03 == Slope B = Result - the mean GPA of the category coded 0 = the Y-intercept
25
0 (Male) 1 (Female) Y X 3.65 2.62 B = 1.03 Interpretation – Female GPAs tend to be predictably higher than Male GPAs Ym – Yf Xm - Xf = 2.62 – 3.65 0 - 1 – 1.03 - 1 1.03 == Slope B =
26
0 (Female) 1 (Male) Y X 2.62 3.65 B = - 1.03 Interpretation – same result Female GPAs tend to be predictably higher than Male GPAs Recode Male = 1, Female = 0: Ym – Yf Xm - Xf = 3.65-2.62 0 - 1 1.03 - 1 -1.03 == Slope B =
27
0 (Female) 1 (Male) Y X 2.62 3.65 Interpretation – We can predict GPA based on male or female Thus, regression equation is: With one variable category = 0, (eg female) then Y intercept is the mean of that category and the slope predicts the other category Y = 3.65 – 1.03X
28
Fine – but what about polychotomous variables? Cannot use single dummy variable for more than two categories. Why? This would assume the nominal categories were actually interval, ie, one was more of the other. eg, if ethnic variable were coded thus: Hispanic = 1, African-Am = 2, Anglo = 3, the regression would assume that Anglo is 2 units greater than Hispanic, etc
29
Regression also interprets a dichotomous variable (eg male=0, female=1) as female being 1 unit more than male. However, with more than two categories, this is not true. But, with dichotomous, the mean score of code ‘0’ is the intercept, and the mean score of code ‘1’ is the intercept + the slope.
30
eg – Ethnic Category Therefore, must treat each category as a unique variable – Has it Doesn’t have it Hispani c 1 0 Afr-Am 1 0 Anglo 1 0 Code each category/variable as: 1 = ‘presence of characteristic’ or 0 = ‘absence of characteristic’ a ‘Dummy’ variable (Depicts 3 students – one in each ethnic category)
31
For each case/subject, code each category as either ‘having it’ or ‘not’ Coding Polychotomous Nominal Variables As Dummy Variables Case ID Stud 1 Stud 2 Stud 3 Hispani c0 1 0 Afr-Am 1 0 Anglo 0 1 eg – Student 1 is an African-American Student 2 is an Hispanic Student 3 is an Anglo (Depicts 3 students – one in each ethnic category)
32
Regression equation would look like: Y = a + bH + bAA + bAn i.e., the sum of the three dummies for each case always equals ‘1’. Stud 1 (Hispanic) ~ 0 + 1 + 0 = 1 Stud 2 (Afr-Am) ~ 1 + 0 + 0 = 1 Stud 3 (Anglo) ~ 0 + 0 + 1= 1 Problem –‘perfect multi-collinearity’ Case ID Stud 1 Stud 2 Stud 3 Hispanic 0 1 0 Afr-Am 1 0 Anglo 0 1
33
The resulting regression equation would return a confusing Y-intercept (a): Y = a + bH + bAA + bAn Case ID Stud 1 Stud 2 Stud 3 Hispanic 0 1 0 Afr-Am 1 0 Anglo 0 1 i.e., what is the reference point from which to determine the actual means of the other variables?
34
What to do - drop one category from the regression i.e., use only g – 1 dummies eg, - Y = a + bH + bAA (bAn dropped for all cases) Reference group – the category/group chosen to be dropped Choosing the Reference group – the group that has the most normative support
35
By leaving out a group, not all cases will sum to ‘1’, and therefore: the regression equation predicts the mean Y for the group to which the case/student belongs, in reference to the Y-intercept. Student 1 (African-AM): Y AA = a + 0 (b*0H) + b (b*1AA) = a + b AA Student 2 (Hispanic): Y H = a + b (b*1H) + 0 (b*0 AA) = a + b H Student 3 (Anglo): Y An = a + 0 (b*0H) + 0 (b*0 AA) = a Case ID Stud 1 Stud 2 Stud 3 Hispanic 0 1 0 Afr-Am 1 0 Anglo 0 1 i.e., Mean Y of reference group ‘Anglo’ is the intercept ‘a’; All other groups are then compared to ‘Anglo’
36
Case ID Stud 1 Stud 2 Stud 3 Satisfaction 4 2 5 Hispanic 0 1 0 Afr-Am 1 0 Anglo 0 1 Satisfaction coding: 5 Very satisfied 4 Satisfied 3 Ambivalent 2 Dissatisfie d 1 Very Dissatisfied Example – Satisfaction with a course
37
Case ID Stud 1 Stud 2 Stud 3 Satisfaction 4 2 5 Hispanic 0 1 0 Afr-Am 1 0 Anglo 0 1 Thus, predicted satisfaction level for any other Hispanic student would be ~ Y = 3.0 + 0.4*1 +0.2*0 = 3.4 Likewise, predicted satisfaction level for any other Africa- American student would be ~ = 3.0 + 0.4*0 +0.2*1 = 3.2 Assume that a multiple regression of the affect of ethnicity on satisfaction returned a Y-intercept of 3.0 with slopes H =.4, AA =.2 (An held out as reference group) i.e., Y = 3.0 +.4H +.2AA Predicted satisfaction level for any Anglo student is the intercept ‘a’ = 3.0
38
Slopes – indicate difference between a specific category/group and the reference group. ie, the 0.4 slope for Hispanic indicates Hispanic satisfaction is 0.4 more than Anglo. i.e., Y = 3.0 + 0.4H + 0.2AA Likewise, African-American satisfaction is 0.2 more than Anglo. Also, relatively, African-American satisfaction is 0.2 less than Hispanic.
39
Note - this does not predict the satisfaction level (or GPA, etc) of a unique individual student – only one of a particular ethnic background. Because there is no ‘degree’ of the characteristic ‘ethnicity’ i.e., you are either Anglo, or you are not.
40
Example (re Example Data) Satisfaction and Ethnic Group – Anglo as reference group Interpretation - Mean Anglo satisfaction level is 3.0, mean Afr- Am level is 2.667, mean Hispanic level is 3.375 Regression Model ~ Y = 3.0 -.333AA +.375H
41
Effect of Multi-colinearity in SPSS – If SPSS detects perfect multi-colinearity within the selected IVs, it drops one IV.
42
To make a prediction more ‘individually unique’, add other variables eg, gender (nominal), age (ratio), time spent on homework (ratio) Adding Other Variables Y = a + [b*H + b*AA] + b*Age + b*Homework
43
Y (GPA) = a + [b*H + b*AA] + b*Age + b*Homework Case ID Stud 1 Stud 2 Stud 3 GPA 3.9 3.1 3.2 Hispani c 0 1 0 Afr-Am 1 0 Anglo 0 1 Age 19 28 23 Hours on Homework 14 5 8 Example – given above data, the regression prediction model would be:
44
Y (GPA) = a + [b*H + b*AA] + b*Age + b*Homework Intercept - new ‘a’ intercept is no longer the mean score for Anglo - it is now the individual score for someone who scored ‘0’ age and ‘0’ hours on homework However - things now change Slopes – now indicates difference between ethnic group and the reference group for individuals who do not differ in ‘age’ or ‘homework’.
45
Applications Research Question - Do gender, culture, or age of a student have an effect on the student’s perception of his/her learning? RETENTION???? Student Opinion Polls (RETENTION???) at Strayer University That is, can we predict a student’s RETNETION???perception of his/her learning based on his/her gender, culture, and age? And if so, which variable has the greatest effect?
46
Applications Collect data from a survey asking students to indicate their perception of satisfaction and instructor effectiveness and how they perceived their instructor. Methodolgy Survey must be designed for a regression, i.e., must have DV and IV.
47
Dependent Variables: Instructor Effectiveness - Scale data, 4 through 1 How satisfying was this course? VERY SATISFYING SATISFYING NOT SATISFYING DISAPPOINTING How effective do you feel your instructor was? VERY EFFECTIVE EFFECTIVE SOMEWHAT NOT EFFECTIVE Satisfaction – Scale data, 4 through 1
48
Independent Variables – nominal, four descriptors FREE DISCUSSION LECTURE BASED THEORY BASED ACTIVITY BASED Which one of the following describes your instructor’s teaching technique? STUDENT CENTERED LITTLE INVOLVEMENT GAVE TIME TO THINK ALONE ACTIVE PARTICIPATION Which one of the following describes your instructor’s involvement with students? Which one of the following describes your instructor’s method of teaching? Which one of the following best describes your instructor? GOT US INVOLVED MOSTLY INSTRUCITONS MOSTLY WRITTEN MOSTLY ACTIONS LISTENERDIRECTORINTERPRETERCOACH
49
Independent Variables – nominal, four descriptors FREE DISCUSSION LECTURE BASED THEORY BASED ACTIVITY BASED Which one of the following describes your instructor’s teaching technique? STUDENT CENTERED LITTLE INVOLVEMENT GAVE TIME TO THINK ALONE ACTIVE PARTICIPATION Which one of the following describes your instructor’s involvement with students? Which one of the following describes your instructor’s method of teaching? Which one of the following best describes your instructor? GOT US INVOLVED MOSTLY INSTRUCITONS MOSTLY WRITTEN MOSTLY ACTIONS LISTENERDIRECTORINTERPRETERCOACH
50
SPSS Regression Output – Correlation Instructor Descriptor on Satisfaction (all included) Interpretation – As all descriptors were included, the correlation (multiple R) is difficult to interpret.
51
SPSS Regression Output – Means & Slopes Instructor Descriptor on Satisfaction (all included) Interpretation ~ As all four descriptors were included, means and slopes are difficult to interpret. However, because it ran, perfect multi-colinearity must not exist – i.e., at least one of the records is missing a ‘1’ score for at least one descriptor.
52
SPSS Regression Output – Instructor Descriptor on Satisfaction Coach as Reference Interpretation – With Coach as reference, descriptors depict a modest correlation (R =.105) to Satisfaction and explain only 1.1% (R 2 -.101) of the variation in Satisfaction.
53
SPSS Regression Output – Means and Slopes Instructor Descriptor on Satisfaction (Coach as Reference) Interpretation – With Coach as reference, Mean score for Coach is 3.313, Listener is 3.474 (slightly higher), Director is 3.272 (slightly lower than Coach), Interpret is 3.351 (slightly higher than Coach). The relatively small slopes suggest relatively little effect the respective descriptor has on Satisfaction.
54
Excel Outputs Examples of same regression analyses in MS Excel (Refer to handouts)
55
In Summary Regression, as a primary tool for prediction, requires quantitative data. Qualitative variables are vastly used in social research Convert these qualitative variables to quantitative variables by ‘dummy’ coding them ‘1’ – presence of quality, or ‘0’ – absence of quality. By so doing, the correlations, means, and slopes become meaningful.
56
Applications Research Question - Do gender, culture, or age of a student have an effect on the student’s perception of his/her learning? RETENTION???? Student Opinion Polls (RETENTION???) at Strayer University That is, can we predict a student’s RETNETION???perception of his/her learning based on his/her gender, culture, and age? And if so, which variable has the greatest effect?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.