Here, pal! Regress this! presented by Miles Hamby, PhD Principle, Ariel Training Consultants MilesFlight.20megsfree.com Or, How to Use Regression to Tell You Just About Everything Part 1
Typical – Descriptive Statistics Frequencies – numbers of things eg – How many female students have graduated over the last 6 years? Mean – measure of central tendency eg – What is the average time to complete an academic program for students with 12 hours transfer credit? Standard Deviation – measure of dispersion eg – 68% of completing students graduate within how many terms?
Shortcoming of Descriptive Statistics They do not predict. They can tell you what it is – but they can’t tell you what it will be
eg - Can we predict how many female students will graduate and when? Regression predicts! Can we predict when a student with no transfer credit will graduate? Can we predict the likelihood of graduation of a student based on gender?
How to Use Regression to Predict Question – What kind of student takes the longest time to graduate? What kind of student never graduates?
Typical way – Start with specific cohort (eg, Fall 1993) Select a single group (eg, 1-12 transfer credits) Count number who graduate each term Compute percentage ~ 25 graduated 100 started = 25% Conclusion – For Fall 93 cohort, graduation rate = 25% after 12 terms for those with 1-12 transfer credits
Exiguousness of Typical Method – DV implied, not specified (and therefore not tested) Does not measure strength of association to graduation time (correlation) or amount of effect (slope) on graduation time eg – compare age’s effect to transfer credits’ effect Graduation Rate does not predict time-in-program or time-to-completion Must repeat procedure for each time block
Time to graduation for each variable not discrete - includes all other variables Typical Method, e.g. Time to GraduationVariable X = 16 terms, S = 5 termsFemales ~ X = 13 terms, S = 4 terms1-12 Xfer Cr ~ X = 18 terms, S = 9 termsMarried ~
But how about a single, black, man with 17 transfer credits? Must repeat procedure for single students, then repeat for black students, then repeat for males then repeat for 13 – 20 transfer credits, then ‘eyeball’ how they correlate. Is there a way to determine how much of the 16 terms time for females (previous ex.) would be ameliorated by being a single, black, male with 17 transfer credit hours?
There is a way! Regress it! Effects of gender, age, transfer credits, marital status, citizenship, ethnicity, and more, directly on time to complete are measurable and comparable Pick a profile and I’ll tell you how long it will take for that student to graduate!
Procedure – 2. Identify independent variables (IV) that possibly effect graduation rates – gender, ethnicity, marital status, age, transfer credits, income 4. Run linear regression to determine: (b) significance of difference in means of IVs (c) regression model (y = a+b 1 X 1 …b n X n ) to predict Time by IVs (a) correlations between Time and IVs 1. Identify dependent variable (DV) – i.e, the question you are asking – eg, Time to Graduate (Time) 3. Collect data
Regression can tell you everything! # Terms = a +.4*marital +.2*Gender +.06*Age -.18*xfer EG – For a single male, age 32, with 18 transfer credits - we can expect a graduation time of 32 terms # Terms = 33 terms +.4*0 +.2*0 +.06* *18 32 terms = 33 terms
DV ~ Time to Graduation (# terms - ratio) Adding Variables IV ~Gender (F or M - nominal) Ethnic (B, H, W, NA, API, Alien - nominal) Alien (Alien or US - nominal) Marital status (si, ma, di – nominal) Age (# years - ratio) Transfer credits (# hours - ratio) Tutoring done (# sessions – ratio; Y/N - nominal
Coding Your Variables Scale (ratio) variables (time to completion, age, etc) – use number directly eg, Age = 32 years, use ’32’ Time to Comp (terms) = 12 terms, use ’12’
Coding Your Variables Nominal Variables – use ‘dummies’ What are Dummy Variables? Variables used to quantify nominal variables i.e., Nominal (qualitative) variables assigned a quantitative number and treated as a quantitative variable.
Dummy Variables eg – Ethnic - African-American, Hispanic, White Major – Bus, Account, Computers, English, LA Religion – Christian, Jew, Muslim, Hindu Dichotomous variable – two categories eg - Male or Female Married or Single Has had tutoring or hasn’t US Citizen or Alien Graduate student or Undergrad Polychotomous variable – several categories of the variable
Dummy Variables ‘Ethnic’ Make B, NA/AN, W, API,H, Unk unique variables Code as 1 = ‘presence of characteristic’ (‘Black’-ness) or 0 = ‘absence of characteristic’ eg, ‘Gender’ Code Male = 0, Female = 1 (or vice-versa) 1 = ‘presence of characteristic’ (femaleness) 0 = ‘absence of characteristic’
Dummy Variables B: 1 = yes, 0 = no AN: 1 = yes, 0 = no W: 1= yes, 0 = no API: 1 = yes, 0 = no H: 1 = yes, 0 = no Unk: 1 = yes, 0 = no Alien: 1 = yes, 2 = no Marital: 1 = MA/DI 0 = SI Gender: 1 = F, 0 = M Age: number years Transfer credits: number # Terms = 3 terms +.2*1 +.3* *10 +.4*3
# Terms = 32 terms + [.2*1+.2*0+.2*0 +.2*0] (ethnic) +.5*0 (Alien) +.4*1 (marital) +.2*1 (gender) +.06*32 (age) - 1.7*10 (xfer credits) e.g. ~ Black, US Citizen, single, female, married, 32 years old, 10 transfer credits: As Used in the Regression
Nominal Variables – Dichotomous - 2 values Create new column for dummy variable or recode original 1 = presence of characteristic of interest 0 = not the characteristic of interest (absence of characteristic) 1F-490G001F 0US0SI1U000M 1GREEN1MA1U110M 1P-R1MA0G001F 0US1DI1U110M 0US0SI1U121F 1F-10SI1U131F ALIENVISAMARITMARITLU/GLEVELTUTRDTUTSESGENDRSEX
Nominal Variables – more than 2 values Create new columns for dummy variables – one for each value 1 = presence of characteristic (value) 0 = absence of characteristic
Run the Regression SPSS
The Results!
Regression Models
Variable Correlations Note – although some variables are highly correlated to each other, the correlation (R) may not be significant
The Regression ANOVA Test of significance of the F statistic indicates all three the regression models are statistically significant (Sig. <.05) i.e, the variation was not by chance – another set of data would probably show the same results.
The Regression ANOVA The larger the F (ratio of the mean square of the Regression and mean square of the Error/Residual), the more robust the regression equation. I.e., the smaller the mean square residual, indicates smaller error or departure from the regression line = F =
Interpretation – Mean Square Error/Residual of Model 1 is > Mean Square Error of Model 2 Variation about the Regression Line Y QTRS to Completion 0 + error y y ŷ ŷ Model 1 error y y ŷ ŷ Model 2
The Regression Correlation (R) Model 3 returns the highest correlation (R =.392) with 15.4% (R 2 =.154) of the variation in Time to Completion (in Qtrs) being explained by the variables Alien, Ethnicity, Marital status, Gender, Age, Tutoring, Transfer credits, U/G status, and Major.
The Slopes Model 3 Interpretation The older the student, the shorter the time to completion (B = -.117)
Y QTRS to Completion Interpretation – Age slope shallow, slight effect on Qtrs to Completion Model 3 Slopes Graph – AGE AGE B = yrs 70 yrs
The Slopes Model 3 Interpretation The older the student, the shorter the time to completion (B = -.117) Married/Divorced tends to shorten completion time (B= ), but is not significant (Sig. =.309, >.05)
Y QTRS to Completion Interpretation – Married/Divorced very shallow, but not significant (Sig. <.000) Model 3 Slopes Graph – Married/Divorced Married B = (Single) 1 (Married/Divorced)
The Slopes Model 3 Interpretation The older the student, the shorter the time to completion (B = -.117) Married/Divorced tends to shorten completion time (B= ), but is not significant (Sig. =.309, >.05) Undergraduates tend to take considerably less time to complete than graduates (B = )
Y QTRS to Completion Interpretation – Undergraduates steep, tend to shorten Qtrs to Completion considerably over Graduates Model 3 Slopes Graph – Undergraduate vs Graduate Under B = (Graduate) 1 (Undergraduate)
The Slopes Model 3 Interpretation The older the student, the shorter the time to completion (B = -.117) Married/Divorced tends to shorten completion time (B= ), but is not significant (Sig. =.309, >.05) Undergraduates tend to take considerably less time to complete than graduates (B = ) Tutoring shortens time very slightly (B = ), but is not significant (Sig. =.571)
Y QTRS to Completion Interpretation – Undergraduates steep, tend to shorten Qtrs to Completion considerably over Graduates, but not significant (Sig..571 >.05) Model 3 Slopes Graph – Undergraduate vs Graduate Tutored B = (No Tutoring) 1 (Tutored)
The Slopes Mode 3 Interpretation Xfer slightly lengthens time (B=.04285) very slightly; GPA shortens time but is not significant (Sig. >.05)
Y QTRS to Completion Xfer B = Interpretation – Xfer & GPA very shallow, but GPA not significant (Sig. <.000) Model 3 Slopes Graph – GPA & Transfer Credits GPA Xfer GPA B =
The Slopes Model 3 Interpretation Xfer lengthens slightly; GPA shortens, but not significant Female (neg) tends to shorten time (B = -.110) over Male
0 (Male) 1 (Female) Y X QTRS to Completion Gender B = Interpretation – Female Qtrs to Completion tend to be predictably shorter than Male Qtrs Model 3 Slopes Graph - Gender
The Slopes Model 3 Interpretation Xfer lengthens slightly; GPA shortens, but not significant Female (neg) tends to shorten time (B = -.329) over Male Black, Nat Am & Unkn take longer than Whites (+ B) (NA not significant) Hisp & Asians tend to take shorter than Whites (-B)
Y X QTRS to Completion Interpretation – Black, Asian & Unknown tend to take longer than Whites (+ B); Hispanic & Native American tend to take shorter than Whites (-B) Model 3 Slopes Graph - Ethnicity White B = 0 Black B =.439 Hispanic B = Unknown.531 Native Am B =.719 Asian -.553
The Slopes Model 3 Interpretation Xfer lengthens slightly; GPA shortens, but not significant Female (neg) tends to shorten time (B = -.329) over Male Black, Nat Am & Unkn take longer than Whites (+ B); Hisp & Asians tend to take shorter than Whites (-B) Alien tends to take less time than US citizen (B = -.618)
Alien B = (US) 1 (Alien) Y X QTRS to Completion Interpretation – Alien tends to take less time than US citizen (B =.279) Model 3 Slopes Graph - Alien
The Slopes Model 3 Interpretation Xfer lengthens slightly; GPA shortens, but not significant Female (neg) tends to shorten time (B = -.329) over Male Black, Nat Am & Unkn take longer than Whites (+ B); Hisp & Asians tend to take shorter than Whites (-B) Alien tends to take less time than US citizens (B = -.618) Acc & Bus considerable effect (B= 2.638, 2.651); pos. relative to CIS slope ‘0’
Interpretation – Accounting & Business steepest slopes (2.638, 2.651); positive relative to CIS slope ‘0’ Y X QTRS to Completion Model 3 Slopes Graph - Major Computers B = 0 Business B = Accounting B = 2.638