Download presentation
Presentation is loading. Please wait.
1
Research & Training Consultants
Here, pal! Regress this! presented by Miles Hamby, PhD Research & Training Consultants MilesFlight.com Quantifying Qualitative Variables
2
Here, pal! Regress this! Miles Hamby, PhD presented by
Director of Institutional Research & Assessment Strayer University Quantifying Qualitative Variables
3
Typical – Descriptive Statistics
Frequencies – numbers of things eg – 70 out 340 (21% ) of female students have graduated over the last 6 years Mean – measure of central tendency eg – The average time to complete an academic program for students with 12 hours transfer credit is 36 terms. Standard Deviation – measure of dispersion eg – 68% of completing students graduate between 25 and 42 terms
4
Shortcoming of Descriptive Statistics
They can tell you what it is – but they can’t tell you what it will be They do not predict.
5
Regression predicts! eg -
Can we predict how many female students will graduate and when? Can we predict when a student with no transfer credit will graduate? Can we predict the likelihood of graduation of a student based on gender?
6
How to Use Regression to Predict
Question – What kind of student takes the longest time to graduate? What kind of student never graduates?
7
Typical way – Start with specific cohort (eg, Fall 1993)
Select a single group (eg, 1-12 transfer credits) Count number who graduate each term Compute percentage ~ 25 graduated 100 started = 25% Conclusion – For Fall 93 cohort, graduation rate = 25% after 12 terms for those with 1-12 transfer credits
8
Exiguousness of Typical Method –
DV implied, not specified (and therefore not tested) Does not measure strength of association (correlation) to graduation time or amount of effect (slope) on graduation time eg – compare age’s effect to transfer credits’ effect Graduation Rate does not predict time-in-program or time-to-completion, or even whether or not one will graduate Must repeat procedure for each time block
9
Typical Method, e.g. Time to graduation for each variable not discrete - includes all other variables Time to Graduation Variable X = 16 terms, S = 5 terms Females ~ = 13 terms, S = 4 terms 1-12 Xfer Cr ~ = 18 terms, S = 9 terms Married ~
10
But how about a single, black, man
with 17 transfer credits? Must repeat procedure for single students, then repeat for black students, then repeat for males then repeat for 13 – 20 transfer credits, then ‘eyeball’ how they correlate. Is there a way to determine how much of the 16 terms time for females (previous ex.) would be ameliorated by being a single, black, male with 17 transfer credit hours?
11
Regress it! There is a way!
Effects of gender, age, transfer credits, marital status, citizenship, ethnicity, and more, directly on time to complete are measurable and comparable Pick a profile and I’ll tell you how long it will take for that student to graduate!
12
Procedure – 1. Identify dependent variable (DV) – i.e, the question you are asking – eg, Time to Graduate (Time) 2. Identify independent variables (IV) that possibly effect graduation rates – gender, ethnicity, marital status, age, transfer credits, income 3. Collect data 4. Run linear regression to determine: (a) correlations between Time and IVs (b) significance of difference in means of IVs (c) regression model (y = a+b1X1…bnXn) to predict Time by IVs
13
Regression can tell you everything!
For a single male, age 32, with 18 transfer credits - we can expect a graduation time of 32 terms # Terms = a + .4*Marital + .2*Gender + .06*Age - .18*xfer # Terms = 33 terms + .4*0 + .2* * *18 32 terms = 33 terms
14
Adding Variables DV ~ Time to Graduation (# terms - ratio)
IV ~ Gender (F or M - nominal) Ethnic (B, H, W, NA, API, Alien, Unk - nominal) Alien (Alien or US - nominal) Marital status (si, ma, di – nominal) Age (# years - ratio) Transfer credits (# hours - ratio) Tutoring done (# sessions – ratio; Y/N - nominal
15
Coding Your Variables Scale (ratio) variables (time to completion, age, etc) – use number directly eg, Age = 32 years, use ’32’ Time to Comp (terms) = 12 terms, use ’12’
16
What are Dummy Variables?
Coding Your Variables Nominal Variables – use ‘dummies’ What are Dummy Variables? Variables used to quantify nominal variables i.e., Nominal (qualitative) variables assigned a quantitative number and treated as a quantitative variable.
17
Dummy Variables Dichotomous variable – two categories
eg - Male or Female Married or Single Has had tutoring or hasn’t US Citizen or Alien Graduate student or Undergrad Polychotomous variable – several categories of the variable eg – Ethnic - African-American, Hispanic, White Major – Bus, Account, Computers, English, LA Religion – Christian, Jew, Muslim, Hindu
18
Dummy Variables ‘Ethnic’ eg, ‘Gender’
Code Male = 0, Female = 1 (or vice-versa) 1 = ‘presence of characteristic’ (femaleness) 0 = ‘absence of characteristic’ ‘Ethnic’ Make B, NA/AN, W, API,H, Unk unique variables Code as 1 = ‘presence of characteristic’ (‘Black’-ness) or 0 = ‘absence of characteristic’
19
Dummy Variables Alien: 1 = yes, 2 = no B: 1 = yes, 0 = no
Marital: 1 = MA/DI 0 = SI Gender: 1 = F, 0 = M Age: number years Transfer credits: number B: 1 = yes, 0 = no AN: 1 = yes, 0 = no W: 1= yes, 0 = no API: 1 = yes, 0 = no H: 1 = yes, 0 = no Unk: 1 = yes, 0 = no # Terms = 3 terms + .2*1 + .3* * *3
20
As Used in the Regression
Black, US Citizen, single, female, married, 32 years old, 10 transfer credits: # Terms = 32 terms + [.2*1+.2*0+.2*0 +.2*0] (ethnic) + .5*0 (Alien) + .4*1 (marital) + .2*1 (gender) + .06*32 (age) - 1.7*10 (xfer credits)
21
Nominal Variables – Dichotomous - 2 values
Create new column for dummy variable or recode original 1 = presence of characteristic of interest 0 = not the characteristic of interest (absence of characteristic) 1 F-4 9 G F US SI U M GREEN MA P-R DI 2 F-1 3 ALIEN VISA MARIT MARITL U/G LEVEL TUTRD TUTSES GENDR SEX
22
Nominal Variables – more than 2 values
Create new columns for dummy variables – one for each value 1 = presence of characteristic (value) 0 = absence of characteristic 1 4 ACC 2 CIS BUS 3 5 0UNKN 5HISP 4ASIAN 3WHITE 2NATAM 1BLACK ETHNIC MAJOR
23
SPSS Run the Regression
ANALYZE/REGRESSION/LINEAR/DV to Dependent, first model IVs to Independent/NEXT/2nd model IVs to Independent/NEXT or STATISTICS/check Model Fit, Descriptives, R Squared Change/Continue/OK
24
The Results!
25
Regression Models
26
Variable Correlations
.005 .338 Note – although all variables show correlation to each other, the correlation (R) may not be significant
27
The Regression ANOVA Test of significance of the F statistic indicates all three the regression models are statistically significant (Sig. < .05) i.e, the variation was not by chance – another set of data would probably show the same results.
28
The Regression ANOVA 893.215 38.960 = 22.926 F =
The larger the F (ratio of the mean square of the Regression and mean square of the Error/Residual), the more robust the regression equation. I.e., the smaller the mean square residual, indicates smaller error or departure from the regression line.
29
Variation about the Regression Line
error y ŷ Model 2 error y ŷ Model 1 Y QTRS to Completion + Interpretation – Mean Square Error/Residual of Model 1 is > Mean Square Error of Model 2
30
The Regression Correlation (R)
Model 3 returns the highest correlation (R = .392) with 15.4% (R2 = .154) of the variation in Time to Completion (in Qtrs) being explained by the variables Alien, Ethnicity, Marital status, Gender, Age, Tutoring, Transfer credits, U/G status, and Major.
31
The Slopes Model 3 Interpretation The older the student, the shorter the time to completion (B = -.117)
32
Model 3 Slopes Graph – AGE
Y QTRS to Completion Interpretation – Age slope shallow, slight effect on Qtrs to Completion Model 3 Slopes Graph – AGE AGE B = 35.577 0 yrs 70 yrs
33
The Slopes Model 3 Interpretation The older the student, the shorter the time to completion (B = -.117) Married/Divorced tends to shorten completion time (B= ), but is not significant (Sig. = .309, >.05)
34
Model 3 Slopes Graph – Married/Divorced
Y QTRS to Completion Interpretation – Married/Divorced very shallow, but not significant (Sig. <.000) Model 3 Slopes Graph – Married/Divorced Married B = 35.577 (Single) 1 (Married/Divorced)
35
The Slopes Model 3 Interpretation The older the student, the shorter the time to completion (B = -.117) Married/Divorced tends to shorten completion time (B= ), but is not significant (Sig. = .309, >.05) Undergraduates tend to take considerably less time to complete than graduates (B = )
36
Model 3 Slopes Graph – Undergraduate vs Graduate
Y QTRS to Completion Interpretation – Undergraduates steep, tend to shorten Qtrs to Completion considerably over Graduates Model 3 Slopes Graph – Undergraduate vs Graduate Under B = 35.577 (Graduate) 1 (Undergraduate)
37
The Slopes Model 3 Interpretation The older the student, the shorter the time to completion (B = -.117) Married/Divorced tends to shorten completion time (B= ), but is not significant (Sig. = .309, >.05) Undergraduates tend to take considerably less time to complete than graduates (B = ) Tutoring shortens time very slightly (B = ), but is not significant (Sig. =.571)
38
Model 3 Slopes Graph – Undergraduate vs Graduate
Y QTRS to Completion Interpretation – Undergraduates steep, tend to shorten Qtrs to Completion considerably over Graduates, but not significant (Sig > .05) Model 3 Slopes Graph – Undergraduate vs Graduate Tutored B = 35.577 (No Tutoring) 1 (Tutored)
39
The Slopes Mode 3 Interpretation Xfer slightly lengthens time (B=.04285) very slightly; GPA shortens time but is not significant (Sig. >.05)
40
Model 3 Slopes Graph – GPA & Transfer Credits Y QTRS to Completion
Xfer B = Interpretation – Xfer & GPA very shallow, but GPA not significant (Sig. <.000) Model 3 Slopes Graph – GPA & Transfer Credits GPA 1.00 2.00 3.00 4.00 Xfer 50 100 150 GPA B = 35.577
41
The Slopes Female tends to shorten time (B = -.110) over Male
Model 3 Interpretation Xfer lengthens slightly; GPA shortens, but not significant Female tends to shorten time (B = -.110) over Male
42
Model 3 Slopes Graph - Gender
(Male) 1 (Female) Y X QTRS to Completion Gender B = Interpretation – Female Qtrs to Completion tend to be predictably shorter than Male Qtrs Model 3 Slopes Graph - Gender 35.577
43
The Slopes Xfer lengthens slightly; GPA shortens, but not significant
Model 3 Interpretation Xfer lengthens slightly; GPA shortens, but not significant Female tends to shorten time (B = -.329) over Male Black, Nat Am & Unkn take longer than Whites (+ B) (NA not significant) Hisp & Asians tend to take shorter than Whites (-B)
44
Model 3 Slopes Graph - Ethnicity Native Am B = .719 Y
X QTRS to Completion Interpretation – Black, Asian & Unknown tend to take longer than Whites (+ B); Hispanic & Native American tend to take shorter than Whites (-B) Model 3 Slopes Graph - Ethnicity White B = 0 Black B = .439 Hispanic B = Unknown .531 Native Am B = .719 Asian -.553
45
The Slopes Xfer lengthens slightly; GPA shortens, but not significant
Model 3 Interpretation Xfer lengthens slightly; GPA shortens, but not significant Female tends to shorten time (B = -.329) over Male Black, Nat Am & Unkn take longer than Whites (+ B); Hisp & Asians tend to take shorter than Whites (-B) Alien tends to take less time than US citizen (B = -.618)
46
Model 3 Slopes Graph - Alien
Alien B = (US) 1 (Alien) Y X QTRS to Completion Interpretation – Alien tends to take less time than US citizen (B = .279) Model 3 Slopes Graph - Alien
47
The Slopes Xfer lengthens slightly; GPA shortens, but not significant
Model 3 Interpretation Xfer lengthens slightly; GPA shortens, but not significant Female tends to shorten time (B = -.329) over Male Black, Nat Am & Unkn take longer than Whites (+ B); Hisp & Asians tend to take shorter than Whites (-B) Alien tends to take less time than US citizens (B = -.618) Acc & Bus considerable effect (B= 2.638, 2.651); pos. relative to CIS slope ‘0’
48
Model 3 Slopes Graph - Major
Interpretation – Accounting & Business steepest slopes (2.638, 2.651); positive relative to CIS slope ‘0’ Y X QTRS to Completion Model 3 Slopes Graph - Major Computers B = 0 Business B = 2.651 Accounting B = 2.638
49
Coffee-break!
50
The Equation Y = a + bAge + bGen + bMar +bBlk
MODEL 3 IV B (Slope) (Constant) Age -.117 Gender -.110 Married -4.05E-02 Black .439 Native Am .719 Asian -.553 Hispanic -.830 Unknown .531 Alien -.618 GPA -.277 Transfer Cr 4.285E-02 Undergrad Tutoring -4.71E-07 Accounting 2.638 Business 2.651 Y = a + bAge + bGen + bMar +bBlk + bNA + bAsn + bHis + bUnk + bAln + bGPA + bXfer + bUndergrad + bTutor + bAcc + bBus Y = (-.11)Age + (-.11)Gen + (-.04)Mar + (.43)Black + (.71)NatAm + (-.55)Asian + (-.83)Hisp + (-.53)Unk + (-.61)Alien + (.27)GPA + (.04)Xfer + (-3.25)Under + (-.04)Tutor + (2.63)Acc + (2.65)Bus
51
Let’s Predict Someone! What is the predicted Quarters to completion for: Age 36, Male, Single, Black, US citizen, 3.5 GPA, 35 Transfer credits, Undergraduate, no Tutoring, Business major Y = (.11)Age - (.11)Gen - (.04)Mar + (.43)Black + (.71)NatAm - (.55)Asian - (.83)Hisp - (.53)Unk - (.61)Alien - (.27)GPA + (.04)Xfer – (3.25)Under - (.04)Tutor + (2.63)Acc + (2.65)Bus Y = (.11)(36) - (.11)(0) - (.04)(0) + (.43)(1) + (.71)(0) - (.55)(0) - (.83)(0) - (.53)(0) - (.61)(0) - (.27)(3.5) + (.04)(35) – (3.25)(1) - (.04)(0) + (2.63)(0) + (2.65)(1) 35.86 = – – –
52
What is the predicted Quarters to completion for:
Age 45, Female, Married, White, Alien, 3.0 GPA, No Transfer credits, Undergraduate, Tutored, Computer major Y = (.11)Age - (.11)Gen - (.04)Mar + (.43)Black + (.71)NatAm - (.55)Asian - (.83)Hisp - (.53)Unk - (.61)Alien - (.27)GPA + (.04)Xfer – (3.25)Under - (.04)Tutor + (2.63)Acc + (2.65)Bus Y = (.11)(45) - (.11)(1) - (.04)(1) + (.43)(0) + (.71)(0) - (.55)(0) - (.83)(0) - (.53)(0) - (.61)(1) - (.27)(3.0) + (.04)(0) – (3.25)(1) - (.04)(1) + (2.63)(0) + (2.65)(0) 25.8 = –
53
Example Profiles Excel
54
Variation in the DV The difference (change) with each successive Model is significant (F < .05) Each successive Model explains more of the variation (R2) in the DV (Time to Completion) But, 84.6% or more of the variation is still unexplained
55
Possible factors? Worklife, children, personal goals, financial aid, company sponsorship The point is – with R2 only .154, there is some other factor out there contributing more to Time to Completion and we need to find it!
56
Variation in the Slopes
Is the slope of Age (-.117) more or less than slope of GPA (-.277)? Cannot tell by the slopes – cannot compare apples to oranges Apples to apples – i.e., use Standardized ‘Beta’ Beta Age (-.162) more Beta Acc (.016); i.e., Z unit of Age results in greater change than Z unit of GPA
57
Drawing Conclusions Summarize the correlations (Pearson’s R)
“There is a statistically significant association between all the variables and Time to Completion.” Summarize the effects (coefficient B) “Academic major, Transfer credits, and Undergraduate status seem to have the greatest affects.” Summarize the variation (R2) “However, 86% of the variation in Time to Completion is still unexplained.” Suggest what’s next “Data on worklife, income, finances, and company sponsorship should be collected and analyzed.”
58
In Summary Regression measures the strength of association (correlation) for all variables considered at the same time Regression measures the amount of effect (slope) of each variable on the dependent variable as ameliorated by all other variables Regression can predict the outcome of any given profile
59
Regress it, Pal! It’s where it’s at! Quantifying Qualitative Variables
60
Research & Training Consultants
Here, pal! Regress this! presented by Miles Hamby, PhD Research & Training Consultants MilesFlight.com Quantifying Qualitative Variables
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.