Here, pal! Regress this! presented by Miles Hamby, PhD Principle, Ariel Training Consultants MilesFlight.20megsfree.com Or, How to Use.

Slides:



Advertisements
Similar presentations
The Regression Equation  A predicted value on the DV in the bi-variate case is found with the following formula: Ŷ = a + B (X1)
Advertisements

Anita M. Baker, Ed.D. Jamie Bassell Evaluation Services Program Evaluation Essentials Evaluation Support 2.0 Session 2 Bruner Foundation Rochester, New.
Department of Cognitive Science Michael J. Kalsher Adv. Experimental Methods & Statistics PSYC 4310 / COGS 6310 Dummy Coding 1 PSYC 4310/6310 Advanced.
Review for the chapter 6 test 6. 1 Scatter plots & Correlation 6
Econ 140 Lecture 151 Multiple Regression Applications Lecture 15.
Statistical Tests Karen H. Hagglund, M.S.
Types of question and types of variable Training session 4 GAP Toolkit 5 Training in basic drug abuse data management and analysis.
Basic Data Analysis for Quantitative Research
QUANTITATIVE DATA ANALYSIS
Bivariate Regression CJ 526 Statistical Analysis in Criminal Justice.
1 Qualitative Independent Variables Sometimes called Dummy Variables.
Statistics: An Introduction Alan Monroe: Chapter 6.
Statistical Analysis SC504/HS927 Spring Term 2008 Week 17 (25th January 2008): Analysing data.
Correlation and Regression. Correlation What type of relationship exists between the two variables and is the correlation significant? x y Cigarettes.
Multiple Regression – Basic Relationships
Summary of Quantitative Analysis Neuman and Robson Ch. 11
DUMMY VARIABLES BY HARUNA ISSAHAKU Haruna Issahaku.
Measures of Central Tendency
Example of Simple and Multiple Regression
Understanding Research Results
Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242.
Statistics for the Social Sciences Psychology 340 Fall 2013 Thursday, November 21 Review for Exam #4.
Statistical Analysis I have all this data. Now what does it mean?
Fundamentals of Data Analysis. Four Types of Data Alphabetical / Categorical / Nominal data: –Information falls only in certain categories, not in-between.
Multiple Regression. In the previous section, we examined simple regression, which has just one independent variable on the right side of the equation.
1 1 Slide Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination n Model Assumptions n Testing.
Chapter Eleven A Primer for Descriptive Statistics.
Statistical Analysis I have all this data. Now what does it mean?
PADM 582 Quantitative and Qualitative Research Methods Basic Concepts of Statistics Soomi Lee, Ph.D.
ALISON BOWLING THE GENERAL LINEAR MODEL. ALTERNATIVE EXPRESSION OF THE MODEL.
Soc 3306a Multiple Regression Testing a Model and Interpreting Coefficients.
Soc 3306a Lecture 9: Multivariate 2 More on Multiple Regression: Building a Model and Interpreting Coefficients.
Lecture on Correlation and Regression Analyses. REVIEW - Variable A variable is a characteristic that changes or varies over time or different individuals.
Multiple Regression Lab Chapter Topics Multiple Linear Regression Effects Levels of Measurement Dummy Variables 2.
Lecture 5: Chapter 5: Part I: pg Statistical Analysis of Data …yes the “S” word.
TYPES OF STATISTICAL METHODS USED IN PSYCHOLOGY Statistics.
Recap of data analysis and procedures Food Security Indicators Training Bangkok January 2009.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 26.
Introduction to Quantitative Research Analysis and SPSS SW242 – Session 6 Slides.
The Statistical Analysis of Data. Outline I. Types of Data A. Qualitative B. Quantitative C. Independent vs Dependent variables II. Descriptive Statistics.
L. Liu PM Outreach, USyd.1 Survey Analysis. L. Liu PM Outreach, USyd.2 Types of research Descriptive Exploratory Evaluative.
Here, pal! Regress this! presented by Miles Hamby, PhD Principle, Ariel Training Consultants MilesFlight.20megsfree.com Part 2.
Department of Cognitive Science Michael J. Kalsher Adv. Experimental Methods & Statistics PSYC 4310 / COGS 6310 Regression 1 PSYC 4310/6310 Advanced Experimental.
September 18-19, 2006 – Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development Conducting and interpreting multivariate analyses.
Chapter Eight: Using Statistics to Answer Questions.
Overview of Regression Analysis. Conditional Mean We all know what a mean or average is. E.g. The mean annual earnings for year old working males.
Multiple Regression  Similar to simple regression, but with more than one independent variable R 2 has same interpretation R 2 has same interpretation.
BASIC STATISTICAL CONCEPTS Chapter Three. CHAPTER OBJECTIVES Scales of Measurement Measures of central tendency (mean, median, mode) Frequency distribution.
12/23/2015Slide 1 The chi-square test of independence is one of the most frequently used hypothesis tests in the social sciences because it can be used.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 25.
Beginners statistics Assoc Prof Terry Haines. 5 simple steps 1.Understand the type of measurement you are dealing with 2.Understand the type of question.
The Research Process First, Collect data and make sure that everything is coded properly, things are not missing. Do this for whatever program your using.
REGRESSION MODEL FITTING & IDENTIFICATION OF PROGNOSTIC FACTORS BISMA FAROOQI.
Interpretation of Common Statistical Tests Mary Burke, PhD, RN, CNE.
Descriptive Statistics Printing information at: Class website:
Michael J. Kalsher PSYCHOMETRICS MGMT 6971 Regression 1 PSYC 4310 Advanced Experimental Methods and Statistics © 2014, Michael Kalsher.
Data, Tables & Graphs October 24, 2016 BIOL 260
Regression Analysis.
Different Types of Data
Inferential Statistics
Bi-variate #1 Cross-Tabulation
APPROACHES TO QUANTITATIVE DATA ANALYSIS
Basic Statistics and Beyond Made Easy
Chapter 15 Linear Regression
Types of Bivariate Relationships and Associated Statistics
Regression Computer Print Out
Categorical Data Analysis Review for Final
Section 4: How to Analyse Quantitative data?
Research & Training Consultants
Chapter Nine: Using Statistics to Answer Questions
Presentation transcript:

Here, pal! Regress this! presented by Miles Hamby, PhD Principle, Ariel Training Consultants MilesFlight.20megsfree.com Or, How to Use Regression to Tell You Just About Everything Part 1

Typical – Descriptive Statistics Frequencies – numbers of things eg – How many female students have graduated over the last 6 years? Mean – measure of central tendency eg – What is the average time to complete an academic program for students with 12 hours transfer credit? Standard Deviation – measure of dispersion eg – 68% of completing students graduate within how many terms?

Shortcoming of Descriptive Statistics They do not predict. They can tell you what it is – but they can’t tell you what it will be

eg - Can we predict how many female students will graduate and when? Regression predicts! Can we predict when a student with no transfer credit will graduate? Can we predict the likelihood of graduation of a student based on gender?

How to Use Regression to Predict Question – What kind of student takes the longest time to graduate? What kind of student never graduates?

Typical way – Start with specific cohort (eg, Fall 1993) Select a single group (eg, 1-12 transfer credits) Count number who graduate each term Compute percentage ~ 25 graduated  100 started = 25% Conclusion – For Fall 93 cohort, graduation rate = 25% after 12 terms for those with 1-12 transfer credits

Exiguousness of Typical Method – DV implied, not specified (and therefore not tested) Does not measure strength of association to graduation time (correlation) or amount of effect (slope) on graduation time eg – compare age’s effect to transfer credits’ effect Graduation Rate does not predict time-in-program or time-to-completion Must repeat procedure for each time block

Time to graduation for each variable not discrete - includes all other variables Typical Method, e.g. Time to GraduationVariable X = 16 terms, S = 5 termsFemales ~ X = 13 terms, S = 4 terms1-12 Xfer Cr ~ X = 18 terms, S = 9 termsMarried ~

But how about a single, black, man with 17 transfer credits? Must repeat procedure for single students, then repeat for black students, then repeat for males then repeat for 13 – 20 transfer credits, then ‘eyeball’ how they correlate. Is there a way to determine how much of the 16 terms time for females (previous ex.) would be ameliorated by being a single, black, male with 17 transfer credit hours?

There is a way! Regress it! Effects of gender, age, transfer credits, marital status, citizenship, ethnicity, and more, directly on time to complete are measurable and comparable Pick a profile and I’ll tell you how long it will take for that student to graduate!

Procedure – 2. Identify independent variables (IV) that possibly effect graduation rates – gender, ethnicity, marital status, age, transfer credits, income 4. Run linear regression to determine: (b) significance of difference in means of IVs (c) regression model (y = a+b 1 X 1 …b n X n ) to predict Time by IVs (a) correlations between Time and IVs 1. Identify dependent variable (DV) – i.e, the question you are asking – eg, Time to Graduate (Time) 3. Collect data

Regression can tell you everything! # Terms = a +.4*marital +.2*Gender +.06*Age -.18*xfer EG – For a single male, age 32, with 18 transfer credits - we can expect a graduation time of 32 terms # Terms = 33 terms +.4*0 +.2*0 +.06* *18 32 terms = 33 terms

DV ~ Time to Graduation (# terms - ratio) Adding Variables IV ~Gender (F or M - nominal) Ethnic (B, H, W, NA, API, Alien - nominal) Alien (Alien or US - nominal) Marital status (si, ma, di – nominal) Age (# years - ratio) Transfer credits (# hours - ratio) Tutoring done (# sessions – ratio; Y/N - nominal

Coding Your Variables Scale (ratio) variables (time to completion, age, etc) – use number directly eg, Age = 32 years, use ’32’ Time to Comp (terms) = 12 terms, use ’12’

Coding Your Variables Nominal Variables – use ‘dummies’ What are Dummy Variables? Variables used to quantify nominal variables i.e., Nominal (qualitative) variables assigned a quantitative number and treated as a quantitative variable.

Dummy Variables eg – Ethnic - African-American, Hispanic, White Major – Bus, Account, Computers, English, LA Religion – Christian, Jew, Muslim, Hindu Dichotomous variable – two categories eg - Male or Female Married or Single Has had tutoring or hasn’t US Citizen or Alien Graduate student or Undergrad Polychotomous variable – several categories of the variable

Dummy Variables ‘Ethnic’ Make B, NA/AN, W, API,H, Unk unique variables Code as 1 = ‘presence of characteristic’ (‘Black’-ness) or 0 = ‘absence of characteristic’ eg, ‘Gender’ Code Male = 0, Female = 1 (or vice-versa) 1 = ‘presence of characteristic’ (femaleness) 0 = ‘absence of characteristic’

Dummy Variables B: 1 = yes, 0 = no AN: 1 = yes, 0 = no W: 1= yes, 0 = no API: 1 = yes, 0 = no H: 1 = yes, 0 = no Unk: 1 = yes, 0 = no Alien: 1 = yes, 2 = no Marital: 1 = MA/DI 0 = SI Gender: 1 = F, 0 = M Age: number years Transfer credits: number # Terms = 3 terms +.2*1 +.3* *10 +.4*3

# Terms = 32 terms + [.2*1+.2*0+.2*0 +.2*0] (ethnic) +.5*0 (Alien) +.4*1 (marital) +.2*1 (gender) +.06*32 (age) - 1.7*10 (xfer credits) e.g. ~ Black, US Citizen, single, female, married, 32 years old, 10 transfer credits: As Used in the Regression

Nominal Variables – Dichotomous - 2 values Create new column for dummy variable or recode original 1 = presence of characteristic of interest 0 = not the characteristic of interest (absence of characteristic) 1F-490G001F 0US0SI1U000M 1GREEN1MA1U110M 1P-R1MA0G001F 0US1DI1U110M 0US0SI1U121F 1F-10SI1U131F ALIENVISAMARITMARITLU/GLEVELTUTRDTUTSESGENDRSEX

Nominal Variables – more than 2 values Create new columns for dummy variables – one for each value 1 = presence of characteristic (value) 0 = absence of characteristic

Run the Regression SPSS

The Results!

Regression Models

Variable Correlations Note – although some variables are highly correlated to each other, the correlation (R) may not be significant

The Regression ANOVA Test of significance of the F statistic indicates all three the regression models are statistically significant (Sig. <.05) i.e, the variation was not by chance – another set of data would probably show the same results.

The Regression ANOVA The larger the F (ratio of the mean square of the Regression and mean square of the Error/Residual), the more robust the regression equation. I.e., the smaller the mean square residual, indicates smaller error or departure from the regression line = F =

Interpretation – Mean Square Error/Residual of Model 1 is > Mean Square Error of Model 2 Variation about the Regression Line Y QTRS to Completion 0 + error  y y  ŷ ŷ Model 1 error  y y  ŷ ŷ Model 2

The Regression Correlation (R) Model 3 returns the highest correlation (R =.392) with 15.4% (R 2 =.154) of the variation in Time to Completion (in Qtrs) being explained by the variables Alien, Ethnicity, Marital status, Gender, Age, Tutoring, Transfer credits, U/G status, and Major.

The Slopes Model 3 Interpretation The older the student, the shorter the time to completion (B = -.117)

Y QTRS to Completion Interpretation – Age slope shallow, slight effect on Qtrs to Completion Model 3 Slopes Graph – AGE AGE B = yrs 70 yrs

The Slopes Model 3 Interpretation The older the student, the shorter the time to completion (B = -.117) Married/Divorced tends to shorten completion time (B= ), but is not significant (Sig. =.309, >.05)

Y QTRS to Completion Interpretation – Married/Divorced very shallow, but not significant (Sig. <.000) Model 3 Slopes Graph – Married/Divorced Married B = (Single) 1 (Married/Divorced)

The Slopes Model 3 Interpretation The older the student, the shorter the time to completion (B = -.117) Married/Divorced tends to shorten completion time (B= ), but is not significant (Sig. =.309, >.05) Undergraduates tend to take considerably less time to complete than graduates (B = )

Y QTRS to Completion Interpretation – Undergraduates steep, tend to shorten Qtrs to Completion considerably over Graduates Model 3 Slopes Graph – Undergraduate vs Graduate Under B = (Graduate) 1 (Undergraduate)

The Slopes Model 3 Interpretation The older the student, the shorter the time to completion (B = -.117) Married/Divorced tends to shorten completion time (B= ), but is not significant (Sig. =.309, >.05) Undergraduates tend to take considerably less time to complete than graduates (B = ) Tutoring shortens time very slightly (B = ), but is not significant (Sig. =.571)

Y QTRS to Completion Interpretation – Undergraduates steep, tend to shorten Qtrs to Completion considerably over Graduates, but not significant (Sig..571 >.05) Model 3 Slopes Graph – Undergraduate vs Graduate Tutored B = (No Tutoring) 1 (Tutored)

The Slopes Mode 3 Interpretation Xfer slightly lengthens time (B=.04285) very slightly; GPA shortens time but is not significant (Sig. >.05)

Y QTRS to Completion Xfer B = Interpretation – Xfer & GPA very shallow, but GPA not significant (Sig. <.000) Model 3 Slopes Graph – GPA & Transfer Credits GPA Xfer GPA B =

The Slopes Model 3 Interpretation Xfer lengthens slightly; GPA shortens, but not significant Female (neg) tends to shorten time (B = -.110) over Male

0 (Male) 1 (Female) Y X QTRS to Completion Gender B = Interpretation – Female Qtrs to Completion tend to be predictably shorter than Male Qtrs Model 3 Slopes Graph - Gender

The Slopes Model 3 Interpretation Xfer lengthens slightly; GPA shortens, but not significant Female (neg) tends to shorten time (B = -.329) over Male Black, Nat Am & Unkn take longer than Whites (+ B) (NA not significant) Hisp & Asians tend to take shorter than Whites (-B)

Y X QTRS to Completion Interpretation – Black, Asian & Unknown tend to take longer than Whites (+ B); Hispanic & Native American tend to take shorter than Whites (-B) Model 3 Slopes Graph - Ethnicity White B = 0 Black B =.439 Hispanic B = Unknown.531 Native Am B =.719 Asian -.553

The Slopes Model 3 Interpretation Xfer lengthens slightly; GPA shortens, but not significant Female (neg) tends to shorten time (B = -.329) over Male Black, Nat Am & Unkn take longer than Whites (+ B); Hisp & Asians tend to take shorter than Whites (-B) Alien tends to take less time than US citizen (B = -.618)

Alien B = (US) 1 (Alien) Y X QTRS to Completion Interpretation – Alien tends to take less time than US citizen (B =.279) Model 3 Slopes Graph - Alien

The Slopes Model 3 Interpretation Xfer lengthens slightly; GPA shortens, but not significant Female (neg) tends to shorten time (B = -.329) over Male Black, Nat Am & Unkn take longer than Whites (+ B); Hisp & Asians tend to take shorter than Whites (-B) Alien tends to take less time than US citizens (B = -.618) Acc & Bus considerable effect (B= 2.638, 2.651); pos. relative to CIS slope ‘0’

Interpretation – Accounting & Business steepest slopes (2.638, 2.651); positive relative to CIS slope ‘0’ Y X QTRS to Completion Model 3 Slopes Graph - Major Computers B = 0 Business B = Accounting B = 2.638