REGRESSION G&W p.498-504 http://stattrek.com/AP-Statistics-1/Regression-Example.aspx?Tutorial=AP
THIS IS A TABLE YOU SHOULD KNOW BEFORE ANALYZING YOUR DATA AND THE FINAL Dependent Variable Independent Variable(s) Method Nominal/Ordinal Chi-Square test Interval/Ratio T-test (2 groups) or ANOVA (3+ groups) Multifactorial ANOVA (2+ IVs) Regression Interval/Ratio AND ANCOVA/GLM
Basics of bivariate regression – When do we need it? We have a dependent variable that is interval or ratio level. We have an independent variable that is interval or ratio (there are ways of incorporating ordinal and nominal variables). We have a posited directional association between the two variables. We have a theory that supports the posited directional association.
Technique: Algebra of the straight line 25 y = 2x + 3 20 (6,15) Regression: y = bx + c b = slope = 6/3 = 2 c = y-intercept = 3 15 6.0 (3,9) 10 3.0 5 1 2 3 4 5 6 7 8 9 10 x
Regression line – we have multiple data points y : Reading Scores x : Family Income Assumptions about deviations from the line?
Technique: Statistics of the straight line We have multiple data points (not just two) We need to find a line that best fits them Define “best fit” – Minimum squared deviations slope intercept Independent variable Error Dependent variable Regression coefficients
Technique: Statistics of the straight line The line that “best fits” has the following slope: Standard deviation of y Standard deviation of x Regression coefficient of y on x. Correlation coefficient of x and y The line that “best fits” has the following intercept:
Things to know A significant regression coefficient does not “prove” causality A regression can provide “predicted” values Predicted values are “point” estimates Confidence intervals around the point estimates provide very important information
Basics of linear regression Hypothesis: a variable(s) x (x1, x2, x3, …) cause(s) another variable y Corollary: x can be used to partially predict y Mathematical implication: This is mimimal and random This is linear
Example
Example: Regression approach Error in prediction is minimal, and random
The way these assumptions look Child’s IQ = 20.99 + 0.78*Mother’s IQ
Prediction Child’s IQ = 20.99 + 0.78*Mother’s IQ Predicted Case 3 IQ = 20.99 + 0.78*110 Predicted Case 3 IQ = 106.83 Actual Case 3 IQ = 102
Example
Example
Example
How are the regression coefficients computed? MINIMIZE SQUARED DEVIATIONS BETWEEN ACTUAL AND PREDICTED VALUES ε is “minimal” and random
Interpreting coefficients
Error in estimation of b The estimate of b will differ from sample to sample. There is sampling error in the estimate of b. b is not equal to the population value of the slope (B). If we take many many simple random samples and estimate b many many times ….
Standard error of b
R2 = + 1 = + 1- R2 = Sum of squared errors of regression Sum of squared deviations from the mean only Variance due to regression Total variance Error variance = + Proportion of variance due to regression Proportion of variance due to error 1 = +