Linear Regression
Uses correlations Predicts value of one variable from the value of another ***computes UKNOWN outcomes from present, known outcomes If we know correlation between two variables and one value, we can predict other value In other words, what value on Y would be predicted by a score on X?
You are examining a relationship between continuous variables You wish to predict scores on one variable from scores on the other
Fit a line between the two variables that best captures the scores ◦ Minimal distance between each data point and the line ◦ Allows for the best guess at a score on the second variable given some data point on the first ◦ Error in prediction: Distance from each point to the regression line ◦ If the correlation were perfect, data points would be at a 45-degree angle.
Y’ = bX + a Y’ = predicted score of Y based on X b = slope of the line a = point where line crosses the y-axis X = score used as the predictor
b ◦ The value of b is the slope ◦ From this we can tell how much the Y variable will change when X increases by 1 point a ◦ The Y-intercept ◦ This tells us what Y would be if X = 0 ◦ This is where the line crosses the Y axis
b = ΣXY – (ΣXΣY / n) ΣX 2 – [(ΣX) 2 / n]
a = ΣY - bΣX n
Can examine how closely the actual Y values approximate the predicted Y values If averaged across all data points, this is the standard error of the estimate ◦ Estimates the imprecision of the line
1. State hypotheses ◦ Null hypothesis: no relationship between years of education and income H 0 : β = 0 ◦ Research hypothesis: years of education predicts income H 1 : β ≠ 0
We’ll use SPSS output to test if the x significantly predicts changes in y Partitions variance into variance accounted for by predictors ◦ And variance unaccounted for by predictors (the residual) ◦ The output will include a significance test of whether the variance accounted for significantly differs from zero (an F-statistic)
5. Use SPSS output ANOVA b Model Sum of Squaresdf Mean SquareFSig. 1Regression a Residual Total a. Predictors: (Constant), yrsed b. Dependent Variable: income
5. Use SPSS output for the standardized beta and the test statistic Model Unstandardized Coefficients Standardized Coefficients tSig. BStd. ErrorBeta 1(Constant) yrsed
6. The output indicates that b = 3.54 and β =.95, with a p <.05 (actually p <.01) ◦ So it does exceed the critical value 7. If over the critical value, reject the null & conclude that years of education significantly predicts income
In results ◦ Years of education significantly predicted income, b = 3.54, t = 6.11, p <.05, such that more years of education predicted greater income. ◦ Could further say that: for every additional year of education, participants made an additional $35,400 per year (3.54 x10,000 dollars).
Predict an outcome Y-value with multiple predictor X-values **This is the real advantage over a correlation coefficient Determine whether each predictor makes a unique improvement to the prediction of Y
Model Unstandardized Coefficients Standardized Coefficients tSig. BStd. ErrorBeta 1(Constant) yrsed (Constant) yrsed pincome