STA291 Statistical Methods Lecture 11
2 LINEar Association o r measures “closeness” of data to the “best” line. What line is that? And best in what terms of what? o In terms of least squared error:
3 “Best” line: least-squares, or regression line Observed point: ( x i, y i ) Predicted value for given x i : (interpretation in a minute) “Best” line minimizes, the sum of the squared errors.
4 Interpretation of the b 0, b 1 b 0 Intercept: predicted value of y when x = 0. b 1 Slope: predicted change in y when x increases by 1.
5 Calculation of the b 0, b 1 where and
6 Least Squares, or Regression Line, Example STA291 study time example: (Hours studied, Score on First Exam) o Data: (1,45), (5, 80), (12, 100) o In summary: o b 1 = o b 0 = Interpretation?
7 Properties of the Least Squares Line o b 1, slope, always has the same sign as r, the correlation coefficient—but they measure different things! o The sum of the errors (or residuals),, is always 0 (zero). o The line always passes through the point.
About those residuals 8 o When we use our prediction equation to “check” values we actually observed in our data set, we can find their residuals: the difference between the predicted value and the observed value o For our STA291 study data earlier, one observation was (5, 80). Our prediction equation was: o When we plug in x = 5, we get a predicted y of 70.24—our residual, then, is
Residuals 9 o Earlier, pointed out the sum of the residuals is always 0 (zero) o Residuals are positive when the observed y is above the regression line; negative when it is below o The smaller (in absolute value) the individual residual, the closer the predicted y was to the actual y.
R-squared??? 10 o Gives the proportion of the variation of the y ’s accounted for in the linear relationship with the x ’s o So, this means?
Why “regression”? 11 o Sir Francis Galton (1880s): correlation between x =father’s height and y =son’s height is about 0.5 o Interpretation: If a father has height one standard deviation below average, then the predicted height of the son is 0.5 standard deviations below average o More Interpretation: If a father has height two standard deviations above average, then the predicted height of the son is 0.5 x 2 = 1 standard deviation above average o Tall parents tend to have tall children, but not so tall o This is called “regression toward the mean” statistical term “regression”
Looking back o Best-fit, or least-squares, or regression line o Interpretation of the slope, intercept o Residuals o R-squared o “Regression toward the mean”