Inference for Regression Lines Chapter 12
Conditions LINER Linear relationship between x and y Independent (10% rule) Normal (check residuals) Equal variance (equal scatter above/below “residual=0” line) Random
WE MAY ALWAYS ASSUME ALL CONDITIONS ARE MET (EVEN ON THE AP EXAM) WE MAY ALWAYS ASSUME ALL CONDITIONS ARE MET (EVEN ON THE AP EXAM)!!!!! WOOT WOOT!
Example Women made significant gains in the 1970’s in terms of their acceptance into professions that had been traditionally populated by men. To measure just how big these gains were, we will compare the percentage of professional degrees award to women in 1972-1973 to the percentage awarded in 1978-1979 for selected fields of student from two random samples. (Statistics and Data Analysis, Siegel, Morgan, p.549)
Example continued b) For every 1% increase in 72-73, there is an approximate increase of 1.72% in 78-79. c) We know that 88.6% of the variation in the percent of degrees awarded in 78-79 can be explained by percent awarded in 72-73 in the regression model.. d) Since the residual plot is random scatter, the data are app. linear
Example continued Residual = yactual – ypredicted = 7.2 – (7.0 + 1.724(1.1)) = 7.2 – 8.9 = – 1.7 f) Linear Regression t-test b = true slope for predicting percent of degrees in 78-79 using degrees in 72-73 Assume all conditions are met
Example continued p-value = .0005 df = n – 2 = 6 0.253 Let a = .05 We reject Ho. Since the p-value is less than a there is enough evidence to believe that there is a linear relationship between the percent of degrees in 72-73 and in 78-79.
Computer output Regression equation Estimate of a Explanatory variable MTB> Regress ‘F%78-79’ 1 ‘F%72-73’ ‘SRES2’ ‘FITS2’; The regression equation is F%78-79 = 7.01 + 1.72 F%72-73 PREDICTOR COEF STDEV T-RATIO P Constant 7.007 1.882 3.72 0.010 F%72-73 1.7241 0.2527 6.82 0.000 s = 2.966 R-sq = 88.6% R-sq(adj) = 86.7% Analysis of Variance SOURCE DF SS MS F P Regression 1 409.60 409.60 46.57 0.000 Error 6 52.77 8.80 Total 7 462.38 Regression equation Estimate of a Explanatory variable Estimate of b t-score p-value SEb Standard error of the line Coefficient of determination
Example continued g) The standard error (S) is the standard deviation for the residuals. “On average, the difference between the actual and predicted % of degrees awarded in ‘78-’79 (y) is 2.97%”
Example continued g) The standard error (S) is the standard deviation for the residuals. “On average, the difference between the actual and predicted % of degrees awarded in ‘78-’79 (y) is 2.97%” h) The standard error of the slope (SEb) is .2527. “Over repeated sampling, the slope of the sample regression line would typically vary by about .2527 from the slope in the true regression line for predicting % of degrees awarded in ‘78-’79 using % of degrees awarded in ‘72-’73.”
For Your Notes… Standard Error (S) write-up: “On average, the difference between the actual and predicted [y-variable] is [#].” Standard Error for the Slope (SEb)write-up: “Over repeated sampling, the slope of the sample regression line would typically vary by about [#] from the slope of the true regression line for predicting [y-variable] using [x-variable].”
2) The following is a MiniTab output for chocolate shakes using ounces to predict calories. On average, the difference between the actual and predicted calories is 50.80. b) Over repeated sampling, the slope of the sample regression line would typically vary by about 30.36 from the slope of the true regression line for predicting calories using ounces.
Example continued c) Linear Regression t-interval b = true slope for predicting calories based on number of ounces Given all conditions are met. We are 95% confident that the true slope for the regression line for predicting calories using ounces lies between -54.9 and 138.3 calories per ounce.