Regression Inference
Height Weight How much would an adult male weigh if he were 5 feet tall? He could weigh varying amounts (in other words, there is a distribution of weights for adult males who are 5 feet tall) This distribution is normal (we hope) What would you expect for other heights? Where would you expect the TRUE LSRL to be? What about the standard deviations of all these normal distributions? We want the standard deviations of all these normal distributions to be the same.
Regression Model For a given x-value, the responses (y) are normally distributed x & y have a linear relationship with the true LSRL going through each y y is the same for each x-value
Regression Model The mean response ( y ) has a linear relationship with the explanatory variable (x) and are unknown parameters For a fixed value of x, y is a normal distribution Standard deviation of y ( y – also unknown) is the same for all values of x
b (slope of LSRL) – unbiased estimator of true slope) a (intercept of LSRL) – unbiased estimator of (true intercept) s (SD of the residuals) – unbiased estimator of y (true SD of y) Note: df = n – 2 because there are 2 unknowns We use to estimate
It is difficult to accurately determine a person’s body fat percentage without immersing him or her in water. Researchers hoping to find ways to make a good estimate immersed 20 male subjects, then measured their weights. a) Find the LSRL, correlation coefficient, and coefficient of determination. Body fat = (weight) r = r 2 = 0.485
b) Interpret the slope in this context. For every increase of a pound in weight, there is an approximate.25% increase in body fat. c) Interpret the coefficient of determination in this context. Approximately 48.5% of the variation in body fat can be explained by the LSRL of body fat on weight.
d) Estimate α, β, and σ y. α ≈ β ≈ 0.25 σ y ≈ e) Create a scatter plot and residual plot for the data. Weight Residuals Weight Body fat
Sampling Distribution of Slopes
Height Weight Suppose you took many samples of the same size from this population & calculated the LSRL for each. Using the slope from each of these LSRLs, we can create a sampling distribution for the slope of the true LSRL. bbbb b b b What shape will this distribution have? b = What is the standard deviation of the sampling distribution ? What will the mean of the sampling distribution equal?
Let’s review the regression model! For a given x-value, the responses (y) are normally distributed x & y have a linear relationship with the true LSRL going through each y y is the same for each x-value
Conditions for Inference on Slope Independent data –SRS True relationship is linear –No pattern in residuals S.D. of y is constant –Points evenly spaced across LSRL Responses vary normally about the true LSRL –Graph of residuals is roughly symmetrical
Height Weight Suppose the LSRL is a horizontal line. Would height be useful in predicting weight? What is the slope of a horizontal line? A slope of zero means there is NO relationship between x & y!
Hypotheses H 0 : = 0 H a : > 0 H a : < 0 H a : ≠ 0 There is no relationship between x & y (x should not be used to predict y) Be sure to define in context!
Formulas: Confidence Interval: Hypothesis Test: df = n – 2
f) Is there evidence that weight can be used to predict body fat? Conditions: SRS No pattern in residuals weight & body fat are linear Points evenly spaced across LSRL σ y is approx. constant Residual boxplot is approx. symm. responses are approx. normal H 0 : β = 0Where β is the true slope of the LSRL of weight H a : β ≠ 0& body fat Since p-value < α, we reject H 0. There is sufficient evidence to suggest that weight can be used to predict body fat. p-value =.0006, α =.05
g) Construct a 95% confidence interval for the true slope of the LSRL. Conditions: Same! We are 95% confident that the true slope of the LSRL of weight & body fat is between 0.12 and If we made lots of intervals this way, 95% of them would contain the true slope.