Download presentation
Presentation is loading. Please wait.
Published byEunice Chrystal Mitchell Modified over 8 years ago
1
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company
2
Statistical model for linear regression p In the population, the linear regression equation is y = 0 + 1 x. p Sample data then fits the model: p Data = fit + residual p y i = ( 0 + 1 x i ) + ( i ) p where the i are independent and Normally distributed N(0, ). p Linear regression assumes equal variance of y ( is the same for all values of x).
3
y = + x The intercept , the slope , and the standard deviation of y are the unknown parameters of the regression model We rely on the random sample data to provide unbiased estimates of these parameters. The value of ŷ from the least-squares regression line is really a prediction of the mean value of y ( y ) for a given value of x. The least-squares regression line (ŷ = b 0 + b 1 x) obtained from sample data is the best estimate of the true population regression line ( y = + x). ŷ unbiased estimate for mean response y b 0 unbiased estimate for intercept 0 b 1 unbiased estimate for slope Estimating the parameters
4
The regression standard error, s, for n sample data points is calculated from the residuals (y i – ŷ i ): s is an unbiased estimate of the regression standard deviation In JMP, this is Root Mean Square Error. The population standard deviation for y at any given value of x represents the spread of the normal distribution of the i around the mean y.
5
Conditions for inference The observations are independent. The relationship is indeed linear. The standard deviation of y, σ, is the same for all values of x. The response y varies normally around its mean.
6
Using residual plots to check for regression validity The residuals (y − ŷ) give useful information about the contribution of individual data points to the overall pattern of scatter. We view the residuals in a residual plot: We may also look at a normal quantile plot of the residuals to check the normality assumption.
7
Residuals are randomly scattered good! Curved pattern the relationship is not linear. Change in variability across plot σ not equal for all values of x.
8
Confidence interval for regression parameters Estimating the regression parameters 0, 1 is a case of one- sample inference with unknown population variance. We rely on the t distribution, with n – 2 degrees of freedom. A level C confidence interval for the slope, 1, is proportional to the standard error of the least-squares slope: b 1 ± t* SE b1 A level C confidence interval for the intercept, 0, is proportional to the standard error of the least-squares intercept: b 0 ± t* SE b0 t* is the t critical for the t (n – 2) distribution with area C between –t* and +t*.
9
Significance test for the slope p We can test the hypothesis H 0 : 1 = 0 versus a 1 or 2 sided alternative. p We calculate t = (b 1 -0) / SE b1 p which if H 0 is true, has the p t (n – 2) distribution; use p Table D to find the p-value of p the test. JMP provides the p numerator and denominator p and the p-values when you p Fit Y by X.
10
Homework for Inference on Regression Read over these notes and be prepared to use JMP to answer the homework questions, to do all the computations Start with exercises #10.9-10.11. Work these in class with JMP Try in a similar way to work #10.12-10.19
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.