The Simple Linear Regression Model
Estimators in Simple Linear Regression and
Sampling distributions of the estimators
Recall that if y 1, y 2, y 3 …, y n are 1.Independent 2.Normally distributed with means 1, 2, 3 …, n and standard deviations 1, 2, 3 …, n Then L = c 1 y 1 + c 2 y 2 + c 3 y 3 + … + c n y n is normal with mean and standard deviation
Sampling distribution the slope
Note : Also
Thus Hence where and standard deviation is normal with mean
Thus since and
Also
and standard deviation Henceis normal with mean
Sampling distribution of the intercept
The sampling distribution intercept of the least squares line : It can be shown that has a normal distribution with mean and standard deviation
Proof: where Thus
Also now
Hence and
and standard deviation Summary is normal with mean is normal with mean and standard deviation 1. 2.
Sampling distribution of the estimate of variance
The sampling distribution of s 2 This estimate of is said to be based on n – 2 degrees of freedom
The sampling distribution of s 2 Recall that y 1, y 2, …, y n are independent, normal with mean + x i and standard deviation Let Then z 1, z 2, …, z n are independent, normal with mean 0 and standard deviation 1, and Has a 2 distribution with n degrees of freedom
If and are replaced by their estimators: then has a 2 distribution with n-2 degrees of freedom Note:
Thus This verifies the statement made earlier that s 2 is an unbiased estimator of 2. and
and standard deviation Summary is normal with mean is normal with mean and standard deviation 1. 2.
and standard deviation Recall is normal with mean Therefore has a standard normal distribution
has a t distribution with n – 2 degrees of freedom and
(1 – )100% Confidence Limits for slope : t /2 critical value for the t-distribution with n – 2 degrees of freedom
and standard deviation Also is normal with mean Therefore has a standard Normal distribution
and has a t distribution with n – 2 degrees of freedom
(1 – )100% Confidence Limits for intercept : t /2 critical value for the t-distribution with n – 2 degrees of freedom
The following data showed the per capita consumption of cigarettes per month (X) in various countries in 1930, and the death rates from lung cancer for men in TABLE : Per capita consumption of cigarettes per month (X i ) in n = 11 countries in 1930, and the death rates, Y i (per 100,000), from lung cancer for men in Country (i)X i Y i Australia4818 Canada5015 Denmark3817 Finland11035 Great Britain11046 Holland4924 Iceland236 Norway259 Sweden3011 Switzerland5125 USA13020
Fitting the Least Squares Line
First compute the following three quantities:
Computing Estimate of Slope and Intercept
95% Confidence Limits for slope : t.025 = critical value for the t-distribution with 9 degrees of freedom to
95% Confidence Limits for intercept : to t.025 = critical value for the t-distribution with 9 degrees of freedom
(1 – )100% Confidence Limits for a point on the regression line + x 0 : x y regression line + 0 x 0 x0x0 y = + 0 x
Let then and
Proof: where Note and Thus
Also now
Hence and
(1 – )100% Confidence Limits for a point on the regression line intercept + x 0 : t /2 critical value for the t-distribution with n - 2 degrees of freedom
Prediction In linear regression model
(1 – )100% Prediction Limits for y when x = x 0 : t /2 critical value for the t-distribution with n - 2 degrees of freedom