Download presentation
Presentation is loading. Please wait.
Published byRalph Strickland Modified over 8 years ago
1
Linear Regression Hypothesis testing and Estimation
2
Assume that we have collected data on two variables X and Y. Let ( x 1, y 1 ) ( x 2, y 2 ) ( x 3, y 3 ) … ( x n, y n ) denote the pairs of measurements on the on two variables X and Y for n cases in a sample (or population)
3
The Statistical Model
4
Each y i is assumed to be randomly generated from a normal distribution with mean i = + x i and standard deviation . ( , and are unknown) yiyi + x i xixi Y = + X slope =
5
The Data The Linear Regression Model The data falls roughly about a straight line. Y = + X unseen
6
The Least Squares Line Fitting the best straight line to “linear” data
7
Let Y = a + b X denote an arbitrary equation of a straight line. a and b are known values. This equation can be used to predict for each value of X, the value of Y. For example, if X = x i (as for the i th case) then the predicted value of Y is:
8
The residual can be computed for each case in the sample, The residual sum of squares (RSS) is a measure of the “goodness of fit of the line Y = a + bX to the data
9
The optimal choice of a and b will result in the residual sum of squares attaining a minimum. If this is the case than the line: Y = a + bX is called the Least Squares Line
10
The equation for the least squares line Let
11
Computing Formulae:
12
Then the slope of the least squares line can be shown to be: This is an estimator of the slope, , in the regression model
13
and the intercept of the least squares line can be shown to be: This is an estimator of the intercept, , in the regression model
14
The residual sum of Squares Computing formula
15
Estimating , the standard deviation in the regression model : This estimate of is said to be based on n – 2 degrees of freedom Computing formula
16
Sampling distributions of the estimators
17
The sampling distribution slope of the least squares line : It can be shown that b has a normal distribution with mean and standard deviation
18
Thus has a standard normal distribution, and has a t distribution with df = n - 2
19
The sampling distribution intercept of the least squares line : It can be shown that a has a normal distribution with mean and standard deviation
20
Thus has a standard normal distribution and has a t distribution with df = n - 2
21
(1 – )100% Confidence Limits for slope : t /2 critical value for the t-distribution with n – 2 degrees of freedom
22
Testing the slope The test statistic is: - has a t distribution with df = n – 2 if H 0 is true.
23
The Critical Region Reject df = n – 2 This is a two tailed tests. One tailed tests are also possible
24
(1 – )100% Confidence Limits for intercept : t /2 critical value for the t-distribution with n – 2 degrees of freedom
25
Testing the intercept The test statistic is: - has a t distribution with df = n – 2 if H 0 is true.
26
The Critical Region Reject df = n – 2
27
Example
28
The following data showed the per capita consumption of cigarettes per month (X) in various countries in 1930, and the death rates from lung cancer for men in 1950. TABLE : Per capita consumption of cigarettes per month (X i ) in n = 11 countries in 1930, and the death rates, Y i (per 100,000), from lung cancer for men in 1950. Country (i)X i Y i Australia4818 Canada5015 Denmark3817 Finland11035 Great Britain11046 Holland4924 Iceland236 Norway259 Sweden3011 Switzerland5125 USA13020
30
Fitting the Least Squares Line
31
First compute the following three quantities:
32
Computing Estimate of Slope ( ), Intercept ( ) and standard deviation ( ),
33
95% Confidence Limits for slope : t.025 = 2.262 critical value for the t-distribution with 9 degrees of freedom 0.0706 to 0.3862
34
95% Confidence Limits for intercept : -4.34 to 17.85 t.025 = 2.262 critical value for the t-distribution with 9 degrees of freedom
35
Y = 6.756 + (0.228)X 95% confidence Limits for slope 0.0706 to 0.3862 95% confidence Limits for intercept -4.34 to 17.85
36
Testing the positive slope The test statistic is:
37
The Critical Region Reject df = 11 – 2 = 9 A one tailed test
38
and conclude we reject
39
Confidence Limits for Points on the Regression Line The intercept is a specific point on the regression line. It is the y – coordinate of the point on the regression line when x = 0. It is the predicted value of y when x = 0. We may also be interested in other points on the regression line. e.g. when x = x 0 In this case the y – coordinate of the point on the regression line when x = x 0 is + x 0
40
x0x0 + x 0 y = + x
41
(1- )100% Confidence Limits for + x 0 : t /2 is the /2 critical value for the t-distribution with n - 2 degrees of freedom
42
Prediction Limits for new values of the Dependent variable y An important application of the regression line is prediction. Knowing the value of x (x 0 ) what is the value of y? The predicted value of y when x = x 0 is: This in turn can be estimated by:.
43
The predictor Gives only a single value for y. A more appropriate piece of information would be a range of values. A range of values that has a fixed probability of capturing the value for y. A (1- )100% prediction interval for y.
44
(1- )100% Prediction Limits for y when x = x 0 : t /2 is the /2 critical value for the t-distribution with n - 2 degrees of freedom
45
Example In this example we are studying building fires in a city and interested in the relationship between: 1. X = the distance of the closest fire hall and the building that puts out the alarm and 2. Y = cost of the damage (1000$) The data was collected on n = 15 fires.
46
The Data
47
Scatter Plot
48
Computations
49
Computations Continued
52
95% Confidence Limits for slope : t.025 = 2.160 critical value for the t-distribution with 13 degrees of freedom 4.07 to 5.77
53
95% Confidence Limits for intercept : 7.21 to 13.35 t.025 = 2.160 critical value for the t-distribution with 13 degrees of freedom
54
Least Squares Line y=4.92x+10.28
55
(1- )100% Confidence Limits for + x 0 : t /2 is the /2 critical value for the t-distribution with n - 2 degrees of freedom
56
95% Confidence Limits for + x 0 :
57
95% Confidence Limits for + x 0 Confidence limits
58
(1- )100% Prediction Limits for y when x = x 0 : t /2 is the /2 critical value for the t-distribution with n - 2 degrees of freedom
59
95% Prediction Limits for y when x = x 0
60
95% Prediction Limits for y when x = x 0 Prediction limits
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.