Download presentation
Presentation is loading. Please wait.
1
Hypothesis testing and Estimation
Linear Regression Hypothesis testing and Estimation
2
The equation for the least squares line
Let
3
Computing Formulae:
4
Then the slope of the least squares line can be shown to be:
This is an estimator of the slope, b, in the regression model
5
and the intercept of the least squares line can be shown to be:
This is an estimator of the intercept, a, in the regression model
6
The residual sum of Squares
Computing formula
7
Estimating s, the standard deviation in the regression model :
Computing formula This estimate of s is said to be based on n – 2 degrees of freedom
8
Confidence limits for the slope
9
(1 – a)100% Confidence Limits for slope b :
ta/2 critical value for the t-distribution with n – 2 degrees of freedom
10
Testing for the slope
11
Testing the slope The test statistic is: - has a t distribution with df = n – 2 if H0 is true.
12
The Critical Region Reject df = n – 2 This is a two tailed tests. One tailed tests are also possible
13
Testing for correlation
14
Recall: Let (x1,y1), (x2,y2), (x3,y3), … , (xn,yn) denote n observations on the variables X and Y. Then = Pearson’s regression coefficient
15
The test for zero correlation
The test statistic is: - has a t distribution with df = n – 2 if H0 is true.
16
The Critical Region Reject df = n – 2 This is a two tailed tests. One tailed tests are also possible
17
The test for independence is equivalent to the test for zero slope
Comment The test for independence is equivalent to the test for zero slope The test for zero slope The test for independence
18
The test for zero slope
19
= the test statistic for independence
20
Confidence limits for the intercept
21
(1 – a)100% Confidence Limits for intercept a :
ta/2 critical value for the t-distribution with n – 2 degrees of freedom
22
Testing for the intercept
23
Testing the intercept The test statistic is: - has a t distribution with df = n – 2 if H0 is true.
24
The Critical Region Reject df = n – 2
25
(1- a)100% Confidence Limits for a + b x0 :
ta/2 is the a/2 critical value for the t-distribution with n - 2 degrees of freedom
26
(1- a)100% Prediction Limits for y when x = x0:
ta/2 is the a/2 critical value for the t-distribution with n - 2 degrees of freedom
27
The Multiple Linear Regression Model
28
Again we assume that we have a single dependent variable Y and p (say) independent variables X1, X2, X3, ... , Xp. The equation (model) that generally describes the relationship between Y and the Independent variables is of the form: Y = f(X1, X2,... ,Xp | q1, q2, ... , qq) + e where q1, q2, ... , qq are unknown parameters of the function f and e is a random disturbance (usually assumed to have a normal distribution with mean 0 and standard deviation s).
29
In Multiple Linear Regression we assume the following model
Y = b0 + b1 X1 + b2 X bp Xp + e This model is called the Multiple Linear Regression Model. Again are unknown parameters of the model and where b0, b1, b2, ... , bp are unknown parameters and e is a random disturbance assumed to have a normal distribution with mean 0 and standard deviation s.
30
The importance of the Linear model
1. It is the simplest form of a model in which each dependent variable has some effect on the independent variable Y. When fitting models to data one tries to find the simplest form of a model that still adequately describes the relationship between the dependent variable and the independent variables. The linear model is sometimes the first model to be fitted and only abandoned if it turns out to be inadequate.
31
In many instance a linear model is the most appropriate model to describe the dependence relationship between the dependent variable and the independent variables. This will be true if the dependent variable increases at a constant rate as any or the independent variables is increased while holding the other independent variables constant.
32
3. Many non-Linear models can be put into the form of a Linear model by appropriately transformation the dependent variables and/or any or all of the independent variables. This important fact ensures the wide utility of the Linear model. (i.e. the fact the many non-linear models are linearizable.)
33
An Example The following data comes from an experiment that was interested in investigating the source from which corn plants in various soils obtain their phosphorous. The concentration of inorganic phosphorous (X1) and the concentration of organic phosphorous (X2) was measured in the soil of n = 18 test plots. In addition the phosphorous content (Y) of corn grown in the soil was also measured. The data is displayed below:
34
Inorganic Phosphorous X1 Organic X2 Plant Available Y 0.4 53 64 12.6 58 51 23 60 10.9 37 76 3.1 19 71 23.1 46 96 0.6 34 61 50 77 4.7 24 54 21.6 44 93 1.7 65 56 95 9.4 81 1.9 36 10.1 31 26.8 168 11.6 29 29.9 99
35
Coefficients Intercept 56.2510241 (b0) X1 1.78977412 (b1) X2
Coefficients Intercept (b0) X1 (b1) X2 (b2) Equation: Y = X X2
37
Summary of the Statistics used in Multiple Regression
38
The Least Squares Estimates:
- the values that minimize
39
The Analysis of Variance Table Entries
a) Adjusted Total Sum of Squares (SSTotal) b) Residual Sum of Squares (SSError) c) Regression Sum of Squares (SSReg) Note: i.e. SSTotal = SSReg +SSError
40
The Analysis of Variance Table
Source Sum of Squares d.f. Mean Square F Regression SSReg p SSReg/p = MSReg MSReg/s2 Error SSError n-p-1 SSError/(n-p-1) =MSError = s2 Total SSTotal n-1
41
Uses of the ANOVA table:
1. To estimate s2 (the error variance). - Use s2 = MSError to estimate s2. To test the Hypothesis H0: b1 = b2= = bp = 0. Use the test statistic - Reject H0 if F > Fa(p,n-p-1).
42
3. To compute other statistics that are useful in describing the relationship between Y (the dependent variable) and X1, X2, ... ,Xp (the independent variables). a) R2 = the coefficient of determination = SSReg/SSTotal = = the proportion of variance in Y explained by X1, X2, ... ,Xp 1 - R2 = the proportion of variance in Y that is left unexplained by X1, X2, ... , Xp = SSError/SSTotal.
43
b) Ra2 = "R2 adjusted" for degrees of freedom.
= 1 -[the proportion of variance in Y that is left unexplained by X1, X2,... , Xp adjusted for d.f.]
44
c). R= ÖR2 = the Multiple correlation coefficient of Y with X1, X2,
c) R= ÖR2 = the Multiple correlation coefficient of Y with X1, X2, ... ,Xp = = the maximum correlation between Y and a linear combination of X1, X2, ... ,Xp Comment: The statistics F, R2, Ra2 and R are equivalent statistics.
45
Using Statistical Packages
To perform Multiple Regression
46
Using SPSS Note: The use of another statistical package such as Minitab is similar to using SPSS
47
Example The example we will use to illustrate Multiple regression looks at n = 392 different automobiles. The variables measured are: mpg – mileage (The dependent variable Y) engine – engine size (Independent variable X1) horse - horsepower (Independent variable X2) weight (Independent variable X3) The objective will be to determine how the dependent variable Y (mileage), depends on the independent variables – engine size , horsepower and weight.
48
After starting the SSPS program the following dialogue box appears:
49
If you select Opening an existing file and press OK the following dialogue box appears
50
The following dialogue box appears:
51
If the variable names are in the file ask it to read the names
If the variable names are in the file ask it to read the names. If you do not specify the Range the program will identify the Range: Once you “click OK”, two windows will appear
52
One that will contain the output:
53
The other containing the data:
54
To perform any statistical Analysis select the Analyze menu:
55
Then select Regression and Linear.
56
The following Regression dialogue box appears
57
Select the Dependent variable Y.
58
Select the Independent variables X1, X2, etc.
59
If you select the Method - Enter.
60
All variables will be put into the equation.
There are also several other methods that can be used : Forward selection Backward Elimination Stepwise Regression
61
Once the dependent variable, the independent variables and the Method have been selected if you press OK, the Analysis will be performed.
62
The output will contain the following table
R2 and R2 adjusted measures the proportion of variance in Y that is explained by X1, X2, X3, etc (67.6% and 67.3%) R is the Multiple correlation coefficient (the maximum correlation between Y and a linear combination of X1, X2, X3, etc)
63
The next table is the Analysis of Variance Table
The F test is testing if the regression coefficients of the predictor variables are all zero. Namely none of the independent variables X1, X2, X3, etc have any effect on Y
64
The final table in the output
Gives the estimates of the regression coefficients, there standard error and the t test for testing if they are zero Note: Engine size has no significant effect on Mileage
65
The estimated equation from the table below:
Is:
66
Note the equation is: Mileage decreases with: With increases in Engine Size (not significant, p = 0.432) With increases in Horsepower (significant, p = 0.000) With increases in Weight (significant, p = 0.000)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.