Summary of the Statistics used in Multiple Regression
The Least Squares Estimates: - The values that minimize
The Analysis of Variance Table Entries a) Adjusted Total Sum of Squares (SS Total ) b) Residual Sum of Squares (SS Error ) c) Regression Sum of Squares (SS Reg ) Note: i.e. SS Total = SS Reg +SS Error
The Analysis of Variance Table SourceSum of Squaresd.f.Mean SquareF RegressionSS Reg pSS Reg /p = MS Reg MS Reg /s 2 ErrorSS Error n-p-1SS Error /(n-p-1) =MS Error = s 2 TotalSS Total n-1
Uses: 1.To estimate 2 (the error variance). - Use s 2 = MSError to estimate 2. 2.To test the Hypothesis H 0 : 1 = 1 = 2 =... = p = 0. Use the test statistic F = MS Reg / s 2 = [(1/p)SS Reg ]/[(1/(n-p-1))SS Error ]. - Reject H 0 if F > F a (p,n-p-1).
3.To compute other statistics that are useful in describing the relationship between Y (the dependent variable) and X 1, X 2,...,X p (the independent variables). a)R 2 = the coefficient of determination = SS Reg /SS Total = = the proportion of variance in Y explained by X 1, X2,...,X p 1 - R 2 = the proportion of variance in Y that is left unexplained by X 1, X2,..., X p = SSError/SSTotal.
b)R a 2 = "R 2 adjusted" for degrees of freedom. = 1 -[the proportion of variance in Y that is left unexplained by X 1, X 2,..., X p adjusted for d.f.] = 1 - [(1/(n-p-1))SS Error ]/[(1/(n-1))SS Total ]. = 1 - [(n-1)SS Error ]/[(n-p-1)SS Total ]. = 1 - [(n-1)/(n-p-1)] [1 - R 2 ].
c) R= R 2 = the Multiple correlation coefficient of Y with X 1, X 2,...,X p = = the maximum correlation between Y and a linear combination of X 1, X 2,...,X p Comment: The statistics F, R 2, R a 2 and R are equivalent statistics.
Properties of the Least Squares Estimators: 1.Normally distributed ( If there error terms are Normally distributed) 2.Unbiased Estimators of the Linear Parameters 0, 1, 2,... p. 3.Minimum Variance (Minimum Standard Error) of all Unbiased Estimators of the Linear Parameters 0, 1, 2,... p.
Comments: 1.The Error Variance s 2 (and s). 2.s X i, the standard deviation of X i (the i th independent variable). 3.The sample size n. 4.The correlations between all pairs of variables.
decreases as s decreases. decreases as s X i increases. decreases as n increases. increases as the correlation between pairs of independent variables increases. –In fact the standard error of the least squares estimates can be extremely high if there is a high correlation between one of the independent variables and a linear combination of the remaining independent variables. (the problem of Multicollinearity). The standard error of ˆ i, S.E. ˆ i s ˆ i
The Covariance Matrix,Correlation and X T X inverse matrix The Covariance Matrix where and
The Correlation Matrix
The X T X inverse matrix
If we multiply each entry in the X T X inverse matrix by s 2 = MS Error this matrix turns into the covariance matrix for :
These matrices can be used to compute standard Errors for linear combinations of the regression coefficients Namely
An Example Suppose one is interested in how the cost per month (Y) of heating a plant is determined the average atmospheric temperature in the Month (X 1 ) and the number of operating days in the month (X 2 ). The data on these variables was collected for n = 25 months selected at random and is given on the following page. Y = cost per month of heating a plant X 1 = average atmospheric temperature in the month X 2 = the number of operating days for the plant in the month.
The Least Squares Estimates: ConstantX1X1 X2X2 Estimate Standard Error The Covariance Matrix ConstantX1X1 X2X X1X X2X The Correlation Matrix ConstantX1X1 X2X X1X X2X The X T X Inverse matrix ConstantX1X1 X2X X1X x x10 -3 X2X
The Analysis of Variance Table SourcedfSSMSF Regression Error Total
Summary Statistics (R 2, R adjusted 2 = R a 2 and R) R 2 = / =.8491 (explained variance in Y %) R a 2 = 1 - [1 - R 2 ][(n-1)/(n-p-1)] = 1 - [ ][24/22] =.8354 (83.54 %) R = =.9215 = Multiple correlation coefficient
Three-dimensional Scatter-plot of Cost, Temp and Days.
Example Motor Vehicle example Variables 1.(Y) mpg – Mileage 2.(X 1 ) engine – Engine size. 3.(X 2 ) horse – Horsepower. 4.(X 3 ) weight – Weight.
Select Analysis->Regression->Linear
To print the correlation matrix or the covariance matrix of the estimates select Statistics
Check the box for the covariance matrix of the estimates.
Here is the table giving the estimates and their standard errors.
Here is the table giving the correlation matrix and covariance matrix of the regression estimates: What is missing in SPSS is covariances and correlations with the intercept estimate (constant).
This can be found by using the following trick 1.Introduce a new variable (called constnt) 2.The new “variable” takes on the value 1 for all cases
Select Transform->Compute
The following dialogue box appears Type in the name of the target variable - constnt Type in ‘1’ for the Numeric Expression
This variable is now added to the data file
Add this new variable (constnt) to the list of independent variables
Under Options make sure the box – Include constant in equation – is unchecked The coefficient of the new variable will be the constant.
Here are the estimates of the parameters with their standard errors Note the agreement with parameter estimates and their standard errors as previously calculated.
Here is the correlation matrix and the covariance matrix of the estimates.
Testing for Hypotheses related to Multiple Regression. The General Linear Hypothesis H 0 :h 11 1 + h 12 2 + h 13 h 1p p = h 1 h 21 1 + h 22 2 + h 23 h 2p p = h 2... h q1 1 + h q2 2 + h q3 h qp p = h q where h 11 h 12, h 13,..., h qp and h 1 h 2, h 3,..., h q are known coefficients.
Examples 1.H 0 : 1 = 0 2.H 0 : 1 = 0, 2 = 0, 3 = 0 3.H 0 : 1 = 2 4.H 0 : 1 = 2, 3 = 4 5.H 0 : 1 = 1/2( 2 + 3 ) 6.H 0 : 1 = 1/2( 2 + 3 ), 3 = 1/3( 4 + 5 + 6 )
The Complete Model Y = 0 + 1 X 1 + 2 X 2 + 3 X p X p + The Reduced Model The model implied by H 0. You are interested in knowing whether the complete model can be simplified to the reduced model.
Testing the General Linear Hypothesis The F-test for H 0 is performed by carrying out two runs of a multiple regression package.
Run 1: Fit the complete model. Resulting in the following Anova Table: SourcedfSum of Squares RegressionpSS Reg Residual (Error)n-p-1SS Error Totaln-1SS Total
Run 2: Fit the reduced model (q parameters eliminated) Resulting in the following Anova Table: SourcedfSum of Squares Regressionp-qSS 1 Reg Residual (Error)n-p+q-1SS 1 Error Totaln-1SS Total
The Test: The Test is carried out using the Test Statistic where SS H 0 = SS 1 Error - SS Error = SS Reg - SS 1 Reg and s 2 = SS Error /(n-p-1). The test statistic, F, has an F-distribution with 1 = q d.f. in the numerator and 2 = n – p - 1 d.f. in the denominator if H 0 is true.
Distribution when H 0 is true
The Critical Region Reject H 0 if F > F (q, n – p – 1) F (q, n – p – 1)
The Anova Table for the Test: SourcedfSum of SquaresMean SquareF Regressionp-qSS 1 Reg [1/(p-q)]SS 1 Reg MS 1 Reg /s 2 (for the reduced model) DepartureqSS H0 (1/q)SS H0 MS H0 /s 2 from H 0 Residual n-p-1SS Error s 2 (Error) Totaln-1SS Total
Some Examples: Four independent Variables X 1, X 2, X 3, X 4 The Complete Model Y = 0 + 1 X 1 + 2 X 2 + 3 X 3 + 4 X 4 +
1)a)H 0 : 3 = 0, 4 = 0 (q = 2) b)The Reduced Model: Y = 0 + 1 X 1 + 2 X 2 + Dependent Variable:Y Independent Variables: X 1, X 2
2)a)H 0 : 3 = 4.5, 4 = 8.0 (q = 2) b)The Reduced Model: Y – 4.5X 3 – 8.0X 4 = 0 + 1 X 1 + 2 X 2 + Dependent Variable:Y – 4.5X 3 – 8.0X 4 Independent Variables: X 1, X 2
Example Motor Vehicle example Variables 1.(Y) mpg – Mileage 2.(X 1 ) engine – Engine size. 3.(X 2 ) horse – Horsepower. 4.(X 3 ) weight – Weight.
Suppose we want to test: H 0 : 1 = 0 against H A : 1 ≠ 0 i.e. engine size(engine) has no effect on mileage(mpg). The Full model: Y = 0 + 1 X 1 + 2 X 2 + 1 X 3 + (mpg) (engine)(horse) (weight) The reduced model: Y = 0 + 2 X 2 + 1 X 3 +
The ANOVA Table for the Full model:
The reduction in the residual sum of squares = = The ANOVA Table for the Reduced model:
The ANOVA Table for testing H 0 : 1 = 0 against H A : 1 ≠ 0
Now suppose we want to test: H 0 : 1 = 0, 2 = 0 against H A : 1 ≠ 0 or 2 ≠ 0 i.e. engine size (engine) and horsepower (horse) have no effect on mileage (mpg). The Full model: Y = 0 + 1 X 1 + 2 X 2 + 1 X 3 + (mpg) (engine)(horse) (weight) The reduced model: Y = 0 + 1 X 3 +
The ANOVA Table for the Full model
The reduction in the residual sum of squares = = The ANOVA Table for the Reduced model:
The ANOVA Table for testing H 0 : 1 = 0, 2 = 0 against H A : 1 ≠ 0 or 2 ≠ 0