Download presentation
Presentation is loading. Please wait.
Published byPauline Edwards Modified over 8 years ago
1
Anareg Week 10 Multicollinearity Interesting special cases Polynomial regression
2
Multicollinearity Numerical analysis problem is that the matrix X’X is close to singular and is therefore difficult to invert accurately Statistical problem is that there is too much correlation among the explanatory variables and it is therefore difficult to determine the regression coefficients
3
Multicollinearity (2) Solve the statistical problem and the numerical problem will also be solved The statistical problem is more serious than the numerical problem We want to refine a model that has redundancy in the explanatory variables even if X’X can be inverted without difficulty
4
Multicollinearity (3) Extremes cases can help us to understand the problem if all X’s are uncorrelated, Type I SS and Type II SS will be the same, i.e, the contribution of each explanatory variable to the model will be the same whether or not the other explanatory variables are in the model if there is a linear combination of the explanatory variables that is a constant (e.g. X 1 = X 2 (X 1 - X 2 = 0)), then the Type II SS for the X’s involved will be zero
5
An example Y = gpaX 1 = hsm X 3 = hssX 4 = hse X 5 = satmX 6 = satv X 7 = genderm; Define: sat=satm+satv; We will regress Y on sat satm and satv;
6
Output Source DF Model 2 Error 221 Corrected Total 223 Something is wrong dfM=2 but there are 3 Xs
7
Output (2) NOTE: Model is not full rank. Least-squares solutions for the parameters are not unique. Some statistics will be misleading. A reported DF of 0 or B means that the estimate is biased.
8
Output (3) NOTE: The following parameters have been set to 0, since the variables are a linear combination of other variables as shown. satv = sat - satm
9
Output (4) Par St Var DF Est Err t P Int 1 1.28 0.37 3.43 0.0007 sat B -0.00 0.00 -0.04 0.9684 satm B 0.00 0.00 2.10 0.0365 satv 0 0...
10
Extent of multicollinearity Our CS example had one explanatory variable equal to a linear combination of other explanatory variables This is the most extreme case of multicollinearity and is detected by statistical software because (X’X) does not have an inverse We are concerned with cases less extreme
11
Effects of multicollinearity Regression coefficients are not well estimated and may be meaningless Similarly for standard errors of these estimates Type I SS and Type II SS will differ R 2 and predicted values are usually ok
12
Two separate problems Numerical accuracy (X’X) is difficult to invert Need good software Statistical problem Results are difficult to interpret Need a better model
13
Polynomial regression We can do linear, quadratic, cubic, etc. by defining squares, cubes, etc. in a data step and using these as predictors in a multiple regression We can do this with more than one explanatory variable When we do this we generally create a multicollinearity problem
14
Polynomial Regression (2) We can remove the correlation between explanatory variables and their squares Center (subtract the mean) before squaring NKNW rescale by standardizing (subtract the mean and divide by the standard deviation)
15
Interaction Models With several explanatory variables, we need to consider the possibility that the effect of one variable depends on the value of another variable Special cases One indep variable – second order One indep variable – Third order Two cindep variables – second order
16
One Independent variable – Second Order The regression model: The mean response is a parabole and is frequently called a quadratic response function. βo reperesents the mean response of Y when x = 0 and β1 is often called the linear effect coeff while β11 is called the quadratic effect coeff.
17
One Independent variable –Third Order The regression model: The mean response is
18
Two Independent variable – Second Order The regression model: The mean response is the equation of a conic section. The coeff β12 is often called the interaction effect coeff.
19
NKNW Example p 330 Response variable is the life (in cycles) of a power cell Explanatory variables are Charge rate (3 levels) Temperature (3 levels) This is a designed experiment
20
check the data Obs cycles chrate temp 1 150 0.6 10 2 86 1.0 10 3 49 1.4 10 4 288 0.6 20 5 157 1.0 20 6 131 1.0 20 7 184 1.0 20 8 109 1.4 20 9 279 0.6 30 10 235 1.0 30 11 224 1.4 30
21
Create the new variables and run the regression Create new variables chrate2=chrate*chrate; temp2=temp*temp; ct=chrate*temp; Then regress cycles on chrate, temp, chrate2, temp2, and ct;
22
a. Regression Coefficients VarbS(b)tPr>|t| int162.8416.619.81<.0002 Chrate-55.8313.22-4.22<0.01 Temp75.5013.225.71<0.005 Chrate227.3920.341.350.2359 Temp2-10.6120.34-.520.6244 ct11.5016.19.710.5092 Output
23
Output (2) b. ANOVA Table SourcedfSSMS Regression56636611703 X1X1 118704 X 2 |X 1 134201 X 1 2 |X 1,X 2 11646 X 2 2 |X 1,X 2,X 1 2 1285 X 2 2 |X 1,X 2,X 1 2, X 2 2 1529 Error542401048 Total1060606
24
Conclusion We have a multicollinearity problem Lets look at the correlations (use proc corr) There are some very high correlations r(chrate,chrate2) = 0.99103 r(temp,temp2) = 0.98609
25
A remedy We can remove the correlation between explanatory variables and their squares Center (subtract the mean) before squaring NKNW rescale by standardizing (subtract the mean and divide by the standard deviation)
26
Last slide Read NKNW 7.6 to 7.7 and the problems on pp 317-326 We used programs cs4.sas and NKNW302.sas to generate the output for today
27
Last slide Read NKNW 8.5 and Chapter 9
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.