Anareg Week 10 Multicollinearity Interesting special cases Polynomial regression.

Slides:



Advertisements
Similar presentations
3.3 Hypothesis Testing in Multiple Linear Regression
Advertisements

Topic 12: Multiple Linear Regression
1 1 Chapter 5: Multiple Regression 5.1 Fitting a Multiple Regression Model 5.2 Fitting a Multiple Regression Model with Interactions 5.3 Generating and.
 Population multiple regression model  Data for multiple regression  Multiple linear regression model  Confidence intervals and significance tests.
Pengujian Parameter Regresi Pertemuan 26 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Analysis of Economic Data
Matrix A matrix is a rectangular array of elements arranged in rows and columns Dimension of a matrix is r x c  r = c  square matrix  r = 1  (row)
Statistics for Managers Using Microsoft® Excel 5th Edition
1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Statistics 350 Lecture 16. Today Last Day: Introduction to Multiple Linear Regression Model Today: More Chapter 6.
1 1 Slide 統計學 Spring 2004 授課教師:統計系余清祥 日期: 2004 年 5 月 4 日 第十二週:複迴歸.
1 1 Slide © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
EPI809/Spring Testing Individual Coefficients.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Linear Simultaneous Equations
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Multiple Regression Dr. Andy Field.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Objectives of Multiple Regression
Topic 16: Multicollinearity and Polynomial Regression.
© 2004 Prentice-Hall, Inc.Chap 15-1 Basic Business Statistics (9 th Edition) Chapter 15 Multiple Regression Model Building.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
STA302/ week 111 Multicollinearity Multicollinearity occurs when explanatory variables are highly correlated, in which case, it is difficult or impossible.
Some matrix stuff.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2003 Thomson/South-Western Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved OPIM 303-Lecture #9 Jose M. Cruz Assistant Professor.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
12a - 1 © 2000 Prentice-Hall, Inc. Statistics Multiple Regression and Model Building Chapter 12 part I.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
CHAPTER 14 MULTIPLE REGRESSION
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation Note: Homework Due Thursday.
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis.
MBP1010H – Lecture 4: March 26, Multiple regression 2.Survival analysis Reading: Introduction to the Practice of Statistics: Chapters 2, 10 and 11.
Anaregweek11 Regression diagnostics. Regression Diagnostics Partial regression plots Studentized deleted residuals Hat matrix diagonals Dffits, Cook’s.
Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.
Analisa Regresi Week 7 The Multiple Linear Regression Model
Chapter 13 Multiple Regression
Topic 26: Analysis of Covariance. Outline One-way analysis of covariance –Data –Model –Inference –Diagnostics and rememdies Multifactor analysis of covariance.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Model Building and Model Diagnostics Chapter 15.
 Relationship between education level, income, and length of time out of school  Our new regression equation: is the predicted value of the dependent.
1 Experimental Statistics - week 12 Chapter 12: Multiple Regression Chapter 13: Variable Selection Model Checking.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Week 101 ANOVA F Test in Multiple Regression In multiple regression, the ANOVA F test is designed to test the following hypothesis: This test aims to assess.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice- Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition.
Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects.
Table of Contents First get all nonzero terms on one side. Quadratic Equation: Solving by factoring Example: Solve 6x 2 – 13x = 8. 6x 2 – 13x – 8 = 0 Second.
EXTRA SUMS OF SQUARES  Extra sum squares  Decomposition of SSR  Usage of Extra sum of Squares  Coefficient of partial determination.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Topic 21: ANOVA and Linear Regression. Outline Review cell means and factor effects models Relationship between factor effects constraint and explanatory.
INTRODUCTION TO MULTIPLE REGRESSION MULTIPLE REGRESSION MODEL 11.2 MULTIPLE COEFFICIENT OF DETERMINATION 11.3 MODEL ASSUMPTIONS 11.4 TEST OF SIGNIFICANCE.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression.
Multiple Regression Prof. Andy Field.
Multiple Regression Analysis and Model Building
Essentials of Modern Business Statistics (7e)
Multiple Regression.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Multiple Regression II
Multiple Regression – Part II
CHAPTER 29: Multiple Regression*
Multiple Regression II
Statistics for Business and Economics
Multiple Linear Regression
Chapter 13 Additional Topics in Regression Analysis
Presentation transcript:

Anareg Week 10 Multicollinearity Interesting special cases Polynomial regression

Multicollinearity Numerical analysis problem is that the matrix X’X is close to singular and is therefore difficult to invert accurately Statistical problem is that there is too much correlation among the explanatory variables and it is therefore difficult to determine the regression coefficients

Multicollinearity (2) Solve the statistical problem and the numerical problem will also be solved The statistical problem is more serious than the numerical problem We want to refine a model that has redundancy in the explanatory variables even if X’X can be inverted without difficulty

Multicollinearity (3) Extremes cases can help us to understand the problem if all X’s are uncorrelated, Type I SS and Type II SS will be the same, i.e, the contribution of each explanatory variable to the model will be the same whether or not the other explanatory variables are in the model if there is a linear combination of the explanatory variables that is a constant (e.g. X 1 = X 2 (X 1 - X 2 = 0)), then the Type II SS for the X’s involved will be zero

An example Y = gpaX 1 = hsm X 3 = hssX 4 = hse X 5 = satmX 6 = satv X 7 = genderm; Define: sat=satm+satv; We will regress Y on sat satm and satv;

Output Source DF Model 2 Error 221 Corrected Total 223 Something is wrong dfM=2 but there are 3 Xs

Output (2) NOTE: Model is not full rank. Least-squares solutions for the parameters are not unique. Some statistics will be misleading. A reported DF of 0 or B means that the estimate is biased.

Output (3) NOTE: The following parameters have been set to 0, since the variables are a linear combination of other variables as shown. satv = sat - satm

Output (4) Par St Var DF Est Err t P Int sat B satm B satv

Extent of multicollinearity Our CS example had one explanatory variable equal to a linear combination of other explanatory variables This is the most extreme case of multicollinearity and is detected by statistical software because (X’X) does not have an inverse We are concerned with cases less extreme

Effects of multicollinearity Regression coefficients are not well estimated and may be meaningless Similarly for standard errors of these estimates Type I SS and Type II SS will differ R 2 and predicted values are usually ok

Two separate problems Numerical accuracy (X’X) is difficult to invert Need good software Statistical problem Results are difficult to interpret Need a better model

Polynomial regression We can do linear, quadratic, cubic, etc. by defining squares, cubes, etc. in a data step and using these as predictors in a multiple regression We can do this with more than one explanatory variable When we do this we generally create a multicollinearity problem

Polynomial Regression (2) We can remove the correlation between explanatory variables and their squares Center (subtract the mean) before squaring NKNW rescale by standardizing (subtract the mean and divide by the standard deviation)

Interaction Models With several explanatory variables, we need to consider the possibility that the effect of one variable depends on the value of another variable Special cases One indep variable – second order One indep variable – Third order Two cindep variables – second order

One Independent variable – Second Order The regression model: The mean response is a parabole and is frequently called a quadratic response function. βo reperesents the mean response of Y when x = 0 and β1 is often called the linear effect coeff while β11 is called the quadratic effect coeff.

One Independent variable –Third Order The regression model: The mean response is

Two Independent variable – Second Order The regression model: The mean response is the equation of a conic section. The coeff β12 is often called the interaction effect coeff.

NKNW Example p 330 Response variable is the life (in cycles) of a power cell Explanatory variables are Charge rate (3 levels) Temperature (3 levels) This is a designed experiment

check the data Obs cycles chrate temp

Create the new variables and run the regression Create new variables chrate2=chrate*chrate; temp2=temp*temp; ct=chrate*temp; Then regress cycles on chrate, temp, chrate2, temp2, and ct;

a. Regression Coefficients VarbS(b)tPr>|t| int <.0002 Chrate <0.01 Temp <0.005 Chrate Temp ct Output

Output (2) b. ANOVA Table SourcedfSSMS Regression X1X X 2 |X X 1 2 |X 1,X X 2 2 |X 1,X 2,X X 2 2 |X 1,X 2,X 1 2, X Error Total

Conclusion We have a multicollinearity problem Lets look at the correlations (use proc corr) There are some very high correlations r(chrate,chrate2) = r(temp,temp2) =

A remedy We can remove the correlation between explanatory variables and their squares Center (subtract the mean) before squaring NKNW rescale by standardizing (subtract the mean and divide by the standard deviation)

Last slide Read NKNW 7.6 to 7.7 and the problems on pp We used programs cs4.sas and NKNW302.sas to generate the output for today

Last slide Read NKNW 8.5 and Chapter 9