Multicollinearity in Regression Principal Components Analysis

Slides:



Advertisements
Similar presentations
Multicollinearity in Regression Principal Components Analysis
Advertisements

3.3 Hypothesis Testing in Multiple Linear Regression
Canonical Correlation
More on understanding variance inflation factors (VIFk)
Covariance Matrix Applications
Ridge Regression Population Characteristics and Carbon Emissions in China ( ) Q. Zhu and X. Peng (2012). “The Impacts of Population Change on Carbon.
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
統計計算與模擬 政治大學統計系余清祥 2004 年 3 月 29 日至 4 月 14 日 第七、九週:矩陣運算
Lecture 7: Principal component analysis (PCA)
Psychology 202b Advanced Psychological Statistics, II February 1, 2011.
Psychology 202b Advanced Psychological Statistics, II January 25, 2011.
Factor analysis Caroline van Baal March 3 rd 2004, Boulder.
Factor Analysis There are two main types of factor analysis:
January 6, morning session 1 Statistics Micro Mini Multiple Regression January 5-9, 2008 Beth Ayers.
19-1 Chapter Nineteen MULTIVARIATE ANALYSIS: An Overview.
Predictive Analysis in Marketing Research
Ordinary least squares regression (OLS)
Lecture 11 Multivariate Regression A Case Study. Other topics: Multicollinearity  Assuming that all the regression assumptions hold how good are our.
Chapter 9 Multicollinearity
Tables, Figures, and Equations
Forecasting Revenue: An Example of Regression Model Building Setting: Possibly a large set of predictor variables used to predict future quarterly revenues.
Objectives of Multiple Regression
Chapter 2 Dimensionality Reduction. Linear Methods
Understanding Multivariate Research Berry & Sanders.
STA302/ week 111 Multicollinearity Multicollinearity occurs when explanatory variables are highly correlated, in which case, it is difficult or impossible.
Model Building III – Remedial Measures KNNL – Chapter 11.
2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences April 2014.
Multiple Linear Regression. Purpose To analyze the relationship between a single dependent variable and several independent variables.
Detecting and reducing multicollinearity. Detecting multicollinearity.
N– variate Gaussian. Some important characteristics: 1)The pdf of n jointly Gaussian R.V.’s is completely described by means, variances and covariances.
Principal Components: A Mathematical Introduction Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia University.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Model Building and Model Diagnostics Chapter 15.
 Relationship between education level, income, and length of time out of school  Our new regression equation: is the predicted value of the dependent.
UNDERSTANDING DESCRIPTION AND CORRELATION. CORRELATION COEFFICIENTS: DESCRIBING THE STRENGTH OF RELATIONSHIPS Pearson r Correlation Coefficient Strength.
Principle Components Analysis A method for data reduction.
The Principal Components Regression Method David C. Garen, Ph.D. Hydrologist USDA Natural Resources Conservation Service National Water and Climate Center.
Presented by: Muhammad Wasif Laeeq (BSIT07-1) Muhammad Aatif Aneeq (BSIT07-15) Shah Rukh (BSIT07-22) Mudasir Abbas (BSIT07-34) Ahmad Mushtaq (BSIT07-45)
DATA ANALYSIS AND MODEL BUILDING LECTURE 9 Prof. Roland Craigwell Department of Economics University of the West Indies Cave Hill Campus and Rebecca Gookool.
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Regression Analysis Part A Basic Linear Regression Analysis and Estimation of Parameters Read Chapters 3, 4 and 5 of Forecasting and Time Series, An Applied.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Chapter 15 Multiple Regression Model Building
Principal Component Analysis (PCA)
Information Management course
Chapter 9 Multiple Linear Regression
The Problem of Large Correlations Among the Independent Variables
Regression Chapter 6 I Introduction to Regression
Regression Diagnostics
Multiple Regression Analysis and Model Building
Regression Analysis Simple Linear Regression
REGRESSION DIAGNOSTIC I: MULTICOLLINEARITY
...Relax... 9/21/2018 ST3131, Lecture 3 ST5213 Semester II, 2000/2001
Multiple Regression II
Regression Model Building - Diagnostics
Chapter 15 – Multiple Linear Regression
Multiple Regression II
Principal Component Analysis (PCA)
Chapter 3 Multiple Linear Regression
MULTICOLLINEARITY: WHAT HAPPENS IF THE REGRESSORS ARE CORRELATED
I271b Quantitative Methods
MULTICOLLINEARITY: WHAT HAPPENS IF THE REGRESSORS ARE CORRELATED
Matrix Algebra and Random Vectors
Regression Model Building - Diagnostics
Digital Image Processing Lecture 21: Principal Components for Description Prof. Charlene Tsai *Chapter 11.4 of Gonzalez.
Principal Component Analysis
Chapter 13 Additional Topics in Regression Analysis
Multicollinearity What does it mean? A high degree of correlation amongst the explanatory variables What are its consequences? It may be difficult to separate.
Financial Econometrics Fin. 505
REGRESSION DIAGNOSTICS
Presentation transcript:

Multicollinearity in Regression Principal Components Analysis Standing Heights and Physical Stature Attributes Among Female Police Officer Applicants S.Q. Lafi and J.B. Kaneene (1992). “An Explanation of the Use of Principal Components Analysis to Detect and Correct for Multicollinearity,” Preventive Veterinary Medicine, Vol. 13, pp. 261-275

Data Description Subjects: 33 Females applying for police officer positions Dependent Variable: Y ≡ Standing Height (cm) Independent Variables: X1 ≡ Sitting Height (cm) X2 ≡ Upper Arm Length (cm) X3 ≡ Forearm Length (cm) X4 ≡ Hand Length (cm) X5 ≡ Upper Leg Length (cm) X6 ≡ Lower Leg Length (cm) X7 ≡ Foot Length (inches) X8 ≡ BRACH (100X3/X2) X9 ≡ TIBIO (100X6/X5)

Data

Standardizing the Predictors

Correlation Matrix of Predictors and Inverse

Variance Inflation Factors (VIFs) VIF measures the extent that a regression coefficient’s variance is inflated due to correlations among the set of predictors VIFj = 1/(1-Rj2) where Rj2 is the coefficient of multiple determination when Xj is regressed on the remaining predictors. Values > 10 are often considered to be problematic VIFs can be obtained as the diagonal elements of R-1 Not surprisingly, X2, X3, X5, X6, X8, and X9 are problems (see definitions of X8 and X9)

Regression of Y on [1|X*] Note the surprising negative coefficients for X3*, X5*, and X9*

Principal Components Analysis While the columns of X* are highly correlated, the columns of W are uncorrelated The ls represent the variance corresponding to each principal component

Police Applicants Height Data - I

Police Applicants Height Data - II

Regression of Y on [1|W] Note that W8 and W9 have very small eigenvalues and very small t-statistics Condition indices are 63.5 and 85.2, Both well above 30

Reduced Model Removing last 2 principal components due to small, insignificant t-statistics and high condition indices Let V(g) be the p×g matrix of the eigenvectors for the g retained principal components (p=9, g=7) Let W(g) = X*V(g) Then regress Y on [1|W(g)] to obtain

Reduced Regression Fit

Transforming Back to Transformed X-scale

Comparison of Coefficients and SEs Original Model Principal Components

Predicted Values