Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)

Slides:



Advertisements
Similar presentations
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Advertisements

Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
Chapter 4 The Relation between Two Variables
Ch11 Curve Fitting Dr. Deshi Ye
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Lecture 25 Multiple Regression Diagnostics (Sections )
Part I – MULTIVARIATE ANALYSIS C2 Multiple Linear Regression I
Multivariate Data Analysis Chapter 4 – Multiple Regression.
January 6, morning session 1 Statistics Micro Mini Multiple Regression January 5-9, 2008 Beth Ayers.
Class 6: Tuesday, Sep. 28 Section 2.4. Checking the assumptions of the simple linear regression model: –Residual plots –Normal quantile plots Outliers.
Lecture 24 Multiple Regression (Sections )
Lecture 24: Thurs., April 8th
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #20.
Lecture 20 Simple linear regression (18.6, 18.9)
Regression Diagnostics Checking Assumptions and Data.
Linear Regression Analysis 5E Montgomery, Peck and Vining 1 Chapter 6 Diagnostics for Leverage and Influence.
Stat 112: Lecture 16 Notes Finish Chapter 6: –Influential Points for Multiple Regression (Section 6.7) –Assessing the Independence Assumptions and Remedies.
Business Statistics - QBM117 Statistical inference for regression.
Correlation and Regression Analysis
Excel Data Analysis Tools Descriptive Statistics – Data ribbon – Analysis section – Data Analysis icon – Descriptive Statistics option – Does NOT auto.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Multiple Regression Dr. Andy Field.
Introduction to Regression Analysis, Chapter 13,
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Simple Linear Regression Analysis
Forecasting Revenue: An Example of Regression Model Building Setting: Possibly a large set of predictor variables used to predict future quarterly revenues.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Correlation & Regression
Objectives of Multiple Regression
Descriptive Methods in Regression and Correlation
1 MULTI VARIATE VARIABLE n-th OBJECT m-th VARIABLE.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Forecasting Revenue: An Example of Regression Model Building Setting: Possibly a large set of predictor variables used to predict future quarterly revenues.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Chapter 12 Examining Relationships in Quantitative Research Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences April 2014.
Correlation & Regression
Examining Relationships in Quantitative Research
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Anaregweek11 Regression diagnostics. Regression Diagnostics Partial regression plots Studentized deleted residuals Hat matrix diagonals Dffits, Cook’s.
Review of Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Simple Linear Regression (SLR)
Model Selection and Validation. Model-Building Process 1. Data collection and preparation 2. Reduction of explanatory or predictor variables (for exploratory.
REGRESSION DIAGNOSTICS Fall 2013 Dec 12/13. WHY REGRESSION DIAGNOSTICS? The validity of a regression model is based on a set of assumptions. Violation.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
I271B QUANTITATIVE METHODS Regression and Diagnostics.
Review Session Linear Regression. Correlation Pearson’s r –Measures the strength and type of a relationship between the x and y variables –Ranges from.
Applied Quantitative Analysis and Practices LECTURE#30 By Dr. Osman Sadiq Paracha.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
 Assumptions are an essential part of statistics and the process of building and testing models.  There are many different assumptions across the range.
1 Regression Review Population Vs. Sample Regression Line Residual and Standard Error of Regression Interpretation of intercept & slope T-test, F-test.
More on regression Petter Mostad More on indicator variables If an independent variable is an indicator variable, cases where it is 1 will.
Quantitative Methods Residual Analysis Multiple Linear Regression C.W. Jackson/B. K. Gordor.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Correlation and Simple Linear Regression
Regression Diagnostics
Chapter 12: Regression Diagnostics
بحث في التحليل الاحصائي SPSS بعنوان :
Diagnostics and Transformation for SLR
Regression Forecasting and Model Building
Chapter 13 Additional Topics in Regression Analysis
Diagnostics and Transformation for SLR
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)

Dr. C. Ertuna2 Collinearity A perfect linear relationship between two (or more) independent variables is called collinearity (multi-collinearity) Under this condition, the least-square regression coefficients cannot be uniquely defined.

Dr. C. Ertuna3 Collinearity A strong but less than perfect linear relationship between the independent variables can cause: 1.Regression coefficients to be unstable, 2.Standard errors to the coefficients become large, hence, confidence intervals for coefficients become large and coefficients become imprecise,

Dr. C. Ertuna4 Collinearity Mesurement One of the measures to determine the impact of Collinearity on the precision of the estimates is called the “Variance Inflation Factor (VIF).”

Dr. C. Ertuna5 Collinearity Detection Wrong signs for the coefficients Drastic changes in the coefficients in terms of size and/or sign as a new variable is added to the equation. High VIF

Dr. C. Ertuna6 Collinearity: Remedies There is no Quick Fix for collinearity, Some strategies: 1. Variable selection for the model: Based on correlation matrix, some of the highly correlated variables could be excluded from the model, 2. Ridge Regression instead Ordinary Least Squared Regression (OLR).

Dr. C. Ertuna7 Unusual Data A single observation that is substantially different from all other observations can make a large difference in the results of your regression analysis. If a single observation (or small group of observations) substantially changes your results, you would want to know about this and investigate further. There are three ways that an observation can be unusual.

Dr. C. Ertuna8 Unusual Data Outliers : In linear regression, an outlier is an observation with large residual. In other words, it is an observation whose dependent-variable value is unusual given its values on the predictor variables. An outlier may indicate a sample peculiarity or may indicate a data entry error or other problem.

Dr. C. Ertuna9 Unusual Data Leverage : An observation with an extreme value on a predictor variable is called a point with high leverage. Leverage is a measure of how far an independent variable deviates from its mean. These leverage points can have an unusually large effect on the estimate of regression coefficients.

Dr. C. Ertuna10 Unusual Data Influence : An observation is said to be influential if removing the observation substantially changes the estimate of coefficients. Influence can be thought of as the product of leverage and outlierness.

Dr. C. Ertuna11 Outliers and Influential Data An outlier is an observation whose dependent variable value is unusual given the value of the independent variable Not all outliers has an important effect on the intercept and/or slope of the regression. For an outlier to be influential it should be away from the mean of the independent variable.

Dr. C. Ertuna12 Influential Data: Diagnosis Cook’s D If Cook’s distance for a particular observation is greater than a cutoff point than that observation could be considered as influential data. One such cutoff point is –D i > 4 / (n-k-1) –Where, k = number of independent variables

Dr. C. Ertuna13 Influential Data Diagnostics on SPSS Standardized DfBETA(s): Change in the regression coefficient that results from the deletion of the i th case. A standardized DfBETA value is computed for each case for each regression coefficient generated by a model. Cut-off Points > 0 means case i increases the slope < 0 means case i decreases the slope |DfBETA(s)| > 2 strong indication of influence |DfBETA(s)| > 2/sqrt(n) might be problem

Dr. C. Ertuna14 Influential Data Diagnostics on SPSS Leverage “h” max(h) <= 0.2 : OK, no problem 0.2 <= max(h) <= 0.5, might be problem max(h) > 0.5, usually a problem of too much leverage for one case h > 2k/n, top few % of cases

Dr. C. Ertuna15 Influential Data Diagnostics on SPSS Standardized DfFIT Change in the predicted value when the i th case is deleted. Cut-off Point DfFIT| > 2*sqrt(k/n) problem

Dr. C. Ertuna16 Influential Data: Remedies The unusual data need to be investigated –For example, it may stem from an error in data entry The model could be re-specified, robust estimation methods could be used, An influential data could only be discarded if it is a truly bad data and cannot be corrected.

Dr. C. Ertuna17 Checking the Assumptions There are assumptions that need to be met to accept the results of Regression analysis and use the model for future decision making: Linearity Independence of errors (No autocorrelation), Normality of errors, Constant Variance of errors (Homoscadasticity ).

Dr. C. Ertuna18 Tests for Linearity Linearity: Plot dependent variable against each of the independent variables separately. Decide whether linear regression is a “Reasonable” description of the tendency in the data. –Consider curvilinear pattern, –Consider undue influence of one data point on the regression line, etc.

Dr. C. Ertuna19 Nonlinear Relationships Advertising Sales Diminishing Returns Relationship of Advertising versus Sales

Dr. C. Ertuna20 Nonlinear Relationships Advertising Sales Diminishing Returns Relationship of Advertising versus Sales

Dr. C. Ertuna21 Analysis of Residuals Residuals Residuals (a) Nonlinear Pattern (b) Linear Pattern

Dr. C. Ertuna22 Tests for Independence Independence of Errors: Plot residuals against time (Residual-Time Plot) –Residuals form y-axis, time form x-axis –If the residuals group alternately into positive and negative clusters then that indicates auto-correlation Ljung-Box Test (Note that only one lag version is applied here)

Dr. C. Ertuna23 Residuals-Time Plot Notice the tendency of the residuals to group alternately into positive and negative clusters. That is an indication that the residuals are not independent but auto-correlated.

Dr. C. Ertuna24 Analysis of Residuals Residuals (a) Independent Residuals Residuals (b) Residuals Not Independent Time

Dr. C. Ertuna25 Ljung-Box Test Compute LB Test Statistics for one lag (Q (1) ) –Q(1) = (n(n-2)/ (n-1) ) * Correl(Data_Range_ 1, Data_Range_ 2 )^2 Compare LB against Chi-square_alpha-value –Chiinv ( alpha / tails, 1) Ho: Q(1) < Chi-square_alpha

Dr. C. Ertuna26 Non-Independence: Remedies EGLS (Estimated Generalized Least Squares) Methods –Prais-Winsten –Cochrane-Orcutt (Note that these are effective only for first-order autocorrelation.)

Dr. C. Ertuna27 Tests for Normality Normality of Errors: Normal-Quantile Plot of Residuals (Errors) Compute Skewness Compute Kurtosis Jarque-Bera Test

Dr. C. Ertuna28 Normal-Quantile Plot of Residuals Sort Residuals (min => max) Create a Rank column Compute z-scores =NORMINV((rank-0.5)/N,0,1) Plot z-scores (x) and residuals (y) For normality the plot should be reasonably linear.

Dr. C. Ertuna29 Jarque-Bera Test (in Excel) Compute JB-Test Statistics –JB = (n/6)*Skew(Data_Range)^2 + + (n/24) * ( Kurt(Data_Range)^2 Compute p-value by using the formula –Chdist(JB,2) Ho: Data is normally distributed –Note that JB is very sensitive to sample size, and p_values are not uniformly distributed, hence danger in committing Type I error.

Dr. C. Ertuna30 Non-Normality: Remedies To stabilize error variance, one of the most frequently used technique is data transformation. X and/or Y values could be transformed by employing power to those variables, y (or x) => y p (or x p ) where p = -2, -1, -½, ½, 2, 3

Dr. C. Ertuna31 Tests for Constant Variance Constant Variance of Errors: Plot residuals against y-estimates: –Residuals form y-axis and estimated y-values form x- axis. –When errors get larger (or smaller) as y-values increase that would indicate non-constant variance. Plot residuals against each x: –Residuals form y-axis and x-values form x-axis.

Dr. C. Ertuna32 Analysis of Residuals Residuals x1x1 (a) Variance Decreases as x Increases

Dr. C. Ertuna33 Analysis of Residuals Residuals (b) Variance Increases as x Increases x1x1

Dr. C. Ertuna34 Analysis of Residuals Residuals (c) Constant Variance x1x1

Dr. C. Ertuna35 Non-Constant Variance: Remedies Transform dependent variable (y) –y => y p where p = -2, -1, -½, ½, 2, 3 Weighted Least Square Regression Method

Dr. C. Ertuna36 Next Lesson (Lesson - 07/A) Qualitative & Judgmental Forecasting Methods