Part 19: Residuals and Outliers 19-1/27 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics.

Slides:



Advertisements
Similar presentations
Part 17: Multiple Regression – Part /26 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department.
Advertisements

Part 12: Linear Regression 12-1/27 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics.
Part 12: Asymptotics for the Regression Model 12-1/39 Econometrics I Professor William Greene Stern School of Business Department of Economics.
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Notes on Residuals Simple Linear Regression Models.
Part 1: Simple Linear Model 1-1/301-1 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
Part 4: Prediction 4-1/22 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
Part 22: Multiple Regression – Part /60 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department.
Models with Discrete Dependent Variables
Statistical Inference and Regression Analysis: GB Professor William Greene Stern School of Business IOMS Department Department of Economics.
Lecture 18: Thurs., Nov. 6th Chapters 8.3.2, 8.4, Outliers and Influential Observations Transformations Interpretation of log transformations (8.4)
5  ECONOMETRICS CHAPTER Yi = B1 + B2 ln(Xi2) + ui
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Statistical Inference and Regression Analysis: GB Professor William Greene Stern School of Business IOMS Department Department of Economics.
Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics.
Part 2: Projection and Regression 2-1/45 Econometrics I Professor William Greene Stern School of Business Department of Economics.
Part 18: Regression Modeling 18-1/44 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics.
Empirical Estimation Review EconS 451: Lecture # 8 Describe in general terms what we are attempting to solve with empirical estimation. Understand why.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Part 5: Functional Form 5-1/36 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis.
Part 7: Multiple Regression Analysis 7-1/54 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
Part 9: Model Building 9-1/43 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
Part 3: Regression and Correlation 3-1/41 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
Part 24: Multiple Regression – Part /45 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department.
Regression and Correlation Methods Judy Zhong Ph.D.
Categorical Data Prof. Andy Field.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.
5.1 Basic Estimation Techniques  The relationships we theoretically develop in the text can be estimated statistically using regression analysis,  Regression.
Chapter 12 Multiple Regression and Model Building.
2.4: Cautions about Regression and Correlation. Cautions: Regression & Correlation Correlation measures only linear association. Extrapolation often produces.
Part 17: Regression Residuals 17-1/38 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics.
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Electric Utility Cost Data for Electricity Generation by Nerlove and Christensen-Greene Data is similar to Electric Utility data described in McGuigan.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 12: Analyzing the Association Between Quantitative Variables: Regression Analysis Section.
Interpreting the Regression Line The slope coefficient gives the marginal effect on the endogenous variable of an increase in the exogenous variable. The.
Part 2: Model and Inference 2-1/49 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
Part 16: Regression Model Specification 16-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department.
1 1 Slide Simple Linear Regression Estimation and Residuals Chapter 14 BA 303 – Spring 2011.
Agresti/Franklin Statistics, 1 of 88  Section 11.4 What Do We Learn from How the Data Vary Around the Regression Line?
Copyright © 2012 Pearson Education, Inc. All rights reserved Chapter 12 Multiple Regression and Model Building.
Slide 9-1 Copyright © 2004 Pearson Education, Inc.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 12 Analyzing the Association Between Quantitative Variables: Regression Analysis Section.
Chapter 9 Regression Wisdom. Getting the “Bends” Linear regression only works for data with a linear association. Curved relationships may not be evident.
1 Chapter 12: Analyzing Association Between Quantitative Variables: Regression Analysis Section 12.1: How Can We Model How Two Variables Are Related?
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 12: Analyzing the Association Between Quantitative Variables: Regression Analysis Section.
Economics 173 Business Statistics Lecture 18 Fall, 2001 Professor J. Petry
Chapter 6: Multiple Regression – Additional Topics
Chapter 15 Multiple Regression Model Building
CHAPTER 3 Describing Relationships
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Lecture Slides Elementary Statistics Thirteenth Edition
Week 5 Lecture 2 Chapter 8. Regression Wisdom.
Chapter 6: Multiple Regression – Additional Topics
Statistical Inference and Regression Analysis: GB
Least-Squares Regression
BA 275 Quantitative Business Methods
Statistical Inference and Regression Analysis: GB
Chapter 12 Review Inference for Regression
Section 4.1 Exponential Modeling
Least-Squares Regression
Warmup A study was done comparing the number of registered automatic weapons (in thousands) along with the murder rate (in murders per 100,000) for 8.
Econometrics I Professor William Greene Stern School of Business
Econometrics I Professor William Greene Stern School of Business
Chapter 13 Multiple Regression
Essentials of Statistics for Business and Economics (8e)
Presentation transcript:

Part 19: Residuals and Outliers 19-1/27 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics

Part 19: Residuals and Outliers 19-2/27 Statistics and Data Analysis Part 19 – Residuals, Outliers and Elasticities

Part 19: Residuals and Outliers 19-3/27 Linear Regression Models  Analyzing residuals Violations of assumptions Unusual data points Hints for improving the model  Model building Linear models – cost functions Semilog models – growth models Logs and elasticities

Part 19: Residuals and Outliers 19-4/27 Using the Residuals  How do you know the model is “good?”  The first place to look is at the residuals.

Part 19: Residuals and Outliers 19-5/27 Residuals Can Signal a Flawed Model  Standard application: Cost function for output of a production process.  Compare linear equation to a quadratic model (in logs)  (123 American Electric Utilities)

Part 19: Residuals and Outliers 19-6/27 Electricity (log) Cost Function

Part 19: Residuals and Outliers 19-7/27 Candidate Model for Cost Log c = a + b log q + e

Part 19: Residuals and Outliers 19-8/27 A Better Model? Log Cost = α + β 1 logOutput + β 2 [logOutput] 2 + ε

Part 19: Residuals and Outliers 19-9/27 Candidate Models for Cost The quadratic equation is the appropriate model. Logc = a + b1 logq + b2 log 2 q + e

Part 19: Residuals and Outliers 19-10/27 Missing Variable Included Residuals from the quadratic cost model Residuals from the linear cost model

Part 19: Residuals and Outliers 19-11/27 Unusual Data Points Outliers have (what appear to be) very large disturbances, ε The 500 most successful movies

Part 19: Residuals and Outliers 19-12/27 Outliers Remember the empirical rule, 99.5% of observations will lie within mean ± 3 standard deviations? We show (a+bx) ± 3s e below.) Titanic is 8.1 standard deviations from the regression! Only 0.86% of the 466 observations lie outside the bounds. (We will refine this later.) These points might deserve a closer look.

Part 19: Residuals and Outliers 19-13/27 logPrice = a + b logArea + e Prices paid at auction for Monet paintings vs. surface area (in logs) Not an outlier: Monet chose to paint a small painting. Possibly an outlier: Why was the price so low?

Part 19: Residuals and Outliers 19-14/27 What to Do About Outliers (1) Examine the data (2) Are they due to mismeasurement error or obvious “coding errors?” Delete the observations. (3) Are they just unusual observations? Do nothing. (4) Generally, resist the temptation to remove outliers. Especially if the sample is large. (500 movies is large.) (5) Question why you think it is an outlier. Is it really?

Part 19: Residuals and Outliers 19-15/27 Regression Options

Part 19: Residuals and Outliers 19-16/27 Minitab’s Opinions Minitab uses ± 2S to flag “large” residuals.

Part 19: Residuals and Outliers 19-17/27 On Removing Outliers Be careful about singling out particular observations this way. The resulting model might be a product of your opinions, not the real relationship in the data. Removing outliers might create new outliers that were not outliers before. Statistical inferences from the model will be incorrect.

Part 19: Residuals and Outliers 19-18/27 Using and Interpreting the Model  Interpreting the linear model  Semilog and growth models  Log-log model and elasticities

Part 19: Residuals and Outliers 19-19/27 Statistical Cost Analysis Generation cost ($M) and output (Millions of KWH) for 123 American electric utilities. (1970). The units of the LHS and RHS must be the same. $M cost = a + b MKWH Y = $ cost a = $ cost = $M b = $M /MKWH = $M/MKWH So,….. a = fixed cost = total cost if MKWH = 0 marginal cost b = marginal cost = dCost/dMKWH b * MKWH = variable cost

Part 19: Residuals and Outliers 19-20/27 Semilog Models and Growth Rates LogSalary = Years + e

Part 19: Residuals and Outliers 19-21/27 Growth Rate

Part 19: Residuals and Outliers 19-22/27 Semilog Model for Fuel Bills

Part 19: Residuals and Outliers 19-23/27 Using Semilog Models for Trends Frequent Flyer Flights for 72 Months. (Text, Ex. 11.1, p. 508)

Part 19: Residuals and Outliers 19-24/27 Regression Approach logFlights = α + β Months + ε a = 2.770, b = , s =

Part 19: Residuals and Outliers 19-25/27 Elasticity and Loglinear Models  logY = α + βlogx + ε  The “responsiveness” of one variable to changes in another  E.g., in economics demand elasticity = (%ΔQ) / (%ΔP)  Math: Ratio of percentage changes %ΔQ / %ΔP = {100%[(ΔQ )/Q] / {100%[(ΔP)/P]} Units of measurement and the 100% fall out of this eqn. Elasticity = ( ΔQ/ΔP)*(P/Q) Elasticities are units free

Part 19: Residuals and Outliers 19-26/27 Monet Regression

Part 19: Residuals and Outliers 19-27/27 Summary  Residual analysis Consistent with model assumptions? Suggest missing elements in the model  Building the regression model Interpreting the model – cost function Growth model – semilog Double log and estimating elasticities