Chapter 17 Understanding Residuals © 2010 Pearson Education 1.

Slides:



Advertisements
Similar presentations
 Objective: To determine whether or not a curved relationship can be salvaged and re-expressed into a linear relationship. If so, complete the re-expression.
Advertisements

Chapter 10: Re-expressing data –Get it straight!
Statistical Methods Lecture 28
Chapter 8 Linear regression
Chapter 8 Linear regression
Linear Regression Copyright © 2010, 2007, 2004 Pearson Education, Inc.
Copyright © 2010 Pearson Education, Inc. Chapter 8 Linear Regression.
Copyright © 2010 Pearson Education, Inc. Slide
Regression Wisdom Chapter 9.
Chapter 10 Re-Expressing data: Get it Straight
Copyright © 2012 Pearson Education. All rights reserved Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals.
Inference for Regression
Copyright © 2009 Pearson Education, Inc. Chapter 8 Linear Regression.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 10 Re-expressing Data: Get it Straight!
17.2 Extrapolation and Prediction
Regression Wisdom.
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 8, Slide 1 Chapter 8 Regression Wisdom.
Class 6: Tuesday, Sep. 28 Section 2.4. Checking the assumptions of the simple linear regression model: –Residual plots –Normal quantile plots Outliers.
Lecture 20 Simple linear regression (18.6, 18.9)
Slide Copyright © 2010 Pearson Education, Inc. Active Learning Lecture Slides For use with Classroom Response Systems Business Statistics First Edition.
Correlation & Regression
Descriptive Methods in Regression and Correlation
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-3 Regression.
Scatterplots, Association, and Correlation Copyright © 2010, 2007, 2004 Pearson Education, Inc.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 7 Scatterplots, Association, and Correlation.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Scatterplots, Associations, and Correlation
Copyright © 2010 Pearson Education, Inc. Chapter 9 Regression Wisdom.
Chapter 9 Regression Wisdom
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 9 Regression Wisdom.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 9 Regression Wisdom.
Statistics Review Chapter 10. Important Ideas In this chapter, we have leaned how to re- express the data and why it is needed.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 8 Linear Regression.
Chapter 14 Inference for Regression © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course.
Chapter 10 Re-expressing the data
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Lecture 6 Re-expressing Data: It’s Easier Than You Think.
Copyright © 2010 Pearson Education, Inc. Slide A least squares regression line was fitted to the weights (in pounds) versus age (in months) of a.
Chapter 3.3 Cautions about Correlations and Regression Wisdom.
Bivariate Data Analysis Bivariate Data analysis 4.
Copyright © 2010 Pearson Education, Inc. Chapter 9 Regression Wisdom.
Slide 9-1 Copyright © 2004 Pearson Education, Inc.
Copyright © 2010 Pearson Education, Inc. Slide The lengths of individual shellfish in a population of 10,000 shellfish are approximately normally.
Chapter 8 Linear Regression HOW CAN A MODEL BE CREATED WHICH REPRESENTS THE LINEAR RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES?
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 8- 1.
Chapter 9 Regression Wisdom math2200. Sifting residuals for groups Residuals: ‘left over’ after the model How to examine residuals? –Residual plot: residuals.
Linear Regression Chapter 8. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King.
Chapter 9 Regression Wisdom
Regression Wisdom Chapter 9. Getting the “Bends” Linear regression only works for linear models. (That sounds obvious, but when you fit a regression,
Copyright © 2010 Pearson Education, Inc. Chapter 10 Re-expressing Data: Get it Straight!
Regression Wisdom. Getting the “Bends”  Linear regression only works for linear models. (That sounds obvious, but when you fit a regression, you can’t.
Chapter 9 Regression Wisdom. Getting the “Bends” Linear regression only works for data with a linear association. Curved relationships may not be evident.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 9 Regression Wisdom.
Regression Wisdom Copyright © 2010, 2007, 2004 Pearson Education, Inc.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 10 Re-expressing Data: Get it Straight!
Statistics 9 Regression Wisdom. Getting the “Bends” Linear regression only works for linear models. (That sounds obvious, but when you fit a regression,
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 8, Slide 1 Chapter 8 Regression Wisdom.
Statistics 7 Scatterplots, Association, and Correlation.
Statistics 8 Linear Regression. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 7- 1.
Statistics 10 Re-Expressing Data Get it Straight.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 9 Regression Wisdom.
AP Statistics.  Linear regression only works for linear models. (That sounds obvious, but when you fit a regression, you can’t take it for granted.)
Chapter 9 Regression Wisdom Copyright © 2010 Pearson Education, Inc.
Chapter 8 Regression Wisdom.
Chapter 8 Part 2 Linear Regression
Presentation transcript:

Chapter 17 Understanding Residuals © 2010 Pearson Education 1

Examining Residuals for Groups Consider the following study of the Sugar content vs. the Calorie content of breakfast cereals: There is no obvious departure from the linearity assumption.

© 2010 Pearson Education Examining Residuals for Groups The histogram of residuals looks fairly normal…

© 2010 Pearson Education Examining Residuals for Groups The mean Calorie content may depend on some factor besides sugar content. …but the distribution shows signs of being a composite of three groups of cereal types.

© 2010 Pearson Education Examining Residuals for Groups Examining the residuals of groups… …suggests factors other than sugar content that may be important in determining Calorie content. Puffing: replacing cereal with “air” lowers the Calorie content, even for high-sugar cereals Fat/oil: Fats add to the Calorie content, even for low-sugar cereals Puffed cereals (high air content per serving) Cereals with fruits and/or nuts (high fat/oil content per serving) All others

© 2010 Pearson Education Examining Residuals for Groups Conclusion: It may be better to report three regressions, one for puffed cereals, one for high-fat cereals, and one for all others.

© 2010 Pearson Education Extrapolation and Prediction Extrapolating – predicting a y value by extending the regression model to regions outside the range of the x -values of the data.

© 2010 Pearson Education Extrapolation and Prediction Why is extrapolation dangerous?  It introduces the questionable and untested assumption that the relationship between x and y does not change.

© 2010 Pearson Education Extrapolation and Prediction Cautionary Example: Oil Prices in Constant Dollars Model Prediction (Extrapolation): On average, a barrel of oil will increase $7.39 per year from 1983 to 1998.

© 2010 Pearson Education Extrapolation and Prediction Cautionary Example: Oil Prices in Constant Dollars Actual Price Behavior Extrapolating the model to the ’80s and ’90s lead to grossly erroneous forecasts.

© 2010 Pearson Education Extrapolation and Prediction Remember: Linear models ought not be trusted beyond the span of the x -values of the data. If you extrapolate far into the future, be prepared for the actual values to be (possibly quite) different from your predictions.

© 2010 Pearson Education Unusual and Extraordinary Observations In regression, an outlier can stand out in two ways. It can have… 1). a large residual:

© 2010 Pearson Education Unusual and Extraordinary Observations In regression, an outlier can stand out in two ways. It can have… 2). a large distance from : “High-leverage point” A high leverage point is influential if omitting it gives a regression model with a very different slope.

© 2010 Pearson Education Unusual and Extraordinary Observations Tell whether the point is a high-leverage point, if it has a large residual, and if it is influential.

© 2010 Pearson Education Unusual and Extraordinary Observations Tell whether the point is a high-leverage point, if it has a large residual, and if it is influential.  Not high-leverage  Large residual  Not very influential

© 2010 Pearson Education Unusual and Extraordinary Observations Tell whether the point is a high-leverage point, if it has a large residual, and if it is influential.

© 2010 Pearson Education Unusual and Extraordinary Observations Tell whether the point is a high-leverage point, if it has a large residual, and if it is influential.  High-leverage  Small residual  Not very influential

© 2010 Pearson Education Unusual and Extraordinary Observations Tell whether the point is a high-leverage point, if it has a large residual, and if it is influential.

© 2010 Pearson Education Unusual and Extraordinary Observations Tell whether the point is a high-leverage point, if it has a large residual, and if it is influential.  High-leverage  Medium residual  Very influential (omitting the red point will change the slope dramatically!)

© 2010 Pearson Education Unusual and Extraordinary Observations What should you do with a high-leverage point?  Sometimes, these points are important. They can indicate that the underlying relationship is in fact nonlinear.  Other times, they simply do not belong with the rest of the data and ought to be omitted. When in doubt, create and report two models: one with the outlier and one without.

© 2010 Pearson Education Unusual and Extraordinary Observations WARNING: Influential points do not necessarily have high residuals! So, use scatterplots rather than residual plots to identify high-leverage outliers. (Residual plots work well of course for identifying high-residual outliers.)

© 2010 Pearson Education Working with Summary Values Scatterplots of summarized (averaged) data tend to show less variability than the un-summarized data. Example: Wind speeds at two locations, collected at 6AM, noon, 6PM, and midnight. Raw data: Daily-averaged data: Monthly-averaged data: R 2 = R 2 = R 2 = 0.942

© 2010 Pearson Education Working with Summary Values WARNING: Be suspicious of conclusions based on regressions of summary data. Regressions based on summary data may look better than they really are! In particular, the strength of the correlation will be misleading.

© 2010 Pearson Education Autocorrelation Time-series data are sometimes autocorrelated, meaning points near each other in time will be related. First-order autocorrelation: Adjacent measurements are related Second-order autocorrelation: Every other measurement is related etc… Autocorrelation violates the independence condition. Regression analysis of autocorrelated data can produce misleading results.

© 2010 Pearson Education Autocorrelation Autocorrelation can sometimes be detected by plotting residuals versus time. Don’t rely on plots to detect autocorrelation. Rather, use the Durbin-Watson statistic.

© 2010 Pearson Education Autocorrelation The value of D will always be between 0 and 4, inclusive. D = 0perfect positive autocorrelation ( e t = e t–1 for all points) D = 2no autocorrelation D = 4perfect negative autocorrelation ( e t = –e t–1 for all points) Durbin-Watson Statistic – estimates the first-order autocorrelation.

© 2010 Pearson Education Autocorrelation Whether the calculated Durbin-Watson statistic D indicates significant autocorrelation depends on the sample size, n, and the number of predictors in the regression model, k. Table W of Appendix C provides critical values for the Durbin-Watson statistic ( d L and d U ) based on n and k.

© 2010 Pearson Education Autocorrelation Testing for positive first-order autocorrelation: If D < d L, then there is evidence of positive autocorrelation If d L < D < d U, then test is inconclusive If D > d U, then there is no evidence of positive autocorrelation Testing for negative first-order autocorrelation: If D > 4 – d L, then there is evidence of negative autocorrelation If 4 – d L < D < 4 – d U, then test is inconclusive If D < 4 – d U, then there is no evidence of negative autocorrelation

© 2010 Pearson Education Autocorrelation Dealing with autocorrelation:  Time series methods (Chapter 20) attempt to deal with the problem by modeling the errors.  Or, look for a predictor variable (Chapter 19) that removes the dependence in the residuals.  A simple solution: sample from the time series to minimize first-order autocorrelation (sampling may do nothing to minimize higher-order autocorrelation, though).

© 2010 Pearson Education Linearity Some data show departures from linearity. Example: Auto Weight vs. Fuel Efficiency Linearity condition is not satisfied.

© 2010 Pearson Education Linearity In cases involving upward bends of negatively-correlated data, try analyzing –1/ y (negative reciprocal of y ) vs. x instead. Linearity condition now appears satisfied.

© 2010 Pearson Education Transforming (Re-expressing) Data The auto weight vs. fuel economy example (17.6) illustrates the principle of transforming data. There is nothing sacred about the way x -values or y -values are measured. From the standpoint of measurement, all of the following may be equally-reasonable: x vs. y x vs. –1/ y x 2 vs. y x vs. log ( y) One or more of these transformations may be useful for making data more linear, more normal, etc.

© 2010 Pearson Education Transforming (Re-expressing) Data Goals of Re-expression Goal 1 Make the distribution of a variable more symmetric.

© 2010 Pearson Education Transforming (Re-expressing) Data Goals of Re-expression Goal 2 Make the spread of several groups more alike. We’ll see methods later in the book that can be applied only to groups with a common standard deviation.

© 2010 Pearson Education Transforming (Re-expressing) Data Goals of Re-expression Goal 3 Make the form of a scatterplot more nearly linear.

© 2010 Pearson Education Transforming (Re-expressing) Data Goals of Re-expression Goal 4 Make the scatter in a scatterplot or residual plot spread out evenly rather than following a fan shape.

© 2010 Pearson Education The Ladder of Powers Ladder of Powers – a collection of frequently-useful re- expressions.

© 2010 Pearson Education The Ladder of Powers Ladder of Powers – a collection of frequently-useful re- expressions.

© 2010 Pearson Education The Ladder of Powers Ladder of Powers – a collection of frequently-useful re- expressions.

© 2010 Pearson Education The Ladder of Powers You want to model the relationship between prices for various items in Paris and Hong Kong. The scatterplot of Hong Kong prices vs. Paris prices shows a generally straight pattern with a a small amount of scatter. What re-expression (if any) of the Hong Kong prices might you start with?

© 2010 Pearson Education The Ladder of Powers You want to model the relationship between prices for various items in Paris and Hong Kong. The scatterplot of Hong Kong prices vs. Paris prices shows a generally straight pattern with a a small amount of scatter. What re-expression (if any) of the Hong Kong prices might you start with? No re-expression is needed to strengthen the linearity assumption. More information is needed to decide whether re-expression might strengthen the normality assumption or the equal-variance assumption.

© 2010 Pearson Education The Ladder of Powers You want to model the population growth of the United States over the past 200 years with a percentage growth that’s nearly constant. The scatterplot shows a strongly upwardly curves pattern. What re-expression (if any) of the Hong Kong prices might you start with?

© 2010 Pearson Education The Ladder of Powers You want to model the population growth of the United States over the past 200 years with a percentage growth that’s nearly constant. The scatterplot shows a strongly upwardly curves pattern. What re-expression (if any) of the Hong Kong prices might you start with? Try a “Power 0” (logarithmic) re-expression of the population values. This should strengthen the linearity assumption.

© 2010 Pearson Education 44 What Can Go Wrong?  Make sure the relationship is straight enough to fit a regression model. Be alert for extreme residuals and what they have to say about the data.  Be on guard for data that is a composite of values from different groups. If you find data subsets that behave differently, consider fitting a different model to each group.  Beware of extrapolating. Be particularly wary of extrapolating far into the future.  Look for unusual points: points with large residuals and high-leverage points.

© 2010 Pearson Education 45 What Can Go Wrong?  Beware of high-leverage points, especially those that are influential.  Consider setting aside outliers and re-running the regression.  Treat unusual points honestly. You must not eliminate points simply to “get a good fit”.  Be alert for autocorrelation. A Durbin-Watson test can be useful for revealing first-order autocorrelation.

© 2010 Pearson Education 46 What Can Go Wrong?  Watch out when dealing with data that are summaries. These tend to inflate the impression of the strength of the correlation.  Re-express your data when necessary.

© 2010 Pearson Education 47 What Have We Learned?  Watch out for more than one group hiding in your regression analysis.  The Linearity Condition says that the relationship should be reasonably linear to fit a regression. The satisfaction of this condition is best assessed after performing the regression and examining the residuals.  The Outlier Condition refers to two kinds of points: those with large residuals and those with high leverage. It’s a good idea to perform the regression analysis both with them and without them.