Statistical Methods Lecture 28

Slides:



Advertisements
Similar presentations
 Objective: To determine whether or not a curved relationship can be salvaged and re-expressed into a linear relationship. If so, complete the re-expression.
Advertisements

 Objective: To identify influential points in scatterplots and make sense of bivariate relationships.
Chapter 8 Linear regression
Chapter 8 Linear regression
Extrapolation: Reaching Beyond the Data
Chapter 10 Re-Expressing data: Get it Straight
Copyright © 2012 Pearson Education. All rights reserved Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals.
Inference for Regression
Copyright © 2009 Pearson Education, Inc. Chapter 8 Linear Regression.
Chapter 8 Linear Regression © 2010 Pearson Education 1.
CHAPTER 8: LINEAR REGRESSION
Chapter 10 Re-expressing the data
17.2 Extrapolation and Prediction
Chapter 17 Understanding Residuals © 2010 Pearson Education 1.
Regression Wisdom.
Chapter 9: Regression Wisdom
Class 6: Tuesday, Sep. 28 Section 2.4. Checking the assumptions of the simple linear regression model: –Residual plots –Normal quantile plots Outliers.
Lecture 20 Simple linear regression (18.6, 18.9)
Regression Diagnostics Checking Assumptions and Data.
Slide Copyright © 2010 Pearson Education, Inc. Active Learning Lecture Slides For use with Classroom Response Systems Business Statistics First Edition.
CHAPTER 3 Describing Relationships
Correlation & Regression
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Copyright © 2010 Pearson Education, Inc. Chapter 9 Regression Wisdom.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 9 Regression Wisdom.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 9 Regression Wisdom.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 8 Linear Regression.
Scatterplot and trendline. Scatterplot Scatterplot explores the relationship between two quantitative variables. Example:
Chapter 14 Inference for Regression © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course.
Chapter 10 Re-expressing the data
Lecture 6 Re-expressing Data: It’s Easier Than You Think.
Chapter 3.3 Cautions about Correlations and Regression Wisdom.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
Copyright © 2010 Pearson Education, Inc. Chapter 9 Regression Wisdom.
Slide 9-1 Copyright © 2004 Pearson Education, Inc.
Copyright © 2010 Pearson Education, Inc. Slide The lengths of individual shellfish in a population of 10,000 shellfish are approximately normally.
Chapter 8 Linear Regression HOW CAN A MODEL BE CREATED WHICH REPRESENTS THE LINEAR RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES?
Business Statistics for Managerial Decision Making
Linear Regression Chapter 8. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King.
Chapter 9 Regression Wisdom
Regression Wisdom Chapter 9. Getting the “Bends” Linear regression only works for linear models. (That sounds obvious, but when you fit a regression,
Regression Wisdom. Getting the “Bends”  Linear regression only works for linear models. (That sounds obvious, but when you fit a regression, you can’t.
Chapter 9 Regression Wisdom. Getting the “Bends” Linear regression only works for data with a linear association. Curved relationships may not be evident.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 9 Regression Wisdom.
Regression Wisdom Copyright © 2010, 2007, 2004 Pearson Education, Inc.
Describing Relationships. Least-Squares Regression  A method for finding a line that summarizes the relationship between two variables Only in a specific.
Statistics 8 Linear Regression. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King.
AP Statistics.  Linear regression only works for linear models. (That sounds obvious, but when you fit a regression, you can’t take it for granted.)
 Understand why re-expressing data is useful  Recognize when the pattern of the data indicates that no re- expression will improve it  Be able to reverse.
Statistics 200 Lecture #6 Thursday, September 8, 2016
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 9 Regression Wisdom Copyright © 2010 Pearson Education, Inc.
Chapter 8 Part 2 Linear Regression
Week 5 Lecture 2 Chapter 8. Regression Wisdom.
CHAPTER 3 Describing Relationships
Chapter 3 Describing Relationships Section 3.2
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Algebra Review The equation of a straight line y = mx + b
CHAPTER 3 Describing Relationships
Presentation transcript:

Statistical Methods Lecture 28

Extrapolation and Prediction QTM1310/ Sharpe Extrapolation and Prediction Extrapolating – predicting a y value by extending the regression model to regions outside the range of the x-values of the data. 2

Extrapolation and Prediction QTM1310/ Sharpe Extrapolation and Prediction Why is extrapolation dangerous? It introduces the questionable and untested assumption that the relationship between x and y does not change. 3

Extrapolation and Prediction QTM1310/ Sharpe Extrapolation and Prediction Cautionary Example: Oil Prices in Constant Dollars Price = – 0.85 + 7.39 Time Model Prediction (Extrapolation): On average, a barrel of oil will increase $7.39 per year from 1983 to 1998. 4

Extrapolation and Prediction QTM1310/ Sharpe Extrapolation and Prediction Cautionary Example: Oil Prices in Constant Dollars Price = – 0.85 + 7.39 Time Actual Price Behavior Extrapolating the 1971-1982 model to the ’80s and ’90s lead to grossly erroneous forecasts. 5

Extrapolation and Prediction QTM1310/ Sharpe Extrapolation and Prediction Remember: Linear models ought not be trusted beyond the span of the x-values of the data. If you extrapolate far into the future, be prepared for the actual values to be (possibly quite) different from your predictions. 6

Unusual and Extraordinary Observations QTM1310/ Sharpe Unusual and Extraordinary Observations Outliers, Leverage, and Influence In regression, an outlier can stand out in two ways. It can have… 1) a large residual: 7

Unusual and Extraordinary Observations QTM1310/ Sharpe Unusual and Extraordinary Observations Outliers, Leverage, and Influence In regression, an outlier can stand out in two ways. It can have… 2) a large distance from : “High-leverage point” A high leverage point is influential if omitting it gives a regression model with a very different slope. 8

Unusual and Extraordinary Observations QTM1310/ Sharpe Unusual and Extraordinary Observations Outliers, Leverage, and Influence Tell whether the point is a high-leverage point, if it has a large residual, and if it is influential. Not high-leverage Large residual Not very influential 9

Unusual and Extraordinary Observations QTM1310/ Sharpe Unusual and Extraordinary Observations Outliers, Leverage, and Influence Tell whether the point is a high-leverage point, if it has a large residual, and if it is influential. High-leverage Small residual Not very influential 10

Unusual and Extraordinary Observations QTM1310/ Sharpe Unusual and Extraordinary Observations Outliers, Leverage, and Influence Tell whether the point is a high-leverage point, if it has a large residual, and if it is influential. High-leverage Medium (large?) residual Very influential (omitting the red point will change the slope dramatically!) large? comment—I would expect many students to say large, because they would be answering “large” based on the residual from a line without that point included 11

Unusual and Extraordinary Observations QTM1310/ Sharpe Unusual and Extraordinary Observations Outliers, Leverage, and Influence What should you do with a high-leverage point? Sometimes, these points are important. They can indicate that the underlying relationship is in fact nonlinear. Other times, they simply do not belong with the rest of the data and ought to be omitted. When in doubt, create and report two models: one with the outlier and one without. 12

Unusual and Extraordinary Observations QTM1310/ Sharpe Unusual and Extraordinary Observations Example: Hard Drive Prices Prices for external hard drives are linearly associated with the Capacity (in GB). The least squares regression line without a 200 GB drive that sold for $299.00 was found to be . The regression equation with the original data is How are the two equations different? The intercepts are different, but the slopes are similar. Does the new point have a large residual? Explain. Yes. The hard drive’s price doesn’t fit the pattern since it pulled the line up but didn’t decrease the slope very much. 13

Working with Summary Values QTM1310/ Sharpe Working with Summary Values Scatterplots of summarized (averaged) data tend to show less variability than the un-summarized data. Example: Wind speeds at two locations, collected at 6AM, noon, 6PM, and midnight. Raw data: R2 = 0.736 Daily-averaged data: R2 = 0.844 Monthly-averaged data: R2 = 0.942 Next 3 slides are merely cautionary—these (summarized) data don’t violate any of the conditions of doing linear regression per se; however, you must take into account that the association you are describing is among summarized data 14

Working with Summary Values QTM1310/ Sharpe Working with Summary Values WARNING: Be suspicious of conclusions based on regressions of summary data. Regressions based on summary data may look better than they really are! In particular, the strength of the correlation will be misleading. 15

QTM1310/ Sharpe Autocorrelation Time-series data are sometimes autocorrelated, meaning points near each other in time will be related. First-order autocorrelation: Adjacent measurements are related Second-order autocorrelation: Every other measurement is related etc… Autocorrelation violates the independence condition. Regression analysis of autocorrelated data can produce misleading results. 16

Transforming (Re-expressing) Data QTM1310/ Sharpe Transforming (Re-expressing) Data An aside On using technology: Make sure to point out Residuals, and Residuals plots on the 3rd thing to “pop up”. 17

Transforming (Re-expressing) Data QTM1310/ Sharpe Transforming (Re-expressing) Data Linearity Some data show departures from linearity. Example: Auto Weight vs. Fuel Efficiency Linearity condition is not satisfied. 18

Transforming (Re-expressing) Data QTM1310/ Sharpe Transforming (Re-expressing) Data Linearity In cases involving upward bends of negatively-correlated data, try analyzing –1/y (negative reciprocal of y) vs. x instead. Linearity condition now appears satisfied. 19

Transforming (Re-expressing) Data QTM1310/ Sharpe Transforming (Re-expressing) Data The auto weight vs. fuel economy example illustrates the principle of transforming data. There is nothing sacred about the way x-values or y-values are measured. From the standpoint of measurement, all of the following may be equally-reasonable: x vs. y x vs. –1/y x2 vs. y x vs. log(y) One or more of these transformations may be useful for making data more linear, more normal, etc. 20

Transforming (Re-expressing) Data QTM1310/ Sharpe Transforming (Re-expressing) Data Goals of Re-expression Goal 1 Make the distribution of a variable more symmetric. 21

Transforming (Re-expressing) Data QTM1310/ Sharpe Transforming (Re-expressing) Data Goals of Re-expression Goal 2 Make the spread of several groups more alike. We’ll see methods later in the book that can be applied only to groups with a common standard deviation. 22

Looking back Make sure the relationship is straight enough to fit a regression model. Beware of extrapolating. Treat unusual points honestly. You must not eliminate points simply to “get a good fit”. Watch out when dealing with data that are summaries. Re-express your data when necessary. Assumptions: quantitative data from incr. normal popn as sample gets smaller, r. sample Particular form of the test statistic Two-tailed test: effect on the p-value (have to double tail prob) or rejection region (have to take given alpha & divide it between two tails)